One of my favorite characters in William Gibson’s Neuromancer was a so-called “psychological construct” named The Dixie Flatline. Dixie wasn’t a person, really, but an emulation of a famous computer hacker named McCoy Pauley (based on a brain scan that was made before he died). As he — or, it — said in a conversation with the novel’s protagonist Henry Case:
“Me, I’m not human … but I respond like one, see? … But I’m really just a bunch of ROM. It’s one of them, ah, philosophical questions, I guess….” The ugly laughter sensation rattled down Case’s spine. “But I ain’t likely to write you no poem, if you follow me.”
The Flatline was neither a human nor an artificial intelligence, but a machine that partially emulated how a human thought. It did a pretty good job, too, playing the central role of “smart guy” in the novel’s main cyberpunk-heist plotline. Yet it wasn’t a perfect human emulation: its laugh was “wrong,” and it was self-aware enough to note its own lack of creativity. Turning its ROM disk off and back on again totally reset Dixie’s memory, and later in the story the villain tried to take out Case first (still alive and human) precisely because the Flatline was a machine, and therefore much more predictable.
Cognitive Models and Their Uses
Regardless, it’s pretty cool to think about what we can accomplish with computational cognitive models derived using real data from real people. In Neuromancer, the data was McCoy Pauley’s brain scan, which was modeled and encoded into a computer program called The Dixie Flatline. The model wasn’t quite right, but was still useful. All that is science fiction of course, but we are making progress in the real world, too. There are both practical and theoretical uses for these kinds of models, such as:
- “Encoding” a human thought process into a computer. It’s hard to “teach” computers directly. Most machine learning algorithms learn by example (i.e., observational data) but there aren’t great ways for people to inject their instincts about a problem into the machine. If we have a good cognitive model that captures properties of our thinking, though, we can perhaps encode that more directly into a learning algorithm.
- Understanding how people think. If a computational model predicts real human behavior pretty well, then there’s a chance that it captures something real about how we think. And if its parameters are easily interpretable, we can gain insight into how our brains work, too.
With these in mind, let me summarize a recent collaboration with fellow computer/cognitive scientists at my alma mater UW-Madison. Here, the data consist of word lists that people think up, which we model computationally for both the practical and theoretical uses mentioned above. In fact, the paper is being presented at the ICML 2013 conference this week in Atlanta. We made a short video overview of the research, too:
That’s mostly me talking in the video, but Kwang-Sung will present it at the conference. The paper itself is here:
K.S. Jun, X. Zhu, B. Settles, and T.T. Rogers. Learning from Human-Generated Lists. Proceedings of the International Conference on Machine Learning (ICML), pages 181-189. 2013.
The SWIRL Model
Our new statistical model is called SWIRL for sampling with reduced replacement. As far as we know, this is the first attempt at a computational cognitive model for verbal fluency, the technical term for generating word lists from memory. Without going into too many details — read the paper for those — SWIRL estimates three kinds of parameters (s, α, λ):
- s – the “word size” statistics (uhmm… watch the video if you skipped it)
- α – the discount factor that reduces a word’s size after it’s been listed
- λ – the length: how long people go before running out of ideas
The set of s parameters is useful for encoding the human list-generation process into computer systems, while α and λ are also useful for understanding what’s going un inside our noggins. I’ll tackle these in turn…
Improving Text Classifiers with SWIRL
Most supervised machine learning algorithms take data as input → output pairs, and “learn” how to map inputs to outputs from statistical regularities in the data. Consider text classifiers like the ones that filter your email or classify online product reviews as positive or negative in sentiment. These classifiers are usually “trained” using hand-labeled texts; and it often takes a ton of these texts before all the statistics can do a good job of teasing out which words predict which category labels.
But you already know that “excellent” indicates a positive review and “horrible” indicates a negative one… even before anyone labels any text! The past five years (or so) we’ve seen several new semi-supervised learning algorithms that use this kind of human “domain knowledge” together with a ton of unlabeled text. People can generate word lists for each category label, which can then be converted into priors for naïve Bayes (Settles, 2011) or constraints for logistic regression (Druck & al., 2008), among other things.
The problem is that these fancy approaches don’t account for the human mental process of generating word lists. The priors and constraints used in previous work generally assumed that people generate words in an IID fashion, ignoring valuable information from word order and repetition. SWIRL, on the other hand, can capture these properties and train better text classifiers:
This chart compares naïve Bayes (NB) and logistic regression (LR) classifiers on three data sets using semi-supervised word-labeling algorithms. The only difference is whether the priors (for NB) or constraints (for LR) were created using the SWIRL s parameters or a previously-published method. In nearly all cases, SWIRL is better and the gains are statistically significant (asterisks). This is an exciting step toward machine learning algorithms that benefit from cognitive models of how people think about the problem.
Understanding Cognitive Disorders with SWIRL
Verbal fluency tests are also widely used in neurology and psychology to help diagnose cognitive disorders. In particular, experts think that damage to the prefrontal cortex in the brain makes people more likely to repeat items (Baldo & al., 1998), and damage to the temporal cortex makes them generate shorter lists overall (Rogers & al., 2006). SWIRL’s α and λ parameters can help quantify both of these hypotheses. We ran the SWIRL algorithm on word lists from two kinds of people: patients with temporal-lobe epilepsy, and healthy control subjects. Here are the results for four different word categories:
According to the SWIRL models, patients (1) repeat words more often with larger α and (2) produce shorter lists with smaller λ. Both of these are consistent with — and provide quantitative evidence for — the psychological hypotheses mentioned above. For the most part the patents’ α parameter is larger than the controls, meaning their “word sizes” shrink less after being listed (which makes them more likely to be repeated). However, the word size parameters s do not differ much between the two groups, suggesting that both healthy and brain-damaged individuals have the same “mental distributions” over words in these categories. So brain injuries might not alter what the patients know as much as how they recall it. SWIRL can also do a pretty good job of predicting whether people have epilepsy or not simply based on the lists that they generate.
I’m excited about this new courtship between cognitive modeling and applied machine learning. Even if we are still decades away from psychological constructs like the one in Neuromancer, there is a lot of potential today in using simpler cognitive models and machine learning to encode human thought processes into computer systems.
In the meantime, we still have many improvements to make to SWIRL. For one thing, right now we ignore the fact that people go on semantic runs (e.g., naming several sea creatures before switching to a string of different bird species). If we can do a better job of incorporating the semantics of consecutive words in these lists, it might yield a more accurate model of the human list-generation process, and might also provide more useful information that can be encoded into machine learning algorithms…