Chapter 1 Reflections on our tour of Cognitive Technologies
Matthew Crump, Brooklyn College and Graduate Center of The City University of New York
In Spring 2018 a small group of doctoral and master’s students and I took a tour on the theme of Cognitive Technologies. We were interested in two sub themes that would connect cognition with technology: 1) examples where basic research and theory in cognition has led to applied technologies, and 2) examples where new tech has the potential to augment human cognition. Each week we covered varied topics from speed reading, computational models of semantics, machine learning for classification, computational models of object recognition, decoding brain states, brain training, apps for education, lie detection technology, augmented reality and smart spaces, cognition on drugs, and the use of big data for informing theories of cognition.
1.1 Snake oil: An old technology
We found some “cognitive” technologies did not live to their hype. For example, speed-reading techniques do little more than train people how to skim text quickly. Too bad any gains in speed are accompanied with losses in comprehension (Rayner et al. 2016). There appears to be no strong evidence that brain training games do anything beyond training people on the games themselves (Simons et al. 2016). No far transfer here. Similarly, the evidence for cognition enhancing drugs (that do anything beyond enhancing alertness) is just as dissapointing (Battleday and Brem 2015; Marraccini et al. 2016; Mehlman 2004).
1.2 Tech that works
We found several promising technologies and endeavored to connect their successes to formative ideas in Cognitive Psychology. We focused a great deal on insights from instance-based view of cognition, and their connection to the many recent successes of machine learning applied to various classification problems. For example, problems that used to be identified as easy for for people and hard for machines, like face and object recognition, word and document similarity analysis, gesture, posture, and emotion recognition, voice transcription, and language translation, all have working computational solutions. In general, many of these problems were solved by the same kind of solution: gather a large database of examples, represent each example as a collection of digital features, then train a machine learning algorithm on the examples and measure whether the classifier can generalize to accurately classify new examples that were not from the training set. This big-data approach often works quite well. Why does it work? My view is that success was anticipated by instance theories.
1.3 Connection to Instance Theory
Instance theories of cognition have been developed to account for a range of cognitive abilities, from learning (Jamieson, Crump, and Hannah 2012) and memory (D. L. Hintzman 1988; D. L. Hintzman 1984; D. Hintzman 1986), skill-acquisition (Logan 1988), categorization and concept formation (Jacoby and Brooks 1984), to judgment and decision-making (Dougherty, Gettys, and Ogden 1999). A common assumption among instance theories is that single experiences are a foundational unit of knowledge representation: People retain the details of their specific experiences. In some computational models, such as MINERVA (D. Hintzman 1986), this idea is expressed in terms of a multiple-trace architecture. Every experience is represented as a feature vector coding the “gory detail” of the elemental features of that experience. These experiences are laid down in an instance-based memory, that has an ever expanding body of unique traces for each successive new experience. The notion that people have instance-based memory representation has been met with some skepticism, perhaps due to the incredulous idea that brains could have enough storage space to hold such a large repository of experiences: Instance theory must be wrong because you would receive a “hard-drive is too full message”. Regardless of the limits of brain-space, the more interesting implication from instance-theory is what can be done with a large pool of examples.
In Jacoby & Brooks’ Jacoby and Brooks (1984) view, a large pool of exmamples can provide a non-analytic basis for cognition. When people have a large number of examples, they can perform classification tasks on the basis of analogy by similarity. When presented with a face, rather than pulling up a feature-list for faces and inspecting each element in the stimulus to determine whether the rules say whether it can be called a face, people can look at face, find that they can spontaneiously recall many similar prior examples of this stimulus, then call it a face because it looks globally similar to other faces they have seen.
The essential ingredients for instance theory are 1) a large pool examples, and 2) sensitivity to similarity. In computational models of instance theory, like most global matching memory models (Eich 1982; D. L. Hintzman 1984; Humphreys et al. 1989; Murdock 1993), both of these ingredients are present. The models store items, experiences, or examples, as high-dimensional feature vectors in a memory. The models also assume that retrieval from memory is done by cue-driven similarity. Specifically, the cues or features present in the current environment are represented as a high-dimensional feature vector, and this cue-of-the-present-moment (probe) is used to retrieve similar traces from memory. This can be achieved mathematically by computing the correlation or cosine between the features of the probe, and the features of each trace in memory. Then, the model assumes that people might respond to the probe in the same way that they responded to the traces retrieved by memory.
Finally, instance theories work well when there is structure in the data. Structure refers to the fact that the features of our experiences are not random. They are correlated with themselves over time and space. If our experiences were random, our world would look and sound like the white noise from old TVs not capable of receiving a station. The insight from instance theory is that sensitivity to the structure of the world around us can be obtained by a reconstructive memory process capable of preserving the details of of our experiences. The experiences contain the structure, and we use similairity between the experiences to become sensitive to that structure. Reasoning by analogy, we see this idea being validated by the many successes of machine learning techniques applied to previously hard classification problems. Those solutions were obtained by harvesting enough examples to support accurate classifier generalization to new exemplars.
1.3.1 Procedures of Mind
Our review of instance theories also anticipated some of the lack-luster findings for cognitive enhancing technologies, like brain-training. The work of Kolers was particularly insightful (Kolers and Roediger 1984). Among other things Kolers spilled a great deal of ink on the topic of learning to read upside down (and other geometric rotations of text). Unsurprisingly, he found that people are of course worse at reading weird rotated text compared to normal upright text, and that people can learn to get better at reading these unfamiliar rotations. More important, were his findings about what it was that people had learned: the details. For example, people learned about the very specific thing they were practicing. A subject learning to read upside-down text would not get better in general for reading just any upside down text, instead they get better at reading the specific letters, words, and sentences contained in the examples they were learning. In other words, there wasn’t much far transfer to be had. In Koler’s view, people were learning “procedures of mind” for solving the specific pattern-analyzing problems they were confronted in the training examples. Here, specificity is the rule and generalization can the exception. Generalization can occur when the specific procedures applied to one problem happen to be useful for another. Taking a broader view, it appears that the specificity and lack of far transfer associated with learning new skills is a kind of hegemonic principle. For example, brain-training games train the game in specific, not the brain in general (Simons et al. 2016). Luminosity should have read Kolers.
1.4 Exciting Directions
Every week we read a few papers and everyone was assigned to find neat papers that we should all be aware of. We found some cool stuff. Here’s a few highlights in no particular order.
1.4.1 Conversational AI
Really hard problems like making computers capable being an engaging conversational partner are being tackled with some success. Amazon created the Alexa prize, which would be awarded to a research group who could produce a chatbot (installed in Amazon’s talking speaker Alexa) that could engage a person in chat for 20 minutes, without the person getting tired and cancelling the conversation. The challenge is still standing after the 2017 competition, but they are running it again in 2018, presumably with more training examples to improve the algorithms. It’s around the corner.
1.4.2 Decoding Brain states
Multi-voxel pattern classification analysis for classifying cognitive states based off of neural data has been applied to many problems. It often works. So, now we can get some idea of what dreams you were dreaming while you were lying in the scanner (Horikawa et al. 2013). There are too many other examples to list.
1.4.3 Detecting Deception
Lie detection has moved beyond the polygraph. People might be bad at detecting some of the signals of lying vs. telling the truth (Vrij, Granhag, and Porter 2010), but machine learning techniques are being thrown at the problem with some success. How about lie detection using fmri (Langleben et al. 2005), videos of your face (Meservy et al. 2005), records of language production (Matsumoto and Hwang 2015), or typing (Derrick et al. 2013)? It seems to work better than chance, often much better.
1.4.4 Inner Voice decoding with a chinstrap!
It’s still in development, but this chinstrap tech called alterego can be used to decode what your inner voice is saying. Umm what?
1.4.5 Image Memorability
What if there was machine that could tell you how intrinsically memorable something is? Ad agencies would be into this, they might want to know what pictures would stick most strongly in people’s memories. Some big data initiatives are now sorting this out by having loads of people do memory tasks for loads of pictures (Isola et al. 2011). The result is a massive database of pictures normed for their memorability (Khosla et al. 2015). This can be used to predict memorability of pictures. Baby steps, but moving forward.
1.5 That’s all
There’s alot going in the broad area of cognitive technologies. Researchers in cognition have growing opportunities to make use of computational models and big data to make theoretical and applied progress. It’ll be fun to see what happens.
Rayner, Keith, Elizabeth R. Schotter, Michael EJ Masson, Mary C. Potter, and Rebecca Treiman. 2016. “So Much to Read, So Little Time How Do We Read, and Can Speed Reading Help?” Psychological Science in the Public Interest 17 (1): 4–34. http://psi.sagepub.com/content/17/1/4.abstract.
Simons, Daniel J., Walter R. Boot, Neil Charness, Susan E. Gathercole, Christopher F. Chabris, David Z. Hambrick, and Elizabeth A. L. Stine-Morrow. 2016. “Do ‘Brain-Training’ Programs Work?” Psychological Science in the Public Interest 17 (3): 103–86. doi:10.1177/1529100616661983.
Battleday, Ruairidh M., and A.-K. Brem. 2015. “Modafinil for Cognitive Neuroenhancement in Healthy Non-Sleep-Deprived Subjects: A Systematic Review.” European Neuropsychopharmacology 25 (11): 1865–81.
Marraccini, Marisa E., Lisa L. Weyandt, Joseph S. Rossi, and Bergljot Gyda Gudmundsdottir. 2016. “Neurocognitive Enhancement or Impairment? A Systematic Meta-Analysis of Prescription Stimulant Effects on Processing Speed, Decision-Making, Planning, and Cognitive Perseveration.” Experimental and Clinical Psychopharmacology 24 (4): 269.
Mehlman, Maxwell J. 2004. “Cognition-Enhancing Drugs.” The Milbank Quarterly 82 (3): 483–506. doi:10.1111/j.0887-378X.2004.00319.x.
Jamieson, Randall K., M. J. C. Crump, and Samuel D. Hannah. 2012. “An Instance Theory of Associative Learning.” Learning & Behavior 40 (1): 61–82. doi:10.3758/s13420-011-0046-2.
Hintzman, Douglas L. 1988. “Judgments of Frequency and Recognition Memory in a Multiple-Trace Memory Model.” Psychological Review 95 (4): 528.
Hintzman, Douglas L. 1984. “MINERVA 2: A Simulation Model of Human Memory.” Behavior Research Methods, Instruments, & Computers 16 (2): 96–101.
Hintzman, D. 1986. “Schema Abstraction in a Multiple-Trace Memory Model.” Psychological Review 93 (4): 411–28.
Logan, Gordon D. 1988. “Toward an Instance Theory of Automatization.” Psychological Review 95 (4): 492–527.
Jacoby, Larry L., and Lee R. Brooks. 1984. “Nonanalytic Cognition: Memory, Perception, and Concept Learning.” The Psychology of Learning and Motivation 18: 1–47.
Dougherty, Michael RP, Charles F. Gettys, and Eve E. Ogden. 1999. “MINERVA-DM: A Memory Processes Model for Judgments of Likelihood.” Psychological Review 106 (1): 180.
Eich, Janet M. 1982. “A Composite Holographic Associative Recall Model.” Psychological Review 89 (6): 627–61.
Humphreys, Michael S., Ray Pike, John D. Bain, and Gerald Tehan. 1989. “Global Matching: A Comparison of the SAM, Minerva II, Matrix, and TODAM Models.” Journal of Mathematical Psychology 33 (1): 36–67.
Murdock, Bennet B. 1993. “TODAM2: A Model for the Storage and Retrieval of Item, Associative, and Serial-Order Information.” Psychological Review 100 (2): 183–203.
Kolers, Paul A., and Henry L. Roediger. 1984. “Procedures of Mind.” Journal of Verbal Learning and Verbal Behavior 23 (4): 425–49.
Horikawa, Tomoyasu, Masako Tamaki, Yoichi Miyawaki, and Yukiyasu Kamitani. 2013. “Neural Decoding of Visual Imagery During Sleep.” Science 340 (6132): 639–42. http://science.sciencemag.org/content/340/6132/639.short.
Vrij, Aldert, Pär Anders Granhag, and Stephen Porter. 2010. “Pitfalls and Opportunities in Nonverbal and Verbal Lie Detection.” Psychological Science in the Public Interest 11 (3): 89–121. doi:10.1177/1529100610390861.
Langleben, Daniel D., James W. Loughead, Warren B. Bilker, Kosha Ruparel, Anna Rose Childress, Samantha I. Busch, and Ruben C. Gur. 2005. “Telling Truth from Lie in Individual Subjects with Fast Event-Related fMRI.” Human Brain Mapping 26 (4): 262–72.
Meservy, Thomas O., Matthew L. Jensen, John Kruse, Judee K. Burgoon, and Jay F. Nunamaker. 2005. “Automatic Extraction of Deceptive Behavioral Cues from Video.” In International Conference on Intelligence and Security Informatics, 198–208. Springer.
Matsumoto, David, and Hyisung C. Hwang. 2015. “Identifying Universal Linguistic Features Associated with Veracity and Deception.” HUMIN℡L LLC BERKELEY CA.
Derrick, Douglas C., Thomas O. Meservy, Jeffrey L. Jenkins, Judee K. Burgoon, and Jay F. Nunamaker Jr. 2013. “Detecting Deceptive Chat-Based Communication Using Typing Behavior and Message Cues.” ACM Transactions on Management Information Systems (TMIS) 4 (2): 9.
Isola, Phillip, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2011. “What Makes an Image Memorable?” In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 145–52. IEEE.
Khosla, Aditya, Akhil S. Raju, Antonio Torralba, and Aude Oliva. 2015. “Understanding and Predicting Image Memorability at a Large Scale.” In Computer Vision (ICCV), 2015 IEEE International Conference on, 2390–8. IEEE.