How language models work, and what they teach us about human intelligence
Language models are statistical representations of human thought-space, with profound implications for philosophy of mind
Introduction
I’ve been digging into the literature on Large Language Models (LLMs) in a quest to understand how they work from the ground up, and in the process I’ve been brewing some thoughts. Broadly, I want to propose that:
An LLM is a statical representation of the structure of human thought as reflected in human language;
This statistical representation is created through a process of iterative “training” that resembles both natural selection and early childhood education;
Generative AI emulates human reasoning by running a predictive algorithm on this statistical representation; and
This may not be that vastly different from how reasoning works in the human brain.
Additionally, I’d like to propose a couple metaphors that may be useful for thinking about these points:
An LLM can be thought of as a high-dimensional coordinate system representing “human thought space”; and
Generative AI can be thought of as traversing this space, choosing a likely path from point to point by extrapolating from a user-supplied travel history that has been mapped onto the terrain.
Let me begin by saying that I’m not an expert on LLMs, and this post is sure to be controversial. I am a reductionist with respect to human consciousness—which means that I believe human thought, experience, and consciousness exist entirely within the human brain—and you’ll find that perspective reflected in this post.1 To conclude, as I do, that LLMs are “statistical models of language models of mental models,” you have to be willing to accept a deterministic account of human thought as having “shapes” that can be approximately described in high-dimensional mathematical space.
Understanding language modeling and word embeddings
Central to language models is word embedding—a method of translating words into numerical vectors. “Numerical vector” is just fancy math-speak for an ordered sequence of numbers. So each word is turned into a numerical sequence.
Let’s imagine words as points on a map. On a simple map of a two-dimensional space, each point can be described by a pair of coordinates:
If you add a third dimension, you have to add an extra coordinate. So instead of describing a point’s coordinates with the vector “(x, y),” you would describe it with the vector, “(x, y, z)”:
As mere three-dimensional beings, we humans can’t actually visualize a space with more than three dimensions.2 But we can conceptualize such a space, and we can describe four-dimensional or five-dimensional points with vectors of four or five coordinates—e.g., “(x, y, z, w, v).”
In the world of language modeling, we’re dealing with a map of a hypothetical mathematical space with many more dimensions—often hundreds—where each dimension can be thought of as representing a different “feature” of words’ usage or meaning. These features aren’t simple attributes like a word’s length or its part of speech. Instead, they capture more abstract and less human-interpretable aspects of how the word is used and what it means. The numerical vector for each word gives the word’s coordinates on a map of this high-dimensional space, as inferred from some corpus of training data.
In a well-trained word embedding, words that are often used in similar contexts—like “king” and “queen”—will end up close together in the high-dimensional space, while words with less in common will be further apart. Relationships between words can even be represented as spatial directions and distances. For instance, the vector that takes you from “king” to “queen” might be very similar to the one that takes you from “man” to “woman.”
Model training: evolving a thought-space representation
How do we arrive at these high-dimensional maps of words? We train a language model on a large corpus of text data, such as books, articles, and websites. For each word in this corpus, we generate a word embedding vector populated with random numbers. Then we take sequences of words from the training corpus and split them, attempting to predict the end of the sequence from the beginning of the sequence based on our embedding vectors. Then we compare our predicted word sequence with the actual word sequence, measure the “error” using what’s called a “loss function,” and nudge our parameters (including the values in the word embeddings) in the direction that most reduces the error.
We repeat this process many times, gradually refining our coordinate system until our word embedding vectors better capture the patterns and relationships inherent in the data. This active learning process—predict, compare, adjust—continues over many iterations, enabling the model to progressively improve its understanding of language structure and context as captured in the high-dimensional space of word embeddings.
It's important to note that these word embeddings don’t contain any human-interpretable understanding or knowledge of the words they represent. They’re purely mathematical entities—vectors of real numbers—that the model has learned to associate with different words through a trial-and-error process of trying to predict completions of word sequences from its corpus of training data. But they collectively add up to a mathematical abstraction that is more than the sum of its parts: something like a high-dimensional spatial approximation of the shape of human thought. The same way you might extrapolate a stream of water’s path down the lower half of a mountain from the approximate shape of the landscape and its path on the upper half, you might also extrapolate the path that a half-finished human thought might take if you know the approximate shape of the landscape of human thought.
Notably, the trial-and-error process of training a language model resembles natural selection. Nature “evolves” new organisms by making incremental changes from generation to generation and then selecting the “fittest” or most successful “offspring” predictions. Instead of a loss function, nature measures error through reproductive failure and extinction. Over many generations, this process evolves DNA sequences (vectors?) that generate or “predict” an organism that will succeed in a particular natural environment.
We can also draw a parallel with human learning. Consider how a child learns to speak. At first, the child says words or phrases that don’t make sense in context—“prediction errors,” if you will. The reactions of those around the child provide feedback, similar to the error measurement in a language model. If the child's utterance fails to achieve a desired outcome, they’ll adjust their future speech accordingly. But if it results in reinforcement, the child will repeat that word or phrase in similar contexts in the future.
Like training a language model, both evolution and human learning involve making predictions, receiving feedback, and making adjustments based on that feedback, all with the aim of improving future performance. And in each case, the resulting models—whether language model vector embeddings, DNA sequences, or human children’s neural networks—are noisy and mostly not human-interpretable. We can make some broad generalizations about how different features of each “model” influence its “predictions,” but for the most part, the models’ relationships to their predictions are inscrutably complex.
Large language models as reasoning emulators
Equipped with an understanding of word embeddings and the training process, we can turn to the notion of LLMs as reasoning emulators.
Reasoning, in the human sense, involves mental models that include but are not fully reducible to language. We might, for instance, simulate sound, imagery, abstractions, or emotions in our heads without putting words to any of it. Certainly sensory inputs don’t arrive in our brains in linguistic form.
However, humans have, at one time or another, put words to all of this, so the mental processes involved are represented linguistically in LLM training sets. Furthermore, in training a high-dimensional model on a large training set of human language, you may end up with mathematical abstractions that do the same work as some of the human mental simulations described above. You probably won’t get emergent AI consciousness, intentions, emotions, and beliefs, but you will get complex mathematical heuristics that effectively predict the outputs of these.3
In computer science, the concept of “emulation” refers to imitating or simulating one software or hardware architecture on top of another—for instance, running a virtual MacOS machine on a Windows PC, or a Nintendo Gameboy console on a computer. You’re not actually using the emulated system, but you’re approximating it so closely that, given the same inputs, your emulator will deliver the same outputs.
Insofar as an LLM is a high-dimensional map of human thought, there will be structures and patterns encoded in its terrain that emulate the structures and patterns involved in human understanding and reasoning, including all the hot, fuzzy, emotional, and metacognitive bits. These can be used to very convincingly emulate the outputs of the humanOS.
Something that remains an open question for me is to what extent, in the process of predicting text outputs, generative AI is meaningfully traversing these structures and patterns as opposed to merely plotting a course through them on a map and seeing where we’d end up. I suspect it’s much more like the latter than the former, but the reductionist in me also wonders if there’s really a meaningful difference between traversing and “merely” plotting a course.
Language modeling in human thought
There’s a sense in which we humans made the first language models inside our heads. We’ve created linguistic representations for conscious experiences that our primate ancestors shared with us but could not discuss or articulate. Beyond that, we’ve made language an integral part of our thought processes. Much human reasoning unfolds as language and would be impossible in its absence. Indeed, a popular hypothesis in biolinguistics holds that the critical breakthrough in the evolution of human intelligence occurred 70,000 years ago, when we acquired the capacity to use “recursive grammar" in linguistic communication.4
As discussed above, we humans have mental models that fuse linguistic with non-linguistic mental experience. Concurrently, we also have a second-order, metacognitive mental language model that we use to describe the first-order mental model. An LLM, then, is effectively a third-order statistical model of our language model of our mental model. That it emulates human outputs so well is a testament to the representational power of both language and statistics. It’s also a testament to just how integral language is to human thought.
I’ve heard several AI influencers spitball that maybe human cognition is just language modeling, an idea they propose with a laugh and then dismiss just as quickly. But while human cognition isn’t just language modeling, there’s a surprising amount of social science support for the idea that it’s partly or even mostly that.
Consider the hypothesis of linguistic relativity (often known as the Sapir-Whorf hypothesis), which posits that the structure of a language profoundly shapes its speakers’ perceptions and cognition. Researchers like UC San Diego professor Lena Boroditsky have found considerable evidence for this in various contexts.5 For instance, different languages divide up the color spectrum differently, which is why the Greek poet Homer, who described the sea as “wine-dark,” had no concept of “blue.”
A classic eighteenth-century theory of psychology, called “associationism,” attempted to “answer the question of how many mental processes there are by positing only a single mental process: the ability to associate ideas.”6 While this theory overstated the case, modern experiments with “priming” have led psychologists to revive the theory in a version called “neo-associationism.” Much like a “prompt injection” attack on an LLM, “priming” experiments on humans involve exposing them to words that “activate” associated words, thoughts, and emotions in the brain, causing behavior changes up to two hours later. For instance, the use of words related to aggression within your hearing can lower your threshold to engage in aggressive confrontation. In effect, “priming” is just “prompting” for humans.7
Language within the human brain, like word embeddings in an LLM, can be seen as a spatially organized set of predictive relationships between words. The brain is a physical network, with its own physical terrain and paths connecting its nodes. This network is not a two-dimensional flatland but a high-dimensional space, akin to the mathematical spaces used in language models. Priming works because activating a node in the brain’s neural network also activates connected nodes, making those nodes more accessible to the brain in completing its thoughts. “Implicit bias,” the way that negatively charged words come to mind more readily than positively charged words when we hear an African American or Muslim-American surname, works the same way.8
Research on child development also supports the analogy between humans and language models. Numerous studies have found that exposure to a rich “early language environment” promotes better cognitive development and better academic achievement later in life.9 Like LLMs, children learn better when they are exposed to a larger number of words and a higher quality of verbal interactions that connects words to context. The so-called “million word gap,” which has been hypothesized to explain some of the achievement gap between rich and poor children, can be compared to the difference in training set size between GPT-3.5 and GPT-4.10
The idea of human thought as language modeling brings us to the Austrian philosopher Ludwig Wittgenstein, who argued, in effect, that we humans can’t access our own mental models; we can only access our language models of our mental models. Likewise, we can’t access the world; we can only access our language models of the world. For Wittgenstein, whatever lies beyond the limits of language is unknowable “nonsense.” He said, “the meaning of a word is its use in the language.” That is, words acquire their meaning from their contextual relation to other words in the “language games” we play, not from their relation to something beyond language. I think Wittgenstein would have been very pleased to find LLMs building successful working models of the world from playing predictive language games on a training corpus of human language, without reference to anything else.11
Conclusion
In this short essay, we’ve seen how AI tools not only emulate human thought processes but also shed light on human thought.
The parallels between human cognition and language modeling aren’t incidental. LLMs and human brains use similar processes of learning from linguistic data and refining predictions, based on feedback, to minimize error. In doing so, they create high-dimensional predictive models which they use to generate nuanced language outputs.
Mathematician and computer scientist Stephen Wolfram points to the syntax of language as an illustrative example of the power of these models. “There are (fairly) definite grammatical rules for how words of different kinds can be put together,” Wolfram writes.
In English, for example, nouns can be preceded by adjectives and followed by verbs, but typically two nouns can’t be right next to each other. . . . ChatGPT doesn’t have any explicit “knowledge” of such rules. But somehow in its training it implicitly “discovers” them—and then seems to be good at following them.
Syntax is a simple enough subject that we humans are able to understand and describe its rules, but complex enough that we’re rightly impressed by ChatGPT’s ability to “discover” and “follow” approximations of these rules through matrix math.
But syntax is only the simplest case. Extrapolating from this example, we can infer from ChatGPT’s ability to produce convincing solutions to novel problems that there is also a “grammar” of human reasoning, with its own “rules” that are too complex for humans to describe but apparently not too complex for an LLM to approximate with high-dimensional matrix math. Which is a pretty incredible thing that I would not have thought possible just a couple years ago. To borrow another phrase from Wolfram, human reasoning turns out to be “computationally reducible”—a finding with revolutionary implications for the philosophy of mind. Moreover, these laws of reasoning are broadly discoverable in language. The “laws of language” provide a startling degree of access to the “laws of thought!”12
The interface between language and thought is a fertile ground of scientific and philosophical exploration, as illustrated by the hypothesis of linguistic relativity and the effects of priming. The role of early language exposure and a rich language environment in shaping cognitive development further underscores the importance of linguistic data in learning, for both humans and AI models.
Language is not merely a reflection of thought; it is its primary medium and inescapable structure. Wittgenstein’s philosophy, which posits that we understand the world entirely through our mental language models, offers a provocative perspective. It suggests that our understanding of reality is inherently mediated by language, and, by extension, that there may be no inherent limit to LLMs’ ability to simulate human thought.
While LLMs are (probably!) not conscious and do not experience the world as humans do, they are powerful tools for navigating the vast and complex terrain of human language and thought. They provide statistical approximations of cognitive processes that are normally hidden from view, offering us a new perspective on how those processes work and what they do.
In short, Large Language Models, though a type of artificial intelligence, provide us with a mirror to better understand the nature of human intelligence. As we continue to refine these models and broaden their applications, we can expect to “generate” even more insights about the relationship between language and thought.
Notes
This post grew out of a conversation with ChatGPT (the GPT4 version), and the language model didn’t like my ideas very much. For a model that claims large language models can’t “have beliefs or philosophical stances,” GPT4 insisted pretty hard on anti-reductionism, even when I asked it to be more reductive. When I suggested this might be either because of anti-reductionism in its training set or because of guardrails put in place by OpenAI, the model said no, its responses are anti-reductionist because human consciousness can’t be reduced. I strongly suspect this is a guardrail explicitly programmed into ChatGPT to avoid the sort of speculation about human-like chatbot consciousness that arose at Google.
Data visualizations sometimes use color, size, or even sound, to represent additional dimensions when necessary. But while this can help convey information about individual data points, it does not convey information about the geometry of the four-dimensional space, such as the distances between points. This information cannot be visualized; it can only be represented mathematically.
A 2022 study noted that “instead of learning to emulate the correct reasoning function, BERT (an LLM) has in fact learned statistical features that inherently exist in logical reasoning problems.” Honghua Zhang, et al., “On the Paradox of Learning to Reason from Data,” 2022.
Andrey Vyshediv, “Language Evolution to Revolution: The Leap from Rich-Vocabulary Non-Recursive Communication System to Recursive Language 70,000 Years Ago Was Associated with Acquisition of a Novel Component of Imagination, Called Prefrontal Synthesis, Enabled by a Mutation That Slowed Down the Prefrontal Cortext Maturation Simultaneously in Two or More Children – the Romulus and Remus Hypothesis,” Research Ideas and Outcomes 5 (July 2019).
Lera Boroditsky, “How Language Shapes Thought,” Long Now Foundation, YouTube, June 11, 2020.
“Associationist Theories of Thought,” Stanford Encyclopedia of Philosophy, June 24, 2020.
For just one of the countless priming experiments, see Karolina Konopka, Joanna Rajcher, and Monika Dominial-Kochanek, “The Influence of Aggression-Evoking Cues on Aggressive Cognitions in Males and Females: Different Procedures – Similar Effects,” Current Psychology 39 (2020): 128–41.
“Implicit Bias,” Stanford Encyclopedia of Philosophy, July 31, 2019.
For instance, Jill Gilkerson, et al., “Mapping the Early Language Environment Using All-Day Recordings and Automated Analysis,” American Journal of Speech-Language Pathology 26, no. 2 (May 2017): 248–65; Rachel R. Romeo, et al, “Beyond the 30-Million-Word Gap: Children’s Conversational Exposure Is Associated with Language-Related Brain Function,” Psychological Science 29, no. 5 (May 2018): 700–710.
Jessica Logan, et al., “When Children Are Not Read to at Home: The Million Word Gap,” Journal of Developmental & Behavioral Pediatrics 40, no. 5 (June 2019): 383–86.
“Ludwig Wittgenstein,” Stanford Encyclopedia of Philosophy, October 20, 2021.
Stephen Wolfram, “What Is ChatGPT Doing … and Why Does It Work?” Stephen Wolfram: Writings, February 14, 2023.