How we think (and AI systems don’t)

Image by Andrew Ostrovsky  © 2011

In my April 9 post, I discussed the possible benefits and (more significantly) drawbacks and likely dangers of using Large Language Model (LLM) AI systems in mathematics. I promised I would post a follow-up article on how humans do mathematics. Or, more accurately, how we think we do it. (Not the general “we”, rather the cognitive scientists who study that kind of thing.)

Though many of us have been surprised at the performance of the more recent LLMs, humans designed them and we know how they work, down to the fine detail. In contrast, we have no real knowledge of how our minds work. We did not design them. In fact, it’s by no means certain our current state of scientific knowledge is capable of (ever) describing, let alone explaining, how our minds work. Indeed, there is a strong, natural selection argument to be made that we will never have conscious access to the “inner workings” of our mind. (See the MAA Devlin’s Angle essays for January and February 2023.)

The best we can do is list some features of how we think. Insofar as mathematical thought is just a highly restricted form of human thought (as language and music are restricted forms of human communication), these considerations provide a starting point for speculating how we do mathematics. 

  • Our brains evolved to help us survive in our environment. We possess perception and action systems that intervene on the external world and generate new information about it. Those action systems utilize causal representations that are embodied in theories (scientific or intuitive) and are also the result of truth-seeking epistemic processes. We evaluate those theories with respect to an external world and make predictions about and shape actions in that world; new evidence from that world can radically revise them.
  • Causal representations, such as perceptual representations, are designed (by natural selection) to solve “the inverse problem”: we can reconstruct the structure of a novel, changing, external world from the data that we receive from that world.
  • Those representations may be very abstract, as in scientific theories, but they ultimately depend on perception and action—on being able to perceive the world and act on it in new ways.
  • As a result of cultural evolution, we can learn not only from our experience in the world, but also from one another. There is a balance between two different kinds of cognitive mechanisms. Innovation produces novel knowledge or skill through contact with a changing world.  Imitation allows the transmission of knowledge or skill from one person to another.
  • Imitation means that each individual does not have to innovate—we can take advantage of the cognitive discoveries of others. But imitation by itself would be useless if some agents did not also have the capacity to innovate. It is the combination of the two that allows cultural and technological progress. 
  • At virtually every stage, our cognitive apparatus and reasoning confront the world, either directly or through other agents we interact with.

Now contrast the above considerations with a LLM, which aggregates large amounts of information that have been generated by people and uses relatively simple statistical inference to extract patterns from that information. No contact with the world, no mediation by the world.

The design of an LLM certainly allows for the production of information we were nor previously aware of, and we may sometimes be surprised by the results we get. Nevertheless (in contrast to the human mind) we understand the mechanism of production, down to the fine detail. And it has none of the real-world guardrails of human cognitive activity.

Nothing in the training or objective functions of a LLM is designed to fulfill the epistemic functions of truth-seeking systems such as perception, causal inference, or theory formation. 

That leaves them with the role of (potentially-) useful tools for our individual and societal use. That may turn out to be progress. This new tool may change the way we live and work; if so, then likely in ways that will surprise us (maybe even horrify us, at least initially). Or it may turn out to be for the most part a relatively short-lived, hype-driven bubble. My instinct says it’s the latter, but as a scientist I remain open to being convinced otherwise.

As with any tool that has “hidden parts”, however, it is critical that anyone who uses it understands how it works, what its limitations are, and what dangers those limitations can lead to. Bender’s wonderful term “stochastic parrots” to describe LLMs first appeared in a 2021 research paper titled On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Note that word “dangers” in her title. (The paper appeared before ChatGPT burst onto the scene.) I’m on record as describing LLMs tongue-in-cheek as “tech-bro mansplaining”.

Reference: There’s a mass of literature on the issue described above. I’ll cite one recent paper that provides an initial gateway to that literature. I used it as a reference source for my above outline: Eunice Yiu , Eliza Kosoy, and Alison Gopnik, Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet), Association for Psychological Science: Perspectives on Psychological Science , 2023, pp.1–10, National Institutes of Health, DOI 10.1177/17456916231201401.