Cats and butterflies: 2 misunderstood analogies in scientistic discourse

Butterfly effect and Schrödinger’s cat are 2 very common ways of signalling one’s belonging to the class of the scientifically literate. But they are almost always told wrong. They were both constructed as illustrations of paradoxes or counterintuitive findings in science. Their retelling always misses the crucial ‘as if’.

This is an example of metaphor becoming real through its sheer imagistic and rhetorical power. But it also underscores the need to carefully investigate both domains being mapped onto each other, as well as the act of mapping. Metaphors used generatively are only useful if they are abandoned quickly enough. In this case, the popular imagination not only did not abandon the metaphor, it made it into a literal statement with practical consequences.

The way the two narratives are constructed is usually in the form of:

Science shows us that

  1. Cats can be both dead and alive in a box with an aparatus controlled by superimposed quantum states.
  2. Butterflies can cause hurricanes on the other side of the world.

But the actual formulation should be:

Science produces a lot of counterintuitive and (seemingly) paradoxical results some of which are at odds with each other and/or our experience of the world. For instance:

  1. If we were to apply Heisenberg’s quantum uncertainty principle to the world we know, we would have to admit that a cat in a box with a quantum killing machine is dead and alive at the same time. And that is obviously nonsense.
  2. If we were to apply what we know about chaos theory to the world of causes and effects, we would have to admit that a butterfly flapping its wings on one side of the world, can cause a hurricane on another side of the world. And that is obviously nonsense because butterflies have no real impact on the world on scales larger than a flower.

Both Schrodinger and Lorenz were trying to illustrate the counterintuitive conclusions of their respective scientific fields – quantum mechanics and dynamic (chaotic) systems. And in both cases, it badly backfired.

Although Schödinger’s dilemma was much more foundational to the structure of our universe, it was Lorenz, whose funny paper to an obscure symposium did the more lasting damage.

It is of no practical consequence whether or not in the retelling of Schrödinger’s paradox, we omit the word ‘paradox’ and assert that ‘science tells us that there are machines that can make cats alive and dead at the same time.’ This is merely par for the course in the general magical nature of the public scientific discourse. And it can even spur the development of new models of the physical universe.

But the ‘butterfly effect’ is more dangerous because it seems like it could have practical applications. Lorenz was not stating a paradox per se, only a counterintuitive conclusion that goes against our most common scenarios of causal relations. Our basic experience of the world is that big things move big things and small things don’t. So any suggestions we can make small things move big things seems intriguing. We know levers and pullies can get us some of the way towards that, so the dream of a magical lever is always there. Homeopathy “works” on the same magical principle. But this is not the lesson of complexity.

The most common reformulation of the ‘Butterfly effect’ is: ‘small actions can have a big impact’. However, this confuses sensitivity to initial conditions with a cumulative cascade effect. Every single snowflake contributes equally to an avalanche as do all the other aspects of the environment. Although none of them individually have any effect on the world at human scale, when combined, they can move much larger objects. But it is that combination (into a larger whole) that has the impact.

Whether any particular snowflake is the last to fall or somebody clapped loudly near a snow drift just before the avalanche fell says nothing about the ability of small things to have big effects. Only about our inability to measure small variations in big things accurately enough.

It is not true that a butterfly’s wings cause anything but minute variations of air right next to them. But it may be true that they are one of the infinite variations of the whole weather system that is simply impossible to measure with finite precision. It’s not that it is hard to calculate all the variations, it is that there are more variations than we have atoms to calculate them with.

Sure, we can talk about proximate and ultimate causes, but that again hides the problem of calculation. And if we ignore the practical problems of measuring the infinite with finite tools, we only get mired in philosophical musings on prime movers and free will. And these have yet to lead anywhere in two and a half millennia.

The only thing we can learn from the butterfly effect is that we cannot measure complex systems accurately enough to predict their behavior over the long term with enough precision. The big mismatch is that while the variation in ‘initial conditions’ is too small to measure, the variation in the outcomes is not. And that feels wrong.

Complexity is unsurprisingly too complicated to be captured by a single metaphor. The ‘butterfly effect’ is a good metaphor for the sensitivity to initial conditions aspect of it. But only if we understand that it is a metaphor that illustrates the counterintuitive nature of complexity and not complexity itself.

The larger lesson here is that metaphor is a process. It doesn’t just lie in a bit of text waiting for us to encounter it and understand it. It is picked up as part of stories. It is told, retold, reexamined, abandoned, readopted, and so on. If you unleash a generative metaphor on the world, you should keep an eye on it to make sure it’s still doing the job you meant it to do. That means a lot of talking and then more talking. Just like with butterflies, the ultimate outcome is never certain. That is fine. Metaphors are supposed to open up new spaces. But some of those spaces may have lions in them and we do know that lions have big impacts on human scales. Bon appetit!

3 “easy” things that are hard for both humans and AI

Everybody is agog at what AI systems can do. Nobody thought even 10 years ago that machines could be trained to recognise images or transcribe natural speech as well as they do now. And because of this leap forward everybody has started worrying about AI taking over the world because it will soon be able to do everything people can but better.

On the other hand, there are AI naysayers who point at incredible feats of human creativity and ingenuity and say ‘no machines will ever be able to write a poem’ or ‘manage a company’.

While I’m more than skeptical about the true possibilities of AI, I am equally ekeptical about this supposed limitless human creativity that is beyond the bounds of computation.

I think we can reveal more about the limits and nature of human intelligence and thus the targets (possible limits) for AI development, if we look at very simple things with which both humans and AI struggle albeit in different ways.

Machines are often thought of as capable only of algorithmic processing (such as adding lots of numbers) and humans are thought to excel at massively parallel tasks – also known as intuition (such as telling part dogs from cats). But we will see that they seem to trade these roles in the ways they approach and fail at these appartently simple problems.

I call these problems ‘easy’ because they can be broken into very easy and straightforward components. But they are hard if not impossible in reality because of the curse of dimensionality. Even the slightest variation in those simple components, will grow into an exponential mess.

1. Figuring out time zones

Apple Watches recently stopped working because of Summer Time in Australia. And just the other day, Outlook asked me if I wanted to switch to a continental time zone in Europe. After I said yes, it started scheduling all meetings 2 hours off.

On the other hand, I’ve been arranging meetings between 2 time zones 1 hour apart for 20 years and I still get it wrong about 3 times out of 10.

So what gives? Time zones are conceptually very straightforward. You just have a database of times and places with notes on what time it is when and where relative to some fixed point. Then all you do is subtract anywhere between 1 and 12. What could be easier?

Well, you have to add in change of dates, so you have to switch between today, tomorrow and yesterday quite a lot. But still. There is a finite number of times and places and their combinations, so how hard can it be for human programmers to sit down and write all the code once and for all? Turns out, incredibly hard. There are just too many permutations and they keep changing as the database of times and places is being updated with new information.

So, the magnitude of the problem seems to be too great for humans to come up with exhastively detailed algorithms to deal with it. (To be clear, the core has been solved, but we don’t seem to be able to nail all the edge cases. Things would be a lot worse without computers.)

So why don’t we unleashe machine deep learning on the problem? Well, partly because there’s no good data for a machine to learn on. This is mostly an algorithmic problem. But the inputs of the algorithm come from very human perceptions of how time relates to cyclical things like days and relationships like comparing states between time zones with respect to days. Again, none of this is all that complex. But the algorithmic part is too complex for humans to describe as a series of if-then commands to a computer without making lots of mistakes. And the perspective and context part seems to be completely outside of what any ML algorithm can access at the moment.

So we’re stuck with something that mostly works but not always and is mostly understood but also always confusing.

2. Scheduling a meeting

Scheduling meetings is another simple algorithmic problem related to time. Simply compare two series of numbers, find where they differ and spit out the result. But all of this starts interacting with a lot of human complexities that make the problem completely intractable if what we wanted to do is write a series of commands in the form of ‘if you see this, do that’.

That’s why the work of the human assistant handling the scheduling for a busy person (who rates an assistant) is not just to provide her intelligence or understanding of calendars. It involves conversations with the person whose calendar is being managed about their priorities, options, possible scenarios, conversations with other people, other assistants, eventually arriving at some compromise which is then entered into the straightforward if-then format of a calendar.

It is this conversation with other people that is often overlooked (“tell your people to call my people”). In the case of the assistants of the other busy people who rate an assistant also have to synthesize the priorities and value of their charges through the same process of conversations and adjustments.

The final matching algorithm is very simple – so simple that it seems like noone should need a human assistant any more. But the inputs into the algorithm need to come from sources that are too rich and complex to treat algorithmically or through some multi-dimensional analysis of hidden regularities (like deep neural nets). The inputs require either a fairly general artificial intelligence (although not full blown AGI) or that everyone keeps their calendar in the same way. (Even then we’d probably have to deal with travelling-salesman problems – but at least we have some ideas about the limits on the computability of those.)

There are many individual components of this process that could be algorithmically assisted. But often the simpler algorithms and heuristics such as preference polls and shared calendars are more effective aids than opaque machine learning output.

So although, this looks like a problem that should be solvable through ML, early attempts have been less than impressive.

3. Importing data about people into a table (deduplication)

Computers are great as aids to managing structured data. But the input into the structure has to be provided by humans. Can AI help here?

Imagine you’re organising an event and you want people to tell you if they’re coming. You can just ask and keep in your head who said yes. But that soon becomes too much. As a next step, you ask them to email or to send an RSVP so you can look at the messages and remind yourself who said yes. But even that becomes difficult soon, so you start a list. And the more people and events you need to manage, the more complicated the list is and the more time you have to spend structuring your data and inputing it into some sort of a table for managing and reviewing the data.

The world is littered with Excel sheets kept by event organisers. Now imagine you wanted to feed all the idiosyncratic Excel sheets with event information at a large organisation into a machine learning algorithm and get one number of the total number of participants or the total cost of lunch breaks.

If everybody kept their spreadsheets exactly the same way, this would be trivial. But they don’t. Computers make the task of managing this kind of structured data much easier but they constantly struggle with errors in the input from busy, overworked and cognitively limited humans.

On the surface of it, there’s nothing to prevent this part (ie participant registration and management) of event management from being completely automated. But there’s always a person involved in dealing with this. So could there be an AI system that does all of this? So far we’re not even very close to this. A system that processes some RSVPs via email, others via forms, and others from other sources (“Hi, Clare, Frances told me she was coming to your party!”) does not exist.

So let’s simplify the task even more. Take data from one table of a system (let’s say a registration table) and put it into another table on a different system (let’s say account creation). All the AI system would have to do is figure out what is important to one system and get the right data from another system. At the moment, humans are involved. In better cases by creating an API and programming algorithms to transfer data between systems. In the worse case, they download a spreadsheet from one system, modify it, if needed, and upload it into another system.

This is trivial if you’ve designed both systems and know they have to integrate. But the permutations get out of hand surprisingly quickly when you take any 2 random systems designed by different people for a similar purpose but without the intention to integrate. Is ‘Last name’ always the ‘Second name’, when the full name is in one column, is first name always first? Any one difficulty is easy to spot for a human and disambiguate. But it gets very error prone at scale and there are always some unexpected edge cases.

Even such a simple thing as contact deduplication between two devices of one person is not a completely solved problem.

Why isn’t there an AI system that can evaluate the data and transfer it as appropriate but at scale and without the errors human data processors or programmers or data processing algorithms make?

As always, even the most trivial algorithms require very complex inputs. And with even minor variation in the possible inputs, the if-then logic becomes too unweildy. Although computers in general are great at pursuing if-then logic chains regardless of complexity (within limits), AI algorithms are not. They provide guesses with probabilities. In certain areas, most notably speech and image recognition, their guesses are becoming very good and resembling humans. They may even outperform humans at scale.

But all the if-then part of what to do with these guesses is still handled by if-then algorithms designed by humans. There’s some talk of ‘Programming 2.0’ but nobody seems to be applying it to some of the day-to-day simple problems with complex scaling. Because even small errors in the inputs result in big aggregate problems and AI systems have no way of assessing whether their guesses ‘make sense’.

Is AI impossible?

Maybe AI is just too hard. But these examples don’t claim it’s impossible, they just show that some difficult problems are just difficult. Even if they appear straightforward on the surface.

I have learned not to bet against engineers’ ability to figure out solutions in the long run. It’s not always clear what is solvable by AI and what is not ahead of time. Sometimes, specialised ML systems can be developed to solve problems that don’t generalise (e.g. GO or chess machines). But I would have expected more people to deal with these problems, if there was an easy solution to be found. And there hundreds more similar task-based problems that just won’t be magicked away by slapping the label ‘AI’ on it. Individual ones may be solved by one way or another. Perhaps by breaking them into component parts. But I’m not seeing any specific steps being taken to create general purpose Machine Learning that would deal with all of them. Just wishful thinking about AGI (Artificial General Intelligence) emerging to solve these problems without regard to the actual complexity of some of them or the complexity of the intermediate steps it would take to get there.

So you think you have a historical analogy? Revisionist history and anthropology reading list

What is this about

How badly we’re getting history

While the world of history and anthropology of the last 30-40 years has completely redrawn the picture of our past, the common perception of the overall shape of history and the development of humanity is still firmly rooted in the view that took hold in the 1800s’ mixture of enlightenment and romanticism.

On this view, we are the pinnacle of development, the logical and inevitable outcome of all that came before us. The development of what is us, the changes in history and culture, can be traced in a straight line from the primitive of the past to the sophisticated of the present. From the savage to the civilized (even if we may eschew these for more polite terms).

But nothing could be farther away from the truth. The shape of global history looks nothing like what we have in our minds from textbooks and popular culture. For a start, it is a lot more complicated, circuitous and fuzzy than we might imagine. That won’t surprise many people. Things are always more complicated when looked at closely. But what I would suggest is that the popular image has completely misplaced the centre of gravity of historical and cultural development. It is the universe before Copernicus and Gallileo, it is the physics before Einstein and Heisenberg.

Yet, all we need to find the right balance is readily available in print, online lectures and courses. We just need to seek it out.

What is on this list

In this post, I compiled what I consider key books of the last 20 years (with a few older exceptions) that can help anyone get a better picture of the history of human politics and culture. And through that history, we can also see the balance of the present better.

Not all these books are flawless and they all bring new biases into the picture. No doubt, they too, will eventually be subject to revision as new perspectives open up. Also, they don’t entirely reject all that came before them. They simply provide a better balance and shine light in important blind spots.

I can imagine that many people reading any one of these books might feel compelled to reject them as outliers. But together, they are hard to ignore. They come from different perspectives and disciplines, yet, they complement and reinforce each other.

This was originally meant to be a short list of a few key works but as I was going through my notes, I kept adding new ones. I tried to keep the list to books that synthesize larger areas rather than histories or ethnographies of individual societies even though, these can often be as illustrative.

Most of these books are histories or contain historical data. Yet, many are written by anthropologists or historians with a distinctly anthropological point of view. This very much reflects my personal bias towards the ethnographic.

I divided the list into 2 sections: 1. Easy reads for a general audience and 2. Dense and extensive works for specialists. But in this, I was very much going by intuition.

I decided to provide some illustrative quotes for each book but I went a bit too far with some of them. At the same time, I could have quoted many more important passages. Remember, they all make much more sense in context.

Where available, I also provided links to podcasts or online lectures by the authors. I also compiled a YouTube playlist with key videos which I will keep up-to-date as I discover more.

I would also recommend to anybody that they listen to the New Books Network podcasts. I find those from the New Books in History, Milirary History, South Asian Studies, Islamic Studies, Anthropology and Genocide studies particularly illuminating and would recommend that anybody goes through the archive, as well.

What are the key lessons

This section was rewritten based on Reddit comments.

The overarching message of these books is one of anti-reductionism. They do not look for inevitable overarching trends but they do show repeating patterns. The key points that stand out to me as a lesson to take away from reading these books could be:

  • The global dominance of Western-European culture and politics is a lot more recent than our history books taught us pretty much starting with the Industrial Revolution and not completed until the end of the 19th century.
  • The balance of global history lies in the East rather than the West. Even those we consider the roots of our civilisation (Rome, Greeks) looked to the East.
  • We are blinkered by focusing our perspective on civilizational artifacts such as architecture and writing. This leads us to overlook important political and social units that outnumbered those we can see at any one point in history.
  • The role of the state throughout history was much more complex and uncertain than it may seem from today’s perspective. It was much weaker, less stable and more transient. And it was also not nearly as attractive to its subjects – ie. walls were often built more to keep people in than out.
  • We cannot view the ‘hunter gatherers’ and other ‘pre-technological’ societies of today as remnants of previous evolutionary stages of history. They are as much part of modernity as the technologically-dependent urban centres we know.

Who should read this

  • Anyone who thinks ‘Guns, Germs, and Steel’ is the last word in historical analysis.
  • Rationalists, economists and futurists. I very much enjoy listening to podcasts like EconTalk and Rationally Speaking. But whenever they or their guests make any points regarding history, I cannot but cringe.
  • Anyone who makes historical analogies based on what they learned in school.
  • When I last worked with Peace Corps volunteers, I shared some of these books with them and they were well received. So I think many development and international policy workers would also benefit.
  • Curriculum reformers in the mould of Michael Gove or Pat Buchanan.

Easy, accessible reads

I felt the books in this section are more accessible and aimed at audiences outside the strict confines of their discipline. Some of them are fairly popular accounts but they are all sufficiently scholarly that it is possible to track down their sources and confront them with alternative perspectives. None of them are by popularisers in the vein of Gladwell or Pinker.

‘Against the Grain: A Deep History of the Earliest States’ by James C Scott, 2018

Scott is best known for ‘Seeing Like a State’ but this is a much more important and in many ways better book. His main thesis is ‘Everything we thought about the invention of agriculture and its role in the formation of early civilisations is wrong.’

In this book, Scott summarises recent decades of research on the emergence of agriculture and emergence of early states and finds that we cannot trust any of our assumptions. The early states were temporary, partial and patchy. They cannot be seen as a final stage in some sort of a process of social evolution. A point elaborated by Yoffee below in greater detail.

My main impression from this book is how recent the dominance of state control is. Until about 1500, most people lived outside the control of the great civilisational behemoths. And this was, for many of them, a conscious choice. As Scott described in his earlier book ‘The Art of Not Being Governed’ (also well worth a read).

Similarly to Diamond, Scott also focuses on the importance of certain crops but from the perspective of their utility for taxation. This point is elaborated in Graeber’s ‘Debt’ (see below).

You can see Scott speak about many of these points in several lectures.

Note: I wrote a review of this book for the Czech daily Lidové noviny.

Illustrative quotes

“Contrary to earlier assumptions, hunters and gatherers—even today in the marginal refugia they inhabit—are nothing like the famished, one-day-away-from-starvation desperados of folklore. Hunters and gathers have, in fact, never looked so good—in terms of their diet, their health, and their leisure. Agriculturalists, on the contrary, have never looked so bad—in terms of their diet, their health, and their leisure.”

“In unreflective use, “collapse” denotes the civilizational tragedy of a great early kingdom being brought low, along with its cultural achievements. We should pause before adopting this usage. Many kingdoms were, in fact, confederations of smaller settlements, and “collapse” might mean no more than that they have, once again, fragmented into their constituent parts, perhaps to reassemble later. In the case of reduced rainfall and crop yields, “collapse” might mean a fairly routine dispersal to deal with periodic climate variation. Even in the case of, say, flight or rebellion against taxes, corvée labor, or conscription, might we not celebrate—or at least not deplore—the destruction of an oppressive social order?”

“until the past four hundred years, one-third of the globe was still occupied by hunter-gatherers, shifting cultivators, pastoralists, and independent horticulturalists, while states, being essentially agrarian, were confined largely to that small portion of the globe suitable for cultivation. Much of the world’s population might never have met that hallmark of the state: a tax collector.”

“Where grain, and therefore agrarian taxes, stopped, there too did the state’s power begin to degrade. The power of the early Chinese states was confined to the arable drainage basins of the Yellow and Yangzi Rivers. […] The territory of the Roman Empire, for all its imperial ambitions, did not extend much beyond the grain line.”

much that passes as collapse as, rather, a disassembly of larger but more fragile political units into their smaller and often more stable components.

‘Seven Myths of the Spanish Conquest’ by Matthew Restall, 2003

The title of the book says it all. Almost anything we say (and Jared Diamond said) about the likes of Columbus, Cortez or Pizarro is wrong. Factually and structurally. Perhaps the most important myth Restall presents is that of ‘completion’. The Spanish and later other conquests were more a case of expanding enclaves and negotiations. To imagine the conquistadors as ruling a geographic area in the same way a modern state governs its territory is completely misleading. This is also a point repeated in Yoffee and Scott with respect to ‘ancient civilisations’.

The other point made by Restall is the complete dependence of the European invaders on local political aliances and the relative ineffectiveness and ultimate irrelevance of their ‘technology’. We see this expanded in Thornton and Sherman into other contexts

You can hear Restall talk about many of the same themes in an interview about his more recent book “When Montesuma met Cortez” in this New Books Podcast.

There is also an illustrated lecture available on YouTube that covers the same topics.

Illustrative quotes

“Looking at Spanish America in its entirety, the Conquest as a series of armed expeditions and military actions against Native Americans never ended.”

“Only very gradually did community autonomy erode under demographic and political pressures from non-native populations. From the native perspective, therefore, the Conquest was not a dramatic singular event, symbolized by any one incident or moment, as it was for Spaniards. Rather, the Spanish invasion and colonial rule were part of a larger, protracted process of negotiation and accommodation.”

‘Empires of the Weak: The Real Story of European Expansion and the Creation of the New World Order’ by Jason Sharman, 2019

The central thesis here is that the relationship between the European conquerors and the conquered around the world was very different from the traditional stories. It was not sudden overwhelming military force but gradual exploitation of local political conditions taking place over the course of centuries that resulted in the world we see today.

Most importantly, the thesis of political competition in Europe resulting in European dominance by 1800 purely through superiority of Western military technology is completely dismantled. European weapons made little difference until the 1800s. Updated based on Reddit comments.

Sharman is a political scientist, so perhaps could be accused of moonlighting outside his core expertise, but we’ll see that this thesis is repeated again and again in many of the other books on this list from various perspectives.

I could not find any videos or audio recordings of Sharman about the book. But I’m sure some will appear, soon.

Key quote

‘Europeans did not enjoy any significant military superiority vis-à-vis non-Western opponents in the early modern era, even in Europe. Expansion was as much a story of European deference and subordination as one of dominance. Rather than state armies or navies, the vanguards of expansion were small bands of adventurers or chartered companies, who relied on the cultivation of local allies.’

‘The Silk Roads: A New History of the World’ by Peter Frankopan, 2015

‘The Silk Roads’ is the history of the world that should be the core textbook for anyone interested in the balance of events. It provides the same correction to the shape of history that an alternative projection gives to the distortions taught to us by the Mercator of school atlases.

Frankopan’s book on the First Crusade is also extremely eye-opening and worth a read. It is the one that most balances the perspectives of east, west and the Byzantines.

Here are some places where you can see Frankopan talk about his book:

Jerry Brotton’s ‘This Orient Isle’ could be thought of as a companion book in that it rethinks the position of Britain in this newly rebalanced history. You can watch Brotton talk about his 2016 book ‘in an online lecture. Brotton’s ‘History of the World in 12 Maps’ also adds new perspectives on the orientation of the world.

Illustrative quotes

We think of globalisation as a uniquely modern phenomenon; yet 2,000 years ago too, it was a fact of life, one that presented opportunities, created problems and prompted technological advance.

Rome’s transition into an empire had little to do with Europe or with establishing control across a continent that was poorly supplied with the kind of resources and cities that were honeypots of consumers and taxpayers. What propelled Rome into a new era was its reorientation towards the Eastern Mediterranean and beyond. Rome’s success and its glory stemmed from its seizure of Egypt in the first instance, and then from setting its anchor in the east – in Asia.

the ancient world was much more sophisticated and interlinked than we sometimes like to think. Seeing Rome as the progenitor of western Europe overlooks the fact that it consistently looked to and in many ways was shaped by influences from the east.

Cities like Merv, Gundesāpūr and even Kashgar, the oasis town that was the entry point to China, had archbishops long before Canterbury did. These were major Christian centres many centuries before the first missionaries reached Poland or Scandinavia.

Baghdad is closer to Jerusalem than to Athens, while Teheran is nearer the Holy Land than Rome, and Samarkand is closer to it than Paris and London.

‘Lost Enlightenment: Central Asia’s Golden Age from the Arab Conquest to Tamerlane’ by F. Frederick Starr, 2013

Starr’s book was a real revelation. I had spent a lot of time in Central Asia and read some history of the region. But other than Samarkand, all of that history has now been lost. And I didn’t get a sense that the people living in the region knew much about it.

Much like Davies in ‘Vanished Kingdoms’ in Europe, Starr shows on a global scale how even major civilisations with real impact can disappear without much trace. But even more importantly, he shows that the trajectory of ‘modern’ intellectual development was much more complex than most people believe.

You can see Starr talk about his book in this online lecture.

Illustrative quotes

“This was truly an Age of Enlightenment, several centuries of cultural flowering during which Central Asia was the intellectual hub of the world. India, China, the Middle East, and Europe all boasted rich traditions in the realm of ideas, but during the four or five centuries around AD 1000 it was Central Asia, the one world region that touched all these other centers, that surged to the fore. It bridged time as well as geography, in the process becoming the great link between antiquity and the modern world.”

“every major Central Asian city at the time boasted one or more libraries, some of them governmental and others private.”

“Above all, Central Asia was a land of cities. Long before the Arab invasion, the most renowned Greek geographer, Strabo, writing in the first century BC, described the Central Asian heartland as ‘a land of 1,000 cities.’”

“At the Merv oasis the outermost rampart ran for more than 155 miles, three times the length of Hadrian’s Wall separating England from Scotland. At least ten days would have been required to cover this distance on camelback.”

‘Genghiz Khan and the Making of the Modern World’ by Jack Weatherford, 2004

Jack Weatherford’s portrayal of the Mongol conquests is definitely not non-partisan. He’s with the Mongols. Nevertheless, he opens important vistas about the foundations of modern interconnectedness. This is a good complement to Starr’s covering of the preceding period in the same region.

Here’s a video lecture by Weatherford about some aspects of this story.

Illustrative quotes

In twenty-five years, the Mongol army subjugated more lands and people than the Romans had conquered in four hundred years. Genghis Khan, together with his sons and grandsons, conquered the most densely populated civilizations of the thirteenth century. Whether measured by the total number of people defeated, the sum of the countries annexed, or by the total area occupied, Genghis Khan conquered more than twice as much as any other man in history.

The majority of people today live in countries conquered by the Mongols; on the modern map, Genghis Kahn’s conquests include thirty countries with well over 3 billion people.

Genghis Khan’s empire connected and amalgamated the many civilizations around him into a new world order. At the time of his birth in 1162, the Old World consisted of a series of regional civilizations each of which could claim virtually no knowledge of any civilization beyond its closest neighbor. No one in China had heard of Europe, and no one in Europe had heard of China, and, so far as is known, no person had made the journey from one to the other. By the time of his death in 1227, he had connected them with diplomatic and commercial contacts that still remain unbroken.

‘Lies my Teacher Told Me: Everything Your American History Textbook Got Wrong’ by James W. Loewen, 1995

This book is slightly outside the scope of this list, but I thought it would be of interest to those who were educated in the American school system. But many of its points apply to all school history books. It will open your eyes to how little you can trust to what you learned in school and what was then reinforced through popular cultural reflection of history.

Here’s an extended interview with the author.

Illustrative quotes

“Many history textbooks list up-to-the-minute secondary sources in their bibliographies, yet the narratives remain totally traditional unaffected by recent research.”

“Most Americans tend automatically to equate educated with informed or tolerant. Traditional purveyors of social studies and American history seize upon precisely this belief to rationalize their enterprise, claiming that history courses lead to a more enlightened citizenry. The Vietnam exercise suggests the opposite is more likely true.”

Comprehensive and/or less accessible

These books require more serious commitment and possibly some comfort with reading relatively dense historical and ethnographic accounts. They are not necessarily poorly written or full of jargon but they are not primarily aimed at an audience too far outside the profession of the author (except ‘Debt’ which I included here because it is so long).

‘A Cultural History of the Atlantic World: 1250 – 1820’ by John K. Thornton, 2012

This is a truly impressive historical synthesis that covers an extensive geographic area as well as a significant stretch of time. It provides detailed elaborations of the central thesis of Sharman’s and Restall’s books and should be consulted every time we feel like we want to make a general statement about the developments in that region and in that time. Which we do all the time.

A podcast interview about this book from the New Books Network will give a good sense of what the book is about.

You can also hear Thornton speak on a related topic in this YouTube lecture on the Slave trade.

Illustrative quotes

“Europeans did not possess decisive advantages over any of the people they met, even though their sailing craft were indeed capable of nautical achievements that no other culture up to that time was able to perform.”

“there was really no economic Third World at the time of European expansion in the fifteenth and sixteenth century, if one uses proxy measures of average quality of life as a guide. The crucial quality-of-life determinant was, in fact, social and economic stratification.”

“African states had the upper hand if the game of force was to be played. Although Europeans often fortified their “factories,” as trading posts were usually called, these fortifications could not resist an attack by determined African authorities.”

“Slow-firing weapons cannot allow small numbers of people to defeat larger numbers unless other factors are in play.”

“Cavalry are most effective only when massed in sufficient numbers to inflict sustained casualties on fleeing infantry, and the dozens and on occasion low hundreds of mounted men in Spanish service did not meet this decisive threshold. Native Americans were reasonably quick in establishing tactical countermeasures against the horsemen after the initial encounters.”

‘Myths of the Archaic State: Evolution of the Earliest Cities, States, and Civilizations’ by Norman Yoffee, 2005

This was perhaps the most embarrassingly eye-opening book for me given that I started out my early adult life by studying Egyptology. Yoffee, building on his work and that of others, shows the limits of what a so-called ‘ancient civilisation’ was and could have been. Collapses and interregna were all much less of tragedies for all involved – point made by Scott in a more accessible way. His reimagining of the position of Hammurabi as a political and literary rather than a legal document was just one of the many myths that this book burst for me.

Norman Yoffee speaks about new perspectives on the collapse summarizing his more recent work.

Illustrative quotes

“[Myth of the archaic states include:] (1) the earliest states were basically all the same kind of thing (whereas bands, tribes, and chiefdoms all varied within their types considerably);(2) ancient states were totalitarian regimes, ruled by despots who monopolized the flow of goods, services, and information and imposed “true” law and order on their powerless citizens; (3) the earliest states enclosed large regions and were territorially integrated; (4) typologies should and can be devised in order to measure societies in a ladder of progressiveness; (5) prehistoric representatives of these social types can be correlated, by analogy, with modern societies reported by ethnographers; and (6) structural changes in political and economic systems were the engines for, and are hence necessary and sufficient conditions that explain, the evolution of the earliest states.”

That the laws of Hammurabi were copied in Mesopotamian schools for over a millennium after Hammurabi’s death attests to the literary success of the composition and has nothing to do with its juridical applicability. […] There is no mention of the code of Hammurabi in the thousands of legal documents that date to his reign and those of his immediate successors.

Order could not survive the frequent shocks it suffered if people were not able to construct the institutions of legitimacy and to determine the quality of illegitimacy. Legitimacy normally invokes the past as something that is absolute and that acts as a point of reference for the present, normally by transmuting the past into some form of the present.

‘Debt: The First Five Thousand Years’ by David Graeber, 2011

I think this is perhaps the best intro to modern anthropological thinking in general. It is very readable and accessible but also very comprehensive. It certainly has its agenda but Graeber tells a convincing story that undermines the classical thinking about the role of exchange in maintaining civilisations. It is easy to get bogged down in the discussions about the nature of money when discussing this book but what it really does is show the great variety of ways in which people relate to each other.

Graeber gave a lecture on his book at Google which is available on YouTube. But this book works best when read as a whole.

Illustrative quotes

there is good reason to believe that barter is not a particularly ancient phenomenon at all, but has only really become widespread in modern times. Certainly in most of the cases we know about, it takes place between people who are familiar with the use of money, but for one reason or another, don’t have a lot of it around.

Through most of history, when overt political conflict between classes did appear, it took the form of pleas for debt cancellation—the freeing of those in bondage, and usually, a more just reallocation of the land.

“Kingdoms rise and fall; they also strengthen and weaken; governments may make their presence known in people’s lives quite sporadically, and many people in history were never entirely clear whose government they were actually in. … It’s only the modern state, with its elaborate border controls and social policies, that enables us to imagine “society” in this way, as a single bounded entity.”

there are three main moral principles on which economic relations can be founded, all of which occur in any human society, and which I will call communism, hierarchy, and exchange.

“communism” is not some magical utopia, and neither does it have anything to do with ownership of the means of production. It is something that exists right now—that exists, to some degree, in any human society, although there has never been one in which everything has been organized in that way, and it would be difficult to imagine how there could be. All of us act like communists a good deal of the time. None of us acts like a communist consistently.

“baseline communism”: the understanding that, unless people consider themselves enemies, if the need is considered great enough, or the cost considered reasonable enough, the principle of “from each according to their abilities, to each according to their needs” will be assumed to apply.

In many periods—from imperial Rome to medieval China—probably the most important relationships, at least in towns and cities, were those of patronage.

‘Vanished Kingdoms: The Rise and Fall of Nations’ by Norman Davies, 2010

Most of the books on the list focus on forgotten, misunderstood or ignored aspects of global history or culture. But Davies shows that even in our own backyard, entire kingdoms vanished without a trace in our consciousness. Perhaps, I should have chosen his ‘Europe: A History’ which is also revisionist in that it places Europe’s cultural, geographic and historical center of gravity much further east and south than is typical. But I found this book much more revelatory and impactful for the purposes of this list.

Davies gave a lecture about his book at the LSEwhich is available as a recording.

A brief interview with Davies about this book is available on YouTube.

Illustrative quotes

“As soon as great powers arise, whether the United States in the twentieth century or China in the twenty-first, the call goes out for offerings on American History or Chinese History, and siren voices sing that today’s important countries are also those whose past is most deserving of examination, that a more comprehensive spectrum of historical knowledge can be safely ignored.”

Most importantly, students of history need to be constantly reminded of the transience of power, for transience is one of the fundamental characteristics both of the human condition and of the political order.

Popular memory-making plays many tricks. One of them may be called ‘the foreshortening of time’. Peering back into the past, contemporary Europeans see modern history in the foreground, medieval history in the middle distance, and the post-Roman twilight as a faint strip along the far horizon.

One has to put aside the popular notion that language and culture are endlessly passed on from generation to generation, rather as if ‘Scottishness’ or ‘Englishness’ were essential constituents of some national genetic code.

To all who have been seduced by the concept of ‘Western Civilization’, therefore, the Byzantine Empire appears as the antithesis – the butt, the scapegoat, the pariah, the undesirable ‘other’.6 Although it formed part of a story that lasted longer than any other kingdom or empire in Europe’s past, and contains in its record a full panoply of all the virtues, vices and banalities that the centuries can muster, it has been subjected in modern times to a campaign of denigration of unparalleled virulence and duration.

‘The Inheritance of Rome: A History of Europe from 400 to 1000’ by Chris Wickham, 2009

The ‘Fall of Rome’ and the subsequent ‘dark ages’ have been one of the big obsessions of historical introspection for centuries. They are the frequent source domain of civilizational analogies even though, as Chris Wickham shows, almost nothing we think of as a given holds up. This book is just one of many in recent historical scholarship that revisits the notion of the dark ages and shines a light on the period of ‘late Rome’ as seen from the perspective of its own time. Of course, many controversies remain but the change in emphasis seems incontrovertible. There are echoes of similar points in the early chapters of Davies’ ‘Vanished Kingdoms’.

I could not find any lectures on the subject by Wickham but some of these questions were raised in this panel he chaired on the middle ages.

Note: I wrote a review of this book for the Czech daily Lidové noviny.

There are also a number of lecture series on this period that reflect the latest scholarship in The Great Courses from the Teaching Company that are available via Audible, as well. I particularly recommend those by Kenneth Harl.

Illustrative quotes

“Anyone in 1000 looking for future industrialization would have put bets on the economy of Egypt, not of the Rhineland and Low Countries, and that of Lancashire would have seemed like a joke.”

Byzantine ‘national identity’ has not been much considered by historians, for that empire was the ancestor of no modern nation state, but it is arguable that it was the most developed in Europe at the end of our period.

the East remained politically and fiscally strong, and eastern Mediterranean commerce was as active in 600 as in 400.

Far from ‘corruption’ being an element of Roman weakness, this vast network of favours was one of the main elements that made the empire work. It was when patronage failed that there was trouble.

The Persian state was almost as large as the Roman empire, extending eastwards into central Asia and what is now Afghanistan; it is much less well documented than the Roman empire, but it, too, was held together by a complex tax system, although it had a powerful military aristocracy as well, unlike Rome.

‘The Anthropology of Eastern Religions: Ideas, Organizations, and Constituencies’ by Murray Leaf, 2014

This was a late addition to this list and it is an imperfect volume in that, as one reviewer put it: “[its] worthwhile aims are met unevenly, resulting in a book that is certainly informed and informative, but often inconsistent in tone and level of analysis.” But I think its core message in chapter one of religion as a social institution which has much more in common with others than traditional religious studies would have us believe.

I couldn’t find any interviews or lectures. But I believe that this interview with Russell McCutcheon about the limits of religious studies would provide a useful complement.

Illustrative quotes

The world religions are complex social phenomena. They use ideas of several different kinds. They include substantial systems of physical infrastructure. They have provisions for economic support. They embody their own systems of scholarship. They produce propaganda and they are politically important in many different ways. From time to time their leaders in various places have commanded armies and conducted wars. This cannot be explained simply by reviewing them as so many sets of beliefs.

The most conspicuous problem in contemporary comparative religion is that they underrate diversity. […] One result is to overstate what the major religions have in common with each other while understating or ignoring what they have in common with traditions considered non-religious.

The general class of cultural phenomena to which world religions belong can be described as large-scale, translocal, multi-organizational, professionalized cultural complexes.

Virtually all Japanese have recourse to the ideas and organizations of Buddhism and Shinto, and for the most part this is also true of Confucianism. […] Japanese parks are Shinto and the system of Shinto shrines in Japan has much the same place in Japanese emotional life as the system of national parks does for Americans. Confucian ideas are important in administrative and professional contexts.

‘Europe and the People without History’ by Eric R Wolf, 1982

This is the oldest book on the list and it has inspired many others.

Unlike Graeber, reading Wolf is hard going. This is certainly for the committed but it repays the effort. Even just looking at the maps showing the intricate trade routes going from the heart of Africa to the Baltic Sea is eye-opening.

Wolf’s central point is also the central point all the authors on this list return to again and again. We invented history based on the things that were easy to see. But this was very much looking for the keys under the lamppost where the light was and not where we lost them. Wolf (similarly to Graeber and Scott) has a distinctly untraditional politics leaning to the left (if perhaps not as much to anarchism).

Illustrative quotes

“Africa south of the Sahara was not the isolated, backward area of European imagination, but an integral part of a web of relations that connected forest cultivators and miners with savanna and desert traders and with the merchants and rulers of the North African settled belt. This web of relations had a warp of gold, “the golden trade of the Moors,” but a weft of exchanges in other products. The trade had direct political consequences. What happened in Nigerian Benin or Hausa Kano had repercussions in Tunis and Rabat. When the Europeans would enter West Africa from the coast, they would be setting foot in a country already dense with towns and settlements, and caught up in networks of exchange that far transcended the narrow enclaves of the European emporia on the coast. We can see such repercussions at the northern terminus of the trade routes in Morocco and Algeria. Here one elite after another came to the fore, each one dependent on interaction with the Sahara and the forest zone. Each successive elite was anchored in a kin-organized confederacy, usually mobilized around a religious ideology.”

Pre-cursors and proto-revisionists

In many ways, almost any history is revisionist history. Each generation writes its own history books to reflect new knowledge but also new perspectives. Most history book authors feel they have something new with which to contribute and that can revise current understanding of the subject matter.

So it is not surprising that even revisionism in the vein that I’m looking at here is not just a matter of the last 20 or so years.

Much of the current revision was inspired by‘The Great Transformation’ by Karl Polanyi) published in 1944 which in turn rests on many of the anthropological revisions started by people like Franz Boas in the US and Bronisław Malinowski.

There is a continued thread of back and forth since at least then. Marshall Sahlins’ ‘Stone Age Economics’ from 1972 (with papers going back to the mid 1960s) started much revision and revision of the hunter gatherer condition. And so on.

At the same time but independently, people like Joseph Needham were painstakingly collecting data on the great civilizations of the ‘East’ which can now give us a fuller and more balanced picture of the world.

We should also not forget the work that has gone into revising the simplistic view of the 19th and 20th centuries, most notably to do with the emergence of the nation state. Here names such as Eric Hobsbawn, Ernest Gellner, Miroslav Hroch and, of course, Benedict Anderson, come to mind. And then there are the many people who are rethinking more recent events such as Timothy Snyder or Antony Beevor.

The list just goes on.

The other side of the coin

Of course, there is also the other side. Historical revisionists with grand schemes and overarching historical narratives. I’ve already mentioned Jared Diamond but also worth reading is Ian Morris. I’ve critiqued some of their work in my thesis proposing the metaphor of ‘History as Weather’. There are also people like Niall Ferguson who cannot be doing with all this rebalancing and want to put ‘the West’ back at the centre of things. I found his attempt in Civilization extremely unconvincing but he is a prominent voice in the anti-revisionist camp.

Note: I wrote a joint review of Morris and Ferguson for the Czech daily Lidové noviny under the title of ‘New historical eschatology’. I also wrote a positive review of Jared Diamond’s ‘Collapse’ soon after it came out which I would now revise significantly – see Questioning Collapse and After Collapse.

Steven Pinker’s ‘Better Angels of Our Nature‘ is another example of a grand sweep of history that tries to make the progress of humanity appear more directional and straightforward than the books on this list suggest it is or can be. And we should not forget the great systematisers of the 1990s Fukuyama andHuntington.

The temptation to discover the key to what makes human history tick is great. FromToynbee to Hari Seldon. And it is not difficult to discover interesting patterns.

The picture that the books on this list paint is that grand narratives of history do not stand up well to scrutiny. They may provide a useful lens through which to view the past, or more often the present. But there is always another grand narrative just around the corner.

In thinking about the predictive utility of history, I asked: “So what is the point of history then? Its accurate predictions are not very useful and its useful predictions are not very accurate.” History and ethnography show us the range of possible ways of being human. They don’t tell us what to do next or how to be, but they are essential components of our never-ending quest to find out what we could be.

Fruit loops and metaphors: Metaphors are not about explaining the abstract through concrete but about the dynamic process of negotiated sensemaking

Note: This is a slightly edited version of a post that first appeared on Medium. It elaborates and exemplifies examples I gave in the more recent posts on metaphor and explanation and understanding.

One of the less fortunate consequences of the popularity of the conceptual metaphor paradigm (which is also the one I by and large work with on this blog) is the highlighting of the embodied metaphor at the expenses of others. This gives the impression that metaphors are there to explain more abstract concepts in terms of more concrete ones.

Wikipedia: “Conceptual metaphors typically employ a more abstract concept as target and a more concrete or physical concept as their source. For instance, metaphors such as ‘the days [the more abstract or target concept] ahead’ or ‘giving my time’ rely on more concrete concepts, thus expressing time as a path into physical space, or as a substance that can be handled and offered as a gift.“

And it is true that many of the more interesting conceptual metaphors that help us frame the fundamentals of language are projections from a concrete domain to one that we think of as more abstract. We talk about time in terms of space, emotions in terms of heat, thoughts in terms of objects, conversations as physical interactions, etc. We can even deploy this aspect of metaphor in a generative way, for instance when we think of electrons as a crowd of little particles.

But I have come to view this as a very unhelpful perspective on what metaphor is and how it works. Instead, going back to Lakoff’s formulation in Women, Fire, and Dangerous Things, I’d like to propose we think of a metaphor as a principle that helps us give structure to our mental models (or frames). But unlike Lakoff, I like to think of these as an incredibly dynamic and negotiated process rather than as a static part of our mental inventory. And I like to use conceptual intergation or blending as way of thinking about the underlying cogntivive processes.

Metaphor does two things: 1. It helps us (re)structure one conceptual domain by projecting another conceptual domain onto it and 2. In the process of 1, it creates a new conceptual domain that is a blend of the two source domains.

We do not really understand one domain in terms of another through metaphor. We ‘understand’ both domains in different ways. And this helps us create new perspectives which are themselves conceptual domains that can be projected or projected into. (As described by Fauconnier and Turner in The Way We Think).

This makes sense when we look at some of the conventional examples used to illustrate metaphors. “The man is a lion” does not help us understand lesser known or more abstract ‘man’ by using the better known or more concrete ‘lion’. No, we actually know a lot more about men and the specific man we’re thus describing than we do about lions. We are just projecting the domain of ‘lions’ including the conventionalised schemas of bravery and fierceness onto a particular man.

This perspective depends on our conventionalised way of projecting these 2 domains. Comparison between languages illustrates this further. The Czech framing of lions is essentially the same as English but the projection into people also maps lion’s vigour into work to mean ‘hard working’. So you can say “she works as a lion”, meaning she works hard. But in the age of documentaries about lions, a joke subverting the conventionalised mapping also appeared and people sometimes say “I work like a lion. I roar and go take a nap.” This is something that could only emerge as more became conventionally known about lions.

But even more embodied metaphors do not always go in a predictable direction. We often structure affective states in terms of the physical world or bodily states. We talk about ‘being in love’ or ‘love hitting a rocky patch’ or ‘breaking hearts’ (where metonymy also plays a role). But does that really mean that we somehow know less about love than we know about travelling on roads? Love is conventionally seen as less concrete than roads or hearts but here we allow ourselves to be mislead by traditional terminology. The domain of ‘love’ is richly structured and does not ‘feel’ all that abstract to the participants. (I’d prefer to think of ‘love’ as a non-prototypical noun; more prototypical than ‘rationalisation’ but less prototypical than ‘cat’).

Which is why ‘love’ can also be used as the source domain. We can say things like “The camera loves him.” and it is clear what we mean by it. We can talk about physical things “being in harmony” with each other and thus helping us understand them in different ways despite harmony being supposedly more abstract than the things being in harmony.

The conceptual domains that enter into metaphoric relationships are incredibly rich and multifaceted (nothing like the dictionaries or encyclopedias we often model linguistic meaning after). And the most important point of unlikeness is their dynamic nature. They are constantly adapting to the context of the listeners and speakers, never exactly the same from use to use. We have a rich inventory of them at our disposal but by reaching into it, we are also constantly remaking it.

We assume that the words we use have some meanings but it is us who has the meanings. The words and other structures just carry the triggers we use to create meanings in the process of negotiation with the world and our interlocutors.

But this sounds much more mysterious and ineffable than it actually is. These things are completely mundane and they are happening every time we open our mouths or our minds. Here’s a very simple but nevertheless illuminating illustration of the process.

Not too long ago, there were two TV shows that had some premise similarities (Psych and The Mentalist). One of them came out a year earlier and its creators were feeling like their premise was copied by the other one. And they used the following analogy:

“When you go to the cereal aisle in a grocery store, and you see Fruit Loops there. If you look down on the bottom, there’s something that looks just like Fruit Loops, and it’s in a different bag, and it’s called Fruity Loop-Os.” 

I was watching both shows at the time but their similarity did not jump out at me. But as soon as I read that comparison it was immediately clear to me what the speaker was trying to say. I could automatically see the projection between the two domains. But even though it seemed the cereal domain was more specific, it actually brought a lot more with it than the specificity of cereal boxes and their placement on store shelves. What it brought over was the abstract relationship between them in quality and value but also many cultural scripts and bits of propositional knowledge associated with cereal brands and their copycats.

But there was even more to it than that. The metaphor does not stop at its first outing (it’s kind of like mushrooms and their  in this way). Whenever, I see a powerful analogy or generative metaphor on the internet, I always look for the comments where people try to reframe it and create new meanings. Something I have been calling ‘frame negotiation’. Take almost any salient metaphoric domain projection and you will find that it is only a part in a process of negotiated sense making. This goes beyond people simply disagreeing with each other’s metaphors. It includes the marshalling of complex structuring conceptual phenomena from schemas, rich images, scenarios, scripts, to propositions, definitions, taxonomies and conventionalised collocations.

This blog post and its comments contain almost all of them: . First, the post author spends three paragraphs (from third on), comparing the two shows and finding similarities and differences. This may not seem like anything interesting but it reveals that the conceptual blends compressed in the cereal analogy are completely available and can be discussed as if it was a literal statement of fact.

Next, the commenters, who have much less space, return to debating the proposition by recompressing it into more metaphors. These are the first four comments in full:

  1. Anonymous said… They’re not totally different. It’s more like comparing Fruit Loops to Fruit Squares which happen to taste like beef.
  2.  said… I think a better comparison would Corn Flakes and Frosted Flakes. Both are made with the same cereal, but one’s sweeter (Psych).
  3.  said… Sweeter as in more comedy oriented? They are vastly different shows that are different on so many levels.
  4. Anonymous said… nikki could not be more right with the corn flakes and frosties analogy

Here we see the process of sense making in action. The metaphoric projection is used as one of several structuring devices around which frames are made. Comment 1 opens the the process by bringing in the idea of reframing through other analogs in the cereal domain. 2. continues that process by offering an alternative. 3. challenges the very idea of using these two domains and 4. agrees with 2 as if this were a literal statement but also referring to the metalinguistic tool being used.

The subsequent comments return to comparing the two shows . Some by offering propositions and scenarios, others by marshalling a new analogy.

 said… The reason the Mentalist feels like House is because house is a modern day medical version of Homes as in Holmes Sherlock. Also both Psych and The Mentalist are both Holmsian in creation. That being said I love the wit and humor of psych

Again, there is no evidence of the concrete/abstract duality or even one between less and better known domains. It is all about making sense of the domains in both cognitive and affective ways. Some domains have very shallow projections (partial mappings) such as cornflakes and frosty flakes, others have very deep mappings such as Sherlock Holmes. They are not providing new information or insight in the way we traditionally think of them. Nor are they providing an explanation to the uninitiated. They are giving new structure to the existing knowledge and thus recreating what is known.

The reason I picked such a seemingly mundane example is because all of this is mundane and it’s all part of the same process. One of my disagreements with much of metaphor application is the overlooking of the ‘boring’ bits surrounding the first time a metaphor is used. But metaphors are always a part of a complex textual and discursive patterns and while they are not parasitic on the literal as was the traditional slight against them, they are also not the only thing that goes on when people make sense.

3 burning issues in the study of metaphor

I’m not sure how ‘burning’ these issues are as such but if they’re not, I’d propose that they deserve to have some kindling or other accelerant thrown on them.

1. What is the interaction between automatic metaphor processing and deliberate metaphor application?

Metaphors have always been an attractive subjects of study. But they have seen an explosionof interest since ‘Metaphors we live by’ by Lakoff and Johnson. In an almost Freudian turn, these previously seemingly superfluous baubles of language and mind, became central to how we think and speak. All of a sudden, it appeared that metaphors reveal something deeper about our mind that would otherwise remain hidden from view.

But our ability to construct and deconstruct metaphors was mostly left unexamined. But this happens ‘literally’ all the time. People test the limits of ‘metaphor’ through all kinds of dicoursive patterns. From, saying things like ‘X is more like Y’ to ‘X is actually Y’ or even ‘X is like Y because’.

How does this interact with the automatic, instantaneous and unconscious processing of language. (Let’s not forget that this is more common)

2. What is the relationship between the cognitive (conceptual) and textual metaphor?

Another way to pose this question is: What happens in text and cognition in between all the metaphors? Many approaches to the study of metaphor only focus on the metaphors they see. They seem to ignore all the text and thought in between the metaphorical. But, often, that is most of goes on.

With a bit of effort, metaphors can be seen everywhere but they are not the same kind of thing. ‘Time is money’, ‘stop wasting my time’, and ‘we spent some time together’ are all metaphorical and relying on the same conceptual metaphor of TIME IS A SOMETHING THAT CAN BE EXCHANGED. But they are clearly not doing the same job of work for the speaker and will be interpreted very differently by the listener.

But there’s even more at stake. Imagine a sentence like ‘Stop wasting my time. I could have been weeding my garden spending time with my children instead of listening to you.’ Obviously, the ‘wasting time’ plays a different role than in a sentence ‘Stop wasting my time. My time is money and when you waste my time, you waste my money.’ The coceptual underpinnings are the same, but way they can be marshalled into meaning is different.

Metaphor analysts are only too happy to ignore the context – which could often be most of the text. I propose that we need a better model for accounting for metaphor in use.

3. What are the different processes used to process figurative language

There are 2 broad schools of the psychology of metaphor. They are represented by the work of Sam Glucksberg and Raymond Gibbs. The difference between them can be summarised as ‘metaphor as polysemy’ vs ‘metaphor as cognition’. Metaphor, according to the first, is only a kind of additional meaning, words or phrases have. While the second approach sees it as a deep interconnected underpinning of our language and thought.

Personally, I’m much closer to the cognitive approach but it’s hard to deny that the experimental evidence is all over the place. The more I study metaphor, the more I’m convinced that we need a unified theory of metaphor processing that takes both approaches into account. But I don’t pretend I have a very clear idea of where to even start.

I think such a theory would also have to account for differences in how inviduals process metaphors. There are figurative language pathologies (e.g. gaps in ability to process metaphor is associated with autism). But clearly, there are also gradations in how well individuals can process metaphor.

Any one individual is also going to vary over time and specific instances in how much they are able  and/or willing to consider something to be metaphorical. Let’s take the example of ‘education is business’. Some people may not consider this to be a metaphor and will consider it a straightforward descriptive statement along the lines of ‘dolphins are mammals’. Others will treat it more or less propositionally but will dispute it on the grounds that ‘education is education’, and therefore clearly not business. But those same people may pursue some of the metaphorical mappings to bolster their arguments. E.g. ‘Education is business and therefore, teachers need to be more productive.’ or ‘Education is not business because schools cannot go bankcrupt’.

Bonus issue: What are the cognitive foundations shared by metaphor with the rest of language?

This is not really a burning issue for metaphor studies so much as it is one for linguistics. Specifically semantics and pragmatics but also syntax and lexicography.

If we think of metaphor as conceptual domain (frame) mapping, we find that this is fundamental to all of language. Our understanding of attributes and predicates relies on the same ability to project between 2 domains as does understanding metaphor. (Although, there seems to be some additional processing burden on novel metaphors).

Even seemingly simple predicates such as ‘is white’ or ‘is food’ require a projection between domains.

Compare:

  1. Our car is white.
  2. Milk chocolate is white.
  3. His hair is white.

Our ability to understand 1 – 3 requires that we map the domain of the ‘subject’ on to the domain of the ‘is white’ predicate. Chocolate is white through and through whereas cars are only white in certain parts (usually not tires). Hair, on the other hand, is white in different ways. And in fact, ‘is white’ can never be fully informative when it comes to hair because there are too many models. In fact, it is even possible to have opposite attributes mean the same thing. ‘Nazi holocaust’ and ‘Jewish holocaust’ are both use to label the same event (with similar frequency) and yet it is clear that they refer to one event. But this ‘clarity of meaning’ depends on projections between various domains. Some of these include ‘encyclopedic knowledge’. For instance, ‘Hungarian holocaust’ does not possess such clarity outside of specialist circles.

It appears that understanding simple predicates relies on the same processes as understanding metaphor does. What makes metaphor special then? Do we perhaps need to return to a more traditional view of metaphor as a rhetorical device but change the way we think about language?

That is what I’ve been doing in my thinking about language and metaphor but most linguistic theories treat these as unremarkable phenomena. This leads them to ignore some pretty fundamental things about language.

Innovation is bad for business: 3 more ‘I’ words to compare innovation to

Innovation is the ‘in’ thing. Innovate or die is the buzz up and down the hive mind. Everybody is feeling like they must innovate all of the things all of the time. But is the incessant innovation the right mode of approaching this?

We constantly spin up stories of the intrepid innovator and the change they bring about in the world. But is that what is really happening on the ground? I think we can bring up some metaphors to bear on this that may open up some different possibilities of conceptualizing innovation.

Note: I’m writing this as an early adopter and some time professional innovator. But despite a personal drive to constantly try new things, I find that approaching innovation uncritically and without regard to the full diversity of its instantiations is counterproductive. Therefore, this is meant to be a corrective as much to myself as others.

Infection as metaphor of innovation

When an innovation spreads through an institution, it does so just like an infection. It starts attacking existing systems who then have to spend time and resources on averting the damage done by the attack. We only tell the stories where the attack led to the strengthening of the system – like with childhood maladies or inoculation. But chronic disease and death of the system are also not uncommon – we just don’t tell them in association with innovation.

This weakening happens through many processes that every innovator and innovatee (and this covers most people in one way or another) will have direct experience of. Innovators have experienced resistance, doubt, slow response times. Those are all defense mechanisms the disease of innovation has to overcome.

The people who experience the innovation as the infected cells, can attest to lower productivity because of the need to learn new things, endless meetings on how to implement the new thing keeping them from doing the job, miscommunication and misunderstanding leading to higher error rate (be it in production or management).

Where this metaphor breaks down is that if the innovation is successful, the system is transformed. Almost, as if instead of a tumor killing us it would grow us a new useful organ while others may fall off without much harm to the organism as a whole.

On the other hand, the success of the system in extinguishing a malignant innovation can make it more resilient to innovation in the future. And this may lower its chances of survival in the face of environmental changes that make it easier to thrive for those where innovation took the system over. This latter aspect is what proponents of innovation as an unalloyed good point to. But that is a backward perspective. From the ground, the system as an organism will always have to start by defending against innovation as infection no matter how well-intentioned everybody involved may be.

Ignorance as metaphor of innovation

Innovation is often equated with knowledge. People research new ways of doing things, they bring together existing strands of knowledge and weave from them beautiful tapestries of brighter futures. But in practice, innovation almost always benefits from ignorance. Or even depends on it.

Ignorance by inventors is a well-known companion to some of the biggest inventions and discoveries. When Morse set out to create the telegraph system, it was received wisdom that what he was trying to do was physically impossible but he did not know that. Columbus is often given as the example of the innovator who was right and pursued his correct knowledge in spite of ridicule. But the truth was he was a zealot crank and completely wrong about everything. Everybody had known the Earth was round for over a thousand years by the time of Columbus and they also knew how large it was. He did not doubt it was round but he subscribed to a crank theory that it was much smaller than it was. He just got lucky there was a continent in the way.

This ignorance-of-the-impossible narrative can be applied to many of the other famous inventors. But this story is often told with the naysayers as hidebound blocks to progress and inventors as courageous pursuers of the truth. But this way the narrative misses the overwhelming majority of ignorant would-be inventors being simple cranks. For every clueless Morse and cranky Columbus, there were thousands of unknown failures who did not know or believe something truly impossible was impossible (just like most random mutations do not win the natural selection lottery). The correct response to somebody claiming they are able to do something known to be impossible is to doubt it. The trick is being able to update one’s priors in a way that helps us better judge the signs of success.

But we should also not overlook the ignorance among the adopters of innovations. The lack of information on the side of adopters of innovation is another necessary ingredient to success. Every innovation is too uncertain and often unreliable in its earliest stages to be considered by the well-informed other than as a bet. This ignorance is partly a result of pure uncertainty as to the viability of something new. But much more commonly, it is just ignorance of how the new thing works, what are its limitations – and how it truly differs from the old. This results in Potemkin innovations like the original Mechanical Turk. The innovations or their effects are often too complex to be fully understood (even by their inventors). This then leads to the creation of a zeitgeist (a sort of general framing) which provides the innovation with enough vectors for infection and the possibility for improvement.

But it also makes it very easy for impostors to sneak in. At present, there are many examples of companies simply labeling products with ‘machine learning’ or ‘blockchain’ and selling it to credulous investors and customers even if the underlying technology is not actually using anything that could be meaningfully described that way.

Imitation as metaphor of innovation

Innovation is associated with inventiveness and creativity. Strokes of brilliance and flashes of genius. But almost all of the great innovations were imitating a previous less successful attempt. It is well known that great inventions and discoveries often appear multiple times simultaneously as different people synthesize available information into similar outcomes. Perhaps the most famous examples are Newton and Leibniz for calculus and Darwin and Wallace for natural selection. But this holds true for almost all the great inventions. Either somebody figured it out as well, or was getting very close.

But more importantly, by the time we get to talk about almost any innovation, it will have reached us through a long chain of imitations. Novel ways of thinking or doing things really only become innovations when somebody copies them.

This process of imitation is similar to that of natural selection, so it usually leads to refinement and strengthening of the original idea. But let’s not forget that natural selection is based around the idea of imperfect copies (random mutations) finding uses that increase their chances of spreading (reproduction). So as part of this metaphor, innovation without copying would just be lots of random ideas that go nowhere.

Not only is innovation the result of imitation, without imitation, there would be no point to it in the first place.

Conclusion

There is not meant to be conclusion here. Investigating metaphors just opens up new prisms that slightly change the way we look at things. Sometimes, it’s the process of thinking through mappings in the metaphor that forces you to investigate one of the domains more closely. And that’s what this is all about.

Does machine learning produce mental representations?

TL;DR

  • Why is this important? Many people believe that mental representations are the next goal for ML and a prerequisite for AGI.
  • Does machine learning produce mental representations equivalent to human ones in kind (if not in quality or quantity)? Definitely not, and there is no clear pathway from current approaches to a place where it would. But it is worth noting that mental representations in humans are also not something straightforward to identify or describe.
  • Is there a currently viable approach to ML that could eventually lead to mental representations with more engineering? It appears not but then again, no one expected neural nets would get so successful.

Update: Further discussion on Reddit.

Background

Over the last few months, I’ve been catching up more systematically on what’s been happening in machine learning and AI research in the last 5 years or so and noticed that a lot of people are starting to talk about the neural net developing a ‘mental’ representation of the problem at hand. As someone who’s preoccupied with mental representations a lot, this struck me as odd because what was being described for the machine learning algorithms did not seem to match what else we know about mental representations.

So I’ve been formulating this post when I was pointed to this interview with Judea Pearl. And he makes exactly the same point:

“That sounds like sacrilege, to say that all the impressive achievements of deep learning amount to just fitting a curve to data. From the point of view of the mathematical hierarchy, no matter how skillfully you manipulate the data and what you read into the data when you manipulate it, it’s still a curve-fitting exercise, albeit complex and nontrivial.”

He continues:

“If a machine does not have a model of reality, you cannot expect the machine to behave intelligently in that reality.”

What does this model of reality look like? Pearl seems to reduce it to ‘cause and effect’ but I would suggest that the model needs more than that (Note: I haven’t read his book just the interview and this intro.)

What are mental representations?

Mental representations are all sorts of images (ranging from rich to schematic and from static to dynamic) in our mind on which we draw sometimes consciously but mostly unconsciously to deal with the world. They are essential for producing and understanding language (from even the simplest sentence) and for basic reasoning. They can be represented as schemas, rich images, scenarios, scripts, dictionaries or encyclopedic entries. They can be in many modalities – speech, sound, image, moving picture.

Here are some examples to illustrate.

Static schemas

What does ‘it’ refer to in pairs of sentences such as these (example from here):

  1. The trophy wouldn’t fit into the suitcase because it was too big.
  2. The trophy wouldn’t fit into the suitcase because it was too small.

It takes no effort at all for a human to determine that it in (1) refers to trophy and in (2) to suitcase. Why, because, we have schemas of containment and we know almost intuitively that big things don’t fit into smaller things. And when we project that schema onto trophy and suitcase we immediately know what has to be too big or too small in order for one not to fit into the other.

You can even do it with a single sentence as in Jane is standing behind Clare so you cannot see her. It is clear that her refers to Jane and not Clare but only because we can project a schema of 2 similar-sized objects positioned relative to the observer’s line of sight.

So we also know that only sentence 1 below makes sense because of the schema we have for things of unequal size being positioned relative to each other and their impact on our ability to see them.

  1. The statue is in front of the cathedral.
  2. The cathedral is in front of the statue.

However, unlike with the trophy and suitcase, it is possible to imagine contexts in which sentence 2 would be acceptable. For instance, in a board game where all objects are printed on blocks of the same size and positioned on a 2D space.

This is to illustrate that the schemas are not static but interact with the rich conceptualisations we create in context.

Force dynamics

This is a notion pioneered by Leonard Talmy that explains many aspects of cognitive and linguistic processes through dynamic schemas of proportional interaction. Thus we know that all things being equal, bigger things will influence smaller things, faster things will overtake slower things, etc.

So we can immediately interpret the it in sentences such as:

  1. The foot hit the ball and it flew off.
  2. The bird landed on the perch and it fell apart.

But we also apply these to more abstract domains. We can thus easily interpret the situations behind these 2 sentences:

  1. The mother walked in and the baby calmed down.
  2. The baby walked in and the mother calmed down.

If asked to tell the story that led to 1 or 2, people would converge on very similar scenarios around each sentence.

Knowledge of the world

Sometimes, we marshall quite rich (encyclopedic) knowledge of the world in understand what we hear or see. Imagine what is required to match the following 2 pairs of sentences (drawing on Langacker):

  1. The Normans conquered England with …
  2. The Smiths conquered England with …
  1. … their moody music.
    b. … their superior army.

Obviously the right pairings are 1b and 2a. But none of this is contained in the surface form. We must have the ‘encyclopedic’ knowledge of who The Normans and The Smiths were but also the force dynamic schemas of who can conquer who.

So on hearing the sentence ‘Mr and Mrs Smith conquered Britain’, we would be looking for some metaphorical mapping to explain the mismatch between the force we know conquering requires and the force we know a married couple can exert. With sufficiently rich knowledge, this is immediately obvious as in ‘John and Yoko conquered America.’

How does machine learning do on interpreting human mental representations?

For AI, examples such as the above are a difficult challenge. It was recently proposed that a much more effective and objective Turing test would be to ask an AI to interepret sentences such as these under the [ Winograd Schema Challenge] (https://en.wikipedia.org/wiki/Winograd_Schema_Challenge).

A database of pairs of sentences such as:

  1. The city councilmen refused the demonstrators a permit because they feared violence.
  2. The city councilmen refused the demonstrators a permit because they advocated violence.

This has the great advantage of perfect objectivity. Unlike with the Turing test, it is always clear which answer is correct.

The best machine learning algorithms use various tricks but they still only do slightly better than chance (57%) at interpreting these schemas.

The only problem is that it is quite hard to construct these pairs in a way that could not be solved with simple statistical distributions. For instance, the Smiths and Normans example above could be easily resolved with current techniques simply by searching which words occur most frequently together.

Also, it is not clear how the schematic and force dynamic aspects interact with the encyclopedic aspects. Can you have one without the other? Can we classify the Winograd schema sentences into different types, some of which would be more suspectible to ML approaches?

Do mental representations exist?

There is a school of thought that claims that mental representations do not actually exist. There is nothing like what I described above in the brain. It is actually just a result of perceptual task orientation. This is the ecological approach developed in the study of perception and physical manipulation (such as throwing or catching a ball).

I am always very sceptical of any approach that requires we find some bits of information resembling what we see stored in the brain. Which is why I am quite sympathetic to the notion that there are no actual mental representations directly encoded into the synaptic activations of our brain.

But even if all of these were just surface representations of completely different neural processes, it is undeniable that something like mental representations is necessary to explain how we think at speak at some level. At the very least to articulate the problems that have to be solved by machine learning.

Note: I have completely ignored the problem of embodiment which would make things even more complicated. Our bodily experience of the world is definitely involved. But to what extent are our bodies actually a part of the reasoning process (as opposed to the brain as an independent computational contrl module) is a subject of hot debate.

How does machine learning represent the problem space?

Now, ML experts are not completely wrong to speak about representations. Neural nets certainly build some sort of representation of the problem space (note, I don’t call it world). We have 4 sources of evidence:

  1. Structure of data inputs: Everything is a vector encoded as a string of numbers.
  2. Patterns of activation in the neural nets (weights): This is where the ‘curve fitting’ happens.
  3. Performance on real world tasks: More reliable than humans on dog breed recognition but penguins can also be identified as pandas.
  4. Adversarial attacks: Adding seemingly random and imperceptible noise to a image or sound can make it produce radically different outputs.

If we take together the vector inputs and the weights on the nodes in the neural net, we have one level of representation. But that is perhaps the less interesting and as complexity increases, it becomes impossible to truly figure out much about it.

But is it possible that all of that actually creates some intermediate layer that has the same representational properties as mental representations? I would argue that at this stage, it is all inputs and weights and all the representational aspects are provided by the human interpreting the outputs. But if we only had the outputs, we could still posit some representational aspects. But the adversarial attacks reveal that the representational level is missing.

Note: Humans can also be subject to adversarial attacks with all sorts of perceptual and cognitive illusions. They seem to be on a different representational level to me but they would be worth exploring further in this context.

Update: A commenter on Reddit suggested that I look at this post on feature visualisation and I think that mostly supports my point. It looks like there are lots of representations shown in that article, but they are really just visualisations of what inputs lead to certain neuron activations on specific layers of the neural net. Those are not ‘representations’ the neural net has independent access to. I think in the same way, we would not think of Pavlov’s dogs salivating on the sounds of the bell as having ‘mental representation’ of the ‘bell means food’ causal connection. Perhaps we could rephrase the question of whether training a neural net is similar to classical or operant conditioning.  and what that means with respect to the question of representation.

Can we create mental representations in machines?

Judea Pearl thinks that nothing current ML is doing is going to lead to a ‘model of the world’ or as I call it ‘mental representations’. But I’m skeptical that his solution is a path to mental representations either:

“The first step, one that will take place in maybe 10 years, is that conceptual models of reality will be programmed by humans.”

This is what the early AI expert systems tried to do but it proved very elusive. One example of manually coding mental representations is FrameNet, a database of words linked to semantic frames but it barely scratches the surface. For instance, here’s the frame for container which links to suitcase. But that still doesn’t help with the idea of trophy being sometimes small enough to fit and sometimes too big. I can see how FrameNet could be used on very small subsets of problems but I don’t see a way for scaling it up in a way that could take into account everything involved in the examples I mentioned. We are faced with the curse of dimensionality here. The possible combinations just grow too fast for us to compute them.

I’m also not sure that simply running more data through bigger and bigger RNNs or CNNs will get us there either. But I can’t rule out that brute force won’t get us close enough for it not to matter that mental representations are not involved.

Perhaps, if label enough text of some subdomain with framenet schemas, we could train a neural net on this. But that will help with the examples where rich knowledge of the world is not required. We can combine a schema of a suitcase and a trophy with that of ‘fit’ and match ‘it’ with the more likely antecedent. Would that approach help with the demonstrators and councilmen? But even if so, the Winograd Schema Challenge is only an artificially constructed set of sentence pairs designed for a particular purpose. The mental representations involved crop up everywhere all the time. So we not only need a way of invoking mental representations but also a way to decide if they are needed and, if so, which ones.

Machine learning fast and slow up the garden path

Let’s imagine that we can somehow engineer a solution that can beat the Winograd Schema Challenge. Would that mean that it has created mental representations? We may want to reach for Searl’s ‘Chinese Room Argument’ and the various responses to it. But I don’t think we need to go that deep.

One big aspect of human intelligence that is often lumped together with the rest is metacognition. This is the ability to bring the process of thinking (or speaking) to conscious awarenes and control it (at least to a degree). This is reminiscent of Kahneman’s two systems in ‘Thinking Fast and Slow’.

Machine learning produces almost exclusively ‘fast thinking’ – instantaneous matching of inputs to outputs. It is the great advance over previous expert system models of AI which tried to reproduce slow thinking.

Take for instance the famous Garden path sentences. Compare these 2:

  1. The horse raced past the barn quickly.
  2. The horse raced past the barn fell.

Imagine the mental effort required to pause and retrace your steps when you reach the word ‘fell’ in the second sentence. It is a combination of instantanous production of mental images that crash and slow deliberate parsing of the sentence to construct a new image that is consistent with our knowledge of the world and the syntactic schema used to generate it.

Up until the advent of stochastic approaches to machine learning in the 1990s (and neural nets in 2010s), most AI systems tried to reproduce the slow thinking through expert systems encoded as decision trees. But they mostly failed because the slow thinking only works because of the fast thinking which provides the inputs to it. Now neural nets can match complex patterns that we once thought impossible. But they do it very differently from us. There doesn’t seem to be much thinking about how to go about developing the sort of metacognition that is required to combine the two. All of the conditional decisionmaking around what to do with the outputs of ML algorithms has to be hardcoded. Alexa can recognize my saying ‘turn on bedroom light’ but I had to give it a name and if I want to make it part of a more complex process (make sure bedroom light is off when I leave home), I have to go to IFTTT.

I don’t see how Pearl’s approach will take us there. But I don’t see an alternative, either. Perhaps, the mental representations will emerge epiphenomenally as the neural nets grow and receive more sophisticated inputs about the spatial nature of world (rather than converting everything to vectors). Maybe they will be able to generate their own schemas as training inputs. I doubt it, but wouldn’t want to bet against it.

What is just as likely is that we will reach a plateau (maybe even resulting in a new AI winter) that will only see incremental improvements and won’t take the next step until a completely new paradigm emerges (which may not happen for decades if ever).

Conclusion

It is not always obvious that more in-depth knowledge of a domain contributes to a better model of it. We are just as likely to overfit our models as to improve them when we dive too deep. But I think that mental representations at least reveal an important problem domain which should be somehow reflected in what machines are being taught to learn.

Update

In response to a comment on Reddit, I wanted to add the following qualification.

I think I ended up sounding a bit more certain than I feel. I know I’m being speculative but I note that all the critics are pointing at hypotheticals and picking at my definition of mental representation (which is not necessarily unwarranted).

But what I would like to hear is a description of the next 5 specific problems to be solved to get nearer to say 75% on the Winograd Schema Challenge that can then be built on further (ie not just hacking around collocation patterns Watson style).

I also wanted to note that I omitted a whole section on the importance of collocability in language with a reference to Michael Hoey’s work on Lexical Priming, which I think is one of the 2 most important contributions to the study of language in the last 20 years, the other being William Crofts Radical Construction Grammar. The reading of which would be of benefit to many ML researchers along with Fauconnier’s and Turner’s The Way We Think.

Not ships in the night: Metaphor and simile as process

In some circles (rhetoric and analytics philosophy come to mind), much is made of the difference between metaphor and simile.

(Rhetoricians pay attention to it because they like taxonomies of communicative devices and analytic philosophers spend time on it because of their commitment to a truth-theoretical account of meaning and naive assumptions about compositionality).

It is true that their surface and communicative differences have an impact in certain contexts but if we’re interested in the conceptual underpinnings of metaphor, we’re more likely to ignore the distinction altogether.

But what’s even more interesting, is  to think about metaphor and simile as just part of the process of interpersonal meaning construction.  Consider this quote from a blog on macroeconomics:

[1a] Think of [1b] the company as a ship. [2] The captain has steered the ship too close to the rocks, and seeing the impending disaster has flown off in the ship’s helicopter and with all the cash he could find. After the boat hit the rocks no lives were lost, but many of the passengers had a terrifying ordeal in the water and many lost possessions, and the crew lost their jobs. [3] Now if this had happened to a real ship you would expect the captain to be in jail stripped of any ill gotten gains. [4] But because this ship is a corporation its captains are free and keep all their salary and bonuses. [5] The Board and auditors which should have done something to correct the ship’s disastrous course also suffer no loss.

Now, this is really a single conceptual creation but it happens in about 5 moves which I highlighted above. (Note: I picked these 5 as an illustrative heuristic but this is not to assume some fixed sequence).

[1] The first move establishes an idea of similarity through a simile. But it is not in the traditional form of ‘X is like Y’. Rather, it starts with the performative ‘Think of’ [1a] and then uses the simile ‘as’. [1b]. ‘Think of X as Y’ is a common construction but it is rarely seen as an example in discussions of similes.

[2] This section lays out an understanding of the source domain for the metaphorical projection. It also sets the limit on the projection in that it is talking about ‘company as a ship traveling through water’ in this scenario, not a ship as a metonym for its internal structure (for instance, the similarities in the organisational structure of ships and companies.) This is another very common aspect of metaphor discourse that is mostly ignored. It is commonly deployed as an instrument in the process of what I like to call ‘frame negotiation’. On the surface, this part seems like a narrative with mostly propositional content that could easily stand alone. But…

[3] By saying, ‘if this happened to a real ship’ the author immediately puts the preceding segment into question as an innocent proposition and reveals that it was serving a metaphorical purpose all along. Not that any of the readers were really lulled into a false sense of security, nor that the author was intending some dramatic reveal. But it is an interesting illustration of how the process of constructing analogies contains many parts.

[4] This part looks like a straightforward metaphor: ‘the ship is a corporation’ but it is flipped around (one would expect ‘the corporation is a ship’. This move links [2] and [3] and reminds us that [1].

[5] This last bit seems to refer to both domains at once. ‘The board and the auditors’ to the business case and ‘ships course’ to the narrative in the simile. But we could even more profitably think of it as referring to this new blended domain in which we have a hypothetical model in which both the shipping and business characteristics were integrated.

But the story does not end there, even though people who are interested in metaphors often feel that they’ve done enough at this stage (if they ever reach it). My recommended heuristic for metaphor analysts is to always look at what comes next. This is the start of the following paragraph:

To say this reflects everything that is wrong with neoliberalism is I think too imprecise. [1] I also think focusing on the fact that Carillion was a company built around public sector contracts misses the point. (I discussed this aspect in an earlier post.)

If you study metaphor in context, this will not surprise you. The blend is projected into another domain that is in a complex relationship to what precedes and what follows. This is far too conceptually intricate to take apart here but it is of course completely communicatively transparent to the reader and would have required little constructive effort on the part of the author (who is most likely to have spent time on constructing the simile/metaphor and its mappings but little on their embedding into the syntactic and textual weave that give it its intricacy).

In the context of the whole text, this is a local metaphor that plays as much an affective as it does a  cognitive role. It opens up some conceptual spaces but does not structure the whole argument.

The metaphor comes up again later and in this case it also plays the role of an anaphor by linking 2 sections of the text:

Few people would think that never being able to captain a ship again was a sufficient disincentive for the imaginary captain who steered his boat too close to the rocks.

Also of note is the use of the word ‘imaginary’ which puts that statement somewhere between a metaphor (similarity expressed as identity) and simile (similarity expressed as comparison).

There are two lessons here:

  1. The distinction between metaphor and simile could be useful in certain contexts but in practice, their use blends together and is not always easy to establish boundaries between them. But even if we could, the underlying cognition is the same (even if truth-conditionally they may differ on the surface). We could even complicate things further and introduce terms such as analogy, allegory, or even parable in this context but it is hard to see how much they would help us elucidate what is going on.

  2. Both metaphor and simile are not static components of a larger whole (like bricks in a wall or words in a dictionary). They are surface aspects of a rich and dynamic process of meaning making.  And the meaning is ‘literally’ (but not really literally) being made here right in front of our eyes or rather by our eyes.  What metaphor and simile (or the sort of hybrid metasimile present here) do is  help structure the conceptual spaces (frames) being created but they are not doing it alone. There are also narratives, schemas, propositions,  definitions, etc. All of these help fill out the pool of meaning into which we may slowly immerse ourselves or hurtle into headlong.  This is not easy to see if we only look at metaphor and simile outside their natural habitat of real discourse. Let that be a lesson to us.

Therapy for Frege: A brief outline of the theory of everything

Frege’s trauma

I found the following quote from Frege on the Language goes on holiday blog and it struck as the perfect starting point for this essay which has been written for a while now:

“Frege (“Logic in Mathematics”): Definitions proper must be distinguished from elucidations [Erläuterungen]. In the first stages of any discipline we cannot avoid the use of ordinary words. But these words are, for the most part, not really appropriate for scientific purposes, because they are not precise enough and fluctuate in their use. Science needs technical terms that have precise and fixed Bedeutungen, and in order to come to an understanding about these Bedeutungen and exclude possible misunderstandings, we provide elucidations. Of course in so doing we have again to use ordinary words, and these may display defects similar to those which the elucidations are intended to remove. So it seems that we shall then have to provide further elucidations. Theoretically one will never really achieve one’s goal in this way. In practice, however, we do manage to come to an understanding about the Bedeutungen of words. Of course we have to be able to count on a meeting of minds, on others’ guessing what we have in mind.”

Duncan Richter’s commentary then follows:

“Frege’s problem is of a different kind [from Mill]. There is something wrong with what he wants. He sees the problems himself, but still, apparently, goes on wanting the same thing. So pointing out the problems won’t help at all. We might say he needs a kind of therapy, although this won’t be regular psycho-therapy.”

Well, I have been thinking about the need for exactly such a therapy and it must stem from an understading that Frege was wrong about the extent to which we can in practice determine the precise Bedeutungen of our terms. As I hope to show below, the infinite regress of elucidation intrudes on our every day thinking in many ways that make even relatively simple communication or understanding difficult (a never ending process of negotiation). Difficulties stemming from what I call below the impossibility of perfect reference are not a matter of some distant perifery of hypothetical paradoxes, they make themselves known as insurmountable obstacles in seemingly innocuous. Or in other words, it is Erläuterungen all the way down.

And this problem does not have an epistemological solution (even if we don’t have to go as far as Rorty in rejecting epistemology as a beneficial enterprise altogether). Our only course of action is acceptance and making peace with the fundamental indeterminacy of reference. The acknowledgment of the need to make peace is the therapeutic part because the alternative is dissolution into madness of circularity or arbitrary absolutism (which is a kind of madly willful blindness, in itself).

Halting Problem of Rationality

The original impetus for these notes was reading a recent review of Elezier Yudkowsky’s new book on Inadequate Equilibria by Scott Alexander. Yudkowsky and Alexander’s review seem to me an object study of what I’ve come to thinking about as the halting problem of rationality.

This problem has many formal kindred spirits in the form undecidability, computability (P=NP), etc. From everything we know, we should be exteremely skeptical of rationality to solve its own problems without any appeal to a sort of axiomatic arbiter (a Godelian ‘because I said so’, perhaps.)

Scott Alexander shows the infinite regression of the process of finding the final level at which to decide which perspective is valid (or even useful). Based on Yudkovsky’s book, he arbitrarily (or perhaps magically) uses two perspectives but they are clearly just points on a continuum which itself is on an infinite plane rather than just a neat straight line.

Now, Yudkowsky does not seem to be bothered by the infinity of it all. He uses a whole lot of Bayesian heuristics to build up a priors machine that spits out one good decision after another. Prior ex machina, if you will. And it’s not always good. That’s why Alexander calls the book’s core argument ‘theodicy’. And that’s how most rationalist epistemological arguments strike me. They are the same sort of hermeneutics performed on the Bayesian heuristic canon that biblical scholars engaged in with the Bible. Read the text and its understanding will reveal THE truth.

The impossibility of true hermeneutics

My arguement is that hermeneutics (in this sense) is impossible and always the wrong goal. What’s more it is very easy to mistake our heuristics for hermeneutics. In other words, it is almost an instinct to assume that the analytic instruments we use to handle the world around us for specific (if often implicit) purposes are isomorphic with the world. And the more successful the instrument, the more likely it is we will assume it reflects the actual ‘true’ and complete image of the world. So computers, have been hugely influential and successful in emulating (and enhancing) some previously difficult mental processes and therefore the world is made up of information and our minds are just computers. We can control so much of the world around us by manipulating chemical elements, and therefore everything we are is really chemistry and our goal in describing the human condition should be a transcription into chemical notation because only that is the language in which a true image of the world can be captured. We can describe a sentence with a transformational rule and therefore the true representation of language is a formal description. We can design precise logical proofs for truth conditions, and therefore all that a meaning of a word or a statement is, is its conditions for truth. We can describe the utility of an economic transation by its marginal value and therefore all that defines value is the margin. And so on.

Richard Rorty pretty much showed how this works in Philosophy and the Mirror of Nature and later on also showed how to deal with it through his ironist approach. But rationalists are too cool to read Rorty. Wittgenstein and Derrida saw the problem and instead of talking about it, they tried to reveal it through cryptic koans.

I’d like to go about this differently and offer an outline of what a proof might look like that there is no ultimate external referent available for adjudication of referential problems. I also show that this causes problems not just on the edges but all the time across all aspects of academic and daily life.

Outline of the theory of everything

Lets start with a key assumption from which everything else derives:

Everything exists!

On the word ‘exist’

Now, the word exist obviously has multipe meanings. I’m obviously not saying that everything exists as an object in the world. So I’m perfectly happy with the statement ‘Unicorns don’t exist’. I’m using it in the most universal sense similar to the logical notation E. In this sense it is impossible for something I can refer to with the word ‘something’ or even think about not to exist. But I don’t have to have a word or a thought for something to exist. In facts, words make it seem as if everything existed as some kind of entity. But those words and thoughts themselves exist and so does the relationship between them and the things they refer to as well as my reference to that relationship and my reference to that reference. And so on ad infinitum. In fact, the very act of naming brings things into existence. Existence in this sense is a Parmenidian totality – it is not temporal. Everything includes past and future. It is not dimensional – if it turns out there are infinite parallel worlds, everything will still exist. Parallel words are also part of everything. And if it turns out there’s no such thing, everything will still exist. The parallel worlds will just exist as an idea that turned out not to have identifiable external correlate. Everything does not require finiteness nor infinity. Infinity is still everything. But even if it turns out that infinity is just a mathematical construct and the physical world is actually finite in the shape of some bizzare multidimensional space-time sphere, that’s still everything. When Wittgenstein said ‘That of which I cannot speak, I must stay silent’, he was alluding to the same concept of everything. If it can exist it does exist, if it cannot, it does not. Everything exists. Anything that does not exist does not exist. What it means that there is nothing outside of existence in the sense of x E everything. There is no such special mode of being as metaexistence – existence beyond existence, existence about existence. Now, this is not the proof, this is the Cartesian axiom abstracted – X exists, therefore X exists.

Impossibility of perfect reference

The key consequence of everything existing is the impossibility of perfect referrentiality. This presents a problem because our entire epistemology is built on the assumption of referentiality. If something exists, we can refer to it with a concept, word, label, or at least point at it. In other words, signifier vs signified. We cannot speak or think without relying the perfect applicability of this abstraction. And most of the time it sort of works. In ‘Pass the salt’, ‘pass’ refers to an action, ‘salt’ to an object, ‘the’ to a relationship between the object and our perception of it. The ‘sort of’ refers to the fact that even simple sign/meaning pairings get very complicated very fast. Semioticians have been dining out on this since at least Peirce. (But medieval logicians and Indian ones before that have also taking this complexity apart as far as it can be taken apart.)

But it stops even sort of working very soon when we get even close to any attempt at metareferentiality. Just look what sort of verbal gymnastics I had to go through to even hint at what I mean by a simple statement ‘everything exists’. The problem is that referentiality is not a passive fact outside of existence. Every act of reference creates a new relationship between the refered, referee, and reference (at its most oversimplified). And that’s something we can then go and refer to, thus creating an infinite regress, that’s not linear but exponential. Because any new act of reference creates not one but at least four potential things to refer to. 1. The act of reference itself, 2. the referee in the act of reference, 3. the referent as being referred to, 4. the signfifier being used for that reference. Most often we can multiply that by referring to other participants in the act of reference, the relationship of that act to prior acts and their relationship to this act. In short, it’s not a pretty picture.

Borges in his psychedelic ways showed how the quest for perfect reference falls on its face in his short story about the mapmakers trying to create an ever better map but making it more and more closely resemble reality until it became as big as the land it was representing. By the end of his story, it simply lay abandoned on the edge of town. But the mapmakers did not even come close to achieving perfection. Because in the perfect representation of the world, the map itself would have to be included as well. But then an even bigger map would have to be created to capture the map, the reality and their relationship, but then we’d need another even bigger map to capture the previous relationships. And so on. A perfect map is a physical impossibility. Even in an infinite universe, there’s not enough transfinity to hold it.

There’s nothing new about this. Zeno, Russell, Goedel, Turing, Mandelbrot are just the most famous of the names who dealt with this problem in one way or another in the formal realm of mathematics. And Rorty did it for philosophy – while of course all the major philosophers of the last 300 years had hints of it, as well. Hume, Kant, Hegel, Marx, Nietsche, Heidegger, Wittgenstein, Rorty, Feyerabend. Of the western ancients Parmenides. But of course, so called eastern philosophy is rife with this, as well.

Meaning without perfect reference

So what does this mean? Is meaning impossible? Can we not speak? No. Meaning is obviously possible. But not in the way it suggests itself to us. When we say something means something we are implying a perfect one-to-one mapping of symbol to entity. But this is a false implication. When I point at an object and say this is a ‘chair’, I have a feeling that I have thus exhaustively described that object. That I engaged in perfect reference. But because everything about that chair exists, not just it chairness, I have simply pointed to a whole complex of existence and the word ‘chair’ only describes one of its infinitely many dimensions. When I set the chair on fire, at what point does it stop being a chair? When does it start being a chair during the contruction process? When the tree’s cut down with the intention of making furniture? When the last bit of varnish dries? Or somewhere in between? Maybe when it takes on the recognizable shape of a chair or when it can start functioning as a chair. What if it is a modernist chair and I can only recognize it as such when somebody puts a label on it? What if it is a chair in a picture? The label chair can do a lot of this work but it is not a perfect reference that maps nicely onto a thing.

This is all kind of obvious, so obvious that we take it all in our stride in our everyday acts of reference. But it starts causing problems as soon as we try to pin it down in the assumption that if we only stop being everyday about our reference, we can easily identify the ‘real’ referent exactly in the way our usual every-day unthinking reference suggests we are doing already. Oh, we’re just being sloppy thinkers, taking quick shortcuts for convenience. But if we sacrifice some of that convenience, take a bit more time, we will be able to stop the infinite referential regress. There has to be an end to it. But there cannot be. Not within the system of reference itself. Every moment we take to try to nail down the reference, creates another referent for us to refer. It just never ends.

Infinite perfect reference is impossible in principle. And we cannot resolve this by stepping outside the system of reference as we can do with maths in Godel’s theorem. Because, we can only consider reference using referential tools. This is so crazy-making and frustrating that generations of great thinkers simple assumed that it cannot be so. But in fact, it cannot be otherwise. Or if you think, it can. Show me how! I’ve been wrong before. (Obviously the Augustinian God who is outside of time – and presumably reference, or Buddhist karma – the extinguishment of existence itself – are pretty good conceptual exits out of the worry but they don’t provide any usable heuristics for dealing with the paradoxes of reference within the referential model itself.)

Summary of the key consequence

In summary, there’s a paradoxical consequence of the theory of everything. Beause everything exists, perfect reference is impossible, and therefore nothing exists in the way our words and thoughts make it seem it does. Or in a pithier (but less accurate) heuristic I recommend to all philosophers and rationalists:
“Just because there’s a word for it, it does not mean it exists.”

Edge cases in our midst

So what? Who cares about some edge cases on the margins of infinity? We can just happily go on to use our ‘ordinary’ language and take care of the really important problems like designing more efficient energy storage.

If only it were that easy. But as the example with the chair showed, the problems of reference are all around us. They pop up all the time in daily conversation or in basic academic discourse. They are not just something people in the most abstract domains have to deal with on their darker days. They are something we all deal with everyday – all of us – from Socrates to the Macedonian swineherd.

Lets take energy storage. It is a perfect way of thinking about batteries or pumping up water on an incline. But is there really a thing called energy we are storing the same way we may be storing bags of dried beans in a cupboard? Is there even such a ‘thing’ as energy? Well, there’s a whole lot of maths used to describe the measurements in the physical world that make it easy to think about a lot of things in terms of energy. Not only can we think of the world that way, we can all of a sudden compare things like burning fire and the rubbing of hands and the running of horses, pile of coal and pile of dinosaurs, etc. But what is happening when we say X is releasing energy? Is the pumping of water up a hill the same thing as a burning fire? What is it that we’re describing with the math? It is certainly not a given that energy is always a useful concept. People say things like, because everything is energy, I don’t believe in God but in universal energy that connects us all. We may laugh at charlatans like Deepak Chopra, but what is the mathematics describing energy really referring to? Is that one example of perfect reference? There is one energy and one value of energy in the world? Further indivisible? The ultimate building block of our semantics?

No. Theory of everything does not claim that no reference is possible. Or even that it is impossible to have one perfect one to one relationship between a signifier and signified. Just that that sort of atomic reference is not very useful. I can agree with my fellow referees that henceforth ‘dog’ refers to Spot at 5pm on July 23, 2011 in my living room (with the rest of the infinite specification taken as read). But that will render the word completely useless. I will have to then come up with a new word to refer to Spot at 5.01pm or Spot who’s wondered into the garden. Or I may choose the much more sensible option of refering to the fuzzy and ever changing universe of dogness. That word will be imprecise and fuzzy but that will make it useful. We will have broad agreement and negotiate around the edges.

So I can equally say, the word energy refers only to a set of mathematical formulae. But then I severly constrain what I can do with it. Which (in the case of physics) maybe exactly what I want. But it is a solution that does not scale as every effort to come up with a precise language has demonstrated and even if it did, it would necessarily run into paradoxes predicted by the theory of everything.

Possible objections

What are some possible objections to the theory of everything? I can think of several.

  1. The premise is wrong. Everything does not exist. There is a mode of metaexistence (for instance, human consciousness or a state of nirvana) that will make it possible to know all.
  2. There’s no problem. We just need an alternative epistemology which does not rely on reference.
  3. So what if perfect reference is impossible. We just need to come up with simpler formulae that will describe more complex ones and build a perfect reference by proxy.
  4. How does this apply to the theory of everything? How can you say everything exists when by your definition you should not be able to make any statements like that?
  5. You made a logical mistake and it is indeed possible to have perfect reference even when everything exists.

Re 1: Many years ago I read about a Buddhist school of semantics that claimed that the meaning of anything is everything that it is not. And the way Buddha himself was able to confirm that something does not exists was by looking at everything and finding that nothing was it. (This was a long time ago and I’m probably mangling this but it will suffice for illustration.) So is it possible that we can achieve some alternate level of consciousness – perhaps even stepping outside the ‘karmic wheel’ on which everything turns and grasping the whole world non-referrentially as one or simply being aware of everything through a vastly expanding consciouness where the limits of infinity don’t apply. Every mystical tradition would have you believe that you can.

But even if you could (and why not), it wouldn’t solve any of the problems in the here and now. Maybe we should realign our goals and instead of striving for accumulating ever more referential possessions, seek this new alternative consciousness. Sure. But again, this does not solve the problem for this consciousness.

re 2. Well, if you can come up with an epistemology not based around some notion of reference I’d like to see it. Now, there are many philosophical approaches that take the very impossibility (or at least great difficulty) of perfect (or even very good) reference to heart and integrate it into its epistemological toolbox. Zen Koans are one example, floating signifiers of post-modernist semioticians are another. But these approaches don’t actually transcend referentiality. They merely break it and through that breakage reveal the boundaries that reference imposes on us. The best Zen masters such as Derrida in his postcards or Wittgenstein in his investigations do a great job.

But, again this only exarcebates the problem rather than resolve it. There is no bulshit filter on koans. I can just as easily remain clueless as englightened and I have no way of knowing which one I am. Most of the reference transcending statements are as likely as not interpretted as if they are referential and simply referring to something not yet seen. Well, that does not help anyone.

re 3. The whole point of reference is that it simplifies the world. Who cares about perfection. As long as we can come up with simple and beautiful mathematics to describe the complex world, we’ll be in good shape. I call this generative referentiality. And if it could get us out of the jam, it would be nice. But it fails on two counts.

Count 1: Assume you come up with a nice function to describe a chunk of the world. Now, if you plug it into a computer, it will eventually spit out a perfect image of that chunk of the world. But then you’ve created a new object that needs to be generated by another function, including that function itself. Now, you might think that you could Cantor your way out of this. Just map one to one until infinity – no problem if it seems that one set should be smaller than the other. Yes, but Cantor never worried about everything. Russell did and look where that got him.

Count 2: But even if we assumed that generative referentiality can solve this problem, it is still arguable that it actually does do the job we assume referentiality does. Look at Madelbrot’s set. It is a dead simple formula (albeit with complex numbers) that generates infinitudes of self-similar shapes when plotted in a 2D space. But does knowing the formula actually constitute knowing the set? Can we know the set without knowing the formula? Do we need to know both? We can certainly take the formula as the signifier of the whole complex thing. But then it would seem to be mostly doing a job of referring to something complicated and calling it Bob (or Madelbrot’s set) would be just as good. There is something magical about knowing the names of things but knowing the names is not knowing the things. Generative referentiality is extermely useful and we might say it provides the foundations of our current civilisation. But confusing it with perfect referentiality has caused a lot of problems.

re 4. How do the epistemological limitations of the theory of everything apply to the theory itself? This is a typical worry of any foundational epistemological theory that tries to encompass all of cognition. How do you deal with self-referentiality without running into a paradox? The strictures are even more severe on any theory that tries to deal with self-referentiality itself. The limits on perfect reference of course apply to anything I say just as much as anything else. However, there is a small reprieve for reference that does not try to do anything useful. The whole point of reference is that it allows us to grasp something external to us. And the hidden strength of reference (at least hidden from most mainstream logicians) is that it is profoundly simplifying. It only works because it ignores almost everything and only zooms in on what is most important. However, there is a kind of perfect reference that is profoundly useless except as a foundational axiom. And that is tautology.

I can in fact avoid all the problems with chairs, love, kings of France or anything referentiality struggles with if I just say they are exactly what they are. So instead of positing that X = a, I simply say X = X. I can thus refer to everything as being everything and be quite happy that that reference includes itself and everything that surrounds it. Just like I can say that a set of all sets is a set of truly all sets including itself. The problems start when I try to build a non-selfreferntial system out of this assumption. Because I can’t.

I would say that the foundation of the theory of everything is purely therapeutic. It points to some fundamental impossibilities of our system without saying ‘and for my next trick, I will now show you how to simply resolve it’. There is no next trick. However, I will try to outline some heuristics that can be used to get around this. Deconstruction is one such approach – Derrida’s horizons come to mind here (but not something I know a lot about.) But even very simple rationalist heuristics will do as long as we don’t assume that they are external to the limits on perfect reference.

re 5. It is possible that I made a mistake somewhere. In fact, I would not be surprised in the least if I did – this kind of thinking is hard and not my strong suit. But what remains is the empirical fact that perfect reference is nearly impossible. It is so hard that nobody has yet managed to crack it in any system capable of expressing something like language. Even algebra. I never quite managed to understand the details of Goedel’s proof but this is what I imagine he was after. But for him, undecidability was an internal problem for any system with an outside observer. But with everything there is no outside observer. (Or at least not any outside observer we have access to.)

Words-as-models heuristic and the halting problem

So what are we to do? Perfect reference is impossible but our language-thought processes behave as if all reference was perfect. Is there a way out? No, there is no way out. You cannot be out of everything but there’s a way of living with this limitation.

One simple heuristic I suggest is to think of anything we say or think as a model. Each word, sentence, concept. It is a model of the thing it refers to. Then we can then go on and live with the statistician’s dictum: All models are wrong, but some are useful.

Of course, the world does not need me for this. Those assumptions have been around for a long time. But what has been missing is the next step. Ok, so some models are useful, how do we know which ones? Can we come up with a universal procedure for determining usefulness of models? And here the analogy with the halting problem comes in.

Models are a type of (by definition) imperfect reference. So, if we could get a perfect procedure for identifying the utility of models, we could build out a model of the whole world just based on utility. But utility of models is itself a mode and, therefore, by its nature imperfect. Which means we cannot have a perfect external procedure for identifying utility. So, what can we have?

As always, we need to remind ourselves of the heuristic ‘just because there’s a word for it, does not mean it exists’. We have a notion and a word of utility but that does not mean that there is a nice monadic entity of utility floating around in the world that we can attach that word to. We can pretend there is (just like the utilitarians) but that is not going to help us avoid paradoxes and other odius conclusions (just like the utilitarians). We don’t know whether a model is useful until we have examined all of its aspects with respect to all aspects of reality. But that is no more possible than it is possible to examine all steps of an infinitely recursive algorithm. At best we can follow the line of steps as far as the eye can see and say, well, it seems like it will continue for a while. Let’s go get a sandwich.

But with utility, things are even more difficult because it is not intrinsically a point on a simple scale from less useful to more useful. To simplify dealing with utility, we may convert it into a unidimensional scale of ‘utils’ spanning from negative to positive infinity. But that only makes the calculations of utility themselves easier by pushing all the difficult work one step down the line. We still have to decide in every case how to map the utility we perceive onto that scale. And we also have to decide how to measure that mapping. So by committing to a simple scale we have simplified one part of the process but we didn’t solve any of the problems. We simply pushed them upstream to the foundational issues.

How do we halt the infinite regress if we don’t know whether there is an actual end to it? In practice, we already do the only thing we can do. We give up when it feels right. Or when we’re exhausted. Or when we’ve reached a point of some sort of equilibrium or conversely leverage. Our only sane option is to do what we’re doing and not pretend that we’ve cracked the halting problem. Pretend (with conviction to the point of self-delusion) that we’ve come to a decision because a decision at that point makes sense. Dance as if noone is watching and there’s an externally arbitrated rational reason for stopping. Or a common sense one. But those are just pragmatic, ad hoc (or as Rorty insists contingent) decisions. The assumptions of external rationality are therapeutic ones, not epistemological.

Dealing with imperfect reference through heuristics: rationalists, postmodernists and pragmatists

Now, given we know all of the above and assuming we want to be reasonably honest about acknowledging there’s a problem, how do we go about continuing to speak and reason referentially while knowing that the reference we are working with as real is actually impossible? The postmodernists have suggested provisional knowledge. And they’re not wrong. All knowledge has to be provisional. The rationalists have come up with the Bayesian ‘strong opinions weakly held’ and updating priors. And they’re not wrong. And the Pragmatists have come up with conflating epistemology with ethics. I like these the most.

But these are just the general slogans of intent. What is really interesting (and actually useful) are the heuristics developed by each of these traditions.

The rationalists assume (implicitly) that perfect reference is indeed possible but very hard. They have come up (as the scholastics – Western and Eastern – before them) with a number of heuristics in the form of logical fallacies that help point out some of the paradoxes. They sort of present them as if avoiding these fallacies would avoid all problems. But while they help avoiding a lot of problems, they don’t avoid all or even most and they also create new ones. But simply dismissing them because of this would be foolish.

The postmodernists, on the other hand, focus on the impossibility of perfect reference and emphasize the provisionality of knowledge. They have developed a lot of deconstructive techniques to direct the mind to the boundaries of possibility. They almost write poetry about the abundance of everything and the futility of its conquest (Feyerabend being one of the most eloquent here). But they tend to reject even some of the more useful heuristics and are very likely to drown in bulshit. The rationalists are prone to non-sense, as well, but I think the profound embarassment of the Sokal hoax is unique to the postmodernists. The rationalists just assume that the infinite regress can be halted if we put up enough barriers of logic in its way, but postmodernists are sometimes all too happy to see something rhyme and don’t care if it could be made reasonable sense of (albeit provisionally) with some simple rationalist heuristics.

Then, there are the pragmatists. They are closest to my heart and I think Rorty pretty much said everything that I ever wanted to say. They emhasize the contingency of knowledge on situation and social commitments. But unlike the postmodernists, they are happy to take provisional stances for something and do something specific with them. When James spoke about the importance of commitments to others as being the foundation of epistemology, he touched on something fundamental. I came up with the slogan ‘epistemology is ethics’ without knowing about James or the details of Rorty’s analysis but when I read Philosophy and the Mirror of Nature, I knew Rorty and I were soul mates.

But I think Rorty was too quick to dismiss epistemology. He rightly took it down a lot of pegs and showed the impossibility of an ultimate epistemological theory. But he did not give it enough credit in thinking through some of the impossible problems while asssuming they are merely very hard. His ‘liberal ironist’ stance in later essays is a good practical application of the core insight but again, it does not give enough room to the basic heuristics.

That makes it much easier for the traditional epistemologists and scientists to dismiss him as irrelevant. While in fact, he speaks to the very core of their enterprise. But it feels to them like he is taking away the very foundations on which all of their heuristics stand and somehow invalidates them along with it.

But Rorty should be viewed as therapeutic. If I can hope to add anything to Rorty, it is this. Similar to the New Wittgenstein studies. Everytime we run into a referential paradox, we can take solace in its totality and turn away from the brink. We can also just simply save time and not worry about justifying stopping following the referential regress. But we can also let ourselves an out by remembering that we stopped simply for pragmatic reasons. And if new reasons (contingencies) appear, we can resume our journey along the infinite refrential web.

Serenity through disciplined conversation

What I am ironically calling ‘theory of everything’ is designed to do just that. Acknowledge that there is a problem and that there’s nothing that can be done about it.

Very much like Alcoholics Anonymous. The difference is that the wisdom to tell the difference between things we can and cannot do something about is not revealed by a deity but is a constant subject of disciplined conversation. Conversation that reflects the contingencies of the present as much as those of the past. A conversation that cannot have an end but which we must inevitably take part in. The serenity one hopes to get out of this will not come from resignation but from embracing of the totality without assuming that we can grasp its every possible aspect.

This is the therapy Frege needs. As do we all.

How to read ‘Women, Fire and Dangerous Things’: Guide to essential reading on human cognition

Note:

These are rough notes for a metaphor reading group, not a continuous narrative. Any comments, corrections or elaborations are welcome.

Why should you read WFDT?

Women, Fire, and Dangerous Things: What Categories Reveal About the Mind is still a significantly underappreciated and (despite its high citation count) not-enough-read book that has a lot to contribute to thinking about how the mind works.

I think it provides one of the most concise and explicit models for how to think about the mind and language from a cognitive perspective. I also find its argument against the still prevalent approach to language and the mind as essentially fixed objects very compelling.

The thing that has been particularly underused in subsequent scholarship is the concept of ‘ICMs’ or ‘Idealised Cognitive Models’ which both puts metaphor (for work on which Lakoff is most well known) in its rightful context but also outlines what we should look for when we think about things like frames, models, scripts, scenarios, etc. Using this concept would have avoided many undue simplifications in work in the social sciences and humanities.

Why this guide

Unfortunately, the concision and explicitness I extolled above is surrounded by hundreds of pages of arguments and elaborations that are often less well-thought out than the central thesis and have been a vector for criticism (I’ve responded to some of these in my review of Verena Haser’s book).

As somebody who translated the whole book into Czech and penned extensive commentary on its relevance to the structuralist linguistic tradition, I have perhaps spent more time with it than most people other than the author and his editors.

Which is why when people ask me whether to read it, I usually recommend an abreviated tour of the core argument with some selections depending on the individual’s interest.

Here are some of my suggestions.

Chapters everyone should read

Chapters 3, 4, 5, 6 – Core contribution of the book – Fundamental structuring principles of human cognition

These four chapters summarize what I think everybody who thinks about language, mind and society should know about how categories work. Even if it is not necessarily the last word on every (or any) aspect, it should be the starting point for inquiry.

All the key concepts (see below) are outlined here.

Preface and Chapter 1 – Outline of the whole argument and its implications

These brief chapters lay out succinctly and, I think very clearly, the overall argument of the book and its implications. This is where he outlines the core of the critique of objectivism which I think is very important (if itself open to criticism).

Chapter 2: Precursors

This is where he outlines the broader panoply of thinkers and research outcomes in recent intellectual history whose insights this books tries to systematise and take further.

The chapter takes up some of the key thinkers who have been critical of the established paradigm. Read it not necessarily for understanding them but for a way of thinking about their work in the context of this book.

Case studies

The case studies represent a large chunk of the book and few people will read all 3. But I think at least one of them should be part of any reading of the book. Most people will be drawn to number 1 on metaphor but I find that number 2 shows off the key concepts in most depth. It will require some focus and patience from non-linguists but I think is worth the effort.

Case study 3 is perhaps too linguistic (even though it introduces the important concept of constructions) for most non-linguist.

Key concepts

No matter how the book is read, these are the key concepts I think people should walk away with understanding.

Idealized Cognitive Models (also called Frames in Lakoff’s later work)

I don’t know of any more systematic treatment of how our conceptual system is structured than this. It is not necessarily the last word but should not be overlooked.

Radial Categories

When people talk about family resemblances they ignore the complexity of the conceptual work that goes into them. Radial categories give a good sense of that depth.

Schemas and rich images

While image schemas are still a bit controversial as actual cognitive constructs, Lakoff’s treatment of them alongside rich images shows the importance of both as heuristics to interpreting cognitive phenomena.

Objectivism vs Basic Realism

Although objectivism (nothing to do with Ayn Rand) is not a position taken by any practicing philosophers and feels a bit straw-manny, I find Lakoff’s outline of it eerily familiar as I read works across the humanities and social sciences, let alone philosophy. When people read the description, they should avoid dismissing it with ‘of course nobody thinks that’ and reflect on how many people approach problems of mind and language as if they did think that.

Prototype effects and basic-level categories

These concepts are not original to Lakoff but are essential to understanding the others.

Role of metaphor and metonymy

Lakoff is best known for his earlier work on metaphor (which is why figurative language is not a key concept in itself) but this book puts metaphor and metonymy in perspective of the broader cognition.

Embodiment and motivation

Embodiment is an idea thrown around a lot these days. Lakoff’s is an important early contribution that shows some of the actual interaction between embodiment and cognition.

I find it particularly relevant when he talks about how concepts are motivated but not determined by embodied cognition.

Constructions

Lakoff’s work was taking shape alongside Fillmore’s work on construction grammar and Langacker’s on cognitive grammar. While the current construction grammar paradigm is much more influenced by those, I think it is still worth reading Lakoff for his contribution here. Particularly case studies 2 and 3 are great examples of the power of this approach.

Additional chapters of interest

Elaborations of core concepts

Chapters 17 and 18 elaborate on the core concepts in important ways but many people never reach them because they follow a lot of work on philosophical implications.

Chapter 17 on Cognitive Semantics takes another more deeper look at ICMs (idealized cognitive models) across various dimensions.

Chapter 18 deals with the question of how conceptual categories work across languages in the context of relativism. The name of the book is derived from a non-English example but this takes the question of universals and language specificity head on. Perhaps not the in the most comprehensive way (the debate on relativism has moved on) but it illuminates the core concepts further.

Case studies

Case Studies 2 and 3 should be of great interest to linguists. Not because they are perfect but because they show the depth of analysis required of even relatively simple concepts.

Philosophical implications

Lakoff is not shy about placing his work in the context of disrruption of the reigning philosophical paradigm of his (and to a significant extent our) day. Chapter 11 goes into more depth on how he understands the ‘objectivist paradigm’. It has been criticised for not representing actual philosophical positions (which he explicitly says he’s not doing) but I think it’s representative of many actual philosophical and other treatments of language and cognition.

This is then elaborated in chapters 12 – 16 and of course in his subsequent book with Mark Johnson Philosophy in the Flesh. I find the positive argument they’re making compelling but it is let down by staying on the surface of the issues they’re criticising.

What to skip

Where Lakoff (and elsewhere Lakoff and Johnson) most open themselves to criticism is their relatively shallow reading of their opponents. Most philosophers don’t engage with this work because they don’t find it speaks their language and when it does, it is easily dismissed as too light.

While I think that the broad critique this book presents of what it calls ‘objectivist approaches’ is correct, I don’t recommend that anyone takes the details too seriously. Lakoff simultaneously gives it too little and too much attention. He argues against very small details but leaves too many gaps.

This means that those who should be engaging with the very core of the work’s contribution fixate on errors and gaps in his criticism and feel free to dismiss the key aspects of what he has to say (much to their detriment).

For example, his critique of situational semantics leaves too many gaps and left him open to successful rejoinders even if he was probably right.

What is missing

While Lakoff engages with cognitive anthropology (and he and Johnson acknowledge their debts in the preface to Metaphors We Live By), he does not reflect the really interesting work in this area. Goffman (shockingly) gets no mention, nor does Victor Turner whose work on liminality is pretty important companion.

There’s also little acknowledgement of work on texts such as that by Halliday and Hasan (although, that was arguably still waiting for its greatest impact in the mid 1980s with the appearance of corpora). But Lakoff and most of the researchers in this areas stay firmly at the level of a clause. But give that my own work is mostly focusing on discourse and text-level phenomena, I would say that.

What to read next

Here are some suggestions for where to go next for elaborations of the key concepts or ideas with relevance to those outlined in the book.

  • Moral politics by Lakoff launched his forays into political work but I think it’s more important as an example of this way of thinking applied for a real purpose. He replaces Idealized Cognitive Models with Frames but shows many great examples of them at work. Even if it falls short as an exhaustive analysis of the issues, it is very important as a methodological contribution of how frames work in real life. I think of it almost as a fourth case study to this book.
  • The Way We Think by Gilles Fauconnier and Mark Turner provides a model of how cognitive models work ‘online’ during the process of speaking. Although, it has made a more direct impact in the field of construction grammar, its importance is still underappreciated outside it. I think of it as an essential companion to the core contribution of this book. Lakoff himself draws on Fauconnier’s earlier work on mental spaces in this book.
  • Work on construction grammar This book was one of the first places where the notion of ‘construction’ in the sense of ‘construction grammar’ was introduced. It has since developed in its own substantive field of study that has been driven by others. I’d say the work of Adele Goldberg is still the best introduction but for my money William Croft’s ‘Radical Construction Grammar’ is the most important. Taylor’s overview of the related ‘Cognitive Grammar’ is also not a bad next read.
  • Work on cognitive semantics There is much to read here. Talmy’s massive 2 volumes of ‘Cognitive Semantics’ are perhaps the most comprehensive but most of the work here happens across various journals. I’m not aware of a single shorter introduction.
  • Philosophy and the Mirror of Nature by Richard Rorty is a book I frankly wish Lakoff had read. Rorty’s taking apart of philosophy’s epistemological imaginings is very much complementary to Lakoff’s critique of ‘objectivism’ but done while engaging deeply with the philosophical issues. While I basically go along with Lakoff’s and later Lakoff and Johnson’s core argument, I can see why it could be more easily dismissed than Rorty. Of course, Rorty’s work is also better known for its reputation than deeply reflected in much of today’s philosophy. Lakoff and Johnson’s essential misunderstanding of Rorty’s contribution and fundamental compatibility with their project in Philosophy in the Flesh is an example of why so many don’t take that aspect of this work seriously. (Although, they are right that both Rorty and Davidson would have been better served by a less impoverished view of meaning and language.)

Hacking Metaphors, Frames and Other Ideas