All posts by Dominik Lukeš

Turing tests in Chinese rooms: What does it mean for AI to outperform humans

Share

TLDR;

  • Reports that AI beat humans on certain benchmarks or very specialised tasks don’t mean that AI is actually better at those tasks than any individual human.
  • They certainly don’t mean that AI is approaching the task with any of the same understanding of the world people do.
  • People actually perform 100% on the tasks when administered individually under ideal conditions (no distraction, typical cognitive development, enough time, etc.) They will start making errors only if we give them too many tasks in too short a time.
  • This means that just adding more of these results will NOT cumulatively approach general human cognition.
  • But it may mean that AI can replace people on certain tasks that were previously mistakenly thought to require general human intelligence.
  • All tests of artificial intelligence suffer from Goodart’s law.
  • A test more closely resembling an internship or an apprenticeship than a gameshow may be a more effective version of the Imitation Game.
  • Worries about ‘superintelligence’ are very likely to be irrelevant because they are based on an unproven notion of arbitrary scalability of intelligence and ignore limits on computability.

Reports of my intelligence have been greatly exaggerated

Over the last few years, there have been various pronouncements about AI being better than humans at various tasks such as image recognition, speech transcription, or even translation. And that’s not even taking into account bogus winners of the Turing test challenge. To make things worse, there’s always the implication that this is means machine learning is getting closer to human learning and artificial intelligence is only a step away from going general.

All of those reports were false. Every. Single. One. How do we know this? Well, because none of them were followed by “and therefore we have decided to replace all humans doing job X with machine learning algorithms”. But even if this were the case, it still would not necessarily mean that the algorithm outperformers humans at the task. Just that it can outperform them at the task when it is repeated time after time and the algorithm ends up making fewer mistakes because, unlike people, it does not get tired, distracted, or simply ticks the wrong box.

But even if the aggregate number of errors is lower for a machine learning algorithm, it may still not make sense to use it because it makes qualitatively different errors. Errors that are more random and unpredictable are worse than more systematic errors that can be corrected for. Also, because AI has no metacognitive mechanisms to identify its errors by doing a ‘sense check’. This often makes correcting AI-generated transcripts difficult to correct because it makes errors that don’t make intuitive sense.

Pattern matching in radiology and law

The closest machine learning has gotten to outperforming humans doing real jobs is in radiology. (I’m discounting games like Go, here.) But even here it only equalled the performance of the best experts. However, this could easily be enough. But interpreting X-Rays is an extremely specialised task that requires lots of training and has a built-in error rate. It is a pattern recognition exercise, not a general reasoning exercise. All the general reasoning about the results of the X Rays still has to be delegated to the human physician.

In a similar instance, AI could notice inconsistencies in complex contracts better than lawyers. Again, this is very plausible, but again this was a pattern-matching exercise with a machine pitted against human distractability and stamina. Definitely impressive, useful, and not something expected even a few years ago. But not in any meaningful ways replacing the lawyer any more than a form to draw up a contract I downloaded from the internet does.

This is definitely a case where an AI can significantly augment what an unassisted human can do. And while it will not replace radiologists or lawyers as a category, it could certainly greatly decrease their numbers.

Machine learning to the test

So on very specialised tasks involving complex pattern recognition, we could say that AI can genuinely outperform humans.

But in all the instances involving language and reasoning tasks, even if an AI beats humans on a test, it does not actually ‘outperform’ them on the task. That’s because tests are always imperfect proxies for the competence they measure.

For example, native speakers often don’t get 100% on English proficiency tests and can even do worse than non-native speakers in certain contexts. Why? Three reasons: 1. They can imagine contexts not expected of non-native speakers. 2. The non-native speakers have been practicing taking these tests a lot so they make fewer formal mistakes.

We are facing exactly the same problems when comparing machine learning and human performance based on tests designed to evaluate machine learning. Humans are the native speakers and they perform 100% on all the tasks in their daily lives. But their performance seems less than perfect in test conditions.

BLEU and overblown claims about Machine Translation

Sometimes the problem is with a poorly designed test. This is the case with the common measure of machine translation called BLEU (Bi-Lingual Evaluation Understudy). BLEU essentially measures how many similar words or word pairs there are in the translation by machine when compared to a reference corpus of human translations. It is obvious that this is not a good metric of quality of translation. It can easily assign a lower score to a good translation and a high score to a patently bad one. For instance, it would not notice that the translation missed a ‘not’ and gave the opposite meaning.

What human translators do is translate whole texts NOT sentences. This sometimes means they drop things, add things, rearrange things. This involves a lot of judgment and therefore no two translations are ever the same. And outside trivial cases they’re never perfect. But a reliable translator can make sure they convey the key message and they could provide footnotes to explain where this was not possible. Machine learning can get surprisingly good at translating texts by brute force. But it is NOT reliable because it operates with no underlying understanding of the overall meaning of the text.

That’s why we can easily dismiss Microsoft’s claim that their English-to-Chinese interpreter outperformed human translators. That is only because they used the BLEU metric to make this claim rather than professional translators evaluating the quality of AI output against that of other professional translators on any test. And since Microsoft has yet to announce that it is no longer using human interpreters when its executives visit China, we can safely assume that this ‘outperform’ is not real.

Now, could a machine translation ever get good enough to replace human translators? Possibly. But it is still very far from that for texts of any complexity. Transformers are very promising at improving the quality of the translation but they still only match patterns. To translate you need to make quite rich inferences and we’re nowhere near this.

GLUE and machine understanding come unstuck

Speaking of inferences. How good is AI at making those? Awful. Here we have another metric to look at: GLUE! Unlike BLEU which is a really bad representation of the quality of translation, GLUE (General Language Understanding Evaluation) is a really good representation of human intelligence. If you wanted to know what are the components of human intelligence, you could do a lot worse than look at the GLUE test.

But the GLUE leaderboard has a human benchmark and it comes 4th with 87.1% score. This puts it 1.4% behind the leader which is Facebook at 88.5%. So, it’s done. AI has not only reached human level of reasoning, it has surpassed them! Of course, not. Apart from the fact that we don’t know how much of a difference in reasoning ability 1% is, this tells us nothing about human ability to reason when compared to that of a machine learning model. Here’s why.

How people and machines make errors

I would argue that a successful machine learning algorithm does not actually outperform humans on these tasks even if it got 100%. Because humans also get 100% but they also devised the test.

Isn’t this a contradiction? How can humans get 100% if they consistently score in the mid-80s when given the test. Well, humans designed the test and the correctness criteria. And a machine learning algorithm must match the best human on every single answer to equal them. The benchmark here is just an average of many people over many answers and does not just reflect the human ability to reason but also the human ability to take tests.

Let’s explain by comparing what it means when a human makes an error on a test and when a machine does. There are three sources of human error: 1. Erroneous choice when knowing the right answer (ie clicking a when meaning to click b), 2. Lack of attention (ie choosing a because we didn’t spend enough time reading the task to choose correctly), 3. Overinterpretation (providing context in our head that makes the incorrect answer make sense).

These benchmarks are not Mensa tests, they measure what all people with typical linguistic and cognitive development can do. Let’s take the Windograd Schema test as an example. Here’s an often-quoted example:

The trophy didn’t fit into the suitcase because itwas too big.
The trophy didn’t fit into the suitcase because itwas too small.

It is very possible that out of 100 people, 5 would get this wrong because they click the wrong answer, 10 because they didn’t process the sentence structure correctly and 1 because they constructed a scenario in their head in which it is normal for suitcases to be smaller than the thing in them (as in Terry Pratchett’s books).

But not a single one got it wrong because they thought that a thing can be bigger than the thing it fits in.

Now, when a machine learning model gets it wrong, it does it because it miscalculated a probability based on an opaque feature set it constructs from lots of examples. When you get 2 people together, they can always figure out the right answer and discuss why they did it wrong. No machine learning algorithm can do that.

This becomes even more obvious when we take an example from the actual GLUE benchmark:

Maude and Dora had seen the trains rushing across the prairie, with long, rolling puffs of black smoke streaming back from the engine. Their roars and their wild, clear whistles could be heard from far away. Horses ran away when they came in sight.

So what does the ‘they’ refer to here? The obvious candidate here is ‘trains’. But it is easy to imagine that a person could click the option where ‘puffs of black smoke’ or even ‘Maude and Dora’ are the antecedent. That’s because both of those can be ‘seen’ and could theoretically cause horses to run away. If this is the 10th sentence I’m parsing in a go, I may easily shortcut the rather complex syntactic processing. I can even see someone choosing “whistles” even though they cannot “come in sight” but are a very strong candidate for causing horses to run away. But nobody would choose ‘horses’ unless they misclicked. A machine learning algorithm very easily could do this simply because ‘they’ and ‘horses’ match grammatically.

But all of this is actually irrelevant, because of how the ML algorithms are tested. They are given multiple pairs or sentences and asked to say 1 or 0 on whether they match or not. So some candidate sentences above are “Horses ran away when the trains came in sight.”, “Horses ran away when Maude and Dora came in sight.” or “Horses ran away when the whistles came in sight.” What it does NOT do is ask “Which of the words in the sentence does ‘they’ refer to?” Because the ML model has no understanding of such questions. You would have to train it for that task separately or just write a sequential algorithm to process these questions.

What people running these contests also cannot do is ask the model to explain their choice in a way that would show some understanding. There is a lot of work being done on interpretability, but this just spits out a bunch of parameters that have to be interpreted by people. Game, set and match to humans.

Chinese room revisited

But let’s also think about what it means for a neural network model to get things right. This brings us back to Searl’s famous Chinese room argument. Every single choice a model makes has assigned a probability and even quite ridiculous choices have a non-zero chance of being right in the model. Let’s look at another common example:

The animal didn’t cross the road because it was too busy.

Here it is sensible to assign it to ‘road’ because it makes the most sense but one could imagine a context in which we could make it refer to ‘the animal’. Animals can be thought of as busy and we can imagine that this could be a reason for not crossing the road. But we know with 100% certainty that it does not refer to ‘the’ or even ‘cross’. Yet, a neural model has no such assurance. It may never choose ‘the’ in practice as the antecedent for ‘it’ but it will never completely discount it, either.

So, even if the model got everything right. We could hardly think of it as making human-like inferences unless it could label certain antecedents as having 0% probability and others (much rarer) as having 100%. (Note: Programming it to change 10% to 0% or 90% to 100% does not count.)

This feels like a very practical expression of Searl’s Chinese room argument albeit in a weak form. Neural networks pose a challenge to Searl because their algorithmic guts are not as exposed as those of the expert systems of Searl’s time. But we can still see echoes of their lack of actual human-like reasoning in their scores.

Is a test of artificial intelligence possible under Goodhart’s Law?

I once attended a conference on AI risk where a skeptic said he wasn’t going to worry “until an AI could do Winograd schemas”. This referred to a test of common sense and linguistic ambiguity that AIs have long been famously bad at. NowMicrosoft claims to have developed a new AI that is comparable to humans on this measure. (Scott Alexander)

This post was inspired by the above remark by Scott Alexander. I wanted to explain why even the Winograd challenge being conquered is not enough in and of itself.

AI proponents constantly complain of sceptics’ shifting standards. When AI achieves a benchmark, everybody scrambles to find something else that could be required of it before it gets a pass. And I admit that I may have made a claim similar to that of the AI researcher quoted by Scott Alexander when I was writing about the Winograd schemas.

But the problem here is not that machines became intelligent and everybody is scrambling to deny the reality. The problem is that they got better at passing the test in ways that nobody envisioned when the test was designed. All this while taking no steps towards actual intelligence. Although with a possible increase in practical utility.

This is the essence of Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” The Winograd Schema Challenge seemed so perfect. Yet, I can imagine a machine learning getting good at passing the challenge but still not actually having any of the cognition necessary to really deal with the tasks in real life. In the same way that IBM Watson got really good at Jeopardy but failed at everything else.

None of this is to say that machine learning could not get good enough at performing many tasks that were previously thought to require generalised cognitive capacity. But when machines actually achieve human-level artificial intelligence, we will know. It will not be that hard to tell. But it will not likely happen just because we’re doing more of the same.

The problem with the Turing test or imitation game is not that it cannot produce reliable results on any one run of it. The problem is that if any single test becomes not only the measure but also a target, it is very much possible to focus on passing the test on the surface while bypassing the underlying abilities the test is meant to measure. But the problem is not just with the individual tests but rather in the illusion that we can design a test that will determine AGI level performance simply by reaching an arbitrary threshold.

The current Turing test winners won by misdirection that hid the fact that they refused to answer the questions. This could be fixed by requiring that Grice’s cooperative principle maxims are observed (especially quality and relevance) but even then, I could see a system trained to deal with a single time-bound conversation pass without any underlying intelligence.

As Scott Aaronson showed, it is possible to defeat a current level AI system simply by asking ‘What is bigger a shoebox or Mount Everest’. But once a pattern of questioning becomes known, it becomes a target and therefore a bad measure.

Similar things happen with all standardised aptitude tests designed so that they cannot be studied for. Job interview techniques designed to get interviewees to reveal their inner strengths and weaknesses. All of these immediately spawn industries of prep schools, instructional guides, etc. That makes them less useful over time (assuming they were all that useful to start with).

Towards a test by Critical Turing Internship

That’s why the Turing test cannot be a ‘test’ in the traditional sense. At the very least, it cannot be a single test.

History and a lot of human-computer interaction research has also shown that people are very bad at administering the Turing test (or playing the imitation game). But this is paradoxically because they’re very good the very thing the machines have been failing at: meaning making. Because we almost never encounter meaningless symbols but often encounter incomplete ones, we are conditioned to always infer some sort of meaning from any communication. And it is difficult if not impossible to turn it off.

Every time we see a bit of language we automatically imbue it with some meaning. So, any Turing tester must not only be trained in the principles of cognition but also to discard their own linguistic instincts. We don’t know what it will take for a machine to become truly intelligent but we do know that humans are notoriously bad at telling machines apart from other humans. We simply cannot entrust this sort of thing to such feeble foundations.

As I said above, I suspect that by the time machines do achieve human-level performance on these tasks, it will be obvious. We probably won’t need such a test. Assuming we get there which is not a given. But if a test were needed, it could look something like this.

To replace the Turing test, I would like to propose a sort of Turing Internship. We don’t entrust critical tasks in fields like medicine to people who just passed a test but require they prove ourselves in a closely supervised context. In the same way, we should not trust any AI system based on a benchmark.

Any proposed human-level AI system can be placed in multiple real contexts with several well-informed human supervisors who would monitor its performance for a period of weeks or months to allow for any tricks to be exposed. For example, most people after a few weeks with Alexa, Google Assistant or Siri, get a clear picture of its strengths and limitations. Five minutes with Alexa may make you feel like the singularity is here. Five months will firmly convince you that it is nowhere in sight.

But at the moment, we don’t need this. We don’t need months or weeks to evaluate AI for human-level intelligence. We need minutes. I estimate that we would not need to use this kind of AI internship for another 50 years but likely for much much longer. We are too obssessed with the rapid progress of some basic technologies but ignore many examples of stagnation. My favourite here is the Roomba which has been on the market for 17 years now and has hardly progressed at all. Equally, the current NLP technologies have made massive strides in utility but have not progressed towards anything that could be meaningfully described as understanding.

That is not to say that tests like GLUE or even BLUE are completely useless. They can certainly help us compare ML approaches (up to a point). They’re just useless for comparing human performance with those of machine-generated models.

Note on Nick Bostrom and Superintelligence

One obvious objection to the Turing Internship idea is that if human-level AI is the last step before Bostrom’s ‘Superintelligence’, unleashing it in any real context would be extremely dangerous.

If you believe in this ‘demon in the machine’ option, there’s nothing I can do to convince you. But I personally don’t find Superintelligence in any way persuasive. The reason is that most of the scenarios described are computationally infeasible in the first place. Bostrom does not mention the issue of computability and things like P=NP almost at all. And he completely ignores questions of nonlinear complexity.

It is hard to judge whether a ‘superintelligent’ system could take over the world. But could it predict the weather 20 days out with 1% tolerance of temperature estimates in any location? The answer is most likely not. There may not be enough atoms in the universe to compute the weather arbitrarily precisely more than a few days in advance. Could it predict earthquakes? Could it run an economy more efficiently than an open market relying on price signals? The answers to all those questions are most likely no. Not because the superintelligence is not super enough but because these may not be problems that can be solved by adding ‘more’ intelligence. Assuming that ‘intelligence’ is a linearly scalable property in the first place. It may well be like body size, after a certain amount of increase, it would just collapse onto itself.

Superintelligence requires a conspiracy theorist’s mindset. Not that people who believe are conspiracy theorists. But they assume that complexity can be conquered with intelligence. They don’t believe that humans are ‘smart’ enough to control everything. But they believe that it is inherently possible. Everything we know about complexity, suggests that this is not the case. And that is why I’m not worried.

Fruit loops and metaphors: Metaphors are not about explaining the abstract through concrete but about the dynamic process of negotiated sensemaking

Share

Note: This is a slightly edited version of a post that first appeared on Medium. It elaborates and exemplifies examples I gave in the more recent posts on metaphor and explanation and understanding.

One of the less fortunate consequences of the popularity of the conceptual metaphor paradigm (which is also the one I by and large work with on this blog) is the highlighting of the embodied metaphor at the expenses of others. This gives the impression that metaphors are there to explain more abstract concepts in terms of more concrete ones.

Wikipedia: “Conceptual metaphors typically employ a more abstract concept as target and a more concrete or physical concept as their source. For instance, metaphors such as ‘the days [the more abstract or target concept] ahead’ or ‘giving my time’ rely on more concrete concepts, thus expressing time as a path into physical space, or as a substance that can be handled and offered as a gift.“

And it is true that many of the more interesting conceptual metaphors that help us frame the fundamentals of language are projections from a concrete domain to one that we think of as more abstract. We talk about time in terms of space, emotions in terms of heat, thoughts in terms of objects, conversations as physical interactions, etc. We can even deploy this aspect of metaphor in a generative way, for instance when we think of electrons as a crowd of little particles.

But I have come to view this as a very unhelpful perspective on what metaphor is and how it works. Instead, going back to Lakoff’s formulation in Women, Fire, and Dangerous Things, I’d like to propose we think of a metaphor as a principle that helps us give structure to our mental models (or frames). But unlike Lakoff, I like to think of these as an incredibly dynamic and negotiated process rather than as a static part of our mental inventory. And I like to use conceptual intergation or blending as way of thinking about the underlying cogntivive processes.

Metaphor does two things: 1. It helps us (re)structure one conceptual domain by projecting another conceptual domain onto it and 2. In the process of 1, it creates a new conceptual domain that is a blend of the two source domains.

We do not really understand one domain in terms of another through metaphor. We ‘understand’ both domains in different ways. And this helps us create new perspectives which are themselves conceptual domains that can be projected or projected into. (As described by Fauconnier and Turner in The Way We Think).

This makes sense when we look at some of the conventional examples used to illustrate metaphors. “The man is a lion” does not help us understand lesser known or more abstract ‘man’ by using the better known or more concrete ‘lion’. No, we actually know a lot more about men and the specific man we’re thus describing than we do about lions. We are just projecting the domain of ‘lions’ including the conventionalised schemas of bravery and fierceness onto a particular man.

This perspective depends on our conventionalised way of projecting these 2 domains. Comparison between languages illustrates this further. The Czech framing of lions is essentially the same as English but the projection into people also maps lion’s vigour into work to mean ‘hard working’. So you can say “she works as a lion”, meaning she works hard. But in the age of documentaries about lions, a joke subverting the conventionalised mapping also appeared and people sometimes say “I work like a lion. I roar and go take a nap.” This is something that could only emerge as more became conventionally known about lions.

But even more embodied metaphors do not always go in a predictable direction. We often structure affective states in terms of the physical world or bodily states. We talk about ‘being in love’ or ‘love hitting a rocky patch’ or ‘breaking hearts’ (where metonymy also plays a role). But does that really mean that we somehow know less about love than we know about travelling on roads? Love is conventionally seen as less concrete than roads or hearts but here we allow ourselves to be mislead by traditional terminology. The domain of ‘love’ is richly structured and does not ‘feel’ all that abstract to the participants. (I’d prefer to think of ‘love’ as a non-prototypical noun; more prototypical than ‘rationalisation’ but less prototypical than ‘cat’).

Which is why ‘love’ can also be used as the source domain. We can say things like “The camera loves him.” and it is clear what we mean by it. We can talk about physical things “being in harmony” with each other and thus helping us understand them in different ways despite harmony being supposedly more abstract than the things being in harmony.

The conceptual domains that enter into metaphoric relationships are incredibly rich and multifaceted (nothing like the dictionaries or encyclopedias we often model linguistic meaning after). And the most important point of unlikeness is their dynamic nature. They are constantly adapting to the context of the listeners and speakers, never exactly the same from use to use. We have a rich inventory of them at our disposal but by reaching into it, we are also constantly remaking it.

We assume that the words we use have some meanings but it is us who has the meanings. The words and other structures just carry the triggers we use to create meanings in the process of negotiation with the world and our interlocutors.

But this sounds much more mysterious and ineffable than it actually is. These things are completely mundane and they are happening every time we open our mouths or our minds. Here’s a very simple but nevertheless illuminating illustration of the process.

Not too long ago, there were two TV shows that had some premise similarities (Psych and The Mentalist). One of them came out a year earlier and its creators were feeling like their premise was copied by the other one. And they used the following analogy:

“When you go to the cereal aisle in a grocery store, and you see Fruit Loops there. If you look down on the bottom, there’s something that looks just like Fruit Loops, and it’s in a different bag, and it’s called Fruity Loop-Os.” 

I was watching both shows at the time but their similarity did not jump out at me. But as soon as I read that comparison it was immediately clear to me what the speaker was trying to say. I could automatically see the projection between the two domains. But even though it seemed the cereal domain was more specific, it actually brought a lot more with it than the specificity of cereal boxes and their placement on store shelves. What it brought over was the abstract relationship between them in quality and value but also many cultural scripts and bits of propositional knowledge associated with cereal brands and their copycats.

But there was even more to it than that. The metaphor does not stop at its first outing (it’s kind of like mushrooms and their  in this way). Whenever, I see a powerful analogy or generative metaphor on the internet, I always look for the comments where people try to reframe it and create new meanings. Something I have been calling ‘frame negotiation’. Take almost any salient metaphoric domain projection and you will find that it is only a part in a process of negotiated sense making. This goes beyond people simply disagreeing with each other’s metaphors. It includes the marshalling of complex structuring conceptual phenomena from schemas, rich images, scenarios, scripts, to propositions, definitions, taxonomies and conventionalised collocations.

This blog post and its comments contain almost all of them: . First, the post author spends three paragraphs (from third on), comparing the two shows and finding similarities and differences. This may not seem like anything interesting but it reveals that the conceptual blends compressed in the cereal analogy are completely available and can be discussed as if it was a literal statement of fact.

Next, the commenters, who have much less space, return to debating the proposition by recompressing it into more metaphors. These are the first four comments in full:

  1. Anonymous said… They’re not totally different. It’s more like comparing Fruit Loops to Fruit Squares which happen to taste like beef.
  2.  said… I think a better comparison would Corn Flakes and Frosted Flakes. Both are made with the same cereal, but one’s sweeter (Psych).
  3.  said… Sweeter as in more comedy oriented? They are vastly different shows that are different on so many levels.
  4. Anonymous said… nikki could not be more right with the corn flakes and frosties analogy

Here we see the process of sense making in action. The metaphoric projection is used as one of several structuring devices around which frames are made. Comment 1 opens the the process by bringing in the idea of reframing through other analogs in the cereal domain. 2. continues that process by offering an alternative. 3. challenges the very idea of using these two domains and 4. agrees with 2 as if this were a literal statement but also referring to the metalinguistic tool being used.

The subsequent comments return to comparing the two shows . Some by offering propositions and scenarios, others by marshalling a new analogy.

 said… The reason the Mentalist feels like House is because house is a modern day medical version of Homes as in Holmes Sherlock. Also both Psych and The Mentalist are both Holmsian in creation. That being said I love the wit and humor of psych

Again, there is no evidence of the concrete/abstract duality or even one between less and better known domains. It is all about making sense of the domains in both cognitive and affective ways. Some domains have very shallow projections (partial mappings) such as cornflakes and frosty flakes, others have very deep mappings such as Sherlock Holmes. They are not providing new information or insight in the way we traditionally think of them. Nor are they providing an explanation to the uninitiated. They are giving new structure to the existing knowledge and thus recreating what is known.

The reason I picked such a seemingly mundane example is because all of this is mundane and it’s all part of the same process. One of my disagreements with much of metaphor application is the overlooking of the ‘boring’ bits surrounding the first time a metaphor is used. But metaphors are always a part of a complex textual and discursive patterns and while they are not parasitic on the literal as was the traditional slight against them, they are also not the only thing that goes on when people make sense.

5 books on knowledge and expertise: Reading list for exploring the role of knowledge and deliberate practice in the development of expert performance

Share

Recently, I’ve been exploring the notion of explanation and understanding. I was (partly implicitly) relying on the notion of ‘mental representations’ as built through deliberate practice. My plan was to write next about how I think we can reconceptualize deliberate practice in such a way that it draws on a richer conception of ‘mental representations’. But that is turning out to be a much longer project.

Meanwhile, in a recent conversation about teaching practitioners, somebody mentioned reading Kahneman’s ‘Thinking Fast and Slow’ as being relevant to the problem and we discussed maybe starting a reading group. This got me thinking about what should such a reading group have on its reading list.

The literature on expertise is vast (just look at the Cambridge Handbook of Expertise and Expert Performance). In my proposed reading list, I would focus on identifying different perspectives on how our mental representations of the world are structured, how we develop them (or how we can help others develop them), how we solve problems with them, and how they are embedded in the social environment in which we function.

1. Thinking Fast and Slow by Daniel Kahneman (2011)

Kahneman’s famous book is not really focused on experts but rather on the limitations of our thought – summarised under the heuristics and biases banner. But Kahneman’s notion of ‘System 1’ (fast) and ‘System 2’ (slow) thinking is directly relevant to the question of expertise. Expertise means that one can think about complex issues quickly but also that one can analyze that same issue with deliberate attention to detail. Exactly how this applies to the question of educating experts is a matter of discussion that I think the other books on my list can help elucidate.

2. Peak: Secrets from the new science of expertise by Anders Ericsson with Robert Pool (2016)

In this book, Ericsson (helped by journalist Pool) provides an outline of a cognitive mechanism by which fast thinking is acquired without the sacrifice of deliberation in the concept of ‘delibrate practice’. I propose that the key to understanding deliberate practice is not the process of practice but rather on Ericsson’s rethinking of the target that the practice should help us achieve. According to Ercisson, what delibrate practice leads is not knowledge or skill but rather ‘mental representations’. Mental representations are best thought of as chunks of knowledge (frames, scripts, schemas, etc. – which makes this approach overlap with Kahneman and Tversky’s work even though Ericsson does not mention this). This allows experts to perform complex mental operations on very rich subject domains which would be beyond the computational powers of anyone’s pure raw intelligence. The best analogy is being able to play chess or speaking a language – this is impossible by simply knowing the rules – we need a rich complex of mental representations to compete at chess or to speak with any fluency.

3. The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities by Gilles Fauconnier and Mark Turner (2002)

Where Kahneman provides the framework and Ericsson the mechanism of acquisition, Fauconnier and Turner offer us a much more detailed description of the actual structure of ‘mental representation’ and how it is used during live processing of information. Building on work in cognitive linguistics and semantics, they develop the notion of ‘conceptual integration’ (or ‘blending’ as it’s more popularly referred to in the field) that explains how multiple ‘mental spaces’ or ‘domains’ can be merged seemingly without any conscious effort into new domains (blends) that we can then build further understanding on.

In this context, I’d also recommend reading the parts of Lakoff’s ‘Women, Fire, and Dangerous Things’ that describe what he then called ‘Idealized Cognitive Models’ and now calls ‘frames’. The book is quite vast and not all of it relevant to this question, which is why I wrote a guide to it.

4. Rethinking Expertise by Harry Collins and Robert Evans (2008)

What’s missing in all the works I’ve looked at so far is any awareness of the social embeddedness of expert performance. There is little discussion of types or levels of expertise and barely any mention of how experts interact with one another. In ‘Rethinking Expertise‘, Collins and Evans propose what they call a ‘periodic table of expertise’ (which happens to overlap quite nicely with my 5 types of understanding). They think not just about the specialist expert knowledge but also about what they call ‘ubiquitous expertise’ – all the underlying skills and knowledge required to even get started (such as languages, basic social skills, metacognition, etc.). Most importantly, they also pay attention to ‘meta-expertise’, i.e. how non-experts evaluate experts and experts judge other experts.

Their notion of expertise relies on the concept of ‘tacit knowledge’ (later developed by Collins in a separate book) which is reminiscent of Ericsson’s ‘mental representations’ and echoes Kahneman, as well.

5. Reflective Practitioner: How professionals think in action by Donald A. Schön (1983)

While Schön’s book has had a profound impact in terms of citation and ways of thinking, I suggest that it has been largely under-appreciated for its depth of epistemological insight. Despite being more than 2 decades older than any of the other books on this list, it is very much still relevant. It considers the very nature of ‘practical knowledge’ as opposed to ‘academic knowledge’. Schön, more than any of the others thinks about the practical needs of a person needing to achieve practical tasks with their knowledge in a complex situation. He highlights the tensions between the technical preparation of experts that focuses on knowledge about a subject and the practical needs of a practitioner who needs to act in such a way that simply recalling information would not be sufficient. His concept of ‘reflection-in-action’ could be seen as a precursor or better still a companion to the notion of ‘deliberate practice’.

Schön followed this up with Educating The Reflective Practitioner which focuses on the practical question of structuring a training course. Another reason to include Schön on this list is that he focuses more directly on ‘professional’ expertise.

Bringing it all together

What these books have in common is an underlying conception of knowledge and its processing. But what they lack is almost any awareness of each other. This makes them add up to more than just the sum of their parts.

Kahneman mentions Ericsson in a footnote and Ericsson and Collins appear jointly in the Cambridge Handbook I mentioned at the start. But they largely travel in separate spheres. Bizarrely, none of them refers to Schön. And all of them are completely unaware of Fauconnier and Turner, who in turn ignore the work done outside their field of cognition (even though we can trace the lineage of their work on cognitive domains directly to Schön’s earlier work on metaphor).

All these approaches are clearly converging on the same thing but they don’t do it using the same terminology, methods or even a shared conceptual framework. Which is why reading just any one of them would probably not be enough to get at the full scope of the issues involved.

I’m not certain that this selection is the most representative of the field. It is certainly not exhaustive and it is definitely shaped by my idiosyncratic intellectual journey and personal interests. But my hope is that it does triangulate the problem domain in a way that a more narrowly focused selection would not.

Writing as translation and translation as commitment: Why is (academic) writing so hard?

Share

This book will perhaps only be understood by those who have themselves already thought the thoughts which are expressed in it—or similar thoughts. It is therefore not a text-book. Its object would be attained if there were one person who read it with understanding and to whom it afforded pleasure.
(opening sentence of the preface to Tractatus Logico Philosophicus by Lugwig Wittgenstein, 1918)

Background

I’ve recently been commenting quite a lot on the excellent academic writing blog (which I mostly read for the epistemology) Inframethodology by Thomas Basbøll. Thomas and I disagree on a lot of details but we have a very similar approach to formulating questions about knowledge and its expression.

The recent discussion was around the problem of ‘writing as expressing what you know’. While I find it very useful to distinguish between writing to describe what you know and writing to explore and discover new ideas (something I first reflected on after reading Inframethodology), I commented:

I still find that no matter how well I think I know my subject, I discover new things by trying to write it down (at least with anything worth writing).

Thomas responded in a separate blogpost, first picking up on my parenthetical:

Can it really be true that the straightforward representation of a known fact is not “worth writing”? Is the value of writing always to be discovered (by way of discovering something new in the moment of writing)? I think Dominik is thinking of kinds of writing that are indeed very valuable because they present ideas that move our own thinking forward and, ideally, contribute positively to the thinking of our peers. But I also think there is value is writing that doesn’t do this, writing that is, for lack of a better word, boring.

With this, I agree wholeheartedly. 110% coach! Yes, this was a throwaway line I wasn’t comfortable with even as I was writing it. The majority of my writing is mundane: emails, instruction manuals, project proposals, etc. They may or may not be “worthy” but they certainly have a worth. And people who do nothing but that sort of writing certainly do not do anything I would find ‘beneath me’ or not worthy. I might have been better served by the term ‘quotidian’ or even ‘instrumental’ writing.

I agree even more with Thomas’s elaboration (my emphasis):

In fact, I think it’s the primary of value of academic writing and one of the reasons that so many people (and even academics themselves) almost equate “academic” (adj.) with “boring”.The business of scholarship is not to bring new ideas into the world, indeed, the function of distinctively academic work (in contrast to, say, scientific or philosophical or literary work) is not to innovate or discover but to critique, to expose ideas to criticism. In order for this happen efficiently and regularly, academics must spend some of their time representing ideas that are not especially exciting to them along with their grounds for entertaining them. They must present their beliefs to their peers along with their justification for thinking they’re true. And they must do this honestly, which is to say, they must not invent new beliefs or new reasons for holding them in the moment of writing. They must write down, not what they’re thinking right now, but what they’ve been thinking all along.

I find this an incredibly valuable perspective and when I think of my own writing, I think this is precisely where I’ve often been going wrong. This is partly because academic writing is more of a hobby than a job, so I don’t have the time to do more than write to discover. But it is partly because of my temperament. I don’t enjoy the boring duties of writing things I know down and then formatting them for the submission to a journal. I prefer to work with editors which is why the bulk of my published writing is in journalism or book chapters.

But there is still another aspect that needs to be explored. And that is, why do most people find it so difficult to write down what they know even while taking into account all of the above.

Writing as translation

I propose that a good way to think about the difficulty of writing to describe our thoughts is to use the metaphor of translation. We can then think of the content of our thoughts in our head as a series of propositions expressed in some kind of ‘mentalese’. And when we come to write them down, we are essentially translating them into ‘writtenese’ or in this case, one of its dialects ‘academic writtenese’.

This is made more complicated by the existence of a third language – let’s call it ‘spokenese’. We are all natively bilingual in ‘mentalese’ and ‘spokenese’ even if not everybody is very good at translating between these two languages. In fact, children find it very difficult until quite late ages (10 and up) to coherently express what they think and even many adults never achieve great facility with this. Just like many natively bilingual speakers are not very good at translating between their two languages.

But nobody is a native speaker of ‘writtenese’. Everybody had to learn it in school with all its weird conventions and specific processing requirements. It is not too outlandish to say (and I owe this to the linguist Jim Miller) that writing is like a foreign language. (Note: see some important qualifications below).

When we are translating from mentalese to academic writtenese, we are facing many of the same problems translators of very different languages faces. The one I want to focus on is ‘making commitments’.

Translation as commitment: Making the implicit explicit

Perhaps the most difficult problem for a translator (I speak as someone who has translated hundreds of thousands of words) is the issue of being forced by the way the target language operates to commit to meanings in the translation where the structure of the source language left more options for interpretation.

Let’s take a simple paragraph consisting of three sentences (Note: this is a paraphrase of an example given by Czech-Finnish translator at a conference I attended some years ago):

The prime minister committed to pursue a dialogue with the opposition. This was after the opposition leader complained about not being involved. She confirmed that he would have a seat at the table in the upcoming negotiations.

The first commitments I have to make at some point is to the gender of the participants in the actions I write about. In English, I can leave the gender ambiguous until the third sentence. In Finnish, which does not have gendered third-person-singular pronouns, I don’t have to express the gender at all.

In Czech (and many other languages), on the other hand, I have to know the gender of the prime minister from the very first word. Like actor and actress in English, all nouns describing professions have built-in genders (this is not optional as in English because all Czech nouns have assigned some grammatical gender). I also need to express gender as part of the past tense morphology of all verbs. So even if I could skirt the gender of the ‘leader’ (there are some gender-ambiguous nouns in Czech), I would have to immediately commit to it with the verb ‘complained’. Which is why knowledge of their subject is essential to simultaneous translators.

But this is a relatively simple problem that can be solved by reference to known facts about the world. A much more significant issue is the differential completion of certain schemas associated with types of expressions. Let’s take the phrase ‘committed to pursue’. The closest translation to the word ‘commit’ is ‘zavázat se’ which unfortunately has the root ‘bind’. It is therefore ever so slightly more ‘binding’ than ‘commit’. I can also look into something like ‘promise’ which of course is precisely what the prime minister did not do.

Then, there is the word ‘pursue’. One way to translate it is ‘usilovat o’ which has connotations of ‘struggle to’. So ‘usilovat o dialog’ is in the neighborhood of ‘pursue a dialog’ but lacks the sense of forward motion making it seem slightly less like the dialog is going to happen. So here each language is making subtly different commitments.

When you’re translating academic writing, there are hundreds of similar examples, where you have to fill in blanks and make some claims seem stronger and others weaker. And even if you know the subject intimately (which I did in most cases), you often have to insert your judgement and interpretation. And the more you do that, the less certain you feel that you got the meaning of the original exactly right. This is even when while reading the original, I had no sense of something being left unexpressed. The only way to get this right is to ask the author. But even that may not always work because they may not remember their exact mental disposition at the time of writing.

Writing as filling in holes in our mind

I believe that this is exactly the experience we have when we write about something that only exists in our head or something we’ve only previously talked about. Even when I’ve given talks at conferences and had many conversations with colleagues, writing my ideas down remains a difficult task.

When writing, the structure of ‘writtenese’ (as well as the demands of its particular medium) forces me to make certain commitments I never had to make in ‘mentalese’ (or even ‘spokenese’). I have to fill out schemas with detail that never seemed necessary. I have to make more commitments to the linearity of arguments, that could previously run parallel in my head. So when I write it is not clear what should come first and what last.

When I just write down what’s in my head (or as close to it as it is possible), it is unlikely to make any sense to anybody. Often including myself after some time. I need to translate it in such a way that all the necessary background is filled out. I also need to use the instruments of cohesion to restore coherence to the written text that I felt in my mind without any formal mental structure.

But during this process, I often become less certain. The act of writing things down triggers other associations and all of a sudden I literally see things from a different perspective. And this is often not a comfortable experience. Many writers find this a source of great stress.

This is, of course, true even of writing instructions and directions. Often, when describing a process, we find there are gaps in it. And when writing down directions, we come to realise that we may not know all aspects of the familiar sufficiently well to mediate the experience to someone else.

Teaching writing as translation

Translation is a skill that requires a lot of training and practice. In many ways, a translator needs to know more about both languages than a native speaker of either. And then they need to know about different ways of finding equivalent expressions between the two languages in such a way that the content expressed in the source language produces similar mental effects when reading in the target language. This is not easy. In fact, it is frequently impossible to achieve perfectly.

When I translate I often refer to a dictionary (such as slovnik.cz) that lists as many possible alternatives of words even if I know exactly what the original ‘means’. This is because I want to see multiple options of expressing something which may not be immediately triggered by my understanding of the whole.

But for this to work, I need to have done a lot of deliberate reading in both languages to know how they tend to express similar things. At the early stages, I may approach this more simply as learning to speak a language. I may learn that ‘commit to pursue’ is best translated as ‘zavázat se usilovat o’. But I have to back that up by a lot of reading in both languages, studying other translators’ work and making hypotheses about both languages and the differences between them. Eventually, this becomes second nature and to translate fluently, we need to ‘forget’ the rules and ‘just do it’.

So how could we apply this to teaching (academic) writing? We need to start by ensuring that students have enough facility in both the source and the target languages. We usually assume greater fluency in the source language (most translators work primarily in the direction of native to non-native). So in this case, we need to focus on the structures and ways of ‘academic writtenese’.

We can very much approach this as teaching a foreign language. Our first aim should be to help students acquire fluency in the language of academic writing. We need to give them some target structures to learn. This should ideally be based on an actual analysis of that writing rather than focusing on random salient features. But ultimately, the key element here is practice.

Then we also need to focus on helping the students develop better awareness of their native mentalese and how to best map its structures onto the structures of writtenese. We can do this by helping them write outlines, create mind maps, come up with relevant key words, and of course, read a lot of other people’s writing, think about it, and then write summaries in similar ways.

None of these are particularly revolutionary ideas and they are being used by writing teachers all over the world. What I’m hoping to do here is to provide a metaphor to help focus the efforts on particular aspects of what makes the translation from thought to writing difficult.

Writing as playing a musical instrument

One final analogy that can help us here is the idea of writing as playing a musical instrument. This analogy is in many ways even more apt. When we play a musical instrument, we are initially translating relatively vague musical ideas into actual notes (melodies and harmonies) by way of the structures given to us by the musical instrument.

We may start by learning some chords to accompany a song we hear but later we will progress into more details of musical theory which will allow us to express more elaborate ideas. But, in fact, this also allows us to have more those more elaborate ideas in the first place.

Initially, our ability to express musical ideas via an instrument (such as piano or guitar) will be limited by our skill. We may not even realize what exactly the idea in our head was until we’ve played it. And often, what we can play limits the ideas we have. Jazz teachers often say something like ‘sing your solos first and then play’ (others call it ‘audiation’). But this is not trivial and requires extensive training. Which is why one common advice for jazz musicians is to transcribe (or at least copy) famous songs and solos. But as you’re transcribing and copying, you’re supposed to notice patterns in how musical ideas are expressed. You can then recombine them to express what is in your ‘musical mind’.

But it seems that the musical ideas and their form of expression are never completely separate. They are not a pure translation but rather a co-creation. And this is true of any good translation and probably also ultimately true about any act of writing. We are using a different medium to express an existing idea but in the process, we are filling gaps in the ideas, creating new connections until we ultimately cannot be completely certain which came first.

As we get better at translation, music or writing, there are some levels about which the last part does not hold true. There are some ideas we can truly and faithfully translate from our head to paper, musical instrument or from one language to another. This is why practice is so important. But at the highest levels of difficulty, writing, translation and music making will always be acts of co-creation between the medium and the message.

Teaching writing as music

So finally, could we teach writing in the same way as we teach music? We certainly could. Just like teaching a foreign language, teaching music is mostly dependent on a lot of practice.

But perhaps there are some techniques that music teachers use that could be useful for both language teachers, translators and writing coaches.

One is the emphasis on patterns. The idea of practicing scales, licks, or chords relentlessly (up to hours a day) holds a lot of appeal. Perhaps we start teaching self-expression with writing too soon. Maybe we should give students some practice patterns to repeat in different combinations. Then we could tell them to just copy and then dissect parts of good texts. The idea of ‘mindless’ copying will probably stick in many teachers’ craws. But just analysing reading will never be enough. Students need the experience of writing some good writing. If only to develop some muscle memory. And while it should never be completely mindless, it should also perhaps not be completely meaningful from the very start. Of course, we could invent numerous variations on this approach to transform the texts in various fun ways while still making sure, students are writing extended chunks and developing fluency. The point is that we would not be focusing on self-expression but developing a language for self-expression.

Music teachers and students use what has been described by Anders Ericsson as ‘deliberate practice’. Ericsson gives the example of Benjamin Franklin who used similar techniques to improve his writing:

He first set out to see how closely he could reproduce the sentences in an article once he had forgotten their exact wording. So he chose several of the articles whose writing he admired and wrote down short descriptions of the content of each sentence—just enough to remind him what the sentence was about. After several days he tried to reproduce the articles from the hints he had written down. His goal was not so much to produce a word-for-word replica of the articles as to create his own articles that were as detailed and well written as the original. Having written his reproductions, he went back to the original articles, compared them with his own efforts, and corrected his versions where necessary. This taught him to express ideas clearly and cogently.

Obviously, this was not all there was to it, but it is very much reminiscent of what music students do. It seems to me that most beginner writers are often asked to do too much at the very start and they never get a chance to improve because they essentially give up too soon.

Writing is NOT foreign language, translation or music: The Unmetaphor

Writing is writing! It has its specific properties that we need to attend to if we want to see all of its complexities. We must use metaphors to help us do this but always by remembering that metaphors hide as much as they reveal. One useful way of understanding something is to create a sort of unmetaphor: a listing of similar things that are different from it in various respects. This is something that, while not uncommon, is done much less than it should be when using analogies.

Written language is not a foreign language

Some of the fundamental mental orientations of a language are shared between the written and spoken forms. This includes tense, aspect, modality, definiteness, case morphology, word categories, meanings of most function words, the shape of words, etc. These present some of the most significant difficulties to learners of foreign languages making it very difficult to acquire a second language by exposure alone after a certain age for most adults.

Writing, on the other hand, can be acquired predominantly by exposure alone for many (if not most) adults. There are many people who acquire native-like competence in the written code in the same way they acquired their spoken language competence (even if there are just as many who never do). And we must also be mindful (as Douglas Biber’s research revealed) that there is a bigger difference between some written genres then there is between writing and speech overall. So we should perhaps attend to that.

Writing is not translation

That writing is not actually translation is contained in the fact that written language is not actually a foreign language. There are many genres and registers in any language with their specific codes. And we could call going from one code to another translation much more easily than going from what I called ‘mentalese’ and ‘writtenese’. (Again, the work of Douglas Biber should be the first port of call for anyone interested in this aspect of writing.)

But most importantly, what I called ‘mentalese’ does not actually have the form of a language. Individuals differ in how they represent thoughts that end up being represented by very similar sentences. Some people rely on images, others on words. For some, the mental images more schematic and for others, they have more filled in details. For instance, Lakoff asked how different people imagine the ‘hand’ in ‘Keep somebody’s at arm’s length’. And the responses he got were that for some the hand is oriented with the palm out, others with the palm in.  For some, it includes a sleeve, for others it does not. Etc.

Writing is not music

I’ve already written about the 8 ways in which language is not like music. And they all apply to writing, as well. The key difference for us here is that music cannot express propositions. This means that musical expression can be a lot freer than expressing ideas through writing.

We could argue that writing is more like music than spoken language because it requires some kind of an instrument. Pen, paper, computer, etc. But we usually learn these independently of the skill of expressing ourselves through writing. My ability to play the piano is much more closely tied to my ability to express my musical meanings. However, people write just as expressive prose by the hunt and peck method as when they touch type. One can even dictate a ‘written text’ – that’s how independent it is of the method of production.

Of course, improving one’s facility with the tools of production can improve the writing output just by removing barriers. This is why students are well-advised to learn to touch type or to use a speech-to-text method if they struggle for other reasons (e.g. visual impairment or dyslexia). But when it comes down to it, this is just writing down words and as we established, writing in most senses is more than that.

Conclusions and limitations

Ultimately, writing and translation are not the same. Just as writing and music are not the same. But there are enough similarities to make it worthwhile learning from each other.

Many writers have developed great skills by the ‘tried and tested’ approach of ‘just doing it’. But we also know that even many people who do write a lot never become very ‘good’ at it. They struggle with the mechanics, ability to express cogently what’s in their minds, or just hate everything about it.

For some beginner writers, the worst thing we could do is give them a lot of mindless exercises. These people will want to do it first and would hate to be held back. Just like many students of languages or music like dive off the deep end. But equally, for many others, telling them to ‘just do it’ is the perfect recipe for developing an inferiority complex or downright phobias of writing.

But all of these writers will need lots of practice – regardless of whether we provide lots of ladders and scaffolding or just put a trampoline next to the edifice of their skill. In this, writing is exactly like music, language and translation. You can only get better at it by doing it. A lot!

I started with a quote from Wittgenstein. But he also famously said in summarising his book:

What can be said at all can be said clearly; and whereof one cannot speak thereof one must be silent.

I think we saw here that this is not necessarily how the act of writing presents itself to most people.

He then continued:

The book will, therefore, draw a limit to thinking, or rather—not to thinking, but to the expression of thoughts; for, in order to draw a limit to thinking we should have to be able to think both sides of this limit (we should therefore have to be able to think what cannot be thought). The limit can, therefore, only be drawn in language and what lies on the other side of the limit will be simply nonsense.

This is was the so-called “early Wittgenstein” before the language games and family resemblances. He spent the rest of his career unpicking this boundary of sense and non-sense. Coming to terms with the fact that what is thought and what is its expression are not straightforward matters.

So all the metaphors notwithstanding, we should be mindful of the constant tensions involved in the writing process and be compassionate with those who struggle to navigate them.

5 kinds of understanding and metaphors: Missing pieces in pedagogical taxonomies

Share

TL;DR

This post outlines 5 levels or types of understanding to help us better to think about the role of metaphor in explanation:

  1. Associative understanding: Place a concept in context without any understanding.
  2. Dictionary understanding: Repeat definitions, give examples, and make basic connections.
  3. Inferential understanding: Make useful inferences based on knowledge about – but without ability to use the understanding in practice. Requires more than just one concept.
  4. Instrumental understanding: Use the understanding as part of work in a field of expertise. Impossible to acquire for an isolated concept.
  5. Creative understanding: Transform understanding of one domain by importing elements from another. Requires instrumental understanding – goes beyond hints and hunches.

Introduction

In a previous post, I proposed three uses of metaphor leading to different levels of understanding.

  1. Metaphor as invitation
  2. Metaphor as an tool
  3. Metaphor as catalyst

Only 2 and 3 led to any meaningful understanding and that could only be achieved by acquiring some ‘native’ structure of the target domain. But I was rather loose with how I used the word ‘understanding’. I was using notions like ‘meaningful understanding’ or ‘useful understanding’ but never went into any detail. That is the purpose of this post.

In what follows, I provide a sketch for one way of classifying different kinds of understanding. They are not meant to be descriptions or even discovery of some sort of ‘natural kinds’. Instead, I find them to be a useful way of looking at understanding from the perspective of metaphoric cognition.

Associative understanding

Associative understanding is the ability to place something in a context or category without necessarily knowing almost anything about it. So, we may know that an emu is a flightless bird without knowing anything else about it. We could also think of this kind of understanding as a vague notion.

This is the kind of understanding the vast majority of education leaves us with after a few years. Watching a documentary, a TV quiz show, or reading a popular news article fosters this kind of understanding.

Many people can get very far with displaying this kind of understanding – such as con artists impersonating doctors – by successfully imitating experts. The famous Sokal hoax was based on the same principle – making plausible sounding noises can get you published in a prestigious publication. But it is even possible to pass a poorly constructed multiple choice knowledge test with just this understanding by being able to eliminate the wrong options rather than by knowing the correct ones.

The associations can be of various kinds. They can be in the form of basic-category labels (such as – this is an animal). They could place the thing into a discipline – such as ‘something they do in chemistry’. And they could simply be in the form of ‘this is the thing that my friend always talks about’. Or they could also just be parts of the cultural vocabulary without a proper object of understanding.

For example, in the 1960s there was a famous pop song called ‘Pták Rosomák’ (The Bird Wolverine). The band simply liked the sound of the Czech word for ‘wolverine’ and its rhyme with the word for ‘bird’. Wolverines are not native to Europe or well known outside of this song. I did not find out what the word meant until I learned it in English (I also knew what the word wolverine meant long before I looked it up in a Czech dictionary). When I presented this at a conference on cognition in Prague, most Czech academics present were surprised by the meaning. Yet, if you asked them – do you understand the word ‘rosomák’, they would have said ‘of course, I do’. But it was just an associative understanding.

My claim is that the vast majority of what passes for understanding and knowledge in ‘polite society’ is of the associative kind. People feel comfortable when concepts like evolution or philosophy are mentioned but have only the vaguest idea of where they belong.

My favourite example of this is Monty Python’s ‘Philosopher’s song‘. All the audience needs to know to appreciate the jokes is that there is a philosopher stereotype and that certain names are of philosophers. In fact, by their own admission (citation needed but I did hear it in an interview), the authors of these sketches also did not know much more than the names. Even the little nod to knowledge in ‘John Stuart Mill of his own free will’ is just a glimmer of something deeper.

Associative understanding is pretty much only useful for social signalling. It can also play a role in making a new field appear more familiar in later stages. I have had that experience several times when vague memories from school made me feel more confident I was on the right track when I set about studying a subject in depth even if I had very little more than a vague feeling about something. But on its own, this kind of understanding has little practical value.

In formal instruction, we generally start with the next step but over time, without practice, this is the kind of understanding, we’re left with. But in literature on pedagogy, it is mostly unaddressed. It is the kind of understanding below the bottom rung of Bloom’s taxonomy. But many teachers encounter it when at the end of classes students come and ask questions that barely show a hint of an understanding that makes it seem like they may not have even been in the same room.

Lexical understanding

At this level, we can repeat a definition as we might find it in a dictionary and give a few examples. We can look at a picture and say, this is an emu. It lives in Australia and it is a kind of ostrich. For something like an emu, it may well be enough for most of us.

This is the kind of understanding we may be able to take away from a quick explanation of something. It is the sort of understanding most tests check for. It is also often used as a proxy for intelligence or ‘being smart’. Lexical understanding is what is required of successful quiz show panellists. UK shows such as ‘Mastermind’, ‘Brain of Britain’ or ‘University Challenge’ are great examples of these.

Conversely, lack of lexical (and sometimes even associative) understanding is also often given as an example of educational decline or lack of intelligence.

This would be roughly equivalent to the ‘Knowledge’ and ‘Comprehension’ levels on the Bloom’s taxonomy. It is the minimum target for instruction but it is very unstable. Unless it has been recently used, it often reverts to associative kind of understanding.

This kind of understanding is generally not very useful outside the educational context. This is the kind of understanding that is the result of ‘teaching to the test’. It can be leveraged into something more but only with practice and application.

In terms, of frames or mental representations, we could say that the only mental representations developed as part of this understanding are propositional or rich imagery. Meaning, we have sentences or images in our head that we can draw on but we would find it very hard to combine them into larger wholes.

This level and the transition from this level to the next are where what we call pedagogy plays the most important role.

Inferential understanding

This kind of understanding lets us make useful inferences about the concept in context. It requires some knowledge of a whole domain or several domains. You can never understand a solitary concept at this level. But it does not necessarily require deep ability or skill. I know nothing about emus, so I cannot think of an example that would not be misleadingly trivial.

But I have a personal example from when I was recently catching up on the latest developments in machine learning. I was reading about different types of neural nets. And when I was reading about CNNs (Convolutional Neural Networks) which are usually used for images, I had an idea for using the similar approach to process language by representing text in a way similar to the way images are represented. And it turned out there are already papers and models out there that do just that.

Inferential understanding is the kind of understanding that good students develop about favorite subjects that they pursue later. The kind of understanding that collaborators develop about each others’ discipline in interdisciplinary projects. The kind of understanding good generalist managers develop about the domains in which they supervise subject experts. Or really good journalists develop about areas on which they report. This is also the kind of understanding experts have about related fields or that teachers have about some of the more advanced areas of their field.

The sociologist of science Harry Collins described in one of his books (I think it was ‘Rethinking Expertise’) how he could pass some knowledge tests in gravitational wave physics better than professional physicists from adjacent specialisations. This was after many years of observing these physicists but without any real ability to the actual calculations or research required.

It may not always be easy to tell the boundary between this and lexical or even associative understanding. This is the kind of understanding potentially displayed by an audience member at a lecture who asks a question that is then described as ‘a good question’ by the presenter. But often this is just a fluke. A random hit based on superficial resemblance of words in a definition.

This is the kind of understanding that sort of ‘does not count’ in the terms of Bloom’s hierarchy. We feel it is insufficient because it is not something people consciously aim at in instruction. But it is in many ways the best we can hope for. It is the first kind of any useful knowledge.

It requires more developed mental representations. Representations where the definitions and rich images are replaced by schemas and scenarios. These are a sort of useful compressions that can be blended (or integrated) with others. What it means that when reasoning with these concepts, we can use them as whole units (mental chunks) rather than laboriously compute them from first principles.

It may also derive from some basic level of instrumental understanding. The humour in XKCD cartoons can be understood with a combination of inferential and instrumental understanding. I immediately understood this comic famous among programmers without being a programmer myself but having some skills with databases and knowledge of common problems with security.

But for the most part, we cannot use this understanding for actual work. This is where the humanities and sciences often diverge. It is possible to pretend (even to oneself) that this understanding lets us do real useful work in history or sociology. Whereas with mathematics, engineering, medicine, or biology, the barrier between this and instrumental understanding is much more clearly defined by specialised tools such as mathematics and chemistry. But if we look at the many former physicists or biologists who have tried their hand at philosophy, sociology or even literary criticism, we see that even here, this kind of understanding is not enough.

You really need more to have a chance of doing something useful.

Instrumental understanding

This is the kind of understanding experts and practitioners have. It requires being able to use the concepts or tools in practice. I don’t have any instrumental understanding of convolutional neural networks. I couldn’t build one and possibly couldn’t even reconstruct the exact way in which it works.

This level of understanding or ability or skills requires more than just reading or learning about. It requires practice and building of mental representations which only comes from long-term engagement with a subject. For example, I don’t have that kind of understanding of neural nets, but I do have it of metaphor.

I can create metaphors, identify them in text, speak to the controversies around them, compare and contrast the various theories of metaphor. I can teach somebody how metaphors work. I can write a successful paper or give a conference presentation in the field. If somebody wants to know about metaphor they can come to me. Other people with good instrumental understanding of metaphor may disagree with some of what I have to say, but they won’t do it (I hope) as they would with somebody who has just an associative, lexical or even inferential level of understanding – e.g. knowing that metaphor has something to do with poetry. You have to put in the work.

This work may require actual repetitive practice (such as working out math problems or analysing text). It absolutely requires extensive engagement with other experts in the field. Taking classes, going to conferences, reading latest research, writing papers, blogs, etc. That’s why loner autodidacts almost never reach this level of understanding.

Here the distinction between understanding and ability or skill becomes blurred. Mental representations develop at the highest levels of schematicity. This means that an expert can look at a very complex situation and treat it as one unit that can be blended with other complex units in a way that only the relevant parts are engaged.

For instance, I can read a complex argument about metaphor and immediately compare with three other complex arguments about metaphor – not because I have a large mental capacity for abstract concepts but because I have developed a number of highly schematic mental representations about the shapes of arguments people make about metaphor. This way, I can project these schemas onto the argument as one big chunk.

Perhaps an even better analogy is learning a foreign language. I may know all the rules and words but I cannot speak the language with any level of fluency until I have developed larger chunks I can just slightly modify. It is simply impossible for even the most highly mentally endowed human to dredge up individual words, apply rules to them and combine them into a sentence quickly enough to speak with any level of coherence. It’s even worse for understanding. Just reading a text with a dictionary is such a slow affair that we forget what a sentence was about before we get to the end.

In other words, we can then define instrumental understanding as developing a basic fluency in the language of the discipline. And this takes time, targetted practice, and active ‘communicative’ engagement across a whole field.

In the ‘hard sciences,’ it requires a good facility with formalisms or even equipment and in the ‘softer’ disciplines it relies on extensive reading, talking, and writing.

Here we are at a much wider aperture of our knowledge funnel. It is therefore impossible to exactly compare 2 people’s levels of instrumental understanding. Everybody will have a slightly different set of mental representations. Also, many people will only be able to ‘perform’ at this level some of the time or only for small chunks of their discipline.

At this level, pedagogy is much less relevant. This is where it makes a lot less sense to talk about teaching and learning if only because it is impossible to acquire this level of understanding purely in the classroom. Training, coaching or even an apprenticeship are much better models.

Creative understanding

Creative understanding is instrumental understanding with a transformative element. This requires knowledge of several domains and their creative intermingling. It is the sort of understanding innovators in their field have. This can lead us to a complete rejection of the thing we understand as an independent concept.

For example, I have long argued that metaphor is only one place in language where domain projection occurs and that we should not think of it as something special but rather as a shortcut for thinking about broader phenomena of framing or cognitive models. I found this a useful way of extending the concept. So, I can make a serious statement such as ‘metaphor and metonymy are the same thing’ that can be productive in the study of metaphor. But it only makes sense because I can actually distinguish between metaphors, similies, synechdoches or metonymies, and I can also reproduce arguments that maintain that the difference between metaphor and metonymy is crucial for understanding figurative language.

It is hard to say whether this type of understanding is even a part of the funnel hierarchy. Perhaps it is just an ingredient (catalyst) to instrumental understanding. But I do want to stress that it only works as a catalyst to instrumental understanding. As I showed in my post on types of metaphors, creativity needs to start from somewhere.

We may often confuse almost accidental insights by people with inferential or even just lexical understanding for creativity. But this is like recognising a melody in the sounds a child makes by randomly banging on the piano keyboard.

We often valorise the outsider perspective in a field. And it certainly can act as a catalyst for creativity but only if it has proper instrumental understanding to lean on.

Conclusions and limitations

I cannot stress enough that this classification is just a useful heuristics. I am not claiming that this kind of classification of understanding is exhaustive or even that it represents some sort of a natural category. But I found it useful when thinking about explanations and pedagogy.

Approaches to classifying understanding

It is quite common to distinguish between shallow and deep understanding. This is intuitively obvious but not very helpful because it assumes the existence of some sort of objective scale of a depth of understanding.

We can also distinguish understanding from knowledge for example by differentiating between explicit and tacit knowledge. Understanding and explicit knowledge intuitively overlap even if we don’t have a firm definition of either. If we understand something, we can mentally manipulate it and, most importantly, pass it along.

But the boundaries between tacit and explicit knowledge are not firm. All explicit knowledge depends on some tacit knowledge – or in other words, all understanding depends on knowledge. We could even say that deep learning is the process of transforming understanding into knowledge. In the sense, that we need to build up schematic mental representations to be able to manipulate ever more complex combinations of concepts.

Another way to try to get at understanding is to investigate how to achieve it. Bloom’s taxonomy of educational objectives is one famous example. There are many tweaks and elaborations – some as extreme as Jack Koumi’s 33 pedagogic roles. But they are ultimately not very satisfying because they already assume we know what the understanding is.

Understandings as a process revisited: The wave and the funnel

Even though these different types of understanding are ‘broadly hierarchical’, I want the emphasis to be on ‘broadly’. It would make no sense to think of these as a straightforward linear hierarchy measurable on a scale of discrete and comparable units. They are more like overlapping waves. Layers of water covering the beach in successive bursts as the tide is coming in.

But that metaphor does not make it easy to visualise the differences and mutual interdependence. It only evokes how hard and unreliable it is to do so. But for the purposes of this comparison, I’d like to offer something more like a funnel (which I also brought up in the context of the metaphor explanation hierarchy) or inverted cone.

The substance that fills the funnel might be a mixture of effort and coverage of material. This makes it easy to visualise the fact that it takes much more effort, time and background knowledge to get from level 3 to level 4 than it does to get from level 1 to level 2. Also, at the higher levels, the concepts themselves transform and interconnect. So it is not possible to understand them in isolation.

This truly takes into account the processual nature of understanding. The funnel also needs to be constantly topped up to maintain certain levels. But it can also underscore the fact that we can never perfectly compare 2 people’s levels of understanding. Because at the higher levels, the funnel is so broad, not everybody will have filled it in the same way with exactly the same substance.

I got this idea from ACTFL language competency levels and I think it is one of the most underappreciated metaphors in education.

Another really useful thing ACTFL does is that it defines low, mid and high sublevels for each competency levels. And a part of the definition of the ‘high’ sublevel is that the person can function at the ‘low’ sublevel of the next level about half the time. (E.g. a Novice-Low can function as Intermediate-Low about 50% of the time). During the test (most often an interview), the examiner establishes a floor and a ceiling rather than pinpointing an exact point on a scale.

This very much applies to the levels in my metaphor. There are no clear boundaries between these levels of understandings. In as much as they are levels in the first place.

Explanation is an event, understanding is a process: How (not) to explain anything with metaphor

Share

TL;DR

  • There are at least 3 uses of metaphor in the educational process: 1. Invitation to enter; 2. An instrument to grasp knowledge with; 3. Catalyst to transform understanding. Many educators assume that 1 is enough but it rarely leads to any useful understanding.
  • Explanation is a salient part of the educational process to such an extent that it is often allowed to stand for all of it even though it is only one step.
  • Explanation often helps the person doing the explaining more than the person being explained at.
  • Metaphors and explanations have been misused by educators from Socrates to Rousseau.
  • A metaphor can only be successful if the student already has some knowledge of the target domain. Knowledge of the source domain is often less important.
  • Metaphor only makes sense if it’s part of a process of learning. It doesn’t do much good on its own.

How metaphors work in helping us understand things

Teachers love explaining things. Students love understanding things. On the rare occasions that the two coincide, the feeling of joy shines like a beacon for the power of explanation. Teachers tell stories of seeing the “lightbulbs come on” in their students’ eyes. Students remember fondly the ecstatic moments of sudden illumination as their teacher’s words suddenly lit up the darkness within them. Thus the myth of the teaching as explaining and learning as understanding those explanations was born.

Most of the more powerful explanations rely on metaphor in the broadest possible sense. In fact, all explanation is to some extent metaphorical in that it provides a projection from one domain of understanding onto another. Metaphor brings out the familiar – or ex plains it – in the unfamiliar. Or so the story goes.

We can think of the metaphoric projection as putting two thin sheets of paper looking at them against a bright light. What can be on these sheets? Sketches, images, words or even just smudges of color. The projection then obscures certain things and shows others in new contexts. Sometimes, with more complex slides we may see completely new shapes and color hues. The process of making sense of the metaphor then involves slight adjustments in how those two sheets align against one another. This can be described as the metaphor giving a new structure to the target domain.

Another way to think about metaphoric projection is as two sets of items which are mapped onto each other. We can put the sets side by side and draw lines between items we think match. Or we can take them out and place them side by side in a new set. We often see them displayed in this way.

Note: This way of thinking about metaphor started with Lakoff and Johnson’s ‘Metaphors we live by’ from 1980. This led to the formulation of the Conceptual Metaphor Theory. It was later developed into a more general theory of frames or mental models by Turner and Fauconnier (2002) known as the theory of conceptual integration or blending. But it can also be found in Donald A Schön’s ‘Displacement of Concepts’ from 1963 which indirectly inspired Lakoff and Johnson.

But despite all this, it is easy to overlook that in order to form a projection from one mental space into another, we have to have some structure in both. In fact, metaphor often assumes equal knowledge of both domains, and in the process of making a projection from one another, a new previously unimagined structure emerges that is a blend of both domains. Because of the complexity, it is hard to give brief examples, but Turner’s and Fauconnier’s ‘The Way We Think’ is full of very illuminating case studies.

But it is also not at all uncommon for metaphor to borrow from a domain we know much less about to elucidate a domain we know a lot about. For example, if I hear, ‘don’t go into that office, the boss is on a warpath’, I understand a lot more about the boss’s behaviour than I do about any warpaths. Here, only the general feeling of ferocity is transferred with none of the possible association of weaponry or military supply lines.

Metaphor is also always partial. It would make no sense to project every aspect of both domains onto one another. But the ability to understand which bits it makes sense to project and which must be left out also requires at least some understanding of both domains. To understand what we mean when we call a piece of software a ‘virus’ we must know enough about computers to know that the infection cannot be transmitted through simple touch.

Metaphor at its most powerful helps us understand both domains better. It also often results in the creation of new understanding of both domains as we strive to find the limits of possible cross-domain mappings. Often, this happens with honest historical explanations of the present. By comparing the Iraq war to Vietnam, we may only choose to transfer the feeling of emotion and loss associated with the former. But we may also choose to explore both in their own right to find the best way in which they project on to one another. And this gives us new understanding of both.

Three uses of metaphor in explanation

There are many ways to classify the uses of metaphors, I’ve outlined some in an early paper. But for the purposes of metaphor in explanation, I’d like to offer three broad types: 1. Metaphor as invitation; 2. Metaphor as instrument; 3. Metaphor as catalyst. I fear that the first type may be most common while only the second two play any real role in building understanding. These three types could also be viewed as forming a sort of process but this is not inherent in the definition.

As we will see, sometimes the same metaphor can serve all three roles, providing a certain thread through the process of learning. But most often, we need new metaphors for each type or stage.

1. Metaphor as invitation

Novice students often come to a new subject with no knowledge and a healthy dose of fear of the unknown. To help them feel more comfortable, teachers often reach for metaphors relying on the familiar. This gives the learner a chance to grasp onto something while they build up sufficient mental representations of the new domain.

But this use of metaphor usually does not help understanding. It just provides emotional support along the arduous journey towards that understanding. It can also backfire. Teachers often spin up these kinds of metaphors in such a way that they assume an understanding of the unfamiliar. And it is only once students have bootstrapped themselves into some understanding of the subject that the metaphor starts to make any sense to them.

For instance (to use a famous example), we can teach students that the electrical current is like a flow of water. This certainly takes some fear out of the invisible world of electrons. But unless students have at least some prior understanding of electricity, they may ask questions like ‘how do you get the water into the wires?’

This type of metaphor can only be used for a fleeting moment and it must be followed by hard work of accumulating understanding of the new domain on its own terms. Perhaps with the use of more metaphors, this time of the instrumental kind.

2. Metaphor as an instrument

The instrumental use of metaphor for explanation is where real understanding starts to happen. But not all teachers are as good at it. In this case, the metaphor provides a way for the student to grasp the new subject. A lens to see it through, or a mental instrument to manipulate it with. Such metaphors are essential to the learning process. However, they do not rely on the moment of instant insight, which they can sometimes trigger, but rather on continuing exploration of the projection between the two domains. Their usefulness is less in the feeling of illumination than in their availability to be used over and over again.

For instance, electrical engineers may be able to make better judgments about certain properties of electrical circuits when they think of electrons as a flow of water. But in other instances, they may be better off when they think about electrons as lots of tiny balls rubbing against one another, generating heat. This metaphor can come up over and over to help them mentally manipulate the two domains.

Here, as with all metaphors, it is essential that we know when to let go. Or even better, when to switch to a different or even a contradictory metaphor. These instrumental metaphors can be local or global but it is rare that one will be enough.

3. Metaphor as catalyst

In the third use, the metaphor plays the role of a catalyst. Like a powder dissolved into a liquid, it makes a new substance in which both domains are transformed into one unified understanding. This is when the student transforms into a scholar. Making independent judgments, challenging the teacher’s own understanding, and ultimately becoming her own teacher. To work as a catalyst, the metaphor may be very rich and detailed or just a quick sketch resulting in a slight shift of perspective. But it always requires solid knowledge of the target domain.

Let’s continue with our electrical current example. Here, the student comes not only to understand that sometimes electricity behaves like a liquid and sometimes like a collection of particles, they also come to see the complexity of liquids and particles. They start making predictions both ways and ask questions like ‘What if we thought of the flow of water as a collection of particles?’, etc.

Here the metaphor becomes a process without an end. It spurs new mixtures and remixtures as one finds out more about the two (and often more) domains. Unlike with instrumental and invitational metaphors, it is no longer important that the metaphor be apt. It is just important that it is useful for new understandings or the possibilities of these new understandings. Donald Schön called one subtype of these ‘generative metaphor’.

But as with the other types, it is important that these metaphors come with some sort self-destruct mechanism.

What often happens is that these metaphors are taken up by those who presume that they map fully onto the target domain and that no other understanding of the target domain is necessary. I described how this is a problem with Schroedinger’s cat, or Lorenz’s hurricane-triggering butterflies.

What’s even worse, teachers often use these metaphors far too soon. This either confuses students or, worse, it gives them an illusion of understanding that they do not possess.

How NOT to use metaphor to explain something – two case studies

Case study 1: Metaphor gap in data science

My first case study of a bad use of explanation with metaphor is the podcast Data Skeptic. In fact, listening to the most recent episodes prompted me to write this in the first place.

I must preface this by saying that I like the podcast and recommend it to others who want to understand modern data science. It covers important subjects and there is much to learn from it. Its one unfortunate feature, however, are certain episodes when the host, data scientist Kyle Polich, uses his wife, project manager and English major, Linh Da Tran as co-host and tries to explain concepts from abstract computational theory to her. Or rather at her.

This almost invariably fails. Not because Linh Da does not possess the raw intelligence or aptitude to understand these concepts but because Kyle confuses metaphor with explanation and explanation with understanding.

In two recent episodes, he attempted to explain attention in neural networks and Neural Turing Machines. It was an unmitigated disaster. As the metaphors kept piling up, Linh Da finally cried out “I don’t know what you want me to understand”. That’s exactly the problem with a metaphor that only relies on the understanding of the source domain. It serves as a good invitation to the subject but as a very bad instrument for developing an understanding.

There are several problems with this set up that make it a bad place for too many metaphors. First, Linh Da is clearly just humoring Kyle. She’s vaguely interested in machine learning as a phenomenon but has no real interest in putting much work in to learn about how it works. This forces Kyle into more and more metaphors about their pet bird Yoshi. These are useful socially and emotionally because they allow Linh Da to contribute to the discussion. But her contributions at every turn show that she cannot use any of the analogies to make useful inferences about the subject. She almost never brings up previous subjects. At the end of the episode on Neural Turing Machines, she asked who owns the Turing Machine. In all the torrent of analogies, Kyle neglected to stress that the Turing Machine is itself a metaphor. This is despite a prior episode where another guest explained why Turing Machines are important very clearly.

The conceit of the episodes is that data science can be explained even to English majors. That is certainly correct. But those majors must be willing to put in some work between episodes or have some prior knowledge. And as the subjects get more technical or abstract, the explanations have to get longer and include some practice time. And the amount of this practice needs to increase as if the practice was filling a funnel and not a test tube. Namely, to get from level B to C requires more work than getting from level A to B. Otherwise, the metaphors have nothing to hold on to. They constantly invite the student in but then offer no tools for going further. At best, they will confuse the learner and at worst, they will give them an illusion of understanding. About as useful as a seat belt made from masking tape.

While it is pleasant learning about these concepts through listening in on a married couple having a light-hearted conversation, at a certain stage, this pedagogic device just gets in the way of learning by the audience. Initially, the listener can just do their own metaphor mapping and ask the right questions in their head. But as the abstractness level increases, the host doing the explaining cannot go into sufficient depth because the co-host can’t keep up. And the increasingly convoluted and unnecessary metaphors just create a mental fog that descends over all.

I was particularly disappointed in the episode on Attention in neural networks which is something I wanted to learn more about. I found the initial metaphor of attention as a sort of memory span very useful but then it got stuck because Linh Da could not use it to go any further. This was because she was not given a chance to integrate the previous episodes where similar things were discussed. It was still useful to me because then I could go read about attention with a renewed perspective. But an opportunity for a deeper exploration of the metaphor was wasted.

This would have been fine if the episode was aimed at general public with no other understanding rather than an interested audience with some prior background. But even then, the general public would have needed more and different information to make any sense of it.

At one point Kyle, raised the possibility that maybe he wasn’t an effective teacher because Linh Da could not understand something he had explained. But in fact, he was not being a teacher at all. In this setting, he’s just a provider of images. Like a documentary from the Serengeti where the audience remembers there are lions, but could not place it on a map.

I can imagine that Kyle would be a very effective teacher with students who are interested in the subject and if he had a chance to take them through it step by step. And his use of metaphors would be a valuable contribution to that. But in the podcast, he’s only playing at being a teacher with Linh Da and she’s only pretending being a student. His only goals are getting her to answer questions within his metaphor that seem like she achieved comprehension. This means she never gets a chance to try out the structures of the source domain on the target domain. And because of this she never gets to develop any understanding that could later be used as a foundation for further metaphors. Without this, adding more to the mix feels like an avalanche of analogies.

Case study 2: The explanation illusion at Wired magazine

But Data Skeptic is not the worst example of this type of pseudo-teaching by explanation. Only the most recent in my mind. A possibly much worse example is the Wired magazine series in which one expert supposedly explains a technical concept at 5 levels of difficulty: 5-7-year old, young teen, college student, graduate student, and another expert. These explanations often involve some level of metaphor, but they are mostly pointless. The conceit is that anybody can understand these concepts at “some level”. But the explanations do not equal understanding as is amply demonstrated in the videos. The people being explained to do not usually develop any new understanding. And it is doubtful whether the people watching do either.

Some of these are because the topic just is not appropriate to be explained to a certain audience. A 5 or 13-year old do not need to understand (nor do they have the background to) things like CRISPR or the Conectome. At best, they may learn which discipline they belong to, but that’s just teaching them a new name. No understanding of the phenomena is necessary.

But even when the understanding is well within reach and might have its use, the ‘expert’ fumbles. Thus the great and inventive musician Jacob Collier failed to explain the concept of ‘harmony’ to any of his charges. First, he tried to convince a five year old that harmony is a way of expressing a feeling with music (as opposed to melody). This is not only too abstract, it is also wrong. Both harmony and melody express feelings. But harmony is different notes played on top of one another rather then in sequence as in a melody (the feelings come from the pitch distance between the tones). This is well within the scope of understanding of a 5-year old when accompanied by some examples. No elaborate metaphors are necessary. But Jacob Collier goes into a very abstract explanation concluding with the most pointless question in any teacher’s arsenal: ‘does this make sense?’ to which he gets a an ‘uhuh’ from the child who clearly has no clue.

But explaining anything to 5-year-olds is hard. So does he do better with a teen? No. He still sticks with the metaphor of harmony as adding emotion to a melody. But then he mixes in the idea of harmony being a journey. To illustrates this, he goes from demonstrating a simple major / minor cord distinction to a jazz chord substitution. Which is wonderful and impresses the student but does not illustrate the concept of harmony to her.

No explanation happens at the higher levels either because all of the others (culminating in jazz giant Herbie Hancock) know the key concepts. So Collier just chats with them about harmonization and reharmonization. Which also reveals that that’s what he had in his mind with the 5-year old and the teen – he was just explaining a much more advanced concept under the label of the simple one.

One of the commenters on the video made an astute observation:

“it’s interesting how in the earlier levels it has to do more with theory and as you get higher up the level it goes back to nature and life experience and emotions. It’s almost as if, as the complexity increases, there’s also a level of fundamental basic understanding of nature and how it goes hand in hand at the most complex level” (emusik97531 [DL fixed small typos])

Essentially, as the level of the underlying understanding grows, the simple metaphor of journey, place and feeling have the most impact. At the lower levels, they just hang in there, not doing much of anything. They may feel like an invitation, but they don’t have any way to be used as a tool for understanding.

At the higher levels, Collier also shows that maybe he could be a great teacher to somebody closer to his level of skill and understanding. But it also reveals the pointlessness of an isolated act of explanation with (or without) metaphor if it is not supported by the hard work of making the connections necessary for the metaphor to become a proper instrument or a catalyst.

This is not a particular critique of Jacob Collier but rather of the whole set up of the series by Wired. Nobody could succeed in this setting. The concept is either going to be hard at the low levels or too basic at the higher ones.

The inglorious history of metaphorical explanation in education

Collier and Pilich, as well as countless others, are in illustrious company of people who overestimate what explanation can do in the process of learning.

Socrates in a famous scene from (‘Meno’) walks a slave boy through a series of questions “proving” that he already knew the answer to how to ‘double’ the area of the square. B F Skinner (1965) [PDF] called the Socratic method modeled on this example “one of the great frauds in the history of education”. Setting aside the metaphysics of innate transcendental knowledge Socrates was after, the boy clearly did not learn anything through the interaction. He would not even be able to recreate the proof at a later point. He never got a chance to develop an understanding. This is very much reminiscent of the long-suffering Linh Da who simply answers questions without getting the point of them at any stage and clearly not being able to reconstruct the argument later.

Another giant of philosophy, Rousseau, constructed a thought experimental student inEmile (because, by his own admission, he found teaching actual students too ill-suited to his temperament). Rousseau took the imaginary Emile on a similarly Socratic journey to create the perfect ‘natural man’. Rousseau’s Emile always immediately gets the point of his metaphors and learns the right lesson as if by magic. He rarely does anything in the way of practice – although he perhaps has more time to assimilate new knowledge than Socrates’ victim.

There is much of Rousseau and Socrates in all teachers. Explanations and metaphors are heady stuff while boring practice such as that Skinner was hoping to replace by his teaching machines is the embodiment of tedium for all involved. But without some sort of practice-like engagement with the subject, no understanding is possible. Educators often leave this for the spaces ‘in-between’ teaching events – invisible to them other than as returned homework assignments. Students who succeed have somehow figured out how to do that unmentioned task of conceptual practice. This then looks like effortless insight to the students who struggle.

How to actually use metaphors for explanation

So, is there a way to avoid the pitfalls we encountered above? As we saw, the first step should be asking oneself whether this is a time for more explanations and if metaphors are the best way of arriving at a useful understanding.

We must also remember that there is no such thing as a perfect explanation or perfect metaphor. Not everybody finds the conceptual work of cognitively decoupling one domain so that it can be projected on another easy to do or even useful. But at some point a metaphor is the only way to go about explaining something.

So when it comes time to construct the metaphor, we must make sure of two things.

First, we have to find the right source domain for the metaphor that can be projected onto the target domain so that the student can achieve useful understanding of some aspect of the target. This happens pretty much through a process of trial and error. Which means, we’re unlikely to happen on the right metaphor on the first try.

Second, we have to make sure we have a good grasp on the possible projections between the two domains. I broadly described the process in my guide to metaphor hacking. We have to decide on what the purpose of the metaphor is and whether successful mappings can be made between the two domains. But we have to keep exploring both domains to see if there are any mappings that would result in a misunderstanding. These then have to be explicitly cut off from the metaphor.

For example, a virus is a good metaphor for a piece of software that ‘infects’ your computer. But we must also specify that this can only happen by executing the software, not by simple exposure of 2 PCs in the same room.

The teacher must know when to abandon a metaphor as much as when to bring one up. Some metaphors are local and others are global. The global metaphors are particularly dangerous because they can lock out possible alternative sources of understanding.

Switching between metaphors is essential. But it also contains a danger. The biggest mistake teachers (including this one) make when students say don’t understand, is to fill the air with more different explanations. Yes, these may be necessary. But first give the student some space and time to integrate this into their current level of understanding.

The teacher also has to make sure that the student already has sufficient mental representations from both domains to be able to make the projections between them catch onto something. A computer virus metaphor is useless if student knows nothing about viruses but it also does not help, if the student knows nothing about computers.

Particularly when metaphors are used as catalysts, it is important to investigate the source domain as much as the target domain. For instance, if we use the metaphor ‘education is business’, we may want to look at various aspects of the way businesses work to see if there are unexpected dangers in using this metaphor globally. Then, if we decide that schools should run along the same model that New York restaurants do, we should ask what is the equivalent of a restaurant going out of business, or a customer having a bad meal. And what happens if we start thinking of education as a dining experience? Etc.

Finally, it is essential that we pay attention to what happens before and after the metaphor. Each student will bring a slightly different understanding of both the source and the target domains. Can we rely on them coming up with the same mappings on their own? And, if we think of the metaphor as an instrument for dealing with a particular concept, we must make sure we teach the students how it works and give them enough time to practice with it before we leave them to their own devices.

There is no perfect procedure for building a metaphor that explains a new concept. And the metaphor is always only a small part of the process of understanding. We must pay attention to the hard work necessary before a metaphor can be used. And we must think about the work required afterward for the metaphor to continue its usefulness.

Good metaphors are often remembered by students and teacher alike for a long time with emotional salience. But even the best metaphor becomes simply a fond memory of a past moment of enlightenment without any understanding if it is not being continually exercised and stretched. It is far too common for people to just remember the source domain with only the vaguest glimpses of the target domain distorted by time.

Ultimately, any metaphor-based explanation can be but a singular event in the continual process of understanding. Metaphors, when used well, can be great instruments for further exploration. But when used poorly, they are but ornaments on an empty box of the vacant mind.

Post script:

Lest there be any doubt. I have not only seen others make the mistakes I mention here. I have made them all myself. Again and again and again. Deepest apologies to my students.

What would make linguistics a better science? Science as a metaphor

Share

Background

This is a lightly edited version of a comment posted on Martin Haspelmath’s blog post “Against traditional grammar – and for normal science in linguistics“.  In it he offers a critique of the current linguistic scene as being unclear as to its goals and in need of better definitions. He proposes ‘normal science’ as an alternative:

In many fields of science, comparative research is based on objective measurements, not on categories that are hoped to be universal natural kinds. In linguistics, we can work with objectively defined comparative concepts (Haspelmath 2010).

While I am in broad agreement with the critique, I’m not sure the solution is going to lead to a ‘better science’ of linguistics. (Also, I’m not sure that this is an accurate description of how science actually works.)

Problem with ‘normal science’ approach to linguistics

I would say that the problem with the ‘normal science’ approach is that it makes it seem natural to turn to a structural description as a mode of ‘doing good linguistics’. But I think that this is misleading as to the nature of language. The current challenge of the radical potential coming from the constructionism (covering Fillmore, Croft, Goldberg) on the one hand and cognitive semantics (Lakoff, Talmy, Langacker) on the other makes a purely structural description (even imbued with functionalism) less appealing as the foundation of a newly scientificised linguistics.

It’s curious that it’s physics and chemistry that get mentioned in this context with their fully mathematised personas and not biology or geography. In both of those, precise definitions are much more provisional and iterative. Even foundational terms such as species or gene are much more fluid and less well defined than it might seem (I recommend Keller’s ‘Century of the Gene’ for an account of how discrepancies on how gene is defined among various labs was actually beneficial for the development of genetics).

That’s not to say that I strongly disagree with any of Haspelmath’s proposals but they don’t particularly make me excited to do linguistics. I found Dixon’s ‘Basic Linguistic Theory’ an exhilarating read but it was not because I felt that Dixon’s programme would lead to more consistency but because it was a radically new proposal (despite his claims to the contrary) for a theoretical basis for a comparative linguistic research agenda. Which is also why I like Haspelmath’s body of work (exemplified in projects such as the World Atlas of Language Structures and Glottolog).

But I doubt that the road ahead is in better definitions. I’m not opposed to them just skeptical that they will lead to much. The road ahead is in better data and better theory. I think that between corpus linguistics, frame semantics and construction grammar we can get both. I proposed the analogy of ‘dictionary and grammar being to language what standing on one foot is to running‘ . I think linguistics needs to embrace the dynamism of language as a human property rather than as a fixed effect (to borrow Clark’s phrase). Fillmore and Kay’s early writing on construction grammar was a first step but things seemed to have settled into the bad old ways of static structural description.

Data and theory need each other in a dialectic fashion. You need data to create a theory but you needed some proto-theory to see the data in the first place. And then you need your theory to collect more data and that data then further shapes your theory which in turns let you see the data in different ways. The difference between biology and linguistics is that our proto-theories of the biological world correspond much better to the dynamic structures which can be theorized (modeled) based on systematic data collection and its modeling. Which is why folk taxonomies of the biological world are much closer to those of botany or zoology than folk taxonomies of language are to linguistic structures. (They are much more elaborate – to start with – at least at the level available to human perception.)

My proposal is to take seriously human ability to reflect (hypostaticaly) on the way they speak (cf. Talmy’s defense of introspection) because this is at the start of any process that bootstraps a theory of language. We then need to be mindful of the way this awareness interacts with the subconscious automaticity in which the patterns of regularity we call structures seem to be used. In the same way that Fillmore and Kay asked any theory of grammar to account for the exceptions (or even take them as a starting point), I’d want to ask any theory of language to take bilingualism and code mixing as its starting point (inspired by Elaine Chaika) and take seriously the variability of acquisition of the ability to automatically use those structures.

None of this is precludes or denies the utility of the great work of linguistics like Haspemath. But it is what I think would lead to linguistics being a ‘better’ science (at least, in the sense of Wissenschaft or ‘natural philosophy’ rather than in the sense implied by the physics envy which often characterizes these efforts).

Update: 

After I finished writing this, I was listening to this episode of the Unsupervised Thinking podcast where the group was discussing two papers critiquing some of the theoretical foundations of biology (“Can a biologist fix a radio?”) and neuroscience (“Could a Neuroscientist Understand a Microprocessor?”). The general thrust of the discussion was that better definitions would be important. Because they would allow better measurement and thus quantifiable models.  But the discussion also veered towards the question of theory and pre-theoretical knowledge. To me it underscored the tension between data and theory.

My concern is only about the assumption that definitions are the solution. But I’d say that a definition (unless purely disambiguating of polysemy) is just a distillation or a snapshot of a slice in time in the never-ending push and pull of data and the model used to make sense of it as well as collect it (otherwise known as theory). This is not that different from the definition of a lexical item in a dictionary.

That is not to deny the heuristic usefulness of definitions. Which reminded me of the critique of modern axiomatic mathematics (in particular set theory and number theory) exemplified by NJ Wildberger in his online courses on Math Foundations. Wildberger is also calling for more precise definitions in mathematics and less reliance on axioms.

Future directions:

I outlined some of the fundamental epistemological problems with definitions (as a species of a referential theory of meaning).

I’m working on a more extensive elaboration of some of the issues of comparing epistemic heuristics used to model the physical and the social world with a subtitle “The differential susceptibility of units to idealization in the social and physical realms” that addresses some of the questions I outlined above.

In it, I want to suggest that the key difference between the social and physical sciences is due to how easy it is to usefully idealize units and sets in the physical and the social world. Key passage from this:

All of physics is based on idealization. You have ideal gas, perfect motion, perfect vacuum, etc. All of Newtonian physics  is based on the mathematical description of a world where things like friction don’t exist. An ideal world, if you will. Platonic, almost. And it turned out that this type of idealization can take us extremely far, if we let engineers loose on it.

Because all the progress we attribute to science has really been made by engineers. People who take a ballistic curve and ask ‘how about we add a little cross wind’. The modern world of technology around us is all built on tolerances – encoded in books of tables describing how far can we take the idealized formulas of science into the non-ideal conditions of the ‘real’ world.

In the social world, the ideal individual and ideal society are more difficult to treat as units of analysis than perfect vacuum or ideal gas. But even if we could define them, there’s far less we can do with them that would make them anywhere as useful as the idealizations of physics and chemistry. That’s why engineering a solution is a positive description when we talk about the physical world but a negative when we talk about social.

Cats and butterflies: 2 misunderstood analogies in scientistic discourse

Share

Butterfly effect and Schrödinger’s cat are 2 very common ways of signalling one’s belonging to the class of the scientifically literate. But they are almost always told wrong. They were both constructed as illustrations of paradoxes or counterintuitive findings in science. Their retelling always misses the crucial ‘as if’.

This is an example of metaphor becoming real through its sheer imagistic and rhetorical power. But it also underscores the need to carefully investigate both domains being mapped onto each other, as well as the act of mapping. Metaphors used generatively are only useful if they are abandoned quickly enough. In this case, the popular imagination not only did not abandon the metaphor, it made it into a literal statement with practical consequences.

The way the two narratives are constructed is usually in the form of:

Science shows us that

  1. Cats can be both dead and alive in a box with an aparatus controlled by superimposed quantum states.
  2. Butterflies can cause hurricanes on the other side of the world.

But the actual formulation should be:

Science produces a lot of counterintuitive and (seemingly) paradoxical results some of which are at odds with each other and/or our experience of the world. For instance:

  1. If we were to apply Heisenberg’s quantum uncertainty principle to the world we know, we would have to admit that a cat in a box with a quantum killing machine is dead and alive at the same time. And that is obviously nonsense.
  2. If we were to apply what we know about chaos theory to the world of causes and effects, we would have to admit that a butterfly flapping its wings on one side of the world, can cause a hurricane on another side of the world. And that is obviously nonsense because butterflies have no real impact on the world on scales larger than a flower.

Both Schrodinger and Lorenz were trying to illustrate the counterintuitive conclusions of their respective scientific fields – quantum mechanics and dynamic (chaotic) systems. And in both cases, it badly backfired.

Although Schödinger’s dilemma was much more foundational to the structure of our universe, it was Lorenz, whose funny paper to an obscure symposium did the more lasting damage.

It is of no practical consequence whether or not in the retelling of Schrödinger’s paradox, we omit the word ‘paradox’ and assert that ‘science tells us that there are machines that can make cats alive and dead at the same time.’ This is merely par for the course in the general magical nature of the public scientific discourse. And it can even spur the development of new models of the physical universe.

But the ‘butterfly effect’ is more dangerous because it seems like it could have practical applications. Lorenz was not stating a paradox per se, only a counterintuitive conclusion that goes against our most common scenarios of causal relations. Our basic experience of the world is that big things move big things and small things don’t. So any suggestions we can make small things move big things seems intriguing. We know levers and pullies can get us some of the way towards that, so the dream of a magical lever is always there. Homeopathy “works” on the same magical principle. But this is not the lesson of complexity.

The most common reformulation of the ‘Butterfly effect’ is: ‘small actions can have a big impact’. However, this confuses sensitivity to initial conditions with a cumulative cascade effect. Every single snowflake contributes equally to an avalanche as do all the other aspects of the environment. Although none of them individually have any effect on the world at human scale, when combined, they can move much larger objects. But it is that combination (into a larger whole) that has the impact.

Whether any particular snowflake is the last to fall or somebody clapped loudly near a snow drift just before the avalanche fell says nothing about the ability of small things to have big effects. Only about our inability to measure small variations in big things accurately enough.

It is not true that a butterfly’s wings cause anything but minute variations of air right next to them. But it may be true that they are one of the infinite variations of the whole weather system that is simply impossible to measure with finite precision. It’s not that it is hard to calculate all the variations, it is that there are more variations than we have atoms to calculate them with.

Sure, we can talk about proximate and ultimate causes, but that again hides the problem of calculation. And if we ignore the practical problems of measuring the infinite with finite tools, we only get mired in philosophical musings on prime movers and free will. And these have yet to lead anywhere in two and a half millennia.

The only thing we can learn from the butterfly effect is that we cannot measure complex systems accurately enough to predict their behavior over the long term with enough precision. The big mismatch is that while the variation in ‘initial conditions’ is too small to measure, the variation in the outcomes is not. And that feels wrong.

Complexity is unsurprisingly too complicated to be captured by a single metaphor. The ‘butterfly effect’ is a good metaphor for the sensitivity to initial conditions aspect of it. But only if we understand that it is a metaphor that illustrates the counterintuitive nature of complexity and not complexity itself.

The larger lesson here is that metaphor is a process. It doesn’t just lie in a bit of text waiting for us to encounter it and understand it. It is picked up as part of stories. It is told, retold, reexamined, abandoned, readopted, and so on. If you unleash a generative metaphor on the world, you should keep an eye on it to make sure it’s still doing the job you meant it to do. That means a lot of talking and then more talking. Just like with butterflies, the ultimate outcome is never certain. That is fine. Metaphors are supposed to open up new spaces. But some of those spaces may have lions in them and we do know that lions have big impacts on human scales. Bon appetit!

3 burning issues in the study of metaphor

Share

I’m not sure how ‘burning’ these issues are as such but if they’re not, I’d propose that they deserve to have some kindling or other accelerant thrown on them.

1. What is the interaction between automatic metaphor processing and deliberate metaphor application?

Metaphors have always been an attractive subjects of study. But they have seen an explosionof interest since ‘Metaphors we live by’ by Lakoff and Johnson. In an almost Freudian turn, these previously seemingly superfluous baubles of language and mind, became central to how we think and speak. All of a sudden, it appeared that metaphors reveal something deeper about our mind that would otherwise remain hidden from view.

But our ability to construct and deconstruct metaphors was mostly left unexamined. But this happens ‘literally’ all the time. People test the limits of ‘metaphor’ through all kinds of dicoursive patterns. From, saying things like ‘X is more like Y’ to ‘X is actually Y’ or even ‘X is like Y because’.

How does this interact with the automatic, instantaneous and unconscious processing of language. (Let’s not forget that this is more common)

2. What is the relationship between the cognitive (conceptual) and textual metaphor?

Another way to pose this question is: What happens in text and cognition in between all the metaphors? Many approaches to the study of metaphor only focus on the metaphors they see. They seem to ignore all the text and thought in between the metaphorical. But, often, that is most of goes on.

With a bit of effort, metaphors can be seen everywhere but they are not the same kind of thing. ‘Time is money’, ‘stop wasting my time’, and ‘we spent some time together’ are all metaphorical and relying on the same conceptual metaphor of TIME IS A SOMETHING THAT CAN BE EXCHANGED. But they are clearly not doing the same job of work for the speaker and will be interpreted very differently by the listener.

But there’s even more at stake. Imagine a sentence like ‘Stop wasting my time. I could have been weeding my garden spending time with my children instead of listening to you.’ Obviously, the ‘wasting time’ plays a different role than in a sentence ‘Stop wasting my time. My time is money and when you waste my time, you waste my money.’ The coceptual underpinnings are the same, but way they can be marshalled into meaning is different.

Metaphor analysts are only too happy to ignore the context – which could often be most of the text. I propose that we need a better model for accounting for metaphor in use.

3. What are the different processes used to process figurative language

There are 2 broad schools of the psychology of metaphor. They are represented by the work of Sam Glucksberg and Raymond Gibbs. The difference between them can be summarised as ‘metaphor as polysemy’ vs ‘metaphor as cognition’. Metaphor, according to the first, is only a kind of additional meaning, words or phrases have. While the second approach sees it as a deep interconnected underpinning of our language and thought.

Personally, I’m much closer to the cognitive approach but it’s hard to deny that the experimental evidence is all over the place. The more I study metaphor, the more I’m convinced that we need a unified theory of metaphor processing that takes both approaches into account. But I don’t pretend I have a very clear idea of where to even start.

I think such a theory would also have to account for differences in how inviduals process metaphors. There are figurative language pathologies (e.g. gaps in ability to process metaphor is associated with autism). But clearly, there are also gradations in how well individuals can process metaphor.

Any one individual is also going to vary over time and specific instances in how much they are able  and/or willing to consider something to be metaphorical. Let’s take the example of ‘education is business’. Some people may not consider this to be a metaphor and will consider it a straightforward descriptive statement along the lines of ‘dolphins are mammals’. Others will treat it more or less propositionally but will dispute it on the grounds that ‘education is education’, and therefore clearly not business. But those same people may pursue some of the metaphorical mappings to bolster their arguments. E.g. ‘Education is business and therefore, teachers need to be more productive.’ or ‘Education is not business because schools cannot go bankcrupt’.

Bonus issue: What are the cognitive foundations shared by metaphor with the rest of language?

This is not really a burning issue for metaphor studies so much as it is one for linguistics. Specifically semantics and pragmatics but also syntax and lexicography.

If we think of metaphor as conceptual domain (frame) mapping, we find that this is fundamental to all of language. Our understanding of attributes and predicates relies on the same ability to project between 2 domains as does understanding metaphor. (Although, there seems to be some additional processing burden on novel metaphors).

Even seemingly simple predicates such as ‘is white’ or ‘is food’ require a projection between domains.

Compare:

  1. Our car is white.
  2. Milk chocolate is white.
  3. His hair is white.

Our ability to understand 1 – 3 requires that we map the domain of the ‘subject’ on to the domain of the ‘is white’ predicate. Chocolate is white through and through whereas cars are only white in certain parts (usually not tires). Hair, on the other hand, is white in different ways. And in fact, ‘is white’ can never be fully informative when it comes to hair because there are too many models. In fact, it is even possible to have opposite attributes mean the same thing. ‘Nazi holocaust’ and ‘Jewish holocaust’ are both use to label the same event (with similar frequency) and yet it is clear that they refer to one event. But this ‘clarity of meaning’ depends on projections between various domains. Some of these include ‘encyclopedic knowledge’. For instance, ‘Hungarian holocaust’ does not possess such clarity outside of specialist circles.

It appears that understanding simple predicates relies on the same processes as understanding metaphor does. What makes metaphor special then? Do we perhaps need to return to a more traditional view of metaphor as a rhetorical device but change the way we think about language?

That is what I’ve been doing in my thinking about language and metaphor but most linguistic theories treat these as unremarkable phenomena. This leads them to ignore some pretty fundamental things about language.

3 “easy” things that are hard for both humans and AI

Share

Everybody is agog at what AI systems can do. Nobody thought even 10 years ago that machines could be trained to recognise images or transcribe natural speech as well as they do now. And because of this leap forward everybody has started worrying about AI taking over the world because it will soon be able to do everything people can but better.

On the other hand, there are AI naysayers who point at incredible feats of human creativity and ingenuity and say ‘no machines will ever be able to write a poem’ or ‘manage a company’.

While I’m more than skeptical about the true possibilities of AI, I am equally ekeptical about this supposed limitless human creativity that is beyond the bounds of computation.

I think we can reveal more about the limits and nature of human intelligence and thus the targets (possible limits) for AI development, if we look at very simple things with which both humans and AI struggle albeit in different ways.

Machines are often thought of as capable only of algorithmic processing (such as adding lots of numbers) and humans are thought to excel at massively parallel tasks – also known as intuition (such as telling part dogs from cats). But we will see that they seem to trade these roles in the ways they approach and fail at these appartently simple problems.

I call these problems ‘easy’ because they can be broken into very easy and straightforward components. But they are hard if not impossible in reality because of the curse of dimensionality. Even the slightest variation in those simple components, will grow into an exponential mess.

1. Figuring out time zones

Apple Watches recently stopped working because of Summer Time in Australia. And just the other day, Outlook asked me if I wanted to switch to a continental time zone in Europe. After I said yes, it started scheduling all meetings 2 hours off.

On the other hand, I’ve been arranging meetings between 2 time zones 1 hour apart for 20 years and I still get it wrong about 3 times out of 10.

So what gives? Time zones are conceptually very straightforward. You just have a database of times and places with notes on what time it is when and where relative to some fixed point. Then all you do is subtract anywhere between 1 and 12. What could be easier?

Well, you have to add in change of dates, so you have to switch between today, tomorrow and yesterday quite a lot. But still. There is a finite number of times and places and their combinations, so how hard can it be for human programmers to sit down and write all the code once and for all? Turns out, incredibly hard. There are just too many permutations and they keep changing as the database of times and places is being updated with new information.

So, the magnitude of the problem seems to be too great for humans to come up with exhastively detailed algorithms to deal with it. (To be clear, the core has been solved, but we don’t seem to be able to nail all the edge cases. Things would be a lot worse without computers.)

So why don’t we unleashe machine deep learning on the problem? Well, partly because there’s no good data for a machine to learn on. This is mostly an algorithmic problem. But the inputs of the algorithm come from very human perceptions of how time relates to cyclical things like days and relationships like comparing states between time zones with respect to days. Again, none of this is all that complex. But the algorithmic part is too complex for humans to describe as a series of if-then commands to a computer without making lots of mistakes. And the perspective and context part seems to be completely outside of what any ML algorithm can access at the moment.

So we’re stuck with something that mostly works but not always and is mostly understood but also always confusing.

2. Scheduling a meeting

Scheduling meetings is another simple algorithmic problem related to time. Simply compare two series of numbers, find where they differ and spit out the result. But all of this starts interacting with a lot of human complexities that make the problem completely intractable if what we wanted to do is write a series of commands in the form of ‘if you see this, do that’.

That’s why the work of the human assistant handling the scheduling for a busy person (who rates an assistant) is not just to provide her intelligence or understanding of calendars. It involves conversations with the person whose calendar is being managed about their priorities, options, possible scenarios, conversations with other people, other assistants, eventually arriving at some compromise which is then entered into the straightforward if-then format of a calendar.

It is this conversation with other people that is often overlooked (“tell your people to call my people”). In the case of the assistants of the other busy people who rate an assistant also have to synthesize the priorities and value of their charges through the same process of conversations and adjustments.

The final matching algorithm is very simple – so simple that it seems like noone should need a human assistant any more. But the inputs into the algorithm need to come from sources that are too rich and complex to treat algorithmically or through some multi-dimensional analysis of hidden regularities (like deep neural nets). The inputs require either a fairly general artificial intelligence (although not full blown AGI) or that everyone keeps their calendar in the same way. (Even then we’d probably have to deal with travelling-salesman problems – but at least we have some ideas about the limits on the computability of those.)

There are many individual components of this process that could be algorithmically assisted. But often the simpler algorithms and heuristics such as preference polls and shared calendars are more effective aids than opaque machine learning output.

So although, this looks like a problem that should be solvable through ML, early attempts have been less than impressive.

3. Importing data about people into a table (deduplication)

Computers are great as aids to managing structured data. But the input into the structure has to be provided by humans. Can AI help here?

Imagine you’re organising an event and you want people to tell you if they’re coming. You can just ask and keep in your head who said yes. But that soon becomes too much. As a next step, you ask them to email or to send an RSVP so you can look at the messages and remind yourself who said yes. But even that becomes difficult soon, so you start a list. And the more people and events you need to manage, the more complicated the list is and the more time you have to spend structuring your data and inputing it into some sort of a table for managing and reviewing the data.

The world is littered with Excel sheets kept by event organisers. Now imagine you wanted to feed all the idiosyncratic Excel sheets with event information at a large organisation into a machine learning algorithm and get one number of the total number of participants or the total cost of lunch breaks.

If everybody kept their spreadsheets exactly the same way, this would be trivial. But they don’t. Computers make the task of managing this kind of structured data much easier but they constantly struggle with errors in the input from busy, overworked and cognitively limited humans.

On the surface of it, there’s nothing to prevent this part (ie participant registration and management) of event management from being completely automated. But there’s always a person involved in dealing with this. So could there be an AI system that does all of this? So far we’re not even very close to this. A system that processes some RSVPs via email, others via forms, and others from other sources (“Hi, Clare, Frances told me she was coming to your party!”) does not exist.

So let’s simplify the task even more. Take data from one table of a system (let’s say a registration table) and put it into another table on a different system (let’s say account creation). All the AI system would have to do is figure out what is important to one system and get the right data from another system. At the moment, humans are involved. In better cases by creating an API and programming algorithms to transfer data between systems. In the worse case, they download a spreadsheet from one system, modify it, if needed, and upload it into another system.

This is trivial if you’ve designed both systems and know they have to integrate. But the permutations get out of hand surprisingly quickly when you take any 2 random systems designed by different people for a similar purpose but without the intention to integrate. Is ‘Last name’ always the ‘Second name’, when the full name is in one column, is first name always first? Any one difficulty is easy to spot for a human and disambiguate. But it gets very error prone at scale and there are always some unexpected edge cases.

Even such a simple thing as contact deduplication between two devices of one person is not a completely solved problem.

Why isn’t there an AI system that can evaluate the data and transfer it as appropriate but at scale and without the errors human data processors or programmers or data processing algorithms make?

As always, even the most trivial algorithms require very complex inputs. And with even minor variation in the possible inputs, the if-then logic becomes too unweildy. Although computers in general are great at pursuing if-then logic chains regardless of complexity (within limits), AI algorithms are not. They provide guesses with probabilities. In certain areas, most notably speech and image recognition, their guesses are becoming very good and resembling humans. They may even outperform humans at scale.

But all the if-then part of what to do with these guesses is still handled by if-then algorithms designed by humans. There’s some talk of ‘Programming 2.0’ but nobody seems to be applying it to some of the day-to-day simple problems with complex scaling. Because even small errors in the inputs result in big aggregate problems and AI systems have no way of assessing whether their guesses ‘make sense’.

Is AI impossible?

Maybe AI is just too hard. But these examples don’t claim it’s impossible, they just show that some difficult problems are just difficult. Even if they appear straightforward on the surface.

I have learned not to bet against engineers’ ability to figure out solutions in the long run. It’s not always clear what is solvable by AI and what is not ahead of time. Sometimes, specialised ML systems can be developed to solve problems that don’t generalise (e.g. GO or chess machines). But I would have expected more people to deal with these problems, if there was an easy solution to be found. And there hundreds more similar task-based problems that just won’t be magicked away by slapping the label ‘AI’ on it. Individual ones may be solved by one way or another. Perhaps by breaking them into component parts. But I’m not seeing any specific steps being taken to create general purpose Machine Learning that would deal with all of them. Just wishful thinking about AGI (Artificial General Intelligence) emerging to solve these problems without regard to the actual complexity of some of them or the complexity of the intermediate steps it would take to get there.