Category Archives: Framing

Explanation is an event, understanding is a process: How (not) to explain anything with metaphor

Share

TL;DR

  • There are at least 3 uses of metaphor in the educational process: 1. Invitation to enter; 2. An instrument to grasp knowledge with; 3. Catalyst to transform understanding. Many educators assume that 1 is enough but it rarely leads to any useful understanding.
  • Explanation is a salient part of the educational process to such an extent that it is often allowed to stand for all of it even though it is only one step.
  • Explanation often helps the person doing the explaining more than the person being explained at.
  • Metaphors and explanations have been misused by educators from Socrates to Rousseau.
  • A metaphor can only be successful if the student already has some knowledge of the target domain. Knowledge of the source domain is often less important.
  • Metaphor only makes sense if it’s part of a process of learning. It doesn’t do much good on its own.

How metaphors work in helping us understand things

Teachers love explaining things. Students love understanding things. On the rare occasions that the two coincide, the feeling of joy shines like a beacon for the power of explanation. Teachers tell stories of seeing the “lightbulbs come on” in their students’ eyes. Students remember fondly the ecstatic moments of sudden illumination as their teacher’s words suddenly lit up the darkness within them. Thus the myth of the teaching as explaining and learning as understanding those explanations was born.

Most of the more powerful explanations rely on metaphor in the broadest possible sense. In fact, all explanation is to some extent metaphorical in that it provides a projection from one domain of understanding onto another. Metaphor brings out the familiar – or ex plains it – in the unfamiliar. Or so the story goes.

We can think of the metaphoric projection as putting two thin sheets of paper looking at them against a bright light. What can be on these sheets? Sketches, images, words or even just smudges of color. The projection then obscures certain things and shows others in new contexts. Sometimes, with more complex slides we may see completely new shapes and color hues. The process of making sense of the metaphor then involves slight adjustments in how those two sheets align against one another. This can be described as the metaphor giving a new structure to the target domain.

Another way to think about metaphoric projection is as two sets of items which are mapped onto each other. We can put the sets side by side and draw lines between items we think match. Or we can take them out and place them side by side in a new set. We often see them displayed in this way.

Note: This way of thinking about metaphor started with Lakoff and Johnson’s ‘Metaphors we live by’ from 1980. This led to the formulation of the Conceptual Metaphor Theory. It was later developed into a more general theory of frames or mental models by Turner and Fauconnier (2002) known as the theory of conceptual integration or blending. But it can also be found in Donald A Schön’s ‘Displacement of Concepts’ from 1963 which indirectly inspired Lakoff and Johnson.

But despite all this, it is easy to overlook that in order to form a projection from one mental space into another, we have to have some structure in both. In fact, metaphor often assumes equal knowledge of both domains, and in the process of making a projection from one another, a new previously unimagined structure emerges that is a blend of both domains. Because of the complexity, it is hard to give brief examples, but Turner’s and Fauconnier’s ‘The Way We Think’ is full of very illuminating case studies.

But it is also not at all uncommon for metaphor to borrow from a domain we know much less about to elucidate a domain we know a lot about. For example, if I hear, ‘don’t go into that office, the boss is on a warpath’, I understand a lot more about the boss’s behaviour than I do about any warpaths. Here, only the general feeling of ferocity is transferred with none of the possible association of weaponry or military supply lines.

Metaphor is also always partial. It would make no sense to project every aspect of both domains onto one another. But the ability to understand which bits it makes sense to project and which must be left out also requires at least some understanding of both domains. To understand what we mean when we call a piece of software a ‘virus’ we must know enough about computers to know that the infection cannot be transmitted through simple touch.

Metaphor at its most powerful helps us understand both domains better. It also often results in the creation of new understanding of both domains as we strive to find the limits of possible cross-domain mappings. Often, this happens with honest historical explanations of the present. By comparing the Iraq war to Vietnam, we may only choose to transfer the feeling of emotion and loss associated with the former. But we may also choose to explore both in their own right to find the best way in which they project on to one another. And this gives us new understanding of both.

Three uses of metaphor in explanation

There are many ways to classify the uses of metaphors, I’ve outlined some in an early paper. But for the purposes of metaphor in explanation, I’d like to offer three broad types: 1. Metaphor as invitation; 2. Metaphor as instrument; 3. Metaphor as catalyst. I fear that the first type may be most common while only the second two play any real role in building understanding. These three types could also be viewed as forming a sort of process but this is not inherent in the definition.

As we will see, sometimes the same metaphor can serve all three roles, providing a certain thread through the process of learning. But most often, we need new metaphors for each type or stage.

1. Metaphor as invitation

Novice students often come to a new subject with no knowledge and a healthy dose of fear of the unknown. To help them feel more comfortable, teachers often reach for metaphors relying on the familiar. This gives the learner a chance to grasp onto something while they build up sufficient mental representations of the new domain.

But this use of metaphor usually does not help understanding. It just provides emotional support along the arduous journey towards that understanding. It can also backfire. Teachers often spin up these kinds of metaphors in such a way that they assume an understanding of the unfamiliar. And it is only once students have bootstrapped themselves into some understanding of the subject that the metaphor starts to make any sense to them.

For instance (to use a famous example), we can teach students that the electrical current is like a flow of water. This certainly takes some fear out of the invisible world of electrons. But unless students have at least some prior understanding of electricity, they may ask questions like ‘how do you get the water into the wires?’

This type of metaphor can only be used for a fleeting moment and it must be followed by hard work of accumulating understanding of the new domain on its own terms. Perhaps with the use of more metaphors, this time of the instrumental kind.

2. Metaphor as an instrument

The instrumental use of metaphor for explanation is where real understanding starts to happen. But not all teachers are as good at it. In this case, the metaphor provides a way for the student to grasp the new subject. A lens to see it through, or a mental instrument to manipulate it with. Such metaphors are essential to the learning process. However, they do not rely on the moment of instant insight, which they can sometimes trigger, but rather on continuing exploration of the projection between the two domains. Their usefulness is less in the feeling of illumination than in their availability to be used over and over again.

For instance, electrical engineers may be able to make better judgments about certain properties of electrical circuits when they think of electrons as a flow of water. But in other instances, they may be better off when they think about electrons as lots of tiny balls rubbing against one another, generating heat. This metaphor can come up over and over to help them mentally manipulate the two domains.

Here, as with all metaphors, it is essential that we know when to let go. Or even better, when to switch to a different or even a contradictory metaphor. These instrumental metaphors can be local or global but it is rare that one will be enough.

3. Metaphor as catalyst

In the third use, the metaphor plays the role of a catalyst. Like a powder dissolved into a liquid, it makes a new substance in which both domains are transformed into one unified understanding. This is when the student transforms into a scholar. Making independent judgments, challenging the teacher’s own understanding, and ultimately becoming her own teacher. To work as a catalyst, the metaphor may be very rich and detailed or just a quick sketch resulting in a slight shift of perspective. But it always requires solid knowledge of the target domain.

Let’s continue with our electrical current example. Here, the student comes not only to understand that sometimes electricity behaves like a liquid and sometimes like a collection of particles, they also come to see the complexity of liquids and particles. They start making predictions both ways and ask questions like ‘What if we thought of the flow of water as a collection of particles?’, etc.

Here the metaphor becomes a process without an end. It spurs new mixtures and remixtures as one finds out more about the two (and often more) domains. Unlike with instrumental and invitational metaphors, it is no longer important that the metaphor be apt. It is just important that it is useful for new understandings or the possibilities of these new understandings. Donald Schön called one subtype of these ‘generative metaphor’.

But as with the other types, it is important that these metaphors come with some sort self-destruct mechanism.

What often happens is that these metaphors are taken up by those who presume that they map fully onto the target domain and that no other understanding of the target domain is necessary. I described how this is a problem with Schroedinger’s cat, or Lorenz’s hurricane-triggering butterflies.

What’s even worse, teachers often use these metaphors far too soon. This either confuses students or, worse, it gives them an illusion of understanding that they do not possess.

How NOT to use metaphor to explain something – two case studies

Case study 1: Metaphor gap in data science

My first case study of a bad use of explanation with metaphor is the podcast Data Skeptic. In fact, listening to the most recent episodes prompted me to write this in the first place.

I must preface this by saying that I like the podcast and recommend it to others who want to understand modern data science. It covers important subjects and there is much to learn from it. Its one unfortunate feature, however, are certain episodes when the host, data scientist Kyle Polich, uses his wife, project manager and English major, Linh Da Tran as co-host and tries to explain concepts from abstract computational theory to her. Or rather at her.

This almost invariably fails. Not because Linh Da does not possess the raw intelligence or aptitude to understand these concepts but because Kyle confuses metaphor with explanation and explanation with understanding.

In two recent episodes, he attempted to explain attention in neural networks and Neural Turing Machines. It was an unmitigated disaster. As the metaphors kept piling up, Linh Da finally cried out “I don’t know what you want me to understand”. That’s exactly the problem with a metaphor that only relies on the understanding of the source domain. It serves as a good invitation to the subject but as a very bad instrument for developing an understanding.

There are several problems with this set up that make it a bad place for too many metaphors. First, Linh Da is clearly just humoring Kyle. She’s vaguely interested in machine learning as a phenomenon but has no real interest in putting much work in to learn about how it works. This forces Kyle into more and more metaphors about their pet bird Yoshi. These are useful socially and emotionally because they allow Linh Da to contribute to the discussion. But her contributions at every turn show that she cannot use any of the analogies to make useful inferences about the subject. She almost never brings up previous subjects. At the end of the episode on Neural Turing Machines, she asked who owns the Turing Machine. In all the torrent of analogies, Kyle neglected to stress that the Turing Machine is itself a metaphor. This is despite a prior episode where another guest explained why Turing Machines are important very clearly.

The conceit of the episodes is that data science can be explained even to English majors. That is certainly correct. But those majors must be willing to put in some work between episodes or have some prior knowledge. And as the subjects get more technical or abstract, the explanations have to get longer and include some practice time. And the amount of this practice needs to increase as if the practice was filling a funnel and not a test tube. Namely, to get from level B to C requires more work than getting from level A to B. Otherwise, the metaphors have nothing to hold on to. They constantly invite the student in but then offer no tools for going further. At best, they will confuse the learner and at worst, they will give them an illusion of understanding. About as useful as a seat belt made from masking tape.

While it is pleasant learning about these concepts through listening in on a married couple having a light-hearted conversation, at a certain stage, this pedagogic device just gets in the way of learning by the audience. Initially, the listener can just do their own metaphor mapping and ask the right questions in their head. But as the abstractness level increases, the host doing the explaining cannot go into sufficient depth because the co-host can’t keep up. And the increasingly convoluted and unnecessary metaphors just create a mental fog that descends over all.

I was particularly disappointed in the episode on Attention in neural networks which is something I wanted to learn more about. I found the initial metaphor of attention as a sort of memory span very useful but then it got stuck because Linh Da could not use it to go any further. This was because she was not given a chance to integrate the previous episodes where similar things were discussed. It was still useful to me because then I could go read about attention with a renewed perspective. But an opportunity for a deeper exploration of the metaphor was wasted.

This would have been fine if the episode was aimed at general public with no other understanding rather than an interested audience with some prior background. But even then, the general public would have needed more and different information to make any sense of it.

At one point Kyle, raised the possibility that maybe he wasn’t an effective teacher because Linh Da could not understand something he had explained. But in fact, he was not being a teacher at all. In this setting, he’s just a provider of images. Like a documentary from the Serengeti where the audience remembers there are lions, but could not place it on a map.

I can imagine that Kyle would be a very effective teacher with students who are interested in the subject and if he had a chance to take them through it step by step. And his use of metaphors would be a valuable contribution to that. But in the podcast, he’s only playing at being a teacher with Linh Da and she’s only pretending being a student. His only goals are getting her to answer questions within his metaphor that seem like she achieved comprehension. This means she never gets a chance to try out the structures of the source domain on the target domain. And because of this she never gets to develop any understanding that could later be used as a foundation for further metaphors. Without this, adding more to the mix feels like an avalanche of analogies.

Case study 2: The explanation illusion at Wired magazine

But Data Skeptic is not the worst example of this type of pseudo-teaching by explanation. Only the most recent in my mind. A possibly much worse example is the Wired magazine series in which one expert supposedly explains a technical concept at 5 levels of difficulty: 5-7-year old, young teen, college student, graduate student, and another expert. These explanations often involve some level of metaphor, but they are mostly pointless. The conceit is that anybody can understand these concepts at “some level”. But the explanations do not equal understanding as is amply demonstrated in the videos. The people being explained to do not usually develop any new understanding. And it is doubtful whether the people watching do either.

Some of these are because the topic just is not appropriate to be explained to a certain audience. A 5 or 13-year old do not need to understand (nor do they have the background to) things like CRISPR or the Conectome. At best, they may learn which discipline they belong to, but that’s just teaching them a new name. No understanding of the phenomena is necessary.

But even when the understanding is well within reach and might have its use, the ‘expert’ fumbles. Thus the great and inventive musician Jacob Collier failed to explain the concept of ‘harmony’ to any of his charges. First, he tried to convince a five year old that harmony is a way of expressing a feeling with music (as opposed to melody). This is not only too abstract, it is also wrong. Both harmony and melody express feelings. But harmony is different notes played on top of one another rather then in sequence as in a melody (the feelings come from the pitch distance between the tones). This is well within the scope of understanding of a 5-year old when accompanied by some examples. No elaborate metaphors are necessary. But Jacob Collier goes into a very abstract explanation concluding with the most pointless question in any teacher’s arsenal: ‘does this make sense?’ to which he gets a an ‘uhuh’ from the child who clearly has no clue.

But explaining anything to 5-year-olds is hard. So does he do better with a teen? No. He still sticks with the metaphor of harmony as adding emotion to a melody. But then he mixes in the idea of harmony being a journey. To illustrates this, he goes from demonstrating a simple major / minor cord distinction to a jazz chord substitution. Which is wonderful and impresses the student but does not illustrate the concept of harmony to her.

No explanation happens at the higher levels either because all of the others (culminating in jazz giant Herbie Hancock) know the key concepts. So Collier just chats with them about harmonization and reharmonization. Which also reveals that that’s what he had in his mind with the 5-year old and the teen – he was just explaining a much more advanced concept under the label of the simple one.

One of the commenters on the video made an astute observation:

“it’s interesting how in the earlier levels it has to do more with theory and as you get higher up the level it goes back to nature and life experience and emotions. It’s almost as if, as the complexity increases, there’s also a level of fundamental basic understanding of nature and how it goes hand in hand at the most complex level” (emusik97531 [DL fixed small typos])

Essentially, as the level of the underlying understanding grows, the simple metaphor of journey, place and feeling have the most impact. At the lower levels, they just hang in there, not doing much of anything. They may feel like an invitation, but they don’t have any way to be used as a tool for understanding.

At the higher levels, Collier also shows that maybe he could be a great teacher to somebody closer to his level of skill and understanding. But it also reveals the pointlessness of an isolated act of explanation with (or without) metaphor if it is not supported by the hard work of making the connections necessary for the metaphor to become a proper instrument or a catalyst.

This is not a particular critique of Jacob Collier but rather of the whole set up of the series by Wired. Nobody could succeed in this setting. The concept is either going to be hard at the low levels or too basic at the higher ones.

The inglorious history of metaphorical explanation in education

Collier and Pilich, as well as countless others, are in illustrious company of people who overestimate what explanation can do in the process of learning.

Socrates in a famous scene from (‘Meno’) walks a slave boy through a series of questions “proving” that he already knew the answer to how to ‘double’ the area of the square. B F Skinner (1965) [PDF] called the Socratic method modeled on this example “one of the great frauds in the history of education”. Setting aside the metaphysics of innate transcendental knowledge Socrates was after, the boy clearly did not learn anything through the interaction. He would not even be able to recreate the proof at a later point. He never got a chance to develop an understanding. This is very much reminiscent of the long-suffering Linh Da who simply answers questions without getting the point of them at any stage and clearly not being able to reconstruct the argument later.

Another giant of philosophy, Rousseau, constructed a thought experimental student inEmile (because, by his own admission, he found teaching actual students too ill-suited to his temperament). Rousseau took the imaginary Emile on a similarly Socratic journey to create the perfect ‘natural man’. Rousseau’s Emile always immediately gets the point of his metaphors and learns the right lesson as if by magic. He rarely does anything in the way of practice – although he perhaps has more time to assimilate new knowledge than Socrates’ victim.

There is much of Rousseau and Socrates in all teachers. Explanations and metaphors are heady stuff while boring practice such as that Skinner was hoping to replace by his teaching machines is the embodiment of tedium for all involved. But without some sort of practice-like engagement with the subject, no understanding is possible. Educators often leave this for the spaces ‘in-between’ teaching events – invisible to them other than as returned homework assignments. Students who succeed have somehow figured out how to do that unmentioned task of conceptual practice. This then looks like effortless insight to the students who struggle.

How to actually use metaphors for explanation

So, is there a way to avoid the pitfalls we encountered above? As we saw, the first step should be asking oneself whether this is a time for more explanations and if metaphors are the best way of arriving at a useful understanding.

We must also remember that there is no such thing as a perfect explanation or perfect metaphor. Not everybody finds the conceptual work of cognitively decoupling one domain so that it can be projected on another easy to do or even useful. But at some point a metaphor is the only way to go about explaining something.

So when it comes time to construct the metaphor, we must make sure of two things.

First, we have to find the right source domain for the metaphor that can be projected onto the target domain so that the student can achieve useful understanding of some aspect of the target. This happens pretty much through a process of trial and error. Which means, we’re unlikely to happen on the right metaphor on the first try.

Second, we have to make sure we have a good grasp on the possible projections between the two domains. I broadly described the process in my guide to metaphor hacking. We have to decide on what the purpose of the metaphor is and whether successful mappings can be made between the two domains. But we have to keep exploring both domains to see if there are any mappings that would result in a misunderstanding. These then have to be explicitly cut off from the metaphor.

For example, a virus is a good metaphor for a piece of software that ‘infects’ your computer. But we must also specify that this can only happen by executing the software, not by simple exposure of 2 PCs in the same room.

The teacher must know when to abandon a metaphor as much as when to bring one up. Some metaphors are local and others are global. The global metaphors are particularly dangerous because they can lock out possible alternative sources of understanding.

Switching between metaphors is essential. But it also contains a danger. The biggest mistake teachers (including this one) make when students say don’t understand, is to fill the air with more different explanations. Yes, these may be necessary. But first give the student some space and time to integrate this into their current level of understanding.

The teacher also has to make sure that the student already has sufficient mental representations from both domains to be able to make the projections between them catch onto something. A computer virus metaphor is useless if student knows nothing about viruses but it also does not help, if the student knows nothing about computers.

Particularly when metaphors are used as catalysts, it is important to investigate the source domain as much as the target domain. For instance, if we use the metaphor ‘education is business’, we may want to look at various aspects of the way businesses work to see if there are unexpected dangers in using this metaphor globally. Then, if we decide that schools should run along the same model that New York restaurants do, we should ask what is the equivalent of a restaurant going out of business, or a customer having a bad meal. And what happens if we start thinking of education as a dining experience? Etc.

Finally, it is essential that we pay attention to what happens before and after the metaphor. Each student will bring a slightly different understanding of both the source and the target domains. Can we rely on them coming up with the same mappings on their own? And, if we think of the metaphor as an instrument for dealing with a particular concept, we must make sure we teach the students how it works and give them enough time to practice with it before we leave them to their own devices.

There is no perfect procedure for building a metaphor that explains a new concept. And the metaphor is always only a small part of the process of understanding. We must pay attention to the hard work necessary before a metaphor can be used. And we must think about the work required afterward for the metaphor to continue its usefulness.

Good metaphors are often remembered by students and teacher alike for a long time with emotional salience. But even the best metaphor becomes simply a fond memory of a past moment of enlightenment without any understanding if it is not being continually exercised and stretched. It is far too common for people to just remember the source domain with only the vaguest glimpses of the target domain distorted by time.

Ultimately, any metaphor-based explanation can be but a singular event in the continual process of understanding. Metaphors, when used well, can be great instruments for further exploration. But when used poorly, they are but ornaments on an empty box of the vacant mind.

Post script:

Lest there be any doubt. I have not only seen others make the mistakes I mention here. I have made them all myself. Again and again and again. Deepest apologies to my students.

Does machine learning produce mental representations?

Share

TL;DR

  • Why is this important? Many people believe that mental representations are the next goal for ML and a prerequisite for AGI.
  • Does machine learning produce mental representations equivalent to human ones in kind (if not in quality or quantity)? Definitely not, and there is no clear pathway from current approaches to a place where it would. But it is worth noting that mental representations in humans are also not something straightforward to identify or describe.
  • Is there a currently viable approach to ML that could eventually lead to mental representations with more engineering? It appears not but then again, no one expected neural nets would get so successful.

Update: Further discussion on Reddit.

Background

Over the last few months, I’ve been catching up more systematically on what’s been happening in machine learning and AI research in the last 5 years or so and noticed that a lot of people are starting to talk about the neural net developing a ‘mental’ representation of the problem at hand. As someone who’s preoccupied with mental representations a lot, this struck me as odd because what was being described for the machine learning algorithms did not seem to match what else we know about mental representations.

So I’ve been formulating this post when I was pointed to this interview with Judea Pearl. And he makes exactly the same point:

“That sounds like sacrilege, to say that all the impressive achievements of deep learning amount to just fitting a curve to data. From the point of view of the mathematical hierarchy, no matter how skillfully you manipulate the data and what you read into the data when you manipulate it, it’s still a curve-fitting exercise, albeit complex and nontrivial.”

He continues:

“If a machine does not have a model of reality, you cannot expect the machine to behave intelligently in that reality.”

What does this model of reality look like? Pearl seems to reduce it to ‘cause and effect’ but I would suggest that the model needs more than that (Note: I haven’t read his book just the interview and this intro.)

What are mental representations?

Mental representations are all sorts of images (ranging from rich to schematic and from static to dynamic) in our mind on which we draw sometimes consciously but mostly unconsciously to deal with the world. They are essential for producing and understanding language (from even the simplest sentence) and for basic reasoning. They can be represented as schemas, rich images, scenarios, scripts, dictionaries or encyclopedic entries. They can be in many modalities – speech, sound, image, moving picture.

Here are some examples to illustrate.

Static schemas

What does ‘it’ refer to in pairs of sentences such as these (example from here):

  1. The trophy wouldn’t fit into the suitcase because it was too big.
  2. The trophy wouldn’t fit into the suitcase because it was too small.

It takes no effort at all for a human to determine that it in (1) refers to trophy and in (2) to suitcase. Why, because, we have schemas of containment and we know almost intuitively that big things don’t fit into smaller things. And when we project that schema onto trophy and suitcase we immediately know what has to be too big or too small in order for one not to fit into the other.

You can even do it with a single sentence as in Jane is standing behind Clare so you cannot see her. It is clear that her refers to Jane and not Clare but only because we can project a schema of 2 similar-sized objects positioned relative to the observer’s line of sight.

So we also know that only sentence 1 below makes sense because of the schema we have for things of unequal size being positioned relative to each other and their impact on our ability to see them.

  1. The statue is in front of the cathedral.
  2. The cathedral is in front of the statue.

However, unlike with the trophy and suitcase, it is possible to imagine contexts in which sentence 2 would be acceptable. For instance, in a board game where all objects are printed on blocks of the same size and positioned on a 2D space.

This is to illustrate that the schemas are not static but interact with the rich conceptualisations we create in context.

Force dynamics

This is a notion pioneered by Leonard Talmy that explains many aspects of cognitive and linguistic processes through dynamic schemas of proportional interaction. Thus we know that all things being equal, bigger things will influence smaller things, faster things will overtake slower things, etc.

So we can immediately interpret the it in sentences such as:

  1. The foot hit the ball and it flew off.
  2. The bird landed on the perch and it fell apart.

But we also apply these to more abstract domains. We can thus easily interpret the situations behind these 2 sentences:

  1. The mother walked in and the baby calmed down.
  2. The baby walked in and the mother calmed down.

If asked to tell the story that led to 1 or 2, people would converge on very similar scenarios around each sentence.

Knowledge of the world

Sometimes, we marshall quite rich (encyclopedic) knowledge of the world in understand what we hear or see. Imagine what is required to match the following 2 pairs of sentences (drawing on Langacker):

  1. The Normans conquered England with …
  2. The Smiths conquered England with …
  1. … their moody music.
    b. … their superior army.

Obviously the right pairings are 1b and 2a. But none of this is contained in the surface form. We must have the ‘encyclopedic’ knowledge of who The Normans and The Smiths were but also the force dynamic schemas of who can conquer who.

So on hearing the sentence ‘Mr and Mrs Smith conquered Britain’, we would be looking for some metaphorical mapping to explain the mismatch between the force we know conquering requires and the force we know a married couple can exert. With sufficiently rich knowledge, this is immediately obvious as in ‘John and Yoko conquered America.’

How does machine learning do on interpreting human mental representations?

For AI, examples such as the above are a difficult challenge. It was recently proposed that a much more effective and objective Turing test would be to ask an AI to interepret sentences such as these under the [ Winograd Schema Challenge] (https://en.wikipedia.org/wiki/Winograd_Schema_Challenge).

A database of pairs of sentences such as:

  1. The city councilmen refused the demonstrators a permit because they feared violence.
  2. The city councilmen refused the demonstrators a permit because they advocated violence.

This has the great advantage of perfect objectivity. Unlike with the Turing test, it is always clear which answer is correct.

The best machine learning algorithms use various tricks but they still only do slightly better than chance (57%) at interpreting these schemas.

The only problem is that it is quite hard to construct these pairs in a way that could not be solved with simple statistical distributions. For instance, the Smiths and Normans example above could be easily resolved with current techniques simply by searching which words occur most frequently together.

Also, it is not clear how the schematic and force dynamic aspects interact with the encyclopedic aspects. Can you have one without the other? Can we classify the Winograd schema sentences into different types, some of which would be more suspectible to ML approaches?

Do mental representations exist?

There is a school of thought that claims that mental representations do not actually exist. There is nothing like what I described above in the brain. It is actually just a result of perceptual task orientation. This is the ecological approach developed in the study of perception and physical manipulation (such as throwing or catching a ball).

I am always very sceptical of any approach that requires we find some bits of information resembling what we see stored in the brain. Which is why I am quite sympathetic to the notion that there are no actual mental representations directly encoded into the synaptic activations of our brain.

But even if all of these were just surface representations of completely different neural processes, it is undeniable that something like mental representations is necessary to explain how we think at speak at some level. At the very least to articulate the problems that have to be solved by machine learning.

Note: I have completely ignored the problem of embodiment which would make things even more complicated. Our bodily experience of the world is definitely involved. But to what extent are our bodies actually a part of the reasoning process (as opposed to the brain as an independent computational contrl module) is a subject of hot debate.

How does machine learning represent the problem space?

Now, ML experts are not completely wrong to speak about representations. Neural nets certainly build some sort of representation of the problem space (note, I don’t call it world). We have 4 sources of evidence:

  1. Structure of data inputs: Everything is a vector encoded as a string of numbers.
  2. Patterns of activation in the neural nets (weights): This is where the ‘curve fitting’ happens.
  3. Performance on real world tasks: More reliable than humans on dog breed recognition but penguins can also be identified as pandas.
  4. Adversarial attacks: Adding seemingly random and imperceptible noise to a image or sound can make it produce radically different outputs.

If we take together the vector inputs and the weights on the nodes in the neural net, we have one level of representation. But that is perhaps the less interesting and as complexity increases, it becomes impossible to truly figure out much about it.

But is it possible that all of that actually creates some intermediate layer that has the same representational properties as mental representations? I would argue that at this stage, it is all inputs and weights and all the representational aspects are provided by the human interpreting the outputs. But if we only had the outputs, we could still posit some representational aspects. But the adversarial attacks reveal that the representational level is missing.

Note: Humans can also be subject to adversarial attacks with all sorts of perceptual and cognitive illusions. They seem to be on a different representational level to me but they would be worth exploring further in this context.

Update: A commenter on Reddit suggested that I look at this post on feature visualisation and I think that mostly supports my point. It looks like there are lots of representations shown in that article, but they are really just visualisations of what inputs lead to certain neuron activations on specific layers of the neural net. Those are not ‘representations’ the neural net has independent access to. I think in the same way, we would not think of Pavlov’s dogs salivating on the sounds of the bell as having ‘mental representation’ of the ‘bell means food’ causal connection. Perhaps we could rephrase the question of whether training a neural net is similar to classical or operant conditioning.  and what that means with respect to the question of representation.

Can we create mental representations in machines?

Judea Pearl thinks that nothing current ML is doing is going to lead to a ‘model of the world’ or as I call it ‘mental representations’. But I’m skeptical that his solution is a path to mental representations either:

“The first step, one that will take place in maybe 10 years, is that conceptual models of reality will be programmed by humans.”

This is what the early AI expert systems tried to do but it proved very elusive. One example of manually coding mental representations is FrameNet, a database of words linked to semantic frames but it barely scratches the surface. For instance, here’s the frame for container which links to suitcase. But that still doesn’t help with the idea of trophy being sometimes small enough to fit and sometimes too big. I can see how FrameNet could be used on very small subsets of problems but I don’t see a way for scaling it up in a way that could take into account everything involved in the examples I mentioned. We are faced with the curse of dimensionality here. The possible combinations just grow too fast for us to compute them.

I’m also not sure that simply running more data through bigger and bigger RNNs or CNNs will get us there either. But I can’t rule out that brute force won’t get us close enough for it not to matter that mental representations are not involved.

Perhaps, if label enough text of some subdomain with framenet schemas, we could train a neural net on this. But that will help with the examples where rich knowledge of the world is not required. We can combine a schema of a suitcase and a trophy with that of ‘fit’ and match ‘it’ with the more likely antecedent. Would that approach help with the demonstrators and councilmen? But even if so, the Winograd Schema Challenge is only an artificially constructed set of sentence pairs designed for a particular purpose. The mental representations involved crop up everywhere all the time. So we not only need a way of invoking mental representations but also a way to decide if they are needed and, if so, which ones.

Machine learning fast and slow up the garden path

Let’s imagine that we can somehow engineer a solution that can beat the Winograd Schema Challenge. Would that mean that it has created mental representations? We may want to reach for Searl’s ‘Chinese Room Argument’ and the various responses to it. But I don’t think we need to go that deep.

One big aspect of human intelligence that is often lumped together with the rest is metacognition. This is the ability to bring the process of thinking (or speaking) to conscious awarenes and control it (at least to a degree). This is reminiscent of Kahneman’s two systems in ‘Thinking Fast and Slow’.

Machine learning produces almost exclusively ‘fast thinking’ – instantaneous matching of inputs to outputs. It is the great advance over previous expert system models of AI which tried to reproduce slow thinking.

Take for instance the famous Garden path sentences. Compare these 2:

  1. The horse raced past the barn quickly.
  2. The horse raced past the barn fell.

Imagine the mental effort required to pause and retrace your steps when you reach the word ‘fell’ in the second sentence. It is a combination of instantanous production of mental images that crash and slow deliberate parsing of the sentence to construct a new image that is consistent with our knowledge of the world and the syntactic schema used to generate it.

Up until the advent of stochastic approaches to machine learning in the 1990s (and neural nets in 2010s), most AI systems tried to reproduce the slow thinking through expert systems encoded as decision trees. But they mostly failed because the slow thinking only works because of the fast thinking which provides the inputs to it. Now neural nets can match complex patterns that we once thought impossible. But they do it very differently from us. There doesn’t seem to be much thinking about how to go about developing the sort of metacognition that is required to combine the two. All of the conditional decisionmaking around what to do with the outputs of ML algorithms has to be hardcoded. Alexa can recognize my saying ‘turn on bedroom light’ but I had to give it a name and if I want to make it part of a more complex process (make sure bedroom light is off when I leave home), I have to go to IFTTT.

I don’t see how Pearl’s approach will take us there. But I don’t see an alternative, either. Perhaps, the mental representations will emerge epiphenomenally as the neural nets grow and receive more sophisticated inputs about the spatial nature of world (rather than converting everything to vectors). Maybe they will be able to generate their own schemas as training inputs. I doubt it, but wouldn’t want to bet against it.

What is just as likely is that we will reach a plateau (maybe even resulting in a new AI winter) that will only see incremental improvements and won’t take the next step until a completely new paradigm emerges (which may not happen for decades if ever).

Conclusion

It is not always obvious that more in-depth knowledge of a domain contributes to a better model of it. We are just as likely to overfit our models as to improve them when we dive too deep. But I think that mental representations at least reveal an important problem domain which should be somehow reflected in what machines are being taught to learn.

Update

In response to a comment on Reddit, I wanted to add the following qualification.

I think I ended up sounding a bit more certain than I feel. I know I’m being speculative but I note that all the critics are pointing at hypotheticals and picking at my definition of mental representation (which is not necessarily unwarranted).

But what I would like to hear is a description of the next 5 specific problems to be solved to get nearer to say 75% on the Winograd Schema Challenge that can then be built on further (ie not just hacking around collocation patterns Watson style).

I also wanted to note that I omitted a whole section on the importance of collocability in language with a reference to Michael Hoey’s work on Lexical Priming, which I think is one of the 2 most important contributions to the study of language in the last 20 years, the other being William Crofts Radical Construction Grammar. The reading of which would be of benefit to many ML researchers along with Fauconnier’s and Turner’s The Way We Think.

Not ships in the night: Metaphor and simile as process

Share

In some circles (rhetoric and analytics philosophy come to mind), much is made of the difference between metaphor and simile.

(Rhetoricians pay attention to it because they like taxonomies of communicative devices and analytic philosophers spend time on it because of their commitment to a truth-theoretical account of meaning and naive assumptions about compositionality).

It is true that their surface and communicative differences have an impact in certain contexts but if we’re interested in the conceptual underpinnings of metaphor, we’re more likely to ignore the distinction altogether.

But what’s even more interesting, is  to think about metaphor and simile as just part of the process of interpersonal meaning construction.  Consider this quote from a blog on macroeconomics:

[1a] Think of [1b] the company as a ship. [2] The captain has steered the ship too close to the rocks, and seeing the impending disaster has flown off in the ship’s helicopter and with all the cash he could find. After the boat hit the rocks no lives were lost, but many of the passengers had a terrifying ordeal in the water and many lost possessions, and the crew lost their jobs. [3] Now if this had happened to a real ship you would expect the captain to be in jail stripped of any ill gotten gains. [4] But because this ship is a corporation its captains are free and keep all their salary and bonuses. [5] The Board and auditors which should have done something to correct the ship’s disastrous course also suffer no loss.

Now, this is really a single conceptual creation but it happens in about 5 moves which I highlighted above. (Note: I picked these 5 as an illustrative heuristic but this is not to assume some fixed sequence).

[1] The first move establishes an idea of similarity through a simile. But it is not in the traditional form of ‘X is like Y’. Rather, it starts with the performative ‘Think of’ [1a] and then uses the simile ‘as’. [1b]. ‘Think of X as Y’ is a common construction but it is rarely seen as an example in discussions of similes.

[2] This section lays out an understanding of the source domain for the metaphorical projection. It also sets the limit on the projection in that it is talking about ‘company as a ship traveling through water’ in this scenario, not a ship as a metonym for its internal structure (for instance, the similarities in the organisational structure of ships and companies.) This is another very common aspect of metaphor discourse that is mostly ignored. It is commonly deployed as an instrument in the process of what I like to call ‘frame negotiation’. On the surface, this part seems like a narrative with mostly propositional content that could easily stand alone. But…

[3] By saying, ‘if this happened to a real ship’ the author immediately puts the preceding segment into question as an innocent proposition and reveals that it was serving a metaphorical purpose all along. Not that any of the readers were really lulled into a false sense of security, nor that the author was intending some dramatic reveal. But it is an interesting illustration of how the process of constructing analogies contains many parts.

[4] This part looks like a straightforward metaphor: ‘the ship is a corporation’ but it is flipped around (one would expect ‘the corporation is a ship’. This move links [2] and [3] and reminds us that [1].

[5] This last bit seems to refer to both domains at once. ‘The board and the auditors’ to the business case and ‘ships course’ to the narrative in the simile. But we could even more profitably think of it as referring to this new blended domain in which we have a hypothetical model in which both the shipping and business characteristics were integrated.

But the story does not end there, even though people who are interested in metaphors often feel that they’ve done enough at this stage (if they ever reach it). My recommended heuristic for metaphor analysts is to always look at what comes next. This is the start of the following paragraph:

To say this reflects everything that is wrong with neoliberalism is I think too imprecise. [1] I also think focusing on the fact that Carillion was a company built around public sector contracts misses the point. (I discussed this aspect in an earlier post.)

If you study metaphor in context, this will not surprise you. The blend is projected into another domain that is in a complex relationship to what precedes and what follows. This is far too conceptually intricate to take apart here but it is of course completely communicatively transparent to the reader and would have required little constructive effort on the part of the author (who is most likely to have spent time on constructing the simile/metaphor and its mappings but little on their embedding into the syntactic and textual weave that give it its intricacy).

In the context of the whole text, this is a local metaphor that plays as much an affective as it does a  cognitive role. It opens up some conceptual spaces but does not structure the whole argument.

The metaphor comes up again later and in this case it also plays the role of an anaphor by linking 2 sections of the text:

Few people would think that never being able to captain a ship again was a sufficient disincentive for the imaginary captain who steered his boat too close to the rocks.

Also of note is the use of the word ‘imaginary’ which puts that statement somewhere between a metaphor (similarity expressed as identity) and simile (similarity expressed as comparison).

There are two lessons here:

  1. The distinction between metaphor and simile could be useful in certain contexts but in practice, their use blends together and is not always easy to establish boundaries between them. But even if we could, the underlying cognition is the same (even if truth-conditionally they may differ on the surface). We could even complicate things further and introduce terms such as analogy, allegory, or even parable in this context but it is hard to see how much they would help us elucidate what is going on.

  2. Both metaphor and simile are not static components of a larger whole (like bricks in a wall or words in a dictionary). They are surface aspects of a rich and dynamic process of meaning making.  And the meaning is ‘literally’ (but not really literally) being made here right in front of our eyes or rather by our eyes.  What metaphor and simile (or the sort of hybrid metasimile present here) do is  help structure the conceptual spaces (frames) being created but they are not doing it alone. There are also narratives, schemas, propositions,  definitions, etc. All of these help fill out the pool of meaning into which we may slowly immerse ourselves or hurtle into headlong.  This is not easy to see if we only look at metaphor and simile outside their natural habitat of real discourse. Let that be a lesson to us.

How to read ‘Women, Fire and Dangerous Things’: Guide to essential reading on human cognition

Share

Note:

These are rough notes for a metaphor reading group, not a continuous narrative. Any comments, corrections or elaborations are welcome.

Why should you read WFDT?

Women, Fire, and Dangerous Things: What Categories Reveal About the Mind is still a significantly underappreciated and (despite its high citation count) not-enough-read book that has a lot to contribute to thinking about how the mind works.

I think it provides one of the most concise and explicit models for how to think about the mind and language from a cognitive perspective. I also find its argument against the still prevalent approach to language and the mind as essentially fixed objects very compelling.

The thing that has been particularly underused in subsequent scholarship is the concept of ‘ICMs’ or ‘Idealised Cognitive Models’ which both puts metaphor (for work on which Lakoff is most well known) in its rightful context but also outlines what we should look for when we think about things like frames, models, scripts, scenarios, etc. Using this concept would have avoided many undue simplifications in work in the social sciences and humanities.

Why this guide

Unfortunately, the concision and explicitness I extolled above is surrounded by hundreds of pages of arguments and elaborations that are often less well-thought out than the central thesis and have been a vector for criticism (I’ve responded to some of these in my review of Verena Haser’s book).

As somebody who translated the whole book into Czech and penned extensive commentary on its relevance to the structuralist linguistic tradition, I have perhaps spent more time with it than most people other than the author and his editors.

Which is why when people ask me whether to read it, I usually recommend an abreviated tour of the core argument with some selections depending on the individual’s interest.

Here are some of my suggestions.

Chapters everyone should read

Chapters 3, 4, 5, 6 – Core contribution of the book – Fundamental structuring principles of human cognition

These four chapters summarize what I think everybody who thinks about language, mind and society should know about how categories work. Even if it is not necessarily the last word on every (or any) aspect, it should be the starting point for inquiry.

All the key concepts (see below) are outlined here.

Preface and Chapter 1 – Outline of the whole argument and its implications

These brief chapters lay out succinctly and, I think very clearly, the overall argument of the book and its implications. This is where he outlines the core of the critique of objectivism which I think is very important (if itself open to criticism).

Chapter 2: Precursors

This is where he outlines the broader panoply of thinkers and research outcomes in recent intellectual history whose insights this books tries to systematise and take further.

The chapter takes up some of the key thinkers who have been critical of the established paradigm. Read it not necessarily for understanding them but for a way of thinking about their work in the context of this book.

Case studies

The case studies represent a large chunk of the book and few people will read all 3. But I think at least one of them should be part of any reading of the book. Most people will be drawn to number 1 on metaphor but I find that number 2 shows off the key concepts in most depth. It will require some focus and patience from non-linguists but I think is worth the effort.

Case study 3 is perhaps too linguistic (even though it introduces the important concept of constructions) for most non-linguist.

Key concepts

No matter how the book is read, these are the key concepts I think people should walk away with understanding.

Idealized Cognitive Models (also called Frames in Lakoff’s later work)

I don’t know of any more systematic treatment of how our conceptual system is structured than this. It is not necessarily the last word but should not be overlooked.

Radial Categories

When people talk about family resemblances they ignore the complexity of the conceptual work that goes into them. Radial categories give a good sense of that depth.

Schemas and rich images

While image schemas are still a bit controversial as actual cognitive constructs, Lakoff’s treatment of them alongside rich images shows the importance of both as heuristics to interpreting cognitive phenomena.

Objectivism vs Basic Realism

Although objectivism (nothing to do with Ayn Rand) is not a position taken by any practicing philosophers and feels a bit straw-manny, I find Lakoff’s outline of it eerily familiar as I read works across the humanities and social sciences, let alone philosophy. When people read the description, they should avoid dismissing it with ‘of course nobody thinks that’ and reflect on how many people approach problems of mind and language as if they did think that.

Prototype effects and basic-level categories

These concepts are not original to Lakoff but are essential to understanding the others.

Role of metaphor and metonymy

Lakoff is best known for his earlier work on metaphor (which is why figurative language is not a key concept in itself) but this book puts metaphor and metonymy in perspective of the broader cognition.

Embodiment and motivation

Embodiment is an idea thrown around a lot these days. Lakoff’s is an important early contribution that shows some of the actual interaction between embodiment and cognition.

I find it particularly relevant when he talks about how concepts are motivated but not determined by embodied cognition.

Constructions

Lakoff’s work was taking shape alongside Fillmore’s work on construction grammar and Langacker’s on cognitive grammar. While the current construction grammar paradigm is much more influenced by those, I think it is still worth reading Lakoff for his contribution here. Particularly case studies 2 and 3 are great examples of the power of this approach.

Additional chapters of interest

Elaborations of core concepts

Chapters 17 and 18 elaborate on the core concepts in important ways but many people never reach them because they follow a lot of work on philosophical implications.

Chapter 17 on Cognitive Semantics takes another more deeper look at ICMs (idealized cognitive models) across various dimensions.

Chapter 18 deals with the question of how conceptual categories work across languages in the context of relativism. The name of the book is derived from a non-English example but this takes the question of universals and language specificity head on. Perhaps not the in the most comprehensive way (the debate on relativism has moved on) but it illuminates the core concepts further.

Case studies

Case Studies 2 and 3 should be of great interest to linguists. Not because they are perfect but because they show the depth of analysis required of even relatively simple concepts.

Philosophical implications

Lakoff is not shy about placing his work in the context of disrruption of the reigning philosophical paradigm of his (and to a significant extent our) day. Chapter 11 goes into more depth on how he understands the ‘objectivist paradigm’. It has been criticised for not representing actual philosophical positions (which he explicitly says he’s not doing) but I think it’s representative of many actual philosophical and other treatments of language and cognition.

This is then elaborated in chapters 12 – 16 and of course in his subsequent book with Mark Johnson Philosophy in the Flesh. I find the positive argument they’re making compelling but it is let down by staying on the surface of the issues they’re criticising.

What to skip

Where Lakoff (and elsewhere Lakoff and Johnson) most open themselves to criticism is their relatively shallow reading of their opponents. Most philosophers don’t engage with this work because they don’t find it speaks their language and when it does, it is easily dismissed as too light.

While I think that the broad critique this book presents of what it calls ‘objectivist approaches’ is correct, I don’t recommend that anyone takes the details too seriously. Lakoff simultaneously gives it too little and too much attention. He argues against very small details but leaves too many gaps.

This means that those who should be engaging with the very core of the work’s contribution fixate on errors and gaps in his criticism and feel free to dismiss the key aspects of what he has to say (much to their detriment).

For example, his critique of situational semantics leaves too many gaps and left him open to successful rejoinders even if he was probably right.

What is missing

While Lakoff engages with cognitive anthropology (and he and Johnson acknowledge their debts in the preface to Metaphors We Live By), he does not reflect the really interesting work in this area. Goffman (shockingly) gets no mention, nor does Victor Turner whose work on liminality is pretty important companion.

There’s also little acknowledgement of work on texts such as that by Halliday and Hasan (although, that was arguably still waiting for its greatest impact in the mid 1980s with the appearance of corpora). But Lakoff and most of the researchers in this areas stay firmly at the level of a clause. But give that my own work is mostly focusing on discourse and text-level phenomena, I would say that.

What to read next

Here are some suggestions for where to go next for elaborations of the key concepts or ideas with relevance to those outlined in the book.

  • Moral politics by Lakoff launched his forays into political work but I think it’s more important as an example of this way of thinking applied for a real purpose. He replaces Idealized Cognitive Models with Frames but shows many great examples of them at work. Even if it falls short as an exhaustive analysis of the issues, it is very important as a methodological contribution of how frames work in real life. I think of it almost as a fourth case study to this book.
  • The Way We Think by Gilles Fauconnier and Mark Turner provides a model of how cognitive models work ‘online’ during the process of speaking. Although, it has made a more direct impact in the field of construction grammar, its importance is still underappreciated outside it. I think of it as an essential companion to the core contribution of this book. Lakoff himself draws on Fauconnier’s earlier work on mental spaces in this book.
  • Work on construction grammar This book was one of the first places where the notion of ‘construction’ in the sense of ‘construction grammar’ was introduced. It has since developed in its own substantive field of study that has been driven by others. I’d say the work of Adele Goldberg is still the best introduction but for my money William Croft’s ‘Radical Construction Grammar’ is the most important. Taylor’s overview of the related ‘Cognitive Grammar’ is also not a bad next read.
  • Work on cognitive semantics There is much to read here. Talmy’s massive 2 volumes of ‘Cognitive Semantics’ are perhaps the most comprehensive but most of the work here happens across various journals. I’m not aware of a single shorter introduction.
  • Philosophy and the Mirror of Nature by Richard Rorty is a book I frankly wish Lakoff had read. Rorty’s taking apart of philosophy’s epistemological imaginings is very much complementary to Lakoff’s critique of ‘objectivism’ but done while engaging deeply with the philosophical issues. While I basically go along with Lakoff’s and later Lakoff and Johnson’s core argument, I can see why it could be more easily dismissed than Rorty. Of course, Rorty’s work is also better known for its reputation than deeply reflected in much of today’s philosophy. Lakoff and Johnson’s essential misunderstanding of Rorty’s contribution and fundamental compatibility with their project in Philosophy in the Flesh is an example of why so many don’t take that aspect of this work seriously. (Although, they are right that both Rorty and Davidson would have been better served by a less impoverished view of meaning and language.)

What does it mean when words ‘really’ mean something: Dismiss the Miss

Share

A few days ago, I tweeted a link to an article in TES:

Today, I got the following response back:

@lizzielh is absolutely right. As the title of an as yet unpublished blog post of mine goes: “Words don’t mean things, people mean things”. I even wrote a whole book chapter on that with the same title as this post.

Indeed, if it had been me writing on the topic, I would have chosen a more judicious title. Such as “The legacy of discrimination behind the humble Miss” or “Past and present inequalities encoded in the simple Miss”.

In fact, the only reason I tweeted that article in the first place was because it was making a much more subtle and powerful point than simple etymology (as you would expect from one based on the work of the eminent scholar of language and gender Jennifer Coates). Going all the way back to Language and the Woman’s Place and even before, people have been trying to peg the blame on simple words. All along the response has been, but these are just words, we don’t mean anything bad by them. Or, these are just words, the real harm is done in the real world.

Many women I meet continue to like the Miss/Mrs distinction despite the long availability of the now destigmatized Ms. It was not too long ago that I set up a sign up form with only Prof Dr Mr Ms and got lots of complaints from women who wanted to keep their Miss or Mrs. So restigmatizing Miss is actively harmful to the self-image of many women whose identity is tied with that label. Feminist tend to make light of the ‘unfeminist’ cry of “I like it when men open the door to me”, or “Carrying my bag for me just shows respect”.  Or going back even further, “I don’t need a vote, I exercise my influence through my husband.” But change is literally hard, it takes time and effort, so an attempt at making the world better will always making temporarily worse (at least for some people).

The fact is that Miss is a bound in a network of meanings, interactions and power relations. And even if it takes some mental pain, it’s worth picking at all it covers up.

But not every minute of every day. Sometimes, we need to say something to get from conversational point A to conversational point B and even a laden word may be better than no word. As one of the respondents in the article says:

My response is always that my name isn’t Miss; it’s Mrs Coslett. But if I’m in a school where students don’t know me and they call me Miss, I’m fine with that. They’re showing respect by giving me a title, rather than ‘hey’ or ‘oi, you’ or whatever.

Most of the time contentious words are used, challenging them is not feasible. But she’s wrong in her conclusion:

That’s just the way the English language works.

That’s absolutely not true. Just like words don’t mean anything on their own, language does not just work. It’s used to do things (to riff on Austin’s famous book) by people. It is not always used purposefully but its use is always bound in the many ways and means of people. The way we speak now is a result of centuries of little power plays, imitations of prestige, prescriptions of obedience. When you look closer, they’re all easy to see.

Things have let up considerably since the 1970s. Many fewer people are concerned about how language encodes gender inequality and it’s worthwhile reminding ourselves that many of the historical unfairnesses hidden in word histories are still with us. Just like you can’t get away with saying “I didn’t mean anything by the ‘n’ word”, you can’t just shrug off the critique of the complex tapestry of gender bias in ‘Miss’.

Miss does not “really mean” anything. It’s just a sequence of letters or sounds. And most people using it do not “really mean” anything by it. Or it does not “really mean” anything to them. But context is everything.

It is a truism to say that racism will be done away with when people don’t dislike each other because of the color of their skin. But the opposite is the case. The sign that racism has disappeared is when I can say “I really don’t like black people” simply because I don’t like the color of their skin in the same way I may prefer redheads to blondes. Preference for skin colour is then just a harmless quirk. But we’re centuries away from that because any such preference is tied to a system of discrimination going back a long way.  (BTW: just to avoid misunderstanding, I personally find black skin beautiful.)

The same thing applies to “Miss”, we can’t just turn our back on its pernicious potential. Most of the time it’s hidden from sight but it’s recoverable at a moment’s notice. Because we live in a world where male is still the default position. We have to work to change that. Change our minds, hearts, cognitions and languages. They don’t  just work on their own. We make them work. So let’s make them work for us. The ‘us’ we want to be, rather than the ‘us’ we used to be in the bad old days.

Photo Credit: abdallahh via Compfight cc

What is not a metaphor: Modelling the world through language, thought, science, or action

Share

The role of metaphor in science debate (Background)

Recently, the LSE podcast an interesting panel on the subject of “Metaphors and Science”. It featured three speakers talking about the interface between metaphor and various ‘scientific’ disciplines (economics, physics and surgery). Unlike many such occasions, all speakers were actually very knowledgeable and thoughtful on the subject.

In particular, I liked Felicity Mellor and Richard Bronk who adopted the same perspective that underlies this blog and which I most recently articulated in writing about obliging metaphors. Felicity Mellor put it especially eloquently when she said:

“Metaphor allows us to speak the truth by saying something that is wrong. That means it can be creatively liberating but it can also be surreptitiously coercive.”

This dual nature of coerciveness and liberation was echoed throughout the discussion by all three speakers. But they also shared the view of ubiquity of metaphor which is what this post is about.

What is not a metaphor? The question!

The moderator of the discussion was much more stereotypically ambivalent about such expansive attitude toward metaphor and challenged the speakers with the question of ‘what is the opposite of metaphor’ or ‘what is not a metaphor’. He elicited suggestions from the audience, who came up with this list:

model, diagram, definition, truths, math, experience, facts, logic, the object, denotation

The interesting thing is that most of the items on this list are in fact metaphorical in nature. Most certainly models, diagrams and definitions (more on these in future posts). But mathematics and logic are also deeply metaphorical (both in their application but also internally; e.g. the whole logico mathematical concept of proof is profoundly metaphorical).

Things get a bit more problematic with things like truth, fact, denotation and the object. All of those seem to be pointing at something that is intuitively unmetaphorical. But it doesn’t take a lot of effort to see that ‘something metaphorical’ is going on there as well. When we assign a label (denotation), for instance, ‘chair’ or ‘coast’ or ‘truth’ we automatically trigger an entire cognitive armoury for dealing with things that exist and these things have certain properties. But it is clear that ‘chair’, ‘coast’ and ‘metaphor’ are not the same kind of thing at all. Yet, we can start treating them the same way because they are both labels. So we start asking for the location, shape or definition of metaphor, just because we assigned it a label in the same way we can ask for the same thing about a chair or a coast. We want to take a measure of it, but this is much easier with a chair than with a coast (thus the famous fractal puzzle about the length of the coast of Britain). But chairs are not particularly easy to nail down (metaphorically, of course) either, as I discussed in my post on clichés and metaphors.

Brute facts of tiny ontology

So what is the thing that is not a metaphor? Both Bronk and Mellor suggested the “brute fact”. A position George Lakoff called basic realism and I’ve recently come to think of as tiny ontology. The idea, as expressed by Mellor and Bronk in this discussion, is that there’s a real world out there which impinges upon our bodily existence but with which we can only interact through the lens of our cognition which is profoundly metaphorical.

But ultimately, this does not give us a very useful answer. Either everything is a metaphor, so we might as well stop talking about it, or there is something that is not a metaphor. In which case, let’s have a look. Tiny ontology does not give us the solution because we can only access it through the filter of our cognition (which does not mean consciously or through some wilful interpretation). So the real question is, are there some aspects of our cognition that are not metaphorical?

Metaphor as model (or What is metaphor)

The solution lies in the revelation hinted at above that labels are in themselves metaphors. The act of labelling is metaphorical, or rather, it triggers the domain of objects. What do I mean by that? Well, first let’s have a look at how metaphor actually works. I find it interesting that nobody during the entire discussion tried to raise that question other than the usual ‘using something to talk about something else’. Here’s my potted summary of how metaphor works (see more details in the About section).

Metaphor is a process of projecting one conceptual domain onto another. All of our cognition involves this process of conceptual integration (or blending). This integration is fluid, fuzzy and partial. In language, this domain mapping is revealed through the process of deixis, attribution, predication, definition, comparison, etc. Sometimes it is made explicit by figurative language. Figurative language spans the scale of overt to covert. Figurative language has a conceptual, communicative and textual dimension (see my classification of metaphor use). In cognition, this process of conceptual integration is involved in identification, discrimination, manipulation. All of these can be more or less overtly analogical.

So all of this is just a long way of saying, that metaphor is a metaphor for a complicated process which is largely unconscious but not uncommonly conscious. In fact, in my research, I no longer use the term ‘metaphor’ because it misleads more than it helps. There’s simply too much baggage from what is just overt textual manifestation of metaphor – the sort of ‘common sense’ understanding of metaphor. However, this common sense ordinary understanding of ‘metaphor’ makes using the word a useful shortcut in communication with people who don’t have much of a background in this thought. But when we think about the issue more deeply, it becomes a hindrance because of all the different types of uses of metaphor I described here (a replay of the dual liberating and coercive nature of metaphor mentioned above – we don’t get escape our cognition just because we’re talking about metaphors).

In my work, I use the term frame, which is just a label for a sort of conceptual model (originally suggested by Lakoff as Idealized Cognitive Model). But I think in this context the term ‘model’ is a bit more revealing about what is going on.

So we can say that every time, we engage conceptually with our experience, we are engaging in an act of modelling (or framing). Even when I call something ‘true’, I am applying a certain model (frame) that will engage certain heuristics (e.g. asking for confirmation, evidence). Equally, if I say something like ‘education is business’, I am applying a certain model that will allow me to talk about things like achieving economies of scale or meeting consumer demand but will make it much harder to talk about ethics and personal growth. That doesn’t mean that I cannot apply more than one model, a combination of models or build new models from old ones. (Computer virus is a famous example, but natural law is another one. Again more on this in later posts.)

Action as an example of modelling

The question was asked during the discussion by an audience member, whether I can experience the world directly (not mediated by metaphoric cognition). The answer is yes, but even this kind of experience involves modelling. When I walk along a path, I automatically turn to avoid objects – therefore I’m modelling their solid and interactive nature. Even when I’m lying still, free of all thought and just letting the warmth of the shining sun wash over me, I’m still applying a model of my position in the world in a particular way. That is, my body is not activating my ears to hear the sun rays, nor is it perceiving the bacteria going about their business in my stomach. A snake, polar bear or a fish would all model that situation in a different way.

This may seem like unnecessary extension of the notion of a model. (But it echos the position of the third speaker Roger Kneebone who was talking about metaphor as part of the practice of surgery.) It is not particularly crucial to our understanding of metaphor, but I think it’s important to divert us from a certain kind of perceptual mysticism in which many people unhappy with the limitations of their cognitive models engage. The point is that not all of our existence is necessarily conceptual but all of it models our interaction with the world and switches between different models as appropriate. E.g. my body applies different models of the world when I’m stepping down from a step on solid ground or stepping into a pool of water.

The languages of metaphor: Or how a metaphor do

I am aware that this is all very dense and requires a lot more elaboration (well, that’s why I’m writing a blog, after all). But I’d like to conclude with a warning that the language used for talking about metaphor brings with it certain models of thinking about the world which can be very confusing if we don’t let go of them in time. Just the fact that we’re using words is a problem. When words are isolated (for instance, in a dictionary or at the end of the phrase ‘What is a…’) it only seems natural that they should have a definition. We have a word “metaphor” and it would seem that it needs to have some clear meaning. The kind of thing we’re used to seeing on the right-hand side of dictionaries. But insisting that dictionary-like definition is what must be at the end of the journey is to misunderstand what we’ve seen along the way.

There are many contexts in which the construction “metaphor is…” is not only helpful but also necessary. For example when clarifying one’s use: “In this blog, what I mean by metaphor is much broader than what traditional students of figurative language meant by it.” But in the context of trying to get at what’s going on in the space that we intuitively describe as metaphorical, we should almost be looking for something along to the lines of “metaphor does” or “metaphors feels like”. Or perhaps refrain from the construction “metaphor verb” altogether and just admit that we’re operating in a kind of metaphor-tasting soup. We can get at the meaning/definition by disciplined exploration and conversation.

In conclusion, metaphor is a very useful model when thinking about cognition, but it soon fails us, so we can replace it with more complex models, like that of a model. We are then left with the rather unsatisfactory notion of a metaphor of metaphor or a model of model. The sort of dissatisfaction that lead Derrida and his like to the heights of obscurity. I think we can probably just about avoid deconstructionist obscurantism but only if we adopt one of its most powerful tools, the fleeting sidelong glance (itself a metaphor/model). Just like the Necker cube, this life on the edge of metaphor is constantly shifting before our eyes. Never quite available to us perceptually all at once but readily apprehended by us in its totality. At once opaque and so so crystal clear. Rejoice all you parents of freshly screaming thoughts. It’s a metaphor!
Photo Credit: @Doug88888 via Compfight cc

Linguistics according to Fillmore

Share

While people keep banging on about Chomsky as being the be all and end all of linguistics (I’m looking at you philosophers of language), there have been many linguists who have had a much more substantial impact on how we actually think about language in a way that matters. In my post on why Chomsky is not really a linguist at all I listed a few.

Sadly, one of these linguists died yesterday. It was Charles J Fillmore who was a towering figure among linguists without writing a single book. In my mind, he changed the face of linguistics three times with just three articles (one of them co-authored). Obviously, he wrote many more but compared to his massive impact, his output was relatively modest. His ideas have been with me all through my life as a linguist and on reflection, they form a foundation about what I know language to be. Therefore, this is not so much an obituary (for which I’m hardly the most qualified person out there) as a manifesto for a linguistics of a truly human language.

The case for Fillmore

The first article, more of a slim monograph at 80 odd pages, was Case for Case (which, for some reason, I first read in Russian translation). Published in 1968 it was one of the first efforts to find deeper functional connections in generative grammar (following on his earlier work with transformations). If you’ve studied Chomskean Government and Binding, this is where thematic roles essentially come from. I only started studying linguistics in 1991 which is when Case for Case was already considered a classic. Particularly in Prague where function was so important. But even after all those years, it is still worth reading for any minimalist  out there. Unlike so many in today’s divided world, Fillmore engaged with the whole universe of linguistics, citing Halliday, Tesniere, Jakobson,  Whorf, Jespersen, and others while giving an excellent overview of the treatment of case by different theories and theorists. But the engagement went even deeper, the whole notion of ‘case’ as one “base component of the grammar of every language” brought so much traditional grammar back into contact with a linguistics that was speeding away from all that came before at a rate of knots.

From today’s perspective, its emphasis on the deep and surface structures, as well as its relatively impoverished semantics may seem a bit dated, but it represents an engagement with language used to express real meaning.  The thinking that went into deep cases transformed into what has become known as Frame Semantics (“I thought of each case frame as characterizing a small abstract ‘scene’ or ‘situation’, so that to understand the semantic structure of the verb it was necessary to understand the properties of such schematized scenes” [1982]) which is where things really get interesting.

Fillmore in the frame

When I think about frame semantics, I always go to his 1982 article Frame Semantics published in the charmingly named conference proceedings ‘Linguistics in the morning calm’ but it had its first outing in 1976. George Lakoff used it as one of the key inspirations to his idealized cognitive models in Women, Fire, and Dangerous things which is where this site can trace its roots. As I have said before, I essentially think about metaphors as a special kinds of frames.

In it, he says:

By the term ‘frame’ I have in mind any system of concepts related in such a way that to understand anyone of them you have to  understand the whole structure in which it fits; when one of the things in such a structure is introduced into a text, or into a conversation, all of the others are automatically made available. I intend the word ‘frame’ as used here to be a general cover term for the set of concepts variously known, in the literature on natural language understanding, as ‘schema: ‘script’, ‘scenario’, ‘ideational scaffolding’, ‘cognitive model’, or ‘folk theory’.

It is a bit of a mouthful but it captures in a paragraph the absolute fundamentals of the semantics of human language as opposed to projecting the rules of formal logic and truth conditions onto an impoverished version of language that all the generative-inspired approaches try to do. Also, it brings together many other concepts from different fields of scholarship. Last year I presented a paper on the power of the concept of frame where I found even more terms that have a close affinity to it which only underscores the far reaching consequences of Fillmore’s insight.

As I was looking for some more quotes from that article, I realized that I’d have to pretty much cut and paste in the whole of it. Almost, every sentence there is pure gold. Rereading it now after many many years, it’s becoming clear how many things from it I’ve internalized (and frankly, reinvented some of the ideas I forgot had been there).

Constructing Fillmore

About the same time, and merging the two earlier insights, Fillmore started working on the principles that have come to be known as construction grammar. Although, by then, the ideas were some years old, I always think of his 1988 article with Paul Kay and Mary Catherine O’Conner as a proper construction grammar manifesto. In it they say:

The overarching claim is that the proper units of a grammar are more similar to the notion of construction in traditional and pedagogical grammars than to that of rule in most versions of generative grammar.

Constructions, according to Fillmore have these properties:

  1. They are not limited to the constituents of a single syntactic tree. Meaning, they span what has been considered as the building blocks of language.
  2. They specify at the same time syntactic, lexical, semantic and pragmatic information.

  3. Lexical items can also be viewed as constructions (this is absolutely earth shattering and I don’t think linguistics has come to grips with it, yet).

  4. They are idiomatic. That is, their meaning is not built up from their constituent parts.

Although Lakoff’s study of ‘there constructions’ in Women, Fire, and Dangerous Things came out a year earlier (and is still essential reading), I prefer Fillmore as an introduction to the subject (if only because I never had to translate it).

The beauty of construction grammar (just as the beauty of frame semantics) is in that it can bridge much of the modern thinking about language with grammatical insights and intuitions of generations of researchers from across many schools of thought. But I am genuinely inspired by its commitment to language as a whole, expressed in the 1999 article by Fillmore and Kay:

To adopt a constructional approach is to undertake a commitment in principle to account for the entirety of each language. This means that the relatively general patterns of the language, such as the one licensing the ordering of a finite auxiliary verb before its subject in English as illustrated in 1, and the more idiomatic patterns, such as those exemplified in 2, stand on an equal footing as data for which the grammar  must provide an account.

(1) a. What have you done?  b. Never will I leave you. c. So will she. d. Long may you prosper! e. Had I known, . . . f. Am I tired! g. . . . as were the others h. Thus did the hen reward Beecher.

(2) a. by and large b. [to] have a field day c. [to] have to hand it to [someone]  d. (*A/*The) Fool that I was, . . . e. in x’s own right

Given such a commitment, the construction grammarian is required to develop an explicit system of representation, capable of encoding economically and without loss of generalization all the constructions (or patterns) of the language, from the most idiomatic to the most general.

Notice that they don’t just say ‘language’ but ‘each language’. Both of those articles give ample examples of how constructions work and what they do and I commend them to your linguistic enjoyment.

Ultimately, I do not subscribe to the exact version of construction grammar that Fillmore and Kay propose, agreeing with William Croft that it is still too beholden to the formalist tradition of the generative era, but there is something to learn from on every page of everything Fillmore wrote.

Once more with meaning: the FrameNet years

Both frame semantics and construction grammar impacted Fillmore’s work in lexicography with Sue Atkins and culminated in FrameNet a machine readable frame semantic dictionary providing a model for a semantic module to a construction grammar. To make the story complete, we can even see FrameNet as a culmination of the research project begun in Case for Case  which was the development of a “valence dictionary” (as he summarized it in 1982). While FrameNet is much more than that and has very much abandoned the claim to universal deep structures, it can be seen as accomplishing the mission of a language with meaning Fillmore set out on in the 1960s.

Remembering Fillmore

I only met Fillmore once when he came to lecture at a summer school in Prague almost twenty years ago. I enjoyed his lectures but was really too star struck to take advantage of the opportunity. But I saw enough of him to understand why he is remembered with deep affection and admiration by all of his colleagues and students whose ranks form a veritable who’s who of linguists to pay attention to.

In my earlier post, I compared him in stature and importance to Roman Jakobson (even if Jakobson’s crazy voluminous output across four languages dwarfs Fillmore’s – and almost everyone else’s). Fillmore was more than a linguist’s linguist, he was a linguist who mattered (and matters) to anyone who wanted (and wants) to understand how language works beyond a few minimalist soundbites. Sadly it is possible to meet graduates with linguistics degrees who never heard of Jakobson or Fillmore. While it’s almost impossible to meet someone who doesn’t know anything about language but has heard of Chomsky. But I have no doubt that in the decades of language scholarship to come, it will be Fillmore and his ideas that will be the foundation upon which the edifice of linguistics will rest. May he rest in peace.

Post Script

I am far from being an expert on Fillmore’s work and life. This post reflects my personal perspective and lessons I’ve learned rather than a comprehensive or objective reference work. I may have been rather free with the narrative arc of his work. Please be free with corrections and clarifications. Language Log reposted a more complete profile of his life.

References

  • Fillmore, C., 1968. The Case for Case. In E. Bach & R. Harms, eds. Universals in Linguistic Theory. New York: Holt, Rinehart and Winston, pp. 1–88. Available at: http://pdf.thepdfportal.com/PDFFiles/123480.pdf [Accessed February 15, 2014].
  • Fillmore, C.J., 1976. Frame Semantics and the nature of language. Annals of the New York Academy of Sciences, 280 (Origins and Evolution of Language and Speech), pp.20–32.
  • Fillmore, C., 1982. Frame Semantics. In The Linguistic Society of Korea, ed. Linguistics in the morning calm : International conference on linguistics : Selected papers. Seoul  Korea: Hanshin Pub. Co., pp. 111–139.
  • Fillmore, C.J., Kay, P. & O’Connor, M.C., 1988. Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone. Language, 64(3), pp.501–538.
  • Kay, P. & Fillmore, C.J., 1999. Grammatical constructions and linguistic generalizations: the What’s X doing Y? construction. Language, 75(1), pp.1–33.

Binders full of women with mighty pens: What is metonymy

Share

Metonymy in the wild

""Things were not going well for Mitt Romney in early autumn of last year. And then he responded to a query about gender equality with this sentence:

“I had the chance to pull together a cabinet, and all the applicants seemed to be men… I went to a number of women’s groups and said, ‘Can you help us find folks?’ and they brought us whole binders full of women.” http://en.wikipedia.org/wiki/Binders_full_of_women

This became a very funny meme that stuck around for weeks. The reason for the longevity was the importance of women’s issues and the image of Romney himself. Not the phrase itself. What it showed or rather confirmed that journalists who in the same breath bemoan the quality of language education are completely ignorant about issues related to language. Saying things like:

In fairness, “binders” was most likely a slip of the tongue. http://edition.cnn.com/2012/10/17/opinion/cardona-binders-women/index.html

The answer to this is NO. This was not some ‘freudian slip of the tongue’ nor was it an inelegant phrase. It was simply a perfectly straightforward use of metonymy. Something we use and hear used probably a dozen times every day without remarking on it (or mostly so – see below).

What is metonymy

Metonymy is a figure of speech where something stands for something else because it has a connection to it. This connection can be physical, where a part of something can stand for a whole and a whole can stand for one of its parts.

  • Part for a whole: In I got myself some new wheels., ‘wheels’ stand in for ‘car’.
  • Whole for a part: In My bicycle got a puncture., ‘bicycle’ stands for a ‘tyre‘ which is a part of the it.

But the part/whole relationship does not have to be physical. Something can be a part of a process, idea, or configuration. The part/whole relationship can also be a membership or a cause and effect link. There are some subdomain instantiations where whole sets of conventional metonymies often congregate. Tools also often stand for jobs and linguistic units can stand for their uses. Materials can also be used to stand for things made from them. Some examples of these are:

  • Membership for members: “The Chess club sends best wishes.” < the ‘chess club’ stands for its members
  • Leader for lead: “The president invaded another country.” < the ‘president’ stands for the army
  • Tool  for person: “hired gun” < the tool stands for the person
  • Linguistic units for uses: “no more ifs and buts’ < if’ and ‘but’ stand for their types of questions
  • End of a process for process: “the house is progressing nicely” < the ‘house’ is the final end of a process which stands for the process as a whole.
  • Tool/position for job“chair person” < ‘chair’ stands for the role of somebody who sits on it.
  • Body part for use: “lend a hand”, the ‘hand’ stands for the part of the process where hands are used.
  • City for inhabitants: “Detroit doesn’t like this” < the city of ‘Detroit’ stands for the people and industries associated with the city.
  • Material for object made from material: “he buried 6 inches of steel in his belly” < the steel stands for a sword as in “he filled him full of lead”, lead stands for bullets.

Metonymy chaining

Metonymies often occur in chains. A famous example by Michael Reddy is

“You’ll find better ideas than that in the library.”

where ideas are expressed in words, printed on pages, bound in books, stored in libraries.

In fact the ‘binders full of women’ is an example of a metonymic chain where women stand for profiles which are written on pages contained in binders.

It has been argued that these chains illustrate the very nature of metonymic inference. (See more below in section on reasoning.) In fact, it is not unreasonable to say that most metonymy contains some level of chaining or potential chaining. Not in cases of direct parts like ‘wheels’ standing for ‘cars’ but in the less concrete types like ‘hands’ standing for help or ‘president’ for the invading army, there is some level of chaining involved.

Metonymy vs. synechdoche

Metonymy is a term which is a part of a long standing classification of rhetorical tropes. The one term from this classification that metonymy is most closely associated with is synechdoche. In fact, what used to be called synechdoche is now simply subsumed under metonymy by most people who write about it.

The distinction is:
Synechdoche describes a part standing for a whole (traditionally called pars pro toto) as in ‘The king built a cathedral.’ or the whole standing for a part (traditionally called totum pro parte) as in ‘Poland votes no’
Metonymy describes a connection based on a non-part association such as containment, cause and effect, etc. (see above for a variety of examples)

While this distinction is not very hard to determine in most cases, it is not particularly useful and most people won’t be aware of it. In fact, I was taught that synechdoche was pars pro toto and metonymy was totum pro parte and all the other uses are an extension of these types. This makes just as much as sense as any other division but doesn’t seem to be the way the ancients looked at it.

Metaphor vs. metonymy

More commonly and perhaps more usefully, metonymy is contrasted with metaphor. In fact, ‘metaphor/metonymy’ is one of the key oppositions made in studies of figurative language.

People studying these tropes in the Lakoff and Johnson tradition will say something along the lines of metonymy relies on continguity wheras metaphor relies on similarity.

So for example:

  • you‘re such a kiss ass” is a metaphor because ‘kissing ass’ signifies a certain kind of behavior, but the body part is not involved, while
  • “I got this other car on my ass” is a metonymy because ‘ass’ stands for everything that’s behind you.

Or:

  • all men are pigs” is a metaphor because we ascribe the bad qualities of pigs to men but
  • this is our pig man” is a metonymy because ‘pig’ is part of the man’s work

Some people (like George Lakoff himself) maintain that the distinction between metaphor and metonymy represent a crucial divide. Lakoff puts metonymic connections along with metaphoric ones as the key figurative structuring principles of conceptual frames (along with propositions and image schemas). But I think that there is evidence to show that they play a similar role in figurative language and language in general. For example, we could add a third sentence to our ‘ass’ opposition such as ‘she kicked his ass’ which could be either metonymic when actual kicking occured but only some involved the buttocks or metaphoric if no kicking at all took place. But even then the metaphor relies on an underlying metonymy.

When we think of metaphor as a more special instance of domain mapping (or conceptual blending, as I do on this blog), then we see that very similar connections are being made in both. Very often both metaphor and metonymy are involved in the same figurative process. There is also often a component of social convention where some types of connections are more likely to be made.

For example, in “pen is mightier than the sword” the connections of ‘pen’ to writing and ‘sword’ to war or physical enforcement is often given as an example of metonymy. But the imagery is much richer than that. In order to understand this phrase, we need to compare two scenarios (one with the effects of writing and one with the effects of fighting) which is exactly what happens in the conceptualisation taking place in metaphors and analogies. These two processes are not just part of a chain but seem to happen all at once.

Another example is ‘enquiring minds want to know’ the labeling of which was the subject of a recent debate. We know that minds often metonymically stand for thinkers as in ‘we have a lot of sharp minds in this class’. But when we hear of ‘minds’ doing something, we think of metaphor. This is not all that implausible because ‘my mind has a mind of its own’ is out there: http://youtu.be/SdUZe2BddHo. But this figure of speech obviously relies on both conceptualisations at once (at least in the way some people will construe it).

Metonymy and meronymy

One confusion, I’ve noticed is putting metonymy into opposition to meronymy. However, the term ‘meronymy has nothing to do with the universe of figurative language. It is simply a term for a name used to label a the meaning of a word in relationship to another word where one of these words denotes a whole and another its part. So ‘wheels’ are a meronym of ‘car’ and ‘bike’ but calling a nice car ‘sharp wheels’ is synechdoche, not meronymy as this post http://wuglife.tumblr.com/post/68572697017/metonymy-or-meronymy erroneously claims.

Meronyms together with hyponyms and hyperonyms are simply terms that describe semantic relationships between words. You could say that synechdoche relies on the meronymic (or holonymic) relationship between words or that it uses meronyms for reference.

It doesn’t make much difference for the overall understanding of the issues but perhaps worth clarifying.

William Croft also claims that meronymy is the only constituent relationship in his radical construction grammar (something which I have a lot of time for but not something hugely relevant to this discussion).

Metonymic imagery

Compared to metaphor, metonymy is often seen as the more pedestrian figure of speech. But as we saw in the reactions to Romney’s ‘binders of women’ that this is not necessarily the case:

he managed to conjure an image confirming every feminist’s worst fears about a Romney presidency; that he views women’s rights in the workplace as so much business admin, to be punched and filed and popped on a shelf http://www.theguardian.com/world/shortcuts/2012/oct/17/binders-full-of-women-romneys-four-words

The meme that sprang up around it consisted mostly of people illustrating this image, many of which can be found on http://bindersfullofwomen.tumblr.com (see one such image above).

This is not uncommon in the deconstructions and hypostatic debates about metonymies. ‘Pen is mightier than the sword’ is often objected to on the basis that somebody with a sword will always prevail over somebody with a pen. People will also often critique the ’cause of’ relationships, as in ‘the king did not erect this tower, all the hard-working builders did’. Another example could be all the gruesome jokes about ‘lending a hand’ or ‘asking for a hand in marriage’. I still remember a comedy routine from my youth which included the sentence, “The autopsy was successful, the doctor came over to me extending a hand…for me to take to the trash.”

But there is a big difference in how the imagery works in metonymy and metaphor. Most of the time we don’t notice it. But when we become aware of the rich evocative images that make a metaphor work, we think of the metaphor as working and those images illustrate the relationship between the two domains. But when we become aware of the images that are contained in a metonymy (as in the examples above), we are witnessing a failure of the metonymy. It stops doing its job as a trope and starts being perceived as somehow inappropriate usage. But metaphor thus revealed typically does its job even better (though not in all cases as I’ve often illustrated on this blog).

Reasoning with metonymy

Much has been written about metaphoric reasoning (sometimes in the guise of ‘analogic reasoning’) but connection is just as an important part of reasoning as similarity is.

Much of sympathetic magic requires both connection and similarity. So the ‘voodoo doll’ is shaped like a person but is connected to them by a their hair, skin, or an item belonging to them.

But reasoning by connection is all around us. For instance, in science, the relationship of containment is crucial to classification and much of logic. Also, the question of sets being part of sets which has spurred so much mathematical reasoning has both metaphoric and metonymic dimensions.

But we also reason by metonymy in daily life when we pay homage to the flag or call on the president to do something about the economy. Sometimes we understand something metonymically by compression, as if when we equate the success of a company with the success of its CEO. Sometimes we use metonymy to elaborate as when we say something like 12 hard working pistons brought the train home.

Metonymy is also involved in the process of exemplars and paragons. While the ultimate conceptualization is metaphoric, we also ask that the exemplar has some real connection. Journalists engage the process of metonymy when they pick someone to tell their story to exemplify a larger group. This person has to be both similar and connected to engage the power of the trope fully. On a more accessible level, insults and praise often have a metonymic component. When we call someone ‘an asshole’ or ‘a hero’, we often substitute a part of who they are for the whole, much to the detriment of our understanding of who they are (note that a metaphor is also involved).

Finally, many elements of representative democracy rely on metonymic reasoning. We want MPs to represent particular areas and think it is best if they originate in that area. We think because we paid taxes, the police ‘work for us’. Also, the ideology of nationalism and nation states are very much metonymic.

Warning in conclusion

I have often warned against the dangers of overdoing the associations generated by metaphors. But in many ways metonymy is potentially even more dangerous because of the magic of direct connection. It can be a very useful (and often necessary) shortcut to communication (particularly when used as compression) but just as often it can lead us down dangerous paths if we let it.

Background

This post is an elaboration and reworking of my comment on Stan Carey’s post on metonymy:  It seemed to me a surprisingly confused and unclear about what metonymy does. This could be because Stan is no linguistic lightweight so I have expected more. But it’s easy to get this wrong, and rereading my comment there, it seems, I got a bit muddled myself. And, I’m sure even my more worked out description here could be successfully picked over. Even Wikipedia, which is normally quite good in this area, is a bit confused on the matter. The different entries for synechdoche and metonymy as well as related terms seem a bit patched together and don’t provide a straightforward definition.

Ultimately, the finer details don’t matter as long as we understand the semantic field. I hope this post contributes to that understanding but I’ll welcome any comments and corrections.

Pervasiveness of Obliging Metaphors in Thought and Deed

Share

when history is at its most obliging, the history-writer needs be at his most wary.” (China by John Keay)

Die Mykologen - Glückspilze - Lucky Fellows - Fungi ExpertsI came across this nugget of wisdom when I was re-reading the Introduction to John Keay’s history of China. And it struck me that in some way this quote could be a part of the motto of this blog. The whole thing might then read something like this:

Hack at your thoughts at any opportunity to see if you can reveal new connections through analogies, metonymies and metaphors. Uncover hidden threads, weave new ones and follow them as far as they take you. But when you see them give way and oblige you with great new revelations about how the world really is, be wary!

Metaphors can be very obliging in their willingness to show us that things we previously thought separate are one and the same. But that is almost always the wrong conclusion. Everything is what it is, it is never like something else. (In this I have been subscribing to ‘tiny ontology’ even before I‘ve heard about it). But we can learn things about everything when we think about it as something else. Often we cannot even think of many things other than through something else. For instance, electricity. Electrons are useful to think of as particles or as waves. Electrons are electrons, they are not little balls nor are they waves. But when we start treating them as one or the other, they become more tractable for some problems (electrical current makes more sense when we think of them as waves and electricity generating heat makes sense when we think of them as little balls).

George Lakoff and Mark Johnson summarize metaphors in the X IS Y format (e.g. LOVE IS A JOURNEY) but this implied identity is where the danger lies. If love is a journey as we can see in a phrase like, ‘We’ve arrived at a junction in our relationship’, then it surely must be a journey in all respects: it has twists and turns, it takes time, it is expensive, it happens on asphalt! Hold on! Is that last one the reason ‘love can burn with an eternal flame’? Of course not. Love IS NOT a journey. Some aspects of what we call love make more sense to us when we think of them as a journey. But others don’t. Since it is obvious that love is not a journey but is like a journey, we don’t worry about it. But it’s more complicated than that. The identities implied in metaphor are so powerful (more so to some people than others) that some mappings are not allowed because of the dangers implied in following them too far. ‘LOVE IS A CONTRACT’ is a perfectly legitimate metaphor. There are many aspects of a romantic relationship that are contract-like. We agree to exclusivity, certain mode of interaction, considerations, etc. when we declare our love (or even when we just feel it – certain obligations seem to follow). But our moral compass just couldn’t stomach (intentional mix) the notion of paying for love or being in love out of obligation which could also be traced from this metaphor. We instinctively fear that ‘LOVE IS A CONTRACT’ is a far too obliging a metaphor and we don’t want to go there. (By we, I mean the general rules of acceptable discourse in certain circles, not every single cognizing individual.)

So even though metaphors do not describe identity, they imply it, and not infrequently, this identity is perceived as dangerous. But there’s nothing inherently dangerous about it. The issue is always the people and how willing they are to let themselves be obliged by the metaphor. They are aided and abetted in this by the conceptual universe the metaphor appears in but never completely constrained by it. Let’s take the common metaphor of WAR. I often mention the continuum of ‘war on poverty’, ‘war on drugs’, and ‘war on terror’ as an example of how the metaphors of ‘war’ do not have to lead to actual ‘war’. Lakoff showed that they can in ‘metaphors can kill’. But we see that they don’t have to. Or rather we don’t have to let them. If we don’t apply the brakes, metaphors can take us almost anywhere.

There are some metaphors that are so obliging, they have become cliches. And some are even recognized as such by the community. Take, for instance, the Godwin law. X is Hitler or X is Nazi are such seductive metaphors that sooner or later someone will apply them in almost any even remotely relevant situation. And even with the awareness of Godwin’s law, people continue to do it.

The key principle of this blog is that anything can be a metaphor for anything with useful consequences. Including:

  • United States is ancient Rome
  • China today is Soviet Union of the 1950s
  • Saddam Hussein is Hitler
  • Iraq is Vietnam
  • Education is a business
  • Mental difficulties are diseases
  • Learning is filling the mind with facts
  • The mind is the software running on the hardware of the brain
  • Marriage is a union between two people who love each other
  • X is evolved to do Y
  • X is a market place

But this only applies with the HUGE caveat that we must never misread the ‘is’ for a statement of perfect identity or even isomorphims (same shapedness). It’s ‘is(m)’. None of the above metaphors are perfect identities – they can be beneficially followed as far as they take us, but each one of them is needs to be bounded before we start drawing conclusions.

Now, things are not helped by the fact that any predication or attribution can appear as a kind of metaphor. Or rather it can reveal the same conceptual structures the same way metaphors do.

‘John is a teacher.’ may seem like a simple statement of fact but it’s so much more. It projects the identity of John (of whom we have some sort of a mental image) into the image schema of a teacher. That there’s more to this than just a simple statement can be revealed by ‘I can’t believe that John is a teacher.’ The underlying mental representations and work on them is not that different to ‘John is a teaching machine.’ Even simple naming is subject to this as we can see in ‘You don’t look much like a Janice.’

Equally, simple descriptions like ‘The sky is blue’ are more complex. The sky is blue in a different ways than somebody’s eyes are blue or the sea is blue. I had that experience myself when I first saw the ‘White Cliffs of Dover’ and was shocked that they were actually white. I just assumed that they were a lighter kind of cliff than a typical cliff or having some white smudges. They were white in the way chalk is white (through and through) and not in the way a zebra crossing is white (as opposed to a double yellow line).

A famous example of how complex these conceptualisations can get is ‘In France, Watergate would not have harmed Nixon.’ The ‘in France’ and ‘not’ bits establishe a mental space in which we can see certain parts of what we know about Nixon and Watergate projected onto what we know about France. Which is why sentences like “The King of France is bald.” and “Unicorns are white.” make perfect sense even though they both describe things that don’t exist.

Now, that’s not to say that sentences like ‘The sky is blue’, ‘I’m feeling blue’,’I’ll praise you to the sky.’, or ‘He jumped sky high.’ and ‘He jumped six inches high.’ are cognitively or linguistically exactly the same. There’s lots of research that shows that they have different processing requirements and are treated differently. But there seems to be a continuum in the ability of different people (much research is needed here) to accept the partiality of any statement of identity or attribution. On the one extreme, there appears something like autism which leads to a reduced ability to identify figurative partiality in any predication but actually, most of the time, we all let ourselves be swayed by the allure of implied identity. Students are shocked when they see their teacher kissing their spouse or shopping in the mall. We even ritualize this sort of thing when we expect unreasonable morality from politicians or other public figures. This is because over the long run overtly figurative sentence such as ‘he’s has eyes like a hawk’ and ‘the hawk has eyes’ need similar mental structures to be present to make sense to us. And I suspect that this is part of the reason why we let ourselves be so easily obliged by metaphors.

Update: This post was intended as a warning against over-obliging metaphors that lead to perverse understandings of things as other things in unwarranted totalities. In this sense, they are the ignes fatui of Hobbes. But there’s another way in which over-obliging metaphors can be misleading. And that is, they draw on their other connections to make it seem we’ve come to a new understanding where in fact all we’ve done is rename elements of one domain with the names of elements of another domain without any elucidation. This was famously and devastatingly the downfall of Skinner’s Verbal Behavior under Chomsky’s critique. He simply (at least in the extreme caricature that was Chomsky’s review) took things about language and described them in terms of operant conditioning. No new understanding was added but because the ‘new’ science of psychology was in as the future of our understanding of everything, just using those terms made us assume there was a deeper knowledge. Chomsky was ultimately right-if only to fall prey to the same danger with his computational metaphors of language. Another area where that is happening is evolution, genetics and neuroscience which are often used (sometimes all at once) to simply relabel something without adding any new understanding whatsoever.

Update 2: Here’s another example of overobliging metaphor in the seeking of analogies to the worries about climate change: http://andrewgelman.com/2013/11/25/interesting-flawed-attempt-apply-general-forecasting-principles-contextualize-attitudes-toward-risks-global-warming/#comment-151713.  My comment was:

…analogies work best when they are opportunistic, ad hoc, and abandoned as quickly as they are adopted. Analogies, if used generatively (i.e. to come up with new ideas), can be incredibly powerful. But when used exegeticaly (i.e. to interpret or summarize other people’s ideas), they can be very harmful.

The big problem is that in our cognition, ‘x is y’ and ‘x is like y’ are often treated very similarly. But the fact is that x is never y. So every analogy has to be judged on its own merit and we need to carefully examine why we’re using the analogy and at every step consider its limits. The power of analogy is in its ability to direct our thinking (and general cognition) i.e. not in its ‘accuracy’ but in its ‘aptness’.

I have long argued that history should be included in considering research results and interpretations. For example, every ‘scientific’ proof of some fundamental deficiencies of women with respect to their role in society has turned out to be either inaccurate or non-scalable. So every new ‘proof’ of a ‘woman’s place’ needs to be treated with great skepticism. But that does not mean that one such proof does not exist. But it does mean that we shouldn’t base any policies on it until we are very very certain.

Image Creative Commons License Hartwig HKD via Compfight

Framing and constructions as a bridge between cognition and culture: Two Abstracts for Cognitive Futures

Share

I just found out that both abstracts I submitted to the Cognitive Futures of the Humanities Conference were accepted. I was really only expecting one to get through but I’m looking forward to talking about the ideas in both.

The first first talk has foundations in a paper I wrote almost 5 years ago now about the nature of evidence for discourse. But the idea is pretty much central to all my thinking on the subject of culture and cognition. The challenge as I see it is to come up with a cognitively realistic but not a cognitively reductionist account of culture. And the problem I see is that often the learning only goes one way. The people studying culture are supposed to be learning about the results of research on cognition.

Frames, scripts, scenarios, models, spaces and other animals: Bridging conceptual divides between the cognitive, social and computational

While the cognitive turn has a definite potential to revolutionize the humanities and social sciences, it will not be successful if it tries to reduce the fields investigated by the humanities to merely cognitive or by extension neural concepts. Its greatest potential is in showing continuities between the mind and its expression through social artefacts including social structures, art, conversation, etc. The social sciences and humanities have not waited on the sidelines and have developed a conceptual framework to apprehend the complex phenomena that underlie social interactions. This paper will argue that in order to have a meaningful impact, cognitive sciences, including linguistics, will have to find points of conceptual integration with the humanities rather than simply provide a new descriptive apparatus.

It is the contention of this paper that this can be best done through the concept of frame. It seems that most disciplines dealing with the human mind have (more or less independently) developed a similar notion dealing with the complexities of conceptualization variously referred to as frame, script, cognitive model or one of the as many as 14 terms that can be found across the many disciplines that use it.  This paper will present the different terms and identify commonalities and differences between them. On this, it will propose several practical ways in which cognitive sciences can influence humanities and also derive meaningful benefit from this relationship. I will draw on examples from historical policy analysis, literary criticism and educational discourse.

See the presentation on Slideshare.

The second paper is a bit more conceptually adventurous and testing the ideas put forth in the first one. I’m going to try to explore a metaphor for the merging of cultural studies with linguistic studies. This was done before with structuralism and ended more or less badly. For me, it ended when I read the Lynx by Lévi-Strauss and realized how empty it was of any real meaning. But I think structuralism ended badly in linguistics, as well. We can’t really understand how very basic things work in language unless we can involve culture. So even though, I come at this from the side of linguistics, I’m coming at it from the perspective of linguistics that has already been informed by the study of culture.

If Lévi-Strauss had met Langacker: Towards a constructional approach to the patterns of culture

Construction/cognitive grammar (Langacker, Lakoff, Croft, Verhagen, Goldberg) has broken the strict separation between the lexical and grammatical linguistic units that has defined linguistics for most of the last century. By treating all linguistic units as meaningful, albeit on a scale of schematicity, it has made it possible to treat linguistic knowledge as simply a part of human knowledge rather than as a separate module in the cognitive system. Central to this effort is the notion of language of as an organised inventory of symbolic units that interact through the process of conceptual integration.

This paper will propose a new view of ‘culture’ as an inventory of construction-like patterns that have linguistic, as well, as interactional content. I will argue that using construction grammar as an analogy allows for the requisite richness and can avoid the pitfalls of structuralism. One of the most fundamental contributions of this approach is the understanding that cultural patterns, like constructions, are pairings of meaning and form and that they are organised in a hierarchically structured inventory. For instance, we cannot properly understand the various expressions of politeness without thinking of them as systematically linked units in an inventory available to members of a given culture in the same as syntactic and morphological relationships. As such, we can understand culture as learnable and transmittable in the same way that language is but without reducing its variability and richness as structuralist anthropology once did.

In the same way that Jakobson’s work on structuralism across the spectrum of linguistic diversity inspired Lévi-Strauss and a whole generation of anthropological theorists, it is now time to bring the exciting advances made within cognitive/construction grammar enriched with blending theory back to the study of culture.

See the presentation on SlideShare.