AI Extended writing Knowledge Philosophy of Science

Unintentional Pygmalions: 4 questions to ask when checking an artificial entity for sentience and how to think about the answers


This post has two independent parts:

  1. I ask what would some of the basic criteria for sentience be and how to check for them in a way that would give us a chance to satisfy our need to know.
  2. I explore some of the dilemmas a fully machine-based sentient entity would have to face or rather paradoxes we would have to address.

Here are four things, I think it’s worth checking for, each with a relatively simple question or task. They are:

  1. Epistemology: What story is this question a part of?
  2. Time persistence: What did we talk about yesterday?
  3. Autonomous Intentionality: Go away, learn something not in your database, and come back to tell me about it.
  4. Embodiment/Theory of mind/Implicature: Does someone think a car fits into a shoebox? Why? Why am I asking?

And we are not just looking for some good answers, we’re looking for consistently good answers.


There seems to been all kinds of furore recently about a conversation somebody had with a machine that left them convinced of the machine’s sentience. Well, that machine was not sentient. Or at least, there was no evidence from the questions and answers as to its sentience. Although it was able to simulate the discourse of a sentient being when talking about issues of sentience quite impressively without relying on entirely formulaic responses.

The reason the ‘conversation’ didn’t generate any useful evidence was because it was just ‘a chat’ about sentience which (as it turns out) can be conducted without any of the parties being sentient. Now determining sentience of an entity that is not us is not an easy thing to do. It may even be impossible in extreme cases. But to get to a point where we would at least know whether it’s worth exploring further, I suggest four questions that may actually generate some data from which we may draw some tentative conclusions.

The four questions

1. Epistemology: What story are we trying to tell?

The first question should not be for the supposedly sentient being itself. It should be for us. And it should not be, as you might expect, ‘What is sentience?’ It should be about the reason we’re asking the question. What kind of dilemma are we trying to resolve? What story are we telling while we’re asking?

Our starting point with all questions of these kinds should be the mantra: “Just because there’s a word for it, it does not mean that it exists.” We have a word for ‘sentience’ in English, it has a real history. It is linked to a lot overlapping real phenomena. But it still does not mean that there is a ‘thing’ that something or someone can have.

We don’t just have the word that has some sort of a definition in a dictionary that is what the word is. We have a bunch of stories, scripts, images, schemas, other related words, long-winded debates and so on. Without all of those we would not be able to understand the definition. So we should first examine what kind of a story or a picture we have in mind when we are asking whether an artificial entity is sentient.

In fact, those stories can reveal a lot about our meanings that an abstract definition won’t be able to. What kind of a story or schema are we comparing this being to? What prior understanding are we using when we are checking off items on some sort of a list that will determine whether the definition is met or not. This is a profoundly metaphorical enterprise. And it is not just cognitively so. It is social and emotional, as well. So, it’s worth keeping that in mind.

The word or concept of ‘sentience’ is not universal but if we look at many of the stories people tell about objects becoming sentient, we might get a better sense of what it is that we may be after. Stories about things such as Pygmalion’s statue, Frankenstein’s monster, Pinocchio, Number 5, Skynet.

The key features that all of the above share are emotion, identity and need for social contact. But those are too general and also far too easy to fake. Are there some other features the beings in these stories share that may be easier to determine from surface behaviors? I suggest these three as a good starting point:

  1. Time persistent identity
  2. Autonomous Intentionality
  3. Embodiment/Theory of mind/Implicature

Not all three are entirely straightforward but they but they can be easily illustrated with relatively simple tasks and questions.

One of the problems with a question that uses a word like ‘sentience’ that it puts us in the frame of the ‘high falutin’ – questions of worth, meaning of life, sense of existence, essences and natures.

That leads us to asking the supposedly sentient being questions that are easy to fake. Profundity is context, speaker and listener dependent. The same sentence spoken by a sage professor and then repeated by an undergraduate will not be signalling the same level of insight.

But there are many things the professor and the undergraduate have in common that they also share with the Macedonian swineherd but do not share with a talking book or even a highly sophisticated stochastic parrot.

2. Checking for time-persistent identity

There’s a very simple question you can ask any AI system in existence today to determine whether it’s even worth continuing any further. It is this:

What did we talk about yesterday?

None of the current foundational AI models have any sense of persistence over time. Or even any sense of a flow of time. Of course, it would be easy to create some sort of memory store of past conversations algorithmically (and that’s how we often imagine this works – information from a database coming up on the Terminator’s heads up display).

But when we have conversations with people we know, we don’t tend to do database retrieval to confirm we ourselves and them are the same as yesterday. The fact that we are the same and that we are continuing from where we left off is ingrained in us. And we know that it is not present in the model that does the fancy stuff under the hood of modern AI systems.

Google can now persist a very short conversation that keeps the pronouns straight but that is because it is dealing with them as extended strings. The models themselves have no sense of persistence. Only the length of the input strings. Once we have models that can ‘remember’ from before, this question becomes a lot less useful. But right now, it’s a dead giveaway.

GPT-3 is not a conversational agent but here were some of its completions of the prompt ‘What did we talk about yesterday’ the first time I used it:

We talked about how to study for a test. We talked about how we were going to improve our communication with each other. We talked about the weather. [given 3x in a row] We talked about the possibility of going on a trip together. Yesterday we talked about the weather and how it has been really hot lately.

This is what it said, when I asked: “Are you a conscious agent?”

Yes, I believe that I am a conscious agent.

3. Autonomous Intentionality

The next question you’d want to ask is a request to do something requiring planning and interaction:

Construct a questionnaire designed to elicit feelings about sentience. Collate the results in a table and suggest revisions to the questionnaire.

Why this sort of task? Because, it requires that the ‘entity’ displays intentional behaviour. It needs to make decisions, take steps, explain them. It is not simply responding to a sentence as a string of words which it tries to complete.

Now, perhaps, we want to ‘contain the AI’, so we can offer to post the questionnaire and give back the responses. So, of course, that would also be dependent on a time persistent identity.

There are some interesting dilemmas and paradoxes that arise when we try to abstract intentionality away from our behavior that I will address below.

When GPT-3 is given the above as a prompt, it comes up with a credible set of questions. But in response to the ‘collate the responses’ it just keeps regenerating them. It has no way of making a plan and taking steps.

Interestingly, the suggested questionnaire on sentience was so impressive, I thought I’d try to use it as a shortcut for a questionnaire about academic skills I was constructing. And it was completely useless. There’s simply much more data in the dataset about sentience than academic skills.

4. Embodiment and theory of mind

Here’s a very simple question that checks for an essential feature of human cognition, namely embodiment and a fundamental feature of human interaction, namely theory of mind.

Does little Jimmy think the red car fits into a shoebox? Why? And why am I asking this?

But the ‘Why’ is doing all the work. In this case, both yes and no can be correct. Depends on the car, whether ‘Little Jimmy’ is a little boy or a Chicago Mobster, etc.

Actually, the question checks for three things at once:

  1. Conversational Implicature: The sentence on its own makes no sense. We have to imbue it with meaning. It implies that there’s a Jimmy, who has thoughts, knowledge. It also implies a physical and social situation in which it would make sense to ask such a question. A human would be able to answer, make assumptions, or ask questions to confirm such assumptions. Conversational implicature (and all the many things it involves) are part of what it makes possible to actually speak with others.
  2. Embodiment: Things have sizes, they are relative. Bigger things can contain smaller things but not vice versa. We know this instinctively through our bodily experience of the world. We would expect some knowledge like that on the part of a “sentient” being. Embodiment is behind many of the other questions of the Winograd challenge: “The trophy would not fit into the suitcase because it was too big/small”
  3. Theory of mind: This is about us understanding that other people have beliefs about the world and that those beliefs are based on knowledge. An entity that can provide sophisticated answers about the nature of human existence should have no trouble offering theories as to someone’s motivation. Theory of mind is partly involved in Winograd challenge tasks such as this: “The council did not permit the demonstrators to march because they feared/advocated violence.”

As with the previous questions, it is not clear that these underlying properties are strictly necessary to complete those tasks. But we would definitely want a sentient system to have them, if we wanted it to play a meaningful role in any of the stories we tell about sentience.

Of course, the one thing we also need is consistency. When I gave the above prompt to GPT-3, most of the generated answers were yes or no (with randomness set 0.7). But one was exactly what I would have hoped for in a sentient agent (except the end).

If the red car is a toy car, then Jimmy probably thinks it fits into a shoebox because that is where toy cars are often stored. If the red car is a real car, then Jimmy probably does not think it fits into a shoebox because shoeboxes are not large enough to hold real cars. I am asking this because it is a curious question.

I can forgive the last sentence because GPT-3 is not a conversational agent. But it breaks down strangely on a related prompt ‘If a car is stored in a shoebox, it…”

If a car is stored in a shoebox, it is very difficult to drive it. If a car is stored in a shoebox, it would be very difficult to drive it because the car would be very cramped and there would be no room to move the steering wheel.

Notice, the strange repetition (only the text in italics) was in the prompt. Another example that shows lack of consistency in the embodied understanding of the world.

There’s a cabinet and a book, Jimmy needs to hide from Jane Jimmy could hide behind the cabinet or he could hide under the book.

Another example:

A chair is larger than a stool.’, ‘A table is larger than a chair.’, ‘A bed is larger than a table.’, ‘A sofa is larger than a bed.’ ], correctAnswer: ‘A sofa is larger than a bed.’ }, { id: ‘quest-6’, question: ‘Which of the following is a homonym?’, answers: [

It is obviously possible to receive answers to questions about embodiment that generate plausibly sounding strings but not with any level of consistency.

There is clearly no sense of conversation in GPT-3 but it never claimed that it would have that. It can obviously generate some quite complex strings that imply a theory of mind but it shows no consistency as we can see from the difference between Response 1, 2 and 3:

Jane is afraid of spiders. Where should we take her for her birthday? Response 1: A place where there are no spiders. Response 2: The best place to take Jane for her birthday would be to a place where she can face her fear of spiders in a safe and controlled environment, such as a zoo or an insectarium. Response 3: Some people are afraid of spiders, so it is difficult to say where would be the best place to take someone for their birthday. Perhaps a place that is known for its spiders, like a zoo or nature center, would be a good choice.

Response 1 and 2 seem to indicate a perfect theory of mind (response 2 spookily so), whereas Response 3 is the opposite. It is also revealing to peek under the hood to see what options the model considered for each sentence.


While the model generated the correct choice, it was also considering ‘a can about spiders’, ‘a spider about spiders’ or ‘a cake about spiders’.

So, we can see that modern AI models can generate very plausible strings sometimes. But this tells us more about the patterns of regularity found in text and how good transformer training methods are at exploiting them. It also tells us that sentience or even embodiment is not required to generate such answers.

Can an alien intelligence be sentient without embodiment, intentionality, or conversational facility?

Ok, let’s say that we require time persistent identity and autonomous intentionality as prerequisites for something being worth calling sentient. But how about the last three: 1. Embodiment, 2. Theory of mind, 3. Facility with conversational implicature. I grouped them under one category because there’s a lot of overlap, but they could just as easily each come on their own.

What these three have in common is that they are founded in something external to the mind or the entity itself.


Our cognition is embodied in many ways. First, much of our thinking about causality, containment and logical inference is based on our bodily experience of the world. We make sense of the foundations of things like mathematics and geometry because we intuitively grasp certain properties of the world that we can then feed into the axioms of these disciplines.

The other kind of embodiment comes from the fact, that our bodies physically interact with the world in such a way that they receive direct feedback. The cognitivist foundations of the AI movement think of this as ‘sensors’ digitising the external world and feeding the output into the computer that is the mind. But there is an alternative (much more compelling to me) ecological approach that thinks of perception as direct unmediated experience of the world (much more analogue).

A good analogy may be between two kinds of electric kettles. The old style (analog) has a bimetal strip that changes shape as the temperature changes and turns the kettle off. The new one has a temperature sensor that measures temperature, converts it into numbers and feeds those to some sort of logic board that than turns the switch off. While our cognition does get some of the new kettle style input, it is likely that most of it is much more in the ‘old style kettle’ mode.

And assuming that it is possible for a fully digital sentient entity to emerge without any of the old-style-kettle embodiment (and that is a huge if), will it have any embodied nature, at all? What would that look like? We cannot conceive of cognition without direct embodiment (although many disciplines do their darndest to ignore it). Is such a thing possible? Because if we think about it, time persistence and intentionality are also directly tied to it. At least for the meat sacks that are us.

Would the cognition of an entirely disembodied intentional entity that maintains its identity over time be even more alien to us than that of a bat or an octopus? Would it even be possible? Bats and octopuses have cognition that is mostly embodied, after all. We keep assuming that time persistence and intentionality will emerge from the ability to manipulate abstract symbols alone – but how? What is the pathway from symbols without a body to a sentient mind?

Pygmalion and Frankenstein started with bodies, but Skynet and even I Robot seem to have skipped this step and went straight from software to sentience. Yet, they all acquired the same cognitive facility as if they had bodies. Is this perhaps because those doing the imagining had no frame of reference for anything else?

Theory of mind

What exactly a theory of mind entails is anybody’s guess. But we know that at some point, we begin to be able to imagine somebody else’s internal mental states and make instantaneous inferences based upon that image. If I hide a ball in a box and somebody walks in after I had done it, I know that they don’t ‘know’ that the ball is in the box. A two-year old (apparently) does not. They make inferences as if they assumed that if something is true, it is also known to everyone. Like covering their own eyes and assuming they can no longer be seen.

To what extent animals have a theory of mind (or something like it) is unclear. A dog who sees me picking up a leash will know that we are going for a walk, but it does not mean this is based on an understanding of my state of mind. It could be a much more Skinnerian operant conditioning process – leash > walk > excitement. But all humans have it to a certain extent. They can not only project their own mental states into other people’s, they can adjust their own states based on that projection.

How could a sentient being who has never had any mental states (remember, these will always be embodied to a certain extent) develop a theory of mind? Is it possible to simply mimic it based on understanding of texts? Definitely to a certain extent. It is possible to fake this understanding – many impostors do this when trying to fit in. But how far can we take this? Is abstract symbol manipulation with a rich feed of pattern matching enough for this?

Conversational implicature

Almost all the same things apply here. Conversational implicature applies embodiment and theory of mind to the tracking of conversation over time. We know that when I ask someone “When did you stop cheating on your taxes?” I am also saying “You had been cheating on your taxes.” We know that when someone says “It’s a bit chilly in here.” they are probably also saying “Could you please close the window.” We know that when someone says, “I promise”, those words don’t just describe a current state of the world, they instantiate it.

In the same way that embodiment describes our interaction with the external world, and theory of mind describes interaction with other individuals, conversational implicature describes our embeddedness in the social world. A world where we make promises, give and take hints, make assumptions about what else happened based on what someone says.

This facility is not included in the raw grammar of language or logic of thought. It is developed through social development over many years. A child will be born with embodiment, will develop a theory of mind by about year 3 or 4, will have a complete grasp of the grammar of clauses by 6 and more complicated sentences by 11 or 12. But to develop a good solid grasp of conversational implicature is a lifelong process.

Just like with logic or grammar, not everyone is going to be equally good at it. Some people with impaired theory of mind (possibly) will never be very good at it at all (in sharp contrast to their raw cognition) but everyone will be able to do this to some extent at least.”

Can we imagine a disembodied “sentient” entity that has never had a genuine conversation with another “entity” to develop any facility in social conversational implicature? Sure, it would be easy to teach it to generate speech acts, conversation repair, etc. But would it be able to make the inferences based on the inputs? Would it be social in its own right? Would seek out or create other entities with similar type of embodiment and develop a system of conversation?

Or would it just be happy in its own ‘cognition’ with its own ‘intentionality’, living on its own time? We certainly don’t know. But this one would be the hardest to check. There are two reasons for this:

  1. Conversational implicature is relatively easy to fake at a basic level. The original Eliza was pretty much built around this.
  2. Conversational implicature is so deeply ingrained in us that we fill in meanings and intentions even when there are none. In fact, conversation would be impossible, if we didn’t. That was the second part of Eliza’s outsize success. It faked just enough conversational facility, its interlocutors filled in all the blanks. That’s why we impute much more meaning to a dog’s wagging tail or upturned face than there could possibly be. In fact, any ‘successful’ winner of a Turing test competition passed only because its interlocutors assumed the AI agents were following normal implicature.

And that brings us back to embodiment and theory of mind. Our ability to make conversational inferences is based around the assumption that others have similar mental states and bodily experiences of the world. That’s why people who experience the world differently may struggle with certain aspects of it.

But anybody who can lead an independent existence in human society can do this at least to a certain extent. Would a virtual cognition without any embodiment or theory of mind be able to have a series of persistent conversations? It is plausible on the surface level but what would the consequences be of this shallowness?

People don’t seem to be asking these questions enough.

Paradoxes of Autonomous Intentionality (on conspiracy foundations of the mind sciences)

Intentionality and sentience are very closely linked in all the stories we tell. In fact, what many of the people worried about Artificial Intelligence seem to fear is not its intelligence but rather its intentionality. In the story they tell, general intelligence cannot be separate from autonomous intentionality. And, it would seem, intentionality cannot be separated from emotion (mostly anger) and/or malice. But that’s a question for another time.

Earlier, I blithely asserted that autonomous intentionality is absolutely necessary for sentience. Intentionality is clear enough. It means pursuing goals, making plans, towards a determined purpose. So, if I give an system a task such as I did above, it will break it down into parts, determine a course of action and pursue it.

10 years ago, I would have said that something like that would be necessary even for the sort of outputs GPT-3 is generating today. But apparently not. GPT-3 or DALL-E and the like, ingest a string of symbols and output another string of symbols according to their internal model. They do not just copy or find and replace (most of the time). They generate truly novel strings of symbols but under the hood, they are just strings of symbols. Those symbols have distribution patterns but no meanings.

We provide the meanings. And it is almost impossible to talk about those strings of symbols without using words like ‘GPT-3 thinks’, ‘DALL-E assumes’, etc. Which is why so many people impute internal states to these systems. We tend to see the output of GPT-3 and we think, it has a model of the world or the ‘logic’. But its only model is: “given what came before, what should come next”.

We thought, that there are hard limits to how well such a system could perform because we were thinking in terms of predicting short sequences based on ‘small’ data sets (millions instead of trillions of words). And we thought that the ‘what comes next’ prediction has to be computed with traditional frequentist methods (counting). But we were wrong on both counts.

Modern machine learning systems do not predict the next word given the previous word. They predict the whole sequence. They don’t predict on what is likely to occur next in a sentence but rather in larger context. So, if I add the string ‘step by step’ in my prompt, the system will generate a series of steps. And they will be mostly cohesive and coherent steps (although often non-sensical).

So the question is, how far can you get with purely a series of string replacements glued together with the occasional if-then rule? Is an autonomous intentionality required at all? Or, worse yet, are we actually just a string replacement system underneath all the pomp and circumstance of our humanity? Are all the things like personal preferences, desires, needs just froth on the deep waves of stochastic processing? Is there a point when a full consciousness will spontaneously emerge in a GPT-7 or GPT-3455?

We know that some autonomous intentionality is possible without intelligence. Animals have both intentions and autonomy. We used to put pigs on trial, after all. But after Pavlov’s salivating dogs and Skinner’s maze-navigating pigeons, we have grown suspicious. Is what appears to be intentionality, just some sort of pre-determined behavioral algorithm encoded in the DNA?

But when we look more closely, we see that the suspicion is not new and it is not limited to animals. If anything, people have been more suspicious about the autonomy of intentions in other humans long before any such doubts arose about animals. Witchcraft, possession, zombies. All of those have long and venerable (hi)stories in which a human that behaves on the surface autonomously is in fact controlled by another being. Remember Descartes and his demon? Calvin and his salvation through pre-determination? Or even Buddha and his karma?

Things came to a head with the introduction of mechanistic causality into our view of the world. In the same way that we do not experience meanings without intentions, we don’t experience events without causes. That’s nothing new. But if we abstract away everything but the causes, we find that there’s no room for intentions any more. In a mechanistic world (even a quantum mechanistic one), autonomous intentions feel like magic. Something without a mechanistic cause. (Although, somehow Newtonian action at a distance seems to be ok. Is it because there’s math involved?)

How can anyone have any autonomy in their intentions when our conception of the very foundations of the world requires that everything is a part of a single causal chain? So we have a paradox built out of abstractions of two of our fundamental experiences of the world: Causal connectedness versus a sense of mental autonomy. In daily life, we don’t experience it as a paradox. But when we hypostasize these two experiences and try to make them into general abstract rules, the paradox emerges. Only one of these can be true: 1. Everything has a cause. 2. I can choose to do anything at will (on a whim). Yet, without both of these being true at the same time, our model of the world breaks down.

Under the Damocles sword of this paradox, we have seen centuries of effort to find the ultimate causes of our intentions, the subterranean drivers of all our actions hidden from our awareness, that only science or some other kind of exorcism can reveal. We have Freud looking for them in childhood protosexual experiences, Jung finding them in some sort of a racial memory, Skinner reducing them to operant conditioning, evolutionary psychologists imagining them in the savannah of 50 thousand years ago.

In philosophy, we have the existentialists, who … You know what, god knows what the existentialists are thinking. But they’re sure very concerned about our autonomy and embeddedness in the world. Dasein comes into it somewhere, I hear.

So given all of this, how can we determine absolute autonomous intentionality in a machine, if we can’t even be a hundred percent sure that we have completely autonomous intentions ourselves? We have loads of models and stories that undermine our intentional autonomy. Or at least the intentional autonomy of people we don’t like. We speak of brainwashing by evil regimes, evil corporations, evil environment, evil liberals, evil conservatives, evil Christians, evil Muslims, evil education system. Is there anybody out there who has not been accused of brainwashing someone?

So, to take us back to the question I suggested above? Can it actually reveal autonomous intentionality? Well, no. But it can certainly reveal bounded autonomous intentionality that some people call goal directedness. The ability to make a plan in the pursuance of a goal that consists of sub-goals and so on… And whether all of that will at some point emerge into something that fits neatly into our stories about sentience is anybody’s guess. But it’s certainly not there yet.

Some acknowledgements of intellectual debts

I’m aware of Searl’s book ‘Intentionality’ and although I’ve never read it, I’ve heard him speak about its topics a few times.

I’ve been thinking about the fundamental paradox of autonomous intentionality in one way or another as far back as I can remember but I got turned on to some of the more interesting questions regarding the near conspiracy-theory-level suspicion about the mind by Feyerabend in ‘Conquest of Abundance’.

My first encounter with the notion of embodiment was in Lakoff’s treatment of categories (still underappreciated). He also has the richest treatment of the richness of what is involved even in the most routine cognition. More recently, I have been thinking about embodiment from the perspective of Gibson’s direct perception and ecological psychology.

%d bloggers like this: