The Turing Test
The layman's interpretation of the big bang theory: In the beginning, there was nothing, which exploded!
You insist that there is something a machine cannot do. If you will tell me precisely what it is that a machine cannot do, then I can always make a machine which will do just that!".
John von Neumann
Alan Mathison Turing (June 23, 1912–June 7, 1954) was a British mathematician, logician, and cryptographer, and is considered to be one of the fathers of modern computer science. He provided an influential formalization of the concept of algorithm and computation: the Turing machine. He formulated the now widely accepted 'Turing' version of the Church-Turing thesis, namely that any practical computing model has either the equivalent or a subset of the capabilities of a Turing machine. During World War II he worked on breaking German ciphers, particularly the Enigma machine; he was the director of the Naval Enigma section at Bletchley Park for some time. After the war, he designed one of the earliest electronic programmable digital computers at the National Physical Laboratory and, shortly thereafter, actually built another early machine at the University of Manchester. He also, amongst many other things, made significant and characteristically provocative contributions to the discussion "Can machines think?"
In his 1950 paper “Computing Machinery And Intelligence” he proposed the following thought experiment that he called “The Imitation Game”: Imagine a locked room with a computer inside. Questions can be fed into the room, and its hidden inhabitant must reply. If, based on such a dialogue, we cannot determine whether the inhabitant is human or machine, then the machine can think.
This came to be known as the “Turing Test” and it is considered by many people that it can be practically implemented and used to determine if a machine is intelligent. In this article I will discuss some of the issues regarding the use to the Turing Test and what it would take for a machine to pass it. Then, based on that assessment, I cast my doubts as to whether we will ever build such a machine. Finally, I will discuss a practical implementation of the Turing Test and its contribution to the goal of machine intelligence.
This article discusses some of the philosophical aspects of intelligence but concentrates on the practical aspects of achieving human-level intelligence in artificial devices. It is thus mainly directed to people that concern themselves with the practical aspects of Artificial Intelligence. The ideas herein expressed describe some of the underlying theoretical assumptions that presided the design and implementation of the Corby system.
If you want to know what I think about intelligence, in its natural and artificial varieties, read my article What is (Artificial) Intelligence?. If you want to know what I think about Searle’s Chinese Room thought experiment, read my article The Chinese Room.
It is very difficult for a machine to be able to pass the Turing Test. The criteria, set by Turing in his 1950 paper, states that a machine passes the test if the “average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning”. Turing does not say what an “average interrogator” is, so we assume that this can include any person, with all kinds of education and background imaginable, who will look at the problem from many different perspectives.
If someone were to ask me to put in a single sentence what it would take for a machine to be able to pass the Turing Test, I would say: It must interact with our environment just like humans do.
This statement has many implications, the first of which is that the machine must share our environment. A fish, no matter how intelligent, would never be able to pass as a human.
Not only must the machine share our environment, it must perceive it like we do. Therefore it must have a set of sensors that can accurately duplicate the functionality of the human senses. There is also the requirement that the machine use each sense in the way that humans do. Imagine, for instance, that a dog would be able to participate in a Turing Test in which you are the examiner. The conversation could go like this:
- How’s she like?
- Oh, she’s hot. She smells like parsley and sage, with a touch of rosemary and a dash of thyme. You know, like when you go to the Scarborough fair through the meadows covered with mist, in spring time, on a Sunday morning.
- Hum… let me see you’re a dog, right?
- Well yeha, how did you guess?
To be able to interact with the world, our machine must have actuators that can manipulate things in the environment very much like we do. Then the machine must have a body where the sensors and actuators are mounted. This body must be able to move in the environment very much like we humans do.
In sum, our Turing Test passing machine must be a mobile robot with very much the same capabilities that a human body has. Not only that, it must have the same physical appearance. Imagine this snippet of conversation in a Turing Test environment:
- Put your hand over your head. What do feel?
- Well, a few years ago I would have felt some hair, now I only feel something like a big egg.
Or this one:
- Put your left hand over your left eye.
- Now put your right hand over your right eye.
- Nah, you won’t talk me into that one again. The last time someone did that I thought I was going blind.
This implies that our mobile robot be an android, that is, it must have the appearance of a human being.
But the most difficult part is yet to come. Our machine must be able to learn from its interactions with the environment and be able to build a world model that resembles the world model that humans have in their heads, no matter if it is not very accurate in some aspects. The conversation between two humans uses primarily their world model and a conversation is not possible if the models are too different.
But there is even worse. Our android robot must also have a history, as any human being does. Any human being remembers himself as a child, playing with other children, going to school, the house where he lived as a teenager and so on. This lifetime of personal experiences is perhaps the most difficult thing for our robot to acquire.
When it was the time to start building a machine that would pass the Turing Test, people considered carefully the above requirements, took a deep breath and then decided to fake the whole thing. Turing, himself, set the ball rolling when he said:
The machine would be unmasked because of its deadly accuracy. The reply to this is simple. The machine (programmed for playing the game) would not attempt to give the right answers to the arithmetic problems. It would deliberately introduce mistakes in a manner calculated to confuse the interrogator.
Other people followed suit and the result, as they say, is history. Nowadays hardly anyone does Artificial Intelligence research anymore. Most practitioners shifted their focus to derivative products like expert systems, data mining, and similar areas. When confronted with Turing’s original question “Can machines think?”, people either dodge the question or plainly answer no. Researchers in natural language processing discuss things like parsers, part-of-speech taggers and lemmatizers; others dedicate themselves to building ontology databases by hand. It is not that these things are not important or useful, but Artificial Intelligence they are not.
A few faithful remain, but they are trapped in the same old paradigm. The result is that today we are not able to build a credible model of the simplest animal, much less something that even roughly emulates the behaviour of a human being.
The Turing Test became then a contest to find the machine that is best able to fool the examiner into thinking it is the real thing. Instead of being the “The Imitation Game” as Turing would have it, his test became “The Deceiving Game” and that is the original sin of the Turing Test.
Of course we have to take this into perspective. Half a century ago, computers were a novelty and I am sure that the first time that a computer output the sentence “Hi there, how’s things going?” people started jumping around, crying “It’s alive!, It’s alive!”. Nowadays computers are a household appliance like a TV or a washing machine and we do not have that excuse anymore. If we are to rekindle the old flame and regain the credibility lost we must face the tough issues head on.
When people set out to build machines that would pass the Turing Test and they realized that to do the thing properly would be too difficult, they starting to think about workarounds to the problem. At some point they realized that what the test demands is that the machine gives the right answer, then they thought: If I make a machine and cram into it as many question-answer pairs as possible, chances are that the examiner will only ask questions that are among those.
Then they started to build machines that engage in some kind of conversation by proxy, because what happens here is that the examiner is really dialoguing with the person who programmed the machine. The machine is just a sort communications tool, with the ability to play back pre-defined answers. A machine like this cannot be considered intelligent, because the intelligence in this case is not in the machine but in the head of the person that programmed its database of question-answer pairs.
For a machine like this to be considered intelligent, it would need to demonstrate somehow that it can understand what people say. This the semantic problem, a very tough nut to crack and one that has not received a satisfactory answer so far. This is perhaps the first problem to tackle if we are to revive the old dream and try to answer in a practical and positive way Turing’s original question “Can machines think?”. In my article The Chinese Room I discuss the semantic problem and present some ideas about how we can approach it.
If you are still not convinced that a machine based on the "conversation by proxy" paradigm will never be able to pass the Turing Test, consider this: For such a machine to be successful you have to contemplate every possible question that someone could ever ask the machine. Just consider the multiple ways in which an idea can be expressed, the little variations like punctuation and the use of synonyms. The number of possible combinations gets quickly out of hand. This raises two very important issues: One is the huge amount of storage space for such a thing; the other is the amount of work needed to set up such a system. If the machine was able to get the idea behind the words, it would be easier, but that requires understanding and so far we have not addressed that problem.
Consider the old “say X” routine that we use on toddlers to improve their pronunciation abilities. The adult says, for instance “say cat” and the child replies “cat”. Imagine how many thousands of entries you would need in a database for a machine based on the "conversation by proxy" paradigm to deal with this simple problem, which any small child can cope with. Now imagine yourself filling the entries in the database for this.
But it gets worse. Now consider a normal conversation between two persons. What one person says depends not only on what the other person just said but also on all the previous sentences uttered by both persons during the whole conversation. Not only that but a response may depend on some aspect of the world, as perceived by any of the interlocutors. Then the number of responses for each question that you must contemplate beforehand just raises to impossible levels. This is where most of the programs existing nowadays fall on their faces, because the problem is so difficult that some people do not even bother to address it.
As if the above was not bad enough, the worse is yet to come. It is not a matter of if, it is a matter of when someone comes up with a question that you have not contemplated. In that event, a system like that would be at a complete loss. Human beings can cope nicely with this situation: If everything else fails, they can always resort to asking for clarifications. Our mechanical system cannot do that, for the simple reason that the question was not contemplated in the first place. But humans can do even better: They can infer the meaning of the question by establishing how close it is to other questions for which they know the response. Then they can use that with a degree of confidence that is proportional to the degree of similarity.
But the fundamental flaw with a system based on the "conversation by proxy" paradigm is that it relies entirely on the programmer’s ability to understand the language it deals with. Therefore it cannot even be considered an intelligent system.
Dr. Hugh G. Loebner is a philanthropist that at some point in his life decided to spend some of his money promoting artificial intelligent devices, using the Turing Test as the fundamental tool.
In 1990 Dr. Loebner agreed with The Cambridge Centre for Behavioral Studies to underwrite a contest designed to implement the Turing Test. Dr. Loebner pledged a Grand Prize of $100,000 and a Gold Medal for the first computer whose responses were indistinguishable from a human's. Each year an annual prize of $2000 and a bronze medal is awarded to the most human computer. The winner of the annual contest is the best entry relative to other entries that year, irrespective of how good it is in an absolute sense.
The contest has taken place every year up to the present and the transcripts of the contesting systems are available on the Internet. The contestants, commonly called chatbots, try to engage in a conversation with each of the several judges and the results are later accumulated in the final score.
I have a lot of respect for Dr. Loebner and all the participants in his contest because they have the courage to back up their convictions will real money or effort. But it is clear, by reading the transcripts, that they are not getting the best for their money or effort. In fact, the results can be best described as pathetic, not just because they are very modest, which they are, but because of the huge distance between what they intend to do and what in fact they manage to achieve. Imagine a program that cannot even be considered intelligent, trying to pass as a human being in a conversation.
The programs presented are little more than toys, based on and with little improvement over the old Joseph Weizenbaum’s program ELIZA. Here is a small excerpt from the dialog between a human and the Jabberwacky entry to the Loebner Prize 2004:
PROGRAM: Hello, my name is George and I am the human.
JUDGE : Hi George, my name is S t e v e n.
PROGRAM: Sally, was a clown.
JUDGE : Who's sally?
PROGRAM: You wrote sally!
JUDGE : I did?
PROGRAM: Yes. You are very confusing.
JUDGE : Please accept my abject apologies.
PROGRAM: That's ok. What are you doing today?
JUDGE : Chatting with you, at the moment.
PROGRAM: What part of the grass do you stand smart guy.
JUDGE : Do you mean what part do I stand *on*?
JUDGE : Then what do you mean?
PROGRAM: I mean Radiohead!!
JUDGE : What is Radiohead?
When two strangers engage in a conversation for the first time, they try to find common life experiences or common interests to base the conversation on. This can be anything from where they live or where they were born, to what kind of music or movies they like, to what sports team they support. A chatbot does not have life experiences and we can hardly say that it has interests at all. What they have is some canned answers that they clumsily try to match to questions posed by the judges. Most of the bots completely forget what they said a moment ago and their responses completely disregard the current context.
From the onset, the programmers competing in the Loebner Prize followed the Turing Test’s lead and engaged in the “Deceiving Game”. The aim here is to fool the judges for the short duration of the interviews and that is all. Learning ability is not valued or even mentioned in the contest. It is doubtful that this fundamentally flawed process will ever lead to really intelligent machines that we could use in real life tasks.
Perhaps the problem here is that we want to jump from zero to an extremely advanced level in a single step. And that is the hard way of doing things; the usual way is to advance one step at a time, so that mistakes are corrected early on, before they become too costly to correct.
History teaches us that we should think at least twice before we make predictions about something never happening, unless the reasoning is based on some well-established law of physics. So, after thinking at least three times about the matter, I will make this bold prediction: We will never build a machine that passes the Turing Test.
Let me add some weasel words, just in case someone decides to embarrass me by building such machine during my lifetime. This prediction is based on the fact that the cost of building such a machine is so big compared to the benefits that it would bring, that nobody in their right mind would engage in such an adventure. These considerations, which are valid in today’s world, may not apply in the future.
But there is nothing to prevent us from building a machine that demonstrates human-level (as opposed to human-like as required by the Turing Test) intelligence. Such a machine would not need to have a body or any of the artefacts described at the beginning of this article; it could, in fact, be just a computer program.
That machine would fully assume its artificial status. In the same manner that in a Turing Test environment we could have this conversation:
- What do you feel when you see a beautiful sunset?
- Nothing. I’m blind.
We could have this conversation with our human-level intelligent machine:
- Have you seen the Foobar’s talk show last night on TV?
- No, I don’t have eyes, I’m just a simple-minded conversation program. I leave those kinds of activities to intellectuals like you.
This kind of system could be included in many useful applications. In my article What is (Artificial) Intelligence? I present a road map for building such a system and provide a definition of intelligence compatible with it.
Perhaps now is the time to revisit Alan Turing’s old dream of having one day machines that can in fact deserve to be called intelligent. Hopefully, this time we should have learned the lessons of past mistakes and avoid taking the easy paths that got us astray and proved to lead nowhere.
Comments and suggestions about this page are welcome and should be sent to firstname.lastname@example.org
Rev 1.0 - This page was last modified 2005-07-26 - Copyright © 2004-2005 A.C.Esteves