The Chinese Room
"Don't you see that the whole aim of Newspeak is to narrow the range of thought? In the end we shall make thoughtcrime literally impossible, because there will be no words in which to express it."
George Orwell as Syme in "1984"
I could not see very far because giants were standing on my shoulders.
John Rogers Searle (born December 1932) is Mills Professor of Philosophy at the University of California, Berkeley, and is noted for contributions in the philosophy of language, philosophy of mind and consciousness, on the characteristics of socially constructed versus physical realities, and on practical reason.
In 1980 he proposed the Chinese Room thought experiment that goes like this: A person who understands no Chinese sits in a room into which written Chinese characters are passed. In the room there is also a book containing a complex set of rules (established ahead of time) to manipulate these characters, and pass other characters out of the room. This would be done on a rote basis, eg. "When you see character X, write character Y". The idea is that a Chinese-speaking interviewer would pass questions written in Chinese into the room, and the corresponding answers would come out of the room appearing from the outside as if there were a native Chinese speaker in the room. This whole set-up depicts a computer executing instructions (program) to manipulate abstract symbols.
It is Searle's belief that such a system would indeed pass the Turing Test, yet the person who manipulated the symbols would obviously not understand Chinese any better than he did before entering the room. Searle proceeds to try to refute the claims of strong AI: that if a machine were to pass a Turing test, then it can be regarded as "thinking" in the same sense as human thought; or put another way, that the human mind is some kind of computer running a program.
To say that the debate over the Chinese room thought experiment generated much controversy is an understatement. It raises many profound questions concerning the foundations of Artificial Intelligence. It is certainly the mark of a great mind to be able to shake the beliefs of so many people with a simple thought experiment.
You might be of the opinion that debating the Chinese Room argument again is like flogging the ground where tradition has it that a dead horse once stood. However these thought experiments, when well designed, as is the case of the Chinese Room, have this bad habit of biting you in the back as soon as you turn away, with that warm feeling of mission accomplished in your heart, after having presented the ultimate argument on the subject. The problem is, of course, that as soon as you finished presenting your argument, someone else presented an equally compelling counter-argument. Then, as you prepare to start rolling that rock up-hill again, you wonder how much blood there is still in the old gentleman.
In this article, I will try to dismiss Searle’s conclusions, based on the argument that the Chinese Room cannot be considered an intelligent system by any stretch of the word. Then, I will cast my doubts as to the possibility of a system like the Chinese Room being able to pass the Turing Test. Finally I will try to determine what it would take for a system like this to understand Chinese, based on some observations on the use of language.
I would also like to use the Chinese Room thought experiment as an excuse to expound my views on some practical aspects pertaining to the implementation of human-level intelligent devices, which is, after all, the ultimate goal of Artificial Intelligence. The ideas expressed in this article constitute some of the underlying theoretical assumptions that presided the design and implementation of the Corby system.
If you want to know my opinions about the Turing Test read my article The Turing Test. If you want to know what I think about intelligence, in its natural and artificial varieties, read my article What is (Artificial) Intelligence?.
In order to investigate what is going on in the Chinese Room, I went there myself and, using a small part of my extensive Chinese language skills, I asked this simple question: “How do you like Beethoven?”. The answer, in Chinese, left no doubt about the matter: It said “I hate Beethoven; I only like Rachmaninof”.
It happens that I know Beethoven to be the favourite composer of the Englishman in the Chinese Room. As physical rooms, Chinese or otherwise, do not have musical preferences, who in the whole set up of the Chinese Room so much dislikes that great musical genius? There can be only one answer: Someone who is not even there, the man who wrote the rulebook.
The Chinese Room is just a device that allows us to engage in a conversation by proxy with the man who wrote the rulebook. This “conversation by proxy” paradigm is the problem that plagues most of the Artificial Intelligent systems of this kind in use today. They rely entirely on the programmer’s ability to understand the language they deal with. And that is the main cause of their failure to even come close to emulate a human being. The whole thing hinges precisely on the device’s ability to understand what is being told and until it does, there is not much hope that Artificial Intelligence will even come close to reaching its goal. Trying to give an answer without really understanding the question is similar to trying to solve a problem without really understanding what the problem is. Until you do really understand a problem, the chances of finding a solution for it vary from dim to non-existent.
A book, under some perspective, can be considered a device that enables its author to answer hypothetical questions posed by its readers. To publish a book you go to a publisher who, in turn, contracts a typography to produce the book. In the old days, a book would be typeset manually by a typographer who would pick up letters made of molten lead and align them together to form the words, then the lines and the pages of the book. It did not matter that the book was written in Latin and the typographer did not understand a word in that language. This is, in essence what goes on in the Chinese Room, where the Englishman takes the place of the typographer.
Nowadays, just about everybody uses a computer program to typeset a book. This typesetting program is no more intelligent that the whole Chinese Room set up. The only intelligence in these examples resides in the author’s head in the case of the book or in the head of the man who wrote the rulebook in the case of the Chinese Room.
Any conclusion that we attempt to draw from the Chinese Room experiment relative to intelligent systems is fundamentally flawed due to the fact that the Chinese Room is not an intelligent system.
What would it take then for the Chinese Rom to become an intelligent system? The answer is deceptively simple: It would need to be able to write the rulebook itself. But that would require that the Chinese Rom would be able to understand Chinese and this is by no means a simple matter.
Imagine that there is this fact that you must be aware of, because, for some reason it is relevant to you: Your dog is barking in your backyard. You can be made aware of this fact by a number or ways, for instance:
· Although you are not able to see any dog at the moment, you can hear one barking and you can identify it as being your dog; judging by the direction the sound comes from you conclude that it comes from your own backyard.
· You can actually see your dog in the backyard and, although you cannot hear it because your windows are soundproof, the dog is moving as if it were barking.
· Someone calls you on the phone and says: Your dog is barking in the backyard.
· Someone sends you an email saying: Your dog is barking in the backyard.
All the above have one thing in common: They have the ability to change your world model to include a new fact that probably was not there before. The last two instances are of particular interest to us here because they involve the use of language.
It is appropriate for us discuss briefly this world model concept. Many people disagree with this idea of us having a world model. Their objection can roughly be formulated like this: Why do we need a world model when we have the real thing out there?
The need for a world model arises precisely because, in many circumstances, some aspect of the world is not available at the moment that we need it. Then we use our world model, which may not be very accurate or even be properly updated, as the next best substitute.
The world model is what allows us to respond to events shifted in time or space. In the above example of the barking dog, if you move to a place where you cannot see or hear the dog and someone asks you the question “Is that your dog barking in the backyard?”, you are able to answer correctly because your world model has been updated before with that fact. Also using the same example, you may decide to do something about it the next morning. Then, when you wake up the next day, you will be able respond to an event that occurred the day before, because it was stored as part of your world model.
Other people object to world models because, when they hear the word model they immediately associate it with formal, scale-down models. Then they imagine that we are talking about having in our heads little houses, with little backyards where little dogs are presently barking. This could not be further from the truth. A weather forecast program that deals with hurricanes does not need to have a little hurricane in the computer memory. A hurricane is probably modelled by a set of equations, each one dealing with a particular aspect of hurricane behaviour. In the same way, intelligent systems do not need formal models of the world. What they need is some relevant information that they can check when the need arises to build a response involving some aspect of the real world and the real thing is not directly available at the moment.
The main source for the individual’s world model is the information it collects directly from the environment through its sensors. But some more evolved animals soon discovered that they could improve their world models if one individual in a community, upon perceiving some aspect of the environment relevant to all, would signal the fact through some kind of behaviour. It is common, for instance, for some animals to emit some kind of sound when they perceive a predator in the neighbourhood. This is probably how language was born: As a conventional set of behaviours, which don’t have any meaning by themselves, but get their significance from their relationship with some aspect of the environment.
Humans took language to new heights, using it not only in relation to the environment but also to encompass any aspect of their world models. These in turn include not only aspects of the physical world, but also mental constructs that go beyond that. We also invented a way to record the behaviour associated with language where writing a set of symbols is used to represent that behaviour. Language can then be understood as a tool used by intelligent beings to synchronize their world models, making the knowledge acquired by one individual available to all the individuals in the community.
Language is then a set of conventions implicitly agreed upon by all the individuals in the community. It doesn’t matter if one particular aspect of the environment is associated with this or that behaviour, what really matters is the agreement among the individuals about it. A dog could very well be called xpto, if all English-speaking individuals would agree on that. The same can be said about the symbols used in the written form of the language: Some human languages use the same set of symbols but not all of them do. Again, that is a matter of agreement between the language’s users.
We now come to the crucial part of this article and must confront the question of meaning: How, for instance, do we come to understand that the latter two methods in the list at the beginning of the preceding section mean exactly the same thing than the former two? This is the question of semantics and it is crucial for Artificial Intelligence to bear fruit.
Stevan Harnad's Symbol Grounding Problem is, in some perspective, an extension to the Chinese Room argument. It starts proposing the "Chinese dictionary" as the first and only reference offered to an intelligent entity. When it receives from the outside questions written in Chinese, it looks up each symbol of the question in the Chinese dictionary. But, what it finds there is just other Chinese symbols. From this, Harnad develops some considerations about discrimination and identification to finally propose the necessity of "grounding" symbols in iconic representations and these into distal objects of our sensory surfaces.
According to this theory, we cannot determine the meaning of words just by explaining them in terms of other words. At some point we need to ground the meaning of words in terms of direct sensory experiences.
We can raise at least three kinds of objections against this theory: The first one is based on words that represent abstract concepts, the second is based on the false dichotomy between two things that are essentially the same and finally a third one based on people with disabilities.
The first argument against the Symbol Grounding theory is related to the understanding of words that represent abstract concepts. The Symbol Grounding theory seems to work very well for things that are in some way related to other things that we can experience directly: A giraffe is like a horse with a long neck, a zebra is like a mule with stripes, and so on. But how on earth do you ground words representing abstract concepts like beauty, honour, courage, shame, freedom or god?
You can try this simple experience: Get yourself a book on some esoteric subject, like a philosophy treaty and read it until you find a word that you do not understand. At that point get yourself a dictionary, of the kind that does not have examples of word use, and try to understand the word using only the definitions contained in the dictionary. In many instances, this will not work; you need something else to understand the word and this goes against the Symbol Grounding theory because a dictionary should be able, albeit with some indirections, to explain every symbol in terms of other symbols for which you have the proper grounding.
In the above experiment you must use a dictionary that does not contain examples of word use. This is, by the way, an important clue as to what you need to understand a word: To see it used in context. You will need many examples of context until you can determine the rules that govern the use the unknown word. You can say that you fully understand the word when you are able to use it correctly in all the appropriate contexts.
The false dichotomy argument is based on the fact that you perceive words exactly by the same mechanism that you perceive the objects they represent. You perceive the written word “dog” because some photons impinge selectively in some areas of your retina. This is exactly the same process by which you perceive a real dog. Then why one of the things should take precedence over the other when they are essentially the same?
Now this raises the problem of the transfer of understanding. Once you understand what a dog is, you can associate that understanding with something else, for instance a picture of a dog or the written word “dog”. For that you establish some simple relationship like putting the word “dog” under the picture of a dog or uttering the word “dog” while pointing to the real thing. But what we are talking here is transference, not acquisition of understanding.
Finally, the people with disabilities argument deals with the fact that many people are not able to perceive directly many more things than the people without those disabilities. However, that fact doesn’t seem to impair their understanding of the world.
One of the most dramatic examples in this respect is the one of Helen Keller (Helen Adams Keller, 1880 -1968). She was both deaf and blind and this did not prevent her from becoming an author, activist, lecturer and otherwise an influential member of her community. She wrote a total of eleven books, and authored numerous articles.
But the main problem with the Symbol Grounding theory is that it does not explain how we are able to understand even the things that we can experience directly, or indeed what understanding really is. Imagine that, while taking a walk through the woods, you get a glimpse of an animal that you have never seen before. As we do not know the name of the animal, let us call it X. Then the question arises: “Do you understand X?”.
Surprisingly, we can define a good way to answer that question in a reasonably precise way. The degree of understanding that you have about something can be expressed as a ratio between the number of questions that you can answer about that thing and the total number of possible questions about the thing. Expressed as a percentage, this ranges from 0, when you cannot answer a single question, to 100%, when you can answer all the questions.
Back to our X, we could ask many questions like what does X eat, is it a predator, is it a prey, can it fly, where does it hide and so on. You must concede that, at this point, your understanding of X, in spite of the fact that you have experienced it directly, is very close to 0. In order to increase your understanding of X you must go back to the woods and see more of it.
Now suppose that you manage to see X again, but this time the animal is perfectly still. You stay there, until the cows come home and at the end of the day you realize that you did not increase your understanding of X by one iota. You could have been looking at a picture of X all day and the result would be the same. Then you realize that what you need to increase your understanding of X is to see it how it relates to other things in the environment. For an animated thing, as is the case of X, the best way to do this is to watch how the animal interacts with its environment: If you see it making a nest in an eucalyptus you know where it lives, if you see it chasing a rabbit you will know that it is a predator, and so on. In the case of an inanimate object, we can look at things like how other living things behave towards the object, or if the object appears preferentially in the vicinity of some object.
Now, suppose that I tell you a new word: Xdruch. You can gain understanding about this new word using the same mechanisms described above to understand X, precisely because they are the same thing: An element of your environment. What you need here is to discover how Xdruch relates to other elements, natural or artificial, of the environment including, of course, other words.
It can very well happen that, as you gain understanding about Xdruch, you come to the conclusion that the meaning of the word overlaps what you know about X, the animal that you encountered in the woods. Then you can merge the knowledge that you acquired separately into a single entity and from now on, the appearance of the animal or the word now associated with it evokes the same thoughts.
In that case, what some people call grounding is no more than a process of associating the understanding of some element of the environment with some other element. It is a process of transference and has nothing to do with the process of acquiring understanding.
It can happen, of course, that Xdruch is a word that I have just invented and therefore, it will not have any relationships. Then I can establish some artificially, for instance by associating the word with X, the animal that you saw earlier in the forest. Or I can associate it with some other artefact, like, for instance that beautiful chair that I have just made. In this case we are again just transferring the understanding of some element of the environment with some other element.
Once we know the relationships between some element and other elements of the environment we can proceed to the next step: The discovery of the rules that govern those relationships. This provides us with what we need to make an attempt to a definition of understanding: It is the process of discovering the rules underlying the relationships between elements of the environment. This makes understanding the main objective of intelligence, because once you understand the rules you can make predictions and this is one of the most powerful survival tools that an individual can possess.
The Chinese Room thought experiment is clearly a philosopher’s device. They like to set up these mental constructs, based on impossible premises that they pull out of the thin air of stratosphere. Then, although they virtuously deny it in public, we all know that they secretly wish that their little experiments have an impact in real life; that is human nature and it is understandable. So, when they say that the Chinese Rom is able to answer any question that an outsider may pose, there is nothing we can do about it because this is a premise to the problem.
However, for us, lesser mortals, who have to deal with the realities of everyday life, things are not so simple. Therefore, we must investigate whether it is possible to build such a system, using the knowledge and technologies currently available. As we have seen in a previous section, what is involved here is whether a system that engages in a conversation by proxy with its users can indeed pass the Turing Test.
In my article The Turing Test I make the case that probably we will never build a machine that passes the Turing Test. Not that this is prevented by some well-established law of physics, but because the benefits that such a machine would bring do not justify the costs involved. This is not a defeatist position, as I think that some day we will be able to build human-level (as opposed to human-like) intelligence, if not superior. In any case, if we ever come close to build a machine that passes the Turing Test it will, almost certainly, be an embodied machine with human like sensors and capable of interacting with the world very much in the way we do. In other words, it will be a mobile robot.
From the above you must infer that the answer to the question of whether it is possible to build a machine that behaves like the Chinese Room is no. The most compelling argument for this position is really the fact that the best minds in the field have been trying precisely that for the best part of 50 years now, backed by huge amounts of money and we are no closer to the goal than we were 50 years ago.
Some people, however, are not convinced by this argument and think that perhaps just a little more effort is what is needed. If you are among those, just consider this: For such a machine to be successful you have to contemplate every possible question that someone could ever ask the machine. Just consider the multiple ways in which an idea can be expressed, the little variations like punctuation and the use of synonyms. The number of possible combinations gets quickly out of hand. If the machine was able to get the idea behind the words, it would be easier, but that requires understanding and that is what a machine modelled after the Chinese Room, by definition, cannot do.
But it gets worse. Now consider a normal conversation between two persons. What one person says depends not only on what the other person just said but also on all the previous sentences uttered by both persons during the whole conversation. Not only that but a response may depend on some aspect of the world, as perceived by any of the interlocutors. Then the number of responses for each question that you must contemplate beforehand just raises to impossible levels. This raises two very important issues: One is the amount of work needed to set up such a system; the other is the huge amount of storage space for such a thing.
As if the above was not bad enough, the worse is yet to come. It is not a matter of if, it is a matter of when someone comes up with a question that you have not contemplated. In that event, a system like the Chinese Room would be at a complete loss. Human beings can cope nicely with this situation: If everything else fails, they can always resort to asking for clarifications. Our mechanical system cannot do that, for the simple reason that the question was not contemplated in the first place. But humans can do even better: They can infer the meaning of the question by establishing how close it is to other questions for which they know the response. Then they can use that with a degree of confidence that is proportional to the degree of similarity.
In this section we will investigate what it would take for a machine to learn Chinese. Here the Chinese Room metaphor fails us because it contains an element, the Englishman, which already possesses all that is needed to learn Chinese. Therefore we will abandon for a moment the Chinese Room metaphor and discuss first what is needed for the Englishman to learn Chinese and then ask the same thing for a computer system.
It is reasonable to assume that nobody disputes the fact that the Englishman already has in his brain all the machinery needed to learn Chinese, or indeed any other language. Then the question that remains is if the Chinese Room provides a suitable environment for him to do so.
There are probably several methods by which we learn our native language. But one popular method, used throughout the world in language schools dedicated to teaching a second language goes like this: The professor says some sentence in the target language and lets the student find the appropriate response. If the student is incapable of doing so, the professor supplies the correct response, lets the student repeat it and then moves on to the next phrase.
This method is then based on the fundamental fact that the student is supplied with sentence-response pairs and he is able to learn the target language from that. If this is so, then the Chinese Room has everything that the Englishman needs to learn the language. He has access to both the questions posed by outsiders and the respective responses that he can get by following the instructions in the rulebook.
We are in an altogether different situation if we consider the Chinese Room as a computer system running a program. Here we do not have a rulebook written by someone intelligent to provide the responses. Therefore for the system to learn Chinese, we must supply both the questions posed by outsiders and the respective responses.
But the program inside the computer plays the same role as the rulebook in the Chinese Room metaphor. If this rulebook does not contain the responses to the questions directly what does it contain? This is the question that Artificial intelligence must answer. But if what has been said in previous sections is valid, it will contain primarily instructions that enable it to discover and explore useful relationships between elements of the language.
From the above, we must conclude the Chinese Room is not an intelligent system. Therefore, any conclusion that we attempt to draw from the experiment relative to intelligent systems is fundamentally flawed.
This article also raises serious doubts as to whether a system based on the “conversation by proxy” paradigm can pretend to one day be able to pass the Turing Test.
On a more positive note, this article shows some avenues to explore in order to tackle the thorny problem of semantics.
So where do we stand in relation to the claims of Strong AI? Back where we started, I suppose. But Artificial Intelligence people should take heart in the fact that the Chinese Room didn't manage to prove that their efforts would never bear fruit.
I would like to end this article by launching a contest to find the best question to ask the Chinese Room, that is, the one that would provide the deepest insight into what is going on there. My entry for that contest would be:
Do you understand Chinese?
Comments and suggestions about this page are welcome and should be sent to firstname.lastname@example.org
Rev 1.0 - This page was last modified 2005-07-22 - Copyright © 2004-2005 A.C.Esteves