The Chinese Room
"Don't you see that the whole aim of
Newspeak is to narrow the range of thought? In the end we shall make
thoughtcrime literally impossible, because there will be no words in which to
express it."
George Orwell as Syme in
"1984"
I could not see very far because giants were
standing on my shoulders.
Anon.
John Rogers
Searle (born December 1932) is Mills Professor of Philosophy at the University
of California, Berkeley, and is noted for contributions in the philosophy of
language, philosophy of mind and consciousness, on the characteristics of
socially constructed versus physical realities, and on practical reason.
In 1980 he
proposed the Chinese Room thought experiment that goes like this: A person who
understands no Chinese sits in a room into which written Chinese characters are
passed. In the room there is also a book containing a complex set of rules
(established ahead of time) to manipulate these characters, and pass other
characters out of the room. This would be done on a rote basis, eg. "When
you see character X, write character Y". The idea is that a
Chinese-speaking interviewer would pass questions written in Chinese into the
room, and the corresponding answers would come out of the room appearing from
the outside as if there were a native Chinese speaker in the room. This whole
set-up depicts a computer executing instructions (program) to manipulate
abstract symbols.
It is
Searle's belief that such a system would indeed pass the Turing Test, yet the
person who manipulated the symbols would obviously not understand Chinese any
better than he did before entering the room. Searle proceeds to try to refute
the claims of strong AI: that if a machine were to pass a Turing test, then it
can be regarded as "thinking" in the same sense as human thought; or
put another way, that the human mind is some kind of computer running a
program.
To say that
the debate over the Chinese room thought experiment generated much controversy
is an understatement. It raises many profound questions concerning the
foundations of Artificial Intelligence. It is certainly the mark of a great
mind to be able to shake the beliefs of so many people with a simple thought
experiment.
You might
be of the opinion that debating the Chinese Room argument again is like
flogging the ground where tradition has it that a dead horse once stood.
However these thought experiments, when well designed, as is the case of the
Chinese Room, have this bad habit of biting you in the back as soon as you turn
away, with that warm feeling of mission accomplished in your heart, after
having presented the ultimate argument on the subject. The problem is, of
course, that as soon as you finished presenting your argument, someone else
presented an equally compelling counter-argument. Then, as you prepare to start
rolling that rock up-hill again, you wonder how much blood there is still in
the old gentleman.
In this
article, I will try to dismiss Searle’s conclusions, based on the argument that
the Chinese Room cannot be considered an intelligent system by any stretch of
the word. Then, I will cast my doubts as to the possibility of a system like
the Chinese Room being able to pass the Turing Test. Finally I will try to
determine what it would take for a system like this to understand Chinese,
based on some observations on the use of language.
I would
also like to use the Chinese Room thought experiment as an excuse to expound my
views on some practical aspects pertaining to the implementation of human-level
intelligent devices, which is, after all, the ultimate goal of Artificial
Intelligence. The ideas expressed in this article constitute some of the
underlying theoretical assumptions that presided the design and implementation
of the Corby
system.
If you want
to know my opinions about the Turing Test read my article The Turing Test. If
you want to know what I think about intelligence, in its natural and artificial
varieties, read my article What is (Artificial)
Intelligence?.
Index
The strange case of the barking dog
In order to
investigate what is going on in the Chinese Room, I went there myself and,
using a small part of my extensive Chinese language skills, I asked this simple
question: “How do you like Beethoven?”. The answer, in Chinese, left no doubt
about the matter: It said “I hate Beethoven; I only like Rachmaninof”.
It happens
that I know Beethoven to be the favourite composer of the Englishman in the
Chinese Room. As physical rooms, Chinese or otherwise, do not have musical
preferences, who in the whole set up of the Chinese Room so much dislikes that
great musical genius? There can be only one answer: Someone who is not even
there, the man who wrote the rulebook.
The Chinese
Room is just a device that allows us to engage in a conversation by proxy with
the man who wrote the rulebook. This “conversation by proxy” paradigm is the
problem that plagues most of the Artificial Intelligent systems of this kind in
use today. They rely entirely on the programmer’s ability to understand the
language they deal with. And that is the main cause of their failure to even
come close to emulate a human being. The whole thing hinges precisely on the
device’s ability to understand what is being told and until it does, there is
not much hope that Artificial Intelligence will even come close to reaching its
goal. Trying to give an answer without really understanding the question is
similar to trying to solve a problem without really understanding what the
problem is. Until you do really understand a problem, the chances of finding a
solution for it vary from dim to non-existent.
A book,
under some perspective, can be considered a device that enables its author to
answer hypothetical questions posed by its readers. To publish a book you go to
a publisher who, in turn, contracts a typography to produce the book. In the
old days, a book would be typeset manually by a typographer who would pick up
letters made of molten lead and align them together to form the words, then the
lines and the pages of the book. It did not matter that the book was written in
Latin and the typographer did not understand a word in that language. This is,
in essence what goes on in the Chinese Room, where the Englishman takes the
place of the typographer.
Nowadays,
just about everybody uses a computer program to typeset a book. This
typesetting program is no more intelligent that the whole Chinese Room set up.
The only intelligence in these examples resides in the author’s head in the
case of the book or in the head of the man who wrote the rulebook in the case
of the Chinese Room.
Any
conclusion that we attempt to draw from the Chinese Room experiment relative to
intelligent systems is fundamentally flawed due to the fact that the Chinese
Room is not an intelligent system.
What would
it take then for the Chinese Rom to become an intelligent system? The answer is
deceptively simple: It would need to be able to write the rulebook itself. But
that would require that the Chinese Rom would be able to understand Chinese and
this is by no means a simple matter.
Imagine
that there is this fact that you must be aware of, because, for some reason it
is relevant to you: Your dog is barking in your backyard. You can be made aware
of this fact by a number or ways, for instance:
·
Although
you are not able to see any dog at the moment, you can hear one barking and you
can identify it as being your dog; judging by the direction the sound comes
from you conclude that it comes from your own backyard.
·
You
can actually see your dog in the backyard and, although you cannot hear it
because your windows are soundproof, the dog is moving as if it were barking.
·
Someone
calls you on the phone and says: Your dog is barking in the backyard.
·
Someone
sends you an email saying: Your dog is barking in the backyard.
All the
above have one thing in common: They have the ability to change your world
model to include a new fact that probably was not there before. The last two
instances are of particular interest to us here because they involve the use of
language.
It is
appropriate for us discuss briefly this world model concept. Many people
disagree with this idea of us having a world model. Their objection can roughly
be formulated like this: Why do we need a world model when we have the real
thing out there?
The need
for a world model arises precisely because, in many circumstances, some aspect
of the world is not available at the moment that we need it. Then we use our
world model, which may not be very accurate or even be properly updated, as the
next best substitute.
The world
model is what allows us to respond to events shifted in time or space. In the
above example of the barking dog, if you move to a place where you cannot see
or hear the dog and someone asks you the question “Is that your dog barking in
the backyard?”, you are able to answer correctly because your world model has
been updated before with that fact. Also using the same example, you may decide
to do something about it the next morning. Then, when you wake up the next day,
you will be able respond to an event that occurred the day before, because it
was stored as part of your world model.
Other
people object to world models because, when they hear the word model they
immediately associate it with formal, scale-down models. Then they imagine that
we are talking about having in our heads little houses, with little backyards
where little dogs are presently barking. This could not be further from the
truth. A weather forecast program that deals with hurricanes does not need to
have a little hurricane in the computer memory. A hurricane is probably
modelled by a set of equations, each one dealing with a particular aspect of
hurricane behaviour. In the same way, intelligent systems do not need formal
models of the world. What they need is some relevant information that they can
check when the need arises to build a response involving some aspect of the
real world and the real thing is not directly available at the moment.
The main
source for the individual’s world model is the information it collects directly
from the environment through its sensors. But some more evolved animals soon
discovered that they could improve their world models if one individual in a
community, upon perceiving some aspect of the environment relevant to all, would
signal the fact through some kind of behaviour. It is common, for instance, for
some animals to emit some kind of sound when they perceive a predator in the
neighbourhood. This is probably how language was born: As a conventional set of
behaviours, which don’t have any meaning by themselves, but get their
significance from their relationship with some aspect of the environment.
Humans took
language to new heights, using it not only in relation to the environment but
also to encompass any aspect of their world models. These in turn include not
only aspects of the physical world, but also mental constructs that go beyond
that. We also invented a way to record the behaviour associated with language
where writing a set of symbols is used to represent that behaviour. Language
can then be understood as a tool used by intelligent beings to synchronize
their world models, making the knowledge acquired by one individual available
to all the individuals in the community.
Language is
then a set of conventions implicitly agreed upon by all the individuals in the
community. It doesn’t matter if one particular aspect of the environment is
associated with this or that behaviour, what really matters is the agreement
among the individuals about it. A dog could very well be called xpto, if all
English-speaking individuals would agree on that. The same can be said about
the symbols used in the written form of the language: Some human languages use
the same set of symbols but not all of them do. Again, that is a matter of
agreement between the language’s users.
We now come
to the crucial part of this article and must confront the question of meaning:
How, for instance, do we come to understand that the latter two methods in the
list at the beginning of the preceding section mean exactly the same thing than
the former two? This is the question of semantics and it is crucial for
Artificial Intelligence to bear fruit.
Stevan
Harnad's Symbol Grounding Problem is, in some perspective, an extension to the
Chinese Room argument. It starts proposing the "Chinese dictionary"
as the first and only reference offered to an intelligent entity. When it
receives from the outside questions written in Chinese, it looks up each symbol
of the question in the Chinese dictionary. But, what it finds there is just
other Chinese symbols. From this, Harnad develops some considerations about
discrimination and identification to finally propose the necessity of
"grounding" symbols in iconic representations and these into distal
objects of our sensory surfaces.
According
to this theory, we cannot determine the meaning of words just by explaining
them in terms of other words. At some point we need to ground the meaning of
words in terms of direct sensory experiences.
We can
raise at least three kinds of objections against this theory: The first one is
based on words that represent abstract concepts, the second is based on the
false dichotomy between two things that are essentially the same and finally a
third one based on people with disabilities.
The first
argument against the Symbol Grounding theory is related to the understanding of
words that represent abstract concepts. The Symbol Grounding theory seems to
work very well for things that are in some way related to other things that we
can experience directly: A giraffe is like a horse with a long neck, a zebra is
like a mule with stripes, and so on. But how on earth do you ground words
representing abstract concepts like beauty, honour, courage, shame, freedom or
god?
You can try
this simple experience: Get yourself a book on some esoteric subject, like a
philosophy treaty and read it until you find a word that you do not understand.
At that point get yourself a dictionary, of the kind that does not have
examples of word use, and try to understand the word using only the definitions
contained in the dictionary. In many instances, this will not work; you need
something else to understand the word and this goes against the Symbol
Grounding theory because a dictionary should be able, albeit with some
indirections, to explain every symbol in terms of other symbols for which you
have the proper grounding.
In the
above experiment you must use a dictionary that does not contain examples of
word use. This is, by the way, an important clue as to what you need to
understand a word: To see it used in context. You will need many examples of
context until you can determine the rules that govern the use the unknown word.
You can say that you fully understand the word when you are able to use it
correctly in all the appropriate contexts.
The false
dichotomy argument is based on the fact that you perceive words exactly by the
same mechanism that you perceive the objects they represent. You perceive the
written word “dog” because some photons impinge selectively in some areas of
your retina. This is exactly the same process by which you perceive a real dog.
Then why one of the things should take precedence over the other when they are
essentially the same?
Now this
raises the problem of the transfer of understanding. Once you understand what a
dog is, you can associate that understanding with something else, for instance
a picture of a dog or the written word “dog”. For that you establish some
simple relationship like putting the word “dog” under the picture of a dog or
uttering the word “dog” while pointing to the real thing. But what we are
talking here is transference, not acquisition of understanding.
Finally,
the people with disabilities argument deals with the fact that many people are
not able to perceive directly many more things than the people without those
disabilities. However, that fact doesn’t seem to impair their understanding of
the world.
One of the
most dramatic examples in this respect is the one of Helen Keller (Helen Adams
Keller, 1880 -1968). She was both deaf and blind and this did not prevent her
from becoming an author, activist, lecturer and otherwise an influential member
of her community. She wrote a total of eleven books, and authored numerous
articles.
But the
main problem with the Symbol Grounding theory is that it does not explain how
we are able to understand even the things that we can experience directly, or
indeed what understanding really is. Imagine that, while taking a walk through
the woods, you get a glimpse of an animal that you have never seen before. As
we do not know the name of the animal, let us call it X. Then the question
arises: “Do you understand X?”.
Surprisingly,
we can define a good way to answer that question in a reasonably precise way.
The degree of understanding that you have about something can be expressed as a
ratio between the number of questions that you can answer about that thing and
the total number of possible questions about the thing. Expressed as a
percentage, this ranges from 0, when you cannot answer a single question, to
100%, when you can answer all the questions.
Back to our
X, we could ask many questions like what does X eat, is it a predator, is it a
prey, can it fly, where does it hide and so on. You must concede that, at this
point, your understanding of X, in spite of the fact that you have experienced
it directly, is very close to 0. In order to increase your understanding of X
you must go back to the woods and see more of it.
Now suppose
that you manage to see X again, but this time the animal is perfectly still.
You stay there, until the cows come home and at the end of the day you realize
that you did not increase your understanding of X by one iota. You could have
been looking at a picture of X all day and the result would be the same. Then
you realize that what you need to increase your understanding of X is to see it
how it relates to other things in the environment. For an animated thing, as is
the case of X, the best way to do this is to watch how the animal interacts
with its environment: If you see it making a nest in an eucalyptus you know
where it lives, if you see it chasing a rabbit you will know that it is a
predator, and so on. In the case of an inanimate object, we can look at things
like how other living things behave towards the object, or if the object
appears preferentially in the vicinity of some object.
Now,
suppose that I tell you a new word: Xdruch. You can gain understanding about
this new word using the same mechanisms described above to understand X,
precisely because they are the same thing: An element of your environment. What
you need here is to discover how Xdruch relates to other elements, natural or
artificial, of the environment including, of course, other words.
It can very
well happen that, as you gain understanding about Xdruch, you come to the
conclusion that the meaning of the word overlaps what you know about X, the
animal that you encountered in the woods. Then you can merge the knowledge that
you acquired separately into a single entity and from now on, the appearance of
the animal or the word now associated with it evokes the same thoughts.
In that
case, what some people call grounding is no more than a process of associating
the understanding of some element of the environment with some other element.
It is a process of transference and has nothing to do with the process of
acquiring understanding.
It can
happen, of course, that Xdruch is a word that I have just invented and
therefore, it will not have any relationships. Then I can establish some artificially,
for instance by associating the word with X, the animal that you saw earlier in
the forest. Or I can associate it with some other artefact, like, for instance
that beautiful chair that I have just made. In this case we are again just
transferring the understanding of some element of the environment with some
other element.
Once we
know the relationships between some element and other elements of the
environment we can proceed to the next step: The discovery of the rules that
govern those relationships. This provides us with what we need to make an
attempt to a definition of understanding: It is the process of discovering the
rules underlying the relationships between elements of the environment. This
makes understanding the main objective of intelligence, because once you
understand the rules you can make predictions and this is one of the most
powerful survival tools that an individual can possess.
The Chinese
Room thought experiment is clearly a philosopher’s device. They like to set up
these mental constructs, based on impossible premises that they pull out of the
thin air of stratosphere. Then, although they virtuously deny it in public, we
all know that they secretly wish that their little experiments have an impact in
real life; that is human nature and it is understandable. So, when they say
that the Chinese Rom is able to answer any question that an outsider may pose,
there is nothing we can do about it because this is a premise to the problem.
However,
for us, lesser mortals, who have to deal with the realities of everyday life,
things are not so simple. Therefore, we must investigate whether it is possible
to build such a system, using the knowledge and technologies currently
available. As we have seen in a previous section, what is involved here is
whether a system that engages in a conversation by proxy with its users can
indeed pass the Turing Test.
In my
article The Turing
Test I make the case that probably we will never build a machine that
passes the Turing Test. Not that this is prevented by some well-established law
of physics, but because the benefits that such a machine would bring do not
justify the costs involved. This is not a defeatist position, as I think that
some day we will be able to build human-level (as opposed to human-like)
intelligence, if not superior. In any case, if we ever come close to build a
machine that passes the Turing Test it will, almost certainly, be an embodied
machine with human like sensors and capable of interacting with the world very
much in the way we do. In other words, it will be a mobile robot.
From the
above you must infer that the answer to the question of whether it is possible
to build a machine that behaves like the Chinese Room is no. The most
compelling argument for this position is really the fact that the best minds in
the field have been trying precisely that for the best part of 50 years now,
backed by huge amounts of money and we are no closer to the goal than we were
50 years ago.
Some
people, however, are not convinced by this argument and think that perhaps just
a little more effort is what is needed. If you are among those, just consider
this: For such a machine to be successful you have to contemplate every
possible question that someone could ever ask the machine. Just consider the
multiple ways in which an idea can be expressed, the little variations like
punctuation and the use of synonyms. The number of possible combinations gets
quickly out of hand. If the machine was able to get the idea behind the words,
it would be easier, but that requires understanding and that is what a machine
modelled after the Chinese Room, by definition, cannot do.
But it gets
worse. Now consider a normal conversation between two persons. What one person
says depends not only on what the other person just said but also on all the
previous sentences uttered by both persons during the whole conversation. Not
only that but a response may depend on some aspect of the world, as perceived
by any of the interlocutors. Then the number of responses for each question
that you must contemplate beforehand just raises to impossible levels. This
raises two very important issues: One is the amount of work needed to set up
such a system; the other is the huge amount of storage space for such a thing.
As if the
above was not bad enough, the worse is yet to come. It is not a matter of if,
it is a matter of when someone comes up with a question that you have not
contemplated. In that event, a system like the Chinese Room would be at a
complete loss. Human beings can cope nicely with this situation: If everything
else fails, they can always resort to asking for clarifications. Our mechanical
system cannot do that, for the simple reason that the question was not
contemplated in the first place. But humans can do even better: They can infer
the meaning of the question by establishing how close it is to other questions
for which they know the response. Then they can use that with a degree of confidence
that is proportional to the degree of similarity.
In this
section we will investigate what it would take for a machine to learn Chinese.
Here the Chinese Room metaphor fails us because it contains an element, the
Englishman, which already possesses all that is needed to learn Chinese.
Therefore we will abandon for a moment the Chinese Room metaphor and discuss
first what is needed for the Englishman to learn Chinese and then ask the same
thing for a computer system.
It is
reasonable to assume that nobody disputes the fact that the Englishman already
has in his brain all the machinery needed to learn Chinese, or indeed any other
language. Then the question that remains is if the Chinese Room provides a
suitable environment for him to do so.
There are
probably several methods by which we learn our native language. But one popular
method, used throughout the world in language schools dedicated to teaching a
second language goes like this: The professor says some sentence in the target
language and lets the student find the appropriate response. If the student is
incapable of doing so, the professor supplies the correct response, lets the
student repeat it and then moves on to the next phrase.
This method
is then based on the fundamental fact that the student is supplied with
sentence-response pairs and he is able to learn the target language from that.
If this is so, then the Chinese Room has everything that the Englishman needs
to learn the language. He has access to both the questions posed by outsiders
and the respective responses that he can get by following the instructions in
the rulebook.
We are in
an altogether different situation if we consider the Chinese Room as a computer
system running a program. Here we do not have a rulebook written by someone
intelligent to provide the responses. Therefore for the system to learn
Chinese, we must supply both the questions posed by outsiders and the
respective responses.
But the
program inside the computer plays the same role as the rulebook in the Chinese
Room metaphor. If this rulebook does not contain the responses to the questions
directly what does it contain? This is the question that Artificial
intelligence must answer. But if what has been said in previous sections is
valid, it will contain primarily instructions that enable it to discover and
explore useful relationships between elements of the language.
From the
above, we must conclude the Chinese Room is not an intelligent system.
Therefore, any conclusion that we attempt to draw from the experiment relative
to intelligent systems is fundamentally flawed.
This
article also raises serious doubts as to whether a system based on the
“conversation by proxy” paradigm can pretend to one day be able to pass the
Turing Test.
On a more
positive note, this article shows some avenues to explore in order to tackle
the thorny problem of semantics.
So where do
we stand in relation to the claims of Strong AI? Back where we started, I
suppose. But Artificial Intelligence people should take heart in the fact that
the Chinese Room didn't manage to prove that their efforts would never bear
fruit.
I would
like to end this article by launching a contest to find the best question to
ask the Chinese Room, that is, the one that would provide the deepest insight
into what is going on there. My entry for that contest would be:
Do you understand Chinese?
Comments and suggestions about this page are welcome and should be sent
to fadevelop@clix.pt
Rev 1.0 - This page was last modified
2005-07-22 - Copyright © 2004-2005 A.C.Esteves