The Resources Page
This is the
resources page of the Operation Manual for the Corby application. The other
pages of the manual are:
Main page
– This is the main page of the Operation Manual.
Using Corby
– Describes the usual operation procedures.
Learning
– Explains how Corby learns.
Options –
Describes the options that control application behaviour.
Knowledge
base – Describes back-up, restore and recovery procedures.
This
page includes the following sections:
Maximum time for an
interactive response
Maximum time for a batch
response
Maximum size of the
Knowledge Base
The Corby
application is very demanding in terms of computing resources. This is true of
hard-disk bandwidth but it is particularly acute as far as computing resources
is concerned. Personal computers available today are barely able to cope with
the thing.
This should
come as a surprise to no one, as what we are trying to do is a simulation of
the human brain, which has billions of computing devices working in parallel –
The neurones. It is true that each individual device, in itself, is not very
powerful either in terms of functionality or speed. But when you consider
several million of them working in parallel, that amount to a considerable
computing power.
Every
effort has been made during the design and coding phases of the Corby
application to ensure that the program is as efficient as possible. But, at the
end of the day, the only way to cope with the problem is to trade computing
resources for quality, response time or both.
Corby lets
you adjust some those aspects to your machine, so that you can make the most of
the computing resources that you have. This page contains several sections,
each one dedicated to one parameter that you can adjust, explaining the trade
offs involved and how they affect program behaviour.
This
parameter appears in the Advanced Options dialog box and defines the size of
the learning context, that is, the number of paragraphs before the current one
that Corby keeps for learning purposes.
The basic
learning model that Corby follows is based on stimulus-response pairs and the
system must learn the appropriate response to a given stimulus. However, in
many instances the response depends no only on the stimulus but also on the
paragraphs the immediately precede them. Or, in other words, they depend on the
context.
To
establish the required relationships between a response and the context, Corby
scans all the elements in the current context. The bigger the context the more
elements Corby has to scan and therefore the heavier is the learning process in
terms of computing resources.
You should
adjust this parameter to the minimum possible that still allows you to include
in the context all the information required in a learning situation.
This
parameter, which appears in the Advanced Options dialog box, is perhaps the
most important of all in terms of how they affect the use of computing resources.
This
parameter controls how many links each knowledge element maintains with other
elements in the knowledge base. It affects indirectly things like the maximum
size of the knowledge base or the quality of the concepts formed by Corby.
You should
be very careful in changing this parameter because the effects of a change will
not be immediately apparent. Moreover, when you decrease this value, the
program will discard the links that are the least important and this means
losing some information, which can only be replaced through learning.
If the
overall system response time is far too slow in your system, you should
decrease this value. If you come to the conclusion that the knowledge base is
increasing too slowly you can increase this value, keeping in mind that Corby
will require more system resources in most of its operations.
This
parameter appears in the Advanced Options dialog box and controls how deeply
Corby will look for correlations between a response and elements of the learning
context. The greater the depth the more efficient the learning process will be.
The drawback is again the time needed to process a learning item.
You should
lower the value of this parameter if the learning process takes too long,
keeping in mind that by doing so you will decrease learning efficiency. You
should increase the value of this parameter if it doesn’t increase noticeably
the response time.
The maximum
time, expresses in seconds, that Corby spends looking for a response is
determined by this parameter, which appears in the Advanced dialog box.
During the
time allowed, the system will try to find the best possible response. The time
that this process takes depends on several factors like the number of possible
responses available and the level of inference that must be applied. If the
time elapses, the system will return the best response it found so far.
This is
really the time that you want to wait for response from the system. You can
adjust it to any value you find comfortable, knowing that too small a value
will give lower quality responses.
Corby shows
you that the response timed out in the Status Bar of the main window. If you
see many responses with that indication, it means that your computer is not
fast enough and you must put up with a longer response time.
This is the
equivalent of the "Maximum time for an interactive response"
parameter but now for writing files. It appears in the Advanced dialog box.
The reason
for the parameters is that you probably want to set to different values for
each situation: When the Corby is writing a file you are not required to attend
to the system for each response it finds.
This group
of buttons in the General Options dialog box define how Corby should treat text
files. It has three settings: "Chained", "Independent" and
"Channel specific". The latter is not used in the current version of
the software.
Usually a
text file contains a set of paragraphs where each one is the logical sequence
of the preceding paragraph, or, according to Corby’s base paradigm, one
paragraph is the stimulus and the one that immediately follows it is the
response. To process a file like this, you should use the "Chained"
setting.
If,
however, you have a file where each paragraph is independent of it neighbours,
you should submit it with the "Independent" setting. Learning in this
case will not be very efficient, but it is much faster. This is useful if you
have a file containing for instance a set of famous sentences. But even with
normal files, in the earlier phases of the learning process, when Corby is
picking up the basic elements of the language, you can use this setting to
speed up things.
In order to
increase overall system performance, the knowledge base works in conjunction
with a memory cache, where the most recently used records are kept. The cache
size, expressed as a percentage of total system physical memory, is a parameter
of the application that can be set in the General Options dialog box.
Ideally,
the cache should keep the most probable records in a given situation. That
however is impossible to determine. Therefore, it maintains the records most
recently used in the hope that they will be reused later.
You should
adjust the cache size to the value that gives the best overall results, knowing
that too small a cache increases hard-disk activity and a too big one will slow
down access to all the records in the cache.
This
parameter appears in the General Options dialog box and specifies the maximum
size of the knowledge base, expressed in Mbytes.
When the
real size of the knowledge base, as shown in the Statistics dialog box, exceeds
the value set by this parameter, Corby starts deleting things, according to
their relative importance. If you have read the Knowledge
base page of the Operation Manual already, you know by now that each record
in the knowledge base contains many references to other records and that Corby
cannot deal with a reference to a non-existing record.
Therefore,
a record that is referenced by any other record in the knowledge base cannot be
deleted because referential integrity would be compromised. Corby approaches this
problem by progressively weakening all the links in all records in the
knowledge base. When a link goes below a given threshold, it is broken. Note
that deleting a link to a record does not compromise referential integrity;
only deleting a referenced record is problematic. As soon as a record has no
more references to it, it can be safely deleted.
This
process, as you can imagine, is very heavy in terms of computing resources.
Therefore it is taken care of by a low-priority background process that only runs
when there is no other thing to do. Therefore, you should not expect that Corby
to instantly set the knowledge base size to the value you want. It will take
some time, especially if the application is doing something else.
Of course
the TANSTAAFL (There ain’t no such thing as a free lunch) principle applies
here: Using the above process will cause Corby to forget things. Fortunately
the things it forgets are the least important ones, given the orderly progress
of the deleting process. Also the most valuable things in the Knowledge base,
concepts, are less prone to be affected by the process. This weakening of the
links simulates the natural decay that explains why we forget things over time.
You set
this parameter to the value you intend for the size of the knowledge base.
However, until that limit is reached, you should set it to a value slightly
above the current size. The reason for this latter setting is that link and
record deletion is actually beneficial in some circumstances. Suppose, for
instance that you make a spelling mistake in a sentence submitted to Corby.
This is going to cause the creation of links and records in the knowledge base
that serve no useful purpose. These links, as they are seldom reinforced
(unless it is a systematic error), will be the first ones to go in the process
described above.
Comments and suggestions about this page are welcome and should be sent
to fadevelop@clix.pt
Rev 1.0 - This page was last modified
2005-07-11 - Copyright © 2004-2005 A.C.Esteves