The Knowledge Base Page
This page
is part of the Operation Manual for the Corby application. It describes the
care and feeding of the Knowledge Base. The other pages of the manual are:
Main page
– This is the main page of the Operation Manual.
Using Corby
– Describes the usual operation procedures.
Learning
– Explains how Corby learns.
Options –
Describes the options that control application behaviour.
Resources
– Explains how to best use your computing resources.
This
page includes the following sections:
Everything that Corby knows about the world is
stored in the knowledge base. When you say something to Corby, the first thing
it does is to parse your input into the appropriate knowledge structures and
then store the result in the knowledge base in the form of links between
knowledge elements. Therefore, the knowledge base contains, Corby’s world
model, which it uses to build responses to your questions.
The knowledge base contains many other things,
like for instance the appropriate response for each question and the
relationships between each response and elements of the context.
The
knowledge base is a set of files located at Corby’s working folder. That is the
folder that appears in the field labelled "Working folder" in the
General Options dialog box. Of all the files in that folder, the most important
(and the bigger) is the data file, whose name is. kbase.db. All the other files
are auxiliary files that either contain housekeeping information or serve the
purpose of speeding the access to the main data file. As we will see later on,
all the information contained in the auxiliary files can be recovered from the
main data file.
Technically,
the knowledge base is a multi-user database of the network type, with variable
length records, with single record lock control and transaction control
capability. Some explanation of some of these terms is in order.
The
knowledge base must be multi-user because it needs to be accessed by more than
one process at the same time: you can be using the program interactively while
the learning background task is also active; both access the knowledge base.
That the
knowledge base is of the network type reflects the fact that, according to
Corby’s underlying model, intelligence is really about relationships between
knowledge elements. This follows closely the connectionist model of Artificial
Intelligence that is at the base of Corby’s architecture. Therefore each record
in the knowledge base contains multiple links to other records. This has some
implications in error control, as we will se later on.
The single
record lock aspect means that while a process is updating a record, all the
other records in the knowledge base are available to other processes. The
alternative would be to lock the entire database while a record is being
updated, but that, although far easier to implement, would be much less
efficient.
Finally,
the transaction control capability has to do with the fact that in some
situations a process needs to do several updates to the knowledge base
simultaneously and that, in order to maintain the integrity of the data base,
either all of them or none at all must be done. So, in case that the updating
operation in interrupted in the middle, due for example to a power failure, the
system must either go back to the original version of the data base or finish
the update when the system is restarted. Corby uses the latter approach.
As said
above, each record in the knowledge base contains multiple links to other
records. The loss of one of those links is not catastrophic because Corby can
easily replace it through learning. But it is imperative that all the
referenced records exist. This is known in database parlance as referential
integrity.
Due to
architectural constraints, Corby cannot deal with a knowledge base where
referential integrity has been compromised. Therefore, a missing record or a
corrupted link make the entire knowledge base unusable. Every effort has been
made during the design and implementation phases of the application to prevent
errors that lead to a corrupted knowledge base. However, as you very well know,
“stuff happens” and due to an hardware malfunction, a software error or even a
virus, the knowledge base can get damaged. There is only one remedy to that: A
good back-up policy.
Most of
Corby’s operations lead, in a way or another, to the update of one or more
records in the knowledge base. Therefore, the program becomes very demanding in
terms of hard-disk use. This implies that Corby’s performance is heavily
dependent upon hard-disk bandwidth and access speed. To improve matters a bit,
Corby uses a memory cache to speed-up the access to the knowledge base. This
also allows for a delayed update of a record, further reducing the stress on
the file system and improving response time. The cache size, expressed as a
percentage of total system physical memory, is a parameter of the application
that can be set in the General Options dialog box, under the title "
Maximum cache size (% of main memory) ".
You can
obtain information about several aspects of the knowledge base in the
Statistics dialog box, which is opened with the View/Statistics command.

The
variables in this window that are related to the knowledge base are:
Knowledge
Base size
This
variable indicates the size of the knowledge base expressed in mega bytes. This
variable corresponds to the real size occupied by the main knowledge base data
file. This value is usually less than the one reported by the file manager, due
to deleted records, which occupy file space but are no longer in use.
This
variable doesn’t take into account the space used by the auxiliary files
(usually a fraction of the space occupied by the main data file).
The
information provided by this variable is very accurate. Unfortunately, it is
not very up to date, the reason being that it is collected by a low priority
task, which only runs when there is no other thing to do. This is the case for
some of the other variables as well.
Records
in the Knowledge Base
This
variable indicates the number of active records in the knowledge base. It
ignores the records that have been deleted.
Concepts
in the Knowledge Base
This
variable indicates the total number of concepts in the knowledge base. The
product of this variable and "Instances per concept" constitutes the
Figure of Merit. This indicates the quality level of the knowledge base.
Instances
per concept
This is the
average number of instances per concept in the knowledge base.
Items
in the cache
This
variable indicates the number of records currently in the cache.
Memory
used
This is the
amount of main memory currently used by the cache, expressed in Kbytes.
Knowledge
base version
This
variable indicates the knowledge base version. It is used to verify the
compatibility between the knowledge base and the current version of the Corby
software.
When the
knowledge base is created, it inherits the software version of Corby that
created it. This is a number in the form x.y where x is the major revision
number and y is the minor one.
As the
software evolves, it may no longer be compatible with earlier versions of the
knowledge base. In those cases, an utility will be provided that upgrades the
old knowledge base to the new version.
As you
continue using Corby, the knowledge base soon becomes a multi-gigabyte affair.
At some point in time, you realize that you have invested in it lots of CPU
resources and some of your own time. The knowledge base becomes thus a valuable
asset, worth preserving.
The only
way to protect the knowledge base against accidental loss is to do regular
back-ups. You should really be looking at it this way: It is not a matter of
if, it is a matter of when will your hard-disk fail.
There are
several back-up strategies; some give more protection others less so; some are
easier to do than others. You have to find the best compromise that suits your
needs. Here is a list of possible back-up strategies in increased security
order.
Same
hard-disk
This is the
absolute minimum back-up protection that you should have. It protects you from
some software and hardware errors but it offers no protection at all in case of
hard-disk failure, unfortunately a very common occurrence these days.
Different
hard-disk
This is
actually a very compromise between the hassle it takes and the protection it
affords. As the cost of hard-disks gets lower, you can afford to have a second
hard-disk just for back-up purposes.
This is the
philosophy behind RAID systems. Of course if your computer has a RAID system,
it will provide this level of back-up automatically for you.
Removable
media
This is the
next step in terms of security. It has the additional advantage that you can
keep the removable media in a safe place, possibly off-site.
Server
based
If your
computer is in a network and has access to a server that is back-up regularly,
this is probably the best possible back-up strategy. It is easy to do and the
server back-up is usually done according to very strict procedures that provide
a very good level of security.
As we have
seen above, Corby is able to recover all the auxiliary files from its main data
file. This process, which can take a considerable time, is called knowledge
base recovery.
When the
Corby application starts, the first thing it does is to check all the knowledge
base files. If it finds no files, Corby will show this error message:

If, during
the file check operation, Corby detects that the main data file is present but
one or more of the auxiliary files are missing or damaged, it will show this
error message:

At that
point you may opt to recover the missing files. After some time, the process
terminates and Corby will check again the files. If it finds no errors it will
start normally.
If you are
doing a back-up using removable media like for instance a CD-ROM or if you just
want to transport your knowledge base to another system using the same method,
it may happen that file will not fit in a single CD-ROM. At that point you need
a file splitter/joiner. His is an application that allows you to break up a
file into chunks of a certain size and them put them together to reconstitute the
original file.
You can use
your favourite splitter/joiner with the Corby knowledge base but the Corby
distribution package includes its own splitter/joiner. It is in the same folder
as the Corby executable and it is called KbImpEx, which stands for “knowledge
base Import/Export”. When you activate it, it looks like this:

This
program compacts the Knowledge Base files into a set of files, called a
save-set. The size of each file in the save-set doesn't exceed the value
specified in the field labelled “Maximum size of a file in the save-set” in the
main window.
This
program is also able to do the reverse, that is, take a save-set and restore
the original Knowledge Base files.
To
create a save-set, do the following:
1.
Set
the input folder to where the Knowledge Base files are.
2.
Create
a temporary folder and set that as the output folder. You can also use an
existing folder, as soon as it is empty.
3.
Set
the maximum size of each file in the save-set.
4.
Hit
go.
At the end
of this process, the output folder will have a set of files with the generic
name CorbyKB.nnn, where nnn represents a number starting at 0.
To
restore the Knowledge Base files from a save-set, do the following:
1.
Set
the input folder to where the save-set files are.
2.
Set
the output folder to where the Knowledge Base files will be restored.
3.
Hit
go.
At the end
of this process, the Knowledge Base files will ready to use. The integrity of
the files is guaranteed by a 32-bit CRC value.
IMPORTANT
NOTE
For
security reasons, this program doesn't delete any files, nor does it overwrite
existing ones. If the need arises, you must delete the files yourself. This
program requires also that the output folder be empty. This implies that the
input and output folders must be different.
The Corby
application is never idle. Even when it is not interacting with you nor reading
or writing files and the learning queue is empty, there are several background
tasks that keep working, doing the knowledge base housekeeping.
In many
instances, a knowledge element is not created in its definitive form. Instead
it evolves through a series of metamorphoses, consisting of intermediate
knowledge elements, driven by learning. When the knowledge element finally
reaches its definitive form, the intermediate ones, which served only as
scaffold, can be discarded, as they serve no useful purpose now.
There are
other instances where there is the need to delete records from the Knowledge
base, namely when its size exceeds the maximum allowed. However, a record that
is referenced by any other record in the knowledge base cannot be deleted
because referential integrity would be compromised. Corby approaches this
problem by progressively weakening all the links in all records in the
knowledge base. When a link goes below a given threshold, it is broken. As soon
as a record has no more references to it, it can be safely deleted.
The
housekeeping tasks have a very low priority and therefore they only run when
the CPU is not needed by the other more important tasks. It is therefore
important that Corby be left alone for large periods of time, so that it can do
the required housekeeping. For instance, in a learning situation where Corby
takes several hours to read a file, after it finishes, you should leave it
alone for the same amount of time it took to process the file.
Corby is
able to create some kinds of concepts just by examining the relationships of
elements it already has in the knowledge base. This done by a background task
and is another reason why you should leave it alone for large periods of time.
The concepts created this way are not the most important ones but they have the
advantage that their creation does not need human intervention.
One
consequence of the fact that the Corby application is never idle is that in the
case of a power failure, the probability is high that it occurs in the middle
of a knowledge base update. All kinds of mischief can happen in those
situations when several independent entities are trying to close up shop in a
graceful manner. This situation is beyond Corby’s control, because it involves
the operating system and the hard disk firmware, and it is one more reason for
you to make regular backups of the knowledge base. Even the lowest level of
backup (to the same hard disk) will sort you out in this situation.
Comments and suggestions about this page are welcome and should be sent
to fadevelop@clix.pt
Rev 1.0 - This page was last modified
2005-07-09 - Copyright © 2004-2005 A.C.Esteves