The Knowledge Base Page
This page is part of the Operation Manual for the Corby application. It describes the care and feeding of the Knowledge Base. The other pages of the manual are:
Main page – This is the main page of the Operation Manual.
Using Corby – Describes the usual operation procedures.
Learning – Explains how Corby learns.
Options – Describes the options that control application behaviour.
Resources – Explains how to best use your computing resources.
This page includes the following sections:
Everything that Corby knows about the world is stored in the knowledge base. When you say something to Corby, the first thing it does is to parse your input into the appropriate knowledge structures and then store the result in the knowledge base in the form of links between knowledge elements. Therefore, the knowledge base contains, Corby’s world model, which it uses to build responses to your questions.
The knowledge base contains many other things, like for instance the appropriate response for each question and the relationships between each response and elements of the context.
The knowledge base is a set of files located at Corby’s working folder. That is the folder that appears in the field labelled "Working folder" in the General Options dialog box. Of all the files in that folder, the most important (and the bigger) is the data file, whose name is. kbase.db. All the other files are auxiliary files that either contain housekeeping information or serve the purpose of speeding the access to the main data file. As we will see later on, all the information contained in the auxiliary files can be recovered from the main data file.
Technically, the knowledge base is a multi-user database of the network type, with variable length records, with single record lock control and transaction control capability. Some explanation of some of these terms is in order.
The knowledge base must be multi-user because it needs to be accessed by more than one process at the same time: you can be using the program interactively while the learning background task is also active; both access the knowledge base.
That the knowledge base is of the network type reflects the fact that, according to Corby’s underlying model, intelligence is really about relationships between knowledge elements. This follows closely the connectionist model of Artificial Intelligence that is at the base of Corby’s architecture. Therefore each record in the knowledge base contains multiple links to other records. This has some implications in error control, as we will se later on.
The single record lock aspect means that while a process is updating a record, all the other records in the knowledge base are available to other processes. The alternative would be to lock the entire database while a record is being updated, but that, although far easier to implement, would be much less efficient.
Finally, the transaction control capability has to do with the fact that in some situations a process needs to do several updates to the knowledge base simultaneously and that, in order to maintain the integrity of the data base, either all of them or none at all must be done. So, in case that the updating operation in interrupted in the middle, due for example to a power failure, the system must either go back to the original version of the data base or finish the update when the system is restarted. Corby uses the latter approach.
As said above, each record in the knowledge base contains multiple links to other records. The loss of one of those links is not catastrophic because Corby can easily replace it through learning. But it is imperative that all the referenced records exist. This is known in database parlance as referential integrity.
Due to architectural constraints, Corby cannot deal with a knowledge base where referential integrity has been compromised. Therefore, a missing record or a corrupted link make the entire knowledge base unusable. Every effort has been made during the design and implementation phases of the application to prevent errors that lead to a corrupted knowledge base. However, as you very well know, “stuff happens” and due to an hardware malfunction, a software error or even a virus, the knowledge base can get damaged. There is only one remedy to that: A good back-up policy.
Most of Corby’s operations lead, in a way or another, to the update of one or more records in the knowledge base. Therefore, the program becomes very demanding in terms of hard-disk use. This implies that Corby’s performance is heavily dependent upon hard-disk bandwidth and access speed. To improve matters a bit, Corby uses a memory cache to speed-up the access to the knowledge base. This also allows for a delayed update of a record, further reducing the stress on the file system and improving response time. The cache size, expressed as a percentage of total system physical memory, is a parameter of the application that can be set in the General Options dialog box, under the title " Maximum cache size (% of main memory) ".
You can obtain information about several aspects of the knowledge base in the Statistics dialog box, which is opened with the View/Statistics command.
The variables in this window that are related to the knowledge base are:
Knowledge Base size
This variable indicates the size of the knowledge base expressed in mega bytes. This variable corresponds to the real size occupied by the main knowledge base data file. This value is usually less than the one reported by the file manager, due to deleted records, which occupy file space but are no longer in use.
This variable doesn’t take into account the space used by the auxiliary files (usually a fraction of the space occupied by the main data file).
The information provided by this variable is very accurate. Unfortunately, it is not very up to date, the reason being that it is collected by a low priority task, which only runs when there is no other thing to do. This is the case for some of the other variables as well.
Records in the Knowledge Base
This variable indicates the number of active records in the knowledge base. It ignores the records that have been deleted.
Concepts in the Knowledge Base
This variable indicates the total number of concepts in the knowledge base. The product of this variable and "Instances per concept" constitutes the Figure of Merit. This indicates the quality level of the knowledge base.
Instances per concept
This is the average number of instances per concept in the knowledge base.
Items in the cache
This variable indicates the number of records currently in the cache.
This is the amount of main memory currently used by the cache, expressed in Kbytes.
Knowledge base version
This variable indicates the knowledge base version. It is used to verify the compatibility between the knowledge base and the current version of the Corby software.
When the knowledge base is created, it inherits the software version of Corby that created it. This is a number in the form x.y where x is the major revision number and y is the minor one.
As the software evolves, it may no longer be compatible with earlier versions of the knowledge base. In those cases, an utility will be provided that upgrades the old knowledge base to the new version.
As you continue using Corby, the knowledge base soon becomes a multi-gigabyte affair. At some point in time, you realize that you have invested in it lots of CPU resources and some of your own time. The knowledge base becomes thus a valuable asset, worth preserving.
The only way to protect the knowledge base against accidental loss is to do regular back-ups. You should really be looking at it this way: It is not a matter of if, it is a matter of when will your hard-disk fail.
There are several back-up strategies; some give more protection others less so; some are easier to do than others. You have to find the best compromise that suits your needs. Here is a list of possible back-up strategies in increased security order.
This is the absolute minimum back-up protection that you should have. It protects you from some software and hardware errors but it offers no protection at all in case of hard-disk failure, unfortunately a very common occurrence these days.
This is actually a very compromise between the hassle it takes and the protection it affords. As the cost of hard-disks gets lower, you can afford to have a second hard-disk just for back-up purposes.
This is the philosophy behind RAID systems. Of course if your computer has a RAID system, it will provide this level of back-up automatically for you.
This is the next step in terms of security. It has the additional advantage that you can keep the removable media in a safe place, possibly off-site.
If your computer is in a network and has access to a server that is back-up regularly, this is probably the best possible back-up strategy. It is easy to do and the server back-up is usually done according to very strict procedures that provide a very good level of security.
As we have seen above, Corby is able to recover all the auxiliary files from its main data file. This process, which can take a considerable time, is called knowledge base recovery.
When the Corby application starts, the first thing it does is to check all the knowledge base files. If it finds no files, Corby will show this error message:
If, during the file check operation, Corby detects that the main data file is present but one or more of the auxiliary files are missing or damaged, it will show this error message:
At that point you may opt to recover the missing files. After some time, the process terminates and Corby will check again the files. If it finds no errors it will start normally.
If you are doing a back-up using removable media like for instance a CD-ROM or if you just want to transport your knowledge base to another system using the same method, it may happen that file will not fit in a single CD-ROM. At that point you need a file splitter/joiner. His is an application that allows you to break up a file into chunks of a certain size and them put them together to reconstitute the original file.
You can use your favourite splitter/joiner with the Corby knowledge base but the Corby distribution package includes its own splitter/joiner. It is in the same folder as the Corby executable and it is called KbImpEx, which stands for “knowledge base Import/Export”. When you activate it, it looks like this:
This program compacts the Knowledge Base files into a set of files, called a save-set. The size of each file in the save-set doesn't exceed the value specified in the field labelled “Maximum size of a file in the save-set” in the main window.
This program is also able to do the reverse, that is, take a save-set and restore the original Knowledge Base files.
To create a save-set, do the following:
1. Set the input folder to where the Knowledge Base files are.
2. Create a temporary folder and set that as the output folder. You can also use an existing folder, as soon as it is empty.
3. Set the maximum size of each file in the save-set.
4. Hit go.
At the end of this process, the output folder will have a set of files with the generic name CorbyKB.nnn, where nnn represents a number starting at 0.
To restore the Knowledge Base files from a save-set, do the following:
1. Set the input folder to where the save-set files are.
2. Set the output folder to where the Knowledge Base files will be restored.
3. Hit go.
At the end of this process, the Knowledge Base files will ready to use. The integrity of the files is guaranteed by a 32-bit CRC value.
For security reasons, this program doesn't delete any files, nor does it overwrite existing ones. If the need arises, you must delete the files yourself. This program requires also that the output folder be empty. This implies that the input and output folders must be different.
The Corby application is never idle. Even when it is not interacting with you nor reading or writing files and the learning queue is empty, there are several background tasks that keep working, doing the knowledge base housekeeping.
In many instances, a knowledge element is not created in its definitive form. Instead it evolves through a series of metamorphoses, consisting of intermediate knowledge elements, driven by learning. When the knowledge element finally reaches its definitive form, the intermediate ones, which served only as scaffold, can be discarded, as they serve no useful purpose now.
There are other instances where there is the need to delete records from the Knowledge base, namely when its size exceeds the maximum allowed. However, a record that is referenced by any other record in the knowledge base cannot be deleted because referential integrity would be compromised. Corby approaches this problem by progressively weakening all the links in all records in the knowledge base. When a link goes below a given threshold, it is broken. As soon as a record has no more references to it, it can be safely deleted.
The housekeeping tasks have a very low priority and therefore they only run when the CPU is not needed by the other more important tasks. It is therefore important that Corby be left alone for large periods of time, so that it can do the required housekeeping. For instance, in a learning situation where Corby takes several hours to read a file, after it finishes, you should leave it alone for the same amount of time it took to process the file.
Corby is able to create some kinds of concepts just by examining the relationships of elements it already has in the knowledge base. This done by a background task and is another reason why you should leave it alone for large periods of time. The concepts created this way are not the most important ones but they have the advantage that their creation does not need human intervention.
One consequence of the fact that the Corby application is never idle is that in the case of a power failure, the probability is high that it occurs in the middle of a knowledge base update. All kinds of mischief can happen in those situations when several independent entities are trying to close up shop in a graceful manner. This situation is beyond Corby’s control, because it involves the operating system and the hard disk firmware, and it is one more reason for you to make regular backups of the knowledge base. Even the lowest level of backup (to the same hard disk) will sort you out in this situation.
Comments and suggestions about this page are welcome and should be sent to firstname.lastname@example.org
Rev 1.0 - This page was last modified 2005-07-09 - Copyright © 2004-2005 A.C.Esteves