Corby Home page

 

 


This is the resources page of the Operation Manual for the Corby application. The other pages of the manual are:

 

Main page – This is the main page of the Operation Manual.

Using Corby – Describes the usual operation procedures.

Learning – Explains how Corby learns.

Options – Describes the options that control application behaviour.

Knowledge base – Describes back-up, restore and recovery procedures.

 

This page includes the following sections:

Overview

Learning context depth

Memory depth control

Learning depth control

Maximum time for an interactive response

Maximum time for a batch response

Text file mode

Maximum cache size

Maximum size of the Knowledge Base

 


Overview

The Corby application is very demanding in terms of computing resources. This is true of hard-disk bandwidth but it is particularly acute as far as computing resources is concerned. Personal computers available today are barely able to cope with the thing.

This should come as a surprise to no one, as what we are trying to do is a simulation of the human brain, which has billions of computing devices working in parallel – The neurones. It is true that each individual device, in itself, is not very powerful either in terms of functionality or speed. But when you consider several million of them working in parallel, that amount to a considerable computing power.

Every effort has been made during the design and coding phases of the Corby application to ensure that the program is as efficient as possible. But, at the end of the day, the only way to cope with the problem is to trade computing resources for quality, response time or both.

Corby lets you adjust some those aspects to your machine, so that you can make the most of the computing resources that you have. This page contains several sections, each one dedicated to one parameter that you can adjust, explaining the trade offs involved and how they affect program behaviour.

 

 


Learning context depth

This parameter appears in the Advanced Options dialog box and defines the size of the learning context, that is, the number of paragraphs before the current one that Corby keeps for learning purposes.

The basic learning model that Corby follows is based on stimulus-response pairs and the system must learn the appropriate response to a given stimulus. However, in many instances the response depends no only on the stimulus but also on the paragraphs the immediately precede them. Or, in other words, they depend on the context.

To establish the required relationships between a response and the context, Corby scans all the elements in the current context. The bigger the context the more elements Corby has to scan and therefore the heavier is the learning process in terms of computing resources.

You should adjust this parameter to the minimum possible that still allows you to include in the context all the information required in a learning situation.

 

 


Memory depth control

This parameter, which appears in the Advanced Options dialog box, is perhaps the most important of all in terms of how they affect the use of computing resources.

This parameter controls how many links each knowledge element maintains with other elements in the knowledge base. It affects indirectly things like the maximum size of the knowledge base or the quality of the concepts formed by Corby.

You should be very careful in changing this parameter because the effects of a change will not be immediately apparent. Moreover, when you decrease this value, the program will discard the links that are the least important and this means losing some information, which can only be replaced through learning.

If the overall system response time is far too slow in your system, you should decrease this value. If you come to the conclusion that the knowledge base is increasing too slowly you can increase this value, keeping in mind that Corby will require more system resources in most of its operations.

 


Learning depth control

This parameter appears in the Advanced Options dialog box and controls how deeply Corby will look for correlations between a response and elements of the learning context. The greater the depth the more efficient the learning process will be. The drawback is again the time needed to process a learning item.

You should lower the value of this parameter if the learning process takes too long, keeping in mind that by doing so you will decrease learning efficiency. You should increase the value of this parameter if it doesn’t increase noticeably the response time.

 


Maximum time for an interactive response

The maximum time, expresses in seconds, that Corby spends looking for a response is determined by this parameter, which appears in the Advanced dialog box.

During the time allowed, the system will try to find the best possible response. The time that this process takes depends on several factors like the number of possible responses available and the level of inference that must be applied. If the time elapses, the system will return the best response it found so far.

This is really the time that you want to wait for response from the system. You can adjust it to any value you find comfortable, knowing that too small a value will give lower quality responses.

Corby shows you that the response timed out in the Status Bar of the main window. If you see many responses with that indication, it means that your computer is not fast enough and you must put up with a longer response time.

 


Maximum time for a batch response

This is the equivalent of the "Maximum time for an interactive response" parameter but now for writing files. It appears in the Advanced dialog box.

The reason for the parameters is that you probably want to set to different values for each situation: When the Corby is writing a file you are not required to attend to the system for each response it finds.

 


Text file mode

This group of buttons in the General Options dialog box define how Corby should treat text files. It has three settings: "Chained", "Independent" and "Channel specific". The latter is not used in the current version of the software.

Usually a text file contains a set of paragraphs where each one is the logical sequence of the preceding paragraph, or, according to Corby’s base paradigm, one paragraph is the stimulus and the one that immediately follows it is the response. To process a file like this, you should use the "Chained" setting.

If, however, you have a file where each paragraph is independent of it neighbours, you should submit it with the "Independent" setting. Learning in this case will not be very efficient, but it is much faster. This is useful if you have a file containing for instance a set of famous sentences. But even with normal files, in the earlier phases of the learning process, when Corby is picking up the basic elements of the language, you can use this setting to speed up things.

 


Maximum cache size

In order to increase overall system performance, the knowledge base works in conjunction with a memory cache, where the most recently used records are kept. The cache size, expressed as a percentage of total system physical memory, is a parameter of the application that can be set in the General Options dialog box.

Ideally, the cache should keep the most probable records in a given situation. That however is impossible to determine. Therefore, it maintains the records most recently used in the hope that they will be reused later.

You should adjust the cache size to the value that gives the best overall results, knowing that too small a cache increases hard-disk activity and a too big one will slow down access to all the records in the cache.

 


Maximum size of the Knowledge Base

This parameter appears in the General Options dialog box and specifies the maximum size of the knowledge base, expressed in Mbytes.

When the real size of the knowledge base, as shown in the Statistics dialog box, exceeds the value set by this parameter, Corby starts deleting things, according to their relative importance. If you have read the Knowledge base page of the Operation Manual already, you know by now that each record in the knowledge base contains many references to other records and that Corby cannot deal with a reference to a non-existing record.

Therefore, a record that is referenced by any other record in the knowledge base cannot be deleted because referential integrity would be compromised. Corby approaches this problem by progressively weakening all the links in all records in the knowledge base. When a link goes below a given threshold, it is broken. Note that deleting a link to a record does not compromise referential integrity; only deleting a referenced record is problematic. As soon as a record has no more references to it, it can be safely deleted.

This process, as you can imagine, is very heavy in terms of computing resources. Therefore it is taken care of by a low-priority background process that only runs when there is no other thing to do. Therefore, you should not expect that Corby to instantly set the knowledge base size to the value you want. It will take some time, especially if the application is doing something else.

Of course the TANSTAAFL (There ain’t no such thing as a free lunch) principle applies here: Using the above process will cause Corby to forget things. Fortunately the things it forgets are the least important ones, given the orderly progress of the deleting process. Also the most valuable things in the Knowledge base, concepts, are less prone to be affected by the process. This weakening of the links simulates the natural decay that explains why we forget things over time.

You set this parameter to the value you intend for the size of the knowledge base. However, until that limit is reached, you should set it to a value slightly above the current size. The reason for this latter setting is that link and record deletion is actually beneficial in some circumstances. Suppose, for instance that you make a spelling mistake in a sentence submitted to Corby. This is going to cause the creation of links and records in the knowledge base that serve no useful purpose. These links, as they are seldom reinforced (unless it is a systematic error), will be the first ones to go in the process described above.

 


Feedback

Comments and suggestions about this page are welcome and should be sent to fadevelop@clix.pt

 

 


Rev 1.0 - This page was last modified 2005-07-11 - Copyright © 2004-2005 A.C.Esteves

Corby Home page