Up: The Alchemy Lite Tutorial Previous: 5 Probabilistic Relational Modeling

Overview of Code

We are working on adding more functionality into Alchemy Lite. Current lines of research include

Learning TPKBs
Parsing

However, if you are inclined to add your own functionality, this can serve as a rough guide to the Alchemy Lite source files. Understanding the code, however, is not necessary to using Alchemy Lite out of the box. :)

High-level Overview
Alchemy Lite takes in a rule file and an evidence file. The rule file contains information about the classes in the model; the evidence file contains information about particular objects. (Note: The evidence file can be blank.) The main data structure, the TMLKB, reads in the information from the files and runs inference.

The rule file is read in first (function: readInTMLRules). Afterwards, the TMLKB holds an array of TMLClass objects that store the information about each of the classes. If a TML constraint is violated by the information in the rule file, an error message is presented and Alchemy Lite exited.

The evidence file is read second (function: readInTMLFacts). As information is read in about objects, the structure of object:class nodes that represents the KB is created. After all the evidence about objects has been read in, the KB structure is fleshed out for object:class pairs that were not mentioned in the KB file. The function that does this, fillOutSPN, also computes the partition function of the KB given the evidence in the evidence file.

If a query was specified on the command line (using the -q flag) or if MAP inference was specified (using the -map flag), the TMLKB object runs the inference and returns an answer. For query inference, the function computeQueryOrAddEvidence is called. (This function can also be used to add new evidence in the interactive mode.) For MAP inference, first the computeMAPState function is called; this function runs the upwards pass of the dynamic programming algorithm. Sum nodes are replaced by max nodes, and pointers to the highest-weighted children are set. The printMAPState function runs a downwards pass through the KB and prints the MAP subclasses and relations of objects, for which evidence is not known.

If interactive mode begins, the user can specify simple facts to add (i.e., new subclass information, names for subparts, or relation evidence) or query the KB. The interactive mode also allows the user to print out a new version of the evidence file if new facts were added during the interactive mode.

For all queries (MAP and otherwise), both the command line and interactive mode all the user to specify a file to save the results of the inference. If not specified, Alchemy Lite just prints the results to the screen.
Middle-level overview
The parsing functions take advantage of the getLineToSemicolon() function which reads in the next "line" of the file (i.e., until the next semicolon or end bracket). This function ignores comments at the ends of line and lines that are just spaces or comments. If more than one semantic line occurs before a carriage return, this function stores the rest of the line in a char* that is owned by the parsing function.
- readInTMLRules()
  Alchemy Lite reads in the class information in a few passes. The first pass counts the classes so we create an array of the proper size. The second pass reads in the names of all the classes and sets up the TMLClass objects. The third pass reads in information about each of the classes. The reason for this three-pass approach is so when reading in the information for a particular class, the parser can identify and find subclasses or classes of subparts and make sure unknown classes are not used.
  
  In the third pass, readInTMLRules() calls a function that reads in one class's information, readInOneTMLClass. readInOneTMLClass reads in a class's information in two passes: first the subclasses and subparts, and then the relations and attributes. This is because relations are defined based on subparts so those subparts must be identified first. readInOneTMLClass calls different functions based on the lines that are read in: readInSubclasses, readInSubparts, readInRelations, and readInAttribute.
  
  When subpart information is learned, Alchemy Lite checks ancestors of this class to see if this is a overriding part. For example, a Family may have 2 parents by default, but a OneParentFamily may override this value to 1. This information is added to the ancestor class so that the inference function waits until the finest subclass that contains a particular part is reached to compute the subpart tree functions. Continuing with the Family example, if the Smiths are known to be a OneParentFamily, then during inference, only the subpart branch for the one parent is examined from the Smiths:OneParentFamily node and are ignored at the Smiths:Family node. If we only know the Smiths's are a Family, then the 2 parent subpart branches are examined from the Smiths:Family node.
  
  The same process occurs when reading in relations and attributes. If a relation with the same name has been defined in an ancestor class, Alchemy Lite (1) determines that the definitions are the same, and (2) annotates the ancestor class that its relation will be overriden for that subclass branch. We do this because the number of possible ground relations for an object may be different depending on its subclass information. If an attribute has been declared in an ancestor class, any values it has at that class that are not identified in the finer class are added to the list of possible values for the new attribute. The weights of values that are defined in both classes are added together for the finer class's attribute.
  After all the class information is read in, Alchemy Lite checks to make sure there is only one Top Class (a class that is not a subclass or subparts (or has a descendant that is a subpart) of any other class. In the future, we could just make this a warning and these classes would be ignored.
- readInTMLFacts()
  This function reads in the evidence file. Instead of creating a full KB with a node for every class:object pair and then removing branches based on evidence, we create nodes for object:class pairs that there is evidence about. Then afterwards, any remaining nodes that are possible given the evidence are created. For example, if the evidence file says that the Smiths are a OneParentFamily, then no branch for Smiths.Adult[2] is created at any point. This helps us potentially save on space requirements.
  
  An object is defined to be of some class either when it was declared as a subpart of a different object or on the first line of its description (e.g., Family Smiths {). Alchemy Lite initializes the object to a class by creating a sequence of Nodes that represent the classes of the object. Any branches that represent classes that the object cannot be based on its initializing class information are not created and never will be. For each object description, Alchemy Lite calls readInOneObject(). This function runs through the object description in two passes: first to read in subclasses and subparts, the second for relations and attributes. The reasoning for this is the same as it was for classes. Once the subclasses line is read in (or if no line exists), Alchemy Lite fills in the rest of the possible class:object Nodes for an object (fillOutSubclasses).
  
  Within each Node, there is a set of counts for each relation that appears for the class of the object:class pair for that Node. These counts store how many positive, negative, and unknown groundings of that relation there are for that object, which is used for inference. (The inference call that computes a relation's weight for a given node during inference is relWeight.) String versions of relations are stored in a hashmap for each object with their polarity. When queried about a relation, Alchemy Lite checks the hashmap for the object first. If the relation is not found, then inference is run.
- fillOutSPN
  This function creates Nodes for subparts that were left unnamed in the evidence file. It also computes log(Z) of the KB given the evidence file.
Low-level overview
- Since Alchemy Lite is written in C, we had limited access to general data structures. We take advantage of uthash (http://troydhanson.github.io/uthash/, written by Troy D. Hanson) for hash tables. The .h and .c files for uthash are packaged with the Alchemy Lite release and no outside downloads or installations are required. To learn more about the functions used to interface with uthash hash tables, we refer you to the above website that has a very detailed user's manual.
- Data Structures
  The main data structure in Alchemy Lite is the TMLKB which stores all the information about the TML knowledge base. The TMLKB stores:
  - An array of all the classes (stored as TMLClass objects) and a pointer to the Top Class (topcl)
  - A pointer to the root of the knowledge base, stored as a Name_and_Ptr object that stores the name of the Top Object and a pointer to the root node
  - Hashmaps of names/pathnames of objects to Nodes in the Knowledge base
  - Hashmap from object names to hashmaps of known relation facts for the objects
  - The value of the KB (i.e., value of the SPN created from the KB)
  - A stack of edits (KBEdit*s) that are created during interactive mode. This is stored so that the user can reset the database.
  - classToObjPtrs stores a numClasses-sized array of hash maps that point from object names to the that object's coarsest-typed node.
  The data structure for class information is the TMLClass. The data structure for a relation (for a particular class) is TMLRelation. The information about a subpart (e.g., name, class, whether it is overridden at finer subclasses) is in a TMLPart. Information about an attribute is in TMLAttribute, with TMLAttrValue storing information about a particular value of an attribute. More detailed information can be found in TMLClass.h.
  
  The data structure for an object:class node in the SPN created by a TMLKB is a Node. It stores information about the parent and child nodes it connects to as well as information about the object at that level of class information. More detailed information can be found in Node.h.

TODO

Up: The Alchemy Lite Tutorial Previous: 5 Probabilistic Relational Model

Chloe Kiddon 2013-04-01