Next: 3 The Basics Up: The Alchemy Tutorial Previous: The Alchemy Tutorial

2 Tractable Markov Logic

TML is a language for building models of domains that have both complex logical structure and uncertainty. It can represent objects, classes, and relations between objects, subject to certain restrictions which ensure that inference in any model built in TML (we call such models Tractable Probabilistic Knowledge Bases, or TPKBs) can be queried efficiently. In TML, the structure of objects is determined by their class memberships. A class determines what subparts any instance object has, as well as what relations are allowed over its subparts, and with what probability those relations hold. Classes may also describe multi-valued attributes, along with, for each attribute, a probability distribution over its possible values. The class itself may have subclasses, which describe further structure, as well as a probability distribution over possible subclass memberships. All probabilities are specified with weights by way of log-linear models. Relations are restricted to only be over the subparts of the same object, which prevents the model from becoming intractable while still allowing for intuitive models. Below we described TML in enough detail to start using it. For more information, consult the original TML paper.

Structure

A TPKB is a set of class declarations and object declarations that obey certain restrictions.

A class declaration for class ClassName specifies the subparts, subclasses, relations and attributes for ClassName, as well as weights for the subclasses, relations, and attributes.

class ClassName { subclasses Subclass1 wt1, Subclass2 wt2, ..., SubclassN wtN; subparts SubpartClass1[n], SubpartClass2 SubpartName2[n], SubpartClass3 SubpartName3, SubpartClass4, ...; relations Relation(SubpartArg,...,SubpartArg) wt, HardRelation(SubpartArg), !NegHardRelation(), ....; ... }

When we specify that a class ClassName has subclasses SubClass1,...,SubClassN, it means that any object of class ClassName must belong to exactly one of the subclasses SubClass1,...,SubClassN, with the probability distribution over subclass memberships determined by the weights w1 , ...,wN . For instance, the class Family may have subclasses TwoParentFamily, which has two Adult parts, and OneParentFamily, with one, with respective weights 1.2 and 0.3. We refer to the subclasses of class ClassName as S(ClassName).

The subparts of ClassName are parts which every object of class ClassName must have and are specified by the class membership of that part, SubpartClass, an optional part name SubpartName, and the number of parts of that type where the number is optional and 1 by default. For instance an object of class TwoParentFamily might have a single Pet subpart of class Animal and two Adult subparts, each of class Person, so if the Smiths are a TwoParentFamily then we can refer to their pet as Smiths.Pet and the adults as Smiths.Adult[1] and Smiths.Adult[2]. We refer to the subparts of class ClassName as P(ClassName).

The number and class of subparts can be overridden by declarations in subclasses. For example, the default for a Family might be 2 Adults. However, its subclass OneParentFamily might override the default to have only 1 Adult subpart. The classes of subparts can also be overridden, but the overriding classes must be subclasses of the declarations at the higher levels. For example, a Family might default to 1 Animal as a Pet. Then a CatLadyFamily might override Family to have ten Cats as Pets. However, you could not have a subclass that has a Toaster as a Pet since Toaster is not a subclass of Animal.

In Alchemy Lite, all subparts with the same subpart name are treated the same in relations; that is, you cannot specify a relation with an argument that is a particular indexed subpart with some subpart name. To illustrate, if TwoParentFamily was defined with the subpart line

subparts Adult[2];

then you could define the relation Married(Adult,Adult), which would create four groundings. However you cannot define the relation Married(Adult[1],Adult[2]). If two subparts are going to be treated differently in some relation(s), then they must be defined with different subpart names. E.g.:

subparts Adult1, Adult2;
relations Married(Adult1,Adult2) 1.2

The relations of ClassName specify what relationships may hold among the subparts of any object of class ClassName, and with what weight. For instance, the class TwoParentFamily may have the relation Married(Adult1, Adult2).

Relations may apply to the object as a whole, rather than to its subparts, in which case we can omit parentheses and arguments, for example the relation Mortgage may apply to a family, and be true if that family holds a mortgage. Note that relations may only take an object and/or its subparts as arguments. This restriction is key in maintaining tractability. Negated relations are written with a ! immediately preceding them. The reason we use weights rather than probabilities is compactness - the weight of a relation rule in a class need only represent the change in the relation's log probability from whatever ancestor class it was introduced at. For instance, if TwoParentFamily and and OneParentFamily are both subclasses of class Family, and two- and one-parent families are equally likely to hold a mortgage, then the Mortgage relation need only appear at the Family class level. However, if two parent families are more likely to hold a mortgage, then Mortgage would appear again under TwoParentFamily with positive weight. We refer to the relations of class ClassName as R(ClassName).

The weights of relations can be overridden in subclasses; however, the relation itself must contain the same arguments.

Relations can be more general than those defined above, specifically, we can have multivalued (as opposed to binary-valued) attributes. However, these are handled semantically like subclasses, so we omit them here for brevity but describe how to use them in the next section.

Object declarations introduce instances of classes, along with whatever information is known about the object declared. For example,

Family Smiths { TwoParentFamily; Adult[1] Sam, Adult[2] Avery, Child[1] Anna; Parent(Sam,Anna), !Parent(Avery,Anna); }

This describes a family, the Smiths, which has two parents, named Sam and Avery, and one child, named Anna, in which Sam is Anna's biological parent but Avery is not.

Restrictions

There are a number of structural restrictions on class and object declarations necessary to ensure that inference is tractable. To describe these we must first define a notion of ancestor/descendent in the TPKB. It is useful to think of an object ObjectName at a particular class level ClassName, referred to as (ObjectName,ClassName). We extend this notation to include the ground atom R(ObjectName,...), which refers to the relation R(...) of class ClassName grounded with the relevant subparts of object ObjectName. With the, we can formalize the notion of a descendant.

The descendants of the pair (ObjectName,ClassName) in TPBK K are

1. (P(ObjectName),SubpartClass) for each P (with class SubpartClass) a subpart of ObjectName according to ClassName.

2. (ObjectName,SubclassName) for each SubclassName a subclass of ClassName.

3. R(ObjectName,...) for each relation R of ClassName

4. The descendants of each of the above.

There are no other descendants of (ObjectName,ClassName), and the first three types of descendant are referred to as children of (ObjectName,ClassName).

With this, the full definition of a TPKB is straightforward. The overarching purpose of the following definition is to allow the distribution defined by the TPKB to have an efficient, recursive decomposition. The first restriction gives the decomposition a starting point. The second ensures that the subparts of an object are independent, not mentioning anything they can disagree on, so that the distribution factors as a product over them. The third prevents contradictory relation truth values. The fourth prevents an object from having itself as a descendant, which is nonsensical, or from having infinitely many descendants.

A set K of class and object declarations is a TPKB iff it satisfies the following conditions:

1. There is a single object, TopObject which is the sole instance of class TopClass, such that (TopObject,TopClass) has every other (ObjectName,ClassName) in K as a descendant.

2. For any two subparts PartA(ObjectName) and PartB(ObjectName) (introduced with classes ClassA and ClassB) of object ObjectName , (PartA(ObjectName),ClassA) and (PartB(ObjectName),ClassB) share as descendants only objects (ObjectNameX,ClassX) such that ClassX has no superclasses, subclasses, or soft relations. Note that this holds even if the two subparts are introduced in different possible classes of ObjectName.

3. If (ObjectName,ClassName) has a hard relation R(ObjectName,...) as a child, then it may have no descendant R(ObjectName,...) that is soft or !R(ObjectName,...) that is hard.

4. No object (ObjectName,ClassName) may have a descendant of form (ObjectName2,ClassName) or (ObjectName,ClassName2) with ClassName distinct from ClassName2 and ObjectName distinct from ObjectName2, which is to say that no object may have itself as a descendent, or have an descendent with the same class as itself.

Semantics

A TPKB defines a distribution over possible worlds. However, TPKBs employ a much richer kind of possible world, which consists of a set of objects and the truth values of the atoms that involve them. Different possible worlds may contain different objects, atoms, and truth values. A key feature that makes TPKBs easy to reason about is that atoms appear in a possible world only when their arguments also appear in that world. More concretely, applying this to the earlier example with children and hair color, the hair color predicate applied to the child will only be present in worlds where the child actually exists, that is, worlds in which the family in question have a child.

The possible objects of a TPKB are the top object and every subpart of a possible object, under every possible class of that object. No other objects are possible.

We define the class membership predicate Is, where Is(ObjectName, ClassName) is true if object ObjectName is of class ClassName and false otherwise. A predicate (e.g., Is) applied to a set of arguments (e.g., ObjectName and ClassName) is an atom (e.g., Is(ObjectName, ClassName)). Likewise, R(ObjectName, ...) from earlier can be thought of as a relation atom which is true if the relation holds on ObjectName (or its subparts) and false otherwise. We refer to class membership and relation atoms and their negations as literals.

A possible world W of a TPKB K is an ordered pair (X,L) where X is a set of objects and L is a set of literals that satisfies the following requirements:

1. TopObject is in X and Is(TopObject,TopClass) is in L.

2. If ObjectName is in X and Is(ObjectName,ClassName) is in L, then

(a) if SubclassName is a subclass of ClassName then either Is(ObjectName,SubclassName) is in L or !Is(ObjectName,SubclassName) is in L,

(b) exactly one of the positive subclass literals Is(ObjectName,SubclassName) is in L,

(c) if R is a relation belonging to ClassName then either R(ObjectName,...) is in L or !R(ObjectName,...) is in L, and

(d) The subparts of ObjectName according to ClassName are all in X, and Is(SubpartName,SubpartClass) is in L for each part SubpartName of ObjectName and corresponding class SubpartClass according to ClassName.

3. No other objects are members of X and no other literals are members of L.

A closely related notion is that of a possible subworld of an object ObjectName at class ClassName, which is just the definition above except with ObjectName in place of TopObject and ClassName in place of TopClass.

We say that a possible world W=(X, L) does not contradict a literal l if !l is not in L (note that this does not imply that l is in L). Recall that facts are stated in object declarations, and function either to name a subpart of an object, specify a class that that object is in or a set of classes it isn't in, or specify the truth value of a relation. Facts are implemented as follows. If K contains the assertion that relation R of object O is true (false), then R(O, ...) (!R(O, ...)) is included in K. A fact declaring that O is of class S_i means that both Is(O, S_i) and !Is(O,S_j), for every j not equal to i (assuming the S's are subclasses of some other class), are in K. However, if the fact is just that O is not of class S_i, this simply means that !Is(O, S_i) is in K.To denote the set of possible worlds of (O,C) that do not contradict any of the facts in a TPKB K we write W_K(O, C). We are now ready to specify the distribution that a TPKB K defines over its possible worlds. This distribution is constructed recursively from distributions over possible subworlds of object-class pairs in K. The unnormalized distribution over possible subworlds W = (X, L) of (O,C) can be written recursively, φ_K(O,C,W) = 0 if !Is(O,C) is in L and otherwise,

(1)

Where

φ

^R_K(O,R_k,W) = e^w_k if R_k(O,...) is in L,

φ

^R_K(O,R_k,W) = 1 if !R_k(O,...) is in L, and

φ

^R_K(O,R_k,W) = 1 + e^w_k if neither atom appears in L. If R_k is hard, then !R_k(O,...) can be interpreted as !Is(O,C). If an object has no subparts, no subclasses, or no relations, the corresponding terms of the above equation are just 1. The w's are weights of the respective subclasses and relations and the n_i's are the numbers of the respective subparts P_i.

The unnormalized distribution over possible worlds W of TPKB K is $φ$ _K(TopObject,TopClass,W) and the partition function of K is

(2)

The probability of a possible world is

(3)

Inference

TPKBs as implemented in Alchemy Lite currently perform two basic types of inference, conditional marginal and MAP. Conditional marginals are all computable as ratios of partition functions, as

(4)

Q is the set of query atoms, like Married(Sam, Avery), and adding them to the knowledge base K eliminates the possible worlds that contradict the atoms in Q. Thus, the ratio of partition functions is clearly interpretable as a probability, being the sum of unnomralized probabilities of possible worlds of K that don't contradict the facts in Q divided by the sum of unnormalized probabilities of all possible worlds of K.

We can perform MAP inference simply by replacing sums over subclasses with maxes and performaing a traceback. Both types of inference are guaranteed to be efficient and exact, with the time to run inference being proportional to the product of the number of objects in the knowledge base multiplied by the number of (class and relation) rules.

Other kinds of inference are possible, such as those that take more explicit advantage of TPKB's existence uncertainty semantics. We plan on implementing these in the future.

Next: 3 The Basics Up: The Alchemy Lite Tutorial Previous: The Alchemy Lite Tutorial

Chloe Kiddon 2013-04-01