2 Tractable Markov Logic

TML is a language for building models of domains that have both complex logical structure and uncertainty. It can represent objects, classes, and relations between objects, subject to certain restrictions which ensure that inference in any model built in TML (we call such models Tractable Probabilistic Knowledge Bases, or TPKBs) can be queried efficiently. In TML, the structure of objects is determined by their class memberships. A class determines what subparts any instance object has, as well as what relations are allowed over its subparts, and with what probability those relations hold. Classes may also describe multi-valued attributes, along with, for each attribute, a probability distribution over its possible values. The class itself may have subclasses, which describe further structure, as well as a probability distribution over possible subclass memberships. All probabilities are specified with weights by way of log-linear models. Relations are restricted to only be over the subparts of the same object, which prevents the model from becoming intractable while still allowing for intuitive models. Below we described TML in enough detail to start using it. For more information, consult the original TML paper.

**Structure**

A TPKB is a set of class declarations and object declarations that obey certain restrictions.

A **class declaration** for class `ClassName` specifies the subparts, subclasses, relations and attributes for `ClassName`, as well as weights for the subclasses, relations, and attributes.

`
class ClassName {
subclasses Subclass1 wt1, Subclass2 wt2, ..., SubclassN wtN;
subparts SubpartClass1[n], SubpartClass2 SubpartName2[n], SubpartClass3 SubpartName3, SubpartClass4, ...;
relations Relation(SubpartArg,...,SubpartArg) wt, HardRelation(SubpartArg), !NegHardRelation(), ....;
...
}
`

When we specify that a class `ClassName` has **subclasses** `SubClass1`,...,`SubClassN`, it means that any object of class `ClassName` must belong to exactly one of the subclasses `SubClass1`,...,`SubClassN`, with the probability distribution over subclass memberships determined by the weights `w1` , ...,`wN` . For instance, the class `Family` may have subclasses `TwoParentFamily`, which has two `Adult` parts, and `OneParentFamily`, with one, with respective weights 1.2 and 0.3. We refer to the subclasses of class `ClassName` as `S(ClassName)`.

The subparts of `ClassName` are parts which every object of class `ClassName` must have and are specified by the class membership of that part, `SubpartClass`, an optional part name `SubpartName`, and the number of parts of that type where the number is optional and 1 by default. For instance an object of class `TwoParentFamily` might have a single `Pet` subpart of class `Animal` and two `Adult` subparts, each of class `Person`, so if the `Smiths` are a `TwoParentFamily` then we can refer to their pet as `Smiths.Pet` and the adults as `Smiths.Adult[1]` and `Smiths.Adult[2]`. We refer to the subparts of class `ClassName` as `P(ClassName)`.

The number and class of subparts can be overridden by declarations in subclasses. For example, the default for a
`Family` might be 2 `Adult`s. However, its subclass `OneParentFamily` might override the default
to have only 1 `Adult` subpart. The classes of subparts can also be overridden,
but the overriding classes must be subclasses of the declarations at the higher levels.
For example, a `Family` might default to 1 `Animal` as a `Pet`.
Then a `CatLadyFamily` might override `Family` to have ten `Cat`s as `Pet`s.
However, you could not have a subclass that has a `Toaster` as a `Pet` since `Toaster` is
not a subclass of `Animal`.

In Alchemy Lite, all subparts with the same subpart name are treated the same in relations; that is, you cannot specify a relation with an
argument that is a particular indexed subpart with some subpart name.
To illustrate, if `TwoParentFamily` was defined with the subpart line

`subparts Adult[2];`

then you could define the relation `Married(Adult,Adult)`, which would create four groundings. However you cannot define
the relation `Married(Adult[1],Adult[2])`. If two subparts are going to be treated differently in some relation(s), then
they must be defined with different subpart names. E.g.:

subparts Adult1, Adult2; relations Married(Adult1,Adult2) 1.2

The **relations** of `ClassName` specify what relationships may hold among the subparts of any object of class `ClassName`, and with what weight. For instance, the class `TwoParentFamily` may have the relation `Married(Adult1, Adult2)`.

Relations may apply to the object as a whole, rather than to its subparts, in which case we can omit parentheses and arguments, for example the relation `Mortgage` may apply to a family, and be true if that family holds a mortgage. Note that relations may only take an object and/or its subparts as arguments. This restriction is key in maintaining tractability.
Negated relations are written with a `!` immediately preceding them. The reason we use weights rather than probabilities is compactness - the weight of a relation rule in a class need only represent the change in the relation's log probability from whatever ancestor class it was introduced at. For instance, if `TwoParentFamily` and and `OneParentFamily` are both subclasses of class `Family`, and two- and one-parent families are equally likely to hold a mortgage, then the `Mortgage` relation need only appear at the `Family` class level. However, if two parent families are more likely to hold a mortgage, then `Mortgage` would appear again under `TwoParentFamily` with positive weight. We refer to the relations of class `ClassName` as `R(ClassName)`.

The weights of relations can be overridden in subclasses; however, the relation itself must contain the same arguments.

Relations can be more general than those defined above, specifically, we can have multivalued (as opposed to binary-valued) attributes. However, these are handled semantically like subclasses, so we omit them here for brevity but describe how to use them in the next section.

**Object declarations** introduce instances of classes, along with whatever information is known about the object declared. For example,

`
Family Smiths {
TwoParentFamily;
Adult[1] Sam, Adult[2] Avery, Child[1] Anna;
Parent(Sam,Anna), !Parent(Avery,Anna);
}
`

This describes a family, the Smiths, which has two parents, named Sam and Avery, and one child, named Anna, in which Sam is Anna's biological parent but Avery is not.

**Restrictions**

There are a number of structural restrictions on class and object declarations necessary to ensure that inference is tractable. To describe these we must first define a notion of ancestor/descendent in the TPKB. It is useful to think of an object `ObjectName` at a particular class level `ClassName`, referred to as `(ObjectName,ClassName)`. We extend this notation to include the ground atom `R(ObjectName,...)`, which refers to the relation `R(...)` of class `ClassName` grounded with the relevant subparts of object `ObjectName`. With the, we can formalize the notion of a descendant.

The **descendants** of the pair `(ObjectName,ClassName)` in TPBK K are

1. `(P(ObjectName),SubpartClass)` for each `P` (with class `SubpartClass`) a subpart of `ObjectName` according to `ClassName`.

2. `(ObjectName,SubclassName)` for each `SubclassName` a subclass of `ClassName`.

3. `R(ObjectName,...)` for each relation `R` of `ClassName`

4. The descendants of each of the above.

There are no other descendants of `(ObjectName,ClassName)`, and the first three types of descendant are referred to as **children** of `(ObjectName,ClassName)`.

With this, the full definition of a TPKB is straightforward. The overarching purpose of the following definition is to allow the distribution defined by the TPKB to have an efficient, recursive decomposition. The first restriction gives the decomposition a starting point. The second ensures that the subparts of an object are independent, not mentioning anything they can disagree on, so that the distribution factors as a product over them. The third prevents contradictory relation truth values. The fourth prevents an object from having itself as a descendant, which is nonsensical, or from having infinitely many descendants.

A set K of class and object declarations is a **TPKB** iff it satisfies the following conditions:

1. There is a single object, `TopObject` which is the sole instance of class `TopClass`, such that `(TopObject,TopClass)` has every other `(ObjectName,ClassName)` in K as a descendant.

2. For any two subparts `PartA(ObjectName)` and `PartB(ObjectName)` (introduced with classes `ClassA` and `ClassB`) of object `ObjectName` , `(PartA(ObjectName),ClassA)` and `(PartB(ObjectName),ClassB)` share as descendants only objects `(ObjectNameX,ClassX)` such that `ClassX` has no superclasses, subclasses, or soft relations. Note that this holds even if the two subparts are introduced in different possible classes of `ObjectName`.

3. If `(ObjectName,ClassName)` has a hard relation `R(ObjectName,...)` as a child, then it may have no descendant `R(ObjectName,...)` that is soft or `!R(ObjectName,...)` that is hard.

4. No object `(ObjectName,ClassName)` may have a descendant of form `(ObjectName2,ClassName)` or `(ObjectName,ClassName2)` with `ClassName` distinct from `ClassName2` and `ObjectName` distinct from `ObjectName2`, which is to say that no object may have itself as a descendent, or have an descendent with the same class as itself.

**Semantics**

A TPKB defines a distribution over possible worlds. However, TPKBs employ a much richer kind of possible world, which consists of a set of objects and the truth values of the atoms that involve them. Different possible worlds may contain different objects, atoms, and truth values. A key feature that makes TPKBs easy to reason about is that atoms appear in a possible world only when their arguments also appear in that world. More concretely, applying this to the earlier example with children and hair color, the hair color predicate applied to the child will only be present in worlds where the child actually exists, that is, worlds in which the family in question have a child.

The **possible objects** of a TPKB are the top object and every subpart of a possible object, under every possible class of that object. No other objects are possible.

We define the class membership predicate `Is`, where `Is(ObjectName, ClassName)` is true if object `ObjectName` is of class `ClassName` and false otherwise. A predicate (e.g., `Is`) applied to a set of arguments (e.g., `ObjectName` and `ClassName`) is an atom (e.g., `Is(ObjectName, ClassName)`). Likewise, `R(ObjectName, ...)` from earlier can be thought of as a relation atom which is true if the relation holds on `ObjectName` (or its subparts) and false otherwise. We refer to class membership and relation atoms and their negations as literals.

A **possible world** *W* of a TPKB K is an ordered pair *(X,L)* where *X* is a set of objects and *L* is a set of literals that satisfies the following requirements:

1. `TopObject` is in *X* and `Is(TopObject,TopClass)` is in *L*.

2. If `ObjectName` is in *X* and `Is(ObjectName,ClassName)` is in *L*, then

(a) if `SubclassName` is a subclass of `ClassName` then either `Is(ObjectName,SubclassName)` is in *L* or `!Is(ObjectName,SubclassName)` is in *L*,

(b) exactly one of the positive subclass literals `Is(ObjectName,SubclassName)` is in *L*,

(c) if `R` is a relation belonging to `ClassName` then either `R(ObjectName,...)` is in *L* or `!R(ObjectName,...)` is in *L*, and

(d) The subparts of `ObjectName` according to `ClassName` are all in *X*, and `Is(SubpartName,SubpartClass)` is in *L* for each part `SubpartName` of `ObjectName` and corresponding class `SubpartClass` according to `ClassName`.

3. No other objects are members of *X* and no other literals are members of *L*.

A closely related notion is that of a **possible subworld**
of an object `ObjectName` at class `ClassName`, which is just the definition above except with `ObjectName` in place of `TopObject` and `ClassName` in place of `TopClass`.

We say that a possible world *W=(X, L)* does not contradict a literal *l* if *!l* is not in *L* (note that this does not imply that *l* is in *L*). Recall that facts are stated in object declarations, and function either to name a subpart of an object, specify a class that that object is in or a set of classes it isn't in, or specify the truth value of a relation. Facts are implemented as follows. If K contains the assertion that relation `R` of object `O` is true (false), then `R(O, ...)` (`!R(O, ...)`) is included in K. A fact declaring that `O` is of class `S _{i}` means that both

(1) |

Where $\phi R$

The unnormalized distribution over possible worlds W of TPKB K is $\phi $_{K}`(TopObject,TopClass,W)` and the partition function of K is

(2) |

The probability of a possible world is

(3) |

**Inference**

TPKBs as implemented in Alchemy Lite currently perform two basic types of inference, conditional marginal and MAP. Conditional marginals are all computable as ratios of partition functions, as

(4) |

Q is the set of query atoms, like `Married(Sam, Avery)`, and adding them to the knowledge base K eliminates the possible worlds that contradict the atoms in Q. Thus, the ratio of partition functions is clearly interpretable as a probability, being the sum of unnomralized probabilities of possible worlds of K that don't contradict the facts in Q divided by the sum of unnormalized probabilities of all possible worlds of K.

We can perform MAP inference simply by replacing sums over subclasses with maxes and performaing a traceback. Both types of inference are guaranteed to be efficient and exact, with the time to run inference being proportional to the product of the number of objects in the knowledge base multiplied by the number of (class and relation) rules.

Other kinds of inference are possible, such as those that take more explicit advantage of TPKB's existence uncertainty semantics. We plan on implementing these in the future.