Next: 4 Logistic Regression Up: The Alchemy Tutorial Previous: 2.3 Multinomial Distribution

3 Social Network Analysis

Now that we have seen some trivial examples, we might want to do something more practical. We start with a simple social network of friends, smoking, and cancer. The network attempts to model friendship ties between people, their smoking habits and the causality of cancer.

If we want to model ``smoking causes cancer'' in first-order logic, this would look like:

$\displaystyle {\tt\forall x \thickspace Smokes(x) \Rightarrow Cancer(x)}$

Of course, this does not hold for all smokers, so in Markov logic we can just tack a weight on to the rule, or, as we do here, learn a weight from training data. In Alchemy all free variables are universally quantified, so the formula is

Smokes(x) => Cancer(x)

Other rules we might add are ``People with friends who smoke, also smoke'' or ``People with friends who don't smoke, don't smoke'', i.e.:

Friends(x,y) => (Smokes(x) <=> Smokes(y))

Converted to CNF, this becomes the two clauses:

!Friends(x,y) v Smokes(x) v !Smokes(y)
!Friends(x,y) v !Smokes(x) v Smokes(y)

We can learn weights from the training data with the command

learnwts -d -i smoking.mln -o smoking-out.mln -t smoking-train.db
  -ne Smokes,Cancer

Here, -d denotes that we want to run discriminative learning, -o designates the output MLN, -t specifies the training data, and -ne indicates which predicates are to be viewed as non-evidence during learning. This produces the file smoking-out.mln with the learned weights. Using this with the test data, we can compute the marginal probabilities of each person smoking and getting cancer:

infer -ms -i smoking-out.mln -r smoking.result -e smoking-test.db
  -q Smokes,Cancer

This gives us marginal probabilities of each ground query atom. Alternatively, we might want the most likely state of the query atoms, the MAP state. In this case, we would use the -a option instead of -ms. The output contains each query atom followed by its truth value (1 = true and 0 = false).

Next: 4 Logistic Regression Up: The Alchemy Tutorial Previous: 2.3 Multinomial Distribution

Marc Sumner 2010-01-22