3.2 Weight Learning

Next: 3.3 Structure Learning Up: 3 Quick Start Previous: 3.1 Input Files

3.2 Weight Learning

To learn the weights of formulas, run the learnwts executable, e.g., ALCHDIR/bin/learnwts -g -i univ.mln -o univ-out.mln -t univ-train.db. -g specifies that generative learning is to be used. Alternatively, you can use -d for discriminative learning, e.g., ALCHDIR/bin/learnwts -d -i univ.mln -o univ-out.mln -t univ-train.db -ne advisedBy,student,professor. -i and -o specify the input and output .mln files as univ.mln and univ-out.mln respectively. If neither -g nor -d are specified, then discriminative learning is performed.

-t specifies the .db file that is to be used by weight learning. You can specify more than one .db file after -t in a comma separated list (e.g., -t univ1.db,univ2.db). The universe of constants are those that appear in the .db files. By default, all the constants are assumed to belong to one database. If this is not the case, you can use the option -multipleDatabases to specify that the constants in each .db file belong to a separate database, and should not be mixed with those in other .db files (e.g, -t ai.db,graphics.db,systems.db -multipleDatabases).

In the current version of Alchemy .db files that are used for learning can only contain true or false atoms (no unknowns). If there are constants that do not appear in the .db files, you can specify one or more .mln files containing the missing constants, and append them after the input .mln file, e.g., -i univ.mln,univ-train.mln. (You may wish to specify the extra .mln files when there are constants that only appear in false ground atoms of a closed-world predicate, or only in unknown ground atoms of an open world predicate. Such ground atoms need not be defined in .db files.) By default, unit clauses for all predicates are added to the MLN during weight learning. (You can change this with the -noAddUnitClauses option.)

The -ne option is used to specify non-evidence predicates. For discriminative learning, at least one non-evidence predicate must be specified. For generative learning, the specified predicates are included in the (weighted) pseudo-log-likelihood computation; if none are specified, all are included.

During weight learning, each formula is converted to conjunctive normal form (CNF), and a weight is learned for each of its clauses. If a formula is preceded by a weight in the input .mln file, the weight is divided equally among the formula's clauses. The weight of a clause is used as the mean of a Gaussian prior for the learned weight. If a formula is terminated by a period (i.e., the formula is a hard one), each of the clauses in its CNF is given a prior weight that is twice the maximum of the soft clause weights. If neither a weight nor a period is specified, a default prior weight is used for each of the formula's clauses; you can specify a default with the -priorMean option. The default prior weight for each clause in the CNF of those formulas is 1.5 times the maximum of the soft clause weights. (See the developer's manual on how to change the default prior weights.)

When multiple databases are used, the CNF of a formula with existentially quantified variables or variables with mutually exclusive and exhaustive values may be different across the databases. This occurs because we have to ground the variables to constants that are different across the databases. When this happens, we learn a weight for the formula rather than for each clause in its CNF.

You can view all the options by typing ALCHDIR/bin/learnwts without any parameters. After weight learning, the output .mln file contains the weights of the original formulas (commented out), as well as those of its derived clauses.

Next: 3.3 Structure Learning Up: 3 Quick Start Previous: 3.1 Input Files

Marc Sumner 2010-01-22