9.1 Statistical Parsing

Next: 9.2 Other NLP tasks Up: 9 Natural Language Processing Previous: 9 Natural Language Processing

9.1 Statistical Parsing

Statistical parsing is the task of computing the most probable parse of a sentence given a probabilistic (or weighted) context-free grammar (CFG). The weights of the probabilistic or weighted CFG are typically learned on a corpus of texts. Here, we demonstrate how to translate a given CFG to an MLN and learn weights for the grammar on a small corpus of simple sentences.

A context-free grammar in Chomsky normal form consists of rules of the form:

$\begin{displaymath}\begin{split}A & \rightarrow BC A & \rightarrow a \end{split}\end{displaymath}$

where , , and are non-terminal symbols and is a terminal symbol. Here, we want to construct a grammar for parsing simple English language sentences. One such simple grammar might look like this:

$\begin{displaymath}\begin{split}S & \rightarrow NP \thickspace VP NP & \right... ...hickspace N VP & \rightarrow V \thickspace NP \end{split}\end{displaymath}$

Here, stands for sentence, for noun phrase, for verb phrase, for adjective, for noun, for determiner, and for verb. To translate this into an MLN, we first encode each production rule in the grammar as a clause:

NP(i,j) ^ VP(j,k) => S(i,k)
Adj(i,j) ^ N(j,k) => NP(i,k)
Det(i,j) ^ N(j,k) => NP(i,k)
V(i,j) ^ NP(j,k) => VP(i,k)

Here, i and j indicate indices between the words, including at the beginning and at the end of the sentence. So, a sentence with words has indices. For example, the words in the sentence ``The dog chases the cat'' would be indexed as The dog chases the cat .

In addition, we need a lexicon for our text telling us the parts of speech of each word. For our small corpus, it looks like this:

// Determiners
Token("a",i) => Det(i,i+1)
Token("the",i) => Det(i,i+1)

// Adjectives
Token("big",i) => Adj(i,i+1)
Token("small",i) => Adj(i,i+1)

// Nouns
Token("dogs",i) => N(i,i+1)
Token("dog",i) => N(i,i+1)
Token("cats",i) => N(i,i+1)
Token("cat",i) => N(i,i+1)
Token("fly",i) => N(i,i+1)
Token("flies",i) => N(i,i+1)

// Verbs
Token("chase",i) => V(i,i+1)
Token("chases",i) => V(i,i+1)
Token("eat",i) => V(i,i+1)
Token("eats",i) => V(i,i+1)
Token("fly",i) => V(i,i+1)
Token("flies",i) => V(i,i+1)

If we left it at this, then two problems would occur. First, if there are two or more production rules with the same left side (such as $NP \rightarrow Adj \thickspace N$ and $NP \rightarrow Det \thickspace N$ ), then we need to enforce the constraint that only one of them fires. This is achieved with a rule of the form:

NP(i,k) ^ Det(i,j) => !Adj(i,j)

Basically, this is saying ``If a noun phrase results in a determiner and a noun, it cannot result in and adjective and a noun''. The second problem involves ambiguities in the lexicon. If we have homonyms belonging to different parts of speech, such as Fly (noun or verb), then we have to make sure that only one of these parts of speech is assigned. We can enforce this constraint in a general manner by making mutual exlcusion rules for each part of speech pair, i.e.:

!Det(i,j) v !Adj(i,j)
!Det(i,j) v !N(i,j)
!Det(i,j) v !V(i,j)
!Adj(i,j) v !N(i,j)
!Adj(i,j) v !V(i,j)
!N(i,j) v !V(i,j)

We now have a complete MLN (consisting of the grammar and the lexicon) for which we can learn weights based on training data. We treat each sentence as a separate database, thus the call is

learnwts -d -i cfg.mln -o cfg-out.mln -t s1-train.db,s2-train.db,...,
  s12-train.db -ne NP,VP,N,V,Det,Adj -multipleDatabases

This produces the weighted CFG in cfg-out.mln with which we can parse our test sentences:

infer -m -i cfg-out.mln -r parse.result -e test/s1-test.db
  -q NP,VP,N,V,Det,Adj

The extension to weighted definite clause grammars is intuitive because Markov logic can capture relations between parts of speech by simply adding an argument to the predicate. For example, if we want to enforce noun-verb agreement, we would add the number to the noun phrase and verb phrase predicates, i.e. NP(pos, pos, num) and VP(pos, pos, num) and our top-level rule would become NP(i,j,n) VP(j,k,n) => S(i,k).

Next: 9.2 Other NLP tasks Up: 9 Natural Language Processing Previous: 9 Natural Language Processing

Marc Sumner 2010-01-22