next up previous
Next: 6 Entity Resolution Up: 5 Text Classification and Previous: 5.1 Text Classification and

5.2 Information Retrieval

With the growth of the Internet, information retrieval has grown into an important field. The task is to retrieve all documents relevant to a query of several words. A simple approach, vector-space information retrieval, can be easily implemented in Alchemy. We represent the words in a document with the HasWord predicate as in the previous section. Additionally, the predicate InQuery(w) is true iff w is in our query. The relevance of a page to our query is expressed by the predicate Relevant(page). Our simple MLN for information retrieval looks like this:

HasWord(word, page)

InQuery(+w) ^ HasWord(+w, p) => Relevant(p)

As web search engines have shown, pages linked to relevant pages are also sometimes relevant. This is achieved by adding one formula to the MLN involving the LinkTo predicate:

Relevant(p1) ^ LinkTo(id, p1, p2) => Relevant(p2)

Of course, in order to scale to the internet much more work is needed in terms of indexing the documents, query processing, etc. but these two formulas represent the core of PageRank-style information retrieval.

Marc Sumner 2010-01-22