Feeds:
Posts
Comments

Posts Tagged ‘zobel’


Passage Retrieval (PR) is a typical Information Retrieval (IR) system that returns short passages in response to a user query. But how to define the size and style of that short passage? It should be the paragraph where the answer probably is? Should we retrieve the whole section of the original document? Or should we only care about one sentence or part of it?

A simple way to define passages is based on the document structure. This entails using author-provided marking (e.g. period, indentation, empty line, etc.) as passage boundaries. Examples of such passages include paragraphs, sections, or sentences.

Nevertheless, passages can also be defined according to subject or content of the text. The main idea is to divide documents into coherent units with each unit corresponding to a subtopic. A well-known algorithm for deriving such passages is TextTiling.

Afterward, the third type of passage is window-based, which consists of a fixed number of words or bytes. Passages in this category may or may not take the logical structure of the document into account. Overlapped windows such as defined by Callan-1994 and non-overlapped windows such as defined by Kaszkiel-2001 do not depend on text, whereas pages in Zobel-1995 and bounded paragraphs in Callan-1994 make use of paragraph boundary information and restrict windows to some minimum length.

A more dynamic alternative to windows is arbitrary passages proposed by Kaszkiel-2001, Kaszkiel-1997 where the passage can start at any word in the document. Two subclasses are further defined. Fixed-length arbitrary passages resemble overlapped windows but with an arbitrary starting point. Variable-length arbitrary passages can be of any length. Unlike structural, topical, and window passages which are typically predefined (defined before or at indexing time), arbitrary passages are defined at query time.

References: Passage Retrieval Based On Language Models

Read Full Post »

Follow

Get every new post delivered to your Inbox.