Information Retrieval

Information Retrieval

Thomas Mandl (Universität Hildesheim, Germany) and Christa Womser-Hacker (Universität Hildesheim, Germany)
DOI: 10.4018/978-1-4666-5888-2.ch386
OnDemand PDF Download:
No Current Special Offers

Chapter Preview



The user is in the center of the information retrieval process. Most research tends to be either more user-oriented or more algorithm and system-oriented. User-oriented research tries to pursue a holistic view of the process, observes information behavior and develops measures for user satisfaction. System-oriented research is concerned with developing new algorithms, measuring the effect of system components and tries to resolve efficiency issues.

Key Terms in this Chapter

Term Weighting: Determines the importance of a term for a document. Weights are calculated by many different formula which consider the frequency of each term in a document and in the collection as well as the length of the document and the average or maximum length of any document in the collection.

Stemming: The mapping to word forms to stems or basic word forms. Word forms may differ from stems due to morphological changes necessary for grammatical reasons. Plural for English nouns, for example, is mostly constructed by adding an s to the basic noun. In most European languages stemming needs to strip suffixes of word forms.

Precision: A quality measure for information retrieval evaluation. It gives the percentage of relevant documents within the document set. Precision can be calculated by dividing the number of relevant documents which were found by the number of documents found.

Link Analysis: The links between pages on the web are a large knowledge source which is exploited by link analysis algorithms for many ends. Many algorithms similar to PageRank determine a quality or authority score based on the number of in-coming links of a page.

Indexing: The assignment of terms (words) which represent a document. Indexing can be carried out manually or automatically. Automatic indexing requires the elimination of stopwords and stemming.

Recall: A quality measure for information retrieval evaluation. It can be calculated by dividing the number of relevant documents which were found by the number of relevant documents in the collection. The second figure can often only be estimated.

Inverse Document Frequency (IDF): A traditional weighting scheme for terms. It can be calculated as the logarithm of the term frequency in the document divided by the frequency of the term in the whole collection.

Complete Chapter List

Search this Book: