This article describes the most prominent approaches to apply artificial intelligence technologies to information retrieval (IR). Information retrieval is a key technology for knowledge management. It deals with the search for information and the representation, storage and organization of knowledge. Information retrieval is concerned with search processes in which a user needs to identify a subset of information which is relevant for his information need within a large amount of knowledge. The information seeker formulates a query trying to describe his information need. The query is compared to document representations which were extracted during an indexing phase. The representations of documents and queries are typically matched by a similarity function such as the Cosine. The most similar documents are presented to the users who can evaluate the relevance with respect to their problem (Belkin, 2000). The problem to properly represent documents and to match imprecise representations has soon led to the application of techniques developed within Artificial Intelligence to information retrieval.
In the early days of computer science, information retrieval (IR) and artificial intelligence (AI) developed in parallel. In the 1980s, they started to cooperate and the term intelligent information retrieval was coined for AI applications in IR. In the 1990s, information retrieval has seen a shift from set based Boolean retrieval models to ranking systems like the vector space model and probabilistic approaches. These approximate reasoning systems opened the door for more intelligent value added components. The large amount of text documents available in professional databases and on the internet has led to a demand for intelligent methods in text retrieval and to considerable research in this area. The need for better preprocessing to extract more knowledge from data has become an important way to improve systems. Off the shelf approaches promise worse results than systems adapted to users, domain and information needs. Today, most techniques developed in AI have been applied to retrieval systems with more or less success. When data from users is available, systems use often machine learning to optimize their results.
Artificial Intelligence Methods in Information Retrieval
Artificial intelligence methods are employed throughout the standard information retrieval process and for novel value added services. The first section gives a brief overview of information retrieval. The subsequent sections are organized along the steps in the retrieval process and give examples for applications.
Key Terms in this Chapter
Recommendation Systems: Actions or content is suggested to the user based on past experience collected from other users. Very often, documents are recommended based on similarity profiles between users
Term Expansion: Terms not present in the original query to an information retrieval system entered by the user are added automatically. The expanded query is then sent to the system again.
Weighting: Weighting determines the importance of a term for a document. Weights are calculated using many different formulas which consider the frequency of each term in a document and in the collection as well as the length of the document and the average or maximum length of any document in the collection.
Information Retrieval: Information retrieval is concerned with the representation and knowledge and subsequent search for relevant information within these knowledge sources. Information retrieval provides the technology behind search engines.
Link Analysis: The links between pages on the web are a large knowledge source which is exploited by link analysis algorithms for many ends. Many algorithms similar to PageRank determine a quality or authority score based on the number of in-coming links of a page. Furthermore, link analysis is applied to identify thematically similar pages, web communities and other social structures
Indexing: Indexing means the assignment of terms (words) which represent a document in an index. Indexing can be carried out manually or automatically. Automatic indexing requires the elimination of stopwords and stemming.
Adaptation: Adaptation is a process of modification based on input or observation. An information system should adapt itself to the specific needs of individual users in order to produce optimized results.