GA on IR: Study the Effectiveness of the Developed Fitness Function on IR

GA on IR: Study the Effectiveness of the Developed Fitness Function on IR

Ammar Al-Dallal (School of Information Systems Computing and Mathematics, Brunel University, West London, UK) and Rasha S. Abdul-Wahab (College of Information Technology, Ahlia University, Manama, Bahrain)
Copyright: © 2012 |Pages: 14
DOI: 10.4018/jalr.2012040101

Abstract

Increasing the growth rates of websites’ number has led to the challenge of assisting Web customers in finding appropriate details from the Internet using an intelligent search engine. Information retrieval (IR) is an essential and useful strategy for Web users; thus, different strategies and techniques are designed for such purpose. Currently, the focus on the usefulness of Artificial Intelligence (AI) has been improved with IR. One AI area is Evolutionary Computation (EC), which is based on designs of natural selection. A traditional and important strategy in EC is Genetic Algorithm (GA); this paper adopts the GA technique to enhance the retrieval of HTML documents. This improvement is obtained by creating a modern evaluation function and applying a hybrid crossover operator. The proposed evaluation function is based on term proximity, keyword probability within the document, and HTML tag weight query. Experimental results are compared with two well known evaluation function functions applied in IR domain which are Okapi-BM25 and Bayesian interface network model. The results demonstrate a good level of enhancement to the recall and precision. In addition, the documents retrieved by the proposed system were more accurate and relevant to the queries than that retrieved by other models.
Article Preview

1. Introduction

Rapid growth to the number of Web pages needs continuous challenges for helping Web users to find relevant information from the Internet. Information Retrieval (IR) is an essential and useful technique for Web users so studying of such system have increased since the coming of the World Wide Web.

In recent years, emphasis in the applicability of Artificial Intelligence (AI) has been increased with IR. One of the AI areas is Evolutionary Computation (EC) which is based on models of natural selection. A classical and important technique in EC is Genetic Algorithm (GA). The GA is biologically inspired and has many mechanisms inspired by natural evolution. Because of its parallel mechanism with high-dimensional space, GA has been used to solve many of scientific and engineering problems. This in turn led to encourage researchers for using this algorithm in IR. Besides, GA plays an important approach to provide suitable information for the user’s needs.

IR and GA integrated to avoid web users suffering from specific problems when trying to retrieve useful information such that:

  • Many of the retrieved documents are not related to the user query.

  • Some of relevant documents have not been retrieved yet (Picarougne et al., 2002).

However, retrieving relevant information is not a simple process. However, the complexity of this process is further increased by the fact that more and more of this information appears in natural language and not in structured formats (Liu, 2006).

The rest of this paper is organized as follows: in section 2, we describe problem statement. Section 3 describes the main objectives of this research. Section 4 discusses related works. Document representation is introduced in section 5. Section 6 introduces the proposed approach. Section 7 represents the experimental results of the proposed method. Section 8 gives the conclusions of this study.

The long history of GAs with IR is presented by integrating GA in IR with the aim of solving IR problems. One of these approaches is the one developed by Kim and Zhang (2003; 2000). They proposed a GA-based retrieval method which is used to learn the importance of HTML tags. This method shows an improvement of average precision when using tagged information over non-tagged information.

Picarougne et al (2002) reported their experience in designing different fitness function for web search using Genetic Programming (GP). They tested this set of fitness functions with other existing order-based fitness functions on the task of ranking function discovery. Their results show the design of such fitness functions lead to an increase in performance.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 8: 2 Issues (2018): 1 Released, 1 Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 1 Issue (2015)
Volume 4: 1 Issue (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing