Ontology-Based Clustering of the Web Meta-Search Results

Ontology-Based Clustering of the Web Meta-Search Results

Constanta-Nicoleta Bodea (Academy of Economic Studies, Romania), Adina Lipai (Academy of Economic Studies, Romania) and Maria-Iuliana Dascalu (Academy of Economic Studies, Romania)
DOI: 10.4018/978-1-4666-1833-6.ch019
OnDemand PDF Download:


The chapter presents a meta-search tool developed in order to deliver search results structured according to the specific interests of users. Meta-search means that for a specific query, several search mechanisms could be simultaneously applied. Using the clustering process, thematically homogenous groups are built up from the initial list provided by the standard search mechanisms. The results are more user oriented, as a result of the ontological approach of the clustering process. After the initial search made on multiple search engines, the results are pre-processed and transformed into vectors of words. These vectors are mapped into vectors of concepts, by calling an educational ontology and using the WordNet lexical database. The vectors of concepts are refined through concept space graphs and projection mechanisms, before applying the clustering procedure. Implementation details and early experimentation results are also provided.
Chapter Preview


Information retrieval refers to the “representation, storage, organization and access to information items” and its success is strongly related to users’ needs (Baeza-Yates & Ribeiro-Neto, 1999), (Heisig, Caldwell, Grebici, & Clarkson, 2010), (Domingo-Ferrer, Bras-Amorós, Wu, & Manjón, 2009). Nevertheless, defining users’ needs is not a straightforward issue. Building a query with a set of keywords, as an expression of users’ needs and applying that query to a large set of data is not enough. The users have to receive the most relevant results, according to the query. The task became more challenging once the World Wide Web came into scene: “The Web is becoming a universal repository of human knowledge and culture which has allowed unprecedented sharing of ideas and information in a scale never seen before” (Baeza-Yates & Ribeiro-Neto, 1999). Trying to keep up with the continuous growth of the World Wide Web (WWW), the retrieval tools are engaged in a permanent race for faster development in order to reach better performances (Ajayi, Aderounmu, & Soriyan, 2010), (Wang, Tsai, & Hsu, 2009), (Tu & Seng, 2009). Information retrieval doesn’t just mean information access (summarization, filtering, search, categorization), but also knowledge acquisition (visualization, mining, extraction, clustering). Thus, besides simple retrieval application, mining and learning applications are needed. Many operations in information retrieval can be automated, such as document indexing or query refinement, but classifications are more often performed manually. For saving time, algorithms were developed for mining documents (Qiu, 2010), (Jeng, Chuang, & Tao, 2010) (Chen, Tseng, & Liang, 2010). These algorithms are based on machine learning,” a dynamic, burgeoning area of computer science which is finding application in domains ranging from ‘expert systems’, where learning algorithms supplement—or even supplant—domain experts for generating rules and explanations (Langley, & Simon, 1995), to ‘intelligent agents’, which learn to play particular, highly-specialized, support roles for individual people and are seen by some to herald a new renaissance of artificial intelligence in information technology (Hendler, 1997)” (Cunningham, Littin, & Witten, 2001). A good example of machine learning algorithm used in information retrieval is the case in which knowledge bases are built as mirrors of WWW in local computer, thus optimizing the search process (Craven, et al., 2000).

Complete Chapter List

Search this Book: