Analyst-Ready Large Scale Real Time Information Retrieval Tool for E-Governance

Analyst-Ready Large Scale Real Time Information Retrieval Tool for E-Governance

Eugene Santos Jr. (Dartmouth College, USA), Eunice E. Santos (Virginia Polytechnic Institute & State University, USA), Hien Nguyen (University of Wisconsin, Whitewater, USA), Long Pan (Virginia Polytechnic Institute & State University, USA) and John Korah (Virginia Polytechnic Institute & State University, USA)
DOI: 10.4018/978-1-60566-130-8.ch016


With the proliferation of the Internet and rapid development of information and communication infrastructure, E-governance has become a viable option for effective deployment of government services and programs. Areas of E-governance such as Homeland security and disaster relief have to deal with vast amounts of dynamic heterogeneous data. Providing rapid real-time search capabilities for such databases/sources is a challenge. Intelligent Foraging, Gathering, and Matching (I-FGM) is an established framework developed to assist analysts to find information quickly and effectively by incrementally collecting, processing and matching information nuggets. This framework has previously been used to develop a distributed, free text information retrieval application. In this chapter, we provide a comprehensive solution for the E-GOV analyst by extending the I-FGM framework to image collections and creating a “live” version of I-FGM deployable for real-world use. We present a Content Based Image Retrieval (CBIR) technique that incrementally processes the images, extracts low-level features and map them to higher level concepts. Our empirical evaluation of the algorithm shows that our approach performs competitively compared to some existing approaches in terms of retrieving relevant images while offering the speed advantages of a distributed and incremental process, and unified framework for both text and images. We describe our production level prototype that has a sophisticated user interface which can also deal with multiple queries from multiple users. The interface provides real-time updating of the search results and provides “under the hood” details of I-FGM processes as the queries are being processed.
Chapter Preview


One of the main challenges in E-governance is to effectively and efficiently find relevant information from vast amounts of dynamic heterogeneous sources quickly under the pressures and limitations of time, supporting tools, and resources. For instance, when natural disasters such as Hurricane Katrina (2005) or the Asian Tsunami of 2004 happen, we need to quickly locate the areas that are most affected and collect information in order to estimate the amount of relief items such as medicines, foods, and drinking water. Unfortunately, in such a situation, frontline communications are typically chaotic and/or there are too many channels of information from different sources that make the retrieval of relevant pieces of information a lot harder. For “hot spots” such as disaster relief areas, combat zones, etc., information is changing rapidly and as such, there is only a small window of time during which information remains valid. Additionally, various types of data representation are used such as images, blogs, maps, news reports, audios, and videos. Each type of data format contains important and indispensable information for the various governmental agencies. Therefore, in order to better assist these agencies in addressing these challenges, there is a clear and urgent need to develop a system that rapidly provides real-time retrieval capabilities of heterogeneous sources of information. There are three main issues that we need to address: (i) how to gather and retrieve information quickly in a real-time setting given the limitations of resources and time; (ii) how to address the problem of heterogeneous data; and, (iii) how to improve retrieval success.

We address the above issues by developing a framework for intelligent foraging, gathering, and matching (I-FGM) that incrementally and distributively gathers, processes, and matches information nuggets to assist users at finding information quickly and effectively. In our previous work (Santos et al, 2005, 2006), I-FGM has been empirically demonstrated to be an effective tool for text retrieval on large and dynamic search spaces. Even though unstructured text is a typical format for most databases/sources, images are also popular with significant support from commercialized search engines such as Google, Yahoo!, and MSN. In order to demonstrate that I-FGM is a general framework for information retrieval, it is necessary to study the system’s ability at effectively handling such heterogeneous databases which contain at least text and images. In this chapter, we apply the I-FGM framework on image collections by using a Content Based Image Retrieval (CBIR) method. We approach this by incrementally processing the images, extracting low-level features, and then mapping them to higher level concepts. The novelties of our approach lie with the distributed storage, and incremental processing and matching of information nuggets extracted from a region-based wavelet image retrieval scheme. We deploy a concept-based image retrieval algorithm that maps low level features of the images to high level concepts. In this way, we are also able to translate the visual information of images into document graphs (Santos et al, 2005) which are used in I-FGM as a common representation of information for heterogeneous data types. Thus, I-FGM provides a seamless integration of text and image through a single unifying semantic representation of content. By implementing and testing our image retrieval algorithm in I-FGM, we can validate the I-FGM framework as a method for future unified rankings of heterogeneous documents.

The prototypes presented in our previous efforts were primarily aimed at validating the I-FGM framework and was not meant to be deployed in the field. In order to fully validate I-FGM as an effective tool for the E-Gov analyst, we implemented a Production-level system. This system uses 79 high end computing nodes versus the 20 nodes in the earlier prototypes. The extensive computing resources enable us to guarantee quick results despite multiple queries from multiple users being processed simultaneously in the system. One notable difference with the previous prototype is that text and image retrievals are performed simultaneously for a given query. Additionally, we provide a set of tools for users to monitor the progress of their search via a graphical user interface. This interface displays the status of the internal processes in the system and allows the user to tailor the framework to his/her needs or area of expertise.

Complete Chapter List

Search this Book: