Information Retrieval in Conjunction With Deep Learning

Information Retrieval in Conjunction With Deep Learning

Anu Bajaj (Department of Computer Science and Engineering, Guru Jambheshwar University of Science and Technology, Hisar, India), Tamanna Sharma (Department of Computer Science and Technology, Guru Jambheshwar University of Science and Technology, Hisar, India) and Om Prakash Sangwan (Department of Computer Science and Technology, Guru Jambheshwar University of Science and Technology, Hisar, India)
DOI: 10.4018/978-1-5225-9643-1.ch014

Abstract

Information is second level of abstraction after data and before knowledge. Information retrieval helps fill the gap between information and knowledge by storing, organizing, representing, maintaining, and disseminating information. Manual information retrieval leads to underutilization of resources, and it takes a long time to process, while machine learning techniques are implications of statistical models, which are flexible, adaptable, and fast to learn. Deep learning is the extension of machine learning with hierarchical levels of learning that make it suitable for complex tasks. Deep learning can be the best choice for information retrieval as it has numerous resources of information and large datasets for computation. In this chapter, the authors discuss applications of information retrieval with deep learning (e.g., web search by reducing the noise and collecting precise results, trend detection in social media analytics, anomaly detection in music datasets, and image retrieval).
Chapter Preview
Top

Introduction

We come across with huge amount of data day by day, which is mainly because of the social media, web and mobile applications usage, e.g., 15 GB of data is generated by Facebook alone (Kanimozhi & Padmini, 2018). This exponentially growing unstructured data in the form of web logs, data records, and sensor data etc. need to be converted into useful information. The information is what we acquire from the unconstrained data to fill the knowledge gap. For example, we want to buy some product then we need to resort to the customers review about the particular product. On positive response of product we would be likely to purchase the product else not. This is just a small example why information retrieval is important. The archival of the inscribed information may be tracked from 3000 BC where the Sumerians deposited clay tablets with cuneiform inscriptions (Singhal, 2001). For proficient use of information even they also projected special classification for identification of each tablet and its contents. Hence, the information retrieval (IR) is the process of archiving, organizing, maintaining the information collected from a huge database and disseminating the same to fill the user’s needs. In other words, IR system reads the user’s query and look out for information in the documents (database and knowledge base) for image, text, sound, and sensing data etc. and this retrieved information is responded back to the users (Guan & Zhang, 2008). The retrieved documents are ranked with their estimate of importance of document for a particular query. Intermediate stages are indexing, filtering, searching matching and ranking of the documents. In indexing the documents are indexed using signature files or inverted indices etc. and the filters remove all the stop words white spaces etc., finally the query in searched by using any brute force search, and linear search (Kanimozhi & Padmini, 2018), and the matched documents are ranked based on their similarity with the query. The user is responded back with the top ranked documents. Several models have been proposed for this purpose (Singhal, 2001) as shown in Table 1.

Table 1.
IR models
IR ModelsDescription
Boolean ModelIn this model, the query is represented by Boolean expression of terms and the terms are connected with Boolean operators.
Vector Space ModelIn this model, the word and phrases are known as terms and these terms are represented in form of vectors.
Probabilistic ModelIt assesses the probability of significance to the query. The documents are ordered by decreasing probability of their significance known as probability ranking principle.
Inference Network ModelThe documents are modeled using the inference process in the inference networks. The documents are ranked according to the term strength.

Complete Chapter List

Search this Book:
Reset