Movie Analytics for Effective Recommendation System using Pig with Hadoop

Movie Analytics for Effective Recommendation System using Pig with Hadoop

Arushi Jain (Ambedkar Institute of Advanced Communication Technologies and Research, New Delhi, India) and Vishal Bhatnagar (Department of Computer Science and Engineering, Ambedkar Institute of Advanced Communication Technologies and Research, New Delhi, India)
Copyright: © 2016 |Pages: 19
DOI: 10.4018/IJRSDA.2016040106


Movies have been a great source of entertainment for the people ever since their inception in the late 18th century. The term movie is very broad and its definition contains language and genres such as drama, comedy, science fiction and action. The data about movies over the years is very vast and to analyze it, there is a need to break away from the traditional analytics techniques and adopt big data analytics. In this paper the authors have taken the data set on movies and analyzed it against various queries to uncover real nuggets from the dataset for effective recommendation system and ratings for the upcoming movies.
Article Preview


Data is being moderately snowballing exponentially over the years. An essential escalation is desirable and imperative in order to process a huge amount of data that is being proliferated per second. The data which is large in size and complex in nature is required to store and process properly, and this is how, the term Big Data comes into existence. Big Data is an argot or easily catching phrase, cliché, used to represent large and complex data sets that can’t be process easily with current analytical tools and technologies. Steadily with time, the way of harnessing data has been gently evolved and is being witnessed by many. Earlier, there were only companies that were generating the data, and some of them were their consumers. Now in the Information Era, new model has come, where all of us are producers, and now all of us too, belong to the flock of consumers of data. Handling such massive volumes of structured, semi structured and unstructured type of data is an intricate task.

Figure 1 explains the transition from relational data to big data, for processing of data of variable amount; earlier relational models were only in use for OLTP (Online Transactional Processing- DBMs). Data is stored in the form of rows and columns in a table as structured data. Then for better performance and output, from business perspective another model is developed. Multi-dimensional analysis is what it emphasizes on. Business dimensions played a subtle and prominent role in informational decision environment. Online analytical processing (OLAP) is for multi-dimensional model. It provides result of ad-hoc queries and decisional support but size of data was still a hurdle in the way. Real-time analytics and Processing (RTAP) the main transition from models, operating on small data sets, to large and multifaceted data sets, finally happened because of Big Data & Architecture model. It enables all of us (companies, customers, web lo files) to be the producers of data as Well as consumers of data. It is the current trend in IT world in the same way Internet has become and more and more people are getting connected with it.

Figure 1.

From relational data to big data (Bhandarkar, 2010)


The definition is broadened using five characteristics or “V’s”. These are:

  • 1.

    Volume: This characteristic signifies huge voluminous data; it is in orders of terabytes and even petabytes;

  • 2.

    Velocity: This characteristic signifies the speed with which the data is generated;

  • 3.

    Variety: This characteristic refers different variety present in the big data that is structured, semi structured and unstructured data;

  • 4.

    Value: This characteristic refers to the intrinsic value contained in big data;

  • 5.

    Veracity: This characteristic refers to uncertainties in big data such as missing, duplicate and incomplete entries.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 7: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 6: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing