Collaborative Filtering Based Data Mining for Large Data

Collaborative Filtering Based Data Mining for Large Data

Amrit Pal (Indian Institute of Information Technology Allahabad, India) and Manish Kumar (Indian Institute of Information Technology Allahabad, India)
Copyright: © 2017 |Pages: 13
DOI: 10.4018/978-1-5225-0489-4.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Size of data is increasing, it is creating challenges for its processing and storage. There are cluster based techniques available for storage and processing of this huge amount of data. Map Reduce provides an effective programming framework for developing distributed program for performing tasks which results in terms of key value pair. Collaborative filtering is the process of performing recommendation based on the previous rating of the user for a particular item or service. There are challenges while implementing collaborative filtering techniques using these distributed models. Some techniques are available for implementing collaborative filtering techniques using these models. Cluster based collaborative filtering, map reduce based collaborative filtering are some of these techniques. Chapter addresses these techniques and some basics of collaborative filtering.
Chapter Preview
Top

Collaborative Filtering

It’s a rating system where a user provides his/her response in a specific domain, these responded values by the user helps in recommending the next items to the similar users. There are two basic methods neighborhood and model-based for selecting the users and find similarity among them (Resnick, 1994).

There are two types of user information in system active users and passive users. The users which are currently using the system are active users and the information stored about the activity and their response for the items is stored in a database act as a passive user or passive user information. The process of neighborhood based filtering (Herlocker, 2002) starts with selection of a sample of users from the set of passive users based on their response to a particular item, basically similarity in their response for that item.

The prediction process for an item from item set to an active user can be described as:

  • Select a set of passive users based on their similarity with the active user.

  • Calculate the mean rating for the active and passive users.

  • To measure the similarity Pearson correlation coefficient can be used.

  • Select users which are having high similarity value corresponding to an active user.

  • Use this weight for calculating the weighted average of the deviations from the neighbor’s mean as:

here:

Complete Chapter List

Search this Book:
Reset