Peer-to-Peer Usage Analysis

Florent Masseglia; Pascal Poncelet; Maguelonne Teisseire

doi:10.4018/978-1-60566-014-1.ch154

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Peer-to-Peer Usage Analysis

Florent Masseglia, Pascal Poncelet, Maguelonne Teisseire

Source Title: Encyclopedia of Multimedia Technology and Networking, Second Edition

DOI: 10.4018/978-1-60566-014-1.ch154

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

With the huge number of information sources available on the Internet and the high dynamics of their data, peer-to-peer (P2P) systems propose a communication model in which each party has the same capabilities and can initiate a communication session. These networks allow a group of computer users with the same networking program to connect with each other and directly access resources from one another. P2P architectures also provide a good infrastructure for data and computer intensive operations such as data mining. In this article we consider a new data mining approach for improving resource searching in a dynamic and distributed database such as an unstructured P2P system, that is, in Masseglia, Poncelet, and Teisseire (2006) we call this problem P2P usage analysis. More precisely we aim at discovering frequent behaviors among users of such a system. We will focus on the sequential order between actions performed on each node (requests or downloads) and show how this order has to be taken into account for extracting useful knowledge. For instance, it may be discovered, in a P2P file sharing network that for 77% of nodes from which a request is sent for “Mandriva Linux,” the file “Mandriva Linux 2005 CD1 i585-Limited- Edition-Mini.iso” is chosen and downloaded; then a new request is performed with the possible name of the remaining iso images (i.e., “Mandriva Linux 2005 Limited Edition”), and in the large number of returned results the image corresponding to “Mandriva Linux 2005 CD2 i585-Limited-Edition-Mini.iso” is chosen and downloaded. Such knowledge is very useful for proposing the user with often downloaded or requested files according to a majority of behaviors. It could also be useful in order to avoid extra bandwidth consumption, which is the main cost of P2P queries (Ng, Chu, Rao, Sripanidkulchai, & Zhang, 2003).

Chapter Preview

Top

Background

Mining either association rules or sequential patterns in very large distributed databases as unstructured P2P systems is far away from trivial. For instance, in Wolff and Schuster (2003), authors propose to mine association rules, that is, sets of objects that tend to associate with one another, in a P2P system. The proposed algorithm combines, at each node, association rule mining algorithm with a majority voting protocol to discover all of the association rules that exist in the distributed database.

By nature P2P systems are dynamic, that is, nodes act independently of one another, and intermediate results may be overturned as new data arrives. Furthermore, whenever a node departs, the sequence of that node also disappears, and the global database has to be reconsidered. Traditional approaches for mining sequential patterns (Agrawal & Srikant, 1995; Pei et al., 2001) are irrelevant in such a dynamic context because they consider that the whole database is available. In Masseglia, Teisseire, and Poncelet (2003), we proposed to discover relationships and global patterns that exist between connecting users (Web usage mining). We proposed a new “client/server/engine” architecture for taking advantage of the computing power available on the machine a user navigates with. In this article our goal is different since it takes into account the dynamic nature of the considered system. We consider that the connected nodes can act with a special peer (called meter peer) in order to provide the end user with a good approximation of patterns embedded in this very large distributed database.

Key Terms in this Chapter

Peer-to-Peer (P2P) Systems: “Peer-to-peer” (or P2P) refers to a class of systems and applications that employ distributed resources to perform a critical function in a decentralized manner. The resources encompass computing power, data (storage and content), network bandwidth, and presence (computers, human, and other resources). Among the critical functions that have to be performed by P2P systems, we can cite distributed computing, data/content sharing, communication and collaboration, or platform services.

Data Sequence: The sequence of itemsets representing the behavior of a client over a specific period. The database involved in a sequential pattern mining process is a (usually large) set of data sequences. Data sequence could be purchase transaction data collected by retail stores or requested URLs. See also Web Usage Mining.

Sequential Pattern: A sequence included in a data sequence such that each item in the sequential pattern appears in this data sequence with respect to the order between the itemsets in both sequences. See also Data Sequence.

Client/Server: Client/server describes the relationship between two computer programs in which one program, called the client, makes a service request from another program, called the server, which fulfills the request.

Itemset: Set of items that occur together.

Peer-to-Peer (P2P) Usage Mining: With the huge number of information sources available on the Internet and the high dynamics of their data, (P2P) systems allow a group of computer users with the same networking program to connect with each other and directly access resources from one another. P2P Usage Analysis aims at discovering frequent behaviors among users of such a system.

Web Usage Mining: With the growing popularity of the World Wide Web, large volumes of data such as the user’s address or requested URLs are automatically gathered by Web servers and stored in access log files (access log, error log …). Analysis of such files in order to understand user behavior is called Web usage mining and can provide useful information to improve a network’s performance, to rebuild a Web site dynamically, or even to target customers in an e-business environment.

Items: Items are merely characteristics of individuals or observations of individual behaviors.

Maximal Frequent Sequential Pattern: A sequential pattern included in at least n data sequences (with n the minimum support specified by the user). A sequential pattern is maximal when it is not included in another frequent sequential pattern. A frequent sequential pattern may represent, for instance, a frequent behavior of a set of customers on a P2P system.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Peer-to-Peer Usage Analysis

Abstract

Background

Key Terms in this Chapter

Complete Chapter List