User Profiles for Personalizing Digital Libraries
Giovanni Semeraro (University of Bari, Italy), Pierpaolo Basile (University of Bari, Italy), Marco de Gemmis (University of Bari, Italy) and Pasquale Lops (University of Bari, Italy)
Copyright: © 2009
Exploring digital collections to find information relevant to a user’s interests is a challenging task. Information preferences vary greatly across users; therefore, filtering systems must be highly personalized to serve the individual interests of the user. Algorithms designed to solve this problem base their relevance computations on user profiles in which representations of the users’ interests are maintained. The main focus of this chapter is the adoption of machine learning to build user profiles that capture user interests from documents. Profiles are used for intelligent document filtering in digital libraries. This work suggests the exploiting of knowledge stored in machine-readable dictionaries to obtain accurate user profiles that describe user interests by referring to concepts in those dictionaries. The main aim of the proposed approach is to show a real-world scenario in which the combination of machine learning techniques and linguistic knowledge is helpful to achieve intelligent document filtering.
Our research was mainly inspired by the following works.
Syskill & Webert (Pazzani & Billsus, 1997) is an agent that learns a user profile to identify interesting Web pages. The learning process is performed by first converting a hypertext markup language (HTML) source into positive and negative examples, represented as keyword vectors, and then using learning algorithms like Bayesian classifiers, a nearest neighbor algorithm, and a decision tree learner.
Personal WebWatcher (Mladenic, 1999) is a Web browsing recommendation service that generates a user profile based on the content analysis of the requested pages. Learning is done by a naïve Bayes classifier where documents are represented as weighted keyword vectors, and classes are “interesting” and “not interesting.”
Mooney and Roy (2000) adopt a text categorization method in their Libra system that performs content-based book recommendations by exploiting product descriptions obtained from the Web pages of the Amazon online digital store. Also in this case, documents are represented by using keywords, and a naïve Bayes text classifier is adopted.
The main limitation of these approaches is that they represent items by using keywords. The objective of our research is to create accurate semantic user profiles. Among the state-of-the-art systems that produce semantic user profiles, SiteIF (Magnini & Strapparava, 2001) is a personal agent for a multilingual news Web site that exploits a sense-based representation to build a user profile as a semantic network, whose nodes represent senses of the words in documents requested by the user.
The role of linguistic ontologies in knowledge-retrieval systems is explored in OntoSeek (Guarino, Masolo, & Vetere, 1999), a system designed for content-based information retrieval from online yellow pages and product catalogs. OntoSeek combines an ontology-driven content-matching mechanism based on WordNet with a moderately expressive representation formalism. The approach has shown that structured content representations coupled with linguistic ontologies can increase both recall and precision of content-based retrieval.
We adopted a content-based method able to learn user profiles from documents represented by using senses of words obtained by a word sense disambiguation strategy that exploits the WordNet IS-A hierarchy.
Key Terms in this Chapter
Synset: A group of data elements that are considered semantically equivalent for the purposes of information retrieval.
Personalization: The process of tailoring products or services to users based on their user profiles.
Word Sense Disambiguation: The problem of determining in which sense a word having a number of distinct senses is used in a given sentence.
User Profile: A structured representation of interests (and disinterests) of a user or group of users.
NLP (Natural Language Processing): A subfield of artificial intelligence and linguistics that studies the problems of automated generation and understanding of natural human languages. It converts samples of human language into more formal representations that are easier for computer programs to manipulate.
WordNet: A semantic lexicon for the English language. It groups English words into sets of synonyms called synsets. It provides short, general definitions, and records the various semantic relations between these synonym sets.
Recommender system: A system that guides users in a personalized way to interesting or useful objects in a large space of possible options.
Complete Chapter List
Detailed Table of Contents
Yin-Leng Theng, Schubert Foo, Dion Goh, Jin-Cheon Na
Leonardo Candela, Donatella Castelli, Pasquale Pagano
Mohammed Nasser Al-Suqri, Esther O.A. Fatuyi
Jian-hua Yeh, Shun-hong Sie, Chao-chen Chen
Juan C. Lavariega, Lorena G. Gomez, Martha Sordia-Salinas, David A. Garza-Salazar
George Pyrounakis, Mara Nikolaidou
Ian H. Witten, David Bainbridge
Yin-Leng Theng, Nyein Chan Lwin Lwin, Jin-Cheon Na, Schubert Foo, Dion Hoe-Lian Goh
Schubert Foo, Yin-Leng Theng, Dion Hoe-Lian Goh, Jin-Cheon Na
Fu Lee Wang, Christopher C. Yang
K. S. Chudamani, H. C. Nagarathna
Payam M. Barnaghi, Wei Wang, Jayan C. Kurian
Giovanni Semeraro, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops
Shiyan Ou, Christopher S.G. Khoo, Dion Hoe-Lian Goh
Wooil Kim, John H.L. Hansen
Irene Lourdi, Mara Nikolaidou
Neide Santos, Fernanda C.A. Campos, Regina M.M. Braga Villela
Svenja Hagenhoff, Björn Ortelbach, Lutz Seidenfaden
Stefano Paolozzi, Fernando Ferri, Patrizia Grifoni
Ana Kovacevic, Vladan Devedzic
Jin-Cheon Na, Tun Thura Thet, Dion Hoe-Lian Goh, Yin-Leng Theng, Schubert Foo
Dion Hoe-Lian Goh, Khasfariyati Razikin, Alton Y.K. Chua, Chei Sian Lee, Schubert Foo
Taha Osman, Dhavalkumar Thakker, Gerald Schaefer
Stephen Kimani, Emanuele Panizzi, Tiziana Catarci, Margerita Antona
Spyros Veronikis, Giannis Tsakonas, Christos Papatheodorou
Mila M. Ramos, Luz Marina Alvaré, Cecilia Ferreyra, Peter Shelton
Robert Neumayer, Andreas Rauber
Gerald Schaefer, Simon Ruszala
Cláudio de Souza Baptista, Ulrich Schiel
Nuria Lloret Romero, Margarita Cabrera Méndez, Alicia Sellés Carot, Lilia Fernandez Aquino
Rubén Béjar, J. Nogueras-Iso, Miguel Ángel Latre, Pedro Rafael Muro-Medrano, F. J. Zarazaga-Soria
O. Cantán Casbas, J. Nogueras-Iso, F. J. Zarazaga-Soria
Piedad Garrido Picazo, Jesús Tramullas Saz, Manuel Coll Villalta
Wan Ab. Kadir Wan Dollah, Diljit Singh
Frances L. Lightsom, Alan O. Allwardt
Stephan Strodl, Christoph Becker, Andreas Rauber
Thomas Lidy, Andreas Rauber
Leonardo Bermón-Angarita, Antonio Amescua-Seco, Maria Isabel Sánchez-Segura, Javier García-Guzmán
Kanwal Ameen, Muhammad Rafiq
Seungwon Yang, Barbara M. Wildemuth, Jeffrey P. Pomerantz, Sanghee Oh
Faisal Ahmad, Tamara Sumner, Holly Devaul
Yongqing Ma, Warwick Clegg, Ann O’Brien
Chang Chew-Hung, John G. Hedberg
Michael B. Twidale, David M. Nichols
Soh Whee Kheng Grace