Web Mining to Identify People of Similar BackgroundQuanzhi Li (Avaya, Inc., USA) and Yi-fang Brook Wu (New Jersey Institute of Technology, USA)
Copyright © 2009.
17 pages.
OnDemand Chapter PDF Download
Download link provided immediately after order completion
| $37.50 | |
Available.
Instant access upon order completion.
DOI: 10.4018/978-1-59904-990-8.ch023
Sample PDFCite
MLA
Li, Quanzhi and Yi-fang Brook Wu. "Web Mining to Identify People of Similar Background." Handbook of Research on Text and Web Mining Technologies. IGI Global, 2009. 369-385. Web. 18 Jun. 2013. doi:10.4018/978-1-59904-990-8.ch023
APA
Li, Q., & Brook Wu, Y. (2009). Web Mining to Identify People of Similar Background. In M. Song, & Y. Brook Wu (Eds.), Handbook of Research on Text and Web Mining Technologies (pp. 369-385). Hershey, PA: Information Science Reference. doi:10.4018/978-1-59904-990-8.ch023
Chicago
Li, Quanzhi and Yi-fang Brook Wu. "Web Mining to Identify People of Similar Background." In Handbook of Research on Text and Web Mining Technologies, ed. Min Song and Yi-Fang Brook Wu, 369-385 (2009), accessed June 18, 2013. doi:10.4018/978-1-59904-990-8.ch023
Export Reference
 Favorite  | | TopAbstractThis chapter presents a new approach of mining the Web to identify people of similar background. To find similar people from the Web for a given person, two major research issues are person representation and matching persons. In this chapter, a person representation method which uses a person’s personal Web site to represent this person’s background is proposed. Based on this person representation method, the main proposed algorithm integrates textual content and hyperlink information of all the Web pages belonging to a personal Web site to represent a person and match persons. Other algorithms are also explored and compared to the main proposed algorithm. The evaluation methods and experimental results are presented. TopPrevious StudiesIn this section, we introduce previous studies on person search and people search. Previous studies of person search mainly focus on how to find Web pages related to a person, given this person’s name as the query. In their systems, the query, which is usually a person’s name, is sent to regular search engines, and the search results from the regular search engines are then refined to find Web pages related to this person. TopComplete Chapter List
Search this Book:
Reset | 1. |
Ying Liu (The Hong Kong Polytechnic University Hong Kong SAR, China)
In the automated text classification, a bag-of-words representation followed by the tfidf weighting is the most popular approach to convert the textual documents int...
Sample PDF |
More details... | $37.50 |
| 2. |
Yi-fang Brook Wu (New Jersey Institute of Technology, USA), Quanzhi Li (Avaya, Inc., USA)
Document keyphrases provide semantic metadata which can characterize documents and produce an overview of the content of a document. This chapter describes a Keyphra...
Sample PDF |
More details... | $37.50 |
| 3. |
John Atkinson (Universidad de Concepción, Chile)
This chapter introduces a novel evolutionary model for intelligent text mining. The model deals with issues concerning shallow text representation and processing for...
Sample PDF |
More details... | $37.50 |
| 4. |
Xiaoyan Yu (Virginia Tech, USA), Manas Tungare (Virginia Tech, USA), Weigo Yuan (Virginia Tech, USA), Yubo Yuan (Virginia Tech, USA), Manuel Pérez-Quiñones (Virginia Tech, USA), Edward A. Fox (Virginia Tech, USA)
Syllabi are important educational resources. Gathering syllabi that are freely available and creating useful services on top of the collection presents great value f...
Sample PDF |
More details... | $37.50 |
| 5. |
Xiao-Li Li (Institute for Infocomm Research, A* STAR, Singapore)
In traditional text categorization, a classifier is built using labeled training documents from a set of predefined classes. This chapter studies a different problem...
Sample PDF |
More details... | $37.50 |
| 6. |
Yu-Jin Zhang (Tsinghua University, Beijing, China)
Mining techniques can play an important role in automatic image classification and content-based retrieval. A novel method for image classification based on feature...
Sample PDF |
More details... | $37.50 |
| 7. |
Han-joon Kim (University of Seoul, Korea)
This chapter introduces two practical techniques for improving Naïve Bayes text classifiers that are widely used for text classification. The Naïve Bayes has been ev...
Sample PDF |
More details... | $37.50 |
| 8. |
Ricco Rakotomalala (University of Lyon, France), Faouzi Mhamdi (University of Jandouba, Tunisia)
In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their...
Sample PDF |
More details... | $37.50 |
| 9. |
Wilson Wong (University of Western Australia, Australia)
Feature-based semantic measurements have played a dominant role in conventional data clustering algorithms for many existing applications. However, the applicability...
Sample PDF |
More details... | $37.50 |
| 10. |
Xiaohui Cui (Oak Ridge National Laboratory, USA)
In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. The major challenge of today’s informa...
Sample PDF |
More details... | $37.50 |
| 11. |
P. Viswanth (Indian Institute of Technology Guwahati, India)
Clustering is a process of finding natural grouping present in a dataset. Various clustering methods are proposed to work with various types of data. The quality of...
Sample PDF |
More details... | $37.50 |
| 12. |
Abdelmalek Amine (Djillali Liabes University, Algeria & Taher Moulay University Center, Algeria), Zakaria Elberrichi (Djillali Liabes University, Algeria), Michel Simonet (Joseph Fourier University, France), Ladjel Bellatreche (University of Poitiers, France), Mimoun Malki (Djillali Liab)
The classification of textual documents has been the subject of many studies. Technologies like the Web and numerical libraries facilitated the exponential growth of...
Sample PDF |
More details... | $37.50 |
| 13. |
Lean Yu (Chinese Academy of Sciences, China), Shouyang Wang (Chinese Academy of Sciences, China), Kin Keung Lai (City University of Hong Kong, China)
With the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which helps people discover some useful knowledge from...
Sample PDF |
More details... | $37.50 |
| 14. |
Sangeetha Kutty (Queensland University of Technology, Australia)
With the emergence of XML standardization, XML documents have been widely used and accepted in almost all the major industries. As a result of the widespread usage,...
Sample PDF |
More details... | $37.50 |
| 15. |
Richi Nayak (Queensland University of Technology, Australia)
XML has gained popularity for information representation, exchange and retrieval. As XML material becomes more abundant, its heterogeneity and structural irregularit...
Sample PDF |
More details... | $37.50 |
| 16. |
Francesco Buccafurri (University “Mediterranea” of Reggio Calabria, Italy)
In the context of Knowledge Discovery in Databases, data reduction is a pre-processing step delivering succinct yet meaningful data to sequent stages. If the target...
Sample PDF |
More details... | $37.50 |
| 17. |
Jan H. Kroeze (University of Pretoria, South Africa)
This chapter discusses the application of some data warehousing techniques on a data cube of linguistic data. The results of various modules of clausal analysis can...
Sample PDF |
More details... | $37.50 |
| 18. |
Yi-fang Brook Wu (New Jersey Institute of Technology, USA), Xin Chen (Microsoft Corporation, USA)
This chapter presents a methodology for personalized knowledge discovery from text. Traditionally, problems with text mining are numerous rules derived and many alre...
Sample PDF |
More details... | $37.50 |
| 19. |
Catia Pesquita (University of Lisbon, Portugal)
Biomedical research generates a vast amount of information that is ultimately stored in scientific publications or in databases. The information in scientific texts...
Sample PDF |
More details... | $37.50 |
| 20. |
Luis M. de Campos (University of Granada, Spain)
In this chapter, we present a thesaurus application in the field of text mining and more specifically automatic indexing on the set of descriptors defined by a thesa...
Sample PDF |
More details... | $37.50 |
| 21. |
Stanley Loh (Lutheran University of Brazil, Brazil), Leandro Krug Wives (Federal University of Rio Grande do Sul, Brazil), Daniel Lichtnow (Catholic University of Pelotas, Brazil), José Palazzo M. de Oliveira (Federal University of Rio Grande do Sul, Brazil)
The goal of this chapter is to present an approach to mine texts through the analysis of higher level characteristics (called “concepts’), minimizing the vocabulary...
Sample PDF |
More details... | $37.50 |
| 22. |
Marcello Pecoraro (University of Naples Federico II, Italy)
This chapter aims at providing an overview about the use of statistical methods supporting the Web Usage Mining. Within the first part is described the framework of...
Sample PDF |
More details... | $37.50 |
| 23. |
Quanzhi Li (Avaya, Inc., USA), Yi-fang Brook Wu (New Jersey Institute of Technology, USA)
This chapter presents a new approach of mining the Web to identify people of similar background. To find similar people from the Web for a given person, two major re...
Sample PDF |
More details... | $37.50 |
| 24. |
Pawan Lingras (Saint Mary’s University, Canada)
This chapter describes how Web usage patterns can be used to improve the navigational structure of a Web site. The discussion begins with an illustration of visualiz...
Sample PDF |
More details... | $37.50 |
| 25. |
Rosa Meo (Università di Torino, Italy), Maristella Matera (Politecnico di Milano, Italy)
In this chapter, we present the usage of a modeling language, WebML, for the design and the management of dynamic Web applications. WebML also makes easier the analy...
Sample PDF |
More details... | $37.50 |
| 26. |
Brigitte Trousse (INRIA Sophia Antipolois, France), Marie-Aude Aufaure (INRIA Sophia and Supélec, France), Bénédicte Le Grand (Laboratoire d’Informatique de Paris 6, France), Yves Lechevallier (INRIA Rocquencourt, France), Florent Masseglia (INRIA Sophia Antipolois, France)
This chapter proposes an original approach for ontology management in the context of Web-based information systems. Our approach relies on the usage analysis of the...
Sample PDF |
More details... | $37.50 |
| 27. |
Yue-Shi Lee (Ming Chuan University, Taiwan, ROC)
Web mining is one of the mining technologies, which applies data mining techniques in large amounts of Web data to improve the Web services. Web traversal pattern mi...
Sample PDF |
More details... | $37.50 |
| 28. |
Stanley R.M. Oliveira (Embrapa Informática Agropecuária, Brazil), Osmar R. Zaïane (University of Alberta, Edmonton, Canada)
Privacy-preserving data mining (PPDM) is one of the newest trends in privacy and security research. It is driven by one of the major policy issues of the information...
Sample PDF |
More details... | $37.50 |
| 29. |
G.S. Mahalakshmi (Anna University, Chennai, India), S. Sendhilkumar (Anna University, Chennai, India)
Automatic reference tracking involves systematic tracking of reference articles listed for a particular research paper by extracting the references of the input seed...
Sample PDF |
More details... | $37.50 |
| 30. |
Wilson Wong (University of Western Australia, Australia)
As more electronic text is readily available, and more applications become knowledge intensive and ontology-enabled, term extraction, also known as automatic term re...
Sample PDF |
More details... | $37.50 |
| 31. |
Fotis Lazarinis (University of Sunderland, UK)
Over 60% of the online population are non-English speakers and it is probable the number of non-English speakers is growing faster than English speakers. Most search...
Sample PDF |
More details... | $37.50 |
| 32. |
Anne Kao (The Boeing Phantom Works, USA)
Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an...
Sample PDF |
More details... | $37.50 |
| 33. |
Ganesh Ramakrishnan (IBM India Research Labs, India), Pushpak Bhattacharyya (IIT Bombay, India)
Text mining systems such as categorizers and query retrievers of the first generation were largely hinged on word level statistics and provided a wonderful first-cut...
Sample PDF |
More details... | $37.50 |
| 34. |
Giuseppe Manco (Italian National Research Council, Italy), Riccardo Ortale (University of Calabria, Italy), Andrea Tagarelli (University of Calabria, Italy)
Personalization is aimed at adapting content delivery to users’ profiles: namely, their expectations, preferences and requirements. This chapter surveys some well-kn...
Sample PDF |
More details... | $37.50 |
| 35. |
Alexander Dreweke (Friedrich-Alexander University Erlangen-Nuremberg, Germany), Ingrid Fischer (University of Konstanz, Germany), Tobias Werth (Friedrich-Alexander University Erlangen-Nuremberg, Germany), Marc Wörlein (Friedrich-Alexander University Erlangen-Nure)
Searching for frequent pieces in a database with some sort of text is a well-known problem. A special sort of text is program code as e.g. C++ or machine code for em...
Sample PDF |
More details... | $37.50 |
| 36. |
Nitin Agarwal (Arizona State University, USA), Huan Liu (Arizona State University, USA), Jianping Zhang (MITRE Corporation, USA)
In Golbeck and Hendler (2006), authors consider those social friendship networking sites where users explicitly provide trust ratings to other members. However, for...
Sample PDF |
More details... | $37.50 |
| 37. |
Pasquale De Meo (Università degli Studi Mediterranea di Reggio Calabria, Italy)
In this chapter we present an information system conceived for supporting managers of Public Health Care Agencies to decide the new health care services to propose....
Sample PDF |
More details... | $37.50 |
| 38. |
Diego Liberati (Istituto di Elettronica e Ingegneria dell’Informazione e delle Telecomunicazioni Consiglio Nazionale delle Ricerche Politecnico di Milano, Italy)
Building effective multitarget classifiers is still an on-going research issue: this chapter proposes the use of the knowledge gleaned from a human expert as a pract...
Sample PDF |
More details... | $37.50 |
| 39. |
Shuting Xu (Virginia State University, USA)
Text mining is an instrumental technology that today’s organizations can employ to extract information and further evolve and create valuable knowledge for more effe...
Sample PDF |
More details... | $37.50 |
| 40. |
E. Thirumaran (Indian Institute of Science, India)
This chapter introduces Collaborative filtering-based recommendation systems, which has become an integral part of E-commerce applications, as can be observed in sit...
Sample PDF |
More details... | $37.50 |
| 41. |
Hanna Suominen (Turku Centre for Computer Science (TUCS), Finland & University of Turku, Finland)
The purpose of this chapter is to provide an overview of prevalent measures for evaluating the quality of system output in seven key text mining task domains. For ea...
Sample PDF |
More details... | $37.50 |
| 42. |
Yanliang Qi (New Jersey Institute of Technology, USA)
The biology literatures have been increased in an exponential growth in recent year. The researchers need an effective tool to help them find out the needed informat...
Sample PDF |
More details... | $37.50 |
| 43. |
Ki Jung Lee (Drexel University, USA)
With the increased use of Internet, a large number of consumers first consult on line resources for their healthcare decisions. The problem of the existing informati...
Sample PDF |
More details... | $37.50 |
| 44. |
Richard S. Segall (Arkansas State University, USA)
This chapter presents background on text mining, and comparisons and summaries of seven selected software for text mining. The text mining software selected for disc...
Sample PDF |
More details... | $37.50 |
| 45. |
Ah Chung Tsoi (Monash University, Australia), Phuong Kim To (Tedis P/L, Australia), Markus Hagenbuchner (University of Wollongong, Australia)
This chapter describes the application of a number of text mining techniques to discover patterns in the health insurance schedule with an aim to uncover any inconsi...
Sample PDF |
More details... | $37.50 |
| 46. |
Miao-Ling Wang (Minghsin University of Science & Technology, Taiwan, ROC), Hsiao-Fan Wang (National Tsing Hua University, Taiwan, ROC)
With the ever-increasing and ever-changing flow of information available on the Web, information analysis has never been more important. Web text mining, which inclu...
Sample PDF |
More details... | $37.50 |
| 47. |
Neil Davis (The University of Sheffield, UK)
Text mining technology can be used to assist in finding relevant or novel information in large volumes of unstructured data, such as that which is increasingly avail...
Sample PDF |
More details... | $37.50 |
TopKey Terms in this ChapterPeople Search: People search is to search other people that have similar interests or background with a given person. It is called “people search” because its purpose is to find a list of people that are similar to the given one, in terms of interests and background. Link Similarity: The degree of similarity between two Web sites (or Web pages), based on the link information (inlinks and outlinks) of the two Web sites (or Web pages). Content Similarity: The degree of similarity between two Web sites (or Web pages), based on the textual content (terms appearing in them) of the two Web sites. Web Page and Web Web Site: In this study, a Web page is a single Web document in a Web site. A Web site holds one or more Web pages. Person Search: Person search is a type of search which finds pages related to a specific person given this person’s name as the query. It aims at searching pages authored by a specific person or containing information about this person, and the query is the name of this person. Inlink and Outlink: To Web page W, an inlink is a URL of another Web page which contains a link pointing to W. To Web page W, an outlink is a link (URL) appearing in W which points to another Web page. Term Weight: different terms have different importance in a textual unit, e.g., a document, a document collection, or a Web site. A term’s weight is a value representing the degree of importance a term is in a textual unit. Usually a term’s frequency of appearance in a document or its TF.IDF value is used as its weight. Word Stemming: A process which strips off the word endings, reducing them to a root form or a common stem. For example, after applying word stemming to words “designed,” “designs,” and “designing,” they have the same root form, “design.” |
| |