Automatic reference tracking involves systematic tracking of reference articles listed for a particular research paper by extracting the references of the input seed publication and further analyzing the relevance of the referred paper with respect to the seed paper. This tracking continues recursively with every reference paper being assumed as seed paper at every track level until the system finds any irrelevant (or far relevant) references deep within the reference tracks which does not help much in the understanding of the input seed research paper at hand. The relevance is analysed based on the keywords collected from the title and abstract of the referred article. The objective of the reference tracking system is to automatically list down closely relevant reference articles to aid the understanding of the seed paper thereby facilitating the literature survey of the aspiring researcher. This paper proposes the system design and evaluation of automatic reference tracking system discussing the observations obtained.
The World Wide Web (WWW), which is expanding everyday, has become a repository of up-to-date information for scientific scholars and researchers. Everyday numerous publications are deposited in a highly distributed fashion across various sites. The collection of online journals is also rapidly increasing. The increasing proportion of online scholarly literature makes it desirable to cite them out for necessary references. Many publications are deposited in a highly distributed fashion across various sites. Web’s current navigation model of browsing from site to site does not facilitate retrieving and integrating data from multiple sites.
Linking documents seems to be a natural proclivity of scholars. In search for additional information or in search for information presented in simpler terms, whatever the case may be, the user is generally misled with abundant pool of publications not knowing the direction of his/her search in relevance to the research problem and often forgets the source that motivated the user to initiate such navigation over research publications. Often budding researchers find themselves misled amidst conceptually ambiguous references while exploring a particular seed scholarly literature either aiming at a more clear understanding of the seed document or trying to find the crux of the concept behind the journal or conference article. Integration of bibliographical information of various publications available on the Web is an important task for researchers to understand the essence of recent advancements and also to avoid misleading of the user over distributed publication information across various sites.
Automatic reference tracking (ART) appears as a good shepherd for the struggling researchers by listing the relevant reference articles to the user’s purview. The relevant articles identified and downloaded are stored for later use. ART-listed references (which are the actual references across various levels from the seed article) are represented year wise, author wise or relevance wise [Mahalakshmi G.S. and Sendhilkumar Selvaraju, 2006] for better browsing of ART output.
ART involves recursive tracking of reference articles listed for a particular research paper by extracting the references of the input seed publication and further analyzing the relevance [Valerie V. Cross, 2001] of the referred paper with respect to the seed paper. This recursion continues with every reference paper being assumed as seed paper at every track level until the system finds any irrelevant (or farthest) references deep within the reference tracks which does not help much in the understanding of the input seed publication at hand.
Tracking of references implies locating the references in the Web through commercial search engines and successfully downloading them [Junjie Chen, Lizhen Liu, Hantao Song and Xueli Yu, 2001] to be fed for subsequent recursions. Web pages, however, change constantly in relation to their contents and existence, often without notification [Xiangzhu Gao, San Murugesan and Bruce Lo, 2003]. Therefore, updating of Web page address for every research reference article is not possible. ART has a dynamic approach for retrieving the articles across the Web. The title of every reference article is extracted from the reference region after successful reference parsing. Later, the title is submitted as a query to search engines and the article is harvested irrespective of the location of the article in the Web. During the tracking of references, pulling down each reference entry and parsing them results in metadata [Day M.Y., Tzong-Hon Tsai, Sung C.L., Lee C.W., Wu S.H., Ong C.S.and Hsu W.L., 2005; Mahalakshmi G.S. and Sendhilkumar Selvaraju, 2006], which is populated into the database to promote further search in a recursive fashion.
Depending upon the user’s needs the data is projected in a suitable representation. The representation is used to track information on specific attribute specified by the user, which includes the author, title, year of publication etc. thereby providing the complete information about the references for the journal cited.
This chapter discusses the design and implementation of a reference tracking system that deals with bringing reference linking value to the scholarly side of the Web and also suitably visualizes the collected document in a more understandable manner. The research described experimentally demonstrates that such systems encourage the viewing and use of information that would not otherwise be viewed, by reducing the cognitive effort required to find, evaluate and access information.
Key Terms in this Chapter
Web Search Query: A Web search query is a query that the user enters into Web search engine to satisfy his or her information needs.
Relevance Feedback: The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query.
Information Filtering System: An information filtering system is a system that removes redundant or unwanted information from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the information overload and increment of the semantic signal-to-noise ratio.
Information Retrieval: Information retrieval is the science of searching for information in documents, searching for documents themselves, searching for meta-data which describe documents or searching within databases, whether relational stand-alone databases or hyper textually-networked databases such as World Wide Web.
Text Mining: Text mining usually involves the process of structuring the input text (usually parsing along with the addition of some linguistic features and the removal of others, and subsequent insertion into the database), deriving patterns within the structured data, and finally evaluation and interpretation of the output.
Web Search: Web search means searching for documents located in the Web by supplying queries through search engines.