Automatically Extracting and Tagging Business Information for E-Business Systems Using Linguistic Analysis
Sumali J. Conlon (University of Mississippi, USA), Susan Lukose (University of Mississippi, USA), Jason G. Hale (University of Mississippi, USA) and Anil Vinjamur (University of Mississippi, USA)
Copyright: © 2009
The Semantic Web will require semantic representations of information that computers can understand when they process business applications. Most Web content is currently represented in formats such as text, that facilitate human understanding, rather than in the more structured formats, that allow automated processing and computer understanding. This chapter explores how natural language processing (NLP) principles, using linguistic analysis, can be employed to extract information from unstructured Web documents and translate it into extensible markup language (XML)—the enabling currency of today’s e-business applications, and the foundation for the emerging Semantic Web languages of tomorrow. Our prototype system is built and tested with online financial documents.