News Trends Processing Using Open Linked Data

News Trends Processing Using Open Linked Data

Antonio Garrote (Universidad de Salamanca, Spain) and María N. Moreno García (Universidad de Salamanca, Spain)
Copyright: © 2013 |Pages: 7
DOI: 10.4018/978-1-4666-2827-4.ch010

Abstract

In this chapter we describe a news trends detection system built with the aim of detecting daily trends in a big collection of news articles extracted from the web and expose the computed trends data as open linked data that can be consumed by other components of the IT infrastructure. Due to the sheer amount of data being processed, the system relies on big data technologies to process raw news data and compute the trends that will be later exposed as open linked data. Thanks to the open linked data interface, data can be easily consumed by other components of the application, like a JavaScript front-end, or re-used by different IT systems. The case is a good example of how open linked data can be used to provide a convenient interface to big data systems.
Chapter Preview
Top

Case Description

The main goal of the project was to make available daily news trends as a structured data source that could be used as an additional input in any data analysis task being performed in the organization. Computation of the news trends was to be achieved in a series of steps involving:

  • Crawling of news raw data from web sources.

  • Classification of the news data by country, language, and topic.

  • Extraction of trends using natural language processing techniques.

  • Storage of the processed trends in the data cluster in a structured format compatible with Apache Hive.

  • Building a data interface for the data available as a collection of web services that could be re-used by other applications without accessing directly the data stored in HDFS.

  • Providing a web application exposing the news trend data through a user interface that could be used by non technical users.

The team assigned to the project consisted of two developers with a good knowledge of the statistical techniques for natural language processing as well as experience with the underlying Hadoop platform and web development skills.

Complete Chapter List

Search this Book:
Reset