Article Preview
TopEtl
Regarding sources of data, many different database management systems (DBMS) can be used; and it is usually best when one such system can be found. The reason is quite simple, i.e., one can use Structured Query Language (SQL) to extract data from such data sources (Rabuzin, 2012; Rabuzin, 2014). More often, flat or Excel files, or even old, not well-documented applications exist as data sources. Other types of data sources may require specialized knowledge and skills in order to extract and to use data. As Liu and Shi (2014) recognized, business data could cause severe problems:
- •
Business data often encounters quality issues and needs substantial cleaning efforts;
- •
Business data is large in overall size but cannot be fully shared due to the concern of data security;
- •
Business data often needs to be cross-referenced with public databases to reveal more information and knowledge.
In recent years, many NoSQL systems became available that contain relevant data. NoSQL systems are usually characterized by large volumes of data that do not share the same structure and that do change quite often. Volume, variety and velocity represent big data challenges, which cause difficulties in capture, storage, search, sharing, analysis and visualization (Abdelhafez, 2014). So there is a problem when one needs to extract data from all those sources, because they are not standardized; each system uses a different query language. For example, graph databases are interesting as they can represent nodes and their relationships. The query language that is used could be Cypher Query Language or Gremlin. For column-oriented databases (e.g., HBase), another specialized language is used. Key-Value databases are not an exception; neither are document-oriented databases. So one needs to be familiar with these systems in order to extract data. For more information on NoSQL databases, one could look at (Redmond & Wilson, 2012) or (Robinson, Webber & Eifrem, 2013).
Furthermore, in recent years Facebook and Twitter are becoming of increasing interest, because people write and publish large amounts of text on them; text analysis (mining) is becoming more and more important. Measuring a user’s opinion and attitudes could be very challenging, but at the same time it is known that people do publish things that could be disturbing or even dangerous. There are an increasing number of cases where people wrote on Facebook that they were going to do certain things that were criminal or dangerous, and then really did those things.