Concept-Oriented Query Language

Concept-Oriented Query Language

Alexandr Savinov
Copyright: © 2014 |Pages: 11
DOI: 10.4018/978-1-4666-5202-6.ch046
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Chapter Preview

Top

Introduction

With the explosion of data volume and the variety of data sources (Cohen et al., 2009) – two aspects of the big data problem - we observe quite significant difficulties in applying conventional data analysis methodologies to real world problems. The existing technologies for data management and analytics were pushed to the limits of their ability to solve more and more complex analysis tasks:

  • Agile Analytics: Perhaps the most widely used methodology for data analysis during several decades is based on the multidimensional metaphor where data is viewed as existing in a multidimensional space. A problem of this approach is that it is based on application-specific scenarios with predefined roles of dimensions, measures, cubes and facts. Changing such scenarios is a quite difficult task because they are embedded in both database systems and client software. The goal of agile analytics consists in going beyond standard OLAP analysis by facilitating exploratory ad-hoc analytics where the user can freely vary all data processing and visualization parameters.

  • Self-Service Analytics: The conventional approach to analysis is to approach IT department which however has several drawbacks: business frequently does not trust data provided by IT, IT is unable to understand the needs of the user (and this leads to frustration and low motivation), IT might not be able to respond to user requests as quickly as is desirable (and the requirements may well change during the response time), existing BI tools are not intended for non-professional users. Self-service analytics is one of the most significant trends in the BI industry over the last few years and these tools aim to give non-professional users the ability to solve analytical tasks with little or no help from IT.

  • Near Real Time Analytics: It may take days to generate a BI report in a typical enterprise system and there is strong demand in reducing the time between data acquisition and making a business decision. One of the core problems is that traditional systems are based on two separate technology stacks: for transactional workload and for analytical workload. The design principles and techniques of these two subsystems are quite different and they cannot provide the necessary response time and agility of decision making on large volumes of data (Chaudhuri, Dayal, & Narasayya, 2011; Thiele & Lehner 2012). Although modern hardware provides a basis for a new generation of in-memory, columnar databases (Boncz, 2012; Larson, 2013) with potentially higher query performance on analytical workloads, it is important to understand that real time analytics is not a hardware problem - new data models, new query languages, new analysis scenarios, new analysis algorithms are needed.

  • Semantic Analysis: The conventional approach is that it is the task of the human analyst to understand the meaning of data while the system has to only execute precise queries. However, a typical enterprise system can contain tens of thousands data tables and open systems can involve numerous external data sources. In this situation it is extremely difficult to get meaningful results manually. Existing solutions add semantics via a separate layer which is based on quite different data modeling and analysis techniques. This leads to complex mappings and translations at all levels of the system architecture.

  • Reasoning about Data: The goal of this type of analysis is to answer questions by automatically deriving them from the available data. This task has been a prerogative of the systems based on formal logic which have several drawbacks: formal logic is not natural for expressing analysis tasks, formal logic is not very suitable for numeric analysis, formal logic requires a separate system because it is not directly compatible with available data storage, queries in formal logic are computationally expensive.

  • Analytical Computations: Analysis is not limited by the operations of grouping and aggregation. Now analysts need to embed arbitrary computations in their analysis tasks. Such tasks are normally expressed as batch jobs where data is exported from one or many databases and then processed using an analysis program. Executing arbitrary analysis tasks close to the data (ideally directly where data resides) is still a big problem. It is actually a new incarnation of the old problem of incompatibility between programming and data modeling (impedance mismatch) because data is modeled and manipulated differently in programming languages and databases.

Key Terms in this Chapter

Logical Navigation: Is an approach to querying where the result is specified via a path in the model structure which leads from source elements to the elements from the result set. In COQL, logical navigation is supported by projection and de-projection operations. A sequence of projection and de-projection operations is referred to as a logical access path.

Inclusion: Is relation between concepts which generalizes classical inheritance. One difference of inclusion from inheritance is that it describes a hierarchy of data elements where child elements share their parent element. Another difference is that it also models containment relation where child elements exist within their parent element.

Arrow Notation: Is an approach to data access where fields are used to navigate through a structure. The main difference from dot notation is that arrow notation is a set-oriented approach and arrow operators are applied to and return sets of elements rather than individual elements. Another difference from dot notation is that arrow notation uses two opposite operators for navigating in both directions. In COQL, arrows denote projection and de-projection operators.

Concept-Oriented Query Language (COQL): Is a syntactic embodiment of the concept-oriented model (COM). It is a join-free query language which uses references and model multidimensional structure for connectivity. At the same time, it is a set-oriented approach because its operators manipulate sets of data elements rather than individual elements. It is also a semantic language because its constructs reflect and rely on basic semantic relationships existing in the model. Main operations of this query language are projection, de-projection and product (cube).

Inference: Is a procedure where constraints imposed on some source sets of elements are automatically propagated to some target set by returning related data elements as a result set. In COQL, inference operator is implemented as an access path consisting of two parts: de-projection part and projection part.

De-Projection: Is an operation applied to a set of elements and returning all their lesser elements in the partially ordered set. In terms of references, it returns a set of elements which reference the source elements along the specified dimension.

Projection: Is an operation applied to a set of elements and returning all their greater elements in the partially ordered set. In terms of references, it returns a set of elements which are referenced by the source elements along the specified dimension.

Concept: Is a syntactic construct which is used to describe a data type and generalizes conventional classes. Concept is defined as a couple of one identity class and one entity class. Thus concepts can model both values (if entity class is empty) and objects (if identity class is empty).

Complete Chapter List

Search this Book:
Reset