Article Preview
Top1. Introduction
The majority of contemporary scientific advancements have been based on the ability to identify specific properties of data, and provide both analytical and predictive capabilities. Furthermore, with the increasing availability of big-data sets, new challenges, as well as opportunities have risen which are at the very core of Big Data research.
In particular, data come in a variety of types, forms, and size, which makes the way we extract and assess information a crucial step in gathering intelligence. However, big data-sets need to be suitably manipulated and assessed to ensure they can be effectively analysed.
In this paper we introduce a novel method to topologically reduce networks created by the elements of data-sets, and their mutual relationships. This provides a tool to superimpose networks on top of real-world data to describe their main properties, whilst providing a computationally efficient method.
Network theory has been developed since the birth of discrete and combinatorial mathematics (Bollobas, 1998) which, broadly speaking, aims to describe and represent relations, referred to as edges, between objects, or nodes. In particular, it has a huge set of applications within a variety of multi-disciplinary research fields, including applied mathematics, psychology, biomedical research, computer science, to name but a few (Dingli, et al., 2012).
Formally, networks are defined as a collection of nodes, called the node set, which are connected as specified by the edge set (Albert, et al. 2002)..
Although networks are based on relatively simple mathematical concepts, their general properties exhibit powerful features that can be applied to model complex scenarios (Trovati, et al., 2014)
Data often consist of elements, which could be numeric values, physical entities, or general semantic concepts, which are linked by relationships. Despite its intrinsic vagueness, this can be effectively described by using networks, even though populating the edge and node sets is typically a complex task. In fact, extracting the relevant information can be challenging especially when addressing unstructured data-sets. Furthermore, when size plays a crucial role, such as in Big Data, such extraction can be even more difficult to carry out effectively. Therefore, there are several methods to generate networks from data, which can be, in turn, investigated according to the overall features of such networks.
One of the most important parts in this investigation is to determine the topological structure of a network to allow a complete mathematical and statistical investigation of the data set(s) associated with it.
Network analysis techniques have been extensively investigated and the use and applications of network data has been proposed previously in a wide range of real-world complex settings (Akoumianakis, et al., 2012) (Zelenkauskaite, et al., 2012). In general, it has been found that the majority of network analyses ignore the network itself that it is the actual focus of this work.
Networks are relatively simple to define based on suitably processed data sets. In fact, via data and text mining techniques, it is possible to isolate semantic objects, such as physical, as well as conceptual entities, along with their mutual relationships determined by hierarchical properties of the corresponding data sets.