Article Preview
TopIntroduction
The outstanding growth of both context-aware environments and user-generated content coming from social network communities prompted the investigation of new ways to represent domain knowledge and its relationships. Semantic Web tools provide the instruments to significantly enrich the knowledge representation through a wide range of semantics-based models. These models, often called ontologies, support users in understanding the meaning of a resource and the related domain. Ontologies are fully comprehensive models for describing domain-specific knowledge. Knowledge representation entails (i) shared agreement of meaning, (ii) term disambiguation, and (iii) domain description through concepts and relationships. The contribution of both advanced data mining algorithms and semantics-based knowledge representation may enhance the knowledge discovery process in a broad range of application contexts, such as social behavior analysis, knowledge discovery from user-generated content, and Web service personalization.
Useful ontologies for a given application domain can be either provided by domain experts or (semi-)automatically inferred from the data of interest. Although the Semantic Web already provides a full technological stack to access semantics-based resources, most of the existing approaches, such as the creation of ontologies in a Web Ontology Language like the OWL (World Wide Web Consortium, 2009), still heavily rely on the human intervention. Hence, the machine-driven construction of meaningful ontologies is becoming an increasingly appealing target in several research fields, including information retrieval, data mining, and data summarization. For example, the exponential growth of social media like blogs and social network services has significantly increased the need of useful ontologies to efficiently support the analysis of large data volumes. Thus, novel and more efficient approaches to automatically construct useful ontologies tailored to the analyzed data are desirable.
This paper presents a novel and effective semi-automatic approach to construct ontologies tailored to structured data. Structured datasets are data collections whose content is organized by means of a schema that describes the relevant data features of interest. For instance, a relational dataset schema is characterized by a set of attributes which describes the main data features. Similarly, an XML dataset is characterized by a set of tags (elements). For the sake of simplicity, we will denote the data features belonging to the dataset schema as attributes in the rest of this paper for both relational and XML data. Ontology construction commonly entails a two-level analysis: (i) an intensional data analysis, to represent shared data concepts and their relationships, and (ii) an extensional data analysis, to represent instances of concepts (i.e., the individuals) and their associations. Different approaches can be used to model and represent ontologies. For instance, Description Logic (DL) (Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider, 2003) can be profitably used to represent ontologies. Ontology representation based on Description Logic relies on two main components: (i) the Tbox component, which includes intensional statements about general concepts, and (ii) the Abox component, which includes both extensional statements about the individuals of those concepts and membership statements, which bind instances with their concepts.