Semantic Web Data Partitioning

Semantic Web Data Partitioning

Trupti Padiya, Minal Bhise, Sanjay Chaudhary
DOI: 10.4018/978-1-4666-2494-8.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Semantic Web database is an RDF database due to increased use of Semantic Web in real life applications; one can find heavy growth in RDF database. As there is a tremendous increase in RDF data, performance and scalability issues are of main concern. This chapter discusses improving and scaling up query performance for increasingly growing Semantic Web. It discusses current Semantic Web data storage techniques, which have been found to scale poorly and have poor query performance. It discusses the partitioning techniques vertical and horizontal partitioning to improve query performance. To further improve the query performance, along with these partitioning techniques, various compression techniques can also be used. Relational data offers faster execution of queries as compared to RDF data. To demonstrate these ideas, semantic data is converted to relational data and then query performance improvement techniques are applied. The scaling up of Semantic Web data is also discussed.
Chapter Preview
Top

Issues In Accessing Semantic Web Data

The Semantic Web data, which is in RDF format <Subject, Property, Object> known as an RDF triple can be stored in database table format. To retrieve data efficiently we can store data tables in two ways: row-by-row or column-by-column. The former approach will keep all information about an entity together. For example in a vendor table, it will store all information about the first vendor and then all information about the second vendor and so on. The later approach will keep all attribute information together: the entire vendor names will be stored consecutively, then the entire vendor addresses and so on. Now as these designs are even handed the choice is based on performance expectations. If it is expected the result should be based the granularity of an entity for, e.g., find a vendor, add a vendor, delete a vendor, etc., then the row-by-row storage is preferable as all of the required information will be stored together. On the other hand, if the expected result tends to read per query only a few attributes from many records will be the result, for, e.g., a query that finds the most common e-mail address domain, then column-by-column storage is preferred as other attributes which are not required for a particular query need not have to be accessed. This is what is said to be partitioning or division of a logical database or its constituting elements into distinct independent parts, which is done for manageability, performance or availability. These are our basic requirements in order to scale up Semantic Web data.

Scalability

As the Semantic Web evolves, scalability becomes increasingly important. Triples are used for resource description on the Web. All resources are described with many triples and this makes a complex graph of relationships, which includes references to other resources and the relationships in semantic Web data also have types. It is therefore necessary that triple stores can deal with large numbers of triples in real life. To fetch and store triples also impacts scalability, it brings in the issues of timeliness, caching and other general problems from a distributed system, which can fail or be delayed.

Performance

Manipulation of data in a triple store is required as triples can be added, modified, and/or removed. This needs support for identifying triples, querying the graph as well as administering the graph—i.e. creating and deleting it and other operations for transferring data to and from the network. Indexes should be created for efficient searching, or for providing specialized searches such as text-based search, or other searches based on properties, sub-properties, logical inference, etc. It is always possible that data from multiple sources will be merged into single graphs of data, and the relationships between them will connect them up and this needs triple-store support in order to handle such merging when the graphs are large enough and also should be possibly separated again later. Such manipulations by applications using semantic Web data will be occurring many times for even simple systems and thus these need to be lightweight, fast, and easy to understand conceptually as well as easy to use via APIs from software or Web methods.

Complete Chapter List

Search this Book:
Reset