Stochastically Balancing Trees for File and Database Systems

Stochastically Balancing Trees for File and Database Systems

Aziz Barbar (Department of Computer Science, American University of Science & Technology, Beirut, Lebanon) and Anis Ismail (Department of Communication and Computer Network Engineering, University Institute of Technology, Lebanese University, Saida, Lebanon)
Copyright: © 2013 |Pages: 13
DOI: 10.4018/jgc.2013010104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

With the constant improvement in data storage technologies, a new generation of indexing mechanisms is to be created to exploit the improvements in disk access speeds that were previously impractical. The self-balancing tree B-Tree, has long been the indexing structure of choice for reducing the amount of disk access at the expense of size of data block to be read or written. A new technique based on a dynamically growing multilevel list structure, which is stochastically balanced rather than self balanced, is discussed and compared to the B-Tree. An analogy between the technique and the structures is established to better compare the computational complexities.
Article Preview

Introduction

As far as regularly available storage devices are concerned (Sugaya, 2006; Kawamoto, 2008) the read and write time can be described as:Tread=TA +S*TRTwrite=TA +S*TR(1) Where, TA is the access time for the disk, S represents how much data needs to be read/written and TR/TW represents how much time is needed to actually read/write data.

TA is an overhead which is present in hard drives due to the mechanical nature of the access that is vastly slower than the electronic operation.

Historically indexing techniques were designed to reduce the total amount of disk operations to minimize the effect of TA on the overall performance of the technique. However, these techniques might be inefficient in the case of newer storage devices such as flash memory and other forms of random access memory where TA is due to an electronic process and therefore becomes negligible with respect to TR/TW.

In case of random access memory, the dominant factor in the read operation becomes S and TR/TW under which circumstances we should optimize the indexing technique to reduce the product. For a fixed TR/TW, the only variable that can be reduced is S, that is the total amount of data that is read/written at once.

There are two main data structures that are of interest in this document. They are the Self Balancing B-Tree, and the List structure.

Self Balancing B-trees (Bayer, 1971; Bayer et al., 2002) are most commonly found in databases and file systems. The idea behind B-trees is that internal nodes can have a variable number of child nodes within some predefined range called the order of the B-Tree. As data are inserted or removed from the data structure, the number of child nodes varies within a node and so internal nodes are coalesced or split so as to maintain the designed range.

For a 3-4 B-tree (shown in Figure 1), each internal node may have only 3 or 4 child nodes. A node is considered to be in an illegal state if it has an invalid number of child nodes; it must be split. Accessing a key in the tree on average takes logn(N)operations where N is the amount of keys in the tree and n is the order of the tree.

Figure 1.

3-4 B-tree

Many types of variants to the B-Tree have been developed with very subtle differences to the B-Tree. Such variants include the B*-Tree (Berliner, 1978) and the B+-Tree (Taniar et al., 2003).

B-Trees are not the only types of indexing structures; other indexing structures, which are specialized in certain types of indexing, have been developed. These include structures dedicated to Video Indexing (Chen et al., 2002), Image Indexing (Ljosa et al., 2006), String Indexing (Kahveci et al., 2001), Regular Expression Indexing (Chan et al., 2003), and Indexing techniques for Data Warehouses (Ester et al., 2000).

One of the most active areas of research is the use of indexing in applications like geographic information systems where we refer to it as Spatial Indexing (Guttman, 1984). There are many approaches to such Spatial Indexing, especially high dimensional spatial indexing (Sakuraim et al., 2000; Chakrabarti et al., 1999; Berchtold et al., 1996; Katayama et al., 1997).

General purpose indexing techniques include the graph index approach (Yan et al., 2004) and hashing (Ramabhadran et al., 2004). Charguéraud (Charguéraud, 2010) verified many functional tree algorithms in Okasaki’s book (Okasaki, 2010) with a new method of transforming a program into a proposition transformer. However, neither Charguéraud’s verification nor the book contains WBT algorithms. Fundamental modules Data.Set and Data.Map in Haskell (Marlow, 2010) and the wttree.scm library in MIT/GNU Scheme and slib are based on a variant of the WBT algorithm.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 2 Issues (2017): Forthcoming, Available for Pre-Order
Volume 7: 1 Issue (2016)
Volume 6: 2 Issues (2015)
Volume 5: 2 Issues (2014)
Volume 4: 2 Issues (2013)
Volume 3: 2 Issues (2012)
Volume 2: 2 Issues (2011)
Volume 1: 2 Issues (2010)
View Complete Journal Contents Listing