Article Preview
TopIntroduction
Trees are among the most important nonlinear structure arising in computer science (Knuth,2011). They occur in several diverse areas such as genealogical studies, computer vision and pattern recognition (Shih,1991), programming compilation and natural language processing.
The diversity of these application fields and the nature of the already studied trees leads to divide this family into three major classes: ordered trees, when there is a predefined order within each set of sibling nodes in the tree (Zhang,1995), unordered trees, when there is no significant order within each set of sibling nodes (Zhang,1996), and semi-ordered trees if some elements of a set of sibling are unordered while some others can be totally ordered (Aïda,2009).
This diversity induces a wide variety of algorithms resolving queries on tree structures. However, the increase in the amount of tree data especially in biology and also the size of each data requires the implementation of appropriate tools (O’Donoghue,2010). This requires not only effective processing, but also effective representation taking into account the complexity of tree data. A way to address the data storage problem is to use redundancies within the object and store them once. This principle can be applied for the compression of tree data since they often present internal redundancies of substructures in different scales of the tree. Hence, the problem of tree compression aims to condense its similar parts to build a reduced form that keeps the major original tree properties and achieves a better algorithmic conditions.
In the literature, data compression is an operation which consists in passing from an initial representation of an object to a more compact representation, that is to say requiring less storage space. A compression algorithm takes as input a data and generates a representation which requires fewer bits, and there is a reconstruction algorithm which generates from the compressed representation a reconstruction (Figure 1). Based on the reconstruction requirements, data compression techniques can be divided into two classes: lossless compression, in which is identical to , and lossy compression which allows to be different from (Sayood,2017). In this paper, we are concerned by the lossless compression.
Figure 1. Compression and reconstruction scheme
Several attempts have been proposed to solve the problem of ordered tree compression. In (Salah,2019) authors assign signatures to each node of the input tree to identify similar subtrees. For that, subtrees rooted in nodes with similar signatures are replaced by a single occurrence, then they used hierarchical relations between these occurrences to form a graph in which they augment each of its edges by the order of the child occurrence among the other children of the parent. This additional information makes possible the exact reconstitution of the initial tree from its compression. Bill and al in (Bill,2015) present another tree compression called top tree compression, that exploits repetitive patterns in a tree and that support navigational queries in a time. Ordered tree compression showed its importance in the representation of structured files (Lohrey,2013), this has largely contributed to the processing of queries on XML file compression, where Sakr presented a survey in this context (see (Sakr,2009) for review).