Article Preview
TopExtensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere (Quin, 2016). Thus, many studies have focused on XML documents distribution to store XML data in distributed environments.
Distributed XML documents, which may be called distributed trees, are documents which have been partitioned and sent to various nodes and are linked together to form a complete XML document (Abiteboul, Gottob, & Manna, 2008). The process can be accomplished through embedded function calls to the separate documents over a network from within a centralized node in the distributed system (Abiteboul et al., 2008). Previous researches have focused on distributing XML documents based on data size, others are based on data structure. The most efficient distributed systems consider both data size and structure. The system discribed in (Seyed-Abbassi & Gordon, 2015; Aljawarneh, 2011) distributes XML documents in a cloud services system using a kernel document and many distributed cloud nodes. Figure 1 discribes the system which distributes the XML document through an algorithm of least load (Seyed-Abbassi & Gordon, 2015) based on the number of cloud servers available and the size of the original XML document and preserving the tree structure of the XML document. The algorithm splits each of the subtrees into parts based on which distribution node has the most space still available (Seyed-Abbassi & Gordon, 2015) and once the load is determined, it stores each subtree in the corresponding cloud. The kernel document indicates which cloud service holds which partitionned document. For the backup the system stores the same partitions with the same loads into a second cloud service, this redundancy allows to find the lost data in case a node is unavailable due to electrical issues, being compromised, hacking, or any other problem (Seyed-Abbassi & Gordon, 2015; Aljawarneh et al., 2015).