Article Preview
Top1. Introduction
The Optimal Binary Search Tree (OBST) problem is of great interest in data processing, as well as in distributed and centralized environments. The input is a sequence of n weighted keys (k1, k2, …, kn), which are to be placed into a binary search tree in such a manner that the expected search cost of the resulting binary search tree is minimized (Cormen et al., 2001). The classical sequential algorithm for this problem is based on the dynamic programming technique (Aho et al., 1974). It requires time steps and memory space. By using the monotonicity property of optimal binary search trees (defined later), Knuth (1972) derived an time steps algorithm in the same space. Yao (1982) obtained the same result thanks to the quadrangle inequalities. Eppstein et al. (1988) put forth an algorithm using the restrictive assumption of convexity, whereas the parallelization of the classical version was extensively utilized by the community of parallel processing researchers for different parallel computing models: Bradford (1994), Guibas, Kung and Thompson (1979), Tang and Gupta (1995), Karypis et al. (1994, 1996), Rytter (1988), and Kechid and Myoupo (2008). Little work has been produced for the parallelization of the Knuth approach. An efficient parallelization of this version in an abstract model, such as PRAM (Parallel Random Access Machine), is difficult. In PRAM, it is unknown how to use the monotonicity property to reduce the number of processors in poly-logarithmic-time computations (Karpinski et al., 1994, 1996).
In this work, we tackle the problem of parallelizing the OBST algorithm on the Bridging Coarse Grain BSP/CGM (Bulk Synchronous Parallel/Coarse Grained Multicomputer) model (Dehne et al., 1994, 1996; Cáceres et al., 1997; Cheatham et al., 1995; Valiant, 1990). CGM seems best suited for the design of algorithms that are not too dependent on an individual architecture. A BSP/CGM machine is a set of p processors, each having its own local memory of size m (with O(m)>>O(1)) and connected to a router able to deliver messages in point-to-point fashion. A BSP/CGM algorithm consists of alternating between local computations and global communication rounds. Each communication round consists of routing a single h-relation with h=O(m). A CGM computation/communication round corresponds to a BSP super-step with communication cost (Bradford, 1994). is the cost of the communication of a word in the BSP model. To produce an efficient BSP/CGM algorithm, designers tend to maximize speedup and minimize the number of communication rounds (ideally independent from the problem size, and constant in the optimum case).