Article Preview
TopIntroduction
Scalability, besides high performance and fault tolerance, is an important feature of modern applications (Tanenbaum & van Steen, 2006; Alvin et al., 2010; Lavinia, Dobre, Pop, & Cristea, 2011). Scalable software applications may be distributed over nodes of a multicomputer. Such loosely-coupled architecture is very cost effective and easy to assemble. The only extra effort that should be made consists in developing additional middleware or/and software tools for managing cooperating multicomputer's resources (Sbalzarini, 2010). Such tools and frameworks may assist developers at different levels of programming, requiring more or less attention to be devoted to the problem of scalability.
At a very low level, simple libraries may be applied, such as MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) (Gropp & Lusk, 1998). They are universal but have to be used by a skilled programmer. They enable the application of message-passing programming model for low-level programming. Both libraries are flexible and universal because MPI and PVM assume that there is no shared memory, and a topology of the network is not important. Actually, the final scalability of an application using PVM or MPI strongly depends on the skills of the programmer. On the other hand, it is the most flexible approach, and the scalability may concern many different aspects and resources (memory, CPU, disk, etc.).
Google MapReduce (Dean & Ghemawat, 2004) is a framework and a programming model. A problem to be solved is split into sub-problems (in “Map” stage) and processed separately by multicomputer nodes. Then the sub-results are combined (in “Reduce” stage) to obtain the final solution of the problem. Pattern matching and sorting are examples of problems which can be solved in this way. MapReduce is highly scalable and may efficiently utilize a large number of the nodes. Summarizing, it requires less programmer's efforts to build a scalable application, but it is well-suited to specific problems only.
The above solutions may be helpful in developing a scalable application, where scalability concerns different resources and also CPUs computing power. An important resource of a scalable application may be distributed RAM used as a data storage. It has many advantages over a hard-disk, including very fast access to data and possibility to process the data simultaneously by many nodes of a system running the application. Thus, the usage of a RAM as data storage, instead of a hard-disk is investigated in the paper.
Distributed Shared Memory system (DSM) (Nitzberg & Lo, 1991), like IVY (Li & Hudak, 1989), simplifies the process of developing an application. It creates a shared memory out of separated RAMs of multicomputer nodes. Moreover, applications designed for typical shared memory systems (even home computers) may be easily adopted for a multicomputer by means of DSM. The DSM may be implemented in hardware and/or in software. The DSM may be used to develop a scalable application. On the other hand, its usage is limited, and may become a bottleneck if inefficiently used.
Scalable Distributed Data Structures (SDDS) (Litwin, Neimat, & Schneider, 1996) are a middleware that can use distributed RAM as a scalable file of records. An SDDS file may be implemented in any programming language, may be used with many applications, even as a block device for a filesystem (Chrobot, Lukawski, & Sapiecha, 2008) or as a part of a database (for example, AMOS (Ndiaye, Diene, Litwin, & Risch, 2001)). However, all SDDS architectures require a large number of records to be moved between buckets (located on nodes of a multicomputer) during data expansion (Litwin, Neimat, & Schneider, 1996). Moreover, the addressing rules should take into account characteristics of the data, e.g. a probability distribution of record keys. Otherwise, the RAM may be used inefficiently. These are serious drawbacks that considerably diminish the area of SDDS applications, despite their fast access to the buckets.