Article Preview
TopIntroduction
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (“in parallel”). There are different forms of parallel computing: bit-level, instruction level, data, and task parallelism (Culler, Singh, & Gupta, 1998). Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling for improving performance of microprocessors. As power consumption (and consequent heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multicore processors.
Parallel computers can be roughly classified according to the level at which the hardware supports parallelism - with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, MPPs, and grids use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks. Parallel computer programs are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically one of the greatest obstacles to getting good parallel program performance. The maximum possible speedup of a program as a result of parallelization is observed as Amdahl's law.
OpenMP is very successful in reducing the time required for the computation of various mathematical expressions. It was developed in the year 1997 and has been growing since then. Various versions have been developed and the last release was Version 3.1 in July 2011. OpenMP has a very good interface to develop parallel programs as the applications are getting larger and more complex. It has basically provided shared memory multiprocessing and multi-threading in commonly used programming languages which include C, C++ as provided in the documentation (Developing Multithreaded Applications, 2005). The programs developed can be scaled to a desktop or a supercomputer. OpenMP uses pragmas and work sharing constructs to divide the workload of a thread into multiple sub-threads so that more threads work on a single task than a single thread trying to do the entire work.
Parallelism can be done at two levels. One is process level parallelism in which many processes run simultaneously. A typical example of process level parallelism is running many applications together on an operating system. You can run media player to listen to music, edit files in a word document, copy data to and from a CD to the hard disk location simultaneously. The second is thread level parallelism which means that several threads run concurrently. For example, within a document several options like editing text, cut, paste, and preview options are handled by different thread in parallel. We use thread level parallelism rather than process level parallelism because of various reasons which are discussed in the Table 1.
Table 1. Comparison between a process and a thread
PROCESS | THREAD |
Different address space is available for each process. | Same address space is allocated for different threads. |
Processes are heavy weight. | Threads are light weighted than process. |
They have more overhead. | They have less overhead than processes. |
Context switching is higher. | Context switching is lower. |
Inter-process communication is high. | Inter-thread communication is low. |