Efficient approaches to computationally intensive image processing tasks are currently highly sought after. In this chapter, the authors show how a blackboard paradigm, originally developed for collaborative problem solving, can be used as an efficient and effective vehicle for distributed computation. Through the design of dedicated intelligent agents, typical image processing algorithms can be applied in parallel on multiple loosely coupled machines leading to a significant overall speedup as is verified in a series of experiments.
In a tightly-coupled architecture all processors share the same main memory and work, concurrently, on the same data. Consequently, this type of system largely eliminates the need for explicit message passing between concurrent tasks. Multi-threaded programming allows applications to branch into independent concurrent threads and is not restricted to shared-memory multi-processor architectures. In general, multi-threaded applications are well suited to multi-processor architectures. This is because individual threads can run concurrently on different processors. As multi-threaded applications share the same address space, they cause considerably fewer overheads than the creation of an equivalent number of processes.
In a loosely-coupled architecture, parallel image processing tasks typically consist of four main steps: image distribution, local processing, data transfer during processing, and segment accumulation. Distribution is the process of dividing an image into segments each of which is assigned to a unique processor (Taniguchi et al., 1997). Under a duplicate distribution scheme each processor is sent an exact copy of the original image. Alternatively, more complex schemes can be adopted where an image is divided into a variable sized matrix (Nicolescu and Jonker, 2000). After distribution, each processor applies local image processing to its allocated segment. When data allocated to other processors are required, they are transferred by inter-processor communication. Finally, after application of the parallel algorithm, segments are accumulated into a resulting image.
Inter-processor communication is required when data allocated to other processors are needed, and can be categorised into groups based on their pattern of data access (Seinstra, Koelma and Geusebroek, 2002). These patterns also represent a strategy for synchronisation between communicating processors. One-to-one access is common in tasks such as image brightening or colour correction, where an output pixel maps directly to a pixel in the input image. Alternatively, a one-to-many relationship exists in neighbourhood operators, such as edge detection filters, which calculate an output based on a function of the input pixel’s immediate neighbourhood. Naturally, the handling and transmission of non-contiguous data differs from data stored as contiguous blocks. Data stored randomly in memory causes additional overheads due to its packing into a contiguous buffer before transmission (Hoare, 1985).