Proposed in 1962, the Hough transform (HT) has been widely applied and investigated for detecting curves, shapes, and motions in the fields of image processing and computer vision. However, the HT has several shortcomings, including high computational cost, low detection accuracy, vulnerability to noise, and possibility of missing objects. Many efforts target at solving some of the problems for decades, while the key idea remains more or less the same. Proposed in 1989 and further developed thereafter, the Randomized Hough Transform (RHT) manages to considerably overcome these shortcomings via innovations on the fundamental mechanisms, with random sampling in place of pixel scanning, converging mapping in place of diverging mapping, and dynamic storage in place of accumulation array. This article will provides an overview on advances and applications of RHT in the past one and half decades.
Taking straight line detection as an example, the upper part of Fig.1 shows the key idea of the Hough Transform (HT) (Hough, 1962) . A set of points on a line y=kx+b in the image are mapped into a set of lines across a point (k, b) in the parameter space. A uniform grid is located on a window in the (k, b) space, with an accumulator a(k, b) at each bin. As each point (x,y) on the image is mapped into a line in the (k, b) space, every associated accumulator a(k, b) is incremented by 1. We can detect lines by finding every accumulator with it’s score a(k, b) larger than a given threshold.
From hough transform to randomized hough transform
The Hough Transform was brought to the attention of the mainstream image processing community by Rosenfeld (1969). Then Duda and Hart (1972) not only introduced the polar parameterization technique for more efficient line detection, but also demonstrated how a circle can be detected. Kimme, Ballard and Sklansky (1975) made circular curve detection significantly more effective by using the gradient information of pixels. Merlin and Faber (1975) showed how the HT could be generalized to detect an arbitrary shape at a given orientation and a given scale. Ballard (1981) eventually generalized the HT to detect curves of a given arbitrary shape for any orientation and any scale. Since then, a lot of applications, variants and extensions of the HT have been published in the literature. A survey on these developments of the HT is given by Illingworth and Kittler (1988).
However, the HT has several critical drawbacks as follows:
All pixels are mapped, and every bin in the grid needs an accumulator. If there are d parameters, each represented by M bins or grid points, one needs Md accumulators.
To reduce the computational cost, quantization resolution cannot be high, which blurs the peaks and leads to low detection accuracy.
Each pixel activates every accumulator located on a line, but there is only one that represents the correct one while all the others are disturbances.
If the grid window is set inappropriately, some objects may locate outside the window and thus cannot be detected.
Disturbing and noisy pixels cause many interfering accumulations.
Key Terms in this Chapter
Threshold Based Voting vs. Local Maxima Finding: Given a pre-specified threshold, an accumulator in an array is picked if it receives votes larger than the threshold, without considering any neighborhood. Finding a local maximum means to find an accumulator with its votes larger than those of accumulators located in its neighborhood area.
d Band Test: A pixel is said to fall in the d band of ? (it denotes a curve or surface) in the image space if the shortest distance from this pixel to ? is less than a pre-specified threshold d. Pixels falling in the d band of ? are regarded as belonging to ? and a d band test can be designed according to these pixels
Under-Constrained vs. Over-Constrained Equations: For a parametric equation of ? free parameters, we have a set of under-constrained equations with pixels of a number m < ? and a set of over-constrained equations with pixels of a number m = ? in a non-degenerate way.
Kernel Estimator: Every mapped point is memorized as the centre of a kernel function, e.g., a bell-shaped such as a Gaussian. Collectively, mapped points forms a density estimation for a multi-mode distribution, with each mode in place of the above cluster centre.
Diverging Mapping vs. Converging Mapping: Given pixels of a number m, a set of under-constrained equations specify a curve or manifold of a dimension = ? – m in R? if m < ?. E.g., from a line y=kx+b passing a given pixel in the image, we have a line b=y-kx in R2. This case is called diverging mapping because m pixels are mapped diversely to the R? space. On the other hand, if m = ?, a unique point in the R? space maybe determined by solving a set of joint equations or optimizing a cost when the joint equations are over-constrained, i.e., we have a converging mapping that maps m pixels into one point in R?.
Cluster Analysis: Beyond using an accumulation array, in the cases of a converging mapping, every mapped point in R? is memorized. After an enough number of converging mappings, we get a set of points on which cluster analyses can be made to find clusters’ centre (mean or median).
Random Sampling: Given a set of N pixels, we take a number m of pixels with each picked randomly with a probability 1/N. Repeating this sampling by an enough number of times, a global configuration of N pixels will emerge, without enumerating all the N pixels.