1 May 2009 Novel template matching method with sub-pixel accuracy based on correlation and Fourier-Mellin transform
Author Affiliations +
Template matching is the process of determining the presence and the location of a reference image or an object inside a scene image under analysis by a spatial cross-correlation. Conventional cross-correlation type algorithms are computationally expensive. Furthermore, when the object in the image is rotated, the conventional algorithms cannot be used for practical purposes. An algorithm for a rotation-invariant template matching with subpixel accuracy is proposed based on the combination of the correlation and Fourier-Mellin transformation when the fluctuating scope of the rotation angle is [-20 deg, 20 deg]. The algorithm consists of two stages. In the first stage, the matching candidates are selected using a computationally low-cost improved correlation algorithm. The operation of AND is adopted to reduce the computational cost for this stage. In the second stage, rotation invariant template matching is performed only on the matching candidates using the cross-correlation algorithm after adjusting image with a Fourier-Mellin invariant (FMI) descriptor, and the matching precision is subpixel by the novel method using the Fermat point. Experimental results show that the proposed method is very robust to Gaussian noise and rotation, and it also achieves high matching accuracy and matching precision.
Zhang, Lei, and Liu: Novel template matching method with sub-pixel accuracy based on correlation and Fourier-Mellin transform



Template matching is a classical problem in scene analysis: given a reference image of an object, decide whether that object exists in a scene image under analysis, and find its location if it does. The template matching process involves cross-correlating the template with the scene image and computing a measure of similarity between them to determine the displacement.1, 2, 3 Since the evaluation of the correlation is computationally expensive, there has been a need for low-cost correlation algorithms for real-time processing. A large number of correlation-type algorithms have been proposed.4, 5, 6 One of the approaches is to use an image pyramid for both the template and the scene image and to perform the registration by a top-down search.7 Other fast matching techniques use two-pass algorithms, using a subtemplate at a coarsely spaced grid in the first pass, and searching for a better match in the neighborhood of the previously found positions in the second pass.8

However, when objects in the image are rotated with respect to each other, the methods described earlier cannot be used, and a set of templates at different orientations is to be used. This procedure is hardly practical for real-time processing when the rotation angle is arbitrary or unconstrained. In such applications, rotation-invariant moment-based methods can be used.9, 10 The moment-based methods, however, require too much computation for practical purposes and are sensitive to noise.

In this paper, we propose a novel fast rotation-invariant template matching method when the fluctuating scope of the rotation angle is [20deg,20deg] . The method consists of three stages. In the first stage, the matching candidates are selected using an improved correlation algorithm with a computationally low cost. In the second stage, rotation-invariant template matching is performed only on the matching candidates using correlation algorithms after adjusting the image with a Fourier-Mellin invariant (FMI) descriptor.11, 12 In the third stage, the exact location of the template is obtained with subpixel interpolation. The methods in the first and third stages are novel, and an improved method in the second stage is used: The 2-D phase correlation is replaced by 1-D phase correlation to accelerate the registration parameter estimation process.

The remainder of the paper is organized as follows. Section 2 gives a brief overview of convolution and the matching method based on correlation. Section 3 presents our proposed algorithm for candidate selection, the improved FMI, and the subpixel interpolation method. In Sec. 4, various simulation results from the use of the proposed schemes are reported and compared with results from the conventional methods in terms of the speed of the algorithm, the presence of noise, and the brightness and contrast variations. Last, we give concluding remarks in Sec. 5.


Convolution and Correlation Algorithms

The convolution algorithm is often used in digital image processing,13, 14 which is simple and efficient. The convolution algorithm and the correlation algorithm are all based on the template matching. Meanwhile, they bear partial similarities in terms of algorithm [see Eq. 2 and the numerator in Eq. 3]. Inspired by this a novel and simplified algorithm similar to the convolution algorithm is proposed, which has not only reduced the complexity of the correlation algorithm, but also withstands rotation within certain angles. The theories of the convolution algorithm and the correlation algorithm are as follows.


Convolution Algorithm

For 2-D images, we will explain only the theory of 2-D convolution. The convolution h of the bivariate continuous functions f and g , given in Eq. 1, is similar to that of the 1-D convolution; the only difference is that the two variables x and y are independent. Their 2-D convolution expression is as follows:



The convolution of the digital image is similar to the continuous function; the only difference is that the value of the independent variable is an integer. The double integration is transformed into a double sum. Therefore, we get Eq. 2 for a digital image:


where G is the convolution kernel, F denotes digital images, and H is the result of the convolution. Because the values of F and G are nonzero only within a limited range, we need to calculate only the sum of the field that is overlapped with the nonzero part.


Cross-Correlation Algorithm

The normalized cross-correlation theory is as follows. Let us define a template to be Gs with the size of Ms×Ns pixels and an image Gr with the size of Mr<Nr pixels, respectively, and Ms<Mr , Ns<Nr . Define the top-left corner of the image as the origin and the subimage as Gr(u,v) in the image with the size of Ms×Ns pixels. The normalized cross-correlation coefficient ρ(u,v) between Gs and Gr(u,v) is then defined as15


where the grayscale average of Gr(u,v) and Gs is respectively represented with Gr(u,v)¯ and Gs¯ . ρ(u,v) in the scope of [1,1] . The best matching location is determined by the maximum cross-correlation coefficient corresponding to the correlation peak, and the matching error is determined by the location of the correlation peak and the size and position of the template.


Template Matching Method Based on Correlation and Fourier-Mellin Transform


Candidate Selection Process

Enlightened by the convolution algorithm, we considered to simplify the crosscorrelation algorithm so that it will have the algorithmic mode of the convolution algorithm; in this way, the computation will be greatly reduced. However, we must point out that this change just takes the convolution algorithm as reference. Although they seem similar to convolution in appearance, they are different in meaning—the novel algorithm does not have the operational properties of the convolution. The realization of the simplification is as follows.

In Eq. 3, we presumed that

Then, Eq. 3 can be transformed into Eq. 4 as follows:



Suppose that we study only the binary image. Obviously, to remove the denominator in Eq. 4 and make it similar to the mode of the convolution, we must focus on the denominator in Eq. 4. After studying Eq. 4, we find that GSi,j is a fixed value, while others are all variable, which increases the complexity of the computation. This is because ρ(u,v) is mainly about the degree of the cross-correlation. So we consider just the correlation coefficient based on the template and abandon the computation of GRi,j in Eq. 4, reducing the computational cost. The method is as follows.

The pixel values of the binary image are usually 0 and 255 or 0 and 1. Suppose that the pixel values of the binary image are only 0 and r . Since it is based on the template, the sum of square of each pixel in the template can be used as the denominator, instead of the denominator in Eq. 4. So the correlation coefficient ρ(u,v) can be simplified as



Obviously, the denominator of Eq. 5 is a constant. Suppose that it is N . Equation 5 can be further simplified into Eq. 6:



Apparently, Eq. 6 is like the mode of the convolution algorithm, and 0ρ1 . In addition, there are only two pixel values (0 and r ). If r=1 , then the operation in brackets in Eq. 6 is actually equal to the sum of the AND operation of the correspondent pixels in the two images. Otherwise, the operation will be equal to the sum of squares of the AND operation. Because it is a binary image, no matter the former or the latter, their computational performances are the same as that of the AND operation. But the latter offers r2 , which is also a fixed constant. Therefore, Eq. 6 can be changed into Eq. 7:



In Eq. 7, denotes the AND operation. Undoubtedly, it is of great value to make the computation more efficient and the hardware implementation of the algorithm more realizable. On the other hand, there might be mismatching that makes ρ=1 where all the pixel values are r in some region of the reference image. For example, when there are only 0 and 1 for pixel values in an image, if the subimage covered by the template is 1, it would cause an error of final correlation degree ρ=1 by using Eq. 7 through the AND operation (similar to the AND operation of two images). Therefore, these regions must be removed by forcedly making ρ=0 there. Here, the subimage of universal 1 is eliminated through the comparison of the template image pixel sum and that of the subimage, covered by the template. If a subimage has universal 1, the sum of pixel values will be greater than that of the template because there is a region of 0 within the template. The universal 1 or 0 would make a template meaningless. Therefore, we would consider only the subimages in which the pixel value sum is less than that of the template. This would avoid error matching. Meanwhile, for a region of universal 0, it is not necessary to have a matching calculation. For a real matching region, the sum of pixel values should be approximate to that of the template. Thus, we consider only the regions in which the sum of pixel values is greater than half that of the template and less than that of the entire template. Suppose that the sum of the pixel values in the subimage covered by the template is Ψ , making M<ΨΦ , where Φ is the sum of the pixel values in the template. M can be adapted according to the need of precision. Here, M=Φ2 . When the template is being moved, it needs to compute just the sum of the pixel values in the subimage covered by the template within this range. As a result, not only can it accelerate the operational speed, but it can also remove a large amount of false matching positions.

Equations 3, 7 are used for the same image and template, respectively. Figure 1 shows the distributions of correlation peaks. Figure 1 shows the plan view of the correlation peak generated from Eq. 3, and Fig. 1 shows the correlation peak generated from Eq. 3. Figures 1 and 1 show the correlation peak with image rotating by 30deg . In particular, Figs. 1 and 1 show the plan view of the correlation peak and the correlation peak, respectively, both generated from Eq. 3. The same in Fig. 1 and 1, but generated from Eq. 7. In the illustration, the pane stands for the place where the value of ρ is highest, and the location of the pane in Fig. 1 is the factual location of the template in the reference image. In Fig. 1, in order to show better comparison of the two algorithms, the denoted matching positions are marked according to the highest correlation peak. But in the following experiments, the candidate matching positions are selected through a definite number of candidates.

Fig. 1

Correlation peaks. (a) Plan view of the correlation peak generated from Eq. 3; (b) correlation peak generated from Eq. 3; (c) plan view of the correlation peak generated from Eq. 3 (image rotates by 30deg ); (d) correlation peak generated from Eq. 3 (image rotates by 30deg ); (e) plan view of the correlation peak generated from Eq. 7 (image rotates by 30deg ); and (f) correlation peak generated from Eq. 7 (image rotates by 30deg ).


Obviously, after the rotation, the classic algorithms have lost the correct matching position [as shown in Fig. 1]. With the refined algorithm, the effective matching positions will still stay near the correct positions [as shown in Fig. 1]. The refined algorithm thus has better flexibility to rotations. Meanwhile, it can isolate the possible matching regions within a smaller region [as shown in the light region of Fig 1], which benefits the following fine matching.

Further, it has to be pointed out that matching candidates are not derived through Eq. 7 after binary images are set through binarization. They are derived, based on the adaptive threshold, through the pixel values, either 0 or 1. This has speeded up the computation. Here, the adaptive thresholds are calculated by the grayscale mean values of the subimages. The subimages are divided equally from reference image into n copies according to the template. If the reference image cannot be divided equally into subimages according to the template, the rest of the image that cannot be put in the subimage will calculate its threshold with its neighboring subimage. This binarization is simple and fast. But just like many other binarizations, it cannot keep consistency of binarization for complex images. Therefore, after the coarse matching (CMA) stage, Fourier-Mellin transformation is introduced. Thus, the performance is guaranteed as long as the matching candidates are near the good matching points. Further, with the cost of additional computation, more candidates can be selected in order that some candidates can fall around the fine matching points.

The Fourier-Mellin transformation has the advantage of low cost of matching duration and overlapping range. Even though the matching candidates are selected at locations far from the fine matching points, a correct matching still can be guaranteed. The performance of these two indices has the best effect compared to other algorithms.


Image Registration Based on Fourier-Mellin Transform

The image registration algorithm based on Fourier-Mellin transform is an effective matching method insensitive to noise and with high veracity.7 It has been applied to image registration, curve matching, and other fields,11, 16, 17 where it is used only to estimate registration parameters. To accelerate the registration parameter estimation process, the 2-D phase correlation in Chen 11 is replaced by 1-D phase correlation. The improved image registration theory is as follows.


Principle of 1-D phase correlation

Phase correlation is one of the most widely used techniques for image registration in the frequency domain to determine the relative translation between two images.18, 19 This method relies on the translation property of the Fourier transform, referred to as the Fourier translation theorem, which is easy to implement, has less computation complexity, and thus requires less process time due to the high efficiency of fast Fourier transform (FFT). Furthermore, it is robust to interference. However, phase correlation can deal with translations only between two images in the x axis and y axis.20

In this paper, with the 1-D signal f(x) as an example, the Fourier translation theory is as follows:



Let f1(x) and f2(x) be two input signals that differ only by a displacement x0 , i.e.,



From Eq. 8, the corresponding Fourier transforms F1(ω) and F2(ω) will be related by



The cross-power spectrum of two signals f1 and f2 with F1(ω) and F2(ω) is defined as


where Φ1(ω) and Φ2(ω) are the spectral phases of f1(x) and f2(x) , F2*(ω) is the complex conjugate of F2(ω) , and the translation theorem guarantees that the phase of the cross-power spectrum is equivalent to the phase difference between the two signals. By taking the inverse Fourier transform of the representation in the frequency domain, we will have a function that is an impulse19—that is, it is approximately zero everywhere except at the displacement x0 that is needed to optimally register the two signals:


Based on the property of the Fourier transform, the inverse Fourier transform of exp(j2πx0ω) is δ(xx0) . Equation 12 gives the 1-D δ function. So D12(ω) is a function that is also an impulse.

Since the phase difference for every frequency component contributes equally, the location of the peak will not change if there are noises or part distortion between the two signals.


FMI descriptor

The FMI descriptor is translation invariant and represents rotation and scaling as translations in parameter space. Its theory is as follows. Consider image s(x,y) and image r(x,y) , among which, s(x,y) is obtained from r(x,y) by the translation, rotation, and scaling transform, i.e.,



Then the corresponding Fourier transforms S(u,v) and R(x,y) will be related by


where, we use ∣.∣ to denote spectrum amplitude. From Eq. 14, spectrum amplitude is related only to rotation angle α and scaling factor σ , not to displacement x0 and y0 . Hence, the transformation parameters can be computed by two steps: In step 1, α and σ can be obtained by the spectrum amplitude; in step 2, x0 and y0 can be achieved.

Step 1

Achieving the rotation angle α and the scaling factor σ using the phase correlation algorithm. The theory is as follows




where the spectrum amplitudes of the image r and the image s in polar coordinates are respectively represented with rp and sp . Then we can get the following equation:




where λ=logρ , κ=logσ .

Here, we use the 1-D projection, so the image in polar coordinates can be devided into two 1-D datasets in the x axis and y axis, respectively. Then, Eq. 18 can be divided into the following two equations:





We can see that Eqs. 19, 20 have the form of Eq. 9 in nature. So syp1 and ryp1 can be calculated using Eqs. 10, 11 ordered in log-polar coordinate space, and the same for sxp1 and rxp1 . Then by taking the inverse Fourier transform, the distribution of the correlation coefficient can be achieved. α and κ can be obtained, according to the location of the corresponding peak.

If the logarithm’s fundus is e , then


Thus, α and σ can all be obtained.

Step 2

Achieving the displacement x0 and y0 using the phase correlation algorithm. By taking the inverse transform, image s1(x,y) can be obtained from the image s(x,y) using α and σ from step 1. Equations 10, 11 are used to image s1(x,y) and image r(x,y) , respectively. Then, by taking the inverse Fourier transform, the distribution of the correlation coefficient can be achieved. The displacement x0 and y0 can be obtained according to the location of the corresponding peak.


Subpixel Interpolation Method

Rectify the image by the method mentioned in Sec. 3.2, and then take the right matching point as the center and search by expanding several pixels. (The range of the searching can be expanded according to different demands—obviously, the bigger the range is, the more reliable the result will be, but the more time it will consume.) Then, according to the correlation peak, the novel subpixel interpolation method is used to gain the final matching position by the principle of parabola interpolation method (Fig. 2) The concrete steps are as follows.

Fig. 2

Parabola interpolation method.


Making eight longitudinal cuts from the summit of the correlation peak (vertically, horizontally, and diagonally), we will get four sections that can be precisely described by the fitting parabola. In this way, 2-D information can be applied to describe 3-D information. Therefore, four pairs of points at the 0.1-pixel level are obtained: (x1,y1) , (x2,y2) , (x3,y3) , (x4,y4) . Suppose that the point (x,y) is the minimum value point—that is, the sum of all distances between this point and other points is the smallest—then this point is the Fermat point. Forming function f(x,y) as follows:

then the (x,y) with fmin in the preceding expression is the wanted subpixel position. Therefore, we can heighten the interpolation accuracy, making it as accurate as 0.01 subpixel.

Next, we will further explain this method. Vertically, the biggest correlation coefficient can be obtained by using the parabola interpolation method and then putting the result into the equation y=ax2+bx+c so that we can get the correspondent values in the x axis and y axis, respectively. The coefficient of the equation a , b , and c can be obtained by the three nearby coordinate values of the correlation peaks. Suppose that the correlation coefficient of the matching position is ρ0 and that its horizontal coordinate value is x0 . We will then get the equation ρ0=ax02+bx0+c , taking two points near (x0,ρ0) —namely, (x1,ρ1) and (x2,ρ2) , respectively—and then solving the following equations:

Then, the coefficients a , b , and c can be obtained. The real matching position should be the place where the correlation coefficient is the biggest—that is, the highest point wmax in the fitting parabola (Fig. 2). In this condition, the correspondent horizontal coordinate can be obtained along the fitting parabola.

Similarly, the correspondent vertical coordinate can be obtained along the fitting parabola by using its vertical coordinate—that is, ρ0=dy02+ey0+f . We will then get the coordinate value (x1,y1) .

When fitting horizontal and diagonal sections, as mentioned earlier, we can get the correspondent coordinate values of the other three correspondent points—say, (x2,y2) , (x3,y3) , and (x4,y4) . The eight directions can be illustrated by Fig. 3 as follows.

Fig. 3

Eight directions along fitting parabolas.


The four points can generally form three shapes—namely, convex quadrilateral, concave quadrilateral, and triangle (see Fig. 4). Making use of basic mathematical knowledge—that is, that the other two sides of a triangle were greater than the third—it is easy to prove that when the four points form a convex quadrilateral, the Fermat point (the point where the sum of the distance between it and other peaks is smallest) is the intersection of the diagonals; when they form a concave quadrilateral, the Fermat point is the point at the bottom of the concave part; and when they form a triangle, the Fermat point is the point in the middle of the line where the three points are in. Because the positions of the points are the same, the conditions when they form a convex quadrilateral or a triangle can be classified as one category. Then by the features of the Fermat point, the coordinates of the subpixel position can be easily obtained.

Fig. 4

Position about four points.



The Matching Algorithm Flowchart

In the preceding sections, we have carefully analyzed the steps of the proposed algorithm. The main flowchart is shown in Fig. 5, which consists of the following steps:

  • 1. Image preprocessing.

  • 2. Implementing coarse matching and finding matching candidates.

  • 3. Implementing image registration and outputting the translation, rotation, and scaling parameters.

  • 4. Image rectification.

  • 5. Implementing fine matching and outputting the position at pixel level.

  • 6 Implementing subpixel interpolation and outputting the precious position of the subpixel

Fig. 5

The matching algorithm flowchart.



Experimental Results

Experiments were carried out to measure the performance of our algorithm in terms of speed, matching precision, and matching accuracy. As proposed in this paper, methods of computation for selecting candidates, the hierarchical Zernike method,21 and the circular projection method of computation for selecting candidates in the frequency domain22 are compared. The hierarchical Zernike method utilizes the pyramid structure of the image and yielded results comparable to ours in terms of accuracy.


Test Images

The proposed algorithm has been tested with four types of variations: rotation, brightness change, contrast change, and added Gaussian noise.

Figure 6 shows a set of templates masked off from the original image, and Figures 7, 7, 7, 7 show rotated images at arbitrary angles in the scope of [20deg,20deg] . In particular, Fig. 7 shows a set for rotation only and Fig. 7 shows a set with additive Gaussian noise in addition to the rotation. Figs. 7 and 7 have brightness and contrast changes, respectively, both with rotations at arbitrary angles in the scope of [20deg,20deg] . In all of the images with different settings, the proposed algorithm has located the position of each template precisely.

Fig. 6

A set of templates.


Fig. 7

Test images (a circuit board). (a) Rotated image; (b) rotated image with Gaussian noise; (c) rotated image with brightness variation; and (d) rotated image with contrast variation.



Speed Test

To verify the matching performance of our proposed algorithm in terms of speed, several experiments were carried out with various sizes of templates for a set of 256×256 images. Template sizes varied from 31×31 to 71×71 . The proposed method was also compared in terms of the speed to the existing algorithm, the hierarchical Zernike method, and the circular projection method of computation for selecting candidates in the frequency domain. The speed test was performed on a 2.4-GHz Pentium 4 PC (Table 1).

Table 1

Speed of matching (in s).

Size oftemplateProposed methodCircular projectionHierarchical Zernike
31×31 0.02500.06250.3855
41×41 0.04200.12550.6612
51×51 0.05750.18111.0015
61×61 0.06070.25761.3232
71×71 0.07990.32421.6202

Figure 8 shows the speed comparison of several algorithms. The proposed algorithm turned out to be the fastest. For the time required to select candidates, Figure 9 shows that it increases very slowly in relation to the template size (Table 2).

Fig. 8

Speed comparison.


Fig. 9

Speed of each stage (first stage: candidate selection; second stage: fine matching).


Table 2

Speed of each stage (in s).

Size of template 31×31 41×41 51×51 61×61 71×71
First stageCandidate selection0.00320.00630.00820.00930.0109
Second stageFine matching0.02180.03570.04930.05140.0690


Matching Precision

To verify the interpolation performance of our proposed algorithm in terms of precision, the matching error was compared with the paraboloid surface fitting method (Fig. 10). Figure 11 shows the relationship between sections and precision. In the experiment, the interpolation algorithm was performed when the matching location is precise. Table 3 shows the experiment results of the matching precision comparison between the paraboloid surface fitting method and the proposed algorithm. Table 4 shows the results of the proposed algorithm under different sections.

Fig. 10

Comparison of the matching error between the paraboloid surface fitting method and the proposed method. (a) Error comparison on the x axis; and (b) error comparison on the y axis.


Fig. 11

Comparison of the matching error with different sections. (a) Error comparison on the x axis; and (b) error comparison on the y axis.


Table 3

Comparison of the experiment results.

SequenceNumberCoordinatesParaboloid surface fitting methodProposed method (4 sections)
Matching positionErrorMatching positionError
1(40, 90)(40.101, 89.909)( 0.101 , 0.091)(39.99975, 90.00036)(0.00025, 0.00036)
2(175, 262)(175.097, 261.898)( 0.097 , 0.102)(174.99980, 261.99904)(0.00020, 0.00096)
3(67, 196)(67.108, 196.068)( 0.108 , 0.068 )(66.99975, 195.99944)(0.00025, 0.00056)
4(30, 140)(30.02, 139.874)( 0.02 , 0.126)(30.00023, 139.99984)( 0.00023 , 0.00016)
5(24, 129)(23.981, 128.907)(0.019, 0.093)(24.00314, 129.00107)( 0.00314 , 0.00107 )
6(190, 116)(190.035, 115.897)( 0.035 , 0.103)(190.00019, 115.99832)( 0.00019 , 0.00168)

Table 4

Comparison of the experiment results.

SNCoordinatesProposed method (different sections)
1 point(1 section)2 points(2 sections)3 points(3 sections)4 points(4 sections)
1(40, 90)(40.00043,90.00043)( 0.00043 , 0.00043 )(39.99827,89.99911)(0.00173,0.00089)(39.99958,89.99943)(0.00042,0.00057)(39.99975,90.00036)(0.00025,0.00036)
3(67,196)(66.99935,195.99935)(0.00065,0.00065)(67.00115,195.99986)( 0.00115 ,0.00014)(67,195.99972)(0,0.00028)(66.99975,195.99944)(0.00025,0.00056)
4(30,140)(29.99980,139.99980)(0.0002,0.0002)(30.00089,139.99869)( 0.00089 ,0.00131)(30.00030,139.99941)( 0.00030 ,0.00059)(30.00023,139.99984)( 0.00023 ,0.00016)
5(24,129)(24.01803,129.01803)( 0.01803 , 0.01803 )(24.00110,129.00539)( 0.00110 , 0.00539 )(24.00236,129.00936)( 0.00236 , 0.00936 )(24.00314,129.00107)( 0.00314 , 0.00107 )
6(190,116)(190.00166,116.00166)( 0.00166 , 0.00166 )(190.00047,115.99938)( 0.00047 ,0.00062)(190.00093,116)( 0.00093 ,0)(190.00019,115.99832)( 0.00019 ,0.00168)

In Table 3, for the method of the paraboloid surface fitting, it is obvious that the minimum of the absolute error is 0.019 and 0.068 in the x axis and y axis, respectively, while the maximum of the absolute error is 0.00314 for x and 0.00168 for y , according to the proposed algorithm (four sections). Obviously, the proposed algorithm is more accurate than the method of the paraboloid surface fitting, which improves the matching accuracy by at least an order of magnitude. Figure 10 shows the error comparison for the two methods.

In Table 4, when one point is selected, it is the point corresponding to the vertex of the fitting parabola along the direction of 135deg . When two points are selected, they are the vertices of the horizontal and vertical fitting parabolas, respectively; when three points are selected, they are the vertices of the fitting parabolas corresponding to the horizontal, vertical, and 135deg directions.

Obviously, the precision increases continuously with the increase of the sections (Fig. 11). However, the quantity and complexity of the computation also increase accordingly with more points selected: When only one point is selected, this point is the best matching location; when two points are selected, it is obvious that the midpoint is the best matching location; when three points are selected, the best matching location is the Fermat point. Fig. 4 shows the analytic results for four points, and the complexity of the computation increases tremendously for more than five points.


Matching Accuracy

For the accuracy of our proposed algorithm, a matching ratio was also compared. The matching ratio is defined as the number of correct matches to the number of trials. For this accuracy testing, 30 similar images of electronic circuit boards had been captured, and 15 templates of the area of interest were selected manually from each image, resulting in 450 templates in total. Various types of noise such as Gaussian noise and contrast/brightness changes were added to each image. Then each template was searched for from the corresponding image. The matching ratio indicates how well the queried template has been retrieved at the precise location from the corresponding image. Obviously, the matching ratio depends on the number of candidates for each trial. The ratio becomes higher as the number of candidates for each template increases, but the computation takes longer.


Rotation angles

Figure 12 shows the relationship between the accuracy and the rotation angles. The rotation matching test was performed with each rotation of 5deg . The transverse error Δx and the longitudinal error Δy were required to satisfy the constraint (NΔx)(NΔy)80%N2 in order to have 80% of the overlap between the template and the subimage the same size as the template, which can guarantee obtaining the accurate scaled and rotated factors by using the Fourier-Mellin transform (Fig. 13),

Fig. 12

Matching ratio with overlapping variation.


Fig. 13

Matching ratio with rotation angle variation.


where N is the size of the template, and N2 is the area of the template.


Gaussian noise

Additive white Gaussian noise was added to each image. The plot in Fig. 14 shows the relationship between the accuracy and the number of candidates. As evidenced by the experiment, when 40 candidates were selected for each template, the accuracy ratio ranked highest. The accuracy degrades by only a small portion as the SNR decreases because the binarization reduces the effect of Gaussian noise.

Fig. 14

Matching ratio under Gaussian noise.



Brightness/contrast variation

Figure 15 shows the matching accuracy result when the brightness of the image changes due to the lighting conditions. The proposed algorithm survives by more than 90% even with severe changes ( 40 to 40 in gray values) in brightness. This is because the binarization eliminates the effect of the overall brightness change. On the other hand, the correlation algorithms are not sensitive to brightness variation. Figure 16 shows the matching accuracy when the contrast of the image changes. The proposed algorithm turned out to be robust to contrast change too.

Fig. 15

Matching ratio with brightness variation.


Fig. 16

Matching ratio with contrast variation.


Equations 22, 23 were used to change the brightness and the contrast of the images. The abscissa in Figs. 15 and 16 indicates the value of d in the following equations:






Terrain Matching Application

We applied the proposed method for terrain matching using a sand table simulating real terrain. Generally, the real-time 3-D terrain data is recovered from the optical images in the observed scene, which is called the reference evaluation map (REM). Suppose that a reference digital elevation map (DEM) called the airborne-referenced DEM is achieved with the feature matching method and surface natural neighbor interpolation in position A (Ref. 23, 24. In position B , with the same method, we will achieve REM, which is also called real-time REM. With the proposed method, we match the achieved REM and DEM.

REM and DEM are both converted to contour maps, which are called the REM contour map (RCM) and the DEM contour map (DCM), using the same method as in Yu .25 Thus, the problem of terrain matching is converted to the matching of RCM and DCM.

For this particular experiment, the scaling factor between RCM and DCM is 1 because the REM has the same resolution as the DEM. Therefore, we consider only the situation in which the two maps contain rotation and translations.

Additive white Gaussian noise was added to the 3-D terrain data. Figure 17 is the matching results of the REM with 15-dB Gaussian noise, and the joined 15-dB noise can indicate that the fluctuating scope of 3-D data is [17.4,17.4]mm , while the real sand table height is 287mm therefore, the fluctuating scope of 3-D is equivalent to the precision changes δ±17.4287 —that is, about ±6.1% . Figure 17 shows the sparse feature matching result using the method in Li and Zhang.23 Figure 17 is the sparse feature matching result with 15-dB noise and 20-deg rotation. Figures 17 and 17 are DEM and REM, respectively. The contours of DEM are shown in Figure 17. Figure 17 is the contour maps of REM. A set of templates intercepted from Fig. 17 is shown in Fig. 17, and the size of RCM is 60×60pixels . Using the method proposed in Sec. 3.2, the rotation angle is 19.8757deg between RCM and DCM, so the error is 0.1243deg . RCM overlaid onto DCM is shown in Fig. 17.

Fig. 17

Matching result of an REM with 15-dB noise to the rotated reference DEM. (a) The sparse feature matching result; (b) the sparse feature matching result with 15-dB noise and 20-deg rotation; (c) reference DEM; (d) REM; (e) contour map of (c); (f) contour map of (d); (g) a set of templates; and (h) RCM overlaid onto DCM.


Experiment 1

Figure 18 shows the matching results of REM and DEM obtained in the different position and height. There are certain differences between them due to surface interpolation, so the constructed contour maps are different. The matching results are that the rotation is 11.2541deg and the translations are (22.7124,12.3686) . But the real parameters are not known.

Fig. 18

Matching results of sand table simulating the real terrain. (a) RCM; (b) DCM; (c) matching results; and (d) RCM overlaid onto DCM.


Experiment 2

Figure 19 shows the matching results of REM and DEM obtained in the different position and height. There are a certain differences between them due to surface interpolation, so the constructed contour maps are different. The matching results are that the rotation is 20.5101deg and translations are (4.9662,19.0128) . And the real parameters are also not known.

Fig. 19

Matching results of sand table simulating the real terrain. (a) RCM; (b) DCM; (c) matching results; and (d) RCM overlaid onto DCM.




In this paper, we proposed a fast and robust algorithm for template matching of an arbitrarily rotated image when the fluctuating scope of the rotation angle is [20deg,20deg] . The proposed method employed to select candidates has efficiently reduced the order of feature space using the AND operation. In the experiment, we showed that it was computationally simple and fairly reliable for selecting candidates. In addition, the time complexity for the candidate selection process increased very slowly compared to the template size. But it is necessary to point out that the excellent performance of the proposed method employed to select candidates was achieved at the expense of matching accuracy. Therefore, the Fourier-Mellin transform was used to remedy this problem. Experimental results also show that the matching precision is subpixel using a novel method, which is better than the paraboloid surface fitting method.


We appreciate the effort of the reviewers, whose helpful comments improved this paper.


1.  J. K. Aggarwal, L. S. Davis, and W. N. Martin, “Correspondence process in dynamic scene analysis,” Proc. IEEE0018-9219 69(5), 562–572 (1981). 10.1109/PROC.1981.12025 Google Scholar

2.  E. S. Baker and R. D. Degroat, “A correlation-based subspace tracking algorithm,” IEEE Trans. Signal Process.1053-587X 46(11), 3112–3116 (1998). 10.1109/78.726827 Google Scholar

3.  L. G. Brown, “A survey of image registration techniques,” ACM Comput. Surv.0360-0300 24(4), 325–376 (1992). 10.1145/146370.146374 Google Scholar

4.  U. M. Cahnvon and R. Bajcsy, “Adaptive correlation tracking of targets with changing scale,” Univ. of Pennsylvania: GRASP Laboratory Technical Report No. MS-CIS-96-22 (1996). Google Scholar

5.  R. G. Caves, P. J. Harley, and S. Quegan, “Matching map features to synthetic aperture radar (SAR) images using template matching,” IEEE Trans. Geosci. Remote Sens.0196-2892 30(4), 680–685 (1992). 10.1109/36.158861 Google Scholar

6.  J. P. Secilla and N. Garcia, “Template location in noisy pictures,” Signal Process.0165-1684 14, 347–361 (1987). 10.1016/0165-1684(88)90093-X Google Scholar

7.  R. Y. Wong and E. L. Hall, “Sequential hierarchical scene matching,” IEEE Trans. Comput.0018-9340 27(4), 359–366 (1978). 10.1109/TC.1978.1675108 Google Scholar

8.  A. Rosenfeld and A. Kak, Digital Image Processing, 2nd ed., vol. 2, Academic Press, Orlando (1982). Google Scholar

9.  S. W. Lee and W. Y. Kim, “Rotation-invariant template matching using projection method,” Proc. Korean Inst. Telecommun. Electron. 19(1), 475–476 (1996). Google Scholar

10.  R. J. Prokop and A. P. Reeves, “A survey of moment-based techniques for unoccluded object representation and recognition,” Graph. Models Image Process.1077-3169 54(5), 438–460 (1992). 10.1016/1049-9652(92)90027-U Google Scholar

11.  Q. S. Chen, D. Michel, and F. Deconinck, “A symmetric phase-only matched filtering of Fourier-Mellin transforms for image registration and recognition,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 16(12), 1156–1168 (1994). 10.1109/34.387491 Google Scholar

12.  C. Kan and M. D. Srinath, “Invariant character recognition with Zernike and orthogonal Fourier-Mellin moments,” Pattern Recogn.0031-3203 35(1), 143–154 (2002). 10.1016/S0031-3203(00)00179-5 Google Scholar

13.  V. Martin, “Wavelets and filter banks: theory and design,” IEEE Trans. Signal Process.1053-587X 40(9), 2207–2232 (1992). 10.1109/78.143454 Google Scholar

14.  B. Murat, E. K. Misha, and L. M. Eric, “Wavelet domain image restoration with adaptive edge-preserving regularization,” IEEE Trans. Image Process.1057-7149 9(4), 597–608 (2000). 10.1109/83.841937 Google Scholar

15.  L. D. Stephano and S. Mattoccia, “Fast template matching using bounded partial correlation,” Mach. Vision Appl.0932-8092 13(4), 213–221 (2003). 10.1007/s00138-002-0070-5 Google Scholar

16.  L. A. D. Hutchinson and W. A. Barrett, “Fast registration of tabular document images using the Fourier-Mellin transform,” in IEEE Proc. First Int. Workshop Document Image Analysis for Libraries (DIAL‘04) 16(12), 1256–1270 (2004). Google Scholar

17.  X. Zhao, “A study on robust image curve matching techniques,” PhD. thesis, Beijing Institute of Automation, Chinese Academy of Sciences (2004). Google Scholar

18.  A. Averbuch and Y. Keller, “FFT based image registration,” in IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, pp. 3601–3608 (2002). Google Scholar

19.  Reddy B. Srinivasa and B. N. Chatterji, “An FFT-based techniques for translation, rotation, and scale-invariant image registration,” IEEE Trans. Image Process.1057-7149 5(8), 1266–1271 (1996). 10.1109/83.506761 Google Scholar

20.  C. Siwaphon, C. Orachat, and T. Kiatnarong, “Image registration using Hough transform and phase correlation,” in Proc. Int. Congress on Anti-Cancer Treatment (ICACT), pp. 20–22 (2006). Google Scholar

21.  S. G. Kim and W. Y. Kim, “Hierarchical grayscale template matching for an arbitrarily oriented object,” in Proc. Korea Signal Process. Conf. 96, pp. 1335–1338 (1996). Google Scholar

22.  M. S. Choi and W. Y. Kim, “A novel two stage template matching method for rotation and illumination invariance,” Pattern Recogn.0031-3203 35(1), 119–129 (2002). 10.1016/S0031-3203(01)00025-5 Google Scholar

23.  H. Li and G. Zhang, “A new fast edge-matching algorithm based on corner constraint and edge constraint,” Proc. SPIE0277-786X 6358, 635815 (2006). 10.1117/12.717813 Google Scholar

24.  M. Yanalak, “Effect of gridding method on digital terrain model profile data based on scattered data,” J. Comput. Civ. Eng.0887-3801 17(1), 58–67 (2003). 10.1061/(ASCE)0887-3801(2003)17:1(58) Google Scholar

25.  Q. Yu, J. Tian, and J. Liu, “A novel contour-based 3D terrain matching algorithm using wavelet transform,” Pattern Recogn. Lett.0167-8655 25(1), 87–99 (2004). 10.1016/j.patrec.2003.09.004 Google Scholar


057001_1_m1.jpg Guangjun Zhang received his BS, MS, and PhD degrees from the Department of Precision Instruments Engineering, Tianjin University, China, in 1986, 1989, and 1991, respectively. He was a visiting professor at North Dakota State University from 1997 to 1998. He is currently a professor and dean of the School of Automation Science and Electrical Engineering, BeiHang University, China. His research interests are machine vision, laser precision measurement, optoelectronic precision instrumentation, and artificial intelligence.

057001_1_m2.jpg Ming Lei received both his BS and his MS from the Automatic Control Department, Daqing Petroleum Institute, China, in 2000 and 2003, respectively. Since 2004, he has been a PhD candidate at Beijing University of Aeronautics and Astronautics, China. His research interests include pattern recognition for machine vision systems and content-based image retrieval.

Xulin Liu biography and photograph not available.

Guangjun Zhang, Ming Lei, Xulin Liu, "Novel template matching method with sub-pixel accuracy based on correlation and Fourier-Mellin transform," Optical Engineering 48(5), 057001 (1 May 2009). https://doi.org/10.1117/1.3125425

Back to Top