Vision-based displacement measurement sensor using modified Taylor approximation approach

. The development of image sensors and optics lenses has contributed to the rapidly increasing use of vision-based methods as noncontact measurement methods in many areas. A high-speed camera system is developed to realize the displacement measurement in real time. Conventional visual measurement algorithms are commonly subjected to various shortcomings, such as complex processes, multiparameter adjustments, or integer-pixel accuracy. Inspired from the combination of block-matching algorithm and simplified optical flow, a motion estimation algorithm that uses modified Taylor approximation is proposed and applied to the vision sensor system. Simplifying integer-pixel searching with a rounding-iterative operation enables the modified algorithm to rapidly accomplish one displacement extraction within 1 ms and yield satisfactory subpixel accuracy. The performance of the vision sensor is evaluated through a simulation test and two experiments on a grating ruler motion platform and a steering wheel system of a forklift. Experimental results show that the developed vision sensor can extract accurate displacement signals and accomplish the vibration measurement of engineering structures. The Unported


Introduction
Noncontact measure techniques, such as speckle photography, 1 hologram interferometry, 2 and laser Doppler vibrometry, 3 have been developed for years and well applied in various fields. Compared to traditional measurement devices, such as accelerometer or linear displacement gauge, devices with noncontact approaches have a more flexible installation and provide intuitionistic exhibitions of the actual movements of the target without affecting its behavior. In some environments where traditional sensors do not have clear access or cannot work effectively, e.g., remote measurement targets or fields with high temperature or strong magnetic, noncontact measurement devices obviously have great advantages over conventional ones. However, most noncontact equipment requires high cost and strict construction structures, thus limiting the wide use of such systems in practical applications.
Technological developments in image sensors and optics lens have contributed to the rapidly increasing use of vision-based measurement methods as noncontact measurement methods in numerous research and industrial areas, such as vibration analysis, 4,5 condition monitoring, [6][7][8][9][10][11] human motion, 12,13 and underwater measurement. 14 With a relatively lower cost and better flexibility in structure, optical devices and cameras offer effective alternatives to noncontact equipment. Benefiting from the wide availability of affordable high-quality digital image sensors and high-performance computers, cheap high-resolution cameras have been growing used in many areas. Recently, vision-based techniques were successfully used to measure various structures and get satisfactory results. [15][16][17][18][19][20][21][22][23][24] Quan et al. 25 achieved three-dimensional displacement measurement based on two-dimensional (2-D) digital image correlation (DIC). Kim et al. 26 proposed a vision-based monitoring system that uses DIC to evaluate the cable tensile force of a cable-stayed bridge. The same method was also applied in experimental mechanics for noncontact, full-field deformation measurement. 27, 28 Park et al. 29 realized displacement measurement for high-rise building structures using the partitioning approach. Wahbeh et al. 30 realized the measurement of displacements and rotations of the Vincent Thomas Bridge in California by using a highly accurate camera in conjunction with a laser tracking reference. Fukuda et al. 31 proposed a camera-based sensor system in which a robust object search algorithm was used to measure the dynamic displacements of large-scale structures. Feng et al. 32 developed a vision-based sensor that employed an up-sampled cross-correlation (UCC) algorithm for noncontact structural displacement measurement, which can accurately measure the displacements of bridges. 33 The traditional camera system for displacement measurement is composed of commercial digital cameras and videoprocessing devices (normally computers). However, ordinary digital cameras often have low video-sampling rate, which limits the application range of their vibration frequency. 34 To overcome this restriction, a high-speed vision system with 1000 frames per second (fps) or even higher has been developed and applied in practice. 35 In the present paper, a high-speed camera sensor system composed of a zoom optical lens, a high-speed camera body with a CCD receiver, and a notebook computer. A USB 3.0 interface is used to ensure stable data transfer between the camera body and the computer. On the notebook computer, the captured video can be processed by software to realize the tracking of an object and to extract motion information in real time.
Similar to other measurement equipment, a vision-based measurement system is mainly concerned with measurement of speed and accuracy, both of which significantly depend on the performance of the image-processing algorithm. Owing to their high sampling rate, high-speed sensors have strict requirements for motion-tracking algorithms on computing speed to satisfy the demand of real-time signal processing. Conventional motion extraction algorithms based on template matching registration techniques [i.e., sum of absolute difference (SAD) or normalized cross-correlation (NCC)] are mostly complex and have a heavy computation load. Moreover, template matching techniques can only achieve integer-pixel resolution because the minimal unit in a video image is 1 pixel. Such accuracy is far from satisfactory in numerous practical applications, particularly for those where the vibrations of small structures are required. Various methods have been proposed to refine measurement accuracy, including interpolation techniques and subpixel registration, [36][37][38] most of which improved accuracy indeed but exhibited low computational efficiency to some degree.
Chan et al. 39 proposed a subpixel motion estimation method that uses a combination of classical block matching and simplified optical flow. The method is tremendously faster than any existing block-matching algorithm because no interpolation is needed. In the first step, a block-matching algorithm, such as three-step search (TSS) or cross-diamond search, is used to determine integer-pixel displacement. The result is then refined through local approximation using a simplified optical flow to subpixel level. In this paper, we simplified Chan's algorithm by replacing block matching with rounding-iterative Taylor approximation. With no subpixel interpolation needed during frame cutting in each iteration, this modified algorithm runs much faster than conventional iterative DIC optical flow. Given that the improvement brings no additional parameter that requires specification, the modified algorithm naturally executes with a high degree of automation. After several times of optimization, the computation time of one extraction in the modified algorithm is reduced to less than 1 ms. The modified algorithm is utilized in the high-speed camera system for its high efficiency and satisfactory subpixel accuracy. A simulation and two experiments under laboratory and realistic conditions are carried out for performance verification. The positive results demonstrated the accuracy and efficiency of the camera sensor system in measuring dynamic displacement.
The rest of the paper is organized as follows. Section 2 introduces the components and capability parameters of the high-speed vision sensor system. Section 3 presents the theory of motion estimation algorithm without interpolation and the modified Taylor algorithm. Section 4 evaluates the performance of the modified algorithm with a simulation test. Section 5 presents two experiments for performance verification. Section 6 discusses the results and outlook.

High-Speed Vision Sensor System
Traditional camera systems for displacement measurement are commonly composed of commercial digital cameras and personal computers. However, commercial digital cameras usually have low frame rates (i.e., 100 fps), which limit their application in vibration frequencies over 50 Hz. In this paper, a high-speed sensor system composed of a notebook computer (Intel Core processor 2.9 GHz, 2.75 GB RAM) and a video camera with telescopic lens is developed for displacement measurement, as shown in Fig. 1(a). The telescopic lens has a large zooming capability [ Fig. 1(b)] that can reach the measurement requirement at different distances. The camera head uses a CCD sensor as the image receiver, which can capture 8-bit gray-scale images at a maximum of 1000 fps when the image resolution is set as 300 pixels × 300 pixels. A USB 3.0 interface is used to ensure stable data transfer between the camera and the computer. With its high sampling rate and computing efficiency, the image-processing software on the notebook computer can use the refined Taylor algorithm to track specific fastmoving objects and extract motion information in real time.
A target panel preinstalled on the target is very helpful to ensure extraction accuracy during measurement. If the target panel is unavailable because of the limitation of the measurement environment, the distinct surface patterns of the structure, such as textures or edges, can be used as tracking templates. Then the camera system is ready to capture images from a remote location, and the displacement time history of the structure can be obtained by applying the displacement tracking algorithm to the digital video images. Figure 2 shows the subpixel motion estimation algorithm combined with block-matching algorithm and simplified optical flow. Two consecutive frames, fðx; yÞ and gðx; yÞ, with real displacement ðΔx; ΔyÞ are given. The real displacement can be divided into an integer part ðΔx; ΔȳÞ and a subpixel part ðδx; δyÞ as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 3 2 6 ; 2 4 3

Subpixel Motion Estimation Without Interpolation
A block-matching algorithm is first applied to estimate integer-pixel displacements Δx and Δȳ. When the integer part is determined, the image block fðx; yÞ is shifted by Δx pixel in the x-direction and Δȳ pixel in the y-direction.
For the subpixel part, the Taylor series approximation is used to refine the search. The shifted image fðx þ Δx; y þ ΔȳÞ differs from the accurate location only by jδxj < 1 and jδyj < 1, which can be computed by using one-step Taylor approximation.
Total displacement can be determined by combining the integer part and the subpixel part. Analytical error analysis 39,40 is deduced in one dimension and can be generalized straightforwardly to a 2-D situation. The results imply that this two-step method can extract more accurate motion vectors than other block-matching algorithms. With no requirement for any interpolation and motion-compensated frames, the algorithm is much faster than the conventional method.

Taylor Approximation with Rounding-Iterative Operation
In this part, an analytic model is built to illustrate the proposed modified algorithm used in the sensor system. Figure 3 illustrates the displacement extraction procedure from consecutive frames k and k þ 1. A random subimage fðx; yÞ in frame k is selected as the matching template. With all its pixels moved by displacement vector p ⇀ ¼ ðΔx; ΔyÞ T , the template image will become a new subimage gðx; yÞ in the next frame at the same position.
With the assumption of brightness constancy or intensity conservation, the relationship between the template images fðx; yÞ and gðx; yÞ at the same position in frame k þ 1 can be written as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 2 7 6 gðx; yÞ ¼ fðx þ Δx; y þ ΔyÞ: (2) Note that the surface radiance remaining fixed from one frame to the next rarely holds exactly. As the scene might be constrained with no specularities, object rotations, and secondary illumination (shadows or intersurface reflection), the brightness constancy assumption works well in practice. 40 Given the fact that the displacement vector p ⇀ ¼ ðΔx;ΔyÞ T is usually a small value (normally several pixels), Eq. (2) can be approximated using first-order Taylor expansion with the higher-order terms ignored as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 6 8 6 With two unknowns, Δx and Δy, in one equation, the linear least squares (LS) estimator minimizes the square errors: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 3 2 6 ; 5 9 6 Eðp ⇀ Þ ¼ X As a linear LS problem, the minimum of Eðp ⇀ Þ can be found by setting its derivatives with respect to p ⇀ are zero: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 5 1 2 Equation (5) can be written in matrix form E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 3 2 6 ; 4 5 4 ∇Ip ⇀ ¼ ΔI; (6) in which ΔI is the difference matrix and ∇I denotes the gradient matrix where n refers to the number of pixels in selected templates.
In the sense of general LS, if ∇I T · ΔI is invertible (full rank), then the displacement vector can be expressed with an LS estimate as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 3 2 6 ; 1 9 6 With Eq. (8), displacement vectors between adjacent frames can be obtained precisely on the condition of a minute interframe displacement because the validation requirement of Taylor approximation is that jΔxj < 1 and jΔyj < 1. However, the interframe displacement between frames may be larger than the expected value in practical application, in which case the vector p ⇀ might be correct in direction but inaccurate in magnitude. Therefore, a rounding-iterative process is introduced to solve the problem and guarantee  accuracy. For each calculation step of p ⇀ j ¼ ðΔx j ; Δy j Þ T , the calculated Δx j and Δy j are set by rounding to the nearest integers until the termination condition is satisfied (p ⇀ j < 0.5). Figure 4 shows the architecture of the proposed displacement extraction method with the reformative iteration process involved. The procedure for the proposed modified method can be summarized as follows: Step 1: Cut fðx; yÞ and gðx; yÞ from consecutive frames at the same position; Step 2: Compute the partial derivatives f x and f y of fðx; yÞ by the central difference; Step 3: Compute the difference matrix ΔI and gradient matrix ∇I according to Eq. (7); Step 4: Compute the displacement vector p ⇀ j ¼ ðΔx j ; Δy j Þ T according to Eq. (8) and make sure p ⇀ j is less than 0.5. If p ⇀ j is less than 0.5, the algorithm proceeds to Step 5; if not, update fðx; yÞ to the new f½x þ roundðΔx j Þ; y þ roundðΔy j Þ and return to Step 2 (Symbol round denotes rounding to the nearest integer); Step 5: Accumulate Δx j and Δy j to obtain the refined displacement vector p ⇀ ¼ ðΔx; ΔyÞ T .
With this rounding-iterative modification, the integerlevel motion estimation is also accomplished with optical flow instead of block matching. This modification of the original method is so simple because it does not introduce any unnecessary computation and pixel interpolation. The rounding-off operation eliminates the subpixel interpolation computation in each frame cutting, which makes the algorithm much faster than conventional iterative DIC optical flow. The algorithm naturally executes with a high degree of automation because the improvement brings no additional parameter that requires specification. Although the rounding-off operation significantly decreases the time consumed for subpixel interpolation, the modified algorithm may, to some extent, sacrifice accuracy for its relatively loose termination condition. Fortunately, the proposed method performs stable satisfactory subpixel results and executes with high efficiency in the following simulation and experiments. Thus, the algorithm can be used in high-speed camera systems to measure displacement in real time. The contrast simulation and experiments for validation are presented in the following sections.

Simulation Test
The performance of the proposed modified Taylor algorithm is first evaluated through a simulation test. The simulation gives a simple case with only one vignetting black circle (a diameter of 160 pixels) on white ground, as shown in Fig. 5. The black circle is programmed to rotate with the following ellipse equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 3 2 6 ; 5 9 7 xðtÞ ¼ 10 cosð2πftÞ; yðtÞ ¼ 6 sinð2πftÞ: The maximum displacements in the xand y-directions are 10 and 6 pixels, respectively. The rotation frequency is set to 1 Hz, and the sampling frequency is 50 Hz. Four algorithms, namely, classical NCC, UCC, TSS with optical flow, and the proposed modified Taylor, are applied to extract the motion displacement of the moving circle. All code programming and computing works are accomplished with MATLAB R2015a.
During the testing, an 80 × 80 pixels region (within the red box) is selected as the tracking template for the stable tracking error. 41 UCC algorithm is an advanced subpixel image registration technique that allows resolution adjusting by changing the up-sampling factor. 34 The up-sampling factors for UCC algorithm are set as 1, 10, and 100 for subpixel levels of one integer pixel, 0.1 pixel, and 0.01 pixel, respectively. Meanwhile, the UCC algorithm cannot give accurate displacement results until the template size is large enough. The contrast test results on the same template condition are summarized in Table 1 with an asterisk.
Motion extraction results of two integer-level methods, namely, NCC and UCC (usfac ¼ 1), are shown in Figs. 6(a) and 6(b). These two algorithms only scan the template's best-matching region per pixel, and such a deficiency certainly leads to a step-type curve shape and reduces extraction accuracy. Results of subpixel level motion extraction are shown in Figs. 6(c)-6(f). The figures show that the motion curves with these four subpixel level algorithms are obviously smoother than the curves with NCC and UCC (usfac ¼ 1).
Quantitive contrast results regarding tracking error and computation time are given in Table 1. The table indicates that with the improvement of subpixel resolution level from 1 to 0.01 pixel, the absolute average horizontal error of the UCC algorithm reduces from 0.2309 to 0.0548 pixel, and the absolute average vertical error reduces from 0.2378 to 0.0481 pixel. Meanwhile, the time consumed increases     Owing to the satisfactory performances in time efficiency and accuracy during displacement extraction in the simulation test, the modified Taylor algorithm was integrated into the real-time vision sensor system mentioned in Sec. 2. The software contains several modules. The high-speed camera module can control the parameters of the digital camera, such as contrast, brightness, and exposure time. The calibration part has the ability to compute the actual displacement of one pixel based on a target with an already known size. With the image-capturing part, the streaming image data can be acquired in real time and sent to the template tracking module, where the modified Taylor algorithm is operating. The entire sensor system is implemented based on the Qt and OpenCV libraries and is capable of realizing the displacement measurement of actual structures.

Case 1: Experiment on a Grating Ruler Motion Platform
To evaluate the performance of the developed vision-based sensor system, an experimental verification was carried out on a laboratory platform with a conventional grating ruler, as shown in Fig. 7. Using the Moiré fringe technology of grating and photoelectric conversion, the incremental grating displacement sensors widely act as a high-accuracy displacement measurement tool with numerous advantages, such as stability, reliability, and high accuracy. The experimental installations are shown in Fig. 7(a). The grating ruler displacement sensor was installed on the moving table platform, with its reading moving synchronously with the junction plate in the horizontal direction. With this structure, the displacement of the target can be recorded simultaneously by the high-speed sensor system and the grating ruler for comparison. The sampling frequency of the grating ruler sensor used in the experiment is 20 Hz, and the grating pitch is 0.02 mm with a resolution of 1 μm.
In the experiment, the vision-based high-speed camera system was experimentally evaluated against the grating ruler. As seen in Fig. 7(a), a circle target with a diameter of 20 mm was installed on the junction plate in advance. The target can be programmed to move with arbitrary amplitudes and frequencies in the horizontal direction. The video camera was placed at a stationary position 3 m away from the platform. The camera captured the moving target at a resolution of 160 × 160 pixels with 200 fps during the shooting process. To measure the displacement in realtime dimensions, the actual size of the preinstalled target panel in the video images was calculated. The results showed that 20 mm in real life corresponds to 104.7 pixels in the captured images, which means the pixel resolution would be 0.191 mm∕pixel. A 50 × 50 pixels region on the target, as shown in Fig. 7(b), was chosen as the template matching image.
The guide screw was driven by a 10 s manual arbitrary input. As shown in Fig. 8, the horizontal displacement time history measured by the vision-based system was compared with that measured by the grating ruler sensor. The grating ruler data (green dashed line) matched well with the vision-based sensor data (dashed blue line). The integer-level tracking result is marked with a red solid line. The step-type result indicates that the integer-level algorithms can only acquire the integer-pixel motion of the target, which leads to a large measurement error.
Similar to the simulation test, the captured video was analyzed by the different motion extraction algorithms previously mentioned. Quantitive experimental results regarding the tracking error and computation time are given in Table 2. To further evaluate the error performance, the normalized root mean squared error (NRMSE) is introduced as where n denotes the frame number and a and b refer to the displacement data measured by the vision-based sensor and grating ruler, respectively. The result indicates that the vision sensor system with modified Taylor algorithm has the lowest NRMSE of 0.75% and the fastest average computing time per frame of 0.22 ms. The absolute average measuring error of modified Taylor was 0.020 mm. With the pixel resolution of 0.191 mm∕pixel, the proposed sensor system achieved 1∕9 the pixel accuracy of the experimental measurement.

Case 2: Vibration Measurement of Steering Wheel System
To validate the effectiveness of the proposed sensor system in a practical environment, a vibration measure experiment on a forklift's steering wheel system was conducted. The steering wheel system consists of several components, including front panel, install panel, mounting plate, commutator pump, steering wheel, and tubular column. As shown in Fig. 9(a), the steering wheel and tubular column are assembled using an interference fit, and the mounting plate is connected with the commutator pump using a locking device. The mounting plate and front panel are both welded to the install plate. Owing to the design defects, resonance exists on the steering wheel system when the engine is working at idle speed (22.8 to 28.3 Hz). This resonance may lead to physical complaints and industrial accidents if drivers operate the forklift for a long drive. A finite element model (FEM) of the steering wheel system was built with Pro/Engineer, as shown in Fig. 9(b). The locking device between commutator pump and mounting plate was simplified into a bolted connection in the FEM modeling. Figure 9 also illustrates the grid results using a hexahedral mesh. Modal test results proved that a firstorder natural frequency occurs at 22.3262 Hz. From Fig. 10, the vibration mode of this frequency is shown as a horizontal steering wheel bending. The FEM analysis confirmed the resonance speculation because the natural frequency is apparently close to the resonance frequency range.
The vision-based experimental setup on the forklift's steering wheel system is shown in Fig. 11. The highspeed camera sensor was installed on a special support to avoid additional interference. Measurement targets with a 10 mm × 10 mm size were marked on the upper surface of the steering wheel. The distance between the camera and the steering wheel was about 60 cm. The actual size of one pixel was 0.0375 mm∕pixel, which was calculated   The horizontal and vertical vibration displacements with their corresponding Fourier spectra after the modified Taylor algorithm were applied to the vibration video, as shown in Fig. 12. The results show that the center of the steering wheel vibrates with an amplitude under 0.5 mm after excitation. Two obvious spectral peaks can be observed at 22.27 and 44.24 Hz in the Fourier spectrum results; these peaks can be considered as the first-order master frequency and its double frequency of the steering wheel system. During the motion extraction process, the elapsed time for each frame was less than 0.4 ms, and more than 87% extractions were completed within 0.1 ms. The results are very close to the natural frequency obtained with FEM analysis with an acceptable error. Therefore, the same frequency can be obtained accurately from the proposed vision-based displacement sensor.

Conclusions
This study developed a vision-based high-speed sensor system for dynamic displacement measurement. The sensor system is composed of a high-speed camera head with a zoom optical lens and a notebook computer. To meet the requirement of real-time measurement, a motion extraction algorithm with high efficiency is used. With the combination of block-matching algorithm and simplified optical flow, motion vectors between frames can be extracted accurately. The method is proven to be much faster than conventional algorithms because there is no interpolation or motion compensation. However, this combination method still has room for improvement.
In our proposed algorithm, the integer-pixel searching is replaced with a rounding-iterative operation on Taylor approximation. This simple modification does not bring any unnecessary computation or pixel interpolation to the original method. By benefiting from no additional parameter requiring specification, the modified algorithm can execute with high automation and even faster with better accuracy. Based on the assumption of brightness constancy or intensity conservation, the proposed algorithm obtains the displacement vector between frames in the sense of LSs and achieves fast automatic computation by iteratively updating the template's position. Without the image feature extraction process, the algorithm simplifies the selection of thresholds and is completed through a simple matrix operation. The properties of high efficiency, high precision, and good robustness of the proposed algorithm itself contribute to the applications of the high-speed camera sensor system.
A simulation on the tracking rotation motion of a black circle as well as two experiments on a grating ruler motion platform and vibration analysis of steering wheel system are conducted to verify the effectiveness of the modified algorithm and developed sensor system. The results of displacement extraction using the modified algorithm are compared with the actual values and the results of three other existing extraction algorithms. From the simulation test, a satisfactory agreement is observed between the real motion curve and the curve obtained through the modified algorithm. In the grating ruler motion platform experiment, the motion of the grating ruler platform is accurately measured using the developed sensor system. In a realistic environment, the performance of the vision sensor is further confirmed by the vibration analysis of the forklift's steering wheel system. Of all the simulation and experiments, the modified algorithm shows its outperformance on computing efficiency. The average elapsed time of handling one frame can be reduced to less than 1 ms with an impressive measurement error. Although the brightness constancy assumption works well in practice, the large vibration on illumination intensity may still influence the measurement results and lead to large errors. Different from the improvement through multiframe, 39 the modified algorithm acquires the image basis by handling only one frame. This characteristic makes the method concise and highly effective, but the differential operation may amplify the image noise and cause an undesired error. The developed sensor system can only meet the real-time measurement under a frequency sampling below 500 Hz, which is limited by the camera module we can access. Future work will be focused on improving the algorithm's robustness under large illumination changes and developing a sensor system for high frequency sampling over 500 Hz.