Object tracking is a core subject in computer vision and has significant meaning in both theory and practice. We propose a tracking method in which a robust discriminative classifier is built based on both object and context information. In this method, we consider multiple frames of local invariant features on and around the object and construct the object template and context template. To overcome the limitation of the invariant representations, we also design a nonparametric learning algorithm using transitive matching perspective transformation. This learning algorithm can keep adding object appearance and can avoid improper updating when occlusions appear. We also analyze the asymptotic stability of our method and prove its drift-free capability in long-term tracking. Extensive experiments using challenging publicly available video sequences that cover most of the critical conditions in tracking demonstrate the enhanced strength and robustness of our method.
Object tracking is a core subject in computer vision and has significant meaning in both theory and practice. In this paper, we propose a novel tracking method, in which a robust discriminative classifier is built basing on both object and context information. In this method, we consider multiple frames of local invariant features on and around the object, and construct the object template and context template. To overcome the limitation of the invariant representations, we also design a non-parametric learning algorithm using transitive matching perspective transformation, which is called as LUPT (Learning Using Perspective Transformation). This learning algorithm can keep adding new object appearance into the object template and avoid improper updating when occlusions appear. In this paper, we also analyze the asymptotic stability of our method and prove its drift-free capability in long term tracking. Extensive experiments using challenging publicly available video sequences that cover most of the critical conditions in tracking demonstrate the enhanced strength and robustness of our method. Moreover, in comparison with several state-of -the-art tracking systems, our method shows superior performance in most of cases, especially in long time sequences.
This paper reports an efficient method for line matching, which utilizes local intensity gradient information and neighboring geometric attributes. Lines are detected in a multi-scale way to make the method robust to scale changes. A descriptor based on local appearance is built to generate candidate matching pairs. The key idea is to accumulate intensity gradient information into histograms based on their intensity orders to overcome the fragmentation problem of lines. Besides, local coordinate system is built for each line to achieve rotation invariance. For each line segment in candidate matching pairs, a histogram is built by aggregating geometric attributes of neighboring line segments. The final matching measure derives from the distance between normalized geometric attributes histograms. Experiments show that the proposed method is robust to large illumination changes and is rotation invariant.
Constructing robust binary local feature descriptors are receiving increasing interest due to their binary nature, which can enable fast processing while requiring significantly less memory than their floating-point competitors. To bridge the performance gap between the binary and floating-point descriptors without increasing the computational cost of computing and matching, optimal binary weights are learning to assign to binary descriptor for considering each bit might contribute differently to the distinctiveness and robustness. Technically, a large-scale regularized optimization method is applied to learn float weights for each bit of the binary descriptor. Furthermore, binary approximation for the float weights is performed by utilizing an efficient alternatively greedy strategy, which can significantly improve the discriminative power while preserve fast matching advantage. Extensive experimental results on two challenging datasets (Brown dataset and Oxford dataset) demonstrate the effectiveness and efficiency of the proposed method.
Fault inspection plays an important role in ensuring the safe operation of freight trains. With the development of computer vision technology, the vision-based fault inspection has become one of the principal means of fault inspection. A coupler yoke is an important component of the train’s connection system, and faults in this system would cause the separation of the train, leading to a serious accident. We propose an automatic image inspection system to inspect for faults in coupler yokes during the running of a freight train. The inspection process is divided into two parts: the localization part and the recognition part. In the localization part, we propose multiple dimension features, design a fast algorithm to compute multiresolution image features, and use a linear support vector machine classifier to locate the position of the coupler yoke. In the recognition part, we propose a fast decision tree training method by prepruning noneffective features, and use Adaboost decision trees as the final fault classifier. Experimental results show that this proposed method can achieve a fault inspection rate of 98.6% while the average processing time of an image is about 98 ms, which shows our system has a high inspection accuracy and a good real-time performance.
We propose a simple yet effective method for long-term object tracking. Different from the traditional visual tracking method, which mainly depends on frame-to-frame correspondence, we combine high-level semantic information with low-level correspondences. Our framework is formulated in a confidence selection framework, which allows our system to recover from drift and partly deal with occlusion. To summarize, our algorithm can be roughly decomposed into an initialization stage and a tracking stage. In the initialization stage, an offline detector is trained to get the object appearance information at the category level, which is used for detecting the potential target and initializing the tracking stage. The tracking stage consists of three modules: the online tracking module, detection module, and decision module. A pretrained detector is used for maintaining drift of the online tracker, while the online tracker is used for filtering out false positive detections. A confidence selection mechanism is proposed to optimize the object location based on the online tracker and detection. If the target is lost, the pretrained detector is utilized to reinitialize the whole algorithm when the target is relocated. During experiments, we evaluate our method on several challenging video sequences, and it demonstrates huge improvement compared with detection and online tracking only.
A new flexible method to calibrate the external parameters of two cameras with no-overlapping field of view (FOV) is proposed in our paper. A flexible target with four spheres and a 1D bar is designed. All spheres can move freely along the bar to make sure that each camera can capture the image of two spheres clearly. As the radius of each sphere is known exactly, the center of each sphere under its corresponding camera coordinate system can be confirmed from each sphere projection. The centers of the four spheres are collinear in the process of calibration, so we can express the relationship of the four centers only by external parameters of the two cameras. When the expressions in different positions are obtained, the external parameters of two cameras can be determined. In our proposed calibration method, the center of the sphere can be determined accurately as the sphere projection is not concerned with the sphere orientation, meanwhile, the freely movement of the spheres can ensure the image of spheres clearly. Experiment results show that the proposed calibration method can obtain an acceptable accuracy, the calibrated vision system reaches 0.105 mm when measuring a distance section of 1040 mm. Moreover, the calibration method is efficient, convenient and with an easy operation.
To overcome the drawback that Boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose a fast decision trees training method by pruning those noneffective features in advance. And basing on this method, we also design a fast Boosting decision trees training algorithm. Firstly, we analyze the structure of each decision trees node, and prove that the classification error of each node has a bound through derivation. Then, by using the error boundary to prune non-effective features in the early stage, we greatly accelerate the decision tree training process, and would not affect the training results at all. Finally, the decision tree accelerated training method is integrated into the general Boosting process forming a fast boosting decision trees training algorithm. This algorithm is not a new variant of Boosting, on the contrary, it should be used in conjunction with existing Boosting algorithms to achieve more training acceleration. To test the algorithm’s speedup performance and performance combined with other accelerated algorithms, the original AdaBoost and two typical acceleration algorithms LazyBoost and StochasticBoost were respectively used in conjunction with this algorithm into three fast versions, and their classification performance was tested by using the Lsis face database which contained 12788 images. Experimental results reveal that this fast algorithm can achieve more than double training speedup without affecting the results of the trained classifier, and can be combined with other acceleration algorithms. Key words: Boosting algorithm, decision trees, classifier training, preliminary classification error, face detection
In this paper we propose a simply yet effective and efficient method for long-term object tracking. Different from traditional visual tracking method which mainly depends on frame-to-frame correspondence, we combine high-level semantic information with low-level correspondences. Our framework is formulated in a confidence selection framework, which allows our system to recover from drift and partly deal with occlusion problem. To summarize, our algorithm can be roughly decomposed in a initialization stage and a tracking stage. In the initialization stage, an offline classifier is trained to get the object appearance information in category level. When the video stream is coming, the pre-trained offline classifier is used for detecting the potential target and initializing the tracking stage. In the tracking stage, it consists of three parts which are online tracking part, offline tracking part and confidence judgment part. Online tracking part captures the specific target appearance information while detection part localizes the object based on the pre-trained offline classifier. Since there is no data dependence between online tracking and offline detection, these two parts are running in parallel to significantly improve the processing speed. A confidence selection mechanism is proposed to optimize the object location. Besides, we also propose a simple mechanism to judge the absence of the object. If the target is lost, the pre-trained offline classifier is utilized to re-initialize the whole algorithm as long as the target is re-located. During experiment, we evaluate our method on several challenging video sequences and demonstrate competitive results.
In this paper, we aim to reconstruct 3D points of the scene from related images. Scale Invariant Feature Transform( SIFT) as a feature extraction and matching algorithm has been proposed and improved for years and has been widely used in image alignment and stitching, image recognition and 3D reconstruction. Because of the robustness and reliability of the SIFT’s feature extracting and matching algorithm, we use it to find correspondences between images. Hence, we describe a SIFT-based method to reconstruct 3D sparse points from ordered images. In the process of matching, we make a modification in the process of finding the correct correspondences, and obtain a satisfying matching result. By rejecting the “questioned” points before initial matching could make the final matching more reliable. Given SIFT’s attribute of being invariant to the image scale, rotation, and variable changes in environment, we propose a way to delete the multiple reconstructed points occurred in sequential reconstruction procedure, which improves the accuracy of the reconstruction. By removing the duplicated points, we avoid the possible collapsed situation caused by the inexactly initialization or the error accumulation. The limitation of some cases that all reprojected points are visible at all times also does not exist in our situation. “The small precision” could make a big change when the number of images increases. The paper shows the contrast between the modified algorithm and not. Moreover, we present an approach to evaluate the reconstruction by comparing the reconstructed angle and length ratio with actual value by using a calibration target in the scene. The proposed evaluation method is easy to be carried out and with a great applicable value. Even without the Internet image datasets, we could evaluate our own results. In this paper, the whole algorithm has been tested on several image sequences both on the internet and in our shots.
Homography matrix is a matric representation of the projective relation between the space plane and its corresponding
image plane in computer vision. It is widely used in visual metrology, camera calibration, 3D reconstruction and etc.
Therefore, the accurate estimation of the homography matrix is significant. Here, the quantum-behaved particle swarm
optimization method, which is global convergent, is first introduced into the estimation of homography matrix. When suited
cost function is chosen, enough point correspondences can be utilized to search the optimal homography matrix, which can
make the estimation accurately. For the purpose of evaluating the proposed method, simulations and experiments are
conducted to confirm the feasibility and robustness. The points obtained from the deviated homography matrix are reprojected
to the image plane to evaluate the accuracy. To compare with the proposed method, the Levenberg-Marquardt
method, which is a typical iterative minimization method, is utilized to obtain the homography matrix. Simulations and
experimental results show that the proposed method is reasonable, accurate, and with an excellent robustness. When 10
correspondences and 20 particles are utilized, the root mean square error of the re-projected points can reach about 0.019 mm.
Furthermore, our proposed method is not related with the initialization and less correlated with the chosen cost function,
which is the deficiency of the common estimation methods.
The stereo light microscope (SLM) plays an important role in the measurement of three-dimensional geometry on the microscopic scale. We propose a fast and precise affine calibration algorithm based on the invariable extrinsic parameters for the SLM. This calibration algorithm with a free planar reference consists of three steps: first, derive the extrinsic parameters based on their invariable definition in the pinhole and affine models; second, calculate the intrinsic parameters through homography matrix; finally, refine all the model parameters by global optimization with the previous closed-form solutions as the initial values. The effectiveness of assuming a noncoaxial optical system as an affine camera is also verified to affinely model all types of SLMs. The calibration experiments show that the affine calibration is preferable for multicriteria including running time, relative positioning precision, and absolute positioning precision. With PlanApo S 1.5× and a total magnification of 3.024×, the proposed affine calibration algorithm achieves a distance error of 0.423 μm and a positioning error of 0.195 mm within 10.6 s.
Determining the relative ubiety between the camera and the structured light plane projector is a classical problem in the measurement of line-structured light vision sensors. A geometrical calibration method based on the theory of vanishing points and vanishing lines is proposed. In this method, a planar target with several parallel lines is used. By moving the target to at least two different positions randomly, we can obtain the normal vector of the structured light plane under the camera coordinate system; as the distance of each two adjacent parallel lines has been known exactly, the parameter D of the structured light plane is determined; therefore, the function of the structured light plane can be confirmed. Experimental results show that the accuracy of the proposed calibration method can reach 0.09 mm within the view field of about 200×200 mm . Moreover, the target used in our calibration method can be easily produced precisely, and the calibration method is efficient and convenient as its simple calculation and easy operation, especially for onsite calibration.
We focus on two key problems in the calibration of multi-sensor visual measurement systems based on structured light: the calibration of the structured light vision sensor, and the global calibration of multiple vision sensors. In the calibration of the vision sensor, the light-plane equation is computed by combining the Plucker matrices of light stripes obtained at different target positions. Since the light-plane equation is optimized by using all the light-stripe center points, the robustness and accuracy of calibration are considerably improved. For the global calibration of multiple vision sensors, the relative positions of the two vision sensors with non-overlapping fields of view are calibrated by means of two planar targets (fixed together), using the constraint that the relative positions of the two targets are invariable. The mutual transformations between the two targets need not be known. Using one of the vision sensors as the base vision sensor, the global calibration of multiple vision sensors is achieved by calibrating each of the other vision sensors with the base vision sensor. The proposed method has already been successfully applied in practic.
A global calibration method of multi-sensor vision system based on flexible 3d target is proposed to solve the calibration
problem of multi-sensor vision system with a large inspection range. The flexible 3d target is a form of target consisted
of several planar targets, called sub-targets, which are placed flexibly according to the sensors' positions. The coordinate
frame of one of the vision sensor is selected as the global coordinate frame. Using the invariance of the relative positions
between sub-targets in the flexible 3d target, the closed solution of the transformation from the local coordinate frame of
each sensor to the global coordinate frame can be computed. The result is refined by the nonlinear optimization method,
and maximum likelihood estimation of the translation matrixes can be achieved. Experimental result demonstrates high
accuracy of proposed calibration method. The proposed method greatly simplifies the process and reduces the cost of
global calibration, for it does not need high-accuracy 3d measuring equipment or special 3d target as most the traditional
global calibration methods do, and only needs to combine several planar targets to carry out the global calibration. It is
applicable for the global calibration of multi-sensor vision system at working location.
A novel 3-D terrain matching algorithm is presented for a passive aircraft navigation system. Stereo matching of a pair of overlapping images can yield a recovered digital elevation model (DEM), which can be matched to the airborne reference DEM of the 3-D terrain. The two DEMs can be represented by the compact representation of contour maps, so the terrain matching is converted to contour-map matching. A contour-map matching algorithm using a combination of Fourier transform and polar transform is then proposed to estimate the associated translation and rotation parameters, which provide the desired position and orientation of the aircraft. Experimental results with real terrain data demonstrate that the proposed algorithm is insensitive to large noise and distortion compared to the previous state of the art, and also has the merits of high reliability and accuracy.
For the problem of the impact of slanting installation of PSD used as photoelectric detector on spot position, a
mathematical model of this distortion error of the spot position is established and the simulation is done. Some
conclusions show that the distortion error of the spot position increases with the increasing of slanting angle of PSD
surface, beam waist radius and distance between PSD and beam waist position. The effect on spot positioning precision
of foregoing two can be ignored in a little range, and the last one has great effect. The distortion error model of spot
position and simulative results provide an available theoretical reference for the actually engineering applications of
OpenGL is the international standard of 3D image. The 3D image generation by OpenGL is similar to the shoot by
camera. This paper focuses on the application of OpenGL to computer vision, the OpenGL 3D image is regarded as
virtual camera image. Firstly, the imaging mechanism of OpenGL has been analyzed in view of perspective projection
transformation of computer vision camera. Then, the relationship between intrinsic and extrinsic parameters of camera
and function parameters in OpenGL has been analysed, the transformation formulas have been deduced. Thereout the
computer vision simulation has been realized. According to the comparison between the actual CCD camera images and
virtual camera images(the parameters of actual camera are the same as virtual camera's) and the experiment results of
stereo vision 3D reconstruction simulation, the effectiveness of the method with which the intrinsic and extrinsic
parameters of virtual camera based on OpenGL are determined has been verified.
Structured light based 3D vision has wide applications in inspecting the form and position errors like straightness and coaxiality of cylindrical workpieces. But for these applications, the light stripe on the workpiece's surface is much too short, and contains inadequate data information, even with some noise. Under such circumstances, the ellipse fitting to the scattered data of the light stripe is not efficient enough, and its fitting accuracy is usually poor. To address this problem, a new least-square fitting method based on the constraint of ellipse minor axis (called CEMA method) is proposed in detail in this paper. Simulations are given for the proposed method and for five other popular methods described in the literature. The results show that the proposed method can efficiently improve the accuracy and the robustness of ellipse fitting to the scattered data of short light stripe.