Consumer-grade range cameras are widely used in three-dimensional reconstruction. However, the resolution and stability limit the quality of reconstruction, especially with transparent objects. A method is proposed to reconstruct the transparency while improving the reconstruction quality of the indoor scene with a single RGB-D sensor. We propose the method to localize the transparent regions from zero depth and wrong depth. The lost surface of transparency is recovered by modeling the statistics of zero depth, variance, and residual error of signed distance function (SDF) with depth data fusion. The camera pose is first initialized by the error minimization of depth map on the SDF and k-color-frame constraint. The pose then is optimized by the penalized coefficient function, which lowers the weight of voxels with higher SDF error. The method is proved to be valid in localizing the transparent objects and can achieve a more robust camera pose under a complex background.
Designing effective local descriptors is crucial for many computer vision tasks such as image matching and patch verification. We propose a convolutional neural network (CNN)-based local descriptor named DHNet with a considerate sampling strategy and a dedicated loss function. By considerate sampling, both the closest nonmatching sample and the farther matching sample can be obtained for effectively training a discriminative model. In addition, an improved triplet loss is designed by adding a constraint that limits the absolute distance for the closest nonmatching pair. Based on hard samples and the constraint, our lightweight CNN can quickly generate local descriptors with enhanced intraclass compactness and interclass separation. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of strong discrimination ability, as evidenced by a considerable performance improvement on several benchmarks.
Binary descriptors have been widely used in many real-time applications due to their efficiency. These descriptors are commonly designed for perspective images but perform poorly on omnidirectional images, which are severely distorted. To address this issue, this paper proposes tangent plane BRIEF (TPBRIEF) and adapted log polar grid-based motion statistics (ALPGMS). TPBRIEF projects keypoints to a unit sphere and applies the fixed test set in BRIEF descriptor on the tangent plane of the unit sphere. The fixed test set is then backprojected onto the original distorted images to construct the distortion invariant descriptor. TPBRIEF directly enables keypoint detecting and feature describing on original distorted images, whereas other approaches correct the distortion through image resampling, which introduces artifacts and adds time cost. With ALPGMS, omnidirectional images are divided into circular arches named adapted log polar grids. Whether a match is true or false is then determined by simply thresholding the match numbers in a grid pair where the two matched points located. Experiments show that TPBRIEF greatly improves the feature matching accuracy and ALPGMS robustly removes wrong matches. Our proposed method outperforms the state-of-the-art methods.
Per-pixel hand detection plays an important role in many human–computer interaction applications while accurate and robust hand detection remains a challenging task due to the large appearance variance of hands in images. We introduce a per-pixel hand detection system using one single depth image. We propose a circle sampling depth-context feature for hand regions representation, and a multilayered hand detection model is built for hand regions detection. Finally, a postprocessing step based on spatial constraints is applied to refine the detection results and further improve the detection accuracy. We evaluate the accuracy of our method on a public dataset and investigate the effect of key parameters in our system. The results of the qualitative and quantitative evaluation reveal that the proposed method performs well on per-pixel hand detection tasks. Furthermore, an additional experiment on hand parts segmentation proves that the depth-context feature has a generalization power for more complex multiclass classification tasks.
The objective of large-scale object retrieval systems is to search for images that contain the target object in an image database. Where state-of-the-art approaches rely on global image representations to conduct searches, we consider many boxes per image as candidates to search locally in a picture. In this paper, a feature quantization algorithm called binary quantization is proposed. In binary quantization, a scale-invariant feature transform (SIFT) feature is quantized into a descriptive and discriminative bit-vector, which allows itself to adapt to the classic inverted file structure for box indexing. The inverted file, which stores the bit-vector and box ID where the SIFT feature is located inside, is compact and can be loaded into the main memory for efficient box indexing. We evaluate our approach on available object retrieval datasets. Experimental results demonstrate that the proposed approach is fast and achieves excellent search quality. Therefore, the proposed approach is an improvement over state-of-the-art approaches for object retrieval.
Abnormal events detection in crowded scenes has been a challenge due to volatility of the definitions for both normality and abnormality, the small number of pixels on the target, appearance ambiguity resulting from the dense packing, and severe inter-object occlusions. A novel framework was proposed for the detection of unusual events in crowded scenes using trajectories produced by moving pedestrians based on an intuition that the motion patterns of usual behaviors are similar to these of group activity, whereas unusual behaviors are not. First, spectral clustering is used to group trajectories with similar spatial patterns. Different trajectory clusters represent different activities. Then, unusual trajectories can be detected using these patterns. Furthermore, behavior of a mobile pedestrian can be defined by comparing its direction with these patterns, such as moving in the opposite direction of the group or traversing the group. Experimental results indicated that the proposed algorithm could be used to reliably locate the abnormal events in crowded scenes.
Coded structured light can rapidly acquire the shape of unknown surfaces by projecting suitable patterns onto a measuring surface and grabbing distorted patterns with a camera. By analyzing the deformation patterns appearing in the images, depth information of the surface can be calculated. This paper presents a new concise and efficient mathematical model for coded structured light measurement system to obtain depth information. The interrelations among model parameters and errors of depth information are investigated. Based on the system geometric structure, the results of system parameters affecting object imaging can be obtained. Also, the dynamic deformation patterns can be captured under different measurement conditions. By analyzing the system parameters and depth information errors, the system constraint conditions can be determined, and the system model simulation and error analysis are discussed in experiments, too. Also, the system model based on optimal parameters is utilized to implement reconstruction for two objects.
This paper presents a method of multi-curve spectrum representation of facial movements and expressions. Based on 3DMCF
(3D muscle-controlled facial) model, facial movements and expressions are controlled by 21 virtual muscles. So,
facial movements and expressions can be described by a group of time-varying curves of normalized muscle contraction,
called multi-curve spectrum. The structure and basic characters of multi-curve spectrum is introduced. The performance
of the proposed method is among the best. This method needs small quantity of data, and is easy to apply. It can also be
used to transplant facial animation between different faces.
Based on an analysis of maximum stripe deformation (due to depth change on surfaces) and measuring resolution limits, a principle of spatial periodicity used for coding is proposed. When spatial periodicity is used for coding, the resolution is greatly improved, or the number of patterns is greatly reduced for real-time structured light systems. A novel coded pattern for real-time structured light systems is presented, which is based on spatial periodicity. The coding pattern allows range scanning of moving objects with easy implementation of decoding and high measurement resolution. Using alternate time-space coding in a structured light system, we achieve a measurement speed of 20 frames per second with two stripe patterns.