Existing motion-capture technologies are implemented through expensive and specialized equipment, which typically requires the wearable devices or the mark of human body joint. This leads to a distinct limitation on the accessibility and popularity for mass market consumer. To address these problems, we present a pipeline for 3D human motion capture from binocular stereo vision, which is based on MobilePose, a supervised learning method for 2D skeletal joint detection in real time. Due to the short time of 2D joint detection and 3D reconstruction, our approach can fulfill 3D human motion capture in real time. Besides, we use the captured 3D motion information to implement a simple application on human body animation. Compared with Microsoft's Kinect method using depth camera, our method is implemented with the available cameras, which is low cost and widely used.
For disparity information acquisition tasks, phase-shift profilometry can achieve high disparity accuracy, but it is a stereo matching technique based on phase unwrapping. The phase unwrapping depends on the fringe pattern of multifrequency. Recent work also shows that deep learning can obtain disparity from a stereo pair of images, but it is difficult to obtain high accuracy. To tackle these problems, we propose a stereo matching method of obtaining high-accuracy disparity using a combination of the fringe pattern stereo matching network and the phase-shift method. First, the coarse disparity is obtained by using the fringe pattern stereo matching network, and then the wrapped phase map obtained by the 3 steps phase-shift method is used to optimize the coarse disparity to obtain a high-accuracy disparity map. This method rectifies the left and right fringe patterns in advance and eliminates the influence of the calibration parameters without loss of generality. It also avoids the challenge of phase unwrapping compared with phase-shift profilometry. We prove that the sinusoidal fringe pattern stereo matching can obtain a better coarse disparity effect than the stereo matching of the texture image, especially in the textureless area. Experiments show that high-precision disparity can be obtained with only three frames of high-frequency fringe patterns.