Photoplethysmography (PPG) technology is widely used in wearable heart pulse rate monitoring. It might reveal the potential risks of heart condition and cardiopulmonary function by detecting the cardiac rhythms in physical exercise. However the quality of wrist photoelectric signal is very sensitive to motion artifact since the thicker tissues and the fewer amount of capillaries. Therefore, motion artifact is the major factor that impede the heart rate measurement in the high intensity exercising. One accelerometer and three channels of light with different wavelengths are used in this research to analyze the coupled form of motion artifact. A novel approach is proposed to separate the pulse signal from motion artifact by exploiting their mixing ratio in different optical paths. There are four major steps of our method: preprocessing, motion artifact estimation, adaptive filtering and heart rate calculation. Five healthy young men are participated in the experiment. The speeder in the treadmill is configured as 12km/h, and all subjects would run for 3-10 minutes by swinging the arms naturally. The final result is compared with chest strap. The average of mean square error (MSE) is less than 3 beats per minute (BPM/min). Proposed method performed well in intense physical exercise and shows the great robustness to individuals with different running style and posture.
An active depth sensing approach by laser speckle projection system is proposed. After capturing the speckle pattern with an infrared digital camera, we extract the pure speckle pattern using a direct-global separation method. Then the pure speckles are represented by Census binary features. By evaluating the matching cost and uniqueness between the real-time image and the reference image, robust correspondences are selected as support points. After that, we build a disparity grid and propose a generative graphical model to compute disparities. An iterative approach is designed to propagate the messages between blocks and update the model. Finally, a dense depth map can be obtained by subpixel interpolation and transformation. The experimental evaluations demonstrate the effectiveness and efficiency of our approach.
Proc. SPIE. 9045, 2013 International Conference on Optical Instruments and Technology: Optoelectronic Imaging and Processing Technology
KEYWORDS: Image fusion, Optical filters, Digital filtering, Machine learning, Active remote sensing, Optimization (mathematics), 3D vision, Magnetorheological finishing, 3D image processing, RGB color model
In this paper, we consider the task of hole filling in depth maps, with the help of an associated color image. We take a supervised learning approach to solve this problem. The model is learnt from the training set, which contain pixels that have depth values. Then we apply supervised learning to predict the depth values in the holes. Our model uses a regional Markov Random Field (MRF) that incorporates multiscale absolute and relative features (computed from the color image), and models depths not only at individual points but also between adjacent points. The experiments show that the proposed approach is able to recover fairly accurate depth values and achieve a high quality depth map.
Face recognition in surveillance is a hot topic in computer vision due to the strong demand for public security and remains a challenging task owing to large variations in viewpoint and illumination of cameras. In surveillance, image sets are the most natural form of input by incorporating tracking. Recent advances in set-based matching also show its great potential for exploring the feature space for face recognition by making use of multiple samples of subjects. In this paper, we propose a novel method that exploits the salient features (such as eyes, noses, mouth) in set-based matching. To represent image sets, we adopt the affine hull model, which can general unseen appearances in the form of affine combinations of sample images. In our proposal, a robust part detector is first used to find four salient parts for each face image: two eyes, nose, and mouth. For each part, we construct an affine hull model by using the local binary pattern histograms of multiple samples of the part. We also construct an affine model for the whole face region. Then, we find the closest distance between the corresponding affine hull models to measure the similarity between parts/face regions, and a weighting scheme is introduced to combine the five distances (four parts and the whole face region) to obtain the final distance between two subjects. In the recognition phase, a nearest neighbor classifier is used. Experiments on the public ChokePoint dataset and our dataset demonstrate the superior performance of our method.
Real-time accurate motion detection is a key step for many visual applications, such as object detection, smart video surveillance and so on. Although lots of considerable research efforts have been devoted to it, it is still a challenging task due to illumination variation, etc. In order to enhance the robustness to illumination changes, many block-based motion detection algorithms are proposed. However, these methods usually neglect the influences of different block sizes. Furthermore, they cannot choose background-modeling scale automatically as environment changes. These weaknesses limit algorithm’s flexibility and their application scenes. In the paper, we propose a multi-scale motion detection algorithm to benefit from different block sizes. Moreover, an adaptive linear fusion strategy is designed through analyzing the accurateness and robustness of background models at different scales. At detecting, the ratios of different scales would be adjusted as the scene changes. In addition, to reduce the computation cost at each scale, we design an integral image structure for HOG feature of different scales. As a result, all features only need to be computed once. Different out-of-door experiments are tested and demonstrate the performance of proposed model.
In this paper, we propose a real-time action recognition algorithm, based on 3D human skeleton positions provided by the depth camera. Our contributions are threefold. First, considering that skeleton positions in different actions at different time are similar, we adopt the Naive-Bayes-Nearest-Neighbor (NBNN) method for classification. Second, to avoid different but similar actions which would decrease recognition rate obviously, we present a hierarchical model and increase the recognition rate significantly. Third, for a real-time application, we apply the sliding window to buffer the input and the threshold presented by the ratio of the second nearest distance and the nearest distance to smooth the output. Our method also rejects undefined actions. Experimental results on the Microsoft Research Action3D dataset demonstrate that our algorithm outperforms other state-of-the-art methods both in recognition rate and computing speed. Our algorithm increases the recognition rate by about 10% at the speed of 30fps averagely (with resolution 640×480).
Hand gesture recognition has attracted more interest in computer vision and image processing recently. Recent works for hand gesture recognition confronted 2 major problems. The former one is how to detect and extract the hand region from color-confusing background objects. The latter one is the expensive computational cost by considering the kinematic hand model with up to 27 degrees of freedom. This paper proposes a stable and real-time static hand gesture recognition system. Our contributions are listed as follows. First, to deal with color-confusing background objects, we take the RGB-D (RGB-Depth) information into account, where foreground and background objects can be segmented well. Additionally, a coarse-to-fine model is proposed, which utilizes the skin color and helps us extract the hand region robustly and accurately. Second, considering the principal direction of hand region is random, we introduce the principal component analysis (PCA) algorithm to estimate and then compensate the direction. Finally, to avoid the expensive computational cost of traditional optimization, we design a fingertip filter and detect extended fingers via calculating their distances to palm center and curvature easily. Then the number of extended fingers will be reported, which corresponds to the recognition result. Experiments have verified the stability and high-speed of our algorithm. On the data set captured by the depth camera, our algorithm recognizes the 6 pre-defined static hand gestures robustly with average accuracy about 98.0%. Furthermore, the average computational time for each image (with the resolution 640×480) is 37ms, which can be extended to many real-time applications.
High-performance pedestrian detection with good accuracy and fast speed is an important yet challenging task in computer vision. We design a novel feature named pair normalized channel feature (PNCF), which simultaneously combines and normalizes two channel features in image channels, achieving a highly discriminative power and computational efficiency. PNCF applies to both gradient channels and color channels so that shape and appearance information are described and integrated in the same feature. To efficiently explore the formidably large PNCF feature space, we propose a statistics-based feature learning method to select a small number of potentially discriminative candidate features, which are fed into the boosting algorithm. In addition, channel compression and a hybrid pyramid are employed to speed up the multiscale detection. Experiments illustrate the effectiveness of PNCF and its learning method. Our proposed detector outperforms the state-of-the-art on several benchmark datasets in both detection accuracy and efficiency.
Gaussian Mixture Model (GMM) for background subtraction (BGS) is widely used for detecting and tracking objects in
video sequences. Although the GMM can provide good results, low processing speed has become its bottleneck for realtime
applications. We propose a novel method to accelerate the GMM algorithm based on graphics processing unit
(GPU). As GPU excels at performing massively parallel operations, the novelty lies in how to adopt various optimization
strategies to fully exploit GPU's resources. The parallel design consists of three levels. On the basis of first-level
implementation, we employ techniques such as memory access coalescing and memory address saving to the secondlevel
optimization and the third-level modification, which reduces the time cost and increases the bandwidth greatly.
Experimental results demonstrate that the proposed method can yield performance gains of 145 frames per second (fps)
for VGA (640*480) video and 505 fps for QVGA (320*240) video which outperform their CPU counterparts by 24X and 23X speedup respectively. The resulted surveillance system can process five VGA videos simultaneously with strong robustness and high efficiency.
Selective enhancement mechanism of Fine-Granular-Scalability (FGS) In MPEG-4 is able to enhance specific objects under bandwidth variation. A novel technique for self-adaptive enhancement of interested regions based on Motion Vectors (MVs) of the base layer is proposed, which is suitable for those video sequences having still background and what we are interested in is only the moving objects in the scene, such as news broadcasting, video surveillance, Internet education, etc. Motion vectors generated during base layer encoding are obtained and analyzed. A Gaussian model is introduced to describe non-moving macroblocks which may have non-zero MVs caused by random noise or luminance variation. MVs of these macroblocks are set to zero to prevent them from being enhanced. A segmentation algorithm, region growth, based on MV values is exploited to separate foreground from background. Post-process is needed to reduce the influence of burst noise so that only the interested moving regions are left. Applying the result in selective enhancement during enhancement layer encoding can significantly improves the visual quality of interested regions within an aforementioned video transmitted at different bit-rate in our experiments.
In this paper, we present a novel scheme to deliver the scalable video over priority network. Firstly, we describe the background on the scalable video transmission over the network and the motivation of this research. Secondly, a new scheme is proposed, covering bitstream classification, prioritization and packetization. Thirdly, we present a simple and effective mechanism of rate control and adaptation by selectively dropping packet aiming to minimize the end-to-end distortion. We also describe a framework of transmission. Simulations show that our scheme is also effective to video multicast scenario.
This paper proposes a novel robust framework for scalable video over Internet. The main contribute of our work is that a simplified rate-distortion theory is specially developed for scalable bit-stream over the network and the corresponding bit allocation is presented to determine the sets of the channel rate for each video layer. Compared with the traditional iterative optimal bit allocation with complexity O(nL) time, simulations show that our scheme achieves high quality video transmission with much less complexity O(L x n) time, only no more than 0.2dB under different network conditions (different bandwidth and different packet loss case). Besides, our error control scheme can be naturally combined with congestion control and error resilient techniques to enhance the performance of the overall system.
Error concealment is becoming increasingly important because of the growing interest in multimedia transmission over unreliable channels such as wireless channel. At present most concealment method has its own advantage as well as applicable limitation. In different case, it can achieve different concealment effect. In our paper, we present a novel feature-based image error detection and error concealment algorithm to improve the image quality which was degraded during its transmission over wireless channel. First a simulation channel based on Rayleigh mode is implemented to emulating the actual wireless fading channel characterized by fading, multipath and Doppler frequency shift. The damaged image blocks are detected by exploring the contextual information in images, such as their consistency and edge continuity. The statistical characteristics of missing blocks are then estimated based on the types of their surrounding blocks (e.g., smoothness, texture and edge). Finally different error concealment strategies are applied to different types of blocks in order to achieve better visual quality. Instead of assuming random errors in packets, we simulate the errors of wireless channel based on the Rayleigh model. The proposed algorithm is tested on a number of still images. Simulation results demonstrate that our proposed algorithm is effective in terms of visual quality assessment and PSNR.
With the emerging of the third generation (3G) wireless technology, digital media, like image and video, over wireless channel becomes more and more demanding. In this paper, the measure metrics for the wireless image is proposed and a Qos-guarantee error control is presented, combining UEP with Forward Error Correction (FEC) and Automatic Repeat reQuest (ARQ), aiming to high quality image transmission with short delay and little energy. Simulation results show that our scheme can achieve good reconstructed image with few retransmission times and small bit budget under different channel conditions, which can reduce the energy consumed in the network interface.
Scalable video delivery over wireless link is a very challenging task due to the time-varying characteristics of wireless channels. This paper proposes a channel-adaptive error control scheme for efficiently video delivery, which consists of dynamically channel estimation and channel- adaptive Unequal Error Protection (UEP). In our proposed channel-adaptive UEP scheme, a bit allocation algorithm is presented to periodically allocate the available bits among different video layers based on varying channel conditions so as to minimize the end-to-end distortion. Simulation results show that our proposed scheme is efficient under various channel conditions.