This paper presents a human pose recognition method which simultaneously reconstructs a human volume based on ensemble of voxel classifiers from a single depth image in real-time. The human pose recognition is a difficult task since a single depth camera can capture only visible surfaces of a human body. In order to recognize invisible (self-occluded) surfaces of a human body, the proposed algorithm employs voxel classifiers trained with multi-layered synthetic voxels. Specifically, ray-casting onto a volumetric human model generates a synthetic voxel, where voxel consists of a 3D position and ID corresponding to the body part. The synthesized volumetric data which contain both visible and invisible body voxels are utilized to train the voxel classifiers. As a result, the voxel classifiers not only identify the visible voxels but also reconstruct the 3D positions and the IDs of the invisible voxels. The experimental results show improved performance on estimating the human poses due to the capability of inferring the invisible human body voxels. It is expected that the proposed algorithm can be applied to many fields such as telepresence, gaming, virtual fitting, wellness business, and real 3D contents control on real 3D displays.
This paper presents a method for tracking human poses in real-time from depth image sequences. The key idea is to adopt recognition for generating the model to be tracked. In contrast to traditional methods utilizing a single-typed 3D body model, we directly define the human body model based on the body part recognition result of the captured depth image, which leads to the reliable tracking regardless of users' appearances. Moreover, the proposed method has the ability to efficiently reduce the tracking drift by exploiting the joint information inserted into our body model. Experimental results on real-world environments show that the proposed method is effective for estimating various human poses in real-time.
Finding defects with automatic visual inspection techniques is an essential task in various industrial fields. Despite considerable studies to achieve this task successfully, most previous methods are still vulnerable to ambiguities from diverse shapes and sizes of defects. We introduce a simple yet powerful method to segment defects on various texture surfaces in an unsupervised manner. Specifically, our method is based on the multiscale scheme of the phase spectrum of Fourier transform. The proposed method can even handle one-dimensional long defect patterns (e.g., streaks by scratch), which have been known to be hard to process in previous methods. In contrast to traditional inspection methods limited to locating particular sorts of defects, our approach has the advantage that it can be applied to segmenting arbitrary defects, because of the nonlinear diffusion involved in the multiscale scheme. Extensive experiments demonstrate that the proposed method provides much better results for defect segmentation than several competitive methods presented in the literature.
We propose a novel image-importance model for content-aware image resizing. In contrast to the previous gradient magnitude-based approaches, we focus on the excellence of gradient domain statistics. The proposed scheme originates from a well-known property of the human visual system that the human visual perception is highly adaptive and sensitive to structural information in images rather than nonstructural information. We do not model the image structure explicitly, because there are diverse aspects of image structure and they cannot be easily modeled from cluttered natural images. Instead, our method obtains the structural information in an image by exploiting the gradient domain statistics in an implicit manner. Extensive tests on a variety of cluttered natural images show that the proposed method is more effective than the previous content-aware image-resizing methods and it is very robust to images with a cluttered background, unlike the previous schemes.
This paper presents a method to estimate the number of people in crowded scenes without using explicit object
segmentation or tracking. The proposed method consists of three steps as follows: (1) extracting space-time interest
points using eigenvalues of the local spatio-temporal gradient matrix, (2) generating crowd regions based on space-time
interest points, and (3) estimating the crowd density based on the multiple regression. In experimental results, the
efficiency and robustness of our proposed method are demonstrated by using PETS 2009 dataset.
The user-friendliness and cost-effectiveness have contributed to the growing popularity of mobile phone cameras.
However, images captured by such mobile phone cameras are easily distorted by a wide range of factors, such
as backlight, over-saturation, and low contrast. Although several approaches have been proposed to solve the
backlight problems, most of them still suffer from distorted background colors and high computational complexity.
Thus, they are not deployable in mobile applications requiring real-time processing with very limited resources. In
this paper, we present a novel framework to compensate image backlight for mobile phone applications based on
an adaptive pixel-wise gamma correction which is computationally efficient. The proposed method is composed
of two sequential stages: 1) illumination condition identification and 2) adaptive backlight compensation. Given
images are classified into facial images and non-facial images to provide prior knowledge for identifying the
illumination condition at first. Then we further categorize the facial images into backlight images and nonbacklight
images based on local image statistics obtained from corresponding face regions. We finally compensate
the image backlight using an adaptive pixel-wise gamma correction method while preserving global and local
contrast effectively. To show the superiority of our algorithm, we compare our proposed method with other
state-of-the-art methods in the literature.
Mobile IPTV is a multimedia service based on wireless networks with interactivity and mobility. Under mobile IPTV
scenarios, people can watch various contents whenever they want and even deliver their request to service providers
through the network. However, the frequent change of the wireless channel bandwidth may hinder the quality of service.
In this paper, we propose an objective video quality measure (VQM) for mobile IPTV services, which is focused on the
jitter measurement. Jitter is the result of frame repetition during the delay and one of the most severe impairments in the
video transmission via mobile channels. We first employ YUV color space to compute the duration and occurrences of
jitter and the motion activity. Then the VQM is modeled by the combination of these three factors and the result of
subjective assessment. Since the proposed VQM is based on no-reference (NR) model, it can be applied for real-time
applications. Experimental results show that the proposed VQM highly correlates to subjective evaluation.
Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give
the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel
framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity
and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos,
multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is
computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector
for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement
boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency
and robustness of our proposed method.