As more multimedia services have become increasingly available over networks where bandwidth is not always guaranteed, quality monitoring has become an important issue. For instance, quality of experience and quality monitoring have become important problems in internet protocol television applications, since transmission errors may introduce all kinds of additional video quality degradations. In this paper, we present a reduced-reference objective model for video quality measurements in multimedia applications. The proposed method first measures edge degradations that are critical for perceptual video quality and then considers transmission error effects. We compared the proposed method with some existing methods. Independent verifications confirmed that the proposed method showed good performance and consequently it was included in an International Telecommunication Union recommendation. The proposed method can be used to monitor video quality at receivers while requiring minimum usage of additional bandwidth.
Detecting foreground objects from image sequences has played an important role in many machine vision applications. Background modeling, which is a preliminary processing step for foreground detection, is a challenging task due to the complexity and variety of background regions, unexpected situations, and image artifacts such as noise factors, impairments, etc. In this work, we propose a pixel-based background modeling method that uses nonparametric kernel density estimation and foreground/background classification based on the Bayesian decision rule. To reduce the complexity of the kernel density estimation technique, we estimate the probability density function for the background regions using histograms. Hue, saturation, and value (HSV) color and gradient information is also used to represent the background features. After the background statistics are estimated, we detect the foreground regions by using a background subtracting method based on the Bayesian decision rule, which eliminates the need to select and tune the threshold value for foreground/background region classification. The proposed algorithm is validated using datasets acquired in indoor and outdoor environments with a fixed camera. The proposed algorithm is quantitatively compared with two existing background modeling methods. The experimental results show that the proposed algorithm produces more accurate and stable results.
A problem with face recognition is that lighting conditions can affect performance since facial features are easily
distorted by varying illumination conditions. To address this problem, we present a new preprocessing method using the
census transform. In the proposed method, we generate a number of binary values from the current pixel by comparing it
with the neighbor pixels, and encode the binary values into a single gray scale image. Finally, we apply the principal
component analysis (PCA) to the gray scale images. Experiments show that the proposed method provides promising
Brain segmentation is a challenging problem due to the complexity of the brain. In this paper, we propose an automated brain segmentation method for 3D magnetic resonance (MR) brain images which are represented as a sequence of 2D brain images. The proposed method consists of three steps: pre-processing, removal of non-brain regions (e.g., the skull, meninges, other organs, etc), and spinal cord restoration. In pre-processing, we perform adaptive thresholding which takes into account variable intensities of MR brain images corresponding to various image acquisition conditions. In segmentation process, we iteratively apply 2D morphological operations and masking for the sequences of 2D sagittal, coronal, and axial planes in order to remove non-brain tissues. Next, final 3D brain regions are obtained by applying OR operation for segmentation results of three planes. Finally we reconstruct the spinal cord truncated during the previous processes. Experiments are performed with fifteen 3D MR brain image sets with 8-bit gray-scale. Experiment results show the proposed algorithm is fast, and provides robust and satisfactory results.
Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to video conference. In the proposed ethod, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.
In this paper, we propose an efficient and fast feature extraction method for iris recognition using wavelet transforms. The wavelet transforms have good space-frequency localization and there exist a number of fast algorithms. In particular, the coefficients of the lowest frequency band, reflecting the characteristics of the whole iris pattern, are used as a feature vector. However, a major problem of iris recognition is that noise such as the eyelid, the eyebrow and glint may be included in iris texture. Such noises adversely affect the performance of iris recognition systems. In order to solve these problems, after dividing the iris texture into a number of sub-regions, we propose to apply the wavelet transform separately to each sub-region and to extract a feature vector from each sub-region. In matching module, we discard some sub-regions which have large differences to exclude a potential noise. Experiments were performed using 3136 eye images acquired from 94 individuals. Experimental results show that the performance of proposed method is comparable to that of the method using Gabor transform and region division noticeably improves recognition performance for both methods. However, it is noted that the processing time of the former is much faster than that of the latter.