We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
We describe a new pose-estimation algorithm via integration of the strength in both empirical mode decomposition (EMD) and mutual information. While mutual information is exploited to measure the similarity between facial images to estimate poses, EMD is exploited to decompose input facial images into a number of intrinsic mode function (IMF) components, which redistribute the effect of noise, expression changes, and illumination variations as such that, when the input facial image is described by the selected IMF components, all the negative effects can be minimized. Extensive experiments were carried out in comparisons to existing representative techniques, and the results show that the proposed algorithm achieves better pose-estimation performances with robustness to noise corruption, illumination variation, and facial expressions.
We describe a new algorithm to construct pseudo-3-D videos out of conventional 2-D videos at the viewing end, where no additional 3-D information is attached at the source of 2-D video production. We name such constructed videos pseudo-3-D on the grounds that the converted video is not true 3-D but presents a perceptual 3-D effect when viewed with a pair of polarized glasses. The proposed algorithm consists of two video processing stages: (1) semantic video object segmentation; and (2) estimation of disparities. In the first stage, we propose a constrained region-growing and filtering approach to improve existing segmentation techniques based on change detections. Such a processing stage ensures that disparities are estimated in terms of semantic video objects rather than textured regions, and thus improve the pseudo-3-D effect in terms of human visual perception. In the second stage, we propose a video object (VO)-size-based disparity estimation to construct additional video frames for the proposed pseudo-3-D video conversion. Experiments carried out demonstrate that the proposed algorithm effectively produces perceptually harmonious pseudo-3-D videos with the advantages of simplicity and low computing cost.
We propose a fast and simple shot-cut detection algorithm, which directly operates in a compressed domain and is suitable for real-time implementation. The proposed algorithm exploits the existing MPEG techniques by examining the prediction status for each macro-block inside the B and P frames. As a result, locating both abrupt and dissolved shot cuts is operated by a sequence of comparison tests; thus, no feature extraction or histogram differentiation is needed. Although the description of the algorithm is primarily based on MPEG-1 and MPEG-2 streams, the scheme can be readily extended to other video compression standards such as MPEG-4 and H.264 by following the principle on monitoring: (1) balance between forward prediction and backward prediction; and (2) boundaries among the P, B, and I frames. Extensive experiments illustrate that the proposed algorithm outperforms similar existing algorithms, providing a useful technique for fast and online video content processing.
As video databases become increasingly important for full exploitation of multimedia resources, this paper aims at describing our recent efforts in feasibility studies towards building up a content-based and high-level video retrieval/management system. The study is focused on constructing a semantic tree structure via combination of low-level image processing techniques and high-level interpretation of visual content. Specifically, two separate algorithms were developed to organise input videos in terms of two layers: the shot layer and the key-frame layer. While the shot layer is derived by developing a multi-featured shot cut detection, the key frame layer is extracted automatically by a genetic algorithm. This paves the way for applying pattern recognition techniques to analyse those key frames and thus extract high level information to interpret the visual content or objects. Correspondingly, content-based video retrieval can be conducted in three stages. The first stage is to browse the digital video via the semantic tree at structural level, the second stage is match the key frame in terms of low-level features, such as colour, shape of objects, and texture etc. Finally, the third stage is to match the high-level information, such as conversation with indoor background, moving vehicles along a seaside road etc. Extensive experiments are reported in this paper for shot cut detection and key frame extraction, enabling the tree structure to be constructed.
We describe a texture description algorithm, designed in the wavelets domain, to reduce the dimension of an existing texture-based feature vector and improve on the existing texture description algorithm in terms of both effectiveness and efficiency. We first demonstrate that the estimated texture directions at different wavelet decomposition scales are very similar. Thus, a new texture description with explicit direction representation can be constructed to improve discrimination capability. Second, we propose a subband integration scheme to further improve the texture description and achieve robustness to rotation of texture patterns. Third, a range of successful texture description elements developed in the pixel domain are applied to the LL subband and added to the texture descriptor for further enhancement of the proposed algorithm. Extensive testing, benchmarked by the existing techniques, shows that the proposed algorithm not only reduces the sensitivity of retrieval to image texture rotation, but also improves the retrieval accuracy.
Conventional pointwise adaptive contrast enhancement is very effective in adjusting the contrast of medical images with large regional brightness differences. However, the pointwise algorithm is computationally intensive, thus limiting its practical application. Two existing stepwise algorithms cut down the computational overhead by adopting regional interpolation and averaging techniques respectively. Based on these two stepwise algorithms, we describe an adaptive contrast enhancement in which a new kind of variable image filter suitable for array multiplication is employed. Experiments show that the new algorithm can avoid the generation of blocking effects and effectively reduce image distortion in enhancement. Moreover, the speed of the algorithm can be adjusted according to different requirements.
The increasing availability and use of online video has led to a high demand for very accurate and efficient automated
video analysis techniques. Previous research has concentrated on segmenting videos by detecting the boundaries between
camera shots. Shot cut detection is the first step in every video indexing and retrieval algorithm. A number of automated
shot-change detection methods for indexing a video sequence to facilitate browsing and retrieval have been proposed in
recent years. However, there is no dataset that is internationally agreed upon to be used as a benchmark for evaluating the
emerging techniques. In this paper, a new algorithm is proposed for the shot cut detection. The detection algorithm
consists of three major stages. The morphological Hit-Miss transform is applied in the first stage. The watershed
transform is applied next and finally feature extraction is carried out. To enable comparison with previous work, the
dataset used in this new algorithm is applied to the technique introduced by T. Y. Liu et al.,2004. Our algorithm shows a
remarkable difference and it provides a better recall and precision rates.
Block-based motion estimation is widely used in the field of video compression due to its feature of high processing
speed and competitive compression efficiency. In the chain of compression operations, however, motion estimation still
remains to be the most time-consuming process. As a result, any improvement in fast motion estimation will enable
practical applications of MPEG techniques more efficient and more sustainable in terms of both processing speed and
computing cost. To meet the requirements of real-time compression of videos and image sequences, such as video
conferencing, remote video surveillance and video phones etc., we propose a new search algorithm and achieve fast
motion estimation for MPEG compression standards based on existing algorithm developments. To evaluate the
proposed algorithm, we adopted MPEG-4 and the prediction line search algorithm as the benchmarks to design the
experiments. Their performances are measured by: (i) reconstructed video quality; (ii) processing time. The results reveal
that the proposed algorithm provides a competitive alternative to the existing prediction line search algorithm. In
comparison with MPEG-4, the proposed algorithm illustrates significant advantages in terms of processing speed and
We propose a fast partial decoding algorithm for content access to MPEG compressed videos, where full decompression is not necessarily required, such as compressed video browsing, content analysis, and specific pattern search. The proposed decoding bypasses the inverse DCT via an approximation process to extract average pixels directly from compressed DCT coefficients. Although such extracted pixels may incur differences compared with their fully decompressed counterparts, extensive experiments show that such partially decoded video frames preserve their content very well and achieve reasonable perceptual quality in terms of visual inspections.
The explosive growth of images and videos on the World Wide Web (WWW) is making the Web into a huge resource of visual information. Among various types of multimedia information, still images or dynamic images (video clips) in compressed format are the most widely accepted on the WWW. Therefore, it becomes an essential issue to achieve the maximum efficiency in transmitting and decoding those compressed images on the Internet. Progressive coding provides a mode that allows a coarse version of an image being transmitted at a lower bit rate and then gradually refined by subsequent transmissions. Compared with conventional coding, it is more suitable for interactive applications such as those involving JPEG images on the Internet. In this paper, we first give an approximation of cosine function used in IDCT for the various orders. Based on the approximation and a series analysis, we then develop a progressive decoding scheme which comprehends the successive approximation and the spectral selection. The analysis and experiments establish the fact that our proposed method saves computational cost significantly in comparison with the existing spectral selection based progressive decoding proposed by JPEG. Extensive experiments are carried out to evaluate the proposed algorithm, which reveals that, the reconstructed images, even at the lowest bit rate and with lower order approximation, can still achieve encouraging PSNR values.
In this paper, we describe our recent work attempting to improve the motion estimation and compensation performance in MPEG, without introducing any significant computing cost. To achieve low complexity, MPEG used fixed frame classification or allocation of I, B, P-frames to do the motion estimation and compensation. By introducing an adaptive mechanism to connect the frame content and previous record analysis with the arrangement of video frames, the classification inside the video stream can be determined according to the nature of its content in relation to those neighboring video frames. During this process, records of motion estimation and compensation for previous frames are taken into consideration to form a basis of analysis. As a result, a histogram of those major directions is formulated to allow information flow being analyzed before the frame is classified. Thresholds are then applied along the analyzed direction to video frames, and their classification of I, B, or P-frames is determined on the fly. Hence, the performance of motion estimation and compensation can be improved towards the whole compression system. Extensive experiments are carried out to support our work and the results reported show that our proposed method outperforms the existing MPEG-2 scheme, when assessed in mean-square-errors.
Following a series of successful launch of MPEG standards on video compression, their applications reveal ever increasing needs for their content access without full decompression or in their compressed format. To this end, we further investigated a number of partial decoding schemes to address the issue of efficient content access to compressed video streams. By controlling the number of DCT coefficients involved in the inverse DCT, a number of partial decoding schemes can be designed featuring fast processing speed and low computing cost. By controlling the size of video frames, visually perceptual quality can be adjusted to suit various application including thumbnail image browsing, low resolution image processing, head tracking, skin detection, face recognition, and object segmentation etc. where full resolution frames are often not necessarily required. While achieving improved computing cost and processing speed, our work also features in: (i) reasonably good image quality for content browsing; (ii) compatibility with original MPEG-2 bit streams; and (iii) enormous potential for further application of MPEG-2 in video content management, content-based video frame retrieval, compressed video editing, and low bit-rate video communication such as those involving mobile phones and telephone networks etc. In addition, extensive experiments were carried out and reported to support our design.
A content adaptive near lossless image codec is designed in this paper to accommodate a visual perception based rate control scheme. The near lossless codec is a modified version of JPEG-LS, in which the modification is typified by the addition of: (i) two levels of information loss distribution; (ii) line-fitting target prediction; and (iii) visual perception based rate control, implemented via quantization and redistribution of distortion. To minimise the possible excessive information loss being introduced during the rate control process, a technology of so-called moving-across-arrays is proposed to arrange the adjustment of distortion levels. Combined with the line-fitting based target prediction, the proposed scheme successfully controls the compression ratio within a limited range of targets, yet the decoded image quality is maintained competitive to JPEG-LS, the non-controlled one. Extensive experiments are carried out to show that the proposed scheme achieves effective rate control for a range of image samples and competitive image quality in terms of visual perception.
The Rate Control phase in MPEG-2 is crucial for the encoding of video sequences for two reasons. First, for timely delivery of video without buffer overflows or underfiows and second for determining indirectly the encoded video quality through moderation ofthe quantization parameter on a macro-block basis. We propose a novel robust estimate which combines local activity estimates with the average activity ofthe previously encoded frame for improving the rate distortion performance of MPEG-2. We then propose a family of exponential modulators for reducing the "over-normalization" effect which occurs when the activity of the macro-block to be encoded is higher than the activity of the previously encoded frame. Extensive experiments show that the proposed low complexity schemes outperform MPEG-2 in terms of PSNR values for the same number of bits produced. We report increases up to 5 db for the luminance component and up to 3.5 db and 3db for the chrominance components respectively
Conventional stereo image coders employ disparity compensated prediction followed by the transform coding of the residual to encode the second image of a stereo pair. However, transform coders, such as DCT, are generally not efficient in coding the residuals. In order to improve the coding gain, subspace projection techniques have been proposed in literature . The idea is to apply a transform to each block, Rb, in the right image in such a way that it exploits stereo and spatial redundancy, simultaneously. The transformation is chosen to be a reduced order operator that projects block Rb onto a subspace that is spanned by a block dependant vector and a set of fixed vectors. Further to their work, we propose a novel local texture adaptive technique that selects between two sets of fixed polynomial vectors to improve the prediction. The choice of this adaptive technique was motivated by the distinctively different orientations of pixel value variation trends that are often present in natural scenes. Extensive experimental results indicate that the proposed technique outperforms the existing techniques both in terms of compression efficiency and reconstructed image quality. Particularly, the proposed algorithm performs well in natural scenes, where most stereo image compression techniques perform sub-optimally.
In this paper, we propose a novel object driven, block based algorithm for the compression of stereo image pairs. The algorithm effectively combines the simplicity and adaptability of the existing block based stereo image compression techniques with an edge/contour based object extraction technique to determine appropriate compression strategy for various areas of the right image. Extensive experiments carried out support that significant improvements of up to 20% in compression ratio can be achieved by the proposed algorithm, compared with the existing stereo image compression techniques. Yet the reconstructed image quality is maintained at an equivalent level in terms of PSNR values. In terms of visual quality, the right image reconstructed by the proposed algorithm does not incur any noticeable effect compared with the outputs of the best algorithms. The proposed algorithm performs object extraction and matching between the reconstructed left frame and the original right frame to identify those objects that match but are displaced by varying amounts due to binocular parallax. Different coding strategies are then applied separately to internal areas and the bounding areas for each identified object. Based on the mean squared matching error of the internal blocks and a selected threshold, a decision is made whether or not to encode the predictive errors inside these objects. The output bit stream includes entropy coding of object disparity, block disparity and possibly some errors, which fail to meet the threshold requirement in the proposed algorithm.
In this paper, we describe a genetic learning neural network system to vector quantize images directly to achieve data compression. The genetic learning algorithm is designed to have two levels: One is at the level of code words in which each neural network is updated through reproduction every time an input vector is processed. The other is at the level of code-books in which five neural networks are included in the gene pool. Extensive experiments on a group of image samples show that the genetic algorithm outperforms other vector quantization algorithms which include competitive learning, frequency sensitive learning and LBG.