H.264/AVC Scalable Video Coding (SVC) is an emerging video coding standard developed by the Joint Video Team
(JVT), which supports multiple scalability features. With scalabilities, SVC video data can be easily adapted to the
characteristics of heterogeneous networks and various devices. Furthermore, SVC requires a high coding efficiency that
is equally competitive or better than single-layer H.264/AVC. Motion prediction at the level of Fine Grain Scalability
(FGS) enhancement layers was proposed to improve coding efficiency as well as inter-layer motion prediction. However,
the removal of the FGS enhancement layer at the inter-layer motion prediction causes significant visual errors due to the
encoder-decoder mismatches of motion vectors and MB modes. In this paper, we analyze visual errors to find the cause
as well as the method for reducing such errors. Experimental results showed that the proposed method allowed SVC
bitstreams decoding with reduced visual errors, even when the FGS enhancement layer used for the inter-layer motion
prediction was removed.
For outstanding coding efficiency with scalability functions, SVC (Scalable Video Coding) is being standardized.
SVC can support spatial, temporal and SNR scalability and these scalabilities are useful to provide a smooth video
streaming service even in a time varying network such as a mobile environment. But current SVC is insufficient to
support dynamic video conversion with scalability, thereby the adaptation of bitrate to meet a fluctuating network
condition is limited. In this paper, we propose dynamic full-scalability conversion methods for QoS adaptive video
streaming in SVC. To accomplish full scalability dynamic conversion, we develop corresponding bitstream extraction,
encoding and decoding schemes. At the encoder, we insert the IDR NAL periodically to solve the problems of spatial
scalability conversion. At the extractor, we analyze the SVC bitstream to get the information which enable dynamic
extraction. Real time extraction is achieved by using this information. Finally, we develop the decoder so that it can
manage the changing scalability. Experimental results showed that dynamic full-scalability conversion was verified and
it was necessary for time varying network condition.
Scalable Video Coding (SVC) is one of the promising techniques to ensure Quality of Service (QoS) in multimedia
communication through heterogeneous networks. SVC compresses a raw video into multiple bitstreams composed of a
base bitstream and enhancement bitstreams to support multi scalabilities such as SNR, temporal and spatial. Therefore, it
is able to extract an appropriate bitstream from original coded bitstream without re-encoding to adapt a video to user
environment. In this flexible environment, QoS has appeared as an important issue for service acceptability. Therefore,
there has been a need for measuring a degree of video quality to guarantee the quality of video streaming service.
Existing studies on the video quality metric have mainly focused on temporal and SNR scalability.
In this paper, we propose an efficient quality metric, which allows for spatial scalability as well as temporal and SNR
scalability. To this end, we study the effect of frame rate, SNR, spatial scalability and motion characteristics by using the
subjective quality assessment, and then a new video quality metric supporting full scalability is proposed. Experimental
results show that this quality metric has high correlation with subjective quality. Because the proposed metric is able to
measure a degree of video quality according to the variation of scalability, it will play an important role at the extraction
point for determining the quality of SVC.
In this paper, we propose a new functionality to Scalable Video Coding, which is the support of multiple ROIs. SVC is targeted at the flexible extraction of some bitstream from the original SVC bitstream, and it is discussed in the MPEG committee for standardization. Region of interest (ROI) is an area that is semantically important to a particular user. It is expected that a bitstream that contains the ROI can be extracted without any transcoding operations. In many cases, the user may want to see more than one ROIs at the same time. The existence of multiple ROIs results in some difficulties in extracting the bitstream containing more than one ROI. In this paper, we present solutions to address these difficulties.
In modern digital broadcasting environment, broadcasting content filtering could provide a useful function that a TV viewer can find or store personal desired scenes from programs of multiple channels and it can be done even when one is watching a program from other channel. To achieve the filtering function in live broadcast, real-time processing is needed basically. In this paper, a broadcasting content filtering algorithm is proposed and filtering system requirements for the real-time content filtering are analyzed. To achieve real-time content processing, a buffer control algorithm is proposed as well. The usefulness of the broadcasting content filtering is demonstrated with experiments on a test-bed system.
With the increase of multimedia contents in the internet, people need to handle a large amount of multimedia contents in the web as well as e-mail. Visual data mining is needed to find appropriate visual data among large multimedia contents. But editing process, which is common on the web affects the feature of visual data, and causes false retrieval in current visual mining system. In this paper, we propose an improving visual mining method by detecting and reducing image editing effects.
We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.
We propose a robust video segmentation algorithm for video summary. Exact shot boundary detection and segmentation of video into meaningful scenes are important parts for the automatic video summary. In this paper, we present a shot boundary detection using audio and visual features defined in the MPEG-7 which provides software standard for multimedia description. By using Hidden Markov Model classifier based on statistics of the audio and visual features, exact shot boundary is detected and further over-segmentation could be reduced, which is a common problem in automatic video segmentation.
We propose a multi-agents platform for an interactive broadcasting system based on the MPEG-7 and TV-anytime standards. In the system, an intelligent agent technique is adopted from FIPA that provides software standards for interacting heterogeneous agents. Using the MDS (Multimedia Description Scheme) of MPEG-7, TV-anytime standards and interactive functions such as a user preference and an audio/video contents summary, are applied. In this paper, we propose the technique to use multi-modal features as well as multiple MPEG-7 features. We evaluate a video summary and filtering based on MPEG-7 on top of intelligent agent.