The ability to assess the quality of new multimedia tools and applications relies heavily on the perception of the end user. In order to quantify the perception, subjective tests are required to evaluate the effectiveness of new technologies. However, the standard for subjective user studies requires a highly controlled test environment and is costly in terms of both money and time. To circumvent these issues we are utilizing crowdsourcing platforms such as CrowdFlower and Amazon's Mechanical Turk. The reliability of the results relies on factors that are not controlled and can be considered “hidden”. We are using pre-test survey to collect responses from subjects that reveal some of the hidden factors. Using statistical analysis we build parameterized model allowing for proper adjustments to collected test scores.
This paper presents a system to detect and extract identifiable information such as license plates, make, model, color, and bumper stickers present on vehicles. The goal of this work is to develop a system that automatically describes a vehicle just as a person would. This information can be used to improve traffic surveillance systems. The presented solution relies on efficient segmentation and structure of license plates to identify and extract information from vehicles. The system was evaluated on videos captures on Florida highways and is expected to work in other regions with little or no modifications. Results show that license plate was successfully segmented 92% of the cases, the make and the model of the car were segmented out and in 93% of the cases and bumper stickers were segmented in 92.5% of the cases. Over all recognition accuracy was 87%.
Nowadays it is very hard to find available spots in public parking lots and even harder in public facilities such as
universities and sports venues. A system that provides drivers with parking availability and parking lot occupancy will
allow users find a parking space much easier and faster. This paper presents a system for automatic parking lot
occupancy computation using motion tracking. Methods for complexity reduction are presented. The system showed
approximately 96% accuracy in determining parking lot occupancy. We showed that by optimizing the resolution and
bitrate of the input video, we can reduce the complexity by 70% and still achieved over 90% of accuracy. The results
showed that high quality video is not necessary for the proposed algorithm to obtain accurate results.
This paper presents a method for perceptual video compression that exploits the phenomenon of backward temporal
masking. We present an overview of visual temporal masking and discuss models to identify portions of a video
sequences masked due to this phenomenon exhibited by the human visual system. A quantization control model based
on the psychophysical model of backward visual temporal masking was developed. We conducted two types of
subjective evaluations and demonstrated that the proposed method up to 10% bitrate savings on top of state of the art
encoder with visually identical video. The proposed methods were evaluated using HEVC encoder.
Perceptual video coding methods attempt to improve compression efficiency by discarding visual information not
perceived by end users. Most of the current approaches for perceptual video coding only use visual features ignoring the
auditory component. Many psychophysical studies have demonstrated that auditory stimuli affects our visual perception.
In this paper we present our study of audio triggered emotional attention and it’s applicability to perceptual video
coding. Experiments with movie clips show that the reaction time to detect video compression artifacts was longer when
video was presented with the audio information. The results reported are statistically significant with p=0.024.
A large number of health-related applications are being developed using web infrastructure. Video is increasingly used in healthcare applications to enable communications between patients and care providers. We present a video conferencing system designed for healthcare applications. In face of network congestion, the system uses role-based adaptation to ensure seamless service. A new web technology, WebRTC, is used to enable seamless conferencing applications. We present the video conferencing application and demonstrate the usefulness of role based adaptation.
Video consumption patterns continue to change with consumers relying more and more on on-demand Internet video
and portable devices rather than traditional TV services. This new form of video service delivery and consumption
makes possible more interactive and social experiences for video consumers, commonly referred to as Social TV
services. This paper presents an overview of technologies and guidelines for the development of Social TV applications.
A prototype using three core technologies, WebRTC, DASH, and WebSocket was developed to understand the
challenges and demonstrate the feasibility of such applications.
Video carving has become an essential tool in digital forensics. Video carving enables recovery of deleted video files from hard disks. Processing data to extract videos is a computationally intensive task. In this paper we present two methods to accelerate video carving: a method to accelerate fragment extraction, and a method to accelerate combining of these fragments into video segments. Simulation results show that complexity of video fragment extraction can be reduced by as much as 75% with minimal impact on the videos recovered.
Mobile compute environments provide a unique set of user needs and expectations that designers must consider. With increased multimedia use in mobile environments, video encoding methods within the smart phone market segment are key factors that contribute to positive user experience. Currently available display resolutions and expected cellular bandwidth are major factors the designer must consider when determining which encoding methods should be supported. The desired goal is to maximize the consumer experience, reduce cost, and reduce time to market. This paper presents a comparative evaluation of the quality of user experience when HEVC and AVC/H.264 video coding standards were used. The goal of the study was to evaluate any improvements in user experience when using HEVC. Subjective comparisons were made between H.264/AVC and HEVC encoding standards in accordance with Doublestimulus impairment scale (DSIS) as defined by ITU-R BT.500-13. Test environments are based on smart phone LCD resolutions and expected cellular bit rates, such as 200kbps and 400kbps. Subjective feedback shows both encoding methods are adequate at 400kbps constant bit rate. However, a noticeable consumer experience gap was observed for 200 kbps. Significantly less H.264 subjective quality is noticed with video sequences that have multiple objects moving and no single point of visual attraction. Video sequences with single points of visual attraction or few moving objects tended to have higher H.264 subjective quality.
Cues from human visual system (HVS) can be used for further optimization of compression in modern hybrid video coding platforms. We present work that explores and exploits motion related attentional limitations. Algorithms for exploiting motion triggered attention were developed and compared with MPEG AVC/H.264 encoder with various settings for different bitrate levels. For the sequences with high motion activity our algorithm provides up to 8% bitrate savings.
In this paper we present a solution to improve the performance of adaptive HTTP streaming services. The proposed
approach uses a content aware method to determine whether switching to a higher bitrate can improve video quality. The
proposed solution can be implemented as a new parameter in segment description to enable content switching only in
cases with meaningful increase in quality. Results of our experiments show clear advantages of using additional
parameter in DASH implementation. The proposed approach enables significant bandwidth savings with minimal
decrease in quality. It guarantees optimal path of adaptation in various scenarios that can be beneficial both for network
providers and end users.
Games have become important applications on mobile devices. A mobile gaming approach known as remote gaming is being developed to support games on low cost mobile devices. In the remote gaming approach, the responsibility of rendering a game and advancing the game play is put on remote servers instead of the resource constrained mobile devices. The games rendered on the servers are encoded as video and streamed to mobile devices. Mobile devices gather user input and stream the commands back to the servers to advance game play. With this solution, mobile devices with video playback and network connectivity can become game consoles. In this paper we present the design and development of such a system and evaluate the performance and design considerations to maximize the end user gaming experience.
Proc. SPIE. 7543, Visual Information Processing and Communication
KEYWORDS: Computer programming, Video coding, Optical spheres, Machine learning, Video, Mobile devices, System on a chip, Visual information processing, Electronic imaging, Current controlled current source
In this paper, we show that it is possible to reduce the complexity of Intra MB coding in H.264/AVC based
on a novel chance constrained classifier. Using the pairs of simple mean-variances values, our technique is able
to reduce the complexity of Intra MB coding process with a negligible loss in PSNR. We present an alternate
approach to address the classification problem which is equivalent to machine learning. Implementation results
show that the proposed method reduces encoding time to about 20% of the reference implementation with
average loss of 0.05 dB in PSNR.
H.264/AVC encoder complexity is mainly due to variable block size in Intra and Inter frames. This makes
H.264/AVC very difficult to implement, especially for real time applications and mobile devices. The current
technological challenge is to conserve the compression capacity and quality that H.264 offers but reduce the
encoding time and, therefore, the processing complexity. This paper applies machine learning technique for video
encoding mode decisions and investigates ways to improve the process of generating more general low complexity
H.264/AVC video encoders. The proposed H.264 encoding method decreases the complexity in the mode decision
inside the Inter frames. Results show, on average, a 67.36% reduction in encoding time, a 0.2 dB decrease in PSNR,
and an average bit rate increase of 0.05%.
This paper describes complexity reduction in MPEG-2 to H.264 transcoding with resolution reduction. The methods
developed are applicable to transcoding any DCT based video such as MPEG-2, MPEG-4, and H.263 to the recently
standardized H.264 video at a reduced resolution. H.264 is being adopted by mobile device industry and devices such as
iPod use H.264. The mobile devices, however, need the video at a reduced resolution. The proposed transcoder
accelerates the H.264 encoding stage by performing motion estimation for only one block size as determined by the
trained decision trees. Our solution allows conversion of MPEG-2 video to the H.264 video format at a reduced
resolution with substantially less computing complexity. We use machine learning based approaches to significantly
reduce the complexity of this transcoding. Experimental results show a reduction in transcoding time of about 67% (a 3x
speedup) with a less than -0.5 dB change in PSNR.
Video applications on handheld devices such as smart phones pose a significant challenge to achieve high quality user
experience. Recent advances in processor and wireless networking technology are producing a new class of multimedia
applications (e.g. video streaming) for mobile handheld devices. These devices are light weight and have modest sizes,
and therefore very limited resources - lower processing power, smaller display resolution, lesser memory, and limited
battery life as compared to desktop and laptop systems. Multimedia applications on the other hand have extensive
processing requirements which make the mobile devices extremely resource hungry. In addition, the device specific
properties (e.g. display screen) significantly influence the human perception of multimedia quality. In this paper we
propose a saliency based framework that exploits the structure in content creation as well as the human vision system to
find the salient points in the incoming bitstream and adapt it according to the target device, thus improving the quality of
new adapted area around salient points. Our experimental results indicate that the adaptation process that is cognizant of
video content and user preferences can produce better perceptual quality video for mobile devices. Furthermore, we
demonstrated how such a framework can affect user experience on a handheld device.
VP6 is a video coding standard developed by On2 Technologies. It is the preferred codec in the Flash 8/9 format used by
many popular online video services and user generated content sites. The wide adoption of Flash video for video delivery
on the Internet has made VP6 one of the most widely used video compression standards on the Internet. With the wide
adoption of VP6 comes the need for transcoding other video formats to the VP6 format. This paper presents algorithms
to transcode H.263 to the VP6 format. This transcoder has applications in media adaptation including converting older
Flash video formats to Flash 8 format. The transcoding algorithms reuse the information from the H.263 decoding stage
and accelerate the VP6 encoding stage. Experimental results show that the proposed algorithms are able to reduce the
encoding complexity by up to 52% while reducing the PSNR by at most 0.42 dB in the worst case.
H.264 is a highly efficient and complex video codec. The complexity of the codec makes it difficult to use all its features
in resource constrained mobile devices. This paper presents a machine learning approach to reducing the complexity of
Intra encoding in H.264. Determining the macro block coding mode requires substantial computational resources in
H.264 video encoding. The goal of this work to reduce MB mode computation from a search operation, as is done in the
encoders today, to a computation. We have developed a methodology based on machine learning that computes the MB
coding mode instead of searching for the best match thus reducing the complexity of Intra 16x16 coding by 17 times and
Intra 4x4 MB coding by 12.5 times. The proposed approach uses simple mean value metrics at the block level to
characterize the coding complexity of a macro block. A generic J4.8 classifier is used to build the decision trees to
quickly determine the mode. We present a methodology for Intra MB coding. The results show that intra MB mode can
be determined with over 90% accuracy. The proposed can also be used for determining MB prediction modes with an
accuracy varying between 70% and 80%.
Reducing the product development cycle time is one of the most important and challenging problems faced by the
industry today. As the functionality and complexity of devices increases, so does the time required to design, test, and
develop the devices. Developing products rapidly in the face of this increasing complexity requires new methodologies
and tools. This paper presents a methodology for estimating the resources consumed by a video decoder. The proposed
methodology enables resource estimation based on high level user requirements. Component architecture for a H.264
video decoder is developed to enable design space exploration. The resources required to decode H.264 video are
estimated based on a measure of the complexity of the H.264 bitstreams and the target architecture. The proposed
approach is based on the hypothesis that the complexity of a H.264 video bitstream significantly influences resource
consumption and the complexity of a bitstream can thus be used to determine resource estimation. The bitstream
complexity is characterized to capture the data dependencies using a process called Bitstream Abstraction. The decoder
is componentized and component level resource requirements determined in a process called Decoder Abstraction. The
proposed methodology uses Bitstream Abstraction together with Decoder Abstraction to estimate resource
requirements. A component model for the H.264 video decoder is developed. Resources consumed by each component
are determined using the VTune performance analyzer. These resource estimates and video bitstream complexity are
used in developing a parametric model for resource estimation based on bitstream complexity. The proposed
methodology enables high level resource estimation for multimedia applications without a need for extensive and time
The MPEG-2 compressed digital video content is being used in a number of products including the DVDs, Camcorders, digital TV, and HDTV. The ability to access this widely available MPEG-2 content on low-power end-user devices such as PDAs and mobile phones depends on effective techniques for transcoding the MPEG-2 content to a more appropriate, low bitrate, video format such as MPEG-4. In this paper we present the software and algorithmic optimizations performed in developing a real time MPEG-2 to MPEG-4 video transcoder. A brief overview of the transcoding architectures is also provided. The details of the transcoding architectures for MPEG-2 to MPEG-4 video transcoding can be found in. The transcoder was targeted and optimized for Windows PCs with the Intel Pentium-4 processors. The optimizations performed exploit the SIMD parallelism offered by the Intel Pentium-4 processors. The transcoder consists of two distinct components: the MPEG-2 video decoder and the MPEG-4 video transcoder. The MPEG-2 video decoder is based on the MPEG-2 Software Simulation Group’s reference implementation while MPEG-4 transcoder is developed from scratch with portions taken from the MOMUSYS implementation of the MPEG-4 video encoder. The optimizations include: 1) generic block-processing optimizations that affected both the MPEG-2 decoder and the MPEG-4 transcoder and 2) optimizations specific to the MPEG-2 video decoder and the MPEG-4 video transcoder. The optimizations resulted in significant improvements both in MPEG-2 decoding as well as the MPEG-4 transcoding. With optimizations, the total time spent by the transcoder was reduced by over 82% with MPEG-2 decoding reduced by over 56% and MPEG-4 transcoding reduced by over 86%.
In this paper, we present a real-time adaptive streaming video platform. This platform is fully compliant with the Internet Streaming Media Alliance Implementation Specification. It has been used for experiments of Real-time video streaming and transcoding via unicast and multicast over heterogeneous networks. One of the examples of streaming video over a lossy channel is given, and a simple and efficient scheme for the packet loss recovery is presented.
We briefly describe the process for creating an MP4 file and introduce the software tools used for the creation. Then, we describe the architecture of an MP4 player - Flavor Player - that implements the MPEG-4 Systems specification. The Flavor Player implements 2-D composition and depth ordering of objects, object animation, user interaction, MPEG-J, IPMP framework, and MP4 file support. Additionally, we describe a simplified version of the Flavor Player - Mild Flavor - that only implements the Object Descriptor Profile. Unlike the Flavor Player, Mild Flavor is also used to create and edit MP4 files in addition to playback.