We proposes a practical technique for the classification of facial images across multiple criteria, such as: gender, age, ethnicity, expression, and others. The technique uses a novel form of Gabor-based features followed by the application of the PCA and LDA algorithms. The computation of class scores in the context of nearest centroid classification is also novel, and relies, in part, on properties of the proposed features. We demonstrate that the proposed form of Gabor features is particularly suitable for achieving simultaneous classification. The reported results are obtained using a set of standard databases and include comparisons against known state-of-the-art algorithms. The utility of the proposed scheme is demonstrated by practical applications requiring multiple classification results to be obtained in real time while using typical consumer devices (cellphones, tablets, PCs) as computing platforms.
We discuss a problem of optimal design of encoding profiles for adaptive bitrate (ABR) streaming applications.
We show, that under certain conditions and optimization targets, this problem becomes equivalent to the problem of quantization of random variable, which in this case is bandwidth of a communication channel between streaming server and the client. But using such reduction to a known information-theoretic problem, we immediately arrive at class of algorithms for solving this problem optimally. We illustrate effectiveness of our approach by examples of optimal encoding ladders designed for different networks and reproduction devices.
Specific techniques and models utilized in this paper include:
- modeling of SSIM-rate functions for modern video codecs (H.264, HEVC) and different content
- adaptation of SSIM (by using scaling & CSF-filteing) to account for different resolutions and reproduction settins
- SSIM - MOS scale mapping
- CDF models of typical communication networks (wireless, cable, WiFi, etc)
- algorithms for solving quantization problem (Lloyd-Max algorithms, analytic solutions, etc)
We describe the design of a video streaming system using adaptation to viewing conditions to reduce the bitrate needed for delivery of video content. A visual model is used to determine sufficient resolution needed under various viewing conditions. Sensors on a mobile device estimate properties of the viewing conditions, particularly the distance to the viewer. We leverage the framework of existing adaptive bitrate streaming systems such as HLS, Smooth Streaming or MPEG-DASH. The client rate selection logic is modified to include a sufficient resolution computed using the visual model and the estimated viewing conditions. Our experiments demonstrate significant bitrate savings compare to conventional streaming methods which do not exploit viewing conditions.
We describe an implementation of DASH streaming client for mobile devices which uses adaptation to user
behavior and viewing conditions as means for improving efficiency of streaming delivery. Proposed design relies
on sensors in a mobile device to detect presence of the user, his proximity to the screen, and other factors such
as motion, brightness of the screen and ambient lighting conditions. This information is subsequently used to
select stream that delivers adequate resolution implied by viewing conditions and natural limits of human vision.
We show that in a mobile environment such adaptation can result in significant reduction of bandwidth usage
compared to traditional streaming systems.
The ultimate goal of network resource allocation for video teleconferencing is to optimize the Quality of Experience
(QoE) of the video. The IPPP video coding structure with macroblock intra refresh is widely used for video
teleconferencing. With such video coding structure, the loss of a frame generally causes error propagation to
subsequent frames. A resource allocation decision of a communication network determines the QoE given that
other conditions such as viewing conditions are fixed. Therefore, to optimize the QoE, a communication network
needs to be able to accurately predict the QoE for each of its resource allocation decisions and then selects the
decision corresponding to the best QoE. In our previous work, we reduced the QoE prediction problem to one of
predicting the per-frame PSNR time series. The accuracy of the proposed per-frame PSNR prediction method
was demonstrated, however, only for low resolution video sequences. In this paper, we show via simulations
that the per-frame PSNR prediction method achieves good performance for higher resolution video sequences as
In this paper, we present an error resilient video coding scheme for wireless video telephony applications that uses
feedback to limit error propagation. In conventional feedback-based error resilient schemes, error propagation
can significantly degrade visual quality when feedback delay is in the order of a few seconds. We propose a coding
structure based on multiple description coding that mitigates error propagation during feedback delay, and uses
feedback to adapt its coding structure to effectively limit error propagation. We demonstrate the effectiveness
of our approach at different error rates when compared to conventional coding schemes that use feedback.
We propose a design of a rate control algorithm for low-delay video transmission over wireless channels. Our
objective is to meet delay constraints and to make sure that the decoder buffer does not overflow or underflow.
We approach this problem analytically, by studying the leaky bucket model in the variable rate transmission
scenario, and deriving sufficient conditions for meeting our objective. We then employ these conditions in the
design of the rate control algorithm. We report results obtained by using this algorithm with an MPEG-4
AVC/H.264 encoder and LTE channel simulator as a test platform.
We offer probabilistic interpretation of meaning of SIFT image features. This allows us to derive formulae connecting SIFT feature values to parameters of gradient distributions. We also study KL-distances between gradient distributions and establish their connections to values in SIFT descriptors.
This paper reports on recent work by MPEG committee towards defining a standard for visual search applications. This
standardization initiative is referred to as Compact Descriptors for Visual Search (CDVS). The call for proposals for this
standard was issued by MPEG in March 2011, with responses due in October 2011. When completed, it is envisioned
that this standard will ensure high performance and interoperability of visual search applications, will simplify their
design, and will reduce amounts of visual search-related data that need to be stored or transmitted over networks.
State-of-the-art image retrieval pipelines are based on "bag-of-words" matching. We note that the original order in which
features are extracted from the image is discarded in the "bag-of-words" matching pipeline. As a result, a set of features
extracted from a query image can be transmitted in any order. A set ofm unique features has m! orderings, and if the order
of transmission can be discarded, one can reduce the query size by an additional log2(m!) bits. In this work, we compare
two schemes for discarding ordering: one based on Digital Search Trees, and another based on location histograms. We
apply the two schemes to a set of low bitrate Compressed Histogram of Gradient (CHoG) features, and compare their
performance. Both schemes achieve approximately log2(m!) reduction in query size for a set of m features.
We propose fast algorithms for computing Discrete Sine and Discrete Cosine Transforms (DCT and DST) of types
VI and VII. Particular attention is paid to derivation of fast algorithms for computing DST-VII of lengths 4 and 8,
which are currently under consideration for inclusion in ISO/IEC/ITU-T High Efficiency Video Coding (HEVC)
This paper describes video coding technology proposal submitted by Qualcomm Inc. in response to a joint call for
proposal (CfP) issued by ITU-T SG16 Q.6 (VCEG) and ISO/IEC JTC1/SC29/WG11 (MPEG) in January 2010. Proposed
video codec follows a hybrid coding approach based on temporal prediction, followed by transform, quantization, and
entropy coding of the residual. Some of its key features are extended block sizes (up to 64x64), recursive integer
transforms, single pass switched interpolation filters with offsets (single pass SIFO), mode dependent directional
transform (MDDT) for intra-coding, luma and chroma high precision filtering, geometry motion partitioning, adaptive
motion vector resolution. It also incorporates internal bit-depth increase (IBDI), and modified quadtree based adaptive
loop filtering (QALF). Simulation results are presented for a variety of bit rates, resolutions and coding configurations to
demonstrate the high compression efficiency achieved by the proposed video codec at moderate level of encoding and
decoding complexity. For random access hierarchical B configuration (HierB), the proposed video codec achieves an
average BD-rate reduction of 30.88c/o compared to the H.264/AVC alpha anchor. For low delay hierarchical P (HierP)
configuration, the proposed video codec achieves an average BD-rate reduction of 32.96c/o and 48.57c/o, compared to the
H.264/AVC beta and gamma anchors, respectively.
We describe design of lossless block codes for geometric, Laplacian, and similar distributions frequently arising
in image and video coding. Proposed codes can be understood as a generalization of Golomb codes, allowing
more precise adaptation to values of parameters of distributions, and resulting in lower redundancy. Design of
universal block codes for a class of geometric distributions is also studied.
A number of popular image matching algorithms such as Scale Invariant Feature Transform (SIFT)1 are based on local
image features. They first detect interest points (or keypoints) across an image and then compute descriptors based on
patches around them. In this paper, we observe that in textured or feature-rich images, keypoints typically appear in
clusters following patterns in the underlying structure. We show that such clustering phenomenon can be used to:
1) enhance recall and precision performance of the descriptor matching process, and 2) improve convergence rate of the
RANSAC algorithm used in the geometric verification stage.
We describe design of a low-complexity lossless and near-lossless image compression system with random access,
suitable for embedded memory compression applications. This system employs a block-based DPCM coder using
variable-length encoding for the residual. As part of this design, we propose to use non-prefix (one-to-one) codes for
coding of residuals, and show that they offer improvements in compression performance compared to conventional
techniques, such as Golomb-Rice and Huffman codes.
We review construction of a Compressed Histogram of Gradients (CHoG) image feature descriptor, and study
quantization problem that arises in its design. We explain our choice of algorithms for solving it, addressing
both complexity and performance aspects. We also study design of algorithms for decoding and matching of
compressed descriptors, and offer several techniques for speeding up these operations.
This paper describes design of transforms for extended block sizes for video coding. The proposed transforms are
orthogonal integer transforms, based on a simple recursive factorization structure, and allow very compact and efficient
implementations. We discuss techniques used for finding integer and scale factors in these transforms, and describe our
final design. We evaluate efficiency of our proposed transforms in VCEG's H.265/JMKTA framework, and show that
they achieve nearly identical performance compared to much more complex transforms in the current test model.
We review design of 4-, 8-, and 16-point transforms currently used in image and video coding standards, and compare
them with fast implementations of Discrete Cosine Transform of various other sizes (including non-dyadic even and odd
numbers) in the range of 2-64. We show that among such transforms there exist few that offer better complexity/coding
gain tradeoffs than current dyadic-sized transforms. In our construction and analysis we utilize an array of known
techniques (such as Heideman's mapping between DCT and DFT, Winograd short length DFT modules, prime-factorand
common-factor algorithms), and also offer a new factorization scheme for even-sized scaled transforms.
We study a problem of approximate computation of color transforms (with real and possibly irrational factors) using integer arithmetics. We show that precision of such computations can be significantly improved if we allow input or output variables to be scaled by some constant. The problem of finding such a constant turns out to be related to the classic Diophantine approximation problem. We use this relation to explain how best scaled approximations can be derived, and provide several examples of using this technique for design of color transforms.
This paper describes fixed-point design methodologies and several resulting implementations of the Inverse
Discrete Cosine Transform (IDCT) contributed by the authors to MPEG's work on defining the new 8x8 fixed
point IDCT standard - ISO/IEC 23002-2. The algorithm currently specified in the Final Committee Draft (FCD)
of this standard is also described herein.
In the context of a Call for Proposal for integer IDCTs issued by MPEG in July 2005, a full 2D integer IDCT based on a
previous Feig and Winograd's work has been proposed. It achieves a high precision by meeting all IEEE1180 conditions
and is suitable of implementation on hardware since it can be performed only with shifts and additions. Furthermore, it
can be useful in high video resolution scenarios like in 720p/1080i/p due to its feedforward operation mode without any
loop as usual in row-column implementations. The proposed transformation can be implemented without changing other
functional blocks either at the encoder or at the decoder or alternatively as a scaled version incorporating the scaling
factors into the dequantization stage. Our algorithm uses only 1328 operations for 8x8 blocks, including scaling factors.
This paper analyzes the drift phenomenon that occurs between video encoders and decoders that employ different
implementations of the Inverse Discrete Cosine Transform (IDCT). Our methodology utilizes MPEG-2, MPEG-4
Part 2, and H.263 encoders and decoders to measure drift occurring at low QP values for CIF resolution video
sequences. Our analysis is conducted as part of the effort to define specific implementations for the emerging ISO/IEC
23002-2 Fixed-Point 8x8 IDCT and DCT standard. Various IDCT implementations submitted as proposals for the new
standard are used to analyze drift. Each of these implementations complies with both the IEEE Standard 1180 and the
new MPEG IDCT precision specification ISO/IEC 23002-1. Reference implementations of the IDCT/DCT, and
implementations from well-known video encoders/decoders are also employed. Our results indicate that drift is
eliminated entirely only when the implementations of the IDCT in both the encoder and decoder match exactly. In this
case, the precision of the IDCT has no influence on drift. In cases where the implementations are not identical, then the
use of a highly precise IDCT in the decoder will reduce drift in the reconstructed video sequence only to the extent that
the IDCT used in the encoder is also precise.