We introduce a real-time implementation and evaluation of a new fast accurate full reference based image quality metric.
The popular general image quality metric known as the Structural Similarity Index Metric (SSIM) has been shown to be
an effective, efficient and useful, finding many practical and theoretical applications. Recently the authors have proposed
an enhanced version of the SSIM algorithm known as the Rotated Gaussian Discrimination Metric (RGDM). This
approach uses a Gaussian-like discrimination function to evaluate local contrast and luminance. RGDM was inspired by
an exploration of local statistical parameter variations in relation to variation of Mean Opinion Score (MOS) for a range
of particular distortion types. In this paper we out-line the salient features of the derivation of RGDM and show how
analyses of local statistics of distortion type necessitate variation in discrimination function width. Results on the LIVE
image database show tight banding of RGDM metric value when plotted against mean opinion score indicating the
usefulness of this metric. We then explore a number of strategies for algorithmic speed-up including the application of
Integral Images for patch based computation optimisation, cost reduction for the evaluation of the discrimination
function and general loop unrolling. We also employ fast Single Instruction Multiple Data (SIMD) intrinsics and explore
data parallel decomposition on a multi-core Intel Processor.
In this study, we develop a high-speed color-histogram-based tracking system that can be applied to 512x511 pixel images at 2000~fps using the hardware implementation of an improved CamShift algorithm.
In the improved algorithm, the size, position, and orientation of an object to be tracked can be extracted using only the hardware implementation of color conversion and the moment feature calculation of 16 binary images quantized by color bins on a high-speed vision platform. We demonstrate its effectiveness by presenting several target tracking results at 2000~fps when color-patterned objects move rapidly under complicated backgrounds.
This paper presents a real-time iris detection procedure for gray intensity images. Typical
applications for iris detection utilize template and feature based methods. These methods are
generally time and memory intensive and not applicable for all practical real-time embedded
realizations. Here, we propose a method that utilizes a simple algorithm that is time-efficient with
high detection and low error rates that is implemented in a smart camera. The system used for this
research involves a National Instruments smart camera with LabVIEW Real-Time Module. First, the
images are analyzed to determine the region of interest (face). The iris location is determined by
applying a convolution-based algorithm on the edge image and then using the Hough Transform.
The edge-based less complex and less computationally expensive algorithm results in an efficient
analysis method. The extracted iris location information is stored in the camera's image buffer, and
used to model one specific eye pattern. The location of the iris thus determined is used as a reference
to reduce the search region for the iris in the subsequent images. The iris detection algorithm has
been applied at different frame rates. The results demonstrate the speed of this algorithm allows the
tracking of the iris when the eyes or the subject is moving in front of the camera at reasonable
speeds and with limited occlusions.
This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell
phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware
computing on mobile embedded systems requires the use of computationally intensive image processing
algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to
take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared
memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for
embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x
Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256
MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread
application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to
OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a
popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling
DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the
speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM
processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees
a good trade off between computational power, scalability and flexibility. The software system is modular and
its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor
of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an
approach to easily parallelize the desired processing application has been presented. In this paper, as case study,
we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple
2D object detection modules in a real-time scenario. System performance has been evaluated under different
load conditions such as number of cameras and image sizes. The results show that the software architecture
scales well with the number of camera and can easily works with different image formats respecting the real time
constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with
a low level of overhead.
This paper presents the real-time implementation of our previously developed logo detection and tracking algorithm on
the open source BeagleBoard mobile platform. This platform has an OMAP processor that incorporates an ARM Cortex
processor. The algorithm combines Scale Invariant Feature Transform (SIFT) with k-means clustering, online color
calibration and moment invariants to robustly detect and track logos in video. Various optimization steps that are carried
out to allow the real-time execution of the algorithm on BeagleBoard are discussed. The results obtained are compared to
the PC real-time implementation results.
In this paper we describe a low complexity image orientation detection algorithm which can be implemented in real-time
on embedded devices such as low-cost digital cameras, mobile phone cameras and video surveillance cameras. Providing
orientation information to tamper detection algorithm in surveillance cameras, color enhancement algorithm and various
scene classifiers can help improve their performances. Various image orientation detection algorithms have been developed
in the last few years for image management systems, as a post processing tool. But, these techniques use certain high-level
features and object classification to detect the orientation, thus they are not suitable for implementation on a capturing
device in real-time. Our algorithm uses low-level features such as texture, lines and source of illumination to detect
orientation. We implemented the algorithm on a mobile phone camera device with a 180 MHz, ARM926 processor. The
orientation detection takes 10 ms for each frame which makes it suitable to use in image capture as well as video mode. It
can be used efficiently in parallel with the other processes in the imaging pipeline of the device. On hardware, the algorithm
achieved an accuracy of 92% with a rejection rate of 4% and a false detection rate of 8% on outdoor images.
Smoothing filter is the method of choice for image
preprocessing and pattern recognition. We present a
new concurrent method for smoothing 2D object in
binary case. Proposed method provides a parallel
computation while preserving the topology by using
homotopic transformations. We introduce an adapted
parallelization strategy called split, distribute and
merge (SDM) strategy which allows efficient
parallelization of a large class of topological
operators including, mainly, smoothing,
skeletonization, and watershed algorithms. To achieve
a good speedup, we cared about task scheduling.
Distributed work during smoothing process is done by
a variable number of threads. Tests on 2D binary
image (512*512), using shared memory parallel
machine (SMPM) with 8 CPU cores (2× Xeon E5405
running at frequency of 2 GHz), showed an
enhancement of 5.2 thus a cadency of 32 images per
second is achieved.
Recently, 3D displays and videos have generated a lot of interest in the consumer electronics industry. To make
3D capture and playback popular and practical, a user friendly playback interface is desirable. Towards this end,
we built a real time software 3D video player. The 3D video player displays user captured 3D videos, provides
for various 3D specific image processing functions and ensures a pleasant viewing experience. Moreover, the
player enables user interactivity by providing digital zoom and pan functionalities. This real time 3D player was
implemented on the GPU using CUDA and OpenGL. The player provides user interactive 3D video playback.
Stereo images are first read by the player from a fast drive and rectified. Further processing of the images
determines the optimal convergence point in the 3D scene to reduce eye strain. The rationale for this convergence
point selection takes into account scene depth and display geometry. The first step in this processing chain is
identifying keypoints by detecting vertical edges within the left image. Regions surrounding reliable keypoints
are then located on the right image through the use of block matching. The difference in the positions between
the corresponding regions in the left and right images are then used to calculate disparity. The extrema of
the disparity histogram gives the scene disparity range. The left and right images are shifted based upon the
calculated range, in order to place the desired region of the 3D scene at convergence. All the above computations
are performed on one CPU thread which calls CUDA functions. Image upsampling and shifting is performed in
response to user zoom and pan. The player also consists of a CPU display thread, which uses OpenGL rendering
(quad buffers). This also gathers user input for digital zoom and pan and sends them to the processing thread.
Users of the next generation wireless paradigm known as multihomed mobile networks expect satisfactory quality of
service (QoS) when accessing streamed multimedia content. The recent H.264 Scalable Video Coding (SVC) extension
to the Advanced Video Coding standard (AVC), offers the facility to adapt real-time video streams in response to the
dynamic conditions of multiple network paths encountered in multihomed wireless mobile networks. Nevertheless, preexisting
streaming algorithms were mainly proposed for AVC delivery over multipath wired networks and were
evaluated by software simulation. This paper introduces a practical, hardware-based testbed upon which we implement
and evaluate real-time H.264 SVC streaming algorithms in a realistic multihomed wireless mobile networks
environment. We propose an optimised streaming algorithm with multi-fold technical contributions. Firstly, we extended
the AVC packet prioritisation schemes to reflect the three-dimensional granularity of SVC. Secondly, we designed a
mechanism for evaluating the effects of different streamer 'read ahead window' sizes on real-time performance. Thirdly,
we took account of the previously unconsidered path switching and mobile networks tunnelling overheads encountered
in real-world deployments. Finally, we implemented a path condition monitoring and reporting scheme to facilitate the
intelligent path switching. The proposed system has been experimentally shown to offer a significant improvement in
PSNR of the received stream compared with representative existing algorithms.
A context adaptive variable length coding (CAVLC) decoder do not know the exact start position of the k-th syntax
element in a bitstream until it finishes parsing the (k-1)-th syntax element. It makes a parallel CAVLC decoding difficult.
It significantly increases implementation cost to predict the exact start position of a syntax element prior to parsing its
previous one. In this paper, we propose a new bitstream structure to concurrently access multiple syntax elements for
parallel CAVLC decoding. The method divides a bit-stream into N kinds of segments whose size is M bits and puts
syntax elements into the segments, based on a proposed rule. Then, a CAVLC decoder can simultaneously access N
segments to read N syntax elements from a single bitstream and decode them in parallel. This technique increases the
speed of CAVLC decoding by up to N times. Since the method just rearranges the generated bitstream, it does not affect
coding efficiency. Experimental results show that the proposed algorithm significantly increases decoding speed.
A novel approach for 3D image and video reconstruction is proposed and implemented. This is based on the wavelet
atomic functions (WAF) that have demonstrated better approximation properties in different processing problems in
comparison with classical wavelets. Disparity maps using WAF are formed, and then they are employed in order to
present 3D visualization using color anaglyphs. Additionally, the compression via Pth law is performed to improve the
disparity map quality. Other approaches such as optical flow and stereo matching algorithm are also implemented as
the comparative approaches. Numerous simulation results have justified the efficiency of the novel framework. The
implementation of the proposed algorithm on the Texas Instruments DSP TMS320DM642 permits to demonstrate
possible real time processing mode during 3D video reconstruction for images and video sequences.
In future 3D videoconferencing systems, depth estimation is required to support autostereoscopic displays and
even more important, to provide eye contact. Real-time 3D video processing is currently possible, but within
some limits. Since traditional CPU centred sub-pixel disparity estimation is computationally expensive, the
depth resolution of fast stereo approaches is directly linked to pixel quantization and the selected stereo baseline.
In this work we present a novel, highly parallelizable algorithm that is capable of dealing with arbitrary depth
resolutions while avoiding texture interpolation related runtime penalties by application of GPU centred design.
The cornerstone of our patch sweeping approach is the fusion of space sweeping and patch based 3D estimation
techniques. Especially for narrow baseline multi-camera configurations, as commonly used for 3D videoconferencing
systems (e.g. ), it preserves the strengths of both techniques and avoid their shortcomings at the same
time. Moreover, we provide a sophisticated parameterization and quantization scheme that establishes a very
good scalability of our algorithm in terms of computation time and depth estimation quality. Furthermore, we
present an optimized CUDA implementation for a multi GPU setup in a cluster environment. For each GPU, it
performs three pair wise high quality depth estimations for a trifocal narrow baseline camera configuration on a
256x256 image block within real-time.
Many scene change detection techniques have been developed for scene cuts, fade in and fade out by analyzing
video encoder input signals. For real time scene change detection, sensor input signals provide first-hand
information which can be used for scene change detection. In this paper, by analyzing camcorder front end
sensor input signals with our proposed algorithms based on camera 3A (auto exposure, auto white balance and
auto focus), a novel scene change detection technique is described. Camera 3A based scene change detection
algorithm can detect scene changes in a timely manner and therefore fits well for real time scene change detection
applications. Experimental results show that this algorithm can detect scene changes with good accuracy. The
proposed algorithm is computationally efficient and easy to implement.
In this work we develop a new algorithm, that extends the bidimensional Fast Digital Radon transform from
Götz and Druckmüller (1996), to digitally simulate the refocusing of a 4D light field into a 3D volume of
photographic planes, as previously done by Ren Ng et al. (2005), but with the minimum number of operations.
This new algorithm does not require multiplications, just sums, and its computational complexity is O(N4) to
achieve a volume consisting of 2N photographic planes focused at different depths, from a N4 plenoptic image.
This reduced complexity allows for the acquisition and processing of a plenoptic sequence with the purpose of
estimating 3D shape at video rate. Examples are given of implementations on GPU and CPU platforms. Finally,
a modified version of the algorithm to deal with domains of sizes different than power of two, is proposed.
In this paper, we present a local adaptive filter for fast edge-preserving smoothing, a so-called cross-based filter.
The filter is mainly built on upright crosses and captures the local image structures adaptively. The cross-based
filter has some resemblance with the classic bilateral filter, when binarizing the support weight and imposing
a spatial connectivity constraint. For edge-preserving smoothing, our cross-based filter is capable of reaching
similar performance as bilateral filter, while being dozens of times faster. The proposed filter can be applied in
near-constant time, using the integral images technique. In addition, the cross-based filter is highly parallel and
suitable for parallel computing platforms, e.g. GPUs. The strength of the proposed filter is illustrated in several
applications, i.e. denoising and image abstraction.
In this paper, a linear discriminant analysis (LDA) based classifier employed in a tree structure is presented to
recognize the human actions in a wide and complex environment. In particular, the proposed classifier is based
on a supervised learning process and achieves the required classification in a multi-step process. This multi-step
process is performed simply by adopting a tree structured which is built during the training phase. Hence, there
is no need of any priori information like in other classifiers such as the number of hidden neurons or hidden
layers in a multilayer neural network based classifier or an exhaustive search as used in training algorithms
for decision trees. A skeleton based strategy is adopted to extract the features from a given video sequence
representing any human action. A Pan-Tilt-Zoom (PTZ) camera is used to monitor the wide and complex test
environment. A background mosaic image is built offline and used to compute the background images in real
time. A background subtraction strategy has been adopted for detecting the object in various frames and to
extract their corresponding silhouette. A skeleton based process is used to extract attributes of a feature vector
corresponding to a human action. Finally, the proposed framework is tested on various indoor and outdoor
scenarios and encouraging results are achieved in terms of classification accuracy.
Wide area airborne surveillance (WAAS) systems are a new class of remote sensing imagers which have many
military and civilian applications. These systems are characterized by long loiter times (extended imaging
time over fixed target areas) and large footprint target areas. These characteristics complicate moving object
detection and tracking due to the large image size and high number of moving objects. This research evaluates
existing object detection and tracking algorithms withWAAS data and provides enhancements to the processing
chain which decrease processing time and maintain or increase tracking accuracy. Decreases in processing time
are needed to perform real-time or near real-time tracking either on the WAAS sensor platform or in ground
station processing centers. Increased tracking accuracy benefits real-time users and forensic (off-line) users.
Results of comparative study of the computational complexity of different algorithms for numerical reconstruction of
electronically recorded holograms are presented and discussed. The following algorithms were compared: different types
of Fourier and convolutional algorithms and a new universal DCT-based algorithm, in terms of the number of operations.
Based on the comparison results, the feasibility of real-time implementation of numerical reconstruction of holograms is
Modern microscopy techniques allow imaging of circulating blood components under vascular flow conditions.
The resulting video sequences provide unique insights into the behavior of blood cells within the vasculature and
can be used as a method to monitor and quantitate the recruitment of inflammatory cells at sites of vascular
injury/ inflammation and potentially serve as a pharmacodynamic biomarker, helping screen new therapies and
individualize dose and combinations of drugs. However, manual analysis of these video sequences is intractable,
requiring hours per 400 second video clip. In this paper, we present an automated technique to analyze the
behavior and recruitment of human leukocytes in whole blood under physiological conditions of shear through
a simple multi-channel fluorescence microscope in real-time. This technique detects and tracks the recruitment
of leukocytes to a bioactive surface coated on a flow chamber. Rolling cells (cells which partially bind to the
bioactive matrix) are detected counted, and have their velocity measured and graphed. The challenges here
include: high cell density, appearance similarity, and low (1Hz) frame rate. Our approach performs frame
differencing based motion segmentation, track initialization and online tracking of individual leukocytes.
The H.264 video coding standard achieves high performance compression and image quality at the expense of increased
encoding complexity, due to the very refined Motion Estimation (ME) and mode decision processes. This paper focuses
on decreasing the complexity of the mode selection process by effectively applying a novel fast mode decision
Firstly the phase correlation is analysed between a macroblock and its prediction obtained from the previously encoded
adjacent block. Relationships are established between the correlation value and object size and also best fit motion
vector. From this a novel fast mode decision and motion estimation technique has been developed utilising preprocessing
frequency domain ME in order to accurately predict the best mode and the search range. We measure the
correlation between a macroblock and the corresponding prediction. Based on the result we select the best mode, or
limit the mode selection process to a subset of modes. Moreover the correlation result is also used to select an
appropriate search range for the ME stage.
Experimental results show that the proposed algorithm significantly reduces the motion estimation time whilst
maintaining similar Rate Distortion performance, when compared to both the H.264/AVC Joint Model (JM) reference
software and recently reported work.
The objective of scalable video coding is to enable the generation of a unique bitstream that can adapt to various bitrates,
transmission channels and display capabilities. The scalability is categorised in terms of temporal, spatial, and
quality. To improve encoding efficiency, the SVC scheme incorporates inter-layer prediction mechanisms which
increases complexity of overall encoding.
In this paper several conditional probabilities are established relating motion estimation characteristics and the mode
distribution at different layers of the H.264/SVC. An evaluation of these probabilities is used to structure a low-complexity
prediction algorithm for Group of Pictures (GOP) in H.264/SVC, reducing computational complexity whilst
maintaining similar performance. When compared to the JSVM software, this algorithm achieves a significant reduction
of encoding time, with a negligible average PSNR loss and bit-rate increase in temporal, spatial and SNR scalability.
Experiments are conducted to provide a comparison between our method and a recently developed fast mode selection
algorithm. These demonstrate our method achieves appreciable time savings for scalable spatial and scalable quality
video coding, while maintaining similar PSNR and bit rate.
This paper proposes a smart portable device, named the X-Eye, which provides a gesture interface with a small size but a
large display for the application of photo capture and management. The wearable vision system is implemented with
embedded systems and can achieve real-time performance. The hardware of the system includes an asymmetric dualcore
processer with an ARM core and a DSP core. The display device is a pico projector which has a small volume size
but can project large screen size. A triple buffering mechanism is designed for efficient memory management. Software
functions are partitioned and pipelined for effective execution in parallel. The gesture recognition is achieved first by a
color classification which is based on the expectation-maximization algorithm and Gaussian mixture model (GMM). To
improve the performance of the GMM, we devise a LUT (Look Up Table) technique. Fingertips are extracted and
geometrical features of fingertip's shape are matched to recognize user's gesture commands finally.
In order to verify the accuracy of the gesture recognition module, experiments are conducted in eight scenes with
400 test videos including the challenge of colorful background, low illumination, and flickering. The processing speed of
the whole system including the gesture recognition is with the frame rate of 22.9FPS. Experimental results give 99%
recognition rate. The experimental results demonstrate that this small-size large-screen wearable system has effective
gesture interface with real-time performance.
Tracking multiple vehicles with multiple cameras is a challenging problem of great importance in tunnel surveillance.
One of the main challenges is accurate vehicle matching across the cameras with non-overlapping fields
of view. Since systems dedicated to this task can contain hundreds of cameras which observe dozens of vehicles
each, for a real-time performance computational efficiency is essential. In this paper, we propose a low complexity,
yet highly accurate method for vehicle matching using vehicle signatures composed of Radon transform
like projection profiles of the vehicle image. The proposed signatures can be calculated by a simple scan-line
algorithm, by the camera software itself and transmitted to the central server or to the other cameras in a smart
camera environment. The amount of data is drastically reduced compared to the whole image, which relaxes the
data link capacity requirements. Experiments on real vehicle images, extracted from video sequences recorded
in a tunnel by two distant security cameras, validate our approach.