Automatic vehicle Make and Model Recognition (MMR) systems provide useful performance enhancements to vehicle
recognitions systems that are solely based on Automatic License Plate Recognition (ALPR) systems. Several car MMR
systems have been proposed in literature. However these approaches are based on feature detection algorithms that can
perform sub-optimally under adverse lighting and/or occlusion conditions. In this paper we propose a real time,
appearance based, car MMR approach using Two Dimensional Linear Discriminant Analysis that is capable of
addressing this limitation. We provide experimental results to analyse the proposed algorithm's robustness under varying
illumination and occlusions conditions. We have shown that the best performance with the proposed 2D-LDA based car
MMR approach is obtained when the eigenvectors of lower significance are ignored. For the given database of 200 car
images of 25 different make-model classifications, a best accuracy of 91% was obtained with the 2D-LDA approach. We
use a direct Principle Component Analysis (PCA) based approach as a benchmark to compare and contrast the
performance of the proposed 2D-LDA approach to car MMR. We conclude that in general the 2D-LDA based algorithm
supersedes the performance of the PCA based approach.
In this paper we extend our previous work to address vehicle differentiation in traffic density computations1. The main
goal of this work is to create vehicle density history for given roads under different weather or light conditions and at
different times of the day. Vehicle differentiation is important to account for connected or otherwise long vehicles, such
as trucks or tankers, which lead to over-counting with the original algorithm. Average vehicle size in pixels, given the
magnification within the field of view for a particular camera, is used to separate regular cars and long vehicles.
A separate algorithm and procedure have been developed to determine traffic density after dark when the vehicle
headlights are turned on. Nighttime vehicle recognition utilizes blob analysis based on head/taillight images. The high
intensity of vehicle lights are identified in binary images for nighttime vehicle detection.
The stationary traffic image frames are downloaded from the internet as they are updated. The procedures are
implemented in MATLAB. The results of both nighttime traffic density and daytime long vehicle identification
algorithms are described in this paper. The determination of nighttime traffic density, and identification of long vehicles
at daytime are improvements over the original work1.
Video surveillance is ubiquitous in modern society, but surveillance cameras are severely limited in utility by their low
resolution. With this in mind, we have developed a system that can autonomously take high resolution still frame
images of moving objects. In order to do this, we combine a low resolution video camera and a high resolution still
frame camera mounted on a pan/tilt mount. In order to determine what should be photographed (objects of interest), we
employ a hierarchical method which first separates foreground from background using a temporal-based median
filtering technique. We then use a feed-forward neural network classifier on the foreground regions to determine
whether the regions contain the objects of interest. This is done over several frames, and a motion vector is deduced for
the object. The pan/tilt mount then focuses the high resolution camera on the next predicted location of the object, and
an image is acquired. All components are controlled through a single MATLAB graphical user interface (GUI). The
final system we present will be able to detect multiple moving objects simultaneously, track them, and acquire high
resolution images of them. Results will demonstrate performance tracking and imaging varying numbers of objects
moving at different speeds.
In this paper, a method which estimates the trajectory of the vehicle from a single vehicle camera is proposed.
The proposed method is a model based method which assumes that the vehicle is running on a planar road. The
input image is converted to a Top-View image and a matching (registration) between the next Top-View image
is done. The registration is done based on an assumed velocity parameter, and repeated with entire candidate
parameters. In this paper, a simple model and the particle filter is introduced to decrease the computation cost.
Simple model gives a constraint to the registration of the Top-View images, and the particle filter decreases the
number of the candidate parameters. Position of the camera is obtained by accumulating the velocity parameters.
Experiments shows 3 results. Enough decreasement of the computation cost, suitable estimated trajectory and
small enough computation cost to estimate the trajectory of the vehicle.
Print-from-video can be achieved via super-resolution techniques, which involve combining information from multiple low resolution images to generate a high resolution image. Due to inaccuracies of sub-pixel motion estimation and motion modeling, undesired artifacts or outliers are produced when using such techniques. This paper discusses the use of the direct approach for the print-from-video application and introduces an outlier reduction algorithm, named pattern filtering, as part of the super-resolution reconstruction process. The introduced algorithm is non-iterative, making it computationally efficient for deployment on digital camera platforms.
We describe a fast contour descriptor algorithm and its application to a distributed supernova detection system (the
Nearby Supernova Factory) that processes 600,000 candidate objects in 80 GB of image data per night. Our shapedetection
algorithm reduced the number of false positives generated by the supernova search pipeline by 41% while
producing no measurable impact on running time. Fourier descriptors are an established method of numerically
describing the shapes of object contours, but transform-based techniques are ordinarily avoided in this type of
application due to their computational cost. We devised a fast contour descriptor implementation for supernova
candidates that meets the tight processing budget of the application. Using the lowest-order descriptors (F1 and F-1) and
the total variance in the contour, we obtain one feature representing the eccentricity of the object and another denoting
its irregularity. Because the number of Fourier terms to be calculated is fixed and small, the algorithm runs in linear
time, rather than the O(n log n) time of an FFT. Constraints on object size allow further optimizations so that the total
cost of producing the required contour descriptors is about 4n addition/subtraction operations, where n is the length of
We present two fast algorithms that approximate the distance transformation of 2D binary images. Distance
transformation finds the minimum distances of all data points from a set of given object points,
however, such an exhaustive search for the minimum distances is infeasible in larger data spaces.
Unlike the conventional approaches, we extract the minimum distances with no explicit distance computation
by using either multi-directional dual scan line propagation or wave propagation methods. We
iteratively move on a scan line in opposite directions and assign an incremental counter to underlying
data points while checking for object points. To our advantage, the precision of dual scan propagation
method can be set according to the available computational power. Alternatively, we start a wavefront
from object points and propagate it outward at each step while assigning the number of steps taken as
the minimum distance. Unlike the most existing approaches, the computational load of our algorithm
does not depend on the number of object points either.
Due to the intricacies in the algorithms involved, the design of imaging software is considered to be more complex than
non-image processing software (Sangwan et al, 2005). A recent investigation (Larsson and Laplante, 2006) examined
the complexity of several image processing and non-image processing software packages along a wide variety of
metrics, including those postulated by McCabe (1976), Chidamber and Kemerer (1994), and Martin (2003). This work
found that it was not always possible to quantitatively compare the complexity between imaging applications and nonimage
processing systems. Newer research and an accompanying tool (Structure 101, 2006), however, provides a
greatly simplified approach to measuring software complexity. Therefore it may be possible to definitively quantify the
complexity differences between imaging and non-imaging software, between imaging and real-time imaging software,
and between software programs of the same application type.
In this paper, we review prior results and describe the methodology for measuring complexity in imaging systems. We
then apply a new complexity measurement methodology to several sets of imaging and non-imaging code in order to
compare the complexity differences between the two types of applications. The benefit of such quantification is far
reaching, for example, leading to more easily measured performance improvement and quality in real-time imaging
This paper presents a software framework providing a platform for parallel and distributed processing of video
data on a cluster of SMP computers. Existing video-processing algorithms can be easily integrated into the
framework by considering them as atomic processing tiles (PTs). PTs can be connected to form processing graphs
that model the data flow of a specific application. This graph also defines the data dependencies that determine
which tasks can be computed in parallel. Scheduling of the tasks in this graph is carried out automatically using
a pool-of-tasks scheme. The data format that can be processed by the framework is not restricted to image data,
such that also intermediate data, like detected feature points or object positions, can be transferred between PTs.
Furthermore, the processing can optionally be carried out efficiently on special-purpose processors with separate
memory, since the framework minimizes the transfer of data. Finally, we describe an example application for a
multi-camera view-interpolation system that we successfully implemented on the proposed framework.
Long Range Observation Systems is a domain, which carries a lot of interest in many fields such as astronomy (i.e.
planet exploration), geology, ecology, traffic control, remote sensing, and homeland security (surveillance and military
intelligence). Ideally, image quality would be limited only by the optical setup used, but, in such systems, the major
cause for image distortion is atmospheric turbulence. The paper presents a real-time algorithm that compensates images
distortion due to atmospheric turbulence in video sequences, while keeping the real moving objects in the video
unharmed. The algorithm is based on moving objects extraction; hence turbulence distortion compensation is applied
only to the static areas of images. For that purpose a hierarchical decision mechanism is suggested. First, a lightweight
computational decision mechanism which extracts most stationary areas is applied. Then a second step improves
accuracy by more computationally complex algorithms. Finally, all areas in the incoming frame that were tagged as
stationary are replaced with an estimation of the stationary scene. The restored videos exhibit excellent stability for
stationary objects while retaining real motion. This is achieved in real-time on standard computer hardware.
The MPEG-4 video standard extends the traditional frame-based processing with the option to compose several
video objects (VO) superimposed on a background sprite image. In our previous work, we presented a distributed,
multiprocessor based, scalable implementation of an MPEG-4 arbitrary-shaped decoder, which forms together
with the background sprite decoder an essential part for further scene rendering. For control of the multiprocessor
architecture, we have constructed a Quality-of-Service (QoS) management that monitors the availability of
required data and distributes the processing of individual tasks with guaranteed or best-effort services of the
platform. However, the proposed architecture with the combined guaranteed and best-effort services poses
problems for real-time scene rendering.
In this paper, we present a technique for proper run-time rendering of the final scene after decoding one VO
Layer. The individual video-object monitors check the data availability and select the highest quality for the
final scene rendering. The algorithm operates hierarchically both at the scene level and at the task level of the
video object processing. Whereas the earlier work on scalable implementation concentrated only on guaranteed
services, we now introduce a new element in the system architecture for the real-time control and fall back
mechanism of the best-effort services. This element is based on first, controlling data availability at task level,
and second, introducing the propagation service to QoS management. We present our simulation results in the
comparison with the standard "frame-skipping" technique that is the only currently available solution to this
type of rendering a scalable processing.
This paper provides an overall description of new image compression technology, Xena, and the
strengths in its lossless compression capability and speed in comparison to JPEG2000 and JPEG_LS, the world
standards in the continuous tone image compression field. Xena has achieved an extremely high compression
speed over 20 times faster than that of JPEG2000, while the compression capability remains almost the same.
Nowadays, video-conference tends to be more and more advantageous because of the economical and
ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let
users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses
generic hardware to achieve an economically realistic implementation. The basic functions of the system are
to capture the scene, transmit it through digital networks to other partners, and then render it according to
each partner's viewing characteristics. The image processing part should run in real-time.
We propose to analyze the whole system. it can be split into different services like central processing
unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network.
Most of the processing is done by CPU resource. It is composed of the 3D reconstruction and the detection
and tracking of faces from the video stream. However, the processing needs to be parallelized in several
threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal
In this paper, we discuss a hardware based low complexity JPEG 2000 video coding system. The
hardware system is based on a software simulation system, where temporal redundancy is exploited by coding of differential frames which are arranged in an adaptive GOP structure whereby the GOP
structure itself is determined by statistical analysis of differential frames. We present a hardware video
coding architecture which applies this inter-frame coding system to a Digital Signal Processor (DSP). The system consists mainly of a microprocessor (ADSP-BF533 Blackfin Processor) and a JPEG 2000 chip (ADV202).
Processing of the vector image information is seemed very important because multichannel sensors used in different
applications. We introduce novel algorithms to process color images that are based on order statistics and vectorial
processing techniques: Video Adaptive Vector Directional (VAVDF) and the Vector Median M-type K-Nearest
Neighbour (VMMKNN) Filters presented in this paper. It has been demonstrated that novel algorithms suppress
effectively an impulsive noise in comparison with different other methods in 3D video color sequences. Simulation
results have been obtained using video sequences "Miss America" and "Flowers", which were corrupted by noise.
The filters: KNNF, VGVDF, VMMKNN, and, finally the proposed VAVDATM have been investigated. The
criteria PSNR, MAE and NCD demonstrate that the VAVDATM filter has shown the best performances in each a
criterion when intensity of noise is more that 7-10%. An attempt to realize the real-time processing on the DSP is
presented for median type algorithms techniques.
Dealing with high-speed image acquisition and processing systems, the speed of operation is often limited by the amount
of available light, due to short exposure times. Therefore, high-speed applications often use line-scan cameras, based on
charge-coupled device (CCD) sensors with time delayed integration (TDI). Synchronous shift and accumulation of
photoelectric charges on the CCD chip - according to the objects' movement - result in a longer effective exposure time
without introducing additional motion blur. This paper presents a high-speed color line-scan camera based on a
commercial complementary metal oxide semiconductor (CMOS) area image sensor with a Bayer filter matrix and a field
programmable gate array (FPGA). The camera implements a digital equivalent to the TDI effect exploited with CCD
cameras. The proposed design benefits from the high frame rates of CMOS sensors and from the possibility of arbitrarily
addressing the rows of the sensor's pixel array. For the digital TDI just a small number of rows are read out from the area
sensor which are then shifted and accumulated according to the movement of the inspected objects. This paper gives a
detailed description of the digital TDI algorithm implemented on the FPGA. Relevant aspects for the practical
application are discussed and key features of the camera are listed.
This paper presents an approach based on linear combinations of order statistics for speckle and impulsive noise
reduction in the 3-D ultrasound images. The proposed technique uses the Rank M-type (RM) estimator and this one is
adapted to 3-D image processing applications. The real-time implementation is presented using real clinical ultrasound
images by means of use of the DSP TMS320C6711. In addition, the results from known techniques are compared with
the proposed method to demonstrate its performance in terms of noise suppression, fine detail preservation, and
processing time criteria.
Region quadtrees are convenient tools for hierarchical image analysis. Like the related Haar wavelets, they are simple to
generate within a fixed calculation time. The clustering at each resolution level requires only local data, yet they deliver
intuitive classification results. Although the region quadtree partitioning is very rigid, it can be rapidly computed from
arbitrary imagery. This research article demonstrates how graphics hardware can be utilized to build region quadtrees at
unprecedented speeds. To achieve this, a data-structure called HistoPyramid registers the number of desired image
features in a pyramidal 2D array. Then, this HistoPyramid is used as an implicit indexing data structure through
quadtree traversal, creating lists of the registered image features directly in GPU memory, and virtually eliminating bus
transfers between CPU and GPU. With this novel concept, quadtrees can be applied in real-time video processing on
standard PC hardware. A multitude of applications in image and video processing arises, since region quadtree analysis
becomes a light-weight preprocessing step for feature clustering in vision tasks, motion vector analysis, PDE
calculations, or data compression. In a sidenote, we outline how this algorithm can be applied to 3D volume data,
effectively generating region octrees purely on graphics hardware.
This paper presents a new algorithm for color-based tracking of objects with radical color using modified Mean shift. Conventional color-based object tracking using mean shift does not provide appropriate result when initial color distribution disappears. In this proposed algorithm, Mean shift analysis is first used to derive the object candidate with maximum increase of density direction from current position. Then the proposed algorithm is used iteratively to update the object color information if the object color is changed. The implementation of the new algorithm achieves effective real-time tracking of objects with complete color changed by time. The validity of the effective approach is illustrated by the presentation of experimental results obtained using the methods described in the paper.
Today's technologies in video analysis use state of the art systems and formalisms like onthologies and datawarehousing
to handle huge amount of data generated from low-level descriptors to high-level descriptors. In the IST
CARETAKER project we develop a multi-dimensional database with distributed features to add a centric data
view of the scene shared between all the sensors of a network.
We propose to enhance possibilities of this kind of system by delegating the intelligence to a lot of other
entities, also known as "Agents" which are specialized little applications, able to walk across the network and
work on dedicated sets of data related to their core domain. In other words, we can reduce, or enhance, the
complexity of the analysis by adding or not feature specific agents, and processing is limited to the data concerned
by the processing.
This article explains how to design and develop an agent oriented systems which can be used by a video
analysis datawarehousing. We also describe how this methodology can distribute the intelligence over the system,
and how the system can be extended to obtain a self reasoning architecture using cooperative agents. We will
demonstrate this approach.
In this paper, we present a scale independent automatic face location technique which can detect the locations of frontal
human faces from images. Our hierarchical approach of knowledge-based face detection composed of three levels. Level
1 consists of a simple but effective eyes model that generates a set of rules to judge whether or not there exists a human
face candidate in the current search area in a scale-independent manner and in a single scan of the image. To utilize this
model, we define a new operator - extended projection and define two new concepts: single projection line and pair
projection line. At level 2, an improved model of Yang's mosaic image model is applied to check the consistency of
visual features with respect to the human face within each 3x3 blocks of a candidate face image. At the third level, we
apply a SVM based face model, to eliminate the false positives obtained from level 2. Experimental results show the
combined rule-based and statistical approach works well in detecting frontal human faces in uncluttered scenes.
Digital architecture for real time processing in vision systems for control of traffic lights is presented. The main idea of
this work is to identify cars on intersections, switching traffic lights in order to reduce traffic jam. The architecture is
based on a color image segmentation algorithm that comprises three stages. Stage one is a color space transformation in
order to measure the color difference properly, image colors are represented in a modified L* u* v* color space. Stage
two consists in a color reduction, where image colors are projected into a small set of prototypes using a self-organizing
map (SOM). Stage three realizes color clustering, where simulated annealing (SA) seeks the optimal clusters from SOM
prototypes. The proposed hardware architecture is implemented in a Virtex II Pro FPGA and tested; having a processing
time inferior to 25ms per 128x128 pixels. The implementation comprises 262,479 equivalent gates.