In this paper, a novel distortion model based on a mixture of Laplacian distributions is presented for the transform
coefficients of predicted residues in quadtree coding. The mixture Laplacian distribution is made on the coding structure
with different quadtree coding unit (CU) depth. Moreover, for intra-coded CU, the distortion model is asymptotically
simplified based on the signal characteristics of the transform coefficient. The proposed mixture model of multiple
Laplacian distributions is tested for the High Efficiency Video Coding (HEVC) Test Model (HM) with quadtreestructured
Coding Unit (CU) and Transform Unit (TU). The experimental results show that the proposed model achieves
more accurate results of distortion estimation than the single probability models.
HEVC is the new video coding standard developed in a joint effort (JCT-VC) by ISO MPEG and ITU-T VCEG. As
other state-of-the-art block-based inter-prediction codec, it is very sensitive to illumination variations in-between frames.
To cope with this limitation, the weighted prediction (WP) tool has been proposed. A comparison of the performance of
WP in HEVC and MPEG-4 AVC/H.264 is carried out. The efficiency of WP is very dependent on the quality of the
estimated WP parameters. The different stages of state-of-art WP parameters estimators are discussed and a new
algorithm is proposed. It is based on histogram matching with global motion compensation. Several options are
evaluated and comparison is made with other existing methods.
We examine the impact of various encoding parameters on the distribution of the DCT coefficients for H.264-like
video coders. We model the distribution of the frame DCT coefficients using the most common Laplacian and
Cauchy distributions. We show that the resolution, the quantization levels and the coding type have significant
impact on the accuracy of the Laplacian and Cauchy distribution based models. We also show that the transform
kernel (4 ×4 vs 8 × 8) has little impact. Moreover, we show that for the video sources that have little temporal
or spatial detail, such as flat regions, the distribution of the frame DCT coefficients resembles a Laplacian
distribution. When the video source exhibits more detail, such as texture and edges, the distribution of the
frame DCT coefficients resembles a Cauchy distribution. The correlation between the details of the video source
to the two probability distributions can be used to further improve the estimation of the distribution of the frame
DCT coefficients, by using a classification based approach.
To improve both coding efficiency and visual quality of video coding, in this paper, we present an adaptive loop filtering
design which exploits local directional characteristics exhibit in the video content. The design combines linear spatial
filtering and directional filtering with a similarity mapping function. We compute and compare multiple simple
directional features to classify blocks in a video frame into classes with different dominant orientations. Each class of
blocks adapt to a directional filter, with symmetric constraints imposed on the filter coefficients according to the
dominant orientation determined by the classification. To emphasis pixel similarity for explicit adaptation to edges, we
use a simple hard-threshold mapping function to avoid artifacts arising from across-edge filtering. Our design uses only
4 filers per frame with fixed 7×7 diamond-shaped filter support, while achieving better coding efficiency and improved
visual quality especially along edges, as compared to other approaches using up to 16 filters with up to 7 vertical × 9
horizontal diamond-shaped filter support.
A distributed video coding (DVC) system based on wavelet transform and set partition coding (SPC) is presented
in this paper. Conventionally the significance map (sig-map) of SPC is not conducive to Slepian-Wolf
(SW) coding, because of the difficulty of generating a side information sig-map and the sensitivity to decoding
errors. The proposed DVC system utilizes a higher structured significance map, named progressive significance
map (prog-sig-map), which structures the significance information into two parts: a high-level summation significance
map (sum-sig-map) and a low-level complementary significance map (comp-sig-map). This prog-sig-map
alleviates the above difficulties and thus makes part of the prog-sig-map (specifically, the fixed-length-coded
comp-sig-map) suitable for SW coding. Simulation results are provided showing the improved rate-distortion
performance of the DVC system even with a simple system configuration.
A new side information generation algorithm using dynamic motion estimation and post processing is proposed for
improved distributed video coding. Multiple reference frames are employed for motion estimation at the side
information frame generation block of the decoder. After motion estimation and compensation, post processing is
applied to improve the hole and overlapped areas on the reconstructed side information frame. The proposed side
information method contributes to improve the quality of reconstructed frames at the distributed video decoder. The
average encoding time of the distributed video coding is around 15% of H.264 inter coding and 40% of H.264 intra
coding. The proposed side information based distributed video coding demonstrates improved performance compared
with that of H.264 intra coding.
Image interpolation is one of the most elementary imaging research topics. A number of image interpolation methods
have been developed for uncompressed images in the literature. However, a lot of videos have already been stored in
MPEG-2 format or have to be transmitted in MPEG-2 format due to bandwidth limitation. The image interpolation
methods developed for uncompressed images may not be effective when directly applied to compressed videos, because
on the one hand, they do not utilize the information existed in the coded bitstreams; on the other hand, they do not
consider quantization error, which may be dominant in some cases. Inspired by the success of the intra prediction in
H.264/AVC and the edge-directed image interpolation methods (such as LAZA and NEDI), we propose a directional
frame interpolation for MPEG compressed video. In the proposed method, 8×8 intra blocks in I frames are first classified
to the nine block directions in transform domain. Then the interpolation on each block is performed along its block
direction. For each block direction, an optimal Wiener filter is trained based on the representative video sequences and
then used for its interpolation. In the similar way, for each pixel in an inter block in P or B frames, the interpolation is
performed along the direction of its corresponding reference block. The experimental results demonstrate that the
proposed method achieves better performance than the traditional linear methods such as Bicubic and Bilinear and the
edge-directed methods such as LAZA and NEDI, while keeping low computational complexity which meets the
requirement of practical applications.
For the higher coding performance than the previous video coding standards, high
efficiency video coding (HEVC) adopts an angular intra prediction method, which
requires heavy computational complexity due to the increased intra prediction
modes. In this paper, we propose a fast intra prediction mode decision based on
the estimation of rate distortion cost using Hadamard transform to reduce the
number of intra prediction mode and early termination whether the current coding
unit is splitted or not. The experimental results show that the proposed method
reduces the computational complexity of intra prediction in HEVC and achieves
similar coding performance to that of HEVC test mode 2.1
The improvements in scanning technologies lead obtaining and managing range image databases. Hence, need of
describing and indexing this type of data arises. Since a range model owns different properties compared to complete 3D
models, we propose a method that relies on Spherical Harmonics Transform (SHT) for retrieving similar models where
the query and the database both consist of only range models. Although SHT, is not a new concept in shape retrieval
research, we propose to utilize it for range images by representing the models in a world seen from the camera. The
difference and advantage of our algorithm is being information lossless. That is the available shape information is
completely included in obtaining the descriptor whereas other mesh retrieval applications utilizing SHT "approximates"
the shape that leads information loss. The descriptor is invariant to scale and rotations about z-axis. Proposed method is
tested on a large database having high diversity. Performance of the proposed method is superior to the performance of
the D2 distribution.
Multiview video in "texture-plus-depth" format enables decoder to synthesize freely chosen intermediate views
for enhanced visual experience. Nevertheless, transmission of multiple texture and depth maps over bandwidthconstrained
and loss-prone networks is challenging, especially for conferencing applications with stringent deadlines.
In this paper, we examine the problem of loss-resilient coding of depth maps by exploiting two observations.
First, different depth macroblocks have significantly different error sensitivities with respect to the reconstructed
images. Second, unlike texture, the relative overhead of using reference pictures with large prediction distance is
low for depth maps. This motivates our approach of assigning a weight to represent the varying error sensitivity
of each macroblock and using such weights to guide selection of reference frames. Results show that (1) errors in
depth maps in sequence with high motion yields significant drop in quality in reconstructed images, and (2) that
the proposed scheme can efficiently maintain the quality of reconstructed images even at relatively high packet
loss rates of 3-5%.
The increasing popularity of 3D TV creates the desire for more 3D video content. Unfortunately, it will take much time
for there to be an abundance of 3D video content derived from stereoscopic cameras. However, there currently exists a
vast quantity of 2D video material that can potentially be converted to 3D. Converting 2D into 3D is a complex process,
and so can be costly. Thus, an automated solution that can be achieved with low-complexity would be desirable. Our
past research work has already resulted in a real-time 2D-to-3D conversion technique, but this generates a surrogate
depth map that results in pseudo-3D and not necessarily accurate 3D. Our current research focuses on improving the
accuracy of the 3D effect by implementing a technique composed of a multi-step process to determine the depth-order of
objects, with respect to the camera, in each frame of a video sequence, and incorporating into our existing technique.
The multi-step process can be summarized as follows: detect pixels that belong to an edge; use block-based motion
estimation to determine if an edge pixel is moving and thus belongs to a moving edge (i.e., occlusion boundary);
determine which of either the left or right side block moves with the moving edge pixel, and by deduction determines the
occluding object; select seed points from the moving edge pixels; implement color-only region growing from each seed;
cluster regions into objects based on their proximity; globally assign depth-order to the objects based on perceived
viewing perspective of a frame; and modify the original surrogate depth map to create a more accurate depth map. Test
results show that this is a very effective and fast technique for deriving the depth-order of objects and generating more
accurate depth map values.
Bit allocation is a key issue in image/video coding. An optimal bit allocation can improve the encoding performance,
which means to maximize the image/video quality in the constraint of bit rate, or vice versa, to minimize the bit rate with
a restrictive quality. We suggest that the bit allocation for macro blocks (MBs) can be optimized by aiming at the
constant perceptual quality (CPQ) inside an image/a frame. Based on the MINMAX criterion, we proposed a multi-pass
block-layer bit allocation scheme for intra frame encoding, in which all the local areas in a frame get approximately the
same perceptual quality by choosing the quantization parameter (QP) for each MB. The experimental results show that
the proposed method can improve the encoding performance obviously.
Data-dependent filtering methods are powerful techniques for image denoising. Beginning with any base procedure
(nonlinear filter), repeated applications of the same process can be interpreted as a discrete version of
anisotropic diffusion. As such, a natural question is "What is the best stopping time in iterative data-dependent
filtering?" This is the general question we address in this paper. To develop our new method, we estimate the
mean-squared-error (MSE) in each image patch. This estimate is used to characterize the effectiveness of the
iterative filtering process, and its minimization yields the ideal stopping time for the diffusion process.
A viewer's visual attention during video playback is the matching of his eye gaze movement to the changing
video content over time. If the gaze movement matches the video content (e.g., follow a rolling soccer ball),
then the viewer keeps his visual attention. If the gaze location moves from one video object to another, then
the viewer shifts his visual attention. A video that causes a viewer to shift his attention often is a "busy" video.
Determination of which video content is busy is an important practical problem; a busy video is difficult for
encoder to deploy region of interest (ROI)-based bit allocation, and hard for content provider to insert additional
overlays like advertisements, making the video even busier. One way to determine the busyness of video content
is to conduct eye gaze experiments with a sizable group of test subjects, but this is time-consuming and costineffective.
In this paper, we propose an alternative method to determine the busyness of video-formally called
video attention deviation (VAD): analyze the spatial visual saliency maps of the video frames across time. We first
derive transition probabilities of a Markov model for eye gaze using saliency maps of a number of consecutive
frames. We then compute steady state probability of the saccade state in the model-our estimate of VAD.
We demonstrate that the computed steady state probability for saccade using saliency map analysis matches
that computed using actual gaze traces for a range of videos with different degrees of busyness. Further, our
analysis can also be used to segment video into shorter clips of different degrees of busyness by computing the
Kullback-Leibler divergence using consecutive motion compensated saliency maps.
Given a blurred image of a known test grid and an accurate estimate of the unblurred image, it has been demonstrated
that the underlying blur kernel (or point-spread function, PSF) can be reliably estimated. Unfortunately,
the estimate of the sharp image can be sensitive to common imperfections in the setup used to obtain the blurred
image, and errors in the image estimate result in an unreliable PSF estimate.
We propose a robust ad-hoc method to estimate a sharp prior image, given a blurry, noisy image of the test
grid from Joshi1 taken in imperfect lab and lighting conditions. The proposed algorithm is able to reliably reject
superfluous image content, can deal with spatially-varying lighting, and is insensitive to errors in alignment of
the grid with the image plane.
We demonstrate the algorithms performance through simulation, and with a set of test images. We also show
that our grid registration algorithm leads to improved PSF estimation and deblurring, compared to an affine
registration using spatially invariant lighting correction.
In this paper, we present a new scalable segmentation algorithm called JHMS (Joint Hierarchical and Multiresolution
Segmentation) that is characterized by region-based hierarchy and resolution scalability. Most of the
proposed algorithms either apply a multiresolution segmentation or a hierarchical segmentation. The proposed
approach combines both multiresolution and hierarchical segmentation processes. Indeed, the image is considered
as a set of images at different levels of resolution, where at each level a hierarchical segmentation is performed.
Multiresolution implies that a segmentation of a given level is reused in further segmentation processes operated
at next levels so that to insure contour consistency between different resolutions. Each level of resolution provides
a Region Adjacency Graph (RAG) that describes the neighborhood relationships between regions within
a given level of the multiresolution representation. Region label consistency is preserved thanks to a dedicated
projection algorithm based on inter-level relationships. Moreover, a preprocess based on a quadtree partitioning
reduces the amount of input data thus leading to a lower overall complexity of the segmentation framework.
Experiments show that we obtain effective results when compared to the state of the art together with a lower
LED-backlit LCD displays hold the promise of improving the image quality while reducing the energy consumption
with signal-dependent local dimming. To fully realize such potentials we propose a novel local dimming
technique that jointly optimizes the intensities of LED backlights and the attenuations of LCD pixels. The
objective is to minimize the distortion in luminance reproduction due to the leakage of LCD and the coarse
granularity of the LED lights. The optimization problem is formulated as one of linear programming, and both
exact and approximate algorithms are proposed. Simulation results demonstrate superior performances of the
proposed algorithms over the existing local dimming algorithms.
In the context of immersive communications, we propose a method enabling natural video interactions through hand
gesture recognition between users and a video meeting system. The interaction can be performed either by the mean of
hand posture recognition or by the dynamic hand gesture recognition according to user's preference. The statistical
approach adopted in our work to recognize hand posture has shown accurate results for both performance evaluation and
user test. Besides, the combination of data-mining fields and signal processing for dynamic gestures recognition allows
us to define the appropriate rules and to reduce the confusion between gestures. Furthermore, the hand region extraction
is based on both skin color and background subtraction to avoid the detection of static objects that have a similar skin
color. Finally, the collected user's feedback allows as to evaluate our approach from the user's point of view and to
define the limitations that will be discussed in our perspectives in order to improve the results.
Underwater survey videos of the seafloor are usually plagued with heavy vignetting (radial falloff) outside of
the light source beam footprint on the seabed. In this paper we propose a novel multi-frame approach for
removing this vignetting phenomenon which involves estimating the light source footprint on the seafloor, and
the parameters for our proposed vignetting model. This estimation is accomplished in a bayesian framework with
an iterative SVD-based optimization. Within the footprint, we leave the image contents as is, whereas outside
this region, we perform vignetting correction. Our approach does not require images with different exposure
values or recovery of the camera response function, and is entirely based on the attenuation experienced by
point correspondences accross multiple frames. We verify our algorithm with both synthetic and real data, and
then compare it with an existing technique. Results obtained show significant improvement in the fidelity of the
The image processing pipeline of a traditional digital camera is often limited by processing power. A better
image quality could be generated only if more complexity was allowed. In a raw data workflow most algorithms
are executed off-camera. This allows the use of more sophisticated algorithms for increasing image quality
while reducing camera complexity. However, this requires a major change in the processing pipeline: a lossy
compression of raw camera images might be used early in the pipeline. Subsequent off-camera algorithms then
need to work on modified data. We analyzed this problem for the interpolation of defect pixels. We found
that a lossy raw compression spreads the error from uncompensated defects over many pixels. This leads to a
problem as this larger error cannot be compensated after compression. The use of high quality, high complexity
algorithms in the camera is also not an option. We propose a solution to this problem: Inside the camera only a
simple and low complexity defect pixel interpolation is used. This significantly reduces the compression error for
neighbors of defects. We then perform a lossy raw compression and compensate for defects afterwards. The high
complexity defect pixel interpolation can be used off-camera. This leads to a high image quality while keeping
the camera complexity low.
In this paper we address the problem of cubic panorama image dataset compression. Two state-of-the-art approaches,
namely: H.264/MPEG4 AVC and Dirac video codec, are used and compared for the application of
virtual navigation in image based representations of real world environments. Different prediction structures and
Group Of Pictures (GOP) sizes are investigated and compared on this new type of visual data. Based on the
obtained results, as well as the requirements of the system, an efficient prediction structure and bitstream syntax
are proposed. The concept of Epipolar geometry is introduced and a method to facilitate efficient disparity
estimation is suggested.
A novel context template design method is presented for lossless compression of halftone images. In each pixel traversal,
the proposed method modifies context template according to inter-pixel correlation. Then, each pixel is arithmetic coded
by using the updated context template. Based on its adaptation to local pixel correlation, the proposed design scheme
outperforms the standard JBIG arithmetic coding by 29 % of bit saving.
This paper builds on a prior work for player detection, and proposes an efficient and effective method to distinguish among players based on the numbers printed on their jerseys. To extract the numbers, the dominant colors of the jersey are learnt during an initial phase and used to speed up the segmentation of the candidate digit regions. An additional set of criteria, considering the relative position and size (compared to the player bounding box) and the density (compared to the digit rectangular support) of the digit, are used to filter out the regions that obviously do not correspond to a digit. Once the plausible digit regions have been extracted, their recognition is based on feature-based classification. A number of original features are proposed to increase the robustness against digit appearance changes, resulting from the font thickness variability and from the deformations of the jersey during the game. Finally, the efficiency and the effectiveness of the proposed method are demonstrated on a real-life basketball dataset. It shows that the proposed segmentation runs about ten times faster than the mean-shift algorithm, but also outlines that the proposed additional features significantly increase the digit recognition accuracy. Despite significant deformations, 40% of the samples, that can be visually recognized as digits, are well classified as numbers. Out of these classified samples, more than 80% of them are correctly recognized. Besides, more than 95% of the samples, that are not numbers, are correctly identified as non-numbers.
The present study is focused on the problem of quality-driven cross-layer optimization of Direct Sequence Code
Division Multiple Access (DS-CDMA) Wireless Visual Sensor Networks (WVSNs). We consider a centralized
topology where each sensor transmits directly to a Centralized Control Unit (CCU), which manages the network
resources. In real environments, the visual sensors view and transmit scenes with varying amount of motion.
Thus, each recorded video has its individual motion characteristics. Our aim is to enable the CCU to jointly
allocate the transmission power and source-channel coding rates for each WVSN node under certain quality-
driven criteria and constant chip rate. We consider two approaches for the cross-layer optimization scheme.
In the first, the optimal set of network resources is assigned to each node according to its individual motion
characteristics. In the second approach, the nodes are partitioned into clusters according to the amount of motion
in the recorded scenes. Then, all nodes within a cluster are assigned identical network resources. Both approaches
result in mixed-integer optimization problems, which are solved with the Particle Swarm Optimization algorithm.
Experimental results demonstrate the quality/complexity trade-off for the two approaches.
Surveillance applications usually require high levels of video quality, resulting in high power consumption. The
existence of a well-behaved scheme to balance video quality and power consumption is crucial for the system's
performance. In the present work, we adopt the game-theoretic approach of Kalai-Smorodinsky Bargaining
Solution (KSBS) to deal with the problem of optimal resource allocation in a multi-node wireless visual sensor
network (VSN). In our setting, the Direct Sequence Code Division Multiple Access (DS-CDMA) method is
used for channel access, while a cross-layer optimization design, which employs a central processing server,
accounts for the overall system efficacy through all network layers. The task assigned to the central server is
the communication with the nodes and the joint determination of their transmission parameters. The KSBS
is applied to non-convex utility spaces, efficiently distributing the source coding rate, channel coding rate and
transmission powers among the nodes. In the underlying model, the transmission powers assume continuous
values, whereas the source and channel coding rates can take only discrete values. Experimental results are
reported and discussed to demonstrate the merits of KSBS over competing policies.
The research described in this paper uses the CMA-ES evolution strategy to optimize matched forward and inverse
transform pairs for the compression and reconstruction of images transmitted from Mars rovers under conditions subject
to quantization error. Our best transforms outperform the 2/6 wavelet (whose integer variant was used onboard the
rovers), substantially reducing error in reconstructed images without allowing increases in compressed file size. This
result establishes a new state-of-the-art for the lossy compression of images transmitted over the deep-space channel.
There is a world-wide effort to apply 21st century intelligence to evolving our transportation networks. The goals of
smart transportation networks are quite noble and manifold, including safety, efficiency, law enforcement, energy
conservation, and emission reduction. Computer vision is playing a key role in this transportation evolution. Video
imaging scientists are providing intelligent sensing and processing technologies for a wide variety of applications and
services. There are many interesting technical challenges including imaging under a variety of environmental and
illumination conditions, data overload, recognition and tracking of objects at high speed, distributed network sensing and
processing, energy sources, as well as legal concerns. This conference presentation and publication is brief introduction
to the field, and will be followed by an in-depth journal paper that provides more details on the imaging systems and
In-car navigation systems have grown in complexity over the recent years, most notably in terms of route
calculation, usability and graphical rendering. In order to guarantee correct system behavior, navigation systems
need to be tested under real operating conditions, i.e. with field-tests on the road. In this paper, we will focus
on a fast compression solution for 2D navigation renderings, so that field-tests can be archived and handed over
to software engineers for subsequent evaluation. No parameters from the rendering procedure are available since
access to the system is limited to the raw display signal. Rotation is a dominant factor throughout all navigation
sequences, so we show how to reconstruct rotational motion parameters with high accuracy and develop a Global
Motion Estimation (GME) method as support for a subsequent H.264/AVC video encoder. By integrating ratedistortion
optimization concepts into our scheme, we can efficiently omit the segmentation of static and non-static
areas. The runtime of the compression solution, which achieves bitrate savings of up to 19.5%, is evaluated both
on a laptop CPU and an embedded OMAP4430 system on chip.
The availability of large-scale databases containing street-level panoramic images offers the possibility to perform
semi-automatic surveying of real-world objects such as traffic signs. These inventories can be performed
significantly more efficiently than using conventional methods. Governmental agencies are interested in these
inventories for maintenance and safety reasons. This paper introduces a complete semi-automatic traffic sign inventory
system. The system consists of several components. First, a detection algorithm locates the 2D position
of the traffic signs in the panoramic images. Second, a classification algorithm is used to identify the traffic sign.
Third, the 3D position of the traffic sign is calculated using the GPS position of the photographs. Finally, the
results are listed in a table for quick inspection and are also visualized in a web browser.
Automatic license plate recognition (ALPR) is an important capability for traffic surveillance applications, including toll
monitoring and detection of different types of traffic violations. ALPR is a multi-stage process comprising plate
localization, character segmentation, optical character recognition (OCR), and identification of originating jurisdiction
(i.e. state or province). Training of an ALPR system for a new jurisdiction typically involves gathering vast amounts of
license plate images and associated ground truth data, followed by iterative tuning and optimization of the ALPR
algorithms. The substantial time and effort required to train and optimize the ALPR system can result in excessive
operational cost and overhead. In this paper we propose a framework to create an artificial set of license plate images for
accelerated training and optimization of ALPR algorithms. The framework comprises two steps: the synthesis of license
plate images according to the design and layout for a jurisdiction of interest; and the modeling of imaging
transformations and distortions typically encountered in the image capture process. Distortion parameters are estimated
by measurements of real plate images. The simulation methodology is successfully demonstrated for training of OCR.
In this paper, we present a novel video markup language for articulating semantic traffic data from surveillance cameras
and other sensors. The markup language includes three layers: sensor descriptions, traffic measurement, and application
interface descriptions. The multi-resolution based video codec algorithm enables a quality-of-service-aware video
streaming according the data traffic. A set of object detection APIs are developed using Convex Hull and Adaptive
Proportion models and 3D modeling. It is found that our approach outperforms 3D modeling and Scale-Independent
Feature Transformation (SIFT) algorithms in terms of robustness. Furthermore, our empirical data shows that it is
feasible to use TCML to facilitate the real-time communication between an infrastructure and a vehicle for safer and
more efficient traffic control.
The Digital Imaging and Remote Sensing Laboratory (DIRS) at the Rochester Institute of Technology, along
with the Savannah River National Laboratory is investigating passive methods to quantify vehicle loading.
The research described in this paper investigates multiple vehicle indicators including brake temperature, tire
temperature, engine temperature, acceleration and deceleration rates, engine acoustics, suspension response, tire
deformation and vibrational response. Our investigation into these variables includes building and implementing a
sensing system for data collection as well as multiple full-scale vehicle tests. The sensing system includes; infrared
video cameras, triaxial accelerometers, microphones, video cameras and thermocouples. The full scale testing
includes both a medium size dump truck and a tractor-trailer truck on closed courses with loads spanning the
full range of the vehicle's capacity. Statistical analysis of the collected data is used to determine the effectiveness
of each of the indicators for characterizing the weight of a vehicle. The final sensing system will monitor multiple
load indicators and combine the results to achieve a more accurate measurement than any of the indicators could
Machine learning methods have been successfully applied to image object classification problems where there is clear
distinction between classes and where a comprehensive set of training samples and ground truth are readily available.
The transportation domain is an area where machine learning methods are particularly applicable, since the classification
problems typically have well defined class boundaries and, due to high traffic volumes in most applications, massive
roadway data is available. Though these classes tend to be well defined, the particular image noises and variations can be
challenging. Another challenge is the extremely high accuracy typically required in most traffic applications. Incorrect
assignment of fines or tolls due to imaging mistakes is not acceptable in most applications. For the front seat vehicle
occupancy detection problem, classification amounts to determining whether one face (driver only) or two faces (driver
+ passenger) are detected in the front seat of a vehicle on a roadway. For automatic license plate recognition, the
classification problem is a type of optical character recognition problem encompassing multiple class classification. The
SNoW machine learning classifier using local SMQT features is shown to be successful in these two transportation
Automotive Active Safety(AAS) is the main branch of intelligence automobile study and pedestrian detection is the key
problem of AAS, because it is related with the casualties of most vehicle accidents. For on-board pedestrian detection
algorithms, the main problem is to balance efficiency and accuracy to make the on-board system available in real scenes,
so an on-board pedestrian detection and warning system with the algorithm considered the features of side pedestrian is
The system includes two modules, pedestrian detecting and warning module. Haar feature and a cascade of stage
classifiers trained by Adaboost are first applied, and then HOG feature and SVM classifier are used to refine false
positives. To make these time-consuming algorithms available in real-time use, a divide-window method together with
operator context scanning(OCS) method are applied to increase efficiency. To merge the velocity information of the
automotive, the distance of the detected pedestrian is also obtained, so the system could judge if there is a potential
danger for the pedestrian in the front. With a new dataset captured in urban environment with side pedestrians on zebra,
the embedded system and its algorithm perform an on-board available result on side pedestrian detection.