In our previous studies, vehicle surfaces’ vibrations caused by operating engines measured by Laser Doppler Vibrometer (LDV) have been effectively exploited in order to classify vehicles of different types, e.g., vans, 2-door sedans, 4-door sedans, trucks, and buses, as well as different types of engines, such as Inline-four engines, V-6 engines, 1-axle diesel engines, and 2-axle diesel engines. The results are achieved by employing methods based on an array of machine learning classifiers such as AdaBoost, random forests, neural network, and support vector machines. To achieve effective classification performance, we seek to find a more reliable approach to pick authentic vibrations of vehicle engines from a trustworthy surface. Compared with vibrations directly taken from the uncooperative vehicle surfaces that are rigidly connected to the engines, these vibrations are much weaker in magnitudes. In this work we conducted a systematic study on different types of objects. We tested different types of engines ranging from electric shavers, electric fans, and coffee machines among different surfaces such as a white board, cement wall, and steel case to investigate the characteristics of the LDV signals of these surfaces, in both the time and spectral domains. Preliminary results in engine classification using several machine learning algorithms point to the right direction on the choice of type of object surfaces to be planted for LDV measurements.
Recently Laser Doppler Vibrometry (LDV) has been widely employed to achieve long-range sensing in military applications, due to its high spatial and spectral resolutions in vibration measurements that facilitates effective analysis using signal processing and machine learning techniques. Based on the collaboration of The City College of New York and the Air Force Research Laboratory in the last several years, we have developed a bank of algorithms to classify different types of vehicles, such as sedans, vans, pickups, motor-cycles and buses, and identify various kinds of engines, such as Inline-4, V6, 1- and 2-axle truck engines. Thanks to the similarities of the LDV signals to acoustic and other time-series signals, a large of body of existing approaches in literature has been employed, such as speech coding, time series representation, Fourier analysis, pyramid analysis, support vector machine, random forest, neural network, and deep learning algorithms. We have found that the classification results based on some of these methods are extremely promising. For instance, our vehicle engine classification algorithm based on the pyramid Fourier analysis of the engine vibration and fundamental frequencies of vehicle surfaces over the data collected by our LDV in the summer of 2014 have consistently attained 96% precision. In laboratory studies or well-controlled environments, a great array of high quality LDV measured points all over the vehicles are permitted by the vehicle owners, therefore extensive classifier training can be conducted to effectively capture the innate properties of surfaces in the space and spectral domains. However, in real contested environments, which are of utmost interest and practical importance to military applications, the uncooperative vehicles are either fast moving or purposively concealed and thus not many high quality LDV measurements can be made. In this work an intensive study is performed to compare the performance in vehicle classifications under the cooperative and uncooperative environments via LDV measurements based on a content-based indexing approach. The method uses an iterative Fourier analysis and an artificial feed-forward neural network. As our empirical studies have suggested, even in uncooperative and contested environments, with adequate training dataset for similar vehicles, our classification approach can still yield promising recognition rates.
Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual
alignment, data labeling and feature selection under uncontrolled environments with occlusions, motion blurs, varying
image resolutions and perspective distortions. In this work, we propose an effective multimodal temporal panorama
approach for the task using a novel long-range audio-visual sensing system. A new audio-visual vehicle (AVV) dataset
for moving vehicle detection and classification is created, which features automatic vehicle detection and audio-visual
alignment, accurate vehicle extraction and reconstruction, and efficient data labeling. In particular, vehicles' visual
images are reconstructed once detected in order to remove most of the occlusions, motion blurs, and variations of
perspective views. Multimodal audio-visual features are extracted, including global geometric features (aspect ratios,
profiles), local structure features (HOGs), as well various audio features (MFCCs, etc). Using radial-based SVMs, the
effectiveness of the integration of these multimodal features is thoroughly and systemically studied. The concept of MTP
may not be only limited to visual, motion and audio modalities; it could also be applicable to other sensing modalities
that can obtain data in the temporal domain.
In both military and civilian applications, abundant data from diverse sources captured on airborne platforms are often
available for a region attracting interest. Since the data often includes motion imagery streams collected from multiple
platforms flying at different altitudes, with sensors of different field of views (FOVs), resolutions, frame rates and
spectral bands, it is imperative that a cohesive site model encompassing all the information can be quickly built and
presented to the analysts. In this paper, we propose to develop an Uncertainty Preserving Patch-based Online Modeling
System (UPPOMS) leading towards the automatic creation and updating of a cohesive, geo-registered, uncertaintypreserving,
efficient 3D site terrain model from passive imagery with varying field-of-views and phenomenologies. The
proposed UPPOMS has the following technical thrusts that differentiate our approach from others: (1) An uncertaintypreserved,
patch-based 3D model is generated, which enables the integration of images captured with a mixture of
NFOV and WFOV and/or visible and infrared motion imagery sensors. (2) Patch-based stereo matching and multi-view
3D integration are utilized, which are suitable for scenes with many low texture regions, particularly in mid-wave
infrared images. (3) In contrast to the conventional volumetric algorithms, whose computational and storage costs grow
exponentially with the amount of input data and the scale of the scene, the proposed UPPOMS system employs an online
algorithmic pipeline, and scales well to large amount of input data. Experimental results and discussions of future work
will be provided.
Association of audio events with video events presents a challenge to a typical camera-microphone approach in order to
capture AV signals from a large distance. Setting up a long range microphone array and performing geo-calibration of
both audio and video sensors is difficult. In this work, in addition to a geo-calibrated electro-optical camera, we propose
to use a novel optical sensor - a Laser Doppler Vibrometer (LDV) for real-time audio sensing, which allows us to
capture acoustic signals from a large distance, and to use the same geo-calibration for both the camera and the audio (via
LDV). We have promising preliminary results on association of the audio recording of speech with the video of the
Circular aerial video provides a persistent view over a scene and generates a large amount of imagery, much of which is
redundant. The interesting features of the scene are the 3D structural data, moving objects, and scenery changes. Mosaic-based
scene representations work well in detecting and modeling these features while greatly reducing the amount of
storage required to store a scene. In the past, mosaic-based methods have worked well for video sequences with straight
camera paths in a dominant motion direction11. Here we expand on this method to handle circular camera motion. By
using a polar transformation about the center of the scene, we are able to transform circular motion into an approximate
linear motion. This allows us to employ proven 3D reconstruction and moving object detection methods that we have
previously developed. Once features are found, they only need to be transformed back to the Cartesian space from the
polar coordinate system.
For evaluating the contents of trucks, containers, cargo, and passenger vehicles by a non-intrusive gamma-ray or X-ray imaging system to determine the possible presence of contraband, three-dimensional (3D) measurements could provide more information than 2D measurements. In this paper, a linear pushbroom scanning model is built for such a commonly used gamma-ray or x-ray cargo inspection system. Accurate 3D measurements of the objects inside a cargo can be
obtained by using two such scanning systems with different scanning angles to construct a pushbroom stereo system. A simple but robust calibration method is proposed to find the important parameters of the linear pushbroom sensors. Then, a fast and automated stereo matching algorithm based on free-form deformable registration is developed to obtain 3D measurements of the objects under inspection. A user interface is designed for 3D visualization of the objects in interests. Experimental results of sensor calibration, stereo matching, 3D measurements and visualization of a 3D cargo container and the objects inside, are presented.
We propose a content-based 3D mosaic (CB3M) representation for long video sequences of 3D and dynamic scenes captured by a camera on a mobile platform. The motion of the camera has a dominant direction of motion (as on an airplane or ground vehicle), but 6 DOF motion is allowed. In the first step, a set of parallel-perspective (pushbroom) mosaics with varying viewing directions is generated to capture both the 3D and dynamic aspects of the scene under the camera coverage. In the second step, a segmentation-based stereo matching algorithm is applied to extract parametric representations of the color, structure and motion of the dynamic and/or 3D objects in urban scenes where a lot of planar surfaces exist. Multiple pairs of stereo mosaics are used for facilitating reliable stereo matching, occlusion handling, accurate 3D reconstruction and robust moving target detection. We use the fact that all the static objects obey the epipolar geometry of pushbroom stereo, whereas an independent moving object either violates the epipolar geometry if the motion is not in the direction of sensor motion or exhibits unusual 3D structures. The CB3M is a highly compressed visual representation for a very long video sequence of a dynamic 3D scene. More importantly, the CB3M representation has object contents of both 3D and motion. Experimental results are given for the CB3M construction for both simulated and real video sequences to show the accuracy and effectiveness of the representation.
A growing number of law enforcement applications, especially in the areas of border security, drug enforcement and anti- terrorism require high-resolution wide area surveillance from unmanned air vehicles. At the University of Massachusetts we are developing an aerial reconnaissance system capable of generating high resolution, geographically registered terrain models (in the form of a seamless mosaic) in real-time from a single down-looking digital video camera. The efficiency of the processing algorithms, as well as the simplicity of the hardware, will provide the user with the ability to produce and roam through stereoscopic geo-referenced mosaic images in real-time, and to automatically generate highly accurate 3D terrain models offline in a fraction of the time currently required by softcopy conventional photogrammetry systems. The system is organized around a set of integrated sensor and software components. The instrumentation package is comprised of several inexpensive commercial-off-the-shelf components, including a digital video camera, a differential GPS, and a 3-axis heading and reference system. At the heart of the system is a set of software tools for image registration, mosaic generation, geo-location and aircraft state vector recovery. Each process is designed to efficiently handle the data collected by the instrument package. Particular attention is given to minimizing geospatial errors at each stage, as well as modeling propagation of errors through the system. Preliminary results for an urban and forested scene are discussed in detail.
This paper presents a fractal-based method for natural scene image segmentation. The main goal is to find artifact objects from complex natural scene. We propose a set of fractal measurements in order to acquire various aspects of roughness of each part of an image. The performance of the data fitting in the box-dimension estimation is analyzed and an improved algorithm is proposed. Experiments prove that the proposed approach is suitable for texture segmentation and artifact object finding in natural environment images.
Correlation of images has been used less in vision-based navigation. In this paper we present a novel method to estimate the orientation of the vehicle relative to the roadway by using image sequence. A parallel correlation algorithm is used to detect the difference between the current view of the roadway with its next view, and the orientation of the vehicle can be estimated reliably in real time. In order to account for figure variation caused by 3-D dynamic environment and perspective effect exerted by the camera system, a weighted correlation method has been developed based on planar motion assumption and reprojection transformation. This method has been implemented on PIPE, a pipelined image processing system, and tested in our campus roadway in combination with optical flow method for the detecting of moving objects. The advantage of the method is that the motion parameters can be extracted reliably without prerequisite for image sequence and no special road model is needed, as it adapts itself to rather complicated situations with other objects sharing the same environment.