This PDF file contains the front matter associated with SPIE Proceedings Volume 10645 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
The electrical power consumption of an area is indicative of industrial activities of high power-consumption facilities. Research on detection and characterization of loads on AC lines has shown that magnetic field sensors at a fixed point can be used to determine the electrical current flowing through a line as a means of determining power consumption versus time. The relative frequency harmonic power and phase content of the current flow can distinguish between types of electrical loads (i.e., resistive or inductive) and changes in those loads. Coupled with an understanding of the line geometry (conductors and cross-sectional location) in the modeling, we can model the geographic distribution of the fields from a given current source. Experimentally, we collect multiple axis magnetic field data from moving vehicles with GPS time and location data for the data recordings. The collection regions include rural, interstate, and suburban areas with both overhead and buried power lines contributing to the signals. We analyze the data using ArcGIS to visualize the geospatial content and compare qualitatively and quantitatively the power levels to data layers such as the area land use. We examine the 60 Hz fundamental frequency, harmonic and non-harmonic signals, and compare the results to 2-D and 3-D modeling tools using known power line conductors. We discuss the effects of the time-varying presence of vehicles in modifying the detected signals as well as the changes in the spectral information over time.
One challenging problem in many remote sensing applications is identifying building footprints in 2D and/or 3D imagery. Existing solutions to this problem use a variety of sensing modalities as input. Recent public challenges have yielded high quality building footprint detection algorithms using high-resolution 2D and 3D imaging modalities as input. However, performance of many of these algorithms is typically degraded as the fidelity and post spacing of the input imagery is reduced. Other challenges use lower resolution 2D satellite imagery alone. The United States Special Operations Command (USSOCOM) sponsored a public prize challenge aimed at identifying building footprints using 2D RGB orthorectified imagery and coincident 3D Digital Surface Models (DSMs) created from commercial satellite imagery. The top 6 winning solutions have been made publicly available as open source software. This paper summarizes the public challenge and provides results and data analysis. In addition, we provide lessons learned and hope to encourage additional research by publicly releasing the benchmark dataset to the community.
Monitoring the level and type of power consumption in an area over a period of time by mapping of field strengths presents detailed information of human activities and the presence of facilities. Power grid usage traditionally has been collected for subsequent viewing at a few fixed locations. Transitioning from collect-and-view to real-time geospatial analytics over a continuous spatial coverage requires making more extensive use of moving sensors. Unmanned airborne systems (UAS) provide mobility in three dimensions, but also present noise issues and severe weight constraints. We discuss our work with collection of multi-axis magnetic and electric field data from a quadcopter UAS. We model collection physical sensor geometries as well as sensor electronics in order to discuss the performance trade-offs. We collect electromagnetic data at several heights above the ground plane of the target sites and calculate the fundamental and harmonic frequency power of the data as well as the self-sense noise. We analyze the data using ArcGIS for visualization of power “hot spots” for different combinations of power spectral data and compare the results to 3-D modeling tools to estimate the magnetic and electric field strengths. We discuss the results of our experiments in using the UAS to perform advanced processing on board in real time in order to initiate cross-cueing of sensors.
Transmission and analysis of imagery for law enforcement and military missions is often constrained by the capacity of available communications channels. Nevertheless, achieving success in operational missions requires acquisition and analysis of imagery that satisfies specific interpretability requirements. By expressing these requirements in terms of the National Imagery Interpretability Ratings Scale (NIIRS), we have developed a method for predicting the NIIRS loss associated with various methods and levels of imagery compression. Our method, known as the Compression Degradation Image Function Index (CoDIFI) framework automatically predicts the NIIRS degradation associated with the specific image compression method and level of compression. In this paper, we first review NIIRS and methods for predicting it followed by the presentation of the CoDIFI framework and we put our emphasis on the results of the empirical validation experiments. By leveraging CoDIFI in operational settings, our goal is to ensure mission success in terms of the NIIRS level of imagery data delivered to users, while optimizing the use of scarce data transmission capacity.
The use of LIDAR (Light Imaging, Detection and Ranging) data for detailed terrain mapping and object recognition is becoming increasingly common. While the rendering of LIDAR imagery is expressive, there is a need for a comprehensive performance metric that presents the quality of the LIDAR image. A metric or scale for quantifying the interpretability of LIDAR point clouds would be extremely valuable to support image chain optimization, sensor design, tasking and collection management, and other operational needs. For many imaging modalities, including visible Electro-optical (EO) imagery, thermal infrared, and synthetic aperture radar, the National Imagery Interpretability Ratings Scale (NIIRS) has been a useful standard. In this paper, we explore methods for developing a comparable metric for LIDAR. The approach leverages the general image quality equation (IQE) and constructs a LIDAR quality metric based on the empirical properties of the point cloud data. We present the rationale and the construction of the metric, illustrating the properties with both measured and synthetic data.
Maritime collisions involving multiple ships are considered rare, but in 2017 several United States Navy vessels were involved in fatal at-sea collisions that resulted in the death of seventeen American Servicemembers. The experimentation introduced in this paper is a direct response to these incidents. We propose a shipboard Collision-At-Sea avoidance system, based on video image processing, that will help ensure the safe stationing and navigation of maritime vessels. Our system leverages a convolutional neural network trained on synthetic maritime imagery in order to detect nearby vessels within a scene, perform heading analysis of detected vessels, and provide an alert in the presence of an inbound vessel. Additionally, we present the Navigational Hazards - Synthetic (NAVHAZ-Synthetic) dataset. This dataset, is comprised of one million annotated images of ten vessel classes observed from virtual vessel-mounted cameras, as well as a human “Topside Lookout” perspective. NAVHAZ-Synthetic includes imagery displaying varying sea-states, lighting conditions, and optical degradations such as fog, sea-spray, and salt-accumulation. We present our results on the use of synthetic imagery in a computer vision based collision-at-sea warning system with promising performance.
Target tracking derived from motion imagery enables automated activity analysis. In this paper, we develop methods for automatically exploiting the track data to detect and recognize activities, develop models of normal behavior, and detect departure from normalcy. We have developed methods for representing activities through syntactic analysis of the track data, by “tokenizing” the track, i.e. converting the kinematic information into strings of symbols amenable to further analysis. The syntactic analysis of target tracks is the foundation for constructing an expandable “dictionary of activities.” Through unsupervised learning on the syntactic representations, we discover the canonical activities in a corpus of motion imagery data. The probability distribution of the learned activities is the “dictionary”. Newly acquired track data is compared to the dictionary to flag atypical behaviors as departures from normalcy. We demonstrate the methods with relevant data.
Motion imagery can be used in some circumstances as a source of data for reconstructing three-dimensional (3D) representations of targets and objects in the scene. This research explores the utility of using motion image data collection simulation for input to a 3D target reconstruction algorithm based on structure-from-motion. The use of simulated video is advantageous for testing potential collections on targets or geographical areas for which real video and images may be unavailable. Examples are provided of tests of angular sampling and occlusion, degradation of input imagery through blurring, and measurement of their effects on 3D reconstruction quality.
The interest in computer vision has grown at a profound rate in recent years. This interest has brought numerous state-of-the-art approaches to all aspects of computer vision, from object detection and recognition in still imagery to action recognition and navigation in driverless vehicles. However, when applied to DoD-relevant real-world data, these approaches struggle to produce the quality of results seen in academic datasets. To assist with the issue of dealing with real-world data, video quality assessment algorithms are often used to understand the difficulty of a particular dataset and may provide guidance to algorithm selection, bandwidth requirements, and other information pertinent to the automatic analysis of imagery and video. In this work we study the aggregation of motion estimation on video frames and image quality metrics on still frames for the automatic assessment of video quality. We study several state-of-the-art optical flow algorithms as well as commonly used image quality algorithms and methods to combine the two to form an aggregate video quality score. We test our approach on real-world videos and compare the combined results to the original scores to study the efficacy of our approach.
Multi-object tracking is one of the most challenging problem among computer vision applications due to computational cost, partial or full occlusions, crowded scenes, and etc. It has many real-life applicable uses from surveillance to video analysis and video summarization. In this paper, We propose a hybrid tracking-by-detection system that combines local and global data association scheme to ensure efficiency and reduce complexity. In local data association, spacial and appearance modules are used to ensure first step assignment for the strongest object matching. Then tracklet linking is applied during global data association step after filtering out all unreliable and distractor hypotheses using spacial, temporal and appearance descriptors. Our framework can handle the appearance of new objects, temporal disappearance, object terminations, and object occlusions. Our experiments on MOT16 dataset 1 that consisting of challenging real-world videos shows the integration between local and global data association is important and having promising performance.
This paper presents recommended principles and processes for the Quality Assurance (QA) and Quality Control (QC) of estimators and their outputs in Geolocation Systems. Relevant estimators include both batch estimators, such as Weighted Least Squares (WLS) estimators, and (near) real-time sequential estimators, such as Kalman filters. The estimators typically solve for (estimate) the value of a state vector X_true containing 3d geolocations and/or corrections to the sensor metadata corresponding to the measurements supplied to the estimator. Along with a best estimate X of X_true, the estimator outputs predicted accuracy, typically an error covariance matrix CovX corresponding to the error in the solution X. It is essential that the estimator output a reliable and near-optimal estimate X as well as a reliable error covariance matrix CovX. This paper presents various procedures, including detailed algorithms, to help ensure that this is the case, and if not, flag the problem along with supporting metrics. The majority of the QA/QC procedures involve data internal to the estimator, such as measurement residuals, and can be built-in to the estimator. Examples include measurement editing, solution convergence detection, and confidence interval tests based on the reference variance.
Reconstruction of 3D objects from UAV EO imagery yields useful information, but can be time consuming and computationally expensive. View planning reduces processing time by selecting the optimal image set needed to reconstruct a scene. This paper demonstrates how view planning is used in a targeted manner to select a subset of images from a large existing image set to model specific vehicles or structures. Potential applications of the method include enabling 3D target classification algorithms and rapid geo-location. The method could also facilitate on-board reconstruction. The view planning algorithm is tested on five different targets, and is shown to reduce processing time for target models by up to a factor of 50 with little decrease in accuracy.
Bayesian state estimators, unlike maximum likelihood estimators, generate a state estimate and a probability density function (PDF) representing the predicted uncertainty of the estimate. While it is relatively straightforward to determine the accuracy of the state estimate, verifying the accuracy of the predicted uncertainty is more difficult, especially when the uncertainty is time-varying. In this work, we review two prior techniques that verify the predicted uncertainty of an estimator. We show that each technique verifies the accuracy of the estimator’s uncertainty by checking if normalized state estimates follow a chi-squared distribution. If these normalized samples do not follow the correct chi-squared distribution, one can conclude that the predicted uncertainty is unreliable. In this work, we propose to use goodness-of-fit tests to determine if normalized state estimates do not follow the correct distribution. Our results demonstrate that one of the prior techniques achieves superior performance if the true uncertainty is Gaussian. When the true uncertainty is non-Gaussian, however, our proposed goodness-of-fit method demonstrates higher discriminative power.
Much progress has been made in recent years in almost every research area within computer vision. This has led to an increased interest in applying computer vision algorithms to real-world problems, such as robot navigation, driver-less cars, and first-person video analysis. However, in each of these real-world applications, there are still significant challenges in processing degraded data, particularly when estimating motion from a single camera, which is commonly solved using optical flow. Previous studies have shown that state-of-the-art optical flow methods fail under realistic conditions of added noise, compression artifacts, and other types of degradations. In this paper we investigate strategies to improve the robustness of optical flow to these degradations by using the degradations and data augmentations in the training and fine-tuning stages of deep learning approaches to optical flow. We test these strategies using real and simulated data and attempt to illuminate this important area of research to the community.
Imagery acquired by airborne sensors is used to address many different tasks in various fields of application. Many of those tasks require the imagery to be georeferenced, i.e. providing a relation between the image coordinate of an image pixel and the real world coordinate of the location on the earth's surface it represents. The georeference of airborne imagery is usually implemented via GPS and INS sensors on board the sensor platform, but potential problems, such as transmission problems, jamming or temporal sensor malfunction, together with a potentially poor knowledge of ground elevation, can render location information accuracy less than sufficient for a given task. We established an image registration workflow which has the capability to improve the georeference of an image in such cases by matching it with a reference image with a satisfying georeference accuracy, i.e. an image covering the same area at a similar resolution. This is achieved by four steps in which an object extraction step is followed by a contour extraction step, which is then followed by a contour point reduction step and finally by a contour matching step. This approach has proven to be both feasible and robust to appearance unsimilarity between the image and the reference image. As each step of the workflow has well defined interfaces for both their input and output, we can easily exchange the methods implementing the operation to be performed in the respective step. This allows us to easily and efficiently evaluate different methods for these operations. The scope of this work was both the implementation of a new method for the Transformation Estimation step, namely the Downhill Simplex Method in Multidimensions, and the systematic analysis of the quantitive influence of different methods and their parametrizations for the Contour Point Number Reduction and the Transformation Estimation steps on the accuracy of the georeference.
We describe the challenges and capabilities of implementing a fast, efficient georegistration system on low-power embedded systems. The input to this are 2-D images and refined camera metadata sensor measurements, the output are registered aerial images that can be used in image tracking and 3D reconstruction. We propose future application in real-time on-device processing, given initial speeds on embedded systems. Because the method used doesn’t rely on image-to-image feature correspondences as in other methods, the computation is significantly faster and improved upon further by GPU programming.
In this study, we devise a method for summarizing geospatial content of aerial imagery using mosaicking. Mosaicking has been a popular method for combining large sequence of image information and providing a wide field of view in computer vision. We propose a feature based registration method which brings the complexity and homogeneity nature of maize data under consideration and formulates a two step registration method which reduces error accumulation acquired through traditional frame-to-frame registration. Experimental result shows good quality mosaic generation without any metadata information available for long sequence of aerial video. This geospatial summarization of crop field data can be helpful for phenotypic analysis, monitoring of field, checking movement and growth of plants, change in field dynamics, etc.
Local feature matching has been proven to be successful for computer vision tasks such as Structure-fromMotion (SfM) and 3D reconstruction. Reliability of features in terms of being precisely detected and persistently matched along a sequence can have a great impact on the quality of the SfM and even on its convergence. Since many feature detectors and descriptors are exclusively designed for specific applications, it is important to find a feature detector-descriptor combination that performs well for SfM. In this paper we evaluate the quality of different image features such as FAST,1 SIFT,2 SURF,3 and BRISK4 and their effects on the Structure-fromMotion performance. To do this end, we design and perform two evaluation procedures to assess a feature matching result on a wide area motion imagery dataset. A matching result is represented in the form of feature track and a track is a collection of continuously matched feature points along the sequence. First we use the concept of Epipolar geometry to measure errors in each correspondence (matching pair). The distance from a matched feature point to the corresponding epipolar line is measured as the error metric. Second, we compute an optimized metadata from SfM using feature matching tracks and then compare it with the ground truth metadata for evaluation. Experimental results demonstrate that SURF detector combined with SURF descriptor generates the longest feature tracks while FAST detector plus SIFT descriptor produces the highest matching precision.
This paper proposes an improved solution to image-based three-dimensional (3D) modeling (also known as ”multi-view stereo”) that outputs surfaces visible in high-resolution wide-area format video also known as widearea motion imagery (WAMI) consisting of a dense set of small 3D points. The improved approach, named 3D patch-based multi-view stereo, is an expansion of PMVS1 and is implemented also as a match, expand, and filter procedure. This approach takes a sequence of image frames and corresponding camera parameters together with a sparse set of matched feature points. As an initial step, it formulates a small 3D patch for each of the matched feature points. It then finds the best fitted curved surface inside the 3D patch based on the photometric consistency of each 3D point inside. Expansion and filtering procedures are then recursively applied on those initial surfaces until a certain percentage of image coverage is achieved. The proposed solution is able to precisely preserve small details and automatically detect and discard outliers. Moreover this approach does not require any initialization in the form of a visual hull, a bounding box, or valid depth ranges. We have tested our algorithm on various data sets including single object with fine surface details, and outdoor occluded extremely large WAMI dataset, where moving or static obstacles appear in front of static structures of interest and large areas of repetitive texture are present.
Designing a robust and accurate object tracker is important in many computer vision applications. The problem becomes more complicated when additional factors like changing appearance, illumination, and scale are introduced in the sequence. Recently, trackers that are based on the correlation filter method like Sum of Template and Pixel-wise Learners (STAPLE)1 have shown state-of-the-art short-term tracking performance. STAPLE consists two major modules: learning correlation filter on HOG features and representing color information using RGB histogram. In this paper, we propose an improved STAPLE (iSTAPLE) tracker by adding the Color Names (CN)2 to the correlation part of the tracker. CN complements HOG feature because using only HOG can lead to tracking failures in some cases where occlusion or deformation is present. As the color information could be a confusing factor and unreliable in tracking due to the rapid illumination changes, Bhattacharyya distance is used to measure the color similarity between the target and surrounding area to decide whether the color information is helpful. Since we use multiple feature cues to improve tracking performance, a robust approach to fuse multiple features is required. To fully utilize all features and optimize the tracking result, numerous weight combinations assigned to each feature are tested. We show through comprehensive experiments on the VOT Challenge 2016 dataset3 that iSTAPLE obtains a gain of 25% in tracking robustness.