In this paper, we present a pipeline and prototype vision system for near-real-time semantic segmentation and classification of objects such as roads, buildings, and vehicles in large high-resolution wide-area real-world aerial LiDAR point-cloud and RGBD imagery. Unlike previous works, which have focused on exploiting ground- based sensors or narrowed the scope to detecting the density of large objects, here we address the full semantic segmentation of aerial LiDAR and RGBD imagery by exploiting crowd-sourced labels that densely canvas each image in the 2015 Dublin dataset.1 Our results indicate important improvements to detection and segmentation accuracy with the addition of aerial LiDAR over RGB imagery alone, which has important implications for civilian applications such as autonomous navigation and rescue operations. Moreover, the prototype system can segment and search geographic areas as big as 1km2 in a matter of seconds on commodity hardware with high accuracy (_ 90%), suggesting the feasibility of real-time scene understanding on small aerial platforms.
We introduce a new approach for designing deep learning algorithms for computed tomography applications. Rather than training generically-structured neural network architectures to equivalently perform imaging tasks, we show how to leverage classical iterative-reconstruction algorithms such as Newton-Raphson and expectation- maximization (EM) to bootstrap network performance to a good initialization-point, with a well-understood baseline of performance. Specifically, we demonstrate a natural and systematic way to design these networks for both transmission-mode x-ray computed tomography (XRCT) and emission-mode single-photon computed tomography (SPECT), highlighting that our method is capable of preserving many of the nice properties, such as convergence and understandability, that is featured in classical approaches. The key contribution of this work is a formulation of the reconstruction task that enables data-driven improvements in image clarity and artifact reduction without sacrificing understandability. In this early work, we evaluate our method on a number of synthetic phantoms, highlighting some of the benefits and difficulties of this machine-learning approach.
We present a new data-driven technique for non-invasive electronic imaging of cardiovascular tissues using routinely-measured body-surface electrocardiogram (ECG) signals. While traditional ECG imaging and 3D reconstruction algorithms typically rely on a combination of linear Fourier theory, geometric and parametric modeling, and invasive measurements via catheters, we show in this work that it is possible to learn the complicated inverse map, from body-surface potentials to epicardial or endocardial potentials, by exploiting the powerful approximation properties of neural networks. The key contribution here is a formulation of the inverse problem that allows historical data to be leveraged as ground-truth for training the inverse operator. We provide some initial experiments, and outline a path for extending this technique for real-time diagnostic applications.
Despite the large availability of geospatial data, registration and exploitation of these datasets remains a persis- tent challenge in geoinformatics. Popular signal processing and machine learning algorithms, such as non-linear SVMs and neural networks, rely on well-formatted input models as well as reliable output labels, which are not always immediately available. In this paper we outline a pipeline for gathering, registering, and classifying initially unlabeled wide-area geospatial data. As an illustrative example, we demonstrate the training and test- ing of a convolutional neural network to recognize 3D models in the OGRIP 2007 LiDAR dataset using fuzzy labels derived from OpenStreetMap as well as other datasets available on OpenTopography.org. When auxiliary label information is required, various text and natural language processing filters are used to extract and cluster keywords useful for identifying potential target classes. A subset of these keywords are subsequently used to form multi-class labels, with no assumption of independence. Finally, we employ class-dependent geometry extraction routines to identify candidates from both training and testing datasets. Our regression networks are able to identify the presence of 6 structural classes, including roads, walls, and buildings, in volumes as big as 8000 m3 in as little as 1.2 seconds on a commodity 4-core Intel CPU. The presented framework is neither dataset nor sensor-modality limited due to the registration process, and is capable of multi-sensor data-fusion.
In this paper we present methods for scene understanding, localization and classification of complex, visually
heterogeneous objects from overhead imagery. Key features of this work include: determining boundaries of objects
within large field-of-view images, classification of increasingly complex object classes through hierarchical
descriptions, and exploiting automatically extracted hypotheses about the surrounding region to improve classification
of a more localized region. Our system uses a principled probabilistic approach to classify increasingly
larger and more complex regions, and then iteratively uses this automatically determined contextual information
to reduce false alarms and misclassifications.
This paper describes an extension of the Minimum Sobolev Norm interpolation scheme to an approximation
scheme. A fast implementation of the MSN interpolation method using the methods for Hierarchical Semiseparable
(HSS) matrices is described and experimental results are provided. The approximation scheme is
introduced along with a numerically stable solver. Several numerical results are provided comparing the interpolation
scheme, the approximation scheme and Thin Plate Splines. A method to decompose images into smooth
and rough components is presented. A metric that could be used to distinguish edges and textures in the rough
component is also introduced. Suitable examples are provided for both the above.
We have investigated adaptive mechanisms for high-volume transform-domain data hiding in MPEG-2 video
which can be tuned to sustain varying levels of compression attacks. The data is hidden in the uncompressed domain
by scalar quantization index modulation (QIM) on a selected set of low-frequency discrete cosine transform
(DCT) coefficients. We propose an adaptive hiding scheme where the embedding rate is varied according to the
type of frame and the reference quantization parameter (decided according to MPEG-2 rate control scheme) for
that frame. For a 1.5 Mbps video and a frame-rate of 25 frames/sec, we are able to embed almost 7500 bits/sec.
Also, the adaptive scheme hides 20% more data and incurs significantly less frame errors (frames for which the
embedded data is not fully recovered) than the non-adaptive scheme. Our embedding scheme incurs insertions
and deletions at the decoder which may cause de-synchronization and decoding failure. This problem is solved
by the use of powerful turbo-like codes and erasures at the encoder. The channel capacity estimate gives an idea
of the minimum code redundancy factor required for reliable decoding of hidden data transmitted through the
channel. To that end, we have modeled the MPEG-2 video channel using the transition probability matrices
given by the data hiding procedure, using which we compute the (hiding scheme dependent) channel capacity.
In this paper we study steganalysis, the detection of hidden data. Specifically we focus on detecting data hidden in grayscale images with spread spectrum hiding. To accomplish this we use a statistical model of images and estimate the detectability of a few basic spread spectrum methods. To verify the results of these findings, we create a tool to discriminate between natural "cover" images and "stego" images (containing hidden data) taken from a diverse database. Existing steganalysis schemes that exploit the spatial memory found in natural images are particularly effective. Motivated by this, we include inter-pixel dependencies in our model of image pixel probabilities and use an appropriate statistical measure for the security of a steganography system subject to optimal hypothesis testing. Using this analysis as a guide, we design a tool for detecting hiding on various spread spectrum methods. Depending on the method and power of the hidden message, we correctly detect the presences of hidden data in about 95% of images.
Print-scan resilient data hiding finds important applications in document security, and image copyright protection. In this paper, we build upon our previous work on print-scan resilient data hiding with the goal of providing a mathematical foundation for computing information-theoretic limits, and guiding design of more complicated hiding schemes allowing higher volume of embedded data. A model for print-scan process is proposed, which has three main components: a) effects due to mild cropping, b) colored high-frequency noise, and c) non-linear effects. It can be shown that cropping introduces unknown but smoothly varying phase shift in the image spectrum. A new hiding method called Differential Quantization Index Modulation (DQIM) is proposed in which, information is hidden in the phase spectrum of images by quantizing the difference in phase of adjacent frequency locations. The unknown phase shift would get cancelled when the difference is taken. Using the proposed DQIM hiding in phase, we are able to survive the print-scan process with several hundred information bits hidden into the images.