In this paper, we present a pipeline and prototype vision system for near-real-time semantic segmentation and classification of objects such as roads, buildings, and vehicles in large high-resolution wide-area real-world aerial LiDAR point-cloud and RGBD imagery. Unlike previous works, which have focused on exploiting ground- based sensors or narrowed the scope to detecting the density of large objects, here we address the full semantic segmentation of aerial LiDAR and RGBD imagery by exploiting crowd-sourced labels that densely canvas each image in the 2015 Dublin dataset.1 Our results indicate important improvements to detection and segmentation accuracy with the addition of aerial LiDAR over RGB imagery alone, which has important implications for civilian applications such as autonomous navigation and rescue operations. Moreover, the prototype system can segment and search geographic areas as big as 1km2 in a matter of seconds on commodity hardware with high accuracy (_ 90%), suggesting the feasibility of real-time scene understanding on small aerial platforms.
We introduce a new approach for designing deep learning algorithms for computed tomography applications. Rather than training generically-structured neural network architectures to equivalently perform imaging tasks, we show how to leverage classical iterative-reconstruction algorithms such as Newton-Raphson and expectation- maximization (EM) to bootstrap network performance to a good initialization-point, with a well-understood baseline of performance. Specifically, we demonstrate a natural and systematic way to design these networks for both transmission-mode x-ray computed tomography (XRCT) and emission-mode single-photon computed tomography (SPECT), highlighting that our method is capable of preserving many of the nice properties, such as convergence and understandability, that is featured in classical approaches. The key contribution of this work is a formulation of the reconstruction task that enables data-driven improvements in image clarity and artifact reduction without sacrificing understandability. In this early work, we evaluate our method on a number of synthetic phantoms, highlighting some of the benefits and difficulties of this machine-learning approach.
We present a new data-driven technique for non-invasive electronic imaging of cardiovascular tissues using routinely-measured body-surface electrocardiogram (ECG) signals. While traditional ECG imaging and 3D reconstruction algorithms typically rely on a combination of linear Fourier theory, geometric and parametric modeling, and invasive measurements via catheters, we show in this work that it is possible to learn the complicated inverse map, from body-surface potentials to epicardial or endocardial potentials, by exploiting the powerful approximation properties of neural networks. The key contribution here is a formulation of the inverse problem that allows historical data to be leveraged as ground-truth for training the inverse operator. We provide some initial experiments, and outline a path for extending this technique for real-time diagnostic applications.
Despite the large availability of geospatial data, registration and exploitation of these datasets remains a persis- tent challenge in geoinformatics. Popular signal processing and machine learning algorithms, such as non-linear SVMs and neural networks, rely on well-formatted input models as well as reliable output labels, which are not always immediately available. In this paper we outline a pipeline for gathering, registering, and classifying initially unlabeled wide-area geospatial data. As an illustrative example, we demonstrate the training and test- ing of a convolutional neural network to recognize 3D models in the OGRIP 2007 LiDAR dataset using fuzzy labels derived from OpenStreetMap as well as other datasets available on OpenTopography.org. When auxiliary label information is required, various text and natural language processing filters are used to extract and cluster keywords useful for identifying potential target classes. A subset of these keywords are subsequently used to form multi-class labels, with no assumption of independence. Finally, we employ class-dependent geometry extraction routines to identify candidates from both training and testing datasets. Our regression networks are able to identify the presence of 6 structural classes, including roads, walls, and buildings, in volumes as big as 8000 m3 in as little as 1.2 seconds on a commodity 4-core Intel CPU. The presented framework is neither dataset nor sensor-modality limited due to the registration process, and is capable of multi-sensor data-fusion.