PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12525, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Satellite-based remote sensing imagery is an effective means for detecting objects and structures in support of many applications. However, detecting the spatial and temporal bounds of a specific activity in satellite imagery is inherently more complex and research in this area is nascent. One reason for this is that describing an activity implies defining both spatial and temporal bounds and while activity is inherently continuous in nature, the geospatial (imagery) time series for any particular swath of ground provided by satellite imagery is relatively sparse and discrete in comparison. The IARPA Space-Based Machine Automated Recognition Technique (SMART)1 program is the first large-scale research program to target advancing the state of the art for automatically detecting, characterizing, and monitoring large-scale anthropogenic activity in global, multispectral satellite imagery. The program has two primary research objectives: 1) the “harmonization” of multiple imagery sources and 2) automated reasoning at scale to detect, characterize, and monitor activities of interest. This paper provides details on the goals, dataset, metrics, and lessons learned of the IARPA SMART program. By releasing the annotated dataset, the program aims to foster additional research in this area by the community at large.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Sustainable Development Goal (SDG) number 11 aims at making cities and human settlements more inclusive, safe, resilient, and sustainable. Complying with SDG 11 is a difficult task, especially when considering rural settlements where: (i) population settles in a dispersed manner; and (ii) geography complexity and social dynamics of the area make it difficult to monitor and capture data. One example of such areas can be found in the South-West of Colombia, in the Las Piedras River sub-basin. The National Administrative Department of Statistics in Colombia (DANE in Spanish) aims at mapping the population and houses in dispersed and difficult-to-access rural settlements in an accurate and continuous way. Nevertheless, there are several difficulties (derived from the in-situ way of collecting the data) that prevent such data from being generated. This research presents a methodology to carry out an updated mapping of rural areas with high spatial resolution data coming from PlanetScope (3 m). Such a mapping considers the dynamics of housing growth, focusing on dispersed and difficult-to-access rural settlements. To this aim, Convolutional Neural Networks (CNNs) are used together with PlanetScope data, allowing to account for average houses size (≥12 𝑚2) in the study area. Preliminary results show a detection accuracy above 95%, in average, according to geography complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Understanding a region’s socio-economic conditions can inform the development of policies in both the public and private sectors. Commercial satellite imagery provides socio-economic context. By combining commercial imagery with geospatially enhanced social media, we generate local measures of political and economic instability risks at a regional and national scale. We present models that generate instability estimates by fusing socio-economic contextual data from commercial imagery with high-tempo social media data. To assess model performance, we predict annual indicators of conditions for a country as assessed by the World Bank. The models relate model-derived features to indicators of political stability, control of corruption, rule of law, government effectiveness, voice and accountability, and gross domestic product using data from multiple countries. Comparison of our methods to the World Bank data demonstrate the strengths of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Las Piedras River sub-basin, located in the department of Cauca, Colombia, is very important for the region, especially for the capital (Popayán). This is because this sub-basin contributes around 68.17% of the water supply for the city. To guarantee continuity of this resource, good management of the Water Ecosystem Services (WES) must be carried out. To this aim, periodic environmental assessments of the water resource in the region are necessary. Such Environmental Assessment WES (EAWES) is possible when an accurate and up-to-date land cover map is available. However, obtaining such a product is quite complex due to the heterogeneous conditions both in the land cover and orography of the studied region. Another impacting factor is the weather conditions of the region, that make it difficult to access the areas and/or to acquire information for land cover mapping. This research proposes a robust model, based on deep learning and Sentinel2 satellite images, able to perform a land cover classification with reliable accuracy (>90%) at a low computational cost. A variant of a LeNet convolutional neural network has been used together with features extracted from original spectral bands, radiometric indices and a digital elevation map. Preliminary results show an overall accuracy of 95.49% in the training data and 96.51% in the validation one.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Geospatial Analytics II: SAR, Aerial, and Satellite imagery
Deep learning models trained on imbalanced datasets with long-tailed label distributions are biased towards the most frequently occurring classes, giving rise to a large number of false-negatives in the tail classes. In this paper, we utilize a cross-modal knowledge distillation framework with balanced sampling for learning SAR image classification from an Electro-Optical (EO) image classification model that is trained on co-registered data. Knowledge distillation from the soft outputs of the EO model aids in the effective training of the SAR model. However, a class balanced sampler adversely affects the performance of the head classes. To mitigate these negative effects, we propose Balanced Cross-KD to efficiently train the SAR model, end-to-end manner in a single stage, via a carefully crafted sampling strategy that strikes a balance between instance and class balanced sampling. Balanced Cross-KD performs training on a long-tailed EO/SAR dataset by alternating between instance and class balanced sampling at fixed intervals. Training utilizes our novel distributed entropy loss and equal diversity loss to encourage compact yet diverse prediction probabilities. Additionally, pretraining the SAR network on another SAR dataset is considered to obtain improved features and an ablation study further demonstrates the utility of each component of our model. Our Balanced Cross-KD model improves performance across the tail classes and increases overall mean per class accuracy, while minimally compromising performance for the head classes on a registered EO/SAR dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent advances in machine learning for geospatial imagery have facilitated image analysis for tasks such as building footprint extraction and urban land cover classification. The current state of the art semantic segmentation networks (including the many variants of the U-Net architecture) have shown promise for such tasks but have a shortcoming in that the networks utilize a loss function that is only computed per pixel. This precludes spatial context from being leveraged as part of the objective function during the training phase for the models. In this study, we propose a modified loss function for semantic segmentation networks that incorporates the spatial context from the ground truth images in efforts to improve building footprint extraction. Specifically, our approach uses neighborhood pixels to provide an adjustment factor for model training. In this work, we use imagery from the SpaceNet-2 dataset consisting of aerial images of buildings vs. landscape. We demonstrate that by adding spatial context to the loss function of semantic segmentation networks, the semantic features extracted by such networks are better aware of spatial context which can help the underlying segmentation task. Our experiments demonstrate both quantitative (e.g. via DICE scores) and qualitative (e.g. via more effective building footprint extraction) improvement to semantic segmentation networks when the proposed loss function is incorporated compared to when it is not. Using the proposed spatially aware loss function, the resulting U-Net converges faster than when using a standard binary cross entropy loss function. This improvement comes at no additional expense with regards to the amount of training data used, modification of model architecture or an increased number of parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Transformer models are demonstrating remarkable and emergent capabilities in the natural language processing domain. These models are bounded only by the availability of large training datasets. These datasets can be tractably obtained since natural language models are pre-trained using self-supervision in the form of token masking. Papers like He et al. and Cao et al. have recently shown the power of this token masking technique by utilizing masked autoencoders as scalable vision learners in combination with a self-supervised pre-training technique for vision transformer models. Feichtenhofer et al. extended these techniques to video, proving that masked autoencoders are scalable spatiotemporal learners as well. To our best knowledge, these techniques have only been experimented on ground-level, object-centric style imagery and video. Extending these techniques to remote or overhead imagery presents two significant problems. First, the size of objects of interest are small compared to the typical mask patch size. Second, the frames are not object centered. In this study, we explore if modern self-supervised pre-training techniques like masked auto encoding extend well to overhead wide area motion imagery (WAMI) data. We argue that modern pre-training techniques like MAE are well suited to WAMI data given the typical object size in this domain as well as the ability to leverage strong global spatial contextual information. To this end, we conduct a comprehensive exploration of different patch sizes and masking ratios on the popular WAMI dataset, WPAFB 2009. We find that domain-specific adjustments to these pre-training techniques result in downstream performance improvements on computer vision tasks including object detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiple object tracking (MOT) is a common computer vision problem that focuses on detecting objects and maintaining their identities through a sequence of image frames. Until now, there have been three main approaches to improve MOT performance: 1) improving the detector’s quality, 2) improving the tracker’s quality, or 3) creating novel approaches to jointly model detection and tracking. In this work, we argue that there is a fourth, simpler way to improve MOT performance, by fusing multiple multiple object trackers together. In this paper, we introduce a novel approach, TrackFuse, that aims to fuse the final tracks from two different models into a single output, similar to classification ensembling or weighted box fusion for object detection. The fundamental assumption of TrackFuse is that multiple trackers will fail uniquely, and similarly, multiple detectors will fail uniquely too. Thus, by fusing the output of multiple approaches to MOT, we can improve tracking performance. We test our approach on combinations of several high performing approaches to tracking and show state-of-the-art results on the MOTA metric on a held out validation set of the MOT17 dataset, compared to individual tracking models. Furthermore, we consistently show that fusing multiple object trackers provides a performance boost on multiple metrics compared to results of individual model outputs sent for fusion. Our code will be released soon.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the physical universe, truth for computer vision (CV) is impractical if not impossible to obtain. As a result, the CV community has resorted to qualitative practices and sub-optimal quantitative measures. This is problematic because it limits our ability to train, evaluate, and ultimately understand algorithms such as single image depth estimation (SIDE) and structure from motion (SfM). How good are these algorithms, individually and relatively, and where do they break? Herein, we discuss that while truth evades both the real and simulated (SIM) universes, a SIM CV gold-standard can be achieved. We outline an extensible SIM framework and data collection workflow using unreal engine with the Robot Operating System (ROS) for three-dimensional mapping on low altitude aerial vehicles. Furthermore, voxel-based mapping measures from algorithm output to a SIM gold-standard are discussed. The proposed metrics are demonstrated by analyzing performance across changes in platform context. Ultimately, the current article is a step towards an improved process for comparing algorithms, evaluating their strengths and weaknesses, and automating algorithm design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An open research question is how to best pair a human and agent (e.g., AI, autonomous) relative to a complex, multi-objective task in a dynamic and unknown partially observable environment. At the heart of this challenge resides even deeper questions like what AI is needed and how can bi-directional and multi-directional human-robot trust be established. In this paper, the theoretical framework for a simple 2D grid world-based cooperative search and rescue game is explored. The resultant prototype interface enables the study of human-robot interaction for human-robot teaming. First, the design and implementation of a prototype interface is discussed. A 2D grid-world was selected to simplify the investigation and eliminate confounding factors that arise in more complicated simulated 3D and real world experiments. Next, different types of autonomous agents are introduced, as they impact our studies and ultimately are an integral element of the underlying research question. This is followed by three levels of increasing complexity open-ended games, easy, medium, and hard. The current paper does not contain human experimentation results. That is the next step in this research. Instead, this article introduces, explains, and defends a set of design choices and working examples are provided to facilitate open discussion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Autonomous or semi-autonomous navigation of UAVs is of great interest in the Defense and Security domains, as it significantly improves their efficiency and responsiveness during operations. The perception of the environment and in particular the dense and metric 3D mapping in real time is a priority for navigation and obstacle avoidance. We therefore present our strategy to jointly estimate a dense 3D map by combining a sparse map estimated by a state-of-the-art Simultaneous Localization and Mapping (SLAM) system and a dense depth map predicted by a monocular self-supervised method. Then, a lightweight and volumetric multi-view fusion solution is used to build and update a voxel map.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real and Synthetic Data Collection and Applications
We introduce a novel collection planning and orchestration framework called Cognitive Tip and Cue (CTC). CTC consists of three components: a natural language processing module that extracts entities and collection intent from free-text, a sensor recommendation module that scores sensors modalities based on collection parameters and environmental context, and an optimization module that uses a multi-objective genetic algorithm to evolve optimal data provider selections. CTC learns an association between intelligence, surveillance and reconnaissance needs and the historical performance of similar requests. We demonstrate that CTC generates collection plans that appropriately account for collection constraints while selecting data providers that align with tradecraft intuition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object detection in 3D point clouds is essential in fields such as geospatial intelligence and autonomous driving. The common machine learning problem of scarce labeled training data is even more acute with 3D point cloud data. Active learning provides a framework to prioritize the additional effort to manually annotate unlabeled training data. Most active learning methods for deep learning fall into one of two categories: uncertainty methods and diversity methods. Uncertainty methods select data by assessing model outputs for their confidence and consistency and are therefore dependent on the expected output of each deep learning task. These methods tend to select batches of informative yet highly similar samples to label. Diversity-based active learning aims to create a labeled dataset that is both varied and representative of the remaining unlabeled data. Diversity methods operate directly on the feature representations of the inputs and are thus more flexible with respect to the specifics of the deep learning task. Our current work explores applying diversity methods and uncertainty-diversity hybrid methods to 3D object detection. We evaluate various approaches to incorporate diversity, including K-Medoids clustering, core set selection, and furthest nearest neighbors. We address the high dimensionality of the features extracted from a VoxelNet-based object detector by varying the distance metric used in the active learning algorithms. Furthermore, we compare our results to those obtained using only uncertainty methods. We assess the performance and efficiency of each active learning method in addition to the representativeness and diversity of the labeled datasets produced. We find that hybrid uncertainty-diversity methods outperform other methods in terms of object detection AP50 throughout active learning, annotation efficiency, and class balance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Monitoring of air pollutants across space and time is critical in understanding pollution trends and reporting air quality. The Air Quality Index (AQI) is a tool used to communicate air quality that incorporates atmospheric concentrations of five major pollution indicators: ground-level ozone, particulate matter, carbon monoxide, sulfur dioxide, and nitrogen dioxide. The ability to accurately forecast these concentrations and identify unusual levels is of particular importance. In this work, we develop a generative time series model for air quality indicators and use it for long and short-term probabilistic forecasts. Air quality data are multivariate and exhibit high variability across indicators in both space and time. Marginal indicator distributions are typically skewed and contain substantial zeros, while indicator-wise cross-correlations can be highly non-linear. We find that hourly measurements additionally exhibit substantial temporal cross-correlation, long-term dependence, and daily periodicity. To capture these complexities, we employ a recurrent extension of the variational autoencoder (VAE) to sequential data. The VAE is a generative neural network architecture capable of learning complex, high dimensional manifolds on which data are distributed. Furthermore, recurrent architectures can capture non-linear and long-term temporal qualities of time series data. We train the proposed time series model on historical air quality measurements at multiple locations and demonstrate its ability to capture observed indicator-wise and temporal complexities. We additionally use the trained model to compute probabilistic forecasts and credible intervals of air quality indicators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural networks (DNNs), enabled by massive open datasets like ImageNet, have produced impressive results in a wide range of fields and applications. ImageNet, a database of over 15 million high-resolution images categorized into 22,000 categories, has revolutionized the field of computer vision with state-of-the-art models achieving 98% accuracy. However, this performance comes at a cost. Recent advances in adversarial machine learning have revealed inherent vulnerabilities in DNN-based models. Adversarial patches have been successfully used to disrupt the performance of artificial intelligence (AI) systems that leverage DNN-based computer vision models, but the trade space of these attacks is not fully understood; adversarial attack generation and validation methods are still nascent. In this paper we explore the generation and performance of synthetically-trained attacks against models trained on real data like MSCOCO, VIRAT and VisDrone. Using a synthetic environment tool built on the Unreal Engine, we generate a synthetic dataset consisting of pedestrians and vehicles, train synthetic object detection models, and optimize adversarial patch attacks on the synthetic feature space of those models. We then apply our synthetic attacks to real image data and examine the efficacy of synthetic patch attacks against models trained on real-word image data. The implications of synthetically optimized attacks are broad: a much larger attack surface for DNN-based computer vision models, development of simulation-based validation pipelines, more effective attacks, and stronger defenses against adversarial examples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Complex human events are high-level human activities that are composed of a set of interacting primitive human actions over time. Complex human event recognition is important for many applications, including security surveillance, healthcare, sports and games. Complex human event recognition requires recognizing not only the constituent primitive actions but also, more importantly, their long range spatiotemporal interactions. To meet this requirement, we propose to exploit the self-attention mechanism in the Transformer to model and capture the long-range interactions among primitive actions. We further extend the conventional Transformer to a probabilistic Transformer in order to quantify the event recognition confidence and to detect anomaly events. Specifically, given a sequence of human 3D skeletons, the proposed model first performs primitive action localization and recognition. The recognized primitive human actions and their features are then fed into the probabilistic Transformer for complex human event recognition. By using a probabilistic attention score, the probabilistic Transformer can not only recognize complex events but also quantify its prediction uncertainty. Using the prediction uncertainty, we further propose to detect anomaly events in an unsupervised manner. We evaluate the proposed probabilistic Transformer on FineDiving dataset and Olympics Sports dataset for both complex event recognition and abnormal event detection. The dataset consists of complex events composed of primitive diving actions. The experimental results demonstrate the effectiveness and superiority of our method against baseline methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human action recognition is important for many applications such as surveillance monitoring, safety, and healthcare. As 3D body skeletons can accurately characterize body actions and are robust to camera views, we propose a 3D skeleton-based human action method. Different from the existing skeleton-based methods that use only geometric features for action recognition, we propose a physics-augmented encoder and decoder model that produces physically plausible geometric features for human action recognition. Specifically, given the input skeleton sequence, the encoder performs a spatiotemporal graph convolution to produce spatiotemporal features for both predicting human actions and estimating the generalized positions and forces of body joints. The decoder, implemented as an ODE solver, takes the joint forces and solves the Euler-Lagrangian equation to reconstruct the skeletons in the next frame. By training the model to simultaneously minimize the action classification and the 3D skeleton reconstruction errors, the encoder is ensured to produce features that are consistent with both body skeletons and the underlying body dynamics as well as being discriminative. The physics-augmented spatiotemporal features are used for human action classification. We evaluate the proposed method on NTU-RGB+D, a large-scale dataset for skeleton-based action recognition. Compared with existing methods, our method achieves higher accuracy and better generalization ability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a newly developed extended reality (XR) environment focused on qualitative and quantitative analysis and visualization of data on deforestation in urban areas and its impact on the area communities. The design and development process followed a user-centric approach that engaged researchers and practitioners. Using off-the-shelf technology such as Meta Quest headsets, the environment was developed in Unity, C#, and Python, and incorporates USGS GIS data layers. Other aspects such as affordability and accessibility were considered by acknowledging individuals with different learning styles and examining a new way to understand data. Compute and storage limitations brought on by the headset were overcome through data sampling and through offloading some of the computing tasks to a separate computer and transmission of the synthesized tasks back to the headset. Initial experiments focused on the ingestion of New York City area data. The region was chosen due to the population density, and the significant socio-economic disparities among various communities, but also due to the availability of ancillary data such as the one provided by the NYC Open Data that can be used to complement the USGS data. Urban and suburban areas were used to find indicators of vegetation and learn about the challenges associated with developing spatial data in different densities. The visualization also showed that while changes in deforestation over the past decade have been fairly uniform in both area types, sub-areas have seen a significant green space decrease. While the current XR environment is envisioned as the first step in the creation of a virtual interactive interface that shows predictive models of urban deforestation, it already constitutes an example of an educational approach to XR development. The code and system description will be made publicly available as Open Source and include mechanisms for community code contributions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reinforcement learning for agent autonomous actions requires many repetitive trials to succeed. The idea of this paper is to distribute the trials across a city-scale geospatial map. This has the advantage of providing rationale for the trial-totrial variance because each location is slightly different. The technique can simultaneously train the agent and deduce where difficult and potentially dangerous intersections exist in the city. The concept is illustrated using readily available open-source tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
From the point of view of satellite monitoring, construction objects include fixed objects of artificial origin (buildings and structures for various purposes), created from building materials and lying directly on the earth's surface. Someone can conditionally divide the life cycle of a construction object into initial, main and final stages. From the point of view of satellite, ground, aerial photography it is possible to distinguish building objects at the initial, main and final stages from each other. In this case, not all initial stages are exposed to the images, and additional conditions are required to distinguish the final stages in the images (the same applies to the main stages). We can produce the current stage of the life cycle of a building object for various reasons. The paper considers some features of deciphering construction objects at the initial and final stages of their life cycle, primarily in an emergency and abandoned state. Relevant types of construction objects are identified, the structure of decoding signs is determined, and decoding areas are established, we derive the signs themselves for different construction objects. An experiment on the detection of abandoned construction sites on several databases is given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.