Simulation has become an important enabler in the development and testing of autonomous ground vehicles (AGV), with simulation being used both to generate training data for AI/ML-based segmentation and classification algorithms and to enable in-the-loop testing of the AGV systems that use those algorithms. Furthermore, digital twins of physical test areas provide a safe, repeatable way to conduct critical safety and performance testing of these AI/ML algorithms and their performance on AGV systems. For both these digital twins and the sensor models that use them to generate synthetic data, it is important to understand the relationship between the fidelity of the scene/model and the accuracy of the resulting synthetic sensor data. This work presents a quantitative evaluation of the relationship between digital scene fidelity, sensor model fidelity, and the quality of the resulting synthetic sensor data, with a focus on camera data typically used on AGV to enable autonomous navigation.
Object detection in aerial images is a challenging task as some objects are only a few pixels wide, some objects are occluded, and some are in shade. With the cost of drones decreasing, there is a surge in the amount of aerial data, so it will be useful if models can extract valuable features from the aerial data. Convolutional neural networks (CNN) are a useful tool for object detection and machine learning applications. However, machine learning requires labeled data to train and test the CNN models. In this work, we used a simulator to automatically generate labeled synthetic aerial imagery to use in the training and testing of machine learning algorithms. The synthetic aerial data used in this work was developed using a physics-based software tool called Mississippi State University Autonomous Vehicle Simulator (MAVS). We generated a dataset of 871 aerial images of 640x480 resolution and implemented Keras-RetinaNet framework with ResNet 50 as backbone for object detection. Keras-RetinaNet is one of the popular object detection models to be used with aerial imagery. As a preliminary task, we detected buildings in the synthetic aerial imagery and our results show a high mAP (mean Average Precision) accuracy of 77.99% using the state-of-the-art RetinaNet model.
LiDAR-based 3D semantic segmentation is one of the most widely used perception methods to support scene understanding of self-driving vehicles. Most publicly available LiDAR datasets for driving scene segmentation, such as SemanticKITTI, nuScenes, and SemanticPOSS, provide only a single type of LiDAR configuration. Therefore, testing a trained model with a different channel configuration than the training dataset is sometimes inevitable in real-world applications. Despite the significance of this LiDAR channel mismatch problem in the machine learning pipeline, little research has focused on investigating the impact of the LiDAR configuration shift on a model’s test performance. This paper aims to provide referenceable baseline experiments for the LiDAR configuration shifts. We explore the effect of using different LiDAR channels when training and testing a 3D LiDAR point cloud semantic segmentation model, utilizing Cylinder3D for the experiments. A Cylinder3D model is trained and tested on simulated 3D LiDAR point cloud datasets created using the Mississippi State University Autonomous Vehicle Simulator (MAVS) and 32, 64 channel 3D LiDAR point clouds of the RELLIS-3D dataset collected in a real-world off-road environment. Our experimental results demonstrate that sensor and spatial domain shifts significantly impact the performance of LiDAR-based semantic segmentation models. In the absence of spatial domain changes between training and testing, models trained and tested on the same sensor type generally exhibited better performance. Moreover, higher-resolution sensors showed improved performance compared to those with lower-resolution ones. However, results varied when spatial domain changes were present. In some cases, the advantage of a sensor’s higher resolution led to better performance both with and without sensor domain shifts. In other instances, the higher resolution resulted in overfitting within a specific domain, causing a lack of generalization capability and decreased performance when tested on data with different sensor configurations.
Failures by autonomous ground vehicles (AGV) may be caused by many different factors in hardware, software, or integration. Effective safety and reliability testing for AGV is complicated by the fact that failures are not only infrequent but also difficult to diagnose. In this work, we will discuss the results of a three-phase project to develop a simulation-based approach to AGV architecture design, test implementation, and simulation integration. This approach features a modular AGV architecture, reliability testing with a physics-based simulator (the MSU Autonomous Vehicle Simulator, or MAVS), and validation with a limited number of field trials.
Autonomous driving in off-road environments is challenging as it does not have a definite terrain structure. Assessment of terrain traversability is the main factor in deciding the autonomous driving capability of the ground vehicle. Traversability in off-road environments is defined as the drivable track on the trails by different vehicles used in autonomous driving. It is very crucial for the autonomous ground vehicle (AGV) to avoid obstacles such as trees, boulders etc. while traversing through the trails. The goal of this research has three main objectives: a) collection of 2D camera data in the off-road / unstructured environment, b) annotation of 2D camera data depending on the vehicles’ ability to drive through the trails , and c) application of semantic segmentation algorithm on the labeled dataset to predict the trajectory based on the type of ground vehicle. Our models and labeled datasets will be publicly available.
Autonomous navigation (also known as self-driving) has rapidly advanced in the last decade for on-road vehicles. In contrast, off-road vehicles still lag in autonomous navigation capability. Sensing and perception strategies used successfully in on-road driving fail in the off-road environment. This is because on-road environments can often be neatly categorized both semantically and geometrically into regions like driving lane, road shoulder, and passing lane and into objects like stop sign or vehicle. The off-road environment is neither semantically nor geometrically tidy, leading to not only difficulty in developing perception algorithms that can distinguish between drivable and non-drivable regions, but also difficulty in the determination of what constitutes "drivable" for a given vehicle. In this work, the factors affecting traversability are discussed, and an algorithm for assessing the traversability of off-road terrain in real time is developed and presented. The predicted traversability is compared to ground-truth traversability metrics in simulation. Finally, we show how this traversability metric can be automatically calculated by using physics-based simulation with the MSU Autonomous Vehicle Simulator (MAVS). A simulated off-road autonomous navigation task using a real-time implementation of the traversability metric is presented, highlighting the utility of this approach.
Semantic Segmentation using convolutional neural networks is a trending technique in scene understanding. As these techniques are data-intensive, several devices struggle to store and process even a small batch of images at a time. Also, as the volume of training datasets required by the training algorithms is very high, it might be wise to store these datasets in their compressed form. Not only this, in order to correspond the limited bandwidth of the transmission network the images could be compressed before sending to the destination. Joint Photography Expert Group (JPEG) is a famous technique for image compression. However, JPEG introduces several unwanted artifacts in the images after compression. In this paper, we explore the effect of JPEG compression on the performance of several deep-learning-based semantic segmentation techniques for both the synthetic and real-world dataset at various compression levels. For some established architectures trained with compressed synthetic and real-world dataset, we noticed the equivalent (and sometimes better) performances compared to uncompressed dataset with substantial amount of storage space reduced. We also analyze the effect of combining original dataset with the compressed dataset with different JPEG quality levels and witnessed a performance improvement over the baseline. Our evaluation and analysis indicates that the segmentation network trained on compressed dataset could be a better option in terms of performance. We also illustrate that the JPEG compression acts as a data augmentation technique improving the performance of semantic segmentation algorithms.
For autonomous vehicles 3D, rotating LiDAR sensors are often critically important towards the vehicle’s ability to sense its environment. Generally, these sensors scan their environment, using multiple laser beams to gather information about the range and the intensity of the reflection from an object. LiDAR capabilities have evolved such that some autonomous systems employ multiple rotating LiDARs to gather greater amounts of data regarding the vehicle’s surroundings. For these multi–LiDAR systems, the placement of the sensors determine the density of the combined point cloud. We perform preliminary research regarding the optimal LiDAR placement strategy on an off–road, autonomous vehicle known as the Halo project. We use the Mississippi State University Autonomous Vehicle Simulator (MAVS) to generate large amounts of labeled LiDAR data that can be used to train and evaluate a neural network used to process LiDAR data in the vehicle. The trained networks are evaluated and their performance metrics are then used to generalize the performance of the sensor pose. Data generation, training, and evaluation, was performed iteratively to perform a parametric analysis of the effectiveness of various LiDAR poses in the Multi–LiDAR system. We also, describe and evaluate intrinsic and extrinsic calibration methods that are applied in the multi–LiDAR system. In conclusion we found that our simulations are an effective way to evaluate the efficacy of various LiDAR placements based on the performance of the neural network used to process that data and the density of the point cloud in areas of interest.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.