From Event: SPIE Defense + Commercial Sensing, 2023
Large and diverse datasets can now be simulated with associated truth to train and evaluate AI/ML algorithms. This convergence of readily accessible simulation (SIM) tools, real-time high-performance computing, and large repositories of high-quality, free-to-inexpensive photorealistic scanned assets is a potential artificial intelligence (AI) and machine learning (ML) gamechanger. While this feat is now within our grasp, what SIM data should be generated, how should it be generated, and how can this be achieved in a controlled and scalable fashion? First, we discuss a formal procedural language for specifying scenes (LSCENE) and collecting sampled datasets (LCAP). Second, we discuss specifics regarding our production and storage of data, ground truth, and metadata. Last, two LSCENE/LCAP examples are discussed and three unmanned aerial vehicle (UAV) AI/ML use cases are provided to demonstrate the range and behavior of the proposed ideas. Overall, this article is a step towards closed-loop automated AI/ML design and evaluation.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jeffrey Kerley, Derek T. Anderson, Brendan Alvey, and Andrew Buck, "How should simulated data be collected for AI/ML and unmanned aerial vehicles?," Proc. SPIE 12529, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 125290J (Presented at SPIE Defense + Commercial Sensing: May 02, 2023; Published: 13 June 2023); https://doi.org/10.1117/12.2663717.