From Event: SPIE Defense + Commercial Sensing, 2019
In this work, we aim to address the needs of human analysts to consume and exploit data given the proliferation of overhead imaging sensors. We have investigated automatic captioning methods capable of describing and summarizing scenes and activities by providing textual descriptions using natural language for overhead full motion video (FMV). We have integrated methods to provide three types of outputs: (1) summaries of short video clips; (2) semantic maps, where each pixel is labeled with a semantic category; and (3) dense object description to capture object attributes and activities. We show results obtained from VIRAT and Aeroscapes publicly available datasets.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Marc Bosch , Christopher Gifford, Agata Ciesielski, Scott Almes, Rachel Ellison, and Gordon Christie, "Captioning of full motion video from unmanned aerial platforms," Proc. SPIE 10992, Geospatial Informatics IX, 1099202 (Presented at SPIE Defense + Commercial Sensing: April 15, 2019; Published: 13 May 2019); https://doi.org/10.1117/12.2518163.