Translator Disclaimer
12 May 2016 Transforming a research-oriented dataset for evaluation of tactical information extraction technologies
Author Affiliations +
The most representative and accurate data for testing and evaluating information extraction technologies is real-world data. Real-world operational data can provide important insights into human and sensor characteristics, interactions, and behavior. However, several challenges limit the feasibility of experimentation with real-world operational data. Realworld data lacks the precise knowledge of a “ground truth,” a critical factor for benchmarking progress of developing automated information processing technologies. Additionally, the use of real-world data is often limited by classification restrictions due to the methods of collection, procedures for processing, and tactical sensitivities related to the sources, events, or objects of interest. These challenges, along with an increase in the development of automated information extraction technologies, are fueling an emerging demand for operationally-realistic datasets for benchmarking. An approach to meet this demand is to create synthetic datasets, which are operationally-realistic yet unclassified in content. The unclassified nature of these unclassified synthetic datasets facilitates the sharing of data between military and academic researchers thus increasing coordinated testing efforts. This paper describes the expansion and augmentation of two synthetic text datasets, one initially developed through academic research collaborations with the Army. Both datasets feature simulated tactical intelligence reports regarding fictitious terrorist activity occurring within a counterinsurgency (COIN) operation. The datasets were expanded and augmented to create two military relevant datasets. The first resulting dataset was created by augmenting and merging the two to create a single larger dataset containing ground-truth. The second resulting dataset was restructured to more realistically represent the format and content of intelligence reports. The dataset transformation effort, the final datasets, and their applicability for research are presented.
© (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Heather Roy, Sue E. Kase, and Joanne Knight "Transforming a research-oriented dataset for evaluation of tactical information extraction technologies", Proc. SPIE 9851, Next-Generation Analyst IV, 98510O (12 May 2016);

Back to Top