Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.
KEYWORDS: Sensors, Computer simulations, Data processing, Information technology, Analytical research, Tactical intelligence, Signals intelligence, Military intelligence, Alternate lighting of surfaces, Social network analysis
The most representative and accurate data for testing and evaluating information extraction technologies is real-world data. Real-world operational data can provide important insights into human and sensor characteristics, interactions, and behavior. However, several challenges limit the feasibility of experimentation with real-world operational data. Realworld data lacks the precise knowledge of a “ground truth,” a critical factor for benchmarking progress of developing automated information processing technologies. Additionally, the use of real-world data is often limited by classification restrictions due to the methods of collection, procedures for processing, and tactical sensitivities related to the sources, events, or objects of interest. These challenges, along with an increase in the development of automated information extraction technologies, are fueling an emerging demand for operationally-realistic datasets for benchmarking. An approach to meet this demand is to create synthetic datasets, which are operationally-realistic yet unclassified in content. The unclassified nature of these unclassified synthetic datasets facilitates the sharing of data between military and academic researchers thus increasing coordinated testing efforts. This paper describes the expansion and augmentation of two synthetic text datasets, one initially developed through academic research collaborations with the Army. Both datasets feature simulated tactical intelligence reports regarding fictitious terrorist activity occurring within a counterinsurgency (COIN) operation. The datasets were expanded and augmented to create two military relevant datasets. The first resulting dataset was created by augmenting and merging the two to create a single larger dataset containing ground-truth. The second resulting dataset was restructured to more realistically represent the format and content of intelligence reports. The dataset transformation effort, the final datasets, and their applicability for research are presented.