Many approaches in software analysis, particularly dynamic malware analyis, benefit greatly from the use of linked data and
other Semantic Web technology. In this paper, we describe AIS, Inc.’s Semantic Extractor (SemEx) component from the
Malware Analysis and Attribution through Genetic Information (MAAGI) effort, funded under DARPA’s Cyber Genome
program. The SemEx generates OWL-based semantic models of high and low level behaviors in malware samples from
system call traces generated by AIS’s introspective hypervisor, IntroVirtTM. Within MAAGI, these semantic models were
used by modules that cluster malware samples by functionality, and construct “genealogical” malware lineages. Herein, we
describe the design, implementation, and use of the SemEx, as well as the C2DB, an OWL ontology used for representing
software behavior and cyber-environments.