3 May 2017 Human-machine interaction to disambiguate entities in unstructured text and structured datasets
Author Affiliations +
Abstract
Creating entity network graphs is a manual, time consuming process for an intelligence analyst. Beyond the traditional big data problems of information overload, individuals are often referred to by multiple names and shifting titles as they advance in their organizations over time which quickly makes simple string or phonetic alignment methods for entities insufficient. Conversely, automated methods for relationship extraction and entity disambiguation typically produce questionable results with no way for users to vet results, correct mistakes or influence the algorithm’s future results. We present an entity disambiguation tool, DRADIS, which aims to bridge the gap between human-centric and machinecentric methods. DRADIS automatically extracts entities from multi-source datasets and models them as a complex set of attributes and relationships. Entities are disambiguated across the corpus using a hierarchical model executed in Spark allowing it to scale to operational sized data. Resolution results are presented to the analyst complete with sourcing information for each mention and relationship allowing analysts to quickly vet the correctness of results as well as correct mistakes. Corrected results are used by the system to refine the underlying model allowing analysts to optimize the general model to better deal with their operational data. Providing analysts with the ability to validate and correct the model to produce a system they can trust enables them to better focus their time on producing higher quality analysis products.
© (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kevin Ward, Jack Davenport, "Human-machine interaction to disambiguate entities in unstructured text and structured datasets", Proc. SPIE 10207, Next-Generation Analyst V, 102070I (3 May 2017); doi: 10.1117/12.2265825; https://doi.org/10.1117/12.2265825
PROCEEDINGS
7 PAGES


SHARE
Back to Top