Currently, obtaining reliable situational awareness of the social landscape is an arduous, lengthy process involving manual analyses by social scientists. These traditional methods do not scale to the speed and diversity required by DoD operations or the high-speed, international business model in today’s corporate environment. Conversely, “big data” easily scales to meet these challenges but lacks the rigor of social science theory. We present Big Open-Source Social Science (BOSSS), a research and development project that leverages the strengths of social- and computer-science technology to address the operational need for rapid and reliable human-landscape situational-awareness. BOSSS iteratively filters, navigates, and summarizes diverse open-source data to characterize a local population’s social structure, conflicts, cleavages, affinities, and animosities. BOSSS automatically scrapes open-access data from the web and performs natural language processing to populate a knowledge graph with a custom schema. BOSSS then mines the graph to extract key, theory-agnostic socialscience principles of human inter-relations and dynamics: homophily, stratification, sentiment, and conflict. Automated quantitative social-network analysis provides up-to-date indicators of trends or anomalies within the local population’s social landscape. BOSSS’s emerging technology will provide a dramatic reduction in the cognitive workload for the next generation of analysts and will facilitate more rapid situational awareness both for deployed soldiers and private companies conducting operations abroad.
Personalized search provides a potentially powerful tool, however, it is limited due to the large number of roles that a person has: parent, employee, consumer, etc. We present the role-relevance algorithm: a search technique that favors search results relevant to the user’s current role. The role-relevance algorithm uses three factors to score documents: (1) the number of keywords each document contains; (2) each document’s geographic relevance to the user’s role (if applicable); and (3) each document’s topical relevance to the user’s role (if applicable). Results on a pre-labeled corpus show an average improvement in search precision of approximately 20% compared to keyword search alone. We further consider several extensions to this algorithm.