Artificial intelligence (AI) has enormous potential for military applications. Fully realizing the conceived benefits of AI requires effective interactions among Soldiers and computational agents in highly uncertain and unconstrained operational environments. Because AI can be complex and unpredictable, computational agents should support their human teammates by adapting their behavior to the human’s elected strategy for a given task, facilitating mutuallyadaptive behavior within the team. While some situations entail explicit and easy-to-understand human top-down strategies, more often than not, human strategies tend to be implicit, ad hoc, exploratory, and difficult to describe. In order to facilitate mutually-adaptive human-agent team behavior, computational teammates must identify, adapt, and modify their behaviors to support human strategies with little or no a priori experience. This challenge may be achieved by training learning agents with examples of successful group strategies. Therefore, this paper focuses on an algorithmic approach to extract group strategies from multi-agent teaming behaviors in a game-theoretic environment: predator-prey pursuit. Group strategies are illuminated with a new method inspired from Graph Theory. This method treats agents as vertices to generate a timeseries of group dynamics and analytically compares timeseries segments to identify group coordinated behaviors. Ultimately, this approach may lead to the design of agents that can recognize and fall in line with strategies implicitly adopted by human teammates. This work can provide a substantial advance to the field of humanagent teaming by facilitating natural interactions within heterogeneous teams.
In recent work, we utilized convergent cross mapping (CCM) to quantify coordination in a multi-agent reinforcement learning (MARL) paradigm by measuring causal influence between pairs of agents in a joint task. CCM was originally developed to detect causal influences within ecological systems, and as we previously demonstrated, it can be used to measure causal dependencies between pairs of time-series data. While this work has provided important insight into the coordination between 2 teammates, it is not clear how such coordination scales with the number of agents working together with a shared goal. Within a predator-prey pursuit environment, the current study investigates the influence that an incremental increase in number of predator agents has on the inherently causal relationship between predators working together to pursue a single prey. We hypothesize that averaged CCM values will decrease with increasing number of predators due to a redistribution of coordination across all predator agents. This work provides a quantitative assessment for the fundamental influence that number of cooperative agents has on the causal relationship between agents working together on a joint task, and insight into coordinated group behaviors.
While deep reinforcement learning techniques have led to agents that are successfully able to learn to perform a number of tasks that had been previously unlearnable, these techniques are still susceptible to the longstanding problem of reward sparsity. This is especially true for tasks such as training an agent to play StarCraft II, a real-time strategy game where reward is only given at the end of a game which is usually very long. While this problem can be addressed through reward shaping, such approaches typically require a human expert with specialized knowledge. Inspired by the vision of enabling reward shaping through the more-accessible paradigm of natural-language narration, we investigate to what extent we can contextualize these narrations by grounding them to the goal-specific states. We present a mutual-embedding model using a multi-input deep-neural network that projects a sequence of natural language commands into the same high-dimensional representation space as corresponding goal states. We show that using this model we can learn an embedding space with separable and distinct clusters that accurately maps natural-language commands to corresponding game states . We also discuss how this model can allow for the use of narrations as a robust form of reward shaping to improve RL performance and efficiency.