Heterogeneity and Uncertainty in Perimeter Defense

Surveillance of perimeters and securing perimeters are important tasks in civilian and military defense applications, and it has become practical to deploy a large number of autonomous agents to address these problems using multi-robot systems.  

A recent paper by members of the DCIST alliance formulates this scenario as a variant of multi-player pursuit-evasion games, where we suppose that intruders must be intercepted before they can reach a region of interest. Finding the optimal strategies in such a game is challenging due to the high dimensionality of the joint state space.  To perform a tractable analysis, we developed a decomposition method that reduces the problem to an efficient combinatorial optimization problem [1]. We derived a set of intruder and defender team strategies, and proved their optimality in the sense that they form a Nash equilibrium. In addition, our algorithm finds the upper bound on the number of intrusions which the defender team can certify at the  beginning of a game.

Highlight Video: https://youtu.be/6zUPkzh_iPU

The performance certified by the defender team can also be used to inform higher level coalition forming algorithms. In the extension of the above work, we considered a scenario where defensive agents have limited sensor footprint, and so they deploy both defenders to intercept along the perimeter and patrollers to scout outside the defended region. In this context, the problems of team formation and task allocation were studied [2].  In particular, we consider how uncertainty in the environment or opponent capabilities introduce uncertainty in the parameters (speed advantage, detection range, and capture range). When designing a team to satisfy a task specification, species may be selected to complement each other both with respect to trait specialization under any known environment but also with respect to their complementary variability when the environment (as modeled by trait parameters) is uncertain.

Related Publications:
 [1] D. Shishika, J. Paulos, and Vijay Kumar. “Cooperative team strategies for multi-player perimeter- defense games,” Accepted to IEEE Robotics and Automation Letters, available at https://arxiv. org/abs/1912.04342.
[2] D. Shishika, J. Paulos, M. R. Dorothy, M. A. Hsieh, and V. Kumar. “Team Composition for Perimeter Defense with Patrollers and Defenders,” IEEE Conference on Decision and Control, pp. 7325-7332, France, 2019.

Tasks:
RA2.A1: Abstraction of Task Diversity
RA2.A3: Composable Autonomy in Heterogeneous Groups

Points of Contact:
Vijay Kumar (PI), Daigo Shishika, James Paulos, Douglas Macharet

 

Human Information Processing in Complex Networks

In this work, we study the structure of real-world communication systems to understand how information can be rapidly and efficiently communicated to humans, for example from swarms of drones or other agents. Humans constantly receive information from systems of interconnected stimuli or concepts — from language and music to literature and science — yet it remains unclear how, if at all, the structure of these networks supports the communication of information. Although information theory provides tools to quantify the information produced by a system, traditional metrics do not account for the inefficient ways that humans process this information.

Here we develop an analytical framework to study the information generated by a system as perceived by a human observer. We demonstrate experimentally that this perceived information depends critically on a system’s network topology. Applying our framework to several real networks, we find that they communicate a large amount of information (having high entropy) and do so efficiently (maintaining low divergence from human expectations). Moreover, we show that such efficient communication arises in networks that are simultaneously heterogeneous, with high-degree hubs, and clustered, with tightly-connected modules — the two defining features of hierarchical organization. Together, these results suggest that rapid and efficient communication is constrained by the structural properties of communication systems, with implications for the design of optimal channels for robot-human and human-human information transmission.

Source: Lynn, C. W., Papadopoulos, L., Kahn, A. E., and Bassett, D. B. “Human information processing in complex networks.” In revision, Nature Physics <arxiv.org/abs/1906.00926>.

Points of contact: Danielle S. Bassett (PI) and Christopher W. Lynn.

Resilient Active Information Acquisition with Mobile Robots

In the future, teams of heterogeneous robot teams will be operating in unknown and adversarial environments.   In failure prone or adversarial environments, the capability of resilience is crucial to ensuring the robots can complete their mission. Mission resilience to robot failures, sensor attacks or communication disruptions is currently an afterthought leading to optimal over-provisioned designs.   DCIST researchers are leading the research community by designing resilient algorithms for distributed planning and control that from the outset embrace the possibility of failures or attacks.   Resilience is defined as the robot team’s ability to recover from failures or attacks to members of the team, while completing the task.

A recent paper by members of the DCIST alliance develops efficient algorithms for information acquisition problems for teams of mobile robots who need to exhibit resiliency in situational awareness.  The work develops a computationally efficient algorithm with linear complexity in the number of robots that achieves a provable approximation performance for information gathering tasks such as active mapping and tracking moving targets in attack prone environments. A key mathematical property used in the algorithm analysis, and common to many information measures is a property called submodularity, which is a diminishing returns property on information gain. Experiments indicate that the algorithm leads to superior target tracking performance under a variety of different robot team sizes, number of targets, and attacks.  


Photo:
A resilient active target tracking problem with a 3 robot team. The compromised robot is highlighted in red, the remaining robots in blue, and the target uncertainty is highlighted in cyan

Schlotfeldt, B., Tzoumas, V., Thakur, D., & Pappas, G. J. (2018, October). Resilient active information gathering with mobile robots. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4309-4316). IEEE.
Task: RA3.C1
Points of Contact: George J Pappas

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

While many distributed reinforcement learning (RL) has emerged as one of the important paradigms in distributed control, we are only beginning to understand the fundamental behavior of these algorithms.  Two recent papers from the DCIST alliance provide important progress in this direction.

In the multi-agent policy evaluation problem, a group of agents operate in a common environment under a fixed control policy, and work together to discover the value (global discounted accumulative reward) associated with each environmental state.  Over a series of time steps, the agents act, get rewarded, update their local estimate of the value function, then communicate with their neighbors.  To solve this problem, a distributed variant of the popular temporal difference learning (TD) method is proposed. The main contribution is to provide a finite-analysis on the performance of this distributed TD algorithm for both constant and time-varying step sizes. In addition, the results also provide a mathematical explanation for observations that have appeared previously in the literature about the choice of the algorithm parameter to yield the best performance of (distributed) TD learning.

This work is currently being applied in the study of more complex control problems in robotic networks using reinforcement learning. A similar distributed Q-learning algorithm is being used to design an optimal sequence of coordinated behaviors for multi-robot systems operating in an unknown environment.  Simulations actualized in Georgia Tech’s Robotarium have demonstrated the effectiveness of these methods in executing complex tasks with a network of autonomous robots.

Sources:
Conference: Thinh T. Doan, Siva Theja Maguluri, Justin Romberg “Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation.” Conference: Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
Journal: Submitted to SIAM Journal on Mathematics of Data Science.

Points of Contact: Justin Romberg (PI) and Thinh T. Doan.

Learning to Learn with Probabilistic Task Embeddings

To operate successfully in a complex and changing environment, learning agents must be able to acquire new skills quickly. Humans display remarkable skill in this area — we can learn to recognize a new object from one example, adapt to driving a different car in a matter of minutes, and add a new slang word to our vocabulary after hearing it once. Meta-learning is a promising approach for enabling such capabilities in machines. In this paradigm, the agent adapts to a new task from limited data by leveraging a wealth of experience collected in performing related tasks. For agents that must take actions and collect their own experience, meta-reinforcement learning (meta-RL) holds the promise of enabling fast adaptation to new scenarios. Unfortunately, while the trained policy can adapt quickly to new tasks, the meta-training process requires large amounts of data from a range of training tasks, exacerbating the sample inefficiency that plagues RL algorithms. As a result, existing meta-RL algorithms are largely feasible only in simulated environments.

The lustre of off-policy meta-RL

While policy gradient RL algorithms can achieve high performance on complex high-dimensional control tasks (e.g., controlling a simulated humanoid robot to
run), they are woefully sample inefficient. For example, the state-of-the-art policy gradient method PPO requires 100 million samples to learn a good policy for humanoid. If we were to run this algorithm on a real robot, running continuously with a 20 Hz controller and without counting time for resets, it would take nearly two months to learn this policy. This sample inefficiency is largely because the data to form the policy gradient update must be sampled from the current policy, precluding the re-use of previously collected data during training. Recent off-policy algorithms (TD3, SAC) have matched the performance of policy gradient algorithms while requiring up to 100X fewer samples. If we could leverage these algorithms for meta-RL, weeks of data collection could be reduced to half a day, putting meta-learning within reach of our robotic arms. Off-policy learning offers further benefits beyond better sample efficiency when training from scratch. We could also make use of previously collected static datasets, and leverage data from other robots in other locations.

Source: K. Rakelly, “BAIR: Berkley Artificial Intelligence Research,” June 10, 2019.
https://bair.berkeley.edu/blog/2019/06/10/pearl/

Task: RA1.A1: The Swarm’s Knowledge Base: Contextual Perceptual Representations

Localization and Mapping using Instance-specific Mesh Models

A recent paper by members of the DCIST alliance proposes an approach for building semantic maps, containing object poses and shapes, in real time, onboard an autonomous robot equippend with a monocular camera. Rich understanding of the geometry and context of a robot’s surroundigs is important for specification and safe, efficient execution of complex missions. This work develops a deformable mesh model of object shape that can be optimized online based on semantic information (object parts and segmentation) extracted from camera images. Multi-view constraints on the object shape are obtained by detecting objects and extracting category-specific keypoints and segmentation masks. The paper shows that the errors between projections of the mesh model and the observed keypoints and masks can be differentiated in order to obtain accurate instance-specific object shapes. The potential of this approach to build large-scale object-level maps will be investigated in DCIST autonomous navigation and situational awareness tasks.

 

Additional Details: https://fengqiaojun.github.io/IROS19_InstanceMesh.html

Source: Q. Feng, Y. Meng, M. Shan, and N. Atanasov, “Localization and Mapping using Instance-specific Mesh Models,“ IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019.

Task: RA1.A1: The Swarm’s Knowledge Base: Contextual Perceptual Representations

Points of Contact: Nikolay Atanasov (PI) and Qiaojun Feng.

Aerial Robot Prototype