Asymptotically Optimal Planning for Non-myopic Multi-Robot Information Gathering

A recent paper by members of the DCIST alliance develops a novel highly scalable sampling-based planning algorithm for multi-robot active information acquisition tasks in complex environments. Active information gathering scenarios include target localization and tracking, active Simultaneous Localization and Mapping (SLAM), surveillance, environmental monitoring and others. The goal is to compute control policies for mobile robot sensors which minimize the accumulated uncertainty of a dynamic hidden state over an a priori unknown horizon. To design optimal sensor policies, we propose a novel nonmyopic sampling-based approach that simultaneously explores both the robot motion space and the information space reachable by the sensors. We show that the proposed algorithm is probabilistically complete, asymptotically optimal, and convergences exponentially fast to the optimal solution. Moreover, we demonstrate that by biasing the sampling process towards regions that are expected to be informative, the proposed method can quickly compute sensor policies that achieve user-specified levels of uncertainty in large-scale estimation tasks that may involve large multi-robot teams, workspaces, and dimensions of the hidden state. We provide extensive simulation results that corroborate the theoretical analysis and show that the proposed algorithm can address large-scale estimation tasks.
Target localization and tracking scenario: Two robots with limited field-of-view (blue ellipses) navigate an environment with obstacles to localize and track six targets of interest. Target uncertainty is illustrated in red.
Source: Yiannis Kantaros, Brent Schlotfeldt, Nikolay Atanasov, and George J. Pappas: ‘Asymptotically Optimal Planning for Non-myopic Multi-Robot Information Gathering’ In Proceedings of the 2019 Robotics: Science and Systems (RSS), Freiburg, Germany, June 2019.
 
Points of Contact: George J. Pappas

Active Exploration in Signed Distance Fields

When performing tasks in unknown environments it is useful for a team of robots to have a good map of the area to assist in efficient, collision-free planning and navigation. A recent paper by members of the DCIST alliance tackles the problem of autonomous mapping of unknown environments using information theoretic metrics and signed distance field maps. Signed distance fields are discrete representations of environmental occupancy in which each cell of the environment stores a distance to the nearest obstacle surface, with negative distances indicating that the cell is within an obstacle. Such a representation has many benefits over the more traditional occupancy grid map including trivial collision checking, and easy extraction of mesh representations of the obstacle surfaces. The researchers use a truncated signed distance field, which only keeps track of cells near obstacle surfaces, and model each cell as a Gaussian random variable with an expected distance and a variance determined incrementally using a realistic RGB-D sensor noise model. The use of Gaussian random variables enables the closed form computation of Shannon mutual information between a Gaussian sensor measurement and the Gaussian cells it intersects. This allows for efficient evaluations of expected information when planning and evaluating possible future trajectories. Using these tools, a robot is able to efficiently evaluate a large number of trajectories before choosing the best next step to increase its information about the environment. The researchers show the resulting active exploration algorithm running on several simulated 2D environments of varying complexity. The figure shows a snapshot of the robot exploring the most complex of the three environments. These simulations can be viewed in more detail in the video linked below.

Points of Contact: Vijay Kumar (PI), Kelsey Saulnier.

Citation:  K. Saulnier., N. Atanasov, G. J.Pappas, & V. Kumar, “Information Theoretic Active Exploration in Signed Distance Fields,” IEEE International Conference on Robotics and Automation (ICRA), Paris, France, June 2020. (Accepted)

Learning Multi-Agent Policies from Observations

A recent paper from the DCIST team introduces a framework for learning to perform multi-robot missions by observing an expert system executing the same
mission. The expert system is a team of robots equipped with a library of controllers, each designed to solve a specific task. The expert system’s policy selects the controller necessary to successfully execute the mission at each time step, based on the states of the robots and the environment. The objective of the learning framework is to enable an un-trained team of robots (i.e., imitator system) — equipped with the same library of controllers but not the expert policy — to learn to execute the mission with performance comparable to that of the expert system. Based on un-annotated and noisy observations of the expert system, a multi-hypothesis filtering technique estimates the series of individual controllers executed by the expert policy. Then, the history of estimated controllers and environmental states provide supervision to train a neural network policy for the imitator system. When evaluated on a perimeter protection scenario, experimental results suggest that the learned policy endows the imitator system with performance comparable to that of the expert system.
Source: P. Pierpaoli, H. Ravichandar, N. Waytowich, A. Li, D. Asher, M. Egerstedt.  “Inferring and Learning Multi-Robot Policies from Observations”, International Conference on Intelligent Robots and Systems (IROS), 2020 – under review
Points of Contact: Pietro Pierpaoli; Harish Ravichandar {pietro.pierpaoli, harish.ravichandar} @gatech.edu

Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors

A recent paper by members of the DCIST alliance develops the use of reinforcement learning techniques to train policies in simulation that transfer remarkably well to multiple different physical quadrotors. Quadrotor stabilizing controllers often require careful, model-specific tuning for safe operation. The policies developed are low-level, i.e., they map the rotorcrafts’ state directly to the motor outputs. The trained control policies are very robust to external disturbances and can withstand harsh initial conditions such as throws. The work shows how different training methodologies (change of the cost function, modeling of noise, use of domain randomization) might affect flight performance. The is the first work that demonstrates that a simple neural network can learn a robust stabilizing low-level quadrotor controller (without the use of a stabilizing PD controller) that is shown to generalize to multiple quadrotors.

Project page:

Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors
A.Molchanov, T. Chen, W. Hönig, J. A.Preiss, N. Ayanian, G. S. Sukhatme
IEEE/RSJ International Conference on Robots and Systems (IROS) 2019

DCIST Task: RA3.A1 Robust Adaptive Machine Learning
Contact: Gaurav Sukhatme

Planning with Uncertain Specifications (PUnS)

Consider the task of setting a dinner table. It involves placing the appropriate serving utensils and silverware according to the dishes being served. Some of the objects need to be placed in a particular order as they might be stacked on top of each other or due to cultural traditions. Many real-world tasks demonstrate such temporal characteristics that can succinctly modeled with linear temporal logic (LTL).

However defining sound and complete task specifications in LTL is non-trivial, and it remains desirable to allow robots to infer the task specifications through intuitive modalities like demonstrations, spoken instructions or corrections. Thus specifications for a task can at best be stated as a belief over candidate LTL formulas. An ideal learner would try to simultaneously satisfy as many of these candidate properties as possible while performing the task.

We tackle this problem in this work, where we introduces a novel problem formulation for planning with uncertain specifications (PUnS). To do that we first identify four evaluation criteria than capture the semantics of satisfying a belief over LTL formulas. These include:

  • Most likely: Here only the formula with the highest probability in the belief distribution is satisfied, the other formulas are ignored.
  • Maximum coverage: The largest set of unique formulas in the belief distribution is satisfied.
  • Minimum regret: The hypothesis averaged satisfaction of the belief distribution is maximised.
  • Chance constrained: The maximum probability of failure is determined by the user and the distribution is modified to only capture a set of formulas that have a cumulative probability larger than the failure probability.

We demonstrate that the choice of evaluation criterion results in a trade-off between fliexibility in task execution and risk-aversion. Finally we also demonstrate that every instance of a PUnS problem can be compiled into an equivalent reward machine. This is an equivalent Markov decision process (MDP) for the original non-Markov PUnS problem.

We demonstrate our technique on the table-setting task where the task specifications were inferred from 30 demonstrations, and the policy computed using the PUnS formulation sets the table with a very low error rate (0.001%).

Source: A. Shah, S. Li, J. Shah, ”Planning With Uncertain Specifications (PUnS),” IEEE International Conference on Robotics and Automation (ICRA), Paris, France, June 2020 (accepted).

Points of Contact: Julie Shah (PI) and Ankit Shah

Synthesis of a Time-Varying Communication Network by Robot Teams with Information Propagation Guarantees

A recent paper by Xi Yu and M. Ani Hsieh from the University of Pennsylvania presents a distributed control and coordination strategy that allows a swarm of mobile robots to form an intermittently connected communication network as they monitor a given environment. The approach assumes robots are tasked to patrol a set of perimeters in a region of interest and are only able to communicate with one another when they move into each other’s communication range as they monitor their assigned perimeters. The work shows how intermittent connectivity can be achieved as each robot synchronizes its speed with other robots moving along neighboring perimeters. As robots synchronize with one another, they effectively ensure future pair-wise rendezvous with robots on neighboring perimeters.  This enables the team to form a time-varying communication network where information can be successfully transmitted between any pair of robots within some finite time period.  Additionally, the approach provides bounds on the time needed to propagate information throughout the network. The feasibility of the strategy and the validity of the approach is validated in simulation.

Fig. 1.  An overhead view of an urban environment (left) where robots are tasked to patrol perimeters around different buildings.  As robots move into each other’s communication range (robots on perimeters 7 and 8 and robots on perimeters 11 and 12 in the right figure), information can be exchanged.  In both figures, blue robots are robots who have obtained a new piece of information from a robot on a neighboring perimeter while red robots are robots who have not received the information.  As robots rendezvous with each other, the information will eventually propagate throughout the entire network.

Learning Decentralized Controllers with Graph Neural Networks

A recent paper by members of the DCIST alliance develops a method for distributed control of large networks of mobile robots with interacting dynamics and sparsely available communications. Their approach is to learn local controllers that require only local information and communications at test time by imitating the policy of centralized controllers using global information at training time. By extending aggregation graph neural networks to time varying signals and time varying network support, they learn a single common local controller which exploits information from distant teammates using only local communication interchanges. The researchers apply this approach to the problem of flocking to demonstrate performance on communication graphs that change as the robots move. They examine how a decreasing communication radius and faster velocities increase the value of multi-hop information.

Task: RA1.C1 Joint Resource Allocation in Perception-Action-Communication Loops
Points of Contact: Alejandro Ribeiro (PI) and Ekaterina Tolstaya

Video: https://youtu.be/Ph-GX0lSKME 

Paper: https://arxiv.org/pdf/1903.10527.pdf 

Citation: E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, A. Ribeiro “Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks.” Conference on Robot Learning, October 2019.

Heterogeneity and Uncertainty in Perimeter Defense

Surveillance of perimeters and securing perimeters are important tasks in civilian and military defense applications, and it has become practical to deploy a large number of autonomous agents to address these problems using multi-robot systems.  

A recent paper by members of the DCIST alliance formulates this scenario as a variant of multi-player pursuit-evasion games, where we suppose that intruders must be intercepted before they can reach a region of interest. Finding the optimal strategies in such a game is challenging due to the high dimensionality of the joint state space.  To perform a tractable analysis, we developed a decomposition method that reduces the problem to an efficient combinatorial optimization problem [1]. We derived a set of intruder and defender team strategies, and proved their optimality in the sense that they form a Nash equilibrium. In addition, our algorithm finds the upper bound on the number of intrusions which the defender team can certify at the  beginning of a game.

Highlight Video: https://youtu.be/6zUPkzh_iPU

The performance certified by the defender team can also be used to inform higher level coalition forming algorithms. In the extension of the above work, we considered a scenario where defensive agents have limited sensor footprint, and so they deploy both defenders to intercept along the perimeter and patrollers to scout outside the defended region. In this context, the problems of team formation and task allocation were studied [2].  In particular, we consider how uncertainty in the environment or opponent capabilities introduce uncertainty in the parameters (speed advantage, detection range, and capture range). When designing a team to satisfy a task specification, species may be selected to complement each other both with respect to trait specialization under any known environment but also with respect to their complementary variability when the environment (as modeled by trait parameters) is uncertain.

Related Publications:
 [1] D. Shishika, J. Paulos, and Vijay Kumar. “Cooperative team strategies for multi-player perimeter- defense games,” Accepted to IEEE Robotics and Automation Letters, available at https://arxiv. org/abs/1912.04342.
[2] D. Shishika, J. Paulos, M. R. Dorothy, M. A. Hsieh, and V. Kumar. “Team Composition for Perimeter Defense with Patrollers and Defenders,” IEEE Conference on Decision and Control, pp. 7325-7332, France, 2019.

Tasks:
RA2.A1: Abstraction of Task Diversity
RA2.A3: Composable Autonomy in Heterogeneous Groups

Points of Contact:
Vijay Kumar (PI), Daigo Shishika, James Paulos, Douglas Macharet

 

Human Information Processing in Complex Networks

In this work, we study the structure of real-world communication systems to understand how information can be rapidly and efficiently communicated to humans, for example from swarms of drones or other agents. Humans constantly receive information from systems of interconnected stimuli or concepts — from language and music to literature and science — yet it remains unclear how, if at all, the structure of these networks supports the communication of information. Although information theory provides tools to quantify the information produced by a system, traditional metrics do not account for the inefficient ways that humans process this information.

Here we develop an analytical framework to study the information generated by a system as perceived by a human observer. We demonstrate experimentally that this perceived information depends critically on a system’s network topology. Applying our framework to several real networks, we find that they communicate a large amount of information (having high entropy) and do so efficiently (maintaining low divergence from human expectations). Moreover, we show that such efficient communication arises in networks that are simultaneously heterogeneous, with high-degree hubs, and clustered, with tightly-connected modules — the two defining features of hierarchical organization. Together, these results suggest that rapid and efficient communication is constrained by the structural properties of communication systems, with implications for the design of optimal channels for robot-human and human-human information transmission.

Source: Lynn, C. W., Papadopoulos, L., Kahn, A. E., and Bassett, D. B. “Human information processing in complex networks.” In revision, Nature Physics <arxiv.org/abs/1906.00926>.

Points of contact: Danielle S. Bassett (PI) and Christopher W. Lynn.

Resilient Active Information Acquisition with Mobile Robots

In the future, teams of heterogeneous robot teams will be operating in unknown and adversarial environments.   In failure prone or adversarial environments, the capability of resilience is crucial to ensuring the robots can complete their mission. Mission resilience to robot failures, sensor attacks or communication disruptions is currently an afterthought leading to optimal over-provisioned designs.   DCIST researchers are leading the research community by designing resilient algorithms for distributed planning and control that from the outset embrace the possibility of failures or attacks.   Resilience is defined as the robot team’s ability to recover from failures or attacks to members of the team, while completing the task.

A recent paper by members of the DCIST alliance develops efficient algorithms for information acquisition problems for teams of mobile robots who need to exhibit resiliency in situational awareness.  The work develops a computationally efficient algorithm with linear complexity in the number of robots that achieves a provable approximation performance for information gathering tasks such as active mapping and tracking moving targets in attack prone environments. A key mathematical property used in the algorithm analysis, and common to many information measures is a property called submodularity, which is a diminishing returns property on information gain. Experiments indicate that the algorithm leads to superior target tracking performance under a variety of different robot team sizes, number of targets, and attacks.  


Photo:
A resilient active target tracking problem with a 3 robot team. The compromised robot is highlighted in red, the remaining robots in blue, and the target uncertainty is highlighted in cyan

Schlotfeldt, B., Tzoumas, V., Thakur, D., & Pappas, G. J. (2018, October). Resilient active information gathering with mobile robots. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4309-4316). IEEE.
Task: RA3.C1
Points of Contact: George J Pappas