Diana Spears on the safety of adaptive agents

||Conversations

Diana Spears portraitDiana Spearsis an Owner and Research Scientist atSwarmotics, LLC。Previously, she worked at US government laboratories (Goddard,nist,NRL) and afterwards she was an Associate Professor of Computer Science at the怀俄明大学。她从马里兰大学,大学公园

Dr. Spears’s research interests include machine learning, adaptive swarm robotic sensing networks/grids, computational fluid dynamics based algorithms for multi-robot chemical/biological plume tracing and plume mapping, and behaviorally-assured adaptive and machine learning systems. Dr. Spears pioneered the field of safe adaptive agents with her award-winning (2001 NRL Alan Berman Research Publication Award) publication entitled,“Asimovian adaptive agents.”Most recently she and her husband co-edited the bookPhysicomimetics: Physics-Based Swarm Intelligence,”published by Springer-Verlag in 2012.

Luke Muehlhauser: InSpears (2006)和其他出版物,您讨论了确保适应性(学习)代理人安全的挑战:

a designer cannot possibly foresee all circumstances that will be encountered by the agent. Therefore, in addition to supplying an agent with a plan, it is essential to also enable the agent to learn and modify its plan to adapt to unforeseen circumstances. The introduction of learning, however, often makes the agent’s behavior significantly harder to predict. The goal of this research is to verify the behavior of adaptive agents. In particular, our objective is to develop efficient methods for determining whether the behavior of learning agents remains within the bounds of prespecified constraints… after learning…

…Our results include proofs that… with respect to important classes of properties… if the [safety] property holds for the agent’s plan prior to learning, then it is guaranteed to still hold after learning. If an agent uses these “safe” learning operators, it will be guaranteed to preserve the properties with no reverification required. This is the best one could hope for in an online situation where rapid response time is critical. For other learning operators and property classes our a priori results are negative. However, for these cases we have developed incremental reverification algorithms that can save time over total reverification from scratch.

What do you mean by “incremental” reverification algorithms, as in the last sentence I quoted?


Diana Spears: Verification (model checking, in particular) consists of proving that a computer program satisfies a desirable property/constraint and, if it does not, a counterexample is provided. In my work, I assume that this program is a (multi)agent plan for action. In most real-world applications, plans are typically enormous and therefore verification may be quite time-consuming. Suppose the safety property/constraint that agent A must always obey is that “agent A should always be at least M units away from agent B” (to prevent collisions). Let’s assume that initial verification proved that the entire plan (consisting of all action sequences that agent A could ever possibly take) is guaranteed to obey this property in all circumstances. Furthermore, let’s assume that adaptation is required after the agent has been fielded, where adaptation consists of applying a machine learning operator to modify the plan. For example, suppose a specific part of the plan states that agent A should “move from location 3 to location 4 if there is a clear path between 3 and 4, the ground is fairly level (e.g., nowhere higher than X or lower than Y between locations 3 and 4), and if the schedule permits such movement at this time.” Then an example machine learning operator might change the “4” to “6” based on new information about the task being performed.

注意,这个学习算子只修改一个condition in one miniscule portion of the entire plan.Therefore, why re-verify that the entire plan still satisfies the desired property after learning? Why not re-verify only the specific portion of the plan that was modified, as well as any parts of the plan that depend on the modified portion? That is what “incremental re-verification” does. It localizes post-adaptation verification to only those parts of the plan that are essential to re-verify. In doing so, it improves the time complexity of re-verification. Time complexity is a very important and practical consideration for online systems – especially those that operate in real-time or in time-critical situations. In my research I ran numerous experiments comparing the CPU time of re-verifying an entire plan versus localized incremental re-verification of the plan after learning. My results showed as much as a 1/2-billion-fold speedup using incremental re-verification! And that’s using a plan that’s tiny compared to what agents would typically use in the real world.


Luke: With what kinds of agent programs have you explored this issue? What do the agents do, in what environment, and what kinds of safety properties do you prove about them?


Diana:由于该主题“安全适应”与航空航天应用具有很强的相关性,因此我选择着重于NASA相关的(多)代理程序。我描绘了一种场景,其中一个航天器已降落在另一个星球上,例如火星,并从中出现了多个移动流浪者。航天器着陆器以及行星漫游者的计划(程序)用于收集,检索和传输/传输/传递/或从地球表面的样品和/或样品。我证明了“安全”和“ Livices”属性。“安全性”的一个例子是,“始终是代理R在代理L传输的同时交付的情况。”在这里,L是Lander,R是流浪者之一。该属性/约束可以防止登录器同时接收新数据时可能引起的问题,同时将旧数据传输到地球。“ Livices”的一个例子是,“如果代理R执行“交付”诉讼,那么最终代理L将执行“接收”诉讼。”


Luke:自2006年以来,您或其他任何人都建立在这一特定工作之上吗?对于这一特定的研究,有哪些自然的“下一步”是什么?亚博体育官网


Diana: There were essentially three main offshoots of my research that I’m aware of – from NASA Ames, SRI, and USC. I’ll start with the NASA Ames offshoot. Around the year 2000 I gave a talk about “Asimovian Adaptive Agents” at NASA Ames. My instincts about the strong relevance of this work to NASA, and more generally aerospace, proved to be correct. (Furthermore, it appears to be highly relevant to任何automated transportation, including automated cars/highways.) The researchers at NASA Ames rapidly and eagerly followed up on my talk with a flurry of related work, including research and publications. These researchers focused on “reference models,” which are used for online, run-time I/O checking. Instead of using temporal logic properties to verify, they used control theoretic properties such as “stability” and “performance.” Perkins and Barto also used Lyapunov stability as the property of interest when building adaptive learning agents1。有关NASA AMES研究和其他相关工作的示例,请参阅NIP亚博体育官网S’04在“学习系统验证,验证和测试”研讨会中出现的论文。亚博体育苹果app官方下载2。Dietterich gave a follow-up talk at NIPS’06 on this topic3。The NASA Ames offshoot has continued to be active post-2006, as exemplified by the many recent contributing papers in Schumann’s 2010 book4。此外,NASA AMES的Vahram Stepanyan和其他人一直在研究一个名为“综合弹性飞机控制”(IRAC)的项目,其目标是验证多学科的集成飞机控制设计工具和技术,该工具和技术将在尽管出乎意料的不良条件下能够实现安全飞行5

在我的艾姆斯(Ames)谈到后不久,SRI International的John Rushby发起了第二次分支。SRI后续研究继续专注于模型检查的正亚博体育官网式方法,这是我最初与之合作的。但是最近,这项工作朝着与艾姆斯(Ames)的方向更加相似6。例如,在本文中,Rushby介绍了使用导致在线性能监视器的“安全案例”的想法。甚至最近,在IRAC项目的背景下,SRI的Ashish Tiwari致力于自适应神经网络的有限验证7

Next, consider a third offshoot. This is the research at the University of Southern California (USC) by Milind Tambe and others. These USC researchers built on my specific line of work, but they decided to address the important issue of mixed-initiative situations (also called “adjustable autonomy”), where humans and artificial agents collaborate to achieve a goal. Their multiagent plans are in the form of Partially Observable Markov Decision Processes (POMDPs) and they check safety constraints in this context. The first paper of theirs that I’m aware of on the topic of Asimovian adaptive (multi)agents appeared in 20068。In 2007, Nathan Schurr got his Ph.D. on this topic9。Milind Tambe continues to teach a very popular courses on “Artificial Intelligence and Science Fiction” in which he discusses his research on Asimovian multiagents.

Finally, I’ll mention miscellaneous post-2006 research that continues to build on my earlier line of work. For one, during 2006-2008 I was part of a DARPA Integrated Learning initiative that focused on methods for airspace deconfliction. Two of my graduate students, Antons Rebguns and Derek Green, along with Geoffrey Levine (U of Illinois) and Ugur Kuter (U of Maryland), applied safety constraints to planners10。Their work was inspired by my earlier research on Asimovian agents. There are also researchers currently building on the NASA Ames work: an international group11,张和密歇根州立大学的其他研究人员亚博体育官网12,以及基于张的工作的意大利研亚博体育官网究人员13。Also, Musliner and Pelican (Honeywell Labs), along with Goldman (SIFT, LLC) started building on myincrementalre-verification work in particular – back in 200514,显然是平纹网仍在对自适应系统进行验证和验证(V&V)。亚博体育苹果app官方下载

现在,我将回答有关此特定研究线的第二个关于自然“下一步”的问题。亚博体育官网我相信上述所有研究都是令人兴奋的,并表现出了希望。亚博体育官网但是我要特别强调NASA/SRI方向对未来有可能富有成果。这是基于我在机器学习,正式方法,V&V和AI的个人经历的基础上。我一直发现,出于计算原因,正式的方法和其他基于逻辑的方法很难扩展到复杂的现实世界问题。在整个职业生涯中,我更倾向于用于机器学习的随机方法,并检查V&V的运行时间。因此,我希望航空航天研究人员将继续朝着他们采用的方向发展。亚博体育官网但是,我也相信他们应该扩大视野。目前有许多用于运行时监视和检查的技术15, or the run-time monitoring and checking of Insup Lee and Oleg Sokolsky16I believe it would be potentially very fruitful to explore how many of the available monitoring and checking techniques are applicable to the behavioral assurance of adaptive systems.

But, most importantly, there is a topic that is critical to the future of building trustworthy adaptive systems and needs to be explored in great depth. That’s the issue of self-recovery/repair. Since around 1998-1999, my colleagues and I have been addressing self-repair in the context of swarm robotics1718。我们的研亚博体育官网究主要集中于基于物理approaches to controlling swarm robotic formations – because physics naturally obeys the “principle of least action,” i.e., if a formation is disturbed then it will automatically perform the minimal actions required to repair the disturbance. This repair is locally optimal but is not necessarily globally optimal. In other words, we have dropped the requirement of global optimality, focusing on “satisficing” behavior instead. Organic and natural physical systems are not perfect, but their lack of perfection often makes them more robust. There are systems where we need precise guarantees of behavior (e.g., the dynamic control of an airplane wing, to ensure that the plane does not stall and crash). But for other tasks, perfection and optimality are not even relevant (e.g., the Internet). We have demonstrated the feasibility of our research both in simulation and on real robots on numerous tasks, including uniform coverage, chain formations, surveillance, the movement of formations through environments with obstacles, and chemical source localization19。Hopefully other researchers will also explore physics-based systems. I believe that an excellent “safe adaptive (multi)agent” architecture would consist of a physics-based controller at the lower level, with a monitor/checker at a higher level to provide strict behavioral guarantees when needed. In particular, a more sophisticated version of our original design in [17] would be quite promising.

In summary, I sincerely hope that the above-mentioned research will continue in the fruitful directions it has already taken, and I also hope that students and researchers will pursue additional, novel research along these lines. It seems to me that the topic of “safe adaptation” is “low-hanging fruit.” DARPA20and other funding agencies have expressed to me their desire to fund research on this topic – because it must be satisfactorily addressed if we are to have deployable, adaptive systems that we can trust.


Luke: In the lines of work you outlined above, what kinds of AI-like functionality are included in the parts of the code that are actually verified? E.g. does the verified code include classical planning algorithms, modern planning algorithms, logical agent architectures, or perhaps even machine learning algorithms in some cases?


Diana: The code that gets verified consists of reactive, “anytime” plans, which are plans that get continually executed in response to internal and external environmental conditions. Each agent’s plan is a finite-state automaton (FSA), which consists of states and state-to-state transitions. Each state in the FSA corresponds to a subtask of the overall task (which is represented by the entire FSA). And each transition corresponds to an action taken by the agent. In general, there are multiple transitions exiting each state, corresponding to the choice of action taken by the agent. For example, consider the scenario I described in one of my previous answers in this interview, i.e., that of a planetary lander along with rovers. Two possible states for a planetary lander L might be “TRANSMITTING DATA” and “RECEIVING DATA.” Suppose the lander is in the former state. If it takes the action “PAUSE” then it will stay in its current state, but if it takes the action “TRANSMIT” then after this action has completed the lander will transition to the latter state. Furthermore, the conditions for transitioning from one state to the next depend not only on what action the agent takes, but also on what’s going on in the environment, including what this agent observes the other agents (e.g., the rovers) doing. For this reason, I call the plans “reactive.”

Every FSA has an initial state, but no final state. The philosophy is that the agents are indefinitely reactive to environmental conditions subsequent to task initiation, and their task is continually ongoing. FSAs are internally represented as finite graphs, with vertices (nodes) corresponding to behavioral states and directed edges corresponding to state-to-state transitions.

机器学习(ML)应用于FSA计划,以进行代理初始化和适应。使用传统的概括和专业操作员,使用进化算法(EAS)进行学习。这些操作员添加,删除,移动或修改顶点和边缘以及与边缘相关的操作。For example, suppose the lander’s transition from its “TRANSMITTING DATA” to its “RECEIVING DATA” state depends not only on its own “TRANSMIT” action, but it also requires that rover R1 successfully received the data transmitted by lander L before the lander can make this state-to-state transition in its FSA. This is a very reasonable requirement. Now suppose that R1’s communication apparatus has catastrophically failed. Then L will need to adapt its FSA by modifying the requirement of checking R1’s receipt of the transmission before it can transition to its “RECEIVING DATA” state. One possibility is that it replaces “R1” with “R2” in its FSA. Many other alternative learning operators are of course also possible, depending on the circumstances.

假定机器学习分为两个阶段:离线然后在线。在离线初始化阶段,每个代理都从候选FSA计划的随机初始化群体开始,然后使用进化算法进化。EA的主要循环是从人口中选择父母计划,应用ML操作员生产后代,评估后代的适应性,然后将后代返回到人口,如果他们足够“合适”。在不断发展良好的候选计划之后,该代理商从其人口中选择了“最佳”(根据其健身标准)。然后,如果需要,将对该计划进行验证以及维修。在在线阶段,代理人进行了范围,计划执行与学习(适应环境变化,例如代理硬件故障),重新验证和根据需要进行计划维修。

The main point of my “Asimovian adaptive agents” paper is that by knowing what adaptation was done by the agent, i.e., what machine learning operator was applied to the FSA, we can streamline the re-verification process enormously.


Luke: AI systems are becoming increasingly autonomous in operation:self-driving cars,机器人导航灾难网站,HFT迅速交易股票的计划以至于Flash崩溃” the market ornearly bankrupt大型股票交易者等

当前的AI安全方法如何(正式验证和恢复,程序合成,单纯形架构,混合系统控制,亚博体育苹果app官方下载分层体系结构, etc.) be extended to meet the safety challenges that will be raised by the future’s highly autonomous systems operating in unknown, continuous, dynamic environments? Do you think our capacity to make systems more autonomous and capable will outpace our capacity to achieve confident safety assurances for those systems?


Diana:我对您的第一个问题的回答是问题和上下文依赖性的。我知道许多围绕单个算法建立的AI社区,这些社区的研究人员试图将该算法应用于尽可能多的问题。亚博体育官网我相信这是一种误导研究的方法。亚博体育官网相反,我一直试图采用问题驱动的方法进行研究。亚博体育官网科学解决问题的最佳方法是深入研究它,并基于先验问题/任务分析选择最合适的解决方案 - 包括计划者或问题解决方案,要验证的属性/约束,适应方法等。这将需要一套不同的AI安全/验证方法的大型套件选择。我不能预见该套房的性质;它必须根据经验构建。当我们处理更复杂的自主系统时,我们的验证技术曲目将相应地增长。亚博体育苹果app官方下载

卢克(Luke),关于自主权是否会超过安全空间的第二个问题。根据您在第一段中列出的应用程序,我看到您的担忧扩展到安全性。实际上,您的安全问题也适用于“物联网,” which includes electronic, interconnected, remotely-accessible autonomous devices such as washing machines, ovens, and thermostats that will be installed in “smart homes” of the future. Businesses usually lack the motivation to install safety and security measures without some incentive. For example, leading software companies release beta versions of their programs with the expectation that the public will find and report the bugs. This is unacceptable as we transition to increasingly capable and potentially hazardous autonomous systems. I believe that the primary incentive will be government regulations. But we can’t wait until disasters arise before putting these regulations in place! Instead, we need to be proactive.

In 2008-2009 I was a member of a美国人工智能促进协会(AAAI)总统小组研究这些问题。这是一个神话般的小组,它为AI研究人员社区带来了认识。亚博体育官网然而,现在是时候提高AI研究人员社区的意识了。亚博体育官网我有一个建议是分配新的或现有成员United States President’s Council of Advisors on Science and Technologythe task of studying and assessing the safety and security of autonomous systems. This council member should consult the following people:

  1. 在开发亚博体育官网自主系统方面拥有丰富经验的AI研究人员亚博体育苹果app官方下载
  2. Engineers from aerospace, transportation, and other applications where safety is paramount
  3. 意识到可能出现的法律问题的律师和立法者
  4. Cyber security experts.

我认为该理事会成员会研究该主题,咨询其他人,举行会议,并通过报告和建议亚博体育官网结束。此外,我坚信应尽快分配此类任务。我们是已经in a state where autonomy is outpacing safety and security, particularly in the commercial sector outside of the transportation industry.


Luke: Given that “autonomy is outpacing safety and security,” what are some other ideas you have for increasing the odds of reliably good outcomes from future autonomous systems?

By the way, I’ve only ever seen an “interim”报告从那个AAAI面板。是否应该在某个时候有一份后续报告?


Diana: I haven’t heard about any follow-up or final report for the AAAI panel, unfortunately.

One idea is that we should have extensive safety and security testing prior to product release, based on established industry/government standards. We may not be able to enforce 100% compliance, but the presence of something like a “Safe and Secure Autonomous Product” certification could motivate consumers to favor purchasing tested and certified products over others lacking compliance. This would be like the existingUL product certification

另一个想法是有监控,安全关闭, self-recovery, and self-repair capabilities associated with autonomous devices. Furthermore, for security reasons these mechanisms should be decoupled from the autonomous system’s control, and they should also be detached from communications (e.g., not connected to the Internet) so as to avoid malicious tampering.

我不相信这是可能的,以确保完整的年代afety and security at all times with autonomous systems. As you stated above, the best we can do is to increase “the odds of reliably good outcomes.” Nevertheless, I believe that substantial progress can be made if there is financial, technical, legal and programmatic support in this direction.


Luke: Thanks, Diana!


  1. Perkins, T. and Barto, A. “Lyapunov设计用于安全加固学习控制。” Proceedings of AAAI’02.
  2. Margineantu,Schumann,Gupta,Drumheller和Fresnedo(联合主席)。关于“Verification, validation, and testing of learning systems.” nips’04。
  3. Dietterich,T。”亚博体育官网部署自适应系统中的研究问题。亚博体育苹果app官方下载” NIPS’06.
  4. 舒曼,j .”Applications of Neural Networks in High Assurance Systems.” Springer-Verlag, 2010.
  5. Stepanyan, V. et al., “Stability and performance metrics for adaptive flight control.” AIAA’09.
  6. Rushby,J。”A safety-case approach for certifying adaptive systems.” AIAA’09.
  7. Tiwari,A。”自适应飞行控制系统的有限验证。亚博体育苹果app官方下载” aiaa’10。
  8. Schurr, N. et al. “Asimovian multiagents: Laws of robotics to teams of humans and agents.” 2006.
  9. Schurr, N. “Toward human-multiagent teams.” USC Ph.D. dissertation, 2007.
  10. Levine, G. et al. “Learning and verifying safety constraints for planners in a knowledge-impoverished system.” Computational Intelligence 28 (3), 2012.
  11. Tamura,G。等。“Towards practical runtime verification and validation of self-adaptive software systems.” Self-Adaptive Systems, LNCS 7475, Springer-Verlag, 2013.
  12. 张等。“动态自适应系统的模块化验证。亚博体育苹果app官方下载” AOSD’09.
  13. Sharifloo, A. and Spoletini, P. “LOVER: Light-weight formal verification of adaptive systems at run time.”组件软件的正式方面。计算机科学第7684卷中的讲义,第170-187页,2013年。
  14. Musliner,D。等。“即时控制器合成的增量验证。” Mochart’05。
  15. Rubinfeld, R.检查
  16. Sokolsky, O.Selected Recent Publications by Subject
  17. Gordon,D。al。“分布式空间控制,移动物理代理的全球监测和转向。” ICIIS’99.
  18. Spears, W. and Spears, D. (Eds.) “物理学:基于物理的群智能。” Springer-Verlag,2012年。
  19. Spears, W. and Spears, D. (Eds.) 2012.
  20. DARPA赞助的ISAT在2006年在SRI举行的“可信赖的自适应系统”会议。亚博体育苹果app官方下载