Colloquium系列稳健和有益的AI
(CSRBAI)

miri_vertical_w_border.fhi-logo.牛津徽标

概述

From May 27 to June 17, 2016, the Machine Intelligence Research Institute (MIRI) and Oxford University’s Future of Humanity Institute (FHI) co-hosted a Colloquium Series on Robust and Beneficial AI at MIRI’s offices in Berkeley, California. This program brought together a variety of academics and professionals to address the technical challenges associated with AI robustness and reliability, with a goal of facilitating conversations between people interested in a number of different approaches.

Attendees worked to identify and collaborate on research projects aimed at ensuring AI is beneficial in the long run, with a focus on technical questions that appear tractable today. The series included lectures by selected speakers, open-ended discussions, and working groups on specific problems. Targeted workshops ran on weekends.

参与者参加了一个(或部分)说d/or Workshop of their choosing. Attendance of the entire event was possible, though not required.

该计划是免费参加的。提供食物,提供住宿和旅行援助的限制。

Venue

MIRI’s new offices in downtown Berkeley, California.

Schedule and Topics

整个计划从5月27日星期五到2016年6月17日星期六,结束了ICML前一天。该计划分为下面详述的四个部分。

CSRBAICalendar-updated2

Daily Schedule

Every day the basic schedule is:

  • 10:00am – Doors officially open.
  • 11:00am – Day begins (first talk or workshop opening).
  • 1:00pm – Lunch provided onsite.
  • 6:00pm – Dinner provided onsite.
  • 7:00pm – Doors officially close.

CSRBAI Week 1: Transparency

在许多情况下,它可以是非常困难的or humans to understand AI systems’ internal states and reasoning. This makes it more difficult to anticipate such systems’ behavior and correct errors. On the other hand, there have been striking advances in communicating the internals of some machine learning systems, and in formally verifying certain features of algorithms. We would like to see how far we can push the transparency of AI systems while maintaining their capabilities.

These topics are introduced in the first week but built on afterwards, as transparency is an important component of many approaches to robustness and error-tolerance.

Relevant topics include:

Scheduled events:

Event Kickoff and Colloquium Talks
星期五,5月27日

Stuart Russell(UC Berkeley)
艾:到目前为止的故事视频,slides
Abstract: I will discuss the need for a fundamental reorientation of the field of AI towards provably beneficial systems. This need has been disputed by some, and I will consider their arguments. I will also discuss the technical challenges involved and some promising initial results.

艾伦蕨类植物(Oregon State University)
Toward Recognizing and Explaining Uncertainty视频,slides 1,slides 2

Francesca Rossi(IBM Research)
道德偏好视频,slides
Abstract: Intelligent systems are going to be more and more pervasive in our everyday lives. They will take care of elderly people and kids, they will drive for us, and they will suggest doctors how to cure a disease. However, we cannot let them do all this very useful and beneficial tasks if we don’t trust them. To build trust, we need to be sure that they act in a morally acceptable way. So it is important to understand how to embed moral values into intelligent machines. Existing preference modeling and reasoning framework can be a starting point, since they define priorities over actions, just like an ethical theory does. However, many more issues are involved when we mix preferences (that are at the core of decision making) and morality, both at the individual level and in a social context. I will discuss some of these issues as well as some possible solutions.

透明度研讨会
Sat/Sun, May 28-29

汤姆·迪斯特里奇(Oregon State University)
关于AI透明度的问题slides

此研讨会专注于AI系统中透明度的主题,以及我们如何在维护能力时提高透明度。亚博体育苹果app官方下载本讲习班通过非正式演示,小组合作以及定期重新组合和讨论探索了这些问题。

CSRBAI Week 2: Robustness and Error-Tolerance

我们如何确保当AI系统失败时,它们优雅和可检测的失败?亚博体育苹果app官方下载这对于必须适应新的或更改环境的系统很难;亚博体育苹果app官方下载当测试数据的分布与培训数据分发不匹配时,机器学习系统的标准PAC保证未能保持。亚博体育苹果app官方下载此外,能够意味着最终亚博体育苹果app官方下载推理的系统可能具有掩盖将导致其被关闭的故障的激励措施。我们希望有开发和验证AI系统的方法,以便可以快速注意到任何错误和纠正。亚博体育苹果app官方下载

Relevant topics include:

Scheduled events:

会议谈话
Wed, June 1

Stefano Ermon(Stanford)
概率推理和准确性保证视频,slides
Abstract: Statistical inference in high-dimensional probabilistic models is one of the central problems in AI. To date, only a handful of distinct methods have been developed, most notably (MCMC) sampling and variational methods. While often effective in practice, these techniques do not typically provide guarantees on the accuracy of the results. In this talk, I will present alternative approaches based on ideas from the theoretical computer science community. These approaches can leverage recent advances in combinatorial optimization and provide provable guarantees on the accuracy.

Thu, June 2

Paul Christiano(UC Berkeley)
培训一个对齐的RL代理视频

吉姆巴巴科克
AGI遏制问题视频,slides
Abstract: Ensuring that powerful AGIs are safe will involve testing and experimenting on them, but a misbehaving AGI might try to tamper with its test environment to gain access to the internet or modify the results of tests. I will discuss the challenges of securing environments to test AGIs in.http://arxiv.org/abs/1604.00545

星期五,6月3日

Bart Selman(Cornell University)
Non-Human Intelligence视频,slides

杰西卡泰勒(MIRI)
Value Alignment for Advanced Machine Learning Systems视频
摘要:如果使用与现代机器学习的算法开发人工通用智能,我们如何瞄准所产生的系统以安全地实现世界上有用的目标?亚博体育苹果app官方下载我为新的Miri项目提出了一个专注于这个问题的技术议程。

Workshop on Robustness and Error-Tolerance
Sat/Sun, June 4–5

This workshop focused on the topic of robustness and error-tolerance in AI systems, and how to ensure that when AI system fail, they fail gracefully and detectably. We want methods of developing and validating AI systems such that any mistakes can be quickly noticed and corrected. This workshop explored these issues through informal presentations, small group collaborations, and regular regrouping and discussion.

CSRBAI Week 3: Preference Specification

The perennial problem of wanting code to “do what I mean, not what I said” becomes increasingly challenging when systems may find unexpected ways to pursue a given goal. Highly capable AI systems thereby increase the difficulty of specifying safe and useful goals, or specifying safe and useful methods for learning human preferences.

Relevant topics include:

Scheduled events:

会议谈话
Wed, June 8

Dylan Hadfield-Menell(UC Berkeley)
The Off-Switch: Designing Corrigible, yet Functional, Artificial Agents视频,slides
Abstract: An artificial agent is corrigible if it accepts or assists in outside correction for its objectives. At a minimum, a corrigible agent should allow its programmers to turn it off. An artificial agent is functional if it is capable of performing non-trivial tasks. For example, a machine that immediately turns itself off is useless (except perhaps as a novelty item). In a standard reinforcement learning agent, incentives for these behaviors are essentially at odds. The agent will either want to be turned off, want to stay alive, or be indifferent between the two. Of these, indifference is the only safe and useful option but there is reason to believe that this is a strong condition on the agent’s incentives. In this talk, I will propose a design for a corrigible, yet functional, agent as the solution to a two-player cooperative game where the robot’s goal is to maximize the humans sum of rewards. We do an equilibrium analysis of the solutions to the game and identify three key properties. First, we show that if the human acts rationally, then the robot will be corrigible. Second, we show that if the robot has no uncertainty about human preferences, then the robot will be incorrigible or non-function if the human is even slightly suboptimal. Finally, we analyze the Gaussian setting and characterize the necessary and sufficient conditions, as a function of the robot’s belief about human preferences and the degree of human irrationality, to ensure that the robot will be corrigible and functional.

Thu, June 9

BAS SteuneBrink.(The Swiss AI Lab IDSIA)
About Understanding, Meaning, and Values视频,slides
Abstract: We will discuss ongoing research into value learning: how an agent can gradually learn to understand the world it’s in, learn to understand what we mean for it to do, learn to understand as well as be compelled to adhere to proper values, and learn to do so robustly in the face of inaccurate, inconsistent, and incomplete information as well as underspecified, conflicting, and updatable goals. To fulfill this ambitious vision we have a long road of gradual teaching and testing ahead of us.

Jan Leike.(Future of Humanity Institute)
General Reinforcement Learning视频,slides
Abstract: General reinforcement learning (GRL) is the theory of agents acting in unknown environments that are non-Markov, non-ergodic, and only partially observable. GRL can serve as a model for strong AI and has been used extensively to investigate questions related to AI safety. Our focus is not on practical algorithms, but rather on the fundamental underlying problems: How do we balance exploration and exploitation? How do we explore optimally? When is an agent optimal? We outline current shortcomings of the model and point to future research directions.

Fri, June 10

汤姆埃弗蒂特(Australian National University)
避免具有价值增强学习的幂级头视频,slides
Abstract: How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward — the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading. Our VRL agent offers the ease of control of RL agents and avoids the incentive for wireheading.https://arxiv.org/abs/1605.03143

Jaan Altosaar(Princeton & Columbia)
f-Proximity Variational Inference
摘要:变分推理是一种近似后部推理的流行方法。但是,如果参数初始化很差,则该方法可能遭受病理学和模型的部分部分。我们通过开发通过可以是参数任何功能的约束来制定用于约束模型参数的一般框架来解决这个问题。我们得出了一种可扩展的变体,可以快速地作为变分推理。在我们的实验中,我们显示我们的方法对初始化的方法不太敏感,并且可以增加与离散和连续变量的模型的数据后部的分集。在变形式自动化器(使用神经网络以扩展贝叶斯推理的模型)中,我们改善了模型容量的使用,并违反了这一点不会导致更好的性能。

偏好规范研讨会
Sat/Sun, June 11–12

This workshop focused on the topic of preference specification for highly capable AI systems. This workshop explored these issues through informal presentations, small group collaborations, and regular regrouping and discussion.

CSRBAI Week 4: Agent Models and Multi-Agent Dilemmas

When designing an agent to behave well in its environment, it is risky to ignore the effects of the agent’s own actions on the environment or on other agents within the environment. For example, a spam classifier in wide use may cause changes in the distribution of data it receives, as adversarial spammers attempt to bypass the classifier. Considerations from game theory, decision theory, and economics become increasingly useful in such cases.

Relevant topics include:

  • Adversarial games and cybersecurity
  • Multi-agent coordination
  • Economic models of AI interactions

Scheduled events:

会议谈话
Wed, June 15

迈克尔·威尔曼(University of Michigan)
Autonomous Agents in Financial Markets: Implications and Risks视频,slides
Abstract: Design for robust and beneficial AI is a topic for the future, but also of more immediate concern for the leading edge of autonomous agents emerging in many domains today. One area where AI is already ubiquitous is on financial markets, where a large fraction of trading is routinely initiated and conducted by algorithms. Models and observational studies have given us some insight on the implications of AI traders for market performance and stability. Design and regulation of market environments given the presence of AIs may also yield lessons for dealing with autonomous agents more generally.

Stefano Albrecht(UT Austin)
Learning to distinguish between belief and truth视频,slides
Abstract: Intelligent agents routinely build models of other agents to facilitate the planning of their own actions. Sophisticated agents may also maintain beliefs over a set of alternative models. Unfortunately, these methods usually do not check the validity of their models during the interaction. Hence, an agent may learn and use incorrect models without ever realising it. In this talk, I will argue that robust agents should have both abilities: to construct models of other agentscontemplate the correctness of their models. I will present a method for behavioural hypothesis testing along with some experimental results. The talk will conclude with open problems and a possible research agenda.

Thu, June 16

斯图尔特阿姆斯特朗(Future of Humanity Institute, Oxford University)
Reduced impact AI and other alternatives to friendliness视频,slides
Abstract: This talk will look at some of the ideas developed to create safe AI without solving the problem of friendliness. It will focus first on “reduced impact AI”, AIs designed to have little effect on the world – but from whom high impact can nevertheless be extracted. It will then delve into the new idea of AIs designed to have preferences over their own virtual worlds only, and look at the advantages – and limitations – of using indifference as a tool of AI control.

Andrew Critch(MIRI)
有限代理的强大合作视频
Abstract: The first interaction between a pair of agents who might destroy each other can resemble a one-shot prisoner’s dilemma. Consider such a game where each player is an algorithm with read-access to its opponent’s source code. Tennenholtz (2004) introduced an agent which cooperates iff its opponent’s source code is identical to its own, thus sometimes achieving mutual cooperation while remaining unexploitable in general. However, precise equality of programs is a fragile cooperative criterion. Here, I will exhibit a new and more robust cooperative criterion, inspired by ideas of LaVictoire, Barasz and others (2014), using a new theorem in provability logic for bounded reasoners.

Workshop on Agent Models and Multi-Agent Dilemmas
Fri, June 17

This workshop focused on the topics of designing agents that behave well in their environments, without ignoring the effects of the agent’s own actions on the environment or on other agents within the environment. This workshop explored these issues through informal presentations, small group collaborations, and regular regrouping and discussion.

Dinner and Closing Mixer

格式

讨论会讨论天

Colloquium talk days feature between one and three talks, beginning at 11:00 am, 12:00 pm, and 2:00 pm. Talks will range from 20 minutes in length to 55 minutes, as needed for the topic, with remaining time devoted to discussion, Q&As, and breaks. The remaining time in the afternoon is left unstructured.

Workshops

周末研讨会专注于在小组中工作,以添加到知识前沿,并开始未来的合作(而不是呈现现有研究)。亚博体育官网每个研讨会都始于一些短暂的开放会谈,然后参与者将组建讨论和调查的主题议程,较小的亚组。这些亚组是临时和液体;目标是让人们有效地合作,有效地讨论共同兴趣的主题。

Open Days

星期一和星期二主要是未经核制的,可以以任何方式使用的方式使用。在一个空格的空间将有很多空间,其中一些突破室和白板。

Apply Now

Applications to attend this event are now closed.

Information for Participants

可以找到一般访客信息www.hdjkn.com/visitors/.

成本

该计划可以自由参加。提供食物,提供住宿和旅行援助的限制。

Accommodations

Accommodations are provided for attendees, as available, at a hotel in downtown Berkeley a block away from the MIRI offices.

旅行

将为选择与会者报销航班和旅行费用。与会者将负责预订旅行。发送收据receipts@www.hdjkn.com, along with your preferred method of reimbursement (PayPal, ACH, or check).

国际参与者

Attendees are provided with a Letter of Invitation for use when entering the United States.