新论文：“ Cirl框架中的不可验证”

August 31, 2017|Matthew Graves|文件

MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance inCooperative Inverse Reinforcement Learning(CIRL) tasks fails to imply that software agents will assist or cooperate with programmers.

这paper, titled “Incorrigibility in the CIRL Framework，”提出了四种情况，其中cirl违反了四个条件符合条件定义Soares等。(2015)。Abstract:

价值学习系统有激励措施遵循关闭说明亚博体育苹果app官方下载，假设关闭指令提供了有关哪些操作导致有价值结果的信息（从技术意义上讲）。但是，此假设对于模拟错误指定的模型并不强（例如，在程序员错误的情况下）。我们通过提出一些有监督的POMDP方案来证明这一点，其中参数化奖励功能中的错误删除了遵循关闭命令的激励措施。这些困难与Soares等人讨论的困难相似。（2015年）在他们的有关科罗的论文中。

We argue that it is important to consider systems that follow shutdown commands under some weaker set of assumptions (e.g., that one small verified module is correctly implemented; as opposed to an entire prior probability distribution and/or parameterized reward function). We discuss some difficulties with simple ways to attempt to attain these sorts of guarantees in a value learning framework.

该论文是对Hadfield-Menell，Dragan，Abbeel和Russell的论文的回应，“开关游戏。” Hadfield-Menell et al. show that an AI system will be more responsive to human inputs when it is uncertain about its reward function and thinks that its human operator has more information about this reward function. Carey shows that the CIRL framework can be used to formalize the problem of corrigibility, and that the known assurances for CIRL systems, given in “The Off-Switch Game”, rely on strong assumptions about having an error-free CIRL system. With less idealized assumptions, a value learning agent may have beliefs that cause it to evade redirection from the human.

关闭按钮的目的是关闭AI系统亚博体育苹果app官方下载如果所有其他保证都失败了, e.g., in the event that the AI system is ignoring (for one reason or another) the instructions of the operators. If the designers of [the AI system]r已经完美地编程了系统，以至于先验和[奖励亚博体育苹果app官方下载功能]r完全没有错误，然后是Hadfield-Menell等人的定理。（2017年）确实适用。实际上，这意味着为了变得可验证，有必要拥有一个不确定所有可能重要的事物的AI系统。亚博体育苹果app官方下载问题在于，在所有可能的世界上执行贝叶斯推理，所有可能的价值函数都非常棘手。实际上，人类可能必须使用大量的启发式方法和近似值来实施系统的信仰系统和更新规则。亚博体育苹果app官方下载[…]

Soares等。（2015年）似乎想要一个可以作为最后一个度假胜地机制的关闭按钮，以便在观察并拒绝程序员建议的情况下关闭AI系统（程序员认为该系统正在发生故障）。亚博体育苹果app官方下载清楚地，一些part of the system must be working correctly in order for us to expect the shutdown button to work at all. However, it seems undesirable for the working of the button to depend on there being zero critical errors in the specification of the system’s prior, the specification of the reward function, the way it categorizes different types of actions, and so on. Instead, it is desirable to develop a shutdown module that is small and simple, with code that could ideally be rigorously verified, and which ideally works to shut the system down even in the event of large programmer errors in the specification of the rest of the system.

为了在价值学习框架中执行此操作，我们需要一个价值学习系统，该系统（i）能够将其操作覆盖，该模块被关闭命令的小型验证模块；亚博体育苹果app官方下载（ii）没有动力去除，损坏或忽略关闭模块；（iii）有一些小动力来保持其关闭模块；即使在广泛的情况下r，先验，一组可用动作等。

即使学习了公用事业功能，仍然需要对意外失败的额外防御。希望这可以通过对AI系统进行模块化来实现。亚博体育苹果app官方下载为此，我们将需要一个代理的模型，该模型将以符合其他系统组件的指定性能的方式进行可检修。亚博体育苹果app官方下载

Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.

August 2017 Newsletter

August 16, 2017|rob Bensinger|Newsletters

research updates

“一种正式的方法来解决逻辑非善良的问题”: We presented our work onlogical inductionat the第16届理论和知识理论方面会议。
在IAFF的新事物：Smoking Lesion Steelman; “像这个世界一样，但是…”;Jessica Taylor’s Current Thoughts on Paul Christiano’s Research Agenda;关于反事实的开放问题：初学者的简介
“A Game-Theoretic Analysis of The Off-Switch Game”: researchers from Australian National University and Linköping University release a new paper on corrigibility, spun off from a MIRIx workshop.

General updates

开放慈善项目的丹尼尔·杜威（Daniel Dewey）writes up his current thoughts关于Miri高度可靠的代理设计工作，Nate Soares和其他人的讨论在评论部分。
生命研究所未来的莎拉·马奎特（Sarah Marquart）discussesMiri在逻辑电感器，理由和其他主题方面的工作。
We attended the决策理论与人工智能未来的研讨会和the5th International Workshop on Strategic Reasoning。

News and links

开放菲尔奖a four-year $2.4 million grant蒙特利尔学习算法研究所Yoshua Bengio小组“支持先进人工智能的潜在风险技术研究”。亚博体育官网
A newIARPA-commissioned reportdiscusses the potential for AI to accelerate technological innovation and lead to “a self-reinforcing technological and economic edge”. The report suggests that AI “has the potential to be a worst-case scenario” in combining high destructive potential, military/civil dual use, and difficulty of monitoring with potentially low production difficulty.
Elon Musk and Mark Zuckerbergcriticize each other’s statementson AI risk.
China制定计划for major investments in AI (全文,translation note).
Microsoft opens一个新的AI实验室with a goal of building “more general artificial intelligence”.
来自人类研究所未来的新事物：“”Trial without Error: Towards Safe Reinforcement Learning via Human Intervention。”
FHI is seeking两个研究亚博体育官网研究员to study AI macrostrategy.
Daniel Selsam and others release certigrad (arXiv,github), a system for creating formally verified machine learning systems; see discussion on Hacker News (1,2).
Applications are open for the Center for Applied Rationality’sAI夏季研究员计划，将于9月8日至25日运行。

July 2017 Newsletter

July 25, 2017|rob Bensinger|Newsletters

一批重大Miri中期更新：迄今为止，我们收到了最大的捐款，从以太坊投资者那里获得了101万美元！我们的研亚博体育官网究优先事项也有所改变，反映了四位新的全职研究人员（Marcello Herreshoff，Sam Eisenstat，Tsvi Benson-Tilsen和Abram Demski）以及Patrick Lavictoire和Jessica Taylor的离开。

research updates

在IAFF的新事物：Futarchy修复,Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
New at AI Impacts:Some Survey Results!,AI Hopes and Fears in Numbers

General updates

We attended theEffective Altruism Global Bostonevent. Speakers included Allan Dafoe on “The AI Revolution and International Politics” (视频) and Jason Matheny on “Effective Altruism in Government” (视频).
MIRI COO Malo Bourgon moderated anIEEE workshop修改一个部分Ethically Aligned Design。

News and links

新从DeepMind人员:“亚博体育官网Interpreting Deep Neural Networks Using Cognitive Psychology”
新从OpenAI人员:“亚博体育官网Corrigibility”
A collaboration between DeepMind and OpenAI: “Learning from Human Preferences”
recent progress in deep learning: “Self-Normalizing Neural Networks”
从Ian Goodfellow and Nicolas Papernot: “这Challenge of Verification and Testing of Machine Learning”
从80,000 Hours: aAI政策和战略工作指南和一个相关interview with Miles Brundageof the Future of Humanity Institute.

Updates to the research team, and a major donation

2017年7月4日|Malo Bourgon|News

We have several major announcements to make, covering new developments in the two months since our2017 strategy update:

1. 5月30日，我们收到了惊喜$1.01 million donationfrom anEthereumcryptocurrency investor. This is the single largest contribution we have received to date by a large margin, and will have a substantial effect on our plans over the coming year.

2.Two new full-time researchersare joining MIRI: Tsvi Benson-Tilsen and Abram Demski. This comes in the wake of Sam Eisenstat and Marcello Herreshoff’s addition to the team在五月。我们还开始与工程师进行试用，为我们的新板块合作software engineer job openings。

3。Two of our researchers have recently left: Patrick LaVictoire and Jessica Taylor, researchers previously heading work on our “高级机器学习系统的对齐亚博体育苹果app官方下载”研亚博体育官网究议程。

For more details, see below.

2017年6月通讯

2017年6月16日|rob Bensinger|Newsletters

research updates

A newAI Impactspaper: “When Will AI Exceed Human Performance?” News coverage atDigital Trends和麻省理工学院技术评论。
在IAFF的新事物：Cooperative Oracles;杰西卡·泰勒（Jessica Taylor）在AAMLS议程上;An Approach to Logically Updateless Decisions
Our 2014 technical agenda, “Agent Foundations for Aligning Machine Intelligence with Human Interests,” is now available as a book chapter in the anthology技术奇异性：管理旅程。

General updates

readthesequences.com：支持者汇总了Eliezer Yudkowsky的网络版本rationality: From AI to Zombies。
这Oxford Prioritisation Project publishesa model of MIRI’s workas an existential risk intervention.

News and links

从麻省理工学院技术评论：“Why Google’s CEO Is Excited About Automating Artificial Intelligence。”
A new alignment paper from researchers at Australian National University and DeepMind: “reinforcement Learning with a Corrupted Reward Channel。”
来自Openai的新事物：Baselines, a tool for reproducing reinforcement learning algorithms.
这人类研究所的未来和Centre for the Future of Intelligencejoin the Partnership on AI alongside其他二十个小组。
新的AI安全job postings包括研究角色亚博体育官网人类研究所的未来和theCenter for Human-Compatible AI，以及ASA加州大学洛杉矶分校脉动奖学金for studying AI’s potential large-scale consequences and appropriate preparations and responses.

May 2017 Newsletter

May 10, 2017|rob Bensinger|Newsletters

research updates

在IAFF的新事物：无处不在的相反律师问题;Two Major Obstacles for Logical Inductor Decision Theory;Generalizing Foundations of Decision Theory II。
New at AI Impacts:AI时间轴预测的页面指南
“Decisions Are For Making Bad Outcomes Inconsistent”: Nate Soares dialogues on some of the deeper issues raised by our “Cheating Death in Damascus” paper.
We ran a machine learningworkshop在四月初。
“确保比人类智能更聪明”：Nate在Google的演讲（视频) provides probably the best general introduction to MIRI’s work on AI alignment.

General updates

Ourstrategy update讨论了我们的AI预测和研究重点，新的外展目标，Miri/DeepMind合作以及其他亚博体育官网新闻的变化。
MIRI is hiring software engineers!If you’re a programmer who’s passionate about MIRI’s mission and wants to directly support our research efforts,在这里申请与我们一起审判。
MIRI Assistant Research Fellow Ryan Carey has taken on an additional联系有了存在生存风险的中心，也正在帮助编辑Informaticaon superintelligence.

News and links

DeepMind researcher Viktoriya Krakovna listsICLR的安全亮点。
DeepMind is寻找申请人for a policy research position “to carry out research on the social and economic impacts of AI”.
这Center for Human-Compatible AIis hiring an assistant director。感兴趣的各方也可能希望申请event coordinatorposition at the new Berkeley Existential Risk Initiative, which will help support work at CHAI and elsewhere.
80,000 Hours lists other potentially high-impact开口，包括斯坦福大学AI索引项目的项目，白宫奥斯特,IARPA, andIVADO。
New papers: “One-Shot Imitation Learning“ 和 ”Stochastic Gradient Descent as Approximate Bayesian Inference。”
开放慈善项目总结了其发现early field growth。
有效利他主义中心正在为有效的利他主义资金in a range of cause areas.

2017 Updates and Strategy

2017年4月30日|rob Bensinger|美里战略

In our last strategy update (2016年8月), Nate wrote that MIRI’s priorities were to make progress on our代理基础agenda and begin work on our new “高级机器学习系统的对齐亚博体育苹果app官方下载” agenda, to collaborate and communicate with other researchers, and to grow our research and ops teams.

Since then, senior staff at MIRI have reassessed their views on how far offartificial general intelligence(AGI) is and concluded that shorter timelines are more likely than they were previously thinking. A few lines of recent evidence point in this direction, such as:¹

AI research is becoming more visibly exciting and资金充足。这表明，更多的顶尖人才（在下一代和当前一代）可能会将他们的注意力转移到AI上。
Agi吸引了更多的学术关注，这是最高AI组的既定目标DeepMind,OpenAI, andFAIR。In particular, many researchers seem more open to thinking about general intelligence now than they did a few years ago.
research groups associated with AGI are showing much clearer externalsignsof profitability.
AI成功Alphagoindicate that it’s easier to outperform top humans in domains like Go (without any new conceptual breakthroughs) than might have been expected.²这降低了我们对与其他领域中人类相抗衡的重大概念突破数量的估计。

这re’s no consensus among MIRI researchers on how long timelines are, and our aggregated estimate puts medium-to-high probability on scenarios in which the research community hasn’t developed AGI by, e.g., 2035. On average, however, research staff now assign moderately higher probability to AGI’s being developed before 2035 than we did a year or two ago. This has a few implications for our strategy:

1. Our relationships with current key players in AGI safety and capabilities play a larger role in our strategic thinking. Short-timeline scenarios reduce the expected number of important new players who will enter the space before we hit AGI, and increase how much influence current players are likely to have.

2. Our research priorities are somewhat different, since shorter timelines change what research paths are likely to pay out before we hit AGI, and also concentrate our probability mass more on scenarios where AGI shares various features in common with present-day machine learning systems.

这两个更新都代表我们已经流行的方向，原因是各种原因。³However, we’re moving in these two directions more quickly and confidently than we were last year. As an example, Nate is spending less time on staff management and other administrative duties than in the past (having handed these off to MIRI COO Malo Bourgon) and less time on broad communications work (having delegated a fair amount of this to me), allowing him to spend more time on object-level research, research prioritization work, and more targeted communications.⁴

I’ll lay out what these updates mean for our plans in more concrete detail below.

Software Engineer Internship / Staff Openings

2017年4月30日|Alex Vermeer|News

这机器情报研究所亚博体育官网is looking for highly capable software engineers to directly support ourAI alignment亚博体育官网研究工作，重点是与机器学习有关的项目。我们正在寻找具有强大编程技能的工程师，他们对Miri的使命充满热情，并寻求具有挑战性和智力吸引力的工作。

While our goal is to hire full-time, we are initially looking for paid interns. Successful internships may then transition into staff positions.

关于实习计划

这start time for interns is flexible, but we’re aiming for May or June. We will likely run several batches of internships, so if you are interested but unable to start in the next few months, do still apply. The length of the internship is flexible, but we’re aiming for 2–3 months.

Examples of the kinds of work you’ll do during the internship:

replicate recent machine learning papers, and implement variations.
Learn about and implement machine learning tools (including results in the fields of deep learning, convex optimization, etc.).
run various coding experiments and projects, either independently or in small groups.
rapidly prototype, implement, and test AI alignment ideas related to machine learning (after demonstrating successes in the above points).

For MIRI, the benefit of this program is that it’s a great way to get to know you and assess you for a potential hire. For applicants, the benefits are that this is an excellent opportunity to get your hands dirty and level up your machine learning skills, and to get to the cutting edge of the AI safety field, with a potential to stay in a full-time engineering role after the internship concludes.

Our goal is to trial many more people than we expect to hire, so our threshold for keeping on engineers long-term as full staff will be higher than for accepting applicants to our internship.

这Ideal Candidate

Some qualities of the ideal candidate:

Extensive breadth and depth of programming skills. Machine learning experience is not required, though it is a plus.
非常熟悉与AI一致性有关的基本思想。
Able to work independently with minimal supervision, and in team/group settings.
Willing to accept a below-market rate. Since MIRI is a non-profit, we can’t compete with the Big Names in the Bay Area.
热衷于在Miri工作并帮助AI Alignment的领域。
不寻找“通用”软件工程位置。

Working at MIRI

我们努力使在Miri工作成为有意义的经历。

现代工作空间 - 我们中的许多人都有可调节的站台，并带有大型外部监视器。我们认为工作空间人体工程学很重要，并尝试操纵工作站以使其尽可能舒适。我们的办公室还提供免费的零食，饮料和餐点。
灵活的工作时间,我们没有严格的办公时间, and we don’t limit employees’ vacation days. Our goal is to make rapid progress on our research agenda, and we would prefer that staff take a day off than that they extend tasks to fill an extra day.
居住在湾区 - 美里办公室位于加利福尼亚州伯克利市中心。从我们的办公室，您可以步行30秒即可到达BART（湾区快速运输），这可以使您绕过湾区。步行3分钟即可到达加州大学伯克利分校校园；yabo体育官网还有30分钟的巴特乘车前往旧金山市中心。

EEO & Employment Eligibility

MIRI is an equal opportunity employer. We are committed to making employment decisions based on merit and value. This commitment includes complying with all federal, state, and local laws. We desire to maintain a work environment free of harassment or discrimination due to sex, race, religion, color, creed, national origin, sexual orientation, citizenship, physical or mental disability, marital status, familial status, ethnicity, ancestry, status as a victim of domestic violence, age, or any other status protected by federal, state, or local laws.

申请

如果有兴趣，点击此处申请。For questions or comments, emailengineering@www.hdjkn.com。

Update (December 2017): We’re now putting less emphasis on finding interns and looking for highly skilled engineers available for full-time work.此处更新的工作发布。

新论文：“ Cirl框架中的不可验证”

Sign up to get updates on new MIRI technical results

August 2017 Newsletter

July 2017 Newsletter

Updates to the research team, and a major donation

2017年6月通讯

May 2017 Newsletter

2017 Updates and Strategy

Software Engineer Internship / Staff Openings

关于实习计划

这Ideal Candidate

Working at MIRI

EEO & Employment Eligibility

申请

搜索

Browse

Subscribe