2015 in review

||MIRI Strategy

As Luke had done in years past (see2013 in reviewand2014 in review), I (Malo) wanted to take some time to review our activities from last year. In the coming weeks Nate will provide a big-picture strategy update. Here, I’ll take a look back at 2015, focusing on our research progress, academic and general outreach, fundraising, and other activities.

After seeing signs in 2014 that interest in AI safety issues was on the rise, wemade plans发展我们的研究团队。亚博体育官网对Bostrom的反应推动了超级智能and the Future of Life Institute’s “Future of AI” conference, interest continued to grow in 2015. This suggested that we could afford to accelerate our plans, but it wasn’t clear how quickly.

In 2015 we did not release a mid-year strategic plan, as Luke did in 2014. Instead, we laid out various conditional strategies dependent on how much funding we raised during our2015 Summer Fundraiser。The response was great; we had our most successful fundraiser to date. We hit我们的第一个two funding targets(and then some), and set out on an accelerated 2015/2016 growth plan.

As a result, 2015 was a big year for MIRI. After publishing ourtechnical agendaat the start of the year, we made progress on many of the open problems it outlined, doubled the size of our core research team, strengthened our connections with industry groups and academics, and raised enough funds to maintain our growth trajectory. We’re very grateful to all our supporters, without whom this progress wouldn’t have been possible.

2015 Research Progress

我们的 ”使机器智能与人类利益对齐的代理基础” research agenda divides open problems into three categories: high reliability (which includes logical uncertainty, naturalized induction, decision theory, and Vingean reflection), error tolerance, and value specification.1MIRI’s top goal in 2015 was to make progress on these problems.

We met our expectations for research progress in each category, with the exception of logical uncertainty and naturalized induction (where we made more progress than expected) and error tolerance (where we made less progress than expected).

Below I’ve provided a brief summary of our progress in each area, with additional details and a full publication list in collapsed “Read More” sections. Some of the papers we published in 2015 were based on research from 2014 or earlier, and some of our 2015 results weren’t published until 2016 (or remain unpublished). In this review I’ll focus on 2015’s new technical developments, rather than on pre-2015 material that happened to be published in that year.

Logical Uncertainty and Naturalized Induction

We expected to makemodestprogress on these two problems in 2015. I’m pleased to report we madesizableprogress.

2015年看到了我们发展的尾声reflective oracles, and early work on “optimal estimators.” Our most important research advance of the year, however, was likely our successdividing logical uncertainty into two subproblems, which happened in late 2015 and the very beginning of 2016.

一个直观的在逻辑上正确unce约束rtain reasoning is that one’s probabilities reflect known logical relationships between claims. For example, if you know that two claims are mutually exclusive (such as “this computation outputs a 3” and “this computation outputs a 7”), then even if you can’t evaluate the claims, you should assign probabilities to the two claims that sum to at most 1.

A second intuitive constraint is that one’s probabilities reflect empirical regularities. Once you observe enough digits of π, you should eventually guess that the numbers 8 and 3 occur equally often in π’s decimal expansion, even if you have not yet proven that π isnormal

In 2015, we developed two different algorithms to solve these two subproblems in isolation.

In collaboration with Benya Fallenstein and other MIRI researchers, Scott Garrabrant solved the problem of respecting logical relationships in a series ofIntelligent Agent Foundations Forum (IAFF) posts, resulting in the “Inductive Coherence“ 纸。斯科特(Scott)和MIRIxLosAngelesgroup in “Asymptotic Logical Uncertainty and the Benford Test,” which was further developed into the “Asymptotic Convergence in Online Learning with Unbounded Delays”论文在2016年。

These two approaches to logical uncertainty were not only nonequivalent, but seemed to preclude each other. The obvious next step is to investigate whether there is a way to solve both subproblems at once with a single procedure—a task we have since made some (soon-to-be-announced) progress on in 2016.

MIRI research associate Vanessa Kosoy’s work on his “optimal estimators” framework represents a large separate corpus of work on logical uncertainty, which may also have applications for decision theory. Vanessa’s work has not yet been officially published, but much of it is availableon IAFF

我们的other significant result in logical uncertainty was Benya Fallenstein, Jessica Taylor, and Paul Christiano’sreflective oracles, building on work that began before 2015 (IAFF digest)。反射神谕避免很多矛盾that normally arise when agents attempt to answer questions about equivalently powerful agents, allowing us to study multi-agent dilemmas and reflective reasoning with greater precision.

Reflective oracles are interesting in their own right, and have proven applicable to a number of distinct open problems. The fact that reflective oracles require no privileged agent/environment distinction suggests that they’re a step in the right direction for naturalized induction. Jan Leike has recently demonstrated that reflective oracles also solve a longstanding open problem in game theory,the grain of truth problem。反射性齿轮为游戏理论提供了第一个完整的决策理论基础,表明通用方法最大化预期的实用程序可以在重复游戏中实现近似NASH的平衡。

In summary, our 2015 logical uncertainty and naturalized induction papers based on pre-2015 work were:

2015 research published the same year:

2015 research published in 2016 or forthcoming:

有关IAFF的其他逻辑不确定性工作,请参阅The Two-Update Problem,Subsequence Induction, andStrict Dominance for the Modified Demski Prior

Decision Theory

In 2015 we produced a number of new incremental advances in decision theory, constitutingmodestprogress, in line with our expectations.

Of these advances, we have published Andrew Critch’s proof ofa version of Löb’s theorem and Gödel’s second incompleteness that holds for bounded reasoners

Critch applies this parametric bounded version of Löb’s theorem to prove that a wide range of resource-limited software agents, given access to each other’s source code, can achieve unexploitable mutual cooperation in the one-shot prisoner’s dilemma. Although we considered our pastrobust cooperationresults strong reason to believe that bounded cooperation was possible, the confirmation is useful and gives us new formal tools for studying bounded reasoners.

Over this period, Eliezer Yudkowsky, Benya Fallenstein, and Nate Soares also improved our technical (and philosophical) understanding of the decision theory we currently favor, “functional decision theory”—a slightly modified version of updateless decision theory.

The biggest obstacle to formalizing decision theory currently seems to be that we lack a suitable formal account of logical counterfactuals. Logical counterfactuals are questions of the form “IfX(which I know to be false) were true, what (if anything) would that imply aboutY?” These are important in decision theory, one special case being off-policy predictions. (Even if I can predict that I’m definitely not taking actionX, I want to be able to ask what would ensue if I did; a wrong answer to this can lead to me accepting substandard self-fulfilling prophecies like two-boxing in the transparent Newcomb problem.)


We explored some proof-length-based approaches to logical counterfactuals, and ultimately rejected them, though we have continued to devote some thought to this approach. During our first2015 workshop, Scott Garrabrant proposedan informal conjecture on proof length and counterfactuals, which was subsequentlyrevised;但是,这两个版本的猜想都被Sam Eisenstat证明是错误的(1,2)。(See also Scott’sOptimal and Causal Counterfactual Worlds。)

In a separate line of research, Patrick LaVictoire and others applied the proof-based decision theory framework to questions of讨价还价and division of trade gains. For other decision theory work on IAFF, see Vanessa and Scott’sSuperrationality in Arbitrary Gamesand Armstrong’sReflective Oracles and Superrationality: Prisoner’s Dilemma

我们的GitHub存储库contains lots of new code from our work on modal agents, representing our most novel work on decision theory in the past year. We have one or two papers in progress that will explain the advances we’ve made in decision theory via this work. SeeProvability逻辑中的“邪恶”决策问题and other posts in thedecision theory IAFF digestfor background on modal universes.

Pre-2015 work published in 2015:

2015 research published in 2016 or forthcoming:

Vingean Reflection

We were expectingmodestprogress on these problems in 2015, and we made modest progress.

Benya Fallenstein and Ramana Kumar’s “Proof-Producing Reflection for HOL” demonstrates a practical form of self-reference (and a partial solution to both theLöbian obstacleand theprocrastination paradox) in the HOL theorem prover. This result provides some evidence that it is possible for a reasoning system to trust another reasoning system that reasons the same way, so long as the systems have different internal states.


There is some internal debate within MIRI about what more is required for real-world Vingean reflection, aside from satisfactory accounts of logical uncertainty and logical counterfactuals. There’s also debate about whether any better results than this are likely to be possible in the absence of a full theory of logical uncertainty. Regardless, “Proof-Producing Reflection for HOL” demonstrates, via machine-checked proof, that it is possible to implement a form of reflective reasoning that is remarkably strong.

Benya和Ramana的工作还为我们提供了一个环境,以建立更好的反思推理玩具模型。Miri研究实习生杰克·加拉格尔(Jack G亚博体育官网allagher)目前是implementing a cellular automaton in HOLthat will let us implement reflective agents.

By applying results from the reflective oracles framework mentioned above, we also improved our theoretical understanding of Vingean reflection. In the IAFF postA Limit-Computable, Self-Reflective Distribution, research associate Tsvi Benson-Tilsen helped solidify our understanding of what kinds of reflection are and aren’t possible. Jessica, working with Benya and Paul, further showed that reflective oracles can’t readily be used to definereflective probabilistic logics

Pre-2015 work published in 2015:

2015 research published the same year:

Other relevant IAFF posts includeA Simple Model of the Löbstacle,Waterfall Truth Predicates, andExistence of Distributions that are Expectation-Reflective and Know It

Error Tolerance

We were expectingmodestprogress on these problems in 2015, but we made onlylimitedprogress.

Corrigibility was a mid-level priority for us in 2015, and we spent some effort trying to build better models of corrigible agents. In spite of this, we didn’t achieve any big breakthroughs. We made some progress on fixing minor defects in our understanding of corrigibility, reflected, e.g., in ourerror-tolerance IAFF digest, Stuart Armstrong’sAI control ideas, and Jessica Taylor’soverview post;but these results are relatively small.

In 2015 our main novelties were Google DeepMind researcher Laurent Orseau and FHI researcher / MIRI research associate Stuart Armstrong’s work on corrigibility (“Safely Interruptible Agents“), along with work on two other error tolerance subproblems:mild optimization(Jessica’sQuantilizersand Abram Demski’sStructural Risk Minimization) andconservative concepts(Jessica’s仅使用积极的例子学习一个概念)。

Pre-2015 work published in 2015:

  • N Soares, B Fallenstein, E Yudkowsky, S Armstrong. “Corrigibility。” 2014 tech report presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop.

2015 research published in 2016 or forthcoming:

我们的failure to make much progress on corrigibility may be a sign that corrigibility is not as tractable a problem as we thought, or that more progress is needed in areas like logical uncertainty (so that we can build better models of AI systems that model their operators as uncertain about the implications of their preferences) before we can properly formalize corrigibility.

We are more optimistic about corrigibility research, however, in light of recent advances in logical uncertainty and some promising discussions of related topics at our recentcolloquium series: “Cooperative Inverse Reinforcement Learning” (via Stuart Russell’s group), “避免通过价值增强学习的线头” (via Tom Everitt), and some items in Stuart Armstrong’s bag of tricks.

Value Specification

We were expectinglimited这些问题在2015年取得了进展,我们取得了有限的进展。

Value learning and related problems were low-priority for us last year, so we didn’t see any big advances.

MIRI research associate Kaj Sotala made value specification his focus, examining several interesting questions outside our core research agenda. Jessica Taylor also began investigating the problemon the research forum

Pre-2015 work published in 2015:

2015 research published in 2016 or forthcoming:

Error-tolerant agent designs and value specification will be larger focus areas for us going forward, under the高级机器学习系统的对齐亚博体育苹果app官方下载research program.


我们发布了我们的technical agenda在2014年底和2015年初。概述论文,“”使机器智能与人类利益对齐的代理基础,” is slated for external publication inThe Technological Singularity在2017.

In 2015 we also produced some research unrelated to our agent foundations agenda. This research generally focused on forecasting and strategy questions.

Pre-2015 work published in 2015:

2015 research published in 2016 or forthcoming:

Beginning in 2015, new AI strategy/forecasting research supported by MIRI has been hosted on Katja Grace’s independentAI Impactsproject. AI Impacts featured 31 new文章and 27 new博客文章在2015, on topics fromthe range of human intelligencetocomputing cost trends

On the whole, we’re happy about our 2015 research output and expect our team growth to further accelerate technical progress.

2015 Research Support Activities

关注活动直接发展科技nical research community or facilitated technical research and collaborations, in 2015 we:

  • LaunchedtheIntelligent Agent Foundations Forum,a public discussion forum for AI alignment researchers. MIRI researchers and collaborators made 139 top-level posts to IAFF in 2015.
  • 聘请了四个新的全日制研究研究员。亚博体育官网Patrick LaVictoire joined in March, Jessica Taylor in August, Andrew Critch in September, and Scott Garrabrant in December. With Nate transitioning to a non-research role, overall we grew from a three-person research team (Eliezer, Benya, and Nate) to a six-person team.
  • Overhauled our research associates program.在2015年之前,我们的研究伙伴主亚博体育官网要是无薪合作者,参与我们的积极研究水平不同。在我们成功的夏季筹款活动之后,我们成为“研究助理”的付费职位,在其他机构的研究人员为我们花费大量时间在研究亚博体育官网项目上。根据该计划,Stuart Armstrong,Tsvi Benson-Tilsen,Abram Demski,Vanessa Kosoy,Ramana Kumar,Kaj Sotala以及(在加入Miri Full Plime Plime Plime Plime)之前,都在合作角色中做出了重大贡献。
  • Hired three research interns.Kaya Stechly and Rafael Cosman worked on polishing and consolidating old MIRI results (example on IAFF), while Jack Gallagher worked on our type theory in type theory project (github repo)。
  • Acquired two new research advisors,斯图尔特·罗素(Stuart Russell)和巴特·塞尔曼(Bart Selman)。
  • 举办了六个夏天workshopsand sponsored the three-weekMIRI Summer Fellowsprogram.These events helped forge a number of new academic connections and directly resulted in us making job offers to two extremely promising attendees: Mihály Bárász (who has plans to join at a future date) and Scott Garrabrant.
  • Helped organize two other academic events,aCambridge decision theory conferenceand a ten-weekAI alignment seminar seriesat UC Berkeley. We also ran 6 research retreats, sponsored 36MIRIxevents, and spoke at an OxfordBig Picture Thinking研讨会系列。
  • Spoke at five other academic events.We participated in the Future of Life Institute’s“ AI的未来”会议,AAAI-15,AGI-15,Lori 2015和APS 2015We also attendednip

I’m excited about our 2015 progress in growing our team and collaborating with the larger academic community. Over the course of the year, we built closer relationships with people at Google DeepMind, Google Brain,OpenAI,替代,良好的AI,人类研究所的未来和其他研究小组。亚博体育官网所有这些使我们处于更好的位置,可以与其他研究人员分享我们的研究结果,方法论和目标,并吸引新的人才来进行AI对齐工作亚博体育官网。

2015 General活动

Beyond direct research support, in 2015 we:

Although we have deemphasized outreach efforts, we continue to expect these activities to be useful for spreading general awareness about MIRI, our research program, and AI safety research more generally. Ultimately, we expect this to help build our donor base, as well as attract potential future researchers (to MIRI and the field more generally), as with our past outreach and capacity-building efforts.


I am very pleased with our fundraising performance. In 2015 we:

  • Continued our strong fundraising growth, with a total of$1,584,109在contributions.3
  • Received$ 166,943在grants from the Future of Life Institute (FLI), with another ~$80,000 annually for the next two years.4
  • 尝试一种新型的筹款活动(不匹配,具有多个目标)。我认为这些实验取得了成功。我们的summer fundraiser是我们迄今为止最大的筹款活动$632,011, and ourwinter fundraiseralso went well, raising$328,148

Total contributions grew 28% in 2015. This was driven by an increase in contributions from new funders, including a one-time $219,000 contribution from an anonymous funder, $166,943 in FLI grants, and at least $137,023 fromRaising for Effective Giving(REG) and regranting from the Effective Altruism Foundation.5返回资助者的贡献减少是由于彼得·泰尔(Peter Thiel)在2015年停止支持,还有一次大型一次性离群值捐款from Jed McCaleb in the years prior ($526,316 arriving in 2013, $104,822 in 2014).

结论从这些年年的境况isons is a little tricky. MIRI underwent significant organizational changes over this time span, particularly in 2013. We switched to accrual-based accounting in 2014, which also complicates comparisons with previous years.6In general, though, we’re continuing to see solid fundraising growth.

The number of new funders decreased from 2014 to 2015. In our2014评论, Luke explains the large increase in funders in 2014:

New donor growth was strong in 2014, though this mostly came from small donations made during theSV Gives fundraiser。返回捐助者的大部分增长也可以归因于在SV期间做出少量捐款的失误捐赠者提供筹款活动。

Comparing our numbers in 2015 and 2013, we see healthy growth in the number of returning funders and total number of funders.

The above chart shows contributions in past years from small, mid-sized, large, and very large funder segments. Contributions from the three largest segments increased (approximately) proportionally from last year, with the notable exception of contributions from large funders, which increased from 26% to 31% of total contributions. We had a small year-over-year decrease in contributions in the small funder segment, which is again due having received an unusually large amount of small contributions during SV Gives in 2014.

As in past years, a full report on our finances (in the form of an independent accountant’s review report) will be made available on ourtransparency and financials页。该报告很可能在8月下旬或9月初开始。

2016 and Beyond

What’s next? Beyond our research goal of making significant progress in five of our six focus areas, we set the following operational goals for ourselves in July/August 2015:

  1. Accelerated growth: “expand to a roughly ten-person core research team.” (source)
  2. Type theory in type theory project: “hire one or two type theorists to work on developing relevant tools full-time.” (source)
  3. Visiting scholar program: “have interested professors drop by for the summer, while we pay their summer salaries and work with them on projects where our interests overlap.” (source)
  4. Independent review: “We’re also looking into options for directly soliciting public feedback from independent researchers regarding our research agenda and early results.” (source)
  5. Higher-visibility publications: “Our current plan this year is to focus on producing a few high-quality publications in elite venues.” (source)

2015年,我们将研究团队的规模从三个增加到六倍。亚博体育官网With the restructuring of our research associates program and the addition of two research interns, I’m pleased with the growth we achieved in 2015. We deemphasized growth in the first half of 2016 in order to focus on onboarding, but plan to expand again by the end of the year.

We have a为我们的类型理论职位的职位广告,在我们完成接下来的几个核心研究员雇用之后,这很可能会填补。亚博体育官网在此期间,我们一直在研究类型理论项目中的研究实习生杰克·加拉格尔(Jack 亚博体育官网Gallagher)工作,我们还在2016年4月举行了类型理论研讨会

With help from our research advisors, our visiting scholars program morphed into a three-week-longcolloquium series。Rather than hosting a handful of researchers for longer periods of time, we hosted over fifty researchers for shorter stretches of time, comparing notes on a wide variety of active AI safety research projects. Speakers at the event included Stuart Russell, Francesca Rossi, Tom Dietterich, and Bart Selman. We’re also collaborating with Stuart Russell on acorrigibility grant

Work is underway on conducting an external review of our research program; the results should be available in the next few months.

With regards to our fifth goal, in addition to “Proof-Producing Reflection for HOL” (which was presented at ITP 2015 in late August), we’ve since published papers at LORI-V (“Reflective Oracles”), at UAI 2016 (“Safely Interruptible Agents” and “A Formal Solution to the Grain of Truth Problem”), and at an IJCAI 2016 workshop (“The Value Learning Problem”). Of those venues, UAI is generally considered more prestigious than most venues that we have published in in the past. I’d count this as moderate (but not great) progress towards the goal of publishing in more elite venues. Nate will have more to say about our future publication plans.

Elaborating further on our plans would take me beyond the scope of this review. In the coming weeks, Nate will be providing more details on our 2016 activities and our goals going forward in a big-picture MIRI strategy post.7

  1. This paper was originally titled “Aligning Superintelligence with Human Interests.” We’ve renamed it in order to emphasize that this research agenda takes a specific approach to the alignment problem, and other approaches are possible too—including, relevantly, Jessica Taylor’s new “Alignment for Advanced Machine Learning Systems” agenda.
  2. I (Malo Bourgon)more recentlytook on a leadership role as MIRI’s new COO and second-in-command.
  3. $80,480 of this was earmarked funding for the AI Impacts project.
  4. MIRI is administering three FLI grants (and participated in a fourth). We are to receive $250,000 over three years to fund work on our agent foundations technical agenda, $49,310 towards AI Impacts, and we are administering Ramana’s $36,750 to study self-reference in the HOL theorem prover in collaboration with Benya.
  5. This only counts direct contributions through REG to MIRI. REG’s support for MIRI is likely closer to $200,000 when accounting for contributions made directly to MIRI as a result of REG’s advice to funders.
  6. Also note that numbers in this section might not exactly match previously published estimates, since small corrections are often made to contributions data. Finally, note that these number do not include in-kind donations.
  7. My thanks to Rob Bensinger for his substantial contributions to this review.