2013in Review: Friendly AI Research

||美里战略

This is the 4th part of my personal and qualitative2013年美里的自我审查, in which I review MIRI’s 2013 Friendly AI (FAI) research activities.1

Friendly AI research in 2013

  1. In early 2013, wedecided将我们的优先事项从研究加上公共宣传转变为对技术FAI研究的亚博体育官网更专有。这大约导致public-facingFAI research in 2013 as in all past years combined.
  2. Also, our workshops succeeded in identifying candidates for hire. We expect to hire two 2013 workshop participants in the first half of 2014.
  3. 在2013年期间,我了解了许多有关如何创建FAI研究所和FAI研究领域的知识。亚博体育官网尤其是…
  4. MIRI needs to attract more experienced workshop participants.
  5. FAI研究可以通过亚博体育官网更广泛的社区, and need not be labeled as FAI research. But, more FAI progress is made when the researchers themselves conceive of the research as FAI research.
  6. 沟通风格非常重要。

The shift to Friendly AI research

From MIRI’s founding in 2000 until ourstrategic shift in early 2013,2we did some research andmuchpublic outreach (e.g. the奇点峰会and序列).3In early 2013, we decided that enough outreach and movement-building had been done that we could productively shift to aprimaryfocus on research, and Friendly AI research specifically.

The task before us was, essentially, tocreate a new FAI research institute((out of what had previously been primarily an outreach organization), and tocreate a new field of FAI research。我们仍然有很多关于如何实现这些目标的知识(见下文)。

Our initial steps were to (1) hold a series ofresearch workshops,(2)将友好的AI理论中的开放问题描述为潜在的研究合作者。亚博体育官网我们的讲习班和开放的问题描述针对three goals in particular。We wanted them to:

  1. help us identify researchers MIRI should hire to work full-time on Friendly AI theory,
  2. expose additional researchers to the Friendly AI research agenda, and
  3. 在友好的AI中,刺激具体的进展。

First, I’ll describe our 2013 Friendly AI research activities. After that, I’ll review “how good” I think these results are, and what lessons I’ve learned.

The workshops

The workshops strategy had been suggested by the success of our one-weekNovember 2012 workshop, which had been an experiment involving only four researchers, and had produced the core result of “概率逻辑中真理的确定性。”

Ourfirst workshop of 2013, held in April, was an attempt to tackle as many open problems as we could, with as many people as we could gather, to quickly learn which problems were most tractable and which researchers were most likely to contribute in the future. It involved 12 participants and lasted 3 weeks, though (due to scheduling constraints) only 5 researchers participated for the entire duration of the workshop. We learned a great deal about the workshop’s participants, and three problems in particular showed the most progress: Christiano’s “真理的确定性” framework, LaVictoire’s “Robust Cooperation” framework, and Fallenstein’s “parametric polymorphismLöbian obstacle for self-modifying systems。The success of this workshop encouraged us to hold more such workshops, albeit at a smaller scale and with tighter research foci.

Our下一个研讨会, in July 2013, had 8 participants and lasted one week. It focused on issues related to logical omniscience and the Löbian obstacle / self-reflective agents, and produced less progress-per-day than the April workshop. Its chief result was described in ablog postby participant Abram Demski.

OurSeptember workshopfocused instead on decision theory. It had 11 participants and lasted one week. Participants brainstormed “well-posed problems” in the area, built on LaVictoire’s robust cooperation framework, made some progress on formalizing无更新决策理论, and formulated additional toy problems such as the终极纽科姆的问题

OurNovember workshopwas our first workshop held outside of Berkeley.FHIgraciously hosted us at Oxford University. As with the July workshop, this workshop focused on logical omniscience and self-reflective agents. There were 11 participants, and it lasted one week. November’s theoretical progress flowed into the progress made at our12月研讨会(相同的主题,13名参与者,一周),被捕获7 new technical reports

Next, some基本统计:

  • We held 5 research workshops in 2013, with all but one of them being one week long.
  • 这些讲习班有35位独特的研究人员,还有7名仅首日游客(例如亚博体育官网Hannes LeitgebandNik Weaver).4
  • For first-time attendees, the median reply to the question “How happy are you that you came to the workshop, 0-10?” was 8.5.
  • From the time it went live in March 2013 to the end of 2013, about a dozen people contacted us about ourRecommended Courses for MIRI Math Researcherspage. However, we have reason to believe it has influenced the study patterns of a much larger number of people. Some MIRI supporters have told us they routinely point smart young acquaintances to that page. Moreover, the page received more unique pageviews in 2013 than (e.g.) ouryabo体育官网下载ios or关于pages, despite not being linked from every page of the site like the Donate and About pages are. The Recommended Courses page使它成为可能for at least one person (Nate Soares) to quickly upgrade his math skills and attend a workshop in 2013, which he couldn’t have done before studying several of the textbooks on the Courses page.
  • From the time it went live in June 2013 to the end of 2013, we received 227 non-junk applications5to attend future MIRI workshops, 47 of which are still being processed. So far, 60 applicants are ones we’ve deemed “promising,” 23 of whom attended a workshop in 2013. Of those 23, about half were researchers with whom we had little to no prior contact.

Describing open problems in Friendly AI

In 2013, MIRI described open problems in Friendly AI (OPFAIs) to researchers via three standard methods: articles, talks, and tutorials at workshops.

OnOPFAI articles: Yudkowsky’s article on “OPFAI #1” discussed情报爆炸微观经济学((akaAI takeoff dynamics), which I consider to be an open problem in “strategy research” rather than in Friendly AI theory, so I discussed it in a以前的帖子。From my perspective, the first written OPFAI description of 2013 was onlogical decision theory。Alex Altair (then a MIRI researcher) described the problem in an April 2013 paper called “A Comparison of Decision Algorithms on Newcomblike Problems。” This open problem had been described before, inLessWrongpostsand in a117-page technical report, but Altair’s presentation of the issue was more succinct and formal than previous presentations had been.

The second written OPFAI description of 2013 was on thetiling agents问题,特别是t Lobian障碍iling agents. Yudkowsky brought a draft of this paper to the April workshop, and heavily modified the draft as a result of the progress at that workshop, finally publishing the draft in June 2013. The third written OPFAI description of 2013, by Patrick LaVictoire and co-authors, was on therobust cooperation问题。2013年的第四篇书面OPFAI描述已开始naturalized induction

Because the tiling agents paper took ~2 months of FAI researcher time to produce, we decided to experiment with aprocessthat would minimize the amount of FAI-researcher-time required to produce new OPFAI descriptions. First, Yudkowsky brain-dumped the OPFAI toa Facebook group。Then, Robby Bensinger worked with several others to produce Less Wrong posts that described the OPFAI more clearly. The first post produced via this process was published in December 2013:Building Phenomenological Bridges。The rest of the posts explaining this OPFAI will be published in Q1 2014. Because we want to maximize the amount of FAI researcher hours that goes into FAI research rather than exposition, we hope to hire additional expository writing talent in 2014 (see our职业page).

OnOPFAI talks: MIRI scheduledtwo OPFAI talksin 2013. Yudkowsky’s Oct. 15th talk, “Recursion in rational agents: Foundations for self-modifying AI,” described both the robust cooperation and tiling agents problems to an audience at MIT. Two days earlier, (MIRI research associate) Paul Christiano gave a talk about probabilistic metamathematics at Harvard, following up on the earlier results from the “真理的确定性” paper.6Unfortunately, Yudkowsky’s talk was not recorded, but Christiano’swas

OnOPFAI tutorials at workshops: Each MIRI workshop in 2013 opened with a day or two of tutorials on the open problems being addressed by that workshop. These tutorials exposed ~35 researchers (participants and first-day visitors) to OPFAIs they weren’t previously very familiar with. (The others — e.g. Yudkowsky, Christiano, and Fallenstein — were already pretty familiar with the OPFAIs described in the tutorials.)

How good are these results?

For comparison’s sake,MIRI’s 2000-2012 FAI researchefforts consisted in:

  • Yudkowsky’s early research into the general “shape” of the Friendly AI challenge, resulting in publications such as “Creating Friendly AI”(2001),“Coherent Extrapolated Volition” (2004), and “人工智能是全球风险的积极和负面因素” (2008). These publications did not yet describe any OPFAIs as well-defined as the open problems described inAltair (2013),Yudkowsky & Herreshoff (2013), orLavictoire等(2013)7
  • 尤德科夫斯基(Yudkowsky)的早期决策理论研究导致了大约200亚博体育官网5年的TDT,尽管这项工作直到2009年才详细写作(1,2,3) and2010
  • 尤德科夫斯基(Yudkowsky)在2003年至2009年的友好后果主义者AI的早期工作,其中一些与马塞洛·赫雷斯福(Marcello Herreshoff)一起,以及彼得·德·布兰克(Peter de Blanc)和尼克·海伊(Nick Hay)的一个夏天(2006年)。这项工作导致了Miri在2013年描述的许多Opfais的早期版本,目前正在写作,或者目前在Yudkowsky的队列中写作。这也导致了后来描述的“无限瀑布”方法Yudkowsky & Herreshoff (2013)
  • 尤德科夫斯基(Yudkowsky)在2009年夏天再次与赫雷斯霍夫(Herreshoff)合作,部分是洛比亚障碍。
  • MIRI held a decision theory workshop in March 2010, attended by Eliezer Yudkowsky,Wei Dai,Stuart Armstrong,Gary Drescher,Anna Salamon,还有大约十几个在场的人,但并非全部讨论。8该研讨会催生了一个决策理论邮寄列表,从2010年到今天,它取得了许多最新进展TDT/UDT-style decision theories, though mostly via non-MIRI researchers like Wei Dai, Vladimir Slepnev, Stuart Armstrong, and Vladimir Nesov.
  • ((Former MIRI researcher) Peter de Blanc’s work on “convergence of expected utility for universal AI” and ontological crises, resulting inde Blanc (2009)andDe Blanc(2011)
  • ((MIRI research associate) Daniel Dewey’s work on value learning, resulting inDewey (2011)

因此,Miri的public-facing2000年至2012年的亚博体育官网友好AI研究包括一些非技术性的作品,例如“创造友好的AI”和“连贯的外推力”,一些关于TDT的哲学著作,以及Peter de Blanc和Daniel Dewey的三篇技术论文。将此与Miri 2013年面向公共的FAI研究相提并论:亚博体育官网Muehlhauser&Williamson(2013),9Altair (2013),Christiano等。(2013),Yudkowsky & Herreshoff (2013),Lavictoire等(2013), andthese 7 technical reports10

Subjectively, it feels to me likeMIRI produced about as much public-facing Friendly AI research progress in 2013 as in all past years combined((2000-2012), and possibly more. This is good but not particularly surprising, since 2013 was also the first year in which MIRItriedto focus on producing public-facing FAI research progress. (But to be clear: if we remove the “public-facing” qualifier, then it’s clear that Yudkowsky alone produced far more FAI research progress in 2000-2012 than MIRI and its workshops produced in 2013 alone.)

因此,我们的研讨会和开放问题描述实现我们既定的目标?Let’s check:

  1. 是的,他们帮助我们确定了雇用候选人。We expect to hire two 2013 workshop participants in the first half of 2014。((One of these hires is pending a visa application approval.)
  2. Yes, they exposed many new researchers to the Friendly AI research program. But, this exposure didn’t lead to as much independent Friendly AI work as I had hoped, and I have some theories as to why this was (see below).
  3. Yes, they spurred concrete research progress on Friendly AI (see above).

While this represents a promising start toward growing an FAI research institute and a new field of FAI research, there are many dimensions on which our output needs to improve for MIRI to have the impact we hope for (see below).

What have I learned about how to create an FAI research institute, and a new field of FAI research?

Some of my “lessons learned” from 2013’s FAI research activities were things I genuinely didn’t know at the start of the year. Most of them are things I suspected already, and I think they were confirmed by our experiences in 2013. Here are a few of them, in no particular order.

1.使操作远离研究人员。亚博体育官网

换句话说,“不要害怕高运营人员的比例。”亚博体育官网与FAI研究人才相比,运营人才(包括执行人才)更容易找到,因此聘请足够的运营人才以确保我们的FAI研究人员很重要亚博体育官网find can spend approximatelyalltheir time on FAI research, and almostnoneof their time on tasks that can mostly be handled by operations staff (writing grant proposals, organizing events, fundraising, paper bibliographies, etc.). MIRI should hire enough operations talent to do this even if it makes our operations-staff-to-researcher ratio looks high for a research institute.11

Universities often struggle with this (from a research productivity perspective), loading up some of the best research talent in the world with teaching duties, grant writing duties, and university service.12作为一家独立研究所,Miri可以制定亚博体育官网自己的政策并将这些问题最小化。

2. We need to attract more experienced workshop participants.

我们的研讨会吸引了一些非常聪明的参与者,但他们几乎只有30岁以下,其名字相对较少。经验丰富的研究人员可能会在(1)了亚博体育官网解相关的结果和正式工具,(2)了解生产性研究策略以及(3)为同行评审等方面编写结果等优点等优点。

3. Much FAI research can be done by a broad community, and need not be labeled as FAI research.

Presently, the Yudkowskian paradigm for “Friendly AI research” describes a very large research program that breaks down into dozens of sub-problems (OPFAIs), e.g. thetiling agentstoy problem. Locating and formulating open problems plausibly relevant for Friendly AI is a challenge in itself, one that especially benefits from specializing in Friendly AI for several years.

Many of the OPFAIs themselves, however, can be framed as “ordinary” open problems in AI safety engineering, philosophy, mathematical logic, theoretical computer science, economics, and other fields. These open problems can often be stated without any mention of Friendly AI, and sometimes without any mention of AI in general.

For every OPFAI Yudkowsky has described,13我已经能够找到早期相关的工作。14尽管这项较早的工作并未产生我们认为在FAI中打开问题的好解决方案,但它确实表明可以以可口的学术界的方式构建FAI。FAI不必是一个“外来”研究计划,在主流学术界外严格运亚博体育官网营,并且仅由FAI明确动机的人进行。相反,FAI研究人员应该能亚博体育官网够在主流研究范式的背景下构架他们的工作,如果他们选择这样做。此外,即使那些没有明确动机的人也可以进亚博体育官网行很多FAI研究,只要他们发现(例如)Löbian障碍很有趣as mathematics或作为计算机科学,或philosophy, etc.

4. But, more FAI progress is made when the researchers themselves conceive of the research as FAI research.

Still, researchersseem more likely to produce useful work on Friendly AI if they are thinking about the problems from the perspective of Friendly AI, rather than merely thinking about them as interesting open problems in philosophy, computer science, economics, etc. As I said inmy conversation with Jacob Steinhardt:

People work on different pieces of the problem depending on whether they’re trying to solve the problem for Friendly AI or just for a math journal. If they aren’t thinking about it from the FAI perspective, people can work all day on stuff that’s very close to what we care about in concept-space and yet has no discernable value to FAI theory. Thus, the people who have contributed by far the most novel FAI progress are people explicitly thinking about the problems from the perspective of FAI…

5. Communication style matters a lot.

When I talk to the kinds of top-notch researchers MIRI would like to collaborate with on open problems in Friendly AI, perhaps the most common complaint I hear is that our work is not formal enough, or not described clearly enough for them to understand it without more effort on their part than they are willing to expend. For an example of such a conversation that was recorded and transcribed, see againmy conversation with Jacob Steinhardt

I’ve thought this for a long time, and my experiences in 2013 have only reinforced the point. I’ll be writing more about this in the future.


  1. What counts as “Friendly AI research” is, naturally, a matter of debate. For most of this post I’ll assume “Friendly AI research” means “what Yudkowsky thinks of as Friendly AI research,” with the exception of intelligence explosion microeconomics, for reasons given in this post.
  2. Untilearly 2013, the organization currently named “Machine Intelligence Research Institute” was known as the “Singularity Institute for Artificial Intelligence.”
  3. From 2000-2004, “MIRI” was just Eliezer Yudkowsky, doing early FAI research. The organization began to grow in 2004, and by 2006 most efforts were outreach-related rather than research-related. This remained true until early 2013.
  4. Some statistics about 2013’s 35 workshop participants: 15 have a PhD, three are women, and 3 hold a university faculty position of assistant professor or higher rank. In short, our workshop participants have thus far largely been graduate students, post-docs, and independent researchers. Among the 15 participants who have a PhD, 9 have a PhD in mathematics, 4 have a PhD in computer science, one has a PhD in cognitive science, and one has a joint PhD in philosophy and computer science.
  5. By “junk applications” I mean to include both spam applications and applications from people who are clearly incapable of math research, e.g. “Hello, I would love to come to America to learn algebra.”
  6. Probabilistic metamathematics is an OPFAI in itself, and also one possible path toward a solution to the tiling agents problem.
  7. The open problems in these publications, too, need additional formalization. Such is the current state of research.
  8. For example, Steve Rayhawk and Henrik Jonsson.
  9. This short paper lies deep in the “philosophy” end of the philosophy -> math -> engineeringspectrum
  10. For both the 2000-2012 and 2013 calendar periods, when I write of “MIRI’s public-facing FAI work” I’m not including work that was “enabled” but not really “produced” by MIRI or its workshops, for example most work onUDT/adt((which were nevertheless largely developed on MIRI’sLesswrong.comwebsite and its decision theory mailing list).
  11. At the end of 2013, we had five full-time staff members: Luke Muehlhauser (executive director), Louie Helm (deputy director), Eliezer Yudkowsky (research fellow), Malo Bourgon (program manager), and Alex Vermeer (program management analyst), totaling 4 operations staff and one researcher. This 4:1 ratio will shrink as we are able to hire more FAI researchers, but I think it would have been a mistake to try to get by with fewer operations staff in 2013.
  12. Link et al. (2008);Marsh & Hattie (2002);NSOPF (2004)
  13. Sometimes with much help from Robby Bensinger and/or others.
  14. 我将列出一些早期相关工作的示例。((1)Superrationality: getting agents to rationally cooperate with agents like themselves. Before “Robust Cooperation“ 有:Rapoport(1966);McAfee (1984);Hofstadter (1985);Binmore (1987);Howard (1988);Tennenholtz (2004);Fortnow (2009);Kalai等。(2010);Peters & Szentes (2012)。((For Rapoport 1966, see especially pages 141-144 and 209-210. (2)Coherent extrapolated volition: figuring out what we would wish if we knew more, thought better, were more the people we wished we were, etc. BeforeYudkowsky (2004)there was:Rawls (1971);Harsanyi (1982);Railton (1986);Rosati (1995)。((For an overview of this background, seeMuehlhauser & Williamson 2013。) (3)价值的议会方法aggregation: using voting mechanisms to resolve challenges in normative uncertainty and values aggregation. BeforeBostrom (2009)社会选择理论中有关于这个话题的大量文献。有关最近的概述,请参阅List (2013);Brandt et al. (2012);Rossi et al. (2011);Gaertner (2009)。((4)Reasoning under fragility: figuring out how to get an agent not to operate with full autonomy before it has been made fully trustworthy. Before Yudkowskybegan to discussthe issue, there was much work on “adjustable autonomy”:Schreckenghost et al. (2010);Mouaddib et al. (2010);Zieba et al. (2010);Pynadath & Tambe (2002);Tambe等。(2002)。((5)Logical decision theory: finding a decision algorithm which can represent the agent’s deterministic decision process. BeforeYudkowsky(2010)there was:Spohn (2003);Spohn (2005)。((6)稳定的自我完善:获得自我修改代理以避免重写自己的代码,除非它对这些重写将保持理想的代理属性有很高的信心。前Yudkowsky & Herreshoff (2013)there was:Schmidhuber (2003);Schmidhuber (2009);Steunebrink & Schmidhuber (2012)。((7)Naturalized induction: getting an induction algorithm to treat itself, its data inputs, and its hypothesis outputs as reducible to its physical posits. Before “Building Phenomenological Bridges“ 有:Orseau & Ring (2011);Orseau & Ring (2012)