2018 Update: Our New Research Directions

||MIRI Strategy,消息

多年以来,Miri的目标一直是解决足够的基本困惑alignment和intelligence to enable humanity to think clearly about technical AI safety risks—and to do this before this technology advances to the point of potential catastrophe. This goal has always seemed to us to be difficult, but possible.1

去年,我们说我们正在启动针对这一目标的新研究计划。亚博体育官网2Here, we’re going to provide background on how we’re thinking about this new set of research directions, lay out some of the thinking behind our recent decision to do less default sharing of our research, and make the case for interested software engineers tojoin our team和help push our understanding forward.


  1. Our research
  2. Why deconfusion is so important to us
  3. Nondisclosed-by-default research, and how this policy fits into our overall strategy
  4. Joining the MIRI team


2014年,Miri发布了其第一个研究议程,“亚博体育官网使机器智能与人类利益对齐的代理基础。” Since then, one of our main research priorities has been to develop a better conceptual understanding ofembedded agency:正式表征缺乏清晰的代理/环境边界的推理系统,比其环境小,必须推理自己,亚博体育苹果app官方下载并冒着在交叉目的工作的零件的风险。这些研究问题亚博体育官网仍然是Miri的主要重点,并且正在与我们的新研究方向同时研究(我将重点介绍下面的更多内容)。3

从我们的角度来看,解决这类问题的目的不是直接告诉我们如何构建良好的AGI系统。亚博体育苹果app官方下载相反,关键是解决我们围绕诸如“对齐”和“ agi”等思想的困惑,以便未来的AGI开发人员对问题有毫无困难的看法。Eliezer在“The Rocket Alignment Problem”,它想象一个世界在了解牛顿力学或微积分之前试图降落在月球上的世界。

最近,一些Miri研究人员开发了新的研究亚博体育官网方向,这些方向似乎可以使得解决这些基本困惑的更可扩展的进展。Specifically, the progress is more scalable in researcher hours—it’s now the case that we believe excellent engineers coming from a variety of backgrounds can have their work efficiently converted into research progress at MIRI—where previously, we only knew how to speed our research progress with a (relatively atypical) breed of mathematician.

At the same time, we’ve seen some significantfinancial successover the past year—not so much that funding is no longer a constraint at all, but enough to pursue our research agenda from new and different directions, in addition to the old.

Furthermore, our view implies that haste is essential. We see AGI asa likely cause of existential catastrophes, especially if it’s developed with relatively brute-force-reliant, difficult-to-interpret techniques; and although we’re非常不确定about when humanity’s collective deadline will come to pass, many of us are somewhat alarmed by the speed of recent machine learning progress.


Comparing our new research directions and Agent Foundations

Our new research directions involve building software systems that we can use to test our intuitions, and building infrastructure that allows us to rapidly iterate this process. Like the Agent Foundations agenda, our new research directions continue to focus on “deconfusion,” rather than on, e.g., trying to improve robustness metrics of current systems—our sense being that even if we make major strides on this kind of robustness work, an AGI system built on principles similar to today’s systems would still be too opaque to align in practice.

从某种意义上说,您可以将我们的新研究视为解决我们一直在攻击的同样的问题,但亚博体育官网要从新角度开始。换句话说,如果您对logical inductors或者functional decision theory, you probably wouldn’t be excited by our new work either. Conversely, if you already have the sense that becoming less confused is a sane way to approach AI alignment, and you’ve been wanting to see those kinds of confusions attacked with software and experimentation in a manner that yields theoretical satisfaction, then you may well want to work at MIRI. (I’ll have more to say about thisbelow

Our new research directions stem from some distinct ideas had by Benya Fallenstein, Eliezer Yudkowsky, and myself (Nate Soares). Some high-level themes of these new directions include:

  1. 寻求全新的低级基础以进行优化, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations.

    请注意,这并不需要试图击败模式rn ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations for optimization that are broadly applicable in the same way, and for some of the same reasons, that gradient descent scales to be broadly applicable, while possessing significantly better alignment characteristics.


  2. Endeavoring to figure out parts of cognition that can be very transparent as cognition, without being GOFAI or completely disengaged from subsymbolic cognition.

  3. Experimenting with some specific alignment problems比以前已经进入计算环境的问题要深。

In common between all our new approaches is a focus on using high-level theoretical abstractions to enable coherent reasoning about the systems we build. A concrete implication of this is that we write lots of our code in Haskell, and are often thinking about our code through the lens of type theory.

We aren’t going to distribute the technical details of this work anytime soon, in keeping with the recent MIRI policy changes在下面讨论。However, we have a good deal to say about this research on the meta level.

We are excited about these research directions, both for their present properties and for the way they seem to be developing. When Benya began the predecessor of this work ~3 years ago, we didn’t know whether her intuitions would pan out. Today, having watched the pattern by which research avenues in these spaces have opened up new exciting-feeling lines of inquiry, none of us expect this research to die soon, and some of us are hopeful that this work may eventually open pathways to attacking the entire list of basic alignment issues.4

我们也同样兴奋的程度eful cross-connections have arisen between initially-unrelated-looking strands of our research. During a period where I was focusing primarily on new lines of research, for example, I stumbled across a solution to the original version of thetiling agents problemfrom the Agent Foundations agenda.5

这项工作似乎比《代理基金会议程》更多地“发出自己的指南”。虽然过去我们需要对研究口味非常紧密地适应研究口味,但现在我们认为我们对地形有足够的感觉,可以放松这些要求。亚博体育官网我们仍在寻找科学创新和是的员工fairlyclose on research taste, but our work is now much more scalable with the number of good mathematicians and engineers working at MIRI.

的说,尽管有前途的the last couple of years have seemed to us, this is still “blue sky” research in the sense that we’d guess most outside MIRI would still regard it as of academic interest but of no practical interest. The more principled/coherent/alignable optimization algorithms we are investigating are not going to sort cat pictures from non-cat pictures anytime soon.

The thing that generally excites us about research results is the extent to which they grant us “deconfusion” in the sense described in the next section, not the ML/engineering power they directly enable. This “deconfusion” that they allegedly reflect must, for the moment, be discerned mostly via abstract arguments supported only weakly by concrete “look what this understanding lets us do” demos. Many of us at MIRI regard our work as being of strong practical relevance nonetheless—but that is because we have long-term models of what sorts of short-term feats indicate progress, and because we view becoming less confused about alignment as having a strong practical relevance to humanity’s future, for reasons that I’ll sketch out next.



Quoting Anna Salamon, the president of the Center for Applied Rationality and a MIRI board member:

If I didn’t have the concept of deconfusion, MIRI’s efforts would strike me as mostly inane. MIRI continues to regard its own work as significant for human survival, despite the fact that many larger and richer organizations are now talking about AI safety. It’s a group that got all excited about逻辑归纳(and tried paranoidly to make sure Logical Induction “wasn’t dangerous”beforereleasing it)—even though Logical Induction had only a moderate amount of math and no practical engineering at all (and did something similar with永恒的决策理论, to pick an even more extreme example). It’s a group that continues to stare mostly at basic concepts, sitting reclusively off by itself, while mostly leaving questions of politics, outreach, and how much influence the AI safety community has, to others.

However, I do have the concept of deconfusion. And when I look at MIRI’s activities through that lens, MIRI seems to me much more like “oh, yes, good, someoneis直接拍摄看起来像关键的事情”和“他们似乎有战斗机会”和“天哪,我希望他们(或某人以某种方式)在截止日期之前解决了更多的困惑,因为没有这样的进步,人类肯定会肯定似乎有点沉没。”

I agree that MIRI’s perspective and strategy don’t make much sense without the idea I’m calling “deconfusion.” As someone reading a MIRI strategy update, you probably already partly have this concept, but I’ve found that it’s not trivial to transmit the full idea, so I ask your patience as I try to put it into words.

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

举一个具体的例子,我对10岁的无穷大的想法是由重新排列的混乱而不是连贯的,即使是1700年最好的数学家的想法也是如此。如果我们从方程式的两侧减去无穷大,会发生什么?”但是我对20岁的无穷大的想法是不是similarly confused, because, by then, I’d been exposed to the more coherent concepts that later mathematicians labored to produce. I wasn’t as smart or as good of a mathematician as Georg Cantor or the best mathematicians from 1700; but deconfusion can be transferred between people; and this transfer can spread the ability to think actually coherent thoughts.

In 1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence anincoherentconcept,” “but the economy’salreadysuperintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also besmart enoughto see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerously更聪明比我们,因为图灵完整计算可以效仿任何东西。”和“无论如何,我们都可以unplugit.”) Today, these conversations are different. In between, folks worked to make themselves and others less fundamentally confused about these topics—so that today, a 14-year-old who wants to skip to the end of all that incoherence can just pick up a copy of Nick Bostrom’s超级智能6

Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.


An even more striking example is the case of Archimedes, who intuited his way to the ability to do useful work in both integral and differential calculus thousands of years before calculus became a simple formal thing that could be passed between people.

In both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.7

Why deconfusion (on our view) is highly relevant to AI accident risk


We suspect that today’s concepts about things like “optimization” and “aiming” are incapable of supporting the necessary precision, even if wielded by researchers who care a lot about safety. Part of why I think this is that if you pushed me to explain what I mean by “optimization” and “aiming,” I’d need to be careful to avoid spouting nonsense—which indicates that I’m still confused somewhere around here.

A worrying fact about this situation is that, as best I can tell, humanitydoesn’t需要这些概念的连贯版本hill-climb它进入Agi的方式。进化的山坡爬到了那个距离,而进化没有模型。但是,随着进化对基因组的巨大优化压力,这些基因组开始编码大脑internally对仅与遗传适应性相关的靶标进行了优化。人类找到了满足我们自己的目标(视频游戏,冰淇淋,节育……)的越来越多的方法,即使这直接与引起我们的选择标准相反,“将您的基因传播到下一代中。”

If we are to avoid a similar fate—one where we attain AGI via huge amounts of gradient descent and other optimization techniques, only to find that the resulting system has internal optimization targets that are very different from the targets we externally optimized it to be adept at pursuing—then we must be more careful.

当AI研究人亚博体育官网员探索优化器的空间时,要确保研究人员发现的第一个高功能优化器是他们知道如何针对所选任务的优化者需要什么?我不确定,因为我仍然对这个问题感到困惑。我可以模糊地告诉你问题与convergent instrumental incentives, and I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize forX” to actually result in a system that内部优化为了X, but there are still wide swaths of the question where I can’t say much without saying nonsense.

As an example, AI systems like Deep Blue and AlphaGo cannot reasonably be said to be reasoning about the whole world. They’re reasoning about some much simpler abstract platonic environment, such as a Go board. There’s an intuitive sense in which we don’t need to worry about these systems taking over the world, for this reason (among others), even in the world where those systems are run on implausibly large amounts of compute.

Vaguely speaking, there’s a sense in which some alignment difficulties don’t arise until an AI system is “reasoning about the real world.” But what does that mean? It doesn’t seem to mean “the space of possibilities that the system considers literally concretely includes reality itself.” Ancient humans did perfectly good general reasoning even while utterly lacking the concept that the universe can be described by specific physical equations.

它看起来像它必定有什么意思更像“the system is building internal models that, in some sense, are little representations of the whole of reality.” But what counts as a “little representation of reality,” and why do a hunter-gatherer’s confused thoughts about a spirit-riddled forest count while a chessboard doesn’t? All these questions are likely confused; my goal here is not to name coherent questions, but to gesture in the direction of a confusion that prevents me from precisely naming a portion of the alignment problem.


要为命名这个概念的替代尝试,请参阅Eliezer的rocket alignmentanalogy. For a further discussion of some of the reasons today’s concepts seem inadequate for describing an aligned intelligence with sufficient precision, see Scott and Abram’srecent write-up。(Or come discuss with us in person, at an “AI对计算机科学家的风险“ 作坊。)


Many types of research become far easier at particular places and times. It seems to me that for the work of becoming less confused about AI alignment, MIRI in 2018 (and for a good number of years to come, I think) is one of those places and times.


Logical inductors, as an example, give us at least a clue about why we’re apt to informally use words like “probably” in mathematical reasoning. It’s not a full answer to “how does probabilistic reasoning about mathematical facts work?”, but it does feel like an interesting hint—which is relevant to thinking about how “real-world” AI reasoning could possibly work, because AI systems might well also use probabilistic reasoning in mathematics.

第二点是,如果有些东西使大多数人团结在Miribesides提高人类生存几率的动力,这可能是使我们对宇宙基础的理解正确的味道。我们中的许多人都带来了这种口味 - 例如,我们许多人都有物理学背景(尤其是基本物理学),而我们中的编程背景的人往往对诸如类型理论,正式逻辑,,,和/或概率理论。

A third point, as noted多于,是我们对当前的研究直觉的身体感到兴奋,以及它们如何随着时间的流逝而变得越来越可转移/跨涂抹/具体化。亚博体育官网




I’d like to try to share some sense of why we chose this policy—especially because this policy may prove disappointing or inconvenient for many people interested in AI safety as a research area.9MIRI is a nonprofit, and there’s a natural default assumption that our mechanism for good is to regularly publish new ideas and insights. But we don’t think this is currently the right choice for serving our nonprofit mission.


  • we’re in a hurry to decrease existential risk;

  • in the same way that Faraday’s journals aren’t nearly as useful as Maxwell’s equations, and in the same way that logical induction isn’t all that useful to the average modern ML researcher, we don’t think it would be that useful to try to share lots of half-confused thoughts with a wider set of people;

  • we believe we can have more of the critical insights faster if we stay focused on making new research progress rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences;

  • we think it’s not unreasonable to be anxious about whether deconfusion-style insights could lead to capabilities insights, and have empirically observed we can think more freely when we don’t have to worry about this; and

  • even when we conclude that those concerns were paranoid or silly upon reflection, we benefited from moving the cognitive work of evaluating those fears from “before internally sharing insights” to “before broadly distributing those insights,” which is enabled by this policy.


我要警告说,在下面的事情中,我试图传达我的信念,但不一定是为什么 - 我并不是要提出一个会导致任何理性的人在我的立场上采取相同策略的论点;我只是为了传达自己如何思考决定的更为适中的目标。

I’ll begin by saying a few words about how our research fits into our overall strategy, then discuss the pros and cons of this policy.


At present, MIRI’s aim is to make research progress on the alignment problem. Our focus isn’t on shifting the field of ML toward taking AGI safety more seriously, nor on any other form of influence, persuasion, or field-building. We are simply and only aiming to directly make research progress on the core problems of alignment.

This choice may seem surprising to some readers—field-building and other forms of outreach can obviously have hugely beneficial effects, and throughout MIRI’s history, we’ve been much more outreach-oriented than the typical math research group.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,10这样一来,美里的时间最好花在直接面对核心研究问题上。亚博体育官网此外,我们认为我们自己的比较优势在于这里,而不是外展工作。11

My beliefs here are connected to my beliefs about the mechanics of deconfusion described多于。In particular, I believe that the alignment problem might start seeming significantly easier once it can be precisely named, and I believe that precisely naming this sort of problem is likely to be a serial challenge—in the sense that some deconfusions cannot be attained until other deconfusions have matured. Additionally, my read on history says that deconfusions regularly come from relatively small communities thinking the right kinds of thoughts (as in the case of Faraday and Maxwell), and that such deconfusions can spread rapidly as soon as the surrounding concepts become coherent (as exemplified by Bostrom’s超级智能)。我从所有这一切中得出结论,试图影响更广泛的领域并不是花费我们自己努力的最佳场所。


We think that most of MIRI’s expected impact comes from worlds in which our deconfusion work eventually succeeds—that is, worlds where our research eventually leads to a principled understanding of alignable optimization that can be communicated to AI researchers, more akin to a modern understanding of calculus and differential equations than to Faraday’s notebooks (with the caveat that most of us aren’t expecting solutions to the alignment problem to compress nearly so well as calculus or Maxwell’s equations, but I digress).

这可能会发生的一种非常合理的方式是,我们的解灌注工作使对齐变得成为可能,而没有太多改变了AGI的可用途径。12To pick a trivial analogy illustrating this sort of world, considerinterval arithmeticas compared to the usual way of doing floating point operations. In interval arithmetic, an operation likesqrttakes two floating point numbers, a lower and an upper bound, and returns a lower and an upper bound on the result. Figuring out how to do interval arithmetic requires some careful thinking about the error of floating-point computations, and it certainly won’t speed those computations up; the only reason to use it is to ensure that the error incurred in a floating point operation isn’t larger than the user assumed. If you discover interval arithmetic, you’re at no risk of speeding up modern matrix multiplications, despite the fact that you really have found a new way of doing arithmetic that has certain desirable properties that normal floating-point arithmetic lacks.

In worlds where deconfusing ourselves about alignment leads us primarily to insights similar (on this axis) to interval arithmetic, it would be best for MIRI to distribute its research as widely as possible, especially once it has reached a stage where it is comparatively easy to communicate, in order to encourage AI capabilities researchers to adopt and build upon it.



The latter scenario is relatively less important in worlds whereAGI timelinesare short. If current deep learning research is already on the brink of AGI, for example, then it becomes less plausible that the results of MIRI’s deconfusion work could become a relevant influence on AI capabilities research, and most of the potential impact of our work would come from its direct applicability to deep-learning-based systems. While many of us at MIRI believe that short timelines are at least plausible, there is significant uncertainty and disagreement about timelines inside MIRI, and I would not feel comfortable committing to a course of action that is safe only in worlds where timelines are short.

In sum, if we continue to make progress on, and eventually substantially succeed at, figuring out the actual “cleave nature at its joints” concepts that let us think coherently about alignment, I find it quite plausible that those same concepts may also enable capabilities boosts (especially in worlds where there’s a lot of time for those concepts to be pushed in capabilities-facing directions). There is certainly strong historical precedent for deep scientific insights yielding unexpected practical applications.

By the nature of deconfusion work, it seems very difficult to predict in advance which other ideas a given insight may unlock. These considerations seem to us to call for conservatism and delay on information releases—potentially very long delays, as it can take quite a bit of time to figure out where a given insight leads.


We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13亚博体育官网研究人员报告说,他们可以更自由地思考,他们的集思广益会议进一步扩大,更广泛,等等。


At the same time, this kind of caution is an unavoidable consequence of doing deconfusion research in public, since it’s very hard to know what ideas may follow five or ten years after a given insight. AI alignment work and AI capabilities work are close enough neighbors that many insights in the vicinity of AI alignment are “potentially capabilities-relevant until proven harmless,” both for reasons discussed above and from the perspective of the conservativesecurity mindsetwe try to encourage around here.

In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.

If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.


There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.

Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.



ML researchers aren’t running around using logical induction or functional decision theory. These theories don’t have practical relevance to the researchers on the ground, and they’re not supposed to; the point of these theories is just deconfusion.

更准确地说,理论本身并不是有趣的新颖性。新颖的是几年前,我们无法写下任何theory of how in principle to assign sane-seeming probabilities to mathematical facts, and today we can write down logical induction. In the journey from pointAto pointB,我们变得不那么困惑。逻辑归纳论文是一个文物,目睹了该融合的融合,也是一个文物,在他们经历了编写过程的过程中,它赋予其作者更多的解灌注。but the thing that excited me about logical induction was not any one particular algorithm or theorem in the paper, but rather the fact that we’re a little bit less in-the-dark than we were about how a reasoner can reasonably assign probabilities to logical sentences. We’re not fully out of the dark on this front, mind you, but we’re a little less confused than we were before.14

If the rest of the world were talking about how confusing they find the AI alignment topics we’re confused about, and were as concerned about their confusions as we are concerned about ours, then failing to share our research would feel a lot more costly to me. But as things stand, most people in the space look at us kind of funny when we say that we’re excited about things like logical induction, and I repeatedly encounter deep misunderstandings when I talk to people who have read some of our papers and tried to infer our research motivations, from which I conclude that they weren’t drawing a lot of benefit from my current ramblings anyway.

从某种意义上说,我们目前的大多数研究都是一种漫步的一种形式 - 充其量亚博体育官网以同样的方式,法拉第的日记也在漫游。可以,如果大多数实用的科学家避免在法拉第期刊上闲逛并等到麦克斯韦出现并将其提炼成三个有用的方程式,那就可以了。而且,如果法拉第期望物理理论最终提炼,他就不需要四处传播他的日记 - 他只能等到它被蒸馏,然后努力传播一些较少的概念。

我们希望我们理解对齐,这是currently far from complete, to eventually distill, and I, at least, am not very excited about attempting to push it on anyone until it’s significantly more distilled. (Or, barring full distillation, until a project with a commitment to the common good, an adequatesecurity mindset, and a large professed interest in deconfusion research comes knocking.)

在此期间,当然有一些researche亚博体育官网rs outside MIRI who care about the same problems we do, and who are also pursuing deconfusion. Our nondisclosed-by-default policy will negatively affect our ability to collaborate with these people on our other research directions, and this is a real cost and not worth dismissing. I don’t have much more to say about this here beyond noting that if you’re one of those people, you’re very welcome toget in touch with us(and you may want to considerjoining the team)!


In the long run, if our research is going to be useful, our findings will need to go out into the world where they can impact how humanity builds AI systems. However, it doesn’t follow from this need for eventual distribution (of some sort) that we might as well publish all of our research immediately. As discussed above, as best I can tell, our current research insights just aren’t that practically useful, and sharing early-stage deconfusion research is time-intensive.

Our nondisclosed-by-default policy also allows us to preserve options like:

  • 确定我们认为应该进一步发展的研亚博体育官网究结果,同时考虑differential technological development;和
  • deciding which group(s) to share each interesting finding with (e.g., the general public, other closed safety research groups, groups with strong commitment to security mindset and the common good, etc.).

Future versions of us obviously have better abilities to make calls on these sorts of questions, though this needs to be weighed against many facts that push in the opposite direction—the later we decide what to release, the less time others have to build upon it, and the more likely it is to be found independently in the interim (thereby wasting time on duplicated efforts), and so on.

Now that I’ve listed reasons in favor of our nondisclosed-by-default policy, I’ll note some reasons against.



  1. 我们将很难吸引和评估新的研究人员;亚博体育官网分享较少的研究意味着尝试进亚博体育官网行各种研究合作的机会较少,并注意哪些合作对双方都效果很好。

  2. We lose some of the benefits of accelerating the progress of other researchers outside MIRI via sharing useful insights with them in real time as they are generated.

  3. We will be less able to get useful scientific insights and feedback from visitors, remote scholars, and researchers elsewhere in the world, since we will be sharing less of our work with them.

  4. 我们将很难吸引资金和其他间接援助,而我们的工作较少,而潜在捐助者很难知道我们的工作是否值得支持。

  5. We will have to pay various costs associated with keeping research private, including social costs and logistical overhead.

We expect these costs to be substantial. We will be working hard to offset some of the losses froma, as I’ll discuss in the next section. For reasons discussed多于,我目前不太担心b。The remaining costs will probably be paid in full.

These costs are why we didn’t adopt this policy (for most of our research) years ago. With outreach feeling less like our comparative advantage than it did in thepre-Puerto-Rico日子和资金似乎比以前的瓶颈更少(尽管仍然是瓶颈),但这种方法现在似乎是可行的。

We’ve already found it helpful in practice to let researchers have insights first and sort out the safety or desirability of publishing later. On the whole, then, we expect this policy to cause a significant net speed-up to our research progress, while ensuring that we can responsibly investigate some of the most important technical questions on our radar.


I believe that MIRI is, and will be for at least the next several years, a focal point of one of those rare scientifically exciting points in history, where the conditions are just right for humanity to substantially deconfuse itself about an area of inquiry it’s been pursuing for centuries—and one where the output is directly impactful in a way that is rare even among scientifically exciting places and times.


  • Work that Eliezer, Benya, myself, and a number of researchers in AI safety view as having a significant chance of boosting humanity’s survival odds.

  • Work that, if it pans out, visibly has central relevance to the alignment problem—the kind of work that has a meaningful chance of shedding light on problems like “is there a loophole-free way to upper-bound the amount of optimization occurring within an optimizer?”.

  • 如果您的口味与我们的口味相匹配,那么与有关智力,代理和现实结构的基本问题密切相关的问题;以及在人类知识的伟大和野生边界之一工作的相关刺激,具有大量重要的见解可能会紧密。

  • An atmosphere in which people are taking their own and others’ research progress seriously. For example, you can expect colleagues who come into work every day looking to actually make headway on the AI alignment problem, and looking to pull their thinking different kinds of sideways until progress occurs. I’m consistently impressed with MIRI staff’s drive to get the job done—with their visible appreciation for the fact that their work really matters, and their enthusiasm for helping one another make forward strides.

  • As an increasing focus at MIRI, empirically grounded computer science work on the AI alignment problem, with clear feedback of the form “did my code type-check?” or “do we have a proof?”.

  • Finally, some good, old-fashioned fun—for a certain very specific brand of “fun” that includes the satisfaction that comes from making progress on important technical challenges, the enjoyment that comes from pursuing lines of research you find compelling without needing to worry about writing grant proposals or otherwise raising funds, and the thrill that follows when you finally manage to distill a nugget of truth from a thick cloud of confusion.

Working at MIRI also means working with other people who were drawn by the very same factors—people who seem to me to have an unusual degree of care and concern for human welfare and the welfare of sentient life as a whole, an unusual degree of creativity and persistence in working on major technical problems, an unusual degree of cognitive reflection and skill with perspective-taking, and an unusual level of efficacy and grit.

My own experience at MIRI has been that this is a group of people who really want to help Team Life get good outcomes from the large-scale events that are likely to dramatically shape our future; who can tackle big challenges head-on without appealing tofalse narrativesabout how likely a given approach is to succeed; and who are remarkably good at fluidly updating on new evidence, and at creating a really fun environment for collaboration.

Who are we seeking?

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.

在实践中,这意味着:人本身的想法in math or code, who take seriously the problem of becoming less confused about AI alignment (quickly!), and who are generally capable. In particular, we’re looking for high-end Google programmer levels of capability; you don’t need a 1-in-a-million test score or ahalo命运。您也不需要博士学位,明确的ML背景甚至先前的研究经验。亚博体育官网

Even if you’re not pointed towards our research agenda, we intend to fund or help arrange funding for any deep, good, and truly new ideas in alignment. This might be as a hire, a fellowship grant, or whatever other arrangements may be needed.


如果你想要更多的信息,有几个good options:

  • 和...聊天Buck Shlegeris, a MIRI computer scientist who helps out with our recruiting. In addition to answering any of your questions and running interviews, Buck can sometimes help skilled programmers take some time off to skill-build through ourAI安全再培训计划

  • If you already know someone else at MIRI and talking with them seems better, you might alternativelyreach out to that person—especially布莱克·博格森(Blake Borgeson)(a new MIRI board member who helps us with technical recruiting) orAnna Salamon(Miri董事会成员,也是CFAR的总裁,并正在帮助举办一些Miri招聘活动)。

  • 来4.5天AI对计算机科学家的风险workshop, co-run by MIRI and CFAR. These workshops are open only to people who Buck arbitrarily deems “probably above MIRI’s technical hiring bar,” though their scope is wider than simply hiring for MIRI—the basic idea is to get a bunch of highly capable computer scientists together to try to fathom AI risk (with a good bit of rationality content, and of trying to fathom the way we’re failing to fathom AI risk, thrown in for good measure).

    These are a great way to get a sense of MIRI’s culture, and to pick up a number of thinking tools whether or not you are interested in working for MIRI. If you’d like to either apply to attend yourself or nominate a friend of yours,send us your info here

  • Come to next year’sMIRI Summer Fellows program, or be asummer internwith us. This is a better option for mathy folks aiming at Agent Foundations than for computer sciencey folks aiming at our new research directions. This last summer we took 6 interns and 30 MIRI Summer Fellows (see Malo’sSummer MIRI Updatespost for more details). Also, note that “summer internships” need not occur during summer, if some other schedule is better for you. ContactColm Ó Riainif you’re interested.

  • You could just tryapplying for a job

Some final notes

A quick note on “推理距离,” or on what it sometimes takes to understand MIRI researchers’ perspectives:To many, MIRI’s take on things is really weird. Many people who bump into our writing somewhere find our basic outlook pointlessly weird/silly/wrong, and thus find us uncompelling forever. Even among those who do ultimately find MIRI compelling, many start off thinking it’s weird/silly/wrong and then, after some months or years of MIRI’s worldview slowly rubbing off on them, eventually find that our worldview makes a bunch of unexpected sense.

如果您认为自己可能属于后一类,并且这种观点的改变,如果发生这种情况,wouldbebecauseMIRI’s worldview is onto something and not because we all got tricked byfalse-but-compellingideas… you might want to start exposing yourself to all this funny worldview stuff now, and see where it takes you. Good starting-points are理性:从AI到僵尸;Inadequate Equilibria;Harry Potter and the Methods of Rationality;the “AI对计算机科学家的风险”讲习班;普通的CFAR workshops;或者just hanging out with folks in ornear美里。

我怀疑我失败的一些关键的通信things above, based on past failed attempts to communicate my perspective, and based on some readers of earlier drafts of this post missing key things I’d wanted to say. I’ve tried to clarify as many points as possible—hence this post’s length!—but in the end, “we’re focusing on research and not exposition now” holds for me too, and I need to get back to the work.15

关于该领域状态的注释:MIRI is one of the dedicated teams trying to solve technical problems in AI alignment, but we’re not the only such team. There are currently three others: theCenter for Human-Compatible AI在加州大学伯克利分校和安全团队OpenAI并在Google Deepmind。All three of these safety teams are highly capable, top-of-their-class research groups, and we recommend them too as potential places to join if you want to make a difference in this field.

在许多其他机构中,还有一些扎实的研究人员,亚博体育官网例如人类研究所的未来Governance of AI Programfocuses on the important social/coordination problems associated with AGI development.

To learn more about AI alignment research at MIRI and other groups, I recommend the MIRI-produced代理基金会嵌入式代理write-ups; Dario Amodei, Chris Olah, et al.’s具体问题agenda; theAI Alignment Forum;和Paul ChristianoDeepMind safety team’s blogs.

On working here:这里的薪水比人们通常想象的更灵活。I’ve had a number of conversations with folks who assumed that because we’re a nonprofit, we wouldn’t be able to pay them enough to maintain their desired standard of living, meet their financial goals, support their family well, or similar. This is false. If you bring the right skills, we’re likely able to provide the compensation you need. We also place a high value on weekends and vacation time, on avoiding burnout, and in general on people here being happy and thriving.

You do need to be physically in Berkeley to work with us on the projects we think are most exciting, though we have pretty great relocation assistance and ops support for moving.


On the other hand, if you like the idea of an epic calling with a group of people who somehow claim to take seriously a task that sounds more like it comes from a science fiction novel than from a迪尔伯特strip, while having a lot of scientific fun; or you just care about humanity’s future, and want to help however you can… give us a call.

  1. 这篇文章是由各种Miri员工组合在一起的汞合金。说“ nate”的旁白意味着我(内特)认可帖子,许多概念和主题很大程度上来自我,我写了很多单词。但是,我没有写所有的单词,概念和主题是与其他许多Miri员工合作建立的。(这大致是章程在Miri博客上已经有一段时间的含义,值得一提。)
  2. See our 2017strategic updatefundraiserposts for more details.
  3. Inpast筹款活动,我们说过,通过足够的资金,我们希望对对齐问题进行替代的攻击线。我们的新研究方向亚博体育官网可以看作是遵循这种精神,的确,至少我们在2015年考虑的替代方法中至少有一个新的研究方向。我们的2015年筹款活动,我们的新作品与我们的代理创始风格的研究非常连续。亚博体育官网
  4. That is, the requisites for aligning AGI systems to perform limitedtasks;并非所有要对齐的必要条件CEV-班级autonomous AGI。比较保罗·克里斯蒂安诺(Paul Christiano)的区别ambitious and narrow value learning(though note that Paul thinks narrow value learning is sufficient for strongly autonomous AGI).
  5. 该结果在很快就会发表的论文中得到了更多描述。或者,至少最终。由于下面讨论的原因,这些天我没有花很多时间写论文。
  6. 有关此概念的更多讨论,请参见“Personal Thoughts on Careers in AI Policy and Strategy” by Carrick Flynn.
  7. Historical examples of deconfusion work that gave rise to a rich and healthy field include the distillation of Lagrangian and Hamiltonian mechanics from Newton’s laws; Cauchy’s overhaul of real analysis; the slow acceptance of the usefulness of complex numbers; and the development of formal foundations of mathematics.
  8. 我应该强调,从我的角度来看,人类永远不会建立AGI,从未意识到我们的潜力,也没有使用cosmic endowmentwould be a tragedy comparable (on anastronomical比例尺),以使我们消灭我们。我说“危险”,但我们不应该忽视人类的好处。
  9. 我自己的感觉是,我和Miri的其他高级职员从未特别goodat explaining what we’re doing and why, so this inconvenience may not be a new thing. It’s new, however, for us to not be making it a priority toattemptto explain where we’re coming from.
  10. 换句话说,许多人仅明确地专注于外展,许多其他人正在选择技术问题,以加强该领域并将其吸引到该领域的既定目标。
  11. This isn’t meant to suggest that nobody else is taking a straight shot at the core problems. For example, OpenAI’sPaul Christiano是一位顶级研究员,正是这样做亚博体育官网的。但是,我们仍然希望在目前的利润率上更多。
  12. For example, perhaps the easiest path to unalignable AGI involves following descendants of today’s gradient descent and deep learning techniques, and perhaps the same is true for alignable AGI.
  13. In other words, retreats/rooms where it is common knowledge that all thoughts and ideas are not going to be shared, except perhaps after some lengthy and irritating bureaucratic process and with everyone’s active support.
  14. As an aside, perhaps my main discomfort with attempting to publish academic papers is that there appears to be no venue in AI where we can go to say, “Hey, check this out—we used to be confused aboutX,现在我们可以说Y, which means we’re a little bit less confused!” I think there are a bunch of reasons behind this, not least the fact that the nature of confusion is such thatY通常一旦说出来,听起来很明显,因此使这样的结果听起来像是令人印象深刻的实践结果,这一点尤其困难。

    A side effect of this, unfortunately, is that all MIRI papers that I’ve ever written with the goal of academic publishing do a pretty bad job of saying what I was previously confused about, and how the “result” is indicative of me becoming less confused—for which I hereby apologize.

  15. 如果您还有更多问题,我鼓励您向我们发送电子邮件contact@www.hdjkn.com