2018年更新:我们的新研究指示亚博体育官网

||美里战略,消息

For many years, MIRI’s goal has been to resolve enough fundamental confusions around结盟智力使人类能够清楚地考虑技术AI安全风险,并在这项技术发展到潜在的灾难点之前这样做。在我们看来,这个目标总是很困难,但可能。1

Last year, we said that we were beginning a new research program aimed at this goal.2Here, we’re going to provide background on how we’re thinking about this new set of research directions, lay out some of the thinking behind our recent decision to do less default sharing of our research, and make the case for interested software engineers tojoin our teamand help push our understanding forward.

Contents:

  1. 我们的研亚博体育官网究
  2. Why deconfusion is so important to us
  3. Nondisclosed-by-default research, and how this policy fits into our overall strategy
  4. 加入美里团队

1. Our research

In 2014, MIRI published its first research agenda, “Agent Foundations for Aligning Machine Intelligence with Human Interests。” Since then, one of our main research priorities has been to develop a better conceptual understanding ofembedded agency: formally characterizing reasoning systems that lack a crisp agent/environment boundary, are smaller than their environment, must reason about themselves, and risk having parts that are working at cross purposes. These research problems continue to be a major focus at MIRI, and are being studied in parallel with our new research directions (which I’ll be focusing on more below).3

从我们的角度来看,解决这类问题的目的不是直接告诉我们如何构建良好的AGI系统。亚博体育苹果app官方下载相反,关键是解决我们围绕诸如“对齐”和“ agi”等思想的困惑,以便未来的AGI开发人员对问题有毫无困难的看法。Eliezer在“The Rocket Alignment Problem,” which imagines a world where humanity tries to land on the Moon before it understands Newtonian mechanics or calculus.

Recently, some MIRI researchers developed new research directions that seem to enable more scalable progress towards resolving these fundamental confusions. Specifically, the progress is more scalable in researcher hours—it’s now the case that we believe excellent engineers coming from a variety of backgrounds can have their work efficiently converted into research progress at MIRI—where previously, we only knew how to speed our research progress with a (relatively atypical) breed of mathematician.

At the same time, we’ve seen some significantfinancial successover the past year—not so much that funding is no longer a constraint at all, but enough to pursue our research agenda from new and different directions, in addition to the old.

Furthermore, our view implies that haste is essential. We see AGI as可能造成存在灾难的原因,尤其是使用相对野蛮的,难以解释的技术开发的;虽然我们是quite uncertainabout when humanity’s collective deadline will come to pass, many of us are somewhat alarmed by the speed of recent machine learning progress.

For these reasons, we’re eager to locate the right people quickly and offer them work on these new approaches; and with this kind of help, it strikes us as very possible that we can resolve enough fundamental confusion in time to port the understanding to those who will need it before AGI is built and deployed.

比较我们的新研究方向和代理基金会亚博体育官网

Our new research directions involve building software systems that we can use to test our intuitions, and building infrastructure that allows us to rapidly iterate this process. Like the Agent Foundations agenda, our new research directions continue to focus on “deconfusion,” rather than on, e.g., trying to improve robustness metrics of current systems—our sense being that even if we make major strides on this kind of robustness work, an AGI system built on principles similar to today’s systems would still be too opaque to align in practice.

In a sense, you can think of our new research as tackling the same sort of problem that we’ve always been attacking, but from new angles. In other words, if you aren’t excited aboutlogical inductorsor功能决策理论, you probably wouldn’t be excited by our new work either. Conversely, if you already have the sense that becoming less confused is a sane way to approach AI alignment, and you’ve been wanting to see those kinds of confusions attacked with software and experimentation in a manner that yields theoretical satisfaction, then you may well want to work at MIRI. (I’ll have more to say about this以下。)

Our new research directions stem from some distinct ideas had by Benya Fallenstein, Eliezer Yudkowsky, and myself (Nate Soares). Some high-level themes of these new directions include:

  1. Seeking entirely new low-level foundations for optimization, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations.

    请注意,这并不需要试图击败模式rn ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations for optimization that are broadly applicable in the same way, and for some of the same reasons, that gradient descent scales to be broadly applicable, while possessing significantly better alignment characteristics.

    We’re aware that there are many ways to attempt this that are shallow, foolish, or otherwise doomed; and in spite of this, we believe our own research avenues have a shot.

  2. 努力找出认知可能非常透明的认知部分,不被Gofai或完全脱离下符号认知。

  3. 实验一些特定的对齐问题that are deeper than problems that have previously been put into computational environments.

在我们所有新方法之间的共同点是,专注于使用高级理论抽象来实现有关我们构建的系统的连贯推理。亚博体育苹果app官方下载这样的具体含义是,我们在Haskell中编写了许多代码,并且经常通过类型理论的镜头来思考我们的代码。

We aren’t going to distribute the technical details of this work anytime soon, in keeping with the recent MIRI policy changes在下面讨论。However, we have a good deal to say about this research on the meta level.

We are excited about these research directions, both for their present properties and for the way they seem to be developing. When Benya began the predecessor of this work ~3 years ago, we didn’t know whether her intuitions would pan out. Today, having watched the pattern by which research avenues in these spaces have opened up new exciting-feeling lines of inquiry, none of us expect this research to die soon, and some of us are hopeful that this work may eventually open pathways to attacking the entire list of basic alignment issues.4

同样,我们对我们研究的最初无关链之间出现了有用的跨连接的程度。亚博体育官网例如,在我主要关注新的研究系列的时期,我偶然发现了原始版本的解决方案亚博体育官网tiling agents problemfrom the Agent Foundations agenda.5

This work seems to “give out its own guideposts” more than the Agent Foundations agenda does. While we used to require extremely close fit of our hires on research taste, we now think we have enough sense of the terrain that we can relax those requirements somewhat. We’re still looking for hires who are scientifically innovative and who arefairlyclose on research taste, but our work is now much more scalable with the number of good mathematicians and engineers working at MIRI.

的说,尽管有前途的the last couple of years have seemed to us, this is still “blue sky” research in the sense that we’d guess most outside MIRI would still regard it as of academic interest but of no practical interest. The more principled/coherent/alignable optimization algorithms we are investigating are not going to sort cat pictures from non-cat pictures anytime soon.

The thing that generally excites us about research results is the extent to which they grant us “deconfusion” in the sense described in the next section, not the ML/engineering power they directly enable. This “deconfusion” that they allegedly reflect must, for the moment, be discerned mostly via abstract arguments supported only weakly by concrete “look what this understanding lets us do” demos. Many of us at MIRI regard our work as being of strong practical relevance nonetheless—but that is because we have long-term models of what sorts of short-term feats indicate progress, and because we view becoming less confused about alignment as having a strong practical relevance to humanity’s future, for reasons that I’ll sketch out next.

2. Why deconfusion is so important to us

What we mean by deconfusion

引用应用理性中心和美里董事会成员的主席安娜·萨拉蒙(Anna Salamon):

如果我没有脱口而出的概念,Miri的努力将使我大部分地融合。尽管许多更大,更富有的组织正在谈论AI安全,但Miri仍将其自己的工作视为人类生存的重要意义。这是一个对Logical Induction(并偏执以确保逻辑归纳“并不危险”before释放它) - 即使逻辑感应只有适量的数学,根本没有实用的工程(并且与永恒的决策理论, to pick an even more extreme example). It’s a group that continues to stare mostly at basic concepts, sitting reclusively off by itself, while mostly leaving questions of politics, outreach, and how much influence the AI safety community has, to others.

However, I do have the concept of deconfusion. And when I look at MIRI’s activities through that lens, MIRI seems to me much more like “oh, yes, good, someoneistaking a straight shot at what looks like the critical thing” and “they seem to have a fighting chance” and “gosh, I hope they (or someone somehow) solve many many more confusions before the deadline, because without such progress, humanity sure seems kinda sunk.”

我同意,没有我称为“脱口而为”的想法,Miri的观点和策略就没有多大意义。当有人阅读Miri策略更新时,您可能已经部分地有了这个概念,但是我发现传达完整的想法并不是很微不足道的,所以当我试图将其插入言语时,我会问您的耐心。

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

To give a concrete example, my thoughts about infinity as a 10-year-old were made of rearranged confusion rather than of anything coherent, as were the thoughts of even the best mathematicians from 1700. “How can 8 plus infinity still be infinity? What happens if we subtract infinity from both sides of the equation?” But my thoughts about infinity as a 20-year-old were不是similarly confused, because, by then, I’d been exposed to the more coherent concepts that later mathematicians labored to produce. I wasn’t as smart or as good of a mathematician as Georg Cantor or the best mathematicians from 1700; but deconfusion can be transferred between people; and this transfer can spread the ability to think actually coherent thoughts.

In 1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence anincoherent概念”,“但是经济是已经superintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also besmart enoughto see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerously更聪明比我们,因为图灵完整计算可以效仿任何东西。”和“无论如何,我们都可以unplugit.”) Today, these conversations are different. In between, folks worked to make themselves and others less fundamentally confused about these topics—so that today, a 14-year-old who wants to skip to the end of all that incoherence can just pick up a copy of Nick Bostrom’sSuperintelligence6

Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.

Interestingly, the history of science is in fact full of instances in which individual researchers possessed a mostly-correct body of intuitions for a long time, and then eventually those intuitions were formalized, corrected, made precise, and transferred between people. Faraday discovered a wide array of electromagnetic phenomena, guided by an intuition that he wasn’t able to formalize or transmit except through hundreds of pages of detailed laboratory notes and diagrams; Maxwell later invented the language to describe electromagnetism formally by reading Faraday’s work, and expressed those hundreds of pages of intuitions in three lines.

阿基米德(Archimedes)的一个例子就是一个更引人注目的例子,他直接进入了在积分和微积分数千年中进行有用的工作能力,然后微积分成为人与人之间可以通过的简单形式事物。

In both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.7

Why deconfusion (on our view) is highly relevant to AI accident risk

If human beings eventually build smarter-than-human AI, and if smarter-than-human AI is as powerful and hazardous as we currently expect it to be, then AI will one day bring enormous forces of optimization to bear.8我们相信,当这种情况发生时,那些巨大的forces need to be brought to bear on real-world problems and subproblems deliberately, in a context where they’re theoretically well-understood. The larger those forces are, the more precision is called for when researchers aim them at cognitive problems.

We suspect that today’s concepts about things like “optimization” and “aiming” are incapable of supporting the necessary precision, even if wielded by researchers who care a lot about safety. Part of why I think this is that if you pushed me to explain what I mean by “optimization” and “aiming,” I’d need to be careful to avoid spouting nonsense—which indicates that I’m still confused somewhere around here.

A worrying fact about this situation is that, as best I can tell, humanitydoesn’t需要这些概念的连贯版本hill-climbits way to AGI. Evolution hill-climbed that distance, and evolution had no model of what it was doing. But as evolution applied massive optimization pressure to genomes, those genomes started coding for brains thatinternally对仅与遗传适应性相关的靶标进行了优化。人类找到了满足我们自己的目标(视频游戏,冰淇淋,节育……)的越来越多的方法,即使这直接与引起我们的选择标准相反,“将您的基因传播到下一代中。”

If we are to avoid a similar fate—one where we attain AGI via huge amounts of gradient descent and other optimization techniques, only to find that the resulting system has internal optimization targets that are very different from the targets we externally optimized it to be adept at pursuing—then we must be more careful.

As AI researchers explore the space of optimizers, what will it take to ensure that the first highly capable optimizers that researchers find are optimizers they know how to aim at chosen tasks? I’m not sure, because I’m still in some sense confused about the question. I can tell you vaguely how the problem relates toconvergent instrumental incentives, 和I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize forX”实际上导致一个系统亚博体育苹果app官方下载内部优化forX, but there are still wide swaths of the question where I can’t say much without saying nonsense.

As an example, AI systems like Deep Blue and AlphaGo cannot reasonably be said to be reasoning about the whole world. They’re reasoning about some much simpler abstract platonic environment, such as a Go board. There’s an intuitive sense in which we don’t need to worry about these systems taking over the world, for this reason (among others), even in the world where those systems are run on implausibly large amounts of compute.

含糊地说,在某种意义上,直到AI系统“对现实世界的推理”,就不会出现一些一致性困难。亚博体育苹果app官方下载但是,这是什么意思?这似乎并不意味着“系统认为实际上包括现实本身的可能性空间”。亚博体育苹果app官方下载即使完全缺乏可以用特定物理方程式描述宇宙的概念,古代人类也做出了很好的一般推理。

它看起来像它必定有什么意思更像“the system is building internal models that, in some sense, are little representations of the whole of reality.” But what counts as a “little representation of reality,” and why do a hunter-gatherer’s confused thoughts about a spirit-riddled forest count while a chessboard doesn’t? All these questions are likely confused; my goal here is not to name coherent questions, but to gesture in the direction of a confusion that prevents me from precisely naming a portion of the alignment problem.

Or, to put it briefly: precisely naming a problem is half the battle, and we are currently confused about how to precisely name the alignment problem.

For an alternative attempt to name this concept, refer to Eliezer’srocket alignmentanalogy. For a further discussion of some of the reasons today’s concepts seem inadequate for describing an aligned intelligence with sufficient precision, see Scott and Abram’srecent write-up。(Or come discuss with us in person, at an “AI对计算机科学家的风险” workshop.)

Why this research may be tractable here and now

Many types of research become far easier at particular places and times. It seems to me that for the work of becoming less confused about AI alignment, MIRI in 2018 (and for a good number of years to come, I think) is one of those places and times.

Why? One point is that MIRI has some history of success at deconfusion-style research (according to me, at least), and MIRI’s researchers are beneficiaries of the local research traditions that grew up in dialog with that work. Among the bits of conceptual progress that MIRI contributed to are:

Logical inductors, as an example, give us at least a clue about why we’re apt to informally use words like “probably” in mathematical reasoning. It’s not a full answer to “how does probabilistic reasoning about mathematical facts work?”, but it does feel like an interesting hint—which is relevant to thinking about how “real-world” AI reasoning could possibly work, because AI systems might well also use probabilistic reasoning in mathematics.

第二点是,如果有东西unites most folks at MIRI除了提高人类生存几率的动力,这可能是使我们对宇宙基础的理解正确的味道。我们中的许多人都带来了这种口味 - 例如,我们许多人都有物理学背景(尤其是基本物理学),而我们中的编程背景的人往往对诸如类型理论,正式逻辑,,,和/或概率理论。

第三点,如前所述above, is that we are excited about our current bodies of research intuitions, and about how they seem increasingly transferable/cross-applicable/concretizable over time.

Finally, I observe that the field of AI at large is currently highly vitalized, largely by the deep learning revolution and various other advances in machine learning. We are not particularly focused on deep neural networks ourselves, but being in contact with a vibrant and exciting practical field is the sort of thing that tends to spark ideas. 2018 really seems like an unusually easy time to be seeking a theoretical science of AI alignment, in dialog with practical AI methods that are beginning to work.

3. Nondisclosed-by-default research, and how this policy fits into our overall strategy

Miri最近决定将其大部分研究“非默认为非违规”进行,我们的意思是,在此过程中,Mi亚博体育官网ri中发现的大多数结果将保持不动,除非有明确决定释放这些结果,通常是基于一个基于一个的结果他们发布的特定预期安全性。

I’d like to try to share some sense of why we chose this policy—especially because this policy may prove disappointing or inconvenient for many people interested in AI safety as a research area.9MIRI is a nonprofit, and there’s a natural default assumption that our mechanism for good is to regularly publish new ideas and insights. But we don’t think this is currently the right choice for serving our nonprofit mission.

The short version of why we chose this policy is:

  • we’re in a hurry to decrease existential risk;

  • in the same way that Faraday’s journals aren’t nearly as useful as Maxwell’s equations, and in the same way that logical induction isn’t all that useful to the average modern ML researcher, we don’t think it would be that useful to try to share lots of half-confused thoughts with a wider set of people;

  • 我们认为,如果我们专注于提出新的研究进展而不是在博览会上,并且如果我们对向广泛的受众的直觉辩解,我们可以更快地拥有更多的关键见解;亚博体育官网

  • we think it’s not unreasonable to be anxious about whether deconfusion-style insights could lead to capabilities insights, and have empirically observed we can think more freely when we don’t have to worry about this; and

  • 即使我们得出结论,这些问题是在反思后是偏执或愚蠢的,我们也受益于将这些恐惧从“内部共享见解”中评估到“在广泛分发这些见解之前”的认知工作,这是本政策所启用的。

The somewhat longer version is below.

I’ll caveat that in what follows I’m attempting to convey what I believe, but not necessarily why—I am not trying to give an argument that would cause any rational person to take the same strategy in my position; I am shooting only for the more modest goal of conveying how I myself am thinking about the decision.

I’ll begin by saying a few words about how our research fits into our overall strategy, then discuss the pros and cons of this policy.

When we say we’re doing AI alignment research, we really genuinely don’t mean outreach

At present, MIRI’s aim is to make research progress on the alignment problem. Our focus isn’t on shifting the field of ML toward taking AGI safety more seriously, nor on any other form of influence, persuasion, or field-building. We are simply and only aiming to directly make research progress on the core problems of alignment.

This choice may seem surprising to some readers—field-building and other forms of outreach can obviously have hugely beneficial effects, and throughout MIRI’s history, we’ve been much more outreach-oriented than the typical math research group.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,10such that MIRI’s time is better spent on taking a straight shot at the core research problems. Further, we think our own comparative advantage lies here, and not in outreach work.11

My beliefs here are connected to my beliefs about the mechanics of deconfusion describedabove。In particular, I believe that the alignment problem might start seeming significantly easier once it can be precisely named, and I believe that precisely naming this sort of problem is likely to be a serial challenge—in the sense that some deconfusions cannot be attained until other deconfusions have matured. Additionally, my read on history says that deconfusions regularly come from relatively small communities thinking the right kinds of thoughts (as in the case of Faraday and Maxwell), and that such deconfusions can spread rapidly as soon as the surrounding concepts become coherent (as exemplified by Bostrom’sSuperintelligence). I conclude from all this that trying to influence the wider field isn’t the best place to spend our own efforts.

It is difficult to predict whether successful deconfusion work could spark capability advances

We think that most of MIRI’s expected impact comes from worlds in which our deconfusion work eventually succeeds—that is, worlds where our research eventually leads to a principled understanding of alignable optimization that can be communicated to AI researchers, more akin to a modern understanding of calculus and differential equations than to Faraday’s notebooks (with the caveat that most of us aren’t expecting solutions to the alignment problem to compress nearly so well as calculus or Maxwell’s equations, but I digress).

这可能会发生的一种非常合理的方式是,我们的解灌注工作使对齐变得成为可能,而没有太多改变了AGI的可用途径。12To pick a trivial analogy illustrating this sort of world, considerinterval arithmeticas compared to the usual way of doing floating point operations. In interval arithmetic, an operation likesqrttakes two floating point numbers, a lower and an upper bound, and returns a lower and an upper bound on the result. Figuring out how to do interval arithmetic requires some careful thinking about the error of floating-point computations, and it certainly won’t speed those computations up; the only reason to use it is to ensure that the error incurred in a floating point operation isn’t larger than the user assumed. If you discover interval arithmetic, you’re at no risk of speeding up modern matrix multiplications, despite the fact that you really have found a new way of doing arithmetic that has certain desirable properties that normal floating-point arithmetic lacks.

In worlds where deconfusing ourselves about alignment leads us primarily to insights similar (on this axis) to interval arithmetic, it would be best for MIRI to distribute its research as widely as possible, especially once it has reached a stage where it is comparatively easy to communicate, in order to encourage AI capabilities researchers to adopt and build upon it.

However, it is also plausible to us that a successful theory of alignable optimization may itself spark new research directions in AI capabilities. For an analogy, consider the progression from classical probability theory and statistics to a modern deep neural net classifying images. Probability theory alone does not let you classify cat pictures, and it is possible to understand and implement an image classification network without thinking much about probability theory; but probability theory and statistics were central to the way machine learning was actually discovered, and still underlie how modern deep learning researchers think about their algorithms.

In worlds where deconfusing ourselves about alignment leads to insights similar (on this axis) to probability theory, it is much less clear whether distributing our results widely would have a positive impact. It goes without saying that we want to have a positive impact (or, at the very least, a neutral impact), even in those sorts of worlds.

The latter scenario is relatively less important in worlds whereAGI timelinesare short. If current deep learning research is already on the brink of AGI, for example, then it becomes less plausible that the results of MIRI’s deconfusion work could become a relevant influence on AI capabilities research, and most of the potential impact of our work would come from its direct applicability to deep-learning-based systems. While many of us at MIRI believe that short timelines are at least plausible, there is significant uncertainty and disagreement about timelines inside MIRI, and I would not feel comfortable committing to a course of action that is safe only in worlds where timelines are short.

总而言之,如果我们继续取得进步并最终取得了基本取得的成功,弄清了实际的“在其关节上裂解自然”的概念,使我们对一致性一致地思考,我发现相同的概念也可以实现能力,这是非常合理的boosts (especially in worlds where there’s a lot of time for those concepts to be pushed in capabilities-facing directions). There is certainly strong historical precedent for deep scientific insights yielding unexpected practical applications.

By the nature of deconfusion work, it seems very difficult to predict in advance which other ideas a given insight may unlock. These considerations seem to us to call for conservatism and delay on information releases—potentially very long delays, as it can take quite a bit of time to figure out where a given insight leads.

We need our researchers to not have walls within their own heads

We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13researchers report that they can think more freely, that their brainstorming sessions extend further and wider, and so on.

这种抑制似乎对研究进度很不利。亚博体育官网我们的研究人员(非或半自觉)退缩并不是一个小领域;亚博体育官网这是一个相当宽阔的杂物,很可能包括我们正在寻找的大多数深层想法或见解。

同时,这种谨慎是在公共场所进行反灌注研究的不可避免的结果,因为很难知道在给定的见解后五到十年可能会发生什么想法。亚博体育官网AI对齐工作和AI能力工作是足够的邻居,以使AI附近的许多见解“可能与能力相关,直到无害,这都是出于上述原因,也是从保守派的角度讨论的。security mindsetwe try to encourage around here.

In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.

If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.

Focus seems unusually useful for this kind of work

There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.

Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.

一旦我们意识到这种情况正在进行,我们意识到回想起来,我们可能会以某种方式忽略共同的实践。许多初创公司的创始人报告了寻找隐形模式和不是VC外部人员的资金,对重点非常有用。因此,最近我们还鼓励Miri的研究人员不必担心在做公共工作时吸引广泛的受众。亚博体育官网我们希望研究人员亚博体育官网主要关注他们发现最引人注目的任何研究方向,使博览会和蒸馏成为次要的优先级,而不必担心优化说服力或更易于捍卫的想法。

早期的解灌注工作还不是那么有用

ML researchers aren’t running around using logical induction or functional decision theory. These theories don’t have practical relevance to the researchers on the ground, and they’re not supposed to; the point of these theories is just deconfusion.

To put it more precisely, the theories themselves aren’t the interesting novelty; the novelty is that a few years ago, we couldn’t write downany这ory of how in principle to assign sane-seeming probabilities to mathematical facts, and today we can write down logical induction. In the journey from pointA指向B,我们变得不那么困惑。逻辑归纳论文是一个文物,目睹了该融合的融合,也是一个文物,在他们经历了编写过程的过程中,它赋予其作者更多的解灌注。but the thing that excited me about logical induction was not any one particular algorithm or theorem in the paper, but rather the fact that we’re a little bit less in-the-dark than we were about how a reasoner can reasonably assign probabilities to logical sentences. We’re not fully out of the dark on this front, mind you, but we’re a little less confused than we were before.14

If the rest of the world were talking about how confusing they find the AI alignment topics we’re confused about, and were as concerned about their confusions as we are concerned about ours, then failing to share our research would feel a lot more costly to me. But as things stand, most people in the space look at us kind of funny when we say that we’re excited about things like logical induction, and I repeatedly encounter deep misunderstandings when I talk to people who have read some of our papers and tried to infer our research motivations, from which I conclude that they weren’t drawing a lot of benefit from my current ramblings anyway.

And in a sense most of our current research is a form of rambling—in the same way, at best, that Faraday’s journal was rambling. It’s OK if most practical scientists avoid slogging through Faraday’s journal and wait until Maxwell comes along and distills the thing down to three useful equations. And, if Faraday expects that physical theories eventually distill, he doesn’t need to go around evangelizing his journal—he can just wait until it’s been distilled, and then work to transmit some less-confused concepts.

我们希望我们理解对齐,这是currently far from complete, to eventually distill, and I, at least, am not very excited about attempting to push it on anyone until it’s significantly more distilled. (Or, barring full distillation, until a project with a commitment to the common good, an adequatesecurity mindset, 和a large professed interest in deconfusion research comes knocking.)

在此期间,当然,Miri以外的一些研究人员关心我们遇到的同样问题,并且也在追求解灌亚博体育官网注。我们的非默认政策将对我们在其他研究方向上与这些人合作的能力产生负面影响,这是一个真正的成本,不值得驳回。亚博体育官网除了指出,如果您是其中之一,我在这里没有更多的话要说请与我们联系(and you may want to considerjoining the team)!

We’ll have a better picture of what to share or not share in the future

In the long run, if our research is going to be useful, our findings will need to go out into the world where they can impact how humanity builds AI systems. However, it doesn’t follow from this need for eventual distribution (of some sort) that we might as well publish all of our research immediately. As discussed above, as best I can tell, our current research insights just aren’t that practically useful, and sharing early-stage deconfusion research is time-intensive.

Our nondisclosed-by-default policy also allows us to preserve options like:

  • 确定我们认为应该进一步发展的研亚博体育官网究结果,同时考虑differential technological development;and
  • deciding which group(s) to share each interesting finding with (e.g., the general public, other closed safety research groups, groups with strong commitment to security mindset and the common good, etc.).

Future versions of us obviously have better abilities to make calls on these sorts of questions, though this needs to be weighed against many facts that push in the opposite direction—the later we decide what to release, the less time others have to build upon it, and the more likely it is to be found independently in the interim (thereby wasting time on duplicated efforts), and so on.

Now that I’ve listed reasons in favor of our nondisclosed-by-default policy, I’ll note some reasons against.

考虑我们的非默认政策的考虑因素

There are a host of pathways via which our work will be harder with this nondisclosed-by-default policy:

  1. We will have a harder time attracting and evaluating new researchers; sharing less research means getting fewer chances to try out various research collaborations and notice which collaborations work well for both parties.

  2. We lose some of the benefits of accelerating the progress of other researchers outside MIRI via sharing useful insights with them in real time as they are generated.

  3. We will be less able to get useful scientific insights and feedback from visitors, remote scholars, and researchers elsewhere in the world, since we will be sharing less of our work with them.

  4. We will have a harder time attracting funding and other indirect aid—with less of our work visible, it will be harder for prospective donors to know whether our work is worth supporting.

  5. We will have to pay various costs associated with keeping research private, including social costs and logistical overhead.

We expect these costs to be substantial. We will be working hard to offset some of the losses froma, as I’ll discuss in the next section. For reasons discussedabove, I’m not presently very worried aboutb。剩余的费用可能会全额支付。

These costs are why we didn’t adopt this policy (for most of our research) years ago. With outreach feeling less like our comparative advantage than it did in thepre-Puerto-Rico日子和资金似乎比以前的瓶颈更少(尽管仍然是瓶颈),但这种方法现在似乎是可行的。

We’ve already found it helpful in practice to let researchers have insights first and sort out the safety or desirability of publishing later. On the whole, then, we expect this policy to cause a significant net speed-up to our research progress, while ensuring that we can responsibly investigate some of the most important technical questions on our radar.

4. Joining the MIRI team

I believe that MIRI is, and will be for at least the next several years, a focal point of one of those rare scientifically exciting points in history, where the conditions are just right for humanity to substantially deconfuse itself about an area of inquiry it’s been pursuing for centuries—and one where the output is directly impactful in a way that is rare even among scientifically exciting places and times.

What can we offer? On my view:

  • Work that Eliezer, Benya, myself, and a number of researchers in AI safety view as having a significant chance of boosting humanity’s survival odds.

  • Work that, if it pans out, visibly has central relevance to the alignment problem—the kind of work that has a meaningful chance of shedding light on problems like “is there a loophole-free way to upper-bound the amount of optimization occurring within an optimizer?”.

  • Problems that, if your tastes match ours, feel closely related to fundamental questions about intelligence, agency, and the structure of reality; and the associated thrill of working on one of the great and wild frontiers of human knowledge, with large and important insights potentially close at hand.

  • 人们认真对待自己和他人的研究进展的氛围。亚博体育官网例如,您可以期望每天上班的同事们希望真正在AI对准问题上取得进展,并希望将他们的思维不同的侧面思考直到发生。我对Miri员工完成工作的动力印象深刻 - 他们对他们的工作确实很重要,以及他们对互相帮助的热情的事实明显地感激。

  • 作为Miri的越来越重点,经验基础上的计算机科学工作在AI对齐问题上,并清楚地反馈了“我的代码类型检查吗?”的形式。或“我们有证据吗?”。

  • Finally, some good, old-fashioned fun—for a certain very specific brand of “fun” that includes the satisfaction that comes from making progress on important technical challenges, the enjoyment that comes from pursuing lines of research you find compelling without needing to worry about writing grant proposals or otherwise raising funds, and the thrill that follows when you finally manage to distill a nugget of truth from a thick cloud of confusion.

Working at MIRI also means working with other people who were drawn by the very same factors—people who seem to me to have an unusual degree of care and concern for human welfare and the welfare of sentient life as a whole, an unusual degree of creativity and persistence in working on major technical problems, an unusual degree of cognitive reflection and skill with perspective-taking, and an unusual level of efficacy and grit.

我在Miri的经验是,这是一群真正想要帮助团队生活的人,从可能会极大地影响我们的未来的大型活动中获得良好的成果。谁可以直接解决重大挑战而不吸引人false narrativesabout how likely a given approach is to succeed; and who are remarkably good at fluidly updating on new evidence, and at creating a really fun environment for collaboration.

Who are we seeking?

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.

在实践中,这意味着:人本身的想法in math or code, who take seriously the problem of becoming less confused about AI alignment (quickly!), and who are generally capable. In particular, we’re looking for high-end Google programmer levels of capability; you don’t need a 1-in-a-million test score or ahaloofdestiny。You also don’t need a PhD, explicit ML background, or even prior research experience.

即使您没有指向我们的研究议程,我们仍打算为任何深入,良好和真正的新想法提供资金亚博体育官网或安排资金。这可能是作为雇用,奖学金赠款或可能需要的其他安排。

What to do if you think you might want to work here

如果你想要更多的信息,有几个好的options:

  • Chat withBuck Shlegeris,一位美里计算机科学家,帮助我们的招聘。除了回答您的任何问题并进行采访外,Buck有时还可以帮助熟练的程序员通过我们AI安全再培训计划

  • 如果您已经在Miri上认识其他人并与他们交谈似乎更好,那么您也可以与那个人接触-尤其Blake Borgeson(帮助我们进行技术招聘的新的Miri董事会成员)或Anna Salamon(Miri董事会成员,也是CFAR的总裁,并正在帮助举办一些Miri招聘活动)。

  • Come to a 4.5-dayAI对计算机科学家的风险workshop, co-run by MIRI and CFAR. These workshops are open only to people who Buck arbitrarily deems “probably above MIRI’s technical hiring bar,” though their scope is wider than simply hiring for MIRI—the basic idea is to get a bunch of highly capable computer scientists together to try to fathom AI risk (with a good bit of rationality content, and of trying to fathom the way we’re failing to fathom AI risk, thrown in for good measure).

    这些是获得美里文化的好方法,无论您是否有兴趣为Miri工作,都可以挑选许多思维工具。如果您想申请自己参加或提名您的朋友,在这里向我们发送您的信息

  • 来年MIRI Summer Fellows program, or be a暑期实习生和我们。对于针对代理基金会的数学人来说,这是一个更好的选择,而不是针对我们新研究方向的计算机科学人员。亚博体育官网去年夏天,我们聘请了6个实习生和30个Miri Summer Fellows(请参阅Malo的Summer MIRI Updatespost for more details). Also, note that “summer internships” need not occur during summer, if some other schedule is better for you. ContactColm Ó Riainif you’re interested.

  • You could just tryapplying for a job

Some final notes

A quick note on “inferential distance,” or on what it sometimes takes to understand MIRI researchers’ perspectives:To many, MIRI’s take on things is really weird. Many people who bump into our writing somewhere find our basic outlook pointlessly weird/silly/wrong, and thus find us uncompelling forever. Even among those who do ultimately find MIRI compelling, many start off thinking it’s weird/silly/wrong and then, after some months or years of MIRI’s worldview slowly rubbing off on them, eventually find that our worldview makes a bunch of unexpected sense.

If you think that you may be in this latter category, and that such a change of viewpoint, should it occur,wouldbebecauseMiri的世界观是一件事情,不是因为我们都被欺骗false-but-compellingideas… you might want to start exposing yourself to all this funny worldview stuff now, and see where it takes you. Good starting-points are理性:从AI到僵尸;Inadequate Equilibria;Harry Potter and the Methods of Rationality;这“AI对计算机科学家的风险” workshops; ordinaryCFAR workshops;or just hanging out with folks in ornearMIRI.

我怀疑我失败的一些关键的通信things above, based on past failed attempts to communicate my perspective, and based on some readers of earlier drafts of this post missing key things I’d wanted to say. I’ve tried to clarify as many points as possible—hence this post’s length!—but in the end, “we’re focusing on research and not exposition now” holds for me too, and I need to get back to the work.15

关于该领域状态的注释:Miri是试图解决AI Alignment技术问题的专门团队之一,但我们并不是唯一的团队。目前还有其他三个:Center for Human-Compatible AI在加州大学伯克利分校和安全团队OpenAIand atGoogle DeepMind。这三个安全团队都具有高度有能力的顶级研究小组,如果您想在该领域有所作为,我们也建议他们作为潜在的加入场所。亚博体育官网

在许多其他机构中,还有一些扎实的研究人员,亚博体育官网例如人类研究所的未来Governance of AI Programfocuses on the important social/coordination problems associated with AGI development.

要了解有关Miri和其他小组的AI对齐研究的更多信息,我建议您制作亚博体育官网Miri-wardAgent Foundationsand嵌入式代理写作;Dario Amodei,Chris Olah等人Concrete Problemsagenda; theAI Alignment Forum;andPaul Christianoand theDeepMind安全团队的博客。

On working here:Salaries here are more flexible than people usually suppose. I’ve had a number of conversations with folks who assumed that because we’re a nonprofit, we wouldn’t be able to pay them enough to maintain their desired standard of living, meet their financial goals, support their family well, or similar. This is false. If you bring the right skills, we’re likely able to provide the compensation you need. We also place a high value on weekends and vacation time, on avoiding burnout, and in general on people here being happy and thriving.

您确实需要在伯克利身体上与我们合作,以我们认为最令人兴奋的项目,尽管我们对搬家的搬迁协助和OPS提供了很大的支持。

尽管所有的工作IRI, I would consider working here a pretty terrible deal if all you wanted was a job. Reorienting to work on major global risks isn’t likely to be the most hedonic or relaxing option available to most people.

On the other hand, if you like the idea of an epic calling with a group of people who somehow claim to take seriously a task that sounds more like it comes from a science fiction novel than from aDilbertstrip, while having a lot of scientific fun; or you just care about humanity’s future, and want to help however you can… give us a call.


  1. This post is an amalgam put together by a variety of MIRI staff. The byline saying “Nate” means that I (Nate) endorse the post, and that many of the concepts and themes come in large part from me, and I wrote a decent number of the words. However, I did not write all of the words, and the concepts and themes were built in collaboration with a bunch of other MIRI staff. (This is roughly what bylines have meant on the MIRI blog for a while now, and it’s worth noting explicitly.)
  2. See our 2017strategic updateandfundraiserposts for more details.
  3. Inpastfundraisers,我们说过,通过足够的资金,我们希望对对齐问题进行替代的攻击线。我们的新研究方向亚博体育官网可以看作是遵循这种精神,的确,至少我们在2015年考虑的替代方法中至少有一个新的研究方向。我们的2015年筹款活动,我们的新作品与我们的代理创始风格的研究非常连续。亚博体育官网
  4. That is, the requisites for aligning AGI systems to perform limitedtasks;并非所有要对齐的必要条件CEV-班级自主agi。Compare Paul Christiano’s distinction betweenambitious and narrow value learning(尽管请注意,保罗认为狭窄的价值学习足以实现强烈自主的AGI)。
  5. This result is described more in a paper that will be out soon. Or, at least, eventually. I’m not putting a lot of time into writing papers these days, for reasons discussed below.
  6. 有关此概念的更多讨论,请参见“Personal Thoughts on Careers in AI Policy and Strategy” by Carrick Flynn.
  7. Historical examples of deconfusion work that gave rise to a rich and healthy field include the distillation of Lagrangian and Hamiltonian mechanics from Newton’s laws; Cauchy’s overhaul of real analysis; the slow acceptance of the usefulness of complex numbers; and the development of formal foundations of mathematics.
  8. I should emphasize that from my perspective, humanity never building AGI, never realizing our potential, and failing to make use of thecosmic endowmentwould be a tragedy comparable (on anastronomical比例尺),以使我们消灭我们。我说“危险”,但我们不应该忽视人类的好处。
  9. My own feeling is that I and other senior staff at MIRI have never been particularly好的at explaining what we’re doing and why, so this inconvenience may not be a new thing. It’s new, however, for us to not be making it a priority toattempt解释我们来自哪里。
  10. In other words, many people are explicitly focusing only on outreach, and many others are selecting technical problems to work on with a stated goal of strengthening the field and drawing others into it.
  11. This isn’t meant to suggest that nobody else is taking a straight shot at the core problems. For example, OpenAI’sPaul Christianois a top-tier researcher who is doing exactly that. But we nonetheless want more of this on the present margin.
  12. 例如,也许最简单的AGI途径涉及跟随当今梯度下降和深度学习技巧的后代,而对AGI的友善也许也是如此。
  13. In other words, retreats/rooms where it is common knowledge that all thoughts and ideas are not going to be shared, except perhaps after some lengthy and irritating bureaucratic process and with everyone’s active support.
  14. 顺便说一句,也许我试图发表学术论文的主要不适是,在AI中似乎没有场地,我们可以说:“嘿,请检查一下 - 我们曾经对此感到困惑X, 和now we can sayY, which means we’re a little bit less confused!” I think there are a bunch of reasons behind this, not least the fact that the nature of confusion is such thatYusually sounds obviously true once stated, and so it’s particularly difficult to make such a result sound like an impressive practical result.

    A side effect of this, unfortunately, is that all MIRI papers that I’ve ever written with the goal of academic publishing do a pretty bad job of saying what I was previously confused about, and how the “result” is indicative of me becoming less confused—for which I hereby apologize.

  15. If you have more questions, I encourage you to shoot us an email atcontact@www.hdjkn.com