2018 Update: Our New Research Directions

||MIRI Strategy,News

多年以来,Miri的目标一直是解决足够的基本困惑alignment和intelligence to enable humanity to think clearly about technical AI safety risks—and to do this before this technology advances to the point of potential catastrophe. This goal has always seemed to us to be difficult, but possible.1

去年,我们说我们正在启动针对这一目标的新研究计划。亚博体育官网2Here, we’re going to provide background on how we’re thinking about this new set of research directions, lay out some of the thinking behind our recent decision to do less default sharing of our research, and make the case for interested software engineers tojoin our team和help push our understanding forward.

Contents:

  1. Our research
  2. Why deconfusion is so important to us
  3. Nondisclosed-by-default research, and how this policy fits into our overall strategy
  4. Joining the MIRI team

1. Our research

2014年,Miri发布了其第一个研究议程,“亚博体育官网Agent Foundations for Aligning Machine Intelligence with Human Interests。” Since then, one of our main research priorities has been to develop a better conceptual understanding ofembedded agency:正式表征缺乏清晰的代理/环境边界的推理系统,比其环境小,必须推理自己,亚博体育苹果app官方下载并冒着在交叉目的工作的零件的风险。这些研究问题亚博体育官网仍然是Miri的主要重点,并且正在与我们的新研究方向同时研究(我将重点介绍下面的更多内容)。3

From our perspective, the point of working on these kinds of problems isn’t that solutions directly tell us how to build well-aligned AGI systems. Instead, the point is to resolve confusions we have around ideas like “alignment” and “AGI,” so that future AGI developers have an unobstructed view of the problem. Eliezer illustrates this idea in “The Rocket Alignment Problem”,它想象一个世界在了解牛顿力学或微积分之前试图降落在月球上的世界。

最近,一些Miri研究人员开发了新的研究亚博体育官网方向,这些方向似乎可以使得解决这些基本困惑的更可扩展的进展。Specifically, the progress is more scalable in researcher hours—it’s now the case that we believe excellent engineers coming from a variety of backgrounds can have their work efficiently converted into research progress at MIRI—where previously, we only knew how to speed our research progress with a (relatively atypical) breed of mathematician.

At the same time, we’ve seen some significantfinancial successover the past year—not so much that funding is no longer a constraint at all, but enough to pursue our research agenda from new and different directions, in addition to the old.

Furthermore, our view implies that haste is essential. We see AGI asa likely cause of existential catastrophes, especially if it’s developed with relatively brute-force-reliant, difficult-to-interpret techniques; and although we’re非常不确定about when humanity’s collective deadline will come to pass, many of us are somewhat alarmed by the speed of recent machine learning progress.

For these reasons, we’re eager to locate the right people quickly and offer them work on these new approaches; and with this kind of help, it strikes us as very possible that we can resolve enough fundamental confusion in time to port the understanding to those who will need it before AGI is built and deployed.

Comparing our new research directions and Agent Foundations

Our new research directions involve building software systems that we can use to test our intuitions, and building infrastructure that allows us to rapidly iterate this process. Like the Agent Foundations agenda, our new research directions continue to focus on “deconfusion,” rather than on, e.g., trying to improve robustness metrics of current systems—our sense being that even if we make major strides on this kind of robustness work, an AGI system built on principles similar to today’s systems would still be too opaque to align in practice.

从某种意义上说,您可以将我们的新研究视为解决我们一直在攻击的同样的问题,但亚博体育官网要从新角度开始。换句话说,如果您对logical inductors或者functional decision theory, you probably wouldn’t be excited by our new work either. Conversely, if you already have the sense that becoming less confused is a sane way to approach AI alignment, and you’ve been wanting to see those kinds of confusions attacked with software and experimentation in a manner that yields theoretical satisfaction, then you may well want to work at MIRI. (I’ll have more to say about thisbelow

Our new research directions stem from some distinct ideas had by Benya Fallenstein, Eliezer Yudkowsky, and myself (Nate Soares). Some high-level themes of these new directions include:

  1. Seeking entirely new low-level foundations for optimization, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations.

    请注意,这并不需要试图击败模式rn ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations for optimization that are broadly applicable in the same way, and for some of the same reasons, that gradient descent scales to be broadly applicable, while possessing significantly better alignment characteristics.

    我们知道,有很多方法可以尝试这种方法,这些方法是浅薄,愚蠢或注定要失败的。尽管如此,我们相信我们自己的研究途径有一个镜头。亚博体育官网

  2. Endeavoring to figure out parts of cognition that can be very transparent as cognition, without being GOFAI or completely disengaged from subsymbolic cognition.

  3. Experimenting with some specific alignment problemsthat are deeper than problems that have previously been put into computational environments.

In common between all our new approaches is a focus on using high-level theoretical abstractions to enable coherent reasoning about the systems we build. A concrete implication of this is that we write lots of our code in Haskell, and are often thinking about our code through the lens of type theory.

We aren’t going to distribute the technical details of this work anytime soon, in keeping with the recent MIRI policy changesdiscussed below。However, we have a good deal to say about this research on the meta level.

We are excited about these research directions, both for their present properties and for the way they seem to be developing. When Benya began the predecessor of this work ~3 years ago, we didn’t know whether her intuitions would pan out. Today, having watched the pattern by which research avenues in these spaces have opened up new exciting-feeling lines of inquiry, none of us expect this research to die soon, and some of us are hopeful that this work may eventually open pathways to attacking the entire list of basic alignment issues.4

我们也同样兴奋的程度eful cross-connections have arisen between initially-unrelated-looking strands of our research. During a period where I was focusing primarily on new lines of research, for example, I stumbled across a solution to the original version of thetiling agents problemfrom the Agent Foundations agenda.5

这项工作似乎比《代理基金会议程》更多地“发出自己的指南”。虽然过去我们需要对研究口味非常紧密地适应研究口味,但现在我们认为我们对地形有足够的感觉,可以放松这些要求。亚博体育官网我们仍在寻找科学创新和是的员工fairlyclose on research taste, but our work is now much more scalable with the number of good mathematicians and engineers working at MIRI.

的说,尽管有前途的the last couple of years have seemed to us, this is still “blue sky” research in the sense that we’d guess most outside MIRI would still regard it as of academic interest but of no practical interest. The more principled/coherent/alignable optimization algorithms we are investigating are not going to sort cat pictures from non-cat pictures anytime soon.

The thing that generally excites us about research results is the extent to which they grant us “deconfusion” in the sense described in the next section, not the ML/engineering power they directly enable. This “deconfusion” that they allegedly reflect must, for the moment, be discerned mostly via abstract arguments supported only weakly by concrete “look what this understanding lets us do” demos. Many of us at MIRI regard our work as being of strong practical relevance nonetheless—but that is because we have long-term models of what sorts of short-term feats indicate progress, and because we view becoming less confused about alignment as having a strong practical relevance to humanity’s future, for reasons that I’ll sketch out next.

2. Why deconfusion is so important to us

What we mean by deconfusion

Quoting Anna Salamon, the president of the Center for Applied Rationality and a MIRI board member:

If I didn’t have the concept of deconfusion, MIRI’s efforts would strike me as mostly inane. MIRI continues to regard its own work as significant for human survival, despite the fact that many larger and richer organizations are now talking about AI safety. It’s a group that got all excited about逻辑归纳(and tried paranoidly to make sure Logical Induction “wasn’t dangerous”beforereleasing it)—even though Logical Induction had only a moderate amount of math and no practical engineering at all (and did something similar withTimeless Decision Theory, to pick an even more extreme example). It’s a group that continues to stare mostly at basic concepts, sitting reclusively off by itself, while mostly leaving questions of politics, outreach, and how much influence the AI safety community has, to others.

However, I do have the concept of deconfusion. And when I look at MIRI’s activities through that lens, MIRI seems to me much more like “oh, yes, good, someoneis直接拍摄看起来像关键的事情”和“他们似乎有战斗机会”和“天哪,我希望他们(或某人以某种方式)在截止日期之前解决了更多的困惑,因为没有这样的进步,人类肯定会肯定似乎有点沉没。”

I agree that MIRI’s perspective and strategy don’t make much sense without the idea I’m calling “deconfusion.” As someone reading a MIRI strategy update, you probably already partly have this concept, but I’ve found that it’s not trivial to transmit the full idea, so I ask your patience as I try to put it into words.

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

举一个具体的例子,我对10岁的无穷大的想法是由重新排列的混乱而不是连贯的,即使是1700年最好的数学家的想法也是如此。如果我们从方程式的两侧减去无穷大,会发生什么?”但是我对20岁的无穷大的想法是notsimilarly confused, because, by then, I’d been exposed to the more coherent concepts that later mathematicians labored to produce. I wasn’t as smart or as good of a mathematician as Georg Cantor or the best mathematicians from 1700; but deconfusion can be transferred between people; and this transfer can spread the ability to think actually coherent thoughts.

In 1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence anincoherentconcept,” “but the economy’salreadysuperintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also besmart enoughto see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerouslysmarterthan us, because Turing-complete computations can emulate anything,” and “anyhow, we could justunplugit.”) Today, these conversations are different. In between, folks worked to make themselves and others less fundamentally confused about these topics—so that today, a 14-year-old who wants to skip to the end of all that incoherence can just pick up a copy of Nick Bostrom’sSuperintelligence6

Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.

Interestingly, the history of science is in fact full of instances in which individual researchers possessed a mostly-correct body of intuitions for a long time, and then eventually those intuitions were formalized, corrected, made precise, and transferred between people. Faraday discovered a wide array of electromagnetic phenomena, guided by an intuition that he wasn’t able to formalize or transmit except through hundreds of pages of detailed laboratory notes and diagrams; Maxwell later invented the language to describe electromagnetism formally by reading Faraday’s work, and expressed those hundreds of pages of intuitions in three lines.

An even more striking example is the case of Archimedes, who intuited his way to the ability to do useful work in both integral and differential calculus thousands of years before calculus became a simple formal thing that could be passed between people.

In both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.7

Why deconfusion (on our view) is highly relevant to AI accident risk

如果人类最终建立了比人类的更聪明的人AI,并且像我们目前所期望的那样强大而更聪明的人类AI,那么AI将有一天会带来巨大的优化力量。8我们认为,当发生这种情况时,在理论上善待的背景下,需要将这些巨大的力量置于现实世界中的问题和子问题上。这些力量越大,当研究人员针对认知问题时,要求更精确。亚博体育官网

We suspect that today’s concepts about things like “optimization” and “aiming” are incapable of supporting the necessary precision, even if wielded by researchers who care a lot about safety. Part of why I think this is that if you pushed me to explain what I mean by “optimization” and “aiming,” I’d need to be careful to avoid spouting nonsense—which indicates that I’m still confused somewhere around here.

A worrying fact about this situation is that, as best I can tell, humanitydoesn’tneed coherent versions of these concepts tohill-climbits way to AGI. Evolution hill-climbed that distance, and evolution had no model of what it was doing. But as evolution applied massive optimization pressure to genomes, those genomes started coding for brains thatinternallyoptimized for targets that merely correlated with genetic fitness. Humans find ever-smarter ways to satisfy our own goals (video games, ice cream, birth control…) even when this runs directly counter to the selection criterion that gave rise to us: “propagate your genes into the next generation.”

If we are to avoid a similar fate—one where we attain AGI via huge amounts of gradient descent and other optimization techniques, only to find that the resulting system has internal optimization targets that are very different from the targets we externally optimized it to be adept at pursuing—then we must be more careful.

As AI researchers explore the space of optimizers, what will it take to ensure that the first highly capable optimizers that researchers find are optimizers they know how to aim at chosen tasks? I’m not sure, because I’m still in some sense confused about the question. I can tell you vaguely how the problem relates toconvergent instrumental incentives, and I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize forX” to actually result in a system thatinternally optimizesforX, but there are still wide swaths of the question where I can’t say much without saying nonsense.

As an example, AI systems like Deep Blue and AlphaGo cannot reasonably be said to be reasoning about the whole world. They’re reasoning about some much simpler abstract platonic environment, such as a Go board. There’s an intuitive sense in which we don’t need to worry about these systems taking over the world, for this reason (among others), even in the world where those systems are run on implausibly large amounts of compute.

Vaguely speaking, there’s a sense in which some alignment difficulties don’t arise until an AI system is “reasoning about the real world.” But what does that mean? It doesn’t seem to mean “the space of possibilities that the system considers literally concretely includes reality itself.” Ancient humans did perfectly good general reasoning even while utterly lacking the concept that the universe can be described by specific physical equations.

它看起来像它必定有什么意思更像“the system is building internal models that, in some sense, are little representations of the whole of reality.” But what counts as a “little representation of reality,” and why do a hunter-gatherer’s confused thoughts about a spirit-riddled forest count while a chessboard doesn’t? All these questions are likely confused; my goal here is not to name coherent questions, but to gesture in the direction of a confusion that prevents me from precisely naming a portion of the alignment problem.

Or, to put it briefly: precisely naming a problem is half the battle, and we are currently confused about how to precisely name the alignment problem.

要为命名这个概念的替代尝试,请参阅Eliezer的rocket alignmentanalogy. For a further discussion of some of the reasons today’s concepts seem inadequate for describing an aligned intelligence with sufficient precision, see Scott and Abram’srecent write-up。(Or come discuss with us in person, at an “AI Risk for Computer Scientists“ 作坊。)

为什么这项研究在这亚博体育官网里和现在可能是可以解决的

Many types of research become far easier at particular places and times. It seems to me that for the work of becoming less confused about AI alignment, MIRI in 2018 (and for a good number of years to come, I think) is one of those places and times.

为什么?一个一点是,Miri在解灌注风格的研究中有一些成功的历史(至少是我的说法),而Miri的研究人员是与该工作对话中长大的当地研究传统的受益者。亚博体育官网Miri贡献的概念进步的位置包括:

Logical inductors, as an example, give us at least a clue about why we’re apt to informally use words like “probably” in mathematical reasoning. It’s not a full answer to “how does probabilistic reasoning about mathematical facts work?”, but it does feel like an interesting hint—which is relevant to thinking about how “real-world” AI reasoning could possibly work, because AI systems might well also use probabilistic reasoning in mathematics.

第二点是,如果有些东西使大多数人团结在Miribesidesa drive to increase the odds of human survival, it is probably a taste for getting our understanding of the foundations of the universe right. Many of us came in with this taste—for example, many of us have backgrounds in physics (and fundamental physics in particular), and those of us with a background in programming tend to have an interest in things like type theory, formal logic, and/or probability theory.

A third point, as noted多于,是我们对当前的研究直觉的身体感到兴奋,以及它们如何随着时间的流逝而变得越来越可转移/跨涂抹/具体化。亚博体育官网

最后,我观察到,整个AI领域目前正在高度生命,这在很大程度上是由于深度学习革命和机器学习方面的其他各种进步。我们本身并不特别专注于深度神经网络,但是与充满活力和令人兴奋的实践领域接触的是那种倾向于激发想法的事情。2018年似乎确实是一个非常容易的时间来寻求AI对齐理论科学,与实用的AI方法开始使用的对话中。

3.非默认研究的非公开研究,以及该政策如何符合我们的整亚博体育官网体战略

MIRI recently decided to make most of its research “nondisclosed-by-default,” by which we mean that going forward, most results discovered within MIRI will remain internal-only unless there is an explicit decision to release those results, based usually on a specific anticipated safety upside from their release.

I’d like to try to share some sense of why we chose this policy—especially because this policy may prove disappointing or inconvenient for many people interested in AI safety as a research area.9MIRI is a nonprofit, and there’s a natural default assumption that our mechanism for good is to regularly publish new ideas and insights. But we don’t think this is currently the right choice for serving our nonprofit mission.

The short version of why we chose this policy is:

  • we’re in a hurry to decrease existential risk;

  • in the same way that Faraday’s journals aren’t nearly as useful as Maxwell’s equations, and in the same way that logical induction isn’t all that useful to the average modern ML researcher, we don’t think it would be that useful to try to share lots of half-confused thoughts with a wider set of people;

  • we believe we can have more of the critical insights faster if we stay focused on making new research progress rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences;

  • we think it’s not unreasonable to be anxious about whether deconfusion-style insights could lead to capabilities insights, and have empirically observed we can think more freely when we don’t have to worry about this; and

  • even when we conclude that those concerns were paranoid or silly upon reflection, we benefited from moving the cognitive work of evaluating those fears from “before internally sharing insights” to “before broadly distributing those insights,” which is enabled by this policy.

下面的更长版本。

我要警告说,在下面的事情中,我试图传达我的信念,但不一定是为什么 - 我并不是要提出一个会导致任何理性的人在我的立场上采取相同策略的论点;我只是为了传达自己如何思考决定的更为适中的目标。

I’ll begin by saying a few words about how our research fits into our overall strategy, then discuss the pros and cons of this policy.

当我们说我们正在进行AI对齐研究时,我们真的并不是说外展亚博体育官网

At present, MIRI’s aim is to make research progress on the alignment problem. Our focus isn’t on shifting the field of ML toward taking AGI safety more seriously, nor on any other form of influence, persuasion, or field-building. We are simply and only aiming to directly make research progress on the core problems of alignment.

This choice may seem surprising to some readers—field-building and other forms of outreach can obviously have hugely beneficial effects, and throughout MIRI’s history, we’ve been much more outreach-oriented than the typical math research group.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,10这样一来,美里的时间最好花在直接面对核心研究问题上。亚博体育官网此外,我们认为我们自己的比较优势在于这里,而不是外展工作。11

My beliefs here are connected to my beliefs about the mechanics of deconfusion described多于。In particular, I believe that the alignment problem might start seeming significantly easier once it can be precisely named, and I believe that precisely naming this sort of problem is likely to be a serial challenge—in the sense that some deconfusions cannot be attained until other deconfusions have matured. Additionally, my read on history says that deconfusions regularly come from relatively small communities thinking the right kinds of thoughts (as in the case of Faraday and Maxwell), and that such deconfusions can spread rapidly as soon as the surrounding concepts become coherent (as exemplified by Bostrom’sSuperintelligence)。我从所有这一切中得出结论,试图影响更广泛的领域并不是花费我们自己努力的最佳场所。

很难预测成功的解灌注工作是否可以引发能力的进步

We think that most of MIRI’s expected impact comes from worlds in which our deconfusion work eventually succeeds—that is, worlds where our research eventually leads to a principled understanding of alignable optimization that can be communicated to AI researchers, more akin to a modern understanding of calculus and differential equations than to Faraday’s notebooks (with the caveat that most of us aren’t expecting solutions to the alignment problem to compress nearly so well as calculus or Maxwell’s equations, but I digress).

One pretty plausible way this could go is that our deconfusion work makes alignment possible, without much changing the set of available pathways to AGI.12To pick a trivial analogy illustrating this sort of world, considerinterval arithmeticas compared to the usual way of doing floating point operations. In interval arithmetic, an operation likesqrttakes two floating point numbers, a lower and an upper bound, and returns a lower and an upper bound on the result. Figuring out how to do interval arithmetic requires some careful thinking about the error of floating-point computations, and it certainly won’t speed those computations up; the only reason to use it is to ensure that the error incurred in a floating point operation isn’t larger than the user assumed. If you discover interval arithmetic, you’re at no risk of speeding up modern matrix multiplications, despite the fact that you really have found a new way of doing arithmetic that has certain desirable properties that normal floating-point arithmetic lacks.

In worlds where deconfusing ourselves about alignment leads us primarily to insights similar (on this axis) to interval arithmetic, it would be best for MIRI to distribute its research as widely as possible, especially once it has reached a stage where it is comparatively easy to communicate, in order to encourage AI capabilities researchers to adopt and build upon it.

但是,对于我们来说,成功的优化理论本身可能会引发AI能力的新研究方向,这对我们来说也是合理的。亚博体育官网为了进行类比,请考虑从经典概率理论和统计学到现代深度神经网络分类图像的发展。仅概率理论并不能让您对猫的图片进行分类,并且可以理解和实施图像分类网络而无需考虑概率理论。但是,概率理论和统计学对于实际发现机器学习的方式至关重要,并且仍然是现代深度学习研究人员如何看待其算法的基础。亚博体育官网

在对一致性的解体的世界中,与概率理论相似的见解(在此轴上)相似,尚不清楚是否会产生积极的影响。不用说,即使在这些世界中,我们也希望产生积极的影响(或至少是中立的影响)。

The latter scenario is relatively less important in worlds whereAGI timelinesare short. If current deep learning research is already on the brink of AGI, for example, then it becomes less plausible that the results of MIRI’s deconfusion work could become a relevant influence on AI capabilities research, and most of the potential impact of our work would come from its direct applicability to deep-learning-based systems. While many of us at MIRI believe that short timelines are at least plausible, there is significant uncertainty and disagreement about timelines inside MIRI, and I would not feel comfortable committing to a course of action that is safe only in worlds where timelines are short.

In sum, if we continue to make progress on, and eventually substantially succeed at, figuring out the actual “cleave nature at its joints” concepts that let us think coherently about alignment, I find it quite plausible that those same concepts may also enable capabilities boosts (especially in worlds where there’s a lot of time for those concepts to be pushed in capabilities-facing directions). There is certainly strong historical precedent for deep scientific insights yielding unexpected practical applications.

By the nature of deconfusion work, it seems very difficult to predict in advance which other ideas a given insight may unlock. These considerations seem to us to call for conservatism and delay on information releases—potentially very long delays, as it can take quite a bit of time to figure out where a given insight leads.

We need our researchers to not have walls within their own heads

We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13亚博体育官网研究人员报告说,他们可以更自由地思考,他们的集思广益会议进一步扩大,更广泛,等等。

This sort of inhibition seems quite bad for research progress. It is not a small area that our researchers were (un- or semi-consciously) holding back from; it’s a reasonably wide swath that may well include most of the deep ideas or insights we’re looking for.

At the same time, this kind of caution is an unavoidable consequence of doing deconfusion research in public, since it’s very hard to know what ideas may follow five or ten years after a given insight. AI alignment work and AI capabilities work are close enough neighbors that many insights in the vicinity of AI alignment are “potentially capabilities-relevant until proven harmless,” both for reasons discussed above and from the perspective of the conservativesecurity mindsetwe try to encourage around here.

In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.

If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.

重点似乎对这类工作异常有用

There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.

Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.

Once we realized this was going on, we realized that in retrospect, we may have been ignoring common practice, in a way. Many startup founders have reported finding stealth mode, and funding that isn’t from VC outsiders, tremendously useful for focus. For this reason, we’ve also recently been encouraging researchers at MIRI to worry less about appealing to a wide audience when doing public-facing work. We want researchers to focus mainly on whatever research directions they find most compelling, make exposition and distillation a secondary priority, and not worry about optimizing ideas for persuasiveness or for being easier to defend.

Early deconfusion work just isn’t that useful (yet)

ML researchers aren’t running around using logical induction or functional decision theory. These theories don’t have practical relevance to the researchers on the ground, and they’re not supposed to; the point of these theories is just deconfusion.

更准确地说,理论本身并不是有趣的新颖性。新颖的是几年前,我们无法写下任何theory of how in principle to assign sane-seeming probabilities to mathematical facts, and today we can write down logical induction. In the journey from pointAto pointB,我们变得不那么困惑了。逻辑归纳paper is an artifact witnessing that deconfusion, and an artifact which granted its authors additional deconfusion as they went through the process of writing it; but the thing that excited me about logical induction was not any one particular algorithm or theorem in the paper, but rather the fact that we’re a little bit less in-the-dark than we were about how a reasoner can reasonably assign probabilities to logical sentences. We’re not fully out of the dark on this front, mind you, but we’re a little less confused than we were before.14

If the rest of the world were talking about how confusing they find the AI alignment topics we’re confused about, and were as concerned about their confusions as we are concerned about ours, then failing to share our research would feel a lot more costly to me. But as things stand, most people in the space look at us kind of funny when we say that we’re excited about things like logical induction, and I repeatedly encounter deep misunderstandings when I talk to people who have read some of our papers and tried to infer our research motivations, from which I conclude that they weren’t drawing a lot of benefit from my current ramblings anyway.

And in a sense most of our current research is a form of rambling—in the same way, at best, that Faraday’s journal was rambling. It’s OK if most practical scientists avoid slogging through Faraday’s journal and wait until Maxwell comes along and distills the thing down to three useful equations. And, if Faraday expects that physical theories eventually distill, he doesn’t need to go around evangelizing his journal—he can just wait until it’s been distilled, and then work to transmit some less-confused concepts.

我们希望我们理解对齐,这是currently far from complete, to eventually distill, and I, at least, am not very excited about attempting to push it on anyone until it’s significantly more distilled. (Or, barring full distillation, until a project with a commitment to the common good, an adequatesecurity mindset, and a large professed interest in deconfusion research comes knocking.)

在此期间,当然有一些researche亚博体育官网rs outside MIRI who care about the same problems we do, and who are also pursuing deconfusion. Our nondisclosed-by-default policy will negatively affect our ability to collaborate with these people on our other research directions, and this is a real cost and not worth dismissing. I don’t have much more to say about this here beyond noting that if you’re one of those people, you’re very welcome toget in touch with us(and you may want to considerjoining the team)!

我们将更好地了解将来要分享或不分享的内容

In the long run, if our research is going to be useful, our findings will need to go out into the world where they can impact how humanity builds AI systems. However, it doesn’t follow from this need for eventual distribution (of some sort) that we might as well publish all of our research immediately. As discussed above, as best I can tell, our current research insights just aren’t that practically useful, and sharing early-stage deconfusion research is time-intensive.

Our nondisclosed-by-default policy also allows us to preserve options like:

  • deciding which research findings we think should be developed further, while thinking aboutdifferential technological development; and
  • deciding which group(s) to share each interesting finding with (e.g., the general public, other closed safety research groups, groups with strong commitment to security mindset and the common good, etc.).

Future versions of us obviously have better abilities to make calls on these sorts of questions, though this needs to be weighed against many facts that push in the opposite direction—the later we decide what to release, the less time others have to build upon it, and the more likely it is to be found independently in the interim (thereby wasting time on duplicated efforts), and so on.

Now that I’ve listed reasons in favor of our nondisclosed-by-default policy, I’ll note some reasons against.

Considerations pulling against our nondisclosed-by-default policy

有许多途径,通过这种非违约政策,我们的工作将变得更加困难:

  1. 我们将很难吸引和评估新的研究人员;亚博体育官网分享较少的研究意味着尝试进亚博体育官网行各种研究合作的机会较少,并注意哪些合作对双方都效果很好。

  2. We lose some of the benefits of accelerating the progress of other researchers outside MIRI via sharing useful insights with them in real time as they are generated.

  3. We will be less able to get useful scientific insights and feedback from visitors, remote scholars, and researchers elsewhere in the world, since we will be sharing less of our work with them.

  4. 我们将很难吸引资金和其他间接援助,而我们的工作较少,而潜在捐助者很难知道我们的工作是否值得支持。

  5. We will have to pay various costs associated with keeping research private, including social costs and logistical overhead.

We expect these costs to be substantial. We will be working hard to offset some of the losses froma, as I’ll discuss in the next section. For reasons discussed多于,我目前不太担心b。The remaining costs will probably be paid in full.

These costs are why we didn’t adopt this policy (for most of our research) years ago. With outreach feeling less like our comparative advantage than it did in thepre-Puerto-Ricodays, and funding seeming like less of a bottleneck than it used to (though still something of a bottleneck), this approach now seems workable.

We’ve already found it helpful in practice to let researchers have insights first and sort out the safety or desirability of publishing later. On the whole, then, we expect this policy to cause a significant net speed-up to our research progress, while ensuring that we can responsibly investigate some of the most important technical questions on our radar.

4. Joining the MIRI team

I believe that MIRI is, and will be for at least the next several years, a focal point of one of those rare scientifically exciting points in history, where the conditions are just right for humanity to substantially deconfuse itself about an area of inquiry it’s been pursuing for centuries—and one where the output is directly impactful in a way that is rare even among scientifically exciting places and times.

我们能提供什么?在我看来:

  • Work that Eliezer, Benya, myself, and a number of researchers in AI safety view as having a significant chance of boosting humanity’s survival odds.

  • Work that, if it pans out, visibly has central relevance to the alignment problem—the kind of work that has a meaningful chance of shedding light on problems like “is there a loophole-free way to upper-bound the amount of optimization occurring within an optimizer?”.

  • 如果您的口味与我们的口味相匹配,那么与有关智力,代理和现实结构的基本问题密切相关的问题;以及在人类知识的伟大和野生边界之一工作的相关刺激,具有大量重要的见解可能会紧密。

  • An atmosphere in which people are taking their own and others’ research progress seriously. For example, you can expect colleagues who come into work every day looking to actually make headway on the AI alignment problem, and looking to pull their thinking different kinds of sideways until progress occurs. I’m consistently impressed with MIRI staff’s drive to get the job done—with their visible appreciation for the fact that their work really matters, and their enthusiasm for helping one another make forward strides.

  • As an increasing focus at MIRI, empirically grounded computer science work on the AI alignment problem, with clear feedback of the form “did my code type-check?” or “do we have a proof?”.

  • Finally, some good, old-fashioned fun—for a certain very specific brand of “fun” that includes the satisfaction that comes from making progress on important technical challenges, the enjoyment that comes from pursuing lines of research you find compelling without needing to worry about writing grant proposals or otherwise raising funds, and the thrill that follows when you finally manage to distill a nugget of truth from a thick cloud of confusion.

Working at MIRI also means working with other people who were drawn by the very same factors—people who seem to me to have an unusual degree of care and concern for human welfare and the welfare of sentient life as a whole, an unusual degree of creativity and persistence in working on major technical problems, an unusual degree of cognitive reflection and skill with perspective-taking, and an unusual level of efficacy and grit.

My own experience at MIRI has been that this is a group of people who really want to help Team Life get good outcomes from the large-scale events that are likely to dramatically shape our future; who can tackle big challenges head-on without appealing tofalse narrativesabout how likely a given approach is to succeed; and who are remarkably good at fluidly updating on new evidence, and at creating a really fun environment for collaboration.

Who are we seeking?

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.

在实践中,这意味着:人本身的想法in math or code, who take seriously the problem of becoming less confused about AI alignment (quickly!), and who are generally capable. In particular, we’re looking for high-end Google programmer levels of capability; you don’t need a 1-in-a-million test score or ahalo命运。您也不需要博士学位,明确的ML背景甚至先前的研究经验。亚博体育官网

Even if you’re not pointed towards our research agenda, we intend to fund or help arrange funding for any deep, good, and truly new ideas in alignment. This might be as a hire, a fellowship grant, or whatever other arrangements may be needed.

如果您认为您可能想在这里工作该怎么办

如果你想要更多的信息,有几个good options:

  • Chat withBuck Shlegeris, a MIRI computer scientist who helps out with our recruiting. In addition to answering any of your questions and running interviews, Buck can sometimes help skilled programmers take some time off to skill-build through ourAI Safety Retraining Program

  • If you already know someone else at MIRI and talking with them seems better, you might alternativelyreach out to that person—especially布莱克·博格森(Blake Borgeson)(a new MIRI board member who helps us with technical recruiting) orAnna Salamon(a MIRI board member who is also the president of CFAR, and is helping run some MIRI recruiting events).

  • 来4.5天AI Risk for Computer Scientistsworkshop, co-run by MIRI and CFAR. These workshops are open only to people who Buck arbitrarily deems “probably above MIRI’s technical hiring bar,” though their scope is wider than simply hiring for MIRI—the basic idea is to get a bunch of highly capable computer scientists together to try to fathom AI risk (with a good bit of rationality content, and of trying to fathom the way we’re failing to fathom AI risk, thrown in for good measure).

    These are a great way to get a sense of MIRI’s culture, and to pick up a number of thinking tools whether or not you are interested in working for MIRI. If you’d like to either apply to attend yourself or nominate a friend of yours,send us your info here

  • Come to next year’sMIRI Summer Fellows program, or be asummer internwith us. This is a better option for mathy folks aiming at Agent Foundations than for computer sciencey folks aiming at our new research directions. This last summer we took 6 interns and 30 MIRI Summer Fellows (see Malo’sSummer MIRI Updatespost for more details). Also, note that “summer internships” need not occur during summer, if some other schedule is better for you. ContactColm Ó Riainif you’re interested.

  • You could just tryapplying for a job

Some final notes

A quick note on “推理距离,” or on what it sometimes takes to understand MIRI researchers’ perspectives:To many, MIRI’s take on things is really weird. Many people who bump into our writing somewhere find our basic outlook pointlessly weird/silly/wrong, and thus find us uncompelling forever. Even among those who do ultimately find MIRI compelling, many start off thinking it’s weird/silly/wrong and then, after some months or years of MIRI’s worldview slowly rubbing off on them, eventually find that our worldview makes a bunch of unexpected sense.

如果您认为自己可能属于后一类,并且这种观点的改变,如果发生这种情况,wouldbebecauseMIRI’s worldview is onto something and not because we all got tricked byfalse-but-compellingideas… you might want to start exposing yourself to all this funny worldview stuff now, and see where it takes you. Good starting-points areRationality: From AI to Zombies;Inadequate Equilibria;Harry Potter and the Methods of Rationality; the “AI Risk for Computer Scientists” workshops; ordinaryCFAR workshops; or just hanging out with folks in ornear美里。

我怀疑我失败的一些关键的通信things above, based on past failed attempts to communicate my perspective, and based on some readers of earlier drafts of this post missing key things I’d wanted to say. I’ve tried to clarify as many points as possible—hence this post’s length!—but in the end, “we’re focusing on research and not exposition now” holds for me too, and I need to get back to the work.15

A note on the state of the field:MIRI is one of the dedicated teams trying to solve technical problems in AI alignment, but we’re not the only such team. There are currently three others: theCenter for Human-Compatible AIat UC Berkeley, and the safety teams atOpenAI和atGoogle DeepMind。All three of these safety teams are highly capable, top-of-their-class research groups, and we recommend them too as potential places to join if you want to make a difference in this field.

There are also solid researchers based at many other institutions, like the Future of Humanity Institute, whoseGovernance of AI Programfocuses on the important social/coordination problems associated with AGI development.

To learn more about AI alignment research at MIRI and other groups, I recommend the MIRI-producedAgent FoundationsEmbedded Agencywrite-ups; Dario Amodei, Chris Olah, et al.’sConcrete Problemsagenda; theAI Alignment Forum; andPaul ChristianoDeepMind safety team’s blogs.

On working here:这里的薪水比人们通常想象的更灵活。I’ve had a number of conversations with folks who assumed that because we’re a nonprofit, we wouldn’t be able to pay them enough to maintain their desired standard of living, meet their financial goals, support their family well, or similar. This is false. If you bring the right skills, we’re likely able to provide the compensation you need. We also place a high value on weekends and vacation time, on avoiding burnout, and in general on people here being happy and thriving.

You do need to be physically in Berkeley to work with us on the projects we think are most exciting, though we have pretty great relocation assistance and ops support for moving.

尽管所有的工作IRI, I would consider working here a pretty terrible deal if all you wanted was a job. Reorienting to work on major global risks isn’t likely to be the most hedonic or relaxing option available to most people.

On the other hand, if you like the idea of an epic calling with a group of people who somehow claim to take seriously a task that sounds more like it comes from a science fiction novel than from aDilbertstrip, while having a lot of scientific fun; or you just care about humanity’s future, and want to help however you can… give us a call.


  1. This post is an amalgam put together by a variety of MIRI staff. The byline saying “Nate” means that I (Nate) endorse the post, and that many of the concepts and themes come in large part from me, and I wrote a decent number of the words. However, I did not write all of the words, and the concepts and themes were built in collaboration with a bunch of other MIRI staff. (This is roughly what bylines have meant on the MIRI blog for a while now, and it’s worth noting explicitly.)
  2. See our 2017strategic updatefundraiserposts for more details.
  3. Inpastfundraisers, we’ve said that with sufficient funding we would like to spin up alternative lines of attack on the alignment problem. Our new research directions can be seen as following this spirit, and indeed, at least one of our new research directions is heavily inspired by alternative approaches I was considering back in 2015. That said, unlike many of the ideas I had in mind when writing our 2015 fundraiser posts, our new work is quite contiguous with our Agent-Foundations-style research.
  4. That is, the requisites for aligning AGI systems to perform limitedtasks; not all of the requisites for aligning a fullCEV-classautonomous AGI。比较保罗·克里斯蒂安诺(Paul Christiano)的区别ambitious and narrow value learning(though note that Paul thinks narrow value learning is sufficient for strongly autonomous AGI).
  5. This result is described more in a paper that will be out soon. Or, at least, eventually. I’m not putting a lot of time into writing papers these days, for reasons discussed below.
  6. For more discussion of this concept, see “Personal Thoughts on Careers in AI Policy and Strategy” by Carrick Flynn.
  7. Historical examples of deconfusion work that gave rise to a rich and healthy field include the distillation of Lagrangian and Hamiltonian mechanics from Newton’s laws; Cauchy’s overhaul of real analysis; the slow acceptance of the usefulness of complex numbers; and the development of formal foundations of mathematics.
  8. I should emphasize that from my perspective, humanity never building AGI, never realizing our potential, and failing to make use of thecosmic endowmentwould be a tragedy comparable (on anastronomicalscale) to AGI wiping us out. I say “hazardous”, but we shouldn’t lose sight of the upside of humanity getting the job done right.
  9. 我自己的感觉是,我和Miri的其他高级职员从未特别goodat explaining what we’re doing and why, so this inconvenience may not be a new thing. It’s new, however, for us to not be making it a priority toattemptto explain where we’re coming from.
  10. In other words, many people are explicitly focusing only on outreach, and many others are selecting technical problems to work on with a stated goal of strengthening the field and drawing others into it.
  11. This isn’t meant to suggest that nobody else is taking a straight shot at the core problems. For example, OpenAI’sPaul Christianois a top-tier researcher who is doing exactly that. But we nonetheless want more of this on the present margin.
  12. For example, perhaps the easiest path to unalignable AGI involves following descendants of today’s gradient descent and deep learning techniques, and perhaps the same is true for alignable AGI.
  13. In other words, retreats/rooms where it is common knowledge that all thoughts and ideas are not going to be shared, except perhaps after some lengthy and irritating bureaucratic process and with everyone’s active support.
  14. As an aside, perhaps my main discomfort with attempting to publish academic papers is that there appears to be no venue in AI where we can go to say, “Hey, check this out—we used to be confused aboutX,现在我们可以说Y, which means we’re a little bit less confused!” I think there are a bunch of reasons behind this, not least the fact that the nature of confusion is such thatYusually sounds obviously true once stated, and so it’s particularly difficult to make such a result sound like an impressive practical result.

    A side effect of this, unfortunately, is that all MIRI papers that I’ve ever written with the goal of academic publishing do a pretty bad job of saying what I was previously confused about, and how the “result” is indicative of me becoming less confused—for which I hereby apologize.

  15. 如果您还有更多问题,我鼓励您向我们发送电子邮件contact@www.hdjkn.com