人工通用情报没有火灾警报

||yabo app


What is the function of a fire alarm?

人们可能会认为火警警报的功能是为您提供有关火灾现有的重要证据,使您可以相应地更改政策并退出建筑物。

在1968年拉坦和达利(Latane and Darley)的经典实验中,要求八组三个学生在一个房间里填写一份问卷,不久之后不久就开始充满烟雾。八个小组中有五个没有反应或报告烟雾,即使它变得足够稠密以使它们开始咳嗽。随后的操作表明,一个孤独的学生将有75%的时间做出反应。虽然一名学生陪同两个演员被告知假装冷漠的时间只有10%的时间。这项和其他实验似乎确定正在发生的事情是多元化的无知。We don’t want to look panicky by being afraid of what isn’t an emergency, so we try to look calm while glancing out of the corners of our eyes to see how others are reacting, but of course they are also trying to look calm.

((I’ve read a number of replications and variations on this research, and the effect size is blatant. I would not expect this to be one of the results that dies to the replication crisis, and I haven’t yet heard about the replication crisis touching it. But we have to put a maybe-not marker on everything now.)

A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won’t lose face if you proceed to exit the building.

火灾警报器不能肯定地告诉我们火灾。实际上,我不记得我一生中有一次,当在火警警报器上离开建筑物时,发生了一场火灾。真的,火警警报是较弱火的证据than smoke coming from under a door.

But the fire alarm tells us that it’s socially okay to react to the fire. It promises us with certainty that we won’t be embarrassed if we now proceed to exit in an orderly fashion.

It seems to me that this is one of the cases where people have mistaken beliefs about what they believe, like when somebody loudly endorsing their city’s team to win the big game will back down as soon as asked to bet. They haven’t consciously distinguished the rewarding exhilaration of shouting that the team will win, from the feeling of anticipating the team will win.

When people look at the smoke coming from under the door, I think they think their uncertain wobbling feeling comes from not assigning the fire a high-enough probability of really being there, and that they’re reluctant to act for fear of wasting effort and time. If so, I think they’re interpreting their own feelings mistakenly. If that was so, they’d get the same wobbly feeling on hearing the fire alarm, or even more so, because fire alarms correlate to fire less than does smoke coming from under a door. The uncertain wobbling feeling comes from the worry that others believe differently, not the worry that the fire isn’t there. The reluctance to act is the reluctance to be seen looking foolish, not the reluctance to waste effort. That’s why the student alone in the room does something about the fire 75% of the time, and why people have no trouble reacting to the much weaker evidence presented by fire alarms.


时不时地提议我们应该以后对人工通用情报的问题作出反应(背景在这里), because, it is said, we are so far away from it that it just isn’t possible to do productive work on it today.

((For direct argument about there being things doable today, see: Soares and Fallenstein (2014/2017);Amodei,Olah,Steinhardt,Christiano,Schulman和Mané(2016);or Taylor, Yudkowsky, LaVictoire, and Critch (2016).)

((If none of those papers existed or if you were an AI researcher who’d read them but thought they were all garbage, and you wished you could work on alignment but knew of nothing you could do, the wise next step would be to sit down and spend two hours by the clock sincerely trying to think of possible approaches. Preferably without self-sabotage that makes sure you don’t come up with anything plausible; as might happen if, hypothetically speaking, you would actually find it much more comfortable to believe there was nothing you ought to be working on today, because e.g. then you could work on other things that interested you more.)

((但是没关系。)

So if AGI seems far-ish away, and you think the conclusion licensed by this is that you can’t do any productive work on AGI alignment yet, then the implicit alternative strategy on offer is: Wait for some unspecified future event that tells us AGI is coming near; and然后we’ll all know that it’s okay to start working on AGI alignment.

This seems to me to be wrong on a number of grounds. Here are some of them.

One:As Stuart Russell observed, if you get radio signals from space and spot a spaceship there with your telescopes and you know the aliens are landing in thirty years, you still start thinking about that today.

您不喜欢,“嗯,这是三十年的休息,无论如何。”您当然不会随便说:“好吧,直到他们更加接近,我们就无能为力。”并非没有花两个小时,或者至少five minutesby the clock, brainstorming about whether there is anything you ought to be starting now.

If you said the aliens were coming in thirty years and you were therefore going to do nothing today… well, if these weremore effective timeS,有人会要求您认为您认为应该做的事情的时间表,从外星人到达多久的时间开始。如果您没有准备好计划,他们会知道您不是根据定时响应的工作表,而只是拖延和无所作为;他们会正确地推断您可能并没有非常努力地寻找今天可以做的事情。

用布莱恩·卡普兰(Bryan Caplan)的话来说,任何对“现在无能为力的准备”的事实似乎很随便缺少心情;他们应该更加震惊,因为他们无法想到任何准备方式。也许问其他人是否提出了任何想法?但是没关系。

Two:History shows that for the general public, and even for scientists not in a key inner circle, and even for scientistsinthat key circle, it is very often the case that key technological developments still seem decades away, five years before they show up.

1901年,在帮助建造第一个超过第一个空中飞行者的两年之前,威尔伯·赖特(Wilbur Wright)告诉他的兄弟,动力飞行是五十年了

In 1939, three years before he personally oversaw the first critical chain reaction in a pile of uranium bricks, Enrico Fermi voiced90% confidence那是impossible使用铀来维持裂变链反应。我相信费米(Fermiifnet power from fission was even possible (as he then granted some greater plausibility) then it would be fifty years off; but for this I neglected to keep the citation.

And of course if you’re not the Wright Brothers or Enrico Fermi, you will be even more surprised. Most of the world learned that atomic weapons were now a thing when they woke up to the headlines about Hiroshima. There were esteemed intellectuals sayingfour yearsafterthe Wright Flyerthat heavier-than-air flight was impossible, because knowledge propagated more slowly back then.

是否有事件hindsight, today, we can see as signs that heavier-than-air flight or nuclear energy were nearing? Sure, but if you go back and read the actual newspapers from that time and see what people actually said about it then, you’ll see that they did not know that these were signs, or that they were very uncertain that these might be signs. Some playing the part of Excited Futurists proclaimed that big changes were imminent, I expect, and others playing the part of Sober Scientists tried to pour cold water on all that childish enthusiasm; I expect that part was more or less exactly the same decades earlier. If somewhere in that din was a superforecaster who said “decades” when it was decades and “5 years” when it was five, good luck noticing them amid all the noise. More likely, the superforecasters were the ones who said “Could be tomorrow, could be decades” both when the big development was a day away and when it was decades away.

事后偏见使我们感到过去比任何人实际上能够预测的更可预测的主要模式之一是,事后看来,我们知道我们应该注意的是什么,我们只关注一个想法每个证据指示的内容。如果您查看人们当时实际上说的话,他们通常不知道三个月发生前三个月发生的事情,因为他们不知道哪些迹象是哪种迹象。

I mean, you可以say the words “AGI is 50 years away” and have those words happen to be true. People were also saying that powered flight was decades away when it was in fact decades away, and those people happened to be right. The problem is that everything looks the same to you either way, if you are actually living history instead of reading about it afterwards.

It’s not that whenever somebody says “fifty years” the thing always happens in two years. It’s that this confident prediction of things being far away corresponds to an epistemic state about the technology that feels the same way internally until you are very very close to the big development. It’s the epistemic state of “Well, I don’t see how to do the thing” and sometimes you say that fifty years off from the big development, and sometimes you say it two years away, and sometimes you say it while the Wright Flyer is flying somewhere out of your sight.

Three:进步是由峰值知识而不是普通知识驱动的。

If Fermi and the Wrights couldn’t see it coming three years out, imagine how hard it must be for anyone else to see it.

If you’re not at the global peak of knowledge of how to do the thing, and looped in on all the progress being made at what will turn out to be the leading project, you aren’t going to be able to see of your own knowledgeat all大发展迫在眉睫。除非你一个re very good at perspective-taking in a way that wasn’t necessary in a hunter-gatherer tribe, and very good at realizing that other people may know techniques and ideas of which you have no inkling even that you do not know them. If you don’t consciously compensate for the lessons of history in this regard; then you will promptly say the decades-off thing. Fermi wasn’t still thinking that net nuclear energy was impossible or decades away by the time he got to 3 months before he built the first pile, because at that point Fermi was looped in on everything and saw how to do it. But anyone not looped in probably still felt like it was fifty years away while the actual pile was fizzing away in a squash court at the University of Chicago.

人们似乎不自动补偿the fact that the timing of the big development is a function of the peak knowledge in the field, a threshold touched by the people who know the most and have the best ideas; while they themselves have average knowledge; and therefore what they themselves know is not strong evidence about when the big development happens. I think they aren’t thinking about that at all, and they just eyeball it using their own sense of difficulty. If they are thinking anything more deliberate and reflective than that, and incorporating real work into correcting for the factors that might bias their lenses, they haven’t bothered writing down their reasoning anywhere I can read it.

要知道AGI已经几十年了,我们需要对Agi足够了解,以了解缺少哪些难题,以及这些作品的难度。在难题完成之前,这种见解不太可能可用。这也可以说,对于前缘以外的任何人来说,这个难题看起来比在边缘看起来更不完整。该项目可能会在证明它们之前发布其理论,尽管我希望不要。但是现在也有未经证实的理论。

And again, that’s not to say that people saying “fifty years” is a certain sign that something is happening in a squash court; they were saying “fifty years” sixty years ago too. It’s saying that anyone who thinks technologicaltimelinesare actually forecastable, in advance, by people who are not looped in to the leading project’s progress reports and who don’t share all the best ideas about exactly how to do the thing and how much effort is required for that, is learning the wrong lesson from history. In particular, from reading history books that neatly lay out lines of progress and their visible signs that we all know现在很重要和证据。有时可以说有用的有条件有关大型开发的后果,但很少有可能对此做出自信的预测timing在这些发展中,超出了一到两年的视野。而且,如果您是可以打电话给时机的稀有人物之一,如果这样的人甚至存在,没有人知道要注意您,而不是对兴奋的未来主义者或清醒的怀疑论者。

Four:The future uses different tools, and can therefore easily do things that are very hard now, or do with difficulty things that are impossible now.

我们为什么知道Agi几十年了?在由AI研究实验室负责人撰写的流行文章中,通常给出三个显着的原因:亚博体育官网

((A) The author does not know how to build AGI using present technology. The author does not know where to start.

(b)作者认为,做现代AI技术所做的令人印象深刻的事情确实非常困难,他们必须长时间奴役热门的GPU农场来调整超参数才能完成。他们认为,公众并不欣赏现在要做任何事情是多么困难,并且因为公众认为任何人都可以启动张力并建造机器人汽车而过早感到恐慌。

(c)作者花费大量时间与AI系统互动,因此能够亲自欣赏他们仍然愚蠢和缺乏常识的所有方式。亚博体育苹果app官方下载

现在,我们考虑了参数A的某些方面A。让我们考虑参数B片刻。

假设我说:“现在,一个Comp-SCI毕业生可以在一周内完成N+年前研究社区可以使用神经网络做任何事情亚博体育官网at all。” How large is N?

我在Twitter上从我不知道的资格的人那里得到了一些答案,但最常见的答案是五个,这听起来是根据我对机器学习的相识,对我来说是对我的。(显然不是作为字面上的通用,因为现实永远不会是整洁的。)如果您可以在2012年的时期做某事,那么您可能可以通过现代GPU,Tensorflow,Xavier初始化,批处理标准化,RELUS和ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或ADAM或rmsprop或仅动量的随机梯度下降。现代技术要好得多。可以肯定的是,现在只有那些简单的方法,我们现在就无法做些事情,这些方法需要更多的工作,但是这些事情在2012年根本不可能。

In machine learning, when you can do something at all, you are probably at most a few years away from being able to do it easily using the future’s much superior tools. From this standpoint, argument B, “You don’t understand how hard it is to do what we do,” is something of a non-sequitur when it comes to timing.

Statement B sounds to me like the same sentiment voiced by Rutherford1933年when he called net energy from atomic fission “moonshine”. If you were a nuclear physicist in 1933 then you had to split all your atoms by hand, by bombarding them with other particles, and it was a laborious business. If somebody talked about getting net energy from atoms, maybe it made you feel that you were unappreciated, that people thought your job was easy.

But of course this will always be the lived experience for AI engineers on serious frontier projects. You don’t get paid big bucks to do what a grad student can do in a week (unless you’re working for a bureaucracy with no clue about AI; but that’s not Google or FB). Your personal experience willalwaysbe that what you are paid to spend months doing is difficult. A change in this personal experience is therefore not something you can use as a fire alarm.

那些扮演明智的持怀疑态度科学家的角色显然会在摘要中同意我们的工具会有所改善。但是,在他们笔的流行文章中,他们只是谈论了今年工具的艰难困难。我认为,当他们处于这种模式时,他们甚至都没有试图预测5年内工具的样子。作为我读过的文章的一部分,他们没有写下任何此类论点。我认为,当他们告诉您Agi几十年来时,他们实际上是在给出估计how long it feels to themlike it would take to build AGI using their current tools and knowledge. Which is why they emphasize how hard it is to stir the heap of linear algebra until it spits out good answers; I think they are not imagining, at all, into how this experience may change over considerably less than fifty years. If they’ve explicitly considered the bias of estimating future tech timelines based on their present subjective sense of difficulty, and tried to compensate for that bias, they haven’t written that reasoning down anywhere I’ve read it. Nor have I ever heard of that forecasting method giving good results historically.

Five:好吧,让我们在这里直率。我不认为关于Agi遥远的大多数话语(or即将到来的机器学习进展模型所产生的。我认为我们没有看错模型;我认为我们没有看模型。

I was once at a conference where there was a panel full of famous AI luminaries, and most of the luminaries were nodding and agreeing with each other that of course AGI was very far off, except for two famous AI luminaries who stayed quiet and let others take the microphone.

我在问答中说:“好吧,你们都告诉我们,进步不会那么快。但是,让我们更具体和具体。我想知道什么是至少impressive accomplishment that you are very confident不能be done in the next two years.”

有一个沉默。

最终,面板上的两个人冒险回答,比他们用来发音AGI数十年来更具试探性语气。他们命名为“一个机器人将洗碗机从洗碗机中扔掉而不打破它们”,然后Winograd模式。Specifically, “I feel quite confident that the Winograd schemas—where we recently had a result that was in the 50, 60% range—in the next two years, we will not get 80, 90% on that regardless of the techniques people use.”

A few months after that panel, there was unexpectedly a big breakthrough on Winograd schemas. The breakthrough didn’t crack 80%, so three cheers for wide credibility intervals with error margin, but I expect the predictor might be feeling slightly more nervous now with one year left to go. (I don’t think it was the breakthrough I remember reading about, but Rob turned up这张纸作为一个可以在上述会议后44天提交的例子,最高可达70%。)

But that’s not the point. The point is the silence that fell after my question, and that eventually I only got two replies, spoken in tentative tones. When I asked for concrete feats that were impossible in the next two years, I think that that’s when the luminaries on that panel switched to trying to build a mental model of future progress in machine learning, asking themselves what they could or couldn’t predict, what they knew or didn’t know. And to their credit, most of them did know their profession well enough to realize that forecasting future boundaries around a rapidly moving field is actuallyreally hard, that nobody knows what will appear on arXiv next month, and that they needed to put wide credibility intervals with very generous upper bounds on how much progress might take place twenty-four months’ worth of arXiv papers later.

(此外,Demis Hassabis在场,所以他们都知道,如果他们命名了不够的东西,Demis会去做Deepmind,去做。)

The question I asked was in a completely different genre from the panel discussion, requiring a mental context switch: the assembled luminaries actually had to try to consult their rough, scarce-formed intuitive models of progress in machine learning and figure out what future experiences, if any, their model of the field definitely prohibited within a two-year time horizon. Instead of, well, emitting socially desirable verbal behavior meant to kill that darned hype about AGI and get some predictable applause from the audience.

我会直言不讳:我认为根本没有考虑过这种自信的长期主义。如果您的模型具有非凡的能力,可以说在另外一百二十个月以后十年后的几年是不可能的,那么您应该能够说出两年内不可能的弱事情,而您应该拥有这些内容。被问到后,预测排队并准备出发,而不是陷入紧张的沉默。

In reality, the two-year problem is hard and the ten-year problem is laughably hard. The future is hard to predict in general, our predictive grasp on a rapidly changing and advancing field of science and engineering is very weak indeed, and it doesn’t permit narrow credible intervals on what can’t be done.

格蕾丝等。((2017) surveyed the predictions of 352 presenters at ICML and NIPS 2015. Respondents’ aggregate forecast was that the proposition “all occupations are fully automatable” (in the sense that “for any occupation, machines could be built to carry out the task better and more cheaply than human workers”) will not reach 50% probability until 121 years hence. Except that a randomized subset of respondents were instead asked the slightly different question of “when unaided machines can accomplish every task better and more cheaply than human workers”, and in this case held that this was 50% likely to occur在44年内

这就是当您要求人们进行估计时无法估计的情况时,就会发生这种情况,并且对理想的口头行为应该是一种社会意识。


当我观察到AGI没有火警警报时,我并不是说不可能从门下出现烟雾。

我说的是,门下的烟雾总是可以争论的。这不会是明显,不可否认的,绝对的火迹象;因此,永远不会出现火灾警报,从而产生常识,即行动现在已经到期且在社会上可以接受。

有一个古老的望远镜说,一旦实际完成了某件事,就不再被称为AI。在AI中工作并且从广义上讲,亲加速主义者和技术爱好者的人,您称之为Kurzweilian Camp(我不是我的成员),有时会反对这是不公平的判断力,例如移动的目标。

This overlooks a real and important phenomenon of adverse selection against AI accomplishments: If you can do something impressive-sounding with AI in 1974, then that is because that thing turned out to be doable in some cheap cheaty way, not because 1974 was so amazingly great at AI. We are uncertain about how much cognitive effort it takes to perform tasks, and how easy it is to cheat at them, and the first “impressive” tasks to be accomplished will be those where we were most wrong about how much effort was required. There was a time when some people thought that a computer winning the world chess championship would require progress in the direction of AGI, and that this would count as a sign that AGI was getting closer. When Deep Blue beat Kasparov in 1997, in a Bayesian sense we did learn something about progress in AI, but we also learned something about chess being easy. Considering the techniques used to construct Deep Blue, most of what we learned was “It is surprisingly possible to play chess without easy-to-generalize techniques” and not much “A surprising amount of progress has been made toward AGI.”

Was AlphaGo smoke under the door, a sign of AGI in 10 years or less? People had previously given Go as an example of What You See Before The End.

查看描述Alphago建筑的论文,在我看来我们weremostly learning that available AI techniques were likely to go further towards generality than expected, rather than about Go being surprisingly easy to achieve with fairly narrow and ad-hoc approaches. Not that the method scales to AGI, obviously; but AlphaGo did look like a product ofrelatively一般的见解和技术正在打开GO的特殊情况,而深蓝色不是。我还大量更新了“人类皮质算法的一般学习能力不那么令人印象深刻,比我想象的相比,用大量的梯度下降和千亿gpus捕获难以捕获”匹配高度自然选择但静止的一般皮质算法要发挥作用,这将是人类玩的人。

也许,如果我们看到一千个地球正在经历类似的活动,我们会收集统计数据,并发现赢得行星Go Championship的计算机是AGI的十年可靠。但是我实际上不知道。你也没有。当然,任何人都可以公开争辩说,我们刚刚了解到,使用严格狭窄的技术比预期的更容易实现,就像过去很多次一样。没有任何实际的AGI迹象,门下没有烟雾,我们知道这绝对是严重的火灾,现在Agi已有10、5或2年。更不用说我们知道其他人都相信的标志了。

And in any case, multiple leading scientists in machine learning have already published articles telling us their criterion for a fire alarm. They will believe Artificial General Intelligence is imminent:

((A) When they personally see how to construct AGI using their current tools. This is what they are always saying is not currently true in order to castigate the folly of those who think AGI might be near.

(B)当他们的个人工作不给他们一个sense of everything being difficult. This, they are at pains to say, is a key piece of knowledge not possessed by the ignorant layfolk who think AGI might be near, who only believe that because they have never stayed up until 2AM trying to get a generative adversarial network to stabilize.

(c)当他们对AI对人的聪明程度给人留下深刻的印象时,这对他们仍然对他们来说仍然是神奇的;与他们知道如何设计的部分相反,对他们来说,这似乎不再是神奇的;又名AI在互动和对话中看起来很聪明。又名AI实际上已经是AGI。

因此,不会有火灾警报。时期。

永远不会有一段时间在结束之前,您可以紧张地环顾四周,并且看到现在显然可以谈论Agi即将到来,并采取行动并以有序的方式退出建筑物,而无需恐惧看起来很愚蠢或害怕。


So far as I can presently estimate, now that we’ve had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we are probably going to occupy our present epistemic state until very near the end.

通过说我们可能会大致处于这种认知状态,直到几乎尽头,我mean to say we know that AGI is imminent, or that there won’t be important new breakthroughs in AI in the intervening time. I mean that it’s hard to guess how many further insights are needed for AGI, or how long it will take to reach those insights. After the next breakthrough, we still won’t know how many more breakthroughs are needed, leaving us in pretty much the same epistemic state as before. Whatever discoveries and milestones come next, it will probably continue to be hard to guess how many further insights are needed, and timelines will continue to be similarly murky. Maybe researcher enthusiasm and funding will rise further, and we’ll be able to say that timelines are shortening; or maybe we’ll hit another AI winter, and we’ll know that’s a sign indicating that things will take longer than they would otherwise; but we still won’t know多久。

在某个时候,我们可能会看到突然的Arxiv论文泛滥,其中真正有趣,根本和可怕的认知挑战似乎正在越来越多地完成。因此,随着洪水的加速,即使有些人认为自己清醒和怀疑的人也会感到不安,以至于他们冒险,也许阿吉现在只有15年了,也许也许是。这些迹象可能会变得如此公然,在结束前很快,人们开始认为可能AGI可能有10年的休息在社会上是可以接受的。Though the signs would have to be pretty darned blatant, if they’re to overcome the social barrier posed by luminaries who are estimating arrival times to AGI using their personal knowledge and personal difficulties, as well as all the historical bad feelings about AI winters caused by hype.

但是,即使说AGI已经超过15年,在过去的几年或几个月中,我仍然希望有分歧。仍然会有其他抗议的人,就像联想记忆和与人类等效的小脑协调(或其他)现在解决的问题一样,他们仍然不知道如何构建AGI。他们会注意到,没有AIS编写计算机科学论文,也没有与人进行真正明智的对话,并激发了那些说话的人毫无意义的警报,就像我们已经知道该怎么做一样。他们将解释说,愚蠢的外行没有意识到要使当前的系统工作需要多少痛苦和调整。亚博体育苹果app官方下载(尽管这些现代方法几乎可以轻松地完成2017年可能的任何事情,并且任何研究生都知道如何在第一次尝试使用TF.unsuperped模块的tensorflow 5.3.1中滚动稳定的gan。)

When all the pieces are ready and in place, lacking only the last piece to be assembled by the very peak of knowledge and creativity across the whole world, it will still seem to the average ML person that AGI is an enormous challenge looming in the distance, because they still won’t personally know how to construct an AGI system. Prestigious heads of major AI research groups will still be writing文章谴责愚蠢的烦恼,以全面破坏所有地球生命以及它可以实现的所有未来价值,并说我们不应该让我们分散我们的注意力真正的,可观的关注like loan-approval systems accidentally absorbing human biases.

Of course, the future is very hard to predict in detail. It’s so hard that not only do I confess my own inability, I make the far stronger positive statement that nobody else can do it either. The “flood of groundbreaking arXiv papers” scenario is one way things could maybe possibly go, but it’s an implausibly specific scenario that I made up for the sake of concreteness. It’s certainly not based on my extensive experience watching other Earthlike civilizations develop AGI. I do put a significant chunk of probability mass on “There’s not much sign visible outside a Manhattan Project until Hiroshima,” because that scenario is simple. Anything more complex is just one more story full of繁重的细节那不太可能是真的。

但是,无论细节如何发挥作用,我都在非常普遍的意义上预测,不会有没有实际运行的火灾警报 - 在那时,每个人都知道并同意的毫无疑问的迹象,这使人们无需感觉到就可以采取行动nervous about whether they’re worrying too early. That’s just not how the history of technology has usually played out in much simpler cases like flight and nuclear engineering, let alone a case like this one where all the signs and models are disputed. We already know enough about the uncertainty and low quality of discussion surrounding this topic to be able to say with confidence that there will be no unarguable socially accepted sign of AGI arriving 10 years, 5 years, or 2 years beforehand. If there’s any general social panic it will be by coincidence, based on terrible reasoning, uncorrelated with real timelines except by total coincidence, set off by a Hollywood movie, and focused on relatively trivial dangers.

It’s no coincidence that nobody has given any actual account of such a fire alarm, and argued convincingly about how much time it means we have left, and what projects we should only then start. If anyone does write that proposal, the next person to write one will say something completely different. And probably neither of them will succeed at convincing me that they know anything prophetic about timelines, or that they’ve identified any sensible angle of attack that is (a) worth pursuing at all and (b) not worth starting to work on right now.


It seems to me that the decision to delay all action until a nebulous totally unspecified future alarm goes off, implies an order of recklessness great enough that the law of continued failure comes into play.

The law of continued failure is the rule that says that if your country is incompetent enough to use a plaintext 9-numeric-digit password on all of your bank accounts and credit applications, your country is not competent enough to correct course after the next disaster in which a hundred million passwords are revealed. A civilization competent enough to correct course in response to that prod, to react to it the way you’d want them to react, is competent enough not to make the mistake in the first place. When a system fails massively and obviously, rather than subtly and at the very edges of competence, the next prod is not going to cause the system to suddenly snap into doing things intelligently.

The law of continued failure is especially important to keep in mind when you are dealing with big powerful systems or high-status people that you might feel nervous about derogating, because you may be tempted to say, “Well, it’s flawed now, but as soon as a future prod comes along, everything will snap into place and everything will be all right.” The systems about which this fond hope is actually warranted look like they are mostly doing all the important things right already, and only failing in one or two steps of cognition. The fond hope is almost never warranted when a person or organization or government or social subsystem is currently falling massively short.

The folly required to ignore the prospect of aliens landing in thirty years is already great enough that the other flawed elements of the debate should come as no surprise.

而且,由于今天同时出现了所有这些问题,我们应该预测,在收到不确定的迹象表明外星人可能会在五年内降落的迹象之后,同一系统和激励措施不会产生正确的产出。亚博体育苹果app官方下载持续失败的定律表明,如果现有当局立即以足够的不同方式失败,以为试图通过说真正的问题是自动驾驶汽车的安全性,那是有意义的,默认的期望是他们以后仍然会说愚蠢的话。

People who make large numbers of simultaneous mistakes don’t generally have all of the incorrect thoughts subconsciously labeled as “incorrect” in their heads. Even when motivated, they can’t suddenly flip to skillfully executing all-correct reasoning steps instead. Yes, we have various experiments showing that monetary incentives can reduce overconfidence and political bias, but (a) that’s reduction rather than elimination, (b) it’s with extremely clear short-term direct incentives, not the nebulous and politicizable incentive of “a lot being at stake”, and (c) that doesn’t mean a switch is flipping all the way to “carry out complicated correct reasoning”. If someone’s brain contains a switch that can flip to enable complicated correct reasoning at all, it’s got enough internal precision and skill to think mostly-correct thoughts now instead of later—at least to the degree that some conservatism and double-checking gets built into examining the conclusions that people know will get them killed if they’re wrong about them.

There is no sign and portent,no threshold crossed, that suddenly causes people to wake up and start doing things systematically correctly. People who can react that competently to any sign at all, let alone a less-than-perfectly-certain not-totally-agreed item of evidence that islikelya wakeup call, have probably already done the timebinding thing. They’ve already imagined the future sign coming, and gone ahead and thought sensible thoughts earlier, like Stuart Russell saying, “If you know the aliens are landing in thirty years, it’s still a big deal now.”


回到funding-starved初期是什么现在MIRI, I learned that people who donated last year were likely to donate this year, and people who last year were planning to donate “next year” would quite often this year be planning to donate “next year”. Of course there were genuine transitions from zero to one; everything that happens needs to happen for a first time. There were college students who said “later” and gave nothing for a long time in a genuinely strategically wise way, and went on to get nice jobs and start donating. But I also learned well that, like many cheap and easy solaces, saying the word “later” is addictive; and that this luxury is available to the rich as well as the poor.

I don’t expect it to be any different with AGI alignment work. People who are trying to get what grasp they can on the alignment problem will, in the next year, be doing a little (or a lot) better with whatever they grasped in the previous year (plus, yes, any general-field advances that have taken place in the meantime). People who want to defer that until after there’s a better understanding of AI and AGI will, after the next year’s worth of advancements in AI and AGI, want to defer work until a better future understanding of AI and AGI.

Some people really对齐get done和are therefore现在trying to wrack their brains about how to get something like a reinforcement learner toreliably identify a utility function over particular elements in a model of the causal environment instead of a sensory reward termor击败更新的(非)尊重的重言式。。Others would rather be working on other things, and will therefore declare that there is no work that can possibly be done today,notspending two hours quietly thinking about it first before making that declaration. And this will not change tomorrow, unless perhaps tomorrow is when we wake up to some interesting newspaper headlines, and probably not even then. The luxury of saying “later” is not available only to the truly poor-in-available-options.

过了一会儿,我开始告诉altruis有效ts in college: “If you’re planning to earn-to-give later, then for now, give around $5 every three months. And never give exactly the same amount twice in a row, or give to the same organization twice in a row, so that you practice the mental habit of re-evaluating causes and re-evaluating your donation amounts on a regular basis.learn the mental habit of just always saying ‘later’.”

Similarly, if somebody wasactually我要告诉他们每六个月,要“以后”进行AGI对齐,花了几个小时来制定他们可以设计的最佳目前计划,以使AGI保持一致并在该方案上做有用的工作。假设他们必须使用,AGI是一种类似于当前技术的技术。至少在将其发布到Facebook的意义上,并发布了他们最不错的信息;因此,他们将对命名一个计划的尴尬感,这种计划看起来并不是有人花了两个小时试图思考最好的坏方法。

There are things we’ll better understand about AI in the future, and things we’ll learn that might give us more confidence that particular research approaches will be relevant to AGI. There may be more future sociological developments akin to Nick Bostrom publishingSuperintelligence,埃隆·马斯克(Elon Musk)在推特上发了推文,从而通过Overton窗户向一块岩石铺设,或者像Stuart Russell这样的众多名人公开加入。未来将举办更多类似Alphago的事件,以公开和私人的介绍ML技术的新基础进步;而且可能有些是这样notleave us in the same epistemic state as having already seen AlphaGo and GANs and the like. It could happen! I can’t see exactly how, but the future does have the capacity to pull surprises in that regard.

But before waiting on that surprise, you should ask whether your uncertainty about AGI timelines is really uncertainty at all. If it feels to you that guessing AGI might have a 50% probability in N years is not enough knowledge to act upon, if that feels scarily uncertain and you want to wait for more evidence before making any decisions… then ask yourself how you’d feel if you believed the probability was 50% in N years, and everyone else on Earth also believed it was 50% in N years, and everyone believed it was right and proper to carry out policy P when AGI has a 50% probability of arriving in N years. If that visualization feels very different, then any nervous “uncertainty” you feel about doing P is not really about whether AGI takes much longer than N years to arrive.

而且,无论Agi多么近,您几乎肯定会被这种“不确定性”的感觉所困扰。因为无论AGI有多近,无论出现什么迹象,几乎都不会产生共同的,共享的,商定的公众知识,即AGI有50%的机会到达N年,也没有任何协议,因此对反应是正确和适当的协议通过做P。

如果所有这些确实成为常识,那么P仍然不太可能仍然是被忽视的干预措施,或者AI对齐是被忽视的问题;因此,您将等到可悲的迟到才能提供帮助。

但是更有可能的是,常识不会存在,因此考虑行动总是会紧张地“不确定”。

尽管如此,您还是可以采取行动,或者不采取行动。在最好的情况下,直到为时已晚,才采取行动;在平均情况下,直到基本结束后,根本不采取行动。

我认为等待未指定的认知奇迹来改变我们的感受是不明智的。很可能您会处于这种精神状态一段时间,包括任何紧张的“不确定性”。如果您通过说“以后”来处理这种精神状态,那么一般政策不太可能对地球产生良好的结果。


进一步的资源: