Announcing “Inadequate Equilibria”

||News

MIRI Senior Research Fellow Eliezer Yudkowsky has a new book out today:平衡不足: Where and How Civilizations Get Stuck, a discussion of societal dysfunction, exploitability, and self-evaluation. From the preface:

平衡不足is a book about a generalized notion of efficient markets, and how we can use this notion to guess where society will or won’t be effective at pursuing some widely desired goal.An efficient market is one where smart individuals should generally doubt that they can spot overpriced or underpriced assets. We can ask an analogous question, however, about the “efficiency” of other human endeavors.

例如,假设某人认为他们可以轻松地建立比Facebook更好,更有利可图的社交网络,或者很容易为广泛的医疗状况提供新的治疗方法。他们是否应该质疑什么聪明的推理导致他们得出结论,就像大多数聪明人应该质疑任何使他们认为AAPL股票价格低廉的聪明推理的方式一样?他们是否应该质疑他们是否可以在这些领域“击败市场”,还是甚至可以发现现状的主要内部改进?多么“高效”或足够的,我们应该期望文明承担各种任务吗?

这re will be, as always, good ways and bad ways to reason about these questions; this book is about both.

这book is available from Amazon (inprintandKindle), 上iBooks, as a pay-what-you-wantdigital download, 和as aequilibriabook.com的网络书。这本书也已发布到不太错误2.0

这book’s contents are:


1. Inadequacy and Modesty

比较两个“大不相同,几乎cognitively nonoverlapping”思考超越表现的方法:modest epistemology, 和inadequacy analysis

2.无自由能的平衡

How, in principle, can society end up neglecting obvious low-hanging fruit?

3. Moloch’s Toolbox

为什么我们的文明实际上最终会忽略低悬垂的水果?

4. Living in an Inadequate World

How can we best take into account civilizational inadequacy in our decision-making?

5. Blind Empiricism

Three examples of modesty in practical settings.

6.反对谦虚的认识论

An argument against the “epistemological core” of modesty: that we shouldn’t take our own reasoning and meta-reasoning at face value in cases in the face of disagreements or novelties.

7. Status Regulation and Anxious Underconfidence

On causal accounts of modesty.


Although平衡不足isn’t about AI, I consider it one of MIRI’s most important nontechnical publications to date, as it helps explain some of the most basic tools and background models we use when we evaluate how promising a potential project, research program, or general strategy is.

A major grant from the Open Philanthropy Project

||News

我很高兴地宣布,公开慈善项目已授予Miri Athree-year $3.75 million general support grant($1.25 million per year). This grant is, by far, the largest contribution MIRI has received to date, and will have a major effect on our plans going forward.

This grant follows a$500,000 grantwe received from the Open Philanthropy Project in 2016. The Open Philanthropy Project’sannouncementfor the new grant notes that they are “now aiming to support about half of MIRI’s annual budget”.1这annual $1.25 million represents 50% of a conservative estimate we provided to the Open Philanthropy Project of the amount of funds we expect to be able to usefully spend in 2018–2020.

This expansion in support was also conditional on our ability to raise the other 50% from other supporters. For that reason, I sincerely thank all of the past and current supporters who have helped us get to this point.

这Open Philanthropy Project has expressed openness to potentially increasing their support if MIRI is in a position to usefully spend more than our conservative estimate, if they believe that this increase in spending is sufficiently high-value, and if we are able to secure additional outside support to ensure that the Open Philanthropy Project isn’t providing more than half of our total funding.

我们将在后续帖子中详细介绍我们未来的组织计划December 1, where we’ll also discuss our end-of-the-year fundraising goals.

在他们的文章中,开放慈善项目指出,自2016年以来,他们对我们的技术输出进行了良好的更新our logical induction paper:

We received a very positive review of MIRI’s work on “logical induction”(i)对AI安全感兴趣的机器学习研究人员(ii亚博体育官网)被我们的至少一位近距顾问评为杰出的研究人员,并且(iii)通常被ML社区视为杰出的。如上所述,我们以前有难以评估the technical quality of MIRI’s research, and we previously could find no one meeting criteria (i) – (iii) to a comparable extent who was comparably excited about MIRI’s technical research. While we would not generally offer a comparable grant to any lab on the basis of this consideration alone, we consider this a significant update in the context of the originalcase for the [2016] grant(特别是Miri在这组问题上的体贴,与我们的价值一致,独特的观点以及该领域的工作历史)。尽管我们技术顾问的意见和论点的平衡仍然使我们对Miri研究的价值持怀疑态度,但该声明的案例“ Miri的研究有一个亚博体育官网nontrivial chanceof turning out to be extremely valuable (when taking into account how different it is from other research on AI safety)” appears much more robust than it did before we received this review.

该消息还指出:“自从我们最初给Miri的赠款以来,我们在该重点领域中又获得了几项赠款,因此不太担心更大的赠款会表明对Miri的方法的大规模认可。”

We’re enormously grateful for the Open Philanthropy Project’s support, and for their deep engagement with the AI safety field as a whole. To learn more about our discussions with the Open Philanthropy Project and their active work in this space, see the group’s previousAI safety grants, our conversation with Daniel Deweyon the Effective Altruism Forum, 和the research problems outlined in the Open Philanthropy Project’s recentAI fellows program description


  1. 开放的慈善项目通常前fers not to provide more than half of an organization’s funding, to facilitate funder coordination and ensure that organizations it supports maintain their independence. From a Marchblog post: “We typically avoid situations in which we provide >50% of an organization’s funding, so as to avoid creating a situation in which an organization’s total funding is ‘fragile’ as a result of being overly dependent on us.”

November 2017 Newsletter

||Newsletters

Eliezer Yudkowsky写了一本关于文明功能障碍和表现效果的新书:平衡不足: Where and How Civilizations Get Stuck。这full book will be available in print and electronic formats November 16. To preorder the ebook or sign up for updates, visitequilibriabook.com

We’re posting the full contents online in stages over the next two weeks. The first two chapters are:

  1. 不足和谦虚(discussion:Lesswrong,EA Forum,Hacker News)
  2. An Equilibrium of No Free Energy(discussion:Lesswrong,EA Forum)

Research updates

General updates

新闻和链接

New paper: “Functional Decision Theory”

||Papers

功能决策理论

MIRI senior researcher Eliezer Yudkowsky and executive director Nate Soares have a new introductory paper out on decision theory: “功能决策理论:一种新的工具理论理论。”

抽象的:

本文描述并激发了一种新的决策理论功能决策理论(FDT),与因果决策理论和证据决策理论不同。

功能决策理论家认为,行动的规范原则是将自己的决定视为固定的数学功能的输出,该功能回答了一个问题:“该功能的哪个输出将产生最佳结果?”坚持这一原则提供了许多好处,包括在CDT和EDT表现不佳的一系列传统决策理论和游戏理论问题中最大化财富的能力。使用一个简单且连贯的决策规则,功能性决策理论家(例如)在NewComb的问题上获得了比CDT更大的实用性,在吸烟病变问题上比EDT更效用,并且在Parfient的Hitchhiker问题中,实用性比两者都更多。

在本文中,我们定义了FDT,在许多不同的决策问题中探索了其处方,将其与CDT和EDT进行比较,并为FDT提供哲学上的理由,作为决策的规范理论。

Our previous introductory paper on FDT, “大马士革作弊死亡,” focused on comparing FDT’s performance to that of CDT and EDT in fairly high-level terms. Yudkowsky and Soares’ new paper puts a much larger focus on FDT’s mechanics and motivations, making “Functional Decision Theory” the most complete stand-alone introduction to the theory.1

阅读更多 ”


  1. “Functional Decision Theory” was originally drafted prior to “大马士革作弊死亡,” and was significantly longer before we received various rounds of feedback from the philosophical community. “Cheating Death in Damascus” was produced from material that was cut from early drafts; other cut material included a discussion ofproof-based decision theory, 和some Death in Damascus variants left on the cutting room floor for being needlessly cruel to CDT.

AlphaGo Zero and the Foom Debate

||yabo app

AlphaGo Zerouses 4 TPUs, is built entirely out of neural nets with no handcrafted features, doesn’t pretrain against expert games or anything else human, reaches a superhuman level after 3 days of self-play, and is the strongest version of AlphaGo yet.

该体系结构已简化。以前的Alphago有一个预测出色比赛的政策网,并且一个评估位置的价值网,两者都使用MCT馈入LookAhead(随机概率加权将在游戏结束时发挥)。Alphago Zero具有一个选择动作的神经网,该网络由Paul-Christiano风格训练能力扩增,与自己一起玩游戏,以学习获胜的新概率。

As others have also remarked, this seems to me to be an element of evidence that favors the Yudkowskian position over the Hansonian position in my and Robin Hanson’sAI-Foom辩论

正如我所记得的那样,正如我所理解的那样:

  • 汉森怀疑他所说的“建筑”很重要,与(汉森说的)元素相比,诸如累积领域知识或专业公司建造的特殊用途组件(他期望是为AI服务的公司的生态学)而建立的特殊用途组件经济。
  • 当我评论一下,当人类对黑猩猩的建筑改进,这对我来说很重要时,汉森回答说,这在他看来,这似乎是一次性的收益,从允许文化的知识积累。

I emphasize how all the mighty human edifice of Go knowledge, the joseki and tactics developed over centuries of play, the experts teaching children from an early age, was entirely discarded by AlphaGo Zero with a subsequent performance improvement. These mighty edifices of human knowledge, as I understand the Hansonian thesis, are supposed to bethebulwark against rapid gains in AI capability across multiple domains at once. I said, “Human intelligence is crap and our accumulated skills are crap,” and this appears to have been borne out.

Similarly, single research labs like DeepMind are not supposed to pull far ahead of the general ecology, because adapting AI to any particular domain is supposed to require lots of components developed all over the place by a market ecology that makes those components available to other companies. AlphaGo Zero is much simpler than that. To the extent that nobody else can run out and build AlphaGo Zero, it’s either because Google has Tensor Processing Units that aren’t generally available, or because DeepMind has a silo of expertise for being able to actually make use of existing ideas like ResNets, or both.

Sheer speed of capability gain should also be highlighted here. Most of my argument for FOOM in the Yudkowsky-Hanson debate was about self-improvement and what happens when an optimization loop is folded in on itself. Though it wasn’t necessary to my argument, the fact that Go play went from “nobody has come close to winning against a professional” to “so strongly superhuman they’re not really bothering any more” over two years just because that’s what happens when you improve and simplify the architecture, says您甚至不需要自我完善要获得看起来像福姆的东西。

是的,GO是一个允许自我播放的封闭系统。亚博体育苹果app官方下载人类仍然花了几个世纪的时间才能学习如何玩游戏。也许新的汉森尼亚堡垒反对快速能力增长可能是,环境有很多经验的地方,这些幅度应该很难学习,即使在AI的限制中,也足够快地吹嘘过去几个世纪以来的人类式学习;而且,人类已经在数百年的文化积累中学到了这些重要的知识,even though we know that humans take centuries to do 3 days of AI learning when humans have all the empirical bits they need;and that AIs cannot absorb this knowledge very quickly using “architecture”,even though humans learn it from each other using architecture。If so, then let’s write down this new world-wrecking assumption (that is, the world ends if the assumption is false) and be on the lookout for further evidence that this assumption might perhaps be wrong.

AlphaGo clearly isn’t a general AI. There’s obviously stuff humans do that make us much more general than AlphaGo, and AlphaGo obviously doesn’t do that. However, if even with the human special sauce we’re to expect AGI capabilities to be slow, domain-specific, and requiring feed-in from a big market ecology, then the situation we see without human-equivalent generality special sauce should not look like this.

换句话说,我在递归自我完善和从灵长类情报到人类智能的变化方面的辩论中非常重视。这并不意味着我们无法获得有关能力速度提高的信息withoutself-improvement. It doesn’t mean we can’t get info about the importance and generality of algorithmswithoutthe general intelligence trick. The debate can start to settle for fast capability gains before we even get to what I saw as the good parts; I wouldn’t have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did on the Yudkowsky-Hanson spectrum.

(Reply from Robin Hanson.)

October 2017 Newsletter

||Newsletters

“So far as I can presently estimate, now that we’ve had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we are probably going to occupy our present epistemic state until very near the end.

“ […i]很难猜测AGI需要多少进一步的见解,或者需要多长时间才能获得这些见解。在下一次突破之后,我们仍然不知道需要多少突破,这使我们处于与以前几乎相同的认知状态。[…]尽管如此,您还是可以采取行动,或者不采取行动。在最好的情况下,直到为时已晚,才采取行动;在平均情况下,直到基本结束后,根本不采取行动。”

Read more in a new blog post by Eliezer Yudkowsky: “人工通用情报没有火灾警报。”(Discussion onLesswrong2.0,Hacker News。)

Research updates

General updates

新闻和链接

人工通用情报没有火灾警报

||yabo app


What is the function of a fire alarm?

One might think that the function of a fire alarm is to provide you with important evidence about a fire existing, allowing you to change your policy accordingly and exit the building.

在1968年拉坦和达利(Latane and Darley)的经典实验中,要求八组三个学生在一个房间里填写一份问卷,不久之后不久就开始充满烟雾。八个小组中有五个没有反应或报告烟雾,即使它变得足够稠密以使它们开始咳嗽。随后的操作表明,一个孤独的学生将有75%的时间做出反应。虽然一名学生陪同两个演员被告知假装冷漠的时间只有10%的时间。这项和其他实验似乎确定正在发生的事情是多元化的无知。We don’t want to look panicky by being afraid of what isn’t an emergency, so we try to look calm while glancing out of the corners of our eyes to see how others are reacting, but of course they are also trying to look calm.

(I’ve read a number of replications and variations on this research, and the effect size is blatant. I would not expect this to be one of the results that dies to the replication crisis, and I haven’t yet heard about the replication crisis touching it. But we have to put a maybe-not marker on everything now.)

A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won’t lose face if you proceed to exit the building.

这fire alarm doesn’t tell us with certainty that a fire is there. In fact, I can’t recall one time in my life when, exiting a building on a fire alarm, there was an actual fire. Really, a fire alarm isweaker火灾的证据比从门下传来的烟雾的证据。

But the fire alarm tells us that it’s socially okay to react to the fire. It promises us with certainty that we won’t be embarrassed if we now proceed to exit in an orderly fashion.

It seems to me that this is one of the cases where people have mistaken beliefs about what they believe, like when somebody loudly endorsing their city’s team to win the big game will back down as soon as asked to bet. They haven’t consciously distinguished the rewarding exhilaration of shouting that the team will win, from the feeling of anticipating the team will win.

When people look at the smoke coming from under the door, I think they think their uncertain wobbling feeling comes from not assigning the fire a high-enough probability of really being there, and that they’re reluctant to act for fear of wasting effort and time. If so, I think they’re interpreting their own feelings mistakenly. If that was so, they’d get the same wobbly feeling on hearing the fire alarm, or even more so, because fire alarms correlate to fire less than does smoke coming from under a door. The uncertain wobbling feeling comes from the worry that others believe differently, not the worry that the fire isn’t there. The reluctance to act is the reluctance to be seen looking foolish, not the reluctance to waste effort. That’s why the student alone in the room does something about the fire 75% of the time, and why people have no trouble reacting to the much weaker evidence presented by fire alarms.


时不时地提议我们应该以后对人工通用情报的问题作出反应(背景在这里), because, it is said, we are so far away from it that it just isn’t possible to do productive work on it today.

(For direct argument about there being things doable today, see: Soares and Fallenstein (2014/2017); Amodei, Olah, Steinhardt, Christiano, Schulman, and Mané (2016); or Taylor, Yudkowsky, LaVictoire, and Critch (2016)。

(如果这些论文存在或者如果你是一个AI researcher who’d read them but thought they were all garbage, and you wished you could work on alignment but knew of nothing you could do, the wise next step would be to sit down and spend two hours by the clock sincerely trying to think of possible approaches. Preferably without self-sabotage that makes sure you don’t come up with anything plausible; as might happen if, hypothetically speaking, you would actually find it much more comfortable to believe there was nothing you ought to be working on today, because e.g. then you could work on other things that interested you more.)

(但是没关系。)

So if AGI seems far-ish away, and you think the conclusion licensed by this is that you can’t do any productive work on AGI alignment yet, then the implicit alternative strategy on offer is: Wait for some unspecified future event that tells us AGI is coming near; andthenwe’ll all know that it’s okay to start working on AGI alignment.

在我看来,这是错误的。这里是其中的一些。

阅读更多 ”

2017年9月通讯

||Newsletters

Research updates

General updates

  • As part of his engineering internship at MIRI, Max Harms assisted in the construction and extension ofRL-Teacher, an open-source tool for training AI systems with human feedback based on the “Deep RL from Human Preferences” OpenAI / DeepMind research collaboration. SeeOpenai的公告
  • MIRI COO Malo Bourgon participated in panel discussions on getting things done (video) and working in AI (video) at the Effective Altruism Global conference in San Francisco. AI Impacts researcher Katja Grace also spoke on AI safety (video). Other EAG talks on AI included Daniel Dewey’s (video) and Owen Cotton-Barratt’s (video), and a larger panel discussion (video).
  • Announcing two winners of the Intelligence in Literature prize: Laurence Raphael Brothers’ “Houseproud” and Shane Halbach’s “Human in the Loop”。
  • 增加, a project to develop online AI alignment course material, is seeking volunteers.

新闻和链接

  • 这Open Philanthropy Project is accepting applicants to anAI Fellows Program“to fully support a small group of the most promising PhD students in artificial intelligence and machine learning”. See also Open Phil’s partial list of关键研究亚博体育官网主题in AI alignment.
  • Call for papers: AAAI and ACM are running a new Conference on AI, Ethics, and Society, with submissions due by the end of October.
  • DeepMind’s Viktoriya Krakovna argues for aportfolio approach to AI safety research
  • Teaching AI Systems to Behave Themselves”: a solid article from theNew York Times在不断增长的AI安全研究领域。亚博体育官网这Timesalso has anopeningfor an investigative reporter in AI.
  • UC Berkeley’s Center for Long-term Cybersecurity ishiring对于多个角色,包括研究人员,董事助理和计划经理。亚博体育官网
  • Life 3.0:Max Tegmark发行了一本关于AI未来的新书(播客讨论).