

2018 research plans and predictions

||MIRI Strategy

Update Nov. 23:This post was edited to reflect Scott’s terminology change from “naturalized world-models” to “embedded world-models。“For a full introduction to these four research problems, see Scott Garrabrant and Abram Demski’s “Embedded Agency。“

Scott Garrabrant正在接管Nate Soares的工作,让我们在今年在不同的研究领域进行了多少进展。亚博体育官网斯科特将Miri的对齐研究分为五类:亚博体育官网

embedded world-models— Problems related to modeling large, complex physical environments that lack a sharp agent/environment boundary. Central examples of problems in this category include logical uncertainty, naturalized induction, multi-level world models, and ontological crises.

在troductory resources: “Formalizing Two Problems of Realistic World-Models,“”Questions of Reasoning Under Logical Uncertainty,“”逻辑归纳,“”Reflective Oracles

Examples of recent work: “极高的布鲁瓦尔,“”一个无法控制的数学家,“”Further Progress on a Bayesian Version of Logical Uncertainty

decision theory- 与建模不同(实际和反事实)决策输出的后果相关的问题,以便决策者可以选择具有最佳后果的输出。中央问题包括反事实,更新,协调,敲诈勒索和反思稳定性。

在troductory resources: “在大马士革欺骗死亡,“”决定是为了使不良成果不一致,“Functional Decision Theory

Examples of recent work:Cooperative Oracles,“”Smoking Lesion Steelman” (1,2), “快乐的舞蹈问题,“”反射性oracles作为交谈制定问题的解决方案

强大的代表团- 与建立高度有能力的代理有关的问题,这些代理商可以信任地执行一些任务。核心问题包括浮现,价值学习,知情监督和上升反射。

在troductory resources:价值学习问题,“”易燃,“”Problem of Fully Updated Deference,“”视频反思,“”Using Machine Learning to Address AI Risk

Examples of recent work: “对古特哈尔法的分类变体,“”Stable Pointers to Value

subsystem alignment- 与确保AI系统的子系统不在交叉目的工作的问题,特别是系统避免创建优化非预期目标的亚博体育苹果app官方下载内部子处理。中央问题包括良性诱导。

在troductory resources:普遍先前实际上是什么样的?“,”Optimization Daemons,“”Modeling Distant Superintelligences

Examples of recent work:Some Problems with Making Induction Benign

其他— Alignment research that doesn’t fall into the above categories. If we make progress on the open problems described inAlignment for Advanced ML Systems,“并且进度较少与我们的代理基金会工作和更多的毫升导向,那么我们可能会在这里分类。

阅读更多 ”

New paper: “Categorizing variants of Goodhart’s Law”


分类变量的古德哈特定律Goodhart’s Law states that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” However, this is not a single phenomenon. InGoodhart Taxonomy,我建议(至少)四种不同的机制,当您针对它们优化时,代理措施通过该机制措施:回归,极值,因果和对抗性。

David Manheim现在有助于将我的分类系统写在这些机制上的更多细节:“Categorizing variants of Goodhart’s Law。“From the conclusion:

This paper represents an attempt to categorize a class of simple statistical misalignments that occur both in any algorithmic system used for optimization, and in many human systems that rely on metrics for optimization. The dynamics highlighted are hopefully useful to explain many situations of interest in policy design, in machine learning, and in specific questions about AI alignment.

在policy, these dynamics are commonly encountered but too-rarely discussed clearly. In machine learning, these errors include extremal Goodhart effects due to using limited data and choosing overly parsimonious models, errors that occur due to myopic consideration of goals, and mistakes that occur when ignoring causality in a system. Finally, in AI alignment, these issues are fundamental to both aligning systems towards a goal, and assuring that the system’s metrics do not have perverse effects once the system begins optimizing for them.


回归古特哈特- 选择代理度量时,您不仅选择真实目标,而且选择代理与目标之间的差异。

  • Model: WhenUis equal toV+X, whereX有些噪音,一个很大的点Uvalue will likely have a largeVvalue, but also a largeXvalue. Thus, whenUis large, you can expectVto be predictably smaller thanU
  • Example: Height is correlated with basketball ability, and does actually directly help, but the best player is only 6’3″, and a random 7′ person in their 20s would probably not be as good.

Extremal Goodhart- 从中​​代理占极值的世界可能与普通世界不同,其中代理与目标之间的相关性。

  • Model:图案倾向于打破简单的关节。一个简单的世界子集是那些世界的Uis very large. Thus, a strong correlation betweenUVobserved for naturally occuringUvalues may not transfer to worlds in whichUis very large. Further, since there may be relatively few naturally occuring worlds in whichU非常大,非常大U可能一致小Vvalues without breaking the statistical correlation.
  • Example: The tallest person on record, Robert Wadlow, was 8’11” (2.72m). He grew to that height because of a pituitary disorder; he would have struggled to play basketball because he “required leg braces to walk and had little feeling in his legs and feet.”

因果关系— When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.

  • Model: IfVcausesU(或者如果VU既是第三件事又造成的),那么之间的相关性VUmay be observed. However, when you intervene to increaseUthrough some mechanism that does not involveV, you will fail to also increaseV
  • Example: Someone who wishes to be taller might observe that height is correlated with basketball skill and decide to start practicing basketball.

Adversarial Goodhart— When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

  • Model: Consider an agentAwith some different goalW。Since they depend on common resources,WVare naturally opposed. If you optimizeUas a proxy forV, 和A知道这一点,Ais incentivized to make largeUvalues coincide with largeWvalues, thus stopping them from coinciding with largeVvalues.
  • Example: Aspiring NBA players might just lie about their height.

For more on this topic, see Eliezer Yudkowsky’s write-up,Goodhart’s Curse

Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.



Sam Harris和Eliezer Yudkowsky上的“AI:竞争朝着边缘”


Waking Up with Sam Harris

Miri高级研究员Eli亚博体育官网ezer Yudkowsky最近被邀请成为Sam Harris的客人“Waking Up“播客。山姆是一个神经科学家和受欢迎的作者,他们撰写了与哲学,宗教和公众话语有关的主题。

The following is a complete transcript of Sam and Eliezer’s conversation,ai:朝着边缘赛跑


阅读更多 ”

February 2018 Newsletter


January 2018 Newsletter




Our2017 fundraiseris complete! We’ve had an incredible month, with, by far, our largest fundraiser success to date. More than 300 distinct donors gave just over$ 2.5m.1,加倍我们的第三个筹款目标为1.25亿美元。谢谢!

$2,504,625 raised in total!

358 donors contributed

Our largest donation came toward the very end of the fundraiser in the form of an Ethereum donation worth $763,970 from Vitalik Buterin, the inventor and co-founder of Ethereum. Vitalik’s donation represents the third-largest single contribution we’ve received to date, after a开放慈善项目的1.25亿美元授予支付in October, and a$1.01M Ethereum donation在五月。

我们的中间筹款机更新, we noted that MIRI was included in a largeMatching Challenge:与筹集有效捐赠的伙伴关系,专业扑克玩家丹史密斯,Tom Crowley和Martin Crowleley宣布,他们将在12月底将所有捐款与Miri和九个其他组织匹配。捐助者帮助我们在2周内向我们的匹配帽提供300万美元,从丹,汤姆和马丁(Chank Guys)达到300万美元的比赛。来自匹配挑战的其他大型赢家,其中在不到3周内筹集了450万美元(包括匹配),包括Givedirectly(捐赠588万美元)和良好的食品学院(416K捐赠)。yabo体育官网下载ios


We also received substantial support from medium-sized donors: a total of $631,595 from the 42 donors who gave $5,000–$50,000 and a total of $113,556 from the 75 who gave $500–$5,000 (graph)。我们也很感谢捐赠雇主匹配慷慨的捐助者,在12月期间捐赠了超过10万美元的合计金额。



  1. 由于我们2017年12月在2018年12月开始发起的捐赠,所需周,确切的总数可能会略有增加。