AI对齐问题：为什么很难以及从哪里开始

What is it?：Eliezer Yudkowsky于2016年5月5日在斯坦福大学发表的演讲符号系统杰出的扬声亚博体育苹果app官方下载器series.

Talk：完整的视频。

成绩单：Full (including Q&A)，，，，部分（包括选择幻灯片）。

幻灯片没有过渡：高质量，，，，低质量。

幻灯片带有过渡：高质量，，，，低质量。

抽象的：如果我们可以建立足够高级的机器智能，我们应该指出什么目标？在这个主题上的前沿问题更少：“机器人可能不会伤害人类，也不会通过无所作为允许人类受到伤害”，以及更多，“如果您可以正式指定任意智能和强大的代理人的偏好，您能把它安全地将一个草莓移到盘子上吗？”这次演讲将讨论AI对齐中的一些开放技术问题，使这些问题难以使这些问题的可能性以及它们适合的更大情况。以及在这个相对较新的领域中工作的感觉。

Notes, references, and resources for learning more are collected here.

代理商及其效用功能

最佳的一般介绍是智能人工智能的主题，是尼克·博斯特罗姆（Nick Bostrom）的合理介绍Superintelligence和斯图尔特·阿姆斯特朗的比我们更聪明。有关较短的解释，请参阅my recent guest poston EconLog.

A fuller version of Stuart Russell’s quotation (来自edge.org）：

有许多令人信服的论点（担心AI灾难），尤其是那些涉及摩尔法律或意识和邪恶意图的自发出现的论点。这次对话的许多贡献者似乎都在回应这些论点，而忽略了Omohundro，Bostrom和其他人提出的更实质性的论点。

主要关心的不是怪异的新兴意识，而是仅仅是制造的能力high-quality decisions。Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

公用事业功能可能与人类的价值观不完全一致，这（充其量）很难固定。

任何足够有能力的智能系统都将倾向于确保其自己的持续存在并获取物理和计算资源亚博体育苹果app官方下载 - 不是为了自己的缘故，而是在其指定的任务中取得成功。

A system that is optimizing a function ofnvariables, where the objective depends on a subset of sizek<n，，，，will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

艾萨克·阿西莫夫（Isaac Asimov）在1942年的短篇小说中介绍了三种机器人技术跑来跑去。”

彼得·诺维格 /斯图尔特·罗素的报价来自Artificial Intelligence: A Modern Approach，，，，the top undergraduate textbook in AI.

The arguments I give for having a utility function are standard, and can be found in e.g. Poole and Mackworth’sArtificial Intelligence: Foundations of Computational Agents。我在更长的时间内写下规范合理性理性：从AI到僵尸(e.g., inThe Allais Paradoxandzut allais！）。

一些AI对齐子问题

My discussion of low-impact agents borrows from a forthcoming research proposal by Taylor et al.: “高级机器学习系统的价值一致性亚博体育苹果app官方下载。”有关概述，请参阅影响低on Arbital.

在Soares等人的“关闭问题”中讨论了暂停问题（以“关闭问题”为名符合条件。”稳定的政策建议来自泰勒的最大化数量，同时通过某些渠道忽略效果。

坡反对机器国际象棋的可能性的论点来自1836年的文章“Maelzel的国际象棋游戏。”

Fallenstein and Soares’ “Vingean Reflection” is currently the most up-to-date overview of work in goal stability. Other papers cited:

Yudkowsky and Herreshoff (2013). “用于自我修改AI的瓷砖代理和Löbian障碍物。”工作文件。
Christiano等。（2013）。“概率逻辑中真理的确定性。”工作文件。
Fallenstein和Kumar（2015）。“Proof-Producing Reflection for HOL: With an Application to Model Polymorphism。”在互动定理证明：第六国际会议，会议记录。
Yudkowsky（2014）。“Distributions Allowing Tiling of Staged Subjective EU Maximizers。”技术报告2014-1。机器智能研究所。亚博体育官网

为什么要期待困难？

有关正交最终目标和融合器乐策略的更多信息，请参见Bostrom的“The Superintelligent Will” (also reproduced inSuperintelligence）。Benson-Tilsen和Soares的“”Formalizing Convergent Instrumental Goals” provides a toy model.

微笑最大化器基于比尔·希伯德（Bill Hibbard）的提议。这个示例和尤尔根·施密杜伯（JürgenSchmidhuber）的可压缩性建议在Soares的“”中更全面地讨论了”The Value Learning Problem。”另请参阅arbital页面上的边缘实例化，，，，上下文灾难，和Nearest Unblocked Strategy。

看到Miri FAQand GiveWell’s关于高级AI潜在风险的报告for quick explanations of why AI is likely to be able to surpass human cognitive capabilities, among other topics. Bensinger’s当AI加速AI时注意一般原因期望能力加速，而”在telligence Explosion Microeconomics”探讨了一个特定问题，即自我修改AI是否可能导致AI进度加速。

Muehlhauser notes the analogy between computer security and AI alignment research inAI Risk and the Security Mindset。

我们现在在哪里

美里的技术研究议程亚博体育官网summarizes many of the field’s core open problems.

For more on conservatism, see the Arbital postConservative Concept Boundaryand Taylor’s保守分类器。同样在arbital上：介绍mild optimizationand基于ACT的代理。

幻灯片中引用的论文：

Armstrong和Levinstein（2015）。“减少了影响人工智能。”工作文件。
Soares（2015）。“正式化两个现实世界模型的问题。”技术报告2015-3。机器智能研究所。亚博体育官网
泰勒（2016）。“Quantilizers: A Safer Alternative to Maximizers for Limited Optimization”。论文发表于2016 AI AAAI,伦理和Society Workshop.
Evans et al. (2015). “Learning the Preferences of Bounded Agents。”论文在NIPS 2015关于有限最优性的研讨会上发表。
Hutter (2007). “Universal Algorithmic Intelligence: A Mathematical Top→Down Approach。”ARXIV：CS/0701125 [CS.AI]。
Lavictoire等。（2014）。“通过Löb定理，囚犯困境中的计划平衡。”在没有事先协调研讨会的AAAI 2014多重互动中发表的论文。
Fallenstein et al. (2015). “反射性甲壳：人工智能游戏理论的基础。”在Proceedings of LORI 2015。

电子邮件contact@www.hdjkn.comif you have any questions, and see亚博体育苹果app官方下载 for information about opportunities to collaborate on AI alignment projects.

代理商及其效用功能

一些AI对齐子问题

为什么要期待困难？

我们现在在哪里

搜索

Browse

订阅