Miri研究指南亚博体育官网

Nate Soares



Update March 2019: This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on theyabo is:

  • If you have a computer science or software engineering background:申请参加我们的新workshops on AI risk和towork as an engineer at MIRI。For this purpose, you don’t need any prior familiarity with our research.

    • 如果您不确定自己是否适合AI风险研讨会,或者适合工程师职位,向我们发送电子邮件和we can talk about whether it makes sense.

    • 您可以在我们的工程计划中找到更多有关我们的工程计划的信息2018 strategy update

  • 如果您想了解有关我们正在努力的问题的更多信息((regardless of your answer to the above): See “嵌入式代理透明for an introduction to our agent foundations research, and see ourAlignment Research Field Guide有关如何开始AI安全的一般建议。

    • 在检查了这两个资源之后,您可以在“嵌入式​​代理商”中使用链接和参考文献,并在此页面上了解有关要研究的主题的更多信息。如果您想关注特定问题,我们建议Scott Garrabrant的“Fixed Point Exercises。”As Scott notes:

      有时人们会问我应该学习什么数学才能进入代理基础。我的第一个答案是,我发现每个子领域的介绍性课程都很有帮助,但是我发现后来的课程的帮助程度要小得多。我的第二个答案是学习足够的数学,以了解所有固定点定理。

      These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.

    • If you want people to collaborate and discuss with, we suggest starting or joining amirix组,发布Lesswrong,申请我们AI对计算机科学家的风险workshops, or otherwiseletting us knowyou’re out there.

If humans are to develop smarter-than-human artificial intelligence that has a positive impact, we must meet three formidable challenges. First, we must design smarter-than-human systems that arehighly reliable,以便我们可以证明系统将实现指定目标或偏好的信心。亚博体育苹果app官方下载第二,设计必须是error-tolerant,使系统可以在不可避免的人为亚博体育苹果app官方下载错误时接受在线修改和校正。第三,系统实际上必须学亚博体育苹果app官方下载习有益的目标或偏好。

MIRI’s current research program focuses on understanding how tomeet these challenges in principle。There are aspects of reliable reasoning that we do not yet understand even in theory; there are questions of bounded rationality that we could not yet solve even in simplified settings. Our study focuses on finding solutions in simplified settings, as a first step. As such, our modern research looks much more like pure mathematics than software engineering or practical machine learning.

This guide briefly overviews our research priorities, and provides resources that will help you get to the cutting edge on each subject area. This guide is not intended to justify these research topics; for further motivation of our approach, refer to the article “MIRI’s Approach透明, or to ourtechnical agenda支持论文

Note (Sep 2016): This research guide is based around ourAgent Foundations agenda。As of 2016, we also have a机器学习集中的议程。Refer to that document for more information about research directions that we think are promising, and which are not covered by this guide.


如何使用本指南

本指南旨在针对尚未在相关主题领域尚未精通的有抱负的研究人员。亚博体育官网如果您已经是AI专业人士或经验丰富的数学家,请考虑跳到我们的existing publications反而。(我们的technical agendais a fine starting point.) This guide is geared towards students who are wondering what to study if they want to become MIRI researchers in the future, and toward professionals in other fields who want to get up to speed on our work.

Researchers generally end up joining our team by one of two paths. The first is to attend a MIRI workshop and build a relationship with us in person. You can use此形式申请参加研究研讨会。亚博体育官网请注意,研讨会之间通常有很多时间,而且容量有限。

第二道路是独立地在我们的研究议程上取得一些进展,并让我们知道您的结果。亚博体育官网您可以使用our online form申请您​​的工作协助或投入,但最快的开始贡献的方法是阅读有关智能代理基金会论坛(IAFF),请注意人们正在努力的开放问题,并解决一个问题。然后,您可以将结果发布为关联on the forum.

((Update March 2019:Lesswrong和AI对齐论坛are now our go-to venues for public discussion of AI alignment problems, superceding IAFF. See the top of this post for other updates to the suggestions in this section.)

The primary purpose of the research forum is for researchers who are already on the same page to discuss unpolished partial results. As such, posts on the forum can be quite opaque. This research guide can help you get up to speed on the open problems being discussed on the IAFF. It can also help you develop the skills necessary to qualify for a workshop, or find ways to work on open problems in AI alignment at other institutions.

本指南始于针对基本主题的建议,在尝试这种研究风格之前,重要的是要理解,例如概率理论。亚博体育官网之后,它分为一系列主题领域,并链接到论文,可以吸引您到该地区的最新技术。

This is not a linear guide: if you want to become a MIRI researcher, I recommend first making sure that you understand the basics, and then choosing one topic that interests you and going into depth in that area. Once you understand one topic well, you’ll be ready to try contributing in that topic area on the IAFF.

有了本指南中的所有材料,请不要为了打磨而磨碎。如果您已经知道材料,请前进。如果一个活跃的研究领域之一未能吸引您的兴趣亚博体育官网,请切换到另一个。如果您不喜欢推荐的教科书之一,请找到更好的教科书或完全跳过。本指南应作为弄清您可以做出贡献的工具,而不是作为该目标的障碍。


基础知识

在直接跳入我们的活跃研究主题之前,要有一些流利的基础数学概念,这一点很重要。亚博体育官网我们所有的研究领域都通亚博体育官网过对计算,逻辑和概率理论的基本理解来良好。以下是一些可以让您入门的资源。

你不需要在这一节中我阅读书籍n the order listed. Pick up whatever is interesting, and don’t hesitate to skip back and forth between the research areas and the basics as necessary.

集理论

Most of modern mathematics is formalized in set theory, and the textbooks and papers listed here are no exception. This makes set theory a great place to begin.



chapters 1-18

可计算性和逻辑

The theory of computability (and the limits posed by diagonalization) is foundational to understanding what can and can’t be done by machines.



第1-4章

Probability Theory

Probability theory is central to an understanding of rational agency. Some familiarity with reasoning under uncertainty is critical in all of our active research areas.



第1-5章

概率推断

这本书将有助于了解如何使用概率世界模型可以进行推理。


统计数据

Fluency with statistical modeling will be helpful for contributing to our “高级机器学习对齐”研亚博体育官网究议程。在这里,一些事先熟悉概率推理是一个好主意。


Machine Learning

To develop a practical familiarity with machine learning, we highly recommendAndrew Ng’s Coursera course((lecture notes这里)。有关ML的理论介绍,请尝试Understanding Machine Learning


人工智能

Though much of our work is theoretical in character, knowledge of the modern field of artificial intelligence is important to put this work in context.

It’s also important to understand the concept of VNM rationality, which I recommend learning fromthe Wikipedia article但是也可以从original book。冯·诺伊曼(Von Neumann)和摩根斯特(Morgenstern)表明,任何服从一些简单一致性公理的试剂的作用,其偏好可由效用函数表征。尽管有些人期望我们最终可能需要放弃VNM合理性才能构建可靠的智能代理,但VNM框架仍然是我们为表征任意强大代理的行为而拥有的最富有表现力的框架。(例如,请参阅正交论文instrumental convergence thesis来自博斯特罗姆的“The Superintelligent Will.透明) The concept of VNM rationality is used throughout all our active research areas.



现实的世界模型

如果您的人类比人类更聪明的系统不可靠,则对有益的目标进行正式化的好目标。亚博体育苹果app官方下载即使从原则上讲,我们尚不理解的良好推理方面。即使使用算法尚未得到充分理解的原因,也很可能通过构建使用算法的实用系统来获得洞察力:通常,在实用应用之后,理论上的理解也亚博体育苹果app官方下载遵循。但是,我们认为这种方法在设计有潜力成为超级智能的系统时:如果我们在尝试创建实用的超级智能系统之前,我们将拥有一般智力理论亚博体育苹果app官方下载,那么我们将变得更加安全。

因此,我们许多积极的研究主题都集中在一般智能的一部分上,即使在原则上,我亚博体育官网们尚不了解如何解决。例如,考虑以下问题:

I have a computer program, known as the “universe.” One function in the universe is undefined. Your job is to provide me with a computer program of the appropriate type to complete my universe program. Then, I’ll run my universe program. My goal is to score your agent according to how well it learns what the original universe program is.

我该怎么办?所罗门诺夫的归纳推理理论对理论解决方案有所了解:它描述了一种从观察结果中做出理想预测的方法,但仅在预测因子生活在环境之外的情况下。Solomonoff的诱导导致了许多有用的工具来思考归纳推理(包括Kolmogorov的复杂性,通用先验和AIXI),但是在代理商是代理商是宇宙子过程的情况下,该问题变得更加困难,该宇宙是宇宙计算的。。

In the case where the agent is embedded inside the environment, the induction problem gets murky: what counts as “learning the universe program”? Against what distribution over environments should the agent be scored? What constitutes ideal induction in the case where the boundary between “agent” and “environment” becomes blurry? These are questions of “naturalized induction.”

  1. Soares’Formalizing two problems of realistic world-models”进一步激发了与一般智力理论有关的归化问题。

  2. Altair’s “An Intuitive Explanation of Solomonoff InductionSolomonoff的归纳推理理论解释说,这是理解归化归纳问题的重要背景知识。

  3. Bensinger的“归化透明((series) explores questions of naturalized induction in more detail.

Solving problems of naturalized induction requires gaining a better understanding of realistic world-models: What is the set of “possible realities”? What sort of priors about the environment would an ideal agent use? Answers to these questions must not only allow good reasoning, they must allow for the specification of human goals in terms of those world-models.

For example, in Solomonoff induction (and in Hutter’s AIXI), Turing machines are used to model the environment. Pretend that the only thing we value is diamonds (carbon atoms covalently bound to four other carbon atoms). Now, say I give you a Turing machine. Can you tell me how much diamond is within?

为了设计一个按照世界模型指定的目标的代理商,代理必须具有某种方式来识别其世界模型(图灵机)内我们目标(碳原子)的本体论。在“正式化两个现实世界模型的两个问题”(上面链接)中讨论了这个“本体识别”问题,并首先是由de Blanc引入的:

  1. De Blanc’s “人造代理的价值系统中的本体论危机亚博体育苹果app官方下载透明asks how one might make an agent’s goals robust to changes in ontology. If the agent starts with an atomic model of physics (where carbon atoms are ontologically basic) then this may not be hard. But what happens when the agent builds a nuclear model of physics (where atoms are constructed from neutrons and protons)? If the “carbon recognizer” was hard-coded, the agent might fail to identify any carbon in this new world-model, and may start acting strangely (in search of hidden “true carbon”). How could the agent be designed so that it can successfully identify “six-proton atoms” with “carbon atoms” in response to this ontological crisis?


Legg and Hutter’s “Universal Intelligence: A Definition of Machine Intelligence透明describes AIXI, a universally intelligent agent in settings where the agent is separate from the environment, and a “scoring metric” used to rate the intelligence of various agent programs in this setting. Hutter’s AIXI and Legg’s scoring metric are very similar in spirit to what we are looking for in response to problems of naturalized induction and ontology identification. The two differences are that AIXI lives in a universe where agent and environment are separated whereas naturalized induction requires a solution where the agent is embedded within the environment, and AIXI maximizes rewards specified in terms of observations whereas we desire a solution that optimizes rewards specified in terms of the outside world.

You can learn more about AIXI in Hutter’s book通用人工智能, although reading Legg’s paper (linked above) is likely sufficient for our purposes.


决策理论

说我给你以下内容:(1)描述宇宙的计算机程序;(2)描述代理的计算机程序;(3)代理商可用的一组动作;(4)一组对宇宙所在的州历史的偏好。我任务您确定有关这些偏好的最佳操作。例如,您的输入可能是:

DEF UNIVERSE():结果= {lo,med,hi} action = {一个,两个,三,三} def agent():worldmodel = {lo:一,hi:hi:二,med:trip,med:trix}{一:lo,二:med,三:hi}返回区域[agent()]
def Agent(): worldmodel = {Lo: One, Hi: Two, Med: Three} return worldmodel[Hi]
actions = {One, Two, Three}
嗨> med> lo

((Notice how the agent is embedded in the environment.) This is another question that we don’t know how to answer, even in principle. It may seem easy: just iterate over each action, figure out which outcome the agent would get if it took that action, and then pick the action that leads to the best outcome. But as a matter of fact, in this thought experiment, the agent is a deterministic subprocess of a deterministic computer program: there is exactly one action that the agent is going to output, and asking what “would happen” if a deterministic part of a deterministic program did something that it doesn’t do is ill-defined.

为了评估如果代理采取了不同的措施,则必须构建“反事实环境”(代理人做某件事),必须构造“反事实环境”。反事实推理的令人满意的理论尚不存在。我们尚不了解如何确定嵌入在环境中的代理的最佳动作,即使在理论上,即使有了对宇宙和我们的偏好的全部知识,并具有无限的计算能力。

Solving this problem will require a better understanding of counterfactual reasoning; this is the domain of decision theory.

决策理论

Peterson’s textbook explains the field of normative decision theory in broad strokes. For a quicker survey, with a stronger focus on Newcomblike problems, see Muehlhauser’s “Decision theory FAQ。”


游戏理论

Many open problems in decision theory involve multi-agent settings. I have heard good things about Tadelis’ textbook, but have not read it myself. You also may have luck with Scott Alexander’s “Introduction to game theory透明on LessWrong.



第1-5章
((+6-9 if enthusiastic)

可预致性逻辑

可以在代理商将其行动基于他们可以在同一环境中证明其他代理商的事物的环境中研究多代理设置的玩具模型。我们当前的玩具模型大量利用了Provability逻辑。

现有的反事实推理方法在短期内都不令人满意(从某种意义上说,它们在某些可能的问题上有系统地实现不良结果)和长期(从意义上说,从自我修改的代理使用推理的意义上讲,亚博体育苹果app官方下载根据那些破碎的反事实,不良反事实会决定他们不应解决所有缺陷)。我的谈话Why ain’t you rich?”简要涉及这两个要点。要了解更多信息,我建议以下资源:

  1. Soares & Fallenstein’s “Toward idealized decision theory”作为一般概述,并进一步激发了与Miri研究计划相关的决策理论问题。亚博体育官网本文讨论了两种现代决策理论的缺点,并讨论了决策理论中的一些新见解,这些见解指向执行反事实推理的新方法。

If “Toward idealized decision theory” moves too quickly, this series of blog posts may be a better place to start:

  1. Yudkowsky的“The true Prisoner’s Dilemma”解释为什么合作不是自动的“正确”或“好”选择。

  2. Soares’因果决策理论不令人满意透明uses the Prisoner’s Dilemma to illustrate the importance of non-causal connections between decision algorithms.

  3. Yudkowsky的“纽科姆的问题和理性的遗憾”主张专注于“获胜”的决策理论,而不仅仅是看上去很合理的决策理论。Soares’新科目问题简介”覆盖类似的地面。

  4. Soares’新科目的问题是常态”指出,人类代理人通常会在常规的基础上对彼此的决策标准进行建模。

MIRI’s research has led to the development of “Updateless Decision Theory” (UDT), a new decision theory which addresses many of the shortcomings discussed above.

  1. Hintze的“问题类别的预测困境中的主导地位”总结了UDT对其他已知决策理论的主导地位,包括永恒的决策理论(TDT),这是另一种主导CDT和EDT的理论。

  2. 法伦斯坦的“A model of UDT with a concrete prior over logical statements”提供了一种概率形式化。

但是,UDT绝不是解决方案,并且在以下位置讨论了许多缺点:

  1. Slepnev’s “UDT中自我实现的虚假证明的一个例子”解释了由于虚假的证据,UDT如何取得优越的结果。

  2. Benson-Tilsen’s “带有已知搜索顺序的UDT透明is a somewhat unsatisfactory solution. It contains a formalization of UDT with known proof-search order and demonstrates the necessity of using a technique known as “playing chicken with the universe” in order to avoid spurious proofs.

In order to study multi-agent settings, Patrick LaVictoire has developed a modal agents framework, which has also allowed us to use provability logic to make some novel progress in the field of decision theory:

  1. Barasz等人的“Robust cooperation in the Prisoner’s Dilemma”允许我们考虑决定是否仅根据他们能做出什么来决定是否相互合作的代理proveabout each other’s behavior. This prevents infinite regress; in fact, the behavior of two agents which act only according to what they can prove about the behavior of the other can be determined in quadratic time using results from provability logic.


UDT was developed by Wei Dai and Vladimir Slepnev, among others. Dai’s “Towards a new decision theory透明introduced the idea, and Slepnev’s “A model of UDT with a halting oracle透明provided an early first formalization. Slepnev also described a strange problem with UDT wherein it seems as if agents are rewarded for having less intelligence, in “Agent simulates predictor”。

这些博客文章具有历史兴趣,但几乎所有内容都在上面“朝向理想的决策理论”。


Logical Uncertainty

想象一个黑匣子,带有一个输入斜槽和两个输出槽。可以将球放入输入槽中,它将来自两个输出槽之一。黑匣子内是一台Rube Goldberg机器,它将球从输入斜槽带到输出槽之一。

A perfect probabilistic reasoner who doesn’t know which Rube Goldberg machine is in the box doesn’t know how the box will behave, but if they could figure out which machine is inside the box, then they would know which chute would take the ball. This reasoner is环境不确定

一个现实的推理器可能知道盒子中的哪台机器,并且可能确切地知道机器的工作原理,但可能缺乏演绎能力来找出机器将在哪里丢球。这个原因是逻辑上不确定。

Probability theory assumes logical omniscience; it assumes that reasoners know all consequences of the things they know. In reality, bounded reasoners are not logically omniscient: we can know precisely which machine the box implements and precisely how the machine works, and just not have the time to deduce where the ball comes out. We reason under logical uncertainty.

逻辑不确定性下的推理形式理论尚不存在。在构建一个高度可靠的通常智能系统时,获得这种理解非常重要:每当代理原因有关复杂系统,计算机程序或其他代理的行为的原因时,它必须至少在逻辑不确定性下进行操作。亚博体育苹果app官方下载

To understand the state of the art, a solid understanding of probability theory is a must; consider augmenting the first few chapters ofJayneswithFeller, chapters 1, 5, 6, and 9, and then study the following papers:

  1. Soares & Fallenstein’s “逻辑不确定性下的推理问题透明provides a general introduction, explaining the field of logical uncertainty and motivating its relevance to MIRI’s research program.

  2. Gaifman’s “Concerning measures in first-order calculi”多年前看这个问题。盖夫曼(Gaifman)主要集中在相关的子问题上,这是将概率分配给正式系统的不同模型(假设一旦已知模型,则已知该模型的所有后果)。亚博体育苹果app官方下载我们现在正在尝试将这种方法扩展到更完整的逻辑不确定性概念(推理者可以知道该模型是什么,但不知道该模型的含义),但是Gaifman的工作仍然有用了解逻辑不确定性围绕的困难。

  3. Hutter et al。”Probabilities on sentences in an expressive logic透明largely looks at the problem of logical uncertainty assuming access to infinite computing power (and many levels of halting oracles). Understanding Hutter’s approach (and what can be done with infinite computing power) helps flesh out our understanding of where the difficult questions lie.

  4. Demski的“逻辑先验概率”提供了一个可计算的逻辑先验。在Demski之后,我们的工作很大程度上集中在逻辑句子上建立近似的先验概率分布,因为完善和近似逻辑先验的行为与一般逻辑不确定性下的推理行为非常相似。

  5. 克里斯蒂安的“Non-omniscience, probabilistic inference, and metamathematics透明largely follows this approach. This paper provides some early practical considerations about the generation of logical priors, and highlights a few open problems.


For more historical work on this problem, see Gaifman’s “丰富语言的概率…“ 和 ”资源有限的推理,并将概率分配给算术陈述。”


Vingean反思

使AI问题与众不同的大部分是,与人类程序员相比,足够先进的系统将能够进行更高质量的科学和工程。亚博体育苹果app官方下载先进系统的许多可能的危害和好处源于其潜力,使其具有更高水平的能力,可能导致亚博体育苹果app官方下载情报爆炸

如果一个代理实现通过recursi超智ve self-improvement, then the impact of the resulting system depends entirely upon the ability of the initial system to reason reliably about agents that are more intelligent than itself. What sort of reasoning methods could a system use in order to justify extremely high confidence in the behavior of a yet more intelligent system? We refer to this sort of reasoning as “Vingean reflection”, after Vernor Vinge (1993),他指出,通常不可能准确地预测代理人比推理者更聪明的行为。

执行Vingean反思的推理者必须必然是理由抽象关于更聪明的代理。几乎可以肯定,这在逻辑上不确定的推理上几乎需要某种形式的高信任,但是代替了逻辑不确定性的工作理论,关于证明的推理(使用形式逻辑)是研究抽象推理的最佳形式主义。因此,对Vingean反思的现代研究需要正式逻辑背景:

First-Order Logic

为研究self-modif米里现有的玩具模型ying agents are largely based on this logic. Understanding the nuances of first-order logic is crucial for using the tools we have developed for studying formal systems capable of something approaching confidence in similar systems.

We study Vingean reflection by constructing toy models of agents which are able to gain some form of confidence in highly similar systems. To get to the cutting edge, read the following papers:

  1. Fallenstein & Soares’ “Vingean reflection: Reliable reasoning for self-improving agents透明introduces the field of Vingean reflection, and motivates its connection to MIRI’s research program.

  2. Yudkowsky的“The procrastination paradox透明goes into more detail on the need for satisfactory solutions to walk a fine line between the Löbian obstacle (a problem stemming from too little “self-trust”) and unsoundness that come from toomuchself-trust.

  3. Christiano et al.’s “Definability of truth in probabilistic logic透明describes an early attempt to create a formal system that can reason about itself while avoiding paradoxes of self-reference. It succeeds, but has ultimately been shown to be unsound. Mywalkthrough因为本文可能有助于将其置于更多的背景下。

  4. Fallenstein & Soares’ “自我提高时空嵌入式智能中的自我引用问题”描述了我们简单的建议佛教仪模型,用于研究产生自己或“瓷砖”自己的版本略有改进的代理。该论文展示了一种玩具场景,在该场景中,声音代理可以成功地瓷砖(例如,对)其他类似代理。


Yudkowsky & Herreshoff’s “用于自我修改AI的瓷砖代理透明is an older, choppier introduction to Vingean reflection which may be easier to work through using mywalkthrough

如果您对该研究主题感到兴奋,那么还有许多其他相关的技术报告亚博体育官网。不幸的是,他们中的大多数人都不能很好地解释自己的动机,也没有被纳入更大的背景。

法伦斯坦的“Procrastination in probabilistic logic”说明了克里斯蒂安(Christiano)等人的概率推理系统如何不健全且容易受到拖延悖论的影响。亚博体育苹果app官方下载Yudkowsky的“Distributions allowing tiling……”需要一些早期的步骤s probabilistic tiling settings.

法伦斯坦的“降低数学强度…” describes one unsatisfactory property of Parametric Polymorphism, a partial solution to the Löbian obstacle. Soares’ “Fallenstein’s monster”描述了一个避免上述问题的黑客形式系统。亚博体育苹果app官方下载它还展示了一种限制代理目标谓词的机制,该机制也可以通过参数多态性使用,以创建比瓷砖代理纸中探讨的限制性较小的PP版本。法伦斯坦的“无限下降的声音理论序列…” describes a more elegant partial solution to the Löbian obstacle, which is now among our favored partial solutions.

An understanding of recursive ordinals provides a useful context from which to understand these results, and can be gained by reading Franzén’s “传播进程:第二次查看完整性。透明


符合条件

As artificially intelligent systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system “corrigible” if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences.

This field of research is basically brand-new, so all it takes in order to get up to speed is to read a paper or two:

  1. Soares等人的“符合条件透明introduces the field at large, along with a few open problems.

  2. Armstrong’s “Proper value learning through indifference”讨论一种潜在的方法,使代理在它们最大化的效用功能之间无动于衷,这是迈向允许自己修改的代理的一小步。

我们目前在验证性上的工作主要集中在一个称为“关机问题”的小子问题上:您如何构造在关闭按钮的按下按下按钮上关闭的代理,并且没有激励措施引起或防止按下按下按钮?在该子问题中,我们目前专注于实用程序无关紧要问题:您如何构建一个代理,该代理可以切换其最大化的实用程序功能,而无需它激励它影响开关是否发生?即使我们对效用无效问题有满意的解决方案,这也不会为关闭问题提供令人满意的解决方案,因为似乎仍然很难以不受反常实例化的方式充分指定“关闭行为”。斯图尔特·阿姆斯特朗(Stuart Armstrong)撰写了几篇博客文章,涉及“减少影响” AGIS的规范:

  1. 驯化减少了影响AIS透明
  2. 降低冲击AI:没有后频道透明

These first attempts are not yet a full solution, but they should get you up to speed on our current understanding of the problem.


可以在Web论坛上找到早期工作不太错误。Most of the relevant results are captured in the above papers. One of the more interesting of these is “Cake or Death透明, an example of the “motivated value selection” problem. In this example, an agent with uncertainty about its utility function benefits from avoiding information that reduces its uncertainty.

Armstrong’s “The mathematics of reduced impact: help needed透明lists initial ideas for specifying reduced-impact agents, and his “Reduced impact in practice: randomly sampling the future透明sketches a simple method for assessing whether a future has been impacted.

Armstrong’s “Utility indifference透明outlines the original utility indifference idea, and is largely interesting for historical reasons. It is subsumed by the “Proper value learning through indifference” paper linked above.


Value Learning

由于我们自己对价值观的理解是模糊和不完整的,因此将值加载到功能AI中的最有希望的方法是指定代理商的标准学习our values incrementally. But this presents a number of interesting problems:

Say you construct a training set containing many outcomes filled with happy humans (labeled “good”) and other outcomes filled with sad humans (labeled “bad”). The simplest generalization, from this data, might be that humans really like human-shaped smiling-things: this agent may then try to build many tiny animatronic happy-looking people.

Value learning must be an online process: the system must be able to identify ambiguities and raise queries about these ambiguities to the user. It must not only identify cases that it doesn’t know how to classify (such as cases where it cannot tell whether a face looks happy or sad), butalsoidentify dimensions along which the training data gives no information (such as when your training data never shows outcomes filled with human-shaped automatons that look happy, labeled as worthless).

Of course, ambiguity identification alone isn’t enough: you don’t want a system that spends the first three weeks asking for clarification on whether humans are still worthwhile when they are at different elevations, or when the wind is blowing, before finally (after the operators have stopped paying attention) asking whether it’s important that the human-shaped things be acting of their own will.

In order for an agent to reliably learn ourintentions,代理必须构建和完善其操作员的模型,并使用该模型来告知其查询并改变其偏好。要了解有关这些问题和其他问题的更多信息,请参见以下内容:

  1. Soares’价值学习问题”提供了与价值学习有关的一些开放问题的一般概述。

  2. Dewey’s “学习有价值的东西”进一步讨论了价值学习的困难。

  3. The正交论文argues that value learning will not be solved by default.

  4. MacAskill’s “Normative Uncertainty”提供了讨论规范不确定性的框架。请注意,全部工作虽然包含许多见解,但很长。您可以脱离零件和/或在某些方面跳过,尤其是当您对其他积极研究方面更加兴奋时。亚博体育官网


解决规范不确定性的一种方法是Bostrom&Ord的“parliamentary model,” which suggests that value learning is somewhat equivalent to a voter aggregation problem, and that many value learning systems can be modeled as parliamentary voting systems (where the voters are possible utility functions).

欧文·科顿·巴拉特(Owen Cotton-Barratt)的“标准化的几何原因…”讨论了效用功能的归一化;这与道德不确定性下的推理玩具模型有关。

Fallenstein&Stiennon的“Loudness”讨论了与汇总效用函数的关注,这是由于实用程序函数所编码的偏好被保留在正仿射转换下(例如,随着效用函数的缩放或移动)。这意味着需要特殊护理才能使一组可能的功能正常化。


Other Tools

掌握在任何话题都可以成为一个非常强大的工具, especially in the realm of mathematics, where seemingly disjoint topics are actually deeply connected. Many fields of mathematics have the property that if you understand them very very well, then that understanding is useful no matter where you go. With that in mind, while the subjects listed below are not necessary in order to understand MIRI’s active research, an understanding of each of these subjects constitutes an additional tool in the mathematical toolbox that will often prove quite useful when doing new research.

Discrete Math

教科书可用online。Most math studies either continuous or discrete structures. Many people find discrete mathematics more intuitive, and a solid understanding of discrete mathematics will help you gain a quick handle on the discrete versions of many other mathematical tools, such as group theory, topology, and information theory.


线性代数

Linear algebra is one of those tools that shows up almost everywhere in mathematics. A solid understanding of linear algebra will be helpful in many domains.


类型理论

SET理论通常是现代数学的基础,但并不是唯一可用的候选人。类型理论还可以作为数学的基础,在许多情况下,类型理论更适合当前的问题。类型理论还弥合了计算机程序和数学证明之间的许多理论差距,因此通常与某些类型的AI研究有关。亚博体育官网


Category Theory

Category theory studies many mathematical structures at a very high level of abstraction. This can help you notice patterns in disparate branches of mathematics, and makes it much easier to transfer your mathematical tools from one domain to another.


拓扑

拓扑是在数学中几乎所有地方出现的另一个主题中的另一个。对拓扑的深入了解在许多意外的地方都有帮助。


计算性和复杂性

MIRI’s math research is working towards solutions that will eventually be relevant to computer programs. A good intuition for what computers are capable of is often essential.


Program Verification

程序验证技术使程序员可以确信特定程序将根据某些规范实际采取行动。(当然,仍然很难验证规范描述了预期的行为。)尽管Miri的工作目前不关心验证现实世界程序,但了解现代程序验证技术可以和不能做什么非常有用。

Understanding the Mission

Why do this kind of research in the first place?

Superintelligence

This guide largely assumes that you’re already on board with MIRI’s mission, but if you’re wondering why so many people think this is an important and urgent area of research in the first place,Superintelligence提供了一个不错的概述。


理性:从AI到僵尸

This electronic tome compiles six volumes of essays that explain much of the philosophy and cognitive science behind MIRI’s perspective on AI.


Inadequate Equilibria

关于微观经济学和认识论的讨论,因为它们在发现社会失误和盲点(包括被忽视的研究机会)上进行了讨论。亚博体育官网试图回答一个基本问题:“何时能雄心勃勃的项目能够实现非同寻常的目标,希望成功?”