A Guide to MIRI’s Research

by Nate Soares

更新2019年3月：本研究指南仅亚博体育官网自2015年以来就进行了轻微更新。我们针对想要从事该研究的人的新建议yabo 是：

如果您有计算机科学或软件工程背景: Apply to attend our new关于AI风险的讲习班并work as an engineer at MIRI。为此，您不需要事先熟悉我们的研究。亚博体育官网
- If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position,shoot us an email我们可以谈论这是否有意义。
- You can find out more about our engineering program in our2018策略更新。
If you’d like to learn more about the problems we’re working on(regardless of your answer to the above): See “Embedded Agency”介绍我们的代理基金会研究，并查看我们的亚博体育官网Alignment Research Field Guidefor general recommendations on how to get started in AI safety.
- After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “固定点练习。”正如斯科特指出的那样：
  
  Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.
  这两个答案实际上非常相似。固定点定理跨越数学，并且是（我的）思考代理基金会的核心。
- 如果您希望人们合作和讨论，我们建议开始或加入mirixgroup, posting onLessWrong, applying for ourAI Risk for Computer Scientists讲习班或其他让我们知道你在那里。

If humans are to develop smarter-than-human artificial intelligence that has a positive impact, we must meet three formidable challenges. First, we must design smarter-than-human systems that arehighly reliable, so that we can justify confidence that the system will fulfill the specified goals or preferences. Second, the designs must be容忍错误, so that the systems are amenable to online modification and correction in the face of inevitable human error. Third, the system must actually learn beneficial goals or preferences.

MIRI’s current research program focuses on understanding how to原则上应对这些挑战。可靠的推理的某些方面即使在理论上也不理解。有界定理性的问题，即使在简化的设置中，我们也无法解决。我们的研究重点是在简化的设置中找到解决方案，这是第一步。因此，我们的现代研究看起来更像是纯数学，亚博体育官网而不是软件工程或实用的机器学习。

本指南简要概述了我们的研究优先事项，并提供了可以帮助您进入每个主题亚博体育官网领域的最前沿的资源。本指南并非旨在证明这些研究主题是合理的。亚博体育官网有关我们方法的进一步动机，请参阅文章“MIRI’s Approach”，或者我们technical agendaandsupporting papers。

注意（2016年9月）：本研究指南涉及我们的亚博体育官网代理基金会议程。As of 2016, we also have amachine learning focused agenda。有关我们认为有希望的研究方向的更多信息，请参阅该文档，本指南不涵盖。亚博体育官网

How to Use This Guide

This guide is intended for aspiring researchers who are not yet well-versed in the relevant subject areas. If you are already an AI professional or a seasoned mathematician, consider skipping to our现有出版物instead. (Ourtechnical agendais a fine starting point.) This guide is geared towards students who are wondering what to study if they want to become MIRI researchers in the future, and toward professionals in other fields who want to get up to speed on our work.

Researchers generally end up joining our team by one of two paths. The first is to attend a MIRI workshop and build a relationship with us in person. You can usethis formto apply to attend a research workshop. Be warned that there is often quite a bit of time between workshops, and that they have limited capacity.

这second path is to make some progress on our research agenda independently and let us know about your results. You can use我们的在线表格to apply for assistance or input on your work, but the fastest way to start contributing is to read posts on theIntelligent Agent Foundations Forum(IAFF), note the open problems people are working on, and solve one. You can then post your results as alink在论坛上。

(更新2019年3月: LessWrong and theAI Alignment Forumare now our go-to venues for public discussion of AI alignment problems, superceding IAFF. See the top of this post for other updates to the suggestions in this section.)

研究论坛的主要目的是针对已经在同一页面上的研究人员讨论亚博体育官网未磨损的部分结果。因此，论坛上的帖子可能非常不透明。本研究指南亚博体育官网可以帮助您加快在IAFF上讨论的开放问题。它还可以帮助您发展有资格参加研讨会所需的技能，或者找到在其他机构中AI对齐中解决开放问题的方法。

This guide begins with recommendations for basic subjects that it’s important to understand before attempting this style of research, such as probability theory. After that, it’s broken into a series of topic areas, with links to papers that will catch you up to the state of the art in that area.

This is not a linear guide: if you want to become a MIRI researcher, I recommend first making sure that you understand the basics, and then choosing one topic that interests you and going into depth in that area. Once you understand one topic well, you’ll be ready to try contributing in that topic area on the IAFF.

With all of the material in this guide, please do not grind away for the sake of grinding away. If you already know the material, skip ahead. If one of the active research areas fails to capture your interest, switch to a different one. If you don’t like one of the recommended textbooks, find a better one or skip it entirely. This guide should serve as a tool for figuring out where you can contribute, not as an obstacle to that goal.

这Basics

It’s important to have some fluency with elementary mathematical concepts before jumping directly into our active research topics. All of our research areas are well-served by a basic understanding of computation, logic, and probability theory. Below are some resources to get you started.

您无需按照列出的顺序阅读本节中的书籍。拿起有趣的东西，请随时在研究区域和基本知识之间来回跳过。亚博体育官网

Set Theory 大多数现代数学在集合理论中都是形式化的，此处列出的教科书和论文也不例外。这使设定理论成为一个不错的起点。第1-18章 Computability and Logic 计算性理论（以及对角线化构成的限制）是理解机器可以做和不能做什么的基础。 chapters 1-4 概率理论概率理论对于理解理性机构是至关重要的。在我们所有活跃的研究领域，对不确定性下的推理的熟悉至关重要。亚博体育官网 chapters 1-5 Probabilistic Inference This book will help flesh out an understanding of how inference can be done using probabilistic world-models. Statistics 统计建模的流利度将有助于我们的“Alignment for Advanced Machine Learning” research agenda. Some prior familiarity with probabilistic reasoning is a good idea here. 机器学习为了建立对机器学习的实际熟悉，我们强烈建议安德鲁·诺（Andrew Ng）的Coursera课程(lecture noteshere). For a more theoretical introduction to ML, try了解机器学习。 Artificial Intelligence Though much of our work is theoretical in character, knowledge of the modern field of artificial intelligence is important to put this work in context.

了解VNM理性的概念也很重要，我建议从中学习the Wikipedia articlebut which can also be picked up from the原始书。Von Neumann and Morgenstern showed that any agent obeying a few simple consistency axioms acts with preferences characterizable by a utility function. While some expect that we may ultimately need to abandon VNM rationality in order to construct reliable intelligent agents, the VNM framework remains the most expressive framework we have for characterizing the behavior of arbitrarily powerful agents. (For example, see theorthogonality thesisand the工具收敛论文from Bostrom’s “超级智能。”）VNM合理性的概念在我们所有活跃的研究领域都使用。亚博体育官网

Realistic World-Models

Formalizing beneficial goals does you no good if your smarter-than-human system is unreliable. There are aspects of good reasoning that we don’t yet understand, even in principle. It is likely possible to gain insight by building practical systems that use algorithms which seem to work, even if the reasons why they work are not yet well-understood: often, theoretical understanding follows in the wake of practical application. However, we consider this approach imprudent when designing systems that have the potential to become superintelligent: we will be safer if we have a theory of general intelligence on hand before attempting to create practical superintelligent systems.

For this reason, many of our active research topics focus on parts of general intelligence that we do not yet understand how to solve, even in principle. For example, consider the following problem:

我有一个计算机程序，称为“宇宙”。宇宙中的一个功能是未定义的。您的工作是为我提供适当类型的计算机程序来完成我的宇宙程序。然后，我将运行我的宇宙程序。我的目标是根据其了解原始宇宙计划的程度来评分您的代理商。

How could I do this? Solomonoff’s theory of inductive inference sheds some light on a theoretical solution: it describes a method for making ideal predictions from observations, but only in the case where the predictor lives outside the environment. Solomonoff induction has led to many useful tools for thinking about inductive inference (including Kolmogorov complexity, the universal prior, and AIXI), but the problem becomes decidedly more difficult in the case where the agent is a subprocess of the universe, computed by the universe.

In the case where the agent is embedded inside the environment, the induction problem gets murky: what counts as “learning the universe program”? Against what distribution over environments should the agent be scored? What constitutes ideal induction in the case where the boundary between “agent” and “environment” becomes blurry? These are questions of “naturalized induction.”

Soares’正式化两个现实世界模型的问题” further motivates problems of naturalized induction as relevant to the construction of a theory of general intelligence.
Altair的“Solomonoff感应的直观解释” explains Solomonoff’s theory of inductive inference, which is important background knowledge when it comes to understanding open problems of naturalized induction.
Bensinger’s “Naturalized induction”（系列）更详细地探讨了归化的问题。

解决归化的问题需要更好地了解现实的世界模型：“可能的现实”集合是什么？理想的代理商会使用什么样的环境先验？这些问题的答案不仅必须允许良好的推理，而且还必须允许根据这些世界模型的人类目标规范。

例如，在Solomonoff感应（以及Hutter的Aixi）中，图灵机用于对环境进行建模。假装我们唯一重视的是钻石（碳原子共价与其他四个碳原子结合）。现在，说我给你一台图灵机。你能告诉我里面有多少钻石吗？

In order to design an agent that pursues goals specified in terms of its world models, the agent must have some way of identifying the ontology of our goals (carbon atoms) inside its world models (Turing machines). This “ontology identification” problem is discussed in “Formalizing Two Problems of Realistic World Models” (linked above), and was first introduced by De Blanc:

de Blanc的“Ontological crises in artificial agents’ value systems”询问一个人如何使代理商的目标对本体论的变化进行强大的目标。如果代理以物理原子模型（碳原子在本体论基础上）开始，那么这可能并不难。但是，当代理建立物理核模型（原子是由中子和质子构建的）时，会发生什么？如果“碳识别剂”是硬编码的，那么代理商可能无法识别出这种新的世界模型中的任何碳，并且可能开始奇怪地行动（寻找隐藏的“真实碳”）。如何设计代理，以便能够以“碳原子”的形式成功识别“六个原子原子”，以应对这一本体论危机？

Further Reading

Legg和Hutter的“通用智能：机器智能的定义” describes AIXI, a universally intelligent agent in settings where the agent is separate from the environment, and a “scoring metric” used to rate the intelligence of various agent programs in this setting. Hutter’s AIXI and Legg’s scoring metric are very similar in spirit to what we are looking for in response to problems of naturalized induction and ontology identification. The two differences are that AIXI lives in a universe where agent and environment are separated whereas naturalized induction requires a solution where the agent is embedded within the environment, and AIXI maximizes rewards specified in terms of observations whereas we desire a solution that optimizes rewards specified in terms of the outside world.

您可以在Hutter的书中了解更多有关AIXI的信息Universal Artificial Intelligence，尽管阅读Legg的论文（上面链接）可能足以满足我们的目的。

Decision Theory

Say I give you the following: (1) a computer program describing a universe; (2) a computer program describing an agent; (3) a set of actions available to the agent; (4) a set of preferences specified over the history of states that the universe has been in. I task you with identifying the best action available to the agent, with respect to those preferences. For example, your inputs might be:

def Universe(): outcomes = {Lo, Med, Hi} actions = {One, Two, Three} def Agent(): worldmodel = {Lo: One, Hi: Two, Med: Three} return worldmodel[Hi] territory = {One: Lo, Two: Med, Three: Hi} return territory[Agent()]

def Agent(): worldmodel = {Lo: One, Hi: Two, Med: Three} return worldmodel[Hi]

actions = {One, Two, Three}

Hi > Med > Lo

（请注意，代理是如何嵌入环境中的。）这是我们不知道如何回答的另一个问题。这似乎很容易：只要在每个动作上迭代，就找出如果采取了该动作，代理商将获得哪些结果，然后选择导致最佳结果的动作。但是事实上，在这个思想实验中，代理是确定性计算机程序的确定性子过程：恰好有一个操作代理人要输出，并且询问如果一个确定性部分的确定性部分，则询问“会发生什么”deterministic program did something that it doesn’t do is ill-defined.

In order to evaluate what “would happen” if the agent took a different action, a “counterfactual environment” (where the agent does something that it doesn’t) must be constructed. Satisfactory theories of counterfactual reasoning do not yet exist. We don’t yet understand how to identify the best action available to an agent embedded within its environment, even in theory, even given full knowledge of the universe and our preferences and given unlimited computing power.

解决此问题将需要更好地理解反事实推理；这是决策理论的领域。

Decision Theory Peterson’s textbook explains the field of normative decision theory in broad strokes. For a quicker survey, with a stronger focus on Newcomblike problems, see Muehlhauser’s “决策理论常见问题解答。” Game Theory Many open problems in decision theory involve multi-agent settings. I have heard good things about Tadelis’ textbook, but have not read it myself. You also may have luck with Scott Alexander’s “游戏理论简介”在Lesswrong上。 chapters 1-5 (+6-9 if enthusiastic) Provability Logic Toy models of multi-agent settings can be studied in an environment where agents base their actions on the things that they can prove about other agents in the same environment. Our current toy models make heavy use of provability logic.

Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws). My talk “你为什么不钱？” briefly touches upon both these points. To learn more, I suggest the following resources:

Soares＆Fallenstein的“Toward idealized decision theory” serves as a general overview, and further motivates problems of decision theory as relevant to MIRI’s research program. The paper discusses the shortcomings of two modern decision theories, and discusses a few new insights in decision theory that point toward new methods for performing counterfactual reasoning.

If “Toward idealized decision theory” moves too quickly, this series of blog posts may be a better place to start:

Yudkowsky的“这true Prisoner’s Dilemma” explains why cooperation isn’t automatically the ‘right’ or ‘good’ option.
Soares’Causal decision theory is unsatisfactory” uses the Prisoner’s Dilemma to illustrate the importance of non-causal connections between decision algorithms.
Yudkowsky的“Newcomb’s problem and regret of rationality” argues for focusing on decision theories that ‘win,’ not just on ones that seem intuitively reasonable. Soares’ “Introduction to Newcomblike problems” covers similar ground.
Soares’Newcomblike problems are the norm” notes that human agents probabilistically model one another’s decision criteria on a routine basis.

Miri的研究亚博体育官网导致了“无更新决策理论”（UDT）的发展，这是一种新的决策理论，旨在解决上述许多缺点。

Hintze’s “Problem class dominance in predictive dilemmas” summarizes UDT’s dominance over other known decision theories, including Timeless Decision Theory (TDT), another theory that dominates CDT and EDT.
Fallenstein’s “A model of UDT with a concrete prior over logical statements” provides a probabilistic formalization.

However, UDT is by no means a solution, and has a number of shortcomings of its own, discussed in the following places:

SLEPNEV的“An example of self-fulfilling spurious proofs in UDT” explains how UDT can achieve sub-optimal results due to spurious proofs.
Benson-Tilsen’s “UDT与已知的搜索顺序”是一个不满意的解决方案。它包含UDT和已知证明搜索订单的形式化，并证明了使用一种称为“与宇宙一起玩鸡肉”的技术，以避免使用虚假的证据。

为了研究多代理设置，帕特里克·拉维克托尔（Patrick Lavictoire）开发了模态代理框架，该框架还使我们能够利用可培养地逻辑在决策理论领域取得了一些新的进步：

Barasz et al。”囚犯困境中的强大合作” allows us to consider agents which decide whether or not to cooperate with each other based only upon what they can证明about each other’s behavior. This prevents infinite regress; in fact, the behavior of two agents which act only according to what they can prove about the behavior of the other can be determined in quadratic time using results from provability logic.

Further Reading

UDT是由Wei Dai和Vladimir Slepnev等开发的。dai的“走向新的决策理论”介绍了这个想法，Slepnev的“A model of UDT with a halting oracle”提供了早期的正式化。SLEPNEV还描述了UDT一个奇怪的问题，其中似乎因智力较少而获得了奖励，在“代理模拟预测指标”.

这se blog posts are of historical interest, but nearly all of their content is in ”Toward idealized decision theory”, above.

Logical Uncertainty

Imagine a black box, with one input chute and two output chutes. A ball can be put into the input chute, and it will come out of one of the two output chutes. Inside the black box is a Rube Goldberg machine which takes the ball from the input chute to one of the output chutes.

A perfect probabilistic reasoner who doesn’t know which Rube Goldberg machine is in the box doesn’t know how the box will behave, but if they could figure out which machine is inside the box, then they would know which chute would take the ball. This reasoner isenvironmentally uncertain。

A realistic reasoner might know which machine is in the box, and might know exactly how the machine works, but may lack the deductive capability to figure out where the machine will drop the ball. This reasoner islogically uncertain.

概率理论假定逻辑无所不知。它假设推理者知道他们所知道的事物的所有后果。实际上，有限的推理器在逻辑上并不是无所不知的：我们可以准确地知道该盒子的实施方式和机器的工作原理，而只是没有时间推断球出来的地方。我们在逻辑不确定性下推理。

A formal theory of reasoning under logical uncertainty does not yet exist. Gaining this understanding is extremely important when it comes to constructing a highly reliable generally intelligent system: whenever an agent reasons about the behavior of complex systems, computer programs, or other agents, it must operate under at least a little logical uncertainty.

要理解艺术的状态，必须对概率理论有一个可靠的理解。考虑增加前几章Jaynes和伙计，第1、5、6和9章，然后研究以下论文：

Soares＆Fallenstein的“Questions of reasoning under logical uncertainty”提供了一般介绍，解释了逻辑不确定性领域，并激发了其与Miri研究计划的相关性。亚博体育官网
Gaifman的“Concerning measures in first-order calculi” looked at this problem many years ago. Gaifman has largely focused on a relevant subproblem, which is the assignment of probabilities to different models of a formal system (assuming that once the model is known, all consequences of that model are known). We are now attempting to expand this approach to a more complete notion of logical uncertainty (where a reasoner can know what the model is but not know the implications of that model), but work by Gaifman is still useful to gain a historical context and an understanding of the difficulties surrounding logical uncertainty.
Hutter et al。”Probabilities on sentences in an expressive logic” largely looks at the problem of logical uncertainty assuming access to infinite computing power (and many levels of halting oracles). Understanding Hutter’s approach (and what can be done with infinite computing power) helps flesh out our understanding of where the difficult questions lie.
Demski’s “Logical prior probability” provides a computably approximable logical prior. Following Demski, our work largely focuses on the creation of an approximable prior probability distribution over logical sentences, as the act of refining and approximating a logical prior is very similar to the act of reasoning under logical uncertainty in general.
Christiano’s “Non-omniscience, probabilistic inference, and metamathematics” largely follows this approach. This paper provides some early practical considerations about the generation of logical priors, and highlights a few open problems.

Further Reading

有关此问题的更多历史工作，请参见Gaifman的“Probabilities over rich languages…” and “Reasoning with limited resources and assigning probabilities to arithmetic statements。”

Vingean Reflection

Much of what makes the AI problem unique is that a sufficiently advanced system will be able to do higher-quality science and engineering than its human programmers. Many of the possible hazards and benefits of an advanced system stem from its potential to bootstrap itself to higher levels of capability, possibly leading to anintelligence explosion。

如果一个代理实现通过recursi超智ve self-improvement, then the impact of the resulting system depends entirely upon the ability of the initial system to reason reliably about agents that are more intelligent than itself. What sort of reasoning methods could a system use in order to justify extremely high confidence in the behavior of a yet more intelligent system? We refer to this sort of reasoning as “Vingean reflection”, after Vernor Vinge (1993), who noted that it is not possible in general to precisely predict the behavior of agents which are more intelligent than the reasoner.

A reasoner performing Vingean reflection must necessarily reasonabstractlyabout the more intelligent agent. This will almost certainly require some form of high-confidence logically uncertain reasoning, but in lieu of a working theory of logical uncertainty, reasoning about proofs (using formal logic) is the best available formalism for studying abstract reasoning. As such, a modern study of Vingean reflection requires a background in formal logic:

一阶逻辑 Miri现有用于研究自我修改剂的玩具模型主要基于此逻辑。了解一阶逻辑的细微差别对于使用我们开发的工具来研究能够对类似系统的信心的正式系统开发的工具至关重要。亚博体育苹果app官方下载

我们通过构建能够在高度相似系统中获得某种形式的信心的代理商的玩具模型来研究Vingean的反思。亚博体育苹果app官方下载要到达最前沿，请阅读以下论文：

Fallenstein＆Soares的“”Vingean reflection: Reliable reasoning for self-improving agents” introduces the field of Vingean reflection, and motivates its connection to MIRI’s research program.
Yudkowsky的“拖延悖论” goes into more detail on the need for satisfactory solutions to walk a fine line between the Löbian obstacle (a problem stemming from too little “self-trust”) and unsoundness that come from too很多自信。
Christiano等人的“概率逻辑中真理的确定性” describes an early attempt to create a formal system that can reason about itself while avoiding paradoxes of self-reference. It succeeds, but has ultimately been shown to be unsound. My演练for this paper may help put it into a bit more context.
Fallenstein＆Soares的“”Problems of self-reference in self-improving space-time embedded intelligence” describes our simple suggester-verifier model for studying agents that produce slightly improved versions of themselves, or ’tile’ themselves. The paper demonstrates a toy scenario in which sound agents can successfully tile to (e.g., gain high confidence in) other similar agents.

Further Reading

Yudkowsky＆Herreshoff的“Tiling agents for self-modifying AI”是较旧的，切碎的Vingean反思介绍，使用我的使用可能更容易工作演练。

If you’re excited about this research topic, there are a number of other relevant tech reports. Unfortunately, most of them don’t explain their motivations well, and have not yet been put into their greater context.

Fallenstein’s “概率逻辑中的拖延” illustrates how Christiano et al.’s probabilistic reasoning system is unsound and vulnerable to the procrastination paradox. Yudkowsky’s “分布允许平铺……”需要一些早期的步骤s probabilistic tiling settings.

Fallenstein’s “Decreasing mathematical strength……”描述了参数多态性的一种不令人满意的特性，这是对洛比亚障碍的部分解决方案。Soares’法伦斯坦的怪物” describes a hackish formal system which avoids the above problem. It also showcases a mechanism for restricting an agent’s goal predicate which can also be used by Parametric Polymorphism to create a less restrictive version of PP than the one explored in the tiling agents paper. Fallenstein’s “An infinitely descending sequence of sound theories……”描述了对Löbian障碍的一种更优雅的部分解决方案，这是我们最喜欢的部分解决方案之一。

对递归序列的理解提供了一个有用的背景，可以从中了解这些结果，并可以通过阅读Franzén的“Transfinite progressions: A second look at completeness.”

Corrigibility

As artificially intelligent systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system “corrigible” if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences.

这个研究领域基本上是全新的，亚博体育官网因此要迅速迅速而要做的就是阅读一两篇论文：

Soares et al.’s “Corrigibility” introduces the field at large, along with a few open problems.
阿姆斯特朗的“通过冷漠学习适当的价值学习” discusses one potential approach for making agents indifferent between which utility function they maximize, which is a small step towards agents that allow themselves to be modified.

我们当前的工作可订正主要on a small subproblem known as the “shutdown problem”: how do you construct an agent that shuts down upon the press of a shutdown button, and which does not have incentives to cause or prevent the pressing of the button? Within that subproblem, we currently focus on the utility indifference problem: how could you construct an agent which allows you to switch which utility function it maximizes, without giving it incentives to affect whether the switch occurs? Even if we had a satisfactory solution to the utility indifference problem, this would not yield a satisfactory solution to the shutdown problem, as it still seems difficult to adequately specify “shutdown behavior” in a manner that is immune to perverse instantiation. Stuart Armstrong has written several blog posts about the specification of “reduced impact” AGIs:

“Domesticating reduced impact AIs”
“Reduced impact AI: no back channels”

这些第一次尝试尚不完整的解决方案，但是它们应该使您能够加快我们当前对问题的理解。

Further Reading

Early work in corrigibility can be found on the web forumLess Wrong。上述论文中捕获了大多数相关结果。其中一个更有趣的是“Cake or Death”, an example of the “motivated value selection” problem. In this example, an agent with uncertainty about its utility function benefits from avoiding information that reduces its uncertainty.

阿姆斯特朗的“影响减少的数学：需要帮助”列出了指定减少影响力的初始想法，他的“练习中的影响减少：随机抽样未来” sketches a simple method for assessing whether a future has been impacted.

阿姆斯特朗的“Utility indifference”概述了原始的效用漠不关心的想法，并且由于历史原因而在很大程度上很有趣。上面链接的“通过冷漠学习”论文的“适当的价值学习”所包含。

Value Learning

Since our own understanding of our values is fuzzy and incomplete, perhaps the most promising approach for loading values into a powerful AI is to specify a criterion for the agent tolearnour values incrementally. But this presents a number of interesting problems:

假设您构建了一个训练组，其中包含许多充满幸福的人（标记为“好”）和其他充满悲伤人类（标记为“坏”）的结果。从这些数据中，最简单的概括可能是人类真的很喜欢人类形的微笑事物：然后，该代理商可能会尝试建立许多微小的动画愉快的人。

价值学习必须是一个在线过程：该系统必须能够识别歧义并向用户提出有关这些歧义的疑问。亚博体育苹果app官方下载它不仅必须确定它不知道如何进行分类的情况（例如，它无法确定脸部是否看起来快乐还是悲伤的情况），而且还also识别培训数据没有提供信息的维度（例如，当您的培训数据从未显示出填充有人形自动机的结果时，看起来很快乐，标记为毫无价值）。

Of course, ambiguity identification alone isn’t enough: you don’t want a system that spends the first three weeks asking for clarification on whether humans are still worthwhile when they are at different elevations, or when the wind is blowing, before finally (after the operators have stopped paying attention) asking whether it’s important that the human-shaped things be acting of their own will.

为了使代理商可靠地学习我们的意图, the agent must be constructing and refining a model of its operator and using that model to inform its queries and alter its preferences. To learn more about these problems and others, see the following:

Soares’这value learning problem”提供了一个通用的一些开放问题的概述ms related to value learning.
Dewey’s “Learning what to value” further discusses the difficulty of value learning.
这orthogonality thesis认为默认情况下不会解决价值学习。
MacAskill’s “规范不确定性” provides a framework for discussing normative uncertainty. Be warned, the full work, while containing many insights, is very long. You can get away with skimming parts and/or skipping around some, especially if you’re more excited about other areas of active research.

Further Reading

一种方法解决规范的不确定性Bostrom & Ord’s “议会模型”，这表明价值学习在一定程度上等同于选民的聚集问题，并且许多价值学习系统可以建模为议会投票系统（选民可能是实用程序功能）。亚博体育苹果app官方下载

Owen Cotton-Barratt’s “Geometric reasons for normalising…” discusses the normalization of utility functions; this is relevant to toy models of reasoning under moral uncertainty.

Fallenstein & Stiennon’s “响亮” discusses a concern with aggregating utility functions stemming from the fact that the preferences encoded by utility functions are preserved under positive affine transformation (e.g. as the utility function is scaled or shifted). This implies that special care is required in order to normalize the set of possible functions.

Other Tools

任何主题的掌握都可以是一个非常强大的工具，尤其是在数学领域，在那里看似不相交的话题实际上是密切相关的。许多数学领域都有一个属性，如果您非常了解它们，那么无论您走到哪里，这种理解都是有用的。考虑到这一点，尽管为了了解Miri的积极研究而不是必需的主题，但对这些主题中的每个主题的理解构成了数学工具箱中的附加工具，在进行新研究时通常会非常有用。亚博体育官网

Discrete Math Textbook availableonline。大多数数学研究要么连续或离散结构。许多人发现离散数学更直观，对离散数学的深入了解将帮助您快速了解许多其他数学工具的离散版本，例如小组理论，拓扑和信息理论。 Linear Algebra 线性代数是几乎在数学中几乎所有地方显示的工具之一。对线性代数的可靠理解将在许多域中有所帮助。 Type Theory Set theory commonly serves as the foundation for modern mathematics, but it’s not the only available candidate. Type theory can also serve as a foundation for mathematics, and in many cases, type theory is a better fit for the problems at hand. Type theory also bridges much of the theoretical gap between computer programs and mathematical proofs, and is therefore often relevant to certain types of AI research. Category Theory 类别理论研究了许多数学结构在非常高的抽象水平上。这可以帮助您注意数学不同分支中的模式，并使将您的数学工具从一个域转移到另一个域变得容易得多。 Topology Topology is another one of those subjects that shows up pretty much everywhere in mathematics. A solid understanding of topology turns out to be helpful in many unexpected places. Computability and Complexity MIRI’s math research is working towards solutions that will eventually be relevant to computer programs. A good intuition for what computers are capable of is often essential. 程序验证 Program verification techniques allow programmers to become confident that a specific program will actually act according to some specification. (It is, of course, still difficult to validate that the specification describes the intended behavior.) While MIRI’s work is not currently concerned with verifying real-world programs, it is quite useful to understand what modern program verification techniques can and cannot do.

了解任务

为什么首先要进行此类研究？亚博体育官网

超级智能本指南在很大程度上假设您已经掌握了Miri的使命，但是如果您想知道为什么这么多人认为这是一个重要且紧急的研究领域，那么亚博体育官网超级智能provides a nice overview. Rationality: From AI to Zombies 这种电子书籍汇编了六卷论文，这些论文解释了Miri对AI的看法背后的许多哲学和认知科学。 Inadequate Equilibria A discussion of microeconomics and epistemology as they bear on spotting societal missteps and blind spots, including neglected research opportunities. An attempt to answer the basic question, “When can ambitious projects to achieve unusual goals hope to succeed?”

A Guide to MIRI’s Research

by Nate Soares

How to Use This Guide

这Basics

Set Theory

Computability and Logic

概率理论

Probabilistic Inference

Statistics

机器学习

Artificial Intelligence

Realistic World-Models

Decision Theory

Decision Theory

Game Theory

Provability Logic

Logical Uncertainty

Vingean Reflection

一阶逻辑

Corrigibility

Value Learning

Other Tools

Discrete Math

Linear Algebra

Type Theory

Category Theory

Topology

Computability and Complexity

程序验证

了解任务

超级智能

Rationality: From AI to Zombies

Inadequate Equilibria