Aligning advanced AI with human interests

MIRI’s mission is to ensure that the creation of smarter-than-human intelligence has
a positive impact. We aim to make advanced intelligent systems behave as
我们打算立即即使没有人类的年代upervision.

代理基金会技术议程
(High reliability focus)

机器学习技术议程
(误差焦点焦点)

非常可靠
代理设计

最佳推理是什么?
看起来像物理世界的资源有限代理?

Miri专注于可以制造的AI方法透明(e.g., precisely specified decision algorithms, not genetic algorithms), so that humans can understand why AI systems behave as they do. For safety purposes, a mathematical equation defining general intelligence is more desirable than an impressive but poorly-understood code kludge.

因此,我们的大部分研究旨亚博体育官网在将理论基础放在AI稳健性工作下。我们考虑传统决策和概率理论经常分解的设置:设置在哪里computation is expensive,没有尖锐的agent/environment boundary多个代理人存在,或者self-referential reasoningis admitted.


Logical Induction

亚克西州的ePrint:1609.03543 [Cs.ai]。

逻辑归纳

We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. We show that it satisfies a number of intuitive desiderata, including: (1) it learns to predict patterns of truth and falsehood in logical statements, often long before having the resources to evaluate the statements, so long as the patterns can be written down in polynomial time; (2) it learns to use appropriate statistical summaries to predict sequences of statements whose truth values appear pseudorandom; and (3) it learns to have accurate beliefs about its own current beliefs, in a manner that avoids the standard paradoxes of self-reference.

These properties and many others follow from alogical induction criterion,这是一系列股票交易类比的动机。粗略地说,每个逻辑句子φis associated with a stock that is worth $1 per share ifφ是真实的,没有什么,我们解释了逻辑上不确定推理的信仰状态作为一套市场价格,其中pN.φ)= 50%意味着当天N.,股票φ可以从推理中购买或销售50¢。逻辑归纳标准说(非常大致),不应该有任何多项式可计算的交易策略,具有有限的风险公差,这些交易策略在该市场上赚取了无限性的利润。


A Formal Solution to the Grain of Truth Problem

人工智能的不确定性:第三十二次会议的会议记录(2016年)

对真理问题的正式解决方案在多代理的贝叶斯代理城市N.t learns to predict the other agents’ policies if its prior assigns positive probability to them (in other words, its prior contains agrain of truth)。找到一个合理的大量政策,其中包含贝叶斯 - 最佳政策的关于这一课程的最佳政策被称为真理问题。Only small classes are known to have a grain of truth and the literature contains several related impossibility results.

In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically. However, agents based on Thompson sampling converge to play ε-Nash equilibria in arbitrary unknown computable multi-agent environments. While these results are purely theoretical, we show that they can be computationally approximated arbitrarily closely.


Functional Decision Theory: A New Theory of Instrumental Rationality

亚克西夫雷:1710.05060 [cs.ai]。

功能决策理论:一种新的乐器理性理论本文介绍并激励了一种新的决策理论functional decision theory(FDT), as distinct from causal decision theory and evidential decision theory. Functional decision theorists hold that the normative principle for action is to treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?” Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decision-theoretic and game-theoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem. In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decision-making.


Proof-Producing Reflection for HOL

互动定理证明:第六次国际会议,ITP 2015,南京,中国,2015年8月24日至27日,诉讼

Proof-producing reflection for HOLWe present a reflection principle of the form “If ⌜⌝ is provable, then ” implemented in the HOL4 theorem prover, assuming the existence of a large cardinal. We use the large-cardinal assumption to construct a model of HOL within HOL, and show how to ensure has the same meaning both inside and outside of this model. Soundness of HOL implies that if ⌜⌝ is provable, then it is true in this model, and hence holds. We additionally show how this reflection principle can be extended, assuming an infinite hierarchy of large cardinals, to implementmodel polymorphism,一种专为验证具有自我替代功能的系统的技术。亚博体育苹果app官方下载

Error Tolerance
and
Value Learning

How can an advanced learning system be made to accept and
assist with online debugging
and adjustment of its goals?

使用培训数据来教导先进的AI系统我们的价值看起来更有希望,而不是试图在手头关心的一亚博体育苹果app官方下载切中编写。但是,我们很少了解如何在培训数据对代理人的未来环境中取代时辨别,或者如何确保代理不仅要学习关于我们的价值观,但接受它们自己。

此外,追求某些目标的理性代理商可以激励保护其目标内容。无论他们目前的目标是什么,如果代理人继续推广它,那么它很可能会更好地服务,而不是如果代理人改变目标。这表明可能难以随着时间的推移改善代理商与人类兴趣的对齐,特别是当代理人足够智能以模拟并适应其程序员的目标时。制作价值学习系统亚博体育苹果app官方下载宽容耐堵塞可能是安全在线学习所必需的。


The Value Learning Problem

presented at the IJCAI 2016 Ethics for Artificial Intelligence workshop.

The Value Learning Problem一台超级智能机器不会自动按预期行动:它将充当编程,但人类意图和书面代码之间的适合可能会很差。我们讨论可以构建系统以了解该系统的方法。亚博体育苹果app官方下载我们突出了特定于归纳价值学习的开放问题(从标记的培训数据),并提高了一些关于构建运营商偏好并相应行动的系统的问题。亚博体育苹果app官方下载


易燃

在Aaai 2015伦理和人工智能研讨会上提出。

易燃As AI systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system “corrigible” if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences. We introduce the notion of corrigibility and analyze utility functions that attempt to make an agent shut down safely if a shut-down button is pressed, while avoiding incentives to prevent the button from being pressed or cause the button to be pressed, and while ensuring propagation of the shut-down behavior as it creates new subsystems or self-modifies. While some proposals are interesting, none have yet been demonstrated to satisfy all of our intuitive desiderata, leaving this simple problem in corrigibility wide-open.

预测

When will highly adaptive and general machine intelligence be invented, and under what circumstances?

In addition to our mathematical research, MIRI investigates important strategic questions. What can (and can’t) we predict about the future of AI? How can we improve our forecasting ability? Which interventions available today appear to be the most beneficial, given what little wedo知道?


The Ethics of Artificial Intelligence

剑桥人工智能手册

人工智能的伦理创建思维机器的可能性提出了许多道德问题。这些问题涉及确保这些机器不会损害人类和其他道德相关的生物,以及机器本身的道德地位。第一部分讨论了在不久的未来可能出现的问题。第二部分概述了确保AI在其智力中接近人类的挑战,以确保AI安全运行。第三部分概述了我们如何评估是否在什么情况下,AIS本身都有道德地位。在第四部分,我们考虑某些基本方面的AIS可能与人类的某些基本方面有何不同。最后一节讨论了创造比人类更聪明的问题的问题,并确保他们利用他们的高级智力而不是生病。


Formalizing Convergent Instrumental Goals

在Aaai 2016 AI,伦理和社会研讨会上提出。

Formalizing convergent instrumental goalsOmohundro has argued that sufficiently advanced AI systems of any design would, by default, have incentives to pursue a number of instrumentally useful subgoals, such as acquiring more computing power and amassing many resources. Omohundro refers to these as “basic AI drives,” and he, along with Bostrom and others, has argued that this means great care must be taken when designing powerful autonomous systems, because even if they have harmless goals, the side effects of pursuing those goals may be quite harmful. These arguments, while intuitively compelling, are primarily philosophical. In this paper, we provide formal models that demonstrate Omohundro’s thesis, thereby putting mathematical weight behind those intuitive claims.


Intelligence Explosion Microeconomics

Miri 2013-1技术报告。

Intelligence Explosion MicroeconomicsI.J.良好建议,充分先进的机器智能可以构建自身的更智能版本,这可能反过来建立一个甚至更聪明的版本,并且这个过程可以继续超越人类能力的程度。我们如何模拟和测试这个假设?

We identify the key issue as returns on cognitive reinvestment—the ability to invest more computing power, faster computers, or improved cognitive algorithms to yield cognitive labor which produces larger brains, faster brains, or better mind designs. Many phenomena have been claimed as evidence for various positions in this debate, from the observed course of hominid evolution to Moore’s Law to the competence over time of chess programs. This paper explores issues that arise when trying to interpret this evidence in light of Good’s hypothesis, and proposes that the next step in this research is to formalize return-on-investment curves, so that each position can formally state which models they hold to be falsified by historical observations.