- 4.Do researchers think AI is imminent?
- 5。What technical problems are you working on?
我们的使命宣言是“确保创造聪明的人工智能具有积极影响。”这是一个雄心勃勃的目标，但我们相信一些early progressis possible, and we believe that the goal’s importance and difficulty makes it prudent to begin work at an early date.
我们的两项研究议亚博体育官网程，“Agent Foundations for Aligning Machine Intelligence with Human Interests“and “Value Alignment for Advanced Machine Learning Systems，“专注于三组技术问题：
- 高度可靠的代理设计- 学习如何指定高度自主系统，可靠地追求一些固定目标;亚博体育苹果app官方下载
- 价值规范— supplying autonomous systems with the intended goals; and
- 误差容忍— making such systems robust to programmer error.
If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond定性改进in cognition, Nick Bostrom notesmore straightforward advantages we could realize in digital minds那e.g.:
- 可编辑- “更容易尝试软件中的参数变化而不是神经湿润。”2
- 速度- “光速大于神经传输的速度超过了一百万倍，突触尖峰耗散的热量比热力学的所需更多的热量，并且电流晶体管频率比神经元尖刺频率快得多百万倍。“
- 串行深度— On short timescales, machines can carry out much longer sequential processes.
- storage capacity- 计算机可以合理地具有更大的工作和长期记忆。
- 尺寸- 计算机可以比人类大脑大得多。
- 重复性- 复制到新硬件上的软件可能比生物再现更快，更高的保真度。
Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.
Present-day AI algorithms already demand special safety guarantees when they must act in important domains without human oversight, particularly when they or their environment can change over time:
实现这些收益(从自治系统)wi亚博体育苹果app官方下载ll depend on development of entirely new methods for enabling “trust in autonomy” through verification and validation (V&V) of the near-infinite state systems that result from high levels of [adaptability] and autonomy. In effect, the number of possible input states that such systems can be presented with is so large that not only is it impossible to test all of them directly, it is not even feasible to test more than an insignificantly small fraction of them. Development of such systems is thus inherently unverifiable by today’s methods, and as a result their operation in all but comparatively trivial applications is uncertifiable.
The largest and most lasting changes in human welfare have come from scientific and technological innovation — which in turn comes from our intelligence. In the long run, then, much of AI’s significance comes from its potential to automate and enhance progress in science and technology. The creation of smarter-than-human AI brings with it the basic risks and benefits of intellectual progress itself, at digital speeds.
The primary concern is not spooky emergent consciousness but simply the ability to makehigh-quality decisions。在这里，质量是指采取的采取行动的预期结果效用，其中包括人工设计师指定的实用程序。现在我们有问题：
- Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function ofN.variables, where the objective depends on a subset of sizek
那will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.4.
Bostrom’s “过度智力意志“更详细地说明这两个问题：我们可能无法在编程比较智能的AI系统中正确指定我们的实际目标，并且大多数优化错过的目标的代理商会使人类对待人类的激励措施或潜在的威胁亚博体育苹果app官方下载obstacles to achieving the agent’s goal.
2013年初,博斯特罗姆和穆勒的调查hundred top-cited living authors in AI, as ranked by Microsoft Academic Search. Conditional on “no global catastrophe halt[ing] progress,” the twenty-nine experts who responded assigned a median 10% probability to our developing a machine “that can carry out most human professions at least as well as a typical human” by the year 2023, a 50% probability by 2048, and a 90% probability by 2080.5.
Most researchers at MIRI approximately agree with the 10% and 50% dates, but think that AI could arrive significantly later than 2080. This is in line with Bostrom’s analysis in超明：
Historically, AI researchers have not had a strong record of being able to predict the rate of advances in their own field or the shape that such advances would take. On the one hand, some tasks, like chess playing, turned out to be achievable by means of surprisingly simple programs; and naysayers who claimed that machines would “never” be able to do this or that have repeatedly been proven wrong. On the other hand, the more typical errors among practitioners have been to underestimate the difficulties of getting a system to perform robustly on real-world tasks, and to overestimate the advantages of their own particular pet project or technique.
鉴于专家（和非专家）在预测AI的进展方面较差，we are relatively agnostic about when full AI will be invented。It could come sooner than expected, or later than expected.
为了实现现实世界的目标更有效ely than a human, a general AI system will need to be able to learn its environment over time and decide between possible proposals or actions. A simplified version of the alignment problem, then, would be to ask how we could construct a system that learns its environment and has a very crude decision criterion, like “Select the policy that maximizes the expected number of diamonds in the world.”
高度可靠的代理设计is the technical challenge of formally specifying a software system that can be relied upon to pursue some preselected toy goal. An example of a subproblem in this space isontology identification：如何将“最大化钻石”中的目标正式正式，允许完全自治的代理人可能最终在意想不到的环境中，并可能构建意外的假设和政策？即使我们在世界上没有染色的计算能力和所有时间，我们目前没有知道如何解决这个问题。这表明我们不仅丢失了实用算法，而且是一个理解问题的基本理论框架。
We can distinguish highly reliable agent design from the problem of价值规范：“一旦我们了解如何设计一个促进目标的自主AI系统，我们如何确保其目标实际上与我们想要的东西相匹配？”亚博体育苹果app官方下载由于人为错误是不可避免的，并且我们需要能够安全地监督和重新设计AI算法，即使它们在认知任务中接近人类等价，Miri也在正式化上工作宽容耐堵塞agent properties.人工智能：一种现代方法那the standard textbook in AI, summarizes the challenge:
Yudkowsky […] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design — to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.6.
The importance of AI safety work is outlined inQ3, above。我们将问题视为时间敏感：
- 忽视- 只有少数人目前正在努力Miri技术议程中概述的公开问题。
- 表观困难— Solving the alignment problem may demand a large number of researcher hours, and may also be harder to parallelize than capabilities research.
- risk asymmetry— Working on safety too late has larger risks than working on it too early.
- AI时间轴不确定性— AI could progress faster than we expect, making it prudent to err on the side of caution.
- AI中的不连续进步— Progress in AI is likely to speed up as we approach general AI. This means that even if AI is many decades away, it would be hazardous to wait for clear signs that general AI is near: clear signs may only arise when it’s too late to begin safety work.
We also think it is possible to do useful work in AI safety today, even if smarter-than-human AI is 50 or 100 years away. We think this for a few reasons:
- lack of basic theory— If we had simple idealized models of what we mean by correct behavior in autonomous agents, but didn’t know how to design practical implementations, this might suggest a need for more hands-on work with developed systems. Instead, however, simple models are what we’re missing. Basic theory doesn’t necessarily require that we have experience with a software system’s implementation details, and the same theory can apply to many different implementations.
- precedents- 理论计算机科学家在相对缺乏实际实施的基本理论方面重复了成功。（知名的例子包括Claude Shannon，Alan Tures，Andrey Kolmogorov和Judea Pearl。）
- 早期结果- 我们已经取得了重大进展，因为优先考虑我们正在寻找的一些理论问题，特别是在decision theoryandlogical uncertainty。这表明有很低的理论水果将被挑选。
Finally, we expect progress in AI safety theory to be useful for improving our understanding of robust AI systems, of the available technical options, and of the broader strategic landscape. In particular,we expect transparency to be necessary for reliable behavior那and we think there are basic theoretical prerequisites to making autonomous AI systems transparent to human designers and users.
Having the relevant theory in hand may not be strictly necessary for designing smarter-than-human AI systems — highly reliable agents may need to employ very different architectures or cognitive algorithms than the most easily constructed smarter-than-human systems that exhibit unreliable behavior. For that reason, some fairly general theoretical questions may be more relevant to AI safety work than to mainline AI capabilities work. Key advantages to AI safety work’s informativeness, then, include:
- general value of information— Making AI safety questions clearer and more precise is likely to give insights into what kinds of formal tools will be useful in answering them. Thus we’re less likely to spend our time on entirely the wrong lines of research. Investigating technical problems in this area may also help us develop a better sense for how difficult the AI problem is, and how difficult the AI alignment problem is.
- requirements for informative testing— If the system is opaque, then online testing may not give us most of the information that we need to design safer systems. Humans are opaque general reasoners, and studying the brain has been quite useful for designing more effective AI algorithms, but it has been less useful for building systems for verification and validation.
- 安全测试要求- 从不透明系统中提取信息可能不安全，因为我们构建的任何沙箱可能具有显而易见的缺亚博体育苹果app官方下载陷，而不是人类。
Miri是一个主要由亚博体育官网中小型捐助者资助的研究非营利组织。捐赠are therefore helpful for funding our mathematics work, workshops, academic outreach, etc.
- 尼尔森（2009年）。The Quest for Artificial Intelligence。Cambridge University Press.↩
- Bostrom (2014).超明：路径，危险，策略。Oxford University Press.↩
- 美国空军首席科学家（2010年）办公室。Technology Horizons: A Vision for Air Force Science and Technology 2010-30。↩
- Russell (2014). “神话和月光。“Edge.org.。Edge Foundation，Inc。↩
- Müller和波斯特拉姆（2014年）。“人工智能未来进展：专家意见调查。“In Müller (ed.),Fundamental Issues of Artificial Intelligence。Springer。↩
- Russell and Norvig (2009).人工智能：一种现代方法。皮尔逊。↩