Subsystem Alignment

November 6, 2018|Scott Garrabrant|yabo app

你想弄清楚一些东西，但你不知道该怎么做。

您必须以某种方式将任务分解为子计算。没有“思维”的原子行为;智能必须由非智能部件建立。

由部分制成的代理是所做的一部分反应性hard, since the agent may have to reason about impossible configurations of those parts.

What we’re primarily going to discuss in this section, though, is another problem: when the agent is made of parts, there could beadversariesnot just in the external environment, but inside the agent as well.

这种问题是Subsystem Alignment: ensuring that subsystems are not working at cross purposes; avoiding subprocesses optimizing for unintended goals.

benign induction

benign optimization

透明度

mesa-optimizers

阅读更多 ”

Robust Delegation

2018年11月4日|Abram Demski.|yabo app

Self-improvement

Becausethe world is big，代理人可能不充分，以实现其目标，包括思考的能力。

因为代理是made of parts, it can improve itself and become more capable.

Improvements can take many forms: The agent can make tools, the agent can make successor agents, or the agent can just learn and grow over time. However, the successors or tools need to be more capable for this to be worthwhile.

This gives rise to a special type of principal/agent problem:

You have an initial agent, and a successor agent. The initial agent gets to decide exactly what the successor agent looks like. The successor agent, however, is much more intelligent and powerful than the initial agent. We want to know how to have the successor agent robustly optimize the initial agent’s goals.

这里有三个例子的forms this principal/agent problem can take:

Three principal-agent problems in robust delegation

在里面AI alignment problem, a human is trying to build an AI system which can be trusted to help with the human’s goals.

在里面tiling agents problem, an agent is trying to make sure it can trust its future selves to help with its own goals.

Or we can consider a harder version of the tiling problem—稳定的自我改善—where an AI system has to build a successor which is more intelligent than itself, while still being trustworthy and helpful.

For a human analogy which involves no AI, you can think about the problem of succession in royalty, or more generally the problem of setting up organizations to achieve desired goals without losing sight of their purpose over time.

困难似乎是双重的：

首先，人或AI代理人可能无法完全理解自己和自己的目标。如果一个代理商不能以确切的细节写出它想要的东西，这使得它很难保证其继任者会强大地帮助目标。

其次，委派工作背后的想法是你不必自己做所有的工作。您希望继承者能够以某种程度的自主行动，包括学习您不知道的新事物，并挥舞新的技能和能力。

在里面limit, a really good formal account of robust delegation should be able to handle arbitrarily capable successors without throwing up any errors—like a human or AI building an unbelievably smart AI, or like an agent that just keeps learning and growing for so many years that it ends up much smarter than its past self.

The problem is not (just) that the successor agent might be malicious. The problem is that we don’t even know what it means not to be.

This problem seems hard from both points of view.

继承人

初始代理需要弄清楚比它更强大的可靠和值得信赖，这似乎很难。但继任者代理必须弄清楚在初始代理人甚至无法理解的情况下要做什么，并试图尊重继任者可以看到的东西的目标不一致, which also seems very hard.

At first, this may look like a less fundamental problem than “做决定“ 要么 ”have models”. But the view on which there are multiple forms of the “build a successor” problem is itself a二元view.

对嵌入式代理人来说，未来的自我不特权;它只是环境的另一部分。建立一个分享您的目标的后继人员之间没有深入差异，并确保您自己的目标随着时间的推移而保持相同。

所以，虽然我谈论“初始”和“继任者”代理商，但请记住，这不仅仅是关于人类目前瞄准继任者的狭隘问题。这是关于成为持续和学会随着时间的基本问题的根本问题。

我们称之为这个问题Robust Delegation。Examples include:

Vingean reflection

thetiling problem

avertingGoodhart’s law

value loading

corrigibility

informed oversight

阅读更多 ”

Embedded World-Models

November 2, 2018|Scott Garrabrant|yabo app

An agent which is larger than its environment can:

Hold an exact model of the environment in its head.
想一想结果of every potential course of action.
If it doesn’t know the environment perfectly, holdeverypossiblewaythe environment could be in its head, as is the case with Bayesian uncertainty.

全部of these are typical of notions of rational agency.

Anembedded agentcan’t do any of those things, at least not in any straightforward way.

emmy嵌入式代理人

One difficulty is that, since the agent is part of the environment, modeling the environment in every detail would require the agent to model itself in every detail, which would require the agent’s self-model to be as “big” as the whole agent. An agent can’t fit inside its own head.

The lack of a crisp agent/environment boundary forces us to grapple with paradoxes of self-reference. As if representing the rest of the world weren’t already hard enough.

Embedded World-Modelshave to represent the world in a way more appropriate for embedded agents. Problems in this cluster include:

“真实性”/“真理之粒”问题：现实世界不在代理人的假设空间

逻辑不确定性

high-level models

多级模型

ontological crises

归化诱导，代理人必须将其自身模型纳入其世界模型的问题

人际推理，推理的问题有多少个自己的副本存在

阅读更多 ”

Decision Theory

October 31, 2018|Abram Demski.|yabo app

Decision theory and artificial intelligence typically try to compute something resembling

$$ \ underset {a \ \ in \ action} {\ mathrm {argmax}} \ \ f（a）。$$

I.e., maximize some function of the action. This tends to assume that we can detangle things enough to see outcomes as a function of actions.

For example, AIXI represents the agent and the environment as separate units which interact over time through clearly defined i/o channels, so that it can then choose actions maximizing reward.

AIXI

When the agent model is环境模型的一部分,它可以大大减少清楚consider taking alternative actions.

Embedded agent

For example, because the agent issmaller than the environment，可以有代理商的其他副本，或者与代理商非常相似的东西。这导致有争议的决策理论问题，如the Twin Prisoner’s Dilemma and Newcomb’s problem。

If Emmy Model 1 and Emmy Model 2 have had the same experiences and are running the same source code, should Emmy Model 1 act like her decisions are steering both robots at once? Depending on how you draw the boundary around “yourself”, you might think you control the action of both copies, or only your own.

这是反事工程问题的一个例子：我们如何评估假设，如“如果太阳突然出去了”？

Problems of adaptingdecision theoryto embedded agents include:

反应性

Newcomblike推理，代理商与本身的副本相互作用

更广泛地了解其他代理

敲诈勒索问题

coordination problems

逻辑反应性

logical updatelessness

阅读更多 ”

October 2018 Newsletter

October 29, 2018|罗伯格林格|时事通讯

Announcing the new AI Alignment Forum

October 29, 2018|客人|客人Posts,News

This is a guest post by Oliver Habryka, lead developer for LessWrong. Our gratitude to the LessWrong team for the hard work they’ve put into developing this resource, and our congratulations on today’s launch!

I am happy to announce that after two months of open beta, theAI Alignment Forumis launching today. The AI Alignment Forum is a new website built by the team behind胜败2.0，帮助为技术AI对准研究和讨论创建新的集线器。亚博体育官网

One of our core goals when we designed the forum was to make it easier for new people to get started on doing technical AI alignment research. This effort was split into two major parts:

阅读更多 ”

嵌入式代理人

October 29, 2018|Scott Garrabrant|yabo app

Suppose you want to build a robot to achieve some real-world goal for you—a goal that requires the robot to learn for itself and figure out a lot of things that you don’t already know.¹

There’s a complicated engineering problem here. But there’s also a problem of figuring out what it even means to build a learning agent like that. What is it to optimize realistic goals in physical environments? In broad terms, how does it work?

In this series of posts, I’ll point to four ways wedon’t目前知道它是如何工作的，并且有四个领域的积极研究旨在弄清楚它。亚博体育官网

This is Alexei, and Alexei is playing a video game.

Alexei The Dualistic Agent

像大多数游戏一样，这个游戏有clear input and output channels。Alexei only observes the game through the computer screen, and only manipulates the game through the controller.

The game can be thought of as a function which takes in a sequence of button presses and outputs a sequence of pixels on the screen.

Alexei is also very smart, and capable ofholding the entire video game inside his mind。If Alexei has any uncertainty, it is only over empirical facts like what game he is playing, and not over logical facts like which inputs (for a given deterministic game) will yield which outputs. This means that Alexei must also store inside his mind every possible game he could be playing.

Alexeidoes not, however, have to think about himself。他只是优化他正在玩的游戏，而不是优化他正在使用的大脑来思考游戏。他可能仍然可以根据信息价值选择行动，但这只是帮助他排除他正在玩的可能游戏，而不是改变他的想法。

事实上，Alexei可以把自己视为一个不变的不可分割的原子。由于他不存在于他思考的环境中，Alexei并不担心他是否会随着时间的推移而变化，或者关于他可能必须运行的任何子程序。

请注意，我谈到的所有属性都是部分地实现的，即Alexei与他正在优化的环境干净地分开。
阅读更多 ”

这是第1部分Embedded Agency系列，由Abram Demski和Scott Garrabrant。↩

The Rocket Alignment Problem

October 3, 2018|Eliezer Yudkowsky|yabo app

The following is a fictional dialogue building off ofAI Alignment: Why It’s Hard, and Where to Start。

(Somewhere in a not-very-near neighboring world, where science took a very different course…)

ALFONSO:Hello, Beth. I’ve noticed a lot of speculations lately about “spaceplanes” being used to attack cities, or possibly becoming infused with malevolent spirits that inhabit the celestial realms so that they turn on their own engineers.

I’m rather skeptical of these speculations. Indeed, I’m a bit skeptical that airplanes will be able to even rise as high as stratospheric weather balloons anytime in the next century. But I understand that your institute wants to address the potential problem of malevolent or dangerous spaceplanes, and that you think this is an important present-day cause.

BETH:That’s… really not how we at the Mathematics of Intentional Rocketry Institute would phrase things.

恶毒的天体精神是世界的问题t all the news articles are focusing on, but we think the real problem is something entirely different. We’re worried that there’s a difficult, theoretically challenging problem which modern-day rocket punditry is mostly overlooking. We’re worried that if you aim a rocket at where the Moon is in the sky, and press the launch button, the rocket may not actually end up at the Moon.

ALFONSO:I understand that it’s very important to design fins that can stabilize a spaceplane’s flight in heavy winds. That’s important spaceplane safety research and someone needs to do it.

But if you were working on that sort of safety research, I’d expect you to be collaborating tightly with modern airplane engineers to test out your fin designs, to demonstrate that they are actually useful.

BETH:空气动力学设计是任何安全火箭的重要特征，我们很高兴火箭科学家正在致力于这些问题并认真对待安全。但这不是我们在Miri专注的问题。

ALFONSO:那是什么担忧，然后是什么？您是否担心航天可能会被虐待的人开发？

BETH:That’s not the failure mode we’re worried about right now. We’re more worried that right now,nobodycan tell you how to point your rocket’s nose such that it goes to the moon, nor indeedany被预分的天体目的地。无论是谷歌还是美国政府或朝鲜都是推出火箭的人，不会对我们的角度来说是一个成功的月亮登陆的概率务实的差异，因为现在nobody knows how to aim any kind of rocket anywhere。

阅读更多 ”

Subsystem Alignment

Robust Delegation

Embedded World-Models

Decision Theory

October 2018 Newsletter

Announcing the new AI Alignment Forum

嵌入式代理人

The Rocket Alignment Problem

Search

浏览

Subscribe

Other updates

新闻和链接

Search

浏览

Subscribe