五个论文,两个lemmas和几个战略意义
MIRI’s primary concern about self-improving AI isn’t so much that it might be created by ‘bad’ actors rather than ‘good’ actors in the global sphere; rather most of our concern is in remedying the situation in whichno one knows at allhow to create a self-modifying AI with known, stable preferences. (This is why we see the main problem in terms of亚博体育官网 和encouraging others to perform relevant research, rather than trying to stop ‘bad’ actors from creating AI.)
This, and a number of other basic strategic views, can be summed up as a consequence of 5 theses about purely factual questions about AI, and 2 lemmas we think are implied by them, as follows:
Intelligence explosion thesis。A sufficiently smart AI will be able to realize large, reinvestable cognitive returns from things it can do on a short timescale, like improving its own cognitive algorithms or purchasing/stealing lots of server time. The intelligence explosion will hit very high levels of intelligence before it runs out of things it can do on a short timescale. See:Chalmers(2010);Muehlhauser & Salamon (2013);Yudkowsky(2013年)。
Orthogonality thesis。心灵设计空间足够巨大,可以包含几乎任何一组偏好的代理,并且这些代理可以对实现这些偏好具有乐于理解的,并且具有很大的计算能力。例如,介意设计空间理论上包含强大的,有用的Rational代理商,它充当预期的纸夹最大化器,并且始终会选择导致最多的预期纸夹的选项。看:Bostrom(2012);Armstrong (2013)。
Convergent instrumental goals thesis。Most utility functions will generate a subset of instrumental goals which follow from most possible final goals. For example, if you want to build a galaxy full of happy sentient beings, you will need matter and energy, and the same is also true if you want to make paperclips. This thesis is why we’re worried about very powerful entities even if they have no explicit dislike of us: “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.” Note though that by the Orthogonality Thesis you can always have an agent which explicitly, terminally prefers not to do any particular thing — an AI which does love you will not want to break you apart for spare atoms. See:Omohundro (2008);Bostrom(2012)。
Complexity of value thesis。It takes a large chunk of Kolmogorov complexity to describe even idealized human preferences. That is, what we ‘should’ do is a computationally complex mathematical object even after we take the limit of reflective equilibrium (judging your own thought processes) and other standard normative theories. A superintelligence with a randomly generated utility function would not do anything we see as worthwhile with the galaxy, because it is unlikely to accidentally hit on final preferences for having a diverse civilization of sentient beings leading interesting lives. See:Yudkowsky (2011);Muehlhauser & Helm (2013)。
Fragility of value thesis。Getting a goal system 90% right does not give you 90% of the value, any more than correctly dialing 9 out of 10 digits of my phone number will connect you to somebody who’s 90% similar to Eliezer Yudkowsky. There are multiple dimensions for which eliminating that dimension of value would eliminate almost all value from the future. For example an alien species which shared almost all of human value except that their parameter setting for “boredom” was much lower, might devote most of their computational power to replaying a single peak, optimal experience over and over again with slightly different pixel colors (or the equivalent thereof). Friendly AI is more like a satisficing threshold than something where we’re trying to eke out successive 10% improvements. See: Yudkowsky (2009,2011).
的se five theses seem to imply two important lemmas:
间接正规性。编程自我改善的机器智能来实现抓住的东西 - 似乎似乎有用的想法会导致一个糟糕的结果,无论苹果派和母性如何响起。例如,如果你给予AI最终目标“让人们幸福”它将只转过人们的快乐中心最多。“Indirectly normative” is Bostrom’s term for an AI that calculates the ‘right’ thing to do via, e.g., looking at human beings and modeling their decision processes and idealizing those decision processes (e.g. what you would-want if you knew everything the AI knew and understood your own decision processes, reflective equilibria, ideal advisior theories, and so on), rather than being told a direct set of ‘good ideas’ by the programmers. Indirect normativity is how you deal with Complexity and Fragility. If you can succeed at indirect normativity, then small variances in essentially good intentions may not matter much — that is, if two different projects do indirect normativity correctly, but one project has 20% nicer and kinder researchers, we could still hope that the end results would be of around equal expected value. See:Muehlhauser & Helm (2013)。
友好的额外额外难度。您可以建立一个友好的AI(通过正交性论文),但您需要大量的工作和巧妙,以获得目标系统。亚博体育苹果app官方下载可能更重要的是,其余的AI需要满足更高的清洁标准,以便目标系统通过十亿个顺序自我修改保持不变。亚博体育苹果app官方下载任何充分智能的AI要做清洁的自我修改都会往往会这样做,但问题是智能爆炸可能会从AIS开始,这比例如,例如,使用遗传算法或其他这样的这种方式重写自己的AISthat don’t preserve a set of consequentialist preferences. In this case, building a Friendly AI could mean that our AI has to be smarter about self-modification than the minimal AI that could undergo an intelligence explosion. See:Yudkowsky (2008)和Yudkowsky(2013年)。
的se lemmas in turn have two major strategic implications:
- 我们有很多工作要做,如间接的正规和稳定的自我改善。在这个阶段,这项工作很多看起来真的是基础 - 也就是说,我们无法描述如何使用无限计算电源来做这些东西,更不用说有限计算功率。我们应该尽早开始这项工作,因为基础研究往往需要很多时间。亚博体育官网
- 的re needs to be a Friendly AI project that has some sort of boost over competing projects which don’t live up to a (very) high standard of Friendly AI work — a project which can successfully build a stable-goal-system self-improving AI, before a less-well-funded project hacks together a much sloppier self-improving AI. Giant supercomputers may be less important to this than being able to bring together the smartest researchers (see the open question posed inYudkowsky 2013) but the required advantage cannot be left up to chance. Leaving things to default means that projects less careful about self-modification would have an advantage greater than casual altruism is likely to overcome.