This is the conclusion of the
Embedded Agencyseries. Previous posts:
Embedded Agents—Decision Theory—Embedded World-Models
Robust Delegation—Subsystem Alignment
A final word on curiosity, and intellectual puzzles:
I described an embedded agent, Emmy, and said that I don’t understand how she evaluates her options, models the world, models herself, or decomposes and solves problems.
在过去,当研究人员讨论亚博体育官网tivations for working on problems like these, they’ve generally focused on the motivation fromAI risk。AI研亚博体育官网究人员想建造可以以人类的通用方式解决问题的机器,并且二元论不是思考此类系统的现实框架。亚博体育苹果app官方下载特别是,随着AI系统变得更聪明,它特别容易崩溃。亚博体育苹果app官方下载当人们弄清楚如何构建通用AI系统时,我们希望这些研究人员处于更好的位置,以了解其系统,分析其内亚博体育苹果app官方下载部属性并对他们的未来行为充满信心。亚博体育官网
This is the motivation for most researchers today who are working on things like updateless decision theory and subsystem alignment. We care about basic conceptual puzzles which we think we need to figure out in order to achieve confidence in future AI systems, and not have to rely quite so much on brute-force search or trial and error.
但是,为什么我们可能需要也可能不需要AI中特定的概念见解的论点很长。我没有试图在这里涉足该辩论的细节。相反,我一直在讨论一组特定的研究方向亚博体育官网intellectual puzzle, and not as an instrumental strategy.
One downside of discussing these problems as instrumental strategies is that it can lead to some misunderstandings about为什么we think this kind of work is so important. With the “instrumental strategies” lens, it’s tempting to draw a direct line from a given research problem to a given safety concern. But it’s not that I’m imagining real-world embedded systems being “too Bayesian” and this somehow causing problems, if we don’t figure out what’s wrong with current models of rational agency. It’s certainly not that I’m imagining future AI systems being written in second-order logic! In most cases, I’m not trying at all to draw direct lines between research problems andspecific AI failure modes。
What I’m instead thinking about is this: We sure do seem to be working with the wrong basic concepts today when we try to think about what agency is, as seen by the fact that these concepts don’t transfer well to the more realistic embedded framework.
If AI developers in the future arestillworking with these confused and incomplete basic concepts as they try to actually build powerful real-world optimizers, that seems like a bad position to be in. And it seems like the research community is unlikely to figure most of this out by default in the course of just trying to develop more capable systems. Evolution certainly figured out how to build human brains without “understanding” any of this, via brute-force search.
嵌入式代理商是我试图指出我认为是一个非常重要且中心的地方的方式,我认为未来的研究人员也冒着陷入困境的风险。亚博体育官网
There’s also a lot of excellent AI alignment research that’s being done with an eye toward more direct applications; but I think of that safety research as having a different type signature than the puzzles I’ve talked about here.
Intellectual curiosity isn’t the ultimate reason we privilege these research directions. But there are somepracticaladvantages to orienting toward research questions from a place of curiosity at times, as opposed toonly applying the “practical impact” lens我们如何看待世界。
When we apply the curiosity lens to the world, we orient toward the sources of confusion preventing us from seeing clearly; the blank spots in our map, the flaws in our lens. It encourages re-checking assumptions and attending to blind spots, which is helpful as a psychological counterpoint to our “instrumental strategy” lens—the latter being more vulnerable to the urge to lean on whatever shaky premises we have on hand so we can get to more solidity and closure in our early thinking.
嵌入式代理is an organizing theme behind most, if not all, of our big curiosities. It seems like a central mystery underlying many concrete difficulties.