安全 - 关键系统的透明度 - 机器智能研究所亚博体育官网亚博体育苹果app官方下载

In this post, I aim to summarize one common view on AI transparency and AI reliability. It’s difficult to identify the field’s “consensus” on AI transparency and reliability, so instead I will present一种common view so that I can use it to introduce a number of complications and open questions that (I think) warrant further investigation.

这是我在下面总结的常见视图的简短版本：

黑匣子测试可以提供一些信心，即系统将以预期的方式行事，但是如果构建系统使其对人类亚博体育苹果app官方下载检查是透明的，则可以使用其他可靠性验证方法。不幸的是，AI的许多最有用的方法都是最不透明的。基于逻辑的系统通常比统计亚博体育苹果app官方下载方法更透明，但是统计方法更广泛地使用。该一般规则有例外，有些人正在努力使统计方法更加透明。

The value of transparency in system design

Nusser (2009)writes:

…在与安全相关的应用领域中，必须提供可以由域专家验证的透明解决方案。“黑匣子”方法（如人工神经网络）被认为是怀疑的 - 即使它们在可用数据上表现出很高的精度 - 因为证明它们在所有可能的输入组合中都会表现出良好的性能是不可行的。

Unfortunately, there is often a tension between AI capability and AI transparency. Many of AI’s most powerful methods are also among its least transparent:

已知可以实现高预测性能的方法 - 例如支持向量机（SVM）或人工神经网络（ANN）通常很难解释。另一方面，人们通常在预测性能方面受到限制，已知的方法（例如（模糊）规则系统，决策树或线性模型）通常受到限制。亚博体育苹果app官方下载¹

但对于safety-critical systems- 特别是美国国际集团— it is important to prioritize system reliability over capability. Again, here isNusser (2009)：

strict requirements [for system transparency] are necessary because a safety-related system is a system whose malfunction or failure can lead to serious consequences — for example environmental harm, loss or severe damage of equipment, harm or serious injury of people, or even death. Often, it is impossible to rectify a wrong decision within this domain.

The special need for transparency in AI has also been stressed by many others,²includingBoden (1977)：

Members of the artificial intelligence community bear an ominous resemblance to… theSorcerer’s Apprentice。学徒学到了足够的魔法……为自己挽救了执行繁重的任务的麻烦，但还不足以阻止拼写的桶和扫帚淹没城堡……

[我要问的一个问题是]]是否有任何写作程序的方法会倾向于控制人的手……[一方面]程序应该是可理解的和明确的，因此“正在发生的事情”不被埋葬在《代码》中或隐式体现在其目标和效果的过程中。

从黑匣子到透明的频谱

Non-transparent systems are sometimes called “black boxes”:

一种黑盒子is a device, system or object which can be viewed in terms of its input, output and transfer characteristics不知道其内部工作。它的实现是“不透明”（黑色）。几乎任何东西都可能被称为黑匣子：晶体管，算法或人类的思想。

…[And] in practice some [technically transparent] systems are so complex that [they] might as well be [black boxes].³

The human brain is mostly a black box. We can observe its inputs (light, sound, etc.), its outputs (behavior), and some of its transfer characteristics (swinging a bat at someone’s eyes often results in ducking or blocking behavior), but we don’t know very much about如何the brain works. We’ve begun to develop an algorithmic understanding ofsomeof its functions (especiallyvision），但勉强。⁴

Many contemporary AI methods are effectively black box methods. AsWhitby (1996)explains, the safety issues that arise in “GOFAI” (e.g. search-based problem solvers and knowledge-based systems) “are as nothing compared to the [safety] problems which must be faced by newer approaches to AI… Software that uses some sort of neural net or genetic algorithm must face the further problem that it seems, often almost by definition, to be ‘inscrutable’. By this, I mean that… we can know that it works and test it over a number of cases but we will not in the typical case ever be able to know exactly how.”

Other methods, however, are relatively transparent, as we shall see below.

This post cannot survey the transparency of all AI methods; there are too many. Instead, I will focus on three major “families” of AI methods.

检查三个AI方法家族的透明度

机器学习

机器学习也许是AI中最大，最活跃的子领域，涵盖了机器从数据中学习的多种方法。有关该领域的概述，请参阅Flach (2012)。For a quick video intro, seehere。

不幸的是，机器学习方法往往不是最透明的方法之一。⁵Nusser (2009)解释：

机器学习方法被与安全相关应用领域的领域专家怀疑，因为通常无法充分解释和验证学习解决方案。

For now, let’s consider one popular machine learning method in particular:一种rtificial neural networks(ANNs). (For a concise video introduction, seehere。）作为Rodvold (1999)解释说，ANN通常是黑匣子：

神经网络的智能包含在数字突触权重，连接，传输功能和其他网络定义参数的集合中。通常，对这些数量的检查几乎没有明确的信息来启发开发人员，以了解为什么要产生某个结果。

同时,库尔德人(2005)：

it is common for typical ANNs to be treated as black-boxes… because ANN behaviour is scattered across its weights and links with little meaning to an observer. As a result of this unstructured and unorganised representation of behaviour, it is often not feasible to completely understand and predict their function and operation… The interpretation problems associated with ANNs impede their use in safety critical contexts…⁶

Deep learning is another popular machine learning technique. It, too, tends to be non-transparent — like ANNs, deep learning methods were inspired by how parts of the brain work, in particular the visual system.⁷

某些机器学习方法比其他方法更透明。Bostrom＆Yudkowsky（2013）explain:

如果一个机器学习算法是基于有限公司mplicated neural network… then it may prove nearly impossible to understand why, or even how, the algorithm [made its judgments]. On the other hand, a machine learner based on decision trees or Bayesian networks is much more transparent to programmer inspection (Hastie等。2001），这可能使审计师能够发现[算法为什么做出判断]。⁸

此外，最近的工作试图使某些机器学习方法更加透明，因此可能更适合于关键安全应用。例如，Taylor (2005)建议从神经网络中提取规则（指人类概念）提取规则的方法，以便研究人员可以对提取的规则进行正式的安全分析。亚博体育官网这些方法仍然是相当原始的，尚未广泛适用或广泛使用，但是进一步的研究可以使这些方法更有用和流行。亚博体育官网⁹

Evolutionary algorithms

Evolutionary algorithms（EA）通常被归类为一种机器学习方法，但在这里将分别考虑它们。EAS使用受进化启发的方法生产candidate solutionsto problems. For example, watch这个视频of software robots evolving to “walk” quickly.

Because evolutionary algorithms use a process of semi-random mutation and recombination to produce candidate solutions, complex candidate solutions tend not to be transparent — just like the evolultionarily-produced brain.米切尔（1998），p。40 writes:

了解[遗传算法]进化的结果是一个普遍的问题 - 通常，要求[遗传算法]找到达到高健身的[候选解决方案]，但并未被告知如何达到高适应性。可以说，这类似于生物学家在理解自然进化产物（例如，我们）的困难……在许多情况下……很难准确地理解进化的高拟合度[候选解决方案]如何起作用。例如，在遗传编程中，进化的程序通常非常长且复杂，许多无关的组件附在核心程序中，执行了所需的计算。手工弄清楚该核心程序是什么，通常是很多工作（有时几乎不可能）。

Fleming & Purshouse (2002)一种dd:

Mission-critical and safety-critical applications do not appear, initially, to be favourable towards EA usage due to the stochastic nature of the evolutionary algorithm. No guarantee is provided that the results will be of sufficient quality for use on-line.

逻辑方法

逻辑方法in AI are implemented widely in safety-critical applications (e.g. medicine), but see far less application in general compared to machine learning methods.

In a logic-based AI, the AI’s knowledge and its systems for reasoning are written out in logical statements. These statements are typically hand-coded, and the meaning of each statement has a precise meaning determined by the axioms of the logical system being used (e.g. first-order logic).Russell & Norvig (2009)，领先的AI教科书描述了AI的逻辑方法第七章。It describes a popular application of logical AI, called “classical planning,” in第10章。也看Thomason (2012)一种ndMinker (2000)。

Galliers (1988，p。88-89) explains the transparency advantages of logic-based methods in AI:

A theory expressed as a set of logical axioms is evident; it is open to examination. This assists the process of determining whether any parts of the theory are inconsistent, or do not behave as had been anticipated when they were expressed in English… Logics are languages with precise semantics [and therefore] there can be no ambiguities of interpretation… By expressing the properties of agents… as logical axioms and theorems… the theory is transparent; properties, interrelationships and inferences are open to examination… This contrasts with the use of computer code [where] it is frequently the case that computer systems concerned with… problem-solving are in fact designed such that properties of the interacting agents are implicit properties of the entire system, and it is impossible to investigate the role or effects of any individual aspect.¹⁰

逻辑方法在AI中的另一个透明度优势来自逻辑语言代表各种机器的能力，包括可以反映自己的机器和信念的原因。例如，他们可以通过每个基准传递假设。例如。在Fox & Das (2000)。

Moreover, some logic-based approaches are amenable to正式方法，例如formal verification：mathematically proving that a system will perform correctly with respect to a formal specification.¹¹正式方法补充经验testingof software, e.g. by identifying “corner bugs” that are difficult to find when using empirical methods only — seeMitra 2008。

正式验证也许以其在verifying hardware components(especially since theFDIV bugthat cost Intel $500 million), but it is also used to verify a variety of software programs (in part or in whole), including flight control systems (米勒等。2005），铁路控制系统（亚博体育苹果app官方下载Platzer＆Quesel 2009), pacemakers (Tuan et al. 2010), compilers (Leroy 2009），操作系统内核（亚博体育苹果app官方下载Andronick 2011），多机构系统（亚博体育苹果app官方下载Raimondi 2006），户外机器人（Proetzsch et al . 2007), and swarm robotics (Dixon等。2012).

Unfortunately, formal methods face severe limitations.Fox (1993)解释：

there are severe limitations on the capability of formal design techniques to completely prevent hazardous situations from arising. Current formal design methods are difficult to use and time-consuming, and may only be practical for relatively modest applications. Even if we reserve formal techniques for the safety-critical elements of the system we have seen that the soundness guaranteed by the techniques can only be as good as the specifier’s ability to anticipate the conditions and possible hazards that can hold at the time of use… These problems are difficult enough for ‘closed systems’ in which the designer can be confident, in principle, of knowing all the parameters which can affect system performance… Unfortunately all systems are to a greater or lesser extent ‘open’; they operate in an environment which cannot be exhaustively monitored and in which unpredictable events will occur. Furthermore, reliance on specification and verification methods assumes that the operational environment will not compromise the correct execution of software. In fact of course software errors can be caused by transient faults causing data loss or corruption; user errors; interfacing problems with external systems (such as databases and instruments); incompatibilities between software versions; and so on.¹²

Bowen＆Hinchey（1995）concur:

There are many… areas where, although possible, formalization is just not practical from a resource, time, or financial aspect. Most successful formal methods projects involve the application of formal methods to critical portions of system development. Only rarely are formal methods, and formal methods alone, applied to all aspects of system development. Even within the CICS project, which is often cited as a major application of formal methods… only about a tenth of the entire system was actually subjected to formal techniques…

[We suggest] the following maxim:亚博体育苹果app官方下载系统开发应尽可能正式，但不要更正式。

For more on the use of formal methods for AI safety, seeRushby＆Whitehurst（1989）;Bowen & Stavridou (1993);Harper (2000);Spears (2006);Fischer等人（2013）。¹³

一些并发症和开放问题

The common view of transparency and AI safety articulated above suggests an opportunity for差异技术发展。为了增加未来的AI系统安全可靠的几率，我们可以在透明的AI方法上进行不成亚博体育苹果app官方下载比例的投资，以及用于提高通常不透明AI方法的透明度的技术。

但是，这种共同的观点带有一些严肃的警告和一些困难的公开问题。例如：

How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we canquerythe Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
Do the categories above really “在关节上雕刻现实” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
系统的透明度是多少，对系统的亚博体育苹果app官方下载“固有”是“固有的”，其中多少取决于用于检查它的用户界面的质量？不同类型的系统可以从设计出色的用户界面中获得多少“透明度提升”？亚博体育苹果app官方下载¹⁴

Acknowledgements

My thanks to John Fox, Jacob Steinhardt, Paul Christiano, Carl Shulman, Eliezer Yudkowsky, and others for their helpful feedback.

Quote fromNusser (2009)。重点增加了。原始文本包含许多引用，这些引用已在这篇文章中删除，以供可读性。也看Schultz & Cronin (2003)，，，，which makes this point by graphing four AI methods along two axes: robustness and transparency. Their graph is availablehere。In their terminology, a method is “robust” to the degree that it is flexible and useful on a wide variety of problems and data sets. On the graph, GA means “genetic algorithms,” NN means “neural networks,” PCA means “principal components analysis,” PLS means “partial least squares,” and MLR means “multiple linear regression.” In this sample of AI methods, the trend is clear: the most robust methods tend to be the least transparent. Schultz & Cronin graphed only a tiny sample of AI methods, but the trend holds more broadly.↩
我将分享一些有关智能系统透明度的重要性的其他报价。亚博体育苹果app官方下载Kröske et al. (2009)write that, to trust a machine intelligence, “human operators need to be able to understand [its] reasoning process and the factors that precipitate certain actions.” Similarly,Fox (1993)写道：“许多工程分支已经超越了纯粹的经验测试（用于安全）……因为他们已经建立了强大的设计理论……结果是设计师可以自信地预测故障模式，性能边界条件等在实施系统之前亚博体育苹果app官方下载…[在AI中获得这些好处]的一种有希望的方法可能是使用定义明确的规范语言和验证程序。Van Harmelen＆Balder（1992）[list some] advantages of using formal languages… [including] the removal of ambiguity… [and] the ability to derive properties of the design in the absence of an implementation.” In their preface,Fox & Das (2000)write: “Our first obligation is to try to ensure that the designs of our systems are sound. We need to ask not only ‘do they work?’ but also ‘do they work for good reasons?’ Unfortunately, conventional software design is frequently ad hoc, and AI software design is little better and possibly worse… Consequently, we place great emphasis on clear design principles, strong mathematical foundations for these principles, and effective development tools that support and verify the integrity of the system… We are creating a powerful technology [AI], possibly more quickly than we think, that has unprecedented potential to create havoc as well as benefit. We urge the community to embark on a vigorous discussion of the issues and the creation of an explicit ‘safety culture’ in the field.”↩
重点增加了。第一段来自维基百科黑盒子page; the second paragraph is from its白盒页。“灰色框”一词有时用于指代“完全黑匣子”和“完全透明”方法之间的透明度中间的方法：例如，请参见例如。Sohlberg (2003)。↩
因此，如果我们能建立一个whole brain emulation如今，即使其所有信息都存储在计算机中，并且数据库搜索工具等等，它也将主要是黑匣子系统。亚博体育苹果app官方下载但是，在WBE之前，我们可能会在认知神经科学方面取得很多进展实际上建造的，也可能会在认知神经科学方面的快速进步，因此，人脑将迅速变得更加透明。↩
For more discussion of how machine learning can be used for relatively “transparent” ends, for example to learn the structure of a Bayesian network, see墨菲（2012），ch。26。↩
Li & Peng (2006)make the same point: “conventional neural networks… lack transparency, as their activation functions (AFs) and their associated neural parameters bear very little physical meaning.” See alsoWoodman et al. (2012)‘在个人机器人技术的背景下对这个问题的评论：“在自动机器人的要求中……是一定程度的鲁棒性。这意味着能够处理错误并在异常条件下继续操作……在动态环境中，机器人经常会发现自己处于多种以前看不见的情况。迄今为止，该领域的大多数研究通过使用学习算法（通常亚博体育官网以人工神经网络（ANN）的实现）解决了这一问题……但是，作为Nehmzow等。（2004）identify, these implementations, although seemingly effective, are difficult to analyse due to the inherent opacity of connection-based algorithms. This means that it is difficult to produce an intelligible model of the system structure that could be used in safety analysis.”↩
墨菲（2012），p。995写道：“当我们看大脑时，我们似乎是许多级别的处理。据信，每个级别都是学习特征或表征，以提高抽象水平。例如，视觉皮层的标准模型…表明（粗略地说）大脑首先提取边缘，然后提取边缘，然后表面，然后对象等……这一观察激发了机器学习的最新趋势，称为深度学习……试图在计算机中复制这种体系结构。”↩
人们普遍认为，贝叶斯网络比ANN更透明，但这仅是正确的。具有数百个与人类直觉概念无关的节点的贝叶斯网络不一定比大型ANN更透明。↩
For an overview of this work, seeNusser (2009)，，，，section 2.2.3. Also seePulina & Tacchella (2011)。最后，NG（2011），，，，sec. 4, notes that we can get a sense of what function an ANN has learned by asking what which inputs would maximize the activation of particular nodes. In his example, Ng uses this technique to visualize which visual features have been learned by a sparse autoencoder trained on image data.↩
Wooldridge (2003)同意，写道：“透明是[逻辑方法]的另一个优势。”↩
有关通常总体上正式方法的概述，请参见Bozzano＆Villafiorita（2010），，，，Woodcock等。（2009）;Gogolla (2006);Bowen＆Hinchey（2006）。For more on the general application of safety engineering theory to AI, seeFox (1993);Yampolskiy & Fox (2013);Yampolskiy (2013)。↩
Another good point Fox makes is that normal AI safety engineering techniques rely on the design team’s ability to predict all circumstances that might hold in the future: “…one might conclude that using a basket of safety methods (hazard analysis, formal specification and verification, rigorous empirical testing, fault tolerant design) will significantly decrease the likelihood of hazards and disasters. However, there is at least one weakness common to all these methods. They rely on the design team being able to make long-range predictions about all the… circumstances that may hold when the system is in use. This is unrealistic, if only because of the countless interactions that can occur… [and] the scope for unforseeable interactions is vast.”↩
See alsothis program在布里斯托大学。↩
As an aside, I’ll briefly remark that user interface confusion has contributed to many computer-related failures in the past. For example,诺依曼(1994)reports on the case of伊朗空中航班655，，，，which was shot down U.S. forces due (partly) to the unclear user interface of the USSVincennes’Aegis missile system. Changes to the interface were subsequently recommended. For other UI-related disasters, see Neumann’s extensive page onIllustrative Risks to the Public in the Use of Computer Systems and Related Technology。↩

Did you like this post?You may enjoy our otheryabo app posts, including:

安全 - 关键系统的透明度亚博体育苹果app官方下载