Goodhart’s Law states that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” However, this is not a single phenomenon. InGoodhart Taxonomy,我建议(至少)四种不同的机制,当您针对它们优化时,代理措施通过该机制措施:回归,极值,因果和对抗性。
David Manheim现在有助于将我的分类系统写在这些机制上的更多细节:“Categorizing variants of Goodhart’s Law。“From the conclusion:
This paper represents an attempt to categorize a class of simple statistical misalignments that occur both in any algorithmic system used for optimization, and in many human systems that rely on metrics for optimization. The dynamics highlighted are hopefully useful to explain many situations of interest in policy design, in machine learning, and in specific questions about AI alignment.
在policy, these dynamics are commonly encountered but too-rarely discussed clearly. In machine learning, these errors include extremal Goodhart effects due to using limited data and choosing overly parsimonious models, errors that occur due to myopic consideration of goals, and mistakes that occur when ignoring causality in a system. Finally, in AI alignment, these issues are fundamental to both aligning systems towards a goal, and assuring that the system’s metrics do not have perverse effects once the system begins optimizing for them.
让V指的是真正的目标,而U指观察到与之相关的目标的代理V并且在某种程度上正在优化。然后,古老的法律的四个亚型如下:
回归古特哈特- 选择代理度量时,您不仅选择真实目标,而且选择代理与目标之间的差异。
- Model: WhenUis equal toV+X, whereX有些噪音,一个很大的点Uvalue will likely have a largeVvalue, but also a largeXvalue. Thus, whenUis large, you can expectVto be predictably smaller thanU。
- Example: Height is correlated with basketball ability, and does actually directly help, but the best player is only 6’3″, and a random 7′ person in their 20s would probably not be as good.
Extremal Goodhart- 从中代理占极值的世界可能与普通世界不同,其中代理与目标之间的相关性。
- Model:图案倾向于打破简单的关节。一个简单的世界子集是那些世界的Uis very large. Thus, a strong correlation betweenU和Vobserved for naturally occuringUvalues may not transfer to worlds in whichUis very large. Further, since there may be relatively few naturally occuring worlds in whichU非常大,非常大U可能一致小Vvalues without breaking the statistical correlation.
- Example: The tallest person on record, Robert Wadlow, was 8’11” (2.72m). He grew to that height because of a pituitary disorder; he would have struggled to play basketball because he “required leg braces to walk and had little feeling in his legs and feet.”
因果关系— When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
- Model: IfVcausesU(或者如果V和U既是第三件事又造成的),那么之间的相关性V和Umay be observed. However, when you intervene to increaseUthrough some mechanism that does not involveV, you will fail to also increaseV。
- Example: Someone who wishes to be taller might observe that height is correlated with basketball skill and decide to start practicing basketball.
Adversarial Goodhart— When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.
- Model: Consider an agentAwith some different goalW。Since they depend on common resources,W和Vare naturally opposed. If you optimizeUas a proxy forV, 和A知道这一点,Ais incentivized to make largeUvalues coincide with largeWvalues, thus stopping them from coinciding with largeVvalues.
- Example: Aspiring NBA players might just lie about their height.
For more on this topic, see Eliezer Yudkowsky’s write-up,Goodhart’s Curse。
Sign up to get updates on new MIRI technical results
Get notified every time a new technical paper is published.