December 2019 Newsletter
From now through the end of December,MIRI's 2019 Fundraiseris live! See our fundraiser post for updates on our past year and future plans.
我很高兴地宣布,我们最大的更新之一是we've hired five new research staff,,,,with a sixth to join us in February. For details, see讲习班和扩展in the fundraiser post.
Also: Facebook's Giving Tuesday matching opportunity istomorrowPT上午5:00!看Colm's post有关如何匹配捐款的详细信息。
其他更新
- Our most recent hire, “Risks from Learned Optimization” co-author Evan Hubinger, describeswhat he'll be doing at MIRI。也可以看看Nate Soares' comment onmiri如何做非判断性的默认。
- Buck Shlegeris discussesEA居民作为外展机会。
- OpenAI releasesSafety Gym,,,,a set of tools and environments for incorporating safety constraints into RL tasks.
- 柴是寻求实习生;申请截止日期12月15日。
研究团队的想法亚博体育官网
本月,我正在尝试一些新事物:引用Miri研究人员对最近AI安全撰写的摘要和想法。亚博体育官网
I've left out names so that these can be read as a snapshot of people's impressions, rather than a definitive “Ah, researcher X believes Y!” Just keep in mind that these will be a small slice of thoughts from staff I've recently spoken to, not anything remotely like a consensus take.
- Re透明度会有助于欺骗吗?- “对重要话题的很好讨论。马修·巴内特(Matthew Barnett)建议,透明度工具中的任何弱点都可能将其变成有害的中间人,而直接训练主管来抓住欺骗可能是可取的。”
- Re克里斯·奥拉(Chris Olah)对AGI安全的看法- “我非常同意Evan Hubinger的想法,即收集不同的观点(不同的'帽子')是一件有用的事情。克里斯·奥拉(Chris Olah)对透明度的看法很高兴。显微镜AI的概念似乎是一个有用的概念,Olah对ML领域如何有效转移的愿景非常有趣。”
- Re定义AI线头— “Stuart Armstrong takes a shot at making a principled distinction between wireheading and the rest of Goodhart.”
- ReHow common is it to have a 3+ year lead?- “对于AI进度模型来说,这似乎是一个非常有趣的问题。预期的交货时间和预期起飞速度的问题极大地影响了赢家赢得全部动态的合理程度。”
- ReThoughts on Implementing Corrigible Robust Alignment— “Steve Byrnes provides a decent overview of some issues around getting ‘pointer’ type values.”