December 2019 Newsletter

2019年12月5日|Rob Bensinger|消息letters

From now through the end of December,MIRI's 2019 Fundraiseris live! See our fundraiser post for updates on our past year and future plans.

我很高兴地宣布，我们最大的更新之一是we've hired five new research staff，，，，with a sixth to join us in February. For details, see讲习班和扩展in the fundraiser post.

Also: Facebook's Giving Tuesday matching opportunity istomorrowPT上午5:00！看Colm's post有关如何匹配捐款的详细信息。

其他更新

Our most recent hire, “Risks from Learned Optimization” co-author Evan Hubinger, describeswhat he'll be doing at MIRI。也可以看看Nate Soares' comment onmiri如何做非判断性的默认。
Buck Shlegeris discussesEA居民作为外展机会。
OpenAI releasesSafety Gym，，，，a set of tools and environments for incorporating safety constraints into RL tasks.
柴是寻求实习生;申请截止日期12月15日。

研究团队的想法亚博体育官网

本月，我正在尝试一些新事物：引用Miri研究人员对最近AI安全撰写的摘要和想法。亚博体育官网

I've left out names so that these can be read as a snapshot of people's impressions, rather than a definitive “Ah, researcher X believes Y!” Just keep in mind that these will be a small slice of thoughts from staff I've recently spoken to, not anything remotely like a consensus take.

Re透明度会有助于欺骗吗？- “对重要话题的很好讨论。马修·巴内特（Matthew Barnett）建议，透明度工具中的任何弱点都可能将其变成有害的中间人，而直接训练主管来抓住欺骗可能是可取的。”
Re克里斯·奥拉（Chris Olah）对AGI安全的看法- “我非常同意Evan Hubinger的想法，即收集不同的观点（不同的'帽子'）是一件有用的事情。克里斯·奥拉（Chris Olah）对透明度的看法很高兴。显微镜AI的概念似乎是一个有用的概念，Olah对ML领域如何有效转移的愿景非常有趣。”
Re定义AI线头— “Stuart Armstrong takes a shot at making a principled distinction between wireheading and the rest of Goodhart.”
ReHow common is it to have a 3+ year lead?- “对于AI进度模型来说，这似乎是一个非常有趣的问题。预期的交货时间和预期起飞速度的问题极大地影响了赢家赢得全部动态的合理程度。”
ReThoughts on Implementing Corrigible Robust Alignment— “Steve Byrnes provides a decent overview of some issues around getting ‘pointer’ type values.”

MIRI’s 2019 Fundraiser

December 2, 2019|马洛·布尔贡|消息

MIRI’s 2019 fundraiseris concluded.

Over the past two years, huge donor support has helped us double the size of our AI alignment research team. Hitting our $1M fundraising goal this month will put us in a great position to continue our growth in 2020 and beyond, recruiting as many brilliant minds as possible to take on what appear to us to be the central technical obstacles to alignment.

Our fundraiser progress, updated in real time (including donations and matches made during the FacebookGiving Tuesday事件）：

Giving Tuesday 2019

2019年11月28日|科尔姆·瑞恩|消息

Update January 25, 2020：77,325美元在星期二通yabo体育官网下载ios过Facebook捐赠给Miri。$ 45,915在Facebook匹配活yabo体育官网下载ios动的13.5秒内从PT开始，并由Facebook匹配。感谢所有早日设置时钟以如此慷慨支持我们的人！也向EA星期二和Rethink Charity团队代表EA社区为他们做出了惊人的努力。

Update December 2, 2019: This page has been updated to reflect (a) observed changes in Facebook’s flow for donations of $500 or larger (b) additional information on securing matching for donations of $2,500 or larger during Facebook’s matching event and (c) a pointer to Paypal’s newly announced, though significantly smaller, matching event(s). Please check in here for more updates before the Facebook Matching event begins at 5am PT on December 3.

美里的年度筹款活动begins this Monday, December 2, 2019 andGiving Tuesday第二天发生；从12月3日上午5:00:00 AM PT（美国东部时间8:00:00 AM）开始Facebookwill match donations made on fundraiser pages on their platform up to a total of $7,000,000. This post focuses on this Facebook matching event. (You can find information on Paypal’s significantly smaller matching events in the footnotes.¹）

Facebook捐赠周二活动期间的捐款将以先到先得的基础上的美元匹配美元，直到耗资7,000,000美元的匹配资金用完为止。根据前几年的趋势，这可能会在10秒内发生。
Any US-based 501(c)(3) nonprofit eligible to receive donations on Facebook, e.g. MIRI, can be matched.
Facebook will match up to a total of $100,000 per nonprofit organization.
每个捐助者在星期二的捐款中最多可以符合合格的捐款。每笔捐款的默认限额为$ 2,499。希望捐赠超过2,499美元的捐助者有多yabo体育官网下载ios种策略可供选择（下），以增加其捐款相匹配的机会。

In2018，，，，Facebook’s matching pool of $7M was exhausted within 16 seconds of the event starting and in that time, 66% of our lightning-fast donors got their donations to MIRI matched, securing a total of $40,072 in matching funds. This year, we’re aiming for the per-organization $100,000 maximum and since it’s highly plausible that this year’s matching event will endwithin 4-10 seconds，以下是一些提示您捐款机会的技巧Miri在Facebook上的筹款页面being matched.

Pre-Event Preparation (before Giving Tuesday)

Confirm your FB account is operational.
Add your preferred credit card(s) as payment method(s)在您的FB设置页面中。Credit cards are plausibly mildly preferable to Paypal as a payment option in terms of donation speed.
Test your payment method(s) ahead of time by donating small amount(s) toMIRI’s Fundraiser page。
If your credit card limit is lower than the amount you’re considering donating, it may be possible to (a) overpay the balance ahead of time and/or (b) call your credit card asking them to (even temporarily) increase your limit.
如果您打算捐款超过2,499美yabo体育官网下载ios元，see belowfor instructions.
Sync whatever clock you’ll be using withtime.is。
Consider pledging your donation to MIRI atEA星期二。²

星期二捐赠

在12月3日星期二，BEFORE5:00:00am PT — it’s advisable to be alert and ready 10-20 minutes before the event — prepare your donation, so you can make your donation with a single click when the event begins at 5:00:00am PT.

打开一个准确的时钟time.is。
在不同的浏览器窗口中，打开Miri在Facebook上的筹款页面in your browser.
Click on the Donate button.
In the “Donate” popup that surfaces:

Enter your donation amount — between $5 and $2,499.见下文for larger donations.
选择要捐赠的任何卡。
Optionally enter a note and/or adjust the donation visibility.

At 05:00:00 PST, click on the Green Donate button.If your donation amount is $500 or larger, you may be presented with an additional “Confirm Your Donation” dialog. If so, click on它的Donate button as quickly as possible.

更大的捐款

By default, Facebook places a limit of $2,499 per donation (in the US³），，，，和will match up to $20,000 per donor. If you’re in a position to donate $2,500 or more to MIRI, you can:

Use multiple browser windows/tabs for each individual donation: open upMiri在Facebook上的筹款页面在尽可能多的标签需要在浏览器和follow the instructions above in each window/tab so you have multiple Donate buttons ready to click, one in each tab. Then at 5:00:00 PT, channel your lightning and click as fast as you can — one of our star supporters last year made 8 donations within 21 seconds, 5 of which were matched.
和/或
Before the event — ideally not the morning of — followEA星期二’s instructionson how to increase your per-donation limit on Facebook above $2,499. Our friends atEA星期二estimate that “you are likely to be able to successfully donate up to $9,999 per donation” after following these instructions. Their analysis also suggests that going higher than $10,000 for an individual donation plausibly significantly increases the probability of being declined and therefore advise not going beyond $9,999 per donation. It is possible that Facebook may put a cap of $2,499 on individual donations closer to the event.

通过上述组合，例如在事件开始的几秒钟内，在单独的浏览器窗口中，可以在单独的浏览器窗口中提供2美元的捐款。

贝宝（Paypal美国，，，，Canada，，，，和the UK。
感谢Ari，William，Rethink Charity以及EA星期二的所有工作，以帮助EA组织最大限度地利用Facebook匹配资金的份额。
有关Facebook在美国以外的Facebook捐赠限制的最新信息，请查看EA星期二’s doc。

2019年11月通讯

November 25, 2019|Rob Bensinger|消息letters

I'm happy to announce that Nate Soares and Ben Levinstein's “Cheating Death in Damascus”已被接受出版哲学杂志（以前投票second-highest-qualityjournal in philosophy).

In other news, MIRI researcher Buck Shlegeris has written over 12,000 words on a variety of MIRI-relevant topics在EA论坛上。（（Example topics:advice for software engineers;一致计划看起来像什么;和decision theory。）

其他更新

艾布拉姆·德姆斯基（Abram Demski）The Parable of Predict-O-Maticis a great read: the predictor/optimizer issues it covers are deep, but I expect a fairly wide range of readers to enjoy it and get something out of it.
埃文·哈宾（Evan Hubinger）渐变黑客描述了以前尚未阐明的重要故障模式。
Vanessa Kosoy'sLessWrong shortformhas recently discussed some especially interesting topics related to her learning-theoretic agenda.
斯图尔特·阿姆斯特朗（Stuart Armstrong）我所知道的是古德哈特constitutes nice conceptual progress on expected value maximizers that are aware ofGoodhart's law并试图避免它。
Reddy, Dragan, and Levine's paper onmodeling human intentcites (of all things)哈利波特与合理性的方法as inspiration.

消息和links

人工智能研究需要负责任的出版规范亚博体育官网：Crotof对问题提供了很好的评论法律。
斯图尔特·罗素（Stuart Russell）的新书已经出版：Human Compatible: Artificial Intelligence and the Problem of Control（（摘抄）。Rohin Shah's审查does an excellent job of contextualizing Russell's views within the larger AI safety ecosystem, and Rohin highlights the quote:

幸运的是，任务不是以下内容：给定具有高度智力的机器，请弄清楚如何控制它。如果那是任务，我们将是敬酒。一台被视为黑匣子的机器也可能从外太空到达。而且，我们从外太空控制超级实体的机会大约为零。类似的论点适用于创建AI系统的方法，以确保我们无法理解它们的工作方式；亚博体育苹果app官方下载这些方法包括全脑仿真 - 创建人类大脑的电子副本 - 以及基于程序的模拟演变方法。我不会对这些建议说更多，因为它们显然是一个坏主意。
Jacob Steinhardt releases anAI对齐研究概述亚博体育官网。
帕特里克·拉维克托尔（Patrick Lavictoire）Alphastar：RL进度令人印象深刻，而不是为AGI进度关于当今最先进的系统的能力，提出了一些重要的问题。亚博体育苹果app官方下载

October 2019 Newsletter

2019年10月25日|Rob Bensinger|消息letters

2019年9月通讯

2019年9月30日|Rob Bensinger|消息letters

2019年8月新闻通讯

2019年8月6日|Rob Bensinger|消息letters

2019年7月新闻通讯

July 19, 2019|Rob Bensinger|消息letters

Hubinger et al.'s “Risks from Learned Optimization in Advanced Machine Learning Systems”, one of our new core resources on the alignment problem, is now available onarxiv，这AI对齐论坛，，，，和LessWrong。

In other news, we received an Ethereum donation worth $230,910 from Vitalik Buterin — the inventor and co-founder of Ethereum, and now our third-largest all-time supporter!

Also worth highlighting, from the Open Philanthropy Project's Claire Zabel and Luke Muehlhauser:there's a pressing need for security professionals in AI safety and biosecurity。

It’s more likely than not that within 10 years, there will be dozens of GCR-focused roles in information security, and some organizations are already looking for candidates that fit their needs (and would hire them now, if they found them).

有些专注于高影响力职业的人（如许多有效的利他主义者）非常适合通过获得Infosec专业知识和经验，然后进入相关组织的工作，从而非常适合满足这一需求。

其他更新

Mesa Optimization: What It Is, And Why We Should Care- Rohin Shah始终如一的出色的一致性通讯讨论了“从学习优化的风险……”以及其他最近的AI安全工作。
Miri研亚博体育官网究助理Stuart Armstrong发布了他的Research Agenda v0.9: Synthesising a Human's Preferences into a Utility Function。
Openai和Miri员工help talk Munich student Connor Leahy out of releasingan attempted replication of OpenAI'sGPT-2model. (Lesswrong讨论。）Although Leahy's replication attempt wasn't successful, write-ups like his suggest that OpenAI's careful discussion surrounding GPT-2 is continuing to prompt good reassessments of publishing norms within ML. Quoting from Leahy's postmortem:

Sometime in the future we will have reached a point where the consequences of our research are beyond what we can discover in a one-week evaluation cycle. And given my recent experiences with GPT2, we might already be there. The more complex and powerful our technology becomes, the more time we should be willing to spend in evaluating its consequences. And if we have doubts about safety, we should default to caution.

我们倾向于生活在不断加速的世界。机器人h the industrial and academic R&D cycles have grown only faster over the decades. Everyone wants “the next big thing” as fast as possible. And with the way our culture is now, it can be hard to resist the pressures to adapt to this accelerating pace. Your career can depend on being the first to publish a result, as can your market share.

作为一个社区和社会，我们需要打击这一趋势，并创造一个健康的文化环境，使研究人员能够亚博体育官网take their time。They shouldn’t have to fear repercussions or ridicule for delaying release. Postponing a release because of added evaluation should be the norm rather than the exception. We need to make it commonly accepted that we as a community respect others’ safety concerns and don’t penalize them for having such concerns,even if they ultimately turn out to be wrong。If we don’t do this, it will be a race to the bottom in terms of safety precautions.
From Abram Demski:Selection vs. Control;Does Bayes Beat Goodhart?;和无更新决策理论和政策选择的概念问题
Vox's未来完美Podcast采访Jaan Tallinn并讨论了Miri在启动和传播AI安全模因中的作用。
人工智能不讨厌你，，，，journalist Tom Chivers' well-researched book about the rationality community and AI risk,在英国。

消息和links

其他最近的AI安全文章：David Krueger的让我们谈谈“融合理性”;Paul Christiano'sAligning a Toy Model of Optimization;和Owain Evans，William Saunders和AndreasStuhlmüller'sMachine Learning Projects on Iterated Distillation and Amplification
来自DeepMind：Vishal Maini将AI reading list，维多利亚·克拉科夫纳recaps the ICLR Safe ML workshop，，，，和Pushmeet Kohlidiscusses AI safety on the 80,000 Hours Podcast。
EA基金会因“减少先进人工智能的天文痛苦风险（S风险）的风险努力”而授予了赠款；apply by Aug. 11。
Additionally, if you're a young AI safety researcher (with a PhD) based at a European university or nonprofit, you may want to apply for~$60,000 in fundingfrom the Bosch Center for AI.

December 2019 Newsletter

MIRI’s 2019 Fundraiser

Giving Tuesday 2019

Pre-Event Preparation (before Giving Tuesday)

星期二捐赠

更大的捐款

2019年11月通讯

October 2019 Newsletter

2019年9月通讯

2019年8月新闻通讯

2019年7月新闻通讯

搜索

Browse

订阅

其他更新

研究团队的想法亚博体育官网

Pre-Event Preparation (before Giving Tuesday)

星期二捐赠

更大的捐款

其他更新

消息和links

Updates

消息和links

Updates

消息和links

Updates

消息和links

其他更新

消息和links

搜索

Browse

订阅