E Hubinger. 2020. “An Overview of 11 Proposals for Building Safe Advanced AI.” arXiv:2012.07532 [cs.LG].
E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “Risks from Learned Optimization in Advanced Machine Learning Systems.” arXiv:1906.01820 [cs.AI].
V Kosoy. 2019. “Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help.” Presented at the Safe Machine Learning workshop at ICLR.
A Demski and S Garrabrant. 2019. “Embedded Agency.” arXiv:1902.09469 [cs.AI].


S Armstrong and S Mindermann. 2018. “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents.” InAdvances in Neural Information Processing Systems31.
D Manheim and S Garrabrant. 2018. “Categorizing Variants of Goodhart’s Law.” arXiv:1803.04585 [cs.AI].


R Carey. 2018. “Incorrigibility in the CIRL Framework.” arXiv:1709.06275 [cs.AI]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2017. “A Formal Approach to the Problem of Logical Non-Omniscience.” Paper presented at the 16th conference on Theoretical Aspects of Rationality and Knowledge.
K Grace, J Salvatier, A Dafoe, B Zhang, and O Evans. 2017. “When Will AI Exceed Human Performance? Evidence from AI Experts.” arXiv:1705.08807 [cs.AI].
V Kosoy. 2017. “Forecasting Using Incomplete Models.” arXiv:1705.04630 [cs.LG].
N Soares and B Levinstein. 2020. “Cheating Death in Damascus.”The Journal of Philosophy117(5):237–266. Previously presented at the 14th Annual Formal Epistemology Workshop.
E Yudkowsky and N Soares. 2017. “Functional Decision Theory: A New Theory of Instrumental Rationality.” arXiv:1710.05060 [cs.AI].


T Benson-Tilsen and N Soares. 2016. “Formalizing Convergent Instrumental Goals.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
A Critch. 2019. “A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory.” arXiv:1602.04184 [cs:GT].The Journal of Symbolic Logic84(4):1368–1381. Previously published as “参数有界Lob定理和健壮的库珀ation of Bounded Agents.”
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “Logical Induction.” arXiv:1609.03543 [cs.AI].
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “Logical Induction (Abridged).” MIRI technical report 2016–2.
S Garrabrant, B Fallenstein, A Demski, and N Soares. 2016. “Inductive Coherence.” arXiv:1604.05288 [cs:AI]. Previously published as “Uniform Coherence.”
S Garrabrant, N Soares, and J Taylor. 2016. “Asymptotic Convergence in Online Learning with Unbounded Delays.” arXiv:1604.05280 [cs:LG].
V Kosoy and A Appel. 2020. “Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm.” arXiv:1608.04112 [cs.CC]. Forthcoming inJournal of Applied Logics.
J Leike, J Taylor, and B Fallenstein. 2016. “A Formal Solution to the Grain of Truth Problem.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.
L Orseau and S Armstrong. 2016. “Safely Interruptible Agents.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.
K Sotala. 2016. “Defining Human Values for Value Learners.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
J Taylor. 2016. “Quantilizers: A Safer Alternative to Maximizers for Limited Optimization.” Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
J Taylor, E Yudkowsky, P LaVictoire, and A Critch. 2016. “Alignment for Advanced Machine Learning Systems.” MIRI technical report 2016–1.


B Fallenstein and R Kumar. 2015. “Proof-Producing Reflection for HOL: With an Application to Model Polymorphism.” InInteractive Theorem Proving: 6th International Conference, ITP 2015, Nanjing, China, August 24-27, 2015, Proceedings.Springer.
B Fallenstein and N Soares. 2015. “Vingean Reflection: Reliable Reasoning for Self-Improving Agents.” MIRI technical report 2015–2.
B Fallenstein, N Soares, and J Taylor. 2015. “Reflective Variants of Solomonoff Induction and AIXI.” InProceedings of AGI 2015. Springer. Previously published as MIRI technical report 2015–8.
B Fallenstein, J Taylor, and P Christiano. 2015. “Reflective Oracles: A Foundation for Classical Game Theory.” arXiv:1508.04145 [cs.AI]. Previously published as MIRI technical report 2015–7. Published in abridged form as “Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence” inProceedings of LORI 2015.
S Garrabrant, S Bhaskar, A Demski, J Garrabrant, G Koleszarik, and E Lloyd. 2016. “Asymptotic Logical Uncertainty and the Benford Test.” arXiv:1510.03370 [cs.LG]. Paper presented at the Ninth Conference on Artificial General Intelligence. Previously published as MIRI technical report 2015–11.
K Grace. 2015. “The Asilomar Conference: A Case Study in Risk Mitigation.” MIRI technical report 2015–9.
K Grace. 2015. “Leó Szilárd and the Danger of Nuclear Weapons: A Case Study in Risk Mitigation.” MIRI technical report 2015–10.
P LaVictoire. 2015. “An Introduction to Löb’s Theorem in MIRI Research.” MIRI technical report 2015–6.
N Soares. 2015. “Aligning Superintelligence with Human Interests: An Annotated Bibliography.” MIRI technical report 2015–5.
N Soares. 2015. “Formalizing Two Problems of Realistic World-Models.” MIRI technical report 2015–3.
N Soares. 2018. “The Value Learning Problem.” InArtificial Intelligence Safety and Security. Chapman and Hall. Previously presented at the IJCAI 2016 Ethics for Artificial Intelligence workshop, and published earlier as MIRI technical report 2015–4.
N Soares and B Fallenstein. 2015. “Questions of Reasoning under Logical Uncertainty.” MIRI technical report 2015–1.
N Soares and B Fallenstein. 2015. “Toward Idealized Decision Theory.” arXiv:1507.01986 [cs.AI]. Previously published as MIRI technical report 2014–7. Published in abridged form as “Two Attempts to Formalize Counterpossible Reasoning in Deterministic Settings” inProceedings of AGI 2015.
K Sotala. 2015. “Concept Learning for Safe Autonomous AI.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop.


S Armstrong, K Sotala, and S Ó hÉigeartaigh. 2014. “The Errors, Insights and Lessons of Famous AI Predictions – and What They Mean for the Future.”Journal of Experimental & Theoretical Artificial Intelligence26 (3): 317–342.
M Bárász, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, and E Yudkowsky. 2014. “Robust Cooperation on the Prisoner’s Dilemma: Program Equilibrium via Provability Logic.” arXiv:1401.5577 [cs.GT].
T Benson-Tilsen. 2014. “UDT with Known Search Order.” MIRI technical report 2014–4.
N Bostrom and E Yudkowsky. 2018. “The Ethics of Artificial Intelligence.” InArtificial Intelligence Safety and Security. Chapman and Hall. Previously published in人工智能的剑桥手册(2014).
P Christiano. 2014. “Non-Omniscience, Probabilistic Inference, and Metamathematics.” MIRI technical report 2014–3.
B Fallenstein. 2014. “Procrastination in Probabilistic Logic.” Working paper.
B Fallenstein and N Soares. 2014. “Problems of Self-Reference in Self-Improving Space-Time Embedded Intelligence.” InProceedings of AGI 2014. Springer.
B Fallenstein and N Stiennon. 2014. “‘Loudness’: On Priors over Preference Relations.” Brief technical note.
P LaVictoire, B Fallenstein, E Yudkowsky, M Bárász, P Christiano and M Herreshoff. 2014. “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem.” Paper presented at the AAAI 2014 Multiagent Interaction without Prior Coordination Workshop.
L Muehlhauser and N Bostrom. 2014. “Why We Need Friendly AI.”Think13 (36): 42–47.
L Muehlhauser and B Hibbard. 2014. “Exploratory Engineering in AI.”Communications of the ACM57 (9): 32–34.
C Shulman and N Bostrom. 2014. “Embryo Selection for Cognitive Enhancement: Curiosity or Game-Changer?Global Policy5 (1): 85–92.
N Soares. 2014. “Tiling Agents in Causal Graphs.” MIRI technical report 2014–5.
N Soares and B Fallenstein. 2014. “Botworld 1.1.” MIRI technical report 2014–2.
N Soares and B Fallenstein. 2017. “Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda.” InThe Technological Singularity: Managing the Journey. Springer. Previously published as MIRI technical report 2014–8 under the name “Aligning Superintelligence with Human Interests: A Technical Research Agenda.”
N Soares, B Fallenstein, E Yudkowsky, and S Armstrong. 2015. “Corrigibility.” Paper presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop. Previously published as MIRI technical report 2014–6.
E Yudkowsky. 2014. “Distributions Allowing Tiling of Staged Subjective EU Maximizers.” MIRI technical report 2014–1.


A Altair. 2013. “A Comparison of Decision Algorithms on Newcomblike Problems.” Working paper. MIRI.
S Armstrong, N Bostrom, and C Shulman. 2015. “Racing to the Precipice: A Model of Artificial Intelligence Development.”AI & Society(DOI 10.1007/s00146-015-0590-7): 1-6. Previously published as Future of Humanity Institute technical report 2013–1.
P Christiano, E Yudkowsky, M Herreshoff, and M Bárász. 2013. “Definability of “Truth” in Probabilistic Logic.” Draft. MIRI.
B Fallenstein. 2013. “The 5-and-10 Problem and the Tiling Agents Formalism.” MIRI technical report 2013–9.
B Fallenstein. 2013. “Decreasing Mathematical Strength in One Formalization of Parametric Polymorphism.” Brief technical note. MIRI.
B Fallenstein. 2013. “An Infinitely Descending Sequence of Sound Theories Each Proving the Next Consistent.” MIRI technical report 2013–6.
B Fallenstein and A Mennen. 2013. “Predicting AGI: What Can We Say When We Know So Little?” Working paper. MIRI.
K Grace. 2013. “Algorithmic Progress in Six Domains.” MIRI technical report 2013–3.
J Hahn. 2013. “Scientific Induction in Probabilistic Metamathematics.” MIRI technical report 2013–4.
L Muehlhauser. 2013. “Intelligence Explosion FAQ.” Working paper. MIRI. (HTML)
L Muehlhauser and L Helm. 2013. “Intelligence Explosion and Machine Ethics.” InSingularity Hypotheses. Springer.
L Muehlhauser and A Salamon. 2013. “Intelligence Explosion: Evidence and Import.” InSingularity Hypotheses. Springer. (Español) (Français) (Italiano)
L Muehlhauser and C Williamson. 2013. “Ideal Advisor Theories and Personal CEV.” Working paper. MIRI.
N Soares. 2013. “Fallenstein’s Monster.” MIRI technical report 2013–7.
K Sotala and R Yampolskiy. 2014. “Responses to Catastrophic AGI Risk: A Survey.”Physica Scripta90 (1): 1-33. Previously published as MIRI technical report 2013–2.
N Stiennon. 2013. “Recursively-Defined Logical Theories Are Well-Defined.” MIRI technical report 2013–8.
R Yampolskiy and J Fox. 2013. “Artificial General Intelligence and the Human Mental Model.” InSingularity Hypotheses. Springer.
R Yampolskiy and J Fox. 2013. “Safety Engineering for Artificial General Intelligence.”Topoi32 (2): 217–226.
E Yudkowsky. 2013. “Intelligence Explosion Microeconomics.” MIRI technical report 2013–1.
E Yudkowsky. 2013. “The Procrastination Paradox.” Brief technical note. MIRI.
E Yudkowsky and M Herreshoff. 2013. “Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.” Draft. MIRI.


S Armstrong and K Sotala. 2012. “How We’re Predicting AI – or Failing To.” InBeyond AI: Artificial Dreams. Pilsen: University of West Bohemia.
B Hibbard. 2012. “Avoiding Unintended AI Behaviors.” InProceedings of AGI 2012. Springer.
B Hibbard. 2012. “Decision Support for Safe AI Design.” InProceedings of AGI 2012. Springer.
L Muehlhauser. 2012. “AI Risk Bibliography 2012.” Working paper. MIRI.
A Salamon and L Muehlhauser. 2012. “Singularity Summit 2011 Workshop Report.” Working paper. MIRI.
C Shulman and N Bostrom. 2012. “How Hard Is Artificial Intelligence? Evolutionary Arguments and Selection Effects.”Journal of Consciousness Studies19 (7–8): 103–130.
K Sotala. 2012. “Advantages of Artificial Intelligences, Uploads, and Digital Minds.”International Journal of Machine Consciousness4 (1): 275-291.
K Sotala and H Valpola. 2012. “Coalescing Minds: Brain Uploading-Related Group Mind Scenarios.”International Journal of Machine Consciousness4 (1): 293–312.


P de Blanc. 2011. “Ontological Crises in Artificial Agents’ Value Systems.” arXiv:1105.3821 [cs.AI]
D Dewey. 2011. “Learning What to Value.” InProceedings of AGI 2011. Springer.
E Yudkowsky. 2011. “Complex Value Systems Are Required to Realize Valuable Futures.” InProceedings of AGI 2011. Springer.


J Fox and C Shulman. 2010. “Superintelligence Does Not Imply Benevolence.” InProceedings of ECAP 2010. Verlag Dr. Hut.
S Kaas, S Rayhawk, A Salamon, and P Salamon. 2010. “Economic Implications of Software Minds.” InProceedings of ECAP 2010. Verlag Dr. Hut.
A Salamon, S Rayhawk, and J Kramár. 2010. “How Intelligible Is Intelligence?” InProceedings of ECAP 2010. Verlag Dr. Hut.
C Shulman. 2010. “Omohundro’s ‘Basic AI Drives’ and Catastrophic Risks.” Working paper. MIRI.
C Shulman. 2010. “Whole Brain Emulation and the Evolution of Superorganisms.” Working paper. MIRI.
C Shulman and A Sandberg. 2010. “Implications of a Software-Limited Singularity.” InProceedings of ECAP 2010. Verlag Dr. Hut.
K Sotala. 2010. “From Mostly Harmless to Civilization-Threatening.” InProceedings of ECAP 2010. Verlag Dr. Hut.
E Yudkowsky. 2010. “Timeless Decision Theory.” Working paper. MIRI.
E Yudkowsky, C Shulman, A Salamon, R Nelson, S Kaas, S Rayhawk, and T McCabe. 2010. “Reducing Long-Term Catastrophic Risks from Artificial Intelligence.” Working paper. MIRI.


P de Blanc. 2009. “Convergence of Expected Utility for Universal Artificial Intelligence.” arXiv:0907.5598 [cs.AI].
S Rayhawk, A Salamon, M Anissimov, T McCabe, and R Nelson. 2009. “Changing the Frame of AI Futurism: From Storytelling to Heavy-Tailed, High-Dimensional Probability Distributions.” Paper presented at ECAP 2009.
C Shulman and S Armstrong. 2009. “Arms Control and Intelligence Explosions.” Paper presented at ECAP 2009.
C Shulman, H Jonsson, and N Tarleton. 2009. “Machine Ethics and Superintelligence.” InProceedings of AP-CAP 2009. University of Tokyo.
C Shulman, N Tarleton, and H Jonsson. 2009. “Which Consequentialism? Machine Ethics and Moral Divergence.” InProceedings of AP-CAP 2009. University of Tokyo.
E Yudkowsky. 2008. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” InGlobal Catastrophic Risks. Oxford University Press. Published in abridged form as “Friendly Artificial Intelligence” inSingularity Hypotheses. (官话) (Italiano) (한국어) (Português) (Pу́сский)
E Yudkowsky. 2008. “Cognitive Biases Potentially Affecting Judgement of Global Risks.” InGlobal Catastrophic Risks. Oxford University Press. (Italiano) (Pу́сский) (Portuguese)
E Yudkowsky. 2007. “Levels of Organization in General Intelligence.” InArtificial General Intelligence (Cognitive Technologies). Springer.
E Yudkowsky. 2004. “Coherent Extrapolated Volition.” Working paper. MIRI.

