Simulator theory is a framing for describing large language models as a category of system, distinct from old notions of sovereign/oracle/genie AI. Domenic Denicola will walk us through the claims of simulator theory, starting with deciphering Janus's original text but quickly branching off into a survey of other works in this field. Is it mainly philosophy? Or can it be used to usefully understand and align AI systems?
Simulator theory in the context of AI refers to an ontology or frame for understanding the working of large generative models, such as the GPT series from OpenAI. Broadly it views these models as simulating a learned distribution with various degrees of fidelity, which in the case of language models trained on a large corpus of text is the mechanics underlying our world.
It can also refer to an alignment research agenda, that deals with better understanding simulator conditionals, effects of downstream training, alignment-relevant properties such as myopia and agency in the context of language models, and using them as alignment research accelerators.