Alignment scholars often make arguments from limits:
In the limit, AIXI perfectly maximises reward. Hence, it is impossible to design an agent that is indifferent to its off switch being pressed; it will either be rewarded when it is pressed (and seek to press it) or punished (and seek to prevent its pressing).
The limit of predicting text is predicting the underlying processes that generated said text; the simplest way to predict a process is to directly simulate the process, and since intelligent humans are the text-generating process, language models are destined to be intelligent.
As AI becomes more powerful, they get better at modelling human behaviour. In the limit, AI can perfectly predict all human behaviour. This should worry us, as an adversary who can perfectly predict you can easily outwit you.
These arguments rely on an implicit assumption that the limit is in some way instantiable; the limit represents a real thing that could exist in the world, that we fleshy humans can interact with. Often, the assumption is that the elements of the sequence grow to approximately resemble the limit—if we set a large enough N, then the Nth element of the sequence is basically equivalent to the limit for practical purposes.
This assumption, that the limit is (approximately) instantiable, is not guaranteed by the mathematics. Further, it seems like most of the interesting properties of limits that worry alignment researchers are the very properties we would expect not the be approximately instantiable. For instance, an AI with uncertainty in its models has an incentive to explore and reduce its uncertainty, which it must balance with the incentive of exploiting the knowledge it already has. A system with no uncertainty, on the other hand, has no explore incentive. It can monomaniacally pursue its terminal goals unhindered by doubt. The limit of reduced uncertainty is monomania, but one cannot reduce uncertainty to zero; therefore, no agent will monomaniacally pursue its terminal goals.
In this talk, Blaine Rogers will talk about sequences and limits, discuss the extent to which the alignment community relies on limit thinking, and see how much of the Yudkowsky doom narrative falls apart under unlimited scrutiny.