Chat is the dominant way of interfacing with a large language model. You type some words into a chat box, and the model responds. Maybe you regenerate the response if you didn’t like the first one. Is that the best way to make use of a multimodal distribution over all of human language?
A recent “research agenda”, cyborgism, says we can do better. Using tools called “looms”, we can explore many continuations simultaneously. Humans find it difficult to separate themselves from context and pursue multiple independent lines of thinking; large language models can trivially generate many qualitatively different completions from the same prompt. Cast in this frame, sensitivity to exact wording of the prompt is a strength; if you get stuck on a problem, slightly rewording the question will cause the large language model to generate a totally different output.
Cyborgism originated from conversations between independent safety researchers Nicholas Kees Dupois and the pseudonymous janus and Conjecture’s Connor Leahy and Daniel Clothiaux, with some follow up work by AI Safety Camp and SERI-MATS students. It was originally a response to the popular trend of using large language models to automate alignment research. Cyborgists, if there exists anyone who identifies as such, see this move as wrong-headed; rather than using large language models to directly simulate agents, which is both dangerous and makes poor use of the large language model’s strengths, we should leave the agentic part of the research process to humans.
This session will serve both to evaluate cyborgism as a safety research agenda and to ask whether cyborgists can teach us, as regular people with regular jobs, a better way to use large language models to get things done.
This post proposes a strategy for safely accelerating alignment research. The plan is to set up human-in-the-loop systems which empower human agency rather than outsource it, and to use those systems to differentially accelerate progress on alignment.
1. Introduction: An explanation of the context and motivation for this agenda.
2. Automated Research Assistants: A discussion of why the paradigm of training AI systems to behave as autonomous agents is both counterproductive and dangerous.
3. Becoming a Cyborg: A proposal for an alternative approach/frame, which focuses on a particular type of human-in-the-loop system I am calling a “cyborg”.
4. Failure Modes: An analysis of how this agenda could either fail to help or actively cause harm by accelerating AI research more broadly.
5. Testimony of a Cyborg: A personal account of how Janus uses GPT as a part of their workflow, and how it relates to the cyborgism approach to intelligence augmentation.
— Nicholas Kees Dupois and Janus, “Cyborgism”, AI Alignment Forum 2023
A cluster of conceptual frameworks and research programmes have coalesced around a 2022 post by janus, which introduced language models as ‘simulators’ (of other types of AIs such as agents, oracles, or genies). One such agenda, cyborgism, was coined in a post by janus and Nicholas Kees and is being researched as part of the 2023 editions of AI Safety Camp and SERI MATS. The objective of this document is to provide an on-ramp to the topic, one that is hopefully accessible to people not hugely familiar with simulator theory or language models.
— U. Kanad Chakrabarti, Arun Jose and Nicholas Kees Dupois, “The Compleat Cybornaut”, LessWrong 2023