Separation of Capabilities in LLMs

Wednesday 28 June 2023
19:00 21:00

Google Calendar ICS

LLMs can do a lot of different things: write code, critique poetry, translate data between formats, glue APIs together, and much more besides. How entangled are these capabilities? It’s plausible that the part of an LLM that generates code shares very few parameters with the part that critiques poetry—they could be perfectly distinct circuits. It’s also plausible that some very general capability like induction or recursive grammar enables many specific capabilities, such that most of the parameters are used for all capabilities. How can we tell which of these is true?

In this week’s session, Nicky Pochinkov will take us through recent research on LLM modularity, including a sneak peek at his upcoming paper on separating code generation from other capabilities on Meta’s OPT and Galactica.

One important aspect of Modularity, is that there are different components of the neural network that are preforming distinct, separate tasks. I call this the “separability” of capabilities in a neural network, and attempt to gain empirical insight into current models.
The main task I chose, was to attempt to prune a Large Language Model (LLM) such that it retains all abilities, except the ability to code (and vice versa). I have had some success in separating out the different capabilities of the LLMs (up to approx 65-75% separability), and have some evidence to suggest that larger LLMs might be somewhat separable in capabilities with only basic pruning methods.
My current understanding from this work, is that attention heads are more task-general, and feed-forward layers are more task-specific. There is, however, still room for better separability techniques and/or to train LLMs to be more separable in the first place.
— https://www.alignmentforum.org/posts/j84JhErNezMxyK4dH/llm-modularity-the-separability-of-capabilities-in-large

Separation of Capabilities in LLMs

LoRA: Why does fine-tuning work?

Editing Activations for Fun and Profit

AI Safety 東京