Back to All Events

Agent Incentives: A Causal Perspective

Following up on last week’s discussion, we’ll be diving deeper into casual influence diagrams. How can they be used for system analysis and incentive design? We’ll be using the preprint Agent Incentives: A Causal Perspective (Everitt et al. ??) from the Causal Incentives Working Group to guide our discussion. Abstract:

We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.

http://arxiv.org/abs/2102.01685

Previous
Previous
11 January

Discovering Agents

Next
Next
25 January

What Everyone in Technical Alignment is Doing and Why: Anthropic, OpenAI, DeepMind Safety, Conjecture