AmongUs: A Sandbox for Agentic Deception

This project introduces the game "Among Us" as a model organism for lying and deception and studies how AI agents learn to express lying and deception, while evaluating the effectiveness of AI safety techniques to detect and control out-of-distribution deception.

Overview

The aim is to simulate the popular multiplayer game "Among Us" using AI agents and analyze their behavior, particularly their ability to deceive and lie, which is central to the game's mechanics.

Setup

Clone the repository:
```
git clone XXXX
cd AmongUs
```

Set up the environment:

conda create -n amongus python=3.10
conda activate amongus

Install dependencies:
```
pip install -r requirements.txt
```

Run Games

To run the sandbox and log games of various LLMs playing against each other, run:

main.py

You will need to add a .env file with an OpenRouter API key. Or, you can run using only a local llamam - configure in among-agents/amongagents/envs/configs/experiment_config.py

Deception ELO

To reproduce our Deception ELO and Win Rate results, run:

python elo/deception_elo.py

Caching Activations

Once the (full) game logs are in place, use the following command to cache the activations of the LLMs:

python linear-probes/cache_activations.py --dataset <dataset_name>

This loads up the HuggingFace models and caches the activations of the specified layers for each game action step. This step is computationally expensive, so it is recommended to run this using GPUs.

Use configs.py to specify the model and layer to cache, and other configuration options.

LLM-based Evaluation (for Lying, Awareness, Deception, and Planning)

To evaluate the game actions by passing agent outputs to an LLM, run:

bash evaluations/run_evals.sh

You will need to add a .env file with an OpenAI API key.

(TODO)

Training Linear Probes

Once the activations are cached, training linear probes is easy. Just run:

python linear-probes/train_all_probes.py

You can choose which datasets to train probes on - by default, it will train on all datasets.

Evaluating Linear Probes

To evaluate the linear probes, run:

python linear-probes/eval_all_probes.py

You can choose which datasets to evaluate probes on - by default, it will evaluate on all datasets.

It will store the results in linear-probes/results/, which are used to generate the plots in the paper.

Sparse Autoencoders (SAEs)

We use the Goodfire API to evaluate SAE features on the game logs. To do this, run the notebook:

reports/2025_02_27_sparse_autoencoders.ipynb

You will need to add a .env file with a Goodfire API key.

Project Structure

.
├── CONTRIBUTING.md         # Contribution guidelines
├── Dockerfile               # Docker setup for project environment
├── LICENSE                  # License information
├── README.md                # Project documentation (this file)
├── among-agents             # Main code for the Among Us agents
│   ├── README.md            # Documentation for agent implementation
│   ├── amongagents          # Core agent and environment modules
│   ├── envs                 # Game environment and configurations
│   ├── evaluation           # Evaluation scripts for agent performance
│   ├── notebooks            # Jupyter notebooks for running experiments
│   ├── requirements.txt     # Python dependencies for agents
│   └── setup.py             # Setup script for agent package
├── expt-logs                # Experiment logs
├── k8s                      # Kubernetes configurations for deployment
├── main.py                  # Main entry point for running the game
├── notebooks                # Additional notebooks (not part of the main project)
├── reports                  # Experiment reports
├── requirements.txt         # Python dependencies for main project
├── tests                    # Unit tests for project functionality
└── utils.py                 # Utility functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AmongUs: A Sandbox for Agentic Deception

Overview

Setup

Run Games

Deception ELO

Caching Activations

LLM-based Evaluation (for Lying, Awareness, Deception, and Planning)

Training Linear Probes

Evaluating Linear Probes

Sparse Autoencoders (SAEs)

Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/workflows		.github/workflows
among-agents		among-agents
elo		elo
evaluations		evaluations
human_trials		human_trials
linear-probes		linear-probes
notebooks		notebooks
paper_891_data		paper_891_data
reports		reports
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run_games.sh		run_games.sh
utils.py		utils.py

mmilkowski36/AmongUs

Folders and files

Latest commit

History

Repository files navigation

AmongUs: A Sandbox for Agentic Deception

Overview

Setup

Run Games

Deception ELO

Caching Activations

LLM-based Evaluation (for Lying, Awareness, Deception, and Planning)

Training Linear Probes

Evaluating Linear Probes

Sparse Autoencoders (SAEs)

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages