Episode Details

“Introducing LinuxArena” by Tyler Tracy, Ram Potham, Nick Kuhn, Myles H

Published 1 month, 2 weeks ago

Description

We are releasing LinuxArena, a new control setting comprised of 20 software engineering environments. Each environment consists of a set of SWE tasks, a set of possible safety failures, and a set of covert sabotage trajectories that cause these safety failures. LinuxArena can be used to evaluate the risk of AI deployments by assessing AI's sabotage capabilities, assessing control mitigations such as monitoring, and developing novel control techniques. This post will cover what LinuxArena is and the basics of its use cases.

I’m excited about LinuxArena because it is a much more complex control setting with many moving parts and enables a lot of new control research. It was already used in the Claude Mythos Preview System Card to measure Mythos's sabotage ability. We are happy to help AI companies or other organizations set up LinuxArena for their own evaluations or research!

LinuxArena

AI companies are increasingly deploying coding agents on real codebases with running services, databases, and complex permissions. But the control settings we use to evaluate whether these deployments are safe are far simpler than the deployments themselves. The APPS backdoor setting is a single-turn Python coding setting, and while our recent BenchBench and BashArena settings are [...]

---

Outline:

(01:07) LinuxArena

(03:25) LinuxArena as a sabotage risk evaluation

(03:49) Sabotage evals

(05:38) Monitor evals

(06:38) Control Evals

(08:54) Conclusion

---

First published:
April 20th, 2026

Source:
https://www.lesswrong.com/posts/WgzwBi6DCagDuHPzP/introducing-linuxarena

---