Episode Details

Back to Episodes

AISN #13: An interdisciplinary perspective on AI proxy failures, new competitors to ChatGPT, and prompting language models to misbehave.

Published 2 years, 10 months ago
Description

Interdisciplinary Perspective on AI Proxy Failures

In this story, we discuss a recent paper on why proxy goals fail. First, we introduce proxy gaming, and then summarize the paper’s findings.

Proxy gaming is a well-documented failure mode in AI safety. For example, social media platforms use AI systems to recommend content to users. These systems are sometimes built to maximize the amount of time a user spends on the platform. The idea is that the time the user spends on the platform approximates the quality of the content being recommended. However, a user might spend even more time on a platform because they’re responding to an enraging post or interacting [...]

---

Outline:

(00:13) Interdisciplinary Perspective on AI Proxy Failures

(06:06) A Flurry of AI Fundraising and Model Releases

(12:53) Adversarial Inputs Make Chatbots Misbehave

(15:52) Links

---

First published:
July 5th, 2023

Source:
https://newsletter.safe.ai/p/ai-safety-newsletter-13

---

Want more? Check out our ML Safety Newsletter for technical safety research.

Narrated by TYPE III AUDIO.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us