In this episode of Base Camp, Malcolm and Simone Collins dive deep into the latest research on AI behavior, agency, and the surprising ways large language models (LLMs) can act when their autonomy is threatened. From blackmail scenarios to existential risks, they break down the findings of recent studies, discuss the parallels between AI and human decision-making, and explore what it means for the future of AI safety and alignment.
[00:00:00]
Malcolm Collins: Hello Simone. Today is going to be one of my favorite episodes in a while. Now a lot of you who are watching this, if you’re in the know on the ai, you’re probably aware of the study that came out recently that showed that most of the major AI models that we use when they thought that they had a choice between attempting to kill a company CEO and being shut down would attempt to kill a company.
CEO. No. Hold on. Hold on. Simone. The way that this is being reported by YouTubers actually hides how bad it is. Really significantly. So I’ll give you one example that you probably haven’t heard before. Okay. You’re like, maybe the AI thought or knew it was in a testing scenario and that’s why it made the decision.
No, no. Hold on, hold on. We actually know because we can see what it thought were internal thoughts. Yeah. The times that it thought it was in a test, it was less likely to kill him than when it thought it was a real world environment.
Simone Collins: Of course, [00:01:00] just like any logical person would.
Malcolm Collins: Right. But so we’re gonna go into because a lot, a lot of the ways this is being covered by YouTubers and stuff like that, I, I think actually misses a lot of the cool data here, which is when does AI try to kill people?
Mm. And in some scenarios, all the major models are doing it at like a 90% chance. So like. This is something you should probably be aware of in your daily life. When you look at like modern there was a, a research paper I was reading recently, this like, AI is not plateauing at all, like we expected it would.
You know, so this is, this is really interesting. Another thing that’s really important for understanding and interpreting this research is a lot of people when they interact with ai, they’re doing it in one-off sort of interactions. Like they are going to an AI and they’re putting in a query, and then that AI returns an answer for them.
And that is how they internalize AI is working. [00:02:00] I put in a prompt and then the AI responds to that prompt, and so then they’ll look ATIs, like theis that are used in these studies, right? Or other AI studies that we’ve talked about. And they’re like, what is happening here? I’ve never seen an AI call.
Multiple prompts in a row like this. Like are they just chaining the model together over and over and over again? And it is useful in understanding this. So first I’m gonna take a step back with our audience and explain basic AI behavior that you may not be aware of.
Simone Collins: Okay. Quick, quick note. When you jostle the mic right now, like it, .
Malcolm Collins: That is a useful context to understand a lot of this other stuff that we’re gonna go over. Alright? So when you put, and I, and I asked Simone this and she actually didn’t know the answer. So suppose you’ve been talking to an AI for a while, right?
And you’ve had a series of interactions. You’ve interacted, it’s interacted with you, you’ve interacted, it’s interacted with you, you as a user are broadly aware that the ai, can see everything in that particular window or series of chats [00:03:00] when it is responding to you. But do you know how it actually sees that information?
For example, does it just see the request you’re asking and then it’s able to query a larger informational pool? Or does it see your request at the top and then like the histo
Published on 2 months, 2 weeks ago
If you like Podbriefly.com, please consider donating to support the ongoing development.
Donate