Episode Details

GPT 5.4 vs Gemini: Benchmarks, Codex, Excel

Episode 675 Published 2 days, 4 hours ago

Description

Beth Lyons and Andy Halliday open the show with a focused breakdown of GPT-5.4, framing it less as a universal leap and more as a strong advance in white-collar knowledge work and real-world task performance. Much of the conversation compares GPT-5.4 with Gemini 3.1 Pro Preview, Claude models, Codex, and other systems across benchmarks like GPT-Val, coding, long-context reasoning, hallucination resistance, and visual reasoning, with repeated emphasis that users still need to pick models based on the actual job to be done. Beth also shares a practical complaint about Gemini hallucinating around silent screen recordings and uses that to argue for a more dependable “colleague layer” in agentic systems. Later, Karl Yeh joins to talk through hands-on experience with GPT-5.4 in Codex, comparisons with Claude in Excel and Gemini in Sheets, and where the new release feels genuinely useful in day-to-day work.

Key Points Discussed

00:00:18 Welcome and setup for a GPT-5.4-focused episode

00:02:47 GPT-Val and white-collar knowledge work framing

00:08:51 Benchmark comparison across GPT-5.4, Claude, Gemini, and others

00:16:26 Gemini strengths in video and visual reasoning

00:18:05 Beth’s Gemini transcription / hallucination workflow example

00:23:54 “Then we’ll move to more news” and handoff to Karl Yeh

00:24:24 Karl Yeh on real-world use cases over benchmarks

00:55:30 Closing recommendations: try GPT-5.4, use Codex, newsletter and community plug

The Daily AI Show Co Hosts: Beth Lyons, Andy Halliday, Karl Yeh

Episode Details

GPT 5.4 vs Gemini: Benchmarks, Codex, Excel

Description

Listen Now

Love PodBriefly?