Episode Details

Back to Episodes
ThursdAI - Apr 3rd - OpenAI Goes Open?! Gemini Crushes Math, AI Actors Go Hollywood & MCP, Now with Observability?

ThursdAI - Apr 3rd - OpenAI Goes Open?! Gemini Crushes Math, AI Actors Go Hollywood & MCP, Now with Observability?

Published 11 months, 4 weeks ago
Description

Woo! Welcome back to ThursdAI, show number 99! Can you believe it? We are one show away from hitting the big 100, which is just wild to me. And speaking of milestones, we just crossed 100,000 downloads on Substack alone! [Insert celebratory sound effect here 🎉]. Honestly, knowing so many of you tune in every week genuinely fills me with joy, but also a real commitment to keep bringing you the the high-signal, zero-fluff AI news you count on. Thank you for being part of this amazing community! 🙏

And what a week it's been! I started out busy at work, playing with the native image generation in ChatGPT like everyone else (all 130 million of us!), and then I looked at my notes for today… an absolute mountain of updates. Seriously, one of those weeks where open source just exploded, big companies dropped major news, and the vision/video space is producing stuff that's crossing the uncanny valley.

We’ve got OpenAI teasing a big open source release (yes, OpenAI might actually be open again!), Gemini 2.5 showing superhuman math skills, Amazon stepping into the agent ring, truly mind-blowing AI character generation from Meta, and a personal update on making the Model Context Protocol (MCP) observable. Plus, we had some fantastic guests join us live!

So buckle up, grab your coffee (or whatever gets you through the AI whirlwind), because we have a lot to cover. Let's dive in! (as always, show notes and links in the end)

OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised

It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles.

First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"! Kevin Weil tweeted that they're working on a "highly capable open language model" and are actively seeking developer feedback through dedicated sessions (sign up here if interested) to "get this right." Word on the street is that this could be a powerful reasoning model. Sam Altman also cheekily added they won't slap on a Llama-style <700M user license limit. Seeing OpenAI potentially re-embrace its "Open" roots with a potentially SOTA model is huge. We'll be watching like hawks!

Second, they dropped PaperBench, a brutal new benchmark evaluating an AI's ability to replicate ICML 2024 research papers from scratch (read paper, write code, run experiments, match results - no peeking at original code!). It's incredibly detailed (>8,300 tasks) and even includes meta-evaluation for the LLM judge they built (Nano-Eval framework also open sourced). The kicker? Claude 3.5 Sonnet (New) came out on top with just 21.0% replication score (human PhDs got 41.4%). Props to OpenAI for releasing an eval where they don’t even win. That’s what real benchmarking integrity looks like. You can find the code on GitHub and read the full paper here.

Third, the casual

Listen Now