Episode Details
Back to Episodes
📆 ThursdAI - the week that changed the AI landscape forever - Gemini 3, GPT codex max, Grok 4.1 & fast, SAM3 and Nano Banana Pro
Description
Hey everyone, Alex here đź‘‹
I’m writing this one from a noisy hallway at the AI Engineer conference in New York, still riding the high (and the sleep deprivation) from what might be the craziest week we’ve ever had in AI.
In the span of a few days:
Google dropped Gemini 3 Pro, a new Deep Think mode, generative UIs, and a free agent-first IDE called Antigravity.xAI shipped Grok 4.1, then followed it up with Grok 4.1 Fast plus an Agent Tools API.OpenAI answered with GPT‑5.1‑Codex‑Max, a long‑horizon coding monster that can work for more than a day, and quietly upgraded ChatGPT Pro to GPT‑5.1 Pro.Meta looked at all of that and said “cool, we’ll just segment literally everything and turn photos into 3D objects” with SAM 3 and SAM 3D.Robotics folks dropped a home robot trained with almost no robot data.And Google, just to flex, capped Thursday with Nano Banana Pro, a 4K image model and a provenance system while we were already live!
For the first time in a while it doesn’t just feel like “new models came out.” It feels like the future actually clicked forward a notch.
This is why ThursdAI exists. Weeks like this are basically impossible to follow if you have a day job, so my co‑hosts and I do the no‑sleep version so you don’t have to. Plus, being at AI Engineer makes it easy to get super high quality guests so this week we had 3 folks join us, Swyx from Cognition/Latent Space, Thor from DeepMind (on his 3rd day) and Dominik from OpenAI! Alright, deep breath. Let’s untangle the week.
TL;DR
If you only skim one section, make it this one (links in the end):
* Gemini 3 Pro: 1M‑token multimodal model, huge reasoning gains - new LLM king
* ARC‑AGI‑2: 31.11% (Pro), 45.14% (Deep Think) – enormous jumps
* Antigravity IDE: free, Gemini‑powered VS Code fork with agents, plans, walkthroughs, and browser control
* Nano Banana Pro: 4K image generation with perfect text + SynthID provenance; dynamic “generative UIs” in Gemini
* xAI
* Grok 4.1: big post‑training upgrade – #1 on human‑preference leaderboards, much better EQ & creative writing, fewer hallucinations
* Grok 4.1 Fast + Agent Tools API: 2M context, SOTA tool‑calling & agent benchmarks (Berkeley FC, T²‑Bench, research evals), aggressive pricing and tight X + web integration
* OpenAI
* GPT‑5.1‑Codex‑Max: “frontier agentic coding” model built for 24h+ software tasks with native compaction for million‑token sessions; big gains on SWE‑Bench, SWE‑Lancer, TerminalBench 2
* GPT‑5.1 Pro: new “research‑grade” ChatGPT mode that will happily think for minutes on a single query
* Meta
* SAM 3: open‑vocabulary segmentation + tracking across images and video (with text & exemplar prompts)
* SAM 3D: single‑image → 3D objects & human bodies; surprisingly high‑quality 3D from one photo
* Robotics
* Sunday Robotics – ACT‑1 & Memo: home robot foundation model trained from a $200 skill glove instead of $20K teleop rigs; long‑horizon household tasks with solid zero‑shot generalization
* Developer Tools
* Antigravity and Marimo’s VS Code / Cursor extension both push toward agentic, reactive dev workflows
Live from AI Engineer New York: Coding Agents Take Center Stage
We recorded this week’s show on location at the AI Engineer Summit in New York, inside a beautiful podcast studio