Episode Details

Back to Episodes
πŸ“† Feb 27, 2025 - GPT-4.5 Drops TODAY?!, Claude 3.7 Coding BEAST, Grok's Unhinged Voice, Humanlike AI voices & more AI news

πŸ“† Feb 27, 2025 - GPT-4.5 Drops TODAY?!, Claude 3.7 Coding BEAST, Grok's Unhinged Voice, Humanlike AI voices & more AI news

Published 1Β year, 1Β month ago
Description

Hey all, Alex here πŸ‘‹

What can I say, the weeks are getting busier , and this is one of those "crazy full" weeks in AI. As we were about to start recording, OpenAI teased GPT 4.5 live stream, and we already had a very busy show lined up (Claude 3.7 vibes are immaculate, Grok got an unhinged voice mode) and I had an interview with Kevin Hou from Windsurf scheduled! Let's dive in!

πŸ”₯ GPT 4.5 (ORION) is here - worlds largest LLM (10x GPT4o)

OpenAI has finally shipped their next .5 model, which is 10x scale from the previous model. We didn't cover this on the podcast but did watch the OpenAI live stream together after the podcast concluded.

A very interesting .5 release from OpenAI, where even Sam Altman says "this model won't crush on benchmarks" and is not the most frontier model, but is OpenAI's LARGEST model by far (folks are speculating 10+ Trillions of parameters)

After 2 years of smaller models and distillations, we finally got a new BIG model, that shows scaling laws proper, and while on some benchmarks it won't compete against reasoning models, this model will absolutely fuel a huge increase in capabilities even for reasoners, once o-series models will be trained on top of this.

Here's a summary of the announcement and quick vibes recap (from folks who had access to it before)

* OpenAI's largest, most knowledgeable model.

* Increased world knowledge: 62.5% on SimpleQA, 71.4% GPQA

* Better in creative writing, programming, problem-solving (no native step-by-step reasoning).

* Text and image input and text output

* Available in ChatGPT Pro and API access (API supports Function Calling, Structured Output)

* Knowledge Cutoff is October 2023.

* Context Window is 128,000 tokens.

* Max Output is 16,384 tokens.

* Pricing (per 1M tokens): Input: $75, Output: $150, Cached Input: $37.50.

* Foundation for future reasoning models

4.5 Vibes Recap

Tons of folks who had access are pointing to the same thing, while this model is not beating others on evals, it's much better at multiple other things, namely creative writing, recommending songs, improved vision capability and improved medical diagnosis.

Karpathy said "Everything is a little bit better and it's awesome, but also not exactly in ways that are trivial to point to" and posted a thread of pairwise comparisons of tone on his X thread

Though the reaction is bifurcated as many are upset with the high price of this model (10x more costly on outputs) and the fact that it's just marginally better at coding tasks. Compared to the newerSonnet (Sonnet 3.7) and DeepSeek, folks are looking at OpenAI and asking, why isn't this way better?

Anthropic's Claude 3.7 Sonnet: A Coding Powerhouse

Anthropic released Claude 3.7 Sonnet, and the immediate reaction from the community was overwhelmingly positive. With 8x more output capability (64K) and reasoning built in, this model is an absolute coding powerhouse.

Claude 3.7 Sonnet is the new king of coding models, achieving a remarkable 70% on the challenging SWE-Bench benchmark, and the initial user feedback is stellar, though vibes started to shift a bit towards Thursday.

Ranking #1 on WebDev arena, and seemingly trained on UX and websites, Claude Sonnet 3.7 (AKA NewerSonner) has been blowing our collective minds since it was released on Monday, especially due to introducing Thinking and reasoning in a combined model.

Now, since the start of the week, the community actually had time to play with it,

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us