Episode Details

Back to Episodes
πŸ“† ThursdAI - July 17th - Kimi K2 πŸ‘‘, OpenAI Agents, Grok Waifus, Amazon Kiro, W&B Inference & more AI news!

πŸ“† ThursdAI - July 17th - Kimi K2 πŸ‘‘, OpenAI Agents, Grok Waifus, Amazon Kiro, W&B Inference & more AI news!

Published 8Β months, 2Β weeks ago
Description

Hey everyone, Alex here πŸ‘‹ and WHAT a week to turn a year older! Not only did I get to celebrate my birthday with 30,000+ of you live during the OpenAI stream, but we also witnessed what might be the biggest open-source AI release since DeepSeek dropped. Buckle up, because we're diving into a trillion-parameter behemoth, agentic capabilities that'll make your head spin, and somehow Elon Musk decided Grok waifus are the solution to... something.

This was one of those weeks where I kept checking if I was dreaming. Remember when DeepSeek dropped and we all lost our minds? Well, buckle up because Moonshot's Kimi K2 just made that look like a warm-up act. And that's not even the wildest part of this week!

As always, all the show notes and links are at the bottom, here's our liveshow (which included the full OAI ChatGPT agents watch party) - Let's get into it!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

πŸš€ Open Source LLMs: The Kimi K2 Revolution

The New Open Source King Has Arrived

Folks, I need you to understand something - just a little after we finished streaming last week celebrating Grok 4, a company called Moonshot decided to casually drop what might be the most significant open source release since... well, maybe ever?

Kimi K2 is a 1 trillion parameter model. Yes, you read that right - TRILLION. Not billion. And before you ask "but can my GPU run it?" - this is an MOE (Mixture of Experts) with only 32B active parameters, which means it's actually usable while being absolutely massive.

Let me give you the numbers that made my jaw drop:

* 65.8% on SWE-bench Verified - This non-reasoning model beats Claude Sonnet (and almost everything else)

* 384 experts in the mixture (the scale here is bonkers)

* 128K context window standard, with rumors of 2M+ capability

* Trained on 15.5 trillion tokens with the new Muon optimizer

The main thing about the SWE-bench score is not even just the incredible performance, it's the performance without thinking/reasoning + price!

The Muon Magic

Here's where it gets really interesting for the ML nerds among us. These folks didn't use AdamW - they used a new optimizer called Muon (with their own Muon Clip variant). Why does this matter? They trained to 15.5 trillion tokens with ZERO loss spikes. That beautiful loss curve had everyone in our community slack channels going absolutely wild.

As Yam explained during the show, claiming you have a better optimizer than AdamW is like saying you've cured cancer - everyone says it, nobody delivers. Well, Moonshot just delivered at 1 trillion parameter scale.

Why This Changes Everything

This isn't just another model release. This is "Sonnet at home" if you have the hardware. But more importantly:

* Modified MIT license (actually open!)

* 5x cheaper than proprietary alternatives

* Base model released (the first time we get a base model this powerful)

* Already has Anthropic-compatible API (they knew what they were doing)

The vibes are OFF THE CHARTS. Every high-taste model tester I know is saying this is the best open source model they've ever used. It doesn't have that "open source smell" - it feels like a frontier model because it IS a frontier model.

Not only a math genius

Importantly, this model is great at multiple things, as folks called out it's personality or writing style specifically! Our Friend Sam Paech, creator of EQBench, has noted that this is maybe the first time an o

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us