Episode Details

Back to Episodes
How LLMs Actually Learn: Stages or Slurry?

How LLMs Actually Learn: Stages or Slurry?

Episode 3767 Published 3 days, 16 hours ago
Description
When you train a large language model from scratch, does it learn in stages — grammar first, then facts, then reasoning? Or does everything bloom in parallel? The honest answer is both. This episode unpacks what loss curves actually reveal about training dynamics: how syntax and factual knowledge race each other from day one, why "grokking" complicates the picture, and how the training data's natural distribution creates a self-curriculating curriculum. We explore why the model's internal representations crystallize over time, how emergent abilities appear at scale thresholds, and why the training process is best understood as a single undifferentiated slurry of next-token prediction from which structure emerges.
Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us