Episode Details

“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob

Published 1 month, 1 week ago

Description

This is a first in a pair of posts I'm hoping to write about Singular Learning Theory (SLT) and singularities as a model of data degeneracy. If I get to it, the second post is going to be more general-audience; this one is more technical.

Introduction

To me, SLT is an important source of toy models which point at an interesting class of new statistical phenomena in learning. It is also a valuable correction to an older and (at this point) largely-defunct story of learning being fully controlled by Hessian eigenvalues and "nonsingular basins". Practitioners of SLT have been instrumental for developing and refining the practice of Bayesian sampling (used by physicists in papers like this one) to empirical models. And the theory's founder Sumio Watanabe is a once-in-a-generation genius who saw and mathematically justified crucial statistical and information-theoretic concepts in learning before long before they appeared in "mainstream" ML theory.

However there is a frequently repeated statement in SLT papers – one that doesn't affect empirical results – which I think is wrong in a load-bearing way. This is the statement that models that appear in machine learning are singular in the infinite-data limit, and that a measurement [...]

---

Outline:

(00:27) Introduction

(03:40) What doesnt need fixing

(04:45) Whats wrong

(07:17) The theory

(08:05) Infinite data, and the parameters and .

(09:16) The SLT prediction

(10:31) Hermite modes and excitations

(13:50) Addendum: the actual lambda-hat scaling (Ansatz and experiment)

(15:28) The effective theory

(18:39) Is this example special?

(20:27) The upshot

The original text contained 10 footnotes which were omitted from this narration.

---

First published:
April 28th, 2026

Source:
https://www.lesswrong.com/posts/5hKgJy8rcqnM9ntp2/learning-zero-and-what-slt-gets-wrong-about-it

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Two log-scale graphs showing Hermite mode coefficients versus mode index k and square root k.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Episode Details

“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob

Description

Listen Now

Love PodBriefly?