Episode Details
Back to Episodes“Data you could have observed but didn’t” by Gretta Duleba
Description
You're running a study that involves keeping records about humans. You have a spreadsheet with rows for each person and columns for height, weight, and eye color. You get pretty far in your study and then realize you sure could have used hair color too, but shoot, you didn't think of that in advance, so you don't have that data.
What kind of data is hair color in this example?
It's not an observable because you didn't observe it.
It's not a latent because you totally could have observed it, if you'd thought to do so, it's right there. (You might have to check the roots specifically though, or the people with purple hair dye are going to throw you off.)
I didn't know the vocab word. I was using unobserved for a while but wasn't happy with it.
I looked into it[1] and it turns out that different fields have different words for this.[2]
In econometrics, they do sometimes say unobserved but when they do it's a fuzzy catch-all that might also mean a latent. They're more likely to say omitted.
There's a whole subfield of statistics specializing in missing data. The subfield is called... Missing Data.
[...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
May 29th, 2026
Source:
https://www.lesswrong.com/posts/jemRvoCop4z4y4zfT/data-you-could-have-observed-but-didn-t
---
Narrated by TYPE III AUDIO.