Episode Details
Back to Episodes
Engineering Around Extreme S3 Scale with R. Tyler Croy
Description
R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out.
About R. Tyler:
R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization.
Show Highlights:
01:48 Scribd's 18-Year History
04:00 One Document Becomes Billions of Files
05:47 When Normal Physics Stop Working
08:02 Why S3 Metadata Costs Too Much
10:50 How AI Made Old Documents Valuable
13:30 From 100 Billion to 100 Million Objects
15:05 The Curse of Retail Pricing
19:17 How Data Scientists Create Growth
21:18 De-Normalizing Data Problems
25:29 Evolving Old Systems
27:45 Billions Added Since Summer
29:29 Underused S3 Features
31:48 Where to Find Tyler
Links:
Scribd: https://tech.scribd.com
Mastodon: https://hacky.town/@rtyler
GitHub: https://github.com/rtyler
Sponsored by:
duckbillhq.com