Episode Details

Back to Episodes
Engineering Around Extreme S3 Scale with R. Tyler Croy

Engineering Around Extreme S3 Scale with R. Tyler Croy

Episode 660 Published 10 hours ago
Description

R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out.

About R. Tyler: 

R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization.

Show Highlights:
01:48 Scribd's 18-Year History

04:00 One Document Becomes Billions of Files

05:47 When Normal Physics Stop Working

08:02 Why S3 Metadata Costs Too Much

10:50 How AI Made Old Documents Valuable

13:30 From 100 Billion to 100 Million Objects

15:05 The Curse of Retail Pricing 

19:17 How Data Scientists Create Growth

21:18 De-Normalizing Data Problems

25:29 Evolving Old Systems

27:45 Billions Added Since Summer

29:29 Underused S3 Features

31:48 Where to Find Tyler


Links: 

Scribd: https://tech.scribd.com
Mastodon:  https://hacky.town/@rtyler
GitHub: https://github.com/rtyler

Sponsored by:
duckbillhq.com

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us