Podcast Episodes
Back to Search
Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15
Episode 15
Summary
The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that…
8 years, 4 months ago
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14
Episode 14
Summary
As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to distr…
8 years, 4 months ago
Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13
Episode 13
Summary
PostGreSQL has become one of the most popular and widely used databases, and for good reason. The level of extensibility that it supports ha…
8 years, 5 months ago
Wallaroo with Sean T. Allen - Episode 12
Episode 12
Summary
Data oriented applications that need to operate on large, fast-moving sterams of information can be difficult to build and scale due to the …
8 years, 5 months ago
SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11
Episode 11
Summary
Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in p…
8 years, 5 months ago
Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10
Episode 10
Summary
To process your data you need to know what shape it has, which is why schemas are important. When you are processing that data in multiple s…
8 years, 5 months ago
data.world with Bryon Jacob - Episode 9
Episode 9
Summary
We have tools and platforms for collaborating on software projects and linking them together, wouldn’t it be nice to have the same capabilit…
8 years, 6 months ago
Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8
Episode 8
Summary
With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, crea…
8 years, 6 months ago
Buzzfeed Data Infrastructure with Walter Menendez - Episode 7
Episode 7
Summary
Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This let…
8 years, 6 months ago
Astronomer with Ry Walker - Episode 6
Episode 6
Summary
Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform …
8 years, 10 months ago