Episode Details

Back to Episodes
Data Infrastructure Automation For Private SaaS At Snowplow

Data Infrastructure Automation For Private SaaS At Snowplow

Episode 120 Published 6 years ago
Description

Summary

One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Snowplow Analytics the complexity is compounded by the need to manage multiple instances of their platform across customer environments. In this episode Josh Beemster, the technical operations lead at Snowplow, explains how they manage automation, deployment, monitoring, scaling, and maintenance of their streaming analytics pipeline for event data. He also shares the challenges they face in supporting multiple cloud environments and the need to integrate with existing customer systems. If you are daunted by the needs of your data infrastructure then it’s worth listening to how Josh and his team are approaching the problem.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host is Tobias Macey and today I’m interviewing Josh Beemster about how Snowplow manages deployment and maintenance of their managed service in their customer’s cloud accounts.

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of the components in your system architecture and the nature of your managed service?
  • What are some of the challenges that are inherent to private SaaS nature of your managed service?
  • What elements of your system require the most attention and maintenance to keep them running properly?
  • Which components in the pipeline are most subject to variability in traffic or resource pressure and what do you do to ensure proper capacity?
  • How do you manage deployment of the full Snowplow pipeline for your customers?
    • How has your strategy for deployment evolved since you first began Soffering the managed service?
    • How has the architecture of the pipeline evolved to simplify operations?
  • How much customization do you allow for in the event that the customer has their own system that they want to use in place of one of your supported components?
    • What are some of the common difficulties that you encounter when working with customers who need customized components, topologies, or event flows?
Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us