Paul Zaich from Checkr tells us about a critical outage that occurred, what caused it and how they tracked down and fixed the issue. The conversation ranges through troubleshooting complex systems, building team culture, blameless post-mortems, and monitoring the right things to make sure your applications don't fail or alert you when they do.
Links
Published on 1 year, 3 months ago
If you like Podbriefly.com, please consider donating to support the ongoing development.
Donate