Episode Details

Sandboxing, Agent Harnesses, and Agent Teamwork

Published 1 week, 2 days ago

Description

Shahram Anver is the Co-Founder and CEO of Cleric, the autonomous AI SRE that investigates and root-causes production issues like an experienced teammate — often in under two minutes. Before Cleric, Shahram led MLOps, DevOps, and FinOps platform engineering at Gojek, Southeast Asia's super-app. In this conversation, he breaks down why production operations never kept pace with AI-accelerated development, and why the real unlock for an AI SRE isn't faster triage — it's an agent that *learns* and compounds operational memory across your whole org.

In this episode:

🔧 The on-call problem — Why one broken service still drags ten engineers onto a call, and how AI changes that

🤖 What an AI SRE actually is — How Cleric investigates across your existing observability stack instead of adding another tool

🧠 Learning over MTTR — Why Shahram argues the value isn't alert triage, it's an agent that gets better every investigation

🪜 Ramping like a new engineer — Explore the environment, learn from the work, talk to the team

🔁 The investigate–measure–learn loop — Turning what worked on one incident into context for the next

🕸️ Knowledge graphs & operational memory — Mapping teams, clusters, and dependencies so insight from one team helps another

⚡ Under two minutes to root cause — What "fast" really requires in a live production environment

🚀 The road to autonomy — From assisted investigation toward self-healing infrastructure

If you're an SRE, platform engineer, DevOps lead, or anyone building or buying AI agents for production, this one's for you.

🔗 Links & Resources

Cleric: https://cleric.ai

Shahram on LinkedIn: https://www.linkedin.com/in/shahramanver/

Willem Pienaar (Co-Founder/CTO): https://www.linkedin.com/in/willempienaar/

Cleric launches the first self-learning AI SRE: https://cleric.ai/blog/cleric-launches-the-first-self-learning-ai-sre

MLOps Community: https://mlops.community

Join the community: https://go.mlops.community/slack

⏱️ Timestamps

[00:00] Tech Jargon Confusion

[00:27] Harness vs Model

[08:48] Model Evolution in Cleric

[13:36] Sandboxing and Simulated Environments

[20:40] Shifting AI Perceptions

[24:10] Managing Humans vs Agents

[31:32] Steering Parallel Agents

[34:16] Human Decision Integration in Models

[43:28] 80/20 Data Split

[49:40] Becoming a Skill

[53:35] 2027 Agent Autonomy

[59:14] Agent Learning in Production

[1:04:31] Software as Personal Capabilities

[1:08:31] Vibe Coding vs Durability

[1:18:23] Wrap up

#AISRE #SiteReliabilityEngineering #AIAgents

Episode Details

Sandboxing, Agent Harnesses, and Agent Teamwork

Description

Listen Now

Love PodBriefly?