Episode Details
Back to Episodes
Sandboxing, Agent Harnesses, and Agent Teamwork
Description
Shahram Anver is the Co-Founder and CEO of Cleric, the autonomous AI SRE that investigates and root-causes production issues like an experienced teammate — often in under two minutes. Before Cleric, Shahram led MLOps, DevOps, and FinOps platform engineering at Gojek, Southeast Asia's super-app. In this conversation, he breaks down why production operations never kept pace with AI-accelerated development, and why the real unlock for an AI SRE isn't faster triage — it's an agent that *learns* and compounds operational memory across your whole org.
In this episode:
🔧 The on-call problem — Why one broken service still drags ten engineers onto a call, and how AI changes that
🤖 What an AI SRE actually is — How Cleric investigates across your existing observability stack instead of adding another tool
🧠 Learning over MTTR — Why Shahram argues the value isn't alert triage, it's an agent that gets better every investigation
🪜 Ramping like a new engineer — Explore the environment, learn from the work, talk to the team
🔁 The investigate–measure–learn loop — Turning what worked on one incident into context for the next
🕸️ Knowledge graphs & operational memory — Mapping teams, clusters, and dependencies so insight from one team helps another
⚡ Under two minutes to root cause — What "fast" really requires in a live production environment
🚀 The road to autonomy — From assisted investigation toward self-healing infrastructure
If you're an SRE, platform engineer, DevOps lead, or anyone building or buying AI agents for production, this one's for you.
🔗 Links & Resources
Cleric: https://cleric.ai
Shahram on LinkedIn: https://www.linkedin.com/in/shahramanver/
Willem Pienaar (Co-Founder/CTO): https://www.linkedin.com/in/willempienaar/
Cleric launches the first self-learning AI SRE: https://cleric.ai/blog/cleric-launches-the-first-self-learning-ai-sre
MLOps Community: https://mlops.community
Join the community: https://go.mlops.community/slack
⏱️ Timestamps
[00:00] Tech Jargon Confusion
[00:27] Harness vs Model
[08:48] Model Evolution in Cleric
[13:36] Sandboxing and Simulated Environments
[20:40] Shifting AI Perceptions
[24:10] Managing Humans vs Agents
[31:32] Steering Parallel Agents
[34:16] Human Decision Integration in Models
[43:28] 80/20 Data Split
[49:40] Becoming a Skill
[53:35] 2027 Agent Autonomy
[59:14] Agent Learning in Production
[1:04:31] Software as Personal Capabilities
[1:08:31] Vibe Coding vs Durability
[1:18:23] Wrap up
#AISRE #SiteReliabilityEngineering #AIAgents