Episode Details
Back to Episodes
Stop Delegating AI Decisions: How Spec Kit Makes AI Agents Safe in Microsoft Entra and Microsoft 365
Season 1
Published 3 months, 3 weeks ago
Description
(00:00:00) The AI Governance Dilemma
(00:00:38) The Pitfalls of Unchecked AI-Powered Development
(00:03:16) The Spec Kit Solution: Binding Intent to Executable Rules
(00:05:38) The Mechanics of Privileged Creep
(00:17:42) Consent Sprawl: When Convenience Becomes a Threat
(00:23:00) Conditional Access Erosion: The Silent Threat
(00:28:44) Measuring and Improving Identity Governance
(00:34:13) Implementing Constitutional Governance with Spec Kit
(00:34:56) The Power of Executable Governance
(00:40:11) Identity Policies as Compilers
In this episode of m365.fm, Mirko Peters looks at what really happens when teams let AI agents make technical decisions in live Microsoft Entra and Microsoft 365 environments. AI agents are increasingly wired directly into internal APIs, developer workflows, and infrastructure, where they write code, call services, and change configurations at scale. The problem: agents optimize for task completion, not for long‑term safety, governance, or architectural intent. This episode explains why “letting the agent figure it out” quickly becomes a reliability and security risk once you leave the lab and enter production.
WHY AI AGENTS BEHAVE DIFFERENTLY IN REAL SYSTEMS
In theory, agentic systems sound efficient: describe the outcome, let the agent plan and execute. In practice, production reality is messy. Agents chain unexpected API calls, pick unsafe defaults, and generate changes that engineers struggle to reproduce or fully understand later. A small prompt can lead to a large system change, touching identity, permissions, and data paths you never intended to expose. Debugging this behavior is significantly harder than debugging human‑written code, especially when logs, prompts, and context windows interact in non‑obvious ways.
NON‑DETERMINISM IS AN ENGINEERING PROBLEM, NOT JUST A RESEARCH QUIRK
Many teams underestimate how non‑deterministic behavior impacts operations, audits, and incident response. The same agent prompt can produce different code, different API calls, or different side effects across runs. That makes root‑cause analysis, reproducible fixes, and compliance evidence difficult or impossible. This episode argues that determinism still matters deeply in modern systems: you need clear boundaries where behavior is predictable, testable, and reviewable—even if an LLM is involved somewhere in the pipeline.
SECURITY, PERMISSIONS, AND ACCIDENTAL CHAOS
Security risk multiplies when AI agents are treated like “junior engineers” instead of untrusted automation. In practice, agents tend to request broader permissions than necessary, store secrets unsafely, or create undocumented endpoints and shortcuts. They may bypass established workflows, skip approvals, or write code that quietly weakens existing controls. The episode breaks down why traditional security assumptions break once agents can act, and why you must design your systems as if agents are external, untrusted callers—no matter how smart they appear.
WHAT SPEC KIT DOES: ENFORCING ARCHITECTURAL INTENT
Spec Kit is introduced as a way to make architectural intent explicit and enforceable before agents touch real systems. Instead of letting an agent “decide” how to integrate with Microsoft Graph or internal APIs, Spec Kit define
(00:00:38) The Pitfalls of Unchecked AI-Powered Development
(00:03:16) The Spec Kit Solution: Binding Intent to Executable Rules
(00:05:38) The Mechanics of Privileged Creep
(00:17:42) Consent Sprawl: When Convenience Becomes a Threat
(00:23:00) Conditional Access Erosion: The Silent Threat
(00:28:44) Measuring and Improving Identity Governance
(00:34:13) Implementing Constitutional Governance with Spec Kit
(00:34:56) The Power of Executable Governance
(00:40:11) Identity Policies as Compilers
In this episode of m365.fm, Mirko Peters looks at what really happens when teams let AI agents make technical decisions in live Microsoft Entra and Microsoft 365 environments. AI agents are increasingly wired directly into internal APIs, developer workflows, and infrastructure, where they write code, call services, and change configurations at scale. The problem: agents optimize for task completion, not for long‑term safety, governance, or architectural intent. This episode explains why “letting the agent figure it out” quickly becomes a reliability and security risk once you leave the lab and enter production.
WHY AI AGENTS BEHAVE DIFFERENTLY IN REAL SYSTEMS
In theory, agentic systems sound efficient: describe the outcome, let the agent plan and execute. In practice, production reality is messy. Agents chain unexpected API calls, pick unsafe defaults, and generate changes that engineers struggle to reproduce or fully understand later. A small prompt can lead to a large system change, touching identity, permissions, and data paths you never intended to expose. Debugging this behavior is significantly harder than debugging human‑written code, especially when logs, prompts, and context windows interact in non‑obvious ways.
NON‑DETERMINISM IS AN ENGINEERING PROBLEM, NOT JUST A RESEARCH QUIRK
Many teams underestimate how non‑deterministic behavior impacts operations, audits, and incident response. The same agent prompt can produce different code, different API calls, or different side effects across runs. That makes root‑cause analysis, reproducible fixes, and compliance evidence difficult or impossible. This episode argues that determinism still matters deeply in modern systems: you need clear boundaries where behavior is predictable, testable, and reviewable—even if an LLM is involved somewhere in the pipeline.
SECURITY, PERMISSIONS, AND ACCIDENTAL CHAOS
Security risk multiplies when AI agents are treated like “junior engineers” instead of untrusted automation. In practice, agents tend to request broader permissions than necessary, store secrets unsafely, or create undocumented endpoints and shortcuts. They may bypass established workflows, skip approvals, or write code that quietly weakens existing controls. The episode breaks down why traditional security assumptions break once agents can act, and why you must design your systems as if agents are external, untrusted callers—no matter how smart they appear.
WHAT SPEC KIT DOES: ENFORCING ARCHITECTURAL INTENT
Spec Kit is introduced as a way to make architectural intent explicit and enforceable before agents touch real systems. Instead of letting an agent “decide” how to integrate with Microsoft Graph or internal APIs, Spec Kit define