Episode Details
Back to Episodes
Dataverse pipelines: choose Synapse Link or Dataflow Gen2 based on refresh, storage ownership, and rollback safety—not hype
Season 1
Published 1 year, 3 months ago
Description
Dataverse pipelines are not failing because users are careless; they are failing because you picked the wrong extraction tool. In this episode of M365.fm, Mirko Peters puts Synapse Link and Dataflow Gen2 on the table side by side and shows how refresh frequency, storage ownership, and rollback safety—not hype—decide which one belongs in your architecture.
He starts with Synapse Link, the control freak’s dream. You choose exactly which Dataverse tables and columns to sync, define refresh cadence down to every 15 minutes, and land data directly in your own Azure Data Lake Storage Gen2 account in open Parquet format. That means you own the storage, satisfy governance and compliance people who care about where data physically lives, and have full flexibility to pipe those files into Fabric lakehouses, warehouses, or external platforms. The trade‑off: you are also responsible for Azure resources, permissions, Delta conversion, and cost discipline—Synapse Link is infrastructure, not a wizard.
Then he flips the scalpel for the Swiss Army knife: Dataflow Gen2. Built for speed and low‑code, it lets Power BI and Fabric users pull Dataverse tables into OneLake with a few clicks, apply simple transformations, and feed dashboards without touching the Azure portal. The price of that convenience shows up later: you are capped at 48 refreshes per day (every 30 minutes), stuck with append‑only or full overwrite behavior instead of row‑level delta, and consuming Fabric capacity units rather than explicit storage and compute bills. When multiple Dataflows point at the same table or Dev and Prod collide, you get silent overwrites and governance chaos at 2 a.m.
Throughout the episode, Mirko uses real‑world stories: a finely tuned Synapse setup that devolved into duplicated exports and overlapping refreshes when multiple teams piled in without governance, and a finance dashboard that looked “successful” in Dataflow Gen2 while nightly overwrites quietly corrupted years of transaction history. His conclusion is blunt: Synapse Link is the right choice when you need near real‑time feeds, storage ownership, and engineered pipelines; Dataflow Gen2 is for quick analytics, prototypes, and low‑risk reporting where losing precise rollback is acceptable. The problem is not your users—it is pretending both tools solve the same problem.
WHAT YOU WILL LEARN
Your Dataverse pipeline is only as good as the extraction tool you design it around. Treat Synapse Link as the surgical instrument for governed, near real
He starts with Synapse Link, the control freak’s dream. You choose exactly which Dataverse tables and columns to sync, define refresh cadence down to every 15 minutes, and land data directly in your own Azure Data Lake Storage Gen2 account in open Parquet format. That means you own the storage, satisfy governance and compliance people who care about where data physically lives, and have full flexibility to pipe those files into Fabric lakehouses, warehouses, or external platforms. The trade‑off: you are also responsible for Azure resources, permissions, Delta conversion, and cost discipline—Synapse Link is infrastructure, not a wizard.
Then he flips the scalpel for the Swiss Army knife: Dataflow Gen2. Built for speed and low‑code, it lets Power BI and Fabric users pull Dataverse tables into OneLake with a few clicks, apply simple transformations, and feed dashboards without touching the Azure portal. The price of that convenience shows up later: you are capped at 48 refreshes per day (every 30 minutes), stuck with append‑only or full overwrite behavior instead of row‑level delta, and consuming Fabric capacity units rather than explicit storage and compute bills. When multiple Dataflows point at the same table or Dev and Prod collide, you get silent overwrites and governance chaos at 2 a.m.
Throughout the episode, Mirko uses real‑world stories: a finely tuned Synapse setup that devolved into duplicated exports and overlapping refreshes when multiple teams piled in without governance, and a finance dashboard that looked “successful” in Dataflow Gen2 while nightly overwrites quietly corrupted years of transaction history. His conclusion is blunt: Synapse Link is the right choice when you need near real‑time feeds, storage ownership, and engineered pipelines; Dataflow Gen2 is for quick analytics, prototypes, and low‑risk reporting where losing precise rollback is acceptable. The problem is not your users—it is pretending both tools solve the same problem.
WHAT YOU WILL LEARN
- Why Dataverse pipelines fail more from wrong tool choice than from user error.
- Where Synapse Link shines: near real‑time sync, selective tables, your own ADLS Gen2 storage.
- Where Dataflow Gen2 fits: low‑code, Fabric‑native refreshes with hard limits on frequency and rollback.
- How refresh caps, overwrite behavior, and capacity consumption can quietly break Dataflow‑based solutions.
- A simple rule of thumb to pick Synapse Link or Dataflow Gen2 based on refresh, ownership, and safety needs.
Your Dataverse pipeline is only as good as the extraction tool you design it around. Treat Synapse Link as the surgical instrument for governed, near real