Episode Details
Back to Episodes
Microsoft Fabric DP-600 Analytics Engineer Training Step 2 of 4: Unlocking Advanced Analytics Power
Season 1
Published 11 months, 2 weeks ago
Description
(00:00:00) Introduction to data flows
(00:07:14) Understanding data pipelines
(00:23:20) Real-time data shortcuts
(00:31:39) Integrating tools for efficiency
(00:41:40) Managing dependencies with lineage
(00:49:15) Role of stored procedures
(00:58:37) Optimizing data transformations
(01:05:43) End-to-end automation overview
In this episode of M365.fm, Mirko Peters takes you from ingestion anxiety to a clear playbook for moving data into Microsoft Fabric—using a concrete scenario: pulling data from Amazon S3, transforming it with Python, and landing it in a Fabric data warehouse. He starts by demystifying “data ingestion” itself, explaining why it is more than just copying files: it is the foundation for timely insights, efficient workflows, and trustworthy data quality, and without it your data remains just numbers on a spreadsheet.
Mirko then breaks down the three core options Fabric gives you: Dataflows, Pipelines, and Notebooks. Dataflows are the no‑code, Power Query–based workhorse for small to moderate datasets, with 150+ connectors and fast wins for cleaning, merging, and shaping data when volumes stay manageable. Pipelines step in when scale and orchestration matter, acting as traffic controllers that coordinate multi‑source ingestion, retries, branching, and scheduling—perfect for production‑grade ETL where monitoring and resilience are non‑negotiable. Notebooks bring full Python flexibility for complex transformations and API‑driven ingestion, turning raw JSON and custom logic into structured data ready for warehousing.
The episode spends time on where Dataflows start to break. As datasets grow into millions of rows or duplicate checks get heavy, Mirko shows how no‑code comfort turns into performance pain, even with optimizations like Fast Copy. He uses practical examples—cleaning marketing data, merging CRM exports, prepping datasets for self‑service reports—to position Dataflows as the Swiss Army knife for hands‑on tasks, not the engine for petabyte‑scale ingestion.
From there, he makes the case for graduating to Pipelines when workloads get serious. You hear how pipelines handle multi‑source ingestion, automatic retries on failure, parameterized workflows, and complex scheduling without burying logic in a single fragile flow. Mirko pairs this with Notebooks for heavy transformation, explaining patterns where Pipelines run extraction and orchestration while Notebooks perform intricate validation and reshaping before data lands in the warehouse—combining robustness with the full power of Python.
By the end, you have a simple decision frame instead of guesswork. Use Dataflows for fast, no‑code ingestion of small to mid‑sized data, Notebooks for complex, code‑driven transformations, and Pipelines as the orchestration backbone that stitches everything together at scale. Mirko’s core message: the problem is rarely “Fabric can’t do this”—it is choosing the wrong tool for your workload and discovering the limits at 2 a.m. instead of at design time.
WHAT YOU WILL LEARN
(00:07:14) Understanding data pipelines
(00:23:20) Real-time data shortcuts
(00:31:39) Integrating tools for efficiency
(00:41:40) Managing dependencies with lineage
(00:49:15) Role of stored procedures
(00:58:37) Optimizing data transformations
(01:05:43) End-to-end automation overview
In this episode of M365.fm, Mirko Peters takes you from ingestion anxiety to a clear playbook for moving data into Microsoft Fabric—using a concrete scenario: pulling data from Amazon S3, transforming it with Python, and landing it in a Fabric data warehouse. He starts by demystifying “data ingestion” itself, explaining why it is more than just copying files: it is the foundation for timely insights, efficient workflows, and trustworthy data quality, and without it your data remains just numbers on a spreadsheet.
Mirko then breaks down the three core options Fabric gives you: Dataflows, Pipelines, and Notebooks. Dataflows are the no‑code, Power Query–based workhorse for small to moderate datasets, with 150+ connectors and fast wins for cleaning, merging, and shaping data when volumes stay manageable. Pipelines step in when scale and orchestration matter, acting as traffic controllers that coordinate multi‑source ingestion, retries, branching, and scheduling—perfect for production‑grade ETL where monitoring and resilience are non‑negotiable. Notebooks bring full Python flexibility for complex transformations and API‑driven ingestion, turning raw JSON and custom logic into structured data ready for warehousing.
The episode spends time on where Dataflows start to break. As datasets grow into millions of rows or duplicate checks get heavy, Mirko shows how no‑code comfort turns into performance pain, even with optimizations like Fast Copy. He uses practical examples—cleaning marketing data, merging CRM exports, prepping datasets for self‑service reports—to position Dataflows as the Swiss Army knife for hands‑on tasks, not the engine for petabyte‑scale ingestion.
From there, he makes the case for graduating to Pipelines when workloads get serious. You hear how pipelines handle multi‑source ingestion, automatic retries on failure, parameterized workflows, and complex scheduling without burying logic in a single fragile flow. Mirko pairs this with Notebooks for heavy transformation, explaining patterns where Pipelines run extraction and orchestration while Notebooks perform intricate validation and reshaping before data lands in the warehouse—combining robustness with the full power of Python.
By the end, you have a simple decision frame instead of guesswork. Use Dataflows for fast, no‑code ingestion of small to mid‑sized data, Notebooks for complex, code‑driven transformations, and Pipelines as the orchestration backbone that stitches everything together at scale. Mirko’s core message: the problem is rarely “Fabric can’t do this”—it is choosing the wrong tool for your workload and discovering the limits at 2 a.m. instead of at design time.
WHAT YOU WILL LEARN
- What data ingestion really is and why it underpins timely, high‑quality analytics.
- When to use Dataflows as a no‑code option—and where they fail on large datasets.
- How Pipelines provide orchestration, retries, and scheduling for scalable ETL.
Listen Now
Love PodBriefly?
If you like Podbriefly.com, please consider donating to support the ongoing development.
Support Us