From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI
Episode 62
Summary
In this episode of the AI Engineering Podcast Brijesh Tripathi, CEO of Flex AI, talks about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. He discusses Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
- Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloads
Interview