Episode Details

Back to Episodes

AI PM Crash Course: Prototyping → Observability → Evals + Prompt Engineering vs RAG vs Fine-Tuning

Published 7 months ago

Description

Every PM has to build AI features these days.

And with that means a completely new skill set:

- AI prototyping

- Observability, Akin to Telemetry

- AI Evals: The New PRD for AI PMs

- RAG v Fine-Tuning v Prompt Engineering

- Working with AI Engineers

So, in today’s episode, I bring you a 2-hour crash course into becoming a better AI PM.

I’ve teamed up with Aman Khan.

When it comes to people creating AI PM content, Aman Khan is amongst the most insightful and informed. And that's because he's been an AI PM since 2019:

- He worked at Cruise on self-driving cars.

- He's worked with Spotify on their AI systems.

- And now he works at Arize, one of the leading observability and evals companies.

----

Brought to you by:

Miro: The innovation workspace is your team’s new canvas

Jira Product Discovery: Plan with purpose, ship with confidence

Maven: Get $100 off Aman’s course with my code ‘AAKASHxMAVEN’

Amplitude: Test out the #1 product analytics and replay tool in the market

----

Timestamps:

Can Anyone Become AIPM? - 0:00

5 AIPM Skills Overview - 5:52

Skill 1: AI Prototyping - 6:31

Ad: Miro - 13:35

Ad: Atlassian - 14:50

Building Trip Planner Agent - 15:27

Ad: Maven - 29:46

Ad: Amplitude - 30:40

Skill 2: Observability - 50:34

Skill 3: Evals - 1:10:10

RAG vs Fine-Tuning vs Prompt Engineering - 1:29:54

Bolt Teardown - 1:30:32

Skill 5: Working With Engineers - 1:43:24

Don't Make These Mistakes - 1:48:33

2 Hours Weekly Plan - 1:53:55

AIPM Jobs Exist - 1:57:45

Aman's Resources - 2:00:48

Outro - 2:04:00

----

Key Takeaways:

1. Cursor beats Bolt for serious AI PMs. While Bolt is great for quick mockups, Cursor gives you the control you need to build real agent systems and understand what's happening under the hood.

2. Observability comes before evals. Just like regular products need telemetry for analytics, AI products need traces for evals. Point Cursor to documentation and it adds what you need.

3. Vibe coding doesn't scale. Looking at outputs and deciding if they "feel good" works for prototypes, but not production. You need systematic evals to measure what "good" actually means.

4. Most PMs fine-tune too early. Aman showed a prompt outperforming a fine-tuned model. Start with prompting (95% of results), add RAG for external data, only fine-tune for cost/speed.

5. Your evals need evals. When your LLM judge marks outputs as "friendly" while your human labels say "robotic," that mismatch tells you exactly where to improve your system.

6. Use text labels, not numbers. LLMs understand "friendly vs robotic" better than 1-5 scales. They're trained on language, not mathematics.

7. AI engineers want data, not docs. Stop sending Google Docs with requirements. They want you labeling datasets and defining success through evals.

8. Bolt is just a really good prompt. Aman tore down Bolt's architecture - it's system prompts + tool calling + code generation. The "magic" isn't magic.

9. Side projects are your interview hack. When Aman asks "What are you building?" he can immediately gauge curiosity, initiative, and hands-on experience.

10. Don't automate yourself too early. Use AI as a second brain fo

Episode Details

AI PM Crash Course: Prototyping → Observability → Evals + Prompt Engineering vs RAG vs Fine-Tuning

Description

Listen Now

Love PodBriefly?