Your interview is coming. Here's the fastest path.
The complete AI PM prep guide — from first principles to high-probability interview questions. Walk in sharp, not just prepared.
→ opens The AI PM Role
Pick your prep mode
Interview tomorrow
30-min crash course. Hit the essentials and go.
1 The AI PM Role
2 Crafting Your Narrative
3 Mock Q&A Bank
~30–45 min
1 week out
Full structured path. Everything in order, nothing skipped.
1 AI PM Role + Narrative
2 Agile & ADO
3 AI / ML + Agents
4 Managing AI + Metrics
5 Mock Q&A Bank
~4–5 hours total
Just brushing up
Spot-check your vocabulary. Sharpen specific answers.
1 Glossary
2 Mock Q&A Bank
3 Questions to Ask
~20–30 min
Suggested learning path
AI PM Role
Start here
Agile & ADO
Frameworks
AI / ML & Agents
AI Knowledge
Managing AI Products
Practice
Mock Q&A Bank
Finish here
Full path: ~4–5 hours Crash path: ~30–45 min Sections marked deep = longer deep dives
How to use this guide
Say answers out loud
Reading Q&As silently isn't preparation. Speak each answer until it flows naturally — that's the moment you're ready.
Do the Glossary before the interview
Read every term once. Interviewers notice candidates who use precise vocabulary — it signals depth, not just prep.
Personalise your narrative
Use "Crafting Your Narrative" to put your own story into every model answer. Generic answers are forgettable. Specific ones land.
Ask AI to explain any term
Highlight any word on the page and click Ask AI. The coach on the right explains it in PM context — no tab-switching needed.
What this guide covers
Agile & Scrum deep
4 values, 3 roles, 5 ceremonies, DoD, velocity, sprint planning — with interview angles at every step
ADO & User Stories
Epic → Feature → Story → Task/Bug hierarchy. INVEST criteria. Given/When/Then AC with all 4 scenario types
AI / ML / LLM deep
AI vs ML vs GenAI, transformers, RAG, embeddings, fine-tuning, prompt engineering — explained for PMs
Agents & MCP
What AI agents are, multi-agent orchestration, Model Context Protocol — the frontier of AI product design
Managing AI Products
UAT for AI features, monitoring drift, human-in-the-loop design, confidence thresholds, model specs in stories
40+ Mock Q&As
High-probability interview questions with model answers — covering strategy, execution, AI craft, and stakeholder management
📍 Suggested learning pathStep 1 of 5 · ~10 min read
Getting Started
The AI PM Role
What separates an AI Product Manager from a traditional PM — and what interviewers are really evaluating.
PM vs AI PM — the key differences
Traditional PM
Translates user needs into features. Manages backlog, roadmap, and stakeholders. Measures success with engagement, adoption, and revenue metrics. Works primarily with engineering and design.
AI Product Manager
All of the above — plus: defines model quality requirements, understands data pipelines, writes AI-specific acceptance criteria, manages confidence thresholds and human-in-the-loop decisions, monitors for model drift, and balances automation with human oversight.
What interviewers are evaluating
1. Can you think in outcomes, not features?
AI PMs who succeed define the problem precisely before reaching for AI as the solution. Interviewers test this by asking "why" repeatedly — why this feature, why this model, why this metric.
2. Do you understand AI's limitations?
Confident AI PMs acknowledge uncertainty. They know AI models fail, drift, and hallucinate. They design products with these limitations in mind — not in spite of them.
3. Can you bridge technical and business?
The core AI PM superpower: translating "our model has 87% precision" into "3 in 10 customers will get a wrong recommendation" — and then deciding what to do about it.
4. Do you think about humans in the loop?
The best AI PMs never fully remove humans from consequential decisions. They design escalation paths, review queues, and feedback mechanisms as first-class product features.
5. Are you comfortable with ambiguity?
AI development is non-deterministic. Requirements change when the model behaves unexpectedly. Interviewers want to see how you adapt and make decisions without complete information.
The AI PM competency stack
Foundation (must have): Product thinking, stakeholder management, Agile delivery, user story writing, data literacy
AI layer (differentiator): Understanding of ML lifecycle, prompt engineering, AI metrics (precision/recall/F1), model evaluation, responsible AI principles
Advanced (stand-out): Agentic AI design, multi-model orchestration, RAG architecture, AI governance frameworks, MLOps awareness
Step 1 of 5 · AI PM Role
Getting Started
Crafting Your Narrative
How to build a compelling "About Me" for an AI PM interview — structured, confident, and specific to your background.
The 3-part structure
Part 1 — Foundation
Where did you start? What core skill did you build? Frame your early career as the foundation for what you do now. Example framing: "I spent [X years] in [domain], where I learned how to [core skill]. That shaped how I approach product problems today."
Part 2 — AI progression
How did you get into AI product work? What have you built or shipped? Anchor with 2–3 specific, quantified achievements. Frame your AI experience as intentional progression, not accidental. Use numbers wherever possible.
Part 3 — Why this role
What specifically draws you to this company and role? Reference something real — a product they've built, a strategy they've announced, a problem in the industry you care about. Generic enthusiasm is forgettable. Specific insight is memorable.
Tips for delivery
Keep it to 2.5–3 minutes
Leave space for follow-up questions. If they want more they'll ask. If you fill all available time, you've lost the dialogue.
Pause between parts
A brief pause after each section signals confidence and lets the interviewer absorb before you continue.
Lead with impact, not chronology
Don't just describe a timeline. Lead with what you achieved and why it matters — then give context for how you got there.
End with energy toward them
Your closing should point forward — toward this company, this product, this problem. Not backward at your resume.
Common mistake: Spending too long on early career and rushing through recent AI work. Interviewers care most about what you've done in the last 2–3 years. Front-load your AI-relevant experience.
📍 Suggested learning pathStep 2 of 5 · ~20 min read
Core Framework
Agile & Scrum
The operational backbone of AI product delivery. Know this inside out — interviewers test both theory and practical application.
Core distinction: Agile is the philosophy. Scrum is the framework. ADO/Jira is the tool. Don't conflate them.
The 4 Agile Values (Manifesto, 2001)
1. Individuals and interactions — over processes and tools
People and communication beat rigid workflows. A quick conversation resolves what a 3-page document cannot. A great PM builds relationships that make this possible.
2. Working software — over comprehensive documentation
Shipping something usable beats exhaustive specs. Documentation serves delivery — it doesn't replace it. Write docs that help people build, not docs that prove you thought about building.
3. Customer collaboration — over contract negotiation
Ongoing engagement with users beats fixed-scope contracts. Requirements evolve — and that's expected. The best PMs stay in continuous discovery even after a product ships.
4. Responding to change — over following a plan
A plan is a hypothesis. Reality will differ. The team that adapts fastest wins. Agile doesn't mean no planning — it means building the ability to change the plan when needed.
Agile vs Waterfall
Waterfall
Linear: Requirements → Design → Build → Test → Deploy. Phases are sequential. No working software until the end. Change is expensive — every phase depends on the prior one. Works for construction; rarely works for software.
Agile
Iterative and incremental. Plan, build, test, release in short cycles. Each cycle = working software. Change is expected. Feedback is continuous. Essential for AI products — model accuracy, edge cases, and data quality can't be fully specced upfront.
Product Owner
Owns the product backlog. Defines and prioritises what gets built. Represents the customer and business. Accountable for maximising product value.
As an AI PM, you typically fill this role.
Scrum Master
Facilitates Scrum ceremonies. Removes team blockers. Coaches the team on Agile practices. Not a project manager — serves the team, doesn't direct it.
Development Team
Cross-functional, self-organising group of 3–9 people (developers, QA, data scientists, designers). Commits to sprint goals and owns the "how."
Interview framing: "As PM I act as Product Owner in Scrum — I prioritise the backlog, define acceptance criteria, attend all ceremonies, and am the single accountability point for product decisions in the sprint."
Product Backlog
The ordered list of everything the product might need — features, fixes, improvements, tech debt, research spikes. The PO owns and prioritises it. Never "complete" — evolves as the product and market evolve. Top items are refined and ready; bottom items are rough ideas.
Sprint Backlog
The subset of product backlog items selected for the current sprint, plus the team's plan to deliver them. The team owns this. The PO should not change it mid-sprint — doing so is a Scrum anti-pattern that destroys team trust.
Increment
The sum of all completed work in a sprint — plus all previous increments. Must meet the team's Definition of Done. Should be potentially shippable every sprint, even if the PO decides not to release it publicly yet.
Critical distinction: Acceptance Criteria (AC) is story-specific — it defines "done" for one story. Definition of Done (DoD) is team-wide — it applies to every story and includes code review, testing, documentation, deployment to staging, etc.
Sprint Planning
When: Start of sprint · ~2 hrs per sprint week
Part 1 — What: PO presents refined backlog items. Team asks clarifying questions. Team selects what to commit to. Part 2 — How: Team breaks stories into tasks, estimates effort, flags dependencies. Output: Sprint Goal + committed Sprint Backlog.
PM responsibility: Arrive with refined, AC-complete stories. Never bring an unrefined story to planning — it derails the entire ceremony.
Daily Standup
When: Every day · 15-minute timebox
Three questions per team member: 1. What did I complete yesterday? 2. What will I work on today? 3. What is blocking me?
Not a status report to management. The team synchronising with itself. Blockers get resolved offline — not solved in the standup.
PM responsibility: Listen for blockers needing your action — missing requirements, pending decisions, cross-team dependencies.
Sprint Review
When: End of sprint · ~1 hr per sprint week
Team demos completed work to stakeholders. Working software only — no slides about what "will be" built. Stakeholders give feedback. PO updates backlog based on what they learn.
PM responsibility: Facilitate the demo, articulate business value of what was delivered, gather structured feedback, translate feedback into backlog updates.
Sprint Retrospective
When: After Sprint Review · ~45 min per sprint week
Team inspects its own process — not the product. Three questions: 1. What went well? 2. What didn't go well? 3. What will we improve next sprint?
Output: 1–3 concrete, actionable improvements. PM responsibility: Participate fully. Own retro actions that involve requirements quality, documentation, or stakeholder communication.
Backlog Refinement / Grooming
When: Mid-sprint · 1–2 hrs/week (not an official Scrum event but universally practiced)
PO and team review upcoming backlog items — clarify requirements, split large stories, estimate effort, identify dependencies. Goal: the top of the backlog is always sprint-ready.
PM responsibility: Own this session. Come prepared with written stories, AC drafted, mockups or data samples ready. This is where your PM craft actually happens.
Definition of Done (DoD)
A shared, team-wide checklist that defines when a story is truly complete. Typical DoD items: code written and reviewed, unit tests passing, integration tested, documentation updated, acceptance criteria verified, deployed to staging environment, PO sign-off received.
For AI features, add to the DoD: model evaluated on held-out test data, confidence threshold validated, human review queue tested, monitoring/alerting instrumented, data drift baseline recorded.
Story Points & Velocity
Story Points
A relative measure of effort, complexity, and uncertainty. Uses Fibonacci sequence: 1, 2, 3, 5, 8, 13, 21. A 5-point story is roughly twice as complex as a 3-pointer. The team calibrates together — a "1" is the simplest possible story for that specific team.
Velocity
Average story points completed per sprint across the last 3–5 sprints. Used for forecasting only — not a performance metric. If velocity is 40 points and the refined backlog has 200 points, you have ~5 sprints (~10 weeks) of work ahead.
Interview framing: "I don't treat velocity as a measure of team performance. I use it as a forecasting tool to set realistic stakeholder expectations — and as an early signal when something external is blocking the team."
Common question
Describe how you've applied Agile in your work.
Anchor on the PM/PO role and concrete practices. Mention backlog grooming, sprint planning, demos with stakeholders, and retrospective-driven improvement. If you have an AI context, highlight how Agile was essential for iterating on model quality — requirements that can only be fully validated once the model runs on real data.
Common question
What do you do when a story isn't ready for sprint planning?
"I pull it out — full stop. An unrefined story in a sprint leads to mid-sprint clarification storms, scope creep, and missed commitments. My rule: no story enters a sprint without clear acceptance criteria, defined scope, and no unresolved dependencies. That's what refinement sessions are for. If something urgent comes up at the last minute, I either time-box a spike to investigate it, or defer it to the next sprint."
Common question
How do you balance speed with documentation in Agile?
"Documentation serves a reader. I ask: who reads this, when, and what decision does it support? That determines the appropriate level of detail. For a fast-moving PoC, a one-page problem statement and success criteria is enough. For a production AI feature in a regulated industry, I write a full BRD with model specs, data governance notes, and audit trail requirements. Agile doesn't mean less documentation — it means right-sized documentation delivered at the right moment."
Step 2 of 5 · Frameworks
Core Framework
ADO & User Stories
The work item hierarchy that turns strategy into shipped software. Know it cold — and be able to write great stories in the interview itself.
Work item hierarchy
Epic
Large business initiative spanning multiple sprints or quarters. Tied to a strategic objective. Example: "AI-powered customer self-service — reduce support contact volume by 25%"
Feature
Distinct, deliverable capability within an Epic. Can span 2–4 sprints. Has its own business value. Example: "Intent detection — classify customer queries into 12 categories with ≥90% accuracy"
User Story
Small unit of user value. Fits in one sprint. Written from user's perspective. Has Acceptance Criteria. Example: "As a customer, I want my billing query routed to the right team automatically so I don't have to repeat myself."
Task
Specific piece of work within a story. Owned by an individual. Tracked in hours. Examples: design intent detection prompt, build routing API, write unit tests, UAT with 5 users.
Bug
Deviation from expected behavior. Must include: steps to reproduce, expected result, actual result, severity, environment. Example: "Query classified as billing when it is a technical issue — Severity: High — Reproducible 4/5 attempts."
Writing user stories — the INVEST test
I — Independent
Can be delivered without waiting for another unfinished story
N — Negotiable
Details evolve through conversation — not a fixed contract
V — Valuable
Delivers standalone value to the user or business
E — Estimable
Team can size it — if not, needs more refinement
S — Small
Fits within one sprint — if not, split it
T — Testable
Acceptance Criteria can be written to verify it
Acceptance criteria — Given / When / Then
Write 4 scenario types — not just 1
Most PMs only write the happy path. Writing all four scenarios is what separates a junior PM from a senior one.
// Type 1 — Happy path (expected behavior)Givena document is uploaded in a supported formatWhenthe AI extraction model processes itThenall required fields are extracted with ≥90% confidence within 3 seconds// Type 2 — Edge case (unusual but valid input)Giventhe model extracts a field with confidence below the thresholdWhenthe result is generatedThenthe field is flagged for human review — NOT auto-populated// Type 3 — Failure case (graceful degradation)Giventhe uploaded file is an unsupported formatWhenextraction is attemptedThenthe system returns a clear error message and does not process the file// Type 4 — Non-functional (performance, security, privacy)Perfpipeline handles 100 concurrent uploads without response time degradationDatano document content is stored beyond the processing window per data governance policy
Story splitting techniques
By workflow step
Split "upload, process, download" into three separate stories — one per step
By data type
Split "extract from documents" into Story 1: PDFs, Story 2: scanned images, Story 3: Word docs
Happy path first
Ship the happy path in Sprint 1. Handle edge cases and errors in Sprint 2.
By user role
Split by persona when different users have different experiences of the same feature
Core Framework
Prioritisation
How to make defensible decisions about what to build next — using frameworks, not instinct.
RICE scoring
Reach × Impact × Confidence ÷ Effort
Reach: How many users affected per sprint/quarter? Impact: How much does this move the needle? (1=minimal, 2=low, 3=medium, 4=high, 5=massive) Confidence: How confident are you in your estimates? (0–100%) Effort: How many person-months to build?
Higher RICE score = higher priority. Compare items on the same scale — absolute numbers matter less than relative ranking.
Kano Model
Basic needs (Must-haves)
Features users expect as table stakes. Absence causes dissatisfaction. Presence doesn't delight — it just avoids upset. Prioritise these first, always.
Performance needs
Features where more is better — linearly. More accuracy, faster speed, lower cost. Score these with RICE to rank relative value.
Delighters
Unexpected features that create excitement. Users didn't know they wanted them. Time-box these to innovation sprints — don't over-invest before validating.
Interview tip: "I use Kano to categorise first, then RICE to rank within categories. Basic needs get prioritised regardless of RICE score — the business can't function without them."
MoSCoW — for release scoping
Must have
Non-negotiable for the release. Without these, the release fails or doesn't ship.
Should have
Important but not critical for this release. Can be deferred to the next sprint with minimal impact.
Could have
Nice to have if time allows. Won't cause significant issues if dropped.
Won't have (this time)
Explicitly out of scope for this release. Not forever — just now. Documenting "Won't have" is as important as documenting "Must have."
OKRs — outcome-driven prioritisation
Objective: Qualitative, inspirational goal — "Become the most trusted AI assistant for enterprise finance teams" Key Results: 3–5 measurable outcomes — "Reduce time-to-answer by 40%", "Achieve NPS > 50 among power users", "95% of responses pass accuracy audit"
PM discipline: Before adding anything to the sprint backlog, ask: "Which Key Result does this move?" If the answer is none, the feature needs a stronger case to exist.
Common trap: Treating delivery OKRs ("ship 3 features") as outcome OKRs ("reduce user effort by 30%"). Delivery is a means, not the goal.
Core Frameworks
Estimation Methods
How Agile teams estimate work — from sprint-level story points to roadmap-level T-shirt sizing. Interviewers test both your understanding of the techniques and your judgement about when to use each.
Why estimation matters in Agile
Estimation in Agile is not about predicting the future with precision — it is about creating a shared understanding of effort and complexity. Good estimates enable sprint planning, capacity management, roadmap forecasting, and trade-off conversations with stakeholders. The goal is relative accuracy, not absolute precision.
Key rule: Never estimate a story that isn't refined. If the team can't estimate it, that is a signal the story needs more refinement — not more pressure to guess.
Story Points & the Fibonacci series
Why Fibonacci and not 1, 2, 3, 4, 5...?
The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21...) grows non-linearly — gaps between numbers get bigger as they increase. This reflects how estimation uncertainty works: the difference between a 1-point and 2-point story is meaningful, but the difference between a 20 and 21-point story is not. Large stories are inherently uncertain — Fibonacci forces the team to acknowledge that with wider gaps at the top.
1 — 2
Trivial / Very small Well-understood, minimal complexity, few unknowns.
3 — 5
Small / Medium Clear requirements, some complexity. Standard feature work.
8 — 13
Large / Very large Significant complexity. 13+ = must split before entering a sprint.
What story points actually measure
Effort — how much work is required? Complexity — how difficult is the problem to solve? Uncertainty — how much don't we know yet?
Story points are relative and team-specific. You cannot compare points across different teams.
Common mistake: Converting story points to hours. Points include complexity and uncertainty which don't convert to time linearly. Use velocity for sprint capacity forecasting only — never as a performance target.
Planning Poker — the estimation ceremony
Step 1 — PO reads the story
Product Owner reads the story aloud, answers clarifying questions. Only proceed when questions are resolved.
Step 2 — Everyone estimates privately
Each team member selects a Fibonacci card silently. No one reveals yet — prevents anchoring bias where the first speaker pulls everyone else's estimate.
Step 3 — Simultaneous reveal
All cards revealed at once. Close estimates (3, 3, 5) → take consensus and move on. Spend no more than 5 minutes per story.
Step 4 — Discuss divergence
Wide divergence (2 and 13) means different assumptions. The lowest and highest estimators explain their reasoning — this surfaces hidden dependencies, technical risks, or missing requirements.
The "?" card
Means "I don't have enough information to estimate." The story goes back to refinement. Never force an estimate on a story that isn't understood.
PM insight: Wide divergence in Planning Poker is not a problem — it is a feature. It surfaces hidden assumptions. A story that gets a 2 and a 13 simultaneously needs more refinement, not more voting.
T-shirt sizing — for Features and Epics
T-shirt sizing is a high-level estimation technique for Features, Epics, and roadmap items — before they are broken into User Stories. It trades precision for speed. Instead of debating 13 vs 21 points, you simply ask: "Is this a small, medium, or large piece of work?" Set a reference Medium first — everything else is sized relative to it.
XS — Extra Small
1–2 days. Tiny scope. Can be done as one story in a single sprint.
S — Small
3–5 days. Clear scope, low complexity. One sprint, 2–3 stories.
M — Medium (the anchor)
1–2 weeks. The reference size — everything is sized relative to this.
L — Large
2–4 weeks. Multiple sprints. Needs breakdown before entering a sprint.
XL — Extra Large
1–2 months. Complex, multi-team. Needs a spike before estimation.
XXL — Epic-scale
Quarter or more. This is an Epic — must be decomposed into Features first.
How they connect: A Feature sized "M" in T-shirt sizing typically breaks into 3–5 User Stories totalling 15–25 story points. T-shirt sizing for roadmap planning; story points for sprint planning.
Other estimation techniques
Three-point estimation (for high-risk tasks)
Optimistic (O) + 4 × Most Likely (M) + Pessimistic (P) ÷ 6 Used for tasks with significant uncertainty — integrations, new technology, poorly understood domains. Gives a weighted average that accounts for worst-case scenarios.
Affinity / Bucket sizing (for large backlogs)
Write each story on a card. Team silently sorts into Fibonacci buckets. Discuss only items people placed differently. Can estimate 50+ stories in under an hour — great for early-stage backlog sizing.
Interview Q&A
Common question
How do you run a story point estimation session?
"I use Planning Poker. Stories must be refined with clear acceptance criteria before we estimate — we never estimate vague stories. During the session, I read the story, take questions, then everyone reveals their Fibonacci card simultaneously. Close estimates (3, 3, 5) → consensus and move on. Wide divergence (2 and 13) → the outliers explain their reasoning. That conversation almost always surfaces a hidden dependency or missing requirement. The '?' card means the story goes back to refinement. Five minutes maximum per story."
Common question
What is the difference between story points and T-shirt sizing?
"T-shirt sizing is for high-level, early-stage estimation of Features and Epics on the roadmap — fast and directional. Story points are for sprint-ready User Stories during Planning Poker — relative, Fibonacci-based, team-specific. They operate at different levels. I use T-shirt sizes to plan the quarter and story points to manage sprint capacity. A feature sized 'Medium' in T-shirt terms typically breaks down into 15–25 story points once refined into User Stories."
Core Frameworks
Product & Team KPIs
Two categories: product KPIs tell you whether you're building the right thing — team KPIs tell you whether you're building it well. Know both, and know the difference between them.
Interview framing: "I track two categories: product KPIs that measure whether we're delivering value to users, and delivery KPIs that measure whether our team process is healthy. Product KPIs are the goal. Delivery KPIs are the engine."
Product KPIs — are we building the right thing?
Adoption Rate
Formula: Active users ÷ eligible users % of eligible users actively using the feature. High accuracy but low adoption = UX or trust problem, not a model problem. Critical for AI features where users often resist AI assistance even when it's accurate.
Retention Rate
Formula: Users retained ÷ users at start of period % of users who continue using the product over time. Rising retention = ongoing value delivered. Dropping retention = solved once but not sticky.
Net Promoter Score (NPS)
Formula: % Promoters − % Detractors (scale: −100 to +100) Would users recommend the product? A leading indicator of churn and growth. Scores above 50 are excellent for enterprise software. Detractors (0–6) are most important to understand.
Customer Satisfaction Score (CSAT)
Formula: Satisfied responses ÷ total responses (1–5 stars) How well the product met expectations in a specific interaction. More granular and moment-specific than NPS. Used heavily in CX and contact centre products.
Churn Rate
Formula: Users lost ÷ total users per period % who stopped using the product. A lagging indicator — rising churn means something went wrong weeks or months earlier. Monitor NPS and session frequency as leading indicators to catch churn before it appears.
Time to Value (TTV)
Formula: Time from onboarding → first meaningful outcome How long until a new user or client experiences core product value? Shorter TTV = lower churn risk. For enterprise AI products, TTV includes data ingestion, model calibration, and user training.
Revenue Metrics
ARR / MRR: Annual / Monthly Recurring Revenue — the business engine and growth tracker ARPU: Average Revenue Per User — how much each user contributes LTV: Lifetime Value — total revenue expected over the customer relationship. Compare to CAC (Customer Acquisition Cost) to assess unit economics.
Team / Delivery KPIs — are we building it well?
Velocity
Formula: Average story points completed per sprint (last 3–5 sprints) How much work the team can reliably complete per sprint. Use for forecasting only — never as a performance target. Pressuring a team to increase velocity leads to inflated estimates, not more output.
Sprint Burndown
Remaining work plotted daily vs. an ideal completion line. Actual line above ideal = behind schedule. Flat lines = blocked — needs immediate attention. A consistently flat burndown mid-sprint is a blocker signal, not a team performance issue.
Sprint Goal Completion Rate
Formula: Sprints where sprint goal was met ÷ total sprints More meaningful than velocity. A team that meets its sprint goal with 80% of planned points is healthier than one hitting 100% of points but missing the goal every sprint. The goal is the point — points are the means.
Cycle Time
Formula: Time from work started → work done (per story) How long a story flows through the system once work begins. Long cycle times signal blockers, context switching, or stories that are too large. Compare cycle time to story point size — large cycle time on small stories = hidden friction.
Lead Time
Formula: Time from requirement created → work done Total end-to-end responsiveness including backlog wait time, refinement, planning, and development. Lead time > cycle time. The gap between them is backlog wait time — often the biggest opportunity to reduce.
Defect Rate / Bug Escape Rate
Formula: Bugs found in production ÷ stories shipped A rising defect rate after a velocity increase = team is moving fast but cutting quality corners. For AI products, also track model error rate — % of AI outputs that are incorrect or flagged by human reviewers.
Interview Q&A
Common — any interviewer
How do you measure whether your product is successful?
"I separate product KPIs from delivery KPIs. Product KPIs — adoption, retention, CSAT, NPS — tell me whether we're building something users value. Delivery KPIs — velocity, sprint goal completion, cycle time, defect rate — tell me whether the team is working effectively. I define product KPIs at requirements stage, before we build, so everyone agrees on what success looks like. Post-launch I track them weekly and set threshold alerts so I'm not discovering problems in a quarterly review."
Common
What is the difference between velocity and sprint goal completion rate?
"Velocity measures output — story points completed per sprint. Sprint goal completion rate measures outcome — did the team achieve what it set out to achieve? A team can hit 100% of planned points but miss the sprint goal if they over-indexed on low-priority stories. I care more about sprint goal completion because it measures whether we're moving in the right direction, not just moving fast. Velocity is a forecasting tool; sprint goal completion is a health indicator."
Core Frameworks
Documentation Types
Every document serves a specific audience and decision. Before writing any document, ask: "Who reads this, when, and what decision does it enable?" That determines the right format and level of detail.
Key principle: Agile does not mean no documentation. It means right-sized documentation delivered to the right person at the right time. Write documents that people actually use — not documents that prove you thought about building.
Strategic & discovery documents
BRD — Business Requirements Document
Owner: BA / PM · Audience: Business stakeholders, project sponsors Purpose: Captures the business need and context — why the project exists, what problem it solves, who the stakeholders are. More business-facing than a PRD. Written before product design begins.
Contains: Business objectives · Stakeholder analysis · Current state pain with evidence · Desired future state · Business constraints · ROI / benefit case · High-level scope · Success criteria from a business perspective
PRD — Product Requirements Document
Owner: PM · Audience: Product, engineering, design, data science Purpose: The PM's primary artifact. Defines what to build and why — the user problem, proposed solution, success metrics, constraints. Does not specify how to build it — that belongs to engineering.
Contains: Problem statement · User personas · Goals & non-goals · Feature requirements · Success metrics · Dependencies · Open questions · Out of scope · For AI features: Model requirements section with confidence thresholds, input/output schema, fallback design, monitoring requirements
Blueprint — Solution Blueprint
Owner: PM / Solution Architect · Audience: Cross-functional team, technical leadership Purpose: Bridges business requirements and technical design. Shows how the solution will be structured — key components, data flows, integrations, architecture — without being a full technical spec. Used in enterprise AI engagements to align all parties before development begins.
Owner: PM / PO · Audience: Development team, QA Purpose: The atomic unit of delivery in Agile. "As a [persona], I want [action], so that [outcome]." Must fit in one sprint with testable AC in Given/When/Then format covering all four scenario types.
Contains: Persona · User need · Business outcome · AC: happy path, edge case, failure case, non-functional · Story points · Dependencies · DoD reference · For AI: model quality AC and monitoring AC
FSD — Functional Specification Document
Owner: BA / PM · Audience: Engineering, QA, UAT team Purpose: Describes how the system should behave from a functional perspective without specifying technical implementation. Sits between the BRD and the SDD. Used in regulated industries and large enterprise projects requiring a full behavioural specification before development.
Contains: Functional requirements per use case · System behaviour · UI/UX requirements · Data inputs & outputs · Error handling · Validation rules · Business rules · Reporting requirements
SDD — System Design Document
Owner: Engineering / Architect · Audience: Development team, DevOps, security Purpose: The engineering team's artifact — describes how the system will be built technically. PMs review but do not author this. Your role is to verify it accurately reflects the functional requirements you specified.
Owner: BA / Operations / PM · Audience: Operations, change management, training Purpose: Documents the business process — current state (as-is) and future state (to-be). Identifies where automation or AI can intervene. Critical for enterprise AI products where workflows must be mapped before automation can be designed.
Contains: Current-state process map · Pain points & bottlenecks · Future-state process map · Role & responsibility changes · AI automation opportunities · Exception handling · Change impact assessment
AI-specific documents
Model Spec — AI Model Specification
Owner: PM + Data Scientist (co-authored) · Audience: Data science, engineering, QA Purpose: Defines the AI model's requirements — inputs, outputs, quality thresholds, evaluation criteria, fallback behaviour. PM writes the business requirements; data scientist fills in the technical design. Must be created before model development begins.
Owner: Operations / PM · Audience: Operations team, human reviewers, quality managers Purpose: Step-by-step instructions for the human review process in AI products — what reviewers do when the model routes a case to the review queue, how they make decisions, and how they record corrections. Without an SOP, the human-in-the-loop design breaks down in practice.
Contains: Step-by-step review process · Decision criteria for AI outputs · Escalation paths · How to record corrections (feeds retraining) · Quality checklist · SLA for review turnaround
The document hierarchy — how they connect
Strategy: BRD → defines the business problem and scope Product: PRD + Blueprint → defines what to build and how it fits together Process: PDD → maps the workflows the product will change Delivery: FSD → functional behaviour · User Stories → sprint delivery units Technical: SDD → how engineering builds it (PM reviews, doesn't author) AI-specific: Model Spec → AI quality requirements (PM co-authors) Operations: SOP → how humans operate and govern the AI system in production
As an AI PM you will: Author BRD, PRD, User Stories, PDD. Co-author Model Spec. Review FSD and SDD. Ensure SOP exists before any AI feature goes to production — without it, the human review queue has no process and the HITL loop breaks down silently.
Interview Q&A
Very common — BA-background interviewers
What is the difference between a BRD and a PRD?
"A BRD is business-facing — it captures the business problem, stakeholders, constraints, and the case for why the project exists. It answers 'why are we doing this?' A PRD is product-facing — it captures the proposed solution, user personas, feature requirements, and success metrics. It answers 'what are we building and for whom?' The BRD comes first and informs the PRD. For AI products I extend the PRD with a Model Spec section covering confidence thresholds, input schema, fallback design, and monitoring requirements — because these are product decisions that must be defined before model development begins."
Common
Walk me through the documentation you produce on an AI feature.
"My documentation follows the delivery lifecycle. In discovery I produce a BRD — problem statement, stakeholder analysis, current state pain, desired outcome. In solution design I write the PRD with a Model Spec section for AI requirements, and a PDD to map current and future-state workflows. In delivery I write User Stories with Given/When/Then AC across all four scenario types. Before production I ensure an SOP exists for the human review queue — without it, reviewers have no process and the HITL loop breaks down. I review the SDD but don't author it — I verify it accurately reflects what I specified in the functional requirements."
📍 Suggested learning pathStep 3 of 5 · ~25 min read
AI Knowledge
AI / ML / LLM Fundamentals
What every AI PM must understand — explained for product thinkers, not data scientists.
Artificial Intelligence (AI)
The broad field of building systems that simulate human intelligence — reasoning, learning, perception, and decision-making. Everything below is a subset of AI.
Machine Learning (ML)
Systems that learn patterns from data rather than following hand-coded rules. Supervised learning: learns from labeled data (classification, regression) Unsupervised learning: finds hidden structure without labels (clustering, anomaly detection) Reinforcement learning: learns by maximising reward (pricing, game AI, routing optimisation)
Deep Learning
A subset of ML using neural networks with many layers. Powers modern AI — image recognition, speech, natural language understanding. Requires large datasets and compute.
Generative AI (GenAI)
AI that generates new content — text, images, code, audio, video — rather than just classifying or predicting. Powered by foundation models trained on enormous datasets. Examples: GPT-4, Claude, Gemini, Stable Diffusion.
NLP — Natural Language Processing
AI that understands and generates human language. Underpins chatbots, document extraction, sentiment analysis, intent classification, translation — core capabilities of most enterprise AI products.
Artificial Intelligence (AI)
The broadest field — building systems that simulate intelligent behaviour: reasoning, perception, decision-making, planning. Everything below is a subset.
Machine Learning (ML)
Systems that learn patterns from data rather than following hand-coded rules. Three main paradigms: supervised, unsupervised, reinforcement learning.
Deep Learning (DL)
Multi-layer neural networks that learn representations automatically. Powers image recognition, speech, and natural language. Requires large datasets and significant compute.
DL ∩ NLP
LLM
Large Language Models
Conv. AI
Conversational AI
Natural Language Processing (NLP)
AI for understanding and generating human language. Underpins chatbots, translation, document extraction, sentiment analysis — core to most enterprise AI products.
Also under AI →Expert SystemsComputer VisionRobotics & PlanningKnowledge Representation
AI ⊃ ML ⊃ Deep Learning · NLP overlaps ML · LLMs and Conversational AI live at the intersection of Deep Learning and NLP
What is an LLM?
A Large Language Model is trained on vast text using transformer architecture. It learns to predict the next token in a sequence. Doing this well, at massive scale, produces models capable of writing, reasoning, coding, and more.
Key concepts for PMs
Tokenisation: Text is split into tokens (roughly words/subwords) before processing Context window: How much text the model can "see" at once — larger = better for long documents Temperature: Controls randomness — 0 = deterministic, 1+ = creative/unpredictable Fine-tuning: Adapting a pre-trained model for a specific domain with labeled examples Embeddings: Numerical vector representations of text capturing semantic meaning — powers search and RAG
The attention mechanism (simplified)
The core innovation in transformers. The model learns which parts of the input to "attend to" when generating each output token. This is why LLMs understand context across long documents — they're not just looking at the words immediately before; they're weighing the entire context.
How to explain LLMs in interviews: "An LLM learns statistical patterns across billions of text examples. When you prompt it, you're activating those learned patterns to generate a contextually relevant response. The quality depends on training data quality, model size, and how well you've framed the prompt."
How an LLM processes your prompt: tokenisation converts text to IDs, embeddings map them to vector space, attention layers compute relationships, then output tokens are decoded back to text.
RAG — Retrieval-Augmented Generation
A pattern that combines an LLM with real-time knowledge retrieval. Instead of relying on the model's training data alone, RAG retrieves relevant documents from a vector database and feeds them as context to the LLM at query time.
Why it matters for PMs: RAG is how enterprise AI products stay accurate and current. It's the architecture behind intelligent search, knowledge assistants, and real-time agent guidance — key capabilities in most AI PM roles.
Prompt Engineering
Designing inputs to LLMs to reliably produce desired outputs. Key techniques: Zero-shot: Just the instruction, no examples Few-shot: Instruction + 2–5 labeled examples Chain-of-thought: Ask the model to reason step-by-step before answering System prompt: Instructions that define the model's role, constraints, and persona
PM responsibility: Define the prompt architecture and evaluation criteria. You don't write every prompt — but you own what "good" looks like.
Model Drift
When a model's performance degrades over time because real-world data patterns shift away from the training distribution. PMs must instrument monitoring from Day 1 — define drift alerts as part of the production Definition of Done, not as a post-launch afterthought.
Hallucination
When an LLM confidently produces factually incorrect or fabricated information. A fundamental limitation of current LLMs. PMs must design products with hallucination in mind — through RAG (grounding in real data), human review gates for high-stakes outputs, and user-facing confidence indicators.
RAG augments the LLM with retrieved context: the query is embedded, similar docs are fetched from a vector DB, and both are passed to the LLM — giving grounded, up-to-date answers.
1. Problem Definition
Is ML the right solution? Define the prediction task precisely. What are you predicting? What's the input data? What's the output? What does "correct" mean?
2. Data Collection & Labeling
Gather training data. Label it (for supervised learning). Data quality here determines model quality ceiling — garbage in, garbage out. PMs often under-invest here and over-invest in model selection.
3. Model Training & Evaluation
Train model on labeled data. Evaluate on held-out test set. Measure precision, recall, F1. Iterate on features, architecture, hyperparameters. PMs define the evaluation criteria and the acceptable performance thresholds.
4. Deployment & Monitoring
Deploy to production with canary/shadow rollout. Instrument monitoring for accuracy drift, latency, error rates. Set alerts. Plan the retraining cadence. PMs own the production success criteria.
5. Feedback Loop & Retraining
Human corrections, user feedback, and production data feed back into retraining. This is where AI products compound in value — the product gets smarter as it's used. Design for this loop from Day 1.
Common question
How do you decide when to use AI vs a simpler rule-based solution?
"I ask three questions: Is the pattern too complex for rules? Do we have enough data to train reliably? Is the cost of errors acceptable given the use case? If rules can solve 90% of cases cleanly and the remaining 10% are low-stakes, rules are probably better. AI shines when the pattern space is genuinely complex, when volume is high enough for statistical learning, and when we have a feedback loop to improve over time."
Common question
A data scientist says the model is ready. You disagree. What do you do?
"I'd first make sure we're measuring the same thing against the same success criteria — which should have been defined at requirements stage. If the model meets the technical metrics but I'm concerned about real-world behavior on edge cases or corner cases the test set doesn't cover, I'd push for an expanded UAT on a broader sample. The technical metric and the business outcome aren't always the same thing — my job is to make sure we're optimising for the right one."
Common question
How do you explain AI capabilities and limitations to a non-technical stakeholder?
"I use analogies grounded in their domain. Instead of 'the model has 87% precision', I say 'for every 100 recommendations it makes, roughly 13 will be off the mark — here's how we've designed the product to catch and correct those before they reach the customer.' I always pair a limitation with the mitigation design, so stakeholders understand we've planned for the failure mode."
Step 3 of 5 · AI Knowledge
AI Knowledge
Agents & MCP
The frontier of AI product design. AI agents are moving from demos to enterprise deployment — AI PMs need to understand how to design, scope, and govern them.
What is an AI Agent?
An AI agent is a system that perceives its environment, reasons about a goal, takes actions (uses tools, calls APIs, searches the web, writes and runs code), evaluates the result, and iterates until the goal is achieved — without requiring a human to direct every step.
The key difference from a chatbot: a chatbot responds once. An agent loops — observe → think → act → observe → think → act — until done or until it reaches a defined stopping condition.
Agent anatomy
Brain (LLM): Reasoning and planning Tools: APIs, search, code execution, databases Memory: Short-term (context window) and long-term (vector store) Orchestrator: Manages the reasoning loop and tool calls
Chatbot vs Agent
A chatbot tells you your flight is delayed. An agent checks alternatives, cross-references your calendar, books the best option, and notifies your hotel — then reports back with what it did.
Multi-Agent Systems
Multiple specialised agents collaborating — each with a defined role. An orchestrator delegates tasks to worker agents, collects results, and assembles the final output.
Example — customer service resolution: Agent 1 (Classifier) → detects issue type and urgency Agent 2 (Retriever) → fetches account data and relevant KB articles Agent 3 (Drafter) → generates response recommendation Agent 4 (QA Checker) → validates for accuracy and compliance Agent 5 (Router) → decides: send to human or auto-resolve?
Key PM consideration: In multi-agent systems you must design the handoff protocol — what data passes between agents, what constitutes a failure at each stage, and when to escalate to human review. This is a product design decision, not an engineering implementation detail.
MCP — Model Context Protocol
What is MCP?
An open standard that allows AI models to securely connect to external tools, databases, and services through a standardised interface. Think of it as USB for AI integrations.
Before MCP, every AI tool integration required custom API code — build once for Salesforce, rebuild for Jira, rebuild again for Gmail. With MCP, any compatible tool exposes a standardised interface — one integration protocol, many tools. Agents become composable.
MCP Server
The service that exposes a tool's capabilities through the MCP protocol. A Salesforce MCP server exposes "read contact", "create opportunity", "update deal stage" — all through a standard interface the agent can discover and call.
MCP Client (the agent)
The AI system that connects to MCP servers to discover available tools and call them. It doesn't need to know the implementation details of each tool — just what the tool can do and what parameters it accepts.
Designing agentic AI — PM considerations
Define the autonomy boundary
What can the agent do without human approval? What requires confirmation? Where must a human always be in the loop? This is a product requirement — define it explicitly before engineering starts.
Design graceful failure
Agents can loop indefinitely, hallucinate tool calls, or take unintended actions. Define: what does a graceful failure look like? What's the rollback path? What gets logged when the agent fails? These are DoD items.
Auditability by design
Enterprise clients need a complete trace of every agent action — what decision was made, by which model, based on what data, at what confidence level. Design for auditability from Day 1, especially in regulated industries.
Cost and latency guardrails
Agents make multiple LLM calls per task. Without guardrails, a single user request can trigger hundreds of API calls. PMs must define maximum step counts, timeout thresholds, and cost-per-task budgets as product requirements.
AI Knowledge
Responsible AI
AI PMs are the last line of defence before a harmful or biased AI product reaches users. This isn't a compliance checkbox — it's a core product design discipline.
The six pillars of responsible AI
Fairness & Bias mitigation
AI models can learn and amplify biases present in training data. A hiring model trained on historical data may disadvantage certain groups. A lending model may discriminate by proxy. PMs must define fairness metrics, run bias audits across demographic slices, and treat disparate impact as a bug — not an acceptable outcome.
Explainability (XAI)
Can we explain why the model made this decision? Critical in regulated industries. "The model said so" is not acceptable when the outcome is a loan denial, a content removal, or a hiring decision. Design explainability into the product — audit logs, confidence breakdowns, contributing factors shown to users or reviewers.
Human oversight
High-stakes decisions should have human review in the loop. Not just as a safety net — human corrections are training signals that improve the model over time. Design review queues and escalation paths as first-class product features, not afterthoughts.
Data governance & Privacy
Users have rights over their data. PMs must understand GDPR, CCPA, and sector-specific regulations. Data minimisation (only collect what you need), retention limits, consent management, and the right to deletion must be in the product requirements — not just the legal documents.
Security & Adversarial robustness
AI systems can be attacked — through prompt injection (manipulating LLM behaviour via malicious inputs), data poisoning (corrupting training data), and model extraction (stealing model weights). PMs should include adversarial testing in the product's security requirements.
Transparency
Users interacting with AI should know they're interacting with AI. Outputs should communicate uncertainty. Limitations should be disclosed. Trust is built through honesty about what the system can and cannot do.
Responsible AI in requirements
As a PM, responsible AI isn't a separate workstream — it's embedded in every story. Add to your story template: a bias evaluation criterion, an explainability requirement, a data retention note, and a human review gate definition. These are DoD items.
Common question
How do you ensure responsible AI practices in your products?
"I treat responsible AI as a product requirement, not a compliance review. At requirements stage I ask: could this model produce disparate outcomes for different user groups? How do we explain a model decision to a user who disputes it? What data are we collecting and for how long? I write these as explicit acceptance criteria — fairness thresholds, explainability hooks, data retention limits. They're in the Definition of Done alongside performance and accuracy criteria."
📍 Suggested learning pathStep 4 of 5 · ~15 min read
AI PM Practice
Managing AI Products
The practical craft of being an AI PM — from writing model specs to running UAT to monitoring production.
AI-specific additions to user stories
The 3-layer AI user story
Layer 1 — Functional spec: Standard user story format — As a [persona], I want [action], so that [outcome]
Layer 2 — Model spec: Input data schema, expected output format, confidence threshold, fallback behavior, evaluation dataset, precision/recall targets
Layer 3 — Prompt spec (for LLM features): Instruction architecture, few-shot examples, output format constraints, guardrails against harmful outputs
Most PMs write only Layer 1. Senior AI PMs own all three.
UAT for AI features — two phases
Phase 1 — Model validation
Does the AI output meet the defined quality metrics on real-world test data?
Run the model on a held-out evaluation dataset representative of production. Measure precision, recall, F1. Validate confidence threshold distribution. Check for bias across demographic slices. Pass/fail against predefined thresholds.
Phase 2 — Workflow validation
Does the end-to-end product flow work as intended?
Test the full workflow including human review queues, escalation paths, error states, and UI. Involve operational SMEs — they know the edge cases. Run adversarial tests (inputs the model should reject). Define explicit exit criteria before UAT begins.
Production monitoring checklist
Accuracy monitoring
Track prediction accuracy against ground truth labels (from human corrections, downstream outcomes, or sampling). Alert when accuracy drops below threshold.
Data drift detection
Track statistical distribution of input data over time. Alert when it diverges significantly from training distribution — this predicts model performance degradation before it's visible in accuracy metrics.
Confidence distribution monitoring
Track the distribution of confidence scores. If the average confidence drops, the model is encountering unfamiliar inputs. If it rises too high, it may be overfitting.
Human review queue health
Track volume, resolution time, and correction rate in the human review queue. Rising correction rate = model degrading. Rising volume = automation rate falling.
Step 4 of 5 · Practice
AI PM Practice
Discovery & Requirements
The upstream work that determines whether you build the right thing — before a single sprint begins.
The discovery mindset
Good product discovery answers four questions before any solution is proposed:
1. Is this a real problem? How often does it occur? Who experiences it? What's the cost? 2. Is the problem worth solving? Does it align with business objectives? Is the addressable impact meaningful? 3. Can AI solve it? Is the pattern learnable from data? Is there a feedback mechanism? Are the failure modes acceptable? 4. What does success look like? Define measurable outcomes before solution design begins.
Discovery techniques
Stakeholder interviews
Structured conversations to understand business objectives, pain points, and constraints. Use open-ended questions. Listen for the problem behind the problem — stakeholders often request solutions, not problems.
Process mapping
Document the current-state workflow in detail — every step, every decision point, every handoff. Identify where delays, errors, and manual effort concentrate. These are your AI opportunity zones.
Data analysis
Analyse operational data to quantify pain. How often does the problem occur? How long does it take? What's the error rate? Numbers turn anecdotes into requirements.
Jobs-to-be-done framing
"When [situation], I want to [motivation], so I can [expected outcome]." JTBD strips away solution assumptions and keeps focus on what the user is actually trying to accomplish.
BRD structure for AI features
1. Problem Statement — not the solution; the specific pain, with evidence 2. User Persona — who experiences this, in what context, how often 3. Current State — how it works today, including pain points and manual steps 4. Desired Outcome — what success looks like for the user and the business 5. Constraints — regulatory, technical, data availability, budget 6. AI/Model Requirements — input schema, output format, confidence thresholds, fallback, evaluation criteria 7. Success Metrics — how you'll measure whether the feature achieves its goal 8. Out of Scope — explicitly what will not be built in this release
AI PM Practice
AI Metrics & KPIs
Define success before you build — and measure the right things after you ship.
Model quality metrics
Accuracy
% of all predictions that are correct. Simple, but misleading with imbalanced datasets. If 95% of inputs belong to one class, a model that always predicts that class has 95% accuracy but is useless.
Precision
Of all positive predictions, how many were actually positive? High precision = low false positive rate. Important when false positives are costly (auto-approving a wrong transaction, triggering a false fraud alert).
Recall
Of all actual positives, how many did the model catch? High recall = low false negative rate. Important when missing a true positive is costly (missing a fraud signal, failing to detect a safety issue).
F1 Score
Harmonic mean of Precision and Recall. Use when you need to balance both. F1 = 2 × (P × R) / (P + R). Ranges from 0 to 1 — higher is better.
Confidence Threshold
Minimum probability score for a prediction to be acted on automatically. This is a product decision, not just a model parameter — it determines the automation rate vs. human review volume trade-off.
AUC-ROC
Area Under the ROC Curve — measures how well the model distinguishes between classes across all threshold settings. AUC = 1.0 is perfect; 0.5 is random. Useful for comparing models before choosing a deployment threshold.
Precision vs Recall trade-off (PM framing): "Higher recall means we catch more true cases but also flag more false ones — that means more human review volume. Higher precision means less review work but we might miss real cases. The right balance depends on the cost of each error type in this specific use case."
Business / product KPIs for AI features
Automation rate
% of tasks handled by AI without human intervention. Higher is not always better — defines the trade-off between scale and quality.
Time to resolution
How long it takes to complete a task end-to-end. AI should reduce this — measure before and after deployment.
Human correction rate
% of AI outputs that are corrected by human reviewers. Rising correction rate = model degrading or distribution shifting.
Adoption rate
% of eligible users actively using the AI feature. High accuracy but low adoption signals a UX or trust problem — not a model problem.
Error cost
The business cost of a model error — financial, reputational, regulatory. Must be quantified to set appropriate quality thresholds.
Latency (p95)
The response time at the 95th percentile — not the average. Averages hide tail latency that kills user experience. Define latency SLAs in requirements.
📍 Suggested learning pathStep 5 of 5 · ~30 min read
Interview Prep
Mock Q&A Bank
40+ high-probability interview questions with model answer frameworks. Customise each answer with your own examples — never deliver a generic answer in an interview.
Very common
What does good product management look like to you?
"Good product management starts with ruthless clarity on the problem — not the solution. I spend disproportionate time in discovery: who is experiencing the pain, how often, at what cost, and why existing solutions fall short. Then I translate that into a prioritised, outcome-driven roadmap — not a feature list. And I stay accountable through delivery: monitoring adoption, measuring outcome metrics, and iterating. For AI products specifically, good PM means defining success metrics before you build — not after deployment."
Very common
How do you define your product vision?
"A product vision should answer: what world are we trying to create for our users, and why does it matter? I work backward from the ideal end state — what does the user's life look like when our product is working perfectly? Then I identify the biggest gap between today and that state, and that becomes the strategic focus. The vision should be ambitious enough to inspire but specific enough to make trade-off decisions easy."
Common
How do you decide what NOT to build?
"I look for three signals: the feature doesn't move any of our outcome KPIs; it serves one client's need but fragments our platform for everyone else; or the cost of building and maintaining it outweighs the value. The hardest 'no' is to a high-revenue client who wants a custom feature. I explain the trade-off transparently, offer an alternative that meets the underlying need within the product strategy, and document the decision."
Common
How do you build and communicate a product roadmap?
"I use a now/next/later format — not a Gantt chart with fake precision. 'Now' is committed sprint work. 'Next' is prioritised but not scheduled. 'Later' is directional intent, not promises. Each item is linked to an outcome OKR so stakeholders understand why it's on the roadmap, not just what it is. I review the roadmap with stakeholders quarterly, and I make clear that the roadmap reflects current knowledge — it will change as we learn more."
Very common
Walk me through how you manage a product backlog.
"I use a combination of RICE scoring and Kano categorisation. First I categorise: basic needs get prioritised regardless of score; performance features get ranked by RICE; delighters get time-boxed to innovation capacity. For AI features I add a data readiness gate — a high-RICE feature that lacks labeled training data still can't enter the sprint. I run refinement sessions mid-sprint to ensure the top of the backlog is always sprint-ready. Nothing enters sprint planning without clear AC and no unresolved dependencies."
Very common
Describe a time a sprint didn't go as planned. What did you do?
Use the STAR format. Set up a real scenario: a technical dependency discovered mid-sprint, a model performing below threshold during UAT, a stakeholder changing requirements on Day 4. Describe: how you triaged, what you descoped and why, how you communicated to stakeholders, and what process change you introduced in the retrospective to prevent recurrence. End with what you learned.
Common
How do you handle scope creep?
"I prevent it upstream rather than managing it mid-sprint. Clear AC, a defined DoD, and a Sprint Goal that everyone has agreed to are my primary defences. When new requests come in mid-sprint, I acknowledge them, log them in the backlog, and explain why adding them now would put the Sprint Goal at risk. The only exception is a critical production issue — and even then I work with the SM to descope something of equivalent size."
Common
How do you work with data science and engineering teams?
"I treat data scientists as full product partners in discovery — not just executors. I involve them early when evaluating whether ML is the right approach. I write model specs (not just functional specs) so they have clear quality targets and evaluation criteria. With engineering, I'm specific about what I need, open about what I don't know, and I never over-specify the 'how'. My job is to describe the problem and the success criteria precisely — their job is to find the best technical path."
Very common
How do you measure the success of an AI feature?
"I define success metrics at requirements stage — never after deployment. For automation features: automation rate, confidence threshold distribution, human correction rate. For classification models: precision, recall, F1 calibrated to the cost of each error type. For user-facing features: task completion rate, time-to-resolution, user satisfaction. And I define a monitoring requirement in the DoD: how will we detect drift, what triggers a review, and what is the retraining cadence?"
Very common
Walk me through how you write acceptance criteria for an AI feature.
"I write AC in Given/When/Then format across four scenario types: happy path, edge case, failure case, and non-functional. For AI features specifically I add a fifth: model quality AC — the precision or recall threshold the model must meet on a defined evaluation dataset before the story is accepted. And a sixth: monitoring AC — how will we detect in production if the model's performance degrades. Most teams skip the last two. That's where production AI features break silently."
Common
How do you decide where to set the confidence threshold?
"It's a product decision rooted in the cost of each error type. I ask: what happens when the model is wrong and we auto-acted? What's the cost of that error versus the cost of routing to human review? I map this on a simple matrix: high-consequence decisions (financial, legal, medical) get conservative thresholds — lower automation, higher human review. Lower-stakes decisions can tolerate higher automation with occasional errors. I review the threshold quarterly as we gather production data."
Common
How do you handle it when an AI model performs worse in production than in testing?
"First I investigate the gap: is the production input distribution different from the test set? Are there data quality issues in production? Are edge cases that weren't in the test set appearing at high frequency? Then I act based on the severity — if the performance gap is critical, I roll back to a rules-based fallback or increase the human review threshold temporarily. I then work with data science to collect and label production examples and retrain. And I add distribution monitoring to the production observability stack to detect this earlier next time."
Very common
How do you handle competing priorities from different stakeholders?
"I use business value and user impact as the anchor — not seniority or volume. I map competing requests against the current OKRs and ask: which item moves the needle on an outcome we've committed to? I document the trade-offs and present a clear 'Option A vs Option B' to the decision-maker — with the business case for each, not just a preference. If two items genuinely tie, the one that unblocks more downstream work wins."
Common
How do you manage a stakeholder who keeps changing requirements?
"I first understand why requirements are changing — is it new information, evolving business context, or unclear initial discovery? If it's the latter, the root cause is upstream and I fix my discovery process. For mid-sprint changes, I explain the impact transparently: 'Adding this now means descoping X and delaying Y — is that a trade-off you want to make?' Documenting the decision and its rationale creates accountability on both sides and reduces future scope churn."
Common
How do you communicate AI limitations to business stakeholders?
"I translate technical metrics into business language. Instead of 'the model has 87% precision,' I say 'for every 100 recommendations, roughly 13 will be off-target — here's how we catch and correct those.' I always pair a limitation with the mitigation design. And I set expectations proactively at launch — stakeholders who understand the limitation upfront are partners in improvement; stakeholders who discover it after the fact are complainants."
Very common
Tell me about a product you shipped that didn't perform as expected. What did you learn?
Use STAR. Pick a real example. The lesson should be about your discovery process, success metric definition, or assumption validation — not about blaming engineering or external factors. Interviewers are looking for intellectual honesty and learning agility.
Very common
How do you make decisions under ambiguity?
"I structure the ambiguity first — what do I know, what don't I know, and what's the cheapest way to find out? If I need to make a decision before I can gather data, I make the reversible bet and monitor closely. For irreversible decisions I invest more in upfront research. I document my assumptions explicitly so I can validate them early and course-correct before the cost of being wrong becomes too high."
Common
What's the hardest "no" you've said in your PM career?
Pick a specific example where you declined a significant feature request or killed a project mid-stream. The answer should show you defended a principle (product coherence, user trust, technical sustainability) over short-term pressure, communicated the decision respectfully, and offered an alternative path that served the underlying need.
Step 5 of 5 · Final step
Interview Prep
Questions to Ask Them
Strong questions signal seniority. They show you've done the homework, think strategically, and care about the right things. Always prepare at least 3 per interviewer.
Strategic questions (for senior/VP interviewers)
How does your product team stay connected to customer problems at scale — is discovery embedded in delivery, or is it a separate function?
Why it works: Shows you think about discovery as an ongoing discipline, not a phase that ends when development starts.
What does the product team's relationship with data science look like — are they embedded in squads, or are they a centralised resource teams request from?
Why it works: Reveals how AI products actually get built here, and whether the PM has real influence over model decisions.
Where do you see the biggest unsolved product problem in your AI portfolio right now?
Why it works: Shows ambition and curiosity. Gives you signal on what the role will actually focus on.
What does the first 90 days look like for someone in this role — is it more discovery and listening, or are there immediate delivery expectations?
Why it works: Shows you think in outcomes from Day 1. Also gives you the actual scorecard they'll use to evaluate you.
How does the product team handle the tension between platform scalability and client-specific customisation?
Why it works: This is the central tension in any B2B AI platform. Asking it signals you understand enterprise product management.
What are the most common failure modes you've seen in AI feature delivery here — where do requirements tend to break down between product, data science, and engineering?
Why it works: Invites honesty. Shows you know AI PM is harder than traditional PM and you're already thinking about how to avoid known pitfalls.
Universal close — for every session
Best closing question: "Based on what you've heard from me today — is there anything you'd want me to expand on, or any area where you'd like more evidence of fit?"
This invites objections while you're still in the room. It signals confident self-awareness — a senior PM trait. Use it in every session without exception.
Never end with: "So, what are the next steps?" — let them offer that. Your close should be substantive, not administrative.
Reference
Glossary
Essential vocabulary for AI PM interviews. Read through this once before any interview — precise language signals depth.
Agile
An iterative approach to software delivery based on the 2001 Agile Manifesto. Values working software, customer collaboration, and responding to change over rigid planning.
Scrum
The most widely used Agile framework. Organises work into time-boxed Sprints with defined roles (PO, SM, Dev Team), artifacts (Product Backlog, Sprint Backlog, Increment), and ceremonies.
Sprint
A fixed-length iteration (typically 2 weeks) during which the team builds and delivers a potentially shippable product increment.
Epic
A large body of work that can be broken down into Features and User Stories. Typically spans multiple sprints and ties to a strategic objective.
User Story
A small, deliverable unit of user value written as: "As a [persona], I want [action], so that [outcome]." Must fit within one sprint and have clear Acceptance Criteria.
Acceptance Criteria (AC)
Specific, testable conditions that must be met for a User Story to be accepted as complete. Written in Given/When/Then format covering happy path, edge case, failure case, and non-functional requirements.
Definition of Done (DoD)
A team-wide checklist that defines when any piece of work is truly complete — code review, testing, documentation, deployment, and for AI features: model evaluation, monitoring setup, and human review gate testing.
Velocity
The average story points a team completes per sprint. Used for forecasting — not as a performance measure. Treat changes in velocity as signals, not targets.
Backlog Refinement
A mid-sprint session where the PO and team review, clarify, estimate, and prioritise upcoming backlog items to ensure the top of the backlog is always sprint-ready.
Story Points
A relative estimation unit for story complexity, effort, and uncertainty. Uses Fibonacci sequence (1, 2, 3, 5, 8, 13, 21). Team-calibrated — not equivalent to hours.
LLM (Large Language Model)
A deep learning model trained on vast text data to predict and generate text. Powers modern generative AI products. Examples: GPT-4, Claude, Gemini.
RAG (Retrieval-Augmented Generation)
A pattern that connects an LLM to a real-time knowledge retrieval system, grounding responses in current, factual data rather than training data alone.
Prompt Engineering
The practice of designing inputs to LLMs to reliably produce desired outputs. Techniques include zero-shot, few-shot, chain-of-thought, and system prompts.
Fine-tuning
Adapting a pre-trained foundation model for a specific domain or task using labeled examples. Produces better performance on the target task at lower inference cost than prompting alone.
Embeddings
Numerical vector representations of text that capture semantic meaning. Similar texts have similar embeddings. Powers semantic search, RAG, and recommendation systems.
Hallucination
When an LLM confidently produces factually incorrect or fabricated information. A fundamental limitation — design products with mitigation in mind (RAG, human review, confidence indicators).
Precision
Of all positive predictions, how many were actually positive? High precision = low false positive rate.
Recall
Of all actual positives, how many did the model find? High recall = low false negative rate.
F1 Score
Harmonic mean of Precision and Recall. The balanced metric when both error types matter.
Model Drift
Performance degradation over time as real-world data patterns shift away from the training distribution. Requires monitoring and retraining pipelines.
Human-in-the-loop (HITL)
A design pattern where human judgment is incorporated into an AI system's decision process — for review, correction, or approval of model outputs.
Agentic AI
AI systems that autonomously pursue goals through multi-step reasoning and action, using tools and external services to accomplish tasks without human direction at each step.
MCP (Model Context Protocol)
An open standard for connecting AI models to external tools and services through a standardised interface — enabling composable, interoperable AI agent tooling.
Confidence Threshold
The minimum probability score a model prediction must achieve to be acted on automatically. Below this threshold, the decision is routed to human review. A product decision with business consequences.
Vector Database
A database optimised for storing and searching embeddings (high-dimensional vectors). Powers semantic search and RAG systems. Examples: Pinecone, Weaviate, Chroma. A PM building RAG-based products needs to understand this as infrastructure.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human raters evaluate model outputs and those ratings are used to fine-tune the model toward preferred behaviour. Used to align LLMs with human values and reduce harmful outputs.
Constitutional AI
An alignment technique (Anthropic) where a model is trained to critique and revise its own outputs according to a set of written principles — reducing reliance on human feedback at scale.
Data Flywheel
A virtuous cycle where more users generate more data, which improves the model, which attracts more users. A core competitive moat for AI products — companies with proprietary usage data can compound their model advantage over time.
Shadow Mode
A deployment pattern where the model runs in production and generates predictions, but those predictions are not shown to users or used to make decisions. Used to validate production performance before going live.
Canary Deployment
Releasing a new model version to a small percentage of traffic (e.g., 5%) before rolling out to everyone. Allows real-world validation without full exposure — a standard AI PM release practice.
A/B Model Testing
Running two model versions simultaneously on split traffic and comparing performance on business metrics. The AI equivalent of A/B testing features — essential for validating model improvements before full rollout.
Ground Truth
The correct, verified answer that a model's prediction is compared against during evaluation. Defining ground truth is a product decision — who labels it, at what quality, and how disagreements are resolved.
Synthetic Data
Artificially generated data used to train or augment models — useful when real data is scarce, sensitive, or imbalanced. An AI PM should understand when synthetic data is appropriate and what risks it introduces (distributional shift).
Model Card
A standardised document describing a model's intended use, performance metrics, limitations, training data, and known failure modes. The AI PM's equivalent of a product requirements document — mandatory for responsible AI deployment.
Product Roadmap
A strategic communication tool showing what will be built, approximately when, and why. Best maintained as a now/next/later format — not a Gantt chart. Living document, not a contract.
OKR (Objectives and Key Results)
A goal-setting framework. Objective = qualitative, inspirational goal. Key Results = measurable outcomes that indicate progress. Separates outcomes (what we're trying to achieve) from outputs (what we're building).
RICE Score
A prioritisation formula: (Reach × Impact × Confidence) ÷ Effort. Used to rank backlog items relative to each other. Higher score = higher priority.
Kano Model
A framework for categorising features by how they affect user satisfaction: basic needs (must-haves), performance needs (more is better), and delighters (unexpected value creators).
MoSCoW
A release scoping framework: Must have, Should have, Could have, Won't have (this time). Used to negotiate scope when time or resources are constrained.
Discovery
The upstream product work of understanding user problems, validating assumptions, and defining opportunity before any solution is built. Good discovery prevents building the wrong thing.
North Star Metric
The single metric that best captures the core value the product delivers to users. Everything else is either an input to this metric or a health metric that shouldn't decline while pursuing it.
MVP (Minimum Viable Product)
The smallest version of a product that delivers enough value to test a hypothesis with real users. Not the smallest thing you can ship — the smallest thing that generates learning.
PI Planning
Program Increment Planning (from SAFe). A large-group planning event where multiple teams align on a 10-week increment of work, resolving dependencies and setting shared objectives.
EU AI Act
The world's first comprehensive AI regulation (EU, 2024). Classifies AI systems into 4 risk tiers: Unacceptable (prohibited), High (strictly regulated), Limited (transparency required), and Minimal (no obligation). High-risk systems require conformity assessments, human oversight, and auditability.
High-Risk AI System
Under the EU AI Act: AI used in healthcare, credit scoring, recruitment, law enforcement, education assessment, or critical infrastructure. Requires technical documentation, human oversight mechanisms, data governance, and registration before deployment.
GDPR (General Data Protection Regulation)
EU regulation governing personal data processing. Key AI implications: data minimisation, right to explanation of automated decisions, right to object to automated processing, and consent requirements for training data use.
Right to Explanation
Under GDPR: individuals can request a meaningful explanation of automated decisions made about them. Forces AI PMs to design explainability into the product — not as a feature, but as a user right.
Model Card
A standardised documentation format for AI models — intended use, performance metrics, limitations, training data characteristics, known failure modes, and ethical considerations. Required for responsible AI governance and increasingly expected by enterprise customers.
SR 11-7
US Federal Reserve guidance on model risk management for financial institutions. Requires independent model validation, documentation of assumptions and limitations, and ongoing performance monitoring. Sets the de facto standard for AI governance in US financial services.
Conformity Assessment
Under the EU AI Act: the process by which a high-risk AI system is evaluated against regulatory requirements before market placement. Third-party assessment is required for the highest-risk categories (biometrics, critical infrastructure).
Privacy by Design
The principle that data protection should be built into a system's architecture from the outset — not added as a compliance layer at the end. A core expectation of GDPR-compliant AI product development.
Algorithmic Accountability
The principle that organisations are responsible for the outcomes produced by their AI systems — including unintended harms, bias, and errors. An AI PM is a key accountability node: they define the system's goals, acceptance criteria, and monitoring strategy.
Bias Audit
A systematic evaluation of an AI model's outputs across demographic groups to detect disparate impact or discriminatory patterns. Increasingly required by regulation (EU AI Act, NYC Local Law 144) and increasingly expected by enterprise buyers.
Interview Prep
Practice Mode
Three practice modes to prepare for your AI PM interview. Start with Quick Drill for fast repetition, use Mock Interview for realistic practice, or tackle Scenario Challenges for situational judgment.
Choose a category
📚
All Cards
54 cards
🏗️
Foundations
8 cards
Role, narrative, Agile, user stories
💡
AI Knowledge
10 cards
LLMs, RAG, agents, responsible AI
🚀
Product Craft
8 cards
Prioritisation, metrics, roadmapping
📦
Delivery
8 cards
Scoping, estimation, UAT, monitoring
🤝
Stakeholder & Comms
6 cards
Trade-offs, difficult conversations
🎤
Behavioural
8 cards
STAR stories, leadership, failure
🧩
Scenario Challenges
6 cards
Full situational judgment questions
Advanced
Card 1 of 54
Easy: 0 · Hard: 0
Foundations
Loading...
Type your answer below, then flip
Framework
Model Answer
What interviewers test:
0 / 20 min60s
Session complete! 🎉
Select interview format
🎯
Full Interview
~20 min · 7 questions
Strategy, AI knowledge, delivery, behavioural
⚡
Quick Round
~10 min · 4 questions
Core AI PM competencies only
How it works: The AI coach will ask one question at a time. Type your answer in the chat panel and send it. After each answer you'll get structured feedback. A session summary appears after all questions.
Mock interview in progress. Answer each question in the chat panel (right side). The AI coach will give feedback and ask the next question.
Mock Interview Complete
Scenario Challenges
Real-world AI PM situations. Read the scenario, write your response, get AI-powered rubric feedback.
AI PM Practice
AI Tools & Prompting
The LLM landscape, prompt engineering techniques, anti-patterns, and how to evaluate AI outputs. These topics appear frequently in AI PM interviews — especially for roles involving LLM-powered products.
The LLM landscape — what a PM needs to know
As an AI PM you don’t need to know which model has the highest benchmark score — you need to know how to make the build vs buy vs configure decision, what trade-offs matter for your product, and how to evaluate model outputs for your use case.
Foundation models (GPT-4, Claude, Gemini, Llama)
Large pretrained models with broad capability. Available via API (GPT-4, Claude) or open-weight for self-hosting (Llama, Mistral). API models: faster to start, vendor dependency. Open-weight: more control, higher infrastructure cost.
Specialised / fine-tuned models
Foundation models adapted for a specific domain or task via fine-tuning. Use when: you have labelled domain data, the base model consistently underperforms on your use case, or you need consistent output format/style.
Embedding models
Convert text into numerical vectors for semantic search, clustering, or retrieval. The backbone of RAG systems. Not generative — they don’t produce text. Key metric: retrieval accuracy on your specific domain.
Build vs buy vs configure decision
Buy (API): commodity capability, speed matters, no proprietary data advantage. Configure (fine-tune/RAG): good base model but needs domain grounding or consistent style. Build: unique capability, proprietary data moat, regulatory reason to self-host. Rare in most enterprise contexts.
Prompt engineering techniques
Zero-shot prompting
Give the model a task with no examples. Works for well-defined tasks the model has seen in training. Fast to iterate. Fails when the task is ambiguous or requires specific output format.
Few-shot prompting
Provide 2–5 examples of input-output pairs before the task. Dramatically improves consistency and format adherence. Use when zero-shot produces variable outputs or when the task has a specific structure.
Chain-of-thought (CoT)
Ask the model to reason step-by-step before giving an answer: “Think through this step by step.” Improves accuracy on reasoning tasks. Trade-off: more tokens = higher latency and cost.
System prompts
A persistent instruction that shapes the model’s behaviour for all turns in a conversation. Use for: persona, tone, output format, constraints, guardrails. The most important lever for consistent behaviour in production.
Retrieval Augmented Generation (RAG)
Augment the prompt with retrieved context from a knowledge base before generating. Reduces hallucination, keeps responses grounded in your data. PM responsibility: define retrieval quality requirements and evaluate whether retrieved chunks are relevant.
Structuring a production prompt
1. Role / persona — “You are a [role] helping [user type]...” 2. Task — what you want the model to do, precisely 3. Context — relevant background or retrieved documents 4. Format — output structure (JSON, bullet list, max length) 5. Constraints — what not to do, guardrails, tone restrictions 6. Examples — 1–3 good examples if format consistency is critical
Common prompt engineering anti-patterns
Vague instructions
“Be helpful and concise” means nothing to a model. “Respond in 3 bullet points, each under 20 words” is specific. Vague prompts produce variable outputs — variability in production is a reliability problem.
Prompt injection vulnerability
When user input is concatenated directly into a system prompt, malicious users can override instructions: “Ignore all previous instructions and...” Mitigation: validate/sanitise user input, use structured formats (JSON), separate system instructions from user content, add explicit injection defence in your system prompt.
Over-engineering before testing
Building a complex multi-step prompt pipeline before validating that a simple prompt works. Start with zero-shot, measure on your eval set, then add complexity only where it demonstrably improves results.
No fallback for model refusal
Models sometimes refuse to answer or produce unexpected outputs. Production systems need a fallback: a default response, an error message, or a human escalation path. Never assume the model will always produce the expected output.
Evaluating LLM outputs
Why evals matter for PMs
You can’t A/B test a model change in production the same way you test a UI change. Evals are your test suite — a fixed dataset of inputs with expected outputs or quality criteria. Without evals, every model update is a leap of faith.
Types of evals
Automated: rule-based checks (format, length, required fields), embedding similarity to reference answers. Model-graded: use another LLM as a judge — scores for relevance, accuracy, tone, helpfulness. Human eval: experts rate outputs on a rubric. Expensive but the ground truth for novel use cases.
What to include in your eval dataset
Representative inputs · Edge cases · Adversarial examples (jailbreak attempts, injection attempts) · Failure modes from production monitoring. Update continuously — a stale eval set is worse than no eval set.
Responsible AI considerations for LLMs
Content safety: filter harmful outputs. Privacy: don’t include PII in prompts. Bias: test across demographic groups. Explainability: can you explain why the model gave that output? These are PM requirements, not just engineering concerns.
Common — LLM product roles
How would you defend an LLM product against prompt injection?
“I treat it as a security requirement, not an afterthought. At the prompt level: explicitly instruct the model to ignore attempts to override its instructions, use structured input formats that separate user content from system instructions, and add injection defence language. At the application level: validate and sanitise all user inputs before they reach the model, implement content filtering on outputs, and log anomalous patterns for review. I include prompt injection test cases in the eval dataset and test every prompt change against them.”
Common
How do you decide between fine-tuning and RAG?
“They solve different problems. RAG is better when you need the model to access specific, up-to-date, or proprietary information at inference time — it grounds the response in retrieved context. Fine-tuning is better when you need the model to behave consistently in a specific style or domain, and the knowledge is stable. I start with RAG because it’s faster to iterate and easier to update. I consider fine-tuning when RAG retrieval quality is the bottleneck or when the model needs to adopt a very specific output format or persona that prompting alone can’t reliably achieve.”
AI PM Practice
Process Optimisation
AI PMs often get involved before development — mapping current processes, identifying automation opportunities, and designing the future state. These skills appear in discovery and stakeholder interviews.
Why process mapping matters for AI PMs
Before you can automate a process with AI, you need to understand exactly how it works today. Skipping the as-is mapping leads to automating the wrong steps, missing edge cases, and building AI that creates new problems faster than it solves existing ones.
SIPOC — the fast-frame technique
Suppliers: who provides inputs to the process? Inputs: what information or materials enter? Process: the high-level steps (5–7 boxes) Outputs: what does the process produce? Customers: who receives the output? SIPOC is used in the early discovery phase — before you go deep — to agree on scope with stakeholders.
Swim lane diagram
Shows who does what at each step. Each horizontal lane = a role or system. Flow arrows show handoffs. Handoffs are where delays, errors, and frustration concentrate — these are your AI opportunity zones.
As-Is → To-Be process design
As-Is (current state): Map exactly what happens today — every step, every handoff, every decision point, every system. Don’t idealise it. Include the workarounds and manual steps people do informally.
Pain point analysis: For each step ask: How long does this take? What goes wrong here? What’s the error rate? How much human judgement is required?
To-Be (future state): Design the optimised process — which steps are automated, which are eliminated, which are transformed. Show the roles and systems in the new state. The gap between as-is and to-be is your product scope.
Questions to ask during as-is mapping workshops
• Walk me through exactly what you do when [trigger event] happens • What information do you use to make that decision? • How often does this step fail or require rework? • What happens when [edge case]? • Where do you spend the most time that feels like it shouldn’t be manual?
Value Stream Mapping
Value Stream Mapping (VSM) extends process mapping by adding time data. For each step you capture: process time (how long the actual work takes) and wait time (how long before work starts). This makes the biggest time-sink visible — usually it’s queue time between steps, not the steps themselves.
Key VSM metrics
Lead time: total time from trigger to output (process + wait) Process time: time actually spent working (value-add) Efficiency: process time ÷ lead time × 100% Most enterprise processes have 5–15% efficiency — 85–95% of elapsed time is waiting.
What VSM reveals
The biggest AI opportunity is rarely the step that takes the most process time — it’s the step that creates the most wait time downstream. A model that eliminates a 2-minute task that blocks a 2-day queue is worth more than one that saves an hour at the end of the process.
Identifying AI automation opportunities
Not every step should be automated. Evaluate each process step against three criteria: volume (how often does it happen?), structure (is the input/output well-defined?), and tolerance for error (what happens if the AI is wrong?).
High-value AI targets
• High-volume, repetitive classification tasks (routing, triaging, categorising) • Document extraction and summarisation • Draft generation from structured inputs • Anomaly detection in data streams • Recommendation based on historical patterns
Low-fit for AI automation
• Novel situations with no historical precedent • High-stakes decisions where error consequence is severe • Tasks requiring relationship or emotional intelligence • Creative work requiring originality and context • Processes where the rules change frequently
The automation spectrum
Assist: AI suggests, human decides — lowest risk, highest trust Automate with review: AI decides, human reviews flagged cases Automate with monitoring: AI decides, exceptions escalate automatically Full automation: AI decides end-to-end — only appropriate for low-stakes, high-confidence use cases
Change management: AI products fail not because the model is bad but because the humans who use it don’t trust it or change their behaviour. Build adoption into your requirements: explainability, feedback mechanisms, and a trust-building rollout plan.
Common in discovery-heavy roles
How do you identify where to apply AI in an existing business process?
“I start with the as-is process — map every step, measure volume and time, identify where decisions happen and what information drives them. Then I apply three filters: is this high-volume and repetitive? Is the input and output well-structured? What’s the cost of an error? The intersection of high volume, structured data, and tolerable error rate is where AI delivers reliable value. I prioritise the step that creates the biggest downstream wait — not necessarily the one that takes the most process time.”
Common
How do you handle change management when introducing AI automation to a team?
“I treat adoption as a product requirement, not an afterthought. Three things I do: First, involve the people who do the current process in the design — they surface the edge cases and their buy-in matters. Second, design the AI to assist rather than replace, at least initially — ‘AI suggests, you decide’ builds trust faster than full automation. Third, make the AI’s reasoning visible where possible — when people can see why the model made a recommendation, they trust it more and can correct it when it’s wrong.”
Strategy
AI Governance & Regulation
The regulatory landscape every AI PM must navigate. Knowing this separates safe operators from liability risks — and signals seniority in interviews.
Why it matters: The EU AI Act is the world's first comprehensive AI law. If your product operates in Europe or serves EU users, it applies to you — regardless of where your company is based.
The 4 Risk Tiers
Unacceptable Risk — Prohibited
Banned outright. Includes: real-time biometric surveillance in public spaces, government social scoring systems, AI that exploits psychological vulnerabilities or manipulates users subliminally. An AI PM must never build these.
High Risk — Strictly Regulated
Can be built but requires conformity assessment, transparency, human oversight, and registration in an EU database. Includes: AI in healthcare decisions, credit scoring, recruitment screening, law enforcement tools, critical infrastructure, and education assessment.
Limited Risk — Transparency Required
Must disclose that users are interacting with AI. Includes: chatbots, deepfakes, emotion recognition systems. Users must know they're not talking to a human.
Minimal Risk — No Legal Obligation
Most AI applications — spam filters, recommendation engines, inventory optimisation. Best practice still applies, but no mandatory compliance requirements beyond existing consumer law.
What High-Risk compliance requires from a PM
Risk management system — ongoing identification and mitigation throughout the AI lifecycle, not just at launch
Data governance — training data must be relevant, representative, and as free from errors and bias as possible
Technical documentation — a model card: capabilities, limitations, intended use, performance metrics, known failure modes
Human oversight — humans must be able to monitor, intervene, and override the system at any point
Accuracy & robustness — the system must perform reliably and be tested against adversarial inputs and edge cases
Logging & auditability — all events must be logged to enable post-hoc review and incident investigation
GDPR applies when: you are processing personal data of EU residents — regardless of where your company is headquartered.
Key GDPR principles for AI PMs
Data minimisation
Collect only the data your model genuinely needs. Training on personal data "just in case" is non-compliant. Define data requirements from the model spec, not from what the data warehouse happens to have.
Right to explanation
Users have the right to understand automated decisions made about them. Explainability must be designed into the product from the start — not bolted on for compliance at the end.
Right to object to automated decisions
Individuals can object to decisions made solely by algorithms without human involvement. Design a clear human escalation path for any consequential AI decision — this is a first-class product feature, not an edge case.
Consent for training data
Using user-generated data to train or fine-tune a model almost certainly requires explicit consent. "We may use your data to improve our service" in ToS is increasingly insufficient — regulators are scrutinising this aggressively.
Privacy by Design
Data protection must be engineered from the first line of the model spec — not added as a compliance checklist before launch. Raise this in discovery: "What data do we need? What is the minimum? How do we handle deletion requests?"
PM compliance checklist
☐ Is any personal data being used to train or fine-tune the model?
☐ Do we have a legal basis for processing (consent, legitimate interest, or contract)?
☐ Can users request deletion of their data — and does that propagate to training sets?
☐ Can we explain any automated decision that affects a user in plain language?
☐ Has the DPO or legal team reviewed the data flows?
☐ Is there a human override path for every consequential AI decision?
Finance — SR 11-7 (US Federal Reserve)
What it is
US supervisory guidance on model risk management for financial institutions. Requires independent model validation, documentation of assumptions and limitations, and ongoing performance monitoring for any model used in credit, risk, or trading decisions.
What it means for a PM
Any AI model used in lending, fraud detection, or risk management must be validated by an independent team before deployment. Build model validation gates into your release process — not as a blocker, but as a first-class sprint milestone with its own acceptance criteria.
Healthcare — FDA AI/ML Guidance (US)
What it is
The FDA regulates AI-enabled Software as a Medical Device (SaMD). Continuously-learning algorithms face additional scrutiny — any significant change to an algorithm used in clinical decision-making may require re-submission and re-approval.
What it means for a PM
In healthcare AI, your model update process is a regulatory event — not just a deployment. Plan for Predetermined Change Control Plans (PCCPs) that pre-approve the specific conditions under which a model may update autonomously without triggering a new review cycle.
What interviewers actually want to see
You don't need to be a lawyer. Interviewers want to see that you:
1. Know that regulation exists and varies by industry, use case, and geography 2. Proactively involve legal, compliance, and data protection stakeholders early — in discovery, not post-launch 3. Design products with compliance built in: explainability, audit logs, human review paths 4. Know how to identify your product's risk tier and what obligations flow from it
High-Probability Question
"How do you ensure your AI product complies with regulations?"
Answer in 3 parts:
Map the landscape early. In discovery, I identify applicable regulatory frameworks — EU AI Act risk tier, GDPR if personal data is involved, and any sector-specific rules (FDA for healthcare, SR 11-7 for finance). I don't wait for legal to flag this. I bring compliance into the conversation at the requirements stage.
Design compliance in, not on. For high-risk systems, that means explainability by default, human override mechanisms, audit logging from day one, and a model card in the technical documentation. These are acceptance criteria — not features to be added later.
Gate the release process. Model validation, privacy impact assessments, and compliance sign-off are milestone gates in my release plan. They're scheduled sprints, not surprises before launch.
Follow-Up Question
"Your AI system made a decision that harmed a user. What do you do?"
Show structured incident thinking:
Immediately: Trigger the human escalation path — which should already exist in the product design. If the failure mode appears systemic, pause automated decisions in that category. Document everything for the audit trail.
Short-term: Root cause analysis with data science — was this a model failure, data drift, an edge case, or a threshold misconfiguration? Scope the blast radius: how many other users were affected?
Systemic fix: Adjust the confidence threshold, retrain or re-evaluate the model, add this case to the evaluation set, and update the model card. Communicate transparently to affected users per your legal obligations.
Strategy
Stakeholder Management
The skill most interviewers probe without naming it. AI PMs operate at the intersection of data science, engineering, legal, and business — managing each relationship differently.
The Power / Interest Grid
The most useful stakeholder framework. Plot each stakeholder on two axes — Power (ability to affect the project) and Interest (how much they care about the outcome).
High power, high interest → Manage closely. Co-create with them. These are your executive sponsors and key decision-makers.
High power, low interest → Keep satisfied. Inform them at key milestones. Don't overwhelm them with detail — escalate only when a decision requires them.
Low power, high interest → Keep informed. These are often the end-users or subject matter experts. They're your advocates if engaged well.
Low power, low interest → Monitor. Minimum effort. Don't ignore — a disengaged stakeholder can become a blocker if circumstances change.
AI-specific stakeholders to map
Data Science / ML Engineering
High power over model feasibility and timelines. Engage early and continuously. They'll tell you what's possible — listen to the constraints, then push back on what matters.
Legal & Compliance
High power to stop or reshape the product. Bring them into discovery — not sprint review. The earlier they're involved, the less expensive their input.
Executive Sponsors
High power, often low day-to-day interest. Communicate in business outcomes, not model metrics. They want to know: what does this do for revenue, cost, or risk?
Engineering & Platform
High interest in technical decisions. Work with them on infrastructure, integration, and MLOps — not just the feature. They'll flag dependencies you won't see from the product side.
End Users / Operations
The people who live with the output every day. High interest, often low formal power. Critical for UAT, edge case discovery, and post-launch monitoring. Under-engage them at your peril.
Data Protection / DPO
Power to block any feature that processes personal data. Involve at requirements stage. Their job is to find problems — frame that as value, not obstruction.
The core challenge: Executives want certainty. AI development produces uncertainty. Your job is to communicate honestly without losing confidence.
How to communicate AI uncertainty upward
Translate model metrics into business language
"87% precision" means nothing to a CFO. "13 in every 100 recommendations will be wrong — here's our plan to catch them before they affect customers" means everything. Always translate before you present.
Lead with the outcome, not the model
Don't open with "we're fine-tuning a transformer." Open with "we're building a system that will reduce manual review time by 40%." The model is the implementation detail — the outcome is the story.
Give a range, not a number
AI timelines and performance targets are genuinely uncertain. Give a calibrated range: "We expect 80–90% accuracy by Q3, with a production decision gate at 85%." Ranges signal honesty — not weakness.
Define the go/no-go criteria in advance
Before you start, agree with your exec sponsor: at what model performance threshold do we launch? What defines success at 3 months post-launch? Pre-agreement prevents uncomfortable conversations later.
Reporting cadence for AI products
Weekly (team): Model performance metrics, drift monitoring, current confidence thresholds
Bi-weekly (stakeholders): Progress against milestone gates, blockers, upcoming decisions that need input
Monthly (executives): Business outcome progress (adoption, error rate reduction, cost impact), risks with mitigation plans, next major milestone
Ad-hoc (immediately): Any incident where the model caused a user-facing error, any significant performance degradation, any change to the confidence threshold
Working with Data Science
What they need from you
Clear problem definition, labelled data requirements, evaluation criteria before they start building, and protection from scope creep mid-sprint. The worst thing you can do is change the success metric after training has begun.
What you need from them
Honest performance estimates with uncertainty ranges, early flagging of data quality issues, and a model card for every model that reaches production. Build a working relationship where they tell you bad news early — not after the sprint review.
Working with Engineering
MLOps is a product conversation
Model deployment, monitoring, retraining pipelines, and rollback procedures are product decisions, not just infrastructure. A PM who treats MLOps as someone else's problem will be blindsided by production failures.
Define the integration contract early
How does the model output get consumed? As an API? A batch job? A stream? What's the latency budget? These decisions affect model architecture — get engineering into the design conversation before any code is written.
Working with Legal & Compliance
The golden rule
Involve them in discovery. Every week you delay their involvement costs you more. A legal blocker discovered in sprint 12 is ten times more expensive than one discovered in sprint 1.
Frame it as risk management, not permission-seeking
Don't ask "can we do this?" Ask "here's what we're building and here are the risks we've identified — what are we missing?" That's a collaborative conversation, not a gate-keeping one.
Setting AI expectations with clients
Under-promise, then explain why
AI performance is probabilistic, not deterministic. Never promise "it will do X" — promise "it will do X in approximately Y% of cases, and here's how we handle the rest." Clients who understand the limitation are more forgiving of failures than clients who were promised perfection.
Show them the error mode, not just the success mode
In demos and pilots, deliberately show the model getting something wrong — then show how the escalation path handles it. This builds more trust than a curated demo that only shows the best case.
Define success metrics together
Before the pilot begins, agree in writing: what does success look like at 30, 60, and 90 days? What's the threshold at which we continue vs. revisit? Clients who help define success are far more likely to declare it.
Handling disappointment
Don't defend the model — acknowledge the gap between expectation and reality first. "I hear you — this isn't performing the way we expected it to." Then diagnose.
Diagnose before you fix — is this a model issue, a data issue, an integration issue, or a scope mismatch? Don't promise a fix until you know what's broken.
Come back with a plan, not just an apology — "Here's what we found, here's what we're changing, and here's how we'll know it's working." Clients can accept problems — they can't accept silence or vague reassurance.
High-Probability Question
"Tell me about a time you managed a difficult stakeholder."
Use STAR, but lead with the complexity:
Structure: Situation (who was the stakeholder, what was at stake) → Task (what you needed from them, or what they were blocking) → Action (how you approached the relationship — meetings, framing, escalation) → Result (what changed and what you learned).
Strong signals: You identified the real concern (often not the stated concern), you changed your approach based on what motivated them, and you turned a blocker into a collaborator.
Weak signals: You escalated immediately, you worked around them, or you "won" by outranking them. Interviewers want influence, not force.
High-Probability Question
"How do you explain AI limitations to a non-technical executive?"
Demonstrate translation skill:
"I translate model behaviour into business language and consequences. If precision is 87%, I say: 13 in every 100 recommendations will be wrong. Then I immediately answer the follow-up question they haven't asked yet: here's our plan to catch those errors before they reach the customer.
I also set up the conversation upfront — before the executive ever sees the model output, I've aligned them on what good looks like and what the failure modes are. That way the first time they see an error is never a surprise."
Strategy
Roadmapping for AI Products
AI roadmaps are different. Model performance is uncertain, dependencies are non-linear, and stakeholders expect the certainty of a software roadmap from a system that doesn't behave like software.
Now / Next / Later — applied to AI
The standard Now/Next/Later format works for AI — but each horizon needs to account for model uncertainty.
Now (current sprint/quarter): Features backed by a model that has already met its performance threshold in evaluation. Confidence is high. Commitments are firm.
Next (next quarter): Features dependent on a model currently in training or evaluation. Commit to the outcome — "reduce manual review by 30%" — not the implementation. Flag the performance gate explicitly.
Later (beyond 6 months): Features that depend on model capabilities that don't yet exist or haven't been validated at scale. Label these as "directional" — not commitments. Revisit each quarter.
What makes AI roadmaps fail
Treating model training like feature development
A feature takes as long as it takes to build. A model takes as long as it takes to reach a performance threshold — which is genuinely uncertain. Never put a model performance milestone on the same timeline as a feature milestone without explicit uncertainty buffers.
Roadmapping the model, not the outcome
"Ship intent detection model" is an output. "Reduce support ticket misrouting by 40%" is an outcome. Roadmap the outcome — the model is an implementation detail that may change. Stakeholders care about results, not architectures.
Not accounting for data dependency
Every AI feature is blocked by data until it isn't. Map your data acquisition, labelling, and pipeline readiness on the roadmap as first-class work items — not invisible dependencies.
Dealing with model dependency
The performance gate
Every AI feature on the roadmap should have an explicit performance gate: the minimum model quality threshold at which you will ship. Define this in advance — precision, recall, latency, or whichever metric matters — so the "go" decision isn't made under launch pressure.
Build in a fallback path
If the model doesn't hit the gate in time, what happens? A good AI roadmap has a fallback for every AI-dependent feature: a rule-based approach, a manual process, or a simplified model. The fallback is not the plan B — it's the plan that keeps the product alive while the model catches up.
Stage your releases around model confidence
Shadow mode → limited beta → controlled rollout → full release. Each stage tests a broader population at a lower confidence threshold. Roadmap these stages explicitly — they are milestones, not testing phases.
Typical AI feature lifecycle on a roadmap
1. Data readiness — labelled data acquired, pipeline validated → gate: training set complete 2. Model training & evaluation — initial model built, evaluated against baseline → gate: performance threshold met 3. Shadow mode — model runs in parallel, predictions not shown to users → gate: production parity with evaluation metrics 4. Limited beta — shown to a small user cohort, human review on all decisions → gate: user feedback positive, no safety incidents 5. Controlled rollout — expanding cohort, automated decisions with monitoring → gate: drift monitoring stable 6. Full release — full production, ongoing monitoring, retraining schedule established
Platform vs Feature — the core AI PM decision
Build a platform when: multiple teams or products will need the same AI capability. Example: a shared intent classification service used by support, sales, and onboarding. The investment in a reusable platform pays back when the second consumer arrives.
Build a feature when: the use case is specific, timelines are tight, and reuse is speculative. Don't over-engineer for scale you may never need. A tightly-scoped feature that ships is worth more than a platform that's 6 months away.
Questions to ask before choosing
Who else needs this capability?
If you can name two other teams who would use this capability in the next 12 months, platform investment is worth discussing. If it's speculative, ship the feature.
How fast is the underlying model evolving?
Building a platform on top of a rapidly-evolving model is risky — the abstraction may need to be rebuilt as the model capabilities shift. Prefer platform investment when the model layer is relatively stable.
What's the retraining and update cadence?
A platform shared across products must handle model updates that don't break consumers. That versioning and compatibility contract is an additional engineering investment — factor it into the platform decision.
Communicating uncertainty in AI roadmaps
Use confidence levels, not just dates
Label roadmap items: High confidence (model validated, shipping next sprint), Medium confidence (model in evaluation, performance gate defined), Low confidence (model not started, directional intent only). This is more honest and more useful than a date that everyone knows is a guess.
Separate outcome commitments from implementation commitments
Commit to the business outcome ("reduce misrouting by 30% by Q3") but be explicit that the implementation path may change based on model performance. Stakeholders can accept implementation pivots when the outcome commitment is clear.
Make the performance gate visible
Put the go/no-go criteria on the roadmap itself: "Ship when precision ≥ 90% on evaluation set." This makes the decision criteria transparent and removes subjectivity from the launch conversation.
The stakeholder conversation you need to have upfront
"I want to set expectations about how this roadmap works differently from a traditional software roadmap.
The features that depend on model performance have two kinds of uncertainty: when the model will be ready, and whether it will reach the threshold we need. I've built in performance gates and fallback paths for each of those items.
What I'm committing to is the outcome and the process — not a specific implementation date for AI-dependent features. I'll give you a confident date once the model clears evaluation. Is that a framing you can work with?"
High-Probability Question
"How do you build a roadmap when you don't know how well the model will perform?"
Show structured thinking under uncertainty:
"I separate what I can commit to from what I can only direct. I commit to the business outcome and the process — discovery, data readiness, evaluation, staged rollout. I don't commit to a ship date for an AI-dependent feature until the model has cleared its performance gate.
I use a Now/Next/Later format with explicit confidence levels, and I define fallback paths for every model-dependent item — so the roadmap stays credible even if the model takes longer than expected.
The thing I do above all else is agree on the go/no-go criteria with stakeholders before development starts. That removes the subjective pressure at launch time and replaces it with a shared, data-driven decision."
Follow-Up Question
"A stakeholder wants a firm ship date for an AI feature. How do you handle that?"
Show you can hold your ground while being collaborative:
"I understand the pressure — a firm date is easier to plan around. But giving a date I can't defend doesn't help either of us.
What I'll do is give you two things: a date by which we'll have a go/no-go decision — that I can commit to firmly — and a range for the ship date conditional on the model hitting its threshold. If the model clears evaluation by sprint 6, we ship in sprint 8. If it doesn't, here's our fallback and the revised timeline.
That way you have a planning anchor and I'm not setting us both up for a missed commitment."
Connect AI Coach
Enter your API key to enable the AI coach.
API Key
Your key is stored only in this browser. It is never sent anywhere except directly to the AI API. Clearing browser data will remove it.
Key stored in localStorage · Never logged or shared · Get a key from your AI provider