Designed an AI Copilot that turned raw call recordings into actionable coaching insights

Overview & Context

Sales and support teams run thousands of conversations every week, but very little of that knowledge actually turns into learning. While working on CallHippo, we discovered a major gap between call recording and real coaching. Teams had recordings, but almost no scalable way to understand what was happening inside those conversations.

Managers wanted to help their teams improve, but listening to hundreds of calls every week was impossible. This project explored how AI could transform raw call recordings into actionable coaching insights for managers and agents.

Role

Product Designer and UX Lead

Team

Product Manager, 2 Engineers, Data Science team

Tools

Figma, Mixpanel, Notion

Timeline

3 weeks design, 6 week rollout

CallHippo AI Copilot

Problem

CallHippo powers millions of calls every month across sales and support teams — but managers had no scalable way to evaluate conversation quality.

Most managers listened to only a few random recordings each week. In many teams, less than five percent of calls were ever reviewed. This created a fundamental coaching gap.

As a result:

Managers relied on intuition instead of data when coaching agents, with no measurable baseline for what made a call successful.

Feedback often arrived days after a call happened, making it difficult for agents to connect coaching to specific conversations.

Agents received vague advice such as "listen more" or "sound confident" without any measurable context to act on.

The core problem was not lack of effort. The problem was lack of visibility.

Design Goals

Baseline

No scalable way to review call quality.

Goal

AI-assisted coaching — understand call quality without listening.

Outcome

22+ customers adopted AI Copilot. Teams actively use AI Copilot to review and improve conversations.

Research

AI Copilot Research

To understand how teams currently reviewed calls, I interviewed and observed 14 users across sales and support organizations — sales managers, support team leads, and frontline agents.

🎯 Managers lacked reliable data

Most coaching decisions were based on quota numbers or customer complaints — not actual call quality.

📣 Agents wanted actionable feedback

Advice like "be more confident" felt subjective and impossible to act on. Agents wanted clear, specific signals they could improve.

📊 Talk-to-listen ratio was never measured

Experienced managers consistently mentioned that good reps listen more than they talk — yet this behavior was never tracked anywhere in the product.

Defining the Opportunity

After synthesizing research, I ran a prioritization session with the product manager and engineering lead. We mapped potential features based on user value and implementation complexity. Our guiding question was simple: How might we help managers understand call quality without listening to every recording?

Four capabilities emerged as the foundation of the first release

SA

Sentiment Analysis

Visualize emotional tone across a call so managers can instantly identify conversations that ended negatively — without listening to a single recording.

TE

Topic Extraction

Automatically surface what customers are actually talking about across hundreds of calls — without managers needing to listen to each recording.

Key Insights

The same patterns emerged consistently across team sizes and industries. Four insights shaped the core of the design.

Insight 01

Managers did not lack coaching skills — they lacked reliable data

Most decisions about who to coach were based on quota numbers or complaints, not actual call quality. Coaching happened reactively, not systematically.

Insight 02

Agents wanted feedback but struggled with vague guidance

Advice such as "be more confident" felt subjective and difficult to act on. Agents needed specific, measurable signals they could improve against.

Insight 03

Sentiment shifts inside calls were completely invisible

A conversation could begin positively and end with frustration, but there was no way to detect when or why the change happened. Teams had no signal for call quality beyond outcome.

Key User Needs

Sales Manager

Pain: Could only review a handful of calls per week — no way to evaluate team performance at scale.

Consequence: Coaching was reactive, often based on gut feel or customer complaints rather than conversation data.

Need: A system that automatically surfaces call quality signals across every conversation so coaching is proactive and consistent.

Frontline Agent

Pain: Received vague feedback like "sound more confident" with no measurable context to understand or improve.

Consequence: Agents had no way to self-reflect or improve between calls, leading to stagnant performance over time.

Need: Clear, specific post-call insights that help agents understand exactly what to improve in the next conversation.

Product Flow

Before designing the interface, I mapped the full coaching loop from the moment a call ends. Two requirements became clear: managers needed a fast triage layer to identify problematic calls, and agents needed feedback immediately after each call while the context was still fresh.

▷▷

Manager flow: Open dashboard → identify flagged call → review AI insights → provide coaching note

▷▷

Agent flow: Receive post-call feedback → review call insights → adjust behavior in next call

▷▷

Use sentiment analysis to let managers instantly see which calls ended negatively — without listening to a single recording.

▷▷

Surface talk-to-listen ratio as a measurable coaching signal so feedback is grounded in behavioral data, not opinion.

▷▷

Generate automatic call summaries so agents can review what happened, what was promised, and what the next steps are — without manual note taking.

Solution

The AI Copilot translated raw conversation data into clear signals that both managers and agents could understand quickly. Four features formed the core of the experience.

Copilot Overview

AI Copilot Overview
Sentiment Analysis

Managers needed a fast way to understand how conversations were ending without listening to recordings. The sentiment view visualized emotional tone across three states: negative, neutral, and positive — each shown as a simple progress bar supported by expressive icons. Instead of randomly sampling calls, managers could now focus immediately on conversations where sentiment dropped.

Sentiment Analysis
Talk-to-Listen Ratio

On any call, three things are happening: the agent is talking, the customer is talking, or nobody is. Talk-to-listen ratio tracks how that time splits across all three.

It's a fast read on call quality. Agent talks too much — they're pitching, not listening. Too much silence — they're stuck, not confident.

Talk-to-Listen Ratio
AI Call Summary

After each call, agents needed to document what happened, what was promised, and what the next steps were — but this was often skipped because the next call arrived immediately. AI Copilot automatically generated a structured summary including key discussion points, customer intent, and suggested follow-ups.

▷▷

Key discussion points — What was covered during the call, automatically extracted from the transcript.

▷▷

Customer intent — What the customer was trying to achieve or resolve during the conversation.

▷▷

Suggested follow-ups — Agents could review the summary instantly or push it directly into their CRM, removing the burden of manual note taking.

Results

100% call review coverage — up from ~5%

AI Copilot automatically analyzed every call, replacing random spot-checks with complete coverage across all 200+ teams in the rollout.

Scoring consistency improved from 3 to 8 out of 10

Standardized AI scoring reduced manager disagreement, creating a shared language for evaluating call quality across the organization.

Managers shifted from reactive to proactive coaching

Instead of waiting for performance issues, managers could identify patterns across their team in real time and act before problems escalated.

An unexpected benefit: product and marketing insights

Extracted conversation topics revealed how customers actually described the product — becoming valuable input for messaging and roadmap decisions beyond the sales team.

What I Learned

AI features require strong transparency

When users understand how a score is calculated, they trust the system far more. Exposing the reasoning behind AI outputs was critical to adoption.

Framing feedback as growth, not grades

Numbers presented as grades created defensiveness. The same numbers framed as growth indicators encouraged improvement and self-reflection.

Design for skeptical users first

Once the system proved reliable to experienced managers, adoption spread quickly across teams. Winning over the skeptics built trust that accelerated the broader rollout.