The metric that hides in the average

Why a 150+ seat contact center stopped trusting average hold time, what they tracked instead, and what it cost them to find out — measured in customers who waited forty minutes on hold without anyone noticing.

The default contact center dashboard reports averages because averages are easy to compute, easy to chart, and easy to put in a quarterly review deck. They are also, for the specific job a shift supervisor is trying to do, almost useless. This is a piece about how one of our clients figured that out, what they did about it, and the operational principle the experience left behind.

The client was CarOffer, the digital wholesale automotive marketplace headquartered in Addison, Texas — at the time, a fast-growing platform connecting dealer-to-dealer vehicle transactions across the United States, since acquired by CarGurus. Their contact center sat at the operational core of the business: 150-plus seats handling three distinct streams of inbound and outbound voice traffic, with supervisors making real-time staffing decisions on a floor where a single delayed call could mean a dealer choosing a competitor's platform for the next transaction.

We designed and operated the contact center reporting layer for them across multiple years and a contract renewal. The lessons below come from that work — specifically, from a moment a few months into operation when their lead supervisor walked over to the dashboard, pointed at a number that looked perfectly fine, and said: I don't trust this.

The split that shipped: queue, outgoing, incoming

Before we get to the metric that broke, it's worth describing the dashboard that worked — because it sets up why the supervisor was looking at hold time at all.

The default contact center reporting that ships with most platforms treats a "call" as a "call." Inbound, outbound, queued, abandoned — all bucketed together into volume metrics that don't distinguish between fundamentally different operational realities. For CarOffer, that bucketing was unworkable from day one. Their voice traffic came in three flavors that demanded different staffing, different supervision, and different success metrics:

Queue calls — inbound from dealers actively trying to buy or list vehicles. These are the revenue calls. Every minute on hold is a minute closer to the dealer abandoning the transaction.
Outgoing calls — sales outreach to dealer prospects. These are pipeline calls. Volume matters more than speed.
Incoming calls — general inbound, which mixed pre-sales questions, support requests, and administrative inquiries. These are "could-be-anything" calls and need a different handling profile entirely.

We built a custom dashboard view that split call volume across these three streams as the top-level KPI surface — three large numbers, real-time, refreshing through the day. That split alone changed how supervisors managed the floor: they could now see, at a glance, whether the morning was a queue-heavy day (deploy senior agents) or an incoming-heavy day (route to triage) or an outgoing-heavy day (push the dialer).

The split worked. The supervisor was happy with it. And then we added average hold time underneath it, which is where the trouble started.

The number that looked fine

Average hold time is the canonical contact center metric. Every platform reports it. Every vendor uses it in a sales pitch. Every executive understands it. For most of the deployment's first months, CarOffer's average hold time looked exactly like a healthy contact center should look: somewhere between three and five minutes, fluctuating mildly with daily volume, never alarming.

Then the supervisor started getting calls. Not from dashboards, from customers. Dealers calling their account managers afterward, saying things like: I waited thirty minutes on hold yesterday and almost gave up. Or worse: I gave up and called your competitor.

The supervisor pulled up the dashboard for the day in question. Average hold time: 3 minutes 47 seconds. Within the normal band. Nothing flagged. Nothing actionable. The dashboard had no idea anything had happened.

"The number on the screen says we're fine. The dealer on the phone says we lost him. One of these is wrong, and it isn't the dealer."

What was happening, of course, is the thing that happens any time you summarize a long-tailed distribution with a mean. Most calls were answered quickly. The bulk of the volume sat in the under-five-minute band, where it dragged the average down. But on the same days the average looked fine, individual callers were spending twenty, thirty, even forty minutes on hold — and those calls, statistically rare, were operationally catastrophic. They were the dealers who churned. They were the customers who wrote angry emails. They were the moments the contact center was failing at its actual job, and the dashboard was reporting "all clear."

Surfacing max hold time from the CDR

The fix was conceptually simple and operationally non-trivial. Conceptually: in addition to average hold time, surface maximum hold time — the longest single hold any caller experienced within the reporting window. If the maximum is wildly out of line with the average, that's the signal that the average is hiding outliers.

Operationally, it required going into the call detail records, computing a per-period maximum across thousands of call legs, and exposing that as a first-class dashboard column alongside the existing averages. Most platforms don't ship this view. Ours didn't, by default — we built it.

Here's a representative slice of the dashboard after the change. The columns that mattered are highlighted:

/ Figure 1 — Dashboard view, post change

Total Calls	Queue Count	Avg Hold	Max Hold	Queue Avg Dur
682	660	0:03:00	0:48:20	0:04:41
524	397	0:02:58	0:50:04	0:05:21
491	402	0:03:12	0:41:54	0:06:29
415	183	0:00:05	0:02:16	0:01:29
282	146	0:00:34	0:09:22	0:03:05

Same dashboard. Same days. Average hold time on the left tells one story (everything's fine, three minutes). Max hold time on the right tells the actual story: customers waited forty-eight minutes, fifty minutes, forty-one minutes. Either of those numbers is a different conversation than three minutes.

Look at the first row. Average hold time of three minutes; maximum hold time of forty-eight minutes and twenty seconds. Both numbers are true. Both are computed from the same underlying CDR. The average tells you the contact center is operating well. The maximum tells you that on that specific day, at least one caller was on hold for nearly fifty minutes — and that's the call the supervisor needed to know about.

What the supervisor actually did with it

A metric that gets added to a dashboard and then ignored is no better than a metric that was never added. The interesting part of this story isn't that we put max hold time on the screen — it's what the supervisor did with it once it was there.

The decision the supervisor used max hold time to justify was a staffing change. The argument went roughly like this: average hold time looks fine, but on the days when max hold time spikes above twenty minutes, we are losing customers in ways the average can't see. Map the spikes against the schedule. Find the time-of-day windows where coverage is thinnest. Adjust staffing to put more agents on the floor during those windows.

That's not a sophisticated analytical move — it's exactly the kind of decision a competent contact center supervisor should be making. But it could not have been made from the previous dashboard. The previous dashboard would have said "you're fine, your averages are healthy, don't restaffing." The new dashboard surfaced a class of failure the old one was actively concealing, and gave the supervisor specific evidence to take to leadership when justifying additional shift coverage.

We're not in a position to take credit for what the floor team did with the data — they were good at their jobs, and they would have figured something out either way. What we can take credit for is the dashboard not lying to them anymore.

The principle the work left behind

CarOffer was a single deployment, and a single anecdote from a single deployment is not a methodology. But the underlying principle is one we've now carried into every contact center engagement we've designed since, and it generalizes well past hold times:

"Averages are for people reviewing performance. Outliers are for people running operations. A dashboard that only reports averages is a dashboard for the boardroom, not the floor."

What this looks like in practice is a small set of habits we now apply by default when designing supervisor-facing reporting:

Always show maxima alongside means.

For every average a dashboard reports, ask whether the maximum (or the 95th percentile) belongs next to it. Hold time, handle time, queue duration, after-call work — all of these have failure modes that live in the tail. If the supervisor only sees the mean, the supervisor cannot find the tail.

Segment by call class before averaging.

An average hold time across queue, outgoing, and incoming calls is a number that describes nothing in particular. Each stream has different acceptable thresholds and different operational stakes. Compute and display the metrics inside each stream separately; resist the temptation to roll up.

Build alerts on the tail, not the mean.

If you're going to wire up automated supervisor alerts, the most operationally valuable trigger is rarely "average hold time exceeded X." It's "maximum hold time exceeded Y" or "any single caller on hold longer than Z minutes." The alert that fires in the moment a customer is being failed is worth more than the alert that fires after the failure has been averaged across a thousand other healthy calls.

Make the dashboard auditable to the call leg.

When max hold time spikes, the supervisor should be able to click through to the specific call legs that drove the spike. Not "show me a summary of long calls today" — "show me the four calls where the customer waited more than thirty minutes, with the timestamps, agent IDs, and disposition." Without that drill-through, the metric is interesting but not actionable.

Why this matters beyond hold time

The CarOffer story is about hold time because that's where the conversation started. But the deeper claim — that averages systematically conceal the operational events that actually cost a business its customers — applies to almost every metric a contact center reports. Average handle time hides agents who are going long on a few calls and burning out. Average abandonment hides time-of-day windows where the abandonment rate triples. Average first-call-resolution hides specific issue categories that are routinely escalating.

None of this is news to a senior operations director who's run a contact center for fifteen years. But the dashboards most platforms ship are designed for executives reviewing summary statistics in a quarterly meeting, not for the supervisor on the floor at 11:42am on a Tuesday trying to figure out why the queue is climbing. The two audiences need fundamentally different views of the same data, and "default" reporting almost always optimizes for the first one — because that's who signs the renewal.

Designing reporting for the floor instead of the boardroom is one of the things that separates a contact center deployment that gets used from one that gets ignored. CarOffer's lead supervisor was good enough at her job to push back on the dashboard. Most supervisors don't, because they assume the dashboard is reporting the right things. If you're a CTO or operations director evaluating a contact center platform — yours, or a vendor's — the question worth asking is not "what does the default dashboard look like?" but "what does the dashboard look like after a working supervisor has spent six months telling you what's missing?" If those two are the same dashboard, the platform isn't being used the way it should be.

That's the work. The platform is the easy part. Getting the reporting layer to actually serve the people running operations — that's where the engagement earns its keep, and that's why CarOffer renewed for another five years.

The metric that hides
in the average.

The split that shipped: queue, outgoing, incoming

The number that looked fine

Surfacing max hold time from the CDR

What the supervisor actually did with it

The principle the work left behind

Always show maxima alongside means.

Segment by call class before averaging.

Build alerts on the tail, not the mean.

Make the dashboard auditable to the call leg.

Why this matters beyond hold time

The reporting layer is where contact center deployments
earn their keep.

The metric that hides in the average.

The split that shipped: queue, outgoing, incoming

The number that looked fine

Surfacing max hold time from the CDR

What the supervisor actually did with it

The principle the work left behind

Always show maxima alongside means.

Segment by call class before averaging.

Build alerts on the tail, not the mean.

Make the dashboard auditable to the call leg.

Why this matters beyond hold time

The reporting layer is where contact center deploymentsearn their keep.

Continue reading

The CDR contains your answer — but only if you organize it first

Designing skill-based routing for inbound queues at 100+ seats

The metric that hides
in the average.

The reporting layer is where contact center deployments
earn their keep.