AI Call Summaries Are Hiding Your Real Intake Problems

My NPS in month two of Reeves Electric was a 4. Not 40. Not 4 out of 5. A 4 on a 100-point scale, which meant my customers were, on average, actively not recommending me to people they liked.

I didn't know why. I assumed it was growing pains — new shop, figuring out scheduling, techs still learning the service model. So I did what most owners do. I looked at my reviews, read the complaints, tried to fix the surface problems. Price transparency came up. Response time came up. I tweaked a few things.

Nothing moved.

What actually fixed it was simpler and more uncomfortable than any process change. I started answering every call personally for ninety days, and I started listening back to recordings of calls I hadn't answered. Not summaries. Not transcripts with the filler words cleaned up. The actual audio, start to finish, at 1.0x speed.

By month nine, NPS was 81. Everything between those two numbers is explained by what I found in those recordings — and AI call summaries would have hidden all of it.

The Feedback Loop You're Accidentally Killing

The pitch for AI call summaries sells itself. Your CSR takes 40 calls a day, you can't listen to all of them, so the AI pulls the key details and returns a clean bullet list: customer name, issue description, address, job type, special notes.

Here's the problem. The summary starts from the assumption that the call worked. It captures output from a successful intake — the information the customer provided — and treats the interaction itself as noise to be filtered. That's fine if your intake is already dialed in. It's catastrophic if it isn't, because the problems live in the interaction, not the output.

When I went back and listened to my month-two recordings, I wasn't finding missing customer information. The job addresses were correct. Problem descriptions were accurate. What I found — what no summary would have surfaced — was that my CSR was ending every call without telling the customer what happened next.

Not dramatically. The calls weren't rude. They wrapped up politely. Customer described the issue, CSR said we could have someone out Thursday, customer said okay, CSR said great, we'll see you then.

What that call never included: what the customer should do before the tech arrived. Whether they needed to be home. Whether we'd call before showing up. What a diagnostic visit typically runs. The customer hung up with a booking confirmation and zero information about what the next 72 hours looked like.

That ambiguity was generating my 4 NPS. Not price. Not wait time. Customers were hanging up uncertain, and uncertain turned into a 4.

An AI summary of that call would have read: Customer reports tripping breaker in kitchen, requested service call, Thursday 9-11am window confirmed. Accurate. Completely useless for diagnosing what was broken.

What Summarization Actually Strips Out

Summarization compresses the details that matter for intake analysis — and what matters is almost never the information itself. It's the texture around the information. A customer saying "well, I guess Thursday works" versus "yes, Thursday is great" produces the same summary bullet. Those are completely different customer states.

My weekly ritual with my dispatcher — I've written about this before — is four calls. Two that booked, two that didn't. We listen to them raw, together, and tag what happened. It runs 45 minutes to an hour. That practice is what moved our booking rate in 2023, and it is completely incompatible with summary-only review.

What I'm actually listening for:

Whether the booking source changed how the call went. Calls from Google LSA close at a lower rate than organic referrals. I know this because I've listened to hundreds of both, not because someone reported it to me. LSA callers are shopping. The hesitation pattern is different. You can hear it. A summary doesn't capture it.

Call duration as a diagnostic. When an unbooked call runs under two minutes, we usually failed to engage the customer in the problem before quoting. When a booked call runs over six minutes on a simple job, sometimes the customer had anxiety we didn't address. Duration is a signal. It's in the raw recording metadata.

The competitor price mention. Comes up maybe once every fifteen calls. It almost never makes it into a summary with enough context to be useful. A customer saying "I saw someone doing this for $399" in a curious tone versus a decided tone requires completely different responses. The audio tells you which one it was.

None of that surfaces cleanly in an AI-generated summary. The summary gives you what was said. The call gives you what happened.

Where AI Summaries Actually Belong

I'm not arguing they're useless.

They're genuinely good for job handoff. Once a call is booked, the relevant question shifts from "how did that interaction go" to "what does the tech need to know before they knock on the door." Customer has two dogs, gate code is 4521, they've replaced the outlet twice and it keeps tripping. That's exactly what a well-formatted summary or auto-generated job note handles well. It saves the CSR five minutes of typing and pushes clean notes into the dispatch board.

That's a real workflow improvement. My dispatcher uses it.

The confusion is using summaries as intake analysis when they're actually a field logistics tool. They serve the second half of the workflow, not the first. The CSR's job is to convert the call and capture the scope. The tech's job is to execute it. AI summaries are useful tech-prep. They're actively harmful as CSR-feedback, because they skip exactly the part of the call that needs examination.

The first 90 seconds of the phone call is where most small-shop revenue leakage happens. That's the part the AI summary skips entirely — the booking decision hasn't been made yet, so there's nothing to capture.

If your shop is using call summaries to understand intake performance, you're monitoring the wrong end of the call. You're reviewing the paperwork from a meeting you weren't actually in.

Month Two, a 4 NPS, and What the Calls Said

I launched Reeves Electric in February 2022 with what I thought was a solid intake setup — scripts, pricing framework, software stack. By month four, the NPS came back at 4.

I ran what I now call the ninety-day audit. I answered every call I could personally. For the ones I missed, I listened to the recordings within 24 hours. Not transcripts. The audio.

What I expected to find: price objections, availability complaints, tech behavior issues.

What I found: the ambiguity problem, over and over. Customer would call, CSR would schedule, call would end without the customer having a clear picture of what came next. No price range. No confirmation of whether they needed to be home. No "we'll call 30 minutes before we arrive." Just a day and a window.

That's a specific, fixable problem. But I had access to feedback forms. I had post-job survey data. I had Google reviews. None of that said "the CSR didn't tell me what came next." Customers don't know to complain about what they were never told. They just know they felt uncertain, and uncertain becomes a 4.

The information that fixed my business was in the recordings the whole time. I rebuilt the intake script from what I heard in those calls — specifically the moments where the conversation should have continued and stopped instead. Month nine, NPS was 81.

A summary would have told me every one of those calls went fine. Customer issue captured. Appointment confirmed. Nothing flagged.

What You're Actually Measuring When You Review Raw Calls

CallRail is where I'd start. Every marketing channel gets its own tracking number — Google LSA has a number, the truck wraps have a number, the direct mail piece has a number, organic referrals have a number. Every call gets recorded, transcribed, and attributed to the source automatically.

That gives you the inputs to start asking real questions. Booking rate by source, for instance. Average call duration by source. Revenue traced back to channel once you close the loop through your CRM. I've seen the gap in my own shop between what LSA calls book at versus what a referral call books at — the LSA caller is often still deciding, the referral caller usually isn't. Knowing that changes how I coach intake on those two call types.

CallRail runs around $50 to $150 a month depending on the plan and number of tracking numbers. Dollar for dollar, it's the highest-return software spend I have.

One thing worth knowing: CallRail sells a Premium tier with AI call scoring — automated sentiment flags, quality scores, the works. I tried it for several months in 2023. My take is it doesn't work well enough for a shop at our scale. For a national franchise with dozens of CSRs and no capacity for manual review, maybe it makes sense. For a five-truck residential shop, you can review a meaningful sample yourself, and your judgment on a specific call beats the algorithm's. Stay on the base tier, get the recordings, do the listening. The recording is the product.

What to Do Monday Morning

Pull four calls from last week before you do anything else. Two that booked, two that didn't. Listen to them — audio, not summary, not transcript. Just the recording at 1.0x speed.

Write down one thing the summary missed. One thing that was in the audio that a bullet point wouldn't have captured — a hesitation, an off-script comment, the moment the customer's tone shifted. That gap between what the summary said and what you just heard is your baseline. It's the measurement of what the AI layer has been filtering out.

No recordings yet? That's Monday's task. Get CallRail set up with at minimum three tracking numbers: your primary Google presence, your truck wraps or physical presence, and a direct line for referrals. You can do this in an afternoon. Jobber and Service Fusion both have CallRail integrations.

Once you have recordings, build the weekly review into the calendar as a hard block. Forty-five minutes, you and your dispatcher or CSR, same time every week. Four calls, consistent rubric: what qualifying questions got asked, what got skipped, whether the next step was communicated clearly at the close. Don't bring the AI summary to this meeting. If you want to check its accuracy, read it after you've listened. Don't let it anchor your analysis first.

The intake script should update based on what you hear. If it hasn't changed in six months, you either stopped listening or you've already fixed everything — and I'd bet on the first one.

FAQ

Is there any situation where AI call summaries are worth using for intake review?

At a residential service shop under five trucks, no. Not yet. The automated scoring isn't accurate enough to replace listening. At a shop running hundreds of calls a day with a dedicated QA function, automated flagging helps prioritize which calls get human attention — you simply can't review everything manually at that scale. That's not a five-truck-shop problem.

How do I get my CSR on board with call review when they see it as surveillance?

Tell them what you're looking at before you start. You're auditing the intake process — what the script asks for, what information we're giving customers before we hang up. Listen together in the same room. When you find something that went well, name it out loud. When you find a gap, make it about what the script should say next time. The CSR who resists this has usually had a manager use recordings to criticize them personally. The CSR who leans into it has usually had a manager use recordings to fix the process they're working inside. Which one you get depends almost entirely on how you run the first session.

What's the minimum CallRail setup a five-truck shop needs?

Three tracking numbers: one for your primary Google presence, one for physical or wrap traffic, one for referrals and direct calls. Make sure call recording is turned on — it's not on by default on every plan. Skip the Premium tier and the Conversation Intelligence add-on. Base plan, recordings enabled, three numbers. Expand once you're actually reviewing calls consistently, not before.

If I've been using AI summaries for six months, how do I know what I've missed?

Pull ten calls from three months ago, don't look at the summaries first, listen to the recordings, and write down anything the summary missed. If you find meaningful signal — tone, competitor mentions, ambiguity at close — in more than half of them, you've been making intake decisions with incomplete information. The recordings are still there. Run a two-week backward audit, see what patterns show up, let that inform what you change in the current script.

Should techs generate their own job notes, or is that the CSR's job?

Different jobs, different notes. The CSR's intake note covers what the customer described before the tech arrived — scope, access details, prior repairs, anything relevant to pre-job context. The tech's field note covers what they found, what they did, what they quoted or completed. If the tech is generating the intake note from memory after the job, you've already lost what the customer actually said. The intake call recording is the source of record for that. The field note is the source of record for what happened on site.

At what point does manual call review stop scaling?

Further than most shops think. At five trucks you're nowhere near the ceiling. A shop with a dedicated CSR team and someone whose job includes QA can run meaningful manual review on a sampled basis well past that. The point where it genuinely breaks is national-franchise scale — dozens of CSRs, hundreds of calls a day. If you're at five trucks and already thinking about scale limits on this, skip that problem. Build the manual review process first, hire someone to own it as you grow, and revisit automation when you've actually outgrown the manual version.

AI Call Summaries Are Hiding Your Real Intake Problems

Enjoyed this article?