You're Not Bad at Estimating. You're Bad at Closing the Loop.

In 2019, I pulled the P&L on a 14-truck residential HVAC shop my firm was evaluating for acquisition. The owner had been in business eleven years. Good reputation, waitlist for installs. He had also been losing money on every install job for three years — enough that the service department, which he thought of as the boring half of the business, was the only reason he hadn't noticed.

He wasn't a bad estimator. His quotes were detailed and his material takeoffs were careful. He'd been refining his flat-rate book since 2014. The problem was simpler and harder to fix: he had never once compared what a job cost him to what he'd quoted. Not systematically. Not in any form that fed back into the next quote. The quote lived in his estimating software. The actual costs sat in labor, truck expense, and materials in QuickBooks. The two sets of numbers never met.

Three years of broken assumptions, repriced faithfully every quarter.

The Quote Is a Guess. The Job Cost Is the Answer.

A quote is a hypothesis — you're predicting how long the job will take, what materials you'll need, and what delivering it will cost. That prediction might be based on long experience. It might be based on a flat-rate book calibrated to a different market. It might be based on what you charged last time for something that looked similar.

The job cost is the test. In most small shops, under 15 trucks and owner-operated, that test never runs.

The actual costs exist somewhere. They're in the timecard, the materials invoice, the dispatch record showing the job ran two hours past its slot. They're just not compared to the quote. So the quote stays on one side of the business, reality stays on the other, and the next job gets priced from the quote.

That's a feedback problem. The fix is not complicated — it's three numbers, captured within 48 hours of close, reviewed once a month.

The Gap Nobody Measures: Quoted Hours vs. Actual Hours

When I was turning wrenches at Bayview Mechanical, we quoted most residential installs at four hours for a straight swap with accessible equipment on an existing line set. That was fine for maybe 60% of the jobs. For anything in an unconditioned attic in the South Bay summer, four hours became five and a half before the refrigerant was in the system. Nobody updated the template. The owner knew jobs ran long sometimes. He absorbed it.

At Caldera on the commercial side, the overruns were proportionally larger and the feedback loop was exactly the same: zero.

Here's what a labor overrun actually costs, and this is the part most owners miss. A job that runs 90 minutes over isn't only a labor loss. It's a truck-hour loss. If you've done the math on your own fleet — your actual insurance premiums, your actual fuel cost per mile, your actual maintenance and depreciation — you know a fully loaded truck-hour has a real cost in your market with your carrier. Most shops I've worked with are using a national average figure they pulled from somewhere. The national average is a fiction. It is not your cost.

A job that runs 90 minutes over isn't just a labor problem. It's a truck-hour problem — and most owners have never calculated what their truck actually costs per hour in their market with their insurance carrier.

To make this concrete: if your fully loaded truck cost is, say, $40 per hour — and when I've helped shops calculate their own number, it almost always lands above what they assumed — 90 minutes of overrun is $60 per job in absorbed cost that doesn't appear as a line item. It distributes into labor and vehicle expense and disappears. Multiply that across your install volume for the week and you can see why the service department's margin looks better than installs even though installs are supposed to be the growth side.

The Flat-Rate Book Problem

The flat-rate books and pricing software sold by the major industry vendors are built to produce predictable revenue per ticket. They are not built to produce predictable margin per ticket for your shop. That distinction matters.

A nationally calibrated price book runs on median assumptions — median labor rates, median truck costs, median overhead structures. If you're running a shop in the D.C. suburbs, where commercial auto premiums have been climbing and journeyman wages sit above the national median, those embedded assumptions are wrong for you. If you're in a secondary market in the Southeast with lower wage pressure and cheaper insurance, the book might be closer — but your distributor pricing and materials costs will still skew the margin math in ways the book can't see.

The result: you know what you charged. You don't know what you made.

This is also how the SEER2 transition hurt shops it shouldn't have hurt. When equipment costs went up, shops that understood their own cost model repriced. Shops that were pricing from a flat-rate book they didn't feel authorized to edit absorbed the increase. In my experience — and I watched this closely in the shops I was working with through 2022 and 2023 — more independents absorbed it than passed it through. The book said what it said. The margin compressed. Not because anyone made a bad decision, but because the tool didn't make the cost increase visible.

A flat-rate book is not a cost model. It is a pricing shortcut. Know which one you're using.

What Post-Job Costing Actually Looks Like for a 6-Truck Shop

I went back to George Mason for an MBA in 2016 because I kept watching shop owners make pricing decisions that didn't survive a spreadsheet. The spreadsheet I had in mind wasn't complicated. Three columns and a habit.

Here's what I walk shops through. You don't need dedicated software. Shops under 10 or 12 trucks frequently pay for reporting capability that sits unused because no one has the time or training to build the reports — and whoever sold them the platform knew that going in. A well-organized spreadsheet handles this.

Within 48 hours of job close, you capture three numbers for every install:

Quoted labor hours vs. actual labor hours. The tech's timecard or your dispatch records give you the actual. Your estimate gives you the quoted. The variance, tracked across 10 jobs, tells you whether your template is accurate or whether it has been consistently lying to you in one direction.

Quoted material cost vs. invoiced material cost. This catches distributor price changes you didn't account for and scope creep — the extra fitting, the refrigerant top-off, the capacitor pulled off the truck. Both erode margin. The price change is the distributor's problem becoming yours; the scope creep is a quoting problem. They're different, and knowing which one is driving your variance tells you which one to fix.

Job duration vs. quoted job duration. This is not the same as labor hours, because it picks up standby time and wait time. A job can have accurate labor hours and still blow a scheduling slot by three hours if the crane window slips, which brings me to Caldera.

What One Bad Rooftop Install Actually Cost

At Caldera, we did a rooftop RTU replacement on a small commercial building. On paper it was straightforward: equipment quoted, labor quoted at six hours for a two-man crew, crane window pre-scheduled. Quoted margin was around 22%.

The existing line set was a size we couldn't reuse. The material we'd quoted was off by a meaningful amount in copper and fittings. The crane window slipped two hours because of a scheduling conflict with another contractor on the roof — two hours of a two-man crew and two trucks sitting, not billing. The equipment itself had been repriced by the distributor since the quote went out. Not a dramatic change. But the quote was built on last quarter's price sheet.

Not one of those three variances made it back into the template for the next rooftop job. The next quote used the same six-hour labor assumption, the same material methodology, and the same zero allowance for standby.

That job's actual margin was nowhere near 22%. Closer to 8%. Before accounting for payment terms — the account paid net-45, so the cash to cover materials and wages went out in week one and came back in week seven. Carrying that receivable for 45 days has a real cost. Most small shops don't calculate it. It's invisible the same way the overrun is invisible: it shows up as tighter cash, not as a named line item.

The three variances — material mismatch, standby time, equipment cost change — were each individually survivable. Together, unreported and unfixed, they became the template for the next job.

What You Do Monday Morning

Pull your last 10 completed install jobs. Get the quoted labor hours from whatever you used to price them. Get the actual hours from timecards or dispatch records — if you have nothing else, ask your techs. Compare them. That comparison is more actionable than anything in your QuickBooks P&L summary.

Build the three-column sheet. Quoted hours, actual hours, variance. Quoted materials, invoiced materials, variance. Quoted job duration, actual job duration, variance. Do this for every install going forward, within 48 hours of close. Twenty minutes per job.

Set a monthly template review. Once a month, look at the variance column. If installs are running 1.2 hours over on average, add 1.2 hours to the template. If material quotes are coming in 8% under actual, add 8% to the material line. That's the feedback loop — and it requires someone to close it deliberately, on a schedule.

Calculate your actual truck cost. Pull your commercial auto insurance premium, last quarter's fuel spend, last 12 months of maintenance costs, and your truck depreciation. Divide by actual billable hours — not estimated billable hours. That number is your truck cost per billable hour. If it's different from what's embedded in your pricing, one of them needs to change.

This is a math problem. The math has been producing a wrong answer because it has been running on inputs nobody checked. Start with the last 10 jobs.

FAQ

If my techs are the ones tracking hours, won't they just report what was quoted to avoid the conversation?

Some will, at first. The fix is to pull actual hours from a source that isn't self-reported — GPS dispatch records, job start and close timestamps in your scheduling software, or before-and-after photos with metadata. When techs know the comparison is happening regardless, the incentive to adjust the number disappears. Frame it correctly from the start: you're auditing the template, not the tech. If a job routinely runs longer than quoted, that's a quoting problem, not a performance problem.

How is post-job costing different from looking at QuickBooks after the fact?

QuickBooks tells you what you spent. Post-job costing tells you what you expected to spend and exactly where the gap opened. Your P&L shows that labor ran high last quarter. Post-job costing shows that attic installs consistently run 45 minutes over because the template doesn't account for the second trip to the truck at that ceiling height. One tells you something is wrong. The other tells you what to fix.

What's a reasonable labor hour variance before I revisit a job template?

My threshold — based on what I've seen in the shops I work with — is 10% sustained across multiple jobs. A single job running 15% over is usually explainable: old equipment, an unexpected refrigerant leak, a difficult access situation. Five jobs of the same type running 15% over is the template. When the average variance across your last 10 comparable jobs exceeds 10%, the assumption is wrong and you're pricing from it every time.

My jobs are all different — custom work, older homes, weird equipment. How do I template something that doesn't repeat?

You build around categories, not individual jobs. Even highly variable work has clusters: attic vs. basement, straight swap vs. system reconfiguration, single-zone vs. multi-zone. Within each cluster, the variance will have a shape. You're not trying to predict every job perfectly. You're trying to stop systematically underestimating the same category of difficulty. Categorize your last 20 jobs and look at variance within each group — not across all of them combined.

Should I be doing this for service calls too, or just installs?

Start with installs. The dollar variance per job is larger and the job types are more comparable to each other. Service calls have high inherent variability — a diagnostic that becomes a blower motor replacement looks nothing like a capacitor swap. That said, if your service call volume is high and average ticket margin is compressing, the same three-column approach applies. The most useful metric, once you have the data, is labor hours per closed ticket by call type. It shows you which call types your techs are efficient on and which ones are bleeding time without the revenue to support it.

At what point does this become a job for my office manager instead of me?

When the habit is solid and the sheet is working, hand off the data entry — pulling hours, entering material invoices, logging job duration. Anyone organized with access to your scheduling records can do that. Don't hand off the monthly template review. That's where your prices actually come from, and in a shop under 15 trucks, the person who understands the cost well enough to change the template is almost certainly you.

You're Not Bad at Estimating. You're Bad at Closing the Loop.

Enjoyed this article?