The AI Implementation Buyer's Guide

Executive summary

Production AI implementation at an organizational layer costs $75K–$250K in year one, regardless of who you hire. The variable is who pockets the margin: an SI consultancy, a freelance prompt engineer, an in-house team, or a focused studio like ours. The install fails not because of bad models or bad prompts — it fails because outcome evaluation gets skipped, and nobody can answer whether it worked.

This guide walks operators through what to budget, what to demand from vendors, what to put in the contract, and — most usefully — when to walk away from an AI install entirely. It's vendor-agnostic; we wrote it because the conversations we have on our own first sales calls keep repeating the same evaluation gaps, and most of the time the right answer for the buyer is not "hire us."

1. The four cost buckets

Every AI install has four cost buckets. Most quotes name two and hope you forget about the other two.

Bucket 1 — Vendor subscriptions

The obvious one. Model providers (OpenAI, Anthropic, Google), embedding/vector store, orchestration layer, downstream system seats (CRM, calendar, email). For a 25-person organization running production AI on real workflows, plan for $1,500–$3,500 per month, or $18K–$42K annualized. The teams that get to $200K subscription bills usually have an uncapped model + a chatty integration loop and didn't notice for three quarters.

Bucket 2 — Engineering time

The biggest hidden cost. A two-week prototype doesn't ship to production. Real installs take 6–10 weeks for the first workflow, plus 2–4 weeks for each additional workflow. Budget $30K–$80K of internal senior engineering time (opportunity cost from other work), or $50K–$150K for an external team that already knows the failure modes. If you're being quoted "two weeks, $5K" by a freelance prompt engineer, that's a Loom demo, not an install.

Bucket 3 — Integration debt

Nobody quotes this. Your AI has to read from somewhere and write to somewhere — Gmail, Calendar, Slack, CRM, DMS, billing system, whatever. A meaningful install touches 5–10 systems. Each integration is 1–3 days of engineering done right (with audit logging, retries, idempotency, error escalation). That's another $20K–$60K nobody put in the proposal.

Bucket 4 — Outcome eval + handoff

The most-skipped, most-load-bearing bucket. Measurement infrastructure, before/after windows, holdout groups, quarterly review cadence, written handoff for whoever inherits this in six months. $10K–$25K done right. $0 done wrong — and $0 is what most installs spend on it, which is why they get quietly turned off when a champion leaves.

Honest year-one total

Add it up: $75K–$250K to ship one meaningful AI install at a real organization, regardless of who you hire. The second workflow is cheaper because integration debt and outcome-eval scaffolding carry over. But the first one is between $75K and $250K of total ownership cost.

2. The outcome-eval framework

If you remember nothing else from this guide, remember this. Before any code ships, write down the answers to these three questions and get them co-signed by the operator who will inherit this:

What metric is this install supposed to move? Named, specific, measurable. "Improve productivity" doesn't count. "Reduce intake-form processing from 12 min to under 3 min per case" counts.
What's the baseline, and how was it measured? A reading taken before the install, in the same measurement frame you'll use after. Not "we think it was about an hour." A logged sample with a known n.
What's the recheck schedule and the decision rule? Measurement windows at 6, 12, and 24 weeks. A decision rule specifying at what number the install gets renewed, modified, or killed. Written down before code ships. Co-signed.

That's it. Three things. Most installs skip them because the vendor doesn't want to be measured and the buyer doesn't want the answer. Defining outcome-eval is a self-imposed test that some vendors will fail; rather sign their refund clause now than write a $200K post-mortem in eight months.

3. Vendor evaluation rubric

Three asks that separate vendors who do this seriously from vendors who do it for a living.

Ask 1 — "Show me an outcome-eval scope you wrote for a comparable client."

Two-page written document. Operator named (or anonymized by sector). Metric, baseline methodology, recheck windows, decision rule. If they can't produce one — even redacted — they don't do it. Move on.

Ask 2 — "Define the metric you'd commit to moving for us, before we sign."

The vendor who names a specific number and risks being wrong is the vendor who does this work seriously. The vendor who pivots into "we'll see qualitative benefits" or "AI is a journey, not a destination" is selling motion, not outcomes.

Ask 3 — "What's your decision rule if the metric doesn't move?"

Trust the vendor who says: "We recommend killing the install and refund the maintenance retainer." Walk away from the vendor who says: "We'll iterate." Iteration without a kill rule is rent.

Three smells worth flagging

The vendor only sells one model. Production workflows need different models for different tasks. A single-model vendor will hammer every nail with the same hammer for as long as you keep paying.
The vendor's proposal has no audit-log line. You can't measure what you don't log. Audit log is a Day-One feature, not a Phase-Three roadmap item.
The vendor won't name a real client. Acceptable if your sector is sensitive (we don't name ours either). Not acceptable if the vendor is selling consumer AI into mid-market and still won't put a logo on the slide.

4. Sample contract terms

Eight clauses we'd put in any AI implementation contract — as buyer or seller. None of these are legal advice; show them to your counsel before signing anything.

Metric-bound milestones. Payment schedule tied to outcome-eval reading checkpoints, not just delivery.
Data ownership and portability. All embeddings, training datasets, audit logs, and configurations are owned by the buyer and exportable on 30 days' notice.
No-train clause. Vendor cannot train models on buyer's data, period. Includes downstream model providers the vendor uses.
Audit-log retention. Buyer-side audit log with 12-month minimum retention. Vendor cannot purge before that window.
Kill clause. Buyer can terminate the engagement and the running install with 30 days' notice; vendor refunds prorated retainer.
Off-boarding deliverable. Written documentation, training session, and credential handover when the engagement ends. Spelled out as a separate deliverable, not "best efforts."
SLA tied to engagement risk profile. Healthcare/finance gets a stricter SLA than internal productivity. Specify uptime, response time, and escalation path explicitly.
The Engine commitment line (if applicable). We line-item the 10–15% of revenue that flows to the Institute for Human Advancement on every invoice. If you're buying from a vendor that claims mission alignment, ask them to show it on the invoice.

5. When not to install AI

Four signals that the right answer is "not yet" or "not at all."

The metric isn't broken. If the workflow already moves the number you care about, installing AI is a cost-add, not a value-add. Optimize what's broken first.
The data is genuinely confidential. Not "we'd rather not share." Genuinely classified, regulated, or under-NDA. Some workflows shouldn't touch an external model regardless of the vendor's privacy claims. Use those workflows as the LAST AI target, not the first.
The team is in a transition. A merger, an exec rotation, a layoff cycle. Installing operating-layer AI during org turbulence is how you get an abandoned install in nine months. Wait until the team is stable enough to inherit and own it.
You can't name an operator who will own the result. "We'll figure it out post-install" is the most reliable predictor of an install that gets quietly turned off. Name the owner before signing the SOW.

On any of these four, the honest move from your vendor is defer or refer. We do this constantly. We'd rather refer you to a vendor whose timing fits than sell you something you'll regret.

6. References & further reading

From our own writing:

What an AI install actually costs — expanded version of section 1, with comparison against freelancer and SI routes.
Outcome-eval, the line item every AI install skips — expanded version of section 2 with the three vendor-asks checklist.
Atlas Discovery — our productized two-week audit ($5K–$15K) that produces the outcome-eval scope as a deliverable.
How we compare to ChatGPT Teams and Claude for Teams — honest feature matrix.
The Perpetual Engine — the structural commitment that 10–15% of every revenue dollar funds the Institute for Human Advancement.

This guide is free to read, share, and quote. If it saved you money or kept you from a bad install, the right return is to send it to the operator on your team who hasn't seen it yet.