What We Learned Building an AI SDR with Claude

We launched Scout this week — an AI SDR that runs entirely on your machine. This post is the behind-the-scenes story of building it: the architecture decisions that worked, the ones that didn't, and the lessons we learned integrating Claude into a production sales pipeline.

If you're building with LLMs or thinking about AI-powered sales tools, hopefully some of this is useful.

Why We Built Scout

The AI SDR market is crowded. By our count, there are 50+ tools in the category. So why build another one?

The short answer: every existing tool is a SaaS. You upload your leads, your ICP, your messaging strategy to someone else's servers. Your sales intelligence — the thing that gives you an edge — gets pooled on a platform you don't control.

We thought the architecture was wrong. Not the AI part — the deployment model. What if the AI came to your data instead of your data going to the AI?

That's Scout: a desktop app where everything runs locally. Your leads, your scoring models, your outreach history — all on disk, on your machine. The only external calls are to public data sources for signal detection and to Anthropic's API for Claude inference.

The Signal-Based Approach

Before we talk about the technical stack, it's worth explaining the core product insight that shaped every architecture decision.

Traditional AI SDRs work like this: give the system a lead list, and it generates personalized emails for each contact. It's "cold outreach with better copy." The improvement is in the writing, not the targeting.

We flipped this. Scout starts with signals — events that indicate buying intent — and works backward to identify relevant leads.

What kind of signals?

A company posts a job listing for an SDR (they've decided to invest in outbound)
A VP of Sales tweets about "scaling outbound challenges" (they're feeling pain)
A startup announces a Series A (they're about to hire and need tools)
Someone asks on Reddit: "What's the best tool for cold outreach?" (direct intent)
A competitor's customer leaves a negative review (they might be shopping)

The insight isn't novel — good human SDRs have always watched for buying signals. But they can monitor maybe 50-100 accounts manually. Scout can monitor thousands of signal sources simultaneously.

Early results validated the approach: ~12% reply rate on signal-based outreach vs. ~2% on our previous cold email strategy. Same product, same ICP, same messaging tone. The difference is timing and relevance.

Choosing Claude: The Model Evaluation

We evaluated three model families for the two core tasks in Scout's pipeline: lead scoring and outreach generation.

Lead Scoring

Lead scoring in Scout isn't the old firmographic "50 points for company size, 30 points for industry" approach. It's reasoning-based. The model sees a signal (e.g., "Company X posted a Senior SDR role on LinkedIn"), your ICP definition, and the lead's context. It has to reason about whether this signal indicates buying intent for your specific product.

This is where Claude stood out. The scoring rationale — the explanation of why a lead scored the way it did — was materially better. Claude would produce analysis like:

"This company posted an SDR role with 'experience with outbound automation tools' in the requirements. Combined with their Series A last quarter and current team size of 12, they're likely building out their sales motion for the first time. High fit for Scout's ICP of early-stage companies investing in outbound."

The competing models tended to produce more generic assessments that didn't tie the signal to the specific ICP as tightly. For a product where the user reviews every lead, the quality of the scoring rationale matters as much as the score itself — it's what helps the user make a fast approve/reject decision.

Outreach Generation

For outreach, the difference was more subtle but still meaningful. Claude-generated messages read less like templates. The signal reference felt more naturally woven in rather than bolted on.

An example of what we mean:

Template-fill feel: "I noticed your company recently posted an SDR position. We help companies automate their outbound sales..."

Signal-native feel: "Saw you're hiring your first SDR — exciting stage. When we were at that point, we found that automating the research and lead scoring gave our SDR 3x more time for actual conversations. Happy to share how if useful."

The second reads like something a thoughtful human would write after seeing the job posting. That's the bar. Claude consistently cleared it; other models did so less reliably.

Latency and Cost

One concern with Claude was latency. For Scout's use case, this turned out to be a non-issue. Lead scoring and outreach generation are asynchronous — they happen in the background while the user does other work. Whether a batch of 20 leads takes 30 seconds or 60 seconds to score doesn't affect the user experience because it all happens before the review queue.

Cost-wise, Claude's pricing fit our model well. Scout processes leads in batches with well-structured prompts, so token usage is predictable. The free tier's 50 leads/month stays well within reasonable API costs.

Architecture: Local-First Isn't Free

Making Scout local-first created some real engineering challenges. Here's what we learned.

Data Storage

We use SQLite for everything — leads, scoring history, outreach drafts, signal logs, user configuration. SQLite's single-file database model is perfect for a local desktop app. No database server to install, no connection management, no migration headaches for users.

The tradeoff: no real-time sync between devices. If you use Scout on your laptop and your desktop, they're separate instances. We considered building sync but decided it was premature complexity. Most of our target users (solo founders, small teams) work from one machine.

Signal Detection

Signal detection is the only part of Scout that requires network access. We scrape public data sources — LinkedIn public profiles, public job postings, Reddit threads, press releases, company blogs — to identify buying signals.

The legal and ethical boundaries here are clear: public data only. No login scraping, no private API abuse, no scraping behind authentication walls. If you can see it in a browser without logging in, Scout can monitor it.

The engineering challenge is reliability. Public data sources change their markup constantly. LinkedIn updates their public profile HTML every few months. Job board layouts shift. We built a signal detection layer that's intentionally resilient to markup changes — we extract semantic content first and let Claude interpret the relevance, rather than relying on brittle CSS selectors for structured data extraction.

The Review Queue

The review queue is the most important UX decision we made, and it's the one we initially got wrong.

Version 1 of Scout was fully automated. Detect signal → score lead → send outreach. No human involvement. In testing, this produced a false positive rate of roughly 15-20% — meaning one in five or six emails went to someone who clearly wasn't a fit.

In sales, a bad email isn't just a wasted email. It burns a lead you can never re-approach. At the volumes an AI can operate, automated bad emails compound into real reputation damage fast.

Version 2 introduced the review queue: Scout does all the research, scoring, and drafting, but the human makes the final send/skip decision. This added about 15 minutes of daily work for the user — reviewing 10-20 leads and their drafted messages.

The surprise was that the review queue actually improved performance beyond just reducing errors. Users who reviewed leads refined their ICP over time. They'd reject a lead and think "actually, companies under 10 people aren't a good fit for us" — and update their ICP definition in Scout. The queue became a feedback loop that made the scoring better with every batch.

Prompting Lessons

A few specific prompting patterns that made a meaningful difference:

Structured Output for Scoring

Early prompts asked Claude to "score this lead from 1-10." The scores were inconsistent and poorly calibrated. A lead that scored 7 in one batch might score 5 in the next for similar reasons.

We switched to a structured approach: ask Claude to evaluate against specific criteria (signal strength, ICP match, timing, company stage, role fit) and produce a reasoning chain before the final score. The structured criteria gave Claude guardrails without removing its ability to reason about edge cases.

Few-Shot Examples from the User's Own Data

Generic few-shot examples ("here's what a good lead looks like") produced generic scoring. We found that using the user's own approved and rejected leads as few-shot examples significantly improved scoring accuracy after the first week of use. Scout automatically selects the 3-5 most relevant approved and rejected leads from history as examples in each scoring prompt.

This is where local-first architecture creates a real advantage — your entire lead history is on disk, instantly available for prompt construction. No API call to retrieve examples from a cloud database.

Outreach That References Signals Without Being Creepy

The hardest prompting challenge: generating outreach that references the buying signal naturally. "I noticed you posted a job for an SDR" is fine. "I saw your LinkedIn post from Tuesday at 3 PM about scaling challenges" crosses into surveillance territory.

We settled on a rule: reference the signal category, not the specific source. "Sounds like you're building out the sales team" (derived from the job posting signal) rather than "I saw your Indeed listing for a Senior SDR posted 3 days ago." The first is a reasonable observation; the second implies monitoring.

What's Next

Scout's free tier is live. We're using it for our own outbound while we build the Pro tier, which will include expanded signal sources (CRM integrations, intent data providers), higher volume limits, and team features.

The biggest lesson from building Scout: AI in sales isn't about replacing the human. It's about making the human's 15 minutes of daily prospecting work as effective as someone's 4 hours of manual research. The AI does the research. The human makes the judgment calls.

If you want to try it, Scout is available at scout.pblvrt.com. Free tier, no credit card, no data uploads.

We're genuinely interested in feedback — what signals you'd want Scout to monitor, what's missing, what could be better. Reach out on X or reply to any of our community posts. We read everything.