Atomic Note

People Analytics Borrowed Intelligence's Vocabulary. It Should Steal the Tradecraft Instead.

workforce intelligenceanalytical disciplinecalibrationorganizational decision-makingAI limitationsbias mitigation

I spent eight years in intelligence before I built people analytics teams, so watching "people intelligence" become the field's favorite phrase is a strange experience. The language is right there now. Napper has a whole manifesto on it. People throw around HUMINT and SIGINT and "intelligence gathering" like the words alone confer something.

They mostly borrow the aesthetic. The taxonomy of collection sources, the spy-shop vocabulary, the gravitas. What they skip is the part that actually made intelligence work valuable, which was never the collection and never the analysis-as-production. It was the tradecraft. And tradecraft is exactly the thing people analytics is least equipped to copy, which is a problem, because it's also the only thing in the field that AI can't do for you.

Tradecraft isn't analysis

Start with the distinction that took me years to internalize and that the PA field mostly hasn't. There's the work of producing an answer, and there's the discipline of reasoning your way to one you can trust. They are not the same skill, and only one of them is scarce now.

Production is running the regression, cleaning the data, building the model, making the chart. That's gone. AI does it faster, and without a PhD. If your value is that you can build the dashboard, you're already competing with a chat box that builds it in a minute. Concede that ground. Defending it is how you lose.

Tradecraft is the rest: the discipline of reasoning under uncertainty when the data is partial, the stakes are real, and being confidently wrong is expensive. It is a learnable, teachable method, and it is almost entirely missing from how we hire and train in this field. See [[Tradecraft is the analytical edge, not human judgment or empathy]].

What it actually looks like, in our work

A few of the moves, translated out of the spy world and into ours.

Make the hypotheses compete. The instinct, in intelligence and in HR, is to pick the explanation that feels right and then go collect support for it. Attrition's up, so it's comp. Engagement dipped, so it's the new manager. The tradecraft move is to lay out every plausible explanation at once and make them fight for the same evidence, because the most likely answer is the one with the least evidence against it, not the one with the most evidence for it. Most of the "support" you'd gather for your favorite story is equally consistent with three others. That fact alone, that most evidence isn't diagnostic, would change half the readouts I've seen.

Hunt for what would disconfirm you. People don't naturally look for the evidence that kills their theory. They look for the evidence that confirms it, and then discount whatever doesn't fit. In a field where the model is going to inform who gets retained, promoted, or managed out, that habit isn't a quirk, it's a liability.

Notice the dog that didn't bark. Intelligence taught me to weigh the evidence that should be there and isn't. In PA we obsess over the signal in front of us and almost never ask what we'd expect to see if our story were true, and whether we're actually seeing it.

Put confidence in numbers, not words. This is the one I'd tattoo on the field. My favorite example isn't even mine. In 1951, a US estimate called a Soviet move a "serious possibility." Sherman Kent, bothered, asked the board members who'd all signed that sentence what odds they'd actually meant. Their private answers ranged from 20 percent to 80 percent. They had unanimously agreed on a phrase while disagreeing four-to-one about the event. "Possible," "likely," "significant risk" are empty shells; every reader fills them with the number they already believed. Put a number on it and you can be measurably wrong, which is exactly why people avoid it, and exactly why it matters. See [[Kent's board agreed on "serious possibility" while privately meaning 20 to 80 percent]].

Start outside, then go inside. When a manager forecasts a project, a hire, or an attrition wave, the instinct is to stare at the specifics of this case. The disciplined move is to start with the base rate, the outside view, and only then adjust for what's unique here. It sounds crude and it works embarrassingly well. A film studio predicted box-office revenue off nothing but genre, cast, and storyline, a reference class of comparable films, and landed a 25% mean error from basically a poster and a paragraph. The UK now mandates this kind of reference-class forecasting for transport projects because the inside view was reliably optimistic and reliably wrong.

Say in advance what would change your mind. A judgment with no stated tripwire is unfalsifiable, which means it's unaccountable. Name the milestone that would flip your call before events arrive, and you can't quietly rationalize it away later.

None of this is exotic. It's the same family as a premortem, a red team, a calibrated forecast scored against what actually happened. Tetlock's forecasting tournaments found that what separates people who are genuinely good at prediction from people who just sound authoritative isn't intelligence or access. The single best marker was granularity: the good ones could tell a 55% from a 45%, and they treated probability as a skill worth grinding on. That's a learnable habit, not a gift.

Why it's worth more now, not less

I wrote a version of this in a comment last week and it's the part I'm most sure of. Large language models are satisficers. They grab the most plausible story, confirm it, ignore what's missing, and stay confident while they're wrong. That is the precise failure mode that gets trained out of you in intelligence analysis, and the machine has now industrialized it. It produces the confident, plausible, unchecked answer at a scale and speed no human ever could.

Here's the concrete version in our own backyard. Point an LLM at a resume and it carries an implicit prior that the candidate is roughly a coin flip. But in a real software pipeline, maybe 5 to 10% of applicants actually clear the bar. So a candidate who should read as 15% qualified comes back as 60%, a four-fold inflation, and now your funnel is stuffed with confident false positives that each cost real interviewer time. The model isn't lying. It's satisficing, and it has no idea what the base rate is. The person who catches that is doing tradecraft, not production.

So the value of the discipline that catches the error goes up. When anyone can generate a polished, wrong analysis in seconds, the rare and expensive skill is being the person who can tell that it's wrong and show why. AI didn't make tradecraft obsolete. It made tradecraft the job.

This also clears up the tired "human judgment is the edge" line. It's half right. The edge is human, but not the soft version people mean by it. Empathy and intuition are not the moat. Confident intuition is exactly what the machine mass-produces now. The moat is judgment with the method put back in.

The uncomfortable proof: let the machine combine, keep the judgment

There's a body of evidence here that every person in this field should have to sit with, because it stings. Grove and Meehl went through 136 studies pitting expert "clinical" judgment against a mechanical rule. The rule won or tied 94% of the time. Human intuition won outright in fewer than 6%. In hiring specifically, moving from holistic judgment to a mechanical rule lifts accuracy 10 to 13% and can raise predictive validity by more than half, and the unstructured interview we all trust has a near-zero correlation with how people actually perform.

Then the part that should change how we think about our own value: the model keeps its edge even when the humans are handed more data than the model uses. Read that twice. The problem was never that people lacked information. It's that human brains combine information inconsistently, and a dumb consistent rule beats a smart inconsistent one.

The instinct is to read that as an argument against us. It's the opposite, if you read it carefully. It says stop fighting for the part you're bad at. Combining the inputs into a score is the machine's job now; concede it the way we concede the regression. What's left for the human is the part the study quietly depends on and never names: choosing what to predict, picking predictors that are actually valid, checking the rule for the bias it inherited, and knowing when this case sits outside the reference class the model was built on. That's not production. That's tradecraft, and it's where the human belongs.

The field doesn't select or train for any of this

Here's what worries me. We hire for statistics, SQL, storytelling, business partnership. I have never once seen a people analytics job description screen for calibration, for the instinct to disconfirm, for the discipline of competing hypotheses. We test whether someone can build the model. We don't test whether they know when to distrust it.

And the field's center of gravity, its relational, humanistic instinct, actively resists the cold parts of this. The ethnographic work on HR analytics describes people softening or quietly ignoring the rigorous output because it threatens how they see their own role. You can't import a discipline that the profession experiences as an attack on its identity. That's the real barrier, and it's deeper than a skills gap.

The frustrating thing is how cheap the fix is. In Tetlock's tournaments, a debiasing module that took under an hour improved people's real forecasting accuracy by 6 to 11% and held up across four years. McKinsey looked at more than a thousand big corporate investments and found that the organizations that actually worked to strip bias out of their decisions earned up to seven percentage points more in returns. This isn't a moonshot capability that takes a decade to build. It's an afternoon of training and a habit nobody's bothering to instill.

How you'd actually build it

The thing intelligence got right is that tradecraft isn't a personal talent you hope to hire. It's an institutional habit you can engineer. A few moves that travel directly:

Make competing hypotheses mandatory on the high-stakes calls. If an analysis is going to move real money or real careers, require that the rejected alternatives get named and the reasons written down. "Here's what I concluded" is half a deliverable. "Here's what I ruled out and why" is the other half. See [[Real knowledge means knowing why the alternatives were rejected]].

Review across the lines. Have someone outside the team read the work precisely because they don't share its assumptions. The people closest to a problem share the same blind spots; that's why intelligence built peer review by analysts from entirely different desks.

Calibrate people, literally. There's a known drill: have analysts give 90% confidence intervals on a pile of questions, score whether the truth actually lands inside the interval 90% of the time, and repeat with feedback until their stated confidence matches their real accuracy. Most people start wildly overconfident and tighten up fast. It's the cheapest quality intervention I know of, and almost no PA team runs it.

Keep score, and score the outcome. Run post-mortems against what actually happened and track your forecasts with a real number. One wrinkle worth knowing: a long Tetlock experiment found that holding forecasters accountable for getting it right beat holding them accountable for following the right process, by about double. Process accountability quietly pushes people toward defensible conformity; outcome accountability pushes them to figure out what actually works. Grade the call, not just the method.

Train the reasoning, not the tools. We assume that someone who knows the methods knows how to think. Intelligence learned, expensively, that this is false, and that the thinking has to be taught on its own.

Why this is the part that survives

I've argued elsewhere that the ownership fight for workforce intelligence is mostly already lost, that Finance and Engineering are taking the high-value mandate while the routine work diffuses into every HRBP's job. I still think that's true. See [[The default future for people analytics is quiet dismemberment]] and [[Cheap production barbells a field, it doesn't raise the bar evenly]].

But there's one seat that holds, and tradecraft is the thing that lets you sit in it. Finance will model capacity and miss the humans. Engineering will measure throughput and miss the tail. Their people-models will encode old bias and break on messy data the moment they touch a real hire or fire, and that failure is legally radioactive. The durable role isn't authoring workforce intelligence. It's being the one with the discipline to catch when everyone else's version of it is confidently wrong, before it becomes a headline or a lawsuit. See [[The defensible seat is auditing people-models, not authoring them]].

That seat is real. It also only goes to people who actually have the tradecraft, and right now we are neither selecting for it nor teaching it. We borrowed the word "intelligence." We should have stolen the method. There's still time to, but the door is the same one that's closing on the rest of the field, so I wouldn't wait.