Humans should choose what to measure; machines should combine the scores
Grove and Meehl reviewed 136 studies comparing expert clinical judgment against a simple mechanical rule. The rule won or tied 94% of the time. Human intuition beat the rule outright in fewer than 6% of cases. In hiring specifically, shifting from holistic to mechanical scoring lifts accuracy 10-13% and can more than double predictive validity. The unstructured interview, which most practitioners trust, correlates near zero with actual job performance.
The instinct is to read this as an argument against human analysts. It isn't. It says: concede the combining step. A dumb, consistent rule beats a smart, inconsistent one. Humans weight the same inputs differently on different days, and consistency is the only thing that matters once the inputs are chosen.
What the studies depend on but never name: someone chose what to predict, picked predictors that are actually valid, and checked the rule for the bias it inherited from the training data. None of that is production work. All of it requires judgment. That's where the human belongs, and AI doesn't change it.
Source claim: Mechanical rules beat human judgment at combining inputs into scores 94% of the time, but the human role is upstream: choosing what to predict and ensuring the rule isn't encoding old bias.