The Compliance Trap: Agency, Surveillance, and the Metrics That Actually Matter

There’s a question worth sitting with before any discussion of learning metrics: Who is this training actually for?

Not who it claims to be for. Not who it should be for. Who it is actually, functionally, operationally designed to serve.

A lot of corporate learning doesn’t survive that question.

What Compliance Training Is Really Doing

We need to be honest about a category of workplace learning that the literature rarely names directly: training that exists not to change behavior but to document that the organization tried to change behavior. The distinction matters enormously, because these two goals require completely different designs—and produce completely different metrics.

Sexual harassment training completed by 94% of employees doesn’t mean 94% of employees won’t harass anyone. It means 94% of employees clicked through enough screens to generate a completion record that lives in an LRS until a plaintiff’s attorney asks for it. The training is not a learning intervention. It’s a legal artifact. And when we measure it with learning metrics—completion rates, post-test scores, knowledge retention at 30 days—we are applying a rigorous framework to a document that was never designed to produce the outcomes that framework measures.

This is not cynicism. It’s diagnosis. The research from the document Robert is asking about makes the distinction visible, even if it doesn’t always name it: there is a “second digital divide,” and it runs not between those who have technology and those who don’t, but between those using technology for high-agency work and those using it for surveillance and drill. Compliance training almost always lands on the wrong side of that line.

The Surveillance Metrics and Why They Fail

The metrics Robert is asking about—completion rates, time-on-screen, post-quiz scores—are surveillance metrics. They answer a specific question: Was the employee present? They do not answer the question he’s actually asking: Did anything change?

This matters because the two questions have entirely different implications. Surveillance metrics protect the organization. Learning metrics serve the learner. When an organization treats the first as evidence of the second, it is committing a category error that no amount of data sophistication will correct.

Consider what “time spent” actually measures. High duration might indicate a confused learner clicking through mandatory content they can’t skip, a distracted employee who left a tab open, or a genuine deep engagement with difficult material. The number looks identical in all three cases. The OECD PISA research—which found that heavy digital device use actually decreased performance outcomes compared to moderate use—suggests that more screen time is frequently counterproductive. We are not just failing to measure learning. We are sometimes measuring the conditions that undermine it.

The research is clear on what surveillance metrics do: they identify whether training occurred. They have essentially zero predictive validity for whether training transferred. And transfer is the only thing that matters.

What Agency Metrics Look Like

Robert’s question—what metrics actually show people applying what they learned—is asking for transfer metrics. Here’s what the evidence supports.

Manager-Rated Behavior Change is imperfect but honest. It requires supervisors to make specific observations about specific behaviors at 30, 60, and 90 days post-training. The research shows effective programs maintaining 70-80% knowledge retention at 90 days versus the typical 20-30% fade. If your organization doesn’t have baseline data because no one has ever measured retention beyond the immediate post-quiz, that’s your first finding. The absence of measurement is itself a measurement.

Error Rate and First Contact Resolution work because they measure consequences of behavior rather than behavior itself. If a customer service team completes communication training and their First Contact Resolution rate doesn’t move, the training either didn’t transfer or the behavior it targeted wasn’t the behavior causing poor FCR. Both conclusions are useful. Neither is visible in a completion report.

Internal Mobility Rate is a longer-range signal but a revealing one. Organizations that develop skills employees actually want and use see people grow into different roles. Organizations that run compliance cycles see people leave.

Application Transfer Rate—the percentage of participants who demonstrably apply learned skills to their jobs—requires the most work but produces the most honest data. It requires partnership between L&D and supervisors, job shadowing, and performance review integration. Most organizations don’t do it because it requires cooperation across departments that rarely share accountability. That structural resistance is itself diagnostic.

The Deeper Problem: When Learning Is a Liability Management Tool

There’s a harder version of this conversation that the metrics literature tends to avoid. Some training is not designed to change behavior because the organization that commissioned it doesn’t actually want the behavior to change. It wants documentation that it tried to change the behavior.

This is not hypothetical. Annual harassment training exists in organizations where the C-suite tolerates behavior it trains against. Diversity and inclusion modules get completed by teams whose hiring practices remain unchanged. Safety compliance courses are taken by workers whose working conditions remain unsafe. The training is real. The organizational commitment to the behavior the training targets is not.

No measurement framework resolves this. The Success Case Method—Brinkerhoff’s approach of finding the best and worst performers and understanding what differentiated them—is honest enough to surface it: the method often reveals that the most successful participants had strong manager support, and the least successful faced environmental barriers. When those environmental barriers are structural and intentional, surfacing them is not a technical problem. It’s a political one.

This is where the “second digital divide” becomes genuinely clarifying. The divide between high-agency and surveillance environments isn’t primarily technological. It’s philosophical. Does the organization believe that employees are learners whose development serves mutual interests? Or does it believe they are liabilities whose completion records serve organizational interests? The training design reflects the answer. The metrics reflect the answer. The gap between the two—when an organization claims it believes the first thing and designs as if it believes the second—is the space where most corporate learning actually lives.

What Transfer Actually Requires

The PwC VR study, Boeing’s AR assembly work, Delta’s de-icing proficiency results—these are transfer stories, not completion stories. They share a common architecture: realistic simulation of the actual decision environment, immediate feedback on the consequences of choices, repetition with variation until skill becomes automatic. Boeing’s accuracy improvement from 50% to 90% on first-attempt assembly isn’t a metrics story. It’s a design story. The metrics only became meaningful because the training was designed for transfer rather than documentation.

The difference is expensive and organizational. You cannot achieve transfer metrics with surveillance-designed training by adding better analytics to it. You achieve transfer by designing for the moment a technician is standing in front of a real aircraft, or a manager is delivering real feedback, or a customer service rep is on a real call with a frustrated customer. The question the training must answer is not “Can the learner select the correct answer under controlled conditions?” but “Can the learner make the right decision when the conditions are nothing like the training?”

Will Thalheimer’s LTEM model makes this structural: Tiers 1-4—attendance, activity, perception, knowledge—are all inadequate to validate learning success. They are surveillance tiers. The model doesn’t say they’re worthless. It says they don’t tell you what you think they tell you. Tier 5, decision-making in realistic scenarios, is where validation begins. Tier 7, transfer to actual workplace behavior, is where organizational value is realized.

Most corporate training never gets past Tier 4. It’s not that organizations don’t have the data. It’s that Tier 4 data is the data that protects the organization from liability. Tier 7 data is the data that tells you whether anyone actually learned anything.

The Question Worth Asking Out Loud

Robert’s question—what metrics show application rather than seat time—is the right question. The harder version of it is: Why do so many organizations resist answering it?

The answer isn’t incompetence. Most L&D professionals know the difference between completion rates and transfer rates. The answer is that completion rates are defensible in court and transfer rates require organizational commitment to change that many institutions are not prepared to make.

Here’s what it means to take transfer metrics seriously: You are committing to measure whether the behavior changed. If it didn’t, you are committing to ask why. And if the answer is that the environment doesn’t support the behavior the training was designed to produce—that the manager doesn’t reinforce it, that the culture doesn’t reward it, that the incentive structure actively undermines it—then you are no longer having a conversation about learning metrics. You are having a conversation about organizational change.

That’s a harder conversation. It’s also the only conversation worth having.