The artifact — the essay, the examination, the project, the recorded performance — has served as the primary evidence of genuine learning for as long as formal education has existed. This evidential relationship rested on a causal assumption: only the process of genuine learning could produce the artifact. Generative AI has broken this assumption. The artifact can now be produced without the process. Any assessment infrastructure built solely on artifact analysis therefore has a finite and shrinking evidentiary lifespan as generation technology improves.
This paper argues that process friction traces — the behavioral signatures that genuine human cognitive engagement leaves in observable data — constitute an independent evidence stream that compensates for what the artifact can no longer reliably tell us alone. We develop the Genuine Learning Probability (GLP) framework, a probabilistic methodology specifying seven observable friction components: temporal engagement pattern, error trajectory coherence, cross-context transfer, uncertainty calibration, social knowledge texture, retrieval strength decay signature, and scaffolding response curve.
The paper makes four claims. First, genuine human learning at each cognitive tier leaves characteristic friction traces arising from the irreducible complexity of the neurological processes that constitute learning. Second, these traces are partially independent of artifact quality and provide information about genuine engagement that the artifact alone does not contain. Third, the composite GLP score is substantially more robust to gaming than any individual component because manufacturing all friction traces simultaneously without performing the underlying cognitive work is essentially equivalent in cost to performing that work. Fourth, the GLP framework is not a replacement for artifact-based assessment but a formally specified second evidence stream that instructors can combine with artifact evidence in whatever proportion their professional judgment warrants.
Genuine human learning at each cognitive tier leaves characteristic friction traces arising from the irreducible complexity of the neurological processes that constitute learning.
These traces are partially independent of artifact quality and provide information about genuine engagement that the artifact alone does not contain.
The composite GLP score is substantially more robust to gaming than any individual component — manufacturing all friction traces without genuine engagement is essentially equivalent in cost to genuine engagement.
The GLP framework is a formally specified second evidence stream. Instructors combine it with artifact evidence in whatever proportion their professional judgment warrants.
1.Introduction
The artifact used to be proof of the process. It no longer is.
For most of human educational history this statement would have been meaningless. The essay demonstrated thinking because only thinking could produce the essay. The proof demonstrated mathematical understanding because only mathematical understanding could produce the proof. The clinical note demonstrated clinical reasoning because only clinical reasoning could produce the clinical note. The artifact and the process that produced it were causally coupled tightly enough that measuring one was effectively measuring both.
Generative AI has severed this coupling. A well-structured essay can now be produced in seconds by a system that has performed none of the cognitive work the essay was designed to evidence. A correct proof can be generated by a system with no mathematical understanding in any sense that matters for the student's development. The artifact exists. The process that should have produced it did not occur.
This decoupling is not a temporary condition that will resolve as detection technology improves. It is a permanent structural change driven by the continuous improvement of generation technology. The forensic window — the period during which artifact analysis can reliably distinguish AI-generated from human-generated work — is closing sequentially across domains. In writing it is largely closed already. In code it is closing. In visual art it closed years ago.
Any detector trained on current AI outputs becomes obsolete as generation technology advances. The arms race between generation and detection has a predictable winner. This paper proposes measuring what the artifact used to be evidence of directly — the process of genuine learning itself.
2.The Decoupling Problem
2.1 The Causal Chain That Broke
Traditional assessment rests on an implicit causal model:
The artifact was never the thing we cared about. We cared about the cognitive process — the schema formation, the conceptual development, the capacity for transfer. The artifact was valuable as evidence because it was causally downstream of the process. Generative AI inserts a bypass:
The artifact now has two causal pathways. One passes through genuine cognitive process. The other bypasses it entirely. The artifact is identical at the end of both pathways — or will be, as generation technology improves.
2.2 Why Detection Cannot Solve This
Artifact-based AI detection attempts to distinguish the two causal pathways by analyzing properties of the artifact. These approaches face three structural limitations:
- Temporally bounded. Every detection methodology is trained on current generation technology. Generation technology improves continuously. Detection is always fighting the last war.
- They answer the wrong question. The educationally relevant question is not whether a human typed these words but whether a human developed this understanding.
- They create perverse incentives. Students learn to game the detector rather than engage with the material. The simulation gets better over time as students share strategies.
2.3 The Bjorkian Insight Applied
Robert and Elizabeth Bjork's foundational distinction between performance and learning is directly relevant. Performance is the observable, often temporary fluctuation in behavior during or immediately after instruction. Learning is the more permanent change in knowledge that supports subsequent access and transfer in novel contexts.
The artifact measures performance. AI assistance is the limiting case: it maximizes performance while minimizing — potentially eliminating — the learning process. What we need to measure is learning, not performance. Learning leaves traces that performance does not. These traces are not in the artifact. They are in the data that surrounds the artifact's production.
3.The Neurobiological Foundation of Friction Traces
3.1 Why Genuine Learning Leaves Traces
Friction traces are not metaphorical. They are behavioral consequences of neurobiological events that constitute genuine learning.
Dopamine neurons fire in response to prediction errors — discrepancies between what the learner expected and what they encountered. This phasic dopamine release facilitates NMDA receptor trafficking and initiates long-term potentiation. BDNF expression is upregulated as much as 2.8-fold during moderate cognitive challenge. Dendritic spine formation increases by 37% under moderate cognitive load compared to low-load conditions. An AI can produce the artifact without triggering any of these events. It cannot produce the behavioral traces those events leave, because the events did not occur.
3.2 The Storage-Retrieval Distinction
Bjork's New Theory of Disuse distinguishes storage strength — how thoroughly a memory is encoded and integrated — from retrieval strength — how accessible the memory currently is. High retrieval strength immediately after learning does not indicate high storage strength. A student who processed an AI explanation has high retrieval strength in the short term, but without the effortful encoding that builds storage strength, retrieval strength decays rapidly and the spacing effect — the benchmark of genuine learning — does not appear.
3.3 The Fluency Trap as a Measurement Problem
The brain confuses perceptual fluency — the ease with which information is processed — with understanding. This fluency trap has a counterintuitive implication: in the AI era, high artifact quality achieved through borrowed fluency may be mild negative evidence of genuine learning, because genuine struggle with difficult material characteristically produces roughness — places where the student lost the thread, found it again, approached the same concept from multiple angles before landing on a formulation.
The smooth, well-structured artifact may be evidence of borrowed certainty. The rough, searching artifact may be evidence of genuine engagement.
4.The Genuine Learning Probability Framework
4.1 Foundational Definitions
Let S denote a student, C a concept or skill being learned, and ℒ a learning episode spanning observation window Ω = [t₀, t₀ + τ]. Define the cognitive engagement state E as the set of neurological and behavioral processes activated during ℒ.
GLP is a property of the engagement process, not the artifact.
GLP is probabilistic and continuous — scored in [0,1] with explicit credible interval.
GLP is tier-sensitive — calibrated to the cognitive tier the activity is designed to develop.
4.2 The Seven Components
Genuine engagement with material of varying complexity produces characteristic time-on-task distributions. Intrinsic cognitive load — the element interactivity of the material — predicts engagement time when processing is genuine. Borrowed certainty decouples time from difficulty: the student spends time proportional to explanation length, not conceptual challenge.
The reward prediction error mechanism produces coherent error evolution during genuine learning. Each error is a prediction violation that updates the mental model. Because updates are cumulative, the error trajectory follows a path reflecting the concept's structure — early errors reflect initial misconceptions, later errors reflect more sophisticated partial understandings.
Transfer — applying knowledge in novel contexts — is the Bjorkian definition of learning. Schema formation through germane cognitive load produces representations that generalize across surface variations. Borrowed certainty produces surface representations tied to the specific context of the AI explanation. The transfer gap ρ_near − ρ_far is independently diagnostic: large positive values indicate surface representation without schema.
Genuine learning through effortful retrieval and prediction error produces calibrated uncertainty — the student learns not just what is correct but what they know and do not know. Borrowed certainty produces systematic overconfidence — the student inherits the AI's confidence distribution without the knowledge base that would justify it.
Genuine encounter with material leaves a characteristic texture in social and discursive contexts — specific confusions, particular connections to prior knowledge, questions that arose from genuine engagement. This texture cannot be manufactured without having had the experience of genuinely encountering the material.
The spacing effect is the benchmark of genuine learning. Borrowed certainty has no storage strength to retrieve. Performance decays monotonically and the spacing effect is absent. At the individual level the decay curve shape is diagnostic independently of the experimental design.
The Zone of Proximal Development is a structural property of a genuinely developing mental model. A student with genuine partial understanding has a ZPD — a region of near-competence that targeted scaffolding can activate. Borrowed certainty has no ZPD because there is no developing model for scaffolding to connect to.
5.The Ensemble Architecture
5.1 Why Ensemble Rather Than Single Model
The seven GLP components have different statistical structures, different data types, and different failure modes. More importantly the components fail — can be gamed — in different ways. A student gaming all seven simultaneously is performing work that approaches the cost of genuine engagement — at which point the gaming has become indistinguishable from learning in the only sense that matters.
5.2 The Three-Layer Architecture
| Layer | Function | Algorithm | Output |
|---|---|---|---|
| Layer 1 — Component Models | One base model per friction component, using the algorithm suited to that component's data structure | Survival analysis (Y₁) · Graph model (Y₂) · Gradient boosting (Y₃) · Isotonic regression (Y₄) · NLP model (Y₅) · Mixed effects longitudinal (Y₆) · Causal inference (Y₇) | P(E genuine | Yᵢ) |
| Layer 2 — Tier-Conditioned | Seven combination models, one per cognitive tier, that learn optimal weighting of Layer 1 outputs conditional on which tier the activity develops | Tier-specific ensemble weighting — learned from labeled data | P(E genuine | Y, tier) |
| Layer 3 — Meta-Model | Final combination model; handles missing components gracefully by widening credible interval to reflect reduced information | Meta-ensemble | GLP ∈ [0,1] with credible interval |
5.3 The Instructor as Meta-Model
The ensemble architecture produces a formally specified GLP score. The instructor receives this score alongside the artifact quality score and combines them into an overall assessment judgment. The appropriate weighting depends on: the learning objectives; the cognitive tier being developed; the stakes; and the context. The paper provides the second evidence stream. The weighting belongs to the educator.
6.Tier Calibration
Each tier of the Irreducibly Human framework engages distinct cognitive and neurological processes. GLP measurement without tier calibration conflates fundamentally different kinds of cognitive work.
| Tier | Primary Cognitive Process | Primary GLP Components | Assessment Note |
|---|---|---|---|
| 1 · Pattern | Statistical regularity detection | Y₁, Y₃ | Least diagnostically useful — pattern recognition is AI's home territory |
| 2 · Embodied | Sensorimotor schema formation | Adapted Y₇, performance variation | Standard components must be adapted to physical performance contexts |
| 3 · Social | Genuine intersubjective cognition | Y₅ (primary) | Social texture most resistant to manufacturing; requires genuine contact with another perspective |
| 4 · Metacognitive | Oversight of one's own cognitive processes | Y₄, Y₂ (primary) | Calibration that develops over the course is the characteristic trajectory |
| 5 · Causal | Counterfactual and interventionist reasoning | Y₃ (primary) | Transfer is the primary diagnostic — causal understanding enables cross-surface generalization |
| 6 · Collective | Emergent group intelligence | Group-level analysis required | Individual GLP measures structurally inadequate; contribution patterns showing genuine interdependence |
| 7 · Wisdom | Practical judgment under genuine stakes | Decision histories; expressed uncertainty specificity | Standard assessment almost entirely inappropriate — stakes define the tier |
7.Validation
7.1 Labeled Corpus Construction
The GLP framework requires a labeled corpus of confirmed genuine and confirmed borrowed engagement cases. Confirmed genuine engagement draws from students with documented engagement trajectories — convergent evidence across multiple components. Confirmed borrowed certainty draws from documented cases including students who submitted AI-generated work later acknowledged in academic integrity proceedings, experimental conditions where students were explicitly instructed to use AI without engaging with the material.
7.2 The Information Gain Test
The central empirical claim: process measurement adds independent information about genuine learning beyond what the artifact provides. This claim is directly testable:
8.The Arms Race Problem and Why This Framework Is Different
For artifact-based detection the arms race objection is decisive. Generation technology improves continuously; detection technology is always calibrated to past outputs. The window closes.
For process-based measurement the objection is substantially weaker for two reasons. First, manufacturing convincing friction traces across all seven components simultaneously without performing the underlying cognitive work is essentially equivalent in cost to performing the underlying cognitive work. Second, the framework is not trying to detect AI use. It is trying to measure genuine learning directly.
A student who manufacturing all seven friction traces has learned the material. At that point the gaming has become indistinguishable from learning in the only sense that matters. The framework has not been defeated. It has been satisfied.
9.Discussion
9.1 What This Paper Is Not Claiming
- The paper is not claiming that artifacts are worthless. The artifact has not become zero evidence. It has become insufficient as the sole evidence.
- The paper is not claiming that all instructors must adopt the full GLP framework. The framework specifies what is possible.
- The paper is not claiming that process measurement is always more informative than artifact measurement. The paper claims the process adds independent information — not that it always adds more.
- The paper is not claiming that the GLP framework can replace the instructor's judgment. The instructor is the meta-model.
9.2 The Institutional Design Implication
If process friction traces are independent evidence of genuine learning, then institutional assessment infrastructure should be designed to make those traces observable. This means: longitudinal process documentation as primary rather than supplementary evidence; embedded formative assessment as the primary data source; developmental trajectory as credential.
Portfolio assessment, formative evaluation, and developmental credentialing have been advocated for decades. The argument of this paper is that the AI decoupling makes them urgent rather than merely desirable — that the institutional cost of not building process-observable assessment infrastructure is now the progressive obsolescence of artifact-based credentialing.
9.3 The Ethics of Process Observation
Process observation for the purpose of supporting learning is categorically different from process observation for the purpose of surveillance. The distinction is in what the data is used for and who controls it. The GLP framework is designed to support the first use. Institutional implementation must actively guard against the second.
10.Conclusion
The artifact has been decoupled from the process that used to produce it. The decoupling is irreversible, accelerating, and domain-general. Any assessment infrastructure built solely on artifact analysis has a shrinking evidentiary lifespan.
Process friction traces — the behavioral signatures that genuine human cognitive engagement leaves in observable data — are an independent evidence stream that compensates for what the artifact can no longer reliably tell us alone. They exist because genuine learning is a biological event that produces behavioral consequences. They are partially independent of artifact quality. They provide information about genuine engagement that the artifact does not contain.
The Genuine Learning Probability framework formalizes this evidence stream as a probabilistic, tier-calibrated, ensemble-based measurement methodology. It does not replace artifact assessment. It gives artifact assessment a partner. The instructor determines the weighting.
The crisis of evidence facing educational institutions is not a technical problem requiring a better AI detector. It is an epistemological problem requiring a new evidence infrastructure — one built on the process of learning rather than its products. The artifact used to be proof of the process. It no longer is. Now we must measure the struggle itself.
References
- Bjork, R.A., and Bjork, E.L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, and R. Shiffrin (Eds.), From Learning Processes to Cognitive Processes: Essays in Honor of William K. Estes (Vol. 2, pp. 35–67). Erlbaum.
- Bjork, R.A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe and A. Shimamura (Eds.), Metacognition: Knowing about Knowing (pp. 185–205). MIT Press.
- Brown, N.B. (2026). Measuring the Friction: A Probabilistic Framework for Detecting Graph Contamination in Music Streaming Platforms. Musinique Research Trilogy, Paper III.
- Craik, F.I.M., and Lockhart, R.S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684.
- Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285.
- Vygotsky, L.S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press.
The GLP framework implementation, labeled corpus, and assessment design templates are published openly at irreduciblyhuman.com. The methodology is not a secret.