The Room Where Respect Lives: On the FUSE Intervention and the Hidden Architecture of Math Classrooms
Why the Room Matters More Than the Curriculum
Consider what a mathematics classroom actually is. Not what the curriculum says it should be, not what the teacher intends it to be—but what it is, as experienced by a thirteen-year-old who already suspects they’re not good at this. It is a room where your confusion is visible. Where the wrong answer happens in front of people who matter to you. Where the teacher’s pause before responding tells you something you didn’t want to know about yourself.
This is the problem the Fellowship Using the Science of Engagement—FUSE—set out to solve. Not by rewriting the curriculum. Not by buying better technology. By changing what the room communicates.
This research came with a question that felt almost too simple: can you teach a teacher to change the culture of a classroom, and if so, does it actually move the math? The answer, emerging from a randomized controlled trial across 80 Texas public schools, 152 teachers, and 12,432 students, is yes—and the magnitude matters. Students in FUSE classrooms showed the equivalent of four additional months of learning, compared to students in classrooms that received a different, also-valid form of professional development. The Black-white achievement gap narrowed by 35 percent. Teacher burnout dropped by half—from 34 percent in the control group to 16 percent in the treatment group.
These are not small numbers. They are the kind of numbers that should make you ask what, exactly, was being unlocked.
What the Room Was Already Doing
The FUSE intervention rests on a deceptively straightforward premise: individual beliefs about learning—growth mindset versus fixed mindset—don’t operate in isolation. They respond to environmental cues. A student may theoretically believe that intelligence can be developed, but if their classroom consistently communicates that mistakes are evidence of deficiency, that theory will never make contact with behavior. The research team calls this the “Mindset x Context” framework, and it reframes what teachers are actually doing, moment to moment, whether they know it or not.
William Schmidt’s decades of work at Michigan State University established the structural conditions for this crisis. U.S. middle school mathematics teachers, he documented, face four compounding disadvantages: instructional time misallocated toward arithmetic at the expense of algebraic foundations; preparation programs that left many teachers under-equipped for demanding content; curricula that were “a mile wide and an inch deep”; and inequitable opportunity to learn, where a student’s race and socioeconomic status reliably predicted the rigor of instruction they received. Schmidt died in 2025, but the conceptual scaffolding he built—the measurement of “opportunity to learn” as a meaningful research variable—runs through the FUSE evaluation design.
What Schmidt’s framework named at the structural level, FUSE addresses at the interpersonal level. Both are necessary. You cannot fix a leaking roof by training the residents to be more comfortable with water.
The Biology of the Classroom
Here is the detail that makes FUSE feel less like an educational intervention and more like applied developmental biology: adolescents are, in a measurable physiological sense, more sensitive to respect from adults than younger children are. Pubertal maturation heightens this sensitivity. A fourteen-year-old in a math class isn’t just learning algebra—they are continuously processing signals about where they stand, whether the adult in the room sees them as capable, whether the room is safe enough for intellectual risk. A wrong answer isn’t just incorrect. It’s social information.
The FUSE program calls this the “culture of judgment and evaluation”—the default setting of most classrooms, in which the teacher functions as an evaluator and the student as a subject of evaluation. This dynamic triggers what the researchers describe as a disengagement cycle: students, to avoid the visible shame of public failure, withdraw effort. Lower effort produces lower performance. Lower performance invites more evaluative pressure. The cycle compounds.
The intervention trains teachers in five specific practices: mining mistakes for what they reveal about student thinking rather than treating them as failures to move past; surfacing student reasoning through open-ended questions; connecting behaviors like asking for help to values students already hold; normalizing confusion explicitly; and pairing high expectations with the explicit assurance of support.
None of these practices require a new textbook. They require a different orientation toward what the room is for.
The Assessment Decision and What It Reveals
There is a methodological choice embedded in the FUSE evaluation that deserves attention, because it illuminates something larger about what we’re willing to measure in education.
The researchers did not use the Texas state test—the STAAR—as their primary outcome measure for the one-year paper. They used a researcher-administered assessment developed by Schmidt’s center at Michigan State, aligned to international benchmarks rather than state procedural standards. The STAAR measures fluency. The MSU assessment measures reasoning: whether students can interrogate their own understanding, apply principles to novel problems, transfer knowledge across contexts.
The decision is defensible, and the researchers defend it carefully. State test data takes approximately a year to reach the Texas Education Research Center and be merged with participant records—results from the current FUSE cohorts won’t be available until 2027. More importantly, a test designed to measure procedural fluency may simply not be sensitive to the shift FUSE is trying to produce. If you’re teaching students to be comfortable with difficulty, to choose harder problems, to view confusion as a productive state, then a test that rewards algorithmic recall is testing something adjacent to your intervention, not the thing itself.
The researchers address the “over-alignment” concern directly: teachers had full instructional autonomy, the assessment items were determined by Texas state standards that all teachers followed regardless of condition, the Schmidt lab developed items without knowing the content of the FUSE curriculum, and tests were scored blind to treatment assignment. The safeguards are real.
What makes this choice worth examining is what it implies about standard educational accountability. The STAAR exists, in part, to ensure that students are acquiring specific content knowledge aligned to state standards. It was not designed to measure whether a classroom has become a place where adolescents feel safe enough to be wrong. The FUSE team is, implicitly, arguing that the second thing matters for the first—that a culture capable of producing intellectual risk-taking will eventually produce better procedural fluency too. The longitudinal data, when it arrives, will test that argument.
I find myself taking it seriously. The mechanism is plausible. The four-month learning gain on a rigorous conceptual assessment, in a study that controlled for the presence of professional development itself, suggests something real is being moved.
What We Are Waiting For
The Texas Education Research Center is one of the largest longitudinal data systems in the country, connecting K-12 records through postsecondary enrollment to workforce outcomes and earnings. When the FUSE cohorts’ state test data is eventually merged—sometime in 2027—researchers will be able to ask whether the culture-of-learning effect persists on the official accountability measure, and whether it translates into course-taking patterns, college enrollment, STEM persistence, and eventually wages.
This is the right question, asked at the right scale, with the right infrastructure. Most educational interventions are evaluated at the point of maximum proximity—immediately after the program ends, on a measure designed to be sensitive to the program’s content. FUSE is designed to be tracked through time, which means it will eventually know whether what it changed in a seventh-grade mathematics classroom in 2024 shows up in a hiring decision a decade later.
That is a bet on mechanism, not just effect. It assumes that the capacity to engage with difficulty—to choose the harder problem, to ask for help without shame, to persist through confusion—compounds. That it is a skill, not just a mood.
The evidence, so far, suggests it might be.
What Respect Actually Costs
At approximately $25 per student per year, FUSE is not expensive. This fact sits uncomfortably next to how much we have spent, over decades, on educational technology, curriculum reform, and standardized testing infrastructure that has not closed the gaps Schmidt documented in the 1990s.
The uncomfortable implication is not that those investments were worthless. It is that we have been willing to spend money on things that don’t require us to change how teachers relate to students, and reluctant to invest in the things that do. Changing a curriculum is, in a certain sense, easier than changing the culture of a room—it doesn’t require a teacher to examine their own assumptions about who is capable of learning.
FUSE asks teachers to do exactly that. And the 50 percent reduction in burnout suggests that when they do—when they move from enforcer to mentor, from evaluator to collaborator—they find the work more sustainable. Which is its own finding about what we’ve been asking teachers to be.
What the room communicates is not a soft variable. It is the variable. Four months of learning, a 35 percent narrowing of the Black-white achievement gap, and half the burnout rate are not the results of better worksheets. They are what happens when a room decides that confusion is not evidence of failure.
That decision, it turns out, is available. We have just rarely made it.
Tags: FUSE intervention mathematics, classroom culture randomized controlled trial, growth mindset adolescent development, teacher burnout reduction, Texas Education Research Center longitudinal data
