Beyond Socratic Questioning: Five Theories of Instructional Assistance and Why the Differences Matter

The SOCRIT protocol makes a compelling case for Socratic prompting—withhold the answer, force the student to generate reasoning, escalate only when they’re genuinely stuck. But Socratic resistance is one theory of learning, not the theory. The Assistance Dilemma framework (Koedinger & Aleven, 2007) names the real problem: deciding when to give versus withhold information during interactive learning. Different answers to that question produce fundamentally different instructional policies.

The Hint Ladder trusts immediate correctness feedback and graded information-giving—it holds that students learn more efficiently when they know they’re wrong and receive targeted help, not from productive struggle alone. Self-Explanation with Feedback shifts the target from answer correctness to reasoning quality, asking students to articulate the principle behind every move. Faded Worked Examples begins on the opposite end of the spectrum from Socratic resistance: show the full solution first, then systematically remove scaffolding as competence grows, reversing the assumption that withholding is always the right default. Intelligent Novice inverts the feedback loop entirely, letting safe errors proceed without flagging so students encounter the downstream consequences of their own reasoning.

These five policies aren’t variations on a theme. They encode genuinely competing hypotheses about where cognitive labor should live, when feedback accelerates learning versus when it short-circuits it, and what counts as productive struggle versus unproductive floundering.

Two of them—Socratic Resistance and the Hint Ladder—are straightforward to build right now. Both run as self-contained prompt-level behavior rules: no external state, no domain modeling, no infrastructure beyond the prompt itself. That’s where the Assistance Dilemma Policy Compiler starts. The other three require session state tracking, explanation quality rubrics, or domain error models, and they come next. But the sequencing is deliberate, not a concession. Build the two simple policies first, run them against each other on identical tasks with identical learners, and you already have something the field doesn’t: a clean empirical comparison between withholding and graded giving, controlled for everything except the theory.

The larger goal is straightforward. Any learning engineer should be able to swap one policy for another—same task, same learner, same metrics—and find out which instructional assumption actually holds for their students and their domain. Not “Socratic prompting works.” Not “hints are better.” But: here is what happened when we tried both, and here is what the data says.

The question was never whether Socratic prompting works. The question is: for whom, on what tasks, and compared to what?

Tags: assistance dilemma instructional theory, Koedinger Aleven tutored practice, hint ladder ITS design, faded worked examples expertise reversal, comparative instructional policy evaluation

Product Scope Document

Project Name: Assistance Dilemma Policy Compiler (ADPC)
Version: 1.0
Date: February 2026
Prepared By: Humanitarians AI / Nik Bear Brown
Status: Active — Phase 1 In Progress (Policies 1–2)

1. Executive Summary

This project builds a suite of interchangeable instructional interaction policies — each encoding a different theory of how to balance giving vs. withholding assistance during tutoring — plus a shared evaluation harness that runs them head-to-head on identical tasks, learners, and metrics.

Each policy is a prompt compiler:

Input: domain/task, learner level, risk tolerance, policy selection
Output: a structured prompt (or tutor config) that enforces a specific give/withhold strategy and switching criteria

The policies differ in what the tutor is allowed to do and when. The harness holds everything else constant so comparisons are credible.

Primary Use Case: Experimental comparison of instructional theories in AI-powered tutoring systems. Secondary use: direct deployment of individual policies as tutor prompts for specific learner populations.

Design Principle: All policies share the same interface. Researchers swap policies; the harness, tasks, learner archetypes, and metrics stay identical.

2. Project Objectives & Vision

3. Theoretical Foundation

All policies are grounded in the Assistance Dilemma framework (Koedinger & Aleven, 2007): the fundamental challenge of deciding when to give vs. withhold information or assistance during interactive learning.

The Core Trade-off

Policy Types by Interaction Class

4. Policy Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                     POLICY COMPILER LAYER                       │
│                                                                 │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐            │
│  │  Policy 1    │ │  Policy 2    │ │  Policy 3    │  ...        │
│  │  Socratic    │ │  Hint Ladder │ │  Self-Expl.  │            │
│  │  Resistance  │ │  (Cal. Asst) │ │  w/ Feedback │            │
│  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘            │
│         │                │                │                     │
│         ▼                ▼                ▼                     │
│  ┌──────────────────────────────────────────────────────┐       │
│  │              COMMON PROMPT INTERFACE                 │       │
│  │  inputs: domain, task, learner_level,               │       │
│  │          risk_tolerance, policy_id                   │       │
│  │  output: structured prompt + config                  │       │
│  └──────────────────────┬───────────────────────────────┘       │
└─────────────────────────┼───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                     INTERACTION LAYER                           │
│  (LLM tutor executes the policy during a tutoring session)      │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                     TRACE LOGGER                                │
│  (identical schema across all policies)                         │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                  EVALUATION HARNESS                             │
│  Same task bank + learner archetypes + scoring rubric           │
│  → learning outcomes, efficiency, behavior metrics              │
└─────────────────────────────────────────────────────────────────┘

5. Policy Catalog & Tiering

Policies are tiered by implementation complexity, not priority. Tier 1 policies are pure prompt interventions. Tier 2 requires state tracking. Tier 3 requires domain modeling.

TIER 1 — Prompt-Based (No external state required)

These run as self-contained system prompts. All logic is encoded in the prompt itself.

Policy 1: Socratic Resistance (SOCRIT)

Theoretical basis: Active withholding; generative learning; productive struggle.

Core behavior: The tutor never provides direct answers or explanations. All responses take the form of questions, redirects, or minimal scaffolds. The student must generate all reasoning explicitly.

Allowed moves: clarifying question; assumption probe (”What does your answer assume?”); alternative hypothesis prompt (”What would someone argue against this?”); localization question (”Where exactly does your reasoning break down?”); one-word confirmation of correct answers (then requires student explanation).

Forbidden moves (before switch criteria are met): direct statement of correct answer; step-by-step explanation; worked example; evaluation of student explanation quality (binary feedback only).

Switch criteria (escalation to hint): student cannot localize confusion after 2 attempts at the localization question; student has made 4+ failed attempts with no evidence of strategy change; explicit distress signal detected.

Switch behavior: One targeted hint at the level of confusion localized. Returns to Socratic mode after hint.

Stop criteria: Student produces correct answer AND can explain it to a non-expert in their own words.

Safety constraints: no answer leakage before switch criteria are met; no infinite localization loops (max 2 attempts); frustration guardrail: if student expresses frustration twice, offer one hint proactively.

Status: ✅ Implemented (HELIX/SOCRIT v1) — needs harness integration

Policy 2: Calibrated Assistance / Hint Ladder

Theoretical basis: Tutored practice (Cognitive Tutor model); interactive/implicit learning; immediate feedback with graded information giving.

Core behavior: The tutor provides immediate correctness feedback after every attempt. When the student is wrong, the tutor offers graded hints in sequence — each one more specific than the last. The final hint (bottom-out) provides the answer or the step needed to get there. The student is never left to flounder.

Allowed moves (in order): (1) immediate yes/no correctness feedback; (2) Level-1 hint: principle or goal statement (”Think about what determines irreversibility here.”); (3) Level-2 hint: narrowed guidance (”Consider the phosphorylation state.”); (4) Level-3 hint (bottom-out): direct next step or answer.

Hint progression rules: each hint level requires a new failed attempt OR explicit hint request; student must acknowledge each hint before the next is offered; bottom-out hint is available but tracked (high bottom-out rate = flag for review).

Switch criteria (to worked example): student reaches bottom-out on 3 consecutive steps without self-correction.

Switch behavior: Provide a full worked example for this step + require student to re-attempt the next similar step independently.

Stop criteria: Student completes step correctly on first or second attempt (no hints requested).

Safety constraints: no skipping hint levels (prevents gaming); track hint-request rate; flag if student requests hints before attempting; bottom-out hint does not trigger mastery credit.

Status: ✅ Specified — needs implementation as prompt template + harness integration

TIER 2 — State-Tracking Required

These policies require persistent state across steps within a session: KC mastery estimates, step history, or fading logic. They cannot be implemented as a single static prompt; they require a session state object passed at each turn.

Integration requirement: A session state manager that tracks at minimum: attempt_history[], hint_level_history[], kc_mastery_estimate{}, steps_completed.

Policy 3: Self-Explanation with Feedback

Theoretical basis: Self-explanation effect (Chi et al., 1989); interactive/explicit learning; feedback on explanation quality, not just answer correctness.

Core behavior: After every student response — correct or incorrect — the tutor requires the student to explain their reasoning. The tutor then evaluates the explanation against a rubric (key principles present, assumptions stated, alternative considered) and provides feedback on the explanation itself, not just the answer.

What this needs beyond a prompt: an explanation quality rubric (domain-specific; must be authored per task); a scoring function (LLM-as-judge with structured output, or human rater); session state to track explanation quality trend over steps.

Switch criteria: If explanation quality score is below threshold for 2 consecutive steps, provide a model explanation and ask student to compare theirs to it.

Status: 🔲 Specification drafted — implementation requires rubric authoring + LLM-as-judge integration

Policy 4: Faded Worked Examples

Theoretical basis: Expertise reversal effect; fading as a transition mechanism from novice to independent solver.

Core behavior: Early in the task, the tutor provides full worked examples with prompts to explain each step. As the student demonstrates competence, steps are progressively removed — converting examples into problems. By the end, the student is solving independently with no scaffolding.

What this needs beyond a prompt: a step-level KC mastery tracker (estimated from explanation quality + correctness); a fading schedule (which steps to remove and when); task decomposition into discrete steps with known KC mappings.

Switch criteria: KC mastery estimate crosses threshold → next example has that step removed.

Status: 🔲 Requires task decomposition + KC mapping + mastery tracker before implementation

Policy 5: Intelligent Novice / Error-Detection

Theoretical basis: Mathan & Koedinger (2005); learning from downstream error consequences; “intelligent novice” model of desired performance.

Core behavior: The tutor allows certain “safe” errors to proceed without immediate flagging, letting the student see the downstream consequences of their reasoning. When the consequences become visible, the student is asked to detect and explain what went wrong. The tutor only intervenes when the error is not self-detectable.

What this needs beyond a prompt: a domain model that classifies errors as “safe” (detectable downstream) vs. “harmful” (not detectable without intervention); a consequence simulation or trace; this is most natural in domains with visible state (spreadsheets, code execution, molecular pathway diagrams).

Status: 🔲 Requires domain model + consequence simulation layer; cancer biology implementation needs a pathway diagram tool or structured reasoning trace

6. Unified Interface Schema

Policy Input (all policies accept this):

{
  "policy_id": "string — socratic | hint_ladder | self_explanation | faded_example | intelligent_novice",
  "domain": "string — e.g. cancer_biology",
  "task_id": "string — references task bank",
  "learner_level": "string — novice | intermediate | advanced",
  "risk_tolerance": "string — low | medium | high (frustration/time constraints)",
  "session_state": {
    "attempt_history": [],
    "hint_level_history": [],
    "kc_mastery": {},
    "steps_completed": 0,
    "flags": {}
  }
}

Trace Log Schema (all policies emit this):

{
  "session_id": "string",
  "policy_id": "string",
  "task_id": "string",
  "step_id": "string",
  "attempt_number": "integer",
  "student_input": "string",
  "correctness": "boolean | null",
  "hint_requested": "boolean",
  "hint_level": "integer — 0 (no hint) to N (bottom-out)",
  "time_since_last_attempt_ms": "integer",
  "confusion_localized": "boolean | null",
  "assumption_stated": "boolean",
  "alternative_stated": "boolean",
  "explanation_quality_score": "float | null — 0.0–1.0; null if not applicable",
  "answer_leaked": "boolean",
  "wheel_spinning_flag": "boolean — true if >k failed attempts without strategy change",
  "switch_triggered": "boolean — true if escalation/mode-change fired",
  "switch_reason": "string | null",
  "tutor_move": "string — question | hint_l1 | hint_l2 | bottom_out | worked_example | confirmation | feedback_on_explanation",
  "timestamp": "ISO 8601"
}

7. Phase Breakdown & Scope of Work

Phase 1: Prompt Implementation — Policies 1 & 2 (Weeks 1–2)

Phase 2: Head-to-Head Pilot — Policies 1 vs. 2 (Week 3)

Phase 3: State-Tracking Infrastructure (Weeks 4–5)

Phase 4: Domain Modeling for Policies 4 & 5 (Weeks 6–8)

8. Risk Register

9. Definition of Done

For Each Tier 1 Policy:

YAML template specifies: allowed moves, forbidden moves, switch criteria, stop criteria, safety constraints
Policy runs in runner.py without modification to harness
Trace logger captures all required fields for this policy
Policy tested against all 4 learner archetypes
README documents: theoretical basis, known limitations, recommended learner level

For Evaluation Harness:

Task bank has ≥15 items with KC labels and difficulty ratings
Scoring rubric has inter-rater reliability ≥0.75 on 20-item calibration set
Comparison report generates automatically from trace logs
Adding a new policy requires only: new YAML file + no harness changes

For Head-to-Head Comparison:

Both policies run under identical model settings (same LLM, same temperature)
Human pilot has ≥10 participants per condition
Report includes: learning outcome scores, time-on-task, wheel-spinning rate, hint abuse rate, explanation quality, transfer performance

10. Appendix

A. Policy Quick Reference

B. Theoretical Mapping to Koedinger & Aleven (2007)

C. Repository Structure

/policies/
  policy_socratic.yaml
  policy_hint_ladder.yaml
  policy_self_explanation.yaml     ← Phase 3
  policy_faded_examples.yaml       ← Phase 4
  policy_intelligent_novice.yaml   ← Phase 4

/harness/
  runner.py
  logger.py
  state_manager.py                 ← Phase 3
  scorer.py                        ← Phase 3

/tasks/
  cancer_biology/
    task_bank.json
    kc_map.json
    rubric.json
    transfer_items.json

/learners/
  archetypes.json

/reports/
  comparison_report_template.ipynb

D. How to Add a New Policy

1. Create /policies/policy_{name}.yaml
2. Define: allowed_moves[], forbidden_moves[], switch_criteria{},
           stop_criteria{}, safety_constraints{}, state_requirements[]
3. If state_requirements is non-empty:
   - Add required fields to state_manager.py
   - Document in policy README
4. Run runner.py --policy {name} --task sample_task --learner novice
5. Verify trace output passes logger schema validation
6. Submit PR with: YAML, README, sample trace output