← Back to Blog

Learning Engineering Toolkit: Evidence-Based Practices from the Learning Sciences, Instructional Design, and Beyond

·34 min read

Part 1: Chapter-by-Chapter Logical Mapping

Introduction: What is Learning Engineering?

Core Claim: Learning engineering is a distinct professional practice combining learning sciences, human-centered design, and data-informed decision-making to create scalable solutions for learning—different from both learning science research and traditional instructional design.

Supporting Evidence:

  • Historical examples (penicillin production scaling, Duolingo’s 500M users, edX’s 33M students)

  • IEEE ICICLE consensus definition developed 2018-2019

  • Documented outcomes from Carnegie Learning, CMU, Kaplan

Logical Method: Analogy-driven reasoning (penicillin: scientific discovery requires engineering to scale, therefore learning science requires engineering to scale). The chapter builds from concrete examples (Fleming’s discovery → Rousseau’s production engineering) to abstract principle (science ≠ engineering, both needed for impact).

Gaps:

  • The penicillin analogy proves necessity of some scaling mechanism, not specifically “engineering” as defined

  • Circular definition risk: Learning engineering is what teams doing learning engineering do

  • No falsification criteria provided—what would prove learning engineering isn’t needed?


Chapter 1: Learning Engineering is a Process

Core Claim: Learning engineering follows a systematic, repeatable process: Challenge → Creation → Implementation → Investigation (iteratively, with context awareness).

Supporting Evidence:

  • Electrostatic Playground VR case study at MIT (23 students, documented aha! moments through spatial recording)

  • Process derived from 2020 ICICLE Design for Learning group consensus

Logical Method: Descriptive modeling. The chapter observes existing practices, abstracts a general process model, then validates through single case study. The model is prescriptive (”should include these steps”) not just descriptive.

Gaps:

  • Single case study (N=1) to validate general process model—insufficient evidence

  • No comparative data: Does this process produce better outcomes than alternatives?

  • “Context circle” is vague—what specific contextual variables must always be considered?

  • No failure modes identified: When does this process not work?

Methodological Soundness: Weak. The process model is asserted based on consensus + one example. The Electrostatic Playground case proves that this process worked once, not that the process generalizes.


Chapter 2: Learning Engineering Applies the Learning Sciences

Core Claim: Learning sciences (neuroscience, cognitive psychology, developmental psychology) provide the theoretical foundation that learning engineering applies at scale.

Supporting Evidence:

  • Established research: Ebbinghaus forgetting curves (1885), Vygotsky’s ZPD, de Groot’s chess master studies, cognitive load theory

  • KLI Framework (Koedinger, Corbett, Perfetti)

  • Brain plasticity research (post-2000s fMRI era)

Logical Method: Literature synthesis. The chapter surveys established findings, organizes them into categories (memory types, expertise development, metacognition), then asserts their relevance to learning engineering without proving application improves outcomes.

Gaps:

  • The measurement-action gap: Knowing that “spaced repetition works” doesn’t automatically translate to “Duolingo’s implementation of spaced repetition works better than alternatives”

  • No discussion of which learning science findings are robust enough for engineering application vs. which require more research

  • The “science provides blueprint, engineering executes” metaphor breaks down: Penicillin’s chemical structure was known; optimal learning conditions are still contested

  • Missing: How to adjudicate between competing learning theories when designing

Methodological Soundness: Moderate. The chapter accurately summarizes established research but doesn’t address:

  • Effect size variations across contexts

  • Failure to replicate in some learning science studies

  • The “file drawer problem” (unpublished negative results)


Chapter 3: Learning Engineering is Human-Centered

Core Claim: Effective learning engineering requires participatory design methods that deeply understand learner variability, engage stakeholders throughout development, and iteratively test with representative users.

Supporting Evidence:

  • Age of Learning’s Math Readiness: 9 test sessions, 50+ children ages 2-3, documented design iterations based on motor skills data

  • Leti Arts KNO game: Participatory design with MSM population in Ghana, documented trust-building through iteration

  • Medic Mobile: Design cards enabled cross-cultural, low-literacy participation

Logical Method: Case-study demonstration. Three detailed examples show how human-centered methods were applied and what design decisions resulted. The claim is that these methods led to better products, evidenced by stakeholder satisfaction and adoption.

Gaps:

  • No control groups: We don’t know if these products would have succeeded without human-centered design

  • Selection bias: All three cases are success stories—where are the failures?

  • Medic Mobile cards: Brilliant workaround for literacy/language barriers, but no data on whether resulting designs were pedagogically superior

  • Age of Learning: Changed from playground theme to simplified foreground after testing, but did the final design actually improve learning outcomes vs. just engagement?

Methodological Soundness: Moderate-High for process documentation, Low for outcome validation. The chapter proves human-centered methods are implementable, not that they’re necessary for success.


Chapter 4: Learning Engineering is Engineering

Core Claim: Learning engineering shares core principles with other engineering disciplines: systematic problem-solving, modular design, control theory, trade-off analysis, and iterative testing against specifications.

Supporting Evidence:

  • Bror Saxberg’s background (MIT EE, Harvard MD, applied at Kaplan/K12/CZI)

  • Piotr Mitros’s edX platform development (3-month prototype → 3000 courses, 33M students)

  • Control theory applied to learning: feedback loops, transfer functions, sensitivity

Logical Method: Analogy + deductive application. The chapter identifies general engineering principles, then shows how they map to learning contexts (cruise control → adaptive instruction, transfer functions → learning theories).

Gaps:

  • False precision: Control theory assumes measurable inputs/outputs with known transfer functions. Learning has no such precision—we can’t reliably predict learning outcomes from instructional “inputs”

  • Missing: Engineering typically has failure tolerances (this bridge must support X tons ± 5%). What are acceptable failure rates in learning engineering?

  • Modular design works for software (LEGO bricks) but learning isn’t modular—skills interdepend in complex, non-linear ways

  • The “faster feedback compensates for poor learning theory” claim is dangerous: It could mask fundamental design flaws

Methodological Soundness: Low. The analogies are instructive but the chapter doesn’t prove learning systems behave like control systems. For example:

  • Cruise control has a single, measurable objective (maintain 65 mph)

  • Learning has multiple, contested objectives (understanding? transfer? retention?)

  • You can’t “test drive” a curriculum the way you test an alternator


Chapter 5: Learning Engineering Uses Data (Part 1): Instrumentation

Core Claim: Effective learning engineering requires purposeful instrumentation—designing data collection while designing learning experiences, not as an afterthought.

Supporting Evidence:

  • xAPI/Caliper standards ecosystem

  • Age of Learning’s iterative data collection lessons (SUCCEED vs. CANCEL confusion, target answer naming debate)

  • Military sensor examples (heart rate monitors for sniper training, communication encoding)

Logical Method: Cautionary tales + standards advocacy. The chapter uses Age of Learning’s mistakes to argue for:

  1. Planning data collection in advance

  2. Maintaining data dictionaries

  3. Using standards (xAPI/Caliper) for interoperability

Gaps:

  • Circular reasoning: “Data are essential” → “therefore instrument everything” → “but not everything” → “so plan carefully” → “but you won’t know what you need until later” → back to “instrument everything”

  • Age of Learning case: The team did eventually get useful data despite initial mistakes—this proves resilience of trial-and-error, not necessity of upfront planning

  • Missing: Cost-benefit analysis of instrumentation. How much data is enough? When does additional instrumentation have diminishing returns?

  • No discussion of data gravity: Once you start collecting, organizational inertia makes it hard to stop even if data proves useless

Methodological Soundness: Moderate. The chapter proves:

  • Standards exist and enable interoperability (true)

  • Poor data labeling causes confusion (true)

  • Teams should maintain data dictionaries (good practice)

But doesn’t prove:

  • Comprehensive upfront planning prevents more failures than it causes delays

  • Standards-based instrumentation produces better learning outcomes than ad-hoc approaches


Chapter 6: Learning Engineering Uses Data (Part 2): Analytics

Core Claim: Data analytics enable evidence-based decision-making that challenges intuitions, reveals unexpected patterns, and drives iterative improvement.

Supporting Evidence:

  • Kaplan LSAT study: 8 worked examples (9 min study) > 15 worked examples (15 min) > 90-minute video

  • Carnegie Learning “data jams”: Multidisciplinary teams interpret analytics collaboratively

  • Ryan Baker’s research: Gaming detection, Scooter the Tutor (effective by data, rejected by users)

  • CMU Discrete Math Primer: Learning curve analysis revealed instruction failures, led to measurable improvements (2016 → 2018 post-test gains)

Logical Method: Experimental validation. Multiple controlled studies with quantitative outcomes. The Kaplan study uses A/B testing (4 conditions, N not specified but implied > 100). The CMU case uses pre/post with learning curve analysis across multiple cohorts.

Gaps:

  • Kaplan study: No sample size reported, no effect size calculations, no discussion of statistical power

  • Scooter the Tutor paradox: Data showed it worked (reduced gaming by 50%, learning gains significant). Users hated it. It failed to scale. This undermines the “data-driven decisions” claim—qualitative data (user satisfaction) trumped quantitative data (learning gains)

  • CMU Discrete Math: Post-test improvements shown, but no control group. Were gains due to learning curve-informed revisions or just natural maturation of the course over 3 years?

  • Missing: Discussion of p-hacking or multiple comparisons problem when running many A/B tests

  • The “data jams” are great for interpretation but the chapter doesn’t prove they lead to better decisions than individual expert analysis

Methodological Soundness: Moderate-High for demonstrating analytics methods exist and can be applied. Low-Moderate for proving analytics-driven decisions produce superior outcomes to alternative decision methods.

Critical Question: If Scooter worked by the data but failed in practice, what does that say about “data-informed decision-making” as the cornerstone of learning engineering?


Chapter 7: Learning Engineering is Ethical

Core Claim: Learning engineering is inherently value-laden; ethical considerations must be integrated throughout the process, not added as afterthought compliance.

Supporting Evidence:

  • Multiple professional codes (APA, IEEE, ACM, AECT)

  • GDPR, UNESCO AI ethics recommendations

  • Medical education examples (body donation protocols, contribution recognition failures)

  • Value-sensitive design, reflective design, design justice frameworks

Logical Method: Normative argument. The chapter catalogs existing ethical frameworks, identifies gaps in current practice (technicians not acknowledged as contributors, learner research consent overlooked), then prescribes the SEEM-ED matrix as a solution.

Gaps:

  • Principle overload: 5 APA principles × 4 learning engineering stages = 20 cells in SEEM-ED matrix. Each cell has 3-7 questions. That’s 60-140 questions per project. No prioritization guidance.

  • Conflicting values: Design justice emphasizes community control; professional codes emphasize expert responsibility. When these conflict, which wins? No adjudication framework.

  • The medical education examples (body donation, technician credit) are failures of ethical practice, not evidence that SEEM-ED would prevent them

  • Missing: Power dynamics. Who decides what’s ethical? The SEEM-ED assumes consensus is achievable, ignores that ethics are contested

Methodological Soundness: Low. This is normative prescription (here’s what you should do) not empirical validation (here’s evidence this approach works). The chapter catalogs frameworks but doesn’t prove:

  • SEEM-ED catches more ethical issues than ad-hoc reflection

  • Projects using SEEM-ED have fewer ethical violations

  • The time cost of SEEM-ED is justified by benefits


Chapter 8: Tools for Understanding the Challenge

Core Claim: Systematic analysis tools (task analysis, five whys, fishbone diagrams, FMEA) help learning engineering teams understand problems before jumping to solutions.

Supporting Evidence:

  • Task analysis template (observable behaviors, cognitive requirements, context)

  • Five whys example: Students failing multiplication word problems → don’t know “product” means multiply

  • Fishbone example: Learners don’t know ¼ > ⅛ → multiple root causes (not introduced to fractions, imperfect mental model, needs practice)

  • FMEA from Prince of Songkla University: Identified communication gaps as top issue for internship failures

Logical Method: Tool demonstration. Each section explains a tool, provides a template, shows an example. The logic is: “These tools work in other engineering domains → here’s how to apply them to learning.”

Gaps:

  • No evidence the tools work in learning contexts. The fishbone example is illustrative, not validated.

  • Five whys risk: Can lead to wrong root cause if you ask why in the wrong direction. Example given (students don’t know “product”) is plausible but unverified—maybe they know the word but can’t apply it under test pressure?

  • FMEA example: The university used the tool and identified communication as the issue. Did fixing communication actually solve the internship problem? No follow-up data.

  • Missing: Guidance on when to use which tool. Task analysis vs. five whys vs. fishbone?

Methodological Soundness: Low. The chapter is a practitioner’s toolkit, not a research validation. It proves the tools exist and can be applied to learning, not that they produce better problem diagnoses than informal methods.


Chapter 9: Tools from the Learning Sciences

Core Claim: Learning sciences concepts can be operationalized as design patterns—reusable solutions to common learning challenges.

Supporting Evidence:

  • Catalog of 20+ concepts (scaffolding, metacognition, spaced learning, etc.)

  • Metacognitive prompting design pattern with detailed implementation steps

  • Cross-references to foundational research (Vygotsky, Kahneman, deliberate practice literature)

Logical Method: Knowledge organization. The chapter functions as a reference guide, organizing learning sciences findings into practitioner-friendly categories with application guidance.

Gaps:

  • Design pattern format inconsistency: Only one full example (metacognitive prompting). Others are brief definitions, not actionable patterns.

  • No prioritization: 20+ concepts, all presented as important. Which matter most? Which have largest effect sizes?

  • Context dependence ignored: “Scaffolding” works differently for 2-year-olds (Math Readiness) vs. college physics (Electrostatic Playground). The chapter doesn’t address this.

  • Missing: Guidance on combining concepts. Does scaffolding + spaced learning produce additive effects? Multiplicative? Interference?

Methodological Soundness: Moderate. This is knowledge synthesis, not empirical validation. The concepts summarized are well-established in learning sciences, but:

  • Effect sizes vary by implementation

  • Some concepts (learning styles) are disputed

  • The chapter doesn’t distinguish robust findings from speculative ones


Chapter 10: Tools for Teaming

Core Claim: Learning engineering requires multidisciplinary teams; effective teaming requires explicit roles, communication norms, and dysfunction mitigation strategies.

Supporting Evidence:

  • Tuckman’s team development stages (forming → storming → norming → performing)

  • DISC communication styles framework

  • Team dysfunction checklist (lack of trust, inequality, poor communication)

Logical Method: Framework application. The chapter imports team science research into learning engineering context through checklists and worksheets.

Gaps:

  • Generic team advice: Nothing specific to learning engineering. This chapter could apply to any project team.

  • No learning engineering team case studies: Unlike other chapters, no concrete examples of teams working through dysfunctions

  • DISC assessment: Popular in corporate training, but personality assessments have mixed empirical support

  • Missing: Power and hierarchy. The chapter assumes teams can just “develop ground rules” and “build trust.” What about teams with power imbalances (professor/grad student, vendor/client)?

Methodological Soundness: Low-Moderate. The frameworks cited (Tuckman, DISC) have research backing in organizational psychology, but:

  • Limited evidence they improve learning engineering team performance

  • Tuckman’s stages are descriptive, not prescriptive (teams go through these phases, but forcing them doesn’t help)

  • The dysfunction checklist is practitioner wisdom, not empirically validated


Chapter 11: Lean-Agile Development Tools

Core Claim: Agile methodologies (Scrum, Kanban) and Lean principles (eliminate waste, continuous improvement) can be adapted to learning engineering for iterative development and stakeholder collaboration.

Supporting Evidence:

  • Duolingo example: Started with 2-week sprints, now adapts Scrum based on project phase

  • Agile Manifesto principles mapped to learning engineering

  • Kanban board adapted to learning engineering process stages

Logical Method: Framework translation. The chapter takes software engineering practices (Scrum, Kanban) and shows how to apply them to learning engineering through examples and templates.

Gaps:

  • Duolingo anecdote is weak evidence: Burr Settles says they use Agile, but no data on whether Agile improved development speed, product quality, or learning outcomes

  • Cargo cult risk: Adopting Agile rituals (standups, sprints, retrospectives) without understanding why they work in software development

  • Lean’s “eliminate waste” assumes you can identify waste. In learning, is a “failed” experiment waste or valuable learning?

  • Missing: When Agile doesn’t fit. Research-heavy projects (like the knowledge state modeling Burr mentioned) don’t fit 2-week sprint model. The chapter acknowledges this but doesn’t provide alternative frameworks.

Methodological Soundness: Low. This is framework advocacy, not empirical validation. The chapter proves:

  • Learning engineering teams can use Scrum/Kanban (existence proof)

But doesn’t prove:

  • Agile teams produce better learning products than Waterfall teams

  • Lean principles reduce development costs or time-to-market

  • Retrospectives lead to meaningful process improvements


Chapter 12: Human-Centered Design Tools

Core Claim: Specific tools (personas, stakeholder maps, card sorting, wireframes, conjecture mapping, SUS surveys) make human-centered design concrete and actionable.

Supporting Evidence:

  • Nielsen’s “test with 5 users” guideline (diminishing returns after 5)

  • System Usability Scale (SUS): 10-item questionnaire, score >68 = above average, >80.3 = top 10%

  • Logic models from IES toolkit

  • Age of Learning Lily persona example (Chapter 3 callback)

Logical Method: Toolkit compilation. Each section introduces a tool, explains its purpose, provides a template or example. The implicit argument is: “These tools work in UX design → they should work in learning engineering.”

Gaps:

  • Tool proliferation: 15+ tools presented. No guidance on which to use when, or minimum viable toolkit.

  • Personas risk: Can reinforce stereotypes or designer biases if not based on real user research. The Lily persona (Chapter 3) was data-driven, but what about less rigorous personas?

  • SUS limitations: Measures perceived usability, not learning effectiveness. A system can score 90 on SUS but produce zero learning.

  • Conjecture mapping: Elegant in theory, but the example (Figure 12.7) is a toy model. Real projects would have 50+ arrows. How do you manage that complexity?

  • Missing: Negative cases. When do these tools fail or mislead?

Methodological Soundness: Moderate. These are established UX/design thinking tools with research backing in their original domains. Evidence for their effectiveness in learning engineering contexts is limited:

  • Nielsen’s 5-user rule is for usability testing, not learning evaluation

  • SUS correlates with user satisfaction, not learning outcomes

  • No studies showing conjecture mapping improves learning product quality


Chapter 13: Data Instrumentation Tools

Core Claim: Learning engineering requires intentional instrumentation design; use standards (xAPI, Caliper) to enable interoperability and reduce development costs.

Supporting Evidence:

  • xAPI example: 4 lines of JavaScript to send learning statement to LRS

  • Standards landscape (xAPI, Caliper, SCORM, IMS standards)

  • InnovateEDU Google Classroom Connector (free, open-source)

  • ASSISTments + E-TRIALS platform for education research

Logical Method: Demonstration + advocacy. The chapter shows how to instrument (code examples) and argues why (interoperability, reduced costs, data quality).

Gaps:

  • Standards complexity understated: The 4-line xAPI example hides enormous complexity in authentication, LRS setup, profile selection

  • Interoperability promise vs. reality: Standards enable technical interoperability, not semantic interoperability. Two systems can both use xAPI but define “mastery” differently.

  • Vendor lock-in: Some LMS vendors implement standards partially or with proprietary extensions, defeating interoperability

  • Missing: Cost-benefit of standards adoption. Learning a standard, staying current with updates, maintaining compliance—what’s the ROI?

Methodological Soundness: Low-Moderate. The chapter proves:

  • Standards exist and are implementable (true)

  • Simple instrumentation is achievable with low technical skill (the 4-line example proves this)

But doesn’t prove:

  • Standards-based systems produce better learning analytics than proprietary approaches

  • Interoperability delivers value in practice (many organizations don’t actually exchange data even when technically possible)


Chapter 14: Software and Technology Standards as Tools

Core Claim: Learning engineering teams must understand modern software architectures (microservices, REST APIs, cloud computing, IoT) and standards (JSON-LD, linked data) to build scalable, interoperable solutions.

Supporting Evidence:

  • Technology landscape overview (26 standards listed in Figure 14.3)

  • Modular architecture advocacy (microservices > monoliths)

  • Open edX example (Chapter 4 callback): Modular design enabled 3000 courses, 33M students

  • Autodex 3.0 fictional case (Chapter 19 callback): Demonstrates linked data for 3E records

Logical Method: Architectural guidance. The chapter is a technology roadmap, explaining concepts and advocating best practices (use standards, embrace modularity, leverage cloud).

Gaps:

  • Overwhelming breadth: 26 standards, 10+ technology concepts, no prioritization. Where does a learning engineering team start?

  • Outdated by publication: Technology chapters age poorly. Some cited standards (SCORM, LOM) are effectively deprecated. No guidance on how to evaluate emerging vs. mature standards.

  • Fictional evidence: The Autodex 3.0 case (Chapter 19) is speculative fiction. It shows what could be, not what is.

  • Missing: Real-world messiness. The chapter assumes standards are implemented correctly, APIs are documented, cloud providers are reliable. In practice: breaking changes, vendor outages, incomplete documentation.

Methodological Soundness: N/A. This is an engineering reference guide, not a research claim. It’s useful as a technology survey but doesn’t argue for specific approaches based on evidence.


Chapter 15: Tools for Learner Motivation

Core Claim: Learner motivation is essential to learning engineering success; teams should systematically apply motivating operations across multiple frameworks (autonomy, competence, value, meaning, avoidance, unpredictability, scarcity).

Supporting Evidence:

  • Four motivation frameworks crosswalked (Friman, Chou, Pink, Yuhas)

  • Gamification principles (from Morford et al., 2014)

  • Multi-level motivation loops (activity → lesson → credential → career)

Logical Method: Framework synthesis. The chapter integrates multiple motivation theories into a unified toolkit, providing a worksheet (Figure 15.1) for applying motivating operations.

Gaps:

  • Framework proliferation without adjudication: Pink says autonomy/mastery/purpose. Chou says 8 core drives. Friman says 6 sources. Which is right? The chapter says “they overlap” but doesn’t reconcile differences.

  • Motivating operations are underspecified: “Allow learners to select topic” (autonomy) vs. “Use leaderboards” (social value) vs. “Create scarcity” (limited dress-down days). These could conflict—no guidance on trade-offs.

  • Gamification evidence is weak: Morford et al. (2014) is a theoretical paper, not empirical validation. The chapter cites principles from that paper but doesn’t cite studies proving gamification improves learning.

  • Missing: Individual differences. Some learners are motivated by autonomy, others by structure. Some like leaderboards, others find them demotivating. How do you personalize?

Methodological Soundness: Low. This chapter is a practitioner’s guide to motivation theory, but:

  • The crosswalked frameworks (Figure 15.5) prove the frameworks overlap, not that they work

  • No effect sizes for different motivating operations

  • The “data-driven motivating operations” section (end of chapter) is 2 paragraphs of speculation, not a validated method


Chapter 16: Implementation Tools

Core Claim: Implementation planning is a distinct phase requiring systematic consideration of 11 domains (policies, budget, resources, leadership, team, technology, operationalization, instrumentation, investigation, ethics, scale-up).

Supporting Evidence:

  • Zambia e-learning case study: Health worker training via tablets, blended learning, 10,000+ workers by 2021

  • Implementation checklist (Figure 16.1) with 11 domains, ~40 questions total

Logical Method: Checklist-driven planning. The Zambia case demonstrates how implementation considerations were addressed across the 11 domains.

Gaps:

  • Checklist fallacy: Having a checklist doesn’t guarantee good implementation. TSA uses checklists, still misses 95% of threats in testing.

  • Zambia case lacks counterfactual: The e-learning program worked, but we don’t know if addressing all 11 domains was necessary. Maybe 6 domains would have sufficed?

  • No prioritization: All 11 domains presented as equally important. In reality, budget constraints force trade-offs.

  • Missing: Failure modes. What happens when you can’t address all 11 domains? Which are must-haves vs. nice-to-haves?

Methodological Soundness: Low-Moderate. The checklist is comprehensive (probably too comprehensive), and the Zambia case proves it’s possible to address all 11 domains. But:

  • No comparison data (projects using checklist vs. projects without)

  • No evidence that more thorough implementation planning correlates with better learning outcomes

  • The “investigation during implementation” domain creates a recursive problem: You need data to know if implementation is working, but to know what data to collect, you need to understand the implementation...


Chapter 17: Ethical Decision-Making Tools

Core Claim: The SEEM-ED (Sense-making Ethical Evaluation Matrix for Ethical Design) matrix provides a systematic framework for integrating ethics into learning engineering.

Supporting Evidence:

  • APA’s 5 ethical principles applied to learning engineering

  • 25-question matrix (5 principles × 5 question types)

  • Emphasis on stakeholder ratings, qualitative responses, iterative refinement

Logical Method: Principled tool design. The chapter takes established ethical principles (APA, not learning-engineering-specific) and operationalizes them into a questionnaire.

Gaps:

  • Same problems as Chapter 7: 25 questions per project evaluation, no prioritization, no conflict resolution guidance

  • Self-assessment bias: The tool asks design teams to rate themselves. Research shows self-assessments are unreliable (Dunning-Kruger effect).

  • “Yes/Maybe/No” scale is too coarse: Ethical questions are rarely binary. A 5-point scale is suggested but not required.

  • No validation: Has SEEM-ED been used on real projects? Did it catch ethical issues that would have been missed otherwise?

  • Missing: Red lines. The tool is all about trade-offs and nuance. Are there any non-negotiable ethical requirements?

Methodological Soundness: Low. SEEM-ED is a proposed tool, not a validated instrument. The chapter provides:

  • Face validity (the questions seem relevant)

  • Content validity (they’re derived from established ethical principles)

But lacks:

  • Criterion validity (do SEEM-ED scores predict ethical outcomes?)

  • Inter-rater reliability (do different evaluators agree on ratings?)

  • Empirical evidence the tool works


Chapter 18: Data Analysis Tools

Core Claim: Learning analytics follows a systematic process: Question → Data Check → Analysis Type (Predict/Infer/Mine) → Model → Validate → Visualize.

Supporting Evidence:

  • Learning Analytics Process Model (Figure 18.1) integrating Ryan Baker’s MOOT concepts

  • OpenSimon Toolkit (DataShop, LearnSphere) with 3000+ data sets

  • Learning curve analysis example from OLI: Identified problematic assessment item, team added “identify mediator” as separate skill, curves improved in next iteration

Logical Method: Process modeling + case demonstration. The process model organizes analytics methods hierarchically (predict vs. infer vs. mine, then sub-methods). The OLI case shows the process in action.

Gaps:

  • OLI case lacks control: Adding “identify mediator” as separate skill improved learning curves, but was that the only change? No control group, no confound analysis.

  • Model validation guidance is thin: Lists validity types (generalizability, ecological, construct) but doesn’t explain how to test them

  • “Feature mining informs modeling” loop: Figure 18.1 shows circular dependency between mining and modeling, but no guidance on how to break into that loop

  • Missing: Negative results. The chapter shows analytics finding problems (OLI curves, Kaplan study). Where are cases where analytics didn’t help or led to wrong conclusions?

Methodological Soundness: Moderate. The process model is logically coherent and maps to established analytics methods. The OLI case demonstrates feasibility. But:

  • No comparison: Does this process produce better analytics than ad-hoc approaches?

  • No failure analysis: When does the process break down?

  • The model assumes linear flow (question → analysis → conclusion) but real analytics are messier (iteration, dead ends, conflicting signals)


Part 2: Bridge Section

Argumentative Architecture

The Learning Engineering Toolkit constructs its argument through:

  1. Definition by consensus (Introduction, Ch. 1): IEEE ICICLE group developed definition through multidisciplinary collaboration

  2. Historical analogy (Introduction): Penicillin required both science (Fleming) and engineering (Rousseau) to scale

  3. Case study accumulation (Chs. 1-6): Math Readiness, KNO game, Electrostatic Playground, edX, Kaplan LSAT, CMU Discrete Math, etc.

  4. Framework synthesis (Chs. 7-18): Tools chapters integrate existing frameworks (UX design, Agile, ethics codes, analytics methods) into learning engineering practice

The structure is:

  • Foundations (Chs. 1-7): Define learning engineering, establish legitimacy

  • Tools (Chs. 8-18): Provide practitioner’s toolkit

  • Vision (Ch. 19): Speculative future scenario

Patterns in Argumentation Across Chapters

Recurring moves:

  1. Assert principle (”Learning engineering requires X”)

  2. Cite existing research/framework from origin discipline

  3. Provide learning engineering case study

  4. Extract lessons/tools for practitioners

  5. (Sometimes) Acknowledge limitations

Strengths:

  • Rich case studies ground abstract principles in concrete practice

  • Multidisciplinary approach (cognitive science + HCI + software engineering + data science + ethics)

  • Practitioner focus (templates, checklists, code examples)

Weaknesses:

  • Survivorship bias: All case studies are successes. Where are the failures?

  • Evidence quality varies wildly:

    • Strong: Kaplan LSAT experiment, CMU learning curves with data

    • Moderate: Duolingo/edX examples with usage stats but no control groups

    • Weak: Electrostatic Playground (N=23, no control), many chapters cite single cases

  • Circular validation: Learning engineering is defined by what learning engineering teams do. The case studies validate that these teams exist and do these things, not that the approach produces superior outcomes to alternatives.

Contradictions and Tensions

Data-driven decision-making paradox (Chs. 5-6):

  • Chapter 6: Scooter the Tutor worked by quantitative data (reduced gaming 50%, learning gains significant) but failed by qualitative data (users hated it, teachers rejected it)

  • Implication: “Data-informed” isn’t sufficient. The chapter resolves this by saying “consider multiple data types” but doesn’t provide a framework for when qualitative trumps quantitative.

Upfront planning vs. iteration (Chs. 5, 11, 13):

  • Chapter 5: “Plan data collection in advance, use standards, maintain data dictionaries”

  • Chapter 11: “Agile = embrace change, iterate rapidly, don’t over-plan”

  • Tension: How much upfront design is enough? Age of Learning case shows team made data collection mistakes but eventually succeeded through iteration. This suggests resilience matters more than perfect planning.

Frameworks proliferation (Chs. 7, 9, 10, 12, 15):

  • Chapter 7: 5 APA principles + SEEM-ED matrix (60-140 questions)

  • Chapter 9: 20+ learning sciences concepts

  • Chapter 10: DISC, Tuckman, dysfunction checklist

  • Chapter 12: 15+ design tools

  • Chapter 15: 4 motivation frameworks, 7 motivational variables

  • Implication: Cognitive overload for practitioners. No guidance on which frameworks to prioritize or how to integrate them. Risk of “checklist compliance” without understanding.

Standards advocacy tension (Chs. 13-14):

  • Chapters advocate xAPI, Caliper, JSON-LD, linked data

  • But also acknowledge: standards evolve, some are deprecated, vendor implementations vary

  • Unresolved: How do you balance standards adoption (interoperability benefits) with flexibility (don’t get locked into dying standard)?


Part 3: Full Rigorous Literary Review

Opening: The Central Claim and Its Logical Foundation

The Learning Engineering Toolkit argues for the emergence of a distinct profession—learning engineering—defined as “a process and practice that applies the learning sciences using human-centered engineering design methodologies and data-informed decision-making to support learners and their development.” The case rests on three premises:

  1. Disciplinary necessity: Learning science discoveries remain unapplied at scale (true but unquantified)

  2. Engineering as scaling mechanism: Just as penicillin required engineering to reach mass production, learning requires engineering to reach mass impact (analogy, not proof)

  3. Emergence proof: Learning engineering teams exist and achieve measurable results (demonstrated through case studies)

The logical chain: Science alone ≠ impact. Engineering transforms science into scalable solutions. Therefore, learning needs engineering.

The problem: This syllogism proves that some scaling mechanism is needed, not that “learning engineering” (as distinctly defined in this book) is that mechanism. Instructional design, educational technology development, and school reform movements have been scaling learning interventions for decades. The book must prove learning engineering is different and better, not just that it’s possible.

Methodology Examination: Can These Methods Bear the Weight of the Claims?

The Toolkit employs four evidentiary strategies:

1. Case Study Accumulation

  • Volume: 15+ detailed cases across 400 pages

  • Diversity: Geographic (Finland, Ghana, Rwanda, US, Singapore), scale (23 students to 33M), domain (K-12 to workforce), technology (VR to donkey carts)

  • Quality: Ranges from rigorous (Kaplan LSAT with experimental controls) to anecdotal (Duolingo uses Agile, no outcome data)

Assessment: Case studies prove existence (learning engineering teams exist and complete projects) but not superiority (these approaches produce better outcomes than alternatives). The missing counterfactual is fatal. We need: “Team A used learning engineering → outcome X. Team B used traditional approach → outcome X - 20%.”

2. Framework Integration The book synthesizes frameworks from:

  • Learning sciences (KLI, cognitive load, spaced repetition)

  • HCI/UX (personas, wireframes, user testing)

  • Software engineering (Agile, modular design, control theory)

  • Ethics (APA principles, design justice, value-sensitive design)

  • Analytics (prediction, inference, mining)

Assessment: This synthesis is the book’s major contribution—no other resource integrates this breadth. However:

  • Integration is additive (here are tools from each field) not generative (here’s a new framework that transcends the component parts)

  • Many frameworks contradict: Design justice (community control) vs. engineering (expert-driven), Agile (embrace change) vs. instrumentation (plan data collection)

  • No guidance on resolving framework conflicts

3. Definitional Consensus The IEEE ICICLE definition emerged through multidisciplinary dialogue (learning scientists, instructional designers, software engineers, psychometricians). This lends legitimacy.

Assessment: Consensus definitions establish what learning engineering is (scope, key concepts) but not whether it works (efficacy). The definition is circular: Learning engineering teams are those who “apply learning sciences using engineering methodologies and data-informed decisions.” But we can only identify such teams by observing their practices. There’s no independent validation criterion.

4. Tool Provision Chapters 8-18 provide 100+ pages of templates, checklists, code examples, worksheets.

Assessment: Highly practical, immediately useful. This is the book’s greatest strength for practitioners. However, tools aren’t evidence. Providing a fishbone diagram template (Ch. 8) doesn’t prove fishbone analysis improves learning engineering outcomes.

The Central Paradox: Scooter the Tutor

Chapter 6 presents the most important case in the book—not for what it proves about learning engineering, but for what it reveals about its limitations.

The data:

  • Intervention: Scooter the Tutor (puppy character) monitors students, provides encouragement/penalties based on behavior

  • Quantitative outcome: Gaming reduced 50%, learning gains significant (students using system improperly caught up to non-gaming students)

  • Qualitative outcome: Students who gamed hated Scooter, said it hurt their learning. Teachers rejected it. Deployment failed.

Baker’s conclusion: “It failed because we didn’t consider the other dimensions of our solution and its adoption.”

The problem: This case undermines the book’s core claim. Scooter was developed using learning engineering principles:

  1. ✓ Applied learning sciences (behavioral interventions, feedback timing)

  2. ✓ Used data-informed decisions (log analysis to detect gaming)

  3. ✓ Showed measurable learning improvement

Yet it failed to scale. Why? The quantitative data optimization (reduce gaming → improve learning) conflicted with qualitative factors (user satisfaction, teacher buy-in, school culture).

What this reveals: Data-driven decision-making is necessary but not sufficient. The book acknowledges this (Baker says “we didn’t consider other dimensions”) but doesn’t revise its definition or process model to account for it. If learning engineering can follow its own principles and still fail, what does that say about the principles?

What the Data Show vs. What the Authors Claim

Data show:

  • Learning engineering teams exist (proven by case studies)

  • Some use systematic processes (edX, Duolingo, CMU documented)

  • Some achieve measurable outcomes (Kaplan LSAT effect, CMU post-test gains)

  • Multiple existing frameworks can be integrated (the book does this synthesis)

Data do NOT show:

  • Learning engineering produces better outcomes than alternatives (no control groups)

  • The IEEE ICICLE definition identifies the right scope (vs. too narrow or too broad)

  • The Chapter 1 process model is optimal (vs. other possible processes)

  • The tools in Chs. 8-18 improve practice (tools are provided, effectiveness not validated)

Authors claim:

  • “Learning engineering is an emerging field that will become standard” (Introduction)

  • “Great advancements in learning can come from multidisciplinary approach” (repeated throughout)

  • “Data-informed decision-making is essential” (Chs. 5-6)

  • Tools chapters implicitly claim the tools work (by providing them without caveats)

The gap: Authors present learning engineering as established best practice backed by evidence. The evidence actually shows it’s an implementable approach with some successes, not a validated methodology with proven superiority.

Where the Reasoning Is Strong

1. The need for scaling mechanisms (Introduction, Ch. 4) The penicillin analogy is imperfect but directionally correct: Scientific discoveries don’t automatically reach populations who need them. Some translation mechanism is required. Whether that’s “engineering” specifically or some other approach (implementation science, diffusion of innovations, knowledge mobilization) is debatable, but the core need is real.

2. Data instrumentation practicality (Ch. 5, 13) The 4-line xAPI example proves that basic instrumentation is achievable without deep technical expertise. The Age of Learning cautionary tales (SUCCEED vs. CANCEL, naming debates) are valuable practitioner wisdom. The lesson—maintain data dictionaries, use standards, plan but accept iteration—is sound even if not empirically validated.

3. Human-centered design necessity (Ch. 3) The Math Readiness case (50+ children tested, 9 iterations, persona-driven design) demonstrates rigorous participatory design. The argument isn’t that human-centered design is sufficient (it’s not—you also need valid pedagogy, working technology, implementation support). The argument is it’s necessary. This is supported by:

  • User testing caught problems (playground background too distracting, helping hand confusing)

  • Persona prevented design-for-self bias

  • The iterative approach revealed edge cases (2-year-old vs. 3-year-old differences)

4. Learning sciences as foundation (Ch. 2, 9) The chapter accurately summarizes established findings (working vs. long-term memory, expertise development, spacing effects, metacognition). The KLI framework provides a coherent integration. The claim isn’t that learning engineering discovered these concepts; it’s that learning engineering applies them. This is defensible.

Where the Reasoning Fails

1. Circular definition problem

  • Learning engineering is defined by its process (Ch. 1)

  • The process is defined by what learning engineering teams do (case studies)

  • But learning engineering teams are identified by whether they follow the process

  • Result: Definitional loop with no external validation

2. The missing control group Almost no case study includes a control condition. Examples:

  • Math Readiness: Tested design iterations with children, but no comparison to Math Readiness developed without human-centered design

  • CMU Discrete Math: Post-test scores improved 2016 → 2018 after learning curve-informed revisions, but no control group taking the original course in 2018

  • edX: 33M students is impressive scale, but no comparison to alternative platforms (Coursera, Khan Academy) using different design principles

Why this matters: The book argues learning engineering is better than alternatives. Without controls, we only know it’s possible, not superior.

3. The standards interoperability promise Chapters 13-14 advocate for xAPI, Caliper, JSON-LD, linked data as enabling interoperability. The claim: “Standards enable plug-and-play systems, reduce costs, accelerate innovation.”

The reality:

  • Technical interoperability ≠ semantic interoperability. Two systems can both use xAPI but define “mastery” differently.

  • Standards proliferation: Figure 14.3 lists 26+ standards. This increases complexity, not reduces it.

  • Vendor lock-in persists: Commercial LMS vendors implement standards partially or with proprietary extensions

  • No ROI data: The chapters don’t provide evidence that standards-based systems cost less or perform better than proprietary ones

4. Tool validation gap Chapters 8-18 (the Tools section) provide checklists, templates, worksheets, code examples. These are useful, but:

  • Fishbone diagrams (Ch. 8): Borrowed from manufacturing, no evidence they work better than informal problem analysis in learning contexts

  • SEEM-ED matrix (Ch. 17): 60-140 questions, no validation data, no prioritization

  • Implementation checklist (Ch. 16): 11 domains, 40+ questions, based on single case study (Zambia)

  • Motivation worksheet (Ch. 15): Crosswalks 4 frameworks but doesn’t adjudicate between them

The missing research: “Teams using Tool X produced outcomes Y% better than teams without, p < 0.05.”

Complications: Where Theory Meets Practice

The Scooter Paradox (detailed above): Quantitative success + qualitative failure = deployment failure. This exposes a fault line in “data-informed decision-making.” The book’s response is weak: “Consider multiple data types.” But how? When qual and quant conflict, which wins?

The Agile-Planning Tension (Chs. 5, 11, 13):

  • Instrumentation chapters: Plan data collection, use standards, maintain dictionaries

  • Agile chapter: Embrace change, iterate rapidly, don’t over-document

  • These aren’t compatible without trade-offs, but book doesn’t specify how to balance

The Framework Proliferation Problem (Chs. 7-18): Counting conservatively:

  • 5 human-centered design approaches (Ch. 3)

  • 20+ learning sciences concepts (Ch. 9)

  • 6+ team formation frameworks (Ch. 10)

  • 15+ design tools (Ch. 12)

  • 4 motivation frameworks (Ch. 15)

  • 11 implementation domains (Ch. 16)

  • 25 ethics questions (Ch. 17)

Total: ~80 frameworks/tools/checklists. Even if each is individually valid, the combination is overwhelming. The book provides no:

  • Prioritization: Which frameworks matter most?

  • Integration: How do frameworks interact?

  • Simplification: What’s the minimum viable toolkit?

Practitioner response: Likely to cherry-pick favorite tools, defeating the systematic approach the book advocates.

Broader Implications: What This Work Reveals About Its Problem Space

1. Legitimacy seeking The Toolkit reads like a bid for professional recognition. Evidence:

  • Extensive citations of engineering codes, IEEE standards, professional associations

  • Comparison to established professions (civil engineering, medicine, software engineering)

  • Emphasis on “learning engineer” as distinct job title

  • Closing quote (Ch. 4): 1961 definition of professional engineer

Why this matters: The book conflates existence with necessity. Learning engineering exists (proven). Is it necessary (unproven)? Could excellent learning products be developed without formal learning engineering (probably yes—see: Montessori, Reggio Emilia, successful MOOCs pre-dating “learning engineering” term)?

2. The interdisciplinarity challenge Learning engineering’s strength (integrates multiple disciplines) is also its weakness:

  • Breadth: No single person can master all components (acknowledged in Ch. 6 Educause quote)

  • Depth: Generalist learning engineers risk being “jack of all trades, master of none”

  • Communication: Each discipline has its own language. The book provides translation (good) but doesn’t address power dynamics when disciplines conflict

Example: Data scientist says “the model predicts 87% accuracy.” Learning scientist says “but the pedagogy violates spacing effect principles.” Software engineer says “implementing that would delay launch 6 months.” Who decides? The book doesn’t say.

3. The measurement problem Learning engineering assumes learning is measurable with sufficient precision to drive optimization. But:

  • Construct validity: Does test performance = learning? (Contested)

  • Transfer: Does performance on training tasks = performance on real-world tasks? (Often no)

  • Time scale: Immediate learning ≠ retention ≠ transfer ≠ expert development

The book acknowledges these issues (Ch. 2: multiple memory types, Ch. 9: transfer is hard) but doesn’t address how to engineer for unmeasurable or poorly-measurable outcomes.

Example: Duolingo optimizes for “probability you remember this word tomorrow.” But language fluency (the real goal) isn’t just word recall—it’s syntax, pragmatics, cultural context, real-time generation. The book doesn’t address the gap between optimizable proxies and ultimate goals.

Assessment of Contribution and Limitations

What this book accomplishes:

  1. Definitional clarity: Establishes working definition of learning engineering through IEEE ICICLE consensus, distinguishes it from adjacent fields (instructional design, learning sciences, EdTech development)

  2. Framework synthesis: Integrates learning sciences, HCI, software engineering, ethics, data science into coherent (if overwhelming) toolkit. No other resource does this.

  3. Practitioner focus: Templates, checklists, code examples make abstract principles actionable. The tool chapters (8-18) are immediately useful.

  4. Existence proof: Case studies demonstrate learning engineering can be done across diverse contexts, scales, technologies.

  5. Advocacy: Makes compelling case that learning science findings are underutilized, that systematic approaches could improve learning at scale, that data-driven iteration is valuable.

What this book does NOT accomplish:

  1. Efficacy proof: Doesn’t demonstrate learning engineering produces superior outcomes to alternatives. No controlled comparisons, no meta-analyses of learning engineering vs. non-learning engineering approaches.

  2. Falsification: Doesn’t specify what would disprove learning engineering’s value. What outcomes would make us reject the approach?

  3. Boundary conditions: Doesn’t identify when learning engineering is inappropriate. Are there learning contexts where systematic engineering is overkill? The donkey cart example (Introduction) suggests low-tech solutions can work—but that’s framed as “still learning engineering” (circular).

  4. Cost-benefit: Doesn’t quantify the costs of learning engineering (team coordination overhead, tool learning curves, framework compliance) vs. benefits.

  5. Conflict resolution: Provides many frameworks but no meta-framework for choosing between them when they conflict.

  6. Negative cases: Almost no discussion of failures, limitations, or contexts where learning engineering has been tried and didn’t work.

Closing: What We’ve Proven and What Remains Open

Proven:

  • Multidisciplinary teams addressing learning challenges exist and self-identify as learning engineers

  • These teams use systematic processes combining learning sciences, design methods, and data analytics

  • Some produce measurable outcomes (edX scale, Kaplan effect sizes, CMU learning gains)

  • Practitioners need integrated resources spanning learning sciences + engineering + design + ethics + analytics

Plausible but unproven:

  • Learning engineering produces better outcomes than traditional instructional design

  • The IEEE ICICLE definition captures the right scope and boundaries

  • The Chapter 1 process model is optimal for all (or most) learning engineering challenges

  • The tools in Chapters 8-18 improve practice vs. ad-hoc methods

  • Data-driven decision-making consistently leads to better learning products

Open questions:

  • What’s the minimum viable learning engineering practice? (Can you do 20% of the process and get 80% of the value?)

  • When does learning engineering create more overhead than value?

  • How do we adjudicate between competing frameworks when they conflict?

  • What outcomes would make us reject learning engineering as an approach?

  • Is “learning engineer” a distinct profession or a mindset any educator/designer can adopt?

The tension this book doesn’t resolve:

Learning engineering is presented as both:

  • Descriptive (this is what learning engineering teams do)

  • Prescriptive (this is what you should do to succeed)

But the book doesn’t prove the teams doing “learning engineering” succeed because of their systematic approach vs. despite it, or orthogonal to it. Duolingo could be successful because:

  1. They use learning engineering (the book’s claim)

  2. They have talented employees, good funding, and a compelling product (confounds)

  3. The mobile language learning market was underserved (timing)

The book doesn’t isolate learning engineering as the causal factor.


Final Verdict

Strongest Sections:

  • Chapter 3 (Math Readiness, KNO game): Detailed participatory design case studies with documented iterations

  • Chapter 5 (Age of Learning data lessons): Honest autopsy of instrumentation mistakes, practical wisdom

  • Chapter 2 (learning sciences foundations): Accurate synthesis of cognitive science research

  • Chapters 13-14 (instrumentation and software tools): Concrete technical guidance with code examples

Weakest Sections:

  • Chapter 1 (process model): Asserted based on consensus + one case study

  • Chapter 7/17 (ethics): Framework proliferation without prioritization or validation

  • Chapter 10 (teaming): Generic team advice, not learning engineering-specific

  • Chapter 15 (motivation): Four frameworks crosswalked but not integrated or tested

The Book’s Unstated Assumption: Systematic > ad-hoc. The entire edifice assumes that formalized, tool-heavy, framework-driven approaches produce better outcomes than talented practitioners using informal methods. This is plausible (aviation safety improved through checklists, software quality improved through Agile) but unproven for learning engineering.

What Would Strengthen the Argument:

  1. Controlled studies: Learning engineering team vs. traditional instructional design team addressing same challenge

  2. Failure analysis: Cases where learning engineering was tried and failed, with lessons learned

  3. Longitudinal data: Do learning engineering products maintain advantages over time?

  4. Cost-effectiveness: ROI calculations for learning engineering investment

  5. Minimal viable practice: What’s the 80/20 of learning engineering?

Verdict: The Learning Engineering Toolkit succeeds as a practitioner’s handbook and field-defining document. It establishes what learning engineering is, who does it, and what tools they use. It fails as an evidence-based argument for learning engineering’s superiority. The book proves possibility, not necessity. It shows existence, not optimal

ity. It provides tools, not validation.

The field needs what Bror Saxberg called for in the Preface: “expect to iterate... celebrate success, but instrument for failure!” The next edition should instrument its own failures.


Tags: learning engineering methodology, IEEE ICICLE standards, educational technology framework, data-informed instructional design, evidence-based pedagogy

Nik Bear Brown Poet and Songwriter