5.2: Evaluating Output | AI-Powered E-Learning

The Problem

AI just generated a learning objective. Or a scenario. Or an entire module.

Question: Is it good?

You have 60 seconds to decide: ship it, fix it, or scrap it.

This matters. Accept bad output and you waste time polishing garbage. Reject good output and you waste time regenerating what was fine.

You need a fast, repeatable process for evaluation.

The Five Quality Criteria

Every AI output gets evaluated on these five dimensions:

✓ Five quality criteria with priority levels ▼

Criterion	Question	Priority
Accuracy	Is the information factually correct?	Critical
Relevance	Does it fit your specific context/audience?	Critical
Completeness	Does it cover what's needed without gaps?	Important
Clarity	Is it clear and understandable?	Important
Tone	Does it match our organizational voice?	Nice-to-have

Decision logic:

Accuracy or Relevance fails → Reject immediately
Completeness or Clarity fails → Refine with follow-up prompts
Tone fails → Quick manual edit

Critical issues get rejected. Fixable issues get refined.

Red Flags: Reject Immediately

Some output isn't worth fixing. Recognize these and start over.

🚩 Five red flags: When to reject output ▼

Red flag	Example	Why reject
Factual errors	Compliance training misrepresents a regulation	Can't build on a wrong foundation
Wrong audience level	Asked for beginner, got expert-level technical	Easier to regenerate than simplify
Generic and vague	"Employees should follow proper procedures"	No value—could apply to anyone
Misunderstood request	Asked for scenario, got case study	Didn't do what you asked
Wrong tone	Casual for compliance; jargon for frontline	Tone mismatch undermines credibility

💡 The rule

If the foundation is wrong, rebuild. Don't polish garbage.

Green Flags: Refine and Use

These outputs are worth improving.

✅ Four green flags: Worth refining ▼

Green flag	Example	Fix
Accurate but incomplete	Right action verb, missing context	"Add context about the manufacturing floor"
Relevant but unclear	Realistic scenario, confusing structure	"Simplify. Use shorter sentences."
Complete but generic	Correct questions, could apply anywhere	"Make specific to our retail environment"
Clear but wrong tone	Understandable, too formal	Quick edit or "Rewrite conversationally"

💡 The rule

Good bones, needs refinement. That's what AI is for.

The 60-Second Process

Run this workflow for every AI output:

⏱️ Five-step evaluation (60 seconds total) ▼

Step	Check	Time	If fail
1	Accuracy – Factually correct?	10 sec	Reject, regenerate
2	Relevance – Fits our context?	10 sec	Reject, regenerate
3	Completeness – Any gaps?	15 sec	Note what's missing
4	Clarity – Understandable?	15 sec	Note what's confusing
5	Tone – Sounds like us?	10 sec	Quick edit or prompt

60 seconds total. Decision: ship, refine, or reject.

What "Good Enough" Looks Like

You're not aiming for perfection. You're aiming for "better than starting from scratch."

✅ Good Enough

Saves you time overall (even with refinement)
Gets you 70-80% there
Gives you something to react to
Requires expertise to finish, not to create

❌ Not Good Enough

Takes longer to fix than write yourself
Requires complete reconstruction
Misses the mark so badly you're starting over

💡 The test

If you're spending more time fighting with AI than you would have spent writing it yourself, stop. Write it yourself or try a completely different prompt.

Example 1: Learning Objective

AI output: "By the end of this training, learners will understand customer service best practices."

60-second evaluation:

✅ Accuracy – Yes, it's about customer service
✅ Relevance – Yes, matches our training topic
❌ Completeness – "Understand" isn't measurable
✅ Clarity – Clear, just not specific
✅ Tone – Fine

Decision: Refine.

Follow-up prompt: "Make this specific. Use Bloom's action verb. Focus on handling difficult customers in retail."

Result: "By the end of this training, retail associates will de-escalate difficult customer situations using the three-step resolution framework."

Time: 90 seconds total. Better than writing from scratch.

Key Takeaways

60-second evaluation saves hours. Fast triage prevents wasting time polishing bad output.
Critical fails = reject. Wrong facts or context? Start over immediately.
Fixable fails = refine. Incomplete or unclear? That's what follow-up prompts are for.
Good enough = 70-80% there. You're not aiming for perfection from AI—just a strong starting point.

Try It Now

🎯 Your task:

Generate a learning objective for your current project. Run the 60-second evaluation. Does it pass? If not, what failed? Refine or reject accordingly.

The test: Can you make the decision in under 60 seconds?

📥 Download: Quality evaluation checklist (PDF)

One-page checklist for evaluating AI output in 60 seconds.

Download PDF