# Skill: Design Thinking - Test ## Description Validate prototypes with real users to gather feedback, measure success, and decide next steps. ## Input - **prototype**: Prototype to test (required) - **test_type**: usability|ab_test|interview|analytics (optional, default: usability) - **user_segments**: Which users to test with (required) - **success_criteria**: What defines success (required) ## Testing Frameworks ### 1. Usability Testing **5-User Rule:** Test with 5 users to find 85% of usability issues. **Test Script Template:** ``` INTRO (2 mins) - Thanks for participating - Testing product, not you - Think aloud encouraged - No wrong answers CONTEXT (1 min) - Scenario: "You need to create a new campaign..." - Goal: "Complete the task as you normally would" TASKS (10-15 mins) Task 1: [Specific scenario] - Observe: Where do they pause? - Note: What do they say? - Ask: "What are you thinking?" Task 2: [Next scenario] - Same observation process DEBRIEF (5 mins) - What was easy? - What was confusing? - What would you change? - Would you use this? ``` **What to Observe:** - Time to complete task - Number of errors/retries - Hesitation points - Verbal confusion - Emotional reactions - Workarounds attempted ### 2. Feedback Collection Templates **Quick Rating Scale:** ``` After each task: "How easy was that? (1-5)" 1 = Very difficult 2 = Difficult 3 = Okay 4 = Easy 5 = Very easy ``` **System Usability Scale (SUS):** ``` Rate 1-5 (1=Strongly Disagree, 5=Strongly Agree): 1. I would use this frequently 2. I found it unnecessarily complex 3. I found it easy to use 4. I would need support to use this 5. Features were well integrated 6. There was too much inconsistency 7. Most people would learn quickly 8. I found it cumbersome 9. I felt confident using it 10. I needed to learn a lot first Score: ((Sum odd items - 5) + (25 - sum even items)) * 2.5 > 68 = Above average < 68 = Below average ``` **Post-Task Interview:** ``` 1. What did you like most? 2. What frustrated you? 3. What would you change? 4. Would you use this over current solution? 5. What's missing? ``` ### 3. A/B Testing Patterns **Simple A/B Test:** ```json { "test_name": "Campaign Creation Flow", "hypothesis": "1-step flow will increase completion by 30%", "variant_a": { "name": "Control (3-step)", "users": 50, "metric": "completion_rate" }, "variant_b": { "name": "Treatment (1-step)", "users": 50, "metric": "completion_rate" }, "duration": "7 days", "success_threshold": "30% improvement" } ``` **Metrics to Track:** - Completion rate - Time to complete - Error rate - Drop-off points - User satisfaction ### 4. Iteration Decision Criteria **Go/No-Go Framework:** ``` SHIP IT if: - [ ] 80%+ users complete task successfully - [ ] Average task time meets target - [ ] SUS score > 68 - [ ] No critical usability issues - [ ] Users prefer it to current solution ITERATE if: - [ ] 50-80% success rate - [ ] Task time 2x target - [ ] SUS score 50-68 - [ ] 2+ moderate issues - [ ] Mixed user preference PIVOT if: - [ ] <50% success rate - [ ] Task time >3x target - [ ] SUS score <50 - [ ] Critical blocking issues - [ ] Users prefer current solution ``` ## Output Format ```json { "status": "success", "test_summary": { "type": "usability_test", "users_tested": 5, "date": "2024-12-14", "duration": "45 minutes total" }, "results": { "completion_rate": "80%", "avg_time": "8 seconds", "sus_score": 72, "user_satisfaction": "4.2/5" }, "key_findings": [ { "issue": "Users confused by template dropdown", "severity": "moderate", "users_affected": 3, "evidence": "Hesitated 5+ seconds, said 'not sure what to pick'" }, { "issue": "Success message not clear", "severity": "low", "users_affected": 2, "evidence": "Asked 'did it work?'" } ], "positive_feedback": [ "Much faster than current process", "Templates are helpful", "Clean and simple interface" ], "improvement_suggestions": [ "Add template preview on hover", "Show success confirmation clearly", "Save last-used template as default" ], "decision": { "verdict": "iterate", "reasoning": "80% success rate is good but template confusion needs fix", "next_actions": [ "Add template previews", "Improve success feedback", "Test again with 3 users" ] }, "next_step": "Implement improvements and /dt loop back to testing" } ``` ## Quality Gates - [ ] Tested with 5+ users per segment - [ ] Clear success/failure criteria defined - [ ] Quantitative data collected (time, completion, SUS) - [ ] Qualitative feedback captured (quotes, observations) - [ ] Decision made (ship/iterate/pivot) - [ ] Next actions defined ## Token Budget - Max input: 800 tokens - Max output: 1800 tokens ## Model - Recommended: sonnet (pattern analysis) ## Philosophy > "In God we trust. All others must bring data." > Test with users, not assumptions. **Keep it simple:** - 5 users find 85% of issues - Watch what they do, not what they say - Quantify where possible - Act on feedback fast - Test early, test often