diff --git a/skills/design-thinking/test.md b/skills/design-thinking/test.md new file mode 100644 index 0000000..c692c57 --- /dev/null +++ b/skills/design-thinking/test.md @@ -0,0 +1,228 @@ +# Skill: Design Thinking - Test + +## Description +Validate prototypes with real users to gather feedback, measure success, and decide next steps. + +## Input +- **prototype**: Prototype to test (required) +- **test_type**: usability|ab_test|interview|analytics (optional, default: usability) +- **user_segments**: Which users to test with (required) +- **success_criteria**: What defines success (required) + +## Testing Frameworks + +### 1. Usability Testing + +**5-User Rule:** +Test with 5 users to find 85% of usability issues. + +**Test Script Template:** +``` +INTRO (2 mins) +- Thanks for participating +- Testing product, not you +- Think aloud encouraged +- No wrong answers + +CONTEXT (1 min) +- Scenario: "You need to create a new campaign..." +- Goal: "Complete the task as you normally would" + +TASKS (10-15 mins) +Task 1: [Specific scenario] +- Observe: Where do they pause? +- Note: What do they say? +- Ask: "What are you thinking?" + +Task 2: [Next scenario] +- Same observation process + +DEBRIEF (5 mins) +- What was easy? +- What was confusing? +- What would you change? +- Would you use this? +``` + +**What to Observe:** +- Time to complete task +- Number of errors/retries +- Hesitation points +- Verbal confusion +- Emotional reactions +- Workarounds attempted + +### 2. Feedback Collection Templates + +**Quick Rating Scale:** +``` +After each task: +"How easy was that? (1-5)" +1 = Very difficult +2 = Difficult +3 = Okay +4 = Easy +5 = Very easy +``` + +**System Usability Scale (SUS):** +``` +Rate 1-5 (1=Strongly Disagree, 5=Strongly Agree): +1. I would use this frequently +2. I found it unnecessarily complex +3. I found it easy to use +4. I would need support to use this +5. Features were well integrated +6. There was too much inconsistency +7. Most people would learn quickly +8. I found it cumbersome +9. I felt confident using it +10. I needed to learn a lot first + +Score: ((Sum odd items - 5) + (25 - sum even items)) * 2.5 +> 68 = Above average +< 68 = Below average +``` + +**Post-Task Interview:** +``` +1. What did you like most? +2. What frustrated you? +3. What would you change? +4. Would you use this over current solution? +5. What's missing? +``` + +### 3. A/B Testing Patterns + +**Simple A/B Test:** +```json +{ + "test_name": "Campaign Creation Flow", + "hypothesis": "1-step flow will increase completion by 30%", + "variant_a": { + "name": "Control (3-step)", + "users": 50, + "metric": "completion_rate" + }, + "variant_b": { + "name": "Treatment (1-step)", + "users": 50, + "metric": "completion_rate" + }, + "duration": "7 days", + "success_threshold": "30% improvement" +} +``` + +**Metrics to Track:** +- Completion rate +- Time to complete +- Error rate +- Drop-off points +- User satisfaction + +### 4. Iteration Decision Criteria + +**Go/No-Go Framework:** +``` +SHIP IT if: +- [ ] 80%+ users complete task successfully +- [ ] Average task time meets target +- [ ] SUS score > 68 +- [ ] No critical usability issues +- [ ] Users prefer it to current solution + +ITERATE if: +- [ ] 50-80% success rate +- [ ] Task time 2x target +- [ ] SUS score 50-68 +- [ ] 2+ moderate issues +- [ ] Mixed user preference + +PIVOT if: +- [ ] <50% success rate +- [ ] Task time >3x target +- [ ] SUS score <50 +- [ ] Critical blocking issues +- [ ] Users prefer current solution +``` + +## Output Format +```json +{ + "status": "success", + "test_summary": { + "type": "usability_test", + "users_tested": 5, + "date": "2024-12-14", + "duration": "45 minutes total" + }, + "results": { + "completion_rate": "80%", + "avg_time": "8 seconds", + "sus_score": 72, + "user_satisfaction": "4.2/5" + }, + "key_findings": [ + { + "issue": "Users confused by template dropdown", + "severity": "moderate", + "users_affected": 3, + "evidence": "Hesitated 5+ seconds, said 'not sure what to pick'" + }, + { + "issue": "Success message not clear", + "severity": "low", + "users_affected": 2, + "evidence": "Asked 'did it work?'" + } + ], + "positive_feedback": [ + "Much faster than current process", + "Templates are helpful", + "Clean and simple interface" + ], + "improvement_suggestions": [ + "Add template preview on hover", + "Show success confirmation clearly", + "Save last-used template as default" + ], + "decision": { + "verdict": "iterate", + "reasoning": "80% success rate is good but template confusion needs fix", + "next_actions": [ + "Add template previews", + "Improve success feedback", + "Test again with 3 users" + ] + }, + "next_step": "Implement improvements and /dt loop back to testing" +} +``` + +## Quality Gates +- [ ] Tested with 5+ users per segment +- [ ] Clear success/failure criteria defined +- [ ] Quantitative data collected (time, completion, SUS) +- [ ] Qualitative feedback captured (quotes, observations) +- [ ] Decision made (ship/iterate/pivot) +- [ ] Next actions defined + +## Token Budget +- Max input: 800 tokens +- Max output: 1800 tokens + +## Model +- Recommended: sonnet (pattern analysis) + +## Philosophy +> "In God we trust. All others must bring data." +> Test with users, not assumptions. + +**Keep it simple:** +- 5 users find 85% of issues +- Watch what they do, not what they say +- Quantify where possible +- Act on feedback fast +- Test early, test often