Add comprehensive Test phase skill
This commit is contained in:
228
skills/design-thinking/test.md
Normal file
228
skills/design-thinking/test.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# Skill: Design Thinking - Test
|
||||
|
||||
## Description
|
||||
Validate prototypes with real users to gather feedback, measure success, and decide next steps.
|
||||
|
||||
## Input
|
||||
- **prototype**: Prototype to test (required)
|
||||
- **test_type**: usability|ab_test|interview|analytics (optional, default: usability)
|
||||
- **user_segments**: Which users to test with (required)
|
||||
- **success_criteria**: What defines success (required)
|
||||
|
||||
## Testing Frameworks
|
||||
|
||||
### 1. Usability Testing
|
||||
|
||||
**5-User Rule:**
|
||||
Test with 5 users to find 85% of usability issues.
|
||||
|
||||
**Test Script Template:**
|
||||
```
|
||||
INTRO (2 mins)
|
||||
- Thanks for participating
|
||||
- Testing product, not you
|
||||
- Think aloud encouraged
|
||||
- No wrong answers
|
||||
|
||||
CONTEXT (1 min)
|
||||
- Scenario: "You need to create a new campaign..."
|
||||
- Goal: "Complete the task as you normally would"
|
||||
|
||||
TASKS (10-15 mins)
|
||||
Task 1: [Specific scenario]
|
||||
- Observe: Where do they pause?
|
||||
- Note: What do they say?
|
||||
- Ask: "What are you thinking?"
|
||||
|
||||
Task 2: [Next scenario]
|
||||
- Same observation process
|
||||
|
||||
DEBRIEF (5 mins)
|
||||
- What was easy?
|
||||
- What was confusing?
|
||||
- What would you change?
|
||||
- Would you use this?
|
||||
```
|
||||
|
||||
**What to Observe:**
|
||||
- Time to complete task
|
||||
- Number of errors/retries
|
||||
- Hesitation points
|
||||
- Verbal confusion
|
||||
- Emotional reactions
|
||||
- Workarounds attempted
|
||||
|
||||
### 2. Feedback Collection Templates
|
||||
|
||||
**Quick Rating Scale:**
|
||||
```
|
||||
After each task:
|
||||
"How easy was that? (1-5)"
|
||||
1 = Very difficult
|
||||
2 = Difficult
|
||||
3 = Okay
|
||||
4 = Easy
|
||||
5 = Very easy
|
||||
```
|
||||
|
||||
**System Usability Scale (SUS):**
|
||||
```
|
||||
Rate 1-5 (1=Strongly Disagree, 5=Strongly Agree):
|
||||
1. I would use this frequently
|
||||
2. I found it unnecessarily complex
|
||||
3. I found it easy to use
|
||||
4. I would need support to use this
|
||||
5. Features were well integrated
|
||||
6. There was too much inconsistency
|
||||
7. Most people would learn quickly
|
||||
8. I found it cumbersome
|
||||
9. I felt confident using it
|
||||
10. I needed to learn a lot first
|
||||
|
||||
Score: ((Sum odd items - 5) + (25 - sum even items)) * 2.5
|
||||
> 68 = Above average
|
||||
< 68 = Below average
|
||||
```
|
||||
|
||||
**Post-Task Interview:**
|
||||
```
|
||||
1. What did you like most?
|
||||
2. What frustrated you?
|
||||
3. What would you change?
|
||||
4. Would you use this over current solution?
|
||||
5. What's missing?
|
||||
```
|
||||
|
||||
### 3. A/B Testing Patterns
|
||||
|
||||
**Simple A/B Test:**
|
||||
```json
|
||||
{
|
||||
"test_name": "Campaign Creation Flow",
|
||||
"hypothesis": "1-step flow will increase completion by 30%",
|
||||
"variant_a": {
|
||||
"name": "Control (3-step)",
|
||||
"users": 50,
|
||||
"metric": "completion_rate"
|
||||
},
|
||||
"variant_b": {
|
||||
"name": "Treatment (1-step)",
|
||||
"users": 50,
|
||||
"metric": "completion_rate"
|
||||
},
|
||||
"duration": "7 days",
|
||||
"success_threshold": "30% improvement"
|
||||
}
|
||||
```
|
||||
|
||||
**Metrics to Track:**
|
||||
- Completion rate
|
||||
- Time to complete
|
||||
- Error rate
|
||||
- Drop-off points
|
||||
- User satisfaction
|
||||
|
||||
### 4. Iteration Decision Criteria
|
||||
|
||||
**Go/No-Go Framework:**
|
||||
```
|
||||
SHIP IT if:
|
||||
- [ ] 80%+ users complete task successfully
|
||||
- [ ] Average task time meets target
|
||||
- [ ] SUS score > 68
|
||||
- [ ] No critical usability issues
|
||||
- [ ] Users prefer it to current solution
|
||||
|
||||
ITERATE if:
|
||||
- [ ] 50-80% success rate
|
||||
- [ ] Task time 2x target
|
||||
- [ ] SUS score 50-68
|
||||
- [ ] 2+ moderate issues
|
||||
- [ ] Mixed user preference
|
||||
|
||||
PIVOT if:
|
||||
- [ ] <50% success rate
|
||||
- [ ] Task time >3x target
|
||||
- [ ] SUS score <50
|
||||
- [ ] Critical blocking issues
|
||||
- [ ] Users prefer current solution
|
||||
```
|
||||
|
||||
## Output Format
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"test_summary": {
|
||||
"type": "usability_test",
|
||||
"users_tested": 5,
|
||||
"date": "2024-12-14",
|
||||
"duration": "45 minutes total"
|
||||
},
|
||||
"results": {
|
||||
"completion_rate": "80%",
|
||||
"avg_time": "8 seconds",
|
||||
"sus_score": 72,
|
||||
"user_satisfaction": "4.2/5"
|
||||
},
|
||||
"key_findings": [
|
||||
{
|
||||
"issue": "Users confused by template dropdown",
|
||||
"severity": "moderate",
|
||||
"users_affected": 3,
|
||||
"evidence": "Hesitated 5+ seconds, said 'not sure what to pick'"
|
||||
},
|
||||
{
|
||||
"issue": "Success message not clear",
|
||||
"severity": "low",
|
||||
"users_affected": 2,
|
||||
"evidence": "Asked 'did it work?'"
|
||||
}
|
||||
],
|
||||
"positive_feedback": [
|
||||
"Much faster than current process",
|
||||
"Templates are helpful",
|
||||
"Clean and simple interface"
|
||||
],
|
||||
"improvement_suggestions": [
|
||||
"Add template preview on hover",
|
||||
"Show success confirmation clearly",
|
||||
"Save last-used template as default"
|
||||
],
|
||||
"decision": {
|
||||
"verdict": "iterate",
|
||||
"reasoning": "80% success rate is good but template confusion needs fix",
|
||||
"next_actions": [
|
||||
"Add template previews",
|
||||
"Improve success feedback",
|
||||
"Test again with 3 users"
|
||||
]
|
||||
},
|
||||
"next_step": "Implement improvements and /dt loop back to testing"
|
||||
}
|
||||
```
|
||||
|
||||
## Quality Gates
|
||||
- [ ] Tested with 5+ users per segment
|
||||
- [ ] Clear success/failure criteria defined
|
||||
- [ ] Quantitative data collected (time, completion, SUS)
|
||||
- [ ] Qualitative feedback captured (quotes, observations)
|
||||
- [ ] Decision made (ship/iterate/pivot)
|
||||
- [ ] Next actions defined
|
||||
|
||||
## Token Budget
|
||||
- Max input: 800 tokens
|
||||
- Max output: 1800 tokens
|
||||
|
||||
## Model
|
||||
- Recommended: sonnet (pattern analysis)
|
||||
|
||||
## Philosophy
|
||||
> "In God we trust. All others must bring data."
|
||||
> Test with users, not assumptions.
|
||||
|
||||
**Keep it simple:**
|
||||
- 5 users find 85% of issues
|
||||
- Watch what they do, not what they say
|
||||
- Quantify where possible
|
||||
- Act on feedback fast
|
||||
- Test early, test often
|
||||
Reference in New Issue
Block a user