diff --git a/skills/design-thinking/test.md b/skills/design-thinking/test.md
new file mode 100644
index 0000000..c692c57
--- /dev/null
+++ b/skills/design-thinking/test.md
@@ -0,0 +1,228 @@
+# Skill: Design Thinking - Test
+
+## Description
+Validate prototypes with real users to gather feedback, measure success, and decide next steps.
+
+## Input
+- **prototype**: Prototype to test (required)
+- **test_type**: usability|ab_test|interview|analytics (optional, default: usability)
+- **user_segments**: Which users to test with (required)
+- **success_criteria**: What defines success (required)
+
+## Testing Frameworks
+
+### 1. Usability Testing
+
+**5-User Rule:**
+Test with 5 users to find 85% of usability issues.
+
+**Test Script Template:**
+```
+INTRO (2 mins)
+- Thanks for participating
+- Testing product, not you
+- Think aloud encouraged
+- No wrong answers
+
+CONTEXT (1 min)
+- Scenario: "You need to create a new campaign..."
+- Goal: "Complete the task as you normally would"
+
+TASKS (10-15 mins)
+Task 1: [Specific scenario]
+- Observe: Where do they pause?
+- Note: What do they say?
+- Ask: "What are you thinking?"
+
+Task 2: [Next scenario]
+- Same observation process
+
+DEBRIEF (5 mins)
+- What was easy?
+- What was confusing?
+- What would you change?
+- Would you use this?
+```
+
+**What to Observe:**
+- Time to complete task
+- Number of errors/retries
+- Hesitation points
+- Verbal confusion
+- Emotional reactions
+- Workarounds attempted
+
+### 2. Feedback Collection Templates
+
+**Quick Rating Scale:**
+```
+After each task:
+"How easy was that? (1-5)"
+1 = Very difficult
+2 = Difficult
+3 = Okay
+4 = Easy
+5 = Very easy
+```
+
+**System Usability Scale (SUS):**
+```
+Rate 1-5 (1=Strongly Disagree, 5=Strongly Agree):
+1. I would use this frequently
+2. I found it unnecessarily complex
+3. I found it easy to use
+4. I would need support to use this
+5. Features were well integrated
+6. There was too much inconsistency
+7. Most people would learn quickly
+8. I found it cumbersome
+9. I felt confident using it
+10. I needed to learn a lot first
+
+Score: ((Sum odd items - 5) + (25 - sum even items)) * 2.5
+> 68 = Above average
+< 68 = Below average
+```
+
+**Post-Task Interview:**
+```
+1. What did you like most?
+2. What frustrated you?
+3. What would you change?
+4. Would you use this over current solution?
+5. What's missing?
+```
+
+### 3. A/B Testing Patterns
+
+**Simple A/B Test:**
+```json
+{
+  "test_name": "Campaign Creation Flow",
+  "hypothesis": "1-step flow will increase completion by 30%",
+  "variant_a": {
+    "name": "Control (3-step)",
+    "users": 50,
+    "metric": "completion_rate"
+  },
+  "variant_b": {
+    "name": "Treatment (1-step)",
+    "users": 50,
+    "metric": "completion_rate"
+  },
+  "duration": "7 days",
+  "success_threshold": "30% improvement"
+}
+```
+
+**Metrics to Track:**
+- Completion rate
+- Time to complete
+- Error rate
+- Drop-off points
+- User satisfaction
+
+### 4. Iteration Decision Criteria
+
+**Go/No-Go Framework:**
+```
+SHIP IT if:
+- [ ] 80%+ users complete task successfully
+- [ ] Average task time meets target
+- [ ] SUS score > 68
+- [ ] No critical usability issues
+- [ ] Users prefer it to current solution
+
+ITERATE if:
+- [ ] 50-80% success rate
+- [ ] Task time 2x target
+- [ ] SUS score 50-68
+- [ ] 2+ moderate issues
+- [ ] Mixed user preference
+
+PIVOT if:
+- [ ] <50% success rate
+- [ ] Task time >3x target
+- [ ] SUS score <50
+- [ ] Critical blocking issues
+- [ ] Users prefer current solution
+```
+
+## Output Format
+```json
+{
+  "status": "success",
+  "test_summary": {
+    "type": "usability_test",
+    "users_tested": 5,
+    "date": "2024-12-14",
+    "duration": "45 minutes total"
+  },
+  "results": {
+    "completion_rate": "80%",
+    "avg_time": "8 seconds",
+    "sus_score": 72,
+    "user_satisfaction": "4.2/5"
+  },
+  "key_findings": [
+    {
+      "issue": "Users confused by template dropdown",
+      "severity": "moderate",
+      "users_affected": 3,
+      "evidence": "Hesitated 5+ seconds, said 'not sure what to pick'"
+    },
+    {
+      "issue": "Success message not clear",
+      "severity": "low",
+      "users_affected": 2,
+      "evidence": "Asked 'did it work?'"
+    }
+  ],
+  "positive_feedback": [
+    "Much faster than current process",
+    "Templates are helpful",
+    "Clean and simple interface"
+  ],
+  "improvement_suggestions": [
+    "Add template preview on hover",
+    "Show success confirmation clearly",
+    "Save last-used template as default"
+  ],
+  "decision": {
+    "verdict": "iterate",
+    "reasoning": "80% success rate is good but template confusion needs fix",
+    "next_actions": [
+      "Add template previews",
+      "Improve success feedback",
+      "Test again with 3 users"
+    ]
+  },
+  "next_step": "Implement improvements and /dt loop back to testing"
+}
+```
+
+## Quality Gates
+- [ ] Tested with 5+ users per segment
+- [ ] Clear success/failure criteria defined
+- [ ] Quantitative data collected (time, completion, SUS)
+- [ ] Qualitative feedback captured (quotes, observations)
+- [ ] Decision made (ship/iterate/pivot)
+- [ ] Next actions defined
+
+## Token Budget
+- Max input: 800 tokens
+- Max output: 1800 tokens
+
+## Model
+- Recommended: sonnet (pattern analysis)
+
+## Philosophy
+> "In God we trust. All others must bring data."
+> Test with users, not assumptions.
+
+**Keep it simple:**
+- 5 users find 85% of issues
+- Watch what they do, not what they say
+- Quantify where possible
+- Act on feedback fast
+- Test early, test often