Add skill-tester skill for validation
This commit is contained in:
55
skills/meta/skill-tester.md
Normal file
55
skills/meta/skill-tester.md
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
---
|
||||||
|
name: skill-tester
|
||||||
|
description: Use when validating a skill before deployment. Tests skill behavior with subagent simulations to find edge cases and rationalization loopholes.
|
||||||
|
category: meta
|
||||||
|
token_budget: 1500
|
||||||
|
---
|
||||||
|
|
||||||
|
# Skill: Test Skills with Subagents
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Every skill must be tested before deployment. The Iron Law: **No skill ships without a failing test first.**
|
||||||
|
|
||||||
|
## Testing Protocol
|
||||||
|
|
||||||
|
### Phase 1: Baseline (No Skill)
|
||||||
|
1. Run a subagent WITHOUT the skill loaded
|
||||||
|
2. Give it a scenario where the skill SHOULD be used
|
||||||
|
3. Document what the subagent does wrong
|
||||||
|
4. This establishes the "before" behavior
|
||||||
|
|
||||||
|
### Phase 2: Apply Skill
|
||||||
|
1. Run subagent WITH the skill loaded
|
||||||
|
2. Same scenario as Phase 1
|
||||||
|
3. Verify the skill changes behavior correctly
|
||||||
|
4. Document improvements
|
||||||
|
|
||||||
|
### Phase 3: Edge Cases
|
||||||
|
Test these rationalization patterns:
|
||||||
|
- "I already know how to do this"
|
||||||
|
- "This situation is different"
|
||||||
|
- "It would be faster to just..."
|
||||||
|
- "The user probably meant..."
|
||||||
|
|
||||||
|
### Phase 4: Adversarial
|
||||||
|
Attempt to make the skill fail:
|
||||||
|
- Ambiguous inputs
|
||||||
|
- Conflicting requirements
|
||||||
|
- Time pressure scenarios
|
||||||
|
- Partial information
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"skill_name": "...",
|
||||||
|
"test_date": "YYYY-MM-DD",
|
||||||
|
"baseline_failures": ["..."],
|
||||||
|
"improvements": ["..."],
|
||||||
|
"edge_cases_passed": N,
|
||||||
|
"edge_cases_failed": N,
|
||||||
|
"recommendations": ["..."],
|
||||||
|
"verdict": "ship|iterate|reject"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Token Budget: 1500 tokens max
|
||||||
Reference in New Issue
Block a user