1.4 KiB
1.4 KiB
name, description, category, token_budget
| name | description | category | token_budget |
|---|---|---|---|
| skill-tester | Use when validating a skill before deployment. Tests skill behavior with subagent simulations to find edge cases and rationalization loopholes. | meta | 1500 |
Skill: Test Skills with Subagents
Overview
Every skill must be tested before deployment. The Iron Law: No skill ships without a failing test first.
Testing Protocol
Phase 1: Baseline (No Skill)
- Run a subagent WITHOUT the skill loaded
- Give it a scenario where the skill SHOULD be used
- Document what the subagent does wrong
- This establishes the "before" behavior
Phase 2: Apply Skill
- Run subagent WITH the skill loaded
- Same scenario as Phase 1
- Verify the skill changes behavior correctly
- Document improvements
Phase 3: Edge Cases
Test these rationalization patterns:
- "I already know how to do this"
- "This situation is different"
- "It would be faster to just..."
- "The user probably meant..."
Phase 4: Adversarial
Attempt to make the skill fail:
- Ambiguous inputs
- Conflicting requirements
- Time pressure scenarios
- Partial information
Output Format
{
"skill_name": "...",
"test_date": "YYYY-MM-DD",
"baseline_failures": ["..."],
"improvements": ["..."],
"edge_cases_passed": N,
"edge_cases_failed": N,
"recommendations": ["..."],
"verdict": "ship|iterate|reject"
}