56 lines
1.4 KiB
Markdown
56 lines
1.4 KiB
Markdown
---
|
|
name: skill-tester
|
|
description: Use when validating a skill before deployment. Tests skill behavior with subagent simulations to find edge cases and rationalization loopholes.
|
|
category: meta
|
|
token_budget: 1500
|
|
---
|
|
|
|
# Skill: Test Skills with Subagents
|
|
|
|
## Overview
|
|
Every skill must be tested before deployment. The Iron Law: **No skill ships without a failing test first.**
|
|
|
|
## Testing Protocol
|
|
|
|
### Phase 1: Baseline (No Skill)
|
|
1. Run a subagent WITHOUT the skill loaded
|
|
2. Give it a scenario where the skill SHOULD be used
|
|
3. Document what the subagent does wrong
|
|
4. This establishes the "before" behavior
|
|
|
|
### Phase 2: Apply Skill
|
|
1. Run subagent WITH the skill loaded
|
|
2. Same scenario as Phase 1
|
|
3. Verify the skill changes behavior correctly
|
|
4. Document improvements
|
|
|
|
### Phase 3: Edge Cases
|
|
Test these rationalization patterns:
|
|
- "I already know how to do this"
|
|
- "This situation is different"
|
|
- "It would be faster to just..."
|
|
- "The user probably meant..."
|
|
|
|
### Phase 4: Adversarial
|
|
Attempt to make the skill fail:
|
|
- Ambiguous inputs
|
|
- Conflicting requirements
|
|
- Time pressure scenarios
|
|
- Partial information
|
|
|
|
## Output Format
|
|
```json
|
|
{
|
|
"skill_name": "...",
|
|
"test_date": "YYYY-MM-DD",
|
|
"baseline_failures": ["..."],
|
|
"improvements": ["..."],
|
|
"edge_cases_passed": N,
|
|
"edge_cases_failed": N,
|
|
"recommendations": ["..."],
|
|
"verdict": "ship|iterate|reject"
|
|
}
|
|
```
|
|
|
|
## Token Budget: 1500 tokens max
|