From 6715edd8447c0c093bcbdd0aeaa84744f9336b88 Mon Sep 17 00:00:00 2001 From: admin Date: Thu, 11 Dec 2025 20:24:37 +0100 Subject: [PATCH] Add skill-tester skill for validation --- skills/meta/skill-tester.md | 55 +++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 skills/meta/skill-tester.md diff --git a/skills/meta/skill-tester.md b/skills/meta/skill-tester.md new file mode 100644 index 0000000..b8a2291 --- /dev/null +++ b/skills/meta/skill-tester.md @@ -0,0 +1,55 @@ +--- +name: skill-tester +description: Use when validating a skill before deployment. Tests skill behavior with subagent simulations to find edge cases and rationalization loopholes. +category: meta +token_budget: 1500 +--- + +# Skill: Test Skills with Subagents + +## Overview +Every skill must be tested before deployment. The Iron Law: **No skill ships without a failing test first.** + +## Testing Protocol + +### Phase 1: Baseline (No Skill) +1. Run a subagent WITHOUT the skill loaded +2. Give it a scenario where the skill SHOULD be used +3. Document what the subagent does wrong +4. This establishes the "before" behavior + +### Phase 2: Apply Skill +1. Run subagent WITH the skill loaded +2. Same scenario as Phase 1 +3. Verify the skill changes behavior correctly +4. Document improvements + +### Phase 3: Edge Cases +Test these rationalization patterns: +- "I already know how to do this" +- "This situation is different" +- "It would be faster to just..." +- "The user probably meant..." + +### Phase 4: Adversarial +Attempt to make the skill fail: +- Ambiguous inputs +- Conflicting requirements +- Time pressure scenarios +- Partial information + +## Output Format +```json +{ + "skill_name": "...", + "test_date": "YYYY-MM-DD", + "baseline_failures": ["..."], + "improvements": ["..."], + "edge_cases_passed": N, + "edge_cases_failed": N, + "recommendations": ["..."], + "verdict": "ship|iterate|reject" +} +``` + +## Token Budget: 1500 tokens max