After ~10 min reading, you will decide whether to install agent-skills, where to wire it in, and how to use every skill immediately.
Addy Osmani, engineering lead at Google Chrome, open-sourced agent-skills on February 15, 2026.
The repo has hit 39K+ stars in three months, with a 1K+ reported daily gain.
The bet is that agent reliability comes from the harness around the model, not from a smarter model.
What’s different is that this is not another prompt library. It is a three-layer architecture (skill, persona, command), with anti-rationalization tables in 20 of 22 skills, a parallel fan-out command, and three hooks that give the pack real enforcement weight on Claude Code.
It does not make the model smarter. It makes skipping specs, tests, reviews, and security checks harder.
The single most screenshot-able thing in the repo is the table used across most lifecycle skills that names the agent’s shortcut excuses and pairs each with a counter-argument:
“I’ll write tests after the code works” → “You won’t. And tests written after the fact test implementation, not behavior.”
“It’s just a prototype” → “Prototypes become production code. Tests from day one prevent the test-debt crisis.”
“I tested it manually” → “Manual testing doesn’t persist. Tomorrow’s change might break it with no way to know.”
That table sits inside test-driven-development/SKILL.md. Most lifecycle skills in the repo ship one of their own. It is the single structural move that separates agent-skills from every other skills repo in the feed.
The author is Osmani, the author of Learning JavaScript Design Patterns and a Google Chrome engineering lead. He summarized the motivation in a May 3 research post:
“A senior engineer’s job is mostly the parts that don’t show up in the diff.”
The repo was created on February 15, 2026 and has shipped 170+ commits from 20+ contributors since. The latest release is v0.6.0 on April 28. Growth went from 27K+ stars on May 4 to 39K+ on May 11.
If you’ve been tracking the skills space, the question isn’t “what is this.” It’s what this does that obra/superpowers and anthropics/skills don’t.
The star comparison cuts the other way: obra/superpowers is bigger by raw count, anthropics/skills is the official spec. The unique contribution of agent-skills is structural: 20 anti-rationalization tables, a parallel-fan-out command, and an enforcement hook layer.
The governing rule is short and load-bearing. The user, or a slash command on the user’s behalf, is the orchestrator. Personas do not invoke other personas. Skills are mandatory hops inside a persona’s workflow.
This rule was formalized in the v0.6.0 release notes after a stretch of issues where contributors tried to write “meta-orchestrator” personas that routed work to other personas. The repo names this an anti-pattern and rejects it on two grounds. It loses information through paraphrasing hops, and on Claude Code it is impossible by platform constraint anyway, since subagents cannot spawn other subagents.
The only multi-persona pattern the repo endorses is parallel fan-out, used by /ship. More on that below.
You can skip to “How to get started” Section down below.
Inside the repo. Skills sit in skills/<name>/SKILL.md. Personas live in agents/ as plain Markdown files. Slash commands ship twice: .claude/commands/*.md for Claude Code and .gemini/commands/*.toml for Gemini CLI. Hooks are bash scripts in hooks/. Four reference checklists (testing, security, performance, accessibility) sit in references/, separate from any skill so they load on demand.
The SKILL.md format. Every skill file follows the same anatomy:
YAML frontmatter (name + description). Only these two fields are loaded at session start, so the agent can scan all 22 skills cheaply. Full file content loads only when the skill is invoked. This is the progressive-disclosure mechanic that keeps the pack under context budget even with all skills installed.
Overview and When to Use. Two short blocks. The first says what the skill does. The second lists the exact triggers that should activate it (for example, “implementing any new logic,” “fixing a bug”). The agent matches the current task against the triggers.
Process. The numbered step-by-step workflow. The largest section in every file, and the part the agent actually executes. Steps include inline code examples, decision flowcharts, and templates the agent fills in.
Common Rationalizations. A two-column table of agent excuses paired with counter-arguments. The agent has to read its own most-likely shortcut before it can take it. 20 of the 22 skills ship one of these.
Red Flags. A bullet list of observable signs the skill is being violated. The agent self-monitors against the list. The reviewer in /review also checks against it.
Verification. A checklist of exit criteria the agent must satisfy before marking work done. Every checkbox is evidence-based: tests passing, build output, runtime data, screenshot. The repo’s rule: “Seems right is never sufficient.”
How a skill activates. Two paths. The agent auto-activates a skill by matching the current task against the When to Use triggers in the frontmatter. The meta-skill using-agent-skills holds the routing flowchart that does this matching, and is injected into every session by the session-start hook. The second path is explicit: the user invokes a skill via a slash command (/spec, /test, /review, etc.). Either path loads the full SKILL.md into context.
Length and structure rules. Every SKILL.md stays under 500 lines. Reference material that would push it over lives in references/, loaded only when a skill needs it. The longest skill file in the repo is ci-cd-and-automation at 390 lines. Verified from the local clone at SHA 3ff4b518: 22 of 22 skill files have valid YAML frontmatter, 21 of 22 have a ## Verification block, 20 of 22 have a ## Common Rationalizations table.
Start here. For most teams, the first three skills to load are spec-driven-development, test-driven-development, and code-review-and-quality. They cover the highest-risk agent failure loop: unclear task, untested change, and unreviewed diff.
The full six-phase lifecycle map is near the end of this article for readers who want every skill by phase.
Osmani’s frame is short: “Agents skip those steps for the same reason any junior would. They’re invisible.” The repo makes those steps visible in four places.
Anti-rationalization tables. test-driven-development is one of 20 skills with a table that names the shortcut and rejects it before the agent can take it.
The /ship fan-out. /ship spawns code-reviewer, security-auditor, and test-engineer concurrently, then merges their reports into a GO or NO-GO decision with a rollback plan. It skips fan-out only when the change touches two files or fewer, stays under 50 lines, and avoids auth, payments, data access, and config.
Three hook systems. session-start.sh injects using-agent-skills on new Claude Code sessions, uses jq for the JSON payload, falls back to INFO when jq is missing, and passes bash hooks/session-start-test.sh. sdd-cache-{pre,post}.sh caches source docs by sha256(url) and only serves cached bodies after 304 Not Modified against If-None-Match or If-Modified-Since. simplify-ignore.sh protects /* simplify-ignore-start: reason */ blocks with BLOCK_<hash> placeholders and reports 21 passed, 0 failed.
The newest skill, doubt-driven-development. Added in May 2026, it runs a fresh-context reviewer on non-trivial decisions using only the artifact plus the contract, not the original agent’s reasoning. Cross-model review through Codex CLI or Gemini CLI requires explicit per-call authorization, so the check happens mid-flight before /review.
Claude Code (the canonical path). Marketplace install:
For teams without SSH keys on GitHub, force HTTPS (workaround for PR #108):
For local development against an in-flight skill, clone and point Claude Code at the working copy:
At runtime, the session-start.sh hook injects the using-agent-skills meta-skill on every new session, which routes the first message to the matching skill. The 7 slash commands become explicit lifecycle entries on top of that. CI on main is green for the workflow Test Plugin Installation at the verified SHA 3ff4b518.
OpenCode. No slash commands and no plugin system. The integration runs through AGENTS.md and the built-in skill tool. The repo ships .opencode/skills as a symlink to ../skills/ so OpenCode resolves the same skill set. The execution rule in AGENTS.md maps user intent (new feature triggers spec-driven-development, bug triggers debugging-and-error-recovery, code review triggers code-review-and-quality) to skills on every turn. Honest tradeoff per the repo’s own docs/opencode-setup.md: skill invocation depends on model compliance with the AGENTS.md contract, with no hook layer.
Cursor. Copy selected SKILL.md files into .cursor/rules/. Start with the 2-to-3-essential set.
Gemini CLI. gemini skills install https://github.com/addyosmani/agent-skills.git --path skills. Native install. The repo also ships .gemini/commands/*.toml with the same 7 commands, except /plan is renamed /planning because /plan collides with a Gemini internal command.
Windsurf, Copilot, Kiro. Add skill content to .windsurfrules, .github/skills/, or .kiro/skills/ respectively. All three are plain-Markdown integrations with no hook layer.
Skills auto-activate by context. The slash commands are explicit user entries on top. The 22 skills, with what each one does so the reader can pick:
using-agent-skills: Routes incoming work to the right skill via a flowchart. Auto-loaded by the session-start hook on every Claude Code session. Defines the shared operating behaviors (surface assumptions, manage confusion, push back, enforce simplicity, scope discipline, verify don’t assume).
idea-refine: Turns vague ideas into concrete proposals through structured divergent and convergent thinking. Output is a one-page markdown spec with problem statement, recommended direction, MVP scope, and a “Not Doing” list.
spec-driven-development: Writes a PRD before code: objective, commands, project structure, code style, testing strategy, boundaries (always/ask-first/never). Four-phase gated workflow (specify, plan, tasks, implement) with human review at each gate.
incremental-implementation: Builds thin vertical slices: implement, test, verify, commit. Caps at ~100 lines of unverified code. Feature flags, safe defaults, rollback-friendly changes.
test-driven-development: Red-Green-Refactor, the test pyramid (80% unit / 15% integration / 5% E2E), DAMP over DRY in tests, the Beyonce Rule, and the Prove-It pattern for bug fixes (failing reproduction test before the fix).
context-engineering: Loads the right context at the right time. Rules files, context packing, MCP integrations.
source-driven-development: Grounds framework decisions in official documentation with required citations. Detects stack and versions, fetches the relevant docs, flags conflicts with existing code. Paired with the sdd-cache hook for cross-session HTTP caching.
doubt-driven-development: Adversarial fresh-context review on every non-trivial in-flight decision. Five-step cycle: CLAIM, EXTRACT, DOUBT, RECONCILE, STOP. Optional cross-model escalation to Codex CLI or Gemini CLI with explicit per-call authorization.
frontend-ui-engineering: Component architecture, design systems, state management, responsive design, WCAG 2.1 AA accessibility.
api-and-interface-design: Contract-first design, Hyrum’s Law, the One-Version Rule, error semantics, boundary validation.
browser-testing-with-devtools: Chrome DevTools MCP for live runtime data. DOM inspection, console errors, network traces, performance profiling, screenshots.
debugging-and-error-recovery: Five-step triage: reproduce, localize, reduce, fix, guard. Stop-the-line rule for failing tests, safe fallbacks.
code-review-and-quality: Five-axis review (correctness, readability, architecture, security, performance), change sizing ~100 lines, Critical/Important/Suggestion severity labels.
code-simplification: Reduce complexity while preserving exact behavior. Chesterton’s Fence, the Rule of 500. Paired with the simplify-ignore hook for protected code blocks.
security-and-hardening: OWASP Top 10 prevention, auth patterns, secrets management, dependency auditing, three-tier boundary validation.
performance-optimization: Measure-first approach. Core Web Vitals targets, profiling workflows, bundle analysis, anti-pattern detection.
git-workflow-and-versioning: Trunk-based development, atomic commits, change sizing ~100 lines, the commit-as-save-point pattern.
ci-cd-and-automation: Shift Left, Faster is Safer, feature flags, quality gate pipelines, failure feedback loops.
deprecation-and-migration: Code-as-liability mindset, compulsory vs advisory deprecation, migration patterns, zombie-code removal.
documentation-and-adrs: Architecture Decision Records, API docs, inline documentation standards. Document the why, not the what.
shipping-and-launch: Pre-launch checklists, feature flag lifecycle, staged rollouts, rollback procedures, monitoring setup.
The minimum viable set the community cites: spec-driven-development, test-driven-development, and code-review-and-quality. Add incremental-implementation and security-and-hardening for production work. Load others by phase as the task requires.
Opt-in scaffolding, not enforcement. The anti-rationalization tables live in the SKILL.md files, but nothing physically prevents an agent from generating code that ignores them. Adoption rests on the model honoring the contract.
Compliance-dependent on most harnesses. Only Claude Code’s plugin manifest, session-start hook, and /ship fan-out give the pack hard teeth. Cursor, Windsurf, OpenCode, and Copilot fall back to rules files the model may or may not honor.
Plugin version drift. .claude-plugin/plugin.json declares plugin version 1.0.0, while the latest GitHub release is v0.6.0. Open issue #145 and PR #155 track the mismatch.
No empirical effectiveness benchmark. The repo provides workflows and verification checklists but no controlled experiment showing agents using these skills produce fewer bugs or higher-quality reviews than the same agents without it.
Shell hooks need review. The plugin includes shell hooks in hooks/ for session start, source-doc caching, and simplify-ignore protection. Teams should review those scripts before enabling the plugin inside production workspaces.
So the best recommendation is to adopt on Claude Code, with one caveat: treat it as scaffolding that needs the agent’s cooperation, not a guarantee. On other harnesses, pilot the skills first and verify that the agent actually follows them.
Verdict: Production Ready for Claude Code teams, Worth Watching elsewhere. The skills do what the README claims, maintenance health is strong (170+ commits, 20+ contributors, daily PR cadence, CI green on main), and the marketplace install path is one command on Claude Code.
On Cursor, Windsurf, OpenCode, Copilot, and rules-file setups, the value depends on whether the agent actually honors the loaded instructions. Forward-looking, v0.7 is the version to watch: open PRs and issues suggest it will formalize Kiro and Codex setup docs and resolve the plugin-version mismatch. The line that lands the design choice is Osmani’s own:
“If you put a 2,000-word essay on testing best practices into the agent’s context, the agent reads it, generates plausible-looking text, and skips the actual testing. If you put a workflow there, the agent has something to do, and you have something to verify.”
Engineering teams running coding agents on production work, solo developers who want fewer agent fires by cherry-picking the minimum viable set (spec-driven-development + test-driven-development + code-review-and-quality), and platform teams designing internal agent frameworks (the three-layer model and parallel fan-out are reusable patterns).
It does not fit legacy codebases without specs or test infrastructure, teams whose primary harness is OpenCode without a strong AGENTS.md discipline, or anyone looking for a model upgrade rather than a workflow layer.
For teams already running a skills layer, the upgrade case is the three-layer model plus parallel fan-out. Neither exists in obra/superpowers or anthropics/skills.
Follow @AlphaSignalAI for more content like this. Also, Check our Harness Engineering workshop, May 14th, 2 days left, +50 going.
Subscribe at AlphaSignal.ai for daily AI signals. Read by 280,000+ developers.
Q: What does agent-skills do that obra/superpowers doesn’t? A: Three things obra/superpowers does not document at the same structural depth. First, 20 of 22 skills include a Common Rationalizations table that names the excuses agents use to skip steps. Second, /ship is a parallel fan-out that runs code-reviewer, security-auditor, and test-engineer concurrently and merges their reports. Third, the repo ships three hook systems that give the pack enforcement weight on Claude Code.
Q: Which skill was added most recently, and why does it matter? A: doubt-driven-development, added in May 2026. It runs an adversarial fresh-context reviewer on every non-trivial in-flight decision using a five-step cycle (CLAIM, EXTRACT, DOUBT, RECONCILE, STOP), with optional cross-model escalation to Codex or Gemini. The point is to catch wrong-direction work mid-flight, while course correction is still cheap, not at /review time when the diff is already written.
Q: Which AI coding agents support agent-skills today? A: Claude Code (recommended path via plugin marketplace), Cursor, Gemini CLI (native skill install), Windsurf, OpenCode, GitHub Copilot, and Kiro IDE. The skills are plain Markdown and work with any agent that accepts system prompts or instruction files.
Q: What is the minimum set of skills to install first? A: The community-cited minimum is spec-driven-development, test-driven-development, and code-review-and-quality. Adding incremental-implementation and security-and-hardening covers most production-relevant workflows without saturating the context window.
Q: How does the /ship command work, and when does it skip the parallel review? A: /ship spawns three subagents in one turn: code-reviewer, security-auditor, and test-engineer. The main agent merges their reports into a GO or NO-GO decision with a mandatory rollback plan. It skips the fan-out only when the change touches two files or fewer, the diff is under 50 lines, and it does not touch auth, payments, data access, or config.