Workshop · 60 minutes · hands-on

Authoring agent skills: a hands-on workshop

How to write a SKILL.md that an agent actually triggers, follows, and can verify — using the htmlify repo's own skills as the worked example.

Who this is for

Developers who use Claude Code, Codex, or any agentskills.io-compatible client and want to package a repeatable workflow as a skill.

What you leave with

A drafted frontmatter description, three rules rewritten to be testable, and a validator lab you ran yourself. Bring a laptop with Node 20+ and a clone of the repo.

3 acts · 3 checkpoints · printable handout under the Guide button (G)

Cold open

You have explained this workflow to your agent fourteen times

“Remember to inline the CSS… no external fonts… check it opens standalone… oh, and add print styles again.”

Every session starts from zero. The workflow lives in your head and leaks into chat one correction at a time.

A skill is that workflow written down once — with a trigger, rules, and a way to check the result.

The agent reads it when the request matches, follows the rules, and validates its own output. No re-explaining.

Today's worked example: the htmlify and deckify skills that produced the deck you are looking at.

Orientation

Three acts, three checkpoints

Fig 1 · The hour at a glance — checkpoints in orange are yours, not mine

Act 1: what a skill file is. Act 2: rules an agent will actually follow. Act 3: making the output check itself. Each act ends with you doing the thing.

Act 1 · Anatomy

A skill is three layers in one file

Fig 2 · Anatomy of a SKILL.md · Source: zakelfassi/htmlify, skills/htmlify/SKILL.md

Act 1 · Anatomy

The description is the trigger surface

yaml · skills/htmlify/SKILL.md (frontmatter, abridged)

---
name: htmlify
description: Create self-contained HTML artifacts from agent or repo
  context, including operator briefs, build plans, implementation maps,
  PR/release packets, incident timelines, … Use when the user asks
  to turn dense text, code evidence, plans, reviews, or status into
  browser-ready HTML instead of a markdown wall.
license: Apache-2.0
metadata:
  version: "0.3.1"
  source: "https://github.com/zakelfassi/htmlify"
---

Specific verbs — “create self-contained HTML artifacts”, not “helps with HTML”.
Artifact nouns — the things users actually ask for, by name.
“Use when…” — describes the request, from the user's side of the conversation.
Versioned and licensed — metadata.version is bumped by release automation, not by hand.

Act 1 · Anatomy

SKILL.md stays short; references/ carry the depth

Fig 3 · Progressive disclosure in the htmlify skill family · Source: zakelfassi/htmlify, skills/

Every token in SKILL.md is paid on every triggered request. Reference files are paid only when the task needs them — so the skill says exactly when to load each one.

Checkpoint 1 · 7 min

Write the frontmatter description for your skill idea

Worksheet · also in the handout

name:

description: (3–5 sentences)

Scoring criteria

Specific verbs: what it produces or does, concretely.
Artifact nouns: the outputs named the way users name them.
“Use when…” clause: request shapes, quoted in user language.
Bounded: says what it is not for if a sibling skill exists.

Test: would a stranger reading nothing but your description know exactly which requests should fire it — and which should not?

Act 2 · Rules

Rules an agent will actually follow

markdown · skills/htmlify/SKILL.md (operating rules, abridged)

## Operating Rules

1. Gather evidence first. Read the repo, docs, git state, PR/CI/deploy
   state, logs, … Mark uncertain claims as `needs verification`.
2. Pick the smallest artifact mode that fits the request:
   - `operator-brief`: what happened, what is next, blockers, risks…
   - `build-plan`: problem, target shape, phases, owners, validation…
4. Keep it self-contained. Inline CSS and JS; no external fonts,
   CDNs, assets, analytics, or build step unless the user explicitly asks.
11. Validate before final response: doctype, standalone
    <html>/<body>, no missing local assets, expected sections…

Numbered, imperative, and ordered as a procedure: evidence → mode choice → constraints → validation. An agent can cite “rule 4” back to you; it cannot cite a vibe.

Act 2 · Rules

A mode menu is a decision procedure

deckify mode	Choose when the user wants…
talk-deck	a YouTube/live presentation with speaker notes and run-of-show
workshop-deck	a talk plus exercises, labs, checkpoints, and handouts — this deck
essay-deck	a presentation plus a downloadable long-form guide
demo-deck	a session centered on live demos with fallback screenshots
launch-deck	product narrative, proof, risks, roadmap, and a CTA
teaching-guide	a PDF-first guide with an optional slide mode

“Smallest mode that fits” converts a fuzzy judgment call into a lookup. The agent stops guessing scope; the user gets predictable shapes.

Act 2 · Rules

Anti-pattern: rules you cannot test

Vague vibe	Testable rule (from this repo's skills)
“Make it look professional”	Default to the Hardcopy tokens: one accent color, hairlines over shadows, border-radius ≤ 2px, serif display headings.
“Should work offline”	Inline all CSS/JS; no external fonts, CDNs, or remote assets — the validator errors on any remote reference.
“Help the presenter”	Every substantive slide contains <aside class="notes">; the validator reports each missing one by slide number.
“Keep it reasonable in size”	Warn above 512 KiB; hard error above 2 MiB.

Litmus test: could a script check it? If yes, it is a rule. If no, it is a hope. Unbounded scope (“and anything else useful”) fails the same test.

Checkpoint 2 · 7 min

Convert three vague instructions into testable rules

Rewrite each so a script — or a strict reviewer — could verify compliance

“Make the output look nice.”
“Don't make the file too big.”
“Add some interactivity if it helps.”

A good rewrite has

a named standard (a token set, a template, a contract file) instead of an adjective,
a number or enumerable condition where one exists,
a consequence: what happens, or what check fails, when it is violated.

Act 3 · Verify

A skill that can check its own output beats one that hopes

Fig 4 · The validation loop the deckify skill mandates before any final response

Act 3 · Verify

Case study: what --profile deck enforces

Contract item	Check	Severity
Standalone document	doctype, one <html>/<body>, non-empty title, viewport meta	error
Slides	≥ 2 <section class="slide"> elements	error
Keyboard nav	a script registering a keydown listener	error
Speaker notes	every substantive slide has <aside class="notes">, reported per slide	error
Script safety	inline script only; no handler attributes; no javascript: URLs	error
Self-contained	no remote assets, fonts, or stylesheet links	error
Print mode	@media print rules for the handout	warning
Size	> 512 KiB warns; > 2 MiB errors	both

The contract lives in references/deck-template.md; the code lives in src/validate.js. Doc and check ship together, in the same repo, versioned together.

Checkpoint 3 · Lab · 8 min

Break it, validate it, fix it

shell · from the repo root

node bin/htmlify-answer.js --validate \
  lab/broken-deck.html --profile deck

expected validator output

lab/broken-deck.html: INVALID — 4 errors
  [no-viewport]      missing viewport meta tag
  [no-slides]        found 1, need ≥ 2
  [no-keyboard-nav]  no keydown listener
  [missing-notes]    slide 1 ("Demo")

Steps

Copy the broken snippet from the handout (Exercise 3) into lab/broken-deck.html.
Run the validator. Read each error code.
Fix one error at a time; re-run after each fix.
Stop at: valid — 0 errors.

Finished early? Run it against this deck file and read what a passing report looks like.

Field guide

Three ways skills die

Failure	Symptom	Fix
Never triggers	Description written from the implementation's side (“parses AST…”); users' actual words appear nowhere.	Rewrite from the requester's side: verbs, artifact nouns, quoted “use when” phrases.
Triggers too much	Grabby verbs and unbounded nouns (“helps with any web content”); fires on requests it handles badly.	Narrow the nouns; name what it is not for; point sideways at sibling skills.
Monolithic	2,000-line SKILL.md taxing every invocation; agents skim and miss the rules that matter.	Keep SKILL.md to rules and pointers; move depth to references/ with load-when conditions.

htmlify and deckify dodge the second failure by pointing at each other: htmlify's rule 8 sends full presentation requests to deckify.

Logistics

Run of show

Clock	Chapter	Beat
00:00	Opening	Promise, audience, setup check (Node 20+, repo clone)
05:00	Act 1 · Anatomy	Three layers; frontmatter as trigger surface; references/
15:00	Checkpoint 1	Write a frontmatter description · debrief two aloud
22:00	Act 2 · Rules	Numbered rules; smallest mode; vague-vs-testable
32:00	Checkpoint 2	Three vague instructions → testable rules · debrief
39:00	Act 3 · Verify	Validation loop; the deck-profile case study
47:00	Checkpoint 3	Lab: validate and fix the broken snippet
55:00	Close	Failure modes recap; checklist; print the handout
60:00	End	—

Ship your skill against this checklist

Description has verbs, artifact nouns, and a “use when” clause.
license and metadata.version are set.
Rules are numbered, imperative, and ordered as a procedure.
Evidence-first is rule 1; a mode menu replaces “use judgment”.
Every rule passes the litmus test: a script could check it.
Depth lives in references/ with load-when pointers.
A validation command exists — and the skill orders the agent to run it.
CI re-validates whatever examples you commit.

Sources

github.com/zakelfassi/htmlify — the worked example
agentskills.io — the skill format
skills/htmlify/SKILL.md · skills/deckify/SKILL.md
skills/deckify/references/deck-template.md
skills/deckify/references/hardcopy.md

Press G for the handout; print it from there for the PDF.

Workshop handout · keep after the session

Authoring Agent Skills — Workshop Guide

Modeworkshop-deck

Duration60 minutes

Source repozakelfassi/htmlify

Formatagentskills.io

This is not a transcript. It is the part of the workshop you will want at your desk next week: the three act summaries, the three exercises with hints and solution sketches, the skill-authoring checklist, and references.

1.0Act 1 — Anatomy of a skill

A skill is a single SKILL.md with three layers that fail in three different ways:

Frontmatter (name, description, license, metadata.version, metadata.source). The description is the trigger surface: the harness matches incoming requests against this text alone. Write it from the requester's side of the conversation — specific verbs, artifact nouns, and a literal “Use when…” clause quoting how users actually ask.
Body: numbered operating rules, a mode menu, validation orders, and a final-response contract. This is the behavior contract once the skill is loaded.
Pointers into references/: depth that is loaded on demand. Every token in SKILL.md is paid on every triggered request; reference files are paid only when needed. Each pointer should name the file and the condition: “Load references/hardcopy.md … before styling any artifact.”

2.0Act 2 — Operating rules agents follow

Rules that survive contact with an agent share three properties:

Numbered and imperative. “1. Gather evidence first.” An agent can cite rule numbers; it cannot cite vibes. The ordering encodes the workflow: evidence → mode → constraints → validation.
Smallest mode that fits. Replace “use judgment” with an enumerated menu plus a selection rule. htmlify names ten artifact modes; deckify names six deck modes. The menu turns scope guessing into a lookup.
Testable, not vague. The litmus test: could a script check it? “Look professional” is a hope; “one accent color, hairlines over shadows, radius ≤ 2px” is a rule. Unbounded scope (“and anything else useful”) fails the same test.

3.0Act 3 — Making output verifiable

A skill that can check its own output beats one that hopes. The htmlify repo demonstrates the full pattern:

The skill orders the loop: “Before the final response, run the bundled validator … Fix every reported error before responding; report remaining warnings.”
The contract is documented next to the code: references/deck-template.md describes the DOM contract; src/validate.js enforces it; both version together.
Two enforcement points: the agent validates before responding (skill rule), and CI re-validates every committed example (repo rule). The skill can be ignored; CI cannot.
Error messages are agent UX: “Substantive slide 7 has no speaker notes” is fixable in one step; “invalid deck” is not.

4.0Exercise 1 — Write a trigger description

Exercise 1 · 7 min

Task. Pick a workflow you repeat with your agent. Write the frontmatter name and a 3–5 sentence description.

Hints. Start with the verb and the artifact (“Generate release notes…”). List the output nouns the way users say them. End with a “Use when the user asks…” clause quoting 2–3 real request phrasings. If a sibling skill exists, say what this one is not for.

Solution sketch (for a release-notes skill):

yaml · sample answer

name: release-notes
description: Generate release notes and changelogs from merged PRs
  and commit history, producing version-grouped markdown and a
  paste-ready announcement. Use when the user asks to "write release
  notes", "summarize what shipped", or turn git history into an
  announcement. Not for live deploy status — use a status skill.

5.0Exercise 2 — Vague to testable

Exercise 2 · 7 min

Task. Rewrite each instruction so a script — or a strict reviewer — could verify compliance.

“Make the output look nice.”
“Don't make the file too big.”
“Add some interactivity if it helps.”

Hints. Name a standard instead of an adjective; attach a number where one exists; state which check fails on violation.

Solution sketches (each maps to a shipped rule in this repo):

“Default to the Hardcopy design tokens: serif display headings, mono-uppercase metadata, at most one accent color, hairlines instead of shadows, border-radius ≤ 2px. No gradients, no stock imagery.”
“Keep the artifact under 512 KiB; the validator warns above that and errors above 2 MiB.”
“For deck-style artifacts, add keyboard navigation (arrows, Home/End) registered with addEventListener in one inline script; inline handler attributes and external scripts are validator errors.”

6.0Exercise 3 — Validator lab

Exercise 3 · Lab · 8 min

Task. From a clone of github.com/zakelfassi/htmlify, save this deliberately broken snippet as lab/broken-deck.html:

html · lab/broken-deck.html (broken on purpose)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <title>Lab deck</title>
</head>
<body>
  <main class="deck-shell">
    <section class="slide active" data-title="Demo">
      <h2>One idea, no notes</h2>
      <table><tr><td>evidence</td></tr></table>
    </section>
  </main>
</body>
</html>

Run.

shell · repo root

node bin/htmlify-answer.js --validate lab/broken-deck.html --profile deck

Expect four errors, then fix one at a time, re-running after each:

no-viewport — add the viewport <meta> to <head>.
no-slides — a deck needs at least two <section class="slide"> elements; add a second slide.
no-keyboard-nav — add one inline <script> that registers a keydown listener with addEventListener and toggles the active class on arrow keys.
missing-notes — each substantive slide (200+ chars of text, or containing h2/h3/table/figure) needs <aside class="notes"> with real speaker notes.

Done when the report reads valid — 0 errors. Then run the same command against examples/deckify/workshop-deck.html to see a passing report for a full deck.

7.0Skill-authoring checklist

Frontmatter has name and a description written from the requester's side: specific verbs, artifact nouns, “use when” phrasings.
license, metadata.version, and metadata.source are set; version bumping is automated where possible.
Operating rules are numbered, imperative, and ordered as the actual workflow.
Rule 1 is evidence-first: read the source material before producing anything.
A mode menu (“smallest mode that fits”) replaces open-ended judgment.
Every rule passes the litmus test: a script could check compliance.
Output constraints are explicit: self-contained, size-bounded, format-contracted.
Depth lives in references/; each pointer names the file and the load-when condition; SKILL.md stays short.
A validation command exists, the skill orders the agent to run it before the final response, and errors must reach zero.
CI re-validates committed examples, so the contract holds even when nobody is watching.
The skill defines its final-response contract: what to report (paths, mode, validation results, gaps).
Sibling skills point at each other to resolve boundary disputes (htmlify rule 8 routes full decks to deckify).

8.0References

github.com/zakelfassi/htmlify — the worked-example repo: skills, validator, CI, and this deck
agentskills.io — the SKILL.md format and compatible clients
skills/htmlify/SKILL.md — trigger description and operating rules studied in Act 1–2
skills/deckify/SKILL.md — the skill that produced this deck
skills/deckify/references/deck-template.md — the DOM contract the deck validator enforces
skills/deckify/references/hardcopy.md — the visual identity used by this handout
bin/htmlify-answer.js · src/validate.js — the validation CLI from checkpoint 3

Generated with the deckify skill · validated with --profile deck · 0 errors