ClosedLoop.AI

Claude Code Expert Training Textbook

Claude Code One-Day Intensive

A full-text, O’Reilly-style course reader for the condensed five-hour live training. Each live training block is expanded into a self-contained module with concepts, examples, artifacts, exercises, review checkpoints, and a Claude Code changelog accuracy pass.

FormatFive-hour live intensive

AudienceEngineering teams adopting Claude Code

Changelog checkClaude Code 2.1.176

How to Use This Book

This textbook version turns the one-day agenda and instructor presentation guide into a readable course companion. It is designed for engineers who want more than slide bullets: they need a practical operating model, concrete artifact templates, and enough explanation to apply the ideas after the live session ends.

The live course assumes mandatory setup pre-work. Participants should arrive with Claude Code working, the repository cloned, demo commands verified, editor and terminal ready, and baseline tool-permission posture understood. Live time should not be spent repairing local setup; it should be spent practicing routing, planning, investigation, review, and workflow design.

Current troubleshooting shortcut: If a participant’s environment behaves unexpectedly, use Claude Code safe mode to distinguish core behavior from local customizations such as CLAUDE.md files, plugins, skills, hooks, and MCP servers. This keeps the readiness gate from turning into a debugging workshop.

Course outcomes: By the end of the intensive, participants should have a primitive and routing board, compact execution brief, short investigation memo, evidence bundle, review and verification packet, compact workflow sketch, and one next-run improvement note.

Course Map

Modules

Current Claude Code Release Cross-Check
Operating Surface and Routing 70 min
Planning and Context Management 55 min
Intent Recovery and Dynamic Evidence 60 min
Review, Test, and Verify 60 min
Workflow Design and Minimal Improvement Loop 45 min

The recurring critique pattern

Every module uses the same critique pattern because the course is artifact-first. Do not ask only whether an artifact looks polished. Ask whether it can serve as a downstream input.

What would you improve about this artifact? What would make it better as a downstream input? Could another operator use this without reopening the whole problem?

This pattern is intentionally simple. It works for routing boards, execution briefs, investigation memos, feedback bundles, review packets, and workflow specs. The goal is to train participants to judge artifacts by operational usefulness rather than by surface completeness.

Current Claude Code Release Cross-Check

This reader has been cross-checked against the public Claude Code changelog through the latest visible entry, version 2.1.176. The course remains accurate as a principles-first training artifact, with release-sensitive updates called out below.

Instructor note: Treat exact model names, entitlement-specific model rows, marketplace behavior, and managed-setting behavior as live product details. The durable lesson is the routing pattern: choose the smallest reliable primitive, isolate broad context, verify evidence, and preserve reusable artifacts.

What changed in the current changelog window

Changelog area	Training impact	Where to emphasize it
Model governance	`availableModels` enforcement and the managed `enforceAvailableModels` setting make model routing an administrator-controlled policy topic, not just an individual preference.	Module 1, model selection and enterprise routing boards
Model picker behavior	The `/model` picker and aliases may vary by plan, provider, and allowlist. Do not teach one hardcoded model menu as universal.	Module 1, model selection
Nested subagents	Subagents can now spawn their own subagents up to five levels deep. This expands workflow design, but the course should still teach bounded delegation and explicit stop conditions.	Modules 1 and 5
Plugin and skill management	Plugin-listing, marketplace browsing, bundled-skill controls, and skill hot-reload improvements reinforce teaching skills and plugins as governed reusable assets.	Module 1 and Appendix
Troubleshooting mode	`--safe-mode` and `CLAUDE_CODE_SAFE_MODE` can start Claude Code with customizations disabled. This is useful when teaching setup checks and plugin/skill debugging.	Pre-work gate and Appendix
Working directory mobility	`/cd` allows a session to move working directories without breaking prompt cache. This improves multi-repo and worktree workflows.	Module 1 and Module 5
Usage attribution	Recent usage reporting breaks down cache misses, long context, subagents, skills, agents, plugins, and MCP usage. This makes cost and context hygiene more measurable.	Modules 2 and 5 scorecards

Accuracy adjustments applied

Model names are now described as release- and entitlement-sensitive instead of fixed universal tiers.
Subagent guidance now explicitly allows nested subagent workflows while warning against uncontrolled recursion.
Pre-work troubleshooting now includes safe mode as the fastest way to separate Claude Code core behavior from local customizations.
The workflow scorecard now includes a release-aware cost and usage review prompt.
The appendix now includes a short update checklist for future changelog refreshes.

Keep this current: Before each live delivery, skim the top of the Claude Code changelog for model, managed-setting, permission, plugin, skill, and background-agent changes. Update examples only where the product surface changed; do not rewrite the core operating model unless the workflow primitive itself changed.

Module 1 · Block 1 · 70 minutes

Operating Surface and Routing

Claude Code becomes useful at team scale when engineers stop treating it as a single chat box and start treating it as an operating surface: a collection of tools, commands, skills, agents, subagents, plugins, execution modes, and model choices. This module teaches the routing judgment that keeps work small, explicit, and repeatable.

Module outcome: Participants produce a primitive and routing board that classifies common engineering tasks by the right Claude Code primitive, execution mode, and model choice.

Why routing is the first skill

Most failed AI-assisted engineering sessions do not fail because the model is incapable. They fail because the work is routed to the wrong primitive. A developer asks for a broad implementation when the real need is investigation. A team writes a permanent rule into a one-off prompt. A repeated checklist stays trapped in someone’s memory instead of becoming an executable skill. A heavyweight model is used for cheap file discovery, while a complex design decision is handed to a fast model with insufficient reasoning depth.

The first day of a Claude Code operating model therefore begins with vocabulary. Not vocabulary for its own sake, but vocabulary that gives the team a shared way to decide where work should live. Once the team can say “this is a command,” “this is a skill,” “this belongs in memory,” “this should be a subagent,” or “this needs a headless run,” the tool stops being mysterious and starts becoming an engineering system.

The operating surface

Think of Claude Code as a workbench with several surfaces. The interactive session is where a human and model negotiate a task. Tools are the model’s hands: reading files, running commands, searching repositories, editing code, and interacting with configured integrations. Commands are named entry points that standardize common activities. Skills package reusable procedures. Agents and subagents isolate specialized work. Plugins and marketplace content extend what is available across projects or organizations. Headless execution turns the same operating model into automation.

The practical question is not “which feature is coolest?” The practical question is: where should this work be represented so that another engineer can reuse it without rediscovering it?

Primitive	Best use	Team-scale signal	Common mistake
Prompt	One-off instruction, exploration, or clarification	Useful when the work is new, ambiguous, or conversational	Using prompts repeatedly for stable procedures
Command	A named local action or workflow entry point	Useful when the same operation should start the same way every time	Packing too much reasoning policy into a command name
Skill	A reusable procedure, checklist, transformation, or review pattern	Useful when a workflow should be invoked on demand and updated centrally	Putting always-needed facts in a skill instead of memory or project instructions
Agent	A specialized role with its own instructions and tool boundaries	Useful when a class of work needs a consistent expert lens	Creating broad agents with vague responsibilities
Subagent	Isolated investigation or parallel work unit	Useful when exploration would pollute the main context window	Letting broad file reads accumulate in the main conversation
Plugin / marketplace package	Reusable capability distributed beyond one repo	Useful when teams need a shared extension point	Packaging before the workflow has stabilized
Headless run	Non-interactive execution in CI, scripts, or automation	Useful when the work has clear inputs, outputs, and stop conditions	Automating work that still requires human judgment

Interactive versus headless work

Interactive mode is for discovery, negotiation, and judgment. A human can interrupt, correct assumptions, ask for alternatives, and decide whether the model’s next step is safe. Headless mode is for work that has already been bounded. It needs clear inputs, allowed tools, expected outputs, and stop conditions. If the task still requires a human to decide what the task is, it is not ready for headless execution.

A good rule is this: interactive sessions produce artifacts; headless sessions consume artifacts. During a live session you might create an execution brief, investigation memo, or verification packet. Once those artifacts are stable, a headless run can implement a bounded change, run a review, or generate a report from known inputs.

Checkpoint: Before automating a Claude Code workflow, ask whether another operator could execute it from the artifact alone. If the answer is no, the workflow is still too implicit.

Goal mode and debate mode

Two conversational modes matter for the one-day course. Goal mode is useful when the team wants the model to drive toward a defined outcome. Debate mode is useful when the team wants the model to challenge a plan before code is written. Goal mode helps with forward motion; debate mode helps with error prevention. Both are most useful when paired with artifacts.

For example, a developer might ask Claude to draft a compact execution brief in goal mode. Once the brief exists, the developer can switch into a debate posture: “Challenge this brief. Identify hidden assumptions, underspecified acceptance criteria, and likely regression risks.” The output of the debate should not be a wandering conversation. It should be a better brief.

Model selection for the job

Model routing is part of work routing. Cheap, fast models are appropriate for bounded lookup, file discovery, and summarizing known material. Balanced models are appropriate for routine implementation and review. The strongest reasoning models are appropriate for architecture, tricky debugging, multi-file refactors, and decisions where a bad plan is more expensive than slower planning. Exact model names and picker rows are release-, plan-, provider-, and allowlist-sensitive; teach the routing principle, then have participants verify the current choices with /model, /status, and any managed availableModels policy in their environment.

Job	Preferred routing	Why
Find relevant files	Explore subagent on a fast model	Broad search stays out of the main context window
Design a refactor	Strong reasoning model for planning, then balanced model for execution	The plan is the expensive part to get wrong
Apply a known checklist	Skill or command	The procedure should be stable and repeatable
Review a security-sensitive change	Specialized review agent with high effort	The lens and depth matter more than speed
Generate a one-time explanation	Interactive prompt	The work is conversational and may not need persistence
Run a recurring report	Headless workflow over a stable spec	The inputs, outputs, and schedule are known

Release-aware model governance: Recent Claude Code releases tightened availableModels enforcement and added managed enforcement controls. For enterprise training, the routing board should include the organization’s approved model set and a note that /fast, aliases, subagent model overrides, advisor models, and background agents must stay inside the allowlist.

Building the primitive and routing board

The routing board is the first durable artifact of the day. It lists common team tasks and routes each one to the smallest reliable primitive. The board should be compact enough to live in a repo, onboarding guide, or team operating doc. It should not be aspirational. It should describe the actual next version of the team’s workflow.

Task: Review a PR for auth regressions
Primitive: security-reviewer agent + /code-review command
Mode: interactive for local development; headless only after the rule set is stable
Model: balanced model for normal review, stronger reasoning for high-risk auth changes
Inputs: diff, execution brief, REVIEW.md, relevant auth rules
Output: findings-first review with severity, evidence, and recommended fix
Stop condition: no important findings or explicit residual risk accepted by human

What good looks like

A good routing board has three qualities. First, it is specific. “Use Claude for coding” is not useful; “use an explorer subagent to identify files before reading them into the main session” is useful. Second, it is bounded. Each row says what the primitive should and should not do. Third, it is teachable. A new engineer should be able to read the board and make the same routing decision as a senior engineer most of the time.

Exercise: Pick five recurring engineering activities from your team. For each, decide whether it belongs as a prompt, command, skill, agent, subagent, plugin, or headless workflow. Then add the model choice, required input artifact, expected output artifact, and stop condition.

Anti-patterns

The mega-prompt: A long prompt that mixes stable policy, one-time instructions, codebase facts, and a workflow checklist. Split it into memory, project instructions, skill, and task prompt.
The everything-agent: A custom agent named “senior engineer” that can do anything. Specialized agents should have a narrow lens, clear tools, and a predictable output shape.
Premature plugin packaging: A workflow is packaged before the team has run it enough times to know its inputs, failure modes, and stop conditions.
Headless ambiguity: A non-interactive run is launched with vague goals and no acceptance criteria. Headless work should consume a brief, not invent one.

Module recap

The foundation of effective Claude Code use is not prompt cleverness. It is routing discipline. Teams that classify work correctly can preserve knowledge, reduce repeated prompting, avoid context-window waste, and turn successful sessions into reusable operating assets.

Module 2 · Block 2 · 55 minutes

Planning and Context Management

Planning is not a ceremonial step before coding. In Claude Code, planning is the act of shaping context so the next operator—human, model, agent, or headless workflow—can act without reopening the entire problem.

Module outcome: Participants create a compact execution brief with facts, assumptions, open questions, context map, bounded work packages, and acceptance criteria.

The plan is a compression artifact

Claude Code sessions can accumulate enormous context: pasted requirements, file contents, terminal output, attempted fixes, test failures, screenshots, and corrections from the human. Without deliberate compression, the session becomes expensive and fragile. The model is forced to infer what still matters from a long transcript. Humans are forced to remember why earlier decisions were made. Downstream operators inherit noise instead of a plan.

A useful plan is not a transcript. It is a lossily compressed representation of the work. It preserves the facts, decisions, constraints, risks, and next actions that matter. It discards the conversational path that produced them. That is why the one-day course treats planning as context management.

Separate facts, assumptions, and open questions

The simplest improvement to most AI coding sessions is to stop blending known facts with guesses. Models are very good at continuing a confident narrative. If the prompt says “the auth middleware probably owns refresh-token invalidation,” the model may proceed as if that is true. A disciplined brief separates evidence from inference.

Category	Definition	Example	How to handle it
Fact	A statement backed by direct evidence	`src/auth/middleware.ts` validates JWTs before route handlers run	Can be used directly in the plan
Assumption	A plausible statement not yet proven	Refresh token invalidation is probably handled in the session store	Must be tested or called out as risk
Open question	A decision or unknown that blocks confident execution	Should expired refresh tokens be deleted or retained for audit?	Resolve before implementation or explicitly defer
Constraint	A boundary the solution must respect	Do not change public API response shape	Use as acceptance criteria and review rule

The compact execution brief

The execution brief is the central planning artifact. It should fit on one or two pages, but it should be complete enough for another operator to execute. The brief is not just a summary. It is an instruction-bearing artifact with a clear contract: here is the problem, here is the known context, here are the boundaries, here is the proposed path, and here is how we will know whether the work is done.

# Compact Execution Brief

## Goal
Implement refresh token rotation without changing the existing login response contract.

## Known facts
- JWT validation happens in src/auth/middleware.ts.
- Session persistence is implemented in src/auth/session-store.ts.
- Existing tests cover login success and expired access tokens.

## Assumptions to verify
- Old refresh tokens are not currently invalidated after rotation.
- Token reuse should be treated as suspicious but not immediately lock the account.

## Open questions
- Should reuse detection emit an audit event?
- Is token family tracking already present in the database schema?

## Context map
- Auth middleware: request validation and session lookup
- Session store: token persistence and expiration
- Test suite: integration tests under tests/auth/

## Work packages
1. Verify current token rotation behavior.
2. Add invalidation logic or token-family tracking.
3. Extend integration tests for old-token reuse.
4. Produce review and verification packet.

## Acceptance criteria
- New refresh token is issued on rotation.
- Previous refresh token cannot be reused.
- Existing login response shape is unchanged.
- Tests demonstrate success, expiration, and reuse behavior.

Context maps

A context map tells the model where to look and why. It is not a full dump of file contents. It is a pointer layer: directories, files, functions, commands, external systems, and documents that are likely relevant. Good context maps reduce token use because Claude can read the right files in the right order instead of scanning the repository blindly.

A context map should include both primary and secondary context. Primary context is required to make the change. Secondary context helps review risk, verify behavior, or understand why the system is shaped the way it is.

## Context map
Primary:
- src/auth/middleware.ts — request authentication boundary
- src/auth/session-store.ts — refresh token persistence
- db/schema.sql — session and token tables
- tests/auth/refresh-token.test.ts — integration behavior

Secondary:
- docs/security/auth-model.md — intended auth posture
- .claude/rules/api-security.md — project-specific security rules
- recent PRs touching auth middleware — intent and regression context

Debate review before coding

Before code is written, ask Claude to attack the plan. The goal is not to win the debate; the goal is to improve the artifact. A debate review should look for ambiguous goals, missing constraints, unsupported assumptions, hidden coupling, risky files, weak acceptance criteria, and likely regression paths.

Review this execution brief as a skeptical senior engineer. Identify unsupported assumptions, missing context, and acceptance criteria that would fail to catch a regression. Do not implement. Return a revised brief outline and a list of questions that must be answered before coding.

The output of debate review should be folded back into the brief. If the debate produces useful insights that remain trapped in conversation history, the next operator still cannot use them. Artifact-first planning means the artifact is the durable memory.

What makes a plan reusable downstream?

A reusable plan has clear boundaries. It names the exact goal, the non-goals, the files likely involved, the evidence already gathered, the assumptions still open, and the stop condition. It also includes enough review criteria to prevent the model from declaring success too early.

Goal: What outcome should exist after the work is complete?
Non-goals: What tempting adjacent work should not be done?
Facts: What has been directly observed?
Assumptions: What might be true, but needs verification?
Context map: Where should the next operator look first?
Work packages: What are the smallest implementation units?
Acceptance criteria: What evidence will prove the work is complete?
Review focus: What risks should review emphasize?

Exercise: Take a messy intake request from your team and compress it into an execution brief. Then ask another participant whether they could execute it without reopening the original discussion. Any question they ask is either an open question or a missing fact.

Module recap

Planning in Claude Code is not about slowing down. It is about preserving momentum by producing a compact artifact that can survive compaction, handoff, review, and automation. The better the brief, the less the model has to infer and the easier it is for humans to hold the work accountable.

Module 3 · Block 3 · 60 minutes

Intent Recovery and Dynamic Evidence

When a codebase is old enough, the current code is rarely the whole story. This module teaches participants to recover intent from Git history, issues, PRs, logs, tests, command traces, screenshots, and other dynamic sources before asking Claude to change behavior.

Module outcome: Participants create a short investigation memo and feedback bundle that distinguish evidence from inference and make the next model turn more accurate.

Find the why before the what

Claude can usually explain what code does from the current files. The harder question is why it does that. Was a strange branch added for a customer-specific edge case? Did a test encode a production incident? Was a confusing abstraction introduced to support a migration that has since finished? Current code often hides the reason for its own shape.

Intent recovery is the discipline of gathering enough historical and runtime evidence to avoid undoing deliberate behavior. It is especially important when the requested change appears simple. Simple changes are dangerous when they cut across hidden intent.

Static code is not enough

Reading the current file gives one kind of evidence. Git history gives another. Tests reveal expected behavior. Issues and PRs reveal tradeoffs. Logs reveal runtime reality. Screenshots reveal UI states that code alone may not make obvious. CLI traces reveal exact failure modes. Documentation and MCP-backed systems can provide external context that is not stored in the repository.

Evidence source	Question it answers	Risk if omitted
Current code	What does the system do now?	The model may miss hidden coupling outside the local file
Git blame and commits	Why was this line introduced or changed?	The model may remove a deliberate workaround
PR discussion	What tradeoffs were accepted?	The model may re-litigate settled decisions
Issues / tickets	What user or incident motivated the behavior?	The model may solve the wrong problem
Tests	What behavior is currently protected?	The model may pass local reasoning but break expected behavior
Logs and traces	What happens in real executions?	The model may optimize for imagined behavior
Screenshots	What does the user actually see?	The model may miss visual or state-machine issues
Docs and runbooks	What standards should govern the change?	The model may violate team conventions

Evidence versus inference

A strong investigation memo labels evidence. It does not say “the bug is caused by stale cache” unless there is direct evidence. It says “the failure appears after the cache read path; logs show cache hit with outdated value; no write-through event appears in the trace; inference: stale cache is likely.” This distinction matters because the next model turn will use the memo as context. If guesses are written like facts, the model will build on them.

# Investigation Memo

## Question
Why does the account summary show stale plan data after billing upgrade?

## Evidence gathered
- UI reproduces after upgrade from Basic to Pro without page refresh.
- Network trace shows GET /account returns plan=Basic immediately after upgrade.
- Server logs show billing webhook processed successfully.
- Git history shows account summary cache added in PR #184 to reduce billing API calls.

## Inferences
- The account summary endpoint likely reads a cached account projection.
- The billing webhook may update billing state without invalidating account summary cache.

## Open questions
- Where is account summary cache invalidated?
- Is delayed consistency acceptable for this UI?

## Recommended next step
Trace account summary cache writes and invalidation paths before changing UI code.

Dynamic evidence sources

Dynamic evidence is information that changes with execution: logs, test runs, database state, browser behavior, CLI output, screenshots, observability traces, and external tool responses. It is powerful because it grounds the model in reality. It is also noisy. A good operator does not paste raw dynamic output indiscriminately. They capture the relevant excerpt, label how it was produced, and explain why it matters.

When collecting dynamic evidence, include command provenance. The model should know not only the output, but the command, environment, timestamp or branch, and whether the result was reproducible.

Command: pnpm test tests/auth/refresh-token.test.ts --runInBand
Branch: refresh-token-rotation
Result: failed, 1 test
Relevant output:
  expected old refresh token reuse to return 401
  received 200
Interpretation:
  Existing implementation issues a new refresh token but does not invalidate the previous token.

Why “that didn’t work” fails as feedback

The phrase “that didn’t work” is almost useless to the model. It omits what was attempted, what was expected, what actually happened, what evidence was observed, and what changed between attempts. Good feedback is a bundle: action, expectation, observation, evidence, hypothesis, and next constraint.

Weak feedback	Better feedback
That did not work.	After applying the patch, `pnpm test tests/auth/refresh-token.test.ts` still fails. Expected old token reuse to return 401; actual response is 200. Relevant log shows session lookup succeeds for the old token. Focus next on invalidation in session-store, not middleware.
The UI is broken.	Clicking Save leaves the modal open. Browser console shows no error. Network tab shows PATCH /settings returns 204. The likely issue is local modal state not closing after success.
Try again.	Revise the approach without changing the public API response shape. Preserve existing success tests and add one regression test for duplicate submission.

The feedback bundle pattern

A feedback bundle is a structured correction that improves the next model turn. It should be short, but it must contain enough evidence for the model to update its plan. Use it whenever Claude’s first implementation fails, when a test result contradicts an assumption, or when a human reviewer spots a gap.

# Feedback Bundle

## Attempted change
Added token rotation logic in src/auth/middleware.ts.

## Expected result
Old refresh token reuse should return 401.

## Actual result
Old refresh token reuse returns 200.

## Evidence
Test: pnpm test tests/auth/refresh-token.test.ts
Failure: expected 401, received 200
Trace: old token still resolves in session-store lookup.

## Updated hypothesis
Middleware is not the right layer for invalidation. The session store accepts both token records.

## Next instruction
Inspect session-store token persistence and invalidation. Do not change response shape.

Module recap

Intent recovery prevents well-intentioned regressions. Dynamic evidence prevents hallucinated debugging. Feedback bundles convert failure into useful context. Together, they turn Claude Code from a code generator into an evidence-driven collaborator.

Exercise: Choose a bug or confusing behavior from the demo app. Produce a one-page investigation memo with at least three evidence sources and a feedback bundle that would help Claude recover from a failed first attempt.

Module 4 · Block 4 · 60 minutes

Review, Test, and Verify

The work is not done when code changes. It is done when the diff has been reviewed against the brief, the verification evidence is explicit, and the residual risk is clear enough for a human to accept or reject.

Module outcome: Participants produce a review and verification packet that makes findings, evidence, scope drift, regression risk, and PR-readiness explicit.

Review the diff against the brief

A Claude-assisted review should not ask “does this code look good?” That question is too broad and too subjective. The stronger question is: “does this diff satisfy the execution brief without violating constraints or introducing unacceptable risk?” The brief becomes the review contract.

Findings-first review means the reviewer leads with issues, not narrative. Each finding should state severity, evidence, affected file or behavior, why it matters, and a recommended fix. If there are no blocking findings, the review should still describe what was checked and what residual risk remains.

# Review Finding

Severity: Important
Area: Refresh token invalidation
Evidence: tests/auth/refresh-token.test.ts covers successful rotation but not reuse of the old token.
Why it matters: The acceptance criteria require old refresh tokens to be rejected.
Recommended fix: Add a regression test that attempts reuse of the previous token after rotation and expects 401.

Scope drift

Scope drift is any change that is not required by the brief. Some drift is harmless cleanup. Some drift is dangerous because it changes behavior the team did not intend to change. Claude can drift when it sees adjacent improvements, especially if the prompt rewards broad helpfulness. Review must therefore compare the diff against explicit goals and non-goals.

Drift type	Example	Review response
Benign cleanup	Renaming a local variable for clarity	Accept if low risk and local
Adjacent refactor	Changing session-store interfaces while adding one token behavior	Challenge unless required by the brief
Behavior expansion	Adding account lockout on token reuse when not requested	Reject or move to follow-up
Contract change	Changing login response shape while implementing rotation	Block
Test-only expansion	Adding regression tests for directly related edge cases	Usually accept

Testing is one gate, not the only gate

Passing tests are necessary evidence, but they are not proof of correctness. Tests only cover what they assert. A verification packet should include test results, manual checks when relevant, static review, diff review, command outputs, and residual risk. The point is not to create paperwork. The point is to prevent the phrase “tests pass” from hiding an unreviewed assumption.

For LLM-assisted work, verification should also include provenance: what files changed, what commands were run, what evidence was observed, and what the model did not check. This gives the human reviewer a clear map of confidence and uncertainty.

The verification packet

# Verification Packet

## Brief alignment
Goal: Refresh token rotation rejects old token reuse.
Status: Implemented and tested.

## Changed files
- src/auth/session-store.ts — invalidates previous refresh token on rotation
- tests/auth/refresh-token.test.ts — adds old-token reuse regression test

## Evidence
- pnpm test tests/auth/refresh-token.test.ts — passed
- pnpm test tests/auth/login.test.ts — passed
- Manual API check: old refresh token returns 401 after rotation

## Scope review
No public response contract changes observed.
No unrelated auth routes modified.

## Residual risk
Database cleanup of invalidated token records is not addressed. Existing retention behavior remains unchanged.

## PR handoff note
Reviewer should focus on token-store concurrency and whether invalidated token retention meets audit expectations.

Regression risk

Regression risk is not just the probability that something breaks. It is the product of likelihood, blast radius, and detectability. A small likelihood with a huge blast radius still deserves attention. A likely bug with easy rollback may be acceptable if the release path is safe. Claude can help enumerate risks, but the team must decide what risk is acceptable.

Risk question	Why it matters
What user-visible behavior changed?	Identifies blast radius
What existing tests protect this path?	Identifies current safety net
What did we not test?	Prevents false confidence
What external systems depend on this behavior?	Finds hidden contracts
How would we detect failure in production?	Separates known risk from invisible risk
How would we roll back?	Determines operational readiness

What makes a handoff PR-ready?

A PR-ready handoff gives the reviewer the shortest path to an informed decision. It should contain the problem statement, brief link or summary, changed files, review focus, verification evidence, known non-goals, and residual risk. The reviewer should not need to reconstruct the story from chat history.

PR-ready handoff formula

Problem → Approach → Changed files → Verification → Review focus → Residual risk. If any of those pieces is missing, the PR is not fully handoff-ready.

Module recap

Review and verification discipline turns model output into engineering evidence. The goal is not to make Claude “sound confident.” The goal is to make the work auditable: what changed, why it changed, how it was checked, and what remains uncertain.

Exercise: Review a provided diff or demo artifact against its execution brief. Produce three findings if problems exist; otherwise produce a verification packet and residual-risk note that would be acceptable in a PR description.

Module 5 · Block 5 · 45 minutes

Workflow Design and Minimal Improvement Loop

The one-day session closes by turning isolated practices into a named workflow. Participants define handoffs, gates, stop conditions, and one lightweight scorecard so the next run can improve without expanding into a full operating-system redesign.

Module outcome: Participants leave with a compact workflow spec and one credible next-run improvement checklist or metric.

From good sessions to repeatable workflows

A strong Claude Code session can still be a dead end if the team cannot repeat it. Workflow design captures the sequence that made the session successful: how the work was framed, what artifacts were produced, which agents or skills were used, where human review occurred, and what evidence counted as done. The workflow does not need to be elaborate. It needs to be named, bounded, and reusable.

The one-day course intentionally keeps this module compact. The goal is not to design a full engineering operating system. The goal is to leave with one workflow that can be tried next week and improved after one run.

A compact multi-agent workflow

Multi-agent workflow design should begin with work boundaries, not agent names. Each agent or subagent should own a distinct lens or phase. Current Claude Code releases support nested subagents, which means a delegated agent can fan out further for bounded research or review. Use that power carefully: nested delegation should have named inputs, maximum depth, output limits, and explicit stop conditions. If two agents need the same broad context and produce overlapping output, the workflow is probably not decomposed well.

# Compact Workflow Spec

Name: Evidence-first bug fix

Trigger:
A bug report has enough detail to reproduce or investigate.

Artifacts:
1. Investigation memo
2. Compact execution brief
3. Implementation diff
4. Review and verification packet

Roles:
- Explorer subagent: identify relevant files, history, and evidence sources
- Planner: compress evidence into execution brief
- Implementer: make bounded code changes from the brief
- Reviewer: compare diff against brief and produce findings

Gates:
- Do not implement until facts, assumptions, and open questions are separated.
- Do not review until acceptance criteria are explicit.
- Do not hand off until verification evidence and residual risk are written.

Stop condition:
The change is PR-ready or blocked by a named open question.

Handoffs

A handoff is where one operator’s output becomes another operator’s input. In Claude Code workflows, handoffs should be artifact-based. The explorer hands off an investigation memo. The planner hands off an execution brief. The implementer hands off a diff plus notes. The reviewer hands off findings and verification evidence. If a handoff requires the next operator to read the entire chat transcript, the handoff failed.

Multi-repo note: Claude Code now supports moving a live session to a new working directory with /cd. That makes worktree and multi-repo workflows easier, but the same rule applies: preserve the handoff as an artifact, not as implicit terminal state.

Handoff	Input	Output	Quality bar
Investigation → Planning	Evidence, traces, history, open questions	Execution brief	Facts and assumptions are separated
Planning → Implementation	Execution brief and context map	Bounded diff	Non-goals and acceptance criteria are respected
Implementation → Review	Diff and brief	Findings or approval with residual risk	Review is evidence-based
Review → Handoff	Findings, fixes, verification commands	PR-ready packet	Reviewer can decide without chat history

Gates and stop conditions

Gates prevent premature motion. Stop conditions prevent infinite motion. A gate says what must be true before the workflow can advance. A stop condition says when the workflow is complete, blocked, or unsafe to continue. Claude workflows need both because models tend to continue helping unless told what “done” means.

Good gates are observable. “Make sure the plan is good” is not a gate. “The brief includes goal, non-goals, facts, assumptions, open questions, context map, work packages, and acceptance criteria” is a gate. Good stop conditions are explicit. “Continue until fixed” is vague. “Stop when the verification packet shows the acceptance criteria pass, or when an open question blocks safe implementation” is actionable.

One lightweight scorecard

The improvement loop should be small enough that the team actually uses it. Pick one scorecard that can be completed after a workflow run in five minutes. The scorecard should measure the workflow, not the model’s personality. It should ask whether artifacts were reusable, whether evidence was sufficient, whether context was controlled, whether review caught issues, and what should change next run.

# Next-run Improvement Scorecard

Workflow name:
Date:
Task:

1. Was the execution brief usable without reopening the original conversation? 0 / 1 / 2
2. Did the investigation memo separate evidence from inference? 0 / 1 / 2
3. Did the implementation stay inside the stated scope? 0 / 1 / 2
4. Did verification include more than passing tests? 0 / 1 / 2
5. Was residual risk explicit? 0 / 1 / 2
6. Did usage evidence show avoidable long-context, subagent, skill, plugin, or MCP cost? 0 / 1 / 2

One thing to keep:
One thing to change next run:
One artifact or rule to update:

What to improve on the next run

Do not try to improve everything after the first run. Choose one improvement. Maybe the context map was too vague. Maybe the review agent needs a narrower rubric. Maybe the execution brief omitted non-goals. Maybe verification evidence was too thin. The scorecard turns that observation into a small change: update a template, add a rule, refine a skill, or adjust a gate.

Instructor note: The one-day version deliberately reduces deep observability design, full operating-system specification work, extended metric design, and multi-day capstone build-out. The right close is one workflow and one next-run improvement, not a grand transformation program.

Closing synthesis

The five modules form one loop. Route the work correctly. Compress the plan. Recover intent with evidence. Review against the brief. Preserve the workflow that worked. That loop is small enough to teach in a day and strong enough to become the foundation for team-scale Claude Code adoption.

Exercise: Name one workflow your team will run in the next week. Fill out the workflow spec, define at least three gates, and choose one scorecard question that will determine what you improve after the first run.

Appendix: Live Delivery Notes

Recommended pacing

Segment	Duration	Purpose
Course frame and operating surface	35 minutes	Establish the mental model and readiness gate
Primitives, marketplace, plugins, skills, commands, agents, and model selection	55 minutes	Build routing judgment
Planning and context-window management	55 minutes	Create compact execution briefs
Intent recovery and dynamic data sources	50 minutes	Ground model turns in evidence
Review, verification, and PR-ready evidence	55 minutes	Make correctness and residual risk explicit
Multi-agent workflow design	35 minutes	Turn practices into a named workflow
Lightweight operating review and final synthesis	15 minutes	Choose one next-run improvement

What to cut if time compresses

If time compresses, cut extended plugin packaging discussion first, then deeper model-routing nuance, then longer workflow-library discussion, and finally most of the improvement-loop discussion. Do not cut planning and context management, intent recovery, dynamic evidence quality, or review and verification discipline. Those are the load-bearing parts of the one-day version.

Deliberately reduced content

The one-day format reduces deep observability design, full operating-system specification work, extended metric design and telemetry planning, and larger capstone build-out across multiple days. Those topics belong in the longer program. The one-day intensive should close with one workable delivery workflow and one clear improvement metric or checklist item.

Before each delivery: changelog refresh checklist

Confirm the latest Claude Code version at the top of the changelog.
Check whether model names, model picker behavior, or available-model enforcement changed.
Check whether skills, plugins, marketplace browsing, commands, hooks, MCP policy, background agents, or safe-mode behavior changed.
Update examples that mention exact commands or settings; leave principles-first sections intact unless the primitive changed.

Instructor caution: Avoid presenting release-specific model names, plan entitlements, or provider-specific rows as universal. Enterprise participants may see a governed subset controlled by managed settings.

Prepared as a generated course-reader artifact from the one-day live agenda, the one-day intensive presentation guide, the uploaded ClosedLoop HTML module example, and a Claude Code changelog cross-check through version 2.1.176.