Claude Code One-Day Intensive
A full-text, O’Reilly-style course reader for the condensed five-hour live training. Each live training block is expanded into a self-contained module with concepts, examples, artifacts, exercises, review checkpoints, and a Claude Code changelog accuracy pass.
How to Use This Book
This textbook version turns the one-day agenda and instructor presentation guide into a readable course companion. It is designed for engineers who want more than slide bullets: they need a practical operating model, concrete artifact templates, and enough explanation to apply the ideas after the live session ends.
The live course assumes mandatory setup pre-work. Participants should arrive with Claude Code working, the repository cloned, demo commands verified, editor and terminal ready, and baseline tool-permission posture understood. Live time should not be spent repairing local setup; it should be spent practicing routing, planning, investigation, review, and workflow design.
Course Map
Modules
The recurring critique pattern
Every module uses the same critique pattern because the course is artifact-first. Do not ask only whether an artifact looks polished. Ask whether it can serve as a downstream input.
What would you improve about this artifact? What would make it better as a downstream input? Could another operator use this without reopening the whole problem?
This pattern is intentionally simple. It works for routing boards, execution briefs, investigation memos, feedback bundles, review packets, and workflow specs. The goal is to train participants to judge artifacts by operational usefulness rather than by surface completeness.
Current Claude Code Release Cross-Check
This reader has been cross-checked against the public Claude Code changelog through the latest visible entry, version 2.1.176. The course remains accurate as a principles-first training artifact, with release-sensitive updates called out below.
What changed in the current changelog window
| Changelog area | Training impact | Where to emphasize it |
|---|---|---|
| Model governance | availableModels enforcement and the managed enforceAvailableModels setting make model routing an administrator-controlled policy topic, not just an individual preference. | Module 1, model selection and enterprise routing boards |
| Model picker behavior | The /model picker and aliases may vary by plan, provider, and allowlist. Do not teach one hardcoded model menu as universal. | Module 1, model selection |
| Nested subagents | Subagents can now spawn their own subagents up to five levels deep. This expands workflow design, but the course should still teach bounded delegation and explicit stop conditions. | Modules 1 and 5 |
| Plugin and skill management | Plugin-listing, marketplace browsing, bundled-skill controls, and skill hot-reload improvements reinforce teaching skills and plugins as governed reusable assets. | Module 1 and Appendix |
| Troubleshooting mode | --safe-mode and CLAUDE_CODE_SAFE_MODE can start Claude Code with customizations disabled. This is useful when teaching setup checks and plugin/skill debugging. | Pre-work gate and Appendix |
| Working directory mobility | /cd allows a session to move working directories without breaking prompt cache. This improves multi-repo and worktree workflows. | Module 1 and Module 5 |
| Usage attribution | Recent usage reporting breaks down cache misses, long context, subagents, skills, agents, plugins, and MCP usage. This makes cost and context hygiene more measurable. | Modules 2 and 5 scorecards |
Accuracy adjustments applied
- Model names are now described as release- and entitlement-sensitive instead of fixed universal tiers.
- Subagent guidance now explicitly allows nested subagent workflows while warning against uncontrolled recursion.
- Pre-work troubleshooting now includes safe mode as the fastest way to separate Claude Code core behavior from local customizations.
- The workflow scorecard now includes a release-aware cost and usage review prompt.
- The appendix now includes a short update checklist for future changelog refreshes.
Module 1 · Block 1 · 70 minutes
Operating Surface and Routing
Claude Code becomes useful at team scale when engineers stop treating it as a single chat box and start treating it as an operating surface: a collection of tools, commands, skills, agents, subagents, plugins, execution modes, and model choices. This module teaches the routing judgment that keeps work small, explicit, and repeatable.
Why routing is the first skill
Most failed AI-assisted engineering sessions do not fail because the model is incapable. They fail because the work is routed to the wrong primitive. A developer asks for a broad implementation when the real need is investigation. A team writes a permanent rule into a one-off prompt. A repeated checklist stays trapped in someone’s memory instead of becoming an executable skill. A heavyweight model is used for cheap file discovery, while a complex design decision is handed to a fast model with insufficient reasoning depth.
The first day of a Claude Code operating model therefore begins with vocabulary. Not vocabulary for its own sake, but vocabulary that gives the team a shared way to decide where work should live. Once the team can say “this is a command,” “this is a skill,” “this belongs in memory,” “this should be a subagent,” or “this needs a headless run,” the tool stops being mysterious and starts becoming an engineering system.
The operating surface
Think of Claude Code as a workbench with several surfaces. The interactive session is where a human and model negotiate a task. Tools are the model’s hands: reading files, running commands, searching repositories, editing code, and interacting with configured integrations. Commands are named entry points that standardize common activities. Skills package reusable procedures. Agents and subagents isolate specialized work. Plugins and marketplace content extend what is available across projects or organizations. Headless execution turns the same operating model into automation.
The practical question is not “which feature is coolest?” The practical question is: where should this work be represented so that another engineer can reuse it without rediscovering it?
| Primitive | Best use | Team-scale signal | Common mistake |
|---|---|---|---|
| Prompt | One-off instruction, exploration, or clarification | Useful when the work is new, ambiguous, or conversational | Using prompts repeatedly for stable procedures |
| Command | A named local action or workflow entry point | Useful when the same operation should start the same way every time | Packing too much reasoning policy into a command name |
| Skill | A reusable procedure, checklist, transformation, or review pattern | Useful when a workflow should be invoked on demand and updated centrally | Putting always-needed facts in a skill instead of memory or project instructions |
| Agent | A specialized role with its own instructions and tool boundaries | Useful when a class of work needs a consistent expert lens | Creating broad agents with vague responsibilities |
| Subagent | Isolated investigation or parallel work unit | Useful when exploration would pollute the main context window | Letting broad file reads accumulate in the main conversation |
| Plugin / marketplace package | Reusable capability distributed beyond one repo | Useful when teams need a shared extension point | Packaging before the workflow has stabilized |
| Headless run | Non-interactive execution in CI, scripts, or automation | Useful when the work has clear inputs, outputs, and stop conditions | Automating work that still requires human judgment |
Interactive versus headless work
Interactive mode is for discovery, negotiation, and judgment. A human can interrupt, correct assumptions, ask for alternatives, and decide whether the model’s next step is safe. Headless mode is for work that has already been bounded. It needs clear inputs, allowed tools, expected outputs, and stop conditions. If the task still requires a human to decide what the task is, it is not ready for headless execution.
A good rule is this: interactive sessions produce artifacts; headless sessions consume artifacts. During a live session you might create an execution brief, investigation memo, or verification packet. Once those artifacts are stable, a headless run can implement a bounded change, run a review, or generate a report from known inputs.
Goal mode and debate mode
Two conversational modes matter for the one-day course. Goal mode is useful when the team wants the model to drive toward a defined outcome. Debate mode is useful when the team wants the model to challenge a plan before code is written. Goal mode helps with forward motion; debate mode helps with error prevention. Both are most useful when paired with artifacts.
For example, a developer might ask Claude to draft a compact execution brief in goal mode. Once the brief exists, the developer can switch into a debate posture: “Challenge this brief. Identify hidden assumptions, underspecified acceptance criteria, and likely regression risks.” The output of the debate should not be a wandering conversation. It should be a better brief.
Model selection for the job
Model routing is part of work routing. Cheap, fast models are appropriate for bounded lookup, file discovery, and summarizing known material. Balanced models are appropriate for routine implementation and review. The strongest reasoning models are appropriate for architecture, tricky debugging, multi-file refactors, and decisions where a bad plan is more expensive than slower planning. Exact model names and picker rows are release-, plan-, provider-, and allowlist-sensitive; teach the routing principle, then have participants verify the current choices with /model, /status, and any managed availableModels policy in their environment.
| Job | Preferred routing | Why |
|---|---|---|
| Find relevant files | Explore subagent on a fast model | Broad search stays out of the main context window |
| Design a refactor | Strong reasoning model for planning, then balanced model for execution | The plan is the expensive part to get wrong |
| Apply a known checklist | Skill or command | The procedure should be stable and repeatable |
| Review a security-sensitive change | Specialized review agent with high effort | The lens and depth matter more than speed |
| Generate a one-time explanation | Interactive prompt | The work is conversational and may not need persistence |
| Run a recurring report | Headless workflow over a stable spec | The inputs, outputs, and schedule are known |
availableModels enforcement and added managed enforcement controls. For enterprise training, the routing board should include the organization’s approved model set and a note that /fast, aliases, subagent model overrides, advisor models, and background agents must stay inside the allowlist.Building the primitive and routing board
The routing board is the first durable artifact of the day. It lists common team tasks and routes each one to the smallest reliable primitive. The board should be compact enough to live in a repo, onboarding guide, or team operating doc. It should not be aspirational. It should describe the actual next version of the team’s workflow.
Task: Review a PR for auth regressions
Primitive: security-reviewer agent + /code-review command
Mode: interactive for local development; headless only after the rule set is stable
Model: balanced model for normal review, stronger reasoning for high-risk auth changes
Inputs: diff, execution brief, REVIEW.md, relevant auth rules
Output: findings-first review with severity, evidence, and recommended fix
Stop condition: no important findings or explicit residual risk accepted by human
What good looks like
A good routing board has three qualities. First, it is specific. “Use Claude for coding” is not useful; “use an explorer subagent to identify files before reading them into the main session” is useful. Second, it is bounded. Each row says what the primitive should and should not do. Third, it is teachable. A new engineer should be able to read the board and make the same routing decision as a senior engineer most of the time.
Anti-patterns
- The mega-prompt: A long prompt that mixes stable policy, one-time instructions, codebase facts, and a workflow checklist. Split it into memory, project instructions, skill, and task prompt.
- The everything-agent: A custom agent named “senior engineer” that can do anything. Specialized agents should have a narrow lens, clear tools, and a predictable output shape.
- Premature plugin packaging: A workflow is packaged before the team has run it enough times to know its inputs, failure modes, and stop conditions.
- Headless ambiguity: A non-interactive run is launched with vague goals and no acceptance criteria. Headless work should consume a brief, not invent one.
Module recap
The foundation of effective Claude Code use is not prompt cleverness. It is routing discipline. Teams that classify work correctly can preserve knowledge, reduce repeated prompting, avoid context-window waste, and turn successful sessions into reusable operating assets.
Module 2 · Block 2 · 55 minutes
Planning and Context Management
Planning is not a ceremonial step before coding. In Claude Code, planning is the act of shaping context so the next operator—human, model, agent, or headless workflow—can act without reopening the entire problem.
The plan is a compression artifact
Claude Code sessions can accumulate enormous context: pasted requirements, file contents, terminal output, attempted fixes, test failures, screenshots, and corrections from the human. Without deliberate compression, the session becomes expensive and fragile. The model is forced to infer what still matters from a long transcript. Humans are forced to remember why earlier decisions were made. Downstream operators inherit noise instead of a plan.
A useful plan is not a transcript. It is a lossily compressed representation of the work. It preserves the facts, decisions, constraints, risks, and next actions that matter. It discards the conversational path that produced them. That is why the one-day course treats planning as context management.
Separate facts, assumptions, and open questions
The simplest improvement to most AI coding sessions is to stop blending known facts with guesses. Models are very good at continuing a confident narrative. If the prompt says “the auth middleware probably owns refresh-token invalidation,” the model may proceed as if that is true. A disciplined brief separates evidence from inference.
| Category | Definition | Example | How to handle it |
|---|---|---|---|
| Fact | A statement backed by direct evidence | `src/auth/middleware.ts` validates JWTs before route handlers run | Can be used directly in the plan |
| Assumption | A plausible statement not yet proven | Refresh token invalidation is probably handled in the session store | Must be tested or called out as risk |
| Open question | A decision or unknown that blocks confident execution | Should expired refresh tokens be deleted or retained for audit? | Resolve before implementation or explicitly defer |
| Constraint | A boundary the solution must respect | Do not change public API response shape | Use as acceptance criteria and review rule |
The compact execution brief
The execution brief is the central planning artifact. It should fit on one or two pages, but it should be complete enough for another operator to execute. The brief is not just a summary. It is an instruction-bearing artifact with a clear contract: here is the problem, here is the known context, here are the boundaries, here is the proposed path, and here is how we will know whether the work is done.
# Compact Execution Brief
## Goal
Implement refresh token rotation without changing the existing login response contract.
## Known facts
- JWT validation happens in src/auth/middleware.ts.
- Session persistence is implemented in src/auth/session-store.ts.
- Existing tests cover login success and expired access tokens.
## Assumptions to verify
- Old refresh tokens are not currently invalidated after rotation.
- Token reuse should be treated as suspicious but not immediately lock the account.
## Open questions
- Should reuse detection emit an audit event?
- Is token family tracking already present in the database schema?
## Context map
- Auth middleware: request validation and session lookup
- Session store: token persistence and expiration
- Test suite: integration tests under tests/auth/
## Work packages
1. Verify current token rotation behavior.
2. Add invalidation logic or token-family tracking.
3. Extend integration tests for old-token reuse.
4. Produce review and verification packet.
## Acceptance criteria
- New refresh token is issued on rotation.
- Previous refresh token cannot be reused.
- Existing login response shape is unchanged.
- Tests demonstrate success, expiration, and reuse behavior.
Context maps
A context map tells the model where to look and why. It is not a full dump of file contents. It is a pointer layer: directories, files, functions, commands, external systems, and documents that are likely relevant. Good context maps reduce token use because Claude can read the right files in the right order instead of scanning the repository blindly.
A context map should include both primary and secondary context. Primary context is required to make the change. Secondary context helps review risk, verify behavior, or understand why the system is shaped the way it is.
## Context map
Primary:
- src/auth/middleware.ts — request authentication boundary
- src/auth/session-store.ts — refresh token persistence
- db/schema.sql — session and token tables
- tests/auth/refresh-token.test.ts — integration behavior
Secondary:
- docs/security/auth-model.md — intended auth posture
- .claude/rules/api-security.md — project-specific security rules
- recent PRs touching auth middleware — intent and regression context
Debate review before coding
Before code is written, ask Claude to attack the plan. The goal is not to win the debate; the goal is to improve the artifact. A debate review should look for ambiguous goals, missing constraints, unsupported assumptions, hidden coupling, risky files, weak acceptance criteria, and likely regression paths.
Review this execution brief as a skeptical senior engineer. Identify unsupported assumptions, missing context, and acceptance criteria that would fail to catch a regression. Do not implement. Return a revised brief outline and a list of questions that must be answered before coding.
The output of debate review should be folded back into the brief. If the debate produces useful insights that remain trapped in conversation history, the next operator still cannot use them. Artifact-first planning means the artifact is the durable memory.
What makes a plan reusable downstream?
A reusable plan has clear boundaries. It names the exact goal, the non-goals, the files likely involved, the evidence already gathered, the assumptions still open, and the stop condition. It also includes enough review criteria to prevent the model from declaring success too early.
- Goal
- What outcome should exist after the work is complete?
- Non-goals
- What tempting adjacent work should not be done?
- Facts
- What has been directly observed?
- Assumptions
- What might be true, but needs verification?
- Context map
- Where should the next operator look first?
- Work packages
- What are the smallest implementation units?
- Acceptance criteria
- What evidence will prove the work is complete?
- Review focus
- What risks should review emphasize?
Module recap
Planning in Claude Code is not about slowing down. It is about preserving momentum by producing a compact artifact that can survive compaction, handoff, review, and automation. The better the brief, the less the model has to infer and the easier it is for humans to hold the work accountable.
Module 3 · Block 3 · 60 minutes
Intent Recovery and Dynamic Evidence
When a codebase is old enough, the current code is rarely the whole story. This module teaches participants to recover intent from Git history, issues, PRs, logs, tests, command traces, screenshots, and other dynamic sources before asking Claude to change behavior.
Find the why before the what
Claude can usually explain what code does from the current files. The harder question is why it does that. Was a strange branch added for a customer-specific edge case? Did a test encode a production incident? Was a confusing abstraction introduced to support a migration that has since finished? Current code often hides the reason for its own shape.
Intent recovery is the discipline of gathering enough historical and runtime evidence to avoid undoing deliberate behavior. It is especially important when the requested change appears simple. Simple changes are dangerous when they cut across hidden intent.
Static code is not enough
Reading the current file gives one kind of evidence. Git history gives another. Tests reveal expected behavior. Issues and PRs reveal tradeoffs. Logs reveal runtime reality. Screenshots reveal UI states that code alone may not make obvious. CLI traces reveal exact failure modes. Documentation and MCP-backed systems can provide external context that is not stored in the repository.
| Evidence source | Question it answers | Risk if omitted |
|---|---|---|
| Current code | What does the system do now? | The model may miss hidden coupling outside the local file |
| Git blame and commits | Why was this line introduced or changed? | The model may remove a deliberate workaround |
| PR discussion | What tradeoffs were accepted? | The model may re-litigate settled decisions |
| Issues / tickets | What user or incident motivated the behavior? | The model may solve the wrong problem |
| Tests | What behavior is currently protected? | The model may pass local reasoning but break expected behavior |
| Logs and traces | What happens in real executions? | The model may optimize for imagined behavior |
| Screenshots | What does the user actually see? | The model may miss visual or state-machine issues |
| Docs and runbooks | What standards should govern the change? | The model may violate team conventions |
Evidence versus inference
A strong investigation memo labels evidence. It does not say “the bug is caused by stale cache” unless there is direct evidence. It says “the failure appears after the cache read path; logs show cache hit with outdated value; no write-through event appears in the trace; inference: stale cache is likely.” This distinction matters because the next model turn will use the memo as context. If guesses are written like facts, the model will build on them.
# Investigation Memo
## Question
Why does the account summary show stale plan data after billing upgrade?
## Evidence gathered
- UI reproduces after upgrade from Basic to Pro without page refresh.
- Network trace shows GET /account returns plan=Basic immediately after upgrade.
- Server logs show billing webhook processed successfully.
- Git history shows account summary cache added in PR #184 to reduce billing API calls.
## Inferences
- The account summary endpoint likely reads a cached account projection.
- The billing webhook may update billing state without invalidating account summary cache.
## Open questions
- Where is account summary cache invalidated?
- Is delayed consistency acceptable for this UI?
## Recommended next step
Trace account summary cache writes and invalidation paths before changing UI code.
Dynamic evidence sources
Dynamic evidence is information that changes with execution: logs, test runs, database state, browser behavior, CLI output, screenshots, observability traces, and external tool responses. It is powerful because it grounds the model in reality. It is also noisy. A good operator does not paste raw dynamic output indiscriminately. They capture the relevant excerpt, label how it was produced, and explain why it matters.
When collecting dynamic evidence, include command provenance. The model should know not only the output, but the command, environment, timestamp or branch, and whether the result was reproducible.
Command: pnpm test tests/auth/refresh-token.test.ts --runInBand
Branch: refresh-token-rotation
Result: failed, 1 test
Relevant output:
expected old refresh token reuse to return 401
received 200
Interpretation:
Existing implementation issues a new refresh token but does not invalidate the previous token.
Why “that didn’t work” fails as feedback
The phrase “that didn’t work” is almost useless to the model. It omits what was attempted, what was expected, what actually happened, what evidence was observed, and what changed between attempts. Good feedback is a bundle: action, expectation, observation, evidence, hypothesis, and next constraint.
| Weak feedback | Better feedback |
|---|---|
| That did not work. | After applying the patch, `pnpm test tests/auth/refresh-token.test.ts` still fails. Expected old token reuse to return 401; actual response is 200. Relevant log shows session lookup succeeds for the old token. Focus next on invalidation in session-store, not middleware. |
| The UI is broken. | Clicking Save leaves the modal open. Browser console shows no error. Network tab shows PATCH /settings returns 204. The likely issue is local modal state not closing after success. |
| Try again. | Revise the approach without changing the public API response shape. Preserve existing success tests and add one regression test for duplicate submission. |
The feedback bundle pattern
A feedback bundle is a structured correction that improves the next model turn. It should be short, but it must contain enough evidence for the model to update its plan. Use it whenever Claude’s first implementation fails, when a test result contradicts an assumption, or when a human reviewer spots a gap.
# Feedback Bundle
## Attempted change
Added token rotation logic in src/auth/middleware.ts.
## Expected result
Old refresh token reuse should return 401.
## Actual result
Old refresh token reuse returns 200.
## Evidence
Test: pnpm test tests/auth/refresh-token.test.ts
Failure: expected 401, received 200
Trace: old token still resolves in session-store lookup.
## Updated hypothesis
Middleware is not the right layer for invalidation. The session store accepts both token records.
## Next instruction
Inspect session-store token persistence and invalidation. Do not change response shape.
Module recap
Intent recovery prevents well-intentioned regressions. Dynamic evidence prevents hallucinated debugging. Feedback bundles convert failure into useful context. Together, they turn Claude Code from a code generator into an evidence-driven collaborator.
Module 4 · Block 4 · 60 minutes
Review, Test, and Verify
The work is not done when code changes. It is done when the diff has been reviewed against the brief, the verification evidence is explicit, and the residual risk is clear enough for a human to accept or reject.
Review the diff against the brief
A Claude-assisted review should not ask “does this code look good?” That question is too broad and too subjective. The stronger question is: “does this diff satisfy the execution brief without violating constraints or introducing unacceptable risk?” The brief becomes the review contract.
Findings-first review means the reviewer leads with issues, not narrative. Each finding should state severity, evidence, affected file or behavior, why it matters, and a recommended fix. If there are no blocking findings, the review should still describe what was checked and what residual risk remains.
# Review Finding
Severity: Important
Area: Refresh token invalidation
Evidence: tests/auth/refresh-token.test.ts covers successful rotation but not reuse of the old token.
Why it matters: The acceptance criteria require old refresh tokens to be rejected.
Recommended fix: Add a regression test that attempts reuse of the previous token after rotation and expects 401.
Scope drift
Scope drift is any change that is not required by the brief. Some drift is harmless cleanup. Some drift is dangerous because it changes behavior the team did not intend to change. Claude can drift when it sees adjacent improvements, especially if the prompt rewards broad helpfulness. Review must therefore compare the diff against explicit goals and non-goals.
| Drift type | Example | Review response |
|---|---|---|
| Benign cleanup | Renaming a local variable for clarity | Accept if low risk and local |
| Adjacent refactor | Changing session-store interfaces while adding one token behavior | Challenge unless required by the brief |
| Behavior expansion | Adding account lockout on token reuse when not requested | Reject or move to follow-up |
| Contract change | Changing login response shape while implementing rotation | Block |
| Test-only expansion | Adding regression tests for directly related edge cases | Usually accept |
Testing is one gate, not the only gate
Passing tests are necessary evidence, but they are not proof of correctness. Tests only cover what they assert. A verification packet should include test results, manual checks when relevant, static review, diff review, command outputs, and residual risk. The point is not to create paperwork. The point is to prevent the phrase “tests pass” from hiding an unreviewed assumption.
For LLM-assisted work, verification should also include provenance: what files changed, what commands were run, what evidence was observed, and what the model did not check. This gives the human reviewer a clear map of confidence and uncertainty.
The verification packet
# Verification Packet
## Brief alignment
Goal: Refresh token rotation rejects old token reuse.
Status: Implemented and tested.
## Changed files
- src/auth/session-store.ts — invalidates previous refresh token on rotation
- tests/auth/refresh-token.test.ts — adds old-token reuse regression test
## Evidence
- pnpm test tests/auth/refresh-token.test.ts — passed
- pnpm test tests/auth/login.test.ts — passed
- Manual API check: old refresh token returns 401 after rotation
## Scope review
No public response contract changes observed.
No unrelated auth routes modified.
## Residual risk
Database cleanup of invalidated token records is not addressed. Existing retention behavior remains unchanged.
## PR handoff note
Reviewer should focus on token-store concurrency and whether invalidated token retention meets audit expectations.
Regression risk
Regression risk is not just the probability that something breaks. It is the product of likelihood, blast radius, and detectability. A small likelihood with a huge blast radius still deserves attention. A likely bug with easy rollback may be acceptable if the release path is safe. Claude can help enumerate risks, but the team must decide what risk is acceptable.
| Risk question | Why it matters |
|---|---|
| What user-visible behavior changed? | Identifies blast radius |
| What existing tests protect this path? | Identifies current safety net |
| What did we not test? | Prevents false confidence |
| What external systems depend on this behavior? | Finds hidden contracts |
| How would we detect failure in production? | Separates known risk from invisible risk |
| How would we roll back? | Determines operational readiness |
What makes a handoff PR-ready?
A PR-ready handoff gives the reviewer the shortest path to an informed decision. It should contain the problem statement, brief link or summary, changed files, review focus, verification evidence, known non-goals, and residual risk. The reviewer should not need to reconstruct the story from chat history.
Problem → Approach → Changed files → Verification → Review focus → Residual risk. If any of those pieces is missing, the PR is not fully handoff-ready.
Module recap
Review and verification discipline turns model output into engineering evidence. The goal is not to make Claude “sound confident.” The goal is to make the work auditable: what changed, why it changed, how it was checked, and what remains uncertain.
Module 5 · Block 5 · 45 minutes
Workflow Design and Minimal Improvement Loop
The one-day session closes by turning isolated practices into a named workflow. Participants define handoffs, gates, stop conditions, and one lightweight scorecard so the next run can improve without expanding into a full operating-system redesign.
From good sessions to repeatable workflows
A strong Claude Code session can still be a dead end if the team cannot repeat it. Workflow design captures the sequence that made the session successful: how the work was framed, what artifacts were produced, which agents or skills were used, where human review occurred, and what evidence counted as done. The workflow does not need to be elaborate. It needs to be named, bounded, and reusable.
The one-day course intentionally keeps this module compact. The goal is not to design a full engineering operating system. The goal is to leave with one workflow that can be tried next week and improved after one run.
A compact multi-agent workflow
Multi-agent workflow design should begin with work boundaries, not agent names. Each agent or subagent should own a distinct lens or phase. Current Claude Code releases support nested subagents, which means a delegated agent can fan out further for bounded research or review. Use that power carefully: nested delegation should have named inputs, maximum depth, output limits, and explicit stop conditions. If two agents need the same broad context and produce overlapping output, the workflow is probably not decomposed well.
# Compact Workflow Spec
Name: Evidence-first bug fix
Trigger:
A bug report has enough detail to reproduce or investigate.
Artifacts:
1. Investigation memo
2. Compact execution brief
3. Implementation diff
4. Review and verification packet
Roles:
- Explorer subagent: identify relevant files, history, and evidence sources
- Planner: compress evidence into execution brief
- Implementer: make bounded code changes from the brief
- Reviewer: compare diff against brief and produce findings
Gates:
- Do not implement until facts, assumptions, and open questions are separated.
- Do not review until acceptance criteria are explicit.
- Do not hand off until verification evidence and residual risk are written.
Stop condition:
The change is PR-ready or blocked by a named open question.
Handoffs
A handoff is where one operator’s output becomes another operator’s input. In Claude Code workflows, handoffs should be artifact-based. The explorer hands off an investigation memo. The planner hands off an execution brief. The implementer hands off a diff plus notes. The reviewer hands off findings and verification evidence. If a handoff requires the next operator to read the entire chat transcript, the handoff failed.
/cd. That makes worktree and multi-repo workflows easier, but the same rule applies: preserve the handoff as an artifact, not as implicit terminal state.| Handoff | Input | Output | Quality bar |
|---|---|---|---|
| Investigation → Planning | Evidence, traces, history, open questions | Execution brief | Facts and assumptions are separated |
| Planning → Implementation | Execution brief and context map | Bounded diff | Non-goals and acceptance criteria are respected |
| Implementation → Review | Diff and brief | Findings or approval with residual risk | Review is evidence-based |
| Review → Handoff | Findings, fixes, verification commands | PR-ready packet | Reviewer can decide without chat history |
Gates and stop conditions
Gates prevent premature motion. Stop conditions prevent infinite motion. A gate says what must be true before the workflow can advance. A stop condition says when the workflow is complete, blocked, or unsafe to continue. Claude workflows need both because models tend to continue helping unless told what “done” means.
Good gates are observable. “Make sure the plan is good” is not a gate. “The brief includes goal, non-goals, facts, assumptions, open questions, context map, work packages, and acceptance criteria” is a gate. Good stop conditions are explicit. “Continue until fixed” is vague. “Stop when the verification packet shows the acceptance criteria pass, or when an open question blocks safe implementation” is actionable.
One lightweight scorecard
The improvement loop should be small enough that the team actually uses it. Pick one scorecard that can be completed after a workflow run in five minutes. The scorecard should measure the workflow, not the model’s personality. It should ask whether artifacts were reusable, whether evidence was sufficient, whether context was controlled, whether review caught issues, and what should change next run.
# Next-run Improvement Scorecard
Workflow name:
Date:
Task:
1. Was the execution brief usable without reopening the original conversation? 0 / 1 / 2
2. Did the investigation memo separate evidence from inference? 0 / 1 / 2
3. Did the implementation stay inside the stated scope? 0 / 1 / 2
4. Did verification include more than passing tests? 0 / 1 / 2
5. Was residual risk explicit? 0 / 1 / 2
6. Did usage evidence show avoidable long-context, subagent, skill, plugin, or MCP cost? 0 / 1 / 2
One thing to keep:
One thing to change next run:
One artifact or rule to update:
What to improve on the next run
Do not try to improve everything after the first run. Choose one improvement. Maybe the context map was too vague. Maybe the review agent needs a narrower rubric. Maybe the execution brief omitted non-goals. Maybe verification evidence was too thin. The scorecard turns that observation into a small change: update a template, add a rule, refine a skill, or adjust a gate.
Closing synthesis
The five modules form one loop. Route the work correctly. Compress the plan. Recover intent with evidence. Review against the brief. Preserve the workflow that worked. That loop is small enough to teach in a day and strong enough to become the foundation for team-scale Claude Code adoption.
Appendix: Live Delivery Notes
Recommended pacing
| Segment | Duration | Purpose |
|---|---|---|
| Course frame and operating surface | 35 minutes | Establish the mental model and readiness gate |
| Primitives, marketplace, plugins, skills, commands, agents, and model selection | 55 minutes | Build routing judgment |
| Planning and context-window management | 55 minutes | Create compact execution briefs |
| Intent recovery and dynamic data sources | 50 minutes | Ground model turns in evidence |
| Review, verification, and PR-ready evidence | 55 minutes | Make correctness and residual risk explicit |
| Multi-agent workflow design | 35 minutes | Turn practices into a named workflow |
| Lightweight operating review and final synthesis | 15 minutes | Choose one next-run improvement |
What to cut if time compresses
If time compresses, cut extended plugin packaging discussion first, then deeper model-routing nuance, then longer workflow-library discussion, and finally most of the improvement-loop discussion. Do not cut planning and context management, intent recovery, dynamic evidence quality, or review and verification discipline. Those are the load-bearing parts of the one-day version.
Deliberately reduced content
The one-day format reduces deep observability design, full operating-system specification work, extended metric design and telemetry planning, and larger capstone build-out across multiple days. Those topics belong in the longer program. The one-day intensive should close with one workable delivery workflow and one clear improvement metric or checklist item.
Before each delivery: changelog refresh checklist
- Confirm the latest Claude Code version at the top of the changelog.
- Check whether model names, model picker behavior, or available-model enforcement changed.
- Check whether skills, plugins, marketplace browsing, commands, hooks, MCP policy, background agents, or safe-mode behavior changed.
- Update examples that mention exact commands or settings; leave principles-first sections intact unless the primitive changed.
Prepared as a generated course-reader artifact from the one-day live agenda, the one-day intensive presentation guide, the uploaded ClosedLoop HTML module example, and a Claude Code changelog cross-check through version 2.1.176.