Claude Code One-Day Intensive Expanded Student Workbook
A hands-on follow-along workbook with repo-backed examples, file locations, commands, exercises, and desk-reference patterns for the full five-hour Claude Code intensive.
How to Use This Workbook
This textbook version turns the one-day agenda and instructor presentation guide into a readable course companion. It is designed for engineers who want more than slide bullets: they need a practical operating model, concrete artifact templates, and enough explanation to apply the ideas after the live session ends.
The live course assumes mandatory setup pre-work. Participants should arrive with Claude Code working, the repository cloned, demo commands verified, editor and terminal ready, and baseline tool-permission posture understood. Live time should not be spent repairing local setup; it should be spent practicing routing, planning, investigation, review, and workflow design.
Course Map
Modules
- Start Here: Student Follow-Along Guide
- How to Use This Workbook
- Module 1: Operating Surface and Routing
- Module 2: Planning and Context Management
- Module 3: Intent Recovery and Dynamic Evidence
- Module 4: Review, Test, and Verify
- Module 5: Workflow Design and Minimal Improvement Loop
- Desk Reference and Repo Links
- Concrete Examples, File Locations, and Repo Links
The recurring critique pattern
Every module uses the same critique pattern because the course is artifact-first. Do not ask only whether an artifact looks polished. Ask whether it can serve as a downstream input.
What would you improve about this artifact? What would make it better as a downstream input? Could another operator use this without reopening the whole problem?
This pattern is intentionally simple. It works for routing boards, execution briefs, investigation memos, feedback bundles, review packets, and workflow specs. The goal is to train participants to judge artifacts by operational usefulness rather than by surface completeness.
Start Here: Student Follow-Along Guide
This workbook is the student version of the one-day intensive. It removes instructor-only delivery notes and turns the course into a practical follow-along reference. Keep the GitHub repository open while you read, because the examples, anti-patterns, pre-work, and commands are part of the exercise flow.
Pre-work checklist: docs/ONE-DAY-PREWORK-CHECKLIST.md
Setup lab: labs/00-no-speed-limits-setup-local-stack-bootstrap/LAB.md
Tool permissions examples: docs/TOOL-PERMISSIONS-EXAMPLES.md
Demo artifacts index: slides/demo-artifacts/
How to use each module
- Read the framing and vocabulary before the live block starts.
- Open the linked example and anti-pattern artifacts from the repo.
- Build the named artifact for that block.
- Use the critique prompt:
What would make this better as a downstream input? - Keep your final artifacts short enough for another operator to use without reopening the whole conversation.
Module 1 · Block 1 · 70 minutes
Operating Surface and Routing
Claude Code becomes useful at team scale when engineers stop treating it as a single chat box and start treating it as an operating surface: a collection of tools, commands, skills, agents, subagents, plugins, execution modes, and model choices. This module teaches the routing judgment that keeps work small, explicit, and repeatable.
Hands-on: build the routing board
- Open the Day 1 routing board example and anti-pattern.
- Add three work items from your team.
- For each, choose primitive, mode, model, tools, permissions, output artifact, and stop condition.
- Revise one row after peer critique.
Why routing is the first skill
Most failed AI-assisted engineering sessions do not fail because the model is incapable. They fail because the work is routed to the wrong primitive. A developer asks for a broad implementation when the real need is investigation. A team writes a permanent rule into a one-off prompt. A repeated checklist stays trapped in someone’s memory instead of becoming an executable skill. A heavyweight model is used for cheap file discovery, while a complex design decision is handed to a fast model with insufficient reasoning depth.
The first day of a Claude Code operating model therefore begins with vocabulary. Not vocabulary for its own sake, but vocabulary that gives the team a shared way to decide where work should live. Once the team can say “this is a command,” “this is a skill,” “this belongs in memory,” “this should be a subagent,” or “this needs a headless run,” the tool stops being mysterious and starts becoming an engineering system.
The operating surface
Think of Claude Code as a workbench with several surfaces. The interactive session is where a human and model negotiate a task. Tools are the model’s hands: reading files, running commands, searching repositories, editing code, and interacting with configured integrations. Commands are named entry points that standardize common activities. Skills package reusable procedures. Agents and subagents isolate specialized work. Plugins and marketplace content extend what is available across projects or organizations. Headless execution turns the same operating model into automation.
The practical question is not “which feature is coolest?” The practical question is: where should this work be represented so that another engineer can reuse it without rediscovering it?
| Primitive | Best use | Team-scale signal | Common mistake |
|---|---|---|---|
| Prompt | One-off instruction, exploration, or clarification | Useful when the work is new, ambiguous, or conversational | Using prompts repeatedly for stable procedures |
| Command | A named local action or workflow entry point | Useful when the same operation should start the same way every time | Packing too much reasoning policy into a command name |
| Skill | A reusable procedure, checklist, transformation, or review pattern | Useful when a workflow should be invoked on demand and updated centrally | Putting always-needed facts in a skill instead of memory or project instructions |
| Agent | A specialized role with its own instructions and tool boundaries | Useful when a class of work needs a consistent expert lens | Creating broad agents with vague responsibilities |
| Subagent | Isolated investigation or parallel work unit | Useful when exploration would pollute the main context window | Letting broad file reads accumulate in the main conversation |
| Plugin / marketplace package | Reusable capability distributed beyond one repo | Useful when teams need a shared extension point | Packaging before the workflow has stabilized |
| Headless run | Non-interactive execution in CI, scripts, or automation | Useful when the work has clear inputs, outputs, and stop conditions | Automating work that still requires human judgment |
Interactive versus headless work
Interactive mode is for discovery, negotiation, and judgment. A human can interrupt, correct assumptions, ask for alternatives, and decide whether the model’s next step is safe. Headless mode is for work that has already been bounded. It needs clear inputs, allowed tools, expected outputs, and stop conditions. If the task still requires a human to decide what the task is, it is not ready for headless execution.
A good rule is this: interactive sessions produce artifacts; headless sessions consume artifacts. During a live session you might create an execution brief, investigation memo, or verification packet. Once those artifacts are stable, a headless run can implement a bounded change, run a review, or generate a report from known inputs.
Goal mode and debate mode
Two conversational modes matter for the one-day course. Goal mode is useful when the team wants the model to drive toward a defined outcome. Debate mode is useful when the team wants the model to challenge a plan before code is written. Goal mode helps with forward motion; debate mode helps with error prevention. Both are most useful when paired with artifacts.
For example, a developer might ask Claude to draft a compact execution brief in goal mode. Once the brief exists, the developer can switch into a debate posture: “Challenge this brief. Identify hidden assumptions, underspecified acceptance criteria, and likely regression risks.” The output of the debate should not be a wandering conversation. It should be a better brief.
Model selection for the job
Model routing is part of work routing. Cheap, fast models are appropriate for bounded lookup, file discovery, and summarizing known material. Balanced models are appropriate for routine implementation and review. The strongest reasoning models are appropriate for architecture, tricky debugging, multi-file refactors, and decisions where a bad plan is more expensive than slower planning.
| Job | Preferred routing | Why |
|---|---|---|
| Find relevant files | Explore subagent on a fast model | Broad search stays out of the main context window |
| Design a refactor | Strong reasoning model for planning, then balanced model for execution | The plan is the expensive part to get wrong |
| Apply a known checklist | Skill or command | The procedure should be stable and repeatable |
| Review a security-sensitive change | Specialized review agent with high effort | The lens and depth matter more than speed |
| Generate a one-time explanation | Interactive prompt | The work is conversational and may not need persistence |
| Run a recurring report | Headless workflow over a stable spec | The inputs, outputs, and schedule are known |
Building the primitive and routing board
The routing board is the first durable artifact of the day. It lists common team tasks and routes each one to the smallest reliable primitive. The board should be compact enough to live in a repo, onboarding guide, or team operating doc. It should not be aspirational. It should describe the actual next version of the team’s workflow.
Task: Review a PR for auth regressions
Primitive: security-reviewer agent + /code-review command
Mode: interactive for local development; headless only after the rule set is stable
Model: balanced model for normal review, stronger reasoning for high-risk auth changes
Inputs: diff, execution brief, REVIEW.md, relevant auth rules
Output: findings-first review with severity, evidence, and recommended fix
Stop condition: no important findings or explicit residual risk accepted by human
What good looks like
A good routing board has three qualities. First, it is specific. “Use Claude for coding” is not useful; “use an explorer subagent to identify files before reading them into the main session” is useful. Second, it is bounded. Each row says what the primitive should and should not do. Third, it is teachable. A new engineer should be able to read the board and make the same routing decision as a senior engineer most of the time.
Anti-patterns
- The mega-prompt: A long prompt that mixes stable policy, one-time instructions, codebase facts, and a workflow checklist. Split it into memory, project instructions, skill, and task prompt.
- The everything-agent: A custom agent named “senior engineer” that can do anything. Specialized agents should have a narrow lens, clear tools, and a predictable output shape.
- Premature plugin packaging: A workflow is packaged before the team has run it enough times to know its inputs, failure modes, and stop conditions.
- Headless ambiguity: A non-interactive run is launched with vague goals and no acceptance criteria. Headless work should consume a brief, not invent one.
Module recap
The foundation of effective Claude Code use is not prompt cleverness. It is routing discipline. Teams that classify work correctly can preserve knowledge, reduce repeated prompting, avoid context-window waste, and turn successful sessions into reusable operating assets.
Module 2 · Block 2 · 55 minutes
Planning and Context Management
Planning is not a ceremonial step before coding. In Claude Code, planning is the act of shaping context so the next operator—human, model, agent, or headless workflow—can act without reopening the entire problem.
Hands-on: write a compact execution brief
- Choose a task.
- Separate facts, assumptions, open questions, constraints, non-goals, and acceptance criteria.
- Add a context map with file pointers and evidence commands.
- Write a guided
/compact focusing on...prompt.
The plan is a compression artifact
Claude Code sessions can accumulate enormous context: pasted requirements, file contents, terminal output, attempted fixes, test failures, screenshots, and corrections from the human. Without deliberate compression, the session becomes expensive and fragile. The model is forced to infer what still matters from a long transcript. Humans are forced to remember why earlier decisions were made. Downstream operators inherit noise instead of a plan.
A useful plan is not a transcript. It is a lossily compressed representation of the work. It preserves the facts, decisions, constraints, risks, and next actions that matter. It discards the conversational path that produced them. That is why the one-day course treats planning as context management.
Separate facts, assumptions, and open questions
The simplest improvement to most AI coding sessions is to stop blending known facts with guesses. Models are very good at continuing a confident narrative. If the prompt says “the auth middleware probably owns refresh-token invalidation,” the model may proceed as if that is true. A disciplined brief separates evidence from inference.
| Category | Definition | Example | How to handle it |
|---|---|---|---|
| Fact | A statement backed by direct evidence | `src/auth/middleware.ts` validates JWTs before route handlers run | Can be used directly in the plan |
| Assumption | A plausible statement not yet proven | Refresh token invalidation is probably handled in the session store | Must be tested or called out as risk |
| Open question | A decision or unknown that blocks confident execution | Should expired refresh tokens be deleted or retained for audit? | Resolve before implementation or explicitly defer |
| Constraint | A boundary the solution must respect | Do not change public API response shape | Use as acceptance criteria and review rule |
The compact execution brief
The execution brief is the central planning artifact. It should fit on one or two pages, but it should be complete enough for another operator to execute. The brief is not just a summary. It is an instruction-bearing artifact with a clear contract: here is the problem, here is the known context, here are the boundaries, here is the proposed path, and here is how we will know whether the work is done.
# Compact Execution Brief
## Goal
Implement refresh token rotation without changing the existing login response contract.
## Known facts
- JWT validation happens in src/auth/middleware.ts.
- Session persistence is implemented in src/auth/session-store.ts.
- Existing tests cover login success and expired access tokens.
## Assumptions to verify
- Old refresh tokens are not currently invalidated after rotation.
- Token reuse should be treated as suspicious but not immediately lock the account.
## Open questions
- Should reuse detection emit an audit event?
- Is token family tracking already present in the database schema?
## Context map
- Auth middleware: request validation and session lookup
- Session store: token persistence and expiration
- Test suite: integration tests under tests/auth/
## Work packages
1. Verify current token rotation behavior.
2. Add invalidation logic or token-family tracking.
3. Extend integration tests for old-token reuse.
4. Produce review and verification packet.
## Acceptance criteria
- New refresh token is issued on rotation.
- Previous refresh token cannot be reused.
- Existing login response shape is unchanged.
- Tests demonstrate success, expiration, and reuse behavior.
Context maps
A context map tells the model where to look and why. It is not a full dump of file contents. It is a pointer layer: directories, files, functions, commands, external systems, and documents that are likely relevant. Good context maps reduce token use because Claude can read the right files in the right order instead of scanning the repository blindly.
A context map should include both primary and secondary context. Primary context is required to make the change. Secondary context helps review risk, verify behavior, or understand why the system is shaped the way it is.
## Context map
Primary:
- src/auth/middleware.ts — request authentication boundary
- src/auth/session-store.ts — refresh token persistence
- db/schema.sql — session and token tables
- tests/auth/refresh-token.test.ts — integration behavior
Secondary:
- docs/security/auth-model.md — intended auth posture
- .claude/rules/api-security.md — project-specific security rules
- recent PRs touching auth middleware — intent and regression context
Debate review before coding
Before code is written, ask Claude to attack the plan. The goal is not to win the debate; the goal is to improve the artifact. A debate review should look for ambiguous goals, missing constraints, unsupported assumptions, hidden coupling, risky files, weak acceptance criteria, and likely regression paths.
Review this execution brief as a skeptical senior engineer. Identify unsupported assumptions, missing context, and acceptance criteria that would fail to catch a regression. Do not implement. Return a revised brief outline and a list of questions that must be answered before coding.
The output of debate review should be folded back into the brief. If the debate produces useful insights that remain trapped in conversation history, the next operator still cannot use them. Artifact-first planning means the artifact is the durable memory.
What makes a plan reusable downstream?
A reusable plan has clear boundaries. It names the exact goal, the non-goals, the files likely involved, the evidence already gathered, the assumptions still open, and the stop condition. It also includes enough review criteria to prevent the model from declaring success too early.
- Goal
- What outcome should exist after the work is complete?
- Non-goals
- What tempting adjacent work should not be done?
- Facts
- What has been directly observed?
- Assumptions
- What might be true, but needs verification?
- Context map
- Where should the next operator look first?
- Work packages
- What are the smallest implementation units?
- Acceptance criteria
- What evidence will prove the work is complete?
- Review focus
- What risks should review emphasize?
Module recap
Planning in Claude Code is not about slowing down. It is about preserving momentum by producing a compact artifact that can survive compaction, handoff, review, and automation. The better the brief, the less the model has to infer and the easier it is for humans to hold the work accountable.
Module 3 · Block 3 · 60 minutes
Intent Recovery and Dynamic Evidence
When a codebase is old enough, the current code is rarely the whole story. This module teaches participants to recover intent from Git history, issues, PRs, logs, tests, command traces, screenshots, and other dynamic sources before asking Claude to change behavior.
Hands-on: produce an investigation memo and feedback bundle
- Use at least two evidence types.
- Label evidence, inference, and unknowns.
- Rewrite weak feedback into expected vs actual, command output, file pointer, and next ask.
Find the why before the what
Claude can usually explain what code does from the current files. The harder question is why it does that. Was a strange branch added for a customer-specific edge case? Did a test encode a production incident? Was a confusing abstraction introduced to support a migration that has since finished? Current code often hides the reason for its own shape.
Intent recovery is the discipline of gathering enough historical and runtime evidence to avoid undoing deliberate behavior. It is especially important when the requested change appears simple. Simple changes are dangerous when they cut across hidden intent.
Static code is not enough
Reading the current file gives one kind of evidence. Git history gives another. Tests reveal expected behavior. Issues and PRs reveal tradeoffs. Logs reveal runtime reality. Screenshots reveal UI states that code alone may not make obvious. CLI traces reveal exact failure modes. Documentation and MCP-backed systems can provide external context that is not stored in the repository.
| Evidence source | Question it answers | Risk if omitted |
|---|---|---|
| Current code | What does the system do now? | The model may miss hidden coupling outside the local file |
| Git blame and commits | Why was this line introduced or changed? | The model may remove a deliberate workaround |
| PR discussion | What tradeoffs were accepted? | The model may re-litigate settled decisions |
| Issues / tickets | What user or incident motivated the behavior? | The model may solve the wrong problem |
| Tests | What behavior is currently protected? | The model may pass local reasoning but break expected behavior |
| Logs and traces | What happens in real executions? | The model may optimize for imagined behavior |
| Screenshots | What does the user actually see? | The model may miss visual or state-machine issues |
| Docs and runbooks | What standards should govern the change? | The model may violate team conventions |
Evidence versus inference
A strong investigation memo labels evidence. It does not say “the bug is caused by stale cache” unless there is direct evidence. It says “the failure appears after the cache read path; logs show cache hit with outdated value; no write-through event appears in the trace; inference: stale cache is likely.” This distinction matters because the next model turn will use the memo as context. If guesses are written like facts, the model will build on them.
# Investigation Memo
## Question
Why does the account summary show stale plan data after billing upgrade?
## Evidence gathered
- UI reproduces after upgrade from Basic to Pro without page refresh.
- Network trace shows GET /account returns plan=Basic immediately after upgrade.
- Server logs show billing webhook processed successfully.
- Git history shows account summary cache added in PR #184 to reduce billing API calls.
## Inferences
- The account summary endpoint likely reads a cached account projection.
- The billing webhook may update billing state without invalidating account summary cache.
## Open questions
- Where is account summary cache invalidated?
- Is delayed consistency acceptable for this UI?
## Recommended next step
Trace account summary cache writes and invalidation paths before changing UI code.
Dynamic evidence sources
Dynamic evidence is information that changes with execution: logs, test runs, database state, browser behavior, CLI output, screenshots, observability traces, and external tool responses. It is powerful because it grounds the model in reality. It is also noisy. A good operator does not paste raw dynamic output indiscriminately. They capture the relevant excerpt, label how it was produced, and explain why it matters.
When collecting dynamic evidence, include command provenance. The model should know not only the output, but the command, environment, timestamp or branch, and whether the result was reproducible.
Command: pnpm test tests/auth/refresh-token.test.ts --runInBand
Branch: refresh-token-rotation
Result: failed, 1 test
Relevant output:
expected old refresh token reuse to return 401
received 200
Interpretation:
Existing implementation issues a new refresh token but does not invalidate the previous token.
Why “that didn’t work” fails as feedback
The phrase “that didn’t work” is almost useless to the model. It omits what was attempted, what was expected, what actually happened, what evidence was observed, and what changed between attempts. Good feedback is a bundle: action, expectation, observation, evidence, hypothesis, and next constraint.
| Weak feedback | Better feedback |
|---|---|
| That did not work. | After applying the patch, `pnpm test tests/auth/refresh-token.test.ts` still fails. Expected old token reuse to return 401; actual response is 200. Relevant log shows session lookup succeeds for the old token. Focus next on invalidation in session-store, not middleware. |
| The UI is broken. | Clicking Save leaves the modal open. Browser console shows no error. Network tab shows PATCH /settings returns 204. The likely issue is local modal state not closing after success. |
| Try again. | Revise the approach without changing the public API response shape. Preserve existing success tests and add one regression test for duplicate submission. |
The feedback bundle pattern
A feedback bundle is a structured correction that improves the next model turn. It should be short, but it must contain enough evidence for the model to update its plan. Use it whenever Claude’s first implementation fails, when a test result contradicts an assumption, or when a human reviewer spots a gap.
# Feedback Bundle
## Attempted change
Added token rotation logic in src/auth/middleware.ts.
## Expected result
Old refresh token reuse should return 401.
## Actual result
Old refresh token reuse returns 200.
## Evidence
Test: pnpm test tests/auth/refresh-token.test.ts
Failure: expected 401, received 200
Trace: old token still resolves in session-store lookup.
## Updated hypothesis
Middleware is not the right layer for invalidation. The session store accepts both token records.
## Next instruction
Inspect session-store token persistence and invalidation. Do not change response shape.
Module recap
Intent recovery prevents well-intentioned regressions. Dynamic evidence prevents hallucinated debugging. Feedback bundles convert failure into useful context. Together, they turn Claude Code from a code generator into an evidence-driven collaborator.
Module 4 · Block 4 · 60 minutes
Review, Test, and Verify
The work is not done when code changes. It is done when the diff has been reviewed against the brief, the verification evidence is explicit, and the residual risk is clear enough for a human to accept or reject.
Hands-on: build a review and verification packet
- Review a diff against the brief.
- Name scope drift and regression risk.
- List verification evidence and what was not verified.
- End with residual risk and PR handoff.
Review the diff against the brief
A Claude-assisted review should not ask “does this code look good?” That question is too broad and too subjective. The stronger question is: “does this diff satisfy the execution brief without violating constraints or introducing unacceptable risk?” The brief becomes the review contract.
Findings-first review means the reviewer leads with issues, not narrative. Each finding should state severity, evidence, affected file or behavior, why it matters, and a recommended fix. If there are no blocking findings, the review should still describe what was checked and what residual risk remains.
# Review Finding
Severity: Important
Area: Refresh token invalidation
Evidence: tests/auth/refresh-token.test.ts covers successful rotation but not reuse of the old token.
Why it matters: The acceptance criteria require old refresh tokens to be rejected.
Recommended fix: Add a regression test that attempts reuse of the previous token after rotation and expects 401.
Scope drift
Scope drift is any change that is not required by the brief. Some drift is harmless cleanup. Some drift is dangerous because it changes behavior the team did not intend to change. Claude can drift when it sees adjacent improvements, especially if the prompt rewards broad helpfulness. Review must therefore compare the diff against explicit goals and non-goals.
| Drift type | Example | Review response |
|---|---|---|
| Benign cleanup | Renaming a local variable for clarity | Accept if low risk and local |
| Adjacent refactor | Changing session-store interfaces while adding one token behavior | Challenge unless required by the brief |
| Behavior expansion | Adding account lockout on token reuse when not requested | Reject or move to follow-up |
| Contract change | Changing login response shape while implementing rotation | Block |
| Test-only expansion | Adding regression tests for directly related edge cases | Usually accept |
Testing is one gate, not the only gate
Passing tests are necessary evidence, but they are not proof of correctness. Tests only cover what they assert. A verification packet should include test results, manual checks when relevant, static review, diff review, command outputs, and residual risk. The point is not to create paperwork. The point is to prevent the phrase “tests pass” from hiding an unreviewed assumption.
For LLM-assisted work, verification should also include provenance: what files changed, what commands were run, what evidence was observed, and what the model did not check. This gives the human reviewer a clear map of confidence and uncertainty.
The verification packet
# Verification Packet
## Brief alignment
Goal: Refresh token rotation rejects old token reuse.
Status: Implemented and tested.
## Changed files
- src/auth/session-store.ts — invalidates previous refresh token on rotation
- tests/auth/refresh-token.test.ts — adds old-token reuse regression test
## Evidence
- pnpm test tests/auth/refresh-token.test.ts — passed
- pnpm test tests/auth/login.test.ts — passed
- Manual API check: old refresh token returns 401 after rotation
## Scope review
No public response contract changes observed.
No unrelated auth routes modified.
## Residual risk
Database cleanup of invalidated token records is not addressed. Existing retention behavior remains unchanged.
## PR handoff note
Reviewer should focus on token-store concurrency and whether invalidated token retention meets audit expectations.
Regression risk
Regression risk is not just the probability that something breaks. It is the product of likelihood, blast radius, and detectability. A small likelihood with a huge blast radius still deserves attention. A likely bug with easy rollback may be acceptable if the release path is safe. Claude can help enumerate risks, but the team must decide what risk is acceptable.
| Risk question | Why it matters |
|---|---|
| What user-visible behavior changed? | Identifies blast radius |
| What existing tests protect this path? | Identifies current safety net |
| What did we not test? | Prevents false confidence |
| What external systems depend on this behavior? | Finds hidden contracts |
| How would we detect failure in production? | Separates known risk from invisible risk |
| How would we roll back? | Determines operational readiness |
What makes a handoff PR-ready?
A PR-ready handoff gives the reviewer the shortest path to an informed decision. It should contain the problem statement, brief link or summary, changed files, review focus, verification evidence, known non-goals, and residual risk. The reviewer should not need to reconstruct the story from chat history.
Problem → Approach → Changed files → Verification → Review focus → Residual risk. If any of those pieces is missing, the PR is not fully handoff-ready.
Module recap
Review and verification discipline turns model output into engineering evidence. The goal is not to make Claude “sound confident.” The goal is to make the work auditable: what changed, why it changed, how it was checked, and what remains uncertain.
Module 5 · Block 5 · 45 minutes
Workflow Design and Minimal Improvement Loop
The one-day session closes by turning isolated practices into a named workflow. Participants define handoffs, gates, stop conditions, and one lightweight scorecard so the next run can improve without expanding into a full operating-system redesign.
Hands-on: design one workflow
- Name the workflow trigger and roles.
- Define artifact handoffs, gates, and stop conditions.
- Choose which parts are commands, skills, agents, or human review.
- Add one next-run improvement.
From good sessions to repeatable workflows
A strong Claude Code session can still be a dead end if the team cannot repeat it. Workflow design captures the sequence that made the session successful: how the work was framed, what artifacts were produced, which agents or skills were used, where human review occurred, and what evidence counted as done. The workflow does not need to be elaborate. It needs to be named, bounded, and reusable.
The one-day course intentionally keeps this module compact. The goal is not to design a full engineering operating system. The goal is to leave with one workflow that can be tried next week and improved after one run.
A compact multi-agent workflow
Multi-agent workflow design should begin with work boundaries, not agent names. Each agent or subagent should own a distinct lens or phase. If two agents need the same broad context and produce overlapping output, the workflow is probably not decomposed well.
# Compact Workflow Spec
Name: Evidence-first bug fix
Trigger:
A bug report has enough detail to reproduce or investigate.
Artifacts:
1. Investigation memo
2. Compact execution brief
3. Implementation diff
4. Review and verification packet
Roles:
- Explorer subagent: identify relevant files, history, and evidence sources
- Planner: compress evidence into execution brief
- Implementer: make bounded code changes from the brief
- Reviewer: compare diff against brief and produce findings
Gates:
- Do not implement until facts, assumptions, and open questions are separated.
- Do not review until acceptance criteria are explicit.
- Do not hand off until verification evidence and residual risk are written.
Stop condition:
The change is PR-ready or blocked by a named open question.
Handoffs
A handoff is where one operator’s output becomes another operator’s input. In Claude Code workflows, handoffs should be artifact-based. The explorer hands off an investigation memo. The planner hands off an execution brief. The implementer hands off a diff plus notes. The reviewer hands off findings and verification evidence. If a handoff requires the next operator to read the entire chat transcript, the handoff failed.
| Handoff | Input | Output | Quality bar |
|---|---|---|---|
| Investigation → Planning | Evidence, traces, history, open questions | Execution brief | Facts and assumptions are separated |
| Planning → Implementation | Execution brief and context map | Bounded diff | Non-goals and acceptance criteria are respected |
| Implementation → Review | Diff and brief | Findings or approval with residual risk | Review is evidence-based |
| Review → Handoff | Findings, fixes, verification commands | PR-ready packet | Reviewer can decide without chat history |
Gates and stop conditions
Gates prevent premature motion. Stop conditions prevent infinite motion. A gate says what must be true before the workflow can advance. A stop condition says when the workflow is complete, blocked, or unsafe to continue. Claude workflows need both because models tend to continue helping unless told what “done” means.
Good gates are observable. “Make sure the plan is good” is not a gate. “The brief includes goal, non-goals, facts, assumptions, open questions, context map, work packages, and acceptance criteria” is a gate. Good stop conditions are explicit. “Continue until fixed” is vague. “Stop when the verification packet shows the acceptance criteria pass, or when an open question blocks safe implementation” is actionable.
One lightweight scorecard
The improvement loop should be small enough that the team actually uses it. Pick one scorecard that can be completed after a workflow run in five minutes. The scorecard should measure the workflow, not the model’s personality. It should ask whether artifacts were reusable, whether evidence was sufficient, whether context was controlled, whether review caught issues, and what should change next run.
# Next-run Improvement Scorecard
Workflow name:
Date:
Task:
1. Was the execution brief usable without reopening the original conversation? 0 / 1 / 2
2. Did the investigation memo separate evidence from inference? 0 / 1 / 2
3. Did the implementation stay inside the stated scope? 0 / 1 / 2
4. Did verification include more than passing tests? 0 / 1 / 2
5. Was residual risk explicit? 0 / 1 / 2
One thing to keep:
One thing to change next run:
One artifact or rule to update:
What to improve on the next run
Do not try to improve everything after the first run. Choose one improvement. Maybe the context map was too vague. Maybe the review agent needs a narrower rubric. Maybe the execution brief omitted non-goals. Maybe verification evidence was too thin. The scorecard turns that observation into a small change: update a template, add a rule, refine a skill, or adjust a gate.
Closing synthesis
The five modules form one loop. Route the work correctly. Compress the plan. Recover intent with evidence. Review against the brief. Preserve the workflow that worked. That loop is small enough to teach in a day and strong enough to become the foundation for team-scale Claude Code adoption.
Appendix: Student Desk Reference and Repo Links
CLAUDE.md; repeatable procedures become skills or workflows; broad exploration goes to subagents; PR handoffs require evidence.Core commands and settings
| Area | Use this | When it matters |
|---|---|---|
| Setup and health | /doctor, claude --safe-mode, CLAUDE_CODE_SAFE_MODE=1 | Validate install health or troubleshoot by disabling customizations. |
| Memory and context | CLAUDE.md, @docs/file.md, .claude/rules/*.md, /memory, /compact focusing on ..., /clear | Make important context durable, modular, scoped, inspectable, and cheap to carry forward. |
| Model routing | /model, /model opusplan, /effort low|medium|high|xhigh|max, /fast, fallbackModel, --fallback-model | Use deeper reasoning where mistakes are expensive; use faster/cheaper paths for bounded work. |
| Delegation | /agents, claude agents, project .claude/agents/, background subagents | Keep broad exploration isolated and return concise findings to the main session. |
| Review and cleanup | /code-review high, /code-review --fix, /simplify, REVIEW.md | Review against the brief, make risk explicit, and clean up before handoff. |
| Governance | availableModels, enforceAvailableModels, requiredMinimumVersion, Tool(param:value) permission rules, disableBundledSkills | Keep teams on approved models, versions, tools, and extension surfaces. |
Context pointers
- Search before reading: use grep/file search to find paths and line numbers first.
- Read narrowly: use offsets and limits instead of loading entire files.
- Label evidence: command run, source, timestamp, relevant excerpt, and why it matters.
- Separate facts, assumptions, open questions, and inferences.
- Compact between tasks and tell compaction what to preserve.
Primary repo links
Concrete Examples, File Locations, and Repo Links
Tools and permissions
Tools are the model action surface: reading, searching, editing, running commands, fetching context, and calling integrations. The course examples emphasize matching tool access to task risk rather than allowing everything by default.
Course backing doc
docs/TOOL-PERMISSIONS-EXAMPLES.md
Safe exploration, controlled implementation, shared repo guardrails, and workflow-specific permission posture.
Where this lives
settings.json, managed settings, permission dialogs, MCP/plugin policies, and task-specific approval choices.
Newer Claude Code versions also support Tool(param:value) permission matching.
Permission prompt pattern:
What must Claude read?
What may Claude write?
What requires approval?
What would be dangerous if Claude guessed?
What evidence is required before widening permissions?
Try it
- Choose one live task.
- Write allowed reads, allowed writes, and approval-required actions.
- Compare your posture to the safe exploration and controlled implementation examples.
Commands
Commands are for short, prompt-shaped, directly invoked operations. Use them when the behavior repeats but does not need a full method, bundled assets, or a specialist role.
Course backing doc
docs/PLUGINS-SKILLS-COMMANDS-AND-MODELS.md
Includes command examples such as /review-pr-risk, /summarize-failing-test, and /draft-pr-body.
Where this lives
Common project pattern: .claude/commands/<name>.md.
Built-ins appear in the slash menu, such as /model, /agents, /mcp, /permissions, and /compact.
# .claude/commands/summarize-failing-test.md
Summarize the failing test evidence in this shape:
1. Command run
2. First failing assertion or error line
3. Relevant file and line pointer
4. Likely failure category
5. Next narrow read or command
Do not propose a fix until the failure category is grounded in evidence.
Skills
Skills are for repeatable methods with structure: required inputs, context gathering, workflow, output artifact, verification checklist, and safety rules.
Skill starter template
The template defines the minimum sections students should fill in for a first-pass skill.
Where this lives
Common project pattern: .claude/skills/<skill-name>/SKILL.md.
Use skills for procedures and reusable artifact production, not always-on project facts.
# .claude/skills/flaky-test-investigation/SKILL.md
# Skill: flaky-test-investigation
## Purpose
Investigate a flaky test using evidence before proposing a fix.
## Required inputs
- failing command
- test name or file
- relevant CI/local output
## Workflow
1. Capture exact command and failure excerpt
2. Classify the failure mode
3. Identify dynamic evidence needed
4. Produce an investigation memo
5. Propose the next narrow action
## Outputs
A short investigation memo with evidence, inference, and next step.
Agents and subagents
Agents and subagents are bounded workers with a mission, explicit tools, output shape, and stop condition. Use them when you need separate context or specialist review.
Course backing doc
docs/PLUGINS-SKILLS-COMMANDS-AND-MODELS.md
Defines agent and subagent usage, including context isolation and bounded missions.
Where this lives
Common project pattern: .claude/agents/<agent-name>.md.
Use /agents or claude agents to manage sessions where supported.
# .claude/agents/security-reviewer.md
---
name: security-reviewer
description: Review code changes for security vulnerabilities. Use proactively.
tools: Read, Grep, Glob, Bash
model: sonnet
maxTurns: 10
---
You are a security specialist. For every code change:
1. Check for injection vulnerabilities
2. Verify input validation at system boundaries
3. Check for exposed secrets or API keys
4. Verify authentication and authorization checks
Report findings by severity. Do not edit files unless explicitly asked.
Plugins and marketplace evaluation
Plugins are a distribution abstraction. A plugin can package multiple reusable units such as commands, subagents, MCP servers, hooks, skills, or workflow assets. Evaluate a plugin like dependency surface area, not like a shortcut.
Course backing doc
Marketplace evaluation checklist
Use the checklist before installing or promoting a package.
Where this appears
Use /plugin flows and /plugin list where available.
Prefer local commands or skills when the behavior is still small or unstable.
Demo artifacts and anti-patterns
| Module | Good example | Anti-pattern |
|---|---|---|
| Operating surface | Routing board | Routing board anti-pattern |
| Investigation | Investigation memo | Investigation memo anti-pattern |
| Feedback | Feedback bundle | Feedback bundle anti-pattern |
| Review | Review packet | Review packet anti-pattern |
| Workflow | Operating scorecard | Scorecard anti-pattern |