Claude Security: Operator's Field Manual

§ 01

TL;DR

Claude Security is an agentic SAST replacement, not an additional one. It reads code the way a security researcher does, traces data flow across files, self-verifies, and produces patch-ready findings inside Claude Code on the Web. It is good at logic and multi-file vulnerabilities, weak at runtime, infra, container, and dependency CVE coverage. Treat it as one of three complementary layers next to AWS Security Agent (DAST / pentest) and Wiz (cloud-to-code posture, IaC, SCA).

Where it wins: business-logic flaws, IDOR, auth-bypass chains, deserialisation, multi-file taint paths, anything pattern matchers miss.
Where it does not play: dependency CVE feeds, IaC misconfig, container CVE, cloud posture, runtime threats, secret rotation. None of those are in scope.
Cost shape: consumption-billed at standard API rates on top of seat fees. There is no fixed price per scan. Plan a budget cap before opening it up to a team.
Determinism: stochastic by design. Two scans of the same SHA produce overlapping but not identical findings. Build process around that, not against it.

§ 02

What it actually is

Claude Security is a defensive code-scanning capability built into claude.ai, accessed at claude.ai/security. It launched as Claude Code Security in February 2026 limited preview, was renamed and reissued in public beta on 30 April 2026 to all Max, Team, and Enterprise tenants. The model under it is Opus 4.7, not the more capable but restricted Mythos. High

The agent scans a GitHub repository (full repo, scoped directory, or specific branch), reasons over the source, runs an adversarial verification pass on its own findings to suppress false positives, then produces structured output: title, location, impact, reproduction steps, recommended fix, severity, status, category. From any finding you can launch a Claude Code on the Web session pre-loaded with the context to draft and review the patch.

Mental model Read it as: "an Opus 4.7 agent with self-critique, scoped to one GitHub repo, scheduled or on-demand, exporting CSV / Markdown / webhooks." If you treat it as a Snyk replacement you will be disappointed. If you treat it as an SCA replacement you will ship a CVE.

Finding categories Anthropic calls out

Per the official help centre: injection (SQLi, command, code, XSS, XXE, ReDoS), path and network (traversal, SSRF, open redirect), auth and access (authn bypass, privesc, IDOR/BOLA, CSRF, race), memory safety (overflow, UAF, unsafe), cryptography (timing, alg confusion, weak primitives), deserialisation (arbitrary type instantiation), protocol and encoding (cache safety, encoding confusion, length-prefix trust). Severity is assigned per finding based on exploitability in your codebase, not category, so the same class can land High in one repo and Low in another. High

§ 03

Use cases

PRIMARY

Logic and multi-file vulnerabilities

The strongest pitch. Cross-file taint flow, business-logic IDOR, missing auth checks across handler chains, race windows. The places Semgrep and Snyk Code routinely miss.

PRIMARY

Pre-release deep review of a critical service

Schedule an Extended-effort scan against the service handling payments / PII before a release boundary. Treat the output as input to a human review, not a release gate.

PRIMARY

Targeted directory deep dive

Scope to auth/, billing/, session/ in a monorepo. Anthropic's own guidance is that scoping increases scan reliability and signal density.

SECONDARY

Continuous weekly hygiene scan

Weekly cadence tied to a triage ritual (Monday review, sprint boundary). Findings export to Jira via webhook, dismissed with documented reasons to build an audit trail.

SECONDARY

M&A or codebase due diligence

Useful for "what is the security shape of this codebase" before integration. Pair with Wiz on the cloud side. Note Anthropic's licensing rule: only scan code you own or hold rights to scan.

SECONDARY

Augmenting a thin AppSec team

If you have 1-2 AppSec engineers across 30+ repos, this is force multiplication. Not a replacement for human review on critical changes, a way to widen coverage at the long tail.

§ 04

Non-use cases

Just as important. If you reach for Claude Security for any of these, you are using the wrong tool.

DO NOT

Dependency & container CVE scanning

Not its job. Stay on Snyk Open Source, Trivy, or Wiz SCA. Claude Security has no Trivy database, no NVD feed, no transitive dependency graph.

DO NOT

IaC misconfiguration

Terraform, Helm, k8s manifests, CFN. The agent will read code but is not built for cloud-config posture. Use Wiz IaC, Checkov, or AWS Security Agent's design review.

DO NOT

Runtime, container, or cloud posture

Use Wiz Defend, Falco, GuardDuty, or your CSPM. Claude Security only sees source.

DO NOT

Secret scanning

It is not a Gitleaks or TruffleHog. Keep your existing pre-commit and push-protection in place. The agent may flag obvious hard-coded creds incidentally, do not rely on it.

DO NOT

Production penetration testing

It does not run anything. No active exploitation, no traffic, no auth fuzzing against live systems. AWS Security Agent or a human pentest covers this.

DO NOT

Compliance gating in CI/CD

Stochastic output and variable scan length make it a poor blocking gate. Run it as a sidecar that posts to Jira, not a check that fails the merge.

DO NOT

Third-party / OSS code you do not own

Anthropic's terms restrict use to code your company owns or holds rights to scan. Scanning OSS upstream "to find a 0-day" is a policy violation.

DO NOT

Non-GitHub repos

GitLab, Bitbucket, self-hosted Gitea, Azure DevOps Repos: not supported in beta. Anthropic has flagged GitHub-only as a current constraint.

§ 05

Prerequisites

Anthropic's own getting-started guide enumerates these, so this section reflects the official requirements rather than my interpretation. High

Eligible plan. Claude Enterprise, Team, or Max. (Enterprise was first, Team and Max were added in the public beta release.)
Claude Code on the Web enabled. The remediation flow opens a Claude Code session, so this has to be on for the org. Check at claude.ai/code.
Extra Usage enabled. Claude Security is consumption-billed. If Extra Usage is off you cannot run scans.
Anthropic GitHub App installed. Same app as Claude Code on the Web. Granted access to the repos you want to scan, at the GitHub org level.
Premium seats for scan operators. Standard seats do not include Claude Code on the Web. Each engineer who runs scans needs a premium seat.
Network allowlisting (optional). If your GitHub Enterprise has IP allowlisting, add Anthropic's published egress ranges.

§ 06

Step-by-step setup

The set-up itself is short. The hard work is policy: who can scan what, what gets exported where, how dismissals are governed.

Confirm Extra Usage and set a spend cap Go to Organization Billing settings. Enable Extra Usage if it is off. Set a separate spend limit specifically for Claude Security once the feature toggle exposes it. Treat the first month as a budget calibration period, not a steady state.
Verify the Anthropic GitHub App is installed In your GitHub org settings, check Installed GitHub Apps. The app should be present and granted access to the relevant repositories. Scope by repository selection, not by "all repos" unless the org is small enough that the blast radius is acceptable.
Enable Claude Security in the admin console Visit claude.ai/admin-settings/claude-code. Toggle Claude Security on. Once enabled, the Security entry appears in the left sidebar of claude.ai for users with premium seats.
Provision RBAC roles Anthropic exposes custom roles via Claude Enterprise RBAC. Create at least two: a Security Operator who can run scans and triage findings, and a Security Reviewer who can read findings and dismissals but not run scans. Default-deny scan permission to engineering generally. You do not want every engineer launching repo-wide Opus 4.7 sessions on a whim, both for cost and noise.
Pick the first three repositories Do not start with the monolith. Pick: one critical service (small, high-stakes, you already understand it), one repo recently audited by humans (so you can sanity-check signal vs noise), and one greenfield service (so you see what zero-history scanning produces).
Wire integrations before findings start landing Set up the Slack and Jira webhooks per project before the first scheduled scan. Findings without an inbox accumulate as a backlog people stop opening.
Document dismissal policy Decide before, not after, what counts as a valid dismissal reason. Suggested taxonomy: not-exploitable-in-context, compensating-control, accepted-risk, false-positive, duplicate. Reject a generic "not applicable" with no reason.
Run a calibration scan Pick the small critical service. Run a Regular-effort scan first, then an Extended-effort scan. Compare findings. The delta is your real signal about how much depth Extended buys you on this codebase.

§ 07

Running scans

Sidebar > Security or claude.ai/security.
Pick the repo. For anything larger than a single service, pick a directory or branch. Anthropic explicitly recommends scoping for larger repositories.
Effort: Regular vs Extended. Regular is the default. Extended runs deeper analysis at materially higher token cost. Use Extended on the first scan of a repo or after material changes (rewrite, framework upgrade, new auth).
Start the scan. Time varies based on repo size and what the agent decides to investigate. Minutes to hours, not seconds.
Run multiple in parallel. Useful for triaging several services at once, or comparing a hardened branch against main without serialising.

Determinism warning Scans are stochastic by design. Anthropic states this explicitly: the agent adapts its analysis per run rather than applying fixed pattern matches. Two scans on the same commit will overlap but not match. This is the right trade for catching logic bugs but the wrong trade for a CI gate. Plan accordingly.

§ 08

Sample configurations

Claude Security is configured in-product, not via YAML. There is no .claude-security.yml. What you can configure: schedule cadence, scan effort, scope, webhook endpoints, and dismissal taxonomy. The samples below show what the integrations on the receiving end should look like.

Webhook payload → Slack triage channel (Block Kit)

Claude Security pushes scan-completion and new-finding events. Forward through a small Lambda or n8n flow that maps severity to channel, formats with Block Kit, and posts a review button linking back to claude.ai/security.

# n8n / Lambda handler shape (illustrative)
on: webhook.claude-security.finding.created
filter:
  severity: [HIGH, MEDIUM]
  repository: hellofresh/*
route:
  HIGH:   "#sec-findings-high"
  MEDIUM: "#sec-findings-triage"
payload:
  blocks:
    - type: header
      text: "[{{severity}}] {{title}}"
    - type: section
      fields:
        - "*Repo:*  {{repository}}"
        - "*File:*  {{location.path}}:{{location.line}}"
        - "*Category:*  {{category}}"
    - type: actions
      elements:
        - type: button
          text: "Open in Claude Security"
          url:  "https://claude.ai/security/findings/{{id}}"
        - type: button
          text: "Triage in Jira"
          url:  "https://hellofresh.atlassian.net/secure/CreateIssue.jspa?summary={{title}}"

Jira auto-creation policy

# Recommended policy at the receiving end
create_issue_when:
  severity: HIGH
  AND repository_tier: tier-1   # PII, payments, auth

create_issue_when:
  severity: HIGH or MEDIUM
  AND scan_type: scheduled-weekly

project:        APPSEC
issuetype:      Vulnerability
priority_map:
  HIGH:   Highest
  MEDIUM: High
  LOW:    Medium
labels:         ["claude-security", "auto-created", "{{category}}"]
assignee:       service_owner_lookup({{repository}})

Schedule policy by repo tier

Tier	Examples	Cadence	Effort	Scope
T1	payments, auth, account, PII	Weekly	Extended	Service root
T2	order, fulfilment, internal APIs	Bi-weekly	Regular	Service root
T3	internal tools, low-risk back-office	Monthly	Regular	Service root
T0 monorepo	large platform repos	Weekly	Regular	One subdirectory per scan, rotated

Dismissal taxonomy

not-exploitable-in-context   # real bug, not reachable here
compensating-control         # WAF rule, network policy, etc.
accepted-risk                # risk owner has signed off; link decision doc
false-positive               # model wrong; link evidence
duplicate                    # covered by existing Jira ticket; link it

§ 09

Reviewing findings

Each finding ships with: title, details, location (file + line, linked), impact, reproduction steps, recommended fix, severity, status, category, repo, branch, date. Dismissed findings carry a reason and optional note that travel forward into future scans.

Recommended review flow:

Triage by severity, then exploitability. Anthropic's severity already accounts for exploitability per repo, but you still have local context they do not.
Read the reproduction steps before the recommended fix. If the repro is hand-wavy, the finding is suspect. Real ones describe a concrete payload or call sequence.
Open a remediation session. Click into Claude Code on the Web with the finding pre-loaded. Review the proposed patch as you would any PR. Do not auto-merge.
Dismiss with reason if not acting. The dismissal reason becomes the audit trail. Future reviewers (and future you) will read these.
Export per scan. Pull a CSV or Markdown after each significant scan. Keep the exports as the paper trail for SOC2 / ISO 27001 evidence. Do not rely on the in-product UI being your sole record.

§ 10

Operationalising

Ownership

Tag every project with a named owner (an engineering team or on-call rota), not a person. Findings without a routing rule become a backlog. Tie ownership to the same service catalogue you already use for incident routing.

Scoped scanning beats whole-repo scanning

For monorepos and anything over roughly 200k lines, run scoped scans against modules in rotation rather than full-repo scans. This is Anthropic's own guidance: narrower scope increases determinism and focuses the agent. It is also cheaper.

Audit trail discipline

Three artefacts per scan should be retained: the CSV / Markdown export, the Jira tickets created, and the dismissal log. These are what an auditor will ask for. Capture them in your existing GRC system, not just in the Claude Security UI.

Human gate on patches

Anthropic's own product copy says it: Claude can make mistakes, so you should always review proposed patches before applying them, especially for critical systems. The remediation session generates a candidate, not a merge-ready commit. Treat it as a PR draft.

Cyber Verification Program

If your team's legitimate security work (red team, exploit research, pentest) trips Opus 4.7's built-in cyber safeguards, Anthropic's Cyber Verification Program is the route to keep operating without interruption. Apply early if you anticipate it. Medium

§ 11

Token consumption & cost

This is the section vendors are most evasive on, so here is the honest shape of it.

Billing model

Claude Security is consumption-billed at standard Anthropic API rates on top of your seat fees. There is no fixed price per scan, no included monthly scan budget. The seat fee covers access; every token the agent spends scanning, reasoning, self-verifying, and producing the report is billed as Extra Usage. High

What drives cost

Codebase size. The agent reads source. Lines of code multiplied by re-reads as it traces flows is the dominant cost driver.
Effort setting. Extended runs deeper, costs materially more than Regular. No public multiplier published. Plan for 2x to 5x as a working assumption until you measure.
Scope. Full repo vs scoped directory differs by orders of magnitude on a monorepo.
Self-verification. The agent's own adversarial pass is part of the cost. It is what reduces false positives, but it is not free.
Findings volume. Each verified finding adds output tokens for the structured report.

What it does not do for cost

No prompt caching across scans documented. You should not assume re-scanning the same repo next week is cheap. Medium
No Batch API discount exposed for this product surface. Medium

Working numbers

Anthropic publishes Opus 4.7 API pricing (input / output / cache) on the platform docs. As of the launch window, expect that a Regular scan of a small service (sub-50k LOC) is in the low-single-digit USD range, an Extended scan on the same repo is the same order higher, and a full monorepo Extended scan can be in the low-three-digit USD range. These are working estimates, not Anthropic-published figures. Run a calibration scan on a known repo and divide your invoice line item by the run to get your actual unit cost. Low: order of magnitude only. Verify by measuring.

Cost-control playbook

Set a feature-level Claude Security spend cap separate from the org cap.
Default to Regular effort. Reserve Extended for tier-1 services and post-incident deep dives.
Scope to directories on monorepos. Rotate weekly, do not full-scan weekly.
Limit who can run scans via RBAC. Premium seat plus a custom role.
Review the first invoice line by line before unlocking the rest of the org.

§ 12

Anti-patterns

ANTI-PATTERN

Wiring it as a blocking CI gate

Stochastic output and variable runtime mean you will both miss findings and block clean PRs at random. Run async, post to Jira, do not fail builds.

ANTI-PATTERN

Treating the patch as merge-ready

The remediation patch is a draft. Always review. The tool itself surfaces this, but cultural norms drift fast under deadline pressure.

ANTI-PATTERN

Letting every engineer run scans

Cost blowout and signal blowout in the same move. Default-deny via RBAC, opt-in by team.

ANTI-PATTERN

Repeating the same scan to "confirm" a finding

Stochastic. A finding that disappears on rerun is not necessarily a false positive. Investigate the code path, not the second scan result.

ANTI-PATTERN

Replacing SCA / IaC / DAST tooling

It does none of those. If you sunset Snyk or Wiz on the back of a successful Claude Security pilot, you have created CVE blindness.

ANTI-PATTERN

Ignoring the licensing scope

Scan only code you own. Pointing it at upstream OSS, vendor SDKs, or competitor repos is a usage policy violation, regardless of intent.

ANTI-PATTERN

Skipping dismissal reasons

"Dismissed: not applicable" is worthless to the next reviewer. Dismissal taxonomy plus a one-line note is the audit trail you will be glad you built.

ANTI-PATTERN

Not setting a spend cap

Consumption billing plus an unfamiliar tool plus excited engineers equals an invoice you did not plan for. Cap before launch, not after.

§ 13

Versus AWS Security Agent & Wiz

The three tools occupy different layers of the same problem. They are complementary, not competitive, despite the marketing surface. High

What each one actually is

Claude Security (Anthropic, Apr 2026): Agentic SAST. Source-only. Logic and multi-file taint. Self-verifying. GitHub repo input, claude.ai surface, Claude Code remediation.
AWS Security Agent (Mar 2026 GA): Agentic DAST + design review + on-demand penetration testing. Reads source, architecture docs, and runs validated multi-step attacks against running apps. Output includes CVSS scores and reproducible exploit paths. Across AWS, Azure, GCP, on-prem.
Wiz Code (and the broader Wiz Cloud Security Platform): Cloud-to-code unified posture. SCA, secrets, IaC, container, CSPM, runtime. Code-to-cloud graph that ties a vulnerable line to a running workload to a public-facing cloud asset. Recently shipped Wiz Skills for agent-driven remediation in IDEs and PRs.

Side-by-side

Capability	Claude Security	AWS Security Agent	Wiz Code
Logic / multi-file SAST	Strong	Partial via code review	Partial
Active exploit validation (DAST)	No	Strong	No in Code; runtime via Defend
Design / architecture review	No	Strong	Partial via IaC
Dependency CVE / SCA	No	Partial	Strong
IaC misconfiguration	No	Strong	Strong
Container image CVE	No	Partial	Strong
Secrets in code	Incidental	Partial	Strong
Cloud posture (CSPM)	No	Partial	Strong
Runtime threat detection	No	No	Wiz Defend
Code-to-cloud graph	No	AWS-centric	Strong
Repo support	GitHub only	GitHub + others (via integration)	GitHub, GitLab, Bitbucket, Azure DevOps
Pricing model	Seat + token consumption	2-month free trial; AWS-billed	Per-asset SaaS subscription
Determinism	Stochastic	Stochastic (agent)	Deterministic (rule + graph)
Patch generation	Strong (Claude Code)	Strong (with code fixes)	Wiz Skills (agent-driven)

What this means in practice

For HelloFresh's stack (GitHub Actions → k8s on AWS), the rational division of labour:

Claude Security on tier-1 service repos for logic-class bugs that bypass pattern-based SAST. Weekly scoped scans.
AWS Security Agent for design-time review of new services and on-demand pentest of staging endpoints before release. Replaces the long lead-time human pentest for the long tail of services.
Wiz Code as the everyday floor: SCA, IaC, container, secrets, CSPM, runtime. The existing investment continues to do the heavy lifting on the cloud and dependency surface.

None of the three replaces the other two. The cost trap is convincing leadership any one of them is "the AI security tool" and dropping the others.

§ 14

HelloFresh comparison experiment

Blocker, with options I cannot run the actual three-way scan on a HelloFresh repo from this session: no access to the repos, the AWS account, or your Wiz tenant. What I can give you is the experiment design that produces a defensible comparison. Pick one of the three options below.

Option A: single-repo deep dive (1 week, low effort)

Pick one tier-1 service. Mid-sized, real history of findings. Avoid the absolute crown jewels for the first run.
Pin a specific commit SHA. All three tools scan the same SHA.
Run Claude Security (Extended), AWS Security Agent (code review + design review + pentest of the staging endpoint), Wiz Code (full).
Export each tool's findings to a single CSV. Normalise to: category, severity, file, line, summary, tool.
Score by hand: true positive, false positive, duplicate of another tool, unique to this tool.
Score the patches: safe to merge as-is, safe with edits, unsafe / wrong.

Output: a Venn diagram of unique-vs-overlapping findings, a precision number per tool, a patch-quality score per tool, and a per-finding cost on the Claude side.

Option B: representative cohort (3 weeks, medium effort)

Pick five repos across tiers: one tier-1 critical, two tier-2, two tier-3.
Same SHA-pinning and normalisation as Option A.
Score the same way, then aggregate.

Output: a tier-weighted view of where each tool earns its keep. Stronger basis for a procurement decision than a single-repo data point.

Option C: continuous parallel run (90 days, high effort)

Run all three in parallel across the top 20 repos for a quarter.
Track time-to-fix, real exploits caught, false-positive rate, and total tool-cost per real exploit caught.
Use that as the basis for a renewal / consolidation decision.

This is the only mode that produces a real ROI number. The first two produce signal-quality numbers, which is different.

Recommendation Start with Option A this month, escalate to Option B if the signal looks promising, only commit to Option C if you are heading into a renewal decision on Wiz or budgeting AWS Security Agent across the org. Calling a winner from Option A alone is the most common procurement mistake here.

Where each tool likely wins on HelloFresh-shaped code

Claude Security: auth and session logic across the customer-account services. IDOR in order / subscription endpoints. Multi-file taint in the recipe-rendering pipeline.
AWS Security Agent: design-review on new services before they hit prod. Validated pentest output that the eng team trusts more than a pure SAST finding. Speed up of the "we need a pentest before launch" lead time.
Wiz: the long tail of CVE in container images, IaC drift in EKS clusters, exposed S3, IAM blast radius. The stuff that does not show up in source.

§ 15

Sources

Anthropic, Claude Security product page: claude.com/product/claude-security
Anthropic, Use Claude Security (help centre): support.claude.com/en/articles/14661296-use-claude-security
Anthropic, Getting started with Claude Security: claude.com/resources/tutorials/getting-started-with-claude-security
SecurityWeek, Anthropic Unveils Claude Security to Counter AI-Powered Exploit Surge, 30 Apr 2026.
SiliconANGLE, Anthropic announces Claude Security public beta to find and fix software vulnerabilities, 30 Apr 2026.
The New Stack, Anthropic's Claude Security emerges from closed preview, 14 Mar 2026 (updated 30 Apr).
AWS, New AWS Security Agent secures applications proactively from design to deployment, GA 31 Mar 2026.
AWS Security Blog, Building AI defenses at scale, Apr 2026 (Project Glasswing context).
Google Cloud / Wiz, Next '26: Redefining security for the AI era, Apr 2026 (Wiz Skills, Wiz Code).
Anthropic platform docs, Pricing: platform.claude.com/docs/en/about-claude/pricing

§ 16

Assumptions

"AWS security agent" in your prompt refers to AWS Security Agent (the frontier-agent product GA'd 31 Mar 2026), not Amazon Inspector or Security Hub. If you meant a different product (Inspector, GuardDuty, Security Hub), the comparison framework changes shape and I will redo §13 accordingly.
"Wiz code scanning" refers to the Wiz Code module of the Wiz Cloud Security Platform, including SCA, IaC, secrets, container, and the Wiz Code-to-Cloud graph. If you specifically mean Wiz Defend (runtime) or a different module, that also reshapes §13.
HelloFresh is on Claude Enterprise and already has Claude Code on the Web in flight. If the org is still on Team or seat-based Enterprise, the prerequisite path in §05 differs slightly.
The cost figures in §11 are working estimates based on Opus 4.7 published API pricing, not a Claude Security price sheet. Anthropic has not published a per-scan price. Calibrate by measuring your first-month invoice.
The §13 capability table reflects vendor-published positioning as of 30 Apr 2026, plus my interpretation of practical strength. The "partial / strong" labels are a working assessment, not a benchmark, and would shift on real measurement.
This document does not include Mythos / Project Glasswing in scope. Mythos is research-restricted, not generally available, and does not factor into a current procurement decision.