A direct guide for security engineers shipping code through GitHub Actions to Kubernetes on AWS, with a head-to-head framework against AWS Security Agent and Wiz Code Security.
STATUS Public beta, Apr 30 2026BACKBONE Claude Opus 4.7TIER Max / Team / EnterpriseSCOPE GitHub repos only
§ 01
TL;DR
Claude Security is an agentic SAST replacement, not an additional one. It reads code the way a security researcher does, traces data flow across files, self-verifies, and produces patch-ready findings inside Claude Code on the Web. It is good at logic and multi-file vulnerabilities, weak at runtime, infra, container, and dependency CVE coverage. Treat it as one of three complementary layers next to AWS Security Agent (DAST / pentest) and Wiz (cloud-to-code posture, IaC, SCA).
Where it wins: business-logic flaws, IDOR, auth-bypass chains, deserialisation, multi-file taint paths, anything pattern matchers miss.
Where it does not play: dependency CVE feeds, IaC misconfig, container CVE, cloud posture, runtime threats, secret rotation. None of those are in scope.
Cost shape: consumption-billed at standard API rates on top of seat fees. There is no fixed price per scan. Plan a budget cap before opening it up to a team.
Determinism: stochastic by design. Two scans of the same SHA produce overlapping but not identical findings. Build process around that, not against it.
§ 02
What it actually is
Claude Security is a defensive code-scanning capability built into claude.ai, accessed at claude.ai/security. It launched as Claude Code Security in February 2026 limited preview, was renamed and reissued in public beta on 30 April 2026 to all Max, Team, and Enterprise tenants. The model under it is Opus 4.7, not the more capable but restricted Mythos. High
The agent scans a GitHub repository (full repo, scoped directory, or specific branch), reasons over the source, runs an adversarial verification pass on its own findings to suppress false positives, then produces structured output: title, location, impact, reproduction steps, recommended fix, severity, status, category. From any finding you can launch a Claude Code on the Web session pre-loaded with the context to draft and review the patch.
Mental modelRead it as: "an Opus 4.7 agent with self-critique, scoped to one GitHub repo, scheduled or on-demand, exporting CSV / Markdown / webhooks." If you treat it as a Snyk replacement you will be disappointed. If you treat it as an SCA replacement you will ship a CVE.
Finding categories Anthropic calls out
Per the official help centre: injection (SQLi, command, code, XSS, XXE, ReDoS), path and network (traversal, SSRF, open redirect), auth and access (authn bypass, privesc, IDOR/BOLA, CSRF, race), memory safety (overflow, UAF, unsafe), cryptography (timing, alg confusion, weak primitives), deserialisation (arbitrary type instantiation), protocol and encoding (cache safety, encoding confusion, length-prefix trust). Severity is assigned per finding based on exploitability in your codebase, not category, so the same class can land High in one repo and Low in another. High
§ 03
Use cases
PRIMARY
Logic and multi-file vulnerabilities
The strongest pitch. Cross-file taint flow, business-logic IDOR, missing auth checks across handler chains, race windows. The places Semgrep and Snyk Code routinely miss.
PRIMARY
Pre-release deep review of a critical service
Schedule an Extended-effort scan against the service handling payments / PII before a release boundary. Treat the output as input to a human review, not a release gate.
PRIMARY
Targeted directory deep dive
Scope to auth/, billing/, session/ in a monorepo. Anthropic's own guidance is that scoping increases scan reliability and signal density.
SECONDARY
Continuous weekly hygiene scan
Weekly cadence tied to a triage ritual (Monday review, sprint boundary). Findings export to Jira via webhook, dismissed with documented reasons to build an audit trail.
SECONDARY
M&A or codebase due diligence
Useful for "what is the security shape of this codebase" before integration. Pair with Wiz on the cloud side. Note Anthropic's licensing rule: only scan code you own or hold rights to scan.
SECONDARY
Augmenting a thin AppSec team
If you have 1-2 AppSec engineers across 30+ repos, this is force multiplication. Not a replacement for human review on critical changes, a way to widen coverage at the long tail.
§ 04
Non-use cases
Just as important. If you reach for Claude Security for any of these, you are using the wrong tool.
DO NOT
Dependency & container CVE scanning
Not its job. Stay on Snyk Open Source, Trivy, or Wiz SCA. Claude Security has no Trivy database, no NVD feed, no transitive dependency graph.
DO NOT
IaC misconfiguration
Terraform, Helm, k8s manifests, CFN. The agent will read code but is not built for cloud-config posture. Use Wiz IaC, Checkov, or AWS Security Agent's design review.
DO NOT
Runtime, container, or cloud posture
Use Wiz Defend, Falco, GuardDuty, or your CSPM. Claude Security only sees source.
DO NOT
Secret scanning
It is not a Gitleaks or TruffleHog. Keep your existing pre-commit and push-protection in place. The agent may flag obvious hard-coded creds incidentally, do not rely on it.
DO NOT
Production penetration testing
It does not run anything. No active exploitation, no traffic, no auth fuzzing against live systems. AWS Security Agent or a human pentest covers this.
DO NOT
Compliance gating in CI/CD
Stochastic output and variable scan length make it a poor blocking gate. Run it as a sidecar that posts to Jira, not a check that fails the merge.
DO NOT
Third-party / OSS code you do not own
Anthropic's terms restrict use to code your company owns or holds rights to scan. Scanning OSS upstream "to find a 0-day" is a policy violation.
DO NOT
Non-GitHub repos
GitLab, Bitbucket, self-hosted Gitea, Azure DevOps Repos: not supported in beta. Anthropic has flagged GitHub-only as a current constraint.
§ 05
Prerequisites
Anthropic's own getting-started guide enumerates these, so this section reflects the official requirements rather than my interpretation. High
Eligible plan. Claude Enterprise, Team, or Max. (Enterprise was first, Team and Max were added in the public beta release.)
Claude Code on the Web enabled. The remediation flow opens a Claude Code session, so this has to be on for the org. Check at claude.ai/code.
Extra Usage enabled. Claude Security is consumption-billed. If Extra Usage is off you cannot run scans.
Anthropic GitHub App installed. Same app as Claude Code on the Web. Granted access to the repos you want to scan, at the GitHub org level.
Premium seats for scan operators. Standard seats do not include Claude Code on the Web. Each engineer who runs scans needs a premium seat.
Network allowlisting (optional). If your GitHub Enterprise has IP allowlisting, add Anthropic's published egress ranges.
§ 06
Step-by-step setup
The set-up itself is short. The hard work is policy: who can scan what, what gets exported where, how dismissals are governed.
Confirm Extra Usage and set a spend cap
Go to Organization Billing settings. Enable Extra Usage if it is off. Set a separate spend limit specifically for Claude Security once the feature toggle exposes it. Treat the first month as a budget calibration period, not a steady state.
Verify the Anthropic GitHub App is installed
In your GitHub org settings, check Installed GitHub Apps. The app should be present and granted access to the relevant repositories. Scope by repository selection, not by "all repos" unless the org is small enough that the blast radius is acceptable.
Enable Claude Security in the admin console
Visit claude.ai/admin-settings/claude-code. Toggle Claude Security on. Once enabled, the Security entry appears in the left sidebar of claude.ai for users with premium seats.
Provision RBAC roles
Anthropic exposes custom roles via Claude Enterprise RBAC. Create at least two: a Security Operator who can run scans and triage findings, and a Security Reviewer who can read findings and dismissals but not run scans. Default-deny scan permission to engineering generally. You do not want every engineer launching repo-wide Opus 4.7 sessions on a whim, both for cost and noise.
Pick the first three repositories
Do not start with the monolith. Pick: one critical service (small, high-stakes, you already understand it), one repo recently audited by humans (so you can sanity-check signal vs noise), and one greenfield service (so you see what zero-history scanning produces).
Wire integrations before findings start landing
Set up the Slack and Jira webhooks per project before the first scheduled scan. Findings without an inbox accumulate as a backlog people stop opening.
Document dismissal policy
Decide before, not after, what counts as a valid dismissal reason. Suggested taxonomy: not-exploitable-in-context, compensating-control, accepted-risk, false-positive, duplicate. Reject a generic "not applicable" with no reason.
Run a calibration scan
Pick the small critical service. Run a Regular-effort scan first, then an Extended-effort scan. Compare findings. The delta is your real signal about how much depth Extended buys you on this codebase.
§ 07
Running scans
Sidebar > Security or claude.ai/security.
Pick the repo. For anything larger than a single service, pick a directory or branch. Anthropic explicitly recommends scoping for larger repositories.
Effort: Regular vs Extended. Regular is the default. Extended runs deeper analysis at materially higher token cost. Use Extended on the first scan of a repo or after material changes (rewrite, framework upgrade, new auth).
Start the scan. Time varies based on repo size and what the agent decides to investigate. Minutes to hours, not seconds.
Run multiple in parallel. Useful for triaging several services at once, or comparing a hardened branch against main without serialising.
Determinism warning
Scans are stochastic by design. Anthropic states this explicitly: the agent adapts its analysis per run rather than applying fixed pattern matches. Two scans on the same commit will overlap but not match. This is the right trade for catching logic bugs but the wrong trade for a CI gate. Plan accordingly.
§ 08
Sample configurations
Claude Security is configured in-product, not via YAML. There is no .claude-security.yml. What you can configure: schedule cadence, scan effort, scope, webhook endpoints, and dismissal taxonomy. The samples below show what the integrations on the receiving end should look like.
Claude Security pushes scan-completion and new-finding events. Forward through a small Lambda or n8n flow that maps severity to channel, formats with Block Kit, and posts a review button linking back to claude.ai/security.
# Recommended policy at the receiving endcreate_issue_when:
severity: HIGH
AND repository_tier: tier-1 # PII, payments, authcreate_issue_when:
severity: HIGH or MEDIUM
AND scan_type: scheduled-weekly
project: APPSEC
issuetype: Vulnerability
priority_map:
HIGH: Highest
MEDIUM: High
LOW: Medium
labels: ["claude-security", "auto-created", "{{category}}"]
assignee: service_owner_lookup({{repository}})
Schedule policy by repo tier
Tier
Examples
Cadence
Effort
Scope
T1
payments, auth, account, PII
Weekly
Extended
Service root
T2
order, fulfilment, internal APIs
Bi-weekly
Regular
Service root
T3
internal tools, low-risk back-office
Monthly
Regular
Service root
T0 monorepo
large platform repos
Weekly
Regular
One subdirectory per scan, rotated
Dismissal taxonomy
not-exploitable-in-context# real bug, not reachable herecompensating-control# WAF rule, network policy, etc.accepted-risk# risk owner has signed off; link decision docfalse-positive# model wrong; link evidenceduplicate# covered by existing Jira ticket; link it
§ 09
Reviewing findings
Each finding ships with: title, details, location (file + line, linked), impact, reproduction steps, recommended fix, severity, status, category, repo, branch, date. Dismissed findings carry a reason and optional note that travel forward into future scans.
Recommended review flow:
Triage by severity, then exploitability. Anthropic's severity already accounts for exploitability per repo, but you still have local context they do not.
Read the reproduction steps before the recommended fix. If the repro is hand-wavy, the finding is suspect. Real ones describe a concrete payload or call sequence.
Open a remediation session. Click into Claude Code on the Web with the finding pre-loaded. Review the proposed patch as you would any PR. Do not auto-merge.
Dismiss with reason if not acting. The dismissal reason becomes the audit trail. Future reviewers (and future you) will read these.
Export per scan. Pull a CSV or Markdown after each significant scan. Keep the exports as the paper trail for SOC2 / ISO 27001 evidence. Do not rely on the in-product UI being your sole record.
§ 10
Operationalising
Ownership
Tag every project with a named owner (an engineering team or on-call rota), not a person. Findings without a routing rule become a backlog. Tie ownership to the same service catalogue you already use for incident routing.
Scoped scanning beats whole-repo scanning
For monorepos and anything over roughly 200k lines, run scoped scans against modules in rotation rather than full-repo scans. This is Anthropic's own guidance: narrower scope increases determinism and focuses the agent. It is also cheaper.
Audit trail discipline
Three artefacts per scan should be retained: the CSV / Markdown export, the Jira tickets created, and the dismissal log. These are what an auditor will ask for. Capture them in your existing GRC system, not just in the Claude Security UI.
Human gate on patches
Anthropic's own product copy says it: Claude can make mistakes, so you should always review proposed patches before applying them, especially for critical systems. The remediation session generates a candidate, not a merge-ready commit. Treat it as a PR draft.
Cyber Verification Program
If your team's legitimate security work (red team, exploit research, pentest) trips Opus 4.7's built-in cyber safeguards, Anthropic's Cyber Verification Program is the route to keep operating without interruption. Apply early if you anticipate it. Medium
§ 11
Token consumption & cost
This is the section vendors are most evasive on, so here is the honest shape of it.
Billing model
Claude Security is consumption-billed at standard Anthropic API rates on top of your seat fees. There is no fixed price per scan, no included monthly scan budget. The seat fee covers access; every token the agent spends scanning, reasoning, self-verifying, and producing the report is billed as Extra Usage. High
What drives cost
Codebase size. The agent reads source. Lines of code multiplied by re-reads as it traces flows is the dominant cost driver.
Effort setting. Extended runs deeper, costs materially more than Regular. No public multiplier published. Plan for 2x to 5x as a working assumption until you measure.
Scope. Full repo vs scoped directory differs by orders of magnitude on a monorepo.
Self-verification. The agent's own adversarial pass is part of the cost. It is what reduces false positives, but it is not free.
Findings volume. Each verified finding adds output tokens for the structured report.
What it does not do for cost
No prompt caching across scans documented. You should not assume re-scanning the same repo next week is cheap. Medium
No Batch API discount exposed for this product surface. Medium
Working numbers
Anthropic publishes Opus 4.7 API pricing (input / output / cache) on the platform docs. As of the launch window, expect that a Regular scan of a small service (sub-50k LOC) is in the low-single-digit USD range, an Extended scan on the same repo is the same order higher, and a full monorepo Extended scan can be in the low-three-digit USD range. These are working estimates, not Anthropic-published figures. Run a calibration scan on a known repo and divide your invoice line item by the run to get your actual unit cost. Low: order of magnitude only. Verify by measuring.
Cost-control playbook
Set a feature-level Claude Security spend cap separate from the org cap.
Default to Regular effort. Reserve Extended for tier-1 services and post-incident deep dives.
Scope to directories on monorepos. Rotate weekly, do not full-scan weekly.
Limit who can run scans via RBAC. Premium seat plus a custom role.
Review the first invoice line by line before unlocking the rest of the org.
§ 12
Anti-patterns
ANTI-PATTERN
Wiring it as a blocking CI gate
Stochastic output and variable runtime mean you will both miss findings and block clean PRs at random. Run async, post to Jira, do not fail builds.
ANTI-PATTERN
Treating the patch as merge-ready
The remediation patch is a draft. Always review. The tool itself surfaces this, but cultural norms drift fast under deadline pressure.
ANTI-PATTERN
Letting every engineer run scans
Cost blowout and signal blowout in the same move. Default-deny via RBAC, opt-in by team.
ANTI-PATTERN
Repeating the same scan to "confirm" a finding
Stochastic. A finding that disappears on rerun is not necessarily a false positive. Investigate the code path, not the second scan result.
ANTI-PATTERN
Replacing SCA / IaC / DAST tooling
It does none of those. If you sunset Snyk or Wiz on the back of a successful Claude Security pilot, you have created CVE blindness.
ANTI-PATTERN
Ignoring the licensing scope
Scan only code you own. Pointing it at upstream OSS, vendor SDKs, or competitor repos is a usage policy violation, regardless of intent.
ANTI-PATTERN
Skipping dismissal reasons
"Dismissed: not applicable" is worthless to the next reviewer. Dismissal taxonomy plus a one-line note is the audit trail you will be glad you built.
ANTI-PATTERN
Not setting a spend cap
Consumption billing plus an unfamiliar tool plus excited engineers equals an invoice you did not plan for. Cap before launch, not after.
§ 13
Versus AWS Security Agent & Wiz
The three tools occupy different layers of the same problem. They are complementary, not competitive, despite the marketing surface. High
What each one actually is
Claude Security (Anthropic, Apr 2026)
Agentic SAST. Source-only. Logic and multi-file taint. Self-verifying. GitHub repo input, claude.ai surface, Claude Code remediation.
AWS Security Agent (Mar 2026 GA)
Agentic DAST + design review + on-demand penetration testing. Reads source, architecture docs, and runs validated multi-step attacks against running apps. Output includes CVSS scores and reproducible exploit paths. Across AWS, Azure, GCP, on-prem.
Wiz Code (and the broader Wiz Cloud Security Platform)
Cloud-to-code unified posture. SCA, secrets, IaC, container, CSPM, runtime. Code-to-cloud graph that ties a vulnerable line to a running workload to a public-facing cloud asset. Recently shipped Wiz Skills for agent-driven remediation in IDEs and PRs.
Side-by-side
Capability
Claude Security
AWS Security Agent
Wiz Code
Logic / multi-file SAST
Strong
Partial via code review
Partial
Active exploit validation (DAST)
No
Strong
No in Code; runtime via Defend
Design / architecture review
No
Strong
Partial via IaC
Dependency CVE / SCA
No
Partial
Strong
IaC misconfiguration
No
Strong
Strong
Container image CVE
No
Partial
Strong
Secrets in code
Incidental
Partial
Strong
Cloud posture (CSPM)
No
Partial
Strong
Runtime threat detection
No
No
Wiz Defend
Code-to-cloud graph
No
AWS-centric
Strong
Repo support
GitHub only
GitHub + others (via integration)
GitHub, GitLab, Bitbucket, Azure DevOps
Pricing model
Seat + token consumption
2-month free trial; AWS-billed
Per-asset SaaS subscription
Determinism
Stochastic
Stochastic (agent)
Deterministic (rule + graph)
Patch generation
Strong (Claude Code)
Strong (with code fixes)
Wiz Skills (agent-driven)
What this means in practice
For HelloFresh's stack (GitHub Actions → k8s on AWS), the rational division of labour:
Claude Security on tier-1 service repos for logic-class bugs that bypass pattern-based SAST. Weekly scoped scans.
AWS Security Agent for design-time review of new services and on-demand pentest of staging endpoints before release. Replaces the long lead-time human pentest for the long tail of services.
Wiz Code as the everyday floor: SCA, IaC, container, secrets, CSPM, runtime. The existing investment continues to do the heavy lifting on the cloud and dependency surface.
None of the three replaces the other two. The cost trap is convincing leadership any one of them is "the AI security tool" and dropping the others.
§ 14
HelloFresh comparison experiment
Blocker, with options
I cannot run the actual three-way scan on a HelloFresh repo from this session: no access to the repos, the AWS account, or your Wiz tenant. What I can give you is the experiment design that produces a defensible comparison. Pick one of the three options below.
Option A: single-repo deep dive (1 week, low effort)
Pick one tier-1 service. Mid-sized, real history of findings. Avoid the absolute crown jewels for the first run.
Pin a specific commit SHA. All three tools scan the same SHA.
Run Claude Security (Extended), AWS Security Agent (code review + design review + pentest of the staging endpoint), Wiz Code (full).
Export each tool's findings to a single CSV. Normalise to: category, severity, file, line, summary, tool.
Score by hand: true positive, false positive, duplicate of another tool, unique to this tool.
Score the patches: safe to merge as-is, safe with edits, unsafe / wrong.
Output: a Venn diagram of unique-vs-overlapping findings, a precision number per tool, a patch-quality score per tool, and a per-finding cost on the Claude side.
Option B: representative cohort (3 weeks, medium effort)
Pick five repos across tiers: one tier-1 critical, two tier-2, two tier-3.
Same SHA-pinning and normalisation as Option A.
Score the same way, then aggregate.
Output: a tier-weighted view of where each tool earns its keep. Stronger basis for a procurement decision than a single-repo data point.
Option C: continuous parallel run (90 days, high effort)
Run all three in parallel across the top 20 repos for a quarter.
Track time-to-fix, real exploits caught, false-positive rate, and total tool-cost per real exploit caught.
Use that as the basis for a renewal / consolidation decision.
This is the only mode that produces a real ROI number. The first two produce signal-quality numbers, which is different.
Recommendation
Start with Option A this month, escalate to Option B if the signal looks promising, only commit to Option C if you are heading into a renewal decision on Wiz or budgeting AWS Security Agent across the org. Calling a winner from Option A alone is the most common procurement mistake here.
Where each tool likely wins on HelloFresh-shaped code
Claude Security: auth and session logic across the customer-account services. IDOR in order / subscription endpoints. Multi-file taint in the recipe-rendering pipeline.
AWS Security Agent: design-review on new services before they hit prod. Validated pentest output that the eng team trusts more than a pure SAST finding. Speed up of the "we need a pentest before launch" lead time.
Wiz: the long tail of CVE in container images, IaC drift in EKS clusters, exposed S3, IAM blast radius. The stuff that does not show up in source.
§ 15
Sources
Anthropic, Claude Security product page: claude.com/product/claude-security
Anthropic, Use Claude Security (help centre): support.claude.com/en/articles/14661296-use-claude-security
Anthropic, Getting started with Claude Security: claude.com/resources/tutorials/getting-started-with-claude-security
SecurityWeek, Anthropic Unveils Claude Security to Counter AI-Powered Exploit Surge, 30 Apr 2026.
SiliconANGLE, Anthropic announces Claude Security public beta to find and fix software vulnerabilities, 30 Apr 2026.
The New Stack, Anthropic's Claude Security emerges from closed preview, 14 Mar 2026 (updated 30 Apr).
AWS, New AWS Security Agent secures applications proactively from design to deployment, GA 31 Mar 2026.
AWS Security Blog, Building AI defenses at scale, Apr 2026 (Project Glasswing context).
Google Cloud / Wiz, Next '26: Redefining security for the AI era, Apr 2026 (Wiz Skills, Wiz Code).
"AWS security agent" in your prompt refers to AWS Security Agent (the frontier-agent product GA'd 31 Mar 2026), not Amazon Inspector or Security Hub. If you meant a different product (Inspector, GuardDuty, Security Hub), the comparison framework changes shape and I will redo §13 accordingly.
"Wiz code scanning" refers to the Wiz Code module of the Wiz Cloud Security Platform, including SCA, IaC, secrets, container, and the Wiz Code-to-Cloud graph. If you specifically mean Wiz Defend (runtime) or a different module, that also reshapes §13.
HelloFresh is on Claude Enterprise and already has Claude Code on the Web in flight. If the org is still on Team or seat-based Enterprise, the prerequisite path in §05 differs slightly.
The cost figures in §11 are working estimates based on Opus 4.7 published API pricing, not a Claude Security price sheet. Anthropic has not published a per-scan price. Calibrate by measuring your first-month invoice.
The §13 capability table reflects vendor-published positioning as of 30 Apr 2026, plus my interpretation of practical strength. The "partial / strong" labels are a working assessment, not a benchmark, and would shift on real measurement.
This document does not include Mythos / Project Glasswing in scope. Mythos is research-restricted, not generally available, and does not factor into a current procurement decision.