DAY NIGHT
HelloFresh SE / Global Security Classification: Internal — Restricted Doc Ref: SEC-AI-TRANS-2026-Q2

Securing the AI Transformation at HelloFresh.

A stack-ranked program of ten practical controls for safely adopting third-party SaaS AI, Claude, Gemini, Bedrock, MCP servers, OSS LLMs, RAG, and agentic workflows across tech and non-tech functions — calibrated to the 02 Aug 2026 EU AI Act enforcement date.

Author
Syed Ishaq B. — Global Head of Security
Audience
SLT, Tribe Leads, GRC, Privacy, Platform
Frameworks Cited
EU AI Act · ISO/IEC 42001 · NIST AI RMF 1.0 · OWASP LLM Top 10 (2025) · OWASP MCP Top 10 · NIS2 · GDPR
Effective
06 May 2026 — Review Q3 2026
Brands in Scope
HelloFresh · Chef's Plate · EveryPlate · Green Chef · Factor75
Status
Draft v0.9 — for SLT review
§ 01

Executive summary

why now, where the line of fire sits

HelloFresh is moving AI from pilots into the operating fabric of the company: code copilots in engineering, RAG-grounded knowledge bots for CX and Ops, agentic flows touching planning, fulfilment, and supplier comms. Every one of those use cases adds new attack surface that the existing AppSec, IAM, and DLP stack does not natively cover.

The risk concentration is not the model itself. It is the data flowing into prompts, the actions agents are authorised to take, and the identity, supply-chain, and trust assumptions baked into MCP tools and third-party SaaS connectors. The recent in-house Bedrock prompt-logging incident, the public CVE-2025-6514 affecting mcp-remote across 437k+ installs, and the live tool-poisoning research against MCPTox-tested clients all point to the same gap: AI traffic and agent identity are governance blind spots.

Ten controls, ranked by prerequisite ordering and risk-reduction-per-FTE-week. Tier 1 (#1–#3) is foundational and non-negotiable before scaling. Tier 2 (#4–#6) builds the technical chokepoints. Tier 3 (#7–#10) is data-architecture, supply-chain assurance, detection, and validation.

§ 02

Risk heatmap — top 10 controls plotted

probability × business impact, exploit complexity overlay
Inherent risk landscape · gap-state (pre-mitigation)
Rare
Unlikely
Possible
Likely
Almost certain
Catastrophic
38
1
Major
10
56
24
Moderate
9
7
Minor
Negligible
→ probability of exploit / occurrence within 12 months
§ 03

Stack-ranked controls — at a glance

click a row below in §04 for full detail
# Control Risk Effort Residual Owner Primary mapping
§ 04

Detailed controls

expand any row for full specification
§ 05

Compliance crosswalk

control × framework coverage matrix
Control EU AI Act ISO 42001 ISO 27001 NIST AI RMF OWASP LLM10 OWASP MCP10 NIS2 GDPR PCI DSS
§ 06

Assumptions

stated explicitly so they can be challenged

Stated assumptions backing this stack-rank

  1. HelloFresh's AI exposure is currently dominated by SaaS LLM consumption (Claude, Gemini, ChatGPT) and Bedrock-hosted models, not in-house model training. The control set is weighted accordingly toward deployer obligations, not provider obligations under the EU AI Act.
  2. HelloFresh, as a B2C company processing customer PII across DACH, NA, UK, and AU, treats GDPR and EU AI Act as binding; PCI DSS is in-scope only for payment-adjacent flows. NIS2 applies through the food/distribution-essential-services vertical.
  3. Effort estimates assume the existing AppSec/IAM/DLP/SIEM stack and a security tribe of ~34 engineers is the baseline; estimates are FTE-weeks of net new work, not absolute hours.
  4. "Residual risk" assumes the listed mitigation set is fully implemented and operating; it does not assume every commercial alternative is purchased — the named tools are reference architectures, not endorsements.
  5. The stack-rank is a prerequisite ordering, not a strict severity ordering. #1 (governance) is highest priority because everything else depends on it; #10 (red-teaming) is critical-severity but can only validate controls #1–#9.
  6. Hodor.ai is positioned as the agent identity-policy-audit reference for #5; based on its own product page (identity, policy, guardrails, audit trail for AI agents). Other named vendors are positioned to their public capability claims as of May 2026 — confirm via PoC, not marketing.
  7. Risk ratings use a 5×5 probability × impact model with a complexity-of-exploitation modifier; complexity is treated as a probability discount, not a separate axis, to keep the heatmap two-dimensional.