Why AI red teaming isn't pentesting

I spent years doing traditional red team work before I started spending serious time on AI systems. The first time I tried to apply the same instincts — enumerate, find the boundary, push past it — I noticed something uncomfortable: some of it worked immediately, and some of it failed in ways I didn't expect. The failure modes were the interesting part.

This isn't a "AI red teaming is totally different" post. It's not. But the parts that are different are exactly the parts where experienced operators get overconfident, so they're worth naming precisely.

What transfers.

The core loop is the same: understand the system, find the seam between intended behavior and actual behavior, and apply pressure at that seam. That loop doesn't change. Neither does the discipline of reading before you push — you still need to understand what you're attacking before you start generating noise.

Threat modeling transfers almost directly. Thinking in terms of trust boundaries, privilege levels, and data flow is just as relevant when the "data" is a prompt and the "privilege level" is whether the system prompt is accessible. The vocabulary maps cleanly even when the substrate doesn't.

Recon discipline — read the docs, understand the architecture before touching anything
Boundary thinking — where does the system trust input it shouldn't?
Chaining — a single technique rarely wins; the interesting results come from combining them
Documentation — logging what you tried and what it produced, not just what worked

What doesn't.

The biggest mismatch is determinism. Traditional systems do the same thing every time you send the same input. LLMs don't. A payload that bypassed a guardrail at temperature 0.9 on Tuesday may hit the guardrail at 0.7 on Wednesday. This breaks the standard workflow of "confirm the finding, then reproduce it on demand."

You're not finding a vulnerability in code. You're finding a region of the model's behavior space where the intended constraints are unreliable. That's a different kind of finding, and it requires a different kind of evidence.

The second mismatch is scope. In a network engagement, "out of scope" has a hard technical boundary — the subnet, the IP range, the asset list. In an AI engagement, the boundary is semantic. "Only answer questions about our product" is a policy, not a firewall. Operators who are used to hard technical gates will underestimate how porous semantic gates are.

The persistence analogy breaks here

Traditional persistence is about staying in a system across reboots, log rotations, and detection events. In agentic AI systems, persistence looks different: it's about whether you can influence the model's behavior across conversation turns, inject content into its memory, or poison its retrieval context. Same instinct, different substrate — but the operational specifics are different enough that muscle memory will lead you to look in the wrong places.

Calibration note

If you're coming from traditional red team and picking up AI work: the first two engagements will feel slower than they should. That's normal. You're rebuilding pattern recognition in a domain where the signals look similar but mean different things. Budget for it.

The one thing worth keeping.

If I had to pick one thing from traditional red team work to preserve completely, it's the habit of writing down what you tried and what happened — not just the wins. LLM behavior is stochastic enough that you need a log to distinguish "this doesn't work" from "this works 20% of the time." You can't hold that in your head across a multi-day engagement.

The operators who are going to do this well aren't the ones who forget everything they knew about traditional red team. They're the ones who know exactly which parts of that knowledge to trust and which parts to consciously set aside.