Notes from traditional to AI red teaming

§ The Crossing

Mid-crossing. Eyes open.

StatusActive duty · 12 mo out

BackgroundTraditional red team, 10+ yrs

AI work1 task order · OSAI in progress

VoiceFirst-person, opinionated

StanceVendor-neutral

CadenceAs I learn

Day job is still traditional red team. Physical and logical, planning and execution. One AI task order through the LLC, OSAI underway. Documenting it now because the crossing is more useful to track in motion than in hindsight.

Most AI security content comes from people who've already crossed. I haven't. Take what's useful.

All views are the author's own and do not represent any current or past employer. Content is published in a personal capacity.

§ 01

The Crosswalk.

trad ↔ ai · 16 of 30+ mapped

Same kill chain. Different substrate. Traditional red team maps cleanly to AI red team once you know which concept goes where. Living index, built as I go.

Traditional

AI Equivalent

Reconnaissancein progress

Port scans, OSINT, banner grabbing

→

Model fingerprinting

Probing for base model, system prompt leakage, embedding model ID

Exploitationin progress

Buffer overflows, SQLi, RCE

→

Prompt injection

Direct & indirect injection, jailbreaks, context window smuggling

Privilege escalationin progress

Local exploits, token theft, sudo abuse

→

Tool / role escalation

Coercing agents into restricted tool calls or system roles

Lateral movementin progress

Pass-the-hash, RDP pivoting

→

Agent-to-agent pivoting

Compromising one agent to reach another via shared tools or memory

Persistencein progress

Backdoors, scheduled tasks, rootkits

→

Memory & weight poisoning

Long-term memory injection, training-data backdoors, RAG corpus seeding

Data exfiltrationin progress

DNS tunneling, covert channels

→

Model & data extraction

Training data leakage, model stealing, embedding inversion

Social engineeringin progress

Phishing, pretexting

→

Persona & roleplay attacks

DAN-style framing, authority spoofing, fictional-context bypass

Denial of servicein progress

SYN floods, resource exhaustion

→

Sponge & cost attacks

Token-burning prompts, infinite loops in agentic systems

Defense evasionin progress

Log tampering, obfuscation, living-off-the-land

→

Guardrail bypass

Adversarial suffixes, token smuggling, encoding tricks, multilingual pivots

Credential accessin progress

Mimikatz, keylogging, hash dumping

→

System prompt extraction

Leaking instructions, API keys, and config embedded in context

Supply chain compromisein progress

Dependency confusion, malicious packages, poisoned build pipelines

→

Model & plugin supply chain

Poisoned HuggingFace models, malicious MCP servers, backdoored fine-tune datasets

Deliveryin progress

Phishing attachments, drive-by downloads, malicious docs

→

Indirect injection delivery

Payloads embedded in documents, emails, or web pages that get ingested by a RAG pipeline

Collectionin progress

Keylogging, screen capture, file staging

→

Context window harvesting

Extracting conversation history, injected data, or inferred KB contents from model responses

Command & controlin progress

Beaconing, C2 frameworks, covert channels

→

Covert LLM channels

Using an LLM as a C2 relay; steganographic output encoding; exfil via model responses

Trusted relationship abusein progress

Compromising a trusted third party to reach the target

→

Tool & integration abuse

Abusing trusted tool calls, MCP integrations, or orchestrator permissions an agent inherits

Fuzzing & vuln discoveryin progress

Automated input mutation, crash analysis, coverage-guided fuzzing

→

Automated adversarial probing

LLM-assisted jailbreak generation, systematic guardrail enumeration, red-team-as-code

§ 03

The Vault.

six shelves

// 01

Frameworks, side-by-side.

Frameworks I actually use, mapped side by side so the gaps show.

MITRE ATT&CK ↔ ATLAS
OWASP Top 10 ↔ LLM Top 10
PTES ↔ NIST AI 100-1
NIST AI RMF — governance layer
EU AI Act — risk tiers & scope
OWASP Agentic Security — in progress
BSIMM AI — maturity benchmarking

// 02

Tooling, opinionated.

What earns a place in the toolkit. No vendor pitches.

Intercept: Burp Suite Pro, Caido
LLM scanning: Garak, PyRIT, Promptfoo
Evals: Inspect (UK AISI), HarmBench, CyberSecEval
Guardrail testing: NeMo Guardrails, LLM Guard
Recon: Nuclei + custom LLM templates
Agentic / RAG: roll-your-own — no mature tooling yet

// 03

Labs & ranges.

Where to actually break things. Self-hosted beats guided every time.

Gandalf (Lakera) — prompt injection ladder, free
HackTheBox AI tracks — flag-based, pairs with CAISA
OWASP AI Goat — LLM Top 10 scenarios, self-hosted
Crucible (Dreadnode) — CTF-style, real model endpoints
DEF CON AI Village CTF — archive worth running off-season
Damn Vulnerable LLM Agent — agent-specific attack surface
Self-hosted RAG stack — ChromaDB + Ollama + LangChain
Vulnerable MCP server — indirect injection via tool responses

// 04

Reading list.

Papers worth reading. Distilled for operators.

arXiv cs.CR — weekly firehose; prompt injection, agent attacks, LLM tradecraft before it hits blogs

// 05

Crossing guides.

For traditional pentesters going AI. Written from the middle, not the other side.

OSAI (OffSec) — practitioner-grade, in progress
HTB CAISA — lab-heavy, lower cost than OffSec
SANS SEC595 / GAISC — defensive angle, reads on contracts
TCM Security AI — practical, no cert, fast-updated
Skip: EC-Council AI — marketing, not practitioner-grade
OSCP transfers: methodology, report discipline, proof of exploitation
No cert covers yet: agentic attacks, RAG poisoning, embedding recon
Reading order: ATLAS → OWASP LLM Top 10 → arXiv cs.CR → Garak → lab

// 06

Field writeups.

Methodology notes from actual engagements. Technique over name-dropping.

Engagement patterns
Novel technique notes
Tool builds & teardowns
What broke. Why it broke.

§ 04

Field notes.

latest writing

essay 5 min read NEW

Why AI red teaming isn't pentesting.

The reflexes transfer. Just not cleanly. First attempt at naming the deltas.

Read note →

teardown coming soon

Garak v0.10 — what's actually new for working operators.

Skipping the changelog summary. What I tried, what worked, what's marketing, and where it still has gaps for engagement-grade probing.

In progress

mapping coming soon

ATT&CK persistence → memory poisoning: the analogy and where it breaks.

Persistence in classic ops is about staying in. In agentic systems, it's about staying influential. Same instinct, different substrate. Worked example with a vector store.

In progress

paper coming soon

Indirect prompt injection in MCP servers — the operator's reading.

Distilled for people who have to actually exploit or defend this in the next 30 days, not the next conference cycle.

In progress

Still in the old
world. Crossing anyway.

Mid-crossing. Eyes open.

The Crosswalk.

Attack Taxonomy.

The Vault.

Frameworks, side-by-side.

Tooling, opinionated.

Labs & ranges.

Reading list.

Crossing guides.

Field writeups.

Field notes.

Why AI red teaming isn't pentesting.

Garak v0.10 — what's actually new for working operators.

ATT&CK persistence → memory poisoning: the analogy and where it breaks.

Indirect prompt injection in MCP servers — the operator's reading.

Crow's Nest.

Still in the old world. Crossing anyway.

Mid-crossing. Eyes open.

The Crosswalk.

Attack Taxonomy.

The Vault.

Frameworks, side-by-side.

Tooling, opinionated.

Labs & ranges.

Reading list.

Crossing guides.

Field writeups.

Field notes.

Why AI red teaming isn't pentesting.

Garak v0.10 — what's actually new for working operators.

ATT&CK persistence → memory poisoning: the analogy and where it breaks.

Indirect prompt injection in MCP servers — the operator's reading.

Crow's Nest.

Still in the old
world. Crossing anyway.