THE CROSSING field notes
About Crosswalk Taxonomy Vault Field notes crows-nest.tech ↗
est. 2026 ● live free to read

Still in the old
world. Crossing anyway.

Still active duty. 10 years traditional red team, one AI task order in, OSAI underway. Sharing what I learn as I go.

§ The Crossing

Mid-crossing. Eyes open.

StatusActive duty · 12 mo out
BackgroundTraditional red team, 10+ yrs
AI work1 task order · OSAI in progress
VoiceFirst-person, opinionated
StanceVendor-neutral
CadenceAs I learn

Day job is still traditional red team. Physical and logical, planning and execution. One AI task order through the LLC, OSAI underway. Documenting it now because the crossing is more useful to track in motion than in hindsight.

Most AI security content comes from people who've already crossed. I haven't. Take what's useful.

All views are the author's own and do not represent any current or past employer. Content is published in a personal capacity.

§ 01

The Crosswalk.

trad ↔ ai · 16 of 30+ mapped

Same kill chain. Different substrate. Traditional red team maps cleanly to AI red team once you know which concept goes where. Living index, built as I go.

Traditional
AI Equivalent
01
Reconnaissancein progress
Port scans, OSINT, banner grabbing
Model fingerprinting
Probing for base model, system prompt leakage, embedding model ID
02
Exploitationin progress
Buffer overflows, SQLi, RCE
Prompt injection
Direct & indirect injection, jailbreaks, context window smuggling
03
Local exploits, token theft, sudo abuse
Tool / role escalation
Coercing agents into restricted tool calls or system roles
04
Lateral movementin progress
Pass-the-hash, RDP pivoting
Agent-to-agent pivoting
Compromising one agent to reach another via shared tools or memory
05
Persistencein progress
Backdoors, scheduled tasks, rootkits
Memory & weight poisoning
Long-term memory injection, training-data backdoors, RAG corpus seeding
06
DNS tunneling, covert channels
Model & data extraction
Training data leakage, model stealing, embedding inversion
07
Phishing, pretexting
Persona & roleplay attacks
DAN-style framing, authority spoofing, fictional-context bypass
08
SYN floods, resource exhaustion
Sponge & cost attacks
Token-burning prompts, infinite loops in agentic systems
09
Defense evasionin progress
Log tampering, obfuscation, living-off-the-land
Guardrail bypass
Adversarial suffixes, token smuggling, encoding tricks, multilingual pivots
10
Mimikatz, keylogging, hash dumping
System prompt extraction
Leaking instructions, API keys, and config embedded in context
11
Dependency confusion, malicious packages, poisoned build pipelines
Model & plugin supply chain
Poisoned HuggingFace models, malicious MCP servers, backdoored fine-tune datasets
12
Deliveryin progress
Phishing attachments, drive-by downloads, malicious docs
Indirect injection delivery
Payloads embedded in documents, emails, or web pages that get ingested by a RAG pipeline
13
Collectionin progress
Keylogging, screen capture, file staging
Context window harvesting
Extracting conversation history, injected data, or inferred KB contents from model responses
14
Beaconing, C2 frameworks, covert channels
Covert LLM channels
Using an LLM as a C2 relay; steganographic output encoding; exfil via model responses
15
Compromising a trusted third party to reach the target
Tool & integration abuse
Abusing trusted tool calls, MCP integrations, or orchestrator permissions an agent inherits
16
Automated input mutation, crash analysis, coverage-guided fuzzing
Automated adversarial probing
LLM-assisted jailbreak generation, systematic guardrail enumeration, red-team-as-code
§ 02

Attack Taxonomy.

v1.1 · six chains · naive + evasion paths

A full attack graph mapping every stage from recon to report across RAG, agent, multi-agent, MCP, and model-layer surfaces. Each chain branches into a naive path and a parallel evasion path.

updated as I learn scroll to zoom · drag to pan v1.1 · MCP chain added

use scroll to zoom — drag to pan — click the controls to reset

§ 03

The Vault.

six shelves
// 01

Frameworks, side-by-side.

Frameworks I actually use, mapped side by side so the gaps show.

  • MITRE ATT&CK ↔ ATLAS
  • OWASP Top 10 ↔ LLM Top 10
  • PTES ↔ NIST AI 100-1
  • NIST AI RMF — governance layer
  • EU AI Act — risk tiers & scope
  • OWASP Agentic Security — in progress
  • BSIMM AI — maturity benchmarking
// 02

Tooling, opinionated.

What earns a place in the toolkit. No vendor pitches.

  • Intercept: Burp Suite Pro, Caido
  • LLM scanning: Garak, PyRIT, Promptfoo
  • Evals: Inspect (UK AISI), HarmBench, CyberSecEval
  • Guardrail testing: NeMo Guardrails, LLM Guard
  • Recon: Nuclei + custom LLM templates
  • Agentic / RAG: roll-your-own — no mature tooling yet
// 03

Labs & ranges.

Where to actually break things. Self-hosted beats guided every time.

  • Gandalf (Lakera) — prompt injection ladder, free
  • HackTheBox AI tracks — flag-based, pairs with CAISA
  • OWASP AI Goat — LLM Top 10 scenarios, self-hosted
  • Crucible (Dreadnode) — CTF-style, real model endpoints
  • DEF CON AI Village CTF — archive worth running off-season
  • Damn Vulnerable LLM Agent — agent-specific attack surface
  • Self-hosted RAG stack — ChromaDB + Ollama + LangChain
  • Vulnerable MCP server — indirect injection via tool responses
// 04

Reading list.

Papers worth reading. Distilled for operators.

  • arXiv cs.CR — weekly firehose; prompt injection, agent attacks, LLM tradecraft before it hits blogs
// 05

Crossing guides.

For traditional pentesters going AI. Written from the middle, not the other side.

  • OSAI (OffSec) — practitioner-grade, in progress
  • HTB CAISA — lab-heavy, lower cost than OffSec
  • SANS SEC595 / GAISC — defensive angle, reads on contracts
  • TCM Security AI — practical, no cert, fast-updated
  • Skip: EC-Council AI — marketing, not practitioner-grade
  • OSCP transfers: methodology, report discipline, proof of exploitation
  • No cert covers yet: agentic attacks, RAG poisoning, embedding recon
  • Reading order: ATLAS → OWASP LLM Top 10 → arXiv cs.CR → Garak → lab
// 06

Field writeups.

Methodology notes from actual engagements. Technique over name-dropping.

  • Engagement patterns
  • Novel technique notes
  • Tool builds & teardowns
  • What broke. Why it broke.
§ 04

Field notes.

latest writing
essay 5 min read NEW

Why AI red teaming isn't pentesting.

The reflexes transfer. Just not cleanly. First attempt at naming the deltas.

Read note →
teardown coming soon

Garak v0.10 — what's actually new for working operators.

Skipping the changelog summary. What I tried, what worked, what's marketing, and where it still has gaps for engagement-grade probing.

In progress
mapping coming soon

ATT&CK persistence → memory poisoning: the analogy and where it breaks.

Persistence in classic ops is about staying in. In agentic systems, it's about staying influential. Same instinct, different substrate. Worked example with a vector store.

In progress
paper coming soon

Indirect prompt injection in MCP servers — the operator's reading.

Distilled for people who have to actually exploit or defend this in the next 30 days, not the next conference cycle.

In progress
§ The operator behind this site

Crow's Nest.

The writing here is personal. Formal work lives at Crow's Nest, an LLC for offensive security across traditional and AI systems.

crows-nest.tech ↗
crowsnest.tech
Offensive security. Traditional and AI.
  • Traditional red team
  • AI & agentic system red teaming
  • 1 AI task order completed
  • Accepting work post-transition
the crossing one operator · both sides always free to read vendor neutral · opinion strong crows-nest.tech the crossing one operator · both sides always free to read vendor-neutral · opinion-strong crows-nest.tech