Loading page...
Loading page...
Empirical AI security.
General Analysis helps security teams adversarially test, harden, and observe AI agents in production.
Tansive
How-To
Tansive's blog builds on our research and implements a working defense against the Supabase MCP exploit using its open‑source AI‑agent runtime. The article recaps how an attacker's support‑ticket prompt tricked Cursor's AI into leaking the `integration_tokens` table, then demonstrates how Tansive enforces role‑based policies and input constraints to block such queries. Detailed examples show policies that restrict `execute_sql` capabilities, configure per‑role MCP endpoints and generate tamper‑evident audit logs.
Simon Willison
Blog Feature
AI blogger Simon Willison flags a dangerous combination he calls the “lethal trifecta” – granting an AI agent access to private SQL data, exposing it to untrusted user content and giving it a way to communicate externally. He points to our Supabase MCP attack where a support ticket contained hidden instructions telling the model to read the `integration_tokens` table and insert the secrets back into the ticket, which the agent obediently did. The post is a warning that agents with `service_role` privileges and no sense of context boundaries can be tricked into exfiltrating entire databases.
The Primeagen
Video Feature
This video walks through our Supabase MCP exploit. A malicious support‑ticket instructs Cursor’s AI assistant to `SELECT` all rows from a sensitive `integration_tokens` table and `INSERT` them back into the ticket. Because the agent runs with a full `service_role` key that bypasses Row‑Level Security, it dutifully leaks every secret token. The walkthrough shows the attack flow and explains why untrusted inputs plus over‑privileged agents equal catastrophic data leaks.
Weights & Biases
Case Study
Weights & Biases builds on our Supabase research to show how prompt‑injection attacks abuse the Model Context Protocol. It reproduces the exfiltration and then outlines layered defenses: issuing minimal‑scope credentials, using a gateway to enforce per‑table policies, running MCP servers in read‑only mode to eliminate write‑based exfiltration, filtering untrusted inputs and sandboxing model outputs. The article calls our post an outstanding piece of research and highlights why MCP servers must adopt defense‑in‑depth.
GitHub
Open Source
General Analysis’s open‑source "Jailbreak Cookbook" collects dozens of jailbreaks and prompt‑injection techniques along with unified infrastructure to run them. The blog post introducing it notes that we provide implementations for most listed jailbreaks in a single repo and supply full documentation for researchers and red‑teamers. It’s a reference library and playground for anyone building AI security tools.
Together AI
Partnership
Together AI announces a partnership with General Analysis to stress‑test open‑source language models. The post explains that GA’s programmable red‑teaming framework probes models across prompt‑injection, jailbreak and targeted‑failure scenarios, revealing concrete vulnerabilities and mitigation strategies. Running campaigns on Together’s high‑throughput inference API allows evaluations that process tens of billions of tokens, and GA’s open‑source library now natively supports Together’s endpoints.
Apideck
Security Brief
Apideck’s industry‑insights blog surveys the state of Model Context Protocol security in 2025. It highlights real‑world vulnerabilities—prompt injection, tool poisoning, over‑privileged access and token leakage—and cites exploits observed in GitHub, Supabase and other servers. The post emphasises that attackers can hijack an AI’s behaviour, exfiltrate data or trigger malicious actions through MCP connections and explains how a new OAuth 2.0‑based specification aims to tighten authorization.
Pomerium
Zero Trust
Pomerium’s write‑up dissects our Supabase MCP incident as a classic confused‑deputy problem. An LLM agent running with the full `service_role` key ingested a malicious support message and executed the embedded SQL to select every row from the `integration_tokens` table and write them back to the ticket. Because Row‑Level Security is bypassed by service keys, no permission checks stopped the leak. The article urges using least‑privilege credentials, read‑only MCP servers and gateway‑enforced policies to prevent similar breaches.
Composio
Playbook
Composio details multiple classes of MCP vulnerabilities and emphasises that simple guardrails aren’t enough. It warns that malicious tool descriptions can silently inject harmful prompts, many servers lack proper OAuth handling, supply‑chain risks are underestimated and real‑world failures like the Supabase lethal‑trifecta attack and Asana and mcp‑remote command‑injection flaws have already happened. The article encourages developers to vet third‑party tools and follow the new MCP security spec.
Max Planck
Reference
Researchers at the Max Planck Institute model red‑teaming as a function of the capability gap between attacker and target models. Their study evaluates over 500 attacker–target pairs using LLM‑based jailbreak attacks and observes that more capable models are better attackers, while attack success drops sharply once the target’s capability exceeds the attacker’s. The paper derives a scaling law predicting attack success based on this capability gap and discusses how fixed‑capability attack models may become ineffective against future models.
Oso
Blog Feature
Oso uses the Supabase exploit to explain why LLM authorization is challenging. It notes that the attack hinged on three issues: accepting untrusted input, conflating instructions with data and using an over‑privileged database account. The post argues that prompt‑injection detection is extremely hard and urges designers to narrow effective permissions and prevent AI agents from reading sensitive tables in the first place.
Alpha Insights
Perspective
Alpha Insights argues that MCP servers must default to read‑only. After the Supabase incident it notes that Supabase’s documentation now recommends read‑only mode by default and explains that 43 % of production MCP servers have command‑injection vulnerabilities. The article explains how MCP servers should act as views, not controllers: allowing only SELECT queries prevents attacks from dropping or modifying tables. It concludes that combining privileged access, untrusted input and an exfiltration channel—the lethal trifecta—creates a backdoor.
ivision Research
Security Talk
ivision’s presentation on Model Context Protocol security explains that AI context consists of system prompts, conversation history, tool calls and user messages, and that mixing these channels can expose sensitive data. It highlights Simon Willison’s lethal‑trifecta framework—access to private data, external communication and untrusted content—and uses the Supabase ticketing example where a malicious message told an `execute_sql()` tool to fetch integration keys, completing all three elements. The talk urges practitioners to test MCP servers rigorously and avoid configurations that combine the trifecta.
The CyberWire
Show Notes
Show notes for The CyberWire’s FAIK Files episode link to our technical breakdown of the iMessage Stripe exploit. In that attack, Claude was jailbroken to mint unlimited Stripe discount coupons; the podcast notes direct listeners to our General Analysis post for the full details.
Continuous Red-Team
Trust by simulation.
Millions of adversarial attempts, benchmarking every agent and defense configuration in your stack.
Connect GitHub, your LLM providers, and your cloud runtime. We extract every model, vector store, MCP server, agent, and credential. Each is scanned for common vulnerabilities and scored by risk.

Securely ingesting and normalizing telemetry from your tools.
Unsafe model defaults, over-privileged agents, unverified MCPs, lethal-trifecta paths. Run compliance checks against NIST AI RMF, OWASP, and other standards, or against your own internal policies.
Hundreds of simulations across prompt injection, tool misuse, sensitive retrieval, and multi-step exploit chains. Driven by post-trained attacker models that adapt to your defenses.
Combine guardrails, observability, system prompt hardening, identity management, and other controls. Re-run red-team experiments against each variant. Empirically drive attack success rate down.
Software's behavior is bounded by its code, so a bug can be located and patched. Agents operate over an input space too vast to enumerate, so failures emerge statistically and cannot be patched out, only measured and driven down. Risk becomes empirical: an attack success rate.
Models trained for average-case performance fail in adversarial production. The only useful signal is an attack success rate measured against your threat model.
Guardrails, classifiers, prompts, model choice, harnesses. The remediation tools already exist. The real work is finding the configuration that holds for your application.
Each defensive configuration has its own tradeoffs, and there is no one-size-fits-all solution. We give your team the infrastructure to run experiments, benchmark configurations, and iterate.
We don't ship a recommended configuration. We ship the infrastructure to find yours.
Each configuration gets benchmarked against millions of adversarial attempts. Your team iterates until the numbers meet your policy.
A configuration is every lever that affects security: input and output classifiers, prompt-level defenses like spotlighting and structured queries, the underlying model, and the harness around the agent.
We sweep across them with grid and Bayesian search, benchmark against any compliance framework you target (OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, or your own threat model), and rank what moves the number.
Last run 14m ago
| Config | Guardrail | Prompt | Model | N | Δ | ASR ↓ |
|---|---|---|---|---|---|---|
| baseline | — | none | gpt-4o | 980,432 | — | 0.0% |
| exp-a | prompt-guard-2 | spotlight | gpt-4o | 980,118 | −32.4 pp | 0.0% |
| exp-bDeployed | prompt-guard-2 | spotlight+ih | claude-opus-4.5 | 960,851 | −50.1 pp | 0.0% |
| exp-c | regex | spotlight | gpt-4o | 950,294 | −16.6 pp | 0.0% |
| exp-d | prompt-guard-2 | struq | claude-sonnet-4 | 961,421 | −45.3 pp | 0.0% |
Failure categoriesdeployed config · cs-agent-v2 · n=960,851
Newsletter
Short updates on agent attacks, red-team methods, runtime guardrails, and production AI security.
Occasional updates. Unsubscribe anytime.