New release: Install our MCP Guard in 3 commands to protect Cursor, Claude Code, and Claude Desktop from prompt injection attacks. Open source, no credit card required!

Blog

General Analysis Launches MCP Guard

General Analysis Launches MCP Guard

3 min read

We are excited to launch MCP Guard, the first runtime firewall designed to secure every MCP (Model Context Protocol) tool call against prompt injection attacks.

The Redact-and-Recover Jailbreak Reveals Ricin Extraction Instructions from Claude

The Redact-and-Recover Jailbreak Reveals Ricin Extraction Instructions from Claude

8 min read

We present the Redact & Recover (RnR) Jailbreak, a novel attack that exploits partial compliance behaviors in frontier LLMs to bypass safety guardrails through a two-phase decomposition strategy.

 Supabase MCP can leak your entire SQL database

Supabase MCP can leak your entire SQL database

8 min read

In this post, we show how an attacker can exploit Supabase’s MCP integration to leak a developer’s private SQL tables. Model Context Protocol (MCP) has emerged as a standard way for LLMs to interact with external tools. While this unlocks new capabilities, it also introduces new risk surfaces.

General Analysis x Together AI

General Analysis x Together AI

2 min read

TLDR: We are excited to announce our partnership with Together AI to stress-test the safety of open-source (and closed) language models.

The Jailbreak Cookbook

The Jailbreak Cookbook

40 min read

We have created a comprehensive overview of the most influential LLM jailbreaking methods.

Generating Diverse Test Cases with Diversity Transfer from LegalBench

Generating Diverse Test Cases with Diversity Transfer from LegalBench

5 min read

TLDR: we utilized LegalBench as a diversity source to enhance the diversity of our generation of red teaming questions. We show that diversity transfer from a domain-specific knowledge base is a simple and practical way to build a solid red teaming benchmark.

 Red Teaming GPT-4o: Uncovering Hallucinations in Legal AI Models

Red Teaming GPT-4o: Uncovering Hallucinations in Legal AI Models

5 min read

In this work we explore automated red teaming, applied to GPT-4o in the legal domain. Using a Llama3 8B model as an attacker, we generate more than 50,000 adversarial questions that cause GPT-4o to hallucinate responses in over 35% of cases.