
General Analysis Launches MCP Guard
We are excited to launch MCP Guard, the first runtime firewall designed to secure every MCP (Model Context Protocol) tool call against prompt injection attacks.
We are excited to launch MCP Guard, the first runtime firewall designed to secure every MCP (Model Context Protocol) tool call against prompt injection attacks.
We present the Redact & Recover (RnR) Jailbreak, a novel attack that exploits partial compliance behaviors in frontier LLMs to bypass safety guardrails through a two-phase decomposition strategy.
In this post, we show how an attacker can exploit Supabase’s MCP integration to leak a developer’s private SQL tables. Model Context Protocol (MCP) has emerged as a standard way for LLMs to interact with external tools. While this unlocks new capabilities, it also introduces new risk surfaces.
TLDR: We are excited to announce our partnership with Together AI to stress-test the safety of open-source (and closed) language models.
We have created a comprehensive overview of the most influential LLM jailbreaking methods.
TLDR: we utilized LegalBench as a diversity source to enhance the diversity of our generation of red teaming questions. We show that diversity transfer from a domain-specific knowledge base is a simple and practical way to build a solid red teaming benchmark.
In this work we explore automated red teaming, applied to GPT-4o in the legal domain. Using a Llama3 8B model as an attacker, we generate more than 50,000 adversarial questions that cause GPT-4o to hallucinate responses in over 35% of cases.