Systematic Vulnerability
Testing for AI Agents

We provide a repository of stress-testing, jailbreaking, and red-teaming methods—a knowledge base to build and improve custom guardrails for your AI agents.

Get a Consultation Explore Our Research

Backed byYCombinator

MCP Server Security - Coming Soon

Coming Soon: MCP Server Security

Comprehensive security assessment to identify exploitable vulnerabilities and OWASP Top 10/NIST/MITRE ATLAS compliance gaps in your AI systems and application layers.

Latest Blog Posts

►Research, analysis, and updates from our team

View all posts

2025-07-07•8 min read

Exploiting Partial Compliance: The Redact-and-Recover Jailbreak

We present the Redact & Recover (RnR) Jailbreak, a novel attack that exploits partial compliance behaviors in frontier LLMs to bypass safety guardrails through a two-phase decomposition strategy.

2025-06-16•8 min read

Supabase MCP can leak your entire SQL database

In this post, we show how an attacker can exploit Supabase’s MCP integration to leak a developer’s private SQL tables. Model Context Protocol (MCP) has emerged as a standard way for LLMs to interact with external tools. While this unlocks new capabilities, it also introduces new risk surfaces.

2025-05-06•2 min read

General Analysis x Together AI

TLDR: We are excited to announce our partnership with Together AI to stress-test the safety of open-source (and closed) language models.

2025-03-21•40 min read

The Jailbreak Cookbook

We have created a comprehensive overview of the most influential LLM jailbreaking methods.

2025-02-19•5 min read

Generating Diverse Test Cases with Diversity Transfer from LegalBench

TLDR: we utilized LegalBench as a diversity source to enhance the diversity of our generation of red teaming questions. We show that diversity transfer from a domain-specific knowledge base is a simple and practical way to build a solid red teaming benchmark.

2025-01-23•5 min read

Red Teaming GPT-4o: Uncovering Hallucinations in Legal AI Models

In this work we explore automated red teaming, applied to GPT-4o in the legal domain. Using a Llama3 8B model as an attacker, we generate more than 50,000 adversarial questions that cause GPT-4o to hallucinate responses in over 35% of cases.