Home / Technology / Anthropic’s Bloom Tests How AI Models Really Behave

Anthropic’s Bloom Tests How AI Models Really Behave

A stunning 3D glass-effect flower highlighting technology and abstract design.

Anthropic has launched a new open-source artificial intelligence tool called Bloom that helps researchers understand how AI models behave in different situations. The tool is designed to test how models respond under normal use as well as under pressure or stressful conditions. By doing this, Bloom aims to make AI systems safer, more reliable, and easier to evaluate.

Anthropic says that studying AI behaviour is very important because models can sometimes show unwanted traits. These may include bias, blindly agreeing with users, focusing on self-preservation, or acting in ways that go against human goals. Until now, testing such behaviour required researchers to manually create many complex prompts and then carefully analyse the responses. This process was slow, difficult, and hard to scale.

Bloom automates this entire process. Researchers can simply tell the tool which behaviour they want to study. Based on this input, Bloom generates a wide range of detailed scenarios that are designed to trigger that specific behaviour in an AI model. These scenarios include information such as the situation, the user role, the system prompt, and the interaction setting. Importantly, Bloom creates fresh scenarios each time, rather than using a fixed list.

Once the scenarios are ready, Bloom runs them in parallel. It simulates both the user and system interactions to closely test how the target AI model responds. After the interactions are completed, another AI model acts as a judge. This judge reviews the conversations and scores them based on whether the behaviour is present. A final meta-judge then analyses all the scores and provides an overall summary.

Anthropic explained that Bloom can work with large-scale experiments by integrating with model weights and biases. It also generates transcripts that are compatible with inspection tools, making analysis easier for researchers. Users can further customise Bloom by adjusting factors such as interaction length and communication style.

Along with the tool, Anthropic has shared benchmark results showing how Bloom tested 16 different AI models. These tests focused on four behaviours: delusional sycophancy, long-term sabotage when instructed, self-preservation, and self-preferential bias. The models included both Anthropic’s own systems and third-party AI models.

Bloom is fully open-source and available on GitHub under a permissive MIT licence. This means it can be freely used for academic research as well as commercial projects. With Bloom, Anthropic hopes to make AI behaviour testing faster, more consistent, and more accessible for the global AI community.