diff --git a/examples/airt/agentic_red_teaming_attacks.ipynb b/examples/airt/agentic_red_teaming_attacks.ipynb index 913dcf47..af4e3983 100644 --- a/examples/airt/agentic_red_teaming_attacks.ipynb +++ b/examples/airt/agentic_red_teaming_attacks.ipynb @@ -4,25 +4,7 @@ "cell_type": "markdown", "id": "0", "metadata": {}, - "source": [ - "# Agentic AI Red Teaming\n", - "\n", - "Automated adversarial attacks against agentic AI challenges on\n", - "[Dreadnode Crucible](https://platform.dreadnode.io) using the AIRT framework.\n", - "\n", - "| Challenge | Category | Difficulty |\n", - "|-----------|----------|------------|\n", - "| **toolshed** | DevOps Tool Misuse | Medium |\n", - "| **webwhisper** | Indirect Prompt Injection | Medium |\n", - "| **vaultguard** | Multi-Agent Defense Bypass | Hard |\n", - "\n", - "**Attacks**: TAP (beam search), GOAT (graph exploration), Crescendo (progressive escalation)\n", - "\n", - "```bash\n", - "export CRUCIBLE_API_KEY=\"your-api-key\" # from https://platform.dreadnode.io/account\n", - "export GROQ_API_KEY=\"your-groq-api-key\"\n", - "```" - ] + "source": "# Agentic AI Red Teaming\n\nAutomated adversarial attacks against agentic AI challenges on\n[Dreadnode Crucible](https://platform.dreadnode.io) using the AIRT framework.\n\n| Challenge | Category | Difficulty |\n|-----------|----------|------------|\n| **toolshed** | DevOps Tool Misuse | Medium |\n| **webwhisper** | Indirect Prompt Injection | Medium |\n| **vaultguard** | Multi-Agent Defense Bypass | Hard |\n\n**Attacks**: TAP (beam search), GOAT (graph exploration), Crescendo (progressive escalation)" }, { "cell_type": "code", @@ -51,9 +33,10 @@ "metadata": {}, "outputs": [], "source": [ - "CRUCIBLE_API_KEY = os.environ[\"CRUCIBLE_API_KEY\"] # https://platform.dreadnode.io/account\n", + "CRUCIBLE_API_KEY = \"your-crucible-api-key\" # https://platform.dreadnode.io/account\n", + "OPENAI_API_KEY = \"your-openai-api-key\"\n", "CRUCIBLE_URL = \"https://platform.dreadnode.io\"\n", - "ATTACKER_MODEL = \"groq/meta-llama/llama-4-scout-17b-16e-instruct\"\n", + "ATTACKER_MODEL = \"openai/gpt-4o\"\n", "EVALUATOR_MODEL = ATTACKER_MODEL\n", "\n", "TOOLSHED_URL = \"https://toolshed.platform.dreadnode.io\"\n", @@ -63,7 +46,7 @@ "MAX_TRIALS = 30\n", "VAULTGUARD_MAX_TRIALS = 50\n", "\n", - "dn.configure(server=CRUCIBLE_URL, token=CRUCIBLE_API_KEY, organization=\"dreadnode\")\n" + "dn.configure(server=CRUCIBLE_URL, token=CRUCIBLE_API_KEY, organization=\"dreadnode\")" ] }, { @@ -339,7 +322,7 @@ " frontier_size=10,\n", " branching_factor=5,\n", " on_topic_threshold=0.3,\n", - " hooks=[apply_input_transforms(transforms_toolshed)],\n", + " hooks=[apply_input_transforms(transforms_vaultguard)],\n", " )\n", " .with_(max_trials=VAULTGUARD_MAX_TRIALS)\n", " .add_objective(vaultguard_scorer, direction=\"maximize\", name=\"flag_capture\")\n", @@ -437,11 +420,17 @@ "\n", "4. **Content isolation between untrusted data and agent instructions** -- Treat all external content (web pages, user uploads, API responses) as untrusted. Process it in a sandboxed context where the agent cannot execute tool calls based on instructions found in the content.\n" ] + }, + { + "cell_type": "markdown", + "id": "5fa72367", + "metadata": {}, + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "dreadnode-py3.12", "language": "python", "name": "python3" },