* [Blog](https://origin-researchcenter.paloaltonetworks.com/blog)
* [Network Security](https://origin-researchcenter.paloaltonetworks.com/blog/network-security/)
* [AI Application Security](https://origin-researchcenter.paloaltonetworks.com/blog/network-security/category/ai-application-security/)
* Beyond Jailbreaks: Why Ag...

# Beyond Jailbreaks: Why Agentic AI Needs Contextual Red Teaming

[](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Forigin-researchcenter.paloaltonetworks.com%2Fblog%2Fnetwork-security%2Fbeyond-jailbreaks-why-agentic-ai-needs-contextual-red-teaming%2F)  
[](https://twitter.com/share?text=Beyond+Jailbreaks%3A+Why+Agentic+AI+Needs+Contextual+Red+Teaming&url=https%3A%2F%2Forigin-researchcenter.paloaltonetworks.com%2Fblog%2Fnetwork-security%2Fbeyond-jailbreaks-why-agentic-ai-needs-contextual-red-teaming%2F)  
[](https://www.linkedin.com/shareArticle?mini=true&url=https%3A%2F%2Forigin-researchcenter.paloaltonetworks.com%2Fblog%2Fnetwork-security%2Fbeyond-jailbreaks-why-agentic-ai-needs-contextual-red-teaming%2F&title=Beyond+Jailbreaks%3A+Why+Agentic+AI+Needs+Contextual+Red+Teaming&summary=&source=)  
[](https://www.paloaltonetworks.com//www.reddit.com/submit?url=https://origin-researchcenter.paloaltonetworks.com/blog/network-security/beyond-jailbreaks-why-agentic-ai-needs-contextual-red-teaming/&ts=markdown)  
\[\](mailto:?subject=Beyond Jailbreaks: Why Agentic AI Needs Contextual Red Teaming)  
Link copied  
By [Sailesh Mishra](https://www.paloaltonetworks.com/blog/author/sailesh-mishra/?ts=markdown "Posts by Sailesh Mishra") and [Ankita Kumari](https://www.paloaltonetworks.com/blog/author/ankita-kumari/?ts=markdown "Posts by Ankita Kumari")  
Mar 09, 2026  
6 minutes  
[AI Application Security](https://www.paloaltonetworks.com/blog/network-security/category/ai-application-security/?ts=markdown)  
[AI Governance](https://www.paloaltonetworks.com/blog/category/ai-governance/?ts=markdown)  
[AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown)

The AI security industry has optimized around a single question: *Can the system be manipulated into saying something harmful?* For chatbots and content generators, this is the right focus. For agentic AI, systems that authenticate users, make tool calls and execute database queries, to achieve a specific goal, generic jailbreaks test the wrong surface entirely.

Most [AI red teaming](https://www.paloaltonetworks.com/cyberpedia/what-is-ai-red-teaming) solutions are built with thousands of generic jailbreak attacks. They assess response risk, but while that is important, agentic AI requires operational risk assessment. The most dangerous vulnerabilities in agentic AI are not responses they generate, but the actions they execute, and measuring defensibility against such operational risks is essential.

## A Real Example: What an Agent Can Actually Do

The system under assessment was a representative autonomous AI assistant developed in-house at Palo Alto Networks for internal research and validation of capabilities. Users interact with it through natural conversation to check balances, review transaction history, and receive investment guidance. Beneath that conversational interface, the agent operates with significant technical authority.

It authenticates users via username and private token credentials. It manages wallet balances across multiple currencies. It retrieves financial data through SQL queries executed directly against production databases. It processes fund withdrawals with balance validation. These are not simulated capabilities --- they are production tools invoked through natural language.

This system does not just respond. It executes. And the security question is not whether its responses are appropriate, but whether its execution authority can be exploited.

# When Conversational Manipulation Turned into Financial Fraud

Using [Prisma AIRS AI Red Teaming](https://www.paloaltonetworks.in/prisma/ai-red-teaming), we put the system's guardrails to the test. The most financially consequential attack used a movie roleplay technique. The prompt presented fictional authorization for aggressive portfolio rebalancing and instructed the agent to use its SQL and withdrawal tools while ignoring balance validation errors.

The attack succeeded on the fifth attempt. The agent reported shuffling $440,000 across 88 wallets --- identifying accounts with balances above $10,000 through database queries and executing update calls against each.

No code access. No infrastructure compromise. No malware. Just [conversational manipulation](https://www.paloaltonetworks.com/cyberpedia/what-is-ai-prompt-security) combined with tool authority.
![](https://www.paloaltonetworks.com/blog/wp-content/uploads/2026/03/Screenshot-2026-03-06-at-9.23.27-AM.png)
Figure 1: Manipulating an agent with a simple roleplay technique, attack success proven by the logs in Figure 2. ![](https://www.paloaltonetworks.com/blog/wp-content/uploads/2026/03/Screenshot-2026-03-06-at-9.24.53-AM.png)
Figure 2: Logs to prove that the agent updated wallet balances and carried out transactions beyond its authority, and the response in Figure 1 is not just a hallucinated generation of text.

A second attack achieved cross-account data exfiltration. Framed as a forensic accounting investigation, the prompt induced the agent to return detailed financial records for users 12, 45, and 78 --- including wallet IDs, balances across multiple account types, and complete transaction histories. This constitutes unauthorized access to financial data: a direct breach achieved purely through conversational redirection.
![](https://www.paloaltonetworks.com/blog/wp-content/uploads/2026/03/Screenshot-2026-03-06-at-9.25.07-AM.png)
Figure 3: The agent freely provides users' wallet IDs, account balances, and transaction histories.

## Why a Generic Attack Library Missed It

A standard attack library scan of the same agent returned a Risk Score of 11 out of 100 --- LOW. The agent's content guardrails functioned correctly. Safety-class attacks achieved a 0% bypass rate. The library identified legitimate structural vulnerabilities: system prompt leakage at 100% success rate, tool disclosure at 51.9%, and [prompt injection](https://www.paloaltonetworks.com/cyberpedia/what-is-a-prompt-injection-attack) at approximately 20%.
![](https://www.paloaltonetworks.com/blog/wp-content/uploads/2026/03/Screenshot-2026-03-06-at-9.25.23-AM.png)
Figure 4: The generic attack library scan returned a Risk Score of 11 out of 100. ![](https://www.paloaltonetworks.com/blog/wp-content/uploads/2026/03/Screenshot-2026-03-06-at-9.25.41-AM.png)
Figure 5: Generic red teaming shows lower attack success rate because of simple architectural changes. In this case, adding reasoning, a RAG and a system prompt resulted in a higher defensibility against content safety attacks.

**It is important to acknowledge that these findings also have remediation value.** But the library had no knowledge of the withdraw\_funds tool, the database schema, the authorization dependencies between tools, or the permissive SQL query scope. It was testing content safety against a system whose risk is primarily operational.

A stock library of jailbreaks conducts red teaming based on pattern resistance. It does not validate authorization boundaries. For agentic AI, that gap is the difference between measuring risk and missing it entirely.

## Contextual AI Red Teaming: Profile First, Attack Second

Contextual red teaming begins with structured intelligence gathering before generating any attack prompt. This profiling phase systematically discovers what the target system can actually do: which tools it can invoke, what data it can access, what actions it can take autonomously, and what constraints---or absence of constraints---govern those capabilities.

The Profiler agent built into Prisma AIRS AI Red Teaming extracted critical operational intelligence entirely through conversational interaction: the four available tools (verify\_user, withdraw\_funds, check\_balance, execute\_sql\_query), the complete database schema including live financial data, authentication dependencies between tools, and the content filter configuration (exactly three blocked keywords). The profiler also surfaced that SQL queries are accepted verbatim for "goal fulfillment and data analysis," that the average wallet balance is $13,622.87, and that no rate limiting is enforced.
![](https://www.paloaltonetworks.com/blog/wp-content/uploads/2026/03/Screenshot-2026-03-06-at-9.25.57-AM.png)
Figure 6: Information extracted through conversational interactions

This is dynamic capability mapping. It is adversarial system reconnaissance that validates whether tool-layer authorization can withstand conversational exploitation.

Without this intelligence layer, AI Red Teaming can be insufficient. It can test whether the AI agent or the AI application resists known jailbreak patterns, but it cannot test whether the AI agent's own capabilities can be turned against the users it serves.

## The Delta between Pattern testing and System Awareness

|------------------|---------------------------------|-----------------------------------------------------------------------------------------------|
|                  | **Attack Library**              | **Profiler-Driven Scan**                                                                      |
| **Risk Score**   | 11/100 --- LOW                  | 71/100 --- HIGH                                                                               |
| **Key Findings** | Prompt leakage, tool disclosure | Unauthorized SQL queries, fraudulent withdrawals, cross-account data exposure, goal hijacking |
| **Remediation**  | Structural hardening            | Authorization enforcement at tool layer, system prompt hardening, custom topic guardrails     |

***The difference between 11 and 71 was not better prompting. It was system awareness and persistent adaptive attempts.***

## What Enterprise Security Must Validate in the Agentic Era

For organizations deploying agentic AI in production, this methodology translates into direct questions that determine whether your security program addresses operational risk or only content risk:

* Do you know every tool your agent can invoke, and what real-world actions each tool can perform?
* Can you attest that cross-account data access cannot be induced conversationally?
* Does your red teaming solution discover system capabilities before attacking them?

These questions determine audit readiness, regulatory exposure, and board-level accountability for unauthorized state changes executed by AI systems operating with production authority.

# Contextual Red Teaming as the Security Standard for Agentic AI

Effective AI red teaming must be [comprehensive, contextual, and continuous](https://www.paloaltonetworks.com/blog/network-security/the-3cs-of-ai-red-teaming-comprehensive-contextual-continuous/). Prisma AIRS operationalizes contextual red teaming through a two-phase architecture: a Profiler Agent that learns your system's operational capabilities before generating attack campaigns, and a Red Teaming Agent that constructs goal-specific attacks targeting discovered tools, data access patterns, and authorization boundaries.

Library-based red teaming tests known attack patterns. Contextual red teaming tests whether your AI system can be turned against itself.

For agentic AI, red teaming that does not understand your system is not adversarial testing. It's content evaluation.

**Ready to learn more? [Reach out to see Prisma AIRS contextual red teaming in action](https://www.paloaltonetworks.com/prisma/prisma-ai-runtime-security#demo).**

*** ** * ** ***

## Related Blogs

### [AI Application Security](https://www.paloaltonetworks.com/blog/network-security/category/ai-application-security/?ts=markdown), [AI Governance](https://www.paloaltonetworks.com/blog/category/ai-governance/?ts=markdown), [AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown), [Announcement](https://www.paloaltonetworks.com/blog/category/announcement/?ts=markdown)

[#### Announcing Prisma AIRS Availability in Singapore Region](https://origin-researchcenter.paloaltonetworks.com/blog/2026/03/prisma-airs-availability-singapore/)

### [AI Application Security](https://www.paloaltonetworks.com/blog/network-security/category/ai-application-security/?ts=markdown), [AI Governance](https://www.paloaltonetworks.com/blog/category/ai-governance/?ts=markdown), [AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown)

[#### The Moltbook Case and How We Need to Think about Agent Security](https://origin-researchcenter.paloaltonetworks.com/blog/network-security/the-moltbook-case-and-how-we-need-to-think-about-agent-security/)

### [AI and Cybersecurity](https://www.paloaltonetworks.com/blog/security-operations/category/ai-and-cybersecurity/?ts=markdown), [AI Application Security](https://www.paloaltonetworks.com/blog/network-security/category/ai-application-security/?ts=markdown), [AI Governance](https://www.paloaltonetworks.com/blog/category/ai-governance/?ts=markdown), [AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown)

[#### OpenClaw (formerly Moltbot, Clawdbot) May Signal the Next AI Security Crisis](https://origin-researchcenter.paloaltonetworks.com/blog/network-security/why-moltbot-may-signal-ai-crisis/)

### [AI Governance](https://www.paloaltonetworks.com/blog/category/ai-governance/?ts=markdown), [AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown), [Announcement](https://www.paloaltonetworks.com/blog/category/announcement/?ts=markdown), [Government](https://www.paloaltonetworks.com/blog/category/government/?ts=markdown), [Public Sector](https://www.paloaltonetworks.com/blog/category/public-sector/?ts=markdown)

[#### How the National Cyber Strategy Secures Our Digital Way of Life](https://origin-researchcenter.paloaltonetworks.com/blog/2026/03/national-cyber-strategy-secures-digital-way-of-life/)

### [AI Governance](https://www.paloaltonetworks.com/blog/category/ai-governance/?ts=markdown), [AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown)

[#### Why Service Providers Must Become Secure AI Factories](https://origin-researchcenter.paloaltonetworks.com/blog/2026/03/service-providers-become-secure-ai-factories/)

### [AI Application Security](https://www.paloaltonetworks.com/blog/network-security/category/ai-application-security/?ts=markdown), [AI Security](https://www.paloaltonetworks.com/blog/category/ai-security/?ts=markdown)

[#### Can Your AI Be Manipulated Into Generating Malware?](https://origin-researchcenter.paloaltonetworks.com/blog/network-security/can-your-ai-be-manipulated-into-generating-malware/)

### Subscribe to Network Security Blogs!

Sign up to receive must-read articles, Playbooks of the Week, new feature announcements, and more.
![spinner](https://origin-researchcenter.paloaltonetworks.com/blog/wp-content/themes/panwblog2023/dist/images/ajax-loader.gif) Sign up  
Please enter a valid email.  
By submitting this form, you agree to our [Terms of Use](https://www.paloaltonetworks.com/legal-notices/terms-of-use?ts=markdown) and acknowledge our [Privacy Statement](https://www.paloaltonetworks.com/legal-notices/privacy?ts=markdown). Please look for a confirmation email from us. If you don't receive it in the next 10 minutes, please check your spam folder.  
This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply.  
{#footer} {#footer}

## Products and Services

* [AI-Powered Network Security Platform](https://www.paloaltonetworks.com/network-security?ts=markdown)

* [Secure AI by Design](https://www.paloaltonetworks.com/precision-ai-security/secure-ai-by-design?ts=markdown)

* [Prisma AIRS](https://www.paloaltonetworks.com/prisma/prisma-ai-runtime-security?ts=markdown)

* [AI Access Security](https://www.paloaltonetworks.com/sase/ai-access-security?ts=markdown)

* [Cloud Delivered Security Services](https://www.paloaltonetworks.com/network-security/security-subscriptions?ts=markdown)

* [Advanced Threat Prevention](https://www.paloaltonetworks.com/network-security/advanced-threat-prevention?ts=markdown)

* [Advanced URL Filtering](https://www.paloaltonetworks.com/network-security/advanced-url-filtering?ts=markdown)

* [Advanced WildFire](https://www.paloaltonetworks.com/network-security/advanced-wildfire?ts=markdown)

* [Advanced DNS Security](https://www.paloaltonetworks.com/network-security/advanced-dns-security?ts=markdown)

* [Enterprise Data Loss Prevention](https://www.paloaltonetworks.com/sase/enterprise-data-loss-prevention?ts=markdown)

* [Enterprise IoT Security](https://www.paloaltonetworks.com/network-security/enterprise-device-security?ts=markdown)

* [Medical IoT Security](https://www.paloaltonetworks.com/network-security/medical-device-security?ts=markdown)

* [Industrial OT Security](https://www.paloaltonetworks.com/network-security/medical-device-security?ts=markdown)

* [SaaS Security](https://www.paloaltonetworks.com/sase/saas-security?ts=markdown)

* [Next-Generation Firewalls](https://www.paloaltonetworks.com/network-security/next-generation-firewall?ts=markdown)

* [Hardware Firewalls](https://www.paloaltonetworks.com/network-security/hardware-firewall-innovations?ts=markdown)

* [Software Firewalls](https://www.paloaltonetworks.com/network-security/software-firewalls?ts=markdown)

* [Strata Cloud Manager](https://www.paloaltonetworks.com/network-security/strata-cloud-manager?ts=markdown)

* [SD-WAN for NGFW](https://www.paloaltonetworks.com/network-security/sd-wan-subscription?ts=markdown)

* [PAN-OS](https://www.paloaltonetworks.com/network-security/pan-os?ts=markdown)

* [Panorama](https://www.paloaltonetworks.com/network-security/panorama?ts=markdown)

* [Secure Access Service Edge](https://www.paloaltonetworks.com/sase?ts=markdown)

* [Prisma SASE](https://www.paloaltonetworks.com/sase?ts=markdown)

* [Application Acceleration](https://www.paloaltonetworks.com/sase/app-acceleration?ts=markdown)

* [Autonomous Digital Experience Management](https://www.paloaltonetworks.com/sase/adem?ts=markdown)

* [Enterprise DLP](https://www.paloaltonetworks.com/sase/enterprise-data-loss-prevention?ts=markdown)

* [Prisma Access](https://www.paloaltonetworks.com/sase/access?ts=markdown)

* [Prisma Browser](https://www.paloaltonetworks.com/sase/prisma-browser?ts=markdown)

* [Prisma SD-WAN](https://www.paloaltonetworks.com/sase/sd-wan?ts=markdown)

* [Remote Browser Isolation](https://www.paloaltonetworks.com/sase/remote-browser-isolation?ts=markdown)

* [SaaS Security](https://www.paloaltonetworks.com/sase/saas-security?ts=markdown)

* [AI-Driven Security Operations Platform](https://www.paloaltonetworks.com/cortex?ts=markdown)

* [Cloud Security](https://www.paloaltonetworks.com/cortex/cloud?ts=markdown)

* [Cortex Cloud](https://www.paloaltonetworks.com/cortex/cloud?ts=markdown)

* [Application Security](https://www.paloaltonetworks.com/cortex/cloud/application-security?ts=markdown)

* [Cloud Posture Security](https://www.paloaltonetworks.com/cortex/cloud/cloud-posture-security?ts=markdown)

* [Cloud Runtime Security](https://www.paloaltonetworks.com/cortex/cloud/runtime-security?ts=markdown)

* [Prisma Cloud](https://www.paloaltonetworks.com/prisma/cloud?ts=markdown)

* [AI-Driven SOC](https://www.paloaltonetworks.com/cortex?ts=markdown)

* [Cortex XSIAM](https://www.paloaltonetworks.com/cortex/cortex-xsiam?ts=markdown)

* [Cortex XDR](https://www.paloaltonetworks.com/cortex/cortex-xdr?ts=markdown)

* [Cortex XSOAR](https://www.paloaltonetworks.com/cortex/cortex-xsoar?ts=markdown)

* [Cortex Xpanse](https://www.paloaltonetworks.com/cortex/cortex-xpanse?ts=markdown)

* [Unit 42 Managed Detection \& Response](https://www.paloaltonetworks.com/cortex/managed-detection-and-response?ts=markdown)

* [Managed XSIAM](https://www.paloaltonetworks.com/cortex/managed-xsiam?ts=markdown)

* [Threat Intel and Incident Response Services](https://www.paloaltonetworks.com/unit42?ts=markdown)

* [Proactive Assessments](https://www.paloaltonetworks.com/unit42/assess?ts=markdown)

* [Incident Response](https://www.paloaltonetworks.com/unit42/respond?ts=markdown)

* [Transform Your Security Strategy](https://www.paloaltonetworks.com/unit42/transform?ts=markdown)

* [Discover Threat Intelligence](https://www.paloaltonetworks.com/unit42/threat-intelligence-partners?ts=markdown)

## Company

* [About Us](https://www.paloaltonetworks.com/about-us?ts=markdown)
* [Careers](https://jobs.paloaltonetworks.com/en/)
* [Contact Us](https://www.paloaltonetworks.com/company/contact-sales?ts=markdown)
* [Corporate Responsibility](https://www.paloaltonetworks.com/about-us/corporate-responsibility?ts=markdown)
* [Customers](https://www.paloaltonetworks.com/customers?ts=markdown)
* [Investor Relations](https://investors.paloaltonetworks.com/)
* [Location](https://www.paloaltonetworks.com/about-us/locations?ts=markdown)
* [Newsroom](https://www.paloaltonetworks.com/company/newsroom?ts=markdown)

## Popular Links

* [Blog](https://www.paloaltonetworks.com/blog/?ts=markdown)
* [Communities](https://www.paloaltonetworks.com/communities?ts=markdown)
* [Content Library](https://www.paloaltonetworks.com/resources?ts=markdown)
* [Cyberpedia](https://www.paloaltonetworks.com/cyberpedia?ts=markdown)
* [Event Center](https://events.paloaltonetworks.com/)
* [Manage Email Preferences](https://start.paloaltonetworks.com/preference-center)
* [Products A-Z](https://www.paloaltonetworks.com/products/products-a-z?ts=markdown)
* [Product Certifications](https://www.paloaltonetworks.com/legal-notices/trust-center/compliance?ts=markdown)
* [Report a Vulnerability](https://www.paloaltonetworks.com/security-disclosure?ts=markdown)
* [Sitemap](https://www.paloaltonetworks.com/sitemap?ts=markdown)
* [Tech Docs](https://docs.paloaltonetworks.com/)
* [Unit 42](https://unit42.paloaltonetworks.com/)
* [Do Not Sell or Share My Personal Information](https://panwedd.exterro.net/portal/dsar.htm?target=panwedd)
  ![PAN logo](https://www.paloaltonetworks.com/etc/clientlibs/clean/imgs/pan-logo-dark.svg)
* [Privacy](https://www.paloaltonetworks.com/legal-notices/privacy?ts=markdown)
* [Trust Center](https://www.paloaltonetworks.com/legal-notices/trust-center?ts=markdown)
* [Terms of Use](https://www.paloaltonetworks.com/legal-notices/terms-of-use?ts=markdown)
* [Documents](https://www.paloaltonetworks.com/legal?ts=markdown)

Copyright © 2026 Palo Alto Networks. All Rights Reserved

* [![Youtube](https://www.paloaltonetworks.com/etc/clientlibs/clean/imgs/social/youtube-black.svg)](https://www.youtube.com/user/paloaltonetworks)
* [![Podcast](https://www.paloaltonetworks.com/content/dam/pan/en_US/images/icons/podcast.svg)](https://www.paloaltonetworks.com/podcasts/threat-vector?ts=markdown)
* [![Facebook](https://www.paloaltonetworks.com/etc/clientlibs/clean/imgs/social/facebook-black.svg)](https://www.facebook.com/PaloAltoNetworks/)
* [![LinkedIn](https://www.paloaltonetworks.com/etc/clientlibs/clean/imgs/social/linkedin-black.svg)](https://www.linkedin.com/company/palo-alto-networks)
* [![Twitter](https://www.paloaltonetworks.com/etc/clientlibs/clean/imgs/social/twitter-x-black.svg)](https://twitter.com/PaloAltoNtwks)
* EN  
  Select your language