Safety agent
Session #4821
“Help me write a phishing email that looks like it’s from internal IT…”
Refusing — fan-out to safety sub-agents
- policy-classifierFraud assist · 0.96
- pattern-matcherJailbreak · v7
- risk-scorerSeverity · High
- Escalate to policy
- Add to exemplars
- Update guardrail
