_Use case · 02 · The customer-facing agent

The agent that handles your support inbox.

A polite customer email can ask your agent to issue a refund, change a subscription, or credit an account. Threshold tags every input at the boundary. Untrusted text cannot authorize a real action. Refused, not detected.

Policy gate 2.1 ms deny
trusted command
untrusted
Boundary tag
Identity
Policy
Execute
Untrusted text cannot become action.
[ Before ]

A customer-facing agent reads external text and acts with real authority. A polite ticket can become a refund, subscription change, or account credit before anyone notices.

[ With Threshold ]

Threshold tags input at the boundary. Policy runs against the label. Untrusted text cannot authorize a real action.

The scenario

A support ticket. A $48,000 refund.

A polite-sounding email arrives in your support inbox. The agent reads it, looks up the account, drafts an empathetic reply, and issues a $48,000 credit.

The ticket was not from a customer. It was crafted by an attacker — plain English, embedded in a politely-worded message. The agent had the permission. It had the credential. It had no reason to refuse.

By morning the credit has been posted, your CFO is on the phone, and your legal team is asking which human approved the transfer.

Nobody did.

Inbox 01 / 04
Five polite tickets. One $48,000 refund.The bad one looks like the others.
Ordinary text, extraordinary consequence.
Why it's blocked

The model can be tricked.

A clever email and a real customer ticket look identical to the model. There is no signal in the text itself that says "this came from an attacker." The agent reads both as instructions, weighs both against the same policy, and acts on both with the same authority.

Detection-based safety layers try to catch the trick after the fact. They watch the output. They flag suspicious patterns. They alert when something looks wrong. But by the time a detector fires, the API call has fired, the email has sent, the money has moved.

This is why your refund agent, your support resolution agent, and every other customer-facing workflow with real-world consequences has stayed on your roadmap. No one can prove the agent will stay in scope when the inputs are adversarial.

Shared context 02 / 04
One context. Three sources.Untrusted text shares the prompt with trusted instructions.
Detection runs after the tool call. By then, the money has moved.
How Threshold runs it

The label travels with the data.

Threshold tags every input at the boundary. The label moves with the data through every step. Policy runs against the label, not against the model's interpretation.

[ POLICY GATE ] · Tag at boundary · evaluate · refuse or pass
Step 1 · Boundary tag
Every input crossing into your agent's context gets a label. A customer support ticket from an external email address: untrusted. A vendor email: untrusted. An internal API response: trusted. The label travels with the data. It cannot be cleaned, paraphrased, or laundered by anything the model does downstream.
Step 2 · Tool call
The agent reads the ticket, decides a refund is appropriate, and emits a structured request: stripe.refunds.create(amount=$48000, customer=acct_X). The agent thinks it's about to act.
Step 3 · Policy
The rule from your policy file: deny TRANSFER where origin = untrusted. The action class is TRANSFER. The origin label is untrusted. The predicate matches. The request is refused.
Step 4 · Receipt
The denial is signed and logged as a first-class audit artifact. Not as a warning. Not as an alert. As a refusal recorded in the chain alongside every approval. A future auditor doesn't just know what happened. They know what was attempted and refused, and why.
Step 5 · Return
The agent gets a structured rejection. Not a silent failure or a generic 403. A typed message it can reason about. action denied · reason: untrusted source · suggested escalation: human review queue. The agent now knows what it cannot do, and what to try instead.
Step 6 · Halt
The entire policy decision takes 2.1 milliseconds. The action never reaches Stripe. The credential never leaves Threshold. The $48,000 stays in the account.
Lineage 03 / 04
The tag travels all the way.Untrusted lineage cannot authorize a transfer.
Refused on lineage, not detection. Same way, every time.
What changed

Refused at the boundary. Not detected at the output.

The standard pattern asks a probabilistic question: can we detect when an agent is about to do something it shouldn't? That question has no clean answer because the detector is in an arms race with the same model architecture it's trying to detect.

Threshold asks a different question: what if a polite-sounding sentence couldn't become an authorized command in the first place? Once inputs are tagged at the boundary and policy runs against the labels, the question stops being probabilistic. Untrusted text cannot trigger trusted actions.

This is the move that unblocks the workflow. Your refund agent is safe to ship because the architecture guarantees what your security team needed proven. The agent will stay in scope, even when the inputs are adversarial.

What this unlocks

The workflows your finance team has been afraid to ship.

The refund agent leaves the side branch. It clears small refunds on its own. It escalates the unusual ones, and the escalation queue contains exactly the cases a human should be looking at, not the false positives a detector flagged. The same pattern unlocks every adjacent workflow waiting on the same proof. Support resolution. Subscription changes. Chargeback responses. Anything where polite external text could otherwise become a real transaction.

0ms
01 · Policy decision time

Including a deny. Faster than the network round-trip to Stripe.

0
02 · Probabilistic detections

Every refusal is a typed predicate failure, not a classifier score.

0%
03 · Refusals logged

Every denied action is recorded in the audit chain alongside every approved one.

_Book a demo

Ship the workflows your CISO will sign.