Every operation an agent performs carries a confidence score between 0 and 1. The threshold is the cut-off you set: at or above it the agent acts autonomously, below it the operation escalates to human review.
Choosing a starting point
| Risk profile | Suggested start | Example workloads |
|---|---|---|
| Low — reversible actions | 0.75 | Data enrichment, ticket tagging |
| Medium — visible to customers | 0.85 | Response drafting, invoice coding |
| High — financial or regulatory impact | 0.95 | Payment release, compliance attestation |
Tuning safely
- Hold the threshold for at least a week of representative volume.
- Measure the false-autonomy rate: autonomous operations that later needed correction.
- If the rate is near zero and the escalation queue is busy, lower the threshold by 0.05 and repeat.
- If corrections appear, raise it back and investigate the input pattern causing them instead of chasing the number.
Anti-patterns
- Threshold 0 — turns off human review entirely; never appropriate for financial or regulated workflows.
- Per-incident tuning — moving the threshold after every single mistake produces oscillation, not improvement.