Agents

Confidence thresholds

Every operation an agent performs carries a confidence score between 0 and 1. The threshold is the cut-off you set: at or above it the agent acts autonomously, below it the operation escalates to human review.

Choosing a starting point

Risk profile	Suggested start	Example workloads
Low — reversible actions	0.75	Data enrichment, ticket tagging
Medium — visible to customers	0.85	Response drafting, invoice coding
High — financial or regulatory impact	0.95	Payment release, compliance attestation

Tuning safely

Hold the threshold for at least a week of representative volume.
Measure the false-autonomy rate: autonomous operations that later needed correction.
If the rate is near zero and the escalation queue is busy, lower the threshold by 0.05 and repeat.
If corrections appear, raise it back and investigate the input pattern causing them instead of chasing the number.

Anti-patterns

Threshold 0 — turns off human review entirely; never appropriate for financial or regulated workflows.
Per-incident tuning — moving the threshold after every single mistake produces oscillation, not improvement.

Human-in-the-loop Integration overview