A CI agent returns PASS or FAIL — but a 51% risk score and a 99% risk score both appear identical. Binary verdicts hide the uncertainty that matters most near the decision boundary.
The uncertainty quantification pipeline extracts 6 features, estimates aleatoric uncertainty with conformal prediction, and routes the decision through a tiered gate.
A reliability diagram shows calibration quality. Bars should align with the diagonal — model confidence should match actual accuracy. Conformal calibration corrects systematic overconfidence near the decision boundary.
The uncertainty score (0–1) determines routing. Low uncertainty → auto-decision. High uncertainty → human review. Watch four PRs land in their zones.
Calibration quality
ECE (pre-calibration)—
ECE (post-calibration)—
Coverage @ 90% target—
Brier score reduction—
Gate performance
Auto-decision rate—
HITL escalation rate—
False HITL escalations—
Reviewer agreement—
Uncertainty features
Features used—
Top feature: LLM conf. σweight 0.31
Conformal α0.10 (90% cov.)
Avg uncertainty bound± 0.142
K11tech Agentic AI QA series
1
Single-repo QA pipeline
LangGraph · 14 agents
2
Detect–Fix–Learn loop
Consensus gate · Auto-remediation
3
Cross-repo impact analysis
Contract registry · Graph traversal
4
Uncertainty quantification
This paper · Conformal prediction · HITL gate
5
Uncertainty source classification
DATA vs SCOPE · Type-stratified thresholds
Open source · Apache License Version 2.0 · builds on the Microservice QA System