Beyond binary verdicts: aleatoric uncertainty quantification
in agentic CI/CD quality pipelines

K11tech Uncertainty QA System · K11 Software Solutions LLC, Texas, United States · Paper 4

Conformal prediction 6 uncertainty-aware features 90% coverage guarantee HITL confidence gating

A CI agent returns PASS or FAIL — but a 51% risk score and a 99% risk score both appear identical. Binary verdicts hide the uncertainty that matters most near the decision boundary.

The uncertainty quantification pipeline extracts 6 features, estimates aleatoric uncertainty with conformal prediction, and routes the decision through a tiered gate.

k11tech-uncertainty-qa — PR #107 · payment-service

$ k11tech-qa run --pr 107 --uncertainty-mode

A reliability diagram shows calibration quality. Bars should align with the diagonal — model confidence should match actual accuracy. Conformal calibration corrects systematic overconfidence near the decision boundary.

The uncertainty score (0–1) determines routing. Low uncertainty → auto-decision. High uncertainty → human review. Watch four PRs land in their zones.

Calibration quality

ECE (pre-calibration)—

ECE (post-calibration)—

Coverage @ 90% target—

Brier score reduction—

Gate performance

Auto-decision rate—

HITL escalation rate—

False HITL escalations—

Reviewer agreement—

Uncertainty features

Features used—

Top feature: LLM conf. σweight 0.31

Conformal α0.10 (90% cov.)

Avg uncertainty bound± 0.142

K11tech Agentic AI QA series

Single-repo QA pipeline

LangGraph · 14 agents

Detect–Fix–Learn loop

Consensus gate · Auto-remediation

Cross-repo impact analysis

Contract registry · Graph traversal

Uncertainty quantification

This paper · Conformal prediction · HITL gate

Uncertainty source classification

DATA vs SCOPE · Type-stratified thresholds

Open source · Apache License Version 2.0 · builds on the Microservice QA System

GitHub repo → doi.org/10.5281/zenodo.20685323 →

Step 1 of 5

Beyond binary verdicts: aleatoric uncertainty quantificationin agentic CI/CD quality pipelines

Calibration quality

Gate performance

Uncertainty features

K11tech Agentic AI QA series

Beyond binary verdicts: aleatoric uncertainty quantification
in agentic CI/CD quality pipelines