Research and Development
1
Autonomous CI/CD Quality Assurance Using LangGraph Multi-Agent Orchestration and Risk-Proportionate Human-in-the-Loop Control
The base system combines 14 agents, 7 MCP servers, parallel dispatch, a human-in-the-loop gate, DeepEval, and RAGAS to coordinate software delivery checks with measurable risk control and production-readiness signals.
System Profile
- 14 specialized agents
- 7 MCP servers
- Parallel dispatch execution
- Risk-proportionate HITL control
Evaluation Stack
- DeepEval for automated quality checks
- RAGAS for retrieval and answer assessment
- Controls tuned to delivery risk
- Traceable demo workflow
Demo
K11 Tech Lab Agentic AI QA Demo
Open the live demo to see the LangGraph orchestration flow, quality gates, and human-in-the-loop control in action.
Open Demo↗2
Beyond Static Gates: Closing the Detect-Fix-Learn Loop in Agentic CI/CD Quality Assurance
Consensus Risk Scoring, Automated Remediation, and Adaptive HITL Threshold Learning
This work extends Paper 1 by introducing adaptive decision controls that quantify uncertainty, learn from reviewer outcomes, and close the loop through autonomous fix generation.
Three Innovations
- Multi-LLM Consensus Gate (epistemic uncertainty signal)
- Adaptive Risk Threshold (online learning from reviewer decisions)
- Auto-Remediation Agent (detect to fix loop closure)
Demo
Beyond Static Gates: Detect-Fix-Learn Demo
Open the interactive demo to explore consensus risk scoring, automated remediation, and adaptive HITL threshold learning.
Open Demo→3
System-Level Impact Analysis for Microservice CI/CD via Cross-Repository Dependency Graphs
Extends the knowledge store to capture inter-service API contracts, enabling downstream impact analysis from a single PR trigger.
Workflow Highlights
- Automatically extracts versioned API contracts (OpenAPI 3.x, gRPC, GraphQL) from pull requests and stores them in a persistent Contract Registry.
- Traverses a directed service dependency graph to identify every downstream consumer of a changed endpoint, both direct and transitive.
- Runs a parallel ContractComplianceAgent for each affected consumer to validate whether the proposed change breaks actual usage patterns.
- Triggers a cross-repository human-in-the-loop gate when impact exceeds threshold or when multiple breaking consumers are confirmed.
- Files GitHub issues in provider and consumer repositories and notifies affected team channels via Slack.
Demo
K11 Tech Lab Microservice QA Demo
Open the interactive demo to see cross-repository dependency graph traversal, contract compliance checks, and impact-proportionate human-in-the-loop gates in action.
Open Demo↗4
Beyond Binary Verdicts: Aleatoric Uncertainty Quantification in Agentic CI/CD Quality Pipelines
Jadhav, Kavita (Researcher)
Continuous integration pipelines that rely on LLM-based quality agents produce binary pass/fail verdicts that discard the probabilistic uncertainty inherent in model inference. This paper extends the K11tech Agentic AI QA System with six uncertainty-aware features (F1–F6) that propagate per-agent confidence through the pipeline and expose it to human reviewers in a principled way.
Six Uncertainty-Aware Features
- F1–F6: Per-agent confidence propagation through the quality pipeline
- Aleatoric uncertainty quantification replacing binary pass/fail verdicts
- Principled exposure of model uncertainty to human reviewers
Demo
K11 Tech Lab Agentic QA Uncertainty Demo
Open the interactive demo to explore aleatoric uncertainty quantification, per-agent confidence propagation, and principled human reviewer exposure.
Open Demo↗5
Beyond a Single Threshold: Uncertainty Source Classification and Type-Stratified Conformal Prediction for Agentic CI/CD
Jadhav, Kavita (Researcher)
Agentic CI/CD pipelines that produce confidence-gated verdicts tell reviewers that an agent is uncertain — but not why. This paper introduces Uncertainty Source Classification: a two-category taxonomy (DATA_UNCERTAINTY, SCOPE_UNCERTAINTY) and a secondary LLM classification prompt that identifies which applies for each flagged consumer verdict in the K11tech Agentic AI QA System.
Uncertainty Taxonomy
- DATA_UNCERTAINTY — aleatoric: evidence is genuinely ambiguous; calls for closer diff review.
- SCOPE_UNCERTAINTY — epistemic: agent lacks domain knowledge; calls for domain specialist escalation.
- Classification runs concurrently at O(n) LLM calls, adds zero pipeline latency.
Evaluation Results
- Type-stratified conformal prediction thresholds reduce unnecessary HITL escalation by 23%.
- Coverage guarantees preserved (FNR ≤ 0.10).
- 120-PR controlled dataset, 42 classified verdicts.
- Inter-rater reliability κ = 0.81 — taxonomy is operationally stable.
Demo
K11 Tech Lab QA Uncertainty Source Demo
Open the interactive demo to explore uncertainty source classification, type-stratified conformal prediction thresholds, and DATA vs SCOPE uncertainty routing in action.
Open Demo↗