K11 Intelligent QA Platform — Claude-native AI QA Automation

Platform Architecture

The CI/CD Orchestrator coordinates 14 specialised agents across 3 pipeline phases. All agents share state through the knowledge-store-mcp (RAG layer). External services — GitHub, Jira, Playwright, k6, PostgreSQL, and Slack — are each a first-class MCP server.

CI/CD Orchestrator Agent

cicd_orchestrator_agent.py · Coordinates phases · Aggregates results · Triggers on PR

1 Planning — sequential

📋 Test Blueprint Agent

→

⚡ Test Gen Agent

2 Execution — parallel (asyncio.gather)

🤖 E2E Runner

🔌 API Probe

⚡ Perf Monitor

👁️ Visual AI

♿ Accessibility

🔒 Security Scanner

🛡️ Data Guard

All 7 agents run concurrently · Each reads & writes the shared knowledge store · Results persist even if one agent fails

3 Analysis & Reporting — sequential

🐛 Defect Analyst

📡 CI Monitor

📊 Coverage

💬 Chatbot Eval

📄 Report Gen

knowledge-store-mcp · :8090 RAG Layer

Shared Knowledge Store

FastMCP · Python · SQLite · 28 tools · 7 domains · Retrieve-before-act pattern

📋 Test Cases

📊 Execution Results

🐛 Defect Patterns

⚡ Perf Baselines

💡 AI Insights

👁️ Visual Baselines

📈 Suite Health

6 External MCP Servers — each a standalone FastMCP service

🎭

playwright-mcp

:8091

Browser automation · 7 tools

⚡

k6-mcp

:8092

Load testing · 5 tools

🐙

github-mcp

:8093

PRs · Checks · Issues · 7 tools

📋

jira-mcp

:8094

Tickets · JQL · Transitions · 6 tools

🗄️

postgres-mcp

:8095

Schema · Integrity · Migrations · 6 tools

💬

slack-mcp

:8096

Alerts · Reports · Threads · 5 tools

Orchestration / planning flow

External MCP integration

RAG read / write (all agents)

How It Works — Core Concepts

This platform is built on four interlocking concepts. Each has a detailed doc in the docs/ folder of the repository.

🔌

Model Context Protocol

The integration layer

MCP is an open standard from Anthropic that lets LLM agents call external tools through a typed, documented contract. Every integration in this platform is an MCP server — not hard-coded function calls.

7 MCP servers expose 64 tools total across the platform
Agents receive tool schemas at runtime — no glue code
Supports stdio (Claude Desktop) and HTTP (agent pipelines)
Tool annotations (readOnlyHint, destructiveHint) signal safety

📄 docs/mcp.md

🧠

Retrieval-Augmented Generation

The memory layer

Agents don't reason in a vacuum. Before every decision they retrieve relevant history from the knowledge store — past failures, baselines, insights — and augment their reasoning with it.

Task arrives → Retrieve history → Augment prompt → Generate decision → Write results

knowledge-store-mcp is the RAG store — structured retrieval, not vector search
7 domains persist QA knowledge across every CI run
Agents write insights that the next agent reads as context

📄 docs/rag.md

🤖

Agentic AI

The reasoning layer

Each of the 14 workers is an autonomous agent — not a function call. They follow the ReAct loop: Observe → Reason → Act → Observe, calling tools until the task is done.

Observe → Reason → Call tool → Observe result → Loop / done

claude-opus-4-6 for planning/analysis · claude-sonnet-4-6 for execution
Agents handle errors autonomously — actionable ToolError messages guide retries
Each agent has a narrow, focused system prompt for reliable tool selection

📄 docs/agentic-ai.md

🎯

Multi-Agent Orchestration

The coordination layer

The orchestrator is itself an LLM agent that reasons about which sub-agents to run and in what order. Agents communicate through the knowledge store, not through direct calls.

Phase 1 (Planning) — sequential: Blueprint → Test Gen
Phase 2 (Execution) — 7 agents run in parallel via asyncio.gather
Phase 3 (Analysis) — sequential: Defect → Monitor → Coverage → Report
Knowledge store is the communication bus — decoupled, auditable, replayable

📄 docs/multi-agent-orchestration.md

🛠️

LLM Tool Design

The reliability layer

A tool is only as good as its description. Every tool in this platform follows strict design principles so agents can discover, call, and recover from them without human intervention.

{server}_{verb}_{noun} naming — agents narrow candidates by prefix
Docstrings with Args + allowed values + Returns field names
Actionable ToolError messages — agents recover without human help
Returns include pre-computed fields and URLs agents will need

📄 docs/tool-design.md

🚀

CI/CD Integration

The delivery layer

The platform is a drop-in QA layer for GitHub Actions. A PR opens, a webhook fires, all 7 MCP servers start, the orchestrator runs, and results flow back automatically.

Triggered by pull_request or workflow_dispatch GitHub events
Results posted as GitHub Check Runs, Jira tickets, and Slack messages
Exit code 0/1 drives the PR gate — blocks merge on critical failures
All credentials injected as GitHub Actions secrets — zero hard-coding

📄 docs/cicd-integration.md

Knowledge Store — 7 Domains, 28 MCP Tools

A FastMCP Python server backed by SQLite. The platform's RAG layer — every agent retrieves context before reasoning and writes findings after. Runs in stdio (Claude Desktop) or HTTP (agent pipelines) mode.

MCP Tools

Knowledge Domains

SQLite Tables

Agents Connected

knowledge-store-mcp · claude_desktop_config.json
"knowledge-store": {
"command": "python",
"args": ["knowledge-store-mcp/server.py", "--stdio"],
"env": { "KS_DB_PATH": "knowledge_store.db" }
}

📋

5 tools

Test Cases

Test case definitions — name, module, type (e2e/api/perf/visual), priority, tags. Written by Test Gen Agent, read by all execution agents before running.

ks_insert_test_case ks_get_test_case ks_list_test_cases ks_update_test_case ks_delete_test_case

📊

4 tools

Test Execution Results

Every CI run's pass/fail/skip/flaky outcomes — duration, error message, screenshot path, run ID, PR number. Flaky tests detected automatically across runs.

ks_insert_test_result ks_get_run_summary ks_list_test_results ks_list_flaky_tests

🐛

4 tools

Defect Patterns

Recurring failure signatures — module, root cause, frequency count, suggested fix. Frequency-ranked across runs, marked resolved when the fix lands.

ks_insert_defect_pattern ks_list_defect_patterns ks_increment_defect_frequency ks_resolve_defect_pattern

⚡

5 tools

Performance Baselines

Approved baselines per endpoint — median latency, p95, p99, error rate. Observations captured each run to detect regressions against stored thresholds.

ks_upsert_perf_baseline ks_get_perf_baseline ks_list_perf_baselines ks_insert_perf_observation ks_get_perf_trend

💡

3 tools

AI Insights

Claude-generated cross-run observations — anomalies, trends, recommendations, severity. Platform's "working memory": agents leave notes for each other to read.

ks_insert_insight ks_list_insights ks_acknowledge_insight

👁️

4 tools

Visual Baselines

Approved screenshot baselines per page, viewport, and browser. Pixel-diff percentages stored per CI run with pass/warn/fail status.

ks_upsert_visual_baseline ks_get_visual_baseline ks_insert_visual_diff ks_list_visual_diffs

📈

3 tools

Suite Health

Composite health scores per suite: pass rate, flakiness rate, coverage %, open defects. Platform-wide health summary across all 14 agents and every run.

ks_upsert_suite_health ks_list_suite_health ks_get_platform_health_summary

14 AI QA Agents

Each agent is a specialised Claude-powered worker with its own system prompt, MCP tool scope, and knowledge store domains. Colour tags show which knowledge domains each agent uses. Grey text shows which external MCP servers it connects to.

Test Cases Results Defect Patterns Perf Baselines AI Insights Visual Suite Health

Phase 1 — Planning

📋

Test Blueprint Agent

test_blueprint_agent.py

Analyses a PR diff and produces a structured test plan — which modules are affected, what test types are needed, and priority levels.

MCPs: knowledge-store · github

Test CasesAI Insights

⚡

Test Generation Agent

test_gen_agent.py

Takes the blueprint, checks for duplicate test cases in the store, generates new test case definitions, and writes them with full metadata.

MCPs: knowledge-store

Test CasesResults

Phase 2 — Execution (parallel)

🤖

E2E Runner Agent

e2e_runner_agent.py

Retrieves test cases, runs Playwright E2E tests against staging, stores pass/fail results, posts a GitHub Check Run with the outcome.

MCPs: knowledge-store · playwright · github

ResultsDefect PatternsSuite Health

🔌

API Probe Agent

api_probe_agent.py

Validates REST/GraphQL endpoints — contracts, status codes, schemas, and response times. Compares against stored performance baselines.

MCPs: knowledge-store · github

ResultsPerf BaselinesDefect Patterns

⚡

Performance Monitor Agent

perf_monitor_agent.py

Runs k6 load tests, compares p95/p99 latency against stored baselines, records observations, flags regressions with Slack alerts.

MCPs: knowledge-store · k6 · slack

Perf BaselinesResultsAI Insights

👁️

Visual AI Agent

visual_ai_agent.py

Captures screenshots with Playwright, compares pixel-by-pixel to stored baselines, records diff percentages, reports visual regressions.

MCPs: knowledge-store · playwright

VisualResultsAI Insights

♿

Accessibility Agent

accessibility_agent.py

Runs axe-core WCAG 2.1 checks via Playwright, creates Jira tickets for violations above severity threshold, writes remediation insights.

MCPs: knowledge-store · playwright · jira

ResultsDefect PatternsAI Insights

🔒

Security Scanner Agent

security_scanner_agent.py

Scans for OWASP vulnerabilities — exposed headers, injection surface, auth flaws, secrets. Creates Jira and GitHub issues for critical findings.

MCPs: knowledge-store · github · jira

Defect PatternsAI InsightsResults

🛡️

Data Guard Agent

data_guard_agent.py

Validates PostgreSQL integrity — foreign key violations, schema drift, null constraints, missing PKs. Runs a full migration health check after every deploy.

MCPs: knowledge-store · postgres

Test CasesResultsDefect Patterns

Phase 3 — Analysis & Reporting

🐛

Defect Analyst Agent

defect_analyst_agent.py

Reads all new failures, correlates with stored defect patterns, estimates root cause, increments frequency counts, creates Jira tickets for new patterns.

MCPs: knowledge-store · jira · github

Defect PatternsAI InsightsResults

📡

CI Monitor Agent

ci_monitor_agent.py

Computes suite health, detects anomalies in pass rates, posts Slack alerts for critical failures and SLA breaches. Updates platform health snapshot.

MCPs: knowledge-store · github · slack

Suite HealthAI InsightsResults

📊

Coverage Agent

coverage_agent.py

Measures code and requirements coverage per module. Flags untested areas, writes coverage-gap insights, updates suite health records.

MCPs: knowledge-store · github

Suite HealthAI InsightsTest Cases

💬

Chatbot Eval Agent

chatbot_eval_agent.py

Evaluates LLM-powered features for accuracy, tone, safety, and hallucination. Posts structured eval results to Slack and stores quality trend insights.

MCPs: knowledge-store · slack

ResultsAI InsightsPerf Baselines

📄

Report Gen Agent

report_gen_agent.py

Reads all 7 knowledge domains after each CI run and generates the executive QA report — posted to Slack and Jira, with GitHub check run details.

MCPs: knowledge-store · jira · slack · github

Test CasesResultsDefect PatternsPerf BaselinesAI InsightsSuite Health

Integration Guide

All 7 MCP servers are configured for Claude Desktop via a single JSON paste. For CI/CD pipelines, a GitHub Actions workflow starts all servers and runs the orchestrator automatically on every PR.

All 7 MCP Servers — Claude Desktop Config

Paste into %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS), then restart Claude Desktop.

🧠

knowledge-store

stdio · 28 tools

🎭

playwright

stdio · 7 tools

⚡

stdio · 5 tools

🐙

github

stdio · 7 tools

📋

jira

stdio · 6 tools

🗄️

postgres

stdio · 6 tools

💬

slack

stdio · 5 tools

external-mcps/claude_desktop_config.json (excerpt)
{
"mcpServers": {
"knowledge-store": { "command": "python", "args": ["../knowledge-store-mcp/server.py", "--stdio"] },
"playwright": { "command": "python", "args": ["playwright-mcp/server.py", "--stdio"] },
"github": { "command": "python", "args": ["github-mcp/server.py", "--stdio"],
"jira": { "command": "python", "args": ["jira-mcp/server.py", "--stdio"], "env": { "JIRA_EMAIL": "your@email.com", "JIRA_API_TOKEN": "...", "JIRA_BASE_URL": "https://yourorg.atlassian.net" } },
"postgres": { "command": "python", "args": ["postgres-mcp/server.py", "--stdio"], "env": { "PG_DSN": "postgresql://user:pass@localhost/mydb" } },
"slack": { "command": "python", "args": ["slack-mcp/server.py", "--stdio"], "env": { "SLACK_BOT_TOKEN": "xoxb-..." } }
}
}

CI/CD: GitHub Actions Workflow

Add .github/workflows/ai-qa.yml to your repo. The platform triggers automatically on every pull request to main or develop.

.github/workflows/ai-qa.yml
on:
pull_request:
branches: [main, develop]
jobs:
ai-qa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start MCP servers
run: python knowledge-store-mcp/server.py --port 8090 &
- name: Run AI QA Orchestrator
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python agents/cicd_orchestrator_agent.py --pr ${{ github.event.pull_request.number }} --staging-url ${{ secrets.STAGING_URL }}

Calling an Agent — Python Pattern

Every agent is a single client.beta.messages.create() call. The while loop drives the ReAct cycle — calling tools until stop_reason == "end_turn".

agents/e2e_runner_agent.py (simplified)
response = client.beta.messages.create(
model="claude-opus-4-6",
tools=[
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8090/mcp", "name": "knowledge-store" } },
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8091/mcp", "name": "playwright" } },
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8093/mcp", "name": "github" } },
],
messages=[{"role":"user","content":"Run E2E suite for PR #42"}]
)
while response.stop_reason == "tool_use":
results = execute_tools(response.content)
response = client.beta.messages.create(..., messages=[..., results])

Quick Start — Local Development

Run the full platform locally against a staging environment in 3 steps.

Set credentials

environment
export ANTHROPIC_API_KEY=sk-ant-...
export GITHUB_TOKEN=ghp_...
export JIRA_EMAIL=your@email.com
export SLACK_BOT_TOKEN=xoxb-...
export PG_DSN=postgresql://user:pass@localhost/mydb

Start MCP servers

bash
python knowledge-store-mcp/server.py --port 8090 &
python external-mcps/playwright-mcp/server.py --port 8091 &
python external-mcps/github-mcp/server.py --port 8093 &
sleep 5

Run the orchestrator

bash
python agents/cicd_orchestrator_agent.py \
--pr 42 \
--staging-url https://staging.example.com

Tools Reference

28 tools across 7 knowledge domains — the full knowledge-store-mcp surface area

Showing 28 tools

Tool	Domain	R/W	Description
`ks_insert_test_case`	Test Cases	Write	Insert a new test case definition into the knowledge store. Requires test_case_id, module, name, test_type, and steps.
`ks_get_test_case`	Test Cases	Read	Fetch a single test case by ID. Returns full definition including steps, priority, and module.
`ks_list_test_cases`	Test Cases	Read	List and filter test cases by module, test_type, or status. Default limit 50.
`ks_update_test_case`	Test Cases	Write	Update an existing test case — steps, priority, status, or expected outcome.
`ks_delete_test_case`	Test Cases	Destructive	Delete a test case by ID. Irreversible — destructiveHint: true.
`ks_insert_test_result`	Test Results	Write	Record a test execution result (pass/fail/flaky/skipped) with duration and optional error message.
`ks_get_run_summary`	Test Results	Read	Get aggregated pass/fail/flaky counts and pre-computed pass_rate for a run_id.
`ks_list_test_results`	Test Results	Read	List individual test results for a run, filterable by status. Includes error messages.
`ks_list_flaky_tests`	Test Results	Read	Return test cases with flaky status across recent runs — used by defect_analyst_agent for pattern detection.
`ks_insert_defect_pattern`	Defect Patterns	Write	Store a newly detected defect pattern with module, root_cause, and suggested_fix.
`ks_list_defect_patterns`	Defect Patterns	Read	List known defect patterns filterable by status (active/resolved) and module. Returns root_cause and suggested_fix.
`ks_increment_defect_frequency`	Defect Patterns	Write	Increment the recurrence count for an existing defect pattern. Called when a known pattern is seen again.
`ks_resolve_defect_pattern`	Defect Patterns	Write	Mark a defect pattern as resolved. Triggers Jira auto-transition via defect_analyst_agent.
`ks_upsert_perf_baseline`	Perf Baselines	Write	Create or update a performance baseline (p50/p95/p99 latency, error rate) for an endpoint.
`ks_get_perf_baseline`	Perf Baselines	Read	Retrieve the current performance baseline for a specific endpoint — used by perf_monitor_agent for regression detection.
`ks_list_perf_baselines`	Perf Baselines	Read	List all performance baselines, optionally filtered by module or endpoint prefix.
`ks_insert_perf_observation`	Perf Baselines	Write	Record a single k6 load test observation (latency percentiles, throughput, error rate).
`ks_get_perf_trend`	Perf Baselines	Read	Return the last N observations for an endpoint to show performance trajectory over time.
`ks_insert_insight`	AI Insights	Write	Persist an AI-generated insight (anomaly, recommendation, risk flag) with severity and source agent.
`ks_list_insights`	AI Insights	Read	List insights filterable by severity, acknowledged status, or source agent. Used by report_gen_agent.
`ks_acknowledge_insight`	AI Insights	Write	Mark an insight as acknowledged so it is excluded from future unreviewed lists.
`ks_upsert_visual_baseline`	Visual Baselines	Write	Store or update the reference screenshot hash and metadata for a page/viewport combination.
`ks_get_visual_baseline`	Visual Baselines	Read	Retrieve the current visual baseline for a page — used by visual_ai_agent before taking new screenshots.
`ks_insert_visual_diff`	Visual Baselines	Write	Record a detected visual difference with diff percentage, affected region, and run_id.
`ks_list_visual_diffs`	Visual Baselines	Read	List visual diffs for a run or page, sorted by diff percentage. Used in reports and PR checks.
`ks_upsert_suite_health`	Suite Health	Write	Update the suite health record for a module — pass rate, flaky count, coverage, last run timestamp.
`ks_list_suite_health`	Suite Health	Read	List suite health records for all modules — used by coverage_agent and report_gen_agent.
`ks_get_platform_health_summary`	Suite Health	Read	Return a single-call platform health overview — aggregate pass rate, active defects, regressions, and critical insights.

Claude-native AI QA Automation System