Platform Architecture
The CI/CD Orchestrator coordinates 14 specialised agents across 3 pipeline phases. All agents share state through the knowledge-store-mcp (RAG layer). External services — GitHub, Jira, Playwright, k6, PostgreSQL, and Slack — are each a first-class MCP server.
CI/CD Orchestrator Agent
cicd_orchestrator_agent.py · Coordinates phases · Aggregates results · Triggers on PR
1
Planning — sequential
📋 Test Blueprint Agent
→
⚡ Test Gen Agent
2
Execution — parallel (asyncio.gather)
🤖 E2E Runner
🔌 API Probe
⚡ Perf Monitor
👁️ Visual AI
♿ Accessibility
🔒 Security Scanner
🛡️ Data Guard
All 7 agents run concurrently · Each reads & writes the shared knowledge store · Results persist even if one agent fails
3
Analysis & Reporting — sequential
🐛 Defect Analyst
📡 CI Monitor
📊 Coverage
💬 Chatbot Eval
📄 Report Gen
knowledge-store-mcp · :8090
RAG Layer
Shared Knowledge Store
FastMCP · Python · SQLite · 28 tools · 7 domains · Retrieve-before-act pattern
📋 Test Cases
📊 Execution Results
🐛 Defect Patterns
⚡ Perf Baselines
💡 AI Insights
👁️ Visual Baselines
📈 Suite Health
6 External MCP Servers — each a standalone FastMCP service
🎭
playwright-mcp
:8091
Browser automation · 7 tools
⚡
k6-mcp
:8092
Load testing · 5 tools
🐙
github-mcp
:8093
PRs · Checks · Issues · 7 tools
📋
jira-mcp
:8094
Tickets · JQL · Transitions · 6 tools
🗄️
postgres-mcp
:8095
Schema · Integrity · Migrations · 6 tools
💬
slack-mcp
:8096
Alerts · Reports · Threads · 5 tools
Orchestration / planning flow
External MCP integration
RAG read / write (all agents)
How It Works — Core Concepts
This platform is built on four interlocking concepts. Each has a detailed doc in the
docs/ folder of the repository.🔌
Model Context Protocol
The integration layer
MCP is an open standard from Anthropic that lets LLM agents call external tools through a typed, documented contract. Every integration in this platform is an MCP server — not hard-coded function calls.
- 7 MCP servers expose 64 tools total across the platform
- Agents receive tool schemas at runtime — no glue code
- Supports stdio (Claude Desktop) and HTTP (agent pipelines)
- Tool annotations (
readOnlyHint,destructiveHint) signal safety
🧠
Retrieval-Augmented Generation
The memory layer
Agents don't reason in a vacuum. Before every decision they retrieve relevant history from the knowledge store — past failures, baselines, insights — and augment their reasoning with it.
Task arrives
→
Retrieve history
→
Augment prompt
→
Generate decision
→
Write results
- knowledge-store-mcp is the RAG store — structured retrieval, not vector search
- 7 domains persist QA knowledge across every CI run
- Agents write insights that the next agent reads as context
🤖
Agentic AI
The reasoning layer
Each of the 14 workers is an autonomous agent — not a function call. They follow the ReAct loop: Observe → Reason → Act → Observe, calling tools until the task is done.
Observe
→
Reason
→
Call tool
→
Observe result
→
Loop / done
- claude-opus-4-6 for planning/analysis · claude-sonnet-4-6 for execution
- Agents handle errors autonomously — actionable ToolError messages guide retries
- Each agent has a narrow, focused system prompt for reliable tool selection
🎯
Multi-Agent Orchestration
The coordination layer
The orchestrator is itself an LLM agent that reasons about which sub-agents to run and in what order. Agents communicate through the knowledge store, not through direct calls.
- Phase 1 (Planning) — sequential: Blueprint → Test Gen
- Phase 2 (Execution) — 7 agents run in parallel via asyncio.gather
- Phase 3 (Analysis) — sequential: Defect → Monitor → Coverage → Report
- Knowledge store is the communication bus — decoupled, auditable, replayable
🛠️
LLM Tool Design
The reliability layer
A tool is only as good as its description. Every tool in this platform follows strict design principles so agents can discover, call, and recover from them without human intervention.
{server}_{verb}_{noun}naming — agents narrow candidates by prefix- Docstrings with Args + allowed values + Returns field names
- Actionable ToolError messages — agents recover without human help
- Returns include pre-computed fields and URLs agents will need
🚀
CI/CD Integration
The delivery layer
The platform is a drop-in QA layer for GitHub Actions. A PR opens, a webhook fires, all 7 MCP servers start, the orchestrator runs, and results flow back automatically.
- Triggered by pull_request or workflow_dispatch GitHub events
- Results posted as GitHub Check Runs, Jira tickets, and Slack messages
- Exit code 0/1 drives the PR gate — blocks merge on critical failures
- All credentials injected as GitHub Actions secrets — zero hard-coding
Knowledge Store — 7 Domains, 28 MCP Tools
A FastMCP Python server backed by SQLite. The platform's RAG layer — every agent retrieves context before reasoning and writes findings after. Runs in stdio (Claude Desktop) or HTTP (agent pipelines) mode.
28
MCP Tools
7
Knowledge Domains
9
SQLite Tables
14
Agents Connected
knowledge-store-mcp · claude_desktop_config.json
"knowledge-store": {
"command": "python",
"args": ["knowledge-store-mcp/server.py", "--stdio"],
"env": { "KS_DB_PATH": "knowledge_store.db" }
}
📋
5 toolsTest Cases
Test case definitions — name, module, type (e2e/api/perf/visual), priority, tags. Written by Test Gen Agent, read by all execution agents before running.
ks_insert_test_case
ks_get_test_case
ks_list_test_cases
ks_update_test_case
ks_delete_test_case
📊
4 toolsTest Execution Results
Every CI run's pass/fail/skip/flaky outcomes — duration, error message, screenshot path, run ID, PR number. Flaky tests detected automatically across runs.
ks_insert_test_result
ks_get_run_summary
ks_list_test_results
ks_list_flaky_tests
🐛
4 toolsDefect Patterns
Recurring failure signatures — module, root cause, frequency count, suggested fix. Frequency-ranked across runs, marked resolved when the fix lands.
ks_insert_defect_pattern
ks_list_defect_patterns
ks_increment_defect_frequency
ks_resolve_defect_pattern
⚡
5 toolsPerformance Baselines
Approved baselines per endpoint — median latency, p95, p99, error rate. Observations captured each run to detect regressions against stored thresholds.
ks_upsert_perf_baseline
ks_get_perf_baseline
ks_list_perf_baselines
ks_insert_perf_observation
ks_get_perf_trend
💡
3 toolsAI Insights
Claude-generated cross-run observations — anomalies, trends, recommendations, severity. Platform's "working memory": agents leave notes for each other to read.
ks_insert_insight
ks_list_insights
ks_acknowledge_insight
👁️
4 toolsVisual Baselines
Approved screenshot baselines per page, viewport, and browser. Pixel-diff percentages stored per CI run with pass/warn/fail status.
ks_upsert_visual_baseline
ks_get_visual_baseline
ks_insert_visual_diff
ks_list_visual_diffs
📈
3 toolsSuite Health
Composite health scores per suite: pass rate, flakiness rate, coverage %, open defects. Platform-wide health summary across all 14 agents and every run.
ks_upsert_suite_health
ks_list_suite_health
ks_get_platform_health_summary
14 AI QA Agents
Each agent is a specialised Claude-powered worker with its own system prompt, MCP tool scope, and knowledge store domains. Colour tags show which knowledge domains each agent uses. Grey text shows which external MCP servers it connects to.
Test Cases
Results
Defect Patterns
Perf Baselines
AI Insights
Visual
Suite Health
Phase 1 — Planning
📋
Test Blueprint Agent
test_blueprint_agent.py
Analyses a PR diff and produces a structured test plan — which modules are affected, what test types are needed, and priority levels.
MCPs: knowledge-store · github
Test CasesAI Insights
⚡
Test Generation Agent
test_gen_agent.py
Takes the blueprint, checks for duplicate test cases in the store, generates new test case definitions, and writes them with full metadata.
MCPs: knowledge-store
Test CasesResults
Phase 2 — Execution (parallel)
🤖
E2E Runner Agent
e2e_runner_agent.py
Retrieves test cases, runs Playwright E2E tests against staging, stores pass/fail results, posts a GitHub Check Run with the outcome.
MCPs: knowledge-store · playwright · github
ResultsDefect PatternsSuite Health
🔌
API Probe Agent
api_probe_agent.py
Validates REST/GraphQL endpoints — contracts, status codes, schemas, and response times. Compares against stored performance baselines.
MCPs: knowledge-store · github
ResultsPerf BaselinesDefect Patterns
⚡
Performance Monitor Agent
perf_monitor_agent.py
Runs k6 load tests, compares p95/p99 latency against stored baselines, records observations, flags regressions with Slack alerts.
MCPs: knowledge-store · k6 · slack
Perf BaselinesResultsAI Insights
👁️
Visual AI Agent
visual_ai_agent.py
Captures screenshots with Playwright, compares pixel-by-pixel to stored baselines, records diff percentages, reports visual regressions.
MCPs: knowledge-store · playwright
VisualResultsAI Insights
♿
Accessibility Agent
accessibility_agent.py
Runs axe-core WCAG 2.1 checks via Playwright, creates Jira tickets for violations above severity threshold, writes remediation insights.
MCPs: knowledge-store · playwright · jira
ResultsDefect PatternsAI Insights
🔒
Security Scanner Agent
security_scanner_agent.py
Scans for OWASP vulnerabilities — exposed headers, injection surface, auth flaws, secrets. Creates Jira and GitHub issues for critical findings.
MCPs: knowledge-store · github · jira
Defect PatternsAI InsightsResults
🛡️
Data Guard Agent
data_guard_agent.py
Validates PostgreSQL integrity — foreign key violations, schema drift, null constraints, missing PKs. Runs a full migration health check after every deploy.
MCPs: knowledge-store · postgres
Test CasesResultsDefect Patterns
Phase 3 — Analysis & Reporting
🐛
Defect Analyst Agent
defect_analyst_agent.py
Reads all new failures, correlates with stored defect patterns, estimates root cause, increments frequency counts, creates Jira tickets for new patterns.
MCPs: knowledge-store · jira · github
Defect PatternsAI InsightsResults
📡
CI Monitor Agent
ci_monitor_agent.py
Computes suite health, detects anomalies in pass rates, posts Slack alerts for critical failures and SLA breaches. Updates platform health snapshot.
MCPs: knowledge-store · github · slack
Suite HealthAI InsightsResults
📊
Coverage Agent
coverage_agent.py
Measures code and requirements coverage per module. Flags untested areas, writes coverage-gap insights, updates suite health records.
MCPs: knowledge-store · github
Suite HealthAI InsightsTest Cases
💬
Chatbot Eval Agent
chatbot_eval_agent.py
Evaluates LLM-powered features for accuracy, tone, safety, and hallucination. Posts structured eval results to Slack and stores quality trend insights.
MCPs: knowledge-store · slack
ResultsAI InsightsPerf Baselines
📄
Report Gen Agent
report_gen_agent.py
Reads all 7 knowledge domains after each CI run and generates the executive QA report — posted to Slack and Jira, with GitHub check run details.
MCPs: knowledge-store · jira · slack · github
Test CasesResultsDefect PatternsPerf BaselinesAI InsightsSuite Health
Integration Guide
All 7 MCP servers are configured for Claude Desktop via a single JSON paste. For CI/CD pipelines, a GitHub Actions workflow starts all servers and runs the orchestrator automatically on every PR.
All 7 MCP Servers — Claude Desktop Config
Paste into
%APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS), then restart Claude Desktop.🧠
knowledge-store
stdio · 28 tools
🎭
playwright
stdio · 7 tools
⚡
k6
stdio · 5 tools
🐙
github
stdio · 7 tools
📋
jira
stdio · 6 tools
🗄️
postgres
stdio · 6 tools
💬
slack
stdio · 5 tools
external-mcps/claude_desktop_config.json (excerpt)
{
"mcpServers": {
"knowledge-store": { "command": "python", "args": ["../knowledge-store-mcp/server.py", "--stdio"] },
"playwright": { "command": "python", "args": ["playwright-mcp/server.py", "--stdio"] },
"github": { "command": "python", "args": ["github-mcp/server.py", "--stdio"],
"jira": { "command": "python", "args": ["jira-mcp/server.py", "--stdio"], "env": { "JIRA_EMAIL": "your@email.com", "JIRA_API_TOKEN": "...", "JIRA_BASE_URL": "https://yourorg.atlassian.net" } },
"postgres": { "command": "python", "args": ["postgres-mcp/server.py", "--stdio"], "env": { "PG_DSN": "postgresql://user:pass@localhost/mydb" } },
"slack": { "command": "python", "args": ["slack-mcp/server.py", "--stdio"], "env": { "SLACK_BOT_TOKEN": "xoxb-..." } }
}
}
CI/CD: GitHub Actions Workflow
Add
.github/workflows/ai-qa.yml to your repo. The platform triggers automatically on every pull request to main or develop..github/workflows/ai-qa.yml
on:
pull_request:
branches: [main, develop]
jobs:
ai-qa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start MCP servers
run: python knowledge-store-mcp/server.py --port 8090 &
- name: Run AI QA Orchestrator
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python agents/cicd_orchestrator_agent.py --pr ${{ github.event.pull_request.number }} --staging-url ${{ secrets.STAGING_URL }}
Calling an Agent — Python Pattern
Every agent is a single
client.beta.messages.create() call. The while loop drives the ReAct cycle — calling tools until stop_reason == "end_turn".agents/e2e_runner_agent.py (simplified)
response = client.beta.messages.create(
model="claude-opus-4-6",
tools=[
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8090/mcp", "name": "knowledge-store" } },
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8091/mcp", "name": "playwright" } },
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8093/mcp", "name": "github" } },
],
messages=[{"role":"user","content":"Run E2E suite for PR #42"}]
)
while response.stop_reason == "tool_use":
results = execute_tools(response.content)
response = client.beta.messages.create(..., messages=[..., results])
Quick Start — Local Development
Run the full platform locally against a staging environment in 3 steps.
1
Set credentials
environment
export ANTHROPIC_API_KEY=sk-ant-...
export GITHUB_TOKEN=ghp_...
export JIRA_EMAIL=your@email.com
export SLACK_BOT_TOKEN=xoxb-...
export PG_DSN=postgresql://user:pass@localhost/mydb
2
Start MCP servers
bash
python knowledge-store-mcp/server.py --port 8090 &
python external-mcps/playwright-mcp/server.py --port 8091 &
python external-mcps/github-mcp/server.py --port 8093 &
sleep 5
3
Run the orchestrator
bash
python agents/cicd_orchestrator_agent.py \
--pr 42 \
--staging-url https://staging.example.com
Tools Reference
28 tools across 7 knowledge domains — the full knowledge-store-mcp surface area
Showing 28 tools
| Tool | Domain | R/W | Description |
|---|---|---|---|
ks_insert_test_case |
Test Cases | Write | Insert a new test case definition into the knowledge store. Requires test_case_id, module, name, test_type, and steps. |
ks_get_test_case |
Test Cases | Read | Fetch a single test case by ID. Returns full definition including steps, priority, and module. |
ks_list_test_cases |
Test Cases | Read | List and filter test cases by module, test_type, or status. Default limit 50. |
ks_update_test_case |
Test Cases | Write | Update an existing test case — steps, priority, status, or expected outcome. |
ks_delete_test_case |
Test Cases | Destructive | Delete a test case by ID. Irreversible — destructiveHint: true. |
ks_insert_test_result |
Test Results | Write | Record a test execution result (pass/fail/flaky/skipped) with duration and optional error message. |
ks_get_run_summary |
Test Results | Read | Get aggregated pass/fail/flaky counts and pre-computed pass_rate for a run_id. |
ks_list_test_results |
Test Results | Read | List individual test results for a run, filterable by status. Includes error messages. |
ks_list_flaky_tests |
Test Results | Read | Return test cases with flaky status across recent runs — used by defect_analyst_agent for pattern detection. |
ks_insert_defect_pattern |
Defect Patterns | Write | Store a newly detected defect pattern with module, root_cause, and suggested_fix. |
ks_list_defect_patterns |
Defect Patterns | Read | List known defect patterns filterable by status (active/resolved) and module. Returns root_cause and suggested_fix. |
ks_increment_defect_frequency |
Defect Patterns | Write | Increment the recurrence count for an existing defect pattern. Called when a known pattern is seen again. |
ks_resolve_defect_pattern |
Defect Patterns | Write | Mark a defect pattern as resolved. Triggers Jira auto-transition via defect_analyst_agent. |
ks_upsert_perf_baseline |
Perf Baselines | Write | Create or update a performance baseline (p50/p95/p99 latency, error rate) for an endpoint. |
ks_get_perf_baseline |
Perf Baselines | Read | Retrieve the current performance baseline for a specific endpoint — used by perf_monitor_agent for regression detection. |
ks_list_perf_baselines |
Perf Baselines | Read | List all performance baselines, optionally filtered by module or endpoint prefix. |
ks_insert_perf_observation |
Perf Baselines | Write | Record a single k6 load test observation (latency percentiles, throughput, error rate). |
ks_get_perf_trend |
Perf Baselines | Read | Return the last N observations for an endpoint to show performance trajectory over time. |
ks_insert_insight |
AI Insights | Write | Persist an AI-generated insight (anomaly, recommendation, risk flag) with severity and source agent. |
ks_list_insights |
AI Insights | Read | List insights filterable by severity, acknowledged status, or source agent. Used by report_gen_agent. |
ks_acknowledge_insight |
AI Insights | Write | Mark an insight as acknowledged so it is excluded from future unreviewed lists. |
ks_upsert_visual_baseline |
Visual Baselines | Write | Store or update the reference screenshot hash and metadata for a page/viewport combination. |
ks_get_visual_baseline |
Visual Baselines | Read | Retrieve the current visual baseline for a page — used by visual_ai_agent before taking new screenshots. |
ks_insert_visual_diff |
Visual Baselines | Write | Record a detected visual difference with diff percentage, affected region, and run_id. |
ks_list_visual_diffs |
Visual Baselines | Read | List visual diffs for a run or page, sorted by diff percentage. Used in reports and PR checks. |
ks_upsert_suite_health |
Suite Health | Write | Update the suite health record for a module — pass rate, flaky count, coverage, last run timestamp. |
ks_list_suite_health |
Suite Health | Read | List suite health records for all modules — used by coverage_agent and report_gen_agent. |
ks_get_platform_health_summary |
Suite Health | Read | Return a single-call platform health overview — aggregate pass rate, active defects, regressions, and critical insights. |