Claude-native
Model Context Protocol
Agentic AI · RAG

Claude-native AI QA Automation System

14 specialised agents · 7 MCP servers · a shared RAG knowledge store that learns from every CI run

14
AI QA Agents
7
MCP Servers
64
MCP Tools
7
Knowledge Domains
3
CI Phases
Platform Architecture
The CI/CD Orchestrator coordinates 14 specialised agents across 3 pipeline phases. All agents share state through the knowledge-store-mcp (RAG layer). External services — GitHub, Jira, Playwright, k6, PostgreSQL, and Slack — are each a first-class MCP server.
CI/CD Orchestrator Agent
cicd_orchestrator_agent.py · Coordinates phases · Aggregates results · Triggers on PR
1 Planning — sequential
📋 Test Blueprint Agent
Test Gen Agent
2 Execution — parallel (asyncio.gather)
🤖 E2E Runner
🔌 API Probe
Perf Monitor
👁️ Visual AI
Accessibility
🔒 Security Scanner
🛡️ Data Guard
All 7 agents run concurrently · Each reads & writes the shared knowledge store · Results persist even if one agent fails
3 Analysis & Reporting — sequential
🐛 Defect Analyst
📡 CI Monitor
📊 Coverage
💬 Chatbot Eval
📄 Report Gen
knowledge-store-mcp · :8090 RAG Layer
Shared Knowledge Store
FastMCP · Python · SQLite · 28 tools · 7 domains · Retrieve-before-act pattern
📋 Test Cases
📊 Execution Results
🐛 Defect Patterns
⚡ Perf Baselines
💡 AI Insights
👁️ Visual Baselines
📈 Suite Health
6 External MCP Servers — each a standalone FastMCP service
🎭
playwright-mcp
:8091
Browser automation · 7 tools
k6-mcp
:8092
Load testing · 5 tools
🐙
github-mcp
:8093
PRs · Checks · Issues · 7 tools
📋
jira-mcp
:8094
Tickets · JQL · Transitions · 6 tools
🗄️
postgres-mcp
:8095
Schema · Integrity · Migrations · 6 tools
💬
slack-mcp
:8096
Alerts · Reports · Threads · 5 tools
Orchestration / planning flow
External MCP integration
RAG read / write (all agents)
How It Works — Core Concepts
This platform is built on four interlocking concepts. Each has a detailed doc in the docs/ folder of the repository.
🔌
Model Context Protocol
The integration layer
MCP is an open standard from Anthropic that lets LLM agents call external tools through a typed, documented contract. Every integration in this platform is an MCP server — not hard-coded function calls.
  • 7 MCP servers expose 64 tools total across the platform
  • Agents receive tool schemas at runtime — no glue code
  • Supports stdio (Claude Desktop) and HTTP (agent pipelines)
  • Tool annotations (readOnlyHint, destructiveHint) signal safety
📄 docs/mcp.md
🧠
Retrieval-Augmented Generation
The memory layer
Agents don't reason in a vacuum. Before every decision they retrieve relevant history from the knowledge store — past failures, baselines, insights — and augment their reasoning with it.
Task arrives Retrieve history Augment prompt Generate decision Write results
  • knowledge-store-mcp is the RAG store — structured retrieval, not vector search
  • 7 domains persist QA knowledge across every CI run
  • Agents write insights that the next agent reads as context
📄 docs/rag.md
🤖
Agentic AI
The reasoning layer
Each of the 14 workers is an autonomous agent — not a function call. They follow the ReAct loop: Observe → Reason → Act → Observe, calling tools until the task is done.
Observe Reason Call tool Observe result Loop / done
  • claude-opus-4-6 for planning/analysis · claude-sonnet-4-6 for execution
  • Agents handle errors autonomously — actionable ToolError messages guide retries
  • Each agent has a narrow, focused system prompt for reliable tool selection
📄 docs/agentic-ai.md
🎯
Multi-Agent Orchestration
The coordination layer
The orchestrator is itself an LLM agent that reasons about which sub-agents to run and in what order. Agents communicate through the knowledge store, not through direct calls.
  • Phase 1 (Planning) — sequential: Blueprint → Test Gen
  • Phase 2 (Execution) — 7 agents run in parallel via asyncio.gather
  • Phase 3 (Analysis) — sequential: Defect → Monitor → Coverage → Report
  • Knowledge store is the communication bus — decoupled, auditable, replayable
📄 docs/multi-agent-orchestration.md
🛠️
LLM Tool Design
The reliability layer
A tool is only as good as its description. Every tool in this platform follows strict design principles so agents can discover, call, and recover from them without human intervention.
  • {server}_{verb}_{noun} naming — agents narrow candidates by prefix
  • Docstrings with Args + allowed values + Returns field names
  • Actionable ToolError messages — agents recover without human help
  • Returns include pre-computed fields and URLs agents will need
📄 docs/tool-design.md
🚀
CI/CD Integration
The delivery layer
The platform is a drop-in QA layer for GitHub Actions. A PR opens, a webhook fires, all 7 MCP servers start, the orchestrator runs, and results flow back automatically.
  • Triggered by pull_request or workflow_dispatch GitHub events
  • Results posted as GitHub Check Runs, Jira tickets, and Slack messages
  • Exit code 0/1 drives the PR gate — blocks merge on critical failures
  • All credentials injected as GitHub Actions secrets — zero hard-coding
📄 docs/cicd-integration.md
Knowledge Store — 7 Domains, 28 MCP Tools
A FastMCP Python server backed by SQLite. The platform's RAG layer — every agent retrieves context before reasoning and writes findings after. Runs in stdio (Claude Desktop) or HTTP (agent pipelines) mode.
28
MCP Tools
7
Knowledge Domains
9
SQLite Tables
14
Agents Connected
knowledge-store-mcp · claude_desktop_config.json
"knowledge-store": {
"command": "python",
"args": ["knowledge-store-mcp/server.py", "--stdio"],
"env": { "KS_DB_PATH": "knowledge_store.db" }
}
📋
5 tools
Test Cases
Test case definitions — name, module, type (e2e/api/perf/visual), priority, tags. Written by Test Gen Agent, read by all execution agents before running.
ks_insert_test_case ks_get_test_case ks_list_test_cases ks_update_test_case ks_delete_test_case
📊
4 tools
Test Execution Results
Every CI run's pass/fail/skip/flaky outcomes — duration, error message, screenshot path, run ID, PR number. Flaky tests detected automatically across runs.
ks_insert_test_result ks_get_run_summary ks_list_test_results ks_list_flaky_tests
🐛
4 tools
Defect Patterns
Recurring failure signatures — module, root cause, frequency count, suggested fix. Frequency-ranked across runs, marked resolved when the fix lands.
ks_insert_defect_pattern ks_list_defect_patterns ks_increment_defect_frequency ks_resolve_defect_pattern
5 tools
Performance Baselines
Approved baselines per endpoint — median latency, p95, p99, error rate. Observations captured each run to detect regressions against stored thresholds.
ks_upsert_perf_baseline ks_get_perf_baseline ks_list_perf_baselines ks_insert_perf_observation ks_get_perf_trend
💡
3 tools
AI Insights
Claude-generated cross-run observations — anomalies, trends, recommendations, severity. Platform's "working memory": agents leave notes for each other to read.
ks_insert_insight ks_list_insights ks_acknowledge_insight
👁️
4 tools
Visual Baselines
Approved screenshot baselines per page, viewport, and browser. Pixel-diff percentages stored per CI run with pass/warn/fail status.
ks_upsert_visual_baseline ks_get_visual_baseline ks_insert_visual_diff ks_list_visual_diffs
📈
3 tools
Suite Health
Composite health scores per suite: pass rate, flakiness rate, coverage %, open defects. Platform-wide health summary across all 14 agents and every run.
ks_upsert_suite_health ks_list_suite_health ks_get_platform_health_summary
14 AI QA Agents
Each agent is a specialised Claude-powered worker with its own system prompt, MCP tool scope, and knowledge store domains. Colour tags show which knowledge domains each agent uses. Grey text shows which external MCP servers it connects to.
Test Cases Results Defect Patterns Perf Baselines AI Insights Visual Suite Health
Phase 1 — Planning
📋
Test Blueprint Agent
test_blueprint_agent.py
Analyses a PR diff and produces a structured test plan — which modules are affected, what test types are needed, and priority levels.
MCPs: knowledge-store · github
Test CasesAI Insights
Test Generation Agent
test_gen_agent.py
Takes the blueprint, checks for duplicate test cases in the store, generates new test case definitions, and writes them with full metadata.
MCPs: knowledge-store
Test CasesResults
Phase 2 — Execution (parallel)
🤖
E2E Runner Agent
e2e_runner_agent.py
Retrieves test cases, runs Playwright E2E tests against staging, stores pass/fail results, posts a GitHub Check Run with the outcome.
MCPs: knowledge-store · playwright · github
ResultsDefect PatternsSuite Health
🔌
API Probe Agent
api_probe_agent.py
Validates REST/GraphQL endpoints — contracts, status codes, schemas, and response times. Compares against stored performance baselines.
MCPs: knowledge-store · github
ResultsPerf BaselinesDefect Patterns
Performance Monitor Agent
perf_monitor_agent.py
Runs k6 load tests, compares p95/p99 latency against stored baselines, records observations, flags regressions with Slack alerts.
MCPs: knowledge-store · k6 · slack
Perf BaselinesResultsAI Insights
👁️
Visual AI Agent
visual_ai_agent.py
Captures screenshots with Playwright, compares pixel-by-pixel to stored baselines, records diff percentages, reports visual regressions.
MCPs: knowledge-store · playwright
VisualResultsAI Insights
Accessibility Agent
accessibility_agent.py
Runs axe-core WCAG 2.1 checks via Playwright, creates Jira tickets for violations above severity threshold, writes remediation insights.
MCPs: knowledge-store · playwright · jira
ResultsDefect PatternsAI Insights
🔒
Security Scanner Agent
security_scanner_agent.py
Scans for OWASP vulnerabilities — exposed headers, injection surface, auth flaws, secrets. Creates Jira and GitHub issues for critical findings.
MCPs: knowledge-store · github · jira
Defect PatternsAI InsightsResults
🛡️
Data Guard Agent
data_guard_agent.py
Validates PostgreSQL integrity — foreign key violations, schema drift, null constraints, missing PKs. Runs a full migration health check after every deploy.
MCPs: knowledge-store · postgres
Test CasesResultsDefect Patterns
Phase 3 — Analysis & Reporting
🐛
Defect Analyst Agent
defect_analyst_agent.py
Reads all new failures, correlates with stored defect patterns, estimates root cause, increments frequency counts, creates Jira tickets for new patterns.
MCPs: knowledge-store · jira · github
Defect PatternsAI InsightsResults
📡
CI Monitor Agent
ci_monitor_agent.py
Computes suite health, detects anomalies in pass rates, posts Slack alerts for critical failures and SLA breaches. Updates platform health snapshot.
MCPs: knowledge-store · github · slack
Suite HealthAI InsightsResults
📊
Coverage Agent
coverage_agent.py
Measures code and requirements coverage per module. Flags untested areas, writes coverage-gap insights, updates suite health records.
MCPs: knowledge-store · github
Suite HealthAI InsightsTest Cases
💬
Chatbot Eval Agent
chatbot_eval_agent.py
Evaluates LLM-powered features for accuracy, tone, safety, and hallucination. Posts structured eval results to Slack and stores quality trend insights.
MCPs: knowledge-store · slack
ResultsAI InsightsPerf Baselines
📄
Report Gen Agent
report_gen_agent.py
Reads all 7 knowledge domains after each CI run and generates the executive QA report — posted to Slack and Jira, with GitHub check run details.
MCPs: knowledge-store · jira · slack · github
Test CasesResultsDefect PatternsPerf BaselinesAI InsightsSuite Health
Integration Guide
All 7 MCP servers are configured for Claude Desktop via a single JSON paste. For CI/CD pipelines, a GitHub Actions workflow starts all servers and runs the orchestrator automatically on every PR.
All 7 MCP Servers — Claude Desktop Config
Paste into %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS), then restart Claude Desktop.
🧠
knowledge-store
stdio · 28 tools
🎭
playwright
stdio · 7 tools
k6
stdio · 5 tools
🐙
github
stdio · 7 tools
📋
jira
stdio · 6 tools
🗄️
postgres
stdio · 6 tools
💬
slack
stdio · 5 tools
external-mcps/claude_desktop_config.json (excerpt)
{
"mcpServers": {
"knowledge-store": { "command": "python", "args": ["../knowledge-store-mcp/server.py", "--stdio"] },
"playwright": { "command": "python", "args": ["playwright-mcp/server.py", "--stdio"] },
"github": { "command": "python", "args": ["github-mcp/server.py", "--stdio"],
"jira": { "command": "python", "args": ["jira-mcp/server.py", "--stdio"], "env": { "JIRA_EMAIL": "your@email.com", "JIRA_API_TOKEN": "...", "JIRA_BASE_URL": "https://yourorg.atlassian.net" } },
"postgres": { "command": "python", "args": ["postgres-mcp/server.py", "--stdio"], "env": { "PG_DSN": "postgresql://user:pass@localhost/mydb" } },
"slack": { "command": "python", "args": ["slack-mcp/server.py", "--stdio"], "env": { "SLACK_BOT_TOKEN": "xoxb-..." } }
}
}
CI/CD: GitHub Actions Workflow
Add .github/workflows/ai-qa.yml to your repo. The platform triggers automatically on every pull request to main or develop.
.github/workflows/ai-qa.yml
on:
pull_request:
branches: [main, develop]
jobs:
ai-qa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start MCP servers
run: python knowledge-store-mcp/server.py --port 8090 &
- name: Run AI QA Orchestrator
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python agents/cicd_orchestrator_agent.py --pr ${{ github.event.pull_request.number }} --staging-url ${{ secrets.STAGING_URL }}
Calling an Agent — Python Pattern
Every agent is a single client.beta.messages.create() call. The while loop drives the ReAct cycle — calling tools until stop_reason == "end_turn".
agents/e2e_runner_agent.py (simplified)
response = client.beta.messages.create(
model="claude-opus-4-6",
tools=[
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8090/mcp", "name": "knowledge-store" } },
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8091/mcp", "name": "playwright" } },
{ "type": "mcp", "source": { "type": "url", "url": "http://localhost:8093/mcp", "name": "github" } },
],
messages=[{"role":"user","content":"Run E2E suite for PR #42"}]
)
while response.stop_reason == "tool_use":
results = execute_tools(response.content)
response = client.beta.messages.create(..., messages=[..., results])
Quick Start — Local Development
Run the full platform locally against a staging environment in 3 steps.
1
Set credentials
environment
export ANTHROPIC_API_KEY=sk-ant-...
export GITHUB_TOKEN=ghp_...
export JIRA_EMAIL=your@email.com
export SLACK_BOT_TOKEN=xoxb-...
export PG_DSN=postgresql://user:pass@localhost/mydb
2
Start MCP servers
bash
python knowledge-store-mcp/server.py --port 8090 &
python external-mcps/playwright-mcp/server.py --port 8091 &
python external-mcps/github-mcp/server.py --port 8093 &
sleep 5
3
Run the orchestrator
bash
python agents/cicd_orchestrator_agent.py \
--pr 42 \
--staging-url https://staging.example.com
Tools Reference
28 tools across 7 knowledge domains — the full knowledge-store-mcp surface area
Showing 28 tools
ToolDomainR/WDescription
ks_insert_test_case Test Cases Write Insert a new test case definition into the knowledge store. Requires test_case_id, module, name, test_type, and steps.
ks_get_test_case Test Cases Read Fetch a single test case by ID. Returns full definition including steps, priority, and module.
ks_list_test_cases Test Cases Read List and filter test cases by module, test_type, or status. Default limit 50.
ks_update_test_case Test Cases Write Update an existing test case — steps, priority, status, or expected outcome.
ks_delete_test_case Test Cases Destructive Delete a test case by ID. Irreversible — destructiveHint: true.
ks_insert_test_result Test Results Write Record a test execution result (pass/fail/flaky/skipped) with duration and optional error message.
ks_get_run_summary Test Results Read Get aggregated pass/fail/flaky counts and pre-computed pass_rate for a run_id.
ks_list_test_results Test Results Read List individual test results for a run, filterable by status. Includes error messages.
ks_list_flaky_tests Test Results Read Return test cases with flaky status across recent runs — used by defect_analyst_agent for pattern detection.
ks_insert_defect_pattern Defect Patterns Write Store a newly detected defect pattern with module, root_cause, and suggested_fix.
ks_list_defect_patterns Defect Patterns Read List known defect patterns filterable by status (active/resolved) and module. Returns root_cause and suggested_fix.
ks_increment_defect_frequency Defect Patterns Write Increment the recurrence count for an existing defect pattern. Called when a known pattern is seen again.
ks_resolve_defect_pattern Defect Patterns Write Mark a defect pattern as resolved. Triggers Jira auto-transition via defect_analyst_agent.
ks_upsert_perf_baseline Perf Baselines Write Create or update a performance baseline (p50/p95/p99 latency, error rate) for an endpoint.
ks_get_perf_baseline Perf Baselines Read Retrieve the current performance baseline for a specific endpoint — used by perf_monitor_agent for regression detection.
ks_list_perf_baselines Perf Baselines Read List all performance baselines, optionally filtered by module or endpoint prefix.
ks_insert_perf_observation Perf Baselines Write Record a single k6 load test observation (latency percentiles, throughput, error rate).
ks_get_perf_trend Perf Baselines Read Return the last N observations for an endpoint to show performance trajectory over time.
ks_insert_insight AI Insights Write Persist an AI-generated insight (anomaly, recommendation, risk flag) with severity and source agent.
ks_list_insights AI Insights Read List insights filterable by severity, acknowledged status, or source agent. Used by report_gen_agent.
ks_acknowledge_insight AI Insights Write Mark an insight as acknowledged so it is excluded from future unreviewed lists.
ks_upsert_visual_baseline Visual Baselines Write Store or update the reference screenshot hash and metadata for a page/viewport combination.
ks_get_visual_baseline Visual Baselines Read Retrieve the current visual baseline for a page — used by visual_ai_agent before taking new screenshots.
ks_insert_visual_diff Visual Baselines Write Record a detected visual difference with diff percentage, affected region, and run_id.
ks_list_visual_diffs Visual Baselines Read List visual diffs for a run or page, sorted by diff percentage. Used in reports and PR checks.
ks_upsert_suite_health Suite Health Write Update the suite health record for a module — pass rate, flaky count, coverage, last run timestamp.
ks_list_suite_health Suite Health Read List suite health records for all modules — used by coverage_agent and report_gen_agent.
ks_get_platform_health_summary Suite Health Read Return a single-call platform health overview — aggregate pass rate, active defects, regressions, and critical insights.