AI Testing Playground

Master the complete AI quality engineering stack — from prompt design and model evaluation to safety red-teaming, RAG pipelines, and agent testing.

🧪 Core Testing
🔬

LLM Evaluation

Prompt engineering, model metrics, performance benchmarking & data quality testing

4 topics →
🔍

RAG & Pipeline Testing

Retrieval quality, answer faithfulness, context relevance, chunk quality & RAGAS framework

4 topics →
💬

Conversational AI Testing

Dialogue flows, intent recognition, multi-turn coherence, persona consistency & fallback handling

4 topics →
🖼️

Multimodal AI Testing

Vision model testing, speech-to-text, image-text alignment, multimodal hallucination & OCR evaluation

4 topics →
🛡️ Safety & Governance
🛡️

AI Safety & Red Teaming

Jailbreak testing, prompt injection attacks, data poisoning, model inversion & adversarial examples

4 topics →
⚖️

Responsible AI

Explainability, fairness auditing, privacy governance, transparency & regulatory compliance

4 topics →
🤖 Agents & Systems
🤖

Agent & Tool Testing

Tool call validation, multi-agent systems, loop detection, MCP server testing & orchestration

4 topics →
📡

Observability & Monitoring

LLM tracing, production logging, quality dashboards, drift alerting, cost monitoring & incident response

4 topics →
🛠️ Practical Labs
🛠️

Practical Sessions

Hands-on code labs for Promptfoo test harness and LangTest — assertions, red-teaming, bias testing & CI/CD integration

2 labs →