Practical Sessions – AI Testing Playground

🛠️ Practical Sessions

Promptfoo Test Harness

⚙️

Promptfoo Setup & YAML Config

Install promptfoo, configure providers and write your first test suite in YAML

✅

Promptfoo Assertions & Scoring

Master built-in assertions, custom scorers, and threshold-based pass/fail logic

🔴

Promptfoo Red-Teaming

Automated adversarial testing — scan for jailbreaks, PII leaks and prompt injections

🚀

Promptfoo CI/CD Integration

Gate deployments with LLM quality checks in GitHub Actions, pass-rate thresholds and diff reports

🛠️ Practical Sessions

LangTest

🔧

LangTest Setup & Harness Config

Install LangTest, configure the Harness, and connect HuggingFace or OpenAI models

💪

LangTest Robustness & NLP Tests

Test model stability under typos, case changes, contractions and entity swaps

⚖️

LangTest Bias & Fairness Tests

Detect demographic, gender, religion and nationality biases in NLP model predictions

📊

Custom Tests & HTML Reports

Write custom test types, set per-category thresholds, and export shareable HTML reports

🛠️ Practical Sessions

DeepEval

🔬

DeepEval Setup & Core Metrics

Install DeepEval, write your first test case, and run built-in metrics like answer relevancy and hallucination

🗂️

DeepEval RAG Evaluation

Evaluate RAG pipelines with faithfulness, context recall, context precision and answer relevancy

⚗️

DeepEval Custom Metrics & G-Eval

Build domain-specific metrics with G-Eval and write fully custom scorer classes

🛠️ Practical Sessions

RAGAS

📐

RAGAS Setup & Core Metrics

Install RAGAS, build a Dataset, and compute faithfulness, answer relevancy and context metrics

🧬

RAGAS Advanced: Testset Generation & Custom Metrics

Auto-generate evaluation testsets from your documents and write custom RAGAS metrics

📈

RAGAS Production Monitoring

Track RAG quality in production with continuous RAGAS scoring and drift detection

🛠️ Practical Sessions

LangSmith

🔭

LangSmith Tracing & Observability

Instrument LangChain apps with LangSmith tracing — capture runs, inspect spans and debug failures

📋

LangSmith Datasets & Evaluation

Create evaluation datasets, run automated evaluators and track quality metrics over time