🛠️ Practical Sessions
Promptfoo Test Harness
Promptfoo Setup & YAML Config
Install promptfoo, configure providers and write your first test suite in YAML
Promptfoo Assertions & Scoring
Master built-in assertions, custom scorers, and threshold-based pass/fail logic
Promptfoo Red-Teaming
Automated adversarial testing — scan for jailbreaks, PII leaks and prompt injections
Promptfoo CI/CD Integration
Gate deployments with LLM quality checks in GitHub Actions, pass-rate thresholds and diff reports
🛠️ Practical Sessions
LangTest
LangTest Setup & Harness Config
Install LangTest, configure the Harness, and connect HuggingFace or OpenAI models
LangTest Robustness & NLP Tests
Test model stability under typos, case changes, contractions and entity swaps
LangTest Bias & Fairness Tests
Detect demographic, gender, religion and nationality biases in NLP model predictions
Custom Tests & HTML Reports
Write custom test types, set per-category thresholds, and export shareable HTML reports
🛠️ Practical Sessions
DeepEval
DeepEval Setup & Core Metrics
Install DeepEval, write your first test case, and run built-in metrics like answer relevancy and hallucination
DeepEval RAG Evaluation
Evaluate RAG pipelines with faithfulness, context recall, context precision and answer relevancy
DeepEval Custom Metrics & G-Eval
Build domain-specific metrics with G-Eval and write fully custom scorer classes
🛠️ Practical Sessions
RAGAS
RAGAS Setup & Core Metrics
Install RAGAS, build a Dataset, and compute faithfulness, answer relevancy and context metrics
RAGAS Advanced: Testset Generation & Custom Metrics
Auto-generate evaluation testsets from your documents and write custom RAGAS metrics
RAGAS Production Monitoring
Track RAG quality in production with continuous RAGAS scoring and drift detection
🛠️ Practical Sessions
LangSmith
LangSmith Tracing & Observability
Instrument LangChain apps with LangSmith tracing — capture runs, inspect spans and debug failures
LangSmith Datasets & Evaluation
Create evaluation datasets, run automated evaluators and track quality metrics over time