Testing, Monitoring, and Evaluating OpenAI Models

Course

Libraries: If you want this course, consider one of these libraries.
AI

Testing, Monitoring, and Evaluating OpenAI Models

Test, monitor, and evaluate GPT-5 apps using the latest OpenAI APIs. Build typed, tool-using services, run CI-grade evals, add observability, and enforce compliance for production-ready LLM deployments.

Brian Letort

Get started

What you'll learn

As AI applications move into high-stakes domains, ensuring their accuracy, reliability, and safety is more important than ever. In this course, Testing, Monitoring, and Evaluating OpenAI Models, you’ll gain the ability to build, harden, and monitor GPT-5-powered applications using the latest OpenAI APIs.

First, you’ll explore how to create typed, tool-using services with the Responses API—enforcing structured JSON Schema outputs, calling function tools, streaming partial responses, and enabling built-in web and file search. Along the way, you’ll log key metrics like latency and cost, and enable prompt caching to measure performance improvements.

Next, you’ll discover how to implement CI-grade evaluation gates using the Evals and Batch APIs. You’ll score model outputs for correctness, tone, and relevance; integrate webhooks to receive completions; and generate JUnit-style reports that feed directly into your CI/CD pipelines—blocking poor-quality releases before they reach production.

Finally, you’ll learn how to monitor AI services in production. Using the Agents SDK, you’ll capture detailed traces, export telemetry via OpenTelemetry, and add moderation layers for multimodal content. You’ll also build a React interface to visualize GPT-5 responses, automated quality scores, PII detection, and compliance flags—leveraging GPT-4o as a judge.

When you’re finished with this course, you’ll have the skills and knowledge to build production-grade GPT-5 applications that are testable, observable, and compliant—ready for deployment in real-world environments.

About the author

Brian Letort

Dr. Daniel “Brian” Letort is a 22+ year veteran of Information Technology. During a 21-year tenure at Northrop Grumman, Brian held various roles across software engineering, systems engineering, Chief Applications Architect, Chief Data Scientist, and Chief Enterprise Architect. Brian held the NG Fellow title for six years and Technical Fellow title for four years prior. In 2022, Brian joined Digital Realty as the Chief Architect - Product and Artificial Intelligence. Aside from working at Digital Realty, Brian has 12+ years experience in teaching Data Science and Computer Science classes as an adjunct professor. Brian has authored two books and holds two patents.

More Courses by Brian