Evals: OpenAI アイテムを評価するためのフレームワークとベンチマークのレジストリ
Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks. You can use Evals to create and run evaluations that: use datasets to generate prompts, measure the quality of completions provided by an OpenAI model, and compare performance across different datasets and models. With Evals, we aim to make it as…