A Quality Standard tracks a standardized process for how a team evaluates a model's performance on a dataset.
Users may define and manage quality standards for a dataset in the Kolena web application from a that dataset's
Quality Standards tab. Once defined, a quality standard provides a well-defined framework for easily understanding and
comparing future model results.
Test cases allow users to evaluate their datasets at various levels of division, providing visibility into how models perform at differing subsets of the full dataset, and mitigating failures caused by hidden stratifications.
Kolena supports easy test case creation through dividing a dataset along categorical or numeric datapoint properties.
For example, if you have a dataset with images of faces of individuals, you may wish to create a set of test cases that
divides your dataset by
datapoint.race (categorical) or
The datasets quickstart provides a more hands-on example of defining test cases.
Metrics describe the criteria used to evaluate the performance of a model and compare it with other models over a given dataset and its test cases.
Kolena supports defining metrics by applying standard aggregations over datapoint level results or by leveraging
common machine learning aggregations, such as Precision or
F1 Score. Once defined, users may also specify highlighting for metrics, indicating if
Higher is better, or if
Lower is better.
The datasets quickstart provides a more hands-on example of defining metrics.
Once you've defined your test cases and metrics, you can view and compare model results in the
Quality Standards tab,
which provides a quick and standardized high level overview of which models perform best over your different test cases.
For step-by-step instructions, take a look at the quickstart for model comparison.
Debugger tab of a dataset allows users to experiment with test cases and metrics without saving them off to the
team level quality standards. This allows users to search for meaningful test cases and experiment with different
metrics with the confidence that they can safely save these updated values to their quality standards when comfortable,
without the risk of accidentally replacing what the team has previously defined. This also provides a view for
visualizing results and relations in plots.
For step-by-step instructions, take a look at the quickstart for results exploration.