Test Cases

What is a Test Case?

At its core, a test case in Kolena is a benchmark dataset against which metrics are computed.
Test cases can be as small or as large as necessary — in some circumstances, a test case containing hundreds of thousands of examples is desirable. In other cases, a test case with a handful of laser-focused examples is best.
Types of test cases include:
  • Unit Tests: examples focusing on a particular scenario or subclass, e.g. "cars from the rear in low lighting"
  • Regression Tests: examples improved upon by a particular model training push (squashed bug) or one segment of the "long tail"
  • Integration Tests: examples mirroring deployment conditions in terms of composition, distribution, preprocessing, etc.
    • Integration tests may also be subsets of normal tests that are run by models both pre-packaging and post-packaging to ensure that operations such as quantization do not unacceptably impact performance
How a test case is defined varies by workflow:
Object Detection / Instance Segmentation
Face Recognition
When building your own workflow, a test case may contain any number (>0) of test samples of any arbitrary data type, such as images with annotations or documents, each with ground truth(s) specific to your workflow.
In the Object Detection and Instance Segmentation workflows, a test case contains any number (>0) of test images, each with zero or more ground truth bounding boxes/segmentation masks defining the extents and the label of the object being bounded.
In the Classification workflow, a test case contains any number (>0) of test images, each with zero or more ground truth labels corresponding to the class(es) the image belongs to.
In the Face Recognition 1:1 workflow, a test case contains any number (>0) of image pairs, each with a ground truth label indicating whether the two images do or do not contain the same person.

Why Test Cases?

Metrics computed against aggregate benchmarks don't tell the full story.
If you are an engineer, data scientist, product manager, team lead, or anyone else who builds or sells ML products, you've likely asked (or been asked):
  • What are my model's failure modes (bugs)?
  • If I've trained a new model, what in particular has improved or regressed from my previous model?
  • If my new model has improved from 96% F1 score to 98%, has it improved across all scenarios?
  • Which model should we deploy?
  • How do I know if my model is ready to be deployed?
Test cases help you give direct, repeatable, and methodological answers to these questions.

Test Case Best Practices & FAQ

How large should my test case be?
How many test cases should I have?
What should I include in a test case?
Can I include the same example in multiple test cases?
Should I modify my test cases or keep them frozen?
Copy link
On this page
What is a Test Case?
Why Test Cases?
Test Case Best Practices & FAQ