Quickstart#
Install Kolena to set up rigorous and repeatable model testing in minutes.
In this quickstart guide, we'll use the
age_estimation
example integration to
demonstrate the how to curate test data and test models in Kolena.
Install kolena
#
Install the kolena
Python package to programmatically interact with Kolena:
Clone the Examples#
The kolenaIO/kolena repository contains a number of example integrations to clone and run directly:
-
Age Estimation using the Labeled Faces in the Wild (LFW) dataset
-
Facial Keypoint Detection using the 300 Faces in the Wild (300-W) dataset
-
Text Summarization using OpenAI GPT-family models and the CNN-DailyMail dataset
-
Example: Object Detection (2D) ↗
2D Object Detection using the COCO dataset
-
Example: Object Detection (3D) ↗
3D Object Detection using the KITTI dataset
-
Example: Binary Classification ↗
Binary Classification of class "Dog" using the Dogs vs. Cats dataset
-
Example: Multiclass Classification ↗
Multiclass Classification using the CIFAR-10 dataset
-
Example: Semantic Textual Similarity ↗
Semantic Textual Similarity using the STS benchmark dataset
-
Question Answering using the Conversational Question Answering (CoQA) dataset
-
Example: Semantic Segmentation ↗
Semantic Segmentation on class
Person
using the COCO-Stuff 10K dataset
To get started, clone the kolena
repository:
With the repository cloned, let's set up the
age_estimation
example:
Now we're up and running and can start creating test suites and testing models.
Create Test Suites#
Each of the example integrations comes with scripts for two flows:
seed_test_suite.py
: Create test cases and test suite(s) from a source datasetseed_test_run.py
: Test model(s) on the created test suites
Before running seed_test_suite.py
,
let's first configure our environment by populating the KOLENA_TOKEN
environment variable. Visit the
Developer page to
generate an API token and copy and paste the code snippet into your environment:
We can now create test suites using the provided seeding script:
After this script has completed, we can visit the Test Suites page to view our newly created test suites.
In this age_estimation
example, we've created test suites stratifying the LFW dataset (which is stored as a CSV in
S3) into test cases by age, estimated race, and estimated gender.
Test a Model#
After we've created test suites, the final step is to test models on these test suites. The age_estimation
example
provides the ssrnet
model for this step:
poetry run python3 age_estimation/seed_test_run.py \
"ssrnet" \
"age :: labeled-faces-in-the-wild [age estimation]" \
"race :: labeled-faces-in-the-wild [age estimation]" \
"gender :: labeled-faces-in-the-wild [age estimation]"
Note: Testing additional models
In this example, model results have already been extracted and are stored in CSV files in S3. To run a new model,
plug it into the infer
method in seed_test_run.py
.
Once this script has completed, click the results link in your console or visit Results to view the test results for this newly tested model.
Conclusion#
In this quickstart, we used an example integration from kolenaIO/kolena to create
test suites from the Labeled Faces in the Wild (LFW) dataset and test the
open-source ssrnet
model on these test suites.
This example shows us how to define an ML problem as a workflow for testing in Kolena, and can be arbitrarily extended with additional metrics, plots, visualizations, and data.