Search…
Testing Models
Testing your models on Kolena is a simple process that involves loading the images in your dataset, performing inference, and pushing inferences into to the platform.
In Kolena, a model is a deterministic transformation from inputs to outputs. During testing, each image being tested is surfaced to your model exactly once. Regardless of how many test cases or test suites a given image belongs to, you only have to perform inference and upload results a single time.
kolena-client provides two main routes for testing: the simple way and the detailed way.

The Simple Way

kolena-client provides a test method for each workflow to handle the entire testing process.
The test method requires an InferenceModel instance with method(s) implementing that model's deterministic transformation from inputs to outputs.
With our model defined implementing the necessary method(s) to perform inference, testing is as simple as:
Custom
Object Detection
Instance Segmentation
Classification
Face Recognition (1:1)
For this example, let's assume our model is abstracted away to the my_model.infer method.
import my_model
from my_workflow import Model
model = Model("example-model", infer=my_model.infer, metadata=dict(
description="any free-form metadata can be included in this dictionary",
))
Testing requires an Evaluator, implementing the metrics computation specific to your workflow.
from my_workflow import MyEvaluator, MyEvaluatorConfiguration
evaluator = MyEvaluator(configurations=[
MyEvaluatorConfiguration(example="a"),
MyEvaluatorConfiguration(example="b"),
])
With our model and evaluator defined, testing is performed by creating a TestRun:
from my_workflow import TestSuite, TestRun
# test our model on previously-created test suite 'A'
TestRun.test(model, TestSuite("A"), evaluator)
from kolena.detection import InferenceModel, test, TestImage, TestSuite
from kolena.detection.inference import BoundingBox
def infer(test_image: TestImage) -> List[BoundingBox]:
"""Transform a TestImage into a list of BoundingBox inferences"""
# Step 1: load image at `test_image.locator`
# Step 2: perform inference
# Step 3: transform inferences into BoundingBox objects and return
model = InferenceModel("example-detection-model", infer=infer)
test_suites = [TestSuite("A"), TestSuite("B")]
test(model, *test_suites)
from kolena.detection import InferenceModel, test, TestImage, TestSuite
from kolena.detection.inference import SegmentationMask
def infer(test_image: TestImage) -> List[SegmentationMask]:
"""Transform a TestImage into a list of SegmentationMask inferences"""
# Step 1: load image at `test_image.locator`
# Step 2: perform inference
# Step 3: transform inferences into SegmentationMask objects and return
model = InferenceModel("example-detection-model", infer=infer)
test_suites = [TestSuite("A"), TestSuite("B")]
test(model, *test_suites)
from kolena.classification import InferenceModel, test, TestImage, TestSuite
def infer(test_image: TestImage) -> List[Tuple[str, float]]:
"""Transform a TestImage into a list of (label, confidence) inferences"""
# Step 1: load image at `test_image.locator`
# Step 2: perform inference and return
model = InferenceModel("example-classification-model", infer=infer)
test_suites = [TestSuite("A"), TestSuite("B")]
test(model, *test_suites)
from kolena.fr import InferenceModel, test, TestSuite
def extract(locator: str) -> Optional[np.ndarray]:
"""Extract an embedding representing the face in the image"""
# Step 1: load image at `locator`
# Step 2: run model pipleine -- detect, align, and extract
# Step 3: return extracted embedding, or None if no face was detected
def compare(embedding_a: np.ndarray, embedding_b: np.ndarray) -> float:
"""Compare two embeddings and generate similarity score"""
model = InferenceModel("exapmle-fr11-model", extract=extract, compare=compare)
test_suites = [TestSuite.load_by_name("A")]
test(model, *test_suites)
We recommend using this simplified test interface to start, and moving to the detailed TestRun interface later on if necessary.

The Detailed Way

Each workflow exports a TestRun object to provide more control over the flow of data during the testing process.
When testing with TestRun, a normal Model object can be created without any infer implementation.
Object Detection
Instance Segmentation
Classification
Face Recognition (1:1)
from kolena.detection import Model
model = Model("example-detection-model", metadata=dict(
description="simple model descriptor (note no `infer` method is necessary)",
))
For this example, let's assume our model is abstracted away to the my_code.infer method that transforms a TestImage to a list of BoundingBox inferences.
from kolena.detection import TestRun, TestSuite
from my_code import infer # model implementation
with TestRun(model, TestSuite("A"), TestSuite("B")) as test_run:
# perform any batching, parallelization, etc. desired here
for test_image in test_run.iter_images():
test_run.add_inferences(test_image, infer(test_image))
from kolena.detection import Model
model = Model("example-detection-model", metadata=dict(
description="simple model descriptor (note no `infer` method is necessary)",
))
For this example, let's assume our model is abstracted away to the my_code.infer method that transforms a TestImage to a list of SegmentationMask inferences.
from kolena.detection import TestRun, TestSuite
from my_code import infer # model implementation
with TestRun(model, TestSuite("A"), TestSuite("B")) as test_run:
# perform any batching, parallelization, etc. desired here
for test_image in test_run.iter_images():
test_run.add_inferences(test_image, infer(test_image))
from kolena.classification import Model
model = Model("example-classification-model", metadata=dict(
description="simple model descriptor (note no `infer` method is necessary)",
))
For this example, let's assume our model is abstracted away to the my_code.infer method that transforms a TestImage to a list of (label, confidence) inferences.
from kolena.classification import TestRun, TestSuite
from my_code import infer # model implementation
with TestRun(model, TestSuite("A"), TestSuite("B")) as test_run:
# perform any batching, parallelization, etc. desired here
for test_image in test_run.iter_images():
test_run.add_inferences(test_image, infer(test_image))
Testing for the face recognition 1:1 workflow is divided into two phases:
  1. 1.
    Extracting embeddings: each image in the test suite is surfaced. During this phase your model detects bounding boxes surrounding faces in each image, estimates landmarks corresponding to the eyes, nose, and mouth of these faces, and extracts embeddings vectors used in the next phase to compute similarity scores between images.
  2. 2.
    Computing similarities: during this phase, the embeddings extracted in the previous phase are used to compute float similarity scores for each image pair defined in the test suite. These similarity scores are used to compute the metrics reported in the platform.
from kolena.fr import Model
model = Model("example-fr11-model", metadata=dict(
description="simple model descriptor (note no `extract` and `compare` methods are necessary)",
))
Next, let's start testing our model by creating a TestRun object:
from kolena.fr import TestRun, TestSuite
test_run = TestRun(model, TestSuite.load_by_name("A"))
If we have a model my_model with an extract method that takes a locator and produces an embedding, the embeddings extraction phase goes as follows:
import pandas as pd
from my_model import extract
# load a dataframe of all images in the test suite(s)
# can also load batches by providing a `batch_size`
df_image = test_run.load_remaining_images()
# compute embeddings for each image and
df_result = pd.DataFrame([
(record.image_id, extract(record.locator))
for record in df_image.itertuples()
], columns=["image_id", "embedding"])
# additional columns that may be populated as desired to provide extra
# debugging information that is visible in the web platform gallery
df_result[["bounding_box", "landmarks_input_image", "landmarks",
"quality_input_image", "quality", "acceptability",
"fr_input_image", "failure_reason"]] = None
test_run.upload_image_results(df_result)
The schema for each pd.DataFrame used during testing can be found at kolena.fr.datatypes
Once we have extracted and uploaded embeddings for all images, we can complete our test run by computing similarities for all image pairs in this test suite:
# load embeddings extracted in previous step and image pairs
# defined in this test suite
df_embedding, df_pair = test_run.load_remaining_pairs()
# reindex embeddings in a dictionary for easier access
embs = {r.image_id: r.embedding for r in df_embedding.itertuples()}
# compute similarity scores for all pairs
df_pair["similarity"] = [
my_model.similarity(embs[r.image_a_id], embs[r.image_b_id])
for r in df_pair.itertuples()
]
test_run.upload_pair_results(df_pair[["image_pair_id", "similarity"]])
Reasons to use TestRun instead of test include (but are not limited to):
  • Parallelization: loading all test examples upfront and dividing the inference tasks between multiple workers operating in parallel
  • Batching: performing inference on multiple images at once instead of processing one-at-a-time
  • Uploading additional information: certain models produce output metadata in addition to inferences. TestRun provides various hooks to upload this additional data to Kolena during testing

Testing Best Practices & FAQ

Loading my dataset from my bucket makes testing slow. How can I speed things up?
My model performs inference more efficiently in batches. How can I leverage this during testing?
If I'm using a built-in workflow, can I use custom evaluation criteria to compute metrics for my model?
Copy link
On this page
The Simple Way
The Detailed Way
Testing Best Practices & FAQ