Quickstart: Object Detection

In this quickstart tutorial we'll use the COCO 2014 Validation dataset and a stubbed out example model to create and run tests for the Object Detection workflow.

Getting Started

With the kolena-client Python client installed, first let's initialize a client session:
import os
import kolena
kolena.initialize(os.environ["KOLENA_TOKEN"], verbose=True)
The data used in this tutorial is publicly available in the kolena-public-datasets S3 bucket in a metadata.csv file:
import pandas as pd
DATASET = "coco-2014-val"
BUCKET = "s3://kolena-public-datasets"
df = pd.read_csv(f"{BUCKET}/{DATASET}/meta/metadata.csv")
To load CSVs directly from S3, make sure to install the s3fs Python module:pip3 install s3fs[boto3]
This metadata.csv file describes an object detection dataset with the following columns:
  • locator: location of the image in S3
  • label: label corresponding to the described by this record's bounding box
  • min_x: x coordinate for top left corner of bounding box
  • min_y: y coordinate for top left corner of bounding box
  • max_x: x coordinate for bottom right corner of bounding box
  • max_y: y coordinate for bottom right corner of bounding box
There is one record in this table for each ground truth bounding box in the dataset, meaning a given locator may be present multiple times.
For brevity, the COCO dataset has been pared down to only 14 classes.

Step 1: Creating Tests

With our data already in an S3 bucket and metadata loaded into memory, we can start creating test cases!
Let's create a simple test case containing the entire dataset:
from kolena.detection import TestCase, TestImage
from kolena.detection.ground_truth import BoundingBox
complete_test_case = TestCase(f"complete {DATASET}", images=[
TestImage(locator, dataset=DATASET, ground_truths=[
BoundingBox(record.label, (record.min_x, record.min_y), (record.max_x, record.max_y))
for record in df_locator.itertuples()
]) for locator, df_locator in df.groupby("locator")
This dataset-sized test case is a good place to start, but let's drill a little deeper to create test cases for each class in the dataset:
complete_test_case_images = complete_test_case.load_images()
class_test_cases = [
TestCase(f"{label} ({DATASET})", images=[
image.filter(lambda gt: gt.label == label)
for image in complete_test_case_images
]) for label, df_label in df.groupby("label")
Note that we're including every image in each class' test case such that there are a sizable number of true negative images.
See test case best practices for more information on balancing positive and negative examples in your test cases.
In this tutorial we created only a few single simple test cases, but more advanced test cases can be generated in a variety of fast and scalable ways. See Creating Test Cases for details.
Now that we have basic test cases for our entire dataset and for each class within the dataset, let's create a test suite to group them together:
from kolena.detection import TestSuite
test_suite = TestSuite(f"complete {DATASET}", test_cases=[
complete_test_case, *class_test_cases
This test suite represents a basic starting point for testing on Kolena.

Step 2: Running Tests

With basic tests defined for the COCO dataset, we can start testing our models.
To start testing, we create an InferenceModel object describing the model being tested:
from typing import List
from kolena.detection import InferenceModel, TestImage
from kolena.detection.inference import BoundingBox
def infer(test_image: TestImage) -> List[BoundingBox]:
model = InferenceModel("example-model", infer=infer, metadata=dict(
description="Example model from quickstart tutorial",
Finally, let's test:
from kolena.detection import test
test(model, test_suite)
That's it! We can now visit the web platform to analyze and debug our model's performance on this test suite:

Advanced: Custom Metrics

Kolena provides a standard set of metrics that we can use to evaluate models. However, there will be application-specific metrics that are not covered by the standard set. For these scenarios, we can provide a callback function when invoking test to compute custom metrics for each test case.
For example, we can implement metrics "mean IOU" and run test like:
from typing import List, Tuple, Optional
import numpy as np
from kolena.detection import CustomMetrics, TestImage
from kolena.detection.inference import BoundingBox
# compute mean IOU between ground truths and inferences
def custom_metrics(
inferences: List[Tuple[TestImage, Optional[List[BoundingBox]]]]
) -> CustomMetrics:
# inferences contains all images of a test case and corresponding predictions
compute_iou = lambda img, bboxes: 0
return {"Mean IOU": np.mean([compute_iou(i, b) for i, b in inferences])}
test(model, test_suite, custom_metrics_callback=custom_metrics)
This new metric "Mean IOU" would be ready in the web platform along the standard set of metrics.


In this quickstart tutorial we learned how to create new tests for object detection datasets and how to test object detection models on Kolena.
What we learned here just scratches the surface of what's possible with Kolena and covered a fraction of the kolena-client API — now that we're up and running, we can think about ways to create more detailed tests, improve existing tests, and dive deep into model behaviors.