Keypoints Detection: detecting a series of keypoints (landmarks) representing e.g. the keypoints of a face or a body.
Building a Workflow
Any model can be tested on Kolena by building a workflow that meets its specific requirements. If your use case meets any of the following conditions, we recommend building your own workflow:
Your use case doesn't fit into one of the standard boxes defined as a built-in workflow (e.g. object detection)
You use in-house or non-standard evaluation criteria
You use exotic data types
You want to plug in your own data visualizations
You want full control over evaluation and metrics
When building a workflow you have control over every piece of the problem:
Test Sample: the input to your model. If your model is a standard single-image computer vision model, this might be locator of an image in an S3 bucket, e.g. s3://my-bucket/image.png.
Any additional inputs required by your model, such as annotations produced by upstream models in your pipeline, are also included as a part of the test sample.
Ground Truth: the target against which your model is evaluated.
Inference: the prediction produced by your model. A model is a deterministic transformation from your test sample type into your inference type. Inferences are typically compared against ground truths to produce metrics.
Metrics: the metrics that describe your model's performance:
Single-test-sample metrics, such as the loss calculated between a ground truth and inference object, the number of false positive predictions, etc.
Object detection example from the Common Objects in Context (COCO) dataset.
Object detection models attempt to localize and classify objects in an image. Kolena supports single-class and multi-class object detection models identifying objects with rectangular (object detection) and arbitrary (instance segmentation) geometry.
Instance Segmentation and Object Detection are functionally equivalent, differing only in the geometry of the detected object. For brevity, this documentation discusses object detection only.
Built-in Workflow: Classification
"Dog" classification example from the Dogs vs. Cats dataset.
Kolena supports the following types of classification models:
Classification model predicts a single class, using a threshold on prediction confidence to bisect the test set
Classification model predicts multiple classes, where only the class with the highest confidence is treated as a positive prediction
Classification model predicts multiple classes, with each prediction over a threshold considered positive (i.e. ensemble of binary classifiers)
Built-in Workflow: Face Recognition (1:1)
Example Face Recognition (1:1) image pair from the Labeled Faces in the Wild dataset.
Face Recognition (1:1) workflow is built to test models answering the question: do these two images depict the same person? The terminology and methodology are adapted from the NIST FRVT 1:1 challenge.
The Face Recognition (1:1) workflow is also referred to as Face Verification.
The embedding extracted by your model representing the face depicted in an image
The similarity score computed from two embeddings extracted by your model, where higher = more similar, lower = less similar
A threshold applied to computed similarity scores, above which two faces are considered the same
Two images depicting the same person
Two images depicting different people
False Match (FM)
An incorrect model classification of an imposter pair as a genuine pair
False Match Rate (FMR)
The percentage of imposter pairs that are incorrectly classified as genuine pairs (i.e. similarity is above threshold)
False Non-Match (FMR)
An incorrect model classification of a genuine pair as an imposter pair
False Non-Match Rate (FNMR)
The percentage of genuine pairs that are incorrectly classified as imposter pairs (i.e. similarity is below threshold)
The test case(s) against which similarity score thresholds are computed (see below)
Test Suite Baseline
In the Face Recognition (1:1) workflow, similarity score thresholds are computed based on target false match rates over a special section of your test suite known as the baseline.
In the simple standard case, you typically want the baseline to be the entire test suite. However, having control over the baseline allows you to define test suites that answer questions like:
How does my model perform on a certain population when the thresholds are computed using a different population (i.e. FMR/FNMR shift)?
How does my model perform in different deployment conditions (e.g. low lighting, infrared) when using thresholds computed from standard conditions? (i.e. can my model generalize to unseen scenarios?)
The Face Recognition (1:1) workflow is built to accommodate standard face recognition model pipelines with the following steps:
Keypoints Detection models attempt to identify a series of keypoints or landmarks such as parts of a face or body. Kolena supports Keypoints Detection for ordered set of keypoints for single instance images.
The Keypoints Detection workflow is also referred to as Landmark Detection.