Metrics Glossary#
This section contains guides for different metrics used to measure model performance.
Each ML use case requires different metrics. Using the right metrics is critical for understanding and meaningfully comparing model performance. In each metrics guide, you can learn about the metric with examples, its limitations and biases, and its intended uses.
-
Accuracy measures how well a model predicts correctly. It's a good metric for assessing model performance in simple cases with balanced data.
-
Average precision summarizes a precision-recall (PR) curve into a single threshold-independent value representing model's performance across all thresholds.
-
Averaging Methods: Macro, Micro, Weighted
Different averaging methods for aggregating metrics for multiclass workflows, such as classification and object detection.
-
Confusion matrix is a structured plot describing classification model performance as a table that highlights counts of objects with predicted classes (columns) against the actual classes (rows), indicating how confused a model is.
-
F1-score is a metric that combines two competing metrics, precision and recall with an equal weight. It symmetrically represents both precision and recall as one metric.
-
Geometry matching is the process of matching inferences to ground truths for computer vision workflows with a localization component. It is a core building block for metrics such as TP, FP, and FN, and any metrics built on top of these, like precision, recall, and F1-score.
-
IoU measures overlap between two geometries, segmentation masks, sets of labels, or time-series snippets. Also known as Jaccard index in classification workflow.
-
Precision measures the proportion of positive inferences from a model that are correct. It is useful when the objective is to measure and reduce false positive inferences.
-
Precision-recall curve is a plot that gauges machine learning model performance by using precision and recall. It is built with precision on the y-axis and recall on the x-axis computed across many thresholds.
-
Recall, also known as true positive rate (TPR) and sensitivity, measures the proportion of all positive ground truths that a model correctly predicts. It is useful when the objective is to measure and reduce false negative ground truths, i.e. model misses.
-
The counts of TP, FP, FN and TN ground truths and inferences are essential for summarizing model performance. They are the building blocks of many other metrics, including accuracy, precision, and recall.