ROC calculation task¶
ROC (or Receiver Operating Characteristic) is a performance measurement for classification problem at various thresholds settings. The ROC-curve is plotted with TPR (True Positive Rate) against the FPR (False Positive Rate). TPR is a true positive match pair count divided by a count of total expected positive match pairs, and FPR is a false positive match pair count divided by a count of total expected negative match pairs. Each point (FPR, TPR) of the ROC-cure corresponds to a certain similarity threshold. See more at wiki.
Using ROC the model performance is determined by looking at:
the area under the ROC-curve (or AUC);
type I and type II error rates equal point, i.e. the ROC-curve and the secondary main diagonal intersection point.
The model performance also determined by hit into the top-N probability, i.e. probability of hit a positive match pair into the top-N for any match result group sorted by similarity.
To make ROC task it needs markup. One can optionally specify threshold_hit_top (default 0) to calculate hit into the top-N probability, the match limit (default 5), key_FPRs - list of key FPR values to calculate ROC-curve key points, and filters with account_id. Also, it needs account_id for task creation.
ROC calculation process¶
ROC calculation is done in several steps:
match faces or attributes with each other, and get match result groups; match results are cropped (limit is applied).
prepare match pairs sorted descending by similarity.
calculate hit into the top-1, top-2, top-3, top-4, top-5 probability (similarity threshold is applied).
moving along the array of match pairs sorted descending by similarity calculate ROC-curve points:
calculate cumulative counts of true positive match pairs (TP) and false positive match pairs (FP).
calculate FPR and TPR:
\[{TPR} = \frac {TP} {P},\]\[{FPR} = \frac {FP} {T - P},\]where
P - a count of total expected positive match pairs:
\[{P} = \sum_{i=0}^{{N}_{labels}} {{N}_{faces}^{label_i} ({N}_{faces}^{label_i}-1)},\]T - a count of the total expected match pairs:
\[{T} = {N}_{faces} ({N}_{faces}-1),\]\({N}_{faces}\) - a count of faces in markup,
\({N}_{labels}\) - a count of labels (or group ids) in markup,
\({N}_{faces}^{label_i}\) - a count of faces with i-th label,
calculate the ROC-curve and the secondary main diagonal intersection point where where type I error rate (1 - TPR) and type II error rate (FPR) are equal.
calculate AUC.
calculate ROC-curve key points with specified key FPR values.
Markup¶
Markup is expected in the following format:
[{'face_id': <face_id>, 'label': <label>}]
or
[{'attribute_id': <attribute_id>, 'label': <label>}]
Label (or group id) can be a number or any string.
Example:
[{'face_id': '94ae2c69-277a-4e46-817d-543f7d3446e2', 'label': 0},
{'face_id': 'cd6b52be-cdc1-40a8-938b-a97a1f77d196', 'label': 1},
{'face_id': 'cb9bda07-8e95-4d71-98ee-5905a36ec74a', 'label': 2},
{'face_id': '4e5e32bb-113d-4c22-ac7f-8f6b48736378', 'label': 3},
{'face_id': 'c43c0c0f-1368-41c0-b51c-f78a96672900', 'label': 2}]
or
[{'attribute_id': 'd156be44-4196-4124-b1a8-4c6254cb6389', 'label': 0},
{'attribute_id': 'e14635f8-d680-4225-bbfc-c6da8aea916c', 'label': 1},
{'attribute_id': '650cdab4-7e5a-48c4-9f1d-a25d6a4d9837', 'label': 2},
{'attribute_id': '35556ae5-ae26-49ee-9beb-553ada7049bb', 'label': 3},
{'attribute_id': '10053eea-58b3-4613-bd76-192c66ecc036', 'label': 2}]
Filters¶
There is an account id filter to get faces from service.
account_id: account id filter