Descriptor Processing Facility#
Overview#
The section describes descriptors and all the processes and objects corresponding to them.
Descriptor itself is a set of object parameters that are specially encoded. Descriptors are typically more or less invariant to various affine object transformations and slight color variations. This property allows efficient use of such sets to identify, lookup, and compare real-world objects images.
To receive a descriptor you should perform a special operation called descriptor extraction.
The general case of descriptors usage is when you compare two descriptors and find their similarity score. Thus you can identify persons by comparing their descriptors with your descriptors database.
All descriptor comparison operations are called matching. The result of the two descriptors matching is a distance between components of the corresponding sets that are mentioned above. Thus, from a magnitude of this distance, we can tell if two objects are presumably the same.
There are two different tasks solved using descriptors: person identification and person reidentification.
Person Identification Task#
Facial recognition is the task of making an identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on the identification of those detected faces. To solve this problem, we use a face descriptor, which extracted from an image face of a person. A person’s face is invariable throughout his life.
In a case of the face descriptor, the extraction is performed from object image areas around some previously discovered facial landmarks, so the quality of the descriptor highly depends on them and the image it was obtained from.
The process of face recognition consists of 4 main stages:
- face detection in an image;
- warping of face detection – compensation of affine angles and centering of a face;
- descriptor extraction;
- comparing of extracted descriptors (matching).
Additionally you can extract face features (gender, age, emotions, etc) or image attributes (light, dark, blur, specularity, illumination, etc.).
Person Reidentification Task#
Note! This functionality is experimental.
The person reidentification enables you to detect a person who appears on different cameras. For example, it is used when you need to track a human, who appears on different supermarket cameras. Reidentification can be used for:
- building of human traffic warm maps;
- analysing of visitors movement across cameras network;
- tracking of visitors across cameras network;
- search for a person across the cameras network in case when face was not captured (e.g. across CCTV cameras in the city);
- etc.
For reidentification purposes, we use so-called human descriptors. The extraction of the human descriptor is performed using the detected area with a person's body on an image or video frame. The descriptor is a unique data set formed based on a person's appearance. Descriptors extracted for the same person in different clothes will be significantly different.
The face descriptor and the human descriptor are almost the same from the technical point of view, but they solve fundamentally different tasks.
The process of reidentifications consists of the following stages:
- human detection in an image;
- warping of human detection – centering and cropping of the human body;
- descriptor extraction;
- comparing of extracted descriptors (matching).
The human descriptor does not support the descriptor score at all. The returned value of the descriptor score is always equal to 1.0.
The human descriptor is based on to the following criteria:
- clothes (type and color);
- shoes;
- accessories;
- hairstyle;
- body type;
- anthropometric parameters of the body.
Note. The human reidentification algorithm is trained to work with input data that meets the following requirements:
- input images should be in R8G8B8 format (will work worse in night mode);
- the smaller side of input crop should be greater than 60 px;
- inside of same crop, one person should occupy more than 80% (sometimes several persons fit into the same frame).
Descriptor#
Descriptor object stores a compact set of packed properties as well as some helper parameters that were used to extract these properties from the source image. Together these parameters determine descriptor compatibility. Not all descriptors are compatible with each other. It is impossible to batch and match incompatible descriptors, so you should pay attention to what settings do you use when extracting them. Refer to section "Descriptor extraction" for more information on descriptor extraction.
Descriptor Versions#
Face descriptor algorithm evolves with time, so newer FaceEngine versions contain improved models of the algorithm.
Descriptors of different versions are incompatible! This means that you cannot match descriptors with different versions. This does not apply to base and mobilenet versions of the same model: they are compatible.
See chapter "Appendix A. Specifications" for details about performance and precision of different descriptor versions.
Descriptor version 62 is the best one by precision. And it works well with the personal protective equipment on face like medical mask.
Descriptor version may be specified in the configuration file (see section "Configuration data" in chapter "Core facility").
Face descriptor#
Currently next versions are available: 58, 59, 60, 62. Descriptors have backend and mobilenet implementations. Versions 58, 62 supports only backend implementation. Backend versions more precise, but mobilenet faster and have smaller model files. See Appendix A.1 and A.2 for details about performance and precision of different descriptor versions.
Human descriptor#
Versions of human descriptors are available: 102, 103, 104, 105, 106, 107, 108, 109, 110, 112, 113, 115, 116
Versions 102, 103, 104 are deprecated.
To create a human descriptor, human batch, human descriptor extractor, human descriptor matcher you must pass the human descriptor version
- DV_MIN_HUMAN_DESCRIPTOR_VERSION = 102 or
- HDV_TRACKER_HUMAN_DESCRIPTOR_VERSION = 102, //!< Deprecated. human descriptor for tracking of people on one camera, light and fast version
- HDV_PRECISE_HUMAN_DESCRIPTOR_VERSION = 103, //!< Deprecated. precise human descriptor, heavy and slow
- HDV_REGULAR_HUMAN_DESCRIPTOR_VERSION = 104, //!< Deprecated. regular human descriptor, use it by default for multi-cameras tracking
- HDV_TRACKER_V2 = 105, //!< human descriptor for tracking of people, light and fast version.
- HDV_PRECISE_V2 = 106, //!< precise human descriptor, heavy and slow.
- HDV_REGULAR_V2 = 107, //!< regular human descriptor.
- HDV_TRACKER_V3 = 108, //!< human descriptor for tracking of people, light and fast version.
- HDV_PRECISE_V3 = 109, //!< precise human descriptor, heavy and slow.
- HDV_REGULAR_V3 = 110, //!< regular human descriptor.
- HDV_PRECISE_V4 = 112, //!< precise human descriptor, heavy and slow.
- HDV_REGULAR_V4 = 113 //!< regular human descriptor.
- HDV_PRECISE_V5 = 115, //!< precise human descriptor, heavy and slow.
- HDV_REGULAR_V5 = 116 //!< regular human descriptor.
Descriptor Batch#
When matching significant amounts of descriptors, it is desired that they reside continuously in memory for performance reasons (think cache-friendly data locality and coherence). This is where descriptor batches come into play. While descriptors are optimized for faster creation and destruction, batches are optimized for long life and better descriptor data representation for the hardware.
A batch is created by the factory like any other object. Aside from type, a size of the batch should be specified. Size is a memory reservation this batch makes for its data. It is impossible to add more data than specified by this reservation.
Next, the batch must be populated with data. You have the following options:
- add an existing descriptor to the batch;
- load batch contents from an archive.
The following notes should be kept in mind:
- When adding an existing descriptor, its data is copied into the batch. This means that the descriptor object may be safely released.
- When adding the first descriptor to an empty batch, initial memory allocation occurs. Before that moment the batch does not allocate. At the same moment, internal descriptor helper parameters are copied into the batch (if there are any). This effectively determines compatibility possibilities of the batch. When the batch is initialized, it does not accept incompatible descriptors.
After initialization, a batch may be matched pretty much the same way as a simple descriptor.
Like any other data storage object, a descriptor batch implements the ::clear() method. An effect of this method is the batch translation to a non-initialized state except memory deallocation. In other words, batch capacity stays the same, and no memory is reallocated. However, an actual number of descriptors in the batch and their parameters are reset. This allows re-populating the batch.
Memory deallocation takes place when a batch is released.
Care should be taken when serializing and deserializing batches. When a batch is created, it is assigned with a fixed-size memory buffer. The size of the buffer is embedded into the batch BLOB when it is saved. So, when allocating a batch object for reading the BLOB into, make sure its size is at least the same as it was for the batch saved to the BLOB (even if it was not full at the moment). Otherwise, loading fails. Naturally, it is okay to deserialize a smaller batch into a larger another batch this way.
Descriptor Extraction#
Descriptor extractor is the entity responsible for descriptor extraction. Like any other object, it is created by the factory. To extract a descriptor, aside from the source image, you need:
- a face detection area inside the image (see chapter "Detection facility")
- a pre-allocated descriptor (see section "Descriptor")
- a pre-computed landmarks (see chapter "Image warping")
A descriptor extractor object is responsible for this activity. It is represented by the straightforward IDescriptorExtractor interface with only one method extract(). Note, that the descriptor object must be created prior to calling extract() by calling an appropriate factory method.
Landmarks are used as a set of coordinates of object points of interest, that in turn determine source image areas, the descriptor is extracted from. This allows extracting only data that matters most for a particular type of object. For example, for a human face we would want to know at least definitive properties of eyes, nose, and mouth to be able to compare it to another face. Thus, we should first invoke a feature extractor to locate where eyes, nose, and mouth are and put these coordinates into landmarks. Then the descriptor extractor takes those coordinates and builds a descriptor around them.
Descriptor extraction is one of the most computation-heavy operations. For this reason, threading might be considered. Be aware that descriptor extraction is not thread-safe, so you have to create an extractor object per a worker thread.
It should be noted, that the face detection area and the landmarks are required only for image warping, the preparation stage for descriptor extraction (see chapter "Image warping"). If the source image is already warped, it is possible to skip these parameters. For that purpose, the IDescriptorExtractor interface provides a special extractFromWarpedImage() method.
Descriptor extraction implementation supports execution on GPUs.
The IDescriptorExtractor interface provides extractFromWarpedImageBatch() method which allows you to extract batch of descriptors from the image array in one call. This method achieve higher utilization of GPU and better performance (see the "GPU mode performance" table in appendix A chapter "Specifications").
Also IDescriptorExtractor returns descriptor score for each extracted descriptor. Descriptor score is normalized value in range [0,1], where 1 - face in the warp, 0 - no face in the warp. This value allows you filter descriptors extracted from false positive detections.
The IDescriptorExtractor interface provides extractFromWarpedImageBatchAsync() method which allows you to extract batch of descriptors from the image array asynchronously in one call. This method achieve higher utilization of GPU and better performance (see the "GPU mode performance" table in appendix A chapter "Specifications").
Note: Method extractFromWarpedImageBatchAsync() is experimental, and it's interface may be changed in the future.
Note: Method extractFromWarpedImageBatchAsync() is not marked as noexcept and may throw an exception.
Descriptor Matching#
It is possible to match a pair (or more) previously extracted descriptors to find out their similarity. With this information, it is possible to implement face search and other analysis applications.
By means of match function defined by the IDescriptorMatcher interface it is possible to match a pair of descriptors with each other or a single descriptor with a descriptor batch (see section "Descriptor batch" for details on batches).
A simple rule to help you decide which storage to opt for:
- when searching among less than a hundred descriptors use separate IDescriptor objects;
- when searching among bigger number of descriptors use a batch.
When working with big data, a common practice is to organize descriptors in several batches keeping a batch per worker thread for processing.
Be aware that descriptor matching is not thread-safe, so you have to create a matcher object per a worker thread.
Descriptor Indexing#
Using HNSW#
To accelerate a descriptor matching process, you can create a special index for a descriptor batch. With the index, matching becomes a two-stage process:
First stage: build an indexed data structure — index — by using IIndexBuilder
. This is quite a slow process, so it is not supposed to be done frequently.
To build it, you can:
- Append the `IDescriptor` or `IDescriptorBatch` objects
- Use the `IIndexBuilder::buildIndex` build method
Second stage: use the index to quickly search the nearest neighbors for passed descriptors.
There are two types of indexes:
IDenseIndex
Read-only. Loading faster thanIDynamicIndex
. Once loaded, there are no performance differences in terms of searching between the two indexes.IDynamicIndex
Editable. Allows you to append and remove descriptors. If you remove descriptors, they are removed from the graph for searching.
To saveIDynamicIndex
with removed descriptors, first, calleraseRemovedDescriptors
fromIDynamicIndex
structure. The state of the stored dynamic search index is not guaranteed for implementation reasons. If the descriptors are successfully erased, the remaining ID will move up. The shift depends on the number of removed handles. If the index state after erasing is valid, you can continue to use it for searching, otherwise you will have to rebuild it. > Important: We recommend to avoid operations that remove descriptors and rebuild the index by callingIIndexBuilder::buildIndex
from a new set of descriptors and save the result as the dynamic index one more time.
You can only build a dynamic index. To get a dense index, you need to make it via deserialization. If you have several processes that might need to search in the index, do one of the following:
- Build an index for every process separately.
> Warning: Building an index is a slow process. - Build an index once and serialize it to a file.
Index serialization#
To serialize an index, use the IDynamicIndex::saveToDenseIndex
or IDynamicIndex::saveToDynamicIndex
methods.
To deserialize an index, use the IFaceEngine::loadDenseIndex
or IFaceEngine::loadDynamicIndex
methods.
Important notes:
- Index files are not cross-platform. If you serialize an index on some platform, it is only usable on that exact platform. An operating system, as well as a different CPU architecture, may break compatibility.
- Embedded and 32-bit desktop platforms do not support the HNSW index.
- After large index files are loaded into RAM, the first lookup may take additional time due to process allocations. We recommend that you perform an idle search of descriptors to warm up.
Dynamic index evaluation scheme. This feature is experimental. Backward compatibility is not guaranteed.#
In LUNA SDK v.5.17.0 and later, you can remove descriptors from a dynamic index in amounts of up to 80-90% of the total count. Deleting descriptors affects the internal structure of the index. The number of removed descriptors increases. For this reason, you must assess an index state.
Simple rules#
- Call
isValidForSearch
every 10% of deletions from the original number of descriptors. - Call
evaluate
after removing of 60% descriptors and every 10% of deletions after. - Rebuilding an index is mandatory in a case of getting
DIS_INVALID
. - Rebuilding an index is recommended if your index coefficient values are less than the ones in the table below (
searchForEvaluation
= 20):
Index size | Value |
---|---|
10M | 0.5 |
20M | 0.4 |
30M | 0.4 |
40M | 0.35 |
isValidForSearch method#
Call the isValidForSearch
method after every removal of 10% of the original descriptor count. This method returns an index state. If the received state differs from DIS_VALID
, you must rebuild the index to avoid unpredictable behavior.
The method specification is presented below:
virtual ResultValue<FSDKError, DynamicIndexState> isValidForSearch() const noexcept = 0;
Where available range of DynamicIndexState
is:
enum DynamicIndexState : uint8_t {
DIS_INVALID = 0, //!< DIS_INVALID - index is invalid for search.
DIS_VALID, //!< DIS_VALID - index is valid for search.
DIS_UNKNOWN, //!< DIS_UNKNOWN - index state is unknown.
DIS_COUNT
};
evaluate method#
Call the evaluate
method after removing 60% of the original descriptor count.
The evaluate
method takes significantly longer to run compared to isValidForSearch
.
You can specify searchForEvaluation
and numThreads
in the IndexBuilder::Settings
section in faceengine.conf to tune it.
The number of threads numThreads
should be selected not greater than the number of cores in the system and not less than 0. By default, the number of threads is 0 and corresponds to the number of available cores.
The larger the searchForEvaluation
value is, the more precise the evaluation will be, and the longer evaluate()
method will run.
The method specification is presented below:
virtual ResultValue<FSDKError, float> evaluate() const noexcept = 0;
The table below shows estimated execution time, in minutes:
searchForEvaluation
is LengthSearch.
Index size | LengthSearch 20 | LengthSearch 50 | LengthSearch 100 | LengthSearch 200 |
---|---|---|---|---|
1.6M | 1.65 | 2.44 | 2.73 | 3.19 |
10M | 5.60 | 8.61 | 16.56 | 28.43 |
30M | 22.10 | 32.03 | 39.60 | 58.63 |
Processor: Intel Xeon Skylake (IBRS)
Number of CPU cores: 32
CPU clock speed: 2.1 GHz
RAM capacity: 113 GB
It is necessary to rebuild the index after receiving the DIS_INVALID
state regardless of the value
.
We recommend you to rebuild the index in the DIS_VALID
state when the value
is below the threshold.
If the index state is DIS_INVALID
, you can save it to a file and load subsequently. The following method can be used to get a descriptor using its identifier:
virtual Result<FSDKError> descriptorByIndex(const DescriptorId index, IDescriptor* descriptor) const noexcept = 0;