Descriptor Processing Facility#

Overview#

The section describes descriptors and all the processes and objects corresponding to them.

Descriptor itself is a set of object parameters that are specially encoded. Descriptors are typically more or less invariant to various affine object transformations and slight color variations. This property allows efficient use of such sets to identify, lookup, and compare real-world objects images.

To receive a descriptor you should perform a special operation called descriptor extraction.

The general case of descriptors usage is when you compare two descriptors and find their similarity score. Thus you can identify persons by comparing their descriptors with your descriptors database.

All descriptor comparison operations are called matching. The result of the two descriptors matching is a distance between components of the corresponding sets that are mentioned above. Thus, from a magnitude of this distance, we can tell if two objects are presumably the same.

There are two different tasks solved using descriptors: person identification and person reidentification.\

Person Identification Task#

Facial recognition is the task of making an identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on the identification of those detected faces. To solve this problem, we use a face descriptor, which extracted from an image face of a person. A person’s face is invariable throughout his life.

In a case of the face descriptor, the extraction is performed from object image areas around some previously discovered facial landmarks, so the quality of the descriptor highly depends on them and the image it was obtained from.

The process of face recognition consists of 4 main stages:

face detection in an image;
warping of face detection – compensation of affine angles and centering of a face;
descriptor extraction;
comparing of extracted descriptors (matching).

Additionally you can extract face features (gender, age, emotions, etc) or image attributes (light, dark, blur, specularity, illumination, etc.).

Person Reidentification Task#

Note! This functionality is experimental.

The person reidentification enables you to detect a person who appears on different cameras. For example, it is used when you need to track a human, who appears on different supermarket cameras. Reidentification can be used for:

building of human traffic warm maps;
analysing of visitors movement across cameras network;
tracking of visitors across cameras network;
search for a person across the cameras network in case when face was not captured (e.g. across CCTV cameras in the city);
etc.

For reidentification purposes, we use so-called human descriptors. The extraction of the human descriptor is performed using the detected area with a person's body on an image or video frame. The descriptor is a unique data set formed based on a person's appearance. Descriptors extracted for the same person in different clothes will be significantly different.

The face descriptor and the human descriptor are almost the same from the technical point of view, but they solve fundamentally different tasks.

The process of reidentifications consists of the following stages:

human detection in an image;
warping of human detection – centering and cropping of the human body;
descriptor extraction;
comparing of extracted descriptors (matching).

The human descriptor does not support the descriptor score at all. The returned value of the descriptor score is always equal to 1.0.

The human descriptor is based on to the following criteria:

clothes (type and color);
shoes;
accessories;
hairstyle;
body type;
anthropometric parameters of the body.

Note. The human reidentification algorithm is trained to work with input data that meets the following requirements:

input images should be in R8G8B8 format (will work worse in night mode);
the smaller side of input crop should be greater than 60 px;
inside of same crop, one person should occupy more than 80% (sometimes several persons fit into the same frame).

Descriptor#

Descriptor object stores a compact set of packed properties as well as some helper parameters that were used to extract these properties from the source image. Together these parameters determine descriptor compatibility. Not all descriptors are compatible with each other. It is impossible to batch and match incompatible descriptors, so you should pay attention to what settings do you use when extracting them. Refer to section "Descriptor extraction" for more information on descriptor extraction.

Descriptor Versions#

Face descriptor algorithm evolves with time, so newer FaceEngine versions contain improved models of the algorithm.

Descriptors of different versions are incompatible! This means that you cannot match descriptors with different versions. This does not apply to base and mobilenet versions of the same model: they are compatible.

See chapter "Appendix A. Specifications" for details about performance and precision of different descriptor versions.

Descriptor version 59 is the best one by precision. And it works well Personal protective equipment on face like medical mask.

Descriptor version may be specified in the configuration file (see section "Configuration data" in chapter "Core facility").

Face descriptor#

Currently next versions are available: 54, 56, 57, 58 and 59. Descriptors have backend and mobilenet implementations. Versions 57, 58 and 59 supports only backend implementation. Backend versions more precise, but mobilenet faster and have smaller model files (See Appendix A). Version 59 is the most precise.
See Appendix A.1 and A.2 for details about performance and precision of different descriptor versions.

Human descriptor#

Currently, only three versions of human descriptors are available: 102, 103, 104

To create a human descriptor, human batch, human descriptor extractor, human descriptor matcher you must pass the human descriptor version

DV_MIN_HUMAN_DESCRIPTOR_VERSION = 102 or
HDV_TRACKER_HUMAN_DESCRIPTOR_VERSION = 102, //!< human descriptor for tracking of people on one camera, light and fast version
HDV_PRECISE_HUMAN_DESCRIPTOR_VERSION = 103, //!< precise human descriptor, heavy and slow
HDV_REGULAR_HUMAN_DESCRIPTOR_VERSION = 104, //!< regular human descriptor, use it by default for multi-cameras tracking

Descriptor Batch#

When matching significant amounts of descriptors, it is desired that they reside continuously in memory for performance reasons (think cache-friendly data locality and coherence). This is where descriptor batches come into play. While descriptors are optimized for faster creation and destruction, batches are optimized for long life and better descriptor data representation for the hardware.

A batch is created by the factory like any other object. Aside from type, a size of the batch should be specified. Size is a memory reservation this batch makes for its data. It is impossible to add more data than specified by this reservation.

Next, the batch must be populated with data. You have the following options:

add an existing descriptor to the batch;
load batch contents from an archive.

The following notes should be kept in mind:

When adding an existing descriptor, its data is copied into the batch. This means that the descriptor object may be safely released.
When adding the first descriptor to an empty batch, initial memory allocation occurs. Before that moment the batch does not allocate. At the same moment, internal descriptor helper parameters are copied into the batch (if there are any). This effectively determines compatibility possibilities of the batch. When the batch is initialized, it does not accept incompatible descriptors.

After initialization, a batch may be matched pretty much the same way as a simple descriptor.

Like any other data storage object, a descriptor batch implements the ::clear() method. An effect of this method is the batch translation to a non-initialized state except memory deallocation. In other words, batch capacity stays the same, and no memory is reallocated. However, an actual number of descriptors in the batch and their parameters are reset. This allows re-populating the batch.

Memory deallocation takes place when a batch is released.

Care should be taken when serializing and deserializing batches. When a batch is created, it is assigned with a fixed-size memory buffer. The size of the buffer is embedded into the batch BLOB when it is saved. So, when allocating a batch object for reading the BLOB into, make sure its size is at least the same as it was for the batch saved to the BLOB (even if it was not full at the moment). Otherwise, loading fails. Naturally, it is okay to deserialize a smaller batch into a larger another batch this way.

Descriptor Extraction#

Descriptor extractor is the entity responsible for descriptor extraction. Like any other object, it is created by the factory. To extract a descriptor, aside from the source image, you need:

a face detection area inside the image (see chapter "Detection facility")
a pre-allocated descriptor (see section "Descriptor")
a pre-computed landmarks (see chapter "Image warping")

A descriptor extractor object is responsible for this activity. It is represented by the straightforward IDescriptorExtractor interface with only one method extract(). Note, that the descriptor object must be created prior to calling extract() by calling an appropriate factory method.

Landmarks are used as a set of coordinates of object points of interest, that in turn determine source image areas, the descriptor is extracted from. This allows extracting only data that matters most for a particular type of object. For example, for a human face we would want to know at least definitive properties of eyes, nose, and mouth to be able to compare it to another face. Thus, we should first invoke a feature extractor to locate where eyes, nose, and mouth are and put these coordinates into landmarks. Then the descriptor extractor takes those coordinates and builds a descriptor around them.

Descriptor extraction is one of the most computation-heavy operations. For this reason, threading might be considered. Be aware that descriptor extraction is not thread-safe, so you have to create an extractor object per a worker thread.

It should be noted, that the face detection area and the landmarks are required only for image warping, the preparation stage for descriptor extraction (see section "Image warping"). If the source image is already warped, it is possible to skip these parameters. For that purpose, the IDescriptorExtractor interface provides a special extractFromWarpedImage() method.

Descriptor extraction implementation supports execution on GPUs.

The IDescriptorExtractor interface provides extractFromWarpedImageBatch() method which allows you to extract batch of descriptors from the image array in one call. This method achieve higher utilization of GPU and better performance (see the "GPU mode performance" table in appendix A chapter "Specifications").

Also IDescriptorExtractor returns descriptor score for each extracted descriptor. Descriptor score is normalized value in range [0,1], where 1 - face in the warp, 0 - no face in the warp. This value allows you filter descriptors extracted from false positive detections.

The IDescriptorExtractor interface provides extractFromWarpedImageBatchAsync() method which allows you to extract batch of descriptors from the image array asynchronously in one call. This method achieve higher utilization of GPU and better performance (see the "GPU mode performance" table in appendix A chapter "Specifications").

Note: Method extractFromWarpedImageBatchAsync() is experimental, and it's interface may be changed in the future. Note: Method extractFromWarpedImageBatchAsync() is not marked as noexcept and may throw an exception.

Descriptor Matching#

It is possible to match a pair (or more) previously extracted descriptors to find out their similarity. With this information, it is possible to implement face search and other analysis applications.

By means of match function defined by the IDescriptorMatcher interface it is possible to match a pair of descriptors with each other or a single descriptor with a descriptor batch (see section "Descriptor batch" for details on batches).

A simple rule to help you decide which storage to opt for:

when searching among less than a hundred descriptors use separate IDescriptor objects;
when searching among bigger number of descriptors use a batch.

When working with big data, a common practice is to organize descriptors in several batches keeping a batch per worker thread for processing.

Be aware that descriptor matching is not thread-safe, so you have to create a matcher object per a worker thread.

Descriptor Indexing#

Using HNSW#

In order to accelerate the descriptor matching process, a special index may be created for a descriptor batch. With the index, matching becomes a two-stage process:

First, you need to build indexed data structure - index - using IIndexBuilder. This is quite slow process so it is not supposed to be done frequently. You build it by appending IDescriptor objects or IDescriptorBatch objects and finally using build method - IIndexBuilder::buildIndex;
Once you have index, you can use it to search nearest neighbors for passed descriptor very fast.

There are two types of indexes: IDenseIndex and IDynamicIndex. The interface difference is very simple: dense index is read only and dynamic index is editable: you can append or remove descriptors.

You can only build a dynamic index. So how can you get a dense index? The answer is through deserialization. Imagine you have several processes that might need to search in index. One option is for every one of them to build index separately, but as mentioned before building of index is very slow and you probably don't want to do it more than needed. So the second option is to build it once and serialize it to file. This is where the dense and dynamic difference arises: formats used to store these two types of index are different. From the user's point of view, the difference is that dense index loads faster, but it is read only. Once loaded, there are no performance difference in terms of searching on these two types of indexes.

To serialize index use IDynamicIndex::saveToDenseIndex or IDynamicIndex::saveToDynamicIndex methods. To deserialized use IFaceEngine::loadDenseIndex or IFaceEngine::loadDynamicIndex.

Index files are not cross-platform. If you serialize index on some platform, it's only usable on that exact platform. Not only the operating system breaks compatibility, but also different architecture of CPU might break it.

HNSW index isn’t supported on embedded and 32-bit desktop platforms.