Glossary#

Track - Information on face position of a single person on a frame sequence.
Tracking - Function that follows an object (face) through a frame sequence.
Best shot - Image suitable for facial recognition.

Working with TrackEngine#

TrackEngine is based on face detection and analysis methods provided by FaceEngine library. This document does not cover FaceEngine usage in detail, for more information please see FaceEngine_Handbook.pdf.

To create a TrackEngine instance use the following global factory functions

ITrackEngine tsdk::createTrackEngine(fsdk::IFaceEngine engine, const char configPath, vsdk::IVehicleEngine vehicleEngine = nullptr, const fsdk::LaunchOptions *launchOptions = nullptr)
- engine - pointer to FaceEngine instance (should be already initialized)
- configPath - path to TrackEngine config file
- vehicleEngine - pointer to the VehicleEngine object (if with vehicle logic)
- launchOptions - launch options for sdk functions
- return value - pointer to ITrackEngine
ITrackEngine tsdk::createTrackEngine(fsdk::IFaceEngine engine, const fsdk::ISettingsProviderPtr& provider, vsdk::IVehicleEngine vehicleEngine = nullptr, const fsdk::LaunchOptions launchOptions = nullptr)
- engine - pointer to FaceEngine instance (should be already initialized)
- provider - settings provider with TrackEngine configuration
- vehicleEngine - pointer to the VehicleEngine object (if with vehicle logic)
- launchOptions - launch options for sdk functions
- return value - pointer to ITrackEngine

It is not recommended to create multiple TrackEngine instances in one application.

In the end of processing user must call ITrackEngine stop method.

void ITrackEngine::stop() Stops processing.

The main interface to TrackEngine is Stream - an entity to which you submit video frames. To create a stream use the following TrackEngine method

IStream ITrackEngine::createStream(StreamParams params = nullptr)
- params - the pointer to stream specific parameters. It's optional parameter, if valid, then it overrides config params for the Stream. Consider StreamParams for details.
- return value - pointer to IStream

Note: User must own this raw pointer by fsdk::Ref, e.g. with fsdk::acquire and reset all refs to all streams before Track Engine object desctruction, otherwise memory leak or/and UB are guaranteed. This is valuable especially in languages where order of objects desctruction is not guaranteed, so users should manage objects lifetime manually (e.g. python). See examples.

Users can create multiple streams working concurrently (in case when need to track faces from multiple cameras). In each stream the engine detects faces and builds their tracks. Each face track has its own unique identifier. It is therefore possible to group face images belonging to the same person with their track ids. Please note, tracks may break from time to time either due to people leaving the visible area or due to the challenging detection conditions (poor image quality, occlusions, extreme head poses, etc).

There are two ways to work with TE. First one is async pushFrame/callbacks method (See IStream::pushFrame), which allows users to use simple API with async push frames per each Stream and get all tracking result events/data in another thread in callbacks. The second one is more complex, but flexible for developers. It's estimator API (See ITrackEngine::track), that works like detector in SDK, so users should pass batch of streams/frames to function and get tracking results for all input streams.

Each of the methods has pros and cons. Main advantage of async method is simplicity of client side code, so users mostly don't have to deal with exceptions handling, multithreading issues and creating queues for multiple stream batches gathering and results deferred processing. All that logic is implemented in TE for callback-mode = 1. The common solution is to create a stream per each frame source, setup callbacks observers, and submit frames to each stream one by one basis. Frames can be pushed to each stream from different threads, depending on architecture of application, but simple case implies one separate thread per each stream and frame source. Tracking results are obtained in the callbacks (batched or single) in another thread, so it's the place where users should write logic of processing results. If users want to control logic of tracking in maximum, they should use estimator tracking API. One of the key advantages of estimator API is minimal memory consumption of Track Engine (so possibility to achieve better performance), because in this case it doesn't keep images in any queues (images are kept in the tracks data still, though). When users work with estimator tracking API, they don't have to deal with many of the config parameters, regulating any buffer sizes or batching settings, e.g "frames-buffer-size", "callback-buffer-size", "min-frames-batch-size", "max-frames-batch-gather-timeout". Also, streams shouldn't be joined in the end. In this case Stream serves only as a state object for source tracking.

Note: To use estimator API, users should set config parameter callback-mode to 0, otherwise value 1 must be set (default value is 1).

Estimator API:

fsdk::Ref track(fsdk::Span streams, fsdk::Span frames) Updates stream tracks by new frame per each stream and returns tracking result batch as callbacks compatible data.
- streams - streams stream identifers, must contain only unique id-s, otherwise undefined behavior. See IStream::getId.
- frames - frames input frames per stream. See also IStream::pushFrame and Frame.
- return value - Ref to tracking result with callbacks arguments data. Consider ITrackingResultBatch.

It's not thread safe now, so concurrent calls aren't allowed, otherwise undefined behavior. The function isn't exception safe like pushFrame.

Note: regulating batch size for track is the critical step to achieve high performance of tracking. Higher values lead to better utilization/throughput (especially on GPU), but increase latency of system.

For prevalidation of track inputs validate non-throwing function is useful.

fsdk::Result validate(fsdk::Span streams, fsdk::Span frames, fsdk::Span> outErrors) Validate input of multiple streams/frames in a single function call.
- streams - streams stream identifers array.
- frames - frames input frames per stream.
- outErrors - errors output span of errors for each stream/frame.
- return value - Result with last error code.

Async API:

bool IStream::pushFrame(const fsdk::Image &frame, uint32_t frameId, tsdk::AdditionalFrameData *data = nullptr) Pushes a single frame to the stream buffer.
- frame - input frame image. Format must be R8G8B8 OR R8G8B8X8.
- frameId - unique identifier for frames sequence.
- data - is any additional data that a developer wants to receive in callbacks-realization. It must be allocated only with new or be equal to nullptr. Do not use the delete-operator. The garbage collector is implemented inside TrackEngine for this param.
- return value - true if frame was appended to the queue for processing, false otherwise - frame was skipped because of full queue.

Also there are some variations of this method: pushCustomFrame, pushFrameWaitFor, pushCustomFrameWaitFor.

TrackEngine emits various events to inform you what is happening. The events occur on a per-stream basis.

When Stream has to be finished, in callback-mode user must call IStream join method before stream destruction. Stream shouldn't be used after join for processing, only 'getter' functions are available.

void IStream::join() Blocks current thread until all frames in this Stream will be handled and all callbacks will be executed.

Note: Ignoring this step can lead to unexpected behavior (TE writes warning log in this case).

You can set up an observer to receive and react to events. There are two types of observers: per-stream specific single observer and batched observer for all streams. Per-stream observers are set deprecated now remain only for compatibility with old versions.

Note: It's highly recommended to use new batched observers API instead of old per-stream one.

Batched observers have some advantages over per-stream observers:

reduce and set fixed number of threads created by TrackEngine itself (see section Threading for details).
eliminate performance overhead from multiple concurrently working threads used for per-stream callbacks.
allow to easily use batched SDK API without additional aggregation of data from single callbacks. Both for GPU/CPU batched SDK API improves performance (for GPU effect is much more significant).
give more information in output (per-stream callbacks functions signatures remain the same because of compatibility with old versions)

Note: you have to setup either single per-stream observer or batched one for all streams, but not both at the same time.

Stream observer interfaces:

Per-stream observers:

IBestShotObserver
IVisualObserver
IDebugObserver

Batched observers:

IBatchBestShotObserver
IBatchVisualObserver
IBatchDebugObserver

By implementing one or several observer interfaces it is possible to define custom processing logic in your application.

IBestShotPredicate type defines recognition suitability criteria for face detections. By implementing a custom predicate one may alter the best shot selection logic and, therefore, specify which images will make it to the recognition phase.

Setting per-stream observer API example:

void IStream::setBestShotObserver(tsdk::IBestShotObserver* observer) Sets a best shot observer for the Stream.
- observer - pointer to the observer object, see IBestShotObserver. Don't set to nullptr, if you want disable it, then use IStream::setObserverEnabled with false.

Setting batched observer API example:

void ITrackEngine::setBatchBestShotObserver(tsdk::IBatchBestShotObserver *observer) Sets a best shot observer for all streams.
- observer - pointer to the batched observer object, see IBatchBestShotObserver. Don't set to nullptr, if you want disable it, then use IStream::setObserverEnabled with false.

IBestShotObserver#

void bestShot(const tsdk::DetectionDescr& descr) called for each emerged best shot. It provides information on a best shot, including frame number, detection coordinates, cropped still image, and other data (see `DetectionDescr structure definition below for details.) Default implementation does nothing.
- descr - best shot detection description

struct TRACK_ENGINE_API DetectionDescr {
    //! Index of the frame
    tsdk::FrameId frameIndex;

    //! Index of the track
    tsdk::TrackId trackId;

    //! Source image
    fsdk::Image image;

    fsdk::Ref<ICustomFrame> customFrame;

    //! Face landmarks
    fsdk::Landmarks5 landmarks;

#ifndef MOBILE_BUILD
    //! Human landmarks
    fsdk::HumanLandmarks17 humanLandmarks;

    //! NOTE: only for internal usage, don't use this field, it isn't valid ptr
    fsdk::IDescriptorPtr descriptor;
#endif

    //! Is it full detection or redetect step
    bool isFullDetect;

    //! Detections flags
    // needed to determine what detections are valid in extraDetections 
    // see EDetectionFlags
    uint32_t detectionsFlags;

    //! Detection
    // always is valid, even when detectionsFlags is combination type
    // useful for one detector case
    // see detectionObject
    fsdk::Detection detection;

    //! extra detections
    // needed when detectionsFlags has combination type,
    // e.g. for EDetection_Body_Face extraDetections[EDetection_Face], extraDetections[EDetection_Body] are valid
    // note: for simple detection type extra detection with corresponding index is valid too
    fsdk::Detection extraDetections[EDetectionObject::EDetection_Simple_Count];

    bool hasDetectionFlag(EDetectionObject obj) {
        return (detectionsFlags & (1 << obj)) ? true : false;
    }

    void setDetectionFlag(EDetectionObject obj, bool enable) {
        if (enable) {
            detectionsFlags |= (1 << obj);
        }
        else {
            detectionsFlags &= ~(1 << obj);
        }
    }

    void setExtraDetection(EDetectionObject obj, const fsdk::Detection &detection) {
        extraDetections[obj] = detection;
    }
};

void trackEnd(const tsdk::TrackId& trackId) tells that the track with trackId has ended and no more best shots should be expected from it. Default implementation does nothing.
- trackId - id of the track
void trackStatusUpdate(tsdk::FrameId frameId, tsdk::TrackId trackId, tsdk::TrackStatus status) tells that the track status updated.
- frameId - id of the frame
- trackId - id of the track
- status - track new status

/** @brief Track status enum. (see human tracking algorithm section in docs for details)
*/
enum class TrackStatus : uint8_t {
    ACTIVE = 0,
    NONACTIVE
};

void trackReIdentificate(tsdk::FrameId frameId, tsdk::TrackId trackId, tsdk::TrackId reidTrackId) tells that the track with id = trackId was matched to one of the old non-active tracks with id = reidTrackId. See section ReIdentification for details.
- frameId - id of the frame
- trackId - id of the track, that was matched to one of the old non-active tracks
- reidTrackId - id of the non-active track, that successfully mathed to track with id = trackId

IVisualObserver#

void visual(const tsdk::FrameId &frameId, const fsdk::Image &image, const tsdk::TrackInfo * trackInfo, const int nTrack) allows to visualize current stream state. It is intended mainly for debugging purposes. The function must be overloaded.
- frameId - current frame id
- image - frame image
- trackInfo - array of currently active tracks

struct TRACK_ENGINE_API TrackInfo {
        //! Face landmarks
        fsdk::Landmarks5 landmarks;

#if !TE_MOBILE_BUILD
        //! Human landmarks
        fsdk::HumanLandmarks17 humanLandmarks;
#endif

        //! Last detection for track
        fsdk::Rect rect;

        //! Id of track
        TrackId trackId;

        //! Score for last detection in track
        float lastDetectionScore;

        //! Detector id
        TDetectorID m_sourceDetectorId;

        //! number of detections for track (count of frames when track was updated with detect/redetect)
        size_t detectionsCount;

        //! id of frame, when track was created
        tsdk::FrameId firstFrameId;

        //! Is it (re)detected or tracked bounding box
        bool isDetector;
};

nTrack - number of tracks

IDebugObserver#

void debugDetection(const tsdk::DetectionDebugInfo& descr) detector debug callback. Default implementation does nothing.
- descr - detection debugging description

struct DetectionDebugInfo {
    //! Detection description
    DetectionDescr descr;

    //! Is it detected or tracked bounding box
    bool isDetector;

    //! Filtered by user bestShotPredicate or not.
    bool isFiltered;

    //! Best detection for current moment or not
    bool isBestDetection;
};

void debugForegroundSubtraction(const tsdk::FrameId& frameId, const fsdk::Image& firstMask, const fsdk::Image& secondMask, fsdk::Rect * regions, int nRegions) background subtraction debug callback. Default implementation does nothing.
- frameId - frame id of foreground
- firstMask - result of background subtraction operation
- secondMask - result of background subtraction operation after procedures of erosion and dilation
- regions - regions obtained after background subtraction operation
- nRegions - number of returned regions

BestShotPredicate#

bool checkBestShot(const tsdk::DetectionDescr& descr) Predicate for best shot detection. This is the place to perform any required quality checks (by means of, e.g. FaceEngines Estimators). This function must be overloaded.
- descr - detection description
- return value - true, if descr has passed the check, false otherwise

VisualPredicate#

bool needRGBImage(const tsdk::FrameId frameId, const tsdk::AdditionalFrameData *data) Predicate for visual callback. It serves to decide whether to output original image in visual callback or not. This function can be overloaded. Default implementation returns true.
- frameId - id of the frame
- data - frame additional data, passed by user
- return value - true, if original image (or rgb image for custom frame) needed in output in visual callback, false otherwise

IBatchBestShotObserver#

void bestShot(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the bestShot callback.
- streamIDs - array of streams id
- data - array of callback data for each stream

struct TRACK_ENGINE_API BestShotCallbackData {
    //! detection description. see 'DetectionDescr' for details
    tsdk::DetectionDescr descr;

    //! additional frame data, passed by user in 'pushFrame'. see 'AdditionalFrameData' for details
    tsdk::AdditionalFrameData *frameData;
};

void trackEnd(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the trackEnd callback.
- streamIDs - array of streams id
- data - array of callback data for each stream

/**
* @brief Track end reason. See 'TrackEndCallbackData' for details.
*/
enum class TrackEndReason : uint8_t {
    //! not used anymore, deprecated value (may be removed in future releases)
    DEFAULT,
    //! some unknown reason (shouldn't meet in normal workflow)
    UNKNOWN,
    //! intersection with another track (see "kill-intersected-detections")
    INTERSECTION,
    //! tracker is disabled or failed to update track
    TRACKER_FAIL,
    //! track's gone out of frame
    OUT_OF_FRAME,
    //! `skip-frames` parameter logic (see docs or config comments for details)
    SKIP_FRAMES,
    //! finished by user (see `IStream::finishTracks` for details)
    USER,
    //! non-active track ends because of lifetime expired
    NONACTIVE_TIMEOUT,
    //! active track ends because of reidentification with old non-active track
    ACTIVE_REID,
    //! non-active track ends because of reidentification with older non-active track
    //  (that means, that current track couldn't been updated and was matched to old non-active at the same time)
    NONACTIVE_REID,
    //! all stream tracks end on stream finishing (IStream::join called)
    STREAM_END
};

struct TRACK_ENGINE_API TrackEndCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! track id
    tsdk::TrackId trackId;

    //! parameter implies reason of track ending
    // NOTE: now it's using only for human tracking, don't use this for other detectors
    TrackEndReason reason;
};

void trackStatusUpdate(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the trackStatusUpdate callback.
- streamIDs - array of streams id
- data - array of callback data for each stream

struct TRACK_ENGINE_API TrackStatusUpdateCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! track id
    tsdk::TrackId trackId;

    //! new track status
    tsdk::TrackStatus status;
};

void trackReIdentificate(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the trackReIdentificate callback. See section ReIdentification for details.
- streamIDs - array of streams id
- data - array of callback data for each stream

struct TRACK_ENGINE_API TrackReIdentificateCallbackData {
    //! id of frame
    tsdk::FrameId frameId;

    //! id of track, that was matched to one of the old non-active tracks
    tsdk::TrackId trackId;

    //! id of the non-active track, that successfully mathed to track with id = 'trackId'
    // see human tracking algorithm section in docs for details
    tsdk::TrackId reidTrackId;

    //! similarity from matching of tracks descriptors
    float similarity;
};

IBatchVisualObserver#

void visual(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the visual callback.
- streamIDs - array of streams id
- data - array of callback data for each stream

struct TRACK_ENGINE_API VisualCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! this is either original image (if 'pushFrame' used) or RGB image got from custom frame convert (is 'pushCustomFrame' used)
    fsdk::Image image;

    //! tracks array raw ptr
    tsdk::TrackInfo *trackInfo;

    //! number of tracks
    int nTrack;

    //! additional frame data, passed by user in 'pushFrame'. See 'AdditionalFrameData' for details.
    tsdk::AdditionalFrameData *frameData;
};

IBatchDebugObserver#

void debugForegroundSubtraction(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the debugForegroundSubtraction callback.
- streamIDs - array of streams id
- data - array of callback data for each stream
void debugDetection(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the debugDetection callback.
- streamIDs - array of streams id
- data - array of callback data for each stream

struct TRACK_ENGINE_API DebugForegroundSubtractionCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! first mask of  the foreground subtraction
    fsdk::Image firstMask;

    //! second mask of  the foreground subtraction
    fsdk::Image secondMask;

    //! regions array raw ptr
    fsdk::Rect *regions;

    //! number of regions
    int nRegions;
};

/** @brief Detection data for debug callback.
    */
struct TRACK_ENGINE_API DebugDetectionCallbackData {
    //! Detection description
    DetectionDescr descr;

    //! Is it detected or tracked bounding box
    bool isDetector;

    //! Filtered by user bestShotPredicate or not.
    bool isFiltered;

    //! Best detection for current moment or not
    bool isBestDetection;
};

void IStream::setObserverEnabled(tsdk::StreamObserverType type, bool enabled) Enables or disables observer.
- type - type of observer
- enabled - flag to enable/disable observer

For full Stream API see class IStream from IStream.h header file.

Tracks lifetime#

All tracks live until they meet the specific conditions of tracking algorithm (e.g. out of frame bounds enough or skip-frames logic). Human tracking algorithm has its own rules for tracks lifetime (see section Human tracking algorithm). Users can finish tracks manually with IStream function finishTracks. TrackEndReason implies reason of track finishing.

Human tracking algorithm#

Human tracking algorithm differs from the faces one. Tracker feature isn't used at all anymore, only detect/redetect are used. For matching tracks with new detections IOU metrics is used. The parameter human:iou-connection-threshold is used for threshold. For better tracking accuracy the ReIdentification feature is used to merge different tracks of one human (for ReIdentification details see the next section).

For face tracking algorithm when detect/redetect fails, then track is updated with tracker, but for human tracking in that case (or under some other conditions) it moves to non-active group of tracks. trackStatusUpdate callback with status = TrackStatus::NONACTIVE is invoked to indicate about that. Tracks from that group are invisible for all observers and they don't participate in common tracking processing (detect/redetect).

Note, that the parameter skip-frames doesn't affect on human tracking algorithm. Human tracks are finished according to its own logic. There are some cases, when trackEnd is called for human track (see TrackEndCallbackData reasonfield):

non-active track is finished by timeout set by config parameter "human":"non-active-tracks-lifetime" (reason = NONACTIVE_TIMEOUT).
active track is finished because of reIdentification with another old track from the non-active group. Note, that the old track id becomes active again. First active track's id is just replaced with the older one and trackStatusUpdate is called with status = TrackStatus::ACTIVE for the old track id to indicate, that it's active again, trackEnd is called for the current active track id to indicate it doesn't exist anymore (reason = ACTIVE_REID). For this case config parameter "human":"reid-matching-detections-number" sets lifetime of the active track (in number of frames) needed for matching to the old non-active tracks.
active track is finished if it to be moved to the non-active group (e.g. detector/redetect fails), but it successfully matched (reIdentification called) to the old non-active one at the same time (reason = NONACTIVE_REID). Also in this case lifetime counter of the non-active track is reset.

Some algorithm notes and parameters relation. After detect/redetect all found detections are filtered by some conditions: - Overlapped detections may be removed. For overlapping estimation IOU metric is used. If IOU is higher than threshold parameter other:kill-intersection-value, then no one, both or detection with lower detection score is removed from further processing, depending on parameter remove-overlapped-strategy. - detections, considered to be horizontal are removed. remove-horizontal-ratio sets detection width to height ratio threshold, used for removing horizontal detections.

ReIdentification#

ReIdentification is a feature, that improves tracking accuracy. ReIdentification is intended to solve problem, described in section Human tracking algorithm. It matches two tracks with different id-s and merges them into one track with id of the older one. trackReIdentificate callback signals about successfull matching and merging of the two tracks into one. The feature's behavior is regulated by config parameters "human":"reid-matching-threshold", "reid-matching-detections-number". Two tracks will be matched only if similarity between them higher then "reid-matching-threshold". If you don't want ReIdentification feature at all, then just set up this parameter value higher than 1.

Note: current version of the TrackEngine supports ReIdentification feature only for human tracking.

Memory consumption#

TrackEngine itself doesn't allocate much memory for internal calculations, but in callback-mode = 1 it keeps frames/images in frames and callbacks queues and current tracks data. The main tips to reduce memory consumption is to set frames-buffer-size, callback-buffer-size and skip-frames low enough. To achieve high optimized minimum memory consumption solution users should use estimator API ITrackEngine::track and don't keep images in any queues or minimize that in maximum.

Threading#

TrackEngine is multi-threaded. The number of threads is configurable and depends on the currently bound FaceEngine settings and type of observers been used (batched or single). TrackEngine calls Observers functions in separate threads. If batched observers are used, then only one additional thread will be created and used for all batched callbacks and all streams. If per-stream single observers are used, then for each stream it's own separate callback thread will be created and used for it's callbacks invocations. In this case all callbacks are invoked from the one thread per-stream. Whatever callback type is used, it is recommended to avoid long-time running tasks in these functions, because pushing to callback buffer blocks main processing thread, so main processing thread always waits until there is free slot in that buffer to push a callback (buffer's size is set by parameter callback-buffer-size, see below). The checkBestShot and needRGBImage functions are called in the main frame processing thread. It is also recommended to avoid expensive computations in these functions. Perfectly, these predicates should take zero performance cost.

Threads count guarantees (excluding calculating threads of SDK): - If batched observers are used, then users have guarantee, that TrackEngine uses only 2-3 threads itself. - If per-stream single observers are used, then users have guarantee, that TrackEngine uses only 1-2 + number of created streams threads itself.

Tracker#

TrackEngine uses tracker to update the current detections in the case of detect/redetect fail. TrackEngine supports several trackers (see tracker-type parameter in the config, section Settings). Some platforms don't support all trackers. vlTracker is the tracker based on neural networks. It's the only tracker, that can be used for GPU/NPU processing (other trackers, except of none, don't support GPU/NPU) and for processing concurrently running multiple streams (it has batching implementation, so provides better CPU utilization). KCF/opencv trackers are simple CPU trackers, that should be used only in case of few tracks in total for all streams at the moment. None tracker choosen disables tracking feature at all, so it leads to better performance, but degradation of tracking quality.