Skip to content

Glossary#

  • Track - Information on face position of a single person on a frame sequence.
  • Tracking - Function that follows an object (face) through a frame sequence.
  • Best shot - Image suitable for facial recognition.

Working with TrackEngine#

TrackEngine is based on face detection and analysis methods provided by FaceEngine library. This document does not cover FaceEngine usage in detail, for more information please see FaceEngine_Handbook.pdf.

To create a TrackEngine instance use the following global factory functions

  • ITrackEngine tsdk::createTrackEngine(fsdk::IFaceEngine engine, const char configPath, vsdk::IVehicleEngine vehicleEngine = nullptr, const fsdk::LaunchOptions *launchOptions = nullptr)

    • engine - pointer to FaceEngine instance (should be already initialized)
    • configPath - path to TrackEngine config file
    • vehicleEngine - pointer to the VehicleEngine object (if with vehicle logic)
    • launchOptions - launch options for sdk functions
    • return value - pointer to ITrackEngine
  • ITrackEngine tsdk::createTrackEngine(fsdk::IFaceEngine engine, const fsdk::ISettingsProviderPtr& provider, vsdk::IVehicleEngine vehicleEngine = nullptr, const fsdk::LaunchOptions launchOptions = nullptr)

    • engine - pointer to FaceEngine instance (should be already initialized)
    • provider - settings provider with TrackEngine configuration
    • vehicleEngine - pointer to the VehicleEngine object (if with vehicle logic)
    • launchOptions - launch options for sdk functions
    • return value - pointer to ITrackEngine

It is not recommended to create multiple TrackEngine instances in one application.

In the end of processing user must call ITrackEngine stop method.

  • void ITrackEngine::stop() Stops processing.

The main interface to TrackEngine is Stream - an entity to which you submit video frames. To create a stream use the following TrackEngine method

  • IStream* ITrackEngine::createStream()

    • return value - pointer to IStream

You can create multiple streams at once if required (in cases when you would like to track faces from multiple cameras). In each stream the engine detects faces and builds their tracks. Each face track has its own unique identifier. It is therefore possible to group face images belonging to the same person with their track ids. Please note, tracks may break from time to time either due to people leaving the visible area or due to the challenging detection conditions (poor image quality, occlusions, extreme head poses, etc).

The frames are submitted on a one by one basis and each frame has its own unique id.

  • bool IStream::pushFrame(const fsdk::Image &frame, uint32_t frameId, tsdk::AdditionalFrameData *data) Pushes a single frame to the stream buffer.

    • frame - input frame image. Format must be R8G8B8 OR R8G8B8X8.
    • frameId - unique identifier for frames sequence.
    • data - is any additional data that a developer wants to receive in callbacks-realization. Do not use the delete-operator. The garbage collector is implemented inside TrackEngine for this param.
    • return value - true if frame was appended to the queue for processing, false otherwise - frame was skipped because of full queue.

Also there are some variations of this method: pushCustomFrame, pushFrameWaitFor, pushCustomFrameWaitFor.

TrackEngine emits various events to inform you what is happening. The events occur on a per-stream basis.

When Stream has to be finished, user must call IStream join method before stream destruction.

  • void IStream::join() Blocks current thread until all frames in this Stream will be handled and all callbacks will be executed (Stream could not be used after join).

Note: Ignoring this step can lead to unexpected behavior (TE writes warning log in this case).

You can set up an observer to receive and react to events. There are two types of observers: per-stream specific single observer and batched observer for all streams. Per-stream observers are set deprecated now remain only for compatibility with old versions.

Note: It's highly recommended to use new batched observers API instead of old per-stream one.

Batched observers have some advantages over per-stream observers:

  • reduce and set fixed number of threads created by TrackEngine itself (see section Threading for details).
  • eliminate performance overhead from multiple concurrently working threads used for per-stream callbacks.
  • allow to easily use batched SDK API without additional aggregation of data from single callbacks. Both for GPU/CPU batched SDK API improves performance (for GPU effect is much more significant).
  • give more information in output (per-stream callbacks functions signatures remain the same because of compatibility with old versions)

Note: you have to setup either single per-stream observer or batched one for all streams, but not both at the same time.

Stream observer interfaces:

Per-stream observers:

  • IBestShotObserver

  • IVisualObserver

  • IDebugObserver

Batched observers:

  • IBatchBestShotObserver

  • IBatchVisualObserver

  • IBatchDebugObserver

By implementing one or several observer interfaces it is possible to define custom processing logic in your application.

IBestShotPredicate type defines recognition suitability criteria for face detections. By implementing a custom predicate one may alter the best shot selection logic and, therefore, specify which images will make it to the recognition phase.

Setting per-stream observer API example:

  • void IStream::setBestShotObserver(tsdk::IBestShotObserver* observer) Sets a best shot observer for the Stream.

    • observer - pointer to the observer object, see IBestShotObserver. Don't set to nullptr, if you want disable it, then use IStream::setObserverEnabled with false.

Setting batched observer API example:

  • void ITrackEngine::setBatchBestShotObserver(tsdk::IBatchBestShotObserver *observer) Sets a best shot observer for all streams.

    • observer - pointer to the batched observer object, see IBatchBestShotObserver. Don't set to nullptr, if you want disable it, then use IStream::setObserverEnabled with false.

IBestShotObserver#

  • void bestShot(const tsdk::DetectionDescr& descr) called for each emerged best shot. It provides information on a best shot, including frame number, detection coordinates, cropped still image, and other data (see `DetectionDescr structure definition below for details.) Default implementation does nothing.

    • descr - best shot detection description
struct TRACK_ENGINE_API DetectionDescr {
    //! Index of the frame
    tsdk::FrameId frameIndex;

    //! Index of the track
    tsdk::TrackId trackId;

    //! Source image
    fsdk::Image image;

    fsdk::Ref<ICustomFrame> customFrame;

    //! Face landmarks
    fsdk::Landmarks5 landmarks;

#ifndef MOBILE_BUILD
    //! Human landmarks
    fsdk::HumanLandmarks17 humanLandmarks;

    //! NOTE: only for internal usage, don't use this field, it isn't valid ptr
    fsdk::IDescriptorPtr descriptor;
#endif

    //! Is it full detection or redetect step
    bool isFullDetect;

    //! Detections flags
    // needed to determine what detections are valid in extraDetections 
    // see EDetectionFlags
    uint32_t detectionsFlags;

    //! Detection
    // always is valid, even when detectionsFlags is combination type
    // useful for one detector case
    // see detectionObject
    fsdk::Detection detection;

    //! extra detections
    // needed when detectionsFlags has combination type,
    // e.g. for EDetection_Body_Face extraDetections[EDetection_Face], extraDetections[EDetection_Body] are valid
    // note: for simple detection type extra detection with corresponding index is valid too
    fsdk::Detection extraDetections[EDetectionObject::EDetection_Simple_Count];

    bool hasDetectionFlag(EDetectionObject obj) {
        return (detectionsFlags & (1 << obj)) ? true : false;
    }

    void setDetectionFlag(EDetectionObject obj, bool enable) {
        if (enable) {
            detectionsFlags |= (1 << obj);
        }
        else {
            detectionsFlags &= ~(1 << obj);
        }
    }

    void setExtraDetection(EDetectionObject obj, const fsdk::Detection &detection) {
        extraDetections[obj] = detection;
    }
};
  • void trackEnd(const tsdk::TrackId& trackId) tells that the track with trackId has ended and no more best shots should be expected from it. Default implementation does nothing.

    • trackId - id of the track
  • void trackStatusUpdate(tsdk::FrameId frameId, tsdk::TrackId trackId, tsdk::TrackStatus status) tells that the track status updated.

    • frameId - id of the frame
    • trackId - id of the track
    • status - track new status
/** @brief Track status enum. (see human tracking algorithm section in docs for details)
*/
enum class TrackStatus : uint8_t {
    ACTIVE = 0,
    NONACTIVE
};
  • void trackReIdentificate(tsdk::FrameId frameId, tsdk::TrackId trackId, tsdk::TrackId reidTrackId) tells that the track with id = trackId was matched to one of the old non-active tracks with id = reidTrackId. See section ReIdentification for details.

    • frameId - id of the frame
    • trackId - id of the track, that was matched to one of the old non-active tracks
    • reidTrackId - id of the non-active track, that successfully mathed to track with id = trackId

IVisualObserver#

  • void visual(const tsdk::FrameId &frameId, const fsdk::Image &image, const tsdk::TrackInfo * trackInfo, const int nTrack) allows to visualize current stream state. It is intended mainly for debugging purposes. The function must be overloaded.

    • frameId - current frame id
    • image - frame image
    • trackInfo - array of currently active tracks
struct TRACK_ENGINE_API TrackInfo {
    //! Face landmarks
    fsdk::Landmarks5 landmarks;

    //! Last detection for track
    fsdk::Rect rect;

    //! Id of track
    TrackId trackId;

    //! Score for last detection in track
    float lastDetectionScore;

    //! Is it detected or tracked bounding box
    bool isDetector;
};
  • nTrack - number of tracks

IDebugObserver#

  • void debugDetection(const tsdk::DetectionDebugInfo& descr) detector debug callback. Default implementation does nothing.

    • descr - detection debugging description
struct DetectionDebugInfo {
    //! Detection description
    DetectionDescr descr;

    //! Is it detected or tracked bounding box
    bool isDetector;

    //! Filtered by user bestShotPredicate or not.
    bool isFiltered;

    //! Best detection for current moment or not
    bool isBestDetection;
};
  • void debugForegroundSubtraction(const tsdk::FrameId& frameId, const fsdk::Image& firstMask, const fsdk::Image& secondMask, fsdk::Rect * regions, int nRegions) background subtraction debug callback. Default implementation does nothing.

    • frameId - frame id of foreground
    • firstMask - result of background subtraction operation
    • secondMask - result of background subtraction operation after procedures of erosion and dilation
    • regions - regions obtained after background subtraction operation
    • nRegions - number of returned regions

BestShotPredicate#

  • bool checkBestShot(const tsdk::DetectionDescr& descr) Predicate for best shot detection. This is the place to perform any required quality checks (by means of, e.g. FaceEngines Estimators). This function must be overloaded.

    • descr - detection description
    • return value - true, if descr has passed the check, false otherwise

VisualPredicate#

  • bool needRGBImage(const tsdk::FrameId frameId, const tsdk::AdditionalFrameData *data) Predicate for visual callback. It serves to decide whether to output original image in visual callback or not. This function can be overloaded. Default implementation returns true.

    • frameId - id of the frame
    • data - frame additional data, passed by user
    • return value - true, if original image (or rgb image for custom frame) needed in output in visual callback, false otherwise

IBatchBestShotObserver#

  • void bestShot(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the bestShot callback.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
struct TRACK_ENGINE_API BestShotCallbackData {
    //! detection description. see 'DetectionDescr' for details
    tsdk::DetectionDescr descr;

    //! additional frame data, passed by user in 'pushFrame'. see 'AdditionalFrameData' for details
    tsdk::AdditionalFrameData *frameData;
};
  • void trackEnd(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the trackEnd callback.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
/**
* @brief Track end reason. See 'TrackEndCallbackData' for details.
*/
enum class TrackEndReason : uint8_t {
    //! non-active track ends because of lifetime expired
    NONACTIVE_TIMEOUT,
    //! active track ends because of reidentification with old non-active track
    ACTIVE_REID,
    //! non-active track ends because of reidentification with older non-active track
    //  (that means, that current track couldn't been updated and was matched to old non-active at the same time)
    NONACTIVE_REID,
    //! all alive tracks are finished on TrackEngine stop call
    TRACKENGINE_STOP
};

struct TRACK_ENGINE_API TrackEndCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! track id
    tsdk::TrackId trackId;

    //! parameter implies reason of track ending
    // NOTE: now it's using only for human tracking, don't use this for other detectors
    TrackEndReason reason;
};
  • void trackStatusUpdate(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the trackStatusUpdate callback.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
struct TRACK_ENGINE_API TrackStatusUpdateCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! track id
    tsdk::TrackId trackId;

    //! new track status
    tsdk::TrackStatus status;
};
  • void trackReIdentificate(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the trackReIdentificate callback. See section ReIdentification for details.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
struct TRACK_ENGINE_API TrackReIdentificateCallbackData {
    //! id of frame
    tsdk::FrameId frameId;

    //! id of track, that was matched to one of the old non-active tracks
    tsdk::TrackId trackId;

    //! id of the non-active track, that successfully mathed to track with id = 'trackId'
    // see human tracking algorithm section in docs for details
    tsdk::TrackId reidTrackId;

    //! similarity from matching of tracks descriptors
    float similarity;
};

IBatchVisualObserver#

  • void visual(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the visual callback.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
struct TRACK_ENGINE_API VisualCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! this is either original image (if 'pushFrame' used) or RGB image got from custom frame convert (is 'pushCustomFrame' used)
    fsdk::Image image;

    //! tracks array raw ptr
    tsdk::TrackInfo *trackInfo;

    //! number of tracks
    int nTrack;

    //! additional frame data, passed by user in 'pushFrame'. See 'AdditionalFrameData' for details.
    tsdk::AdditionalFrameData *frameData;
};

IBatchDebugObserver#

  • void debugForegroundSubtraction(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the debugForegroundSubtraction callback.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
  • void debugDetection(const fsdk::Span &streamIDs, const fsdk::Span &data) Batched version of the debugDetection callback.

    • streamIDs - array of streams id
    • data - array of callback data for each stream
struct TRACK_ENGINE_API DebugForegroundSubtractionCallbackData {
    //! frame id
    tsdk::FrameId frameId;

    //! first mask of  the foreground subtraction
    fsdk::Image firstMask;

    //! second mask of  the foreground subtraction
    fsdk::Image secondMask;

    //! regions array raw ptr
    fsdk::Rect *regions;

    //! number of regions
    int nRegions;
};

/** @brief Detection data for debug callback.
    */
struct TRACK_ENGINE_API DebugDetectionCallbackData {
    //! Detection description
    DetectionDescr descr;

    //! Is it detected or tracked bounding box
    bool isDetector;

    //! Filtered by user bestShotPredicate or not.
    bool isFiltered;

    //! Best detection for current moment or not
    bool isBestDetection;
};
  • void IStream::setObserverEnabled(tsdk::StreamObserverType type, bool enabled) Enables or disables observer.

    • type - type of observer
    • enabled - flag to enable/disable observer

For full Stream API see class IStream from IStream.h header file.

Human tracking algorithm#

Human tracking algorithm differs from the faces one. Tracker feature isn't used at all anymore, only detect/redetect are used. For matching tracks with new detections IOU metrics is used. The parameter human:iou-connection-threshold is used for threshold. For better tracking accuracy the ReIdentification feature is used to merge different tracks of one human (for ReIdentification details see the next section).

For face tracking algorithm when detect/redetect fails, then track is updated with tracker, but for human tracking in that case (or under some other conditions) it moves to non-active group of tracks. trackStatusUpdate callback with status = TrackStatus::NONACTIVE is invoked to indicate about that. Tracks from that group are invisible for all observers and they don't participate in common tracking processing (detect/redetect).

Note, that the parameter "skip-frames" doesn't affect on human tracking algorithm. Human tracks are finished according to its own logic. There are some cases, when trackEnd is called for human track (see TrackEndCallbackData reasonfield):

  1. non-active track is finished by timeout set by config parameter "human":"non-active-tracks-lifetime" (reason = NONACTIVE_TIMEOUT).
  2. active track is finished because of reIdentification with another old track from the non-active group. Note, that the old track id becomes active again. First active track's id is just replaced with the older one and trackStatusUpdate is called with status = TrackStatus::ACTIVE for the old track id to indicate, that it's active again, trackEnd is called for the current active track id to indicate it doesn't exist anymore (reason = ACTIVE_REID). For this case config parameter "human":"reid-matching-detections-number" sets lifetime of the active track (in number of frames) needed for matching to the old non-active tracks.
  3. active track is finished if it to be moved to the non-active group (e.g. detector/redetect fails), but it successfully matched (reIdentification called) to the old non-active one at the same time (reason = NONACTIVE_REID). Also in this case lifetime counter of the non-active track is reset.

Some algorithm notes and parameters relation. After detect/redetect all found detections are filtered by some conditions: - Overlapped detections may be removed. For overlapping estimation IOU metric is used. If IOU is higher than threshold parameter other:kill-intersection-value, then no one, both or detection with lower detection score is removed from further processing, depending on parameter remove-overlapped-strategy. - detections, considered to be horizontal are removed. remove-horizontal-ratio sets detection width to height ratio threshold, used for removing horizontal detections.

ReIdentification#

ReIdentification is a feature, that improves tracking accuracy. ReIdentification is intended to solve problem, described in section Human tracking algorithm. It matches two tracks with different id-s and merges them into one track with id of the older one. trackReIdentificate callback signals about successfull matching and merging of the two tracks into one. The feature's behavior is regulated by config parameters "human":"reid-matching-threshold", "reid-matching-detections-number". Two tracks will be matched only if similarity between them higher then "reid-matching-threshold". If you don't want ReIdentification feature at all, then just set up this parameter value higher than 1.

Note: current version of the TrackEngine supports ReIdentification feature only for human tracking.

Threading#

TrackEngine is multi-threaded. The number of threads is configurable and depends on the currently bound FaceEngine settings and type of observers been used (batched or single). TrackEngine calls Observers functions in separate threads. If batched observers are used, then only one additional thread will be created and used for all batched callbacks and all streams. If per-stream single observers are used, then for each stream it's own separate callback thread will be created and used for it's callbacks invocations. In this case all callbacks are invoked from the one thread per-stream. Whatever callback type is used, it is recommended to avoid long-time running tasks in these functions, because pushing to callback buffer blocks main processing thread, so main processing thread always waits until there is free slot in that buffer to push a callback (buffer's size is set by parameter callback-buffer-size, see below). The checkBestShot and needRGBImage functions are called in the main frame processing thread. It is also recommended to avoid expensive computations in these functions. Perfectly, these predicates should take zero performance cost.

Threads count guarantees (excluding calculating threads of SDK): - If batched observers are used, then users have guarantee, that TrackEngine uses only 2-3 threads itself. - If per-stream single observers are used, then users have guarantee, that TrackEngine uses only 1-2 + number of created streams threads itself.

Tracker#

TrackEngine uses tracker to update the current detections in the case of detect/redetect fail. TrackEngine supports several trackers (see tracker-type parameter in the config, section Settings). Some platforms don't support all trackers. vlTracker is the tracker based on neural networks. It's the only tracker, that can be used for GPU/NPU processing (other trackers, except of none, don't support GPU/NPU) and for processing concurrently running multiple streams (it has batching implementation, so provides better CPU utilization). KCF/opencv trackers are simple CPU trackers, that should be used only in case of few tracks in total for all streams at the moment. None tracker choosen disables tracking feature at all, so it leads to better performance, but degradation of tracking quality.