Garbage Collecting pipeline

Garbage Collecting (GC) module

According to LUNA Platform 3 logic, garbage is the descriptors that are linked neither to a person nor to a list.

For normal system operation, one needs to regularly delete garbage from databases. For this, run the system cleaning script remove_not_linked_descriptors.py from this folder.

According to Backport 3 architecture, this script removes faces, which do not have links with any lists or persons from the Luna Backport 3 database, from the Faces service.

Script execution pipeline

The script execution pipeline consists of several stages:

  1. A temporary table is created in the Faces database. See more info about temporary tables for oracle or postgres.

  2. Ids of faces that are not linked to lists are obtained. The ids are stored in the temporary table.

  3. While the temporary table is not empty, the following operations are performed:

    • The batch of ids from the temporary table is obtained. First 10k (or less) face ids are received.

    • Filtered ids are obtained. Filtered ids are ids that do not exist in the person_face table of the Backport 3 database.

    • Filtered ids are removed from the Faces database. If some of the faces cannot be removed, the script stops.

    • Filtered ids are removed from the Backport 3 database (foolcheck). A warning will be printed.

    • Ids are removed from the temporary table.

Script launching

Python of version 3.9 is required for the script launching.

The virtual environment for the Backport 3 service should be activated before the script launching. Read about the virtual environment activation and requirements installation in the install.html document.

  • Set up actual configurations for the Faces database and Backport 3 database in the ./config.conf file.

  • Launch the script:

    python remove_not_linked_descriptors.py

The output will include information about the number of removed faces and the number of persons with faces.