Garbage Collecting pipeline¶
Garbage Collecting (GC) module¶
According to LUNA Platform 3 logic, garbage is the descriptors
that are linked neither to a person
nor to a list
.
For normal system operation, one needs to regularly delete garbage from databases. For this, run the system cleaning script remove_not_linked_descriptors.py
from this folder.
According to Backport 3 architecture, this script removes faces
, which do not have links with any lists
or persons
from the Luna Backport 3 database, from the Faces service.
Script execution pipeline¶
The script execution pipeline consists of several stages:
A temporary table is created in the Faces database. See more info about temporary tables for oracle or postgres.
Ids
of faces that are not linked to lists are obtained. Theids
are stored in the temporary table.While the temporary table is not empty, the following operations are performed:
The batch of
ids
from the temporary table is obtained. First 10k (or less) face ids are received.Filtered ids
are obtained.Filtered ids
are ids that do not exist in theperson_face
table of the Backport 3 database.Filtered ids
are removed from the Faces database. If some of the faces cannot be removed, the script stops.Filtered ids
are removed from the Backport 3 database (foolcheck). A warning will be printed.Ids
are removed from the temporary table.
Script launching¶
Python of version 3.9 is required for the script launching.
The virtual environment for the Backport 3 service should be activated before the script launching. Read about the virtual environment activation and requirements installation in the install.html document.
Set up actual configurations for the Faces database and Backport 3 database in the
./config.conf
file.Launch the script:
python remove_not_linked_descriptors.py
The output will include information about the number of removed faces and the number of persons with faces.