Useful tools#

Performance testing#

Performance testing is crucial for ensuring the reliability, accuracy, and efficiency of software systems. It helps in optimizing resource usage, reducing latency, and providing consistent results across different environments. Below are key concepts, metrics, parameters, and practical recommendations for conducting effective performance tests.

Key concepts in performance testing#

Warm-up Phase
Initial iterations often include delays due to memory allocation, lazy data initialization, thread creation, and caching. These effects diminish after a few iterations. Warm-up iterations are excluded from final results to ensure accuracy.
Noise Compensation
Noise in performance tests arises from factors like OS multitasking, resource contention, and memory management. To mitigate noise:
- Increase the number of iterations to average out high-frequency noise.
- Use statistical methods such as averaging or filtering to stabilize results.

Metrics for performance analysis#

Common metrics#

Metric	Description
`min`	The smallest measured time across all iterations. It is protected by hardware limitations, less sensitive to anomalies compared to `max`, `avg`, or `median` and does not reflect worst-case scenarios.
`max`	The largest measured time across all iterations. It reflects extreme cases (for example, system delays) and is highly variable between runs, no upper boundary, sensitive to OS delays.
`avg`	The arithmetic mean of all measured times. It is simple to calculate and sensitive to outliers; a single large value can significantly increase the average.
`median`	The middle value in the sorted list of measured times. It is more robust than `avg` but less reliable than `min` and resistant to moderate anomalies. It can shift upward if multiple anomalies fall into the upper half of the sorted list.
`mode`	The most frequently occurring value in the measured times and the most reliable metric for analysis, unaffected by rare anomalies, works well with asymmetric distributions. It requires careful histogram construction to avoid instability.

Practical use#

Use min for determining convergence because it approaches a hardware-determined lower bound as iterations increase.
Combine metrics for comprehensive analysis (for example, min for stability, max for outliers).

Performance test parameters#

Below are additional command line parameters that allow you to customize performance test operation.

Test-specific parameters#

Parameter	Description
`-t, --test`	Specifies the type of test being performed (mandatory named parameter).
`-i, --image`	Specifies the input image for tests (named parameter).
`-o, --out`	Specifies the output CSV file for final statistics (mandatory named parameter).
`--raw-out`	Specifies the CSV file for recording operational statistics after each iteration. Includes all iterations, even those during warm-up.

Batch and sensor parameters#

Parameter	Description
`-b, --batch`	Sets the batch size (named parameter).
`-s, --sensor`	Sets the sensor type, for example, for the EyesBatch test.
`-y, --yuv`	Sets the YUV image for YUV12toRGB and YUV21toRGB tests.

iOS-specific parameters#

Parameter	Description
`--data`	Path to the data directory (used only in non-standard iOS mode).
`--threads`	Number of threads used for testing (used only in non-standard iOS mode).
`--descriptor-model`	Specifies the descriptor model used in tests.
`--detector-type`	Specifies the detector type used in tests.

Stopping condition parameters#

Parameter	Description
`--max-rel-height`	Threshold for relative height of the last step. If exceeded, stopping conditions are not met.
`--min-step-width`	Minimum width of the last step. If narrower, stopping conditions are not met.
`--max-rel-slope`	Threshold for relative slope of the last step. Combines the effects of `--max-rel-height` and `--min-step-width`.
`--min-steps`	Minimum number of steps required before stopping conditions can be evaluated.
`--min-iters`	Minimum number of iterations required before stopping.
`--max-iters`	Maximum number of iterations allowed (emergency stop condition).
`--max-time`	Maximum total execution time allowed (emergency stop condition).

Recommendations for parameter selection#

Start with default parameters.
Avoid overriding these settings unless necessary, as doing so may unnecessarily extend the execution time of most tests.
Optimize runtime.
If the test runtime is excessively long, consider relaxing the thresholds for the following parameters:
- --min-step-width
- --max-rel-height
- --max-rel-slope
- --min-steps
- --min-iters
Adjusting these parameters will cause the convergence-based stopping conditions to trigger more quickly, thereby reducing the overall test duration. However, this approach may compromise the reliability and stability of the results. - Balance runtime and result quality.
Striking a balance between runtime efficiency and result quality is one of the key trade-offs when configuring a performance test. While loosening thresholds can expedite the test, it is essential to ensure that the resulting data remains sufficiently accurate and stable for meaningful analysis.

Stopping conditions#

Normal stopping conditions#

Convergence analysis:

The test analyzes the convergence of min values over iterations.

Key metrics:
- step_height - Absolute change in min between two consecutive iterations.
- rel_step_height - Relative change in min as a percentage of the previous value.
- step_width - Number of consecutive iterations where min does not improve.
- rel_slope - Rate of change in min per iteration.
Conditions:
- If rel_step_height is below a threshold (--max-rel-height), convergence is assumed.
- If step_width exceeds a threshold (--min-step-width), it indicates that further improvements require too few iterations.
- If rel_slope is below a threshold (--max-rel-slope), it confirms slow changes in min.
- Minimum steps/iterations :
- At least --min-steps must be generated to ensure stability.
- At least --min-iters iterations must be completed to ensure sufficient data collection.

Emergency stopping conditions#

Exceeding iteration limit:
If the number of iterations reaches --max-iters, the test stops regardless of convergence.
Exceeding time limit:
If the total execution time exceeds --max-time, the test stops regardless of convergence.
Insufficient warm-up:
If an emergency stop occurs before completing the warm-up phase, results may be unreliable due to incomplete stabilization of initial delay.

Configuration of emergency stop conditions#

The --max-iters and --max-time parameters are designed to trigger an emergency stop of the test. These safeguards prevent the performance test from running indefinitely in cases where convergence issues arise.

A normal stop should occur before these emergency thresholds are reached. Ideally, the test will meet its convergence criteria and terminate well before approaching the emergency limits. To ensure this, we recommend that you set --max-iters and --max-time with a generous margin, so they significantly exceed the expected duration for a successful, routine stop.

By doing so, you can avoid premature terminations due to overly restrictive settings and allow the test sufficient time to achieve stable results under normal conditions.

Special cases#

Local minima:
If small steps are formed early in the test, the --min-steps parameter ensures enough steps are generated to confirm global convergence.
Last iteration uncertainty:
For the final iteration, future behavior is unknown, so no step parameters are defined.

Example console report#

During a performance test execution, the console displays operational statistics that help you track the current test results. Operational statistics show all iterations, including warm-up iterations. Here is how it is organized:

By analyzing the console report, you can assess the stability of results, identify potential issues, and ensure the test converges correctly before relying on the final output.

Structure of the first table#

Each row in the table represents one "step" or "staircase" of the min value over time.

Between steps, there may be idle iterations that do not improve the min value; these are marked with dots (.), one for each iteration.

Column contents#

First column: Step width (number of consecutive iterations). The sum of all step widths equals the total number of iterations, including the warm-up phase.
Second column: Relative height of the step.
Third column: Relative slope of the step (percentage change per iteration).

Additional metrics#

For every generated step, three metrics are displayed:

Metric	Description
`min`	Current minimum time after the current iteration.
`max`	Maximum time across all completed iterations.
`avg`	Average time across all completed iterations.

Zero and last iterations#

Zero iteration: Parameters for the first step are undefined because no prior min values exist.
Last iteration: Parameters for the final step are also undefined since it is unknown whether further iterations would have reduced min.

Color coding#

Changes in relative height or slope compared to the previous step are color-coded:

Green indicates an increase in relative height or slope.
Red indicates a decrease in relative height or slope.

Reasons for stopping#

After the table, the console displays the reason for stopping the test:

Normal stop: Conditions for convergence were met.
Emergency stop: Exceeded --max-time or --max-iters.

Operational vs. final statistics#

Operational statistics	Final statistics
Includes all iterations, even those during the warm-up phase.	Excludes warm-up iterations and focuses on post-warm-up data and adds the calculation of mode (most frequent value) for better accuracy.
Values like `max` and `avg` in operational statistics tend to be higher than in final statistics due to the inclusion of warm-up data.	Warm-up iterations account for initial delays (d(t)), which skew early results but are excluded from final reports.
Operational `max` includes warm-up delays, so it appears higher.	Final `max` excludes warm-up, providing a more accurate representation of steady-state performance.

Performance test challenges#

Measurement range limitations#

Performance tests are unsuitable for measuring time intervals in the range of a few nanoseconds to several hundred microseconds. For such cases, use microbenchmark frameworks. However, performance tests excel at measuring time in the required range — from milliseconds and above.

High-frequency noise#

Performance tests effectively filter out high-frequency noise, such as random delays with periods much shorter than the total execution time of the test across all iterations of one type.

Low-frequency noise#

Performance tests cannot efficiently handle low-frequency noise if its characteristic duration is comparable to or exceeds the test execution time. For example:

Delays during the warm-up phase (d(t)), which are predictable and easily compensated.
Service processes (updates, defragmentation, backups) running for several hours. If the test runtime overlaps with these processes, results will be distorted.
Low-frequency noise affects all types of measurements. Collecting long-term statistics to detect and filter such noise is often impractical or too resource-intensive. Therefore, minimizing its impact relies on user intervention, such as proper server configuration.

Test execution duration#

The primary new challenge for performance tests is the significant amount of time required to gather a sufficient sample of data.

Artificial constraints efficiency#

Artificial constraints via --max-iters or --max-time reduce the test's effectiveness by limiting the dataset size, potentially compromising reliability.

Launch recommendations#

Run tests overnight when system load is minimal for optimal results.
Daytime runs can be conducted with reduced execution time for quick analysis but should be treated as preliminary, as they may lack accuracy.

Potential improvements#

These improvements aim to streamline the testing process, provide deeper insights, and reduce manual intervention, ultimately resulting in more efficient and accurate performance evaluations:

Automatic chart generation
Utilize libraries like Plotly to create visually appealing and interactive web-based charts. This enhances clarity, simplifies analysis, and improves usability.
Continuous function approximation
Instead of discrete histograms, approximate measurement distribution using continuous functions. This eliminates issues related to bin size and count, improving accuracy.
Enhanced warm-Up logic
Dynamically calculate the required number of warm-up iterations based on specific test needs, improving both accuracy and efficiency.
Automated result comparison
Implement an automated system to compare current results with previous runs, generating reports on performance improvements or regressions. Visualizing changes through graphs and detecting abnormal performance drops would enhance responsiveness to issues.
Advancements in convergence analysis
Refine algorithms for detecting stabilized metrics, incorporate advanced statistical methods to handle noise and outliers, and improve heuristics for identifying global versus local minima during the test.

Practical recommendations#

Always include a warm-up phase to eliminate initialization delays from results.
Use a sufficient number of iterations to reduce noise and achieve stable metrics.
Focus on the min metric for determining convergence due to its stability and predictable behavior.
Visualize results using tools like CSV exports and graphs for better interpretation of trends and anomalies.