docs/benchmarks.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

.. _benchmarks:

Benchmarks
==========

We currently use `pytest-benchmark <https://pytest-benchmark.readthedocs.io/>`_
to write tests to assess the time and resources taken by various tasks.

Results of benchmarking can be seen in https://benchmarks.leap.se/.

Test repetition
---------------

`pytest-benchmark` runs tests multiple times so it can provide meaningful
statistics for the time taken for a tipical run of a test function. The number
of times that the test is run can be manually or automatically configured. When
automatically configured, the number of runs is decided by taking into account
multiple `pytest-benchmark` configuration parameters. See the `the
corresponding documenation
<https://pytest-benchmark.readthedocs.io/en/stable/calibration.html>`_ for more
details on how automatic calibration works.

The actual number of times a test is run depends on many parameters: the time
taken for a sample run and the configuration of the minimum number of rounds
and maximum time allowed for a benchmark. For a snapshot of the number of
rounds for each test function see `the soledad benchmarks wiki page
<https://0xacab.org/leap/soledad/wikis/benchmarks>`_.

Sync size statistics
--------------------

Currenly, the main use of Soledad is to synchronize client-encrypted email
data. Because of that, it makes sense to measure the time and resources taken
to synchronize an amount of data that is realistically comparable to a user's
email box.

In order to determine what is a good example of dataset for synchronization
tests, we used the size of messages of one week of incoming and outgoing email
flow of a friendly provider. The statistics that came out from that are (all
sizes are in KB):

+--------+-----------+-----------+
|        | outgoing  | incoming  |
+========+===========+===========+
| min    | 0.675     | 0.461     |
+--------+-----------+-----------+
| max    | 25531.361 | 25571.748 |
+--------+-----------+-----------+
| mean   | 252.411   | 110.626   |
+--------+-----------+-----------+
| median | 5.320     | 14.974    |
+--------+-----------+-----------+
| mode   | 1.404     | 1.411     |
+--------+-----------+-----------+
| stddev | 1376.930  | 732.933   |
+--------+-----------+-----------+

Test scenarios
--------------

Ideally, we would want to run tests for a big data set, but that may be
infeasible given time and resource limitations. Because of that, we choose a
smaller data set and suppose that the behaviour is somewhat linear to get an
idea for larger sets.

Supposing a data set size of 10MB, some possibilities for number of documents
and document sizes for testing download and upload are:

* 10 x 1M
* 20 x 500K
* 100 x 100K
* 200 x 50K
* 1000 x 10K

The above scenarios all have documents of the same size. If we want to account
for some variability on document sizes, it is sufficient to come up with a
simple scenario where the average, minimum and maximum sizes are somehow
coherent with the above statistics, like the following one:

* 60 x 15KB + 1 x 1MB