1 files changed, 106 insertions, 0 deletions
diff --git a/docs/development/benchmarks.rst b/docs/development/benchmarks.rst
new file mode 100644
index 00000000..25e39ae7
--- /dev/null
+++ b/docs/development/benchmarks.rst
@@ -0,0 +1,106 @@
+.. _benchmarks:
+
+Benchmarks
+==========
+
+We currently use `pytest-benchmark <https://pytest-benchmark.readthedocs.io/>`_
+to write tests to assess the time and resources taken by various tasks.
+
+To run benchmark tests, once inside a cloned Soledad repository, do the
+following::
+
+    tox -e benchmark
+
+Results of automated benchmarking for each commit in the repository can be seen
+in: https://benchmarks.leap.se/.
+
+Benchmark tests also depend on `tox` and `CouchDB`. See the :ref:`tests` page
+for more information on how to setup the test environment.
+
+Test repetition
+---------------
+
+``pytest-benchmark`` runs tests multiple times so it can provide meaningful
+statistics for the time taken for a tipical run of a test function. The number
+of times that the test is run can be manually or automatically configured.
+
+When automatically configured, the number of runs is decided by taking into
+account multiple ``pytest-benchmark`` configuration parameters. See the `the
+corresponding documenation
+<https://pytest-benchmark.readthedocs.io/en/stable/calibration.html>`_ for more
+details on how automatic calibration works.
+
+To achieve a reasonable number of repetitions and a reasonable amount of time
+at the same time, we let ``pytest-benchmark`` choose the number of repetitions
+for faster tests, and manually limit the number of repetitions for slower tests.
+
+Currently, tests for `synchronization` and `sqlcipher asynchronous document
+creation` are fixed to run 4 times each. All the other tests are left for
+``pytest-benchmark`` to decide how many times to run each one. With this setup,
+the benchmark suite is taking approximatelly 7 minutes to run in our CI server.
+As the benchmark suite is run twice (once for time and cpu stats and a second
+time for memory stats), the whole benchmarks run takes around 15 minutes.
+
+The actual number of times a test is run when calibration is done automatically
+by ``pytest-benchmark`` depends on many parameters: the time taken for a sample
+run and the configuration of the minimum number of rounds and maximum time
+allowed for a benchmark. For a snapshot of the number of rounds for each test
+function see `the soledad benchmarks wiki page
+<https://0xacab.org/leap/soledad/wikis/benchmarks>`_.
+
+Sync size statistics
+--------------------
+
+Currenly, the main use of Soledad is to synchronize client-encrypted email
+data. Because of that, it makes sense to measure the time and resources taken
+to synchronize an amount of data that is realistically comparable to a user's
+email box.
+
+In order to determine what is a good example of dataset for synchronization
+tests, we used the size of messages of one week of incoming and outgoing email
+flow of a friendly provider. The statistics that came out from that are (all
+sizes are in KB):
+
++--------+-----------+-----------+
+|        | outgoing  | incoming  |
++========+===========+===========+
+| min    | 0.675     | 0.461     |
++--------+-----------+-----------+
+| max    | 25531.361 | 25571.748 |
++--------+-----------+-----------+
+| mean   | 252.411   | 110.626   |
++--------+-----------+-----------+
+| median | 5.320     | 14.974    |
++--------+-----------+-----------+
+| mode   | 1.404     | 1.411     |
++--------+-----------+-----------+
+| stddev | 1376.930  | 732.933   |
++--------+-----------+-----------+
+
+Sync test scenarios
+-------------------
+
+Ideally, we would want to run tests for a big data set (i.e. a high number of
+documents and a big payload size), but that may be infeasible given time and
+resource limitations. Because of that, we choose a smaller data set and suppose
+that the behaviour is somewhat linear to get an idea for larger sets.
+
+Supposing a data set total size of 10MB, some possibilities for number of
+documents and document sizes for testing download and upload can be seen below.
+Scenarios marked in bold are the ones that are actually run in the current sync
+benchmark tests, and you can see the current graphs for each one by following
+the corresponding links:
+
+
+* 10 x 1M
+* **20 x 500K** (`upload <https://benchmarks.leap.se/test-dashboard_test_upload_20_500k.html>`_, `download <https://benchmarks.leap.se/test-dashboard_test_download_20_500k.html>`_)
+* **100 x 100K** (`upload <https://benchmarks.leap.se/test-dashboard_test_upload_100_100k.html>`_, `download <https://benchmarks.leap.se/test-dashboard_test_download_100_100k.html>`_)
+* 200 x 50K
+* **1000 x 10K** (`upload <https://benchmarks.leap.se/test-dashboard_test_upload_1000_10k.html>`_, `download <https://benchmarks.leap.se/test-dashboard_test_download_1000_10k.html>`_)
+
+In each of the above scenarios all the documents are of the same size. If we
+want to account for some variability on document sizes, it is sufficient to
+come up with a simple scenario where the average, minimum and maximum sizes are
+somehow coherent with the above statistics, like the following one:
+
+* 60 x 15KB + 1 x 1MB