[doc] add doc on benchmark sync tests sizes

Closes: #8884
author: drebs <drebs@leap.se> 2017-07-05 17:26:31 -0300
committer: Kali Kaneko <kali@leap.se> 2017-07-07 21:28:07 +0200
commit: 0c5e91c1fa5644e4274e7763ab26138f2cdfb225 (patch)
tree: a56ffeb284f6571780b7d6f2fa707dbc16ab1210 /docs/benchmarks.rst
parent: 7502bf9928780648b8c1f79ab24eb53e3f1d620d (diff)
1 files changed, 60 insertions, 0 deletions
diff --git a/docs/benchmarks.rst b/docs/benchmarks.rst
new file mode 100644
index 00000000..0439f7d4
--- /dev/null
+++ b/docs/benchmarks.rst
@@ -0,0 +1,60 @@
+Benchmarks
+==========
+
+Soledad has a set of benchmark tests to assess the time and resources taken by
+various tasks. 
+
+Results of benchmarking can be seen in https://benchmarks.leap.se/.
+
+Sync size statistics
+--------------------
+
+Currenly, the main use of Soledad is to synchronize client-encrypted email
+data. Because of that, it makes sense to measure the time and resources taken
+to synchronize an amount of data that is realistically comparable to a user's
+email box.
+
+In order to determine what is a good example of dataset for synchronization
+tests, we used the size of messages of one week of incoming and outgoing email
+flow of a friendly provider. The statistics that came out from that are (all
+sizes are in KB):
+
++--------+-----------+-----------+
+|        | outgoing  | incoming  |
++========+===========+===========+
+| min    | 0.675     | 0.461     |
++--------+-----------+-----------+
+| max    | 25531.361 | 25571.748 |
++--------+-----------+-----------+
+| mean   | 252.411   | 110.626   |
++--------+-----------+-----------+
+| median | 5.320     | 14.974    |
++--------+-----------+-----------+
+| mode   | 1.404     | 1.411     |
++--------+-----------+-----------+
+| stddev | 1376.930  | 732.933   |
++--------+-----------+-----------+
+
+Test scenarios
+--------------
+
+Ideally, we would want to run tests for a big data set, but that may be
+infeasible given time and resource limitations. Because of that, we choose a
+smaller data set and suppose that the behaviour is somewhat linear to get an
+idea for larger sets.
+
+Supposing a data set size of 10MB, some possibilities for number of documents
+and document sizes for testing download and upload are:
+
+* 10 x 1M
+* 20 x 500K
+* 100 x 100K
+* 200 x 50K
+* 1000 x 10K
+
+The above scenarios all have documents of the same size. If we want to account
+for some variability on document sizes, it is sufficient to come up with a
+simple scenario where the average, minimum and maximum sizes are somehow
+coherent with the above statistics, like the following one:
+
+* 60 x 15KB + 1 x 1MB
author	drebs <drebs@leap.se>	2017-07-05 17:26:31 -0300
committer	Kali Kaneko <kali@leap.se>	2017-07-07 21:28:07 +0200
commit	0c5e91c1fa5644e4274e7763ab26138f2cdfb225 (patch)
tree	a56ffeb284f6571780b7d6f2fa707dbc16ab1210 /docs/benchmarks.rst
parent	7502bf9928780648b8c1f79ab24eb53e3f1d620d (diff)