1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
|
.. _document-sync:
Document synchronization
========================
Soledad follows `the U1DB synchronization protocol
<https://pythonhosted.org/u1db/conflicts.html>`_ with some modifications:
* A synchronization always happens between the Soledad Server and one Soledad
Client. Many clients can synchronize with the same server.
* Soledad Client :ref:`always encrypts <client-encryption>` before sending
data to the server.
* Soledad Client refuses to receive a document if it is encrypted and the MAC
is incorrect.
* Soledad Server doesn't try to decide about document convergence based on the
document's content, because the content is client-encrypted.
Synchronization protocol
------------------------
Synchronization between the Soledad Server and one Soledad Client consists of
the following steps:
1. The client asks the server for the information it has stored about the last
time they have synchronized (if ever).
2. The client validates that its information regarding the last synchronization
is consistent with the server's information, and raises an error if not.
(This could happen for instance if one of the replicas was lost and restored
from backup, or if a user inadvertently tries to synchronize a copied
database.)
3. The client generates a list of changes since the last change the server
knows of.
4. The client checks what the last change is it knows about on the server.
5. If there have been no changes on either side that the other side has not
seen, the synchronization stops here.
6. The client encrypts and sends the changed documents to the server, along
with what the latest change is that it knows about on the server.
7. The server processes the changed documents, and records the client's latest
change.
8. The server responds with the documents that have changes that the client
does not yet know about.
9. The client decrypts and processes the changed documents, and records the
server's latest change.
10. If the client has seen no changes unrelated to the synchronization during
this whole process, it now sends the server what its latest change is, so
that the next synchronization does not have to consider changes that were
the result of this one.
Synchronization metadata
------------------------
The synchronization information stored on each database replica consists of:
* The replica id of the other replica. (Which should be globally unique
identifier to distinguish database replicas from one another.)
* The last known generation and transaction id of the other replica.
* The generation and transaction id of this replica at the time of the most
recent succesfully completed synchronization with the other replica.
Transactions
------------
Any change to any document in a database constitutes a transaction. Each
transaction increases the database generation by 1, and is assigned
a transaction id, which is meant to be a unique random string paired with each
generation.
The transaction id can be used to detect the case where replica A and replica
B have previously synchronized at generation N, and subsequently replica B is
somehow reverted to an earlier generation (say, a restore from backup, or
somebody made a copy of the database file of replica B at generation < N, and
tries to synchronize that), and then new changes are made to it. It could end
up at generation N again, but with completely different data.
Having random unique transaction ids will allow replica A to detect this
situation, and refuse to synchronize to prevent data loss. (Lesson to be
learned from this: do not copy databases around, that is what synchronization
is for.)
|