summaryrefslogtreecommitdiff
path: root/docs/design/soledad.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/design/soledad.md')
-rw-r--r--docs/design/soledad.md94
1 files changed, 55 insertions, 39 deletions
diff --git a/docs/design/soledad.md b/docs/design/soledad.md
index d200859..081690e 100644
--- a/docs/design/soledad.md
+++ b/docs/design/soledad.md
@@ -5,7 +5,7 @@
Introduction
=====================
-Soledad is a system for to allow client applications the ability to securely share synchronized document databases. Soledad is based on Ubuntu's U1DB, "a cross-platform, cross-device, syncable database API", but with the addition of client-side encryption of documents stored on the server, and encryption of the local database replica.
+Soledad allows client applications to securely share synchronized document databases. Soledad is based on Ubuntu's U1DB, "a cross-platform, cross-device, syncable database API", but with the addition of client-side encryption of database replicas and documents stored on the server.
Key aspects of Soledad include:
@@ -23,8 +23,8 @@ Goals
**Security goals**
-* *Client-side encryption:* Before any data is synced to the cloud, it should be encrypted/decrypted on the client device.
-* *Encrypted local storage:* Any data cached or stored on the client should be stored in an encrypted format.
+* *Client-side encryption:* Before any data is synced to the cloud, it should be encrypted on the client device.
+* *Encrypted local storage:* Any data cached in the client should be stored in an encrypted format.
* *Resistant to offline attacks:* Data stored on the server should be highly resistant to offline attacks (i.e. an attacker with a static copy of data stored on the server would have a very hard time discerning much from the data).
* *Resistant to online attacks:* Analysis of storing and retrieving data should not leak potentially sensitive information.
* *Resistance to data tampering:* The server should not be able to provide the client with old or bogus data for a document.
@@ -66,22 +66,28 @@ Soledad protocol
Storage secret
-----------------------------------
-When a client application first wants to use Soledad, it must provide the user's password to unlock the `storage_secret`. The `storage_secret` is a long, randomly generated symmetric key used to encrypt both the documents stored on the server and the local replica of these documents.
+When a client application first wants to use Soledad, it must provide the user's password to unlock the `storage_secret`. The `storage_secret` is a long, randomly generated symmetric key used to generate encryption keys for both the documents stored on the server and the local replica of these documents.
-TO ADD: example code
+ from leap.soledad.client import Soledad
+ sol = Soledad('<user_uid>', '<user_passphrase>',
+ secrets_path='~/.config/leap/soledad/<user_uid>.secret',
+ local_db_path='~/.config/leap/soledad/<user_uid>.db',
+ server_url='https://<soledad_server_url>',
+ cert_file='~/.config/leap/providers/<provider>/keys/ca/cacert.pem',
+ secret_id='<storage_secret_id>') # optional argument
-The `storage_secret` is saved locally on disk in the file `soledad.json`, block encrypted using a derived key. The derived key is obtained from the user's password.
+The `storage_secret` is saved locally on disk in the file `<user-uid>.secret`, block encrypted using a derived key. The derived key is obtained from the user's password.
-The file `soledad.json` has a field `storage_secrets` that looks like so:
+The file `<user-uid>.secret` has a field `storage_secrets` that looks like so:
{
"storage_secrets": {
"<secret_id>": {
"kdf": "scrypt",
"kdf_salt": "400$8$5fb$61b499fe3366d947",
- "kdf_length": 128,
- "cipher": "aes128",
- "length": 512,
+ "kdf_length": 256,
+ "cipher": "aes256",
+ "length": 1024,
"secret": "<encrypted storage_secret 1>",
}
}
@@ -89,21 +95,19 @@ The file `soledad.json` has a field `storage_secrets` that looks like so:
The `storage_secrets` entry is a map that stores information about each storage key, indexed by the id of each key. For each storage key, the following fields are stored:
-* `kdf`: the key derivation function to use. Only scrypt is currently supported (so for now, this value is ignored).
+* `secret_id`: a handle used to refer to a particular storage_secret and equal to `sha256(storage_secret)`.
+* `kdf`: the key derivation function to use. Only scrypt is currently supported.
* `kdf_salt`: the salt used in the kdf. The salt for scrypt is not random, but encodes important parameters like the limits for time and memory.
* `kdf_length`: the length of the derived key resulting from the kdf.
-* `secret`: the encrypted `storage_secret`, created by `sym_encrypt(cipher, storage_secret, derived_key)` (base64 encoded).
-* `length`: the length of `storage_secret`, when not encrypted.
* `cipher`: what cipher to use to encrypt `storage_secret`. It must match kdf_length (i.e. the length of the derived_key).
-* `secret_id`: a handle used to refer to a particular storage_secret and equal to `md5(storage_secret)`.
+* `length`: the length of `storage_secret`, when not encrypted.
+* `secret`: the encrypted `storage_secret`, created by `sym_encrypt(cipher, storage_secret, derived_key)` (base64 encoded).
Other variables:
* `derived_key` is equal to `kdf(user_password, kdf_salt, kdf_length)`.
* `storage_secret` is equal to `sym_decrypt(cipher, secret, derived_key)`.
-In the current version, only one `storage_secret` is supported.
-
The `storage_secret` is shared among all devices with access to a particular user's Soledad database. See [Recovery and bootstrap](#Recovery.and.bootstrap) for how the storage_secret is initially installed on a device.
We don't use the derived_key as the storage_secret because we want the user to be able to change their password without needing to re-key.
@@ -120,7 +124,12 @@ This is unchanged and identical to the [API used in U1DB](http://pythonhosted.or
* Document indexing and searching: `create_index()`, `list_indexes()`, `get_from_index()`, `delete_index()`.
* Document conflict resolution: `get_doc_conflicts()`, `resolve_doc()`.
-TO ADD: code examples
+ # create document, change it and sync
+ sol.create_doc({'my': 'doc'}, doc_id='mydoc')
+ doc = sol.get_doc('mydoc')
+ doc.content = {'new': 'content'}
+ sol.put_doc(doc)
+ sol.sync()
Document encryption
------------------------
@@ -128,39 +137,44 @@ Document encryption
Before a JSON document is synced with the server, it is transformed into a document that looks like so:
{
- "scheme": "aes128",
- "secret_id": "1",
- "ciphertext": "xxxxxxxxx",
- "mac": "xxxxxxx"
+ "_enc_json": "<encrypted_doc_content>",
+ "_enc_scheme": "symkey",
+ "_enc_method": "aes256ctr",
+ "_enc_iv": "<initialization_vector>",
+ "_mac": "<auth_mac>",
+ "_mac_method": "hmac"
}
About these fields:
-* `ciphertext`: The original JSON document, encrypted and base64 encoded. `ciphertext` is equal to `sym_encrypt(cipher, content, document_secret)`.
-* `scheme`: Information about the block cipher that is used to encrypt this document.
-* `secret_id`: The id of the storage_secret that was used to generate the `document_key`.
-* `mac`: Defined as `HMAC(doc_id|rev|ciphertext, document_secret)`. The purpose of this field is to prevent the server from tampering with the stored documents.
+* `_enc_json`: The original JSON document, encrypted and base64 encoded. `ciphertext` is equal to `sym_encrypt(cipher, content, document_secret)`.
+* `_enc_scheme`: Information about the encryption scheme used to encrypt this document (i.e.'pubkey', 'symkey' or 'none').
+* `_enc_method`: Information about the block cipher that is used to encrypt this document.
+* `_mac`: Defined as `mac(doc_id|rev|ciphertext, document_secret)`. The purpose of this field is to prevent the server from tampering with the stored documents.
+* `_mac_method`: The method used to calculate the mac above (currently hmac).
Other variables:
-* `document_secret`: equal to `HMAC(doc_id, storage_secret)`. This value is unique for every document and only kept in memory. We use document_secret instead of simply storage_secret in order to hinder possible derivation of storage_secret by the server. Every `doc_id` is unique.
+* `document_secret`: equal to `mac(doc_id, storage_secret)`. This value is unique for every document and only kept in memory. We use document_secret instead of simply storage_secret in order to hinder possible derivation of storage_secret by the server. Every `doc_id` is unique.
* `content`: equal to `sym_decrypt(cipher, ciphertext, document_secret)`.
-When receiving a document with the above structure from the server, Soledad client will first verify that `mac` is correct, then decrypt the `ciphertext` to find `content`, which it saves as a cleartext document in the local database replica.
+When receiving a document with the above structure from the server, Soledad client will first verify that `_mac` is correct, then decrypt the `_enc_json` to find `content`, which it saves as a cleartext document in the local database replica.
-TO DO: specify supported ciphers
-
-TO DO: specify supported HMAC
+Currently supported encryption ciphers are AES256 (CTR mode) and XSalsa20;
+currently supported MAC method is HMAC.
Document synchronization
-----------------------------------
-Soledad follows the U1DB synchronization protocol, with two changes:
+Soledad follows the U1DB synchronization protocol, with some changes:
-* Soledad adds the ability to flag some documents so they are not synchronized by default.
-* Soledad will refuse to synchronize a document if it is encrypted and the MAC is incorrect.
+* Add the ability to flag some documents so they are not synchronized by default.
+* Refuse to synchronize a document if it is encrypted and the MAC is incorrect.
+* Always use `https://<soledad_server_url>/user-<user_uid>` as the synchronization URL.
-TO ADD: code examples
+ doc = sol.create_doc({'some': 'data'})
+ doc.syncable = False
+ sol.sync() # will not send the above document to the server!
Document IDs
--------------------
@@ -172,14 +186,12 @@ Re-keying
Sometimes there is a need to change the `storage_secret`. Rather then re-encrypt every document, Soledad implements a system called "lazy revocation" where a new storage_secret is generated and used for all subsequent encryption. The old storage_secret is still retained and used when decrypting older documents that have not yet been re-encrypted with the new storage_secret.
-Implementation status: not yet.
-
-TO DO: code example
+ sol.change_passphrase('<new_passphrase>')
Authentication
-----------------------
-Unlike U1DB, Soledad only supports token authentication and does not support not support OAuth. Soledad itself does not handle authentication. Instead, this job is handled by a thin HTTP middleware layer running in front of the Soledad server daemon. How the session token is obtained is beyond the scope of Soledad.
+Unlike U1DB, Soledad only supports token authentication and does not support OAuth. Soledad itself does not handle authentication. Instead, this job is handled by a thin HTTP WSGI middleware layer running in front of the Soledad server daemon, which retrieves valid tokens from a certain shared database and compares with the user-provided token. How the session token is obtained is beyond the scope of Soledad.
Recovery and bootstrap
------------------------------------------
@@ -258,6 +270,8 @@ Dependencies:
* [U1DB](https://launchpad.net/u1db) provides an API and protocol for synchronized databases of JSON documents.
* [SQLCipher](http://sqlcipher.net/) provides a block-encrypted SQLite database used for local storage.
* python-gnupg
+* scrypt
+* pycryptopp
Local storage
--------------------------
@@ -325,8 +339,10 @@ https://github.com/leapcode/soledad
Dependencies:
* [CouchDB](https://couchdb.apache.org/] for server storage, via [python client library](https://pypi.python.org/pypi/CouchDB/0.8).
-* WSGI middleware for authentication.
* [Twisted](http://twistedmatrix.com/trac/) to run the WSGI application.
+* scrypt
+* pycryptopp
+* PyOpenSSL
CouchDB backend
-------------------------------