summaryrefslogtreecommitdiff
path: root/docs/design
diff options
context:
space:
mode:
authorelijah <elijah@riseup.net>2013-05-04 14:36:29 -0700
committerelijah <elijah@riseup.net>2013-05-04 14:36:29 -0700
commit914ee88a009565b9c12bcff4bf554acbc9fe3f3c (patch)
tree2fd63a5d8712a1f1c92c48cb0f1cf92e73eed0db /docs/design
parent566cd71f4d43d9bd69da1599d93e161f3c785eb4 (diff)
added design docs for nicknym and soledad
Diffstat (limited to 'docs/design')
-rw-r--r--docs/design/cuttlefish.md7
-rw-r--r--docs/design/en.haml8
-rw-r--r--docs/design/nicknym.md445
-rw-r--r--docs/design/overview.md385
-rw-r--r--docs/design/soledad.md367
5 files changed, 1212 insertions, 0 deletions
diff --git a/docs/design/cuttlefish.md b/docs/design/cuttlefish.md
new file mode 100644
index 0000000..6b2c0f5
--- /dev/null
+++ b/docs/design/cuttlefish.md
@@ -0,0 +1,7 @@
+@title = 'Cuttlefish'
+@toc = true
+@summary = "Federated events and callback notifications."
+
+Not yet written.
+
+About the name: Cuttlefish are able to communicate by creating [different patterns on their skin](http://www.newscientist.com/article/dn3728-mathematics-reveals-the-cuttlefishs-wink.html) and communicate secretly with each other by [changing the polarization of their skin](http://www.ncbi.nlm.nih.gov/pubmed/9319987). Also, cuttlefish are [freakishly smart](http://www.pbs.org/wgbh/nova/nature/spineless-smarts.html).
diff --git a/docs/design/en.haml b/docs/design/en.haml
new file mode 100644
index 0000000..b5f1bc3
--- /dev/null
+++ b/docs/design/en.haml
@@ -0,0 +1,8 @@
+- @title = "Design Docs"
+- @summary = "Design documents and specifications for various LEAP components and protocols."
+
+%h1.first Design Documents
+
+Design documents and specifications for various LEAP components and protocols.
+
+= child_summaries \ No newline at end of file
diff --git a/docs/design/nicknym.md b/docs/design/nicknym.md
new file mode 100644
index 0000000..4041d56
--- /dev/null
+++ b/docs/design/nicknym.md
@@ -0,0 +1,445 @@
+@title = 'Nicknym'
+@toc = true
+@summary = "Automatic discovery and validation of public keys."
+
+Introduction
+==========================================
+
+Nicknym is a system to map user nicknames to public keys. With Nicknym, the user to be able to think solely in terms of nickname, while still being able to communicate with a high degree of security (confidentiality, integrity, and authenticity). Essentially, Nicknym is a system for binding human-memorable nicknames to a cryptographic key via automatic discovery and automatic validation.
+
+Nicknym is a federated protocol: a Nicknym address is in the form `username@domain` just alike an email address and Nicknym includes both a client and a server component. Although the client can fall back to legacy methods of key discovery when needed, domains that run the Nicknym server component enjoy much stronger identity guarentees.
+
+Nicknym is key agnostic, and supports whatever public key information is available for an address (OpenPGP, OTR, X.509, RSA, etc). However, Nicknym enforces a strict one-to-one mapping of address to public key.
+
+Existing forms of secure identity are deeply flawed. These systems rely on either a single trusted entity (e.g. Skype), a vulnerable Certificate Authority system (e.g. S/MIME), or keys that cannot be made human memorable (e.g. OpenPGP, OTR). When an identity system is hard to use, it is effectively compromised because too few people take the time to use it properly.
+
+The broken nature of existing identities systems (either in security or in usability) is especially troubling because identity remains a bedrock precondition for any message security: you cannot ensure confidentiality or integrity without confirming the authenticity of the other party. Nicknym is a protocol to solve this problem in a way that is backward compatible, easy for the user, and includes very strong authenticity when possible.
+
+Goals
+==========================================
+
+**High level goals**
+
+* Pseudo-anonymous and human friendly addresses in the form `username@domain`.
+* Automatic discovery and validation of public keys associated with an address.
+* The user should be able to use Nicknym without understanding anything about public/private keys or signatures.
+
+**Technical goals**
+
+* Wide utility: nicknym should be a general purpose protocol that can be used in wide variety of contexts.
+* No revocation: instead of key revocation, support short lived keys that frequently and automatically refresh.
+* Prevent dangerous actions: Nicknym should fail hard when there is a possibility of an attack.
+* Minimize false positives: because Nicknym fails hard, we should minimize false positives where it fails incorrectly.
+* Resistant to malicious actors: Nicknym should be externally auditable in order to assure service providers are not compromised or advertising bogus keys.
+* Resistant to association analysis: Nicknym should not reveal to any actor or network observer a map of a user's associations.
+
+**Non-goals**
+
+* Nicknym does not try to create a decentralized or peer-to-peer identity system.
+
+The binding problem
+=============================================
+
+Nicknym attempts to solve the problem of binding a human memorable identifer to a cryptographic key. If you have the identifier, you should be able to get the key with a high level of confidence, and vice versa. The goal is to have decentralized, human memorable, globally unique public keys. In other words, to violate [Zooko's triangle](https://en.wikipedia.org/wiki/Zooko's_triangle) by making a few consessions.
+
+There are a number of established methods for binding identifier to key:
+
+* [Web of Trust (WOT)](http://en.wikipedia.org/wiki/Web_of_trust)
+* Trust on First Use (TOFU)
+* [X.509 Certificate Authority System](https://en.wikipedia.org/wiki/X.509)
+* [DNSSEC](https://en.wikipedia.org/wiki/Dnssec)
+* [Shared Secret](https://en.wikipedia.org/wiki/Socialist_millionaire)
+* Mail-back Verification
+* [Network Perspective](http://convergence.io/)
+* Global Append-only Log
+* Nonverbal Feedback (a la ZRTP)
+
+The methods differ widely, but they all try to solve the same general problem of proving that a person or organization is in control of a particular key.
+
+Nicknym uses a combination of these methods, utilizing TOFU, X.509, Network Perspective, and additional methods we call "Provider Keys" and "Federated Web of Trust" (FWOT).
+
+1. Nicknym starts with TOFU of user keys, because it is easy to do and backward compatible with legacy providers. In TOFU, your client naively accept the key of another user when it first encounters it. When you TOFU a user key, you are making a bet that possible attackers against you did not have the foresight to specifically target you with a false key during discovery.
+2. Next, we add X.509. For those providers that publish the public keys of their users, we require that these keys be fetched over validated TLS. This makes third party attacks against TOFU more difficult, but also places a lot of trust in the providers (and the Certificate Authorities).
+3. Next, we add a simple form of Network Perspective where the client can ask one provider what key another provider is distributing. This allows a user's client to be able to audit their provider and keep them honest in an automated manner. If a service provider distributes bogus keys, their users and other providers will be quickly alerted to the problem.
+4. Next, we add Provider Keys. If a service provider has a provider key, the public keys of its users are additionally signed by the provider with the "provider key". If your client has the correct provider key, you no longer need to TOFU the keys of the provider's users. This has the benefit making it possible for a user to issue new keys, and to add support for very short-lived keys rather than trying to use key revocation. A service provider is much less likely to lose their private key or have it compromised, a significant problem with TOFU of user keys.
+5. Finally, we add a Federated Web of Trust. The system works like this: each service provider is responsible for the due diligence of properly signing the keys of a few other providers, akin to the distributed web of trust model of OpenPGP, but with all the hard work of proper signature validation placed upon the service provider. When a user communicates with another party who happens to use a service provider that participates in the FWOT, the user’s software will automatically trace a chain of signature from the other party’s key, to their service provider, to the user’s own service provider (with some possible intermediary signatures). This allows for identity that is verified through an end-to-end trust path from any user to any other user in a way that can be automated and is human memorable. Support for a FWOT allows us to bypass entirely X.509 Certificate Authorities, to gracefully handle short lived provider keys, and to handle emergency re-key events if a provider's key is lost.
+
+As we move down this list, each measure taken gets more complicated, requires more provider cooperation, and provides less additional benefit than the one before it. Nevertheless, each measure contributes some important benefit toward the goal of automatic binding of user identity to public key.
+
+**Questions**
+
+*Why not use WOT?* Most users are empirically unable to properly maintain a web of trust. The concepts are hard and it is easy to mess up the signing practice.
+
+*Why not use DNSSEC?* Many reasons. DNS records are slow to update. RSA Public keys will soon be too big for UDP packets (though this is not true of ECC), so putting keys in DNS will mean putting a URL to a key in DNS, so you might as well just use TLS. DNSSEC could still be of added benefit if you put the fingerprint in the DNS record. Mostly, however, a simple HTTP get request is a lot easier to deal with than DNS, both for the client and the server.
+
+*Why not use Shared Secret?* Shared secrets, like with the Socialist Millionaire protocol, are cool in theory but prone to user error and frustration in practice. Was the secret "Invisible Zebra" or "invisibleZebra"?
+
+*Why not use Mail-back Verification?* If the provider distributes user keys, there is not any benefit to mail-back verification. However, it would be good to add support for mail-back verification for non-cooperating legacy providers.
+
+*Why not use Global Append-only Log?* Maybe we should, they are neat. However, current implementations are resource intensive and experimental (e.g. namecoin).
+
+*Why not use Nonverbal Feedback?* ZRTP can use non-verbal clues to establish secure identify because of the nature of a live phone call. This doesn't work for text only messaging.
+
+
+Related work
+===================================
+
+**WebID and BrowserID**
+
+What about WebID or BrowserID? These are both interesting cryptographic identity standards that are gaining support and implementations. So why do we need something new?
+
+These protocols, and the poorly conceived OpenID Connect, are designed to address a fundamentally different problem: authenticating a user to a website. The problem of authenticating users to one another requires a different architecture entirely. There are some similarities, however, and in the long run Nicknym could be combined with something like BrowserID.
+
+**STEED**
+
+[STEED](http://g10code.com/steed.html) is a proposal with very similar goals to Nicknym. In a nutshell, Nicknym basically looks very similar to STEED when the domain owner does not support Nicknym. STEED includes four main ideas:
+
+* trust upon first contact: Nicknym uses this as well, although this is the fallback mechanism when others fail.
+* automatic key distribution and retrieval: Nicknym uses this as well, although we used HTTP for this instead of DNS.
+* automatic key generation: Nicknym is designed specifically to support automatic key generation, but this is outside the scope of the Nicknym protocol and it is not required.
+* opportunistic encryption: Again, Nicknym is designed to support opportunistic encryption, but does not require it.
+
+Additional differences include:
+
+* Nicknym is key agnostic: Nicknym does not make an assumption about what types of public keys a user wants to associate with their address.
+* Nicknym is protocol agnostic: Nicknym can be used with SMTP, XMPP, SIP, etc.
+* Nicknym relies on service provider adoption: With Nicknym, the strength of verification of public keys rests the degree to which a service provider adopts Nicknym. If a service provider does not support Nicknym, then effectively Nicknym opperates like STEED for that domain.
+
+
+Nicknym protocol
+==============================
+
+Definitions
+-------------------------
+
+* **address**: A globally unique handle in the form user@domain (i.e. an email, SIP, or XMPP address).
+* **provider**: A service provider that offers end-user services on a particular domain.
+* **user key**: A public/private key pair associated with a user address. If not specified, "user key" refers to the public key.
+* **provider key**: A public/private key pair owned by the provider. The address associated with this key is just the domain of the service provider.
+* **validated key**: A key is "validated" if the nickagent has bound the user address to a public key.
+* **nickagent**: Client side program that manages a user's contact list, the public keys they have encountered and validated, and the user's own key pairs.
+* **nickserver**: Server side daemon run by providers who support Nicknym.
+
+Nickserver requests
+-----------------------
+
+A nickagent will attempt to discover the public key for a particular user address by contacting a nickserver. The nickserver returns JSON encoded key information in response to a simple HTTP request with a user's address. For example:
+
+ curl -X POST -d address=alice@domain.org https://nicknym.domain.org:6425
+
+* The port is always 6425.
+* The HTTP verb may be POST or GET.
+* The request must use TLS (see [Query security](#Query.security)).
+* The query data should have a single field 'address'.
+* For POST requests to nicknym.domain.org, the query data may be encrypted to the the public OpenPGP key nicknym@domain.org (see [Query security](#Query.security)).
+
+Requests may be local or foreign, and for user keys or for provider keys.
+
+* **local** requests are for information that the nickserver is authoritative. In other words, when the requested address is for the same domain that the nickserver is running on.
+* **foreign** request are for information about other domains.
+* **user key** requests are for addresses in the form "username@domain".
+* **provider key** requests are for addresses in the form "domain".
+
+**Local, Provider Key request**
+
+For example:
+
+ https://nicknym.domain.org:6425/?address=domain.org
+
+The response is the authoritative provider key for that domain.
+
+**Local, User Key request**
+
+For example:
+
+ https://nicknym.domain.org:6425/?address=alice@domain.org
+
+The nickserver returns authoritative key information from the provider's own user database. Every public key returned for local requests must be signed by the provider's key.
+
+**Foreign, Provider Key request**
+
+For example:
+
+ https://nicknym.domain.org:6425/?address=otherdomain.org
+
+1. First, check the nickserver's cache database of discovered keys. If the cache is not old, return this key.
+2. Otherwise, fetch provider key from the provider's nickserver, cache the result, and return it.
+
+**Foreign, User Key request**
+
+For example:
+
+ https://nicknym.domain.org:6425/?address=bob@otherdomain.org
+
+* First, check the nickserver's database cache of nicknyms. If the cache is not old, return the key information found in the cache.
+* Otherwise, attempt to contact a nickserver run by the provider of the requested address. If the nickserver exists, query that nickserver, cache the result, and return it in the response.
+* Otherwise, fall back to querying existing SKS keyservers, cache the result and return it.
+* Otherwise, return a 404 error.
+
+If the key returned for a foreign request contains multiple user addresses, they are all ignored by nicknym except for the user address specified in the request.
+
+Nickserver response
+---------------------------------
+
+A nickserver response is a JSON encoded map with a field "address" plus one or more of the following fields: "openpgp", "otr", "rsa", "ecc", "x509-client", "x509-server", "x509-ca".
+
+A nickserver response is always signed with the OpenPGP public signing key associated with the address nicknym@domain.org. The signature is ASCII armored and appended to the JSON.
+
+For example:
+
+ {
+ "address": "alice@example.org",
+ "openpgp": "6VtcDgEKaHF64uk1c/crFhRHuFW9kTvgxAWAK01rXXjrxEa/aMOyXnVQuQINBEof...."
+ }
+ -----BEGIN PGP SIGNATURE-----
+ iQIcBAEBCgAGBQJRhWO+AAoJEIaItIgARAAl2IwP/24z9CjKjD0fd27pQs+r+e3h
+ p8KAYDbVac3+c3vm30DjHO/RKF4Zq6+sTAIkrFvXOwYJl9KgjMpQVV/voInjxATz
+ -----END PGP SIGNATURE-----
+
+If the data in the request was encrypted to the public key nicknym@domain.org, then the JSON response and signature are additionally encrypted to the symmetric key found in the request and returned base64 encoded.
+
+Query balancing
+------------------------
+
+A nickagent must choose what IP address to query by selecting randomly from among hosts that resolve from `nicknym.domain.org` (where `domain.org` is the domain name of the provider).
+
+If a host does not response, a nickagent must skip over it and attempt to contact another host in the pool.
+
+Query security
+--------------------------
+
+TLS is required for all nickserver queries.
+
+When querying https://nicknym.domain.org, nickagent must validate the TLS connection in one of three ways:
+
+1. Using a commercial CA certificate distributed with the host operating system.
+2. Using a seeded CA certificate (see [Discovering nickservers](#Discoverying.nickservers)).
+3. Using a custom self-signed CA certificate discovered for the domain, so long as the CA certificate was discovered via #1 or #2. Custom CA certificates may be discovered for a domain by making a provider request of a nickserver (e.g. https://nicknym.known-domain.org/?address=new-domain.org).
+
+Optionally, a nickagent may make an encrypted query like so:
+
+0. Suppose the nickagent wants to make an encrypted query regarding the address alice@x.org.
+1. Nickagent discovers the public key for nicknym@domain.org
+2. Nickagent uses the OpenPGP key for nicknym@domain.org to encrypt the body of the request (using POST). The request body should consist of the address being queried and the second line a randomly generated 128 bit symmetric key. The request can be foreign or local.
+3. The body of the nickserver' response is encrypted using AES128 using the symmetric key.
+
+Comment: although it may seem excessive to encrypt both the request via TLS and the request body via OpenPGP, the reason for this is that many requests will not use OpenPGP.
+
+Automatic key validation
+----------------------------------
+
+A key is "validated" if the nickagent has bound the user address to a public key.
+
+Nicknym supports three different levels of key validation:
+
+* Level 3 - path trusted: A path of cryptographic signatures can be traced from a trusted key to the key under evaluation. By default, only the provider key from the user's provider is a "trusted key".
+* Level 2 - provider signed: The key has been signed by a provider key for the same domain, but the provider key is not validated using a trust path (i.e. it is only registered)
+* Level 1 - registered: The key has been encountered and saved, it has no signatures (that are meaningful to the nickagent).
+
+nickagent will try to validate using the highest level possible.
+
+Automatic renewal
+-----------------------------
+
+A validated public key is replaced with a new key when:
+
+* The new key is path trusted
+* The new key is provider signed, but the old key is only registered.
+* The new key has a later expiration, and the old key is only registered and will expire "soon" (exact time TBD).
+* The agent discovers a new subkey, but the master signing key is unchanged.
+
+In all other cases, the new key is rejected.
+
+The nickagent will attempt to refresh a key by making request to a nickserver of its choice when a key is past 3/4 of its lifespan and again when it is about to expire.
+
+Nicknym encourages, but does not require, the use of short lived public keys, in the range of X to Y days. It is recommended that short lived keys are not uploaded to OpenPGP keyservers.
+
+Automatic invalidation
+----------------------------
+
+A key is invalidated if:
+
+* The old key has expired, and no new key can be discovered with equal or greater validation level.
+
+This means validation is a one way street: once a certain level of validation is established for a user address, no client should accept any future keys for that address with a lower level of validation.
+
+Discovering nickservers
+--------------------------------
+
+It is entirely up to the nickagent to decide what nickservers to query. If it wanted to, a nickagent send all its requests to a single nickserver.
+
+However, nickagents should discover new nickservers and balance their queries to these nickservers for the purposes of availability, load balancing, network perspective, and hiding the user's association map.
+
+Whenever the nickagent is asked by a locally running application for a public key corresponding to an address on the domain `domain.org`, it may check to see if the host `nicknym.domain.org` exists. If the domain resolves, then the nickagent may add it to the pool of known nickservers.
+
+Additionally, a nickagent may be distributed with an initial list of "seed" nickservers. In this case, the nickagent is distributed with a copy of the CA certificate used to validate the TLS connection with each respective seed nickserver.
+
+Cross-provider signatures
+----------------------------------
+
+To be written.
+
+Auditing
+----------------------------
+
+In order to keep the user's provider from handing out bogus public keys, a nickagent should occasionally make foreign queries of the user's own address against nickservers run by third parties.
+
+In order to prevent a nickserver from handing out bogus provider keys, a nickagent should query multiple nickservers before a provider key is registered or path trusted.
+
+Possible attacks:
+
+**Attack 1 - Intercept Outgoing:**
+
+* Attack: provider `A` signs an impostor key for provider `B` and distributes it to users of `A` (in order to intercept outgoing messages sent to `B`).
+* Countermeasure: By querying multiple nickservers for the provider key of `B`, the nickagent can detect if provider `A` is attempting to distribute impostor keys.
+
+**Attack 2 - Intercept Incoming:**
+
+* Attack: provider `A` signs an impostor key for one of its own users, and distributes to users of provider `B` (in order to intercept incoming messages).
+* Countermeasure: By querying for its own keys, a nickagent can detect if a provider is given out bogus keys for their addresses.
+
+**Attack 3 - Association Mapping:**
+
+* Attack: A provider tracks all the requests for key discovery in order to build a map of association.
+* Countermeasure: By performing foreign key queries via third party nickservers, an agent can prevent any particular entity from tracking their queries.
+
+
+Future enhancements
+---------------------
+
+Should we support additional discovery mechanisms:
+
+* Webfinger includes a standard mechanism for distributing a user's public key via a simple HTTP request. This is very easy to implement on the server, and very easy to consume on the client side.
+* There are multiple competing standards for key discovery via DNS. When and if one of these emerges predominate, Nicknym should attempt to use this method when available. DNS discovery, however, has some problems. DNS discovery of keys is much harder to implement, because the service provider must run their own customized authoritative nameserver. Also, since (RSA) keys can be too big for domain UDP packets, any future-proof DNS method relies on an HTTP request, thus undermining the potential benefit of decentralization you might get from using DNS rather than webfinger.
+
+
+
+Reference nickagent implementation
+====================================================
+
+There is a reference nickagent implementation called "key manager" written in Python and integrated into the LEAP client. It uses Soledad to store its data.
+
+Public API
+----------------------------
+
+**refresh_keys()**
+
+updates the keys with fresh ones, as needed.
+
+**get_key(address, type)**
+
+returns a single public key for address. type is one of 'openpgp', 'otr', 'x509', or 'rsa'.
+
+**send_key(address, public_key, type)**
+
+authenticates with the appropriate provider and saves the public_key in the user database.
+
+Storage
+--------------------------
+
+Key manager uses Soledad for storage. GPGME, however, requires keys to be stored in keyrings, which are read from disk.
+
+For now, Key Manager deals with this by storing each key in its own keyring. In other words, every key is in a keyring with exactly 1 key, and this keyring is stored in a Soledad document. To keep from confusing this keyring from a normal keyring, I will call it a 'unitary keyring'.
+
+Suppose Alice needs to communicate with Bob:
+
+1. Alice's Key Manager copies to disk her private key and bob's public key. The key manager gets these from Soledad, in the form of unitary Keyrings.
+2. Client code uses GPGME, feeding it these temporary keyring files.
+3. The keyrings are destroyed.
+
+TBD: how best to ensure destruction of the keyring files.
+
+An example Soledad document for an address:
+
+ {
+ "address":"alice@example.org",
+ "keys": [
+ {
+ "type": "opengpg"
+ "key": "binary blob",
+ "keyring": "binary blob",
+ "expires_on": "2014-01-01",
+ "validation": "provider_signed",
+ "first_seen_at": "2013-04-01 00:11:00",
+ "last_audited_at": "2013-04-02 12:00:00",
+ },
+ {
+ "type": "otr"
+ "key": "binary blob",
+ "expires_on": "2014-01-01",
+ "validation": "registered",
+ "first_seen_at": "2013-04-01 00:11:00",
+ "last_audited_at": "2013-04-02 12:00:00",
+ }
+ ]
+ }
+
+Pseudocode
+---------------------------
+
+get_key
+
+ #
+ # return a key for an address
+ #
+ function get_key(address, type)
+ if key for address exists in soledad database?
+ return key
+ else
+ fetch key from nickserver
+ save it in soledad
+ return key
+ end
+ end
+
+send_key
+
+ #
+ # send the user's provider the user's key. this key will get signed by the provider, and replace any prior keys
+ #
+ function send_key(type)
+ if not authenticated:
+ error!
+ end
+ get (self.address, type)
+ send (key_data, type) to the provider
+ end
+
+refresh_keys
+
+ #
+ # update the user's db of validated keys to see if there are changes.
+ #
+ function refresh_keys()
+ for each key in the soledad database (that should be checked?):
+ newkey = fetch_key_from_nickserver()
+ if key is about to expire and newkey complies with the renewal paramters:
+ replace key with newkey
+ else if fingerprint(key) != fingerprint(newkey):
+ freak out, something wrong is happening? :)
+ may be handle revokation, or try to get some voting for a given key and save that one (retrieve it through tor/vpn/etc and see what's the most found key or something like that.
+ else:
+ everything's cool for this key, continue
+ end
+ end
+ end
+
+private fetch_key_from_nickserver
+
+ function fetch_key_from_nickserver(key)
+ randomly pick a subset of the available nickservers we know about
+ send a tcp request to each in this subset in parallel
+ first one that opens a successful socket is used, all the others are terminated immediately
+ make http request
+ parse json for the keys
+ return keys
+ end
+
+
+Reference nickserver implementation
+=====================================================
+
+The reference nickserver is written in Ruby 1.9 and licensed GPLv3. It is lightweight and scalable (supporting high concurrency, and reasonable latency), and uses EventMachine for asynchronous network IO. Data stored in CouchDB.
+
+For more information, see https://github.com/leapcode/nickserver
+
diff --git a/docs/design/overview.md b/docs/design/overview.md
new file mode 100644
index 0000000..2d257c7
--- /dev/null
+++ b/docs/design/overview.md
@@ -0,0 +1,385 @@
+@nav_title = "Overview"
+@title = "Overview of LEAP architecture"
+@summary = "Bird's eye view of how all the pieces fit together."
+
+The LEAP Platform allows an organization to deploy and manage a complete infrastructure for providing user communication services.
+
+This document gives a brief overview of how the pieces fit together.
+
+LEAP Client
+===================
+
+The LEAP Client is an application that runs on a user's own device and is responsible for all encryption of user data. The client must be installed a user's device before they can access any LEAP services (except for user support via the web application).
+
+Desktop Client
+--------------------------
+
+LEAP Client for Linux, Windows, and Mac.
+
+Written in: Python
+
+Libraries used: QT, PyQT, OpenVPN, Sqlite, Sqlcipher, U1DB, OpenSSL, GPG.
+
+User interface:
+
+* First run wizard: walks the user through the bootstrap process when the client is first run (either registering a new user or authenticating as an existing user)
+* Preferences panel: A mac system-preferences-like place to edit all the LEAP client settings (does not exist yet).
+* Task bar: Show the status of LEAP services (connected? syncing?), and lets the user open the preferences panel.
+* Update wizard: a dialog that shows the code update progress.
+
+Android Client
+------------------------------
+
+LEAP Client for Android.
+
+Written in: Java (possibly with with some Python in the future)
+
+Libraries used: sqlcipher, sqlite, bouncycastle, U1DB, OpenVPN.
+
+User interface:
+
+* Single button to connect or disconnect encrypted internet
+* A notification drawer item indicating status of VPN
+* A first run wizard
+
+Features (planned):
+
+* a sync provider to allow contacts and calendar data to be sync'ed via Soledad.
+* eventually, match the desktop client in features.
+
+
+LEAP Admin Tools
+====================================
+
+Platform Recipes
+------------------------------
+
+The LEAP platform recipes define an abstract service provider. It consists of puppet modules designed to work together to provide a system administrator everything they need to manage a service provider infrastructure that provides secure communication services.
+
+Typically, a system administrator will not need to modify the LEAP platform recipes, although they are free to fork and merge as desired. Most service providers using the LEAP platform will use the same platform recipes.
+
+The recipes are abstract. In order to configure settings for a particular service provider, a system administrator creates a provider instance. The platform recipes also include a base provider that provider instances inherit from.
+
+Provider Instance
+----------------------------------
+
+A "provider instance" is a directory tree (typically tracked in git) containing all the configurations for a service provider's infrastructure. A provider instance primarily consists of:
+
+* A configuration file for each server (node) in the provider's infrastructure (e.g. nodes/vpn1.json)
+* A global configuration file for the provider (e.g. provider.json).
+* Additional files, such as certificates and keys (e.g. files/nodes/vpn1/vpn1_ssh.pub).
+* A pointer to the platform recipes (as defined in "Leapfile")
+
+A minimal provider instance directory looks like this:
+
+
+ └── bitmask # provider instance directory.
+ ├── common.json # settings common to all nodes.
+ ├── Leapfile # various settings for this instance.
+ ├── provider.json # global settings of the provider.
+ ├── files/ # keys, certificates, and other files.
+ ├── nodes/ # a directory for node configurations.
+ └── users/ # public key information for privileged sysadmins.
+
+A provider instance directory contains everything needed to manage all the servers that compose a provider's infrastructure. Because of this, you can use normal git development work-flow to manage your provider instance.
+
+Command line program
+-------------------------------
+
+The command line program `leap` is used by sysadmins to manage everything about a service provider's infrastructure. Except when creating an new provider instance, `leap` is run from within the directory tree of a provider instance.
+
+The `leap` command line has many capabilities, including:
+
+* create an initial provider instance
+* create, initialize, and deploy nodes (e.g. servers)
+* manage keys and certificates
+* query information about the node configurations
+
+Traditional system configuration automation systems, like puppet or chef, deploy changes to servers using a pull method. Each server pulls a manifest from a central master server and uses this to alter the state of the server.
+
+Instead, LEAP uses a masterless push method: The user runs 'leap deploy' from the provider instance directory on their desktop machine to push the changes out to every server (or a subset of servers). LEAP still uses puppet, but there is no central master server that each node must pull from.
+
+One other significant difference between LEAP and typical system automation is how interactions among servers are handled. Rather than store a central database of information about each server that can be queried when a recipe is applied, the `leap` command compiles static representation of all the information a particular server will need in order to apply the recipes. In compiling this static representation, `leap` can use arbitrary programming logic to query and manipulate information about other servers.
+
+These two approaches, masterless push and pre-compiled static configuration, allow the sysadmin to manage a set of LEAP servers using traditional software development techniques of branching and merging, to more easily create local testing environments using virtual servers, and to deploy without the added complexity and failure potential of a master server.
+
+Server-side Components
+=======================================
+
+These are components where most of the code and logic runs on a server (as opposed to client-side components, where most of the code runs on the client).
+
+Databases
+------------------------------------
+
+All user data is stored using BigCouch, a decentralized and high-availability version of CouchDB.
+
+There are three "main" databases:
+
+* users -- stores basic information about each user, such as their username, a SRP password verifier, and any email aliases or forwards.
+* tickets -- database of help desk tickets.
+* client_certificates -- a pool of short-lived client x.509 certificates that are distributed to authenticated clients when their client certificate has expired.
+
+Additionally, each user may have multiple databases for storing client-encrypted data, such as email messages.
+
+Like many NoSQL databases, BigCouch is inspired by [Amazon's Dynamo paper](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) and works by sharding each database among many servers using a circular ring hash. The number of shards might be greater than the number of servers, in which case each server would have multiple shards of the same database. Each server in the BigCouch cluster appears to contain the entire database, but actually it will just proxy the request to the actual database that has the content (if it does not have the document itself).
+
+Important BigCouch constants:
+
+* Q -- The number of shards over which a database will spread.
+* N -- The number of redundant copies of each document. Default is 3.
+* W -- The number of document copies that must be saved before document is 'written'. Default is 2.
+* R -- The number of document copies that must be found before document is 'read'. Default is 2.
+* Z -- The number of zones in the cluster. Each zone will have a complete copy of all the data. Default is 1.
+
+In LEAP, every service that needs to interact with the database runs a local HTTP load balancer that distributes database requests randomly to the BigCouch cluster. If a BigCouch node dies, the load balancer detects this and takes it out of rotation (this usage is typical of BigCouch installations).
+
+Web App
+------------------------------
+
+The LEAP Web App provides the following functions:
+
+* User registration and management
+* Help tickets
+* Client certificate renewal
+* Webfinger access to user's public keys
+* Email alias and forwarding
+
+Written in: Ruby, Rails.
+
+The Web App communicates with:
+
+* CouchDB is used for all data storage.
+* Web browsers of users accessing the user interface in order to edit their settings or fill out help tickets. Additionally, admins may delete users.
+* LEAP Clients access the web app's REST API in order to register new users, authenticate existing ones, and renew client certificates.
+
+Nickserver
+------------------------------
+
+Written in: Ruby
+Libaries: EventMachine, GPG
+
+Nickserver is the opposite of a key server. A key server allows you to lookup keys, and the UIDs associated with a particular key. A nickserver allows you to query a particular 'nick' (e.g. username@example.org) and get back relevant public key information for that nick.
+
+Nickserver has the following properties:
+
+* Written in Ruby, licensed GPLv3
+* Lightweight and scalable (high concurrency, reasonable latency)
+* Uses asynchronous network IO for both server and client connections (via EventMachine)
+* Attempts to reply to queries using four different methods:
+ * Cached key in CouchDB
+ * Webfinger
+ * DNS
+ * HKP keyserver pool (https://hkps.pool.sks-keyservers.net)
+
+Why bother writing Nickserver instead of just using the existing HKP keyservers?
+
+* Keyservers are fundamentally different: Nickserver is a registry of 1:1 mapping from nick (uid) to public key. Keyservers are directories of public keys, which happen to have some uid information in the subkeys, but there is no way to query for an exact uid.
+* Support clients: the goal is to provide clients with a cloud-based method of rapidly and easily converting nicks to keys. Client code can stay simple by pushing more of the work to the server.
+* Enhancements over keyservers: the goal with Nickserver is to support future enhancements like webfinger, DNS key lookup, mail-back verification, network perspective, and fast distribution of short lived keys.
+* Scalable: the goal is for a service that can handle many simultaneous requests very quickly with low memory consumption.
+
+Miscellaneous
+------------------------------
+
+A LEAP service provider might also run servers with the following services:
+
+* git -- private git repository hosting.
+* Domain Name Server -- Authoritative name server for the provider's domain.
+* CA Daemon -- headless daemon that generates x.509 certificates and puts them in the distributed database.
+
+Client-side Components
+======================================
+
+Most of the code and processing for these components happens on the client-side, although they all include some interaction with cloud services.
+
+Soledad
+------------------------------
+
+Soledad stands for "Synchronization Of Locally Encrypted Data Among Devices". On the client side, Soledad is responsible for client-encrypting user data, keeping it in sync with the copy in the cloud, and for providing local applications with a simple API for data storage. This "client-side Soledad" is essentially a local database that is kept in sync with the cloud. The "Soledad Server" is the cloud-based component that the client syncs with.
+
+Written in: Python (on desktops and servers), possibly Java (on android, not yet written).
+
+Libraries used:
+
+* Client-side: U1DB, Sqlite, Sqlcipher, GPG.
+* Server: U1DB (forked), CouchDB.
+
+Client-side Soledad communicates with:
+
+* Other client application code, providing a storage API.
+* Soledad Server via the U1DB synchronization protocol.
+
+Soledad Server communicates with:
+
+* LEAP Client via the U1DB synchronization protocol
+* CouchDB or OpenStack Object Storage for backend storage.
+
+Client-side Soledad Notes:
+
+* Soledad is an modification of U1DB python reference implementation with changes to support client-side encryption and to replace sqlite with sqlcipher.
+* Local data is stored on disk as an SQLite DB file(s) that is block-encrypted with sqlcipher (AES128).
+* Before being synced to the server, a document is block-encrypted using a symmetric key composed from HMAC of the document id and a long secret (Soledad secret).
+* Soledad secret is stored on-disk encrypted to the user's OpenPGP key. A copy is stored on the server as well. The same secret is shared among all the clients a user has activated.
+* Soledad inherits these traits from U1DB:
+ * The storage API used by client code is similar to couchdb (schema-less document storage with indexes).
+ * Application code using Soledad is responsible for resolving sync conflicts.
+
+Secrets Manager
+------------------------------
+
+Not yet written.
+
+Written in: Python
+Libraries used: GPG, GnuTLS
+
+Communicates with: Nickserver (cloud), Soledad (local).
+
+The Secrets Manager is a library that exposes to local client code an API for managing cryptographic material. It is responsible for:
+
+* private secrets: the user's private and public keys and certificates.
+* public keys: discovering, registering, and trusting the public keys of other people.
+* creation: creating keys as needed.
+* renewal: fetch a new client certificate when the current one is about to expire.
+* recovery: allow the user to recover their data if they lose everything except for a recovery code.
+* crypto hardware: allow the user to unlock secrets via an OpenPGP smart card like cryptostick.
+
+**Example secrets**
+
+* A user's OpenPGP keypair
+* The symmetric key used to encrypt local data (used by sqlcipher)
+* The client certificate used to auth with OpenVPN gateway.
+* The client certificate used to auth with the SMTP gateway.
+
+**Public key management**
+
+Some functionality of public key management:
+
+* Discover the public keys of recipients and senders via a Nickserver.
+* "Register" the discovered keys, either using a federated path through the provider, directly, or via trust on first use (TOFU). For now, we will start initially with TOFU.
+* Allow the user to choose between two competing keys when a recipient has multiple candidate keys.
+* Allow the user to specify keys that should be not used.
+* Allow the user to manually specify a user's public key.
+
+**Recovery**
+
+* Allow the user to generate and print out a recovery code. This creates a record on the server, in an anonymized way, that can be used to restore all the secrets stored by the key/secret manager and thus recover all your data. The provider should not know what recovery information maps to which user.
+* Eventually, perhaps allow the user to specify other users who have the power to recover their lost secrets in the event that the user forgets their password.
+* Allow the user to enter this recovery code when they have lost their username and password. If this is enabled, the user's private keys are stored in the cloud, albeit encrypted and anonymized.
+* Give some users the option of full recovery via email reset by storing the user's password on the server. This would be a very low security option, but one that some users may wish to opt-in for.
+
+**Notes**
+
+* All secrets are stored in Soledad, except the secret to unlock Soledad storage. This way, all clients will have access to the same secrets. For some things, like validated public keys, this is exactly what we want. For other things, this could be a problem, and should be refined in future revisions.
+* The current scheme is to store the user's private keys and private secrets in their Soledad storage. This allows a user to login with a different device and be all set up. There are, however, certainly problems with this approach.
+
+
+Bootstrap
+------------------------------
+
+Parts of this are written.
+
+Written in: Python
+
+* Register new accounts or authenticate via the REST API, using SRP.
+* Download the providers definition file, and various service definition files.
+* Validate the CA certificate of the service provider.
+* If using an existing account on a new device, fetch user's secrets from the cloud (not yet written).
+* If creating a new account, generate a key pair and store in the cloud (not yet written).
+
+Update Manager
+------------------------------
+
+Not yet written.
+
+Handles upgrading the client by downloading and installing signed code.
+
+Three goals:
+
+* Frequent Updates: we want to be able to push out small and frequent updates should the need arise.
+* Secure Updates: we want to ensure that the update mechanism cannot be used as an attack vector.
+* Third Party Updates: we want a third party to be responsible for updates, NOT the service provider itself.
+
+End User Services
+=========================================
+
+Email
+------------------------------
+
+Not yet working, some of the parts are written.
+
+Written in: Python
+
+Email in the client consists of three parts:
+
+* SMTP Proxy: for outgoing mail.
+ * Communicates with user's MUA (local), Key Manager (local), Nickserver (cloud), and SMTP relay (cloud).
+* Message Receiver: for incoming mail.
+ * Communicates with Soledad (local), Key Manager (local).
+* IMAP Server: for reading and writing to user's mailbox.
+ * Communicates with Soledad (local), user's MUA (local).
+
+Outgoing mail workflow:
+
+* LEAP client runs a thin SMTP proxy on the user's device, bound to localhost.
+* User's MUA is configured outgoing SMTP to localhost
+* When SMTP proxy receives an email from MUA
+ * SMTP proxy queries Key Manager for the user's private key and public keys of all recipients
+ * Message is signed by sender and encrypted to recipients.
+ * If recipient's key is missing, email goes out in cleartext (unless user has configured option to send only encrypted email)
+ * Finally, message is relayed to provider's SMTP relay
+
+Incoming email workflow:
+
+* Incoming message is received by provider's MX servers.
+* Message is encrypted to the user's public key (if not already so), and stored in the user's incoming message queue.
+* Message queue is synced to client device via Soledad.
+* "Message Receiver" in the LEAP Client empties message queue, unencrypting each message and saving it in the user's inbox, stored in local Soledad database.
+* Local database gets client-encrypted and sync'ed to cloud and other devices owned by the user via Soledad.
+
+Mail storage workflow:
+
+* LEAP client runs a thin IMAP server on the user's device, bound to localhost.
+* User's MUA is configured to use localhost for the mail account.
+* Local IMAP server runs against a local database the user's email data (access via Soledad).
+* Soledad will sync changes made to mailboxes with the cloud and other clients.
+
+Encrypted Internet
+------------------------------
+
+The goal behind the encrypted internet service is to provide an automatic, always on, trouble free way to encrypt a user's network traffic. For now, we use OpenVPN for the transport (OpenVPN uses TLS for session negotiation and IPSec for data).
+
+Written in: C (OpenVPN binary), Python (desktop controlling code), Java (android controlling code)
+Libraries: QT
+Uses: OpenVPN
+
+Communicates with:
+
+* All traffic is routed through one of the provider's OpenVPN gateways
+* OpenVPN binary and LEAP client communicate via a telnet administration interface to OpenVPN.
+* Client discovers gateways and fetches client certificate from the provider's HTTP API.
+
+User Interface:
+
+* Initial connection attempt takes place in the first run wizard, displaying any errors along the way.
+* After first run, the client will display the status of the encrypted internet in the task tray (windows, linux), menu bar (mac), or notification drawer (android).
+* The three main UI functions of the encrypted internet will be: connect/disconnect, choose gateway, view errors.
+
+Notes:
+
+* OpenVPN must be started with superuser privileges (or have the ability to execute network changes as superuser). Afterwards, it can drop the privileges.
+* OpenVPN authentication with the gateway uses an x.509 client certificate. This certificate is short lived, and is acquired by the client from the provider's HTTP API as needed.
+
+Workflow:
+
+* user installs client
+* on first run
+ * client downloads and validates service provider's definition file, CA cert, and encrypted internet service definition file.
+ * user registers new account or authenticates with provider's webapp REST API
+ * SRP is used, server never sees the password and does not store a hash of the password.
+ * if registering, new record is created for user in distributed users db.
+* client gets a new client certificate from webapp, if missing or expired
+ * authenticate via SRP with webapp
+ * webapp retrieves client cert from a pool of pre-generated certificates.
+ * cert pool is filled as needed by background CA deamon.
+* client connects to openvpn gateway, picked from among those listed in service definition file, authenticating with client certificate.
+* by default, when user starts computer the next time, client autoconnects. \ No newline at end of file
diff --git a/docs/design/soledad.md b/docs/design/soledad.md
new file mode 100644
index 0000000..0e21016
--- /dev/null
+++ b/docs/design/soledad.md
@@ -0,0 +1,367 @@
+@title = 'Soledad'
+@summary = 'A server daemon and client library to provide client-encrypted application data that is kept synchronized among multiple client devices.'
+@toc = true
+
+Introduction
+=====================
+
+Soledad is a system for to allow client applications the ability to securely share synchronized document databases. Soledad is based on Ubuntu's U1DB, "a cross-platform, cross-device, syncable database API", but with the addition of client-side encryption of documents stored on the server, and encryption of the local database replica. Soledad is an acronym of "Synchronization of Locally Encrypted Documents Among Devices" and means "solitude" in Spanish.
+
+Key aspects of Soledad include:
+
+* **Client and server:** Soledad includes a server daemon and client application library.
+* **Client-side encryption:** Soledad puts very little trust in the server by encrypting all data before it is synchronized to the server and by limiting ways in which the server can modify the user's data.
+* **Local storage:** All data cached locally is stored in an encrypted database.
+* **Document database:** An application using the Soledad client library is presented with a document-centric database API for storage and sync. Documents may be indexed, searched, and versioned.
+
+The current reference implementation of Soledad is written in Python and distributed under a GPLv3 license.
+
+Goals
+======================
+
+Security goals
+--------------------------------------
+
+* *Client-side encryption:* Before any data is synced to the cloud, it should be encrypted/decrypted on the client device.
+* *Encrypted local storage:* Any data cached or stored on the client should be stored in an encrypted format.
+* *Resistant to offline attacks:* Data stored on the server should be highly resistant to offline attacks (i.e. an attacker with a static copy of data stored on the server would have a very hard time discerning much from the data).
+* *Resistant to online attacks:* Analysis of storing and retrieving data should not leak potentially sensitive information.
+* *Resistance to data tampering:* The server should not be able to provide the client with old or bogus data for a document.
+
+Synchronization goals
+-------------------------------------
+
+* *Consistency:* multiple clients should all get sync'ed with the same data.
+* *Sync flag:* the ability to partially sync data. For example, so a mobile device doesn't need to sync all email attachments.
+* *Multi-platform:* supports both desktop and mobile clients.
+* *Quota:* the ability to identify how much storage space a user is taking up.
+* *Scalable cloud:* distributed master-less storage on the cloud side, with no single point of failure.
+* *Conflict resolution:* conflicts are flagged and handed off to the application logic to resolve.
+
+Usability goals
+---------------------------------
+
+* *Availability*: the user should always be able to access their data.
+* *Recovery*: there should be a mechanism for a user to recover their data should they forget their password.
+
+Known limitations
+------------------------------
+
+* Currently, the server knows when the contents of a document have changed.
+* Currently, there is no facility for sharing documents among multiple users.
+
+Non-goals
+---------------------------
+
+* Soledad is not for filesystem synchronization, storage or backup. It provides an API for application code to synchronize and store arbitrary schema-less JSON documents in one big flat document database. One could model a filesystem on top of Soledad, but it would be a bad fit.
+* Soledad is not intended for decentralized peer-to-peer synchronization, although the underlying synchronization protocol does not require a server. Soledad takes a cloud approach in order to ensure that a client has quick access to an available copy of the data.
+
+Related software
+==================================
+
+[Crypton](https://crypton.io/) - Similar goals to Soledad, but in javascript for HTML5 applications.
+
+[U1DB](http://pythonhosted.org/u1db/) - Similar API as Soledad, without encryption.
+
+Protocol
+===================================
+
+Storage secret
+-----------------------------------
+
+When a client application first wants to use Soledad, it must provide the user's password to unlock the `storage_secret`. The `storage_secret` is a long, randomly generated symmetric key used to encrypt both the documents stored on the server and the local replica of these documents.
+
+TO ADD: example code
+
+The `storage_secret` is saved locally on disk in the file `soledad.json`, block encrypted using a derived key. The derived key is obtained from the user's password.
+
+The file `soledad.json` has a field `storage_secrets` that looks like so:
+
+ {
+ "storage_secrets": {
+ "<secret_id>": {
+ "kdf": "scrypt",
+ "kdf_salt": "400$8$5fb$61b499fe3366d947",
+ "kdf_length": 128,
+ "cipher": "aes128",
+ "length": 512,
+ "secret": "<encrypted storage_secret 1>",
+ }
+ }
+ }
+
+The `storage_secrets` entry is a map that stores information about each storage key, indexed by the id of each key. For each storage key, the following fields are stored:
+
+* `kdf`: the key derivation function to use. Only scrypt is currently supported (so for now, this value is ignored).
+* `kdf_salt`: the salt used in the kdf. The salt for scrypt is not random, but encodes important parameters like the limits for time and memory.
+* `kdf_length`: the length of the derived key resulting from the kdf.
+* `secret`: the encrypted `storage_secret`, created by `sym_encrypt(cipher, storage_secret, derived_key)` (base64 encoded).
+* `length`: the length of `storage_secret`, when not encrypted.
+* `cipher`: what cipher to use to encrypt `storage_secret`. It must match kdf_length (i.e. the length of the derived_key).
+* `secret_id`: a handle used to refer to a particular storage_secret and equal to `md5(storage_secret)`.
+
+Other variables:
+
+* `derived_key` is equal to `kdf(user_password, kdf_salt, kdf_length)`.
+* `storage_secret` is equal to `sym_decrypt(cipher, secret, derived_key)`.
+
+In the current version, only one `storage_secret` is supported.
+
+The `storage_secret` is shared among all devices with access to a particular user's Soledad database. See [Recovery and bootstrap](#Recovery.and.bootstrap) for how the storage_secret is initially installed on a device.
+
+We don't use the derived_key as the storage_secret because we want the user to be able to change their password without needing to re-key.
+
+TO DO: settle on a block cipher.
+
+Unresolved:
+
+* How do devices receive updates if the storage_secret changes?
+
+Document API
+-----------------------------------
+
+This is unchanged and identical to the [API used in U1DB](http://pythonhosted.org/u1db/reference-implementation.html).
+
+* Document storage: `create_doc()`, `put_doc()`, `get_doc()`.
+* Synchronization between database replicas: `sync()`.
+* Document indexing and searching: `create_index()`, `list_indexes()`, `get_from_index()`, `delete_index()`.
+* Document conflict resolution: `get_doc_conflicts()`, `resolve_doc()`.
+
+TO ADD: code examples
+
+Document encryption
+------------------------
+
+Before a JSON document is synced with the server, it is transformed into a document that looks like so:
+
+ {
+ "scheme": "aes128",
+ "secret_id": "1",
+ "ciphertext": "xxxxxxxxx",
+ "mac": "xxxxxxx"
+ }
+
+About these fields:
+
+* `ciphertext`: The original JSON document, encrypted and base64 encoded. `ciphertext` is equal to `sym_encrypt(cipher, content, document_secret)`.
+* `scheme`: Information about the block cipher that is used to encrypt this document.
+* `secret_id`: The id of the storage_secret that was used to generate the `document_key`.
+* `mac`: Defined as `HMAC(doc_id|rev|ciphertext, document_secret)`. The purpose of this field is to prevent the server from tampering with the stored documents.
+
+Other variables:
+
+* `document_secret`: equal to `HMAC(doc_id, storage_secret)`. This value is unique for every document and only kept in memory. We use document_secret instead of simply storage_secret in order to hinder possible derivation of storage_secret by the server. Every `doc_id` is unique.
+* `content`: equal to `sym_decrypt(cipher, ciphertext, document_secret)`.
+
+When receiving a document with the above structure from the server, Soledad client will decrypt the `ciphertext` to find `content`, verify that the mac is correct, and then store `content` as a cleartext document in the local database replica.
+
+Soledad client will verify that the mac is correct, decrypt the `ciphertext` to find `content`, and then store `content` as a document in the local database replica.
+
+Document synchronization
+-----------------------------------
+
+Soledad follows the U1DB synchronization protocol, with two changes:
+
+* Soledad adds the ability to flag some documents so they are not synchronized by default.
+* Soledad will refuse to synchronize a document if it is encrypted and the MAC is incorrect.
+
+TO ADD: code examples
+
+Document IDs
+--------------------
+
+Like U1DB, Soledad allows the programmer to use whatever ID they choose for each document. However, it is best practice to let the library choose random IDs for each document so as to ensure you don't leak information. In other words, leave the second argument to `create_doc()` empty.
+
+UNRESOLVED: perhaps Soledad should forbid custom document IDs.
+chiiph: I don't think we should forbid this, it's handy for certain cases and the downside isn't too problematic.
+
+Re-keying
+-----------
+
+Sometimes there is a need to change the `storage_secret`. Rather then re-encrypt every document, Soledad implements a system called "lazy revocation" where a new storage_secret is generated and used for all subsequent encryption. The old storage_secret is still retained and used when decrypting older documents that have not yet been re-encrypted with the new storage_secret.
+
+Implementation status: not yet.
+
+TO DO: code example
+
+Authentication
+-----------------------
+
+Unlike U1DB, Soledad only supports token authentication and does not support not support OAuth. Soledad itself does not handle authentication. Instead, this job is handled by a thin middleware layer running in front of the Soledad server daemon.
+
+Recovery and bootstrap
+------------------------------------------
+
+In order to bootstrap Soledad on a new device, the user only needs their login name and password. Everything else is downloaded from the server.
+
+**Recovery database**
+
+In order to support this functionality, the Soledad client stores a recovery document in a special recovery database. This database is shared among all users.
+
+The recovery database supports two functions:
+
+* `get_doc(doc_id)`
+* `put_doc(doc_id, recovery_document_content)`
+
+**Recovery document**
+
+An example recovery document:
+
+ {
+ "doc_id": "xxxxx"
+ "kdf": "scrypt",
+ "kdf_salt": "400$8$5fb$61b499fe3366d947",
+ "kdf_length": 128,
+ "cipher": "aes128",
+ "soledad": "xxxxx"
+ }
+
+About these fields:
+
+* `doc_id` is determined by the client and computed from `hmac(username@domain, user_password)`.
+* `soledad`: the encrypted `soledad.json`, created by `sym_encrypt(cipher, contents(soledad.json), derived_key)` (base64 encoded).
+* `kdf`: the key derivation function to use. Only scrypt is currently supported (so for now, this value is ignored).
+* `kdf_salt`: the salt used in the kdf. The salt for scrypt is not random, but encodes important parameters like the limits for time and memory.
+* `kdf_length`: the length of the derived key resulting from the kdf.
+* `cipher`: what cipher to use to encrypt `soledad`. It must match kdf_length (i.e. the length of the derived_key).
+
+**Authentication**
+
+Like other Soledad functions, access to the recovery database requires token authentication. However, the recovery database is shared among all users. Any user can query for any `doc_id`. The purpose of this is to allow the server to not know which user corresponds to which recovery document.
+
+To mitigate the vulnerabily created by this design, the response to queries of the discovery database have a very long delay.
+
+TODO: come up with a better authentication scheme.
+TODO: determine the response delay.
+
+
+Client Reference Implementation
+===================================
+
+Dependencies:
+
+* [U1DB](https://launchpad.net/u1db) provides an API and protocol for synchronised databases of JSON documents.
+* [SQLCipher](http://sqlcipher.net/) provides a block-encrypted SQLite database used for local storage.
+* python-gnupg
+
+Local storage
+--------------------------
+
+U1DB reference implementation in Python has an SQLite backend that implements the object store API over a common SQLite database residing in a local file. To allow for encrypted local storage, Soledad adds a SQLCipher backend, built on top of U1DB's SQLite backend, which adds [SQLCipher API](http://sqlcipher.net/sqlcipher-api/) to U1DB.
+
+**Responsibilities**
+
+The SQLCipher backend is responsible for:
+
+* Providing the SQLCipher API for U1DB (`PRAGMA` statements that control encryption parameters).
+* Guaranteeing that the local database used for storage is indeed encrypted.
+* Guaranteeing secure synchronization:
+ * All data being sent to a remote replica is encrypted with a symmetric key before being sent.
+ * Ensure that data received from remote replica is indeed encrypted to a symmetric key when it arrives, and then that it is decrypted before being included in the local database replica.
+* Correctly representing and handling new Document properties as sync flag.
+
+The Soledad `storage_key` is used directly as the key for the SQLCipher encryption layer. SQLCipher supports the use of a raw 256 bit keys if provided as a 64 character hex string. This will skip the key derivation step (PBKDF2), which is redundant in our case. For example:
+
+ sqlite> PRAGMA key = "x'2DD29CA851E7B56E4697B0E1F08507293D761A05CE4D1B628663F411A8086D99'";
+
+**Classes**
+
+SQLCipher backend classes:
+
+* `SQLCipherDatabase`: An extension of SQLitePartialExpandDatabase used by Soledad Client to store data locally using SQLCipher. It implements the following:
+ * Need of a password to instantiate the db.
+ * Verify if the db instance is indeed encrypted.
+ * Use a LeapSyncTarget for encrypting content before synchronizing over HTTP.
+ * "Syncable" option for documents (users can mark documents as not syncable, so they do not propagate to the server).
+
+Encrypted synchronization target
+--------------------------------------------------
+
+To allow for database synchronization among devices, Soledad uses the following conventions:
+
+* Centralized synchronization scheme: Soledad clients always sync with a server, and never between themselves.
+* The server stores its database in a CouchDB database using a REST API over HTTP.
+* All data sent to the server is encrypted with a symmetric secret before being sent. Note that this ensures all data received by the server and stored in the CouchDB database has been encrypted by the client.
+* All data received from the server is validated as being an encrypted blob, and then is decrypted before being stored in local database. Note that the local database provides a new encryption layer for the data through SQLCipher.
+
+**Responsibilities**
+
+Provide sync between local and remote replicas:
+
+* Encrypt outgoing content.
+* Decrypt incoming content.
+
+**Classes**
+
+Synchronization-related classes:
+
+* `LEAPDocument`: an extension of @u1db.Document@ with methods to:
+ * Return a symmetric encrypted version of Documents JSON representation.
+ * Set document's content by symmetric decrypting an encrypted JSON representation.
+* `LEAPSyncTarget`: an extension of `HTTPSyncTarget` with the following modified methods:
+ * `sync_exchange`: request encrypted version of Document's content before sending it to the network.
+ * `_parse_sync_stream`: set Document's content based on encrypted version right after it arrives as a response from the network.
+
+Server Reference Implementation
+======================================================
+
+Dependencies:
+
+* [CouchDB](https://couchdb.apache.org/] for server storage, via [python client library](https://pypi.python.org/pypi/CouchDB/0.8).
+* WSGI middleware for authentication.
+* [Twisted](http://twistedmatrix.com/trac/) to run the WSGI application.
+
+CouchDB backend
+-------------------------------
+
+In the server side, Soledad stores its database replicas in CouchDB servers. Soledad's CouchDB backend implementation is built on top of the reference `InMemory` implementation, but forces storage and fetch of U1DB data on a remote couch server for every write and read operation, respectively.
+
+CouchDB backend is responsible for:
+
+* Initializing and maintaining the following U1DB replica data in the database:
+ * Transaction log.
+ * Conflict log.
+ * Synchronization log.
+ * Indexes.
+* Mapping the U1DB API to CouchDB API.
+
+**Classes**
+
+* `CouchDatabase`: A backend used by Soledad Server to store data in CouchDB.
+* `CouchSyncTarget`: Just a target for syncing with Couch database.
+* `CouchServerState`: Inteface of the WSGI server with the CouchDB backend.
+
+WSGI Server
+-----------------------------------------
+
+The U1DB server reference implementation provides for an HTTP api backed by SQLite databases (of minimal usefulness in production environment!). Soledad extends this with token-based auth HTTP access to CouchDB databases.
+
+* Soledad makes use of @twistd@ from Twisted API to serve its WSGI application.
+* Authentication is done by means of a token.
+* Soledad implements a WSGI middleware in server side that:
+ * Uses the provided token to verify read and write access to each user's private databases and write access to the shared recovery database.
+ * Allows reading from the shared remote recovery database.
+ * Uses CouchDB as its backend.
+
+**Classes**
+
+* `SoledadAuthMiddleware`: implemnets the WSGI middleware with token based auth as described before.
+* `SoledadApp`: The WSGI application. For now, not different from `u1db.remote.http_app.HTTPApp`.
+
+**Authentication**
+
+Soledad Server authentication middleware controls access to user's private databases and to the shared recovery database. Soledad client provides a token for Soledad server that can check the validity of this token for this user's session by querying a certain database.
+
+A valid token for this user's session is required for:
+
+* Read and write access to this user's database.
+* Read and write access to the shared recovery database.
+
+Tests
+===================
+
+To be sure the new implemented backends work correctly, we included in Soledad the U1DB tests that are relevant for the new pieces of code (backends, document, http(s) and sync tests). We also added specific tests to the new functionalities we are building.
+
+salt = SCrypt::Engine.generate_salt(:max_time => 10)
+SCrypt::Engine.hash_secret "my grand secret", salt
+