diff options
author | elijah <elijah@riseup.net> | 2015-02-18 23:44:14 -0800 |
---|---|---|
committer | elijah <elijah@riseup.net> | 2015-02-18 23:44:14 -0800 |
commit | e53e113dcde3e3686095c3661307efccc5c7e64e (patch) | |
tree | 2d5219d73587750ec478811c65499325a95a04db /pages/docs/design |
initial conversation from leap_doc and leap_website
Diffstat (limited to 'pages/docs/design')
-rw-r--r-- | pages/docs/design/bonafide.text | 290 | ||||
-rw-r--r-- | pages/docs/design/cuttlefish.md | 7 | ||||
-rw-r--r-- | pages/docs/design/en.haml | 5 | ||||
-rw-r--r-- | pages/docs/design/nicknym-draft.md | 578 | ||||
-rw-r--r-- | pages/docs/design/nicknym.md | 498 | ||||
-rw-r--r-- | pages/docs/design/overview.md | 403 | ||||
-rw-r--r-- | pages/docs/design/soledad.md | 423 |
7 files changed, 2204 insertions, 0 deletions
diff --git a/pages/docs/design/bonafide.text b/pages/docs/design/bonafide.text new file mode 100644 index 0000000..1db2311 --- /dev/null +++ b/pages/docs/design/bonafide.text @@ -0,0 +1,290 @@ +@title = 'Bonafide' +@summary = 'Secure user registration, authentication, and provider discovery.' +@toc = true + +h1. Introduction + +Bonafide is a protocol that allows a user agent to communicate with a service provider. It includes the following capabilities: + +* Discover basic information about a provider. +* Register a new account with a provider. +* Discover information about all the services offered by a provider. +* Authenticate with a provider. +* Destroy a user account. + +Bonafide user SRP (Secure Remote Password) for password-based authentication. + +h1. Configuration Files + +h2. JSON files + +h3. GET /provider.json + +The @provider.json@ file includes basic information about a provider. The URL for provider.json is always the same for all providers (`http://DOMAIN/provider.json`). This is the basic 'bootstrap' file that informs the user agent what URLs to use for the other actions. + +JSON files are always in UTF8. When loaded in the browser, they are not displayed in UTF8, so non-ascii characters look off, but the files are correct. + +Here is an example `provider.json` (from https://demo.bitmask.net/provider.json): + +bc.. { + "api_uri": "https://api.demo.bitmask.net:4430", + "api_version": "1", + "ca_cert_fingerprint": "SHA256: 0f17c033115f6b76ff67871872303ff65034efe7dd1b910062ca323eb4da5c7e", + "ca_cert_uri": "https://demo.bitmask.net/ca.crt", + "default_language": "en", + "description": { + "en": "A demonstration provider." + }, + "domain": "demo.bitmask.net", + "enrollment_policy": "open", + "languages": [ + "en" + ], + "name": { + "en": "Bitmask" + }, + "services": [ + "openvpn" + ] +} + +p. In this document, `API_BASE` consists of `api_uri/api_version` +TODO: define a schema for this file. + +h3. GET API_BASE/configs.json + +For each supported service code, `configs.json` lists the available configuration file (there might be more than one for a particular service if there are different formats available). The service codes are listed in "services" in `provider.json`. A provider can use whatever service codes they want, but the user agent will only respond to the ones that it understands. + +For example: + +bc.. { + "openvpn": { + "formats": ["1", "2"], + "1": "eip-service.json", + "2": "eip-service-2.json" + }, + "soledad": { + "formats": ["1"], + "1": "soledad-service.json" + }, + "mx": { + "formats": ["1"], + "1": "smtp-service.json" + } +} + +h3. GET API_BASE/config/eip-service.json + +e.g. https://api.bitmask.net:4430/1/config/eip-service.json + +This file defines the "encrypted internet proxy" capabilities and gateways. + +h2. Keys + +h3. GET /ca.crt + +e.g. https://bitmask.net/ca.crt + +This is the CA certificate for the provider. It is used to validate servers when not using the web browser. In particular, for OpenVPN. The URL for this is the same for all providers. The fingerprint for this CA cert should be distributed with the client whenever possible. + + +h1. REST API + +h2. Version + +The API_BASE for the webapp API is constructed from 'api_uri' and 'api_version' from provider.json. + +For example, given this in provider.json: + +<code> +{ + "api_uri": "https://api.bitmask.net:4430", + "api_version": "1", +} +</code> + +The API_BASE would be https://api.bitmask.net:4430/1 + +The API_VERSION will increment if breaking changes to the api are made. The API might be enhanced without incrementing the version. For Version 1 this may include sending additional data in json responses. + +h2. Session + +h3. Handshake + +Starts authentication process (values A and B are part of the two step SRP authentication process). + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">POST API / sessions(.json)</th> + </tr> +</thead> +<tr> + <td>Query params:</td> + <td>@{"A": "12…345", "login": "swq055"}@</td> +</tr> +<tr> + <td>Response:</td> + <td>200 @{"B": "17…651", "salt": "A13CDE"}@</td> +</tr> +</table> + +If the query_params leave out the @A@, then no @B@ will be included and only the salt for the given login send out: + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">POST API / sessions(.json)</th> + </tr> +</thead> +<tr> + <td>Query params:</td> + <td>@{"login": "swq055"}@</td> +</tr> +<tr> + <td>Response:</td> + <td>200 @{"salt": "A13CDE"}@</td> +</tr> +</table> + +h3. Authenticate + +Finishes authentication handshake, after which the user is successfully authenticated (assuming no errors). This needs to be run after the Handshake. + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">PUT API / sessions/:login(.json)</th> + </tr> +</thead> +<tr> + <td>Query params:</td> + <td>@{"client_auth": "123…45", "A": "12…345"}@</td> +</tr> +<tr> + <td>Response:</td> + <td>200 @{"M2": "A123BC", "id": "234863", "token": "Aenfw893-zh"}@</td> +</tr> +<tr> + <td>Error Response:</td> + <td>500 @{"field":"password","error":"wrong password"}@</td> +</tr> +</table> + +Variables: + +* *A*: same as A param from the first Handshake request (POST). +* *client_auth*: SRP authentication value M, calculated by client. +* *M2*: Server response for SRP. +* *id*: User id for updating user record +* *token*: Unique identifier used to authenticate the user (until the session expires). + +h3. Token Authentication + +Tokens returned by the authentication request are used to authenticate further requests to the API and stored as a Hash in the couch database. Soledad directly queries the couch database to ensure the authentication of a user. It compares a hash of the token to the one stored in the database. Hashing prevents timing attacks. + +h3. Logout + +Destroy the current session and invalidate the token. Requires authentication. + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">DELETE API / logout(.json)</th> + </tr> +</thead> +<tr> + <td>Query params:</td> + <td>@{"login": "swq055"}@</td> +</tr> +<tr> + <td>Response:</td> + <td>204 NO CONTENT</td> +</tr> +</table> + +h2. Certificates + +h3. Get a VPN client certificate + +The client certificate will be a "free" cert unless client is authenticated. + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">POST API / cert</th> + </tr> +</thead> +<tr> + <td>Response:</td> + <td>200 @PEM ENCODED CERT@</td> +</tr> +</table> + +The response also includes the corresponding private key. + +h3. Get a SMTP client certificate + +The client certificate will include the user's email address and the fingerprint will be stored with the users identity and the date it was created. Authentication is required. + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">POST API / smtp_cert</th> + </tr> +</thead> +<tr> + <td>Response:</td> + <td>200 @PEM ENCODED CERT@</td> +</tr> +</table> + +The response also includes the corresponding private key. + +h2. Users + +h3. Signup + +Create a new user. + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">POST API / users(.json)</th> + </tr> +</thead> +<tr> + <td>Query params:</td> + <td>@{"user[password_salt]": "5A...21", "user[password_verifier]": "12...45", "user[login]": "that_s_me"}@</td> +</tr> +<tr> + <td>Response:</td> + <td>200 @{"password_salt":"5A...21","login":"that_s_me"}@</td> +</tr> +</table> + +h3. Update user record + +Update information about the user. Requires Authentication. + +<table class="table table-bordered table-striped"> +<thead> + <tr> + <th colspan="2">PUT API /users/:uid(.json)</th> + </tr> +</thead> +<tr> + <td>Query params:</td> + <td>@{"user[param1]": "value1", "user[param2]": "value2" }@</td> +</tr> +<tr> + <td>Response:</td> + <td>204 @NO CONTENT@</td> +</tr> +</table> + +Possible parameters to update: + +* @login@ (requires @password_verifier@) +* @password_verifier@ combined with @salt@ +* @public_key@ diff --git a/pages/docs/design/cuttlefish.md b/pages/docs/design/cuttlefish.md new file mode 100644 index 0000000..6b2c0f5 --- /dev/null +++ b/pages/docs/design/cuttlefish.md @@ -0,0 +1,7 @@ +@title = 'Cuttlefish' +@toc = true +@summary = "Federated events and callback notifications." + +Not yet written. + +About the name: Cuttlefish are able to communicate by creating [different patterns on their skin](http://www.newscientist.com/article/dn3728-mathematics-reveals-the-cuttlefishs-wink.html) and communicate secretly with each other by [changing the polarization of their skin](http://www.ncbi.nlm.nih.gov/pubmed/9319987). Also, cuttlefish are [freakishly smart](http://www.pbs.org/wgbh/nova/nature/spineless-smarts.html). diff --git a/pages/docs/design/en.haml b/pages/docs/design/en.haml new file mode 100644 index 0000000..427dbb6 --- /dev/null +++ b/pages/docs/design/en.haml @@ -0,0 +1,5 @@ +- @nav_title = "Design Docs" +- @title = "Design Documents" +- @summary = "Design documents and specifications for various LEAP components and protocols." + += child_summaries
\ No newline at end of file diff --git a/pages/docs/design/nicknym-draft.md b/pages/docs/design/nicknym-draft.md new file mode 100644 index 0000000..9398a9f --- /dev/null +++ b/pages/docs/design/nicknym-draft.md @@ -0,0 +1,578 @@ +@title = 'Nicknym' +@nav_title = 'Nicknym' +@toc = true +@summary = "Automatic discovery and validation of public keys." + +Introduction +========================================== + +Although many interesting key validation infrastructure schemes have been recently proposed, it is not at all clear what someone writing secure email software today should do. + +1. **Automatic Management Of Keys (Amok)**: concrete rules for software agents that automatically managing keys, with forward support for new validation protocols as they are developed. +1. **X-Key-Validation Email Header**: a simple, in-line method of advertising support for different key validation schemes. +1. **Super Basic Provider Endorsement Protocol**: + +super +basic +easy +simple +provider +endorsement +public keys +protocol +http +web + +**What is Nicknym?** + +Nicknym is a protocol to map user nicknames to public keys. With Nicknym, the user is able to think solely in terms of nickname, while still being able to communicate with a high degree of security (confidentiality, integrity, and authenticity). Essentially, Nicknym is a system for binding human-memorable nicknames to a cryptographic key via automatic discovery and automatic validation. + +Nicknym is a federated protocol: a Nicknym address is in the form `username@domain` just alike an email address and Nicknym includes both a client and a server component. Although the client can fall back to legacy methods of key discovery when needed, domains that run the Nicknym server component enjoy much stronger identity guarentees. + +Nicknym is key agnostic, and supports whatever public key information is available for an address (OpenPGP, OTR, X.509, RSA, etc). + +**Why is Nicknym needed?** + +Existing forms of secure identity are deeply flawed. These systems rely on either a single trusted entity (e.g. Skype), a vulnerable Certificate Authority system (e.g. S/MIME), or key identifiers that are not human memorable (e.g. fingerprints used in OpenPGP, OTR, etc). When an identity system is hard to use, it is effectively compromised because too few people take the time to use it properly. + +The broken nature of existing identities systems (either in security or in usability) is especially troubling because identity remains a bedrock precondition for any message security: you cannot ensure confidentiality or integrity without confirming the authenticity of the other party. Nicknym is a protocol to solve this problem in a way that is backward compatible, easy for the user, and includes very strong authenticity. + +Goals +========================================== + +**High level goals** + +* Pseudo-anonymous and human friendly addresses in the form `username@domain`. +* Automatic discovery and validation of public keys associated with an address. +* The user should be able to use Nicknym without understanding anything about public/private keys or signatures. + +**Technical goals** + +* Wide utility: nicknym should be a general purpose protocol that can be used in wide variety of contexts. +* Prevent dangerous actions: Nicknym should fail hard when there is a possibility of an attack. +* Minimize false positives: because Nicknym fails hard, we should minimize false positives where it fails incorrectly. +* Resistant to malicious actors: Nicknym should be externally auditable in order to assure service providers are not compromised or advertising bogus keys. +* Resistant to association analysis: Nicknym should not reveal to any actor or network observer a map of a user's associations. + +**Non-goals** + +* Nicknym does not try to create a decentralized peer-to-peer identity system. Nicknym is federated, akin to the way email is federated. + +Nicknym Overview +============================================= + +1. Nicknym Key Management Rules (NickKMR) +1. Nicknym Key Discovery Protocol (NickKDP) +1. Nicknym Key Endorsement Protocol (NickKEP) +1. Nicknym Key Auditing Protocol () + + +Nicknym attempts to solve the binding problem using several strategies: + +1. **TOFU**: +1. **Provider Endorsement**: +1. **Network Perspective**: + +Related work +=================================== + +**The Binding Problem** + +Nicknym attempts to solve the problem of binding a human memorable identifier to a cryptographic key. If you have the identifier, you should be able to get the key with a high level of confidence, and vice versa. The goal is to have federated, human memorable, globally unique public keys. + +There are a number of established methods for binding identifier to key: + +* [X.509 Certificate Authority System](https://en.wikipedia.org/wiki/X.509) +* Trust on First Use (TOFU) +* Mail-back Verification +* [Web of Trust (WOT)](http://en.wikipedia.org/wiki/Web_of_trust) +* [DNSSEC](https://en.wikipedia.org/wiki/Dnssec) +* [Shared Secret](https://en.wikipedia.org/wiki/Socialist_millionaire) +* [Network Perspective](http://convergence.io/) +* Nonverbal Feedback (a la ZRTP) +* Global Append-only Log +* Key fingerprint as unique identifiers + +The methods differ widely, but they all try to solve the same general problem of proving that a person or organization is in control of a particular key. + +**Nyms** + +http://nyms.io + +**DANE** + +[DANE](https://datatracker.ietf.org/wg/dane/), and the specific proposal for [OpenPGP user keys using DANE](https://datatracker.ietf.org/doc/draft-wouters-dane-openpgp/), offer a standardized method for securely publishing and locating OpenPGP public keys in DNS. + +As noted above, DANE will be very cool if ever adopted widely, but user keys are probably not a good fit for DNSSEC, because of issues of observability of DNS queries and complexity on the server and client end. + +By relying on the central authority of the root DNS zone, and the authority of TLDs (many of which are of doubtful trustworthiness), DANE potentially suffers from problems of compromised or nefarious authorities. Because DNS queries are not secure, a single user is particularly vulnerable to MiTM attacks that rewrite all their DNS queries. Adopting an alternate DNS query system, like [DNSCurve](http://dnscurve.org/), [DNSCrypt](https://www.opendns.com/technology/dnscrypt/), an alternate HTTPS based API, or restricting DNS queries to a VPN, would go a long way to fix this problem, and would effectively turn any supporting DNS server into a network perspectives notary. Regardless, the other problems with using DANE for user keys remain. + +**DIME** + +DIME, formerly DarkMail, uses DNSSEC for provider endorsement, in a manner similar to DANE. Each key endorsement includes the fingerprint of the previously endorsed key, allowing for some limited form of eventual consistency auditing. + +**End-To-End** + +https://code.google.com/p/end-to-end/wiki/KeyDistribution + +Certificate Transparency, but applied to email addresses. + +**Prism Proof Email** + +http://prismproof.org/ + +* S/MIME +* TOFU for legacy clients. Most mail user agents already support S/MIME, and will TOFU the key when they get a new message. + +**STEED** + +[STEED](http://g10code.com/steed.html) is a proposal with very similar goals to Nicknym. In a nutshell, Nicknym basically looks very similar to STEED when the domain owner does not support Nicknym. STEED includes four main ideas: + +* trust upon first contact: Nicknym uses this as well, although this is the fallback mechanism when others fail. +* automatic key distribution and retrieval: Nicknym uses this as well, although we used HTTP for this instead of DNS. +* automatic key generation: Nicknym is designed specifically to support automatic key generation, but this is outside the scope of the Nicknym protocol and it is not required. +* opportunistic encryption: Again, Nicknym is designed to support opportunistic encryption, but does not require it. + +Additional differences include: + +* Nicknym is key agnostic: Nicknym does not make an assumption about what types of public keys a user wants to associate with their address. +* Nicknym is protocol agnostic: Nicknym can be used with SMTP, XMPP, SIP, etc. +* Nicknym relies on service provider adoption: With Nicknym, the strength of verification of public keys rests the degree to which a service provider adopts Nicknym. If a service provider does not support Nicknym, then effectively Nicknym opperates like STEED for that domain. + +**The Simple Thing** + +"The Simple Thing" (TST) is not really a protocol, but it could be. The idea is to just do the simple thing, which is to ignored any type of key endorsement and just TOFU all keys and allow people who care to manually verify fingerprints of the keys they hold. + +In all the other proposals, the burden of key validation is on the person who owns the key. TST works in the opposite way: all the burden for key validation is placed on the person using the public key, not on the key's owner. + +If written as a rule, TST might look like this: + +1. The client should use whatever latest key is advertised inline via headers in email it receives. Ideally, this would be validated by the +provider via a very simple mechanism (such as grab user Bob's key from this well-known https URL or DNSSEC/DANE). +2. To cold start, sender can grab recipient's key via this well-known method. +3. Sender should confirm before sending a message that they have the most up to date key. Messages received that are encrypted to unsupported keys should be bounced. + +For a long discussion of the simple thing, see [messaging list](https://moderncrypto.org/mail-archive/messaging/2014/000855.html) + +**WebID and Mozilla Persona** + +What about [WebID](http://www.w3.org/wiki/WebID) or [Mozilla Persona](https://www.mozilla.org/en-US/persona/)? These are both interesting standards for cryptographically proving identify, so why do we need something new? + +These protocols, and the poorly conceived OpenID Connect, are designed to address a fundamentally different problem: authenticating a user to a website. The problem of authenticating users to one another requires a different architecture entirely. There are some similarities, however, and in the long run a Nicknym provider could also be a WebID and Mozilla Persona provider. + + +Nicknym protocol +============================== + +Definitions +------------------------- + +General terms: + +* **address**: A globally unique handle in the form username@domain (i.e. an email, SIP, or XMPP address) that we attempt to bind to a particular key. + +Actors: + +* **user**: the person with an email account through a service provider. +* **provider**: A service provider that offers end-user services on a particular domain. +* **key manager**: The key manager is a trusted user agent that is responsible for storing a database of all the keys for the user, updating these keys, and auditing the endorsements of the user's own keys. Typically, the key manager will run on the user's device, but might be running on any device the user chooses to trust. +* **key directory**: An online service that stores public keys and allows clients to search for keys by address or fingerprint. A key directory does not make any assertions regarding the validity of an address + key binding. Existing OpenPGP keyservers are a type of key directory in this context, but several of the key validation proposals include new protocols for key directories. +* **key endorser**: A key endorser is an organization that makes assertions regarding the binding of username@domain address to public key, typically by signing public keys. When supported, all such endorsement signatures must apply only to the uid corresponding to the address being endorsed. +* **nickagent**: A key manager that supports nicknym. +* **nickserver**: A daemon that acts as a key directory and key endorser for nicknym. + +Keys: + +* **user key**: A public/private key pair associated with a user address. If not specified, "user key" refers to the public key. +* **endorsement key**: The public/private key pair that a service provider or third party endorser uses to sign user keys. +* **provider key**: A public/private key pair owned by the provider used as an endorsement key. +* **validated key**: A key is "validated" if the nickagent has bound the user address to a public key. + +Key actions: + +* **key discovery**: The act of encountering a new key, either inline the message, via URL, or via a key directory. +* **verified key transition**: A process where a key owner generates a new public/private key pair and signs the new key with a prior key. Someone verifying this new key then must check to see if there is a signature on the new key from a key previously validated for that particular email address. In effect, "verified key transition" is a process where verifiers treat all keys as name-constrained signing authorities, with the ability to sign any new key matching the same email address. In the case of a system that supports signing particular uids, like OpenPGP, the signatures for key transition must apply only to the relevant uid. +* **key registration**: the key has been stored by the key manager, and assigned a validation level. The user agent always uses registered keys. This is analogous to adding a key to a user's keyring, although implementations may differ. + +Key information: + +* **binding information**: evidence that the key manager uses to make an educated guess regarding what key to associate with what email address. This information could come from the headers in an email, a DNS lookup, a key endorser, etc. +* **key validation level**: the level of confidence the key manager has that we have the right key for a particular address. For automatic key management, we don't say that a key is ever "trusted" unless the user has manually verified the fingerprint. + + +Nickserver requests +----------------------- + +A nickagent will attempt to discover the public key for a particular user address by contacting a nickserver. The nickserver returns JSON encoded key information in response to a simple HTTP request with a user's address. For example: + + curl -X POST -d address=alice@domain.org https://nicknym.domain.org:6425 + +* The port is always 6425. +* The HTTP verb may be POST or GET. +* The request must use TLS (see [Query security](#Query.security)). +* The query data should have a single field 'address'. +* For POST requests to nicknym.domain.org, the query data may be encrypted to the the public OpenPGP key nicknym@domain.org (see [Query security](#Query.security)). +* The request may include an "If-Modified-Since" header. In this case, the response might be "304 Not Modified". + +Requests may be local or foreign, and for user keys or for provider keys. + +* **local** requests are for information that the nickserver is authoritative. In other words, when the requested address is for the same domain that the nickserver is running on. +* **foreign** request are for information about other domains. +* **user key** requests are for addresses in the form "username@domain". +* **provider key** requests are for addresses in the form "domain". + +**Local, Provider Key request** + +For example: + + https://nicknym.domain.org:6425/?address=domain.org + +The response is the authoritative provider key for that domain. + +**Local, User Key request** + +For example: + + https://nicknym.domain.org:6425/?address=alice@domain.org + +The nickserver returns authoritative key information from the provider's own user database. Every public key returned for local requests must be signed by the provider's key. + +**Foreign, Provider Key request** + +For example: + + https://nicknym.domain.org:6425/?address=otherdomain.org + +1. First, check the nickserver's cache database of discovered keys. If the cache is not old, return this key. This step is skipped if the request is encrypted to the foreign provider's key. +2. Otherwise, fetch provider key from the provider's nickserver, cache the result, and return it. + +**Foreign, User Key request** + +For example: + + https://nicknym.domain.org:6425/?address=bob@otherdomain.org + +* First, check the nickserver's database cache of nicknyms. If the cache is not old, return the key information found in the cache. This step is skipped if the request is encrypted to a foreign provider key. +* Otherwise, attempt to contact a nickserver run by the provider of the requested address. If the nickserver exists, query that nickserver, cache the result, and return it in the response. +* Otherwise, fall back to querying existing SKS keyservers, cache the result and return it. +* Otherwise, return a 404 error. + +If the key returned for a foreign request contains multiple user addresses, they are all ignored by nicknym except for the user address specified in the request. + +Nickserver response +--------------------------------- + +The nickserver will respond with one of the following status codes: + +* "200 Success": found keys for this address, and the result is in the body of the response encoded as JSON. +* "304 Not Modified": if the request included an "If-Modified-Since" header and the keys in question have not been modified, the response will have status code 304. +* "404 Not Found": no keys were found for this address. +* "500 Error": An unknown error occurred. Details may be included in the body. + +Responses with status code 200 include a body text that is a JSON encoded map with a field "address" plus one or more of the following fields: "openpgp", "otr", "rsa", "ecc", "x509-client", "x509-server", "x509-ca". For example: + + { + "address": "alice@example.org", + "openpgp": "6VtcDgEKaHF64uk1c/crFhRHuFW9kTvgxAWAK01rXXjrxEa/aMOyXnVQuQINBEof...." + } + +Responses with status codes other than 200 include a body text that is a JSON encoded map with the following fields: "address", "status", and "message". For example: + + { + "address": "bob@otherdomain.org", + "status": 404, + "message": "Not Found" + } + +A nickserver response is always signed with a OpenPGP public signing key associated with the provider. Both successful AND unsuccessful responses are signed. Responses to successful local requests must be signed by the key associated with the address "nicknym@domain.org". Foreign requests and non-200 responses may alternately be signed with a key associated with the address nickserver@domain.org. This allows for user keys to be signed off-site and in advance, if they so choose. The signature is ASCII armored and appended to the JSON. + + { + "address": "alice@example.org", + "openpgp": "6VtcDgEKaHF64uk1c/crFhRHuFW9kTvgxAWAK01rXXjrxEa/aMOyXnVQuQINBEof...." + } + -----BEGIN PGP SIGNATURE----- + iQIcBAEBCgAGBQJRhWO+AAoJEIaItIgARAAl2IwP/24z9CjKjD0fd27pQs+r+e3h + p8KAYDbVac3+c3vm30DjHO/RKF4Zq6+sTAIkrFvXOwYJl9KgjMpQVV/voInjxATz + -----END PGP SIGNATURE----- + +If the data in the request was encrypted to the public key nicknym@domain.org, then the JSON response and signature are additionally encrypted to the symmetric key found in the request and returned base64 encoded. + +TBD: maybe we should just switch to a raw RSA or ECC signature. + +Query balancing +------------------------ + +A nickagent must choose what IP address to query by selecting randomly from among hosts that resolve from `nicknym.domain.org` (where `domain.org` is the domain name of the provider). + +If a host does not response, a nickagent must skip over it and attempt to contact another host in the pool. + +Query security +-------------------------- + +TLS is required for all nickserver queries. + +When querying https://nicknym.domain.org, nickagent must validate the TLS connection in one of four possible ways: + +1. Using a commercial CA certificate distributed with the host operating system. +2. Using DANE TLSA record to discover and validate the server certificate. +3. Using a seeded CA certificate (see [Discovering nickservers](#Discovering.nickservers)). +4. Using a custom self-signed CA certificate discovered for the domain, so long as the CA certificate was discovered via #1 or #2 or #3. Custom CA certificates may be discovered for a domain by making a provider key request of a nickserver (e.g. https://nicknym.known-domain.org/?address=new-domain.org). + +Optionally, a nickagent may make an encrypted query like so: + +0. Suppose the nickagent wants to make an encrypted query regarding the address alice@domain-x.org. +1. Nickagent discovers the public key for nicknym@domain-y.org. +2. The nickagent makes a POST request to a nickserver with two fields: address and ciphertext. +3. The address only contains the domain part of the address (unlike an unencrypted request). +4. The ciphertext field is encrypted to the public key for nicknym@domain-y.org. The corresponding cleartext contains the full address on the first line followed by randomly generated symmetric key on the second line. +5. If the request was local, the nickserver handles the request. If the request for foreign, the nickserver proxies the request to the domain specified in the address field. +6. When the request gets to the right nickserver, the body of the nickserver response is encrypted using using the symmetric key. The first line of the response specifies the cipher and mode used (allowed ciphers TBD). + +Comment: although it may seem excessive to encrypt both the request via TLS and the request body via OpenPGP, the reason for this is that many requests will not use OpenPGP. + +Automatic key validation +---------------------------------- + +A key is "validated" if the nickagent has bound the user address to a public key. + +Nicknym supports three different levels of key validation: + +* Level 3 - **path trusted**: A path of cryptographic signatures can be traced from a trusted key to the key under evaluation. By default, only the provider key from the user's provider is a "trusted key". +* Level 2 - **provider signed**: The key has been signed by a provider key for the same domain, but the provider key is not validated using a trust path (i.e. it is only registered) +* Level 1 - **registered**: The key has been encountered and saved, it has no signatures (that are meaningful to the nickagent). + +A nickagent will try to validate using the highest level possible. + +Automatic renewal +----------------------------- + +A validated public key is replaced with a new key when: + +* The new key is **path trusted** +* The new key is **provider signed**, but the old key is only **registered**. +* The new key has a later expiration, and the old key is only **registered** and will expire "soon" (exact time TBD). +* The agent discovers a new subkey, but the master signing key is unchanged. + +In all other cases, the new key is rejected. + +The nickagent will attempt to refresh a key by making request to a nickserver of its choice when a key is past 3/4 of its lifespan and again when it is about to expire. + +Nicknym encourages, but does not require, the use of short lived public keys, in the range of X to Y days. It is recommended that short lived keys are not uploaded to OpenPGP keyservers. + +Automatic invalidation +---------------------------- + +A key is invalidated if: + +* The old key has expired, and no new key can be discovered with equal or greater validation level. + +This means validation is a one way street: once a certain level of validation is established for a user address, no client should accept any future keys for that address with a lower level of validation. + +Discovering nickservers +-------------------------------- + +It is entirely up to the nickagent to decide what nickservers to query. If it wanted to, a nickagent could send all its requests to a single nickserver. + +However, nickagents should discover new nickservers and balance their queries to these nickservers for the purposes of availability, load balancing, network perspective, and hiding the user's association map. + +Whenever the nickagent is asked by a locally running application for a public key corresponding to an address on the domain `domain.org`, it may check to see if the host `nicknym.domain.org` exists. If the domain resolves, then the nickagent may add it to the pool of known nickservers. A nickagent should only perform this DNS check if it is able to do so over an encrypted tunnel. + +Additionally, a nickagent may be distributed with an initial list of "seed" nickservers. In this case, the nickagent is distributed with a copy of the CA certificate used to validate the TLS connection with each respective seed nickserver. + +Cross-provider signatures +---------------------------------- + +Nicknym does not support user signatures on user keys. There is no trust path from user to user. However, a service provider may sign the provider key of another provider. + +To be written. + +Auditing +---------------------------- + +In order to keep the user's provider from handing out bogus public keys, a nickagent should occasionally make foreign queries of the user's own address against nickservers run by third parties. The recommended frequency of these queries is once per day, at a random time during local waking hours. + +In order to prevent a nickserver from handing out bogus provider keys, a nickagent should query multiple nickservers before a provider key is registered or path trusted. + +Possible attacks: + +**Attack 1 - Intercept Outgoing:** + +* Attack: provider `A` signs an impostor key for provider `B` and distributes it to users of `A` (in order to intercept outgoing messages sent to `B`). +* Countermeasure: By querying multiple nickservers for the provider key of `B`, the nickagent can detect if provider `A` is attempting to distribute impostor keys. + +**Attack 2 - Intercept Incoming:** + +* Attack: provider `A` signs an impostor key for one of its own users, and distributes to users of provider `B` (in order to intercept incoming messages). +* Countermeasure: By querying for its own keys, a nickagent can detect if a provider is given out bogus keys for their addresses. + +**Attack 3 - Association Mapping:** + +* Attack: A provider tracks all the requests for key discovery in order to build a map of association. +* Countermeasure: By performing foreign key queries via third party nickservers, an agent can prevent any particular entity from tracking their queries. + +Known vulnerabilities +------------------------------------------ + +The nicknym protocol does not yet have a good solution for dealing with the following problems: + +* Enumeration attack: an attacker can enumerate the list of all users for a provider by simply querying every possible username combination. We have no defense against this, although it would surely take a while. +* DDoS attack: by their very nature, nickservers perform a bit of work for every request. Because of this, they are vulnerable to be overloaded by a a flood of bogus requests. +* Besmirch attack: a MitM attacker can sully the reputation of a provider by generating many bad responses (responses signed with the wrong key), thus leading other nickservers and nicknym agents to consider the provider compromised. + +Future enhancements +--------------------- + +**Additional discovery mechanisms** + +In addition to nickservers and SKS keyservers, there are two other potential methods for discovering public keys: + +* **Webfinger** includes a standard mechanism for distributing a user's public key via a simple HTTP request. This is very easy to implement on the server, and very easy to consume on the client side, but there are not many webfinger servers with public keys in the wild. +* **DNS** is used by multiple competing standards for key discovery. When and if one of these emerges predominate, Nicknym should attempt to use this method when available. + +Discussion +---------------------- + +*Why not use WoT?* Most users are empirically unable to properly maintain a web of trust. The concepts are hard, it is easy to mess up the signing practice, most people default to TOFU anyway, and very few users use revocation properly. Most importantly, the WOT exposes a user's social network, potentially highly sensitive information in its own right. When first proposed, WOT was a clever innovation, but contemporary threats have greatly reduced its usefulness. + +*Why not use DANE/DNSSEC?* DANE is great for discovery and validation of server keys, but there are many reasons why it is not so good for user keys: DNS records are slow to update; DNS queries are observable, unlike HTTP over TLS; it is difficult for a provider to publish thousands of keys in DNS; it is much easier for a client to do a simple HTTP fetch (and more possible for HTML5 clients). Also, RSA Public keys will soon be too big for UDP packets (though this is not true of ECC), so putting keys in DNS will mean putting a URL to a key in DNS, so you might as well just use HTTP anyway. + +*Why not use Shared Secret?* Shared secrets, like with the Socialist Millionaire protocol, are cool in theory but prone to user error and frustration in practice. A typical user is not in a position to have established a prior secret with most of the people they need to make first contact with. Shared secrets also cannot be scaled to a group setting. Finally, shared secrets are often typed incorrectly (e.g. was the secret "Invisible Zebra" or "invisibleZebra"? This could be fixed with rules for secret normalization, but this is tricky and language specific). For the special case of advanced users with special security needs, however, a shared secret provides a much stronger validation than other methods of key binding (so long as the validation window is small). + +*Why not use Mail-back Verification?* If the provider distributes user keys, there is not any benefit to mail-back verification. The nicknym protocol could potentially benefit from a future enhancement to support mail-back for users on a non-cooperating legacy provider. However, at its best, mail-back is a very weak form of key validation. + +*Why not use Global Append-only Log?* Maybe we should, they are neat. However, current implementations are slow, resource intensive, and experimental (e.g. namecoin). + +*Why not use Nonverbal Feedback?* ZRTP can use non-verbal clues to establish secure identity because of the nature of a live phone call. This doesn't work for text only messaging. + +*Why not use the key fingerprint as the unique identifier?* This is the strategy taken by all systems for peer-to-peer messaging (e.g. retroshare, bitmessage, etc). Depending on the length of the fingerprint, this method is very secure. Essentially, this approach neatly solves the binding problem by collapsing the key and the identifier together as one. The problem, of course, is that this is not very user friendly. Users must either have pre-arranged some way to exchange fingerprints, or they must fall back to one of the other methods for verification (shared secret, WoT, etc). The friction associated with pre-arranged sharing of fingerprints can be reduced with technology, using QR-codes and hand held devices, for example. In the best case scenario, however, fingerprints as identifiers will always be much less user friendly than simple username@domain.org addresses. The motivating premise behind Nicknym is that when an identity system is hard to use, it is effectively compromised because too few people take the time to use it properly. + +Reference nickagent implementation +==================================================== + +https://github.com/leapcode/keymanager + +There is a reference nickagent implementation called "key manager" written in Python and integrated into the LEAP client. It uses Soledad to store its data. + +Public API +---------------------------- + +**refresh_keys()** + +updates the keys with fresh ones, as needed. + +**get_key(address, type)** + +returns a single public key for address. type is one of 'openpgp', 'otr', 'x509', or 'rsa'. + +**send_key(address, public_key, type)** + +authenticates with the appropriate provider and saves the public_key in the user database. + +Storage +-------------------------- + +Key manager uses Soledad for storage. GPGME, however, requires keys to be stored in keyrings, which are read from disk. + +For now, Key Manager deals with this by storing each key in its own keyring. In other words, every key is in a keyring with exactly 1 key, and this keyring is stored in a Soledad document. To keep from confusing this keyring from a normal keyring, I will call it a 'unitary keyring'. + +Suppose Alice needs to communicate with Bob: + +1. Alice's Key Manager copies to disk her private key and bob's public key. The key manager gets these from Soledad, in the form of unitary Keyrings. +2. Client code uses GPGME, feeding it these temporary keyring files. +3. The keyrings are destroyed. + +TBD: how best to ensure destruction of the keyring files. + +An example Soledad document for an address: + + { + "address":"alice@example.org", + "keys": [ + { + "type": "opengpg" + "key": "binary blob", + "keyring": "binary blob", + "expires_on": "2014-01-01", + "validation": "provider_signed", + "first_seen_at": "2013-04-01 00:11:00", + "last_audited_at": "2013-04-02 12:00:00", + }, + { + "type": "otr" + "key": "binary blob", + "expires_on": "2014-01-01", + "validation": "registered", + "first_seen_at": "2013-04-01 00:11:00", + "last_audited_at": "2013-04-02 12:00:00", + } + ] + } + +Pseudocode +--------------------------- + +get_key + + # + # return a key for an address + # + function get_key(address, type) + if key for address exists in soledad database? + return key + else + fetch key from nickserver + save it in soledad + return key + end + end + +send_key + + # + # send the user's provider the user's key. this key will get signed by the provider, and replace any prior keys + # + function send_key(type) + if not authenticated: + error! + end + get (self.address, type) + send (key_data, type) to the provider + end + +refresh_keys + + # + # update the user's db of validated keys to see if there are changes. + # + function refresh_keys() + for each key in the soledad database (that should be checked?): + newkey = fetch_key_from_nickserver() + if key is about to expire and newkey complies with the renewal paramters: + replace key with newkey + else if fingerprint(key) != fingerprint(newkey): + freak out, something wrong is happening? :) + may be handle revokation, or try to get some voting for a given key and save that one (retrieve it through tor/vpn/etc and see what's the most found key or something like that. + else: + everything's cool for this key, continue + end + end + end + +private fetch_key_from_nickserver + + function fetch_key_from_nickserver(key) + randomly pick a subset of the available nickservers we know about + send a tcp request to each in this subset in parallel + first one that opens a successful socket is used, all the others are terminated immediately + make http request + parse json for the keys + return keys + end + + +Reference nickserver implementation +===================================================== + +https://github.com/leapcode/nickserver + +The reference nickserver is written in Ruby 1.9 and licensed GPLv3. It is lightweight and scalable (supporting high concurrency, and reasonable latency). Data is stored in CouchDB. diff --git a/pages/docs/design/nicknym.md b/pages/docs/design/nicknym.md new file mode 100644 index 0000000..3f94875 --- /dev/null +++ b/pages/docs/design/nicknym.md @@ -0,0 +1,498 @@ +@title = 'Nicknym' +@toc = true +@summary = "Automatic discovery and validation of public keys." + +Introduction +========================================== + +**What is Nicknym?** + +Nicknym is a protocol to map user nicknames to public keys. With Nicknym, the user is able to think solely in terms of nickname, while still being able to communicate with a high degree of security (confidentiality, integrity, and authenticity). Essentially, Nicknym is a system for binding human-memorable nicknames to a cryptographic key via automatic discovery and automatic validation. + +Nicknym is a federated protocol: a Nicknym address is in the form `username@domain` just alike an email address and Nicknym includes both a client and a server component. Although the client can fall back to legacy methods of key discovery when needed, domains that run the Nicknym server component enjoy much stronger identity guarentees. + +Nicknym is key agnostic, and supports whatever public key information is available for an address (OpenPGP, OTR, X.509, RSA, etc). However, Nicknym enforces a strict one-to-one mapping of address to public key. + +**Why is Nicknym needed?** + +Existing forms of secure identity are deeply flawed. These systems rely on either a single trusted entity (e.g. Skype), a vulnerable Certificate Authority system (e.g. S/MIME), or key identifiers that are not human memorable (e.g. fingerprints used in OpenPGP, OTR, etc). When an identity system is hard to use, it is effectively compromised because too few people take the time to use it properly. + +The broken nature of existing identities systems (either in security or in usability) is especially troubling because identity remains a bedrock precondition for any message security: you cannot ensure confidentiality or integrity without confirming the authenticity of the other party. Nicknym is a protocol to solve this problem in a way that is backward compatible, easy for the user, and includes very strong authenticity. + +Goals +========================================== + +**High level goals** + +* Pseudo-anonymous and human friendly addresses in the form `username@domain`. +* Automatic discovery and validation of public keys associated with an address. +* The user should be able to use Nicknym without understanding anything about public/private keys or signatures. + +**Technical goals** + +* Wide utility: nicknym should be a general purpose protocol that can be used in wide variety of contexts. +* No revocation: instead of key revocation, support short lived keys that frequently and automatically refresh. +* Prevent dangerous actions: Nicknym should fail hard when there is a possibility of an attack. +* Minimize false positives: because Nicknym fails hard, we should minimize false positives where it fails incorrectly. +* Resistant to malicious actors: Nicknym should be externally auditable in order to assure service providers are not compromised or advertising bogus keys. +* Resistant to association analysis: Nicknym should not reveal to any actor or network observer a map of a user's associations. + +**Non-goals** + +* Nicknym does not try to create a decentralized peer-to-peer identity system. Nicknym is federated, akin to the way email is federated. + +The binding problem +============================================= + +Nicknym attempts to solve the problem of binding a human memorable identifier to a cryptographic key. If you have the identifier, you should be able to get the key with a high level of confidence, and vice versa. The goal is to have federated, human memorable, globally unique public keys. + +There are a number of established methods for binding identifier to key: + +* [X.509 Certificate Authority System](https://en.wikipedia.org/wiki/X.509) +* Trust on First Use (TOFU) +* Mail-back Verification +* [Web of Trust (WOT)](http://en.wikipedia.org/wiki/Web_of_trust) +* [DNSSEC](https://en.wikipedia.org/wiki/Dnssec) +* [Shared Secret](https://en.wikipedia.org/wiki/Socialist_millionaire) +* [Network Perspective](http://convergence.io/) +* Nonverbal Feedback (a la ZRTP) +* Global Append-only Log +* Key fingerprint as unique identifiers + +The methods differ widely, but they all try to solve the same general problem of proving that a person or organization is in control of a particular key. + +**Nicknym overview** + +Nicknym solves the binding problem by using a combination of methods, utilizing TOFU, X.509, Network Perspective, and additional methods we call "Provider Keys" and "Federated Web of Trust" (FWOT). + +1. Nicknym starts with TOFU of user keys, because it is easy to do and backward compatible with legacy providers. In TOFU, your client naively accept the key of another user when it first encounters it. When you accept a key via TOFU, you are making a bet that possible attackers against you did not have the foresight to specifically target you with a false key during discovery. +2. Next, we add X.509. For those providers that publish the public keys of their users, we require that these keys be fetched over validated TLS. This makes third party attacks against TOFU more difficult, but also places a lot of trust in the providers (and the Certificate Authorities). +3. Next, we add a simple form of Network Perspective where the client can ask one provider what key another provider is distributing. This allows a user's client to be able to audit their provider and keep them honest in an automated manner. If a service provider distributes bogus keys, their users and other providers will be quickly alerted to the problem. +4. Next, we add Provider Keys. If a service provider supports nicknym, the public keys of its users are additionally signed by a "provider key". If your client has the correct provider key, you no longer need to TOFU the keys of the provider's users. This has the benefit making it possible for a user to issue new keys, and to add support for very short-lived keys rather than trying to use key revocation. A service provider is much less likely to lose their private key or have it compromised, a significant problem with TOFU of user keys. +5. Finally, we add a Federated Web of Trust. The system works like this: each service provider is responsible for the due diligence of properly signing the keys of a few other providers, akin to the distributed web of trust model of OpenPGP, but with all the hard work of proper signature validation placed upon the service provider. When a user communicates with another party who happens to use a service provider that participates in the FWOT, the user's software will automatically trace a chain of signature from the other party's key, to their service provider, to the user's own service provider (with some possible intermediary signatures). This allows for identity that is verified through an end-to-end trust path from any user to any other user in a way that can be automated and is human memorable. Support for a FWOT allows us to bypass entirely X.509 Certificate Authorities, to gracefully handle short lived provider keys, and to handle emergency re-key events if a provider's key is lost. + +As we move down this list, each measure taken gets more complicated, requires more provider cooperation, and provides less additional benefit than the one before it. Nevertheless, each measure contributes some important benefit toward the goal of automatic binding of user identity to public key. + +**Questions** + +*Why not use WoT?* Most users are empirically unable to properly maintain a web of trust. The concepts are hard, it is easy to mess up the signing practice, most people default to TOFU anyway, and very few users use revocation properly. Most importantly, the WOT exposes a user's social network, potentially highly sensitive information in its own right. When first proposed, WOT was a clever innovation, but contemporary threats have greatly reduced its usefulness. + +*Why not use DANE/DNSSEC?* DANE is great for discovery and validation of server keys, but there are many reasons why it is not so good for user keys: DNS records are slow to update; DNS queries are observable, unlike HTTP over TLS; it is difficult for a provider to publish thousands of keys in DNS; it is much easier for a client to do a simple HTTP fetch (and more possible for HTML5 clients). Also, RSA Public keys will soon be too big for UDP packets (though this is not true of ECC), so putting keys in DNS will mean putting a URL to a key in DNS, so you might as well just use HTTP anyway. + +*Why not use Shared Secret?* Shared secrets, like with the Socialist Millionaire protocol, are cool in theory but prone to user error and frustration in practice. A typical user is not in a position to have established a prior secret with most of the people they need to make first contact with. Shared secrets also cannot be scaled to a group setting. Finally, shared secrets are often typed incorrectly (e.g. was the secret "Invisible Zebra" or "invisibleZebra"? This could be fixed with rules for secret normalization, but this is tricky and language specific). For the special case of advanced users with special security needs, however, a shared secret provides a much stronger validation than other methods of key binding (so long as the validation window is small). + +*Why not use Mail-back Verification?* If the provider distributes user keys, there is not any benefit to mail-back verification. The nicknym protocol could potentially benefit from a future enhancement to support mail-back for users on a non-cooperating legacy provider. However, at its best, mail-back is a very weak form of key validation. + +*Why not use Global Append-only Log?* Maybe we should, they are neat. However, current implementations are slow, resource intensive, and experimental (e.g. namecoin). + +*Why not use Nonverbal Feedback?* ZRTP can use non-verbal clues to establish secure identity because of the nature of a live phone call. This doesn't work for text only messaging. + +*Why not use the key fingerprint as the unique identifier?* This is the strategy taken by all systems for peer-to-peer messaging (e.g. retroshare, bitmessage, etc). Depending on the length of the fingerprint, this method is very secure. Essentially, this approach neatly solves the binding problem by collapsing the key and the identifier together as one. The problem, of course, is that this is not very user friendly. Users must either have pre-arranged some way to exchange fingerprints, or they must fall back to one of the other methods for verification (shared secret, WoT, etc). The friction associated with pre-arranged sharing of fingerprints can be reduced with technology, using QR-codes and hand held devices, for example. In the best case scenario, however, fingerprints as identifiers will always be much less user friendly than simple username@domain.org addresses. The motivating premise behind Nicknym is that when an identity system is hard to use, it is effectively compromised because too few people take the time to use it properly. + +Related work +=================================== + +**WebID and Mozilla Persona** + +What about [WebID](http://www.w3.org/wiki/WebID) or [Mozilla Persona](https://www.mozilla.org/en-US/persona/)? These are both interesting standards for cryptographically proving identify, so why do we need something new? + +These protocols, and the poorly conceived OpenID Connect, are designed to address a fundamentally different problem: authenticating a user to a website. The problem of authenticating users to one another requires a different architecture entirely. There are some similarities, however, and in the long run a Nicknym provider could also be a WebID and Mozilla Persona provider. + +**STEED** + +[STEED](http://g10code.com/steed.html) is a proposal with very similar goals to Nicknym. In a nutshell, Nicknym basically looks very similar to STEED when the domain owner does not support Nicknym. STEED includes four main ideas: + +* trust upon first contact: Nicknym uses this as well, although this is the fallback mechanism when others fail. +* automatic key distribution and retrieval: Nicknym uses this as well, although we used HTTP for this instead of DNS. +* automatic key generation: Nicknym is designed specifically to support automatic key generation, but this is outside the scope of the Nicknym protocol and it is not required. +* opportunistic encryption: Again, Nicknym is designed to support opportunistic encryption, but does not require it. + +Additional differences include: + +* Nicknym is key agnostic: Nicknym does not make an assumption about what types of public keys a user wants to associate with their address. +* Nicknym is protocol agnostic: Nicknym can be used with SMTP, XMPP, SIP, etc. +* Nicknym relies on service provider adoption: With Nicknym, the strength of verification of public keys rests the degree to which a service provider adopts Nicknym. If a service provider does not support Nicknym, then effectively Nicknym opperates like STEED for that domain. + +**DANE** + +[DANE](https://datatracker.ietf.org/wg/dane/), and the specific proposal for [OpenPGP user keys using DANE](https://datatracker.ietf.org/doc/draft-wouters-dane-openpgp/), offer a standardized method for securely publishing and locating OpenPGP public keys in DNS. + +As noted above, DANE will be very cool if ever adopted widely, but user keys are probably not a good fit for DNSSEC, because of issues of observability of DNS queries and complexity on the server and client end. + +By relying on the central authority of the root DNS zone, and the authority of TLDs (many of which are of doubtful trustworthiness), DANE potentially suffers from problems of compromised or nefarious authorities. Because DNS queries are not secure, a single user is particularly vulnerable to MiTM attacks that rewrite all their DNS queries. Adopting an alternate DNS query system, like [DNSCurve](http://dnscurve.org/), [DNSCrypt](https://www.opendns.com/technology/dnscrypt/), an alternate HTTPS based API, or restricting DNS queries to a VPN, would go a long way to fix this problem, and would effectively turn any supporting DNS server into a network perspectives notary. Regardless, the other problems with using DANE for user keys remain. + +Nicknym protocol +============================== + +Definitions +------------------------- + +* **address**: A globally unique handle in the form user@domain (i.e. an email, SIP, or XMPP address). +* **provider**: A service provider that offers end-user services on a particular domain. +* **user key**: A public/private key pair associated with a user address. If not specified, "user key" refers to the public key. +* **provider key**: A public/private key pair owned by the provider. The address associated with this key is just the domain of the service provider. +* **validated key**: A key is "validated" if the nickagent has bound the user address to a public key. +* **nickagent**: Client side program that manages a user's contact list, the public keys they have encountered and validated, and the user's own key pairs. The nickagent may also expose an API for other local applications to query for a public key. +* **nickserver**: Server side daemon run by providers who support Nicknym. A nickserver is responsible for answering the question "what public key do you see for this address"? + +Nickserver requests +----------------------- + +A nickagent will attempt to discover the public key for a particular user address by contacting a nickserver. The nickserver returns JSON encoded key information in response to a simple HTTP request with a user's address. For example: + + curl -X POST -d address=alice@domain.org https://nicknym.domain.org:6425 + +* The port is always 6425. +* The HTTP verb may be POST or GET. +* The request must use TLS (see [Query security](#Query.security)). +* The query data should have a single field 'address'. +* For POST requests to nicknym.domain.org, the query data may be encrypted to the the public OpenPGP key nicknym@domain.org (see [Query security](#Query.security)). +* The request may include an "If-Modified-Since" header. In this case, the response might be "304 Not Modified". + +Requests may be local or foreign, and for user keys or for provider keys. + +* **local** requests are for information that the nickserver is authoritative. In other words, when the requested address is for the same domain that the nickserver is running on. +* **foreign** request are for information about other domains. +* **user key** requests are for addresses in the form "username@domain". +* **provider key** requests are for addresses in the form "domain". + +**Local, Provider Key request** + +For example: + + https://nicknym.domain.org:6425/?address=domain.org + +The response is the authoritative provider key for that domain. + +**Local, User Key request** + +For example: + + https://nicknym.domain.org:6425/?address=alice@domain.org + +The nickserver returns authoritative key information from the provider's own user database. Every public key returned for local requests must be signed by the provider's key. + +**Foreign, Provider Key request** + +For example: + + https://nicknym.domain.org:6425/?address=otherdomain.org + +1. First, check the nickserver's cache database of discovered keys. If the cache is not old, return this key. This step is skipped if the request is encrypted to the foreign provider's key. +2. Otherwise, fetch provider key from the provider's nickserver, cache the result, and return it. + +**Foreign, User Key request** + +For example: + + https://nicknym.domain.org:6425/?address=bob@otherdomain.org + +* First, check the nickserver's database cache of nicknyms. If the cache is not old, return the key information found in the cache. This step is skipped if the request is encrypted to a foreign provider key. +* Otherwise, attempt to contact a nickserver run by the provider of the requested address. If the nickserver exists, query that nickserver, cache the result, and return it in the response. +* Otherwise, fall back to querying existing SKS keyservers, cache the result and return it. +* Otherwise, return a 404 error. + +If the key returned for a foreign request contains multiple user addresses, they are all ignored by nicknym except for the user address specified in the request. + +Nickserver response +--------------------------------- + +The nickserver will respond with one of the following status codes: + +* "200 Success": found keys for this address, and the result is in the body of the response encoded as JSON. +* "304 Not Modified": if the request included an "If-Modified-Since" header and the keys in question have not been modified, the response will have status code 304. +* "404 Not Found": no keys were found for this address. +* "500 Error": An unknown error occurred. Details may be included in the body. + +Responses with status code 200 include a body text that is a JSON encoded map with a field "address" plus one or more of the following fields: "openpgp", "otr", "rsa", "ecc", "x509-client", "x509-server", "x509-ca". For example: + + { + "address": "alice@example.org", + "openpgp": "6VtcDgEKaHF64uk1c/crFhRHuFW9kTvgxAWAK01rXXjrxEa/aMOyXnVQuQINBEof...." + } + +Responses with status codes other than 200 include a body text that is a JSON encoded map with the following fields: "address", "status", and "message". For example: + + { + "address": "bob@otherdomain.org", + "status": 404, + "message": "Not Found" + } + +A nickserver response is always signed with a OpenPGP public signing key associated with the provider. Both successful AND unsuccessful responses are signed. Responses to successful local requests must be signed by the key associated with the address "nicknym@domain.org". Foreign requests and non-200 responses may alternately be signed with a key associated with the address nickserver@domain.org. This allows for user keys to be signed off-site and in advance, if they so choose. The signature is ASCII armored and appended to the JSON. + + { + "address": "alice@example.org", + "openpgp": "6VtcDgEKaHF64uk1c/crFhRHuFW9kTvgxAWAK01rXXjrxEa/aMOyXnVQuQINBEof...." + } + -----BEGIN PGP SIGNATURE----- + iQIcBAEBCgAGBQJRhWO+AAoJEIaItIgARAAl2IwP/24z9CjKjD0fd27pQs+r+e3h + p8KAYDbVac3+c3vm30DjHO/RKF4Zq6+sTAIkrFvXOwYJl9KgjMpQVV/voInjxATz + -----END PGP SIGNATURE----- + +If the data in the request was encrypted to the public key nicknym@domain.org, then the JSON response and signature are additionally encrypted to the symmetric key found in the request and returned base64 encoded. + +TBD: maybe we should just switch to a raw RSA or ECC signature. + +Query balancing +------------------------ + +A nickagent must choose what IP address to query by selecting randomly from among hosts that resolve from `nicknym.domain.org` (where `domain.org` is the domain name of the provider). + +If a host does not response, a nickagent must skip over it and attempt to contact another host in the pool. + +Query security +-------------------------- + +TLS is required for all nickserver queries. + +When querying https://nicknym.domain.org, nickagent must validate the TLS connection in one of four possible ways: + +1. Using a commercial CA certificate distributed with the host operating system. +2. Using DANE TLSA record to discover and validate the server certificate. +3. Using a seeded CA certificate (see [Discovering nickservers](#Discovering.nickservers)). +4. Using a custom self-signed CA certificate discovered for the domain, so long as the CA certificate was discovered via #1 or #2 or #3. Custom CA certificates may be discovered for a domain by making a provider key request of a nickserver (e.g. https://nicknym.known-domain.org/?address=new-domain.org). + +Optionally, a nickagent may make an encrypted query like so: + +0. Suppose the nickagent wants to make an encrypted query regarding the address alice@domain-x.org. +1. Nickagent discovers the public key for nicknym@domain-y.org. +2. The nickagent makes a POST request to a nickserver with two fields: address and ciphertext. +3. The address only contains the domain part of the address (unlike an unencrypted request). +4. The ciphertext field is encrypted to the public key for nicknym@domain-y.org. The corresponding cleartext contains the full address on the first line followed by randomly generated symmetric key on the second line. +5. If the request was local, the nickserver handles the request. If the request for foreign, the nickserver proxies the request to the domain specified in the address field. +6. When the request gets to the right nickserver, the body of the nickserver response is encrypted using using the symmetric key. The first line of the response specifies the cipher and mode used (allowed ciphers TBD). + +Comment: although it may seem excessive to encrypt both the request via TLS and the request body via OpenPGP, the reason for this is that many requests will not use OpenPGP. + +Automatic key validation +---------------------------------- + +A key is "validated" if the nickagent has bound the user address to a public key. + +Nicknym supports three different levels of key validation: + +* Level 3 - **path trusted**: A path of cryptographic signatures can be traced from a trusted key to the key under evaluation. By default, only the provider key from the user's provider is a "trusted key". +* Level 2 - **provider signed**: The key has been signed by a provider key for the same domain, but the provider key is not validated using a trust path (i.e. it is only registered) +* Level 1 - **registered**: The key has been encountered and saved, it has no signatures (that are meaningful to the nickagent). + +A nickagent will try to validate using the highest level possible. + +Automatic renewal +----------------------------- + +A validated public key is replaced with a new key when: + +* The new key is **path trusted** +* The new key is **provider signed**, but the old key is only **registered**. +* The new key has a later expiration, and the old key is only **registered** and will expire "soon" (exact time TBD). +* The agent discovers a new subkey, but the master signing key is unchanged. + +In all other cases, the new key is rejected. + +The nickagent will attempt to refresh a key by making request to a nickserver of its choice when a key is past 3/4 of its lifespan and again when it is about to expire. + +Nicknym encourages, but does not require, the use of short lived public keys, in the range of X to Y days. It is recommended that short lived keys are not uploaded to OpenPGP keyservers. + +Automatic invalidation +---------------------------- + +A key is invalidated if: + +* The old key has expired, and no new key can be discovered with equal or greater validation level. + +This means validation is a one way street: once a certain level of validation is established for a user address, no client should accept any future keys for that address with a lower level of validation. + +Discovering nickservers +-------------------------------- + +It is entirely up to the nickagent to decide what nickservers to query. If it wanted to, a nickagent could send all its requests to a single nickserver. + +However, nickagents should discover new nickservers and balance their queries to these nickservers for the purposes of availability, load balancing, network perspective, and hiding the user's association map. + +Whenever the nickagent is asked by a locally running application for a public key corresponding to an address on the domain `domain.org`, it may check to see if the host `nicknym.domain.org` exists. If the domain resolves, then the nickagent may add it to the pool of known nickservers. A nickagent should only perform this DNS check if it is able to do so over an encrypted tunnel. + +Additionally, a nickagent may be distributed with an initial list of "seed" nickservers. In this case, the nickagent is distributed with a copy of the CA certificate used to validate the TLS connection with each respective seed nickserver. + +Cross-provider signatures +---------------------------------- + +Nicknym does not support user signatures on user keys. There is no trust path from user to user. However, a service provider may sign the provider key of another provider. + +To be written. + +Auditing +---------------------------- + +In order to keep the user's provider from handing out bogus public keys, a nickagent should occasionally make foreign queries of the user's own address against nickservers run by third parties. The recommended frequency of these queries is once per day, at a random time during local waking hours. + +In order to prevent a nickserver from handing out bogus provider keys, a nickagent should query multiple nickservers before a provider key is registered or path trusted. + +Possible attacks: + +**Attack 1 - Intercept Outgoing:** + +* Attack: provider `A` signs an impostor key for provider `B` and distributes it to users of `A` (in order to intercept outgoing messages sent to `B`). +* Countermeasure: By querying multiple nickservers for the provider key of `B`, the nickagent can detect if provider `A` is attempting to distribute impostor keys. + +**Attack 2 - Intercept Incoming:** + +* Attack: provider `A` signs an impostor key for one of its own users, and distributes to users of provider `B` (in order to intercept incoming messages). +* Countermeasure: By querying for its own keys, a nickagent can detect if a provider is given out bogus keys for their addresses. + +**Attack 3 - Association Mapping:** + +* Attack: A provider tracks all the requests for key discovery in order to build a map of association. +* Countermeasure: By performing foreign key queries via third party nickservers, an agent can prevent any particular entity from tracking their queries. + +Known vulnerabilities +------------------------------------------ + +The nicknym protocol does not yet have a good solution for dealing with the following problems: + +* Enumeration attack: an attacker can enumerate the list of all users for a provider by simply querying every possible username combination. We have no defense against this, although it would surely take a while. +* DDoS attack: by their very nature, nickservers perform a bit of work for every request. Because of this, they are vulnerable to be overloaded by a a flood of bogus requests. +* Besmirch attack: a MitM attacker can sully the reputation of a provider by generating many bad responses (responses signed with the wrong key), thus leading other nickservers and nicknym agents to consider the provider compromised. + +Future enhancements +--------------------- + +**Additional discovery mechanisms** + +In addition to nickservers and SKS keyservers, there are two other potential methods for discovering public keys: + +* **Webfinger** includes a standard mechanism for distributing a user's public key via a simple HTTP request. This is very easy to implement on the server, and very easy to consume on the client side, but there are not many webfinger servers with public keys in the wild. +* **DNS** is used by multiple competing standards for key discovery. When and if one of these emerges predominate, Nicknym should attempt to use this method when available. + + +Reference nickagent implementation +==================================================== + +https://github.com/leapcode/keymanager + +There is a reference nickagent implementation called "key manager" written in Python and integrated into the LEAP client. It uses Soledad to store its data. + +Public API +---------------------------- + +**refresh_keys()** + +updates the keys with fresh ones, as needed. + +**get_key(address, type)** + +returns a single public key for address. type is one of 'openpgp', 'otr', 'x509', or 'rsa'. + +**send_key(address, public_key, type)** + +authenticates with the appropriate provider and saves the public_key in the user database. + +Storage +-------------------------- + +Key manager uses Soledad for storage. GPGME, however, requires keys to be stored in keyrings, which are read from disk. + +For now, Key Manager deals with this by storing each key in its own keyring. In other words, every key is in a keyring with exactly 1 key, and this keyring is stored in a Soledad document. To keep from confusing this keyring from a normal keyring, I will call it a 'unitary keyring'. + +Suppose Alice needs to communicate with Bob: + +1. Alice's Key Manager copies to disk her private key and bob's public key. The key manager gets these from Soledad, in the form of unitary Keyrings. +2. Client code uses GPGME, feeding it these temporary keyring files. +3. The keyrings are destroyed. + +TBD: how best to ensure destruction of the keyring files. + +An example Soledad document for an address: + + { + "address":"alice@example.org", + "keys": [ + { + "type": "opengpg" + "key": "binary blob", + "keyring": "binary blob", + "expires_on": "2014-01-01", + "validation": "provider_signed", + "first_seen_at": "2013-04-01 00:11:00", + "last_audited_at": "2013-04-02 12:00:00", + }, + { + "type": "otr" + "key": "binary blob", + "expires_on": "2014-01-01", + "validation": "registered", + "first_seen_at": "2013-04-01 00:11:00", + "last_audited_at": "2013-04-02 12:00:00", + } + ] + } + +Pseudocode +--------------------------- + +get_key + + # + # return a key for an address + # + function get_key(address, type) + if key for address exists in soledad database? + return key + else + fetch key from nickserver + save it in soledad + return key + end + end + +send_key + + # + # send the user's provider the user's key. this key will get signed by the provider, and replace any prior keys + # + function send_key(type) + if not authenticated: + error! + end + get (self.address, type) + send (key_data, type) to the provider + end + +refresh_keys + + # + # update the user's db of validated keys to see if there are changes. + # + function refresh_keys() + for each key in the soledad database (that should be checked?): + newkey = fetch_key_from_nickserver() + if key is about to expire and newkey complies with the renewal paramters: + replace key with newkey + else if fingerprint(key) != fingerprint(newkey): + freak out, something wrong is happening? :) + may be handle revokation, or try to get some voting for a given key and save that one (retrieve it through tor/vpn/etc and see what's the most found key or something like that. + else: + everything's cool for this key, continue + end + end + end + +private fetch_key_from_nickserver + + function fetch_key_from_nickserver(key) + randomly pick a subset of the available nickservers we know about + send a tcp request to each in this subset in parallel + first one that opens a successful socket is used, all the others are terminated immediately + make http request + parse json for the keys + return keys + end + + +Reference nickserver implementation +===================================================== + +https://github.com/leapcode/nickserver + +The reference nickserver is written in Ruby 1.9 and licensed GPLv3. It is lightweight and scalable (supporting high concurrency, and reasonable latency), and uses EventMachine for asynchronous network IO. Data is stored in CouchDB. + diff --git a/pages/docs/design/overview.md b/pages/docs/design/overview.md new file mode 100644 index 0000000..e477806 --- /dev/null +++ b/pages/docs/design/overview.md @@ -0,0 +1,403 @@ +@nav_title = "Overview" +@title = "Overview of LEAP architecture" +@summary = "Bird's eye view of how all the pieces fit together." + +The LEAP Platform allows an organization to deploy and manage a complete infrastructure for providing user communication services. + +This document gives a brief overview of how the pieces fit together. + +LEAP Client +=================== + +The LEAP Client is an application that runs on a user's own device and is responsible for all encryption of user data. The client must be installed a user's device before they can access any LEAP services (except for user support via the web application). + +Desktop Client +-------------------------- + +LEAP Client for Linux, Windows, and Mac. + +Written in: Python + +Libraries used: QT, PyQT, OpenVPN, Sqlite, Sqlcipher, U1DB, OpenSSL, GPG. + +User interface: + +* First run wizard: walks the user through the bootstrap process when the client is first run (either registering a new user or authenticating as an existing user) +* Preferences panel: A mac system-preferences-like place to edit all the LEAP client settings (does not exist yet). +* Task bar: Show the status of LEAP services (connected? syncing?), and lets the user open the preferences panel. +* Update wizard: a dialog that shows the code update progress. + +Android Client +------------------------------ + +LEAP Client for Android. + +Written in: Java (possibly with with some Python in the future) + +Libraries used: sqlcipher, sqlite, bouncycastle, U1DB, OpenVPN. + +User interface: + +* Single button to connect or disconnect encrypted internet +* A notification drawer item indicating status of VPN +* A first run wizard + +Features (planned): + +* a sync provider to allow contacts and calendar data to be sync'ed via Soledad. +* eventually, match the desktop client in features. + + +LEAP Admin Tools +==================================== + +Platform Recipes +------------------------------ + +The LEAP platform recipes define an abstract service provider. It consists of puppet modules designed to work together to provide a system administrator everything they need to manage a service provider infrastructure that provides secure communication services. + +Typically, a system administrator will not need to modify the LEAP platform recipes, although they are free to fork and merge as desired. Most service providers using the LEAP platform will use the same platform recipes. + +The recipes are abstract. In order to configure settings for a particular service provider, a system administrator creates a provider instance. The platform recipes also include a base provider that provider instances inherit from. + +Provider Instance +---------------------------------- + +A "provider instance" is a directory tree (typically tracked in git) containing all the configurations for a service provider's infrastructure. A provider instance primarily consists of: + +* A configuration file for each server (node) in the provider's infrastructure (e.g. nodes/vpn1.json) +* A global configuration file for the provider (e.g. provider.json). +* Additional files, such as certificates and keys (e.g. files/nodes/vpn1/vpn1_ssh.pub). +* A pointer to the platform recipes (as defined in "Leapfile") + +A minimal provider instance directory looks like this: + + + └── bitmask # provider instance directory. + ├── common.json # settings common to all nodes. + ├── Leapfile # various settings for this instance. + ├── provider.json # global settings of the provider. + ├── files/ # keys, certificates, and other files. + ├── nodes/ # a directory for node configurations. + └── users/ # public key information for privileged sysadmins. + +A provider instance directory contains everything needed to manage all the servers that compose a provider's infrastructure. Because of this, you can use normal git development work-flow to manage your provider instance. + +Command line program +------------------------------- + +The command line program `leap` is used by sysadmins to manage everything about a service provider's infrastructure. Except when creating an new provider instance, `leap` is run from within the directory tree of a provider instance. + +The `leap` command line has many capabilities, including: + +* create an initial provider instance +* create, initialize, and deploy nodes (e.g. servers) +* manage keys and certificates +* query information about the node configurations + +Traditional system configuration automation systems, like puppet or chef, deploy changes to servers using a pull method. Each server pulls a manifest from a central master server and uses this to alter the state of the server. + +Instead, LEAP uses a masterless push method: The user runs 'leap deploy' from the provider instance directory on their desktop machine to push the changes out to every server (or a subset of servers). LEAP still uses puppet, but there is no central master server that each node must pull from. + +One other significant difference between LEAP and typical system automation is how interactions among servers are handled. Rather than store a central database of information about each server that can be queried when a recipe is applied, the `leap` command compiles static representation of all the information a particular server will need in order to apply the recipes. In compiling this static representation, `leap` can use arbitrary programming logic to query and manipulate information about other servers. + +These two approaches, masterless push and pre-compiled static configuration, allow the sysadmin to manage a set of LEAP servers using traditional software development techniques of branching and merging, to more easily create local testing environments using virtual servers, and to deploy without the added complexity and failure potential of a master server. + +Server-side Components +======================================= + +These are components where most of the code and logic runs on a server (as opposed to client-side components, where most of the code runs on the client). + +Databases +------------------------------------ + +All user data is stored using BigCouch, a decentralized and high-availability version of CouchDB. + +The databases are used by the different services and sometimes work as communication channels between the services. + +These are the databases we currently use: + +* customers -- payment information for the webapp +* identities -- alias information, written by the webapp, read by leap_mx and nickserver +* keycache -- used by the nickserver +* sessions -- web session persistance for the webapp +* shared -- used by soledad +* tickets -- help tickets issued in the webapp +* tokens -- created by the webapp on login, used by soledad to authenticate +* users -- user records used by the webapp including the authentication data +* user-...id... -- client-encrypted user data accessed from the client via soledad + +### Database Setup + +The main couch databases are initially created, seeded and updated when deploying the platform. + +The site_couchdb module contains the database description and security settings in `manifests/create_dbs.pp`. The design docs are seeded from the files in `files/designs/:db_name`. If these files change the next puppet deploy will update the databases accordingly. Both the webapp and soledad have scripts that will dump the required design docs so they can be included here. + +The per-user databases are created upon user registration by [Tapicero](https://leap.se/docs/design/tapicero). Tapicero also adds security and design documents. The design documents for per-user databases are stored in the [tapicero repository](https://github.com/leapcode/tapicero) in `designs`. Tapicero can be used to update existing user databases with new security settings and design documents. + +### BigCouch + +Like many NoSQL databases, BigCouch is inspired by [Amazon's Dynamo paper](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) and works by sharding each database among many servers using a circular ring hash. The number of shards might be greater than the number of servers, in which case each server would have multiple shards of the same database. Each server in the BigCouch cluster appears to contain the entire database, but actually it will just proxy the request to the actual database that has the content (if it does not have the document itself). + +Important BigCouch constants: + +* Q -- The number of shards over which a database will spread. +* N -- The number of redundant copies of each document. Default is 3. +* W -- The number of document copies that must be saved before document is 'written'. Default is 2. +* R -- The number of document copies that must be found before document is 'read'. Default is 2. +* Z -- The number of zones in the cluster. Each zone will have a complete copy of all the data. Default is 1. + +In LEAP, every service that needs to interact with the database runs a local HTTP load balancer that distributes database requests randomly to the BigCouch cluster. If a BigCouch node dies, the load balancer detects this and takes it out of rotation (this usage is typical of BigCouch installations). + +Web App +------------------------------ + +The LEAP Web App provides the following functions: + +* User registration and management +* Help tickets +* Client certificate renewal +* Webfinger access to user's public keys +* Email aliases and forwarding +* Localized and Customizable documentation + +Written in: Ruby, Rails. + +The Web App communicates with: + +* CouchDB is used for all data storage. +* Web browsers of users accessing the user interface in order to edit their settings or fill out help tickets. Additionally, admins may delete users. +* LEAP Clients access the web app's REST API in order to register new users, authenticate existing ones, and renew client certificates. +* tokens are stored upon successful authentication to allow the client to authenticate against other services + +Nickserver +------------------------------ + +Written in: Ruby +Libaries: EventMachine, GPG + +Nickserver is the opposite of a key server. A key server allows you to lookup keys, and the UIDs associated with a particular key. A nickserver allows you to query a particular 'nick' (e.g. username@example.org) and get back relevant public key information for that nick. + +Nickserver has the following properties: + +* Written in Ruby, licensed GPLv3 +* Lightweight and scalable (high concurrency, reasonable latency) +* Uses asynchronous network IO for both server and client connections (via EventMachine) +* Attempts to reply to queries using four different methods: + * Cached key in CouchDB + * Webfinger + * DNS + * HKP keyserver pool (https://hkps.pool.sks-keyservers.net) + +Why bother writing Nickserver instead of just using the existing HKP keyservers? + +* Keyservers are fundamentally different: Nickserver is a registry of 1:1 mapping from nick (uid) to public key. Keyservers are directories of public keys, which happen to have some uid information in the subkeys, but there is no way to query for an exact uid. +* Support clients: the goal is to provide clients with a cloud-based method of rapidly and easily converting nicks to keys. Client code can stay simple by pushing more of the work to the server. +* Enhancements over keyservers: the goal with Nickserver is to support future enhancements like webfinger, DNS key lookup, mail-back verification, network perspective, and fast distribution of short lived keys. +* Scalable: the goal is for a service that can handle many simultaneous requests very quickly with low memory consumption. + +Miscellaneous +------------------------------ + +A LEAP service provider might also run servers with the following services: + +* git -- private git repository hosting. +* Domain Name Server -- Authoritative name server for the provider's domain. +* Tapicero -- headless daemon that watches couch changes for new users and creates their databases + +Client-side Components +====================================== + +Most of the code and processing for these components happens on the client-side, although they all include some interaction with cloud services. + +Soledad +------------------------------ + +Soledad stands for "Synchronization Of Locally Encrypted Data Among Devices". On the client side, Soledad is responsible for client-encrypting user data, keeping it in sync with the copy in the cloud, and for providing local applications with a simple API for data storage. This "client-side Soledad" is essentially a local database that is kept in sync with the cloud. The "Soledad Server" is the cloud-based component that the client syncs with. + +Written in: Python (on desktops and servers), possibly Java (on android, not yet written). + +Libraries used: + +* Client-side: U1DB, Sqlite, Sqlcipher, GPG. +* Server: U1DB (forked), CouchDB. + +Client-side Soledad communicates with: + +* Other client application code, providing a storage API. +* Soledad Server via the U1DB synchronization protocol. + +Soledad Server communicates with: + +* LEAP Client via the U1DB synchronization protocol +* CouchDB or OpenStack Object Storage for backend storage. + +Client-side Soledad Notes: + +* Soledad is an modification of U1DB python reference implementation with changes to support client-side encryption and to replace sqlite with sqlcipher. +* Local data is stored on disk as an SQLite DB file(s) that is block-encrypted with sqlcipher (AES128). +* Before being synced to the server, a document is block-encrypted using a symmetric key composed from HMAC of the document id and a long secret (Soledad secret). +* Soledad secret is stored on-disk encrypted to the user's OpenPGP key. A copy is stored on the server as well. The same secret is shared among all the clients a user has activated. +* Soledad inherits these traits from U1DB: + * The storage API used by client code is similar to couchdb (schema-less document storage with indexes). + * Application code using Soledad is responsible for resolving sync conflicts. + +Secrets Manager +------------------------------ + +Not yet written. + +Written in: Python +Libraries used: GPG, GnuTLS + +Communicates with: Nickserver (cloud), Soledad (local). + +The Secrets Manager is a library that exposes to local client code an API for managing cryptographic material. It is responsible for: + +* private secrets: the user's private and public keys and certificates. +* public keys: discovering, registering, and trusting the public keys of other people. +* creation: creating keys as needed. +* renewal: fetch a new client certificate when the current one is about to expire. +* recovery: allow the user to recover their data if they lose everything except for a recovery code. +* crypto hardware: allow the user to unlock secrets via an OpenPGP smart card like cryptostick. + +**Example secrets** + +* A user's OpenPGP keypair +* The symmetric key used to encrypt local data (used by sqlcipher) +* The client certificate used to auth with OpenVPN gateway. +* The client certificate used to auth with the SMTP gateway. + +**Public key management** + +Some functionality of public key management: + +* Discover the public keys of recipients and senders via a Nickserver. +* "Register" the discovered keys, either using a federated path through the provider, directly, or via trust on first use (TOFU). For now, we will start initially with TOFU. +* Allow the user to choose between two competing keys when a recipient has multiple candidate keys. +* Allow the user to specify keys that should be not used. +* Allow the user to manually specify a user's public key. + +**Recovery** + +* Allow the user to generate and print out a recovery code. This creates a record on the server, in an anonymized way, that can be used to restore all the secrets stored by the key/secret manager and thus recover all your data. The provider should not know what recovery information maps to which user. +* Eventually, perhaps allow the user to specify other users who have the power to recover their lost secrets in the event that the user forgets their password. +* Allow the user to enter this recovery code when they have lost their username and password. If this is enabled, the user's private keys are stored in the cloud, albeit encrypted and anonymized. +* Give some users the option of full recovery via email reset by storing the user's password on the server. This would be a very low security option, but one that some users may wish to opt-in for. + +**Notes** + +* All secrets are stored in Soledad, except the secret to unlock Soledad storage. This way, all clients will have access to the same secrets. For some things, like validated public keys, this is exactly what we want. For other things, this could be a problem, and should be refined in future revisions. +* The current scheme is to store the user's private keys and private secrets in their Soledad storage. This allows a user to login with a different device and be all set up. There are, however, certainly problems with this approach. + + +Bootstrap +------------------------------ + +Parts of this are written. + +Written in: Python + +* Register new accounts or authenticate via the REST API, using SRP. +* Download the providers definition file, and various service definition files. +* Validate the CA certificate of the service provider. +* If using an existing account on a new device, fetch user's secrets from the cloud (not yet written). +* If creating a new account, generate a key pair and store in the cloud (not yet written). + +Update Manager +------------------------------ + +Not yet written. + +Handles upgrading the client by downloading and installing signed code. + +Three goals: + +* Frequent Updates: we want to be able to push out small and frequent updates should the need arise. +* Secure Updates: we want to ensure that the update mechanism cannot be used as an attack vector. +* Third Party Updates: we want a third party to be responsible for updates, NOT the service provider itself. + +End User Services +========================================= + +Email +------------------------------ + +Not yet working, some of the parts are written. + +Written in: Python + +Email in the client consists of three parts: + +* SMTP Proxy: for outgoing mail. + * Communicates with user's MUA (local), Key Manager (local), Nickserver (cloud), and SMTP relay (cloud). +* Message Receiver: for incoming mail. + * Communicates with Soledad (local), Key Manager (local). +* IMAP Server: for reading and writing to user's mailbox. + * Communicates with Soledad (local), user's MUA (local). + +Outgoing mail workflow: + +* LEAP client runs a thin SMTP proxy on the user's device, bound to localhost. +* User's MUA is configured outgoing SMTP to localhost +* When SMTP proxy receives an email from MUA + * SMTP proxy queries Key Manager for the user's private key and public keys of all recipients + * Message is signed by sender and encrypted to recipients. + * If recipient's key is missing, email goes out in cleartext (unless user has configured option to send only encrypted email) + * Finally, message is relayed to provider's SMTP relay + +Incoming email workflow: + +* Incoming message is received by provider's MX servers. +* Message is encrypted to the user's public key (if not already so), and stored in the user's incoming message queue. +* Message queue is synced to client device via Soledad. +* "Message Receiver" in the LEAP Client empties message queue, unencrypting each message and saving it in the user's inbox, stored in local Soledad database. +* Local database gets client-encrypted and sync'ed to cloud and other devices owned by the user via Soledad. + +Mail storage workflow: + +* LEAP client runs a thin IMAP server on the user's device, bound to localhost. +* User's MUA is configured to use localhost for the mail account. +* Local IMAP server runs against a local database the user's email data (access via Soledad). +* Soledad will sync changes made to mailboxes with the cloud and other clients. + +Encrypted Internet +------------------------------ + +The goal behind the encrypted internet service is to provide an automatic, always on, trouble free way to encrypt a user's network traffic. For now, we use OpenVPN for the transport (OpenVPN uses TLS for session negotiation and IPSec for data). + +Written in: C (OpenVPN binary), Python (desktop controlling code), Java (android controlling code) +Libraries: QT +Uses: OpenVPN + +Communicates with: + +* All traffic is routed through one of the provider's OpenVPN gateways +* OpenVPN binary and LEAP client communicate via a telnet administration interface to OpenVPN. +* Client discovers gateways and fetches client certificate from the provider's HTTP API. + +User Interface: + +* Initial connection attempt takes place in the first run wizard, displaying any errors along the way. +* After first run, the client will display the status of the encrypted internet in the task tray (windows, linux), menu bar (mac), or notification drawer (android). +* The three main UI functions of the encrypted internet will be: connect/disconnect, choose gateway, view errors. + +Notes: + +* OpenVPN must be started with superuser privileges (or have the ability to execute network changes as superuser). Afterwards, it can drop the privileges. +* OpenVPN authentication with the gateway uses an x.509 client certificate. This certificate is short lived, and is acquired by the client from the provider's HTTP API as needed. + +Workflow: + +* user installs client +* on first run + * client downloads and validates service provider's definition file, CA cert, and encrypted internet service definition file. + * user registers new account or authenticates with provider's webapp REST API + * SRP is used, server never sees the password and does not store a hash of the password. + * if registering, new record is created for user in distributed users db. +* client gets a new client certificate from webapp, if missing or expired + * authenticate via SRP with webapp + * webapp retrieves client cert from a pool of pre-generated certificates. + * cert pool is filled as needed by background CA deamon. +* client connects to openvpn gateway, picked from among those listed in service definition file, authenticating with client certificate. +* by default, when user starts computer the next time, client autoconnects. diff --git a/pages/docs/design/soledad.md b/pages/docs/design/soledad.md new file mode 100644 index 0000000..a0eeed4 --- /dev/null +++ b/pages/docs/design/soledad.md @@ -0,0 +1,423 @@ +@title = 'Soledad' +@summary = 'A server daemon and client library to provide client-encrypted application data that is kept synchronized among multiple client devices.' +@toc = true + +Introduction +===================== + +Soledad allows client applications to securely share synchronized document databases. Soledad aims to provide a cross-platform, cross-device, syncable document storage API, with the addition of client-side encryption of database replicas and document contents stored on the server. + +Key aspects of Soledad include: + +* **Client and server:** Soledad includes a server daemon and client application library. +* **Client-side encryption:** Soledad puts very little trust in the server by encrypting all data before it is synchronized to the server and by limiting ways in which the server can modify the user's data. +* **Local storage:** All data cached locally is stored in an encrypted database. +* **Document database:** An application using the Soledad client library is presented with a document-centric database API for storage and sync. Documents may be indexed, searched, and versioned. + +The current reference implementation of Soledad is written in Python and distributed under a GPLv3 license. + +Soledad is an acronym of "Synchronization of Locally Encrypted Documents Among Devices" and means "solitude" in Spanish. + +Goals +====================== + +**Security goals** + +* *Client-side encryption:* Before any data is synced to the cloud, it should be encrypted on the client device. +* *Encrypted local storage:* Any data cached in the client should be stored in an encrypted format. +* *Resistant to offline attacks:* Data stored on the server should be highly resistant to offline attacks (i.e. an attacker with a static copy of data stored on the server would have a very hard time discerning much from the data). +* *Resistant to online attacks:* Analysis of storing and retrieving data should not leak potentially sensitive information. +* *Resistance to data tampering:* The server should not be able to provide the client with old or bogus data for a document. + +**Synchronization goals** + +* *Consistency:* multiple clients should all get sync'ed with the same data. +* *Sync flag:* the ability to partially sync data. For example, so a mobile device doesn't need to sync all email attachments. +* *Multi-platform:* supports both desktop and mobile clients. +* *Quota:* the ability to identify how much storage space a user is taking up. +* *Scalable cloud:* distributed master-less storage on the cloud side, with no single point of failure. +* *Conflict resolution:* conflicts are flagged and handed off to the application logic to resolve. + +**Usability goals** + +* *Availability*: the user should always be able to access their data. +* *Recovery*: there should be a mechanism for a user to recover their data should they forget their password. + +**Known limitations** + +These are currently known limitations: + +* The server knows when the contents of a document have changed. +* There is no facility for sharing documents among multiple users. +* Soledad is not able to prevent server from withholding new documents or new revisions of a document. +* Deleted documents are never deleted, just emptied. Useful for security reasons, but could lead to DB bloat. + +**Non-goals** + +* Soledad is not for filesystem synchronization, storage or backup. It provides an API for application code to synchronize and store arbitrary schema-less JSON documents in one big flat document database. One could model a filesystem on top of Soledad, but it would be a bad fit. +* Soledad is not intended for decentralized peer-to-peer synchronization, although the underlying synchronization protocol does not require a server. Soledad takes a cloud approach in order to ensure that a client has quick access to an available copy of the data. + +Related software +================================== + +[Crypton](https://crypton.io/) - Similar goals to Soledad, but in javascript for HTML5 applications. + +[Mylar](https://github.com/strikeout/mylar) - Like Crypton, Mylar can be used to write secure HTML5 applications in javascript. Uniquely, it includes support for homomorphic encryption to allow server-side searches. + +[Firefox Sync](https://wiki.mozilla.org/Services/Sync) - A client-encrypted data sync from Mozilla, designed to securely synchronize bookmarks and other browser settings. + +[U1DB](http://pythonhosted.org/u1db/) - Similar API as Soledad, without encryption. + +Soledad protocol +=================================== + +Document API +----------------------------------- + +Soledad's document API is similar to the [API used in U1DB](http://pythonhosted.org/u1db/reference-implementation.html). + +* Document storage: `create_doc()`, `put_doc()`, `get_doc()`. +* Synchronization with the server replica: `sync()`. +* Document indexing and searching: `create_index()`, `list_indexes()`, `get_from_index()`, `delete_index()`. +* Document conflict resolution: `get_doc_conflicts()`, `resolve_doc()`. + +For example, create a document, modify it and sync: + + sol.create_doc({'my': 'doc'}, doc_id='mydoc') + doc = sol.get_doc('mydoc') + doc.content = {'new': 'content'} + sol.put_doc(doc) + sol.sync() + +Storage secret +----------------------------------- + +The `storage_secret` is a long, randomly generated key used to derive encryption keys for both the documents stored on the server and the local replica of these documents. The `storage_secret` is block encrypted using a key derived from the user's password and saved locally on disk in a file called `<user_uid>.secret`, which contains a JSON structure that looks like this: + + { + "storage_secrets": { + "<secret_id>": { + "kdf": "scrypt", + "kdf_salt": "<b64 repr of salt>", + "kdf_length": <key_length>, + "cipher": "aes256", + "length": <secret_length>, + "secret": "<encrypted storage_secret>", + } + } + 'kdf': 'scrypt', + 'kdf_salt': '<b64 repr of salt>', + 'kdf_length: <key length> + } + +The `storage_secrets` entry is a map that stores information about available storage keys. Currently, Soledad uses only one storage key per provider, but this may change in the future. + +The following fields are stored for one storage key: + +* `secret_id`: a handle used to refer to a particular `storage_secret` and equal to `sha256(storage_secret)`. +* `kdf`: the key derivation function to use. Only scrypt is currently supported. +* `kdf_salt`: the salt used in the kdf. The salt for scrypt is not random, but encodes important parameters like the limits for time and memory. +* `kdf_length`: the length of the derived key resulting from the kdf. +* `cipher`: what cipher to use to encrypt `storage_secret`. It must match `kdf_length` (i.e. the length of the derived_key). +* `length`: the length of `storage_secret`, when not encrypted. +* `secret`: the encrypted `storage_secret`, created by `sym_encrypt(cipher, storage_secret, derived_key)` (base64 encoded). + +Other variables: + +* `derived_key` is equal to `kdf(user_password, kdf_salt, kdf_length)`. +* `storage_secret` is equal to `sym_decrypt(cipher, secret, derived_key)`. + +When a client application first wants to use Soledad, it must provide the user's password to unlock the `storage_secret`: + + from leap.soledad.client import Soledad + sol = Soledad( + uuid='<user_uid>', + passphrase='<user_passphrase>', + secrets_path='~/.config/leap/soledad/<user_uid>.secret', + local_db_path='~/.config/leap/soledad/<user_uid>.db', + server_url='https://<soledad_server_url>', + cert_file='~/.config/leap/providers/<provider>/keys/ca/cacert.pem', + auth_token='<auth_token>', + secret_id='<secret_id>') # optional argument + + +Currently, the `storage_secret` is shared among all devices with access to a particular user's Soledad database. See [Recovery and bootstrap](#Recovery.and.bootstrap) for how the `storage_secret` is initially installed on a device. + +We don't use the `derived_key` as the `storage_secret` because we want the user to be able to change their password without needing to re-key. + +Document encryption +------------------------ + +Before a JSON document is synced with the server, it is transformed into a document that looks like this: + + { + "_enc_json": "<ciphertext>", + "_enc_scheme": "symkey", + "_enc_method": "aes256ctr", + "_enc_iv": "<initialization_vector>", + "_mac": "<auth_mac>", + "_mac_method": "hmac" + } + +About these fields: + +* `_enc_json`: The original JSON document, encrypted and hex encoded. Calculated as: + * `doc_key = hmac(storage_secret[MAC_KEY_LENGTH:], doc_id)` + * `ciphertext = hex(sym_encrypt(cipher, content, doc_key))` +* `_enc_scheme`: Information about the encryption scheme used to encrypt this document (i.e.`pubkey`, `symkey` or `none`). +* `_enc_method`: Information about the block cipher that is used to encrypt this document. +* `_mac`: A MAC to prevent the server from tampering with stored documents. Calculated as: + * `mac_key = hmac(storage_secret[:MAC_KEY_LENGTH], doc_id)` + * `_mac = hmac(doc_id|rev|ciphertext|_enc_scheme|_enc_method|_enc_iv, mac_key)` +* `_mac_method`: The method used to calculate the mac above (currently hmac). + +Other variables: + +* `doc_key`: This value is unique for every document and only kept in memory. We use `doc_key` instead of simply `storage_secret` in order to hinder possible derivation of `storage_secret` by the server. Every `doc_id` is unique. +* `content`: equal to `sym_decrypt(cipher, ciphertext, doc_key)`. + +When receiving a document with the above structure from the server, Soledad client will first verify that `_mac` is correct, then decrypt the `_enc_json` to find `content`, which it saves as a cleartext document in the local encrypted database replica. + +The document MAC includes the document revision and the client will refuse to download a new document if the document does not include a higher revision. In this way, the server cannot rollback a document to an older revision. The server also cannot delete a document, since document deletion is handled by removing the document contents, marking it as deleted, and incrementing the revision. However, a server can withhold from the client new documents and new revisions of a document (including withholding document deletion). + +The currently supported encryption ciphers are AES256 (CTR mode) and XSalsa20. The currently supported MAC method is HMAC with SHA256. + +Document synchronization +----------------------------------- + +Soledad follows the U1DB synchronization protocol, with some changes: + +* Add the ability to flag some documents so they are not synchronized by default (not fully supported yet). +* Refuse to synchronize a document if it is encrypted and the MAC is incorrect. +* Always use `https://<soledad_server_url>/user-<user_uid>` as the synchronization URL. + + + doc = sol.create_doc({'some': 'data'}) + doc.syncable = False + sol.sync() # will not send the above document to the server! + +Document IDs +-------------------- + +Like U1DB, Soledad allows the programmer to use whatever ID they choose for each document. However, it is best practice to let the library choose random IDs for each document so as to ensure you don't leak information. In other words, leave the second argument to `create_doc()` empty. + +Re-keying +----------- + +Sometimes there is a need to change the `storage_secret`. Rather then re-encrypt every document, Soledad implements a system called "lazy revocation" where a new `storage_secret` is generated and used for all subsequent encryption. The old `storage_secret` is still retained and used when decrypting older documents that have not yet been re-encrypted with the new `storage_secret`. + +Authentication +----------------------- + +Unlike U1DB, Soledad only supports token authentication and does not support OAuth. Soledad itself does not handle authentication. Instead, this job is handled by a thin HTTP WSGI middleware layer running in front of the Soledad server daemon, which retrieves valid tokens from a certain shared database and compares with the user-provided token. How the session token is obtained is beyond the scope of Soledad. + +Bootstrap and recovery +------------------------------------------ + +Because documents stored on the server's database replica have their contents encrypted with keys based on the `storage_secret`, initial synchronizations of newly configured provider accounts are only possible if the secret is transferred from one device to another. Thus, installation of Soledad in a new device or account recovery after data loss is only possible if specific recovery data has previously been exported and either stored on the provider or imported on a new device. + +Soledad may export a recovery document containing recovery data, which may be password-encrypted and stored in the server, or stored in a safe environment in order to be later imported into a new Soledad installation. + +**Recovery document** + +An example recovery document: + + { + 'storage_secrets': { + '<secret_id>': { + 'kdf': 'scrypt', + 'kdf_salt': '<b64 repr of salt>' + 'kdf_length': <key length> + 'cipher': 'aes256', + 'length': <secret length>, + 'secret': '<encrypted storage_secret>', + }, + }, + 'kdf': 'scrypt', + 'kdf_salt': '<b64 repr of salt>', + 'kdf_length: <key length>, + '_mac_method': 'hmac', + '_mac': '<mac>' + } + +About these fields: + +* `secret_id`: a handle used to refer to a particular `storage_secret` and equal to `sha256(storage_secret)`. +* `kdf`: the key derivation function to use. Only scrypt is currently supported. +* `kdf_salt`: the salt used in the kdf. The salt for scrypt is not random, but encodes important parameters like the limits for time and memory. +* `kdf_length`: the length of the derived key resulting from the kdf. +* `length`: the length of the secret. +* `secret`: the encrypted `storage_secret`. +* `cipher`: what cipher to use to encrypt `secret`. It must match `kdf_length` (i.e. the length of the `derived_key`). +* `_mac_method`: The method used to calculate the mac above (currently hmac). +* `_mac`: Defined as `hmac(doc_id|rev|ciphertext, doc_key)`. The purpose of this field is to prevent the server from tampering with the stored documents. + +Currently, scrypt parameters are: + + N (CPU/memory cost parameter) = 2^14 = 16384 + p (paralelization parameter) = 1 + r (length of block mixed by SMix()) = 8 + dkLen (length of derived key) = 32 bytes = 256 bits + +Other fields we might want to include in the future: + +* `expires_on`: the month in which this recovery document should be purged from the database. The server may choose to purge documents before their expiration, but it should not let them linger after it. +* `soledad`: the encrypted `soledad.json`, created by `sym_encrypt(cipher, contents(soledad.json), derived_key)` (base64 encoded). +* `reset_token`: an optional encrypted password reset token, if supported by the server, created by `sym_encrypt(cipher, password_reset_token, derived_key)` (base64 encoded). The purpose of the reset token is to allow recovery using the recovery code even if the user has forgotten their password. It is only applicable if using recovery code method. + +**Recovery database** + +In order to support easy recovery, the Soledad client stores a recovery document in a special recovery database. This database is shared among all users. + +The recovery database supports two functions: + +* `get_doc(doc_id)` +* `put_doc(doc_id, recovery_document_content)` + +Anyone may preform an unauthenticated `get_doc` request. To mitigate the potential attacks, the response to queries of the discovery database must have a long delay of X seconds. Also, the `doc_id` is very long (see below). + +Although the database is shared, the user must authenticate via the normal means before they are allowed to put a recovery document. Because of this, a nefarious server might potentially record which user corresponds to which recovery documents. A well behaved server, however, will not retain this information. If the server supports authentication via blind signatures, then this will not be an issue. + + +**Recovery code (yet to be implemented)** + +We intend to offer data recovery by specifying username and a recovery code. The choice of type of recovery (using password or a recovery code) must be made in advance of attempting recovery (e.g. at some point after the user has Soledad successfully running on a device). + +About the optional recovery code: + +* The recovery code should be randomly generated, at least 16 characters in length, and contain all lowercase letters (to make it sane to type into mobile devices). +* The recovery code is not stored by Soledad. When the user needs to bootstrap a new device, a new code is generated. To be used for actual recovery, a user will need to record their recovery code by printing it out or writing it down. +* The recovery code is independent of the password. In other words, if a recovery code is generated, then a user changes their password, the recovery code is still be sufficient to restore a user's account even if the user has lost the password. This feature is dependent on the server supporting a password reset token. Also, generating a new recovery code does not affect the password. +* When a new recovery code is created, and new recovery document must be pushed to the recovery database. A code should not be shown to the user before this happens. +* The recovery code expires when the recovery database record expires (see below). + +The purpose of the recovery code is to prevent a compromised or nefarious Soledad service provider from decrypting a user's storage. The benefit of a recovery code over the user password is that the password has a greater opportunity to be compromised by the server. Even if authentication is performed via Secure Remote Password, the server may still perform a brute force attack to derive the password. + +Reference implementation of client +=================================== + +https://github.com/leapcode/soledad + +Dependencies: + +* [U1DB](https://launchpad.net/u1db) provides an API and protocol for synchronized databases of JSON documents. +* [SQLCipher](http://sqlcipher.net/) provides a block-encrypted SQLite database used for local storage. +* python-gnupg +* scrypt +* pycryptopp + +Local storage +-------------------------- + +U1DB reference implementation in Python has an SQLite backend that implements the object store API over a common SQLite database residing in a local file. To allow for encrypted local storage, Soledad adds a SQLCipher backend, built on top of U1DB's SQLite backend, which adds [SQLCipher API](http://sqlcipher.net/sqlcipher-api/) to U1DB. + +**Responsibilities** + +The SQLCipher backend is responsible for: + +* Providing the SQLCipher API for U1DB (`PRAGMA` statements that control encryption parameters). +* Guaranteeing that the local database used for storage is indeed encrypted. +* Guaranteeing secure synchronization: + * All data being sent to a remote replica is encrypted with a symmetric key before being sent. + * Ensure that data received from remote replica is indeed encrypted to a symmetric key when it arrives, and then that it is decrypted before being included in the local database replica. +* Correctly representing and handling new Document properties (e.g. the `sync` flag). + +Part of the Soledad `storage_key` is used directly as the key for the SQLCipher encryption layer. SQLCipher supports the use of a raw 256 bit keys if provided as a 64 character hex string. This will skip the key derivation step (PBKDF2), which is redundant in our case. For example: + + sqlite> PRAGMA key = "x'2DD29CA851E7B56E4697B0E1F08507293D761A05CE4D1B628663F411A8086D99'"; + +**Classes** + +SQLCipher backend classes: + +* `SQLCipherDatabase`: An extension of `SQLitePartialExpandDatabase` used by Soledad Client to store data locally using SQLCipher. It implements the following: + * Need of a password to instantiate the db. + * Verify if the db instance is indeed encrypted. + * Use a LeapSyncTarget for encrypting content before synchronizing over HTTP. + * "Syncable" option for documents (users can mark documents as not syncable, so they do not propagate to the server). + +Encrypted synchronization target +-------------------------------------------------- + +To allow for database synchronization among devices, Soledad uses the following conventions: + +* Centralized synchronization scheme: Soledad clients always sync with a server, and never between themselves. +* The server stores its database in a CouchDB database using a REST API over HTTP. +* All data sent to the server is encrypted with a symmetric secret before being sent. Note that this ensures all data received by the server and stored in the CouchDB database has been encrypted by the client. +* All data received from the server is validated as being an encrypted blob, and then is decrypted before being stored in local database. Note that the local database provides a new encryption layer for the data through SQLCipher. + +**Responsibilities** + +Provide sync between local and remote replicas: + +* Encrypt outgoing content. +* Decrypt incoming content. + +**Classes** + +Synchronization-related classes: + +* `SoledadSyncTarget`: an extension of `HTTPSyncTarget` modified to encrypt documents' content before sending them to the network and to have more control of the syncing process. + +Reference implementation of server +====================================================== + +https://github.com/leapcode/soledad + +Dependencies: + +* [CouchDB](https://couchdb.apache.org/] for server storage, via [python client library](https://pypi.python.org/pypi/CouchDB/0.8). +* [Twisted](http://twistedmatrix.com/trac/) to run the WSGI application. +* scrypt +* pycryptopp +* PyOpenSSL + +CouchDB backend +------------------------------- + +In the server side, Soledad stores its database replicas in CouchDB servers. Soledad's CouchDB backend implementation is built on top of U1DB's `CommonBackend`, and stores and fetches data using a remote CouchDB server. It lacks indexing first because we don't need that functionality on server side, but also because if not very well done, it could lack sensitive information about document's contents. + +CouchDB backend is responsible for: + +* Initializing and maintaining the following U1DB replica data in the database: + * Transaction log. + * Conflict log. + * Synchronization log. +* Mapping the U1DB API to CouchDB API. + +**Classes** + +* `CouchDatabase`: A backend used by Soledad Server to store data in CouchDB. +* `CouchSyncTarget`: Just a target for syncing with Couch database. +* `CouchServerState`: Interface of the WSGI server with the CouchDB backend. + +WSGI Server +----------------------------------------- + +The U1DB server reference implementation provides for an HTTP API backed by SQLite databases. Soledad extends this with token-based auth HTTP access to CouchDB databases. + +* Soledad makes use of `twistd` from Twisted API to serve its WSGI application. +* Authentication is done by means of a token. +* Soledad implements a WSGI middleware in server side that: + * Uses the provided token to verify read and write access to each user's private databases and write access to the shared recovery database. + * Allows reading from the shared remote recovery database. + * Uses CouchDB as its backend. + +**Classes** + +* `SoledadAuthMiddleware`: implements the WSGI middleware with token based auth as described before. +* `SoledadApp`: The WSGI application. For now, not different from `u1db.remote.http_app.HTTPApp`. + +**Authentication** + +Soledad Server authentication middleware controls access to user's private databases and to the shared recovery database. Soledad client provides a token for Soledad server that can check the validity of this token for this user's session by querying a certain database. + +A valid token for this user's session is required for: + +* Read and write access to this user's database. +* Read and write access to the shared recovery database. + +Tests +=================== + +To be sure the new implemented backends work correctly, we included in Soledad the U1DB tests that are relevant for the new pieces of code (backends, document, http(s) and sync tests). We also added specific tests to the new functionalities we are building. |