diff options
Diffstat (limited to 'specs/thandy-spec.txt')
-rw-r--r-- | specs/thandy-spec.txt | 713 |
1 files changed, 713 insertions, 0 deletions
diff --git a/specs/thandy-spec.txt b/specs/thandy-spec.txt new file mode 100644 index 0000000..3c21466 --- /dev/null +++ b/specs/thandy-spec.txt @@ -0,0 +1,713 @@ + + Thandy: Automatic updates for Tor bundles + +0. Preliminaries + +0.0. Scope + + This document describes a system for distributing Tor binary bundle + updates. + +0.1. Proposed code name + + Since "auto-update" is so generic, I had been thinking about going with + "glider", based on the sugar glider you get when you search for "handy + pocket creature". Based on conversations, it seems that "glider" + is taken by a well-known WoW bot, so I'm rechristening this thing + as "Thandy" (which could stand for Tor's Handy pocket creature if + you want it to, or which could also be a person's first name). + + Some of this document still refers to "Glider", and needs to be updated. + +0.2. Non-goals + + This is not meant to replace any existing download mechanism for + users who prefer that mechanism. For example, just downloading + source will still work fine. + + Similarly, we're not trying to force users who do not want to use + downloaded binaries to use them, or to force users who do not want + automatic updates to get them. {This should be obvious, but enough + people have asked that I'm putting it in the document.} + + This is not a general-purpose package manager like yum or apt: it + assumes that users will want to have one or more of a set of + "bundles", not an arbitrary selection of packages dependant on one + another. (Rationale: these systems do what they do pretty well.) + + This is also not a general-purpose package format. It assumes the + existence of an external package format that can handle install, + update, remove, and version query. + +0.3. Goals + + Once Tor was a single executable that you could just run. Then it + required Privoxy. Now, thanks to the Tor Browser Bundle and related + projects, a full installation can contain Tor, Privoxy, Torbutton, + Firefox, and more. + + We need to keep this software updated. When we make security fixes, + quick uptake helps narrow the window in which attackers can exploit + them. + + We need updates to be easy. Each additional step a user must take to + get updated means that more users will stay with older insecure + versions. + + We need updates to be secure. We're supposed to be good at crypto; + let's act like it. There is no good reason in this day and age to + subject users to rollback attacks or unsigned packages or whatever. + + We need administration to be simple. Tor doesn't have a release + engineering team, so we can't add too many hard steps to putting out + a new release. + + The system should be easy to implement; we may need to do multiple + implementations on the client side at least. + +0.3.1. Goals for package formats and PKIs + + It should be possible to mirror a repository using only rsync and + cron. + + Separate keys should be used for different people and different + roles. + + Only a minimal set of keys should have to be kept online to keep + the system running. + + The system should handle any single computer or system or person + being unavailable. + + The formats and protocols should be pretty future-proof. + +1. System overview + + The basic unit of updatability is a "bundle". A bundle is a set of + software components, or "packages", plus some rules about installing + them. Example bundles could be "Tor Browser, stable series" or + "Basic Tor, development series". + + When Glider has responsibility for keeping a bundle up to date, we + say that a user has "subscribed" to that bundle. + + Conceptually, there are four parts to keeping a bundle up to date: + + Polling: + - Periodically, Glider asks a mirror whether there is a newer + version of some bundle that a user has subscribed to. If so, + Glider determines what's in the bundle. + + Fetching: + - If the bundle contains packages that Glider hasn't installed + or hasn't cached, it needs to download them from a mirror. + This can happen over any protocol; v1 should support at least + http and https-over-Tor. V1 should also support resuming + partial downloads, since many users have unreliable + connections. + + Later versions could support Bittorrent, or whatever. + + Validation: + - Throughout the process, Glider must ensure that all the + bundles are signed correctly, all the packages are signed + correctly, and everything is up-to-date. + + We want to specify this so that users can't be tricked about + the contents of a bundle, can't install a malicious package, + and can't be fooled into believing that an old bundle is + actually the latest. + + Installation: + - Now Glider has a set of packages to install. The format of + these packages will be platform-dependent: they could be pkg + files on OSX, MSI files on Win32, RPMs or DEBs on Linux, and + so on. Glider should query the user for permission to start + installing packages, then install the packages. All other + steps should generally happen automatically, in the + background, without needing user intervention. This part + needs user intervention because (A) it isn't nice to install + updates without permission, and (B) in some configurations, + it needs administrator privileges. + + (NO OTHER PART of this design needs administrator privileges.) + +1.1. The repository + + Each Glider instance knows about one or more "repositories". A + repository is a filesystem somewhere that contains the packages in a + set of bundles, and some associated metadata. A repository must + exist at one or more canonical hosts, and may have a number of full + or partial mirrors. + + In v1, each Glider instance will know about only one repository. + +1.2. The PKI + + The trust root for the whole system is, necessarily, whatever users + download when they first download a copy of Glider. We need to make + sure that the first download happens from a site we trust, using + HTTPS. + + Glider ships with root keys, which in turn are used to verify the + keys for all the other roles. There are a few root keys, operated by + trusted admins for the system. If root keys ever need to be changed, + we can just ship an update of Glider: it's supposed to be + self-updating anyway. + + The root keys are only used to sign a 'key list' of all the other + keys and their roles. A key list is valid if it has been signed by a + threshold of root keys. + + Each package is signed with the key of its authorized builder. For + example, one volunteer may be authorized to build the mac versions of + several packages, and another may be authorized to build the windows + version of just one. + + Each bundle is signed with the key of its maintainer. It's assumed + that the bundle maintainer might be the package maintainer for some + but not all of the packages. + + The list of mirrors is also signed. If the mirror list is + automatically updated, this key must be kept online; otherwise, it + can be offline. + + To prevent an adversary from replaying an out-of-date signed + document, an automated process periodically signs a timestamped + statement containing the hashes of the mirror list, the latest + bundles, and the key list, using yet another special-purpose key. + This key must be kept online. + +1.3. Threat Model And Analysis + + We assume an adversary who can operate compromised mirrors, and who + can possibly compromise the main repository. At worst, such an + adversary can DOS users in a way that they can detect. + + We're assuming for the moment an OSX/Win32-like execution model, + where all packages will run equal privilege, but occasionally + installation will require higher privilege. This means that once a + hostile package is installed, it can basically do whatever it + wants. As rootkit writers demonstrate, compromise is really + tenuous: any attacker who can induce a user to install a hostile + piece of code has, in effect, permanently compromised that user + until they reinstall. + + Thus, if an adversary compromises enough keys to sign a compromised + package, or tricks a packager into signing a compromised package, + and manages to get that package into a signed bundle, the best we + can do is to limit the number of users who are affected. We do + this by compartmentalizing signing keys so that only the package + and bundle in question are at risk. + + (If we had replicated build processes and a bit-by-bit reliable + build process, we could have multiple packagers test that a binary + was built properly, and multiply sign it. This would be effective + against an adversary compromising a single packaging key, but not + against one compromising a source repository.) + +2. The repository layout + + The filesystem layout in the repository is used for two purposes: + - To give mirrors an easy way to mirror only some of the repository. + - To specify which parts of the repository a given key has the + authority to sign. + + The following files exist in all repositories and mirrors: + + /meta/keys.txt + + Signed by the root keys; indicates keys and roles. + [???? I'm using the txt extension here. Is that smart?] + + /meta/mirrors.txt + + Signed by the mirror key; indicates which parts of the + repository are mirrored at what mirrors. + + /meta/timestamp.txt + + Signed by the timestamp key; indicates hashes and timestamps + for the latest versions of keys.txt and mirrors.txt. Also + indicates the latest version of each bundle for each os/arch. + + This is the only file that needs to be downloaded for polling. + + /bundleinfo/bundlename/os-arch/bundlename-os-arch-bundleversion.txt + + Signed by the appropriate bundle key. Describes what + packages make up a bundle, and what order to install, + uninstall, and upgrade them in. + + /pkginfo/packagename/os-arch/version/packagename-os-arch-packageversion.txt + + Signed by the appropriate package key. Tells the name of the + file that makes up a package, its hash, and what procedure + is used to install it. + + /packages/packagename/os-arch/version/(some filename) + + The actual package file. Its naming convention will depend + on the underlying packaging system. + +3. Document formats + +3.1. Metaformat + + All documents use Rivest's SEXP meta-format as documented at + http://people.csail.mit.edu/rivest/sexp.html + with the restriction that no "display hint" fields are to be used, + and the base64 transit encoding isn't used either. + + (We use SEXP because it's really easy to parse, really portable, + and unlike most other tagged data formats, has a + trivially-specified canonical format suitable for hashing.) + + In descriptions of syntax below, we use regex-style qualifiers, so + that in + (sofa slipcover? occupant* leg+) + the sofa will have an optional slipcover, zero or more occupants, + and one or more legs. This pattern matches (sofa leg) and (sofa + slipcover occupant occupant leg leg leg leg) but not (sofa leg + slipcover). + + We also use a braces notation to indicate elements that can occur + in any order. For example, + (bread {flour+ eggs? yeast}) + matches a list starting with "bread", and then containing one or + more of flours, zero or one occurrences of eggs, and one + occurrence of yeast, in any order. This pattern matches (bread eggs + yeast flour) but not (bread yeast) or (bread flour eggs yeast + macadamias). + +3.2. File formats: general principles + + We use tagged lists (lists whose first element is a string) to + indicate typed objects. Tags are generally lower-case, with + hyphens used for separation. Think Lispy. + + We use attrlists [lists of (key value) lists] to indicate a + multimap from keys to values. Clients MUST accept unrecognized + keys in these attrlists. The syntax for an attrlist with two + recognized and required keys is typically given as ({(key1 val1) + (key2 val2) (ATTR VAL)*}), indicating that the keys can occur in + any order, intermixed with other attributes. + + Timestamp files will be downloaded very frequently; all other files + will be much smaller in size than package files. Thus, + size-optimization for timestamp files makes sense and most other + other space optimizations don't. + + Versions are represented as lists of the form (v I1 I2 I3 I4 ...) + where each item is a number or alphanumeric version component. For + example, the version "0.2.1.5-alpha" is represented as (v 0 2 1 5 + alpha). + + All signed files are of the format: + + (signed + X + (signature ({(keyid K) (method M) (ATTR VAL)*}) SIG)+ + ) + + { "_type" : "Signed", + "signed" : X, + "sigatures" : [ + { "keyid" : K, + "method" : M, + ... + "sig" : S } ] + + where: X is a list whose first element describes the signed object. + K is the identifier of a key signing the document + M is the method to be used to make the signature + (ATTR VAL) is an arbitrary list whose first element is a + string. + SIG is a signature of the canonical encoding of X using the + identified key. + + We define two signing methods at present: + sha256-oaep : A signature of the SHA256 hash of the canonical + encoding of X, using OAEP+ padding. [XXXX say more about mgf] + + All times are given as strings of the format "YYYY-MM-DD HH:MM:SS", + in UTC. + + All keys are of the format: + (pubkey ({(type TYPE) (ATTR VAL)*}) KEYVAL) + + { "_keytype" : TYPE, + ... + "keyval" : KEYVAL } + + where TYPE is a string describing the type of the key and how it's + used to sign documents. The type determines the interpretation of + KEYVAL. + + The ID of a key is a two-element list of the type and the SHA-256 + hash of the canonical encoding of the KEYVAL field. + + We define one keytype at present: 'rsa'. The KEYVAL in this case + is a 2-element list of (e n), with both values given in big-endian + binary format. [This makes keys 45-60% more compact than using + decimal integers.] + + {Values given as integers.} + + {'e' : e, 'n' : n, big-endian hex. } + + All RSA keys must be at least 2048 bits long. + + + Every role in the system is associated with a key. Replacing + anything but a root key is supposed to be relatively easy. + + Root-keys sign other keys, and certify them as belonging to roles. + Clients are configured to know the root keys. + + Bundle keys certify the contents of a bundle. + + Package keys certify packages for a given program or set of + programs. + + Mirror keys certify a list of mirrors. We expect this to be an + automated process. + + Timestamp keys certify that given versions of other metadata + documents are up-to-date. They are the only keys that absolutely + need to be kept online. (If they are not, timestamps won't be + generated.) + +3.3. File formats: key list + + The key list file is signed by multiple root keys. It indicates + which keys are authorized to sign which parts of the repository. + + (keylist + (ts TIME) + (keys + ((key ({(roles (ROLE PATH)+) (ATTR VAL)*}) KEY)*) + ... + ) + + { "_type" : "Keylist", + "ts" : TIME, + "keys" : [ + { "roles" : [ [ ROLE, PATH ], ... ], + ... + "key" : KEY }, ... ] } + + The "ts" line describes when the keys file was updated. Clients + MUST NOT replace a file with an older one, and SHOULD NOT accept a + file too far in the future. + + A ROLE is one of "timestamp" "mirrors" "bundle" or "package". + + PATH is a path relative to the top of the directory hierarchy. It + may contain "*" elements to indicate "any file", and may end with a + "/**" element to indicate all files under a given point. + +3.4. File formats: mirror list + + The mirror list is signed by a mirror key. It indicates which + mirrors are active and believed to be mirroring which parts of the + repository. + + (mirrorlist + (ts TIME) + (mirrors + ( (mirror ({(name N) (urlbase U) (contents PATH+) (weight W) + (official)? (ATTR VAL)})) * ) + ... + ) + + { "_type" : "Mirrorlist", + "mirrors" : [ + { "name" : N, + "urlbase" : U, + "contents" : [PATH ... ] , + "weight" : W, + "official" : BOOL, + ... + }, ... ] + } + + Every mirror is a copy of some or all of the directory hierarchy + containing at least the /meta, /bundles/, and /pkginfo directories. + + N is a descriptive name for the mirror; U is the URL of the mirror's + base (i.e., the parent of the "meta" directory); and the PATH + elements are the components describing how much of the packages + directory is mirrored. Their format is as in the keylist file. + + W is an integer used to weight mirrors when picking at random; + mirrors with more bandwidth should have higher weigths. The + "official" element should only be present if the mirror is (one of + the) official repositories operated by the Tor Project. + +3.5. File formats: timestamp files + + The timestamp file is signed by a timestamp key. It indicates the + latest versions of other files, and contains a regularly updated + timestamp to prevent rollback attacks. + + (ts + ({(at TIME) + (m TIME MIRRORLISTHASH) + (k TIME KEYLISTHASH) + (b NAME VERSION PATH TIME HASH)*}) + ) + + { "_type" : Timestamp, + "at" : TIME, + "m" : [ TIME, HASH ], + "k" : [ TIME, HASH ], + "b" : { NAME : + [ [ Version, Path, Time, Hash ] ] } + } + + TIME is when the timestamp was signed. MIRRORLISTHASH is the digest + of the mirror-list file; KEYLISTHASH is the digest of the key list + file; and the 'b' entries are a list of the latest version of all + bundles and their locations and hashes. + +3.6. File formats: bundle files + + (bundle + (at TIME) + (os OS) + [(arch ARCH)] + (version V) + (packages + (NAME VERSION PATH HASH ({(order INST UPDATE REMOVE) + (optional)? + (gloss LANG TEXT)* + (longloss LANG TEXT)* + (ATTR VAL)*})? )* ) + ) + + { "_type" : "Bundle", + "name" : NAME, + "at" : TIME, + "os" : OS, + [ "arch" : ARCH, ] + "version" : V + "packages" : + [ { "name" : NAME, + "version" : VERSION, + "path" : PATH, + "hash" : HASH, + "order" : [ INST, UPDATE, REMOVE ], + [ "optional : BOOL, ] + "gloss" : { LANG : TEXT }, + "longgloss" : { LANG : TEXT }, + } ] } + + Most elements are self-explanatory; the INST, UPDATE, and REMOVE + elements of the order element are numbers defining the order in + which the packages are installed, updated, and removed respectively. + The "optional" element is present if the package is optional. + "Gloss" is a short utf-8 human-readable string explaining what the + package provides for the bundle; "longloss" is a longer such + utf-8 string. + + (Note that the gloss strings are meant, not to describe the package, + but to describe what the package provides for the bundle. For + example, "The Anonymous Email Bundle needs the Python Runtime to run + Mixminion.") + + Multiple gloss strings are allowed; each should have a different + language. The UI should display the must appropriate language to the + user. + +3.7. File formats: package files + + (package + ({(name NAME) + (version VERSION) + (format FMT ((ATTR VAL)*)? ) + (path PATH) + (ts TIME) + (digest HASH) + (shortdesc LANG TEXT)* + (longdesc LANG TEXT)* + (ATTR VAL)* }) + ) + + Most elements are self-explanatory. The "FMT" element describes the + file format of the package, which should give enough information + about how to install it. + + No two package files in the same repository should have the same + name and version. If a package needs to be changed, the version + MUST be incremented. + + Descriptions are tagged with languages in the same way as glosses. + +4. Detailed Workflows + +4.1. The client application + + Periodically, the client updater fetches a timestamp file from a + mirror. If the timestamp in the file is up-to-date, the client + first checks to see whether the keys file listed is one that the + client has. If not, the client fetches it, makes sure the hash of + the keys file matches the hash in the timestamp file, makes sure its + date is more recent than any keys file they have but not too far in + the future, and that it is signed by enough root keys that the + client recognizes. + + [If the timestamp file is not up-to-date, the client tries a + few mirrors until it finds one with a good timestamp.] + + [If the keys file from a mirror does not match the timestamp + file, the client tries a new mirror for both.] + + [If the keys file is not signed by enough root keys, the client + warns the user and tries another mirror for both the timestamp + file and the keys file.] + + Once the client has an up-to-date keys file, the client checks the + signature on the timestamp file. Assuming it checks out, the client + refreshes the mirror list as needed, and refreshes any bundle files + to which the user is subscribed if the client does not have + the latest version of those files. The client checks signatures on + these files, and fetches package metadata for any packages listed in + the bundle file that the client does not have, checks signatures on + these, and fetches binaries for packages that might need to be + installed or updated. As the packages arrive, clients check their + hashes. + + Once the client has gotten enough packages, it informs the user that + new packages have arrived, and asks them if they want to update. + + Clients SHOULD cache at least the latest versions they have received + of all files. + +4.1.1. Download preferences + + Users should be able to specify that packages must be only + downloaded over Tor, or must only be downloaded over encrypted + protocols, or both. Users should also be able to express preference + for Tor vs non-Tor and encrypted vs non-encrypted, even if they + allow both. + +4.2. Mirrors + + Periodically, mirrors do an rsync or equivalent to fetch the latest + version of whatever parts of the repository have changed since the + version they currently hold. Mirrors SHOULD replace older versions + of the repository idempotently, so that clients are less likely to + see inconsistent state. Mirrors SHOULD validate the information + they receive, and not serve partial or inconsistent files. + +4.3. Workflow: Packagers + + When a new binary package is done, the person making the package + runs a tool to generate and sign a package file, and sends both the + package and the package file to a repository admin. Typically, the + base package file will be generated by inserting a version into a + template. + + Packages MAY have as part of their build process a script to + generate the appropriately versioned package file. This script + should at a minimum demand a build version, or use a timestamp in + place of a build version, to prevent two packages with the same + version from being created. + +4.4. Workflow: bundlers + + When the packages in a bundle are done, the bundler runs a tool on + the package files to generate and sign a bundle file. Typically, + this tool uses a template bundle file. + +4.5. Workflow: repository administrators + + Repository administrators use a tool to validate signed files into the + repository. The repository should not be altered manually. + + This tool acts as follows: + - Package files may be added, but never replaced. + - Bundle files may be added, but never replaced. + - No file may be added unless it is syntactically valid and + signed by a key in the keys file authorized to sign files of + this type in this file's location location. + + - A package file may not be added unless all of its binary + packages match their hashes. + + - A bundle file may not be added unless all of its package files + are present and match their hashes. + + - When adding a new keylist, bundle, or mirrors list, the + timestamp file must be regenerated immediately. + +5. Parameter setting and corner cases. + +5.1. Timing + + The timestamp file SHOULD be regenerated every 15 minutes. Mirrors + SHOULD attempt to update every hour. Clients SHOULD accept a + timestamp file up to 6 hours old. + +5.2. Format versioning and forward-compatibility: + + All of the above formats include the ability to add more + attribute-value fields for backwards-compatible format changes. If + we need to make a backwards incompatible format change, we create a + new filename for the new format. + +5.3. Key management and migration: + + Root keys should be kept offline. All keys except timestamp and + mirror keys should be stored encrypted. + + All the formats above allow for multiple keys to sign a single + document. To replace a compromised root key, it suffices to sign + keylist documents with both the compromised key and its replacement + until all clients have updated to a new version of the autoupdater. + + To replace another key, it suffices to authorize the new key in the + keylist. Note that a new package or bundle key must re-sign and + issue new versions of all packages or bundles it has generated. + + + +F. Future directions and open questions + +F.1. Package decomposition + + It would be neat to decouple existing packages. Right now, we'd + never want a windows user to have to fetch an openssl dll and Tor + separately. But if they're using an auto-update tool, it'd be + pretty keen to have them not need to fetch a new openssl every time + Tor has a bugfix. + +F.2. Caching at Tor servers. + + See Tor Proposal number 127. + +F.3. Support for more download methods + + Ozymandns, chunked downloads, and bittorrent would all be neat + ideas. + +F.4. Support for bogus clocks. + + Glider should have a user configurable "no, my clock is _supposed_ + to be wrong" mode, since lots of users seem to _like_ having their + clocks in 1970 forever. + +R. Ideas I'm rejecting for the moment + +R.1. Considering recommended versions from Tor consensus directory documents + + This requires a working Tor to update Tor; that's not necessarily a + great idea. + +R.2. Integration with existing GPG signatures + + The OpenPGP signature and key format is so complicated that you'd + have to be mad to touch it. + + |