Thandy: Automatic updates for Tor bundles 0. Preliminaries 0.0. Scope This document describes a system for distributing Tor binary bundle updates. 0.1. Proposed code name Since "auto-update" is so generic, I had been thinking about going with "glider", based on the sugar glider you get when you search for "handy pocket creature". Based on conversations, it seems that "glider" is taken by a well-known WoW bot, so I'm rechristening this thing as "Thandy" (which could stand for Tor's Handy pocket creature if you want it to, or which could also be a person's first name). 0.2. Non-goals This is not meant to replace any existing download mechanism for users who prefer that mechanism. For example, just downloading source will still work fine. Similarly, we're not trying to force users who do not want to use downloaded binaries to use them, or to force users who do not want automatic updates to get them. {This should be obvious, but enough people have asked that I'm putting it in the document.} This is not a general-purpose package manager like yum or apt: it assumes that users will want to have one or more of a set of "bundles", not an arbitrary selection of packages dependant on one another. (Rationale: these systems do what they do pretty well.) This is also not a general-purpose package format. It assumes the existence of an external package format that can handle install, update, remove, and version query. Though see section XX. 0.3. Goals Once, Tor was a single executable that you could just run. Then it required Privoxy. Now, thanks to the Tor Browser Bundle and related projects, a full installation can contain Tor, Privoxy, Torbutton, Firefox, and more. We need to keep this software updated. When we make security fixes, quick uptake helps narrow the window in which attackers can exploit them. We need updates to be easy. Each additional step a user must take to get updated means that more users will stay with older insecure versions. We need updates to be secure. We're supposed to be good at crypto; let's act like it. There is no good reason in this day and age to subject users to rollback attacks or unsigned packages or whatever. We need administration to be simple. Tor doesn't have a release engineering team, so we can't add too many hard steps to putting out a new release. The system should be easy to implement; we may need to do multiple implementations on the client side at least. 0.3.1. Goals for package formats and PKIs It should be possible to mirror a repository using only rsync and cron. Separate keys should be used for different people and different roles. Only a minimal set of keys should have to be kept online to keep the system running. The system should handle any single computer or system or person being unavailable. The formats and protocols should be pretty future-proof. 1. System overview The basic unit of updatability is a "bundle". A bundle is a set of software components, or "packages", plus some rules about installing them. Example bundles could be "Tor Browser, stable series" or "Basic Tor, development series". When Thandy has responsibility for keeping a bundle up to date, we say that a user has "subscribed" to that bundle. Conceptually, there are four parts to keeping a bundle up to date: Polling: - Periodically, Thandy asks a mirror whether there is a newer version of some bundle that a user has subscribed to. If so, Thandy determines what's in the bundle. Fetching: - If the bundle contains packages that Thandy hasn't installed or hasn't cached, it needs to download them from a mirror. This can happen over any protocol; v1 should support at least http and https-over-Tor. V1 should also support resuming partial downloads, since many users have unreliable connections. Later versions could support Bittorrent, or whatever. Validation: - Throughout the process, Thandy must ensure that all the bundles are signed correctly, all the packages are signed correctly, and everything is up-to-date. We want to specify this so that users can't be tricked about the contents of a bundle, can't install a malicious package, and can't be fooled into believing that an old bundle is actually the latest. Installation: - Now Thandy has a set of packages to install. The format of these packages will be platform-dependent: they could be pkg files on OSX, MSI files on Win32, RPMs or DEBs on Linux, and so on. Thandy should query the user for permission to start installing packages, then install the packages. All other steps should generally happen automatically, in the background, without needing user intervention. This part needs user intervention because (A) it isn't nice to install updates without permission, and (B) in some configurations, it needs administrator privileges. (NO OTHER PART of this design needs administrator privileges.) 1.1. The repository Each Thandy instance knows about one or more "repositories". A repository is a filesystem somewhere that contains the packages in a set of bundles, and some associated metadata. A repository must exist at one or more canonical hosts, and may have a number of full or partial mirrors. In v1, each Thandy instance will know about only one repository. 1.2. The PKI The trust root for the whole system is, necessarily, whatever users download when they first download a copy of Thandy. We need to make sure that the first download happens from a site we trust, using HTTPS. Thandy ships with root keys, which in turn are used to verify the keys for all the other roles. There are a few root keys, operated by trusted admins for the system. If root keys ever need to be changed, we can just ship an update of Thandy: it's supposed to be self-updating anyway. The root keys are only used to sign a 'key list' of all the other keys and their roles. A key list is valid if it has been signed by a threshold of root keys. Each package is signed with the key of its authorized builder. For example, one volunteer may be authorized to build the mac versions of several packages, and another may be authorized to build the windows version of just one. Each bundle is signed with the key of its maintainer. It's assumed that the bundle maintainer might be the package maintainer for some but not all of the packages. The list of mirrors is also signed. If the mirror list is automatically updated, this key must be kept online; otherwise, it can be offline. To prevent an adversary from replaying an out-of-date signed document, an automated process periodically signs a timestamped statement containing the hashes of the mirror list, the latest bundles, and the key list, using yet another special-purpose key. This key must be kept online. 1.3. Threat Model And Analysis We assume an adversary who can operate compromised mirrors, and who can possibly compromise the main repository. At worst, such an adversary can DOS users in a way that they can detect. We're assuming for the moment an OSX/Win32-like execution model, where all packages will run equal privilege, but occasionally installation will require higher privilege. This means that once a hostile package is installed, it can basically do whatever it wants. As rootkit writers demonstrate, compromise is really tenacious: any attacker who can induce a user to install a hostile piece of code has, in effect, permanently compromised that user until they reinstall. Thus, if an adversary compromises enough keys to sign a compromised package, or tricks a packager into signing a compromised package, and manages to get that package into a signed bundle, the best we can do is to limit the number of users who are affected. We do this by compartmentalizing signing keys so that only the package and bundle in question are at risk. (If we had replicated build processes and a bit-by-bit reliable build process, we could have multiple packagers test that a binary was built properly, and multiply sign it. This would be effective against an adversary compromising a single packaging key, but not against one compromising a source repository.) 2. The repository layout The filesystem layout in the repository is used for two purposes: - To give mirrors an easy way to mirror only some of the repository. - To specify which parts of the repository a given key has the authority to sign. The following files exist in all repositories and mirrors: /meta/keys.txt Signed by the root keys; indicates keys and roles. /meta/mirrors.txt Signed by the mirror key; indicates which parts of the repository are mirrored at what mirrors. /meta/timestamp.txt Signed by the timestamp key; indicates hashes and timestamps for the latest versions of keys.txt and mirrors.txt. Also indicates the latest version of each bundle for each os/arch. This is the only file that needs to be downloaded for polling. /bundleinfo/bundlename/os-arch/bundlename-os-arch-bundleversion.txt Signed by the appropriate bundle key. Describes what packages make up a bundle, and what order to install, uninstall, and upgrade them in. Each leaf directory under bundleinfo should have only a single bundle's files. Users subscribe to such a directory, receiving (for example) "the most recent bundle in /bundleinfo/tor-browser-stable/win32/" /pkginfo/packagename/os-arch/version/packagename-os-arch-packageversion.txt Signed by the appropriate package key. Tells the name of the file that makes up a package, its hash, and what procedure is used to install it. /pkginfo/packagename/os-arch/version/(some filename).torrent The .torrent metadata file used to download that file. The file name is exactly the same as specified in the package file. /packages/packagename/os-arch/version/(some filename) The actual package file. Its naming convention will depend on the underlying packaging system. 3. Document formats 3.1. Metaformat All documents use a subset of the JSON object format, with floating-point numbers omitted. When calculating the digest of an object, we use the "canonical JSON" subdialect as described at http://wiki.laptop.org/go/Canonical_JSON 3.2. File formats: general principles Timestamp files will be downloaded very frequently; all other files will be much smaller in size than package files. Thus, size-optimization for timestamp files makes sense and most other other space optimizations don't. Versions are represented as lists of the form [I1, I2, I3, I4 ...] where each item is a number or alphanumeric version component. For example, the version "0.2.1.5-alpha" is represented as [0, 2, 1, 5, "alpha"). All signed files are of the format: { "signed" : X, "sigatures" : [ { "keyid" : K, "method" : M, ... "sig" : S } ..., ] } where: X is a list whose first element describes the signed object. K is the identifier of a key signing the document M is the method to be used to make the signature S is a signature of the canonical encoding of X using the identified key. We define one signing method at present: sha256-pkcs1 : A signature of the SHA256 hash of the canonical encoding of X, using PKCS-1 padding. All times are given as strings of the format "YYYY-MM-DD HH:MM:SS", in UTC. All keys are of the format: { "_keytype" : TYPE, ... } where TYPE is a string describing the type of the key and how it's used to sign documents. The type determines the interpretation of KEYVAL. The ID of a key is the SHA-256 hash of the canonical encoding of the key. We define one keytype at present: 'rsa'. Its format is: { "_keytype" : "rsa", "e" : E, "n" : N }, where E and N are the binary representations of the exponent and modulus, encoded as big-endian numbers in base 64. All keys must be at least 2048 bits long. Every role in the system is associated with a key. Replacing anything but a root key is supposed to be relatively easy. Master keys sign other keys, and certify them as belonging to roles. Clients are configured to know the master keys. Bundle keys certify the contents of a bundle. Package keys certify packages for a given program or set of programs. Mirror keys certify a list of mirrors. We expect this to be an automated process. Timestamp keys certify that given versions of other metadata documents are up-to-date. They are the only keys that absolutely need to be kept online. (If they are not, timestamps won't be generated.) 3.3. File formats: key list The key list file is signed by multiple root keys. It indicates which keys are authorized to sign which parts of the repository. { "_type" : "Keylist", "ts" : TIME, "keys" : [ { "roles" : [ [ ROLE, PATH ], ... ], ... "key" : KEY }, ... ] } The "ts" line describes when the keys file was updated. Clients MUST NOT replace a file with an older one, and SHOULD NOT accept a file too far in the future. A ROLE is one of "timestamp" "mirrors" "bundle" or "package". PATH is a path relative to the top of the directory hierarchy. It may contain "*" elements to indicate "any file", and may end with a "/**" element to indicate all files under a given point. 3.4. File formats: mirror list The mirror list is signed by a mirror key. It indicates which mirrors are active and believed to be mirroring which parts of the repository. { "_type" : "Mirrorlist", "ts" : TIME, "mirrors" : [ { "name" : N, "urlbase" : U, "contents" : [PATH ... ] , "weight" : W, ("official" : BOOL,) ... }, ... ] } Every mirror is a copy of some or all of the directory hierarchy containing at least the /meta, /bundles/, and /pkginfo directories. N is a descriptive name for the mirror; U is the URL of the mirror's base (i.e., the parent of the "meta" directory); and the PATH elements are the components describing how much of the packages directory is mirrored. Their format is as in the keylist file. W is an integer used to weight mirrors when picking at random; mirrors with more bandwidth should have higher weights. The "official" element should only be present if the mirror is (one of the) official repositories operated by the Tor Project. 3.5. File formats: timestamp files The timestamp file is signed by a timestamp key. It indicates the latest versions of other files, and contains a regularly updated timestamp to prevent rollback attacks. { "_type" : Timestamp, "at" : TIME, "m" : [ TIME, HASH, LENGTH ], "k" : [ TIME, HASH, LENGTH ], "b" : { NAME : [ [ Version, Path, Time, Hash, Length ] ] } } TIME is when the timestamp was signed. MIRRORLISTHASH is the digest of the mirror-list file; KEYLISTHASH is the digest of the key list file; and the 'b' entries are a list of the latest version of all bundles and their locations and hashes. The "name" of a bundle (in this context) is the directory component of the bundle's path. 3.6. File formats: bundle files { "_type" : "Bundle", "name" : NAME, "at" : TIME, "os" : OS, "location" : LOCATION, ("arch" : ARCH,) "version" : V, "packages" : [ { "name" : NAME, "version" : VERSION, "path" : PATH, "hash" : HASH, "length" : LENGTH, "order" : [ INST, UPDATE, REMOVE ], ("optional : BOOL, ) "gloss" : { LANG : TEXT }, "longgloss" : { LANG : TEXT }, } ] } Most elements are self-explanatory; the INST, UPDATE, and REMOVE elements of the order element are numbers defining the order in which the packages are installed, updated, and removed respectively. The "optional" element is present if the package is optional. "Gloss" is a short utf-8 human-readable string explaining what the package provides for the bundle; "longloss" is a longer such utf-8 string. (Note that the gloss strings are meant, not to describe the package, but to describe what the package provides for the bundle. For example, "The Anonymous Email Bundle needs the Python Runtime to run Mixminion.") Multiple gloss strings are allowed; each should have a different language. The UI should display the must appropriate language to the user. 3.7. File formats: package files { "_type" : "Package", "name" : NAME, "location" : LOCATION, "version" : VERSION, "format" : FMT, "ts" : TIME, "files" : [ [ PATH, HASH, INFO, LENGTH ], ... ], "shortdesc" : { LANG : DESC, ... }, "longdesc" : { LANG : DESC, ... }, } Most elements are self-explanatory. To interpret the 'INFO' entry for each installable file, see section 6. No two package files in the same repository should have the same name and version. If a package needs to be changed, the version MUST be incremented. Descriptions are tagged with languages in the same way as glosses. 4. Detailed Workflows 4.1. The client application Periodically, the client updater fetches a timestamp file from a mirror. If the timestamp in the file is up-to-date, the client first checks to see whether the keys file listed is one that the client has. If not, the client fetches it, makes sure the hash of the keys file matches the hash in the timestamp file, makes sure its date is more recent than any keys file they have but not too far in the future, and that it is signed by enough root keys that the client recognizes. [If the timestamp file is not up-to-date, the client tries a few mirrors until it finds one with a good timestamp.] [If the keys file from a mirror does not match the timestamp file, the client tries a new mirror for both.] [If the keys file is not signed by enough root keys, the client warns the user and tries another mirror for both the timestamp file and the keys file.] Once the client has an up-to-date keys file, the client checks the signature on the timestamp file. Assuming it checks out, the client refreshes the mirror list as needed, and refreshes any bundle files to which the user is subscribed if the client does not have the latest version of those files. The client checks signatures on these files, and fetches package metadata for any packages listed in the bundle file that the client does not have, checks signatures on these, and fetches binaries for packages that might need to be installed or updated. As the packages arrive, clients check their hashes. Once the client has gotten enough packages, it informs the user that new packages have arrived, and asks them if they want to update. Clients SHOULD cache at least the latest versions they have received of all files. When dowloading a file, if the client knows what that file's length should be, it SHOULD NOT accept a longer file, and SHOULD NOT continue the download past the file length. 4.1.1. Download preferences Users should be able to specify that packages must be only downloaded over Tor, or must only be downloaded over encrypted protocols, or both. Users should also be able to express preference for Tor vs non-Tor and encrypted vs non-encrypted, even if they allow both. 4.2. Mirrors Periodically, mirrors do an rsync or equivalent to fetch the latest version of whatever parts of the repository have changed since the version they currently hold. Mirrors SHOULD replace older versions of the repository idempotently, so that clients are less likely to see inconsistent state. Mirrors SHOULD validate the information they receive, and not serve partial or inconsistent files. 4.3. Workflow: Packagers When a new binary package is done, the person making the package runs a tool to generate and sign a package file, and sends both the package and the package file to a repository admin. Typically, the base package file will be generated by inserting a version into a template. Packages MAY have as part of their build process a script to generate the appropriately versioned package file. This script should at a minimum demand a build version, or use a timestamp in place of a build version, to prevent two packages with the same version from being created. 4.4. Workflow: bundlers When the packages in a bundle are done, the bundler runs a tool on the package files to generate and sign a bundle file. Typically, this tool uses a template bundle file. 4.5. Workflow: repository administrators Repository administrators use a tool to validate signed files into the repository. The repository should not be altered manually. This tool acts as follows: - Package files may be added, but never replaced. - Bundle files may be added, but never replaced. - No file may be added unless it is syntactically valid and signed by a key in the keys file authorized to sign files of this type in this file's location. - A package file may not be added unless all of its binary packages match their hashes. - A bundle file may not be added unless all of its package files are present and match their hashes. - When adding a new keylist, bundle, or mirrors list, the timestamp file must be regenerated immediately. 5. Parameter setting and corner cases 5.1. Timing The timestamp file SHOULD be regenerated every 15 minutes. Mirrors SHOULD attempt to update every hour. Clients SHOULD accept a timestamp file up to 6 hours old. 5.2. Format versioning and forward-compatibility All of the above formats include the ability to add more attribute-value fields for backwards-compatible format changes. If we need to make a backwards incompatible format change, we create a new filename for the new format. 5.3. Key management and migration Root keys should be kept offline. All keys except timestamp and mirror keys should be stored encrypted. All the formats above allow for multiple keys to sign a single document. To replace a compromised root key, it suffices to sign keylist documents with both the compromised key and its replacement until all clients have updated to a new version of the autoupdater. To replace another key, it suffices to authorize the new key in the keylist. Note that a new package or bundle key must re-sign and issue new versions of all packages or bundles it has generated. 6. Installing files from packages. Any single package file in Thandy can refer to one or more installable files in its 'files' list. These files are called "installable items" below to distinguish them from Thandy's package files as described in 3.7. Note that these installable items may in turn be called "package files" by other package systems. Try not to get confused. 6.1. Thandy's demands Thandy needs at minimum two things to fully support a kind of installable item. - Thandy needs to be able to see whether it has already been installed, _before_ downloading the item. We call this operation "checking" the item. - Thandy needs to be able to install it. We call this operation "installing" the item, for obscure reasons. Methods for checking and installing an item are sometimes linked, but can be orthogonal. For example, an RPM item is checked and installed with the RPM utility. An EXE-based installer, on the other hand, might be checked either by looking for a registry entry, by looking to see if a given installed file has a hash as expected, or by looking in Thandy's internal package database. To see how to check or install an item, look at the third element of that item's [ PATH, HASH, INFO, ... ] tuple. We'll call this the item's "INFO" below. 6.2. Checking installable items Thandy checks installable items in a package to see whether they're already installed before it tries to download them to the cache. This saves bandwidth. To check an item, see whether the check_type field in its INFO is a recognized value. Recognized values are: "registry" "rpm" "db" If the field's value is "registry", the INFO must also have these fields: "registry_ent" : [ KEY, VAL ] The installable item is installed if the win32 registry is present, and it has an entry KEY whose value is VAL. If the field's value is "rpm", the INFO must also have these fields: "rpm_version" : VERSION_STRING The installable item is installed if the version listed as installed in the RPM database is exactly VERSION_STRING. If the field's value is "db", the INFO must also have these fields: "item_name" : KEY, "item_version" : VAL KEY must be unique across the thandy repository. It identifies this kind of installable item uniquely. VAL is the version of this item. When Thandy installs the item, it writes a persistent mapping from KEY to VAL to a local database. The item is installed if such a mapping is found to exist. When Thandy decides to update an installable item that has a missing check_type field, or a check_type field with an unrecognized value, Thandy must download the item whether it is installed or not. 6.3. Installing installable items To install an item, see whether the install_type field in its INFO is a recognized value. Recognized values are: "rpm" "command" If the field's value is "rpm", the installable item is an RPM package. It gets installed and removed as normal for an RPM. If the field's value is "command", the following field must be present: "cmd_install" : COMMAND The following field is optional: "cmd_remove" : COMMAND Each command is a list of strings executed to install or remove the installable item. Strings and substrings of the form ${xyz} are special: they trigger variable expansion. Recognized variables are: ${FILE} : The absolute path to the file to install. When Thandy decides to install an installable item that has a missing install_type field, or a install_type field with an unrecognized value, Thandy alerts the user to the presence of the file, but can't install it itself. F. Future directions and open questions F.1. Package decomposition It would be neat to decouple existing packages. Right now, we'd never want a windows user to have to fetch an openssl dll and Tor separately. But if they're using an auto-update tool, it'd be pretty keen to have them not need to fetch a new openssl every time Tor has a bugfix. F.2. Caching at Tor servers. See Tor Proposal number 127. F.3. Support for more download methods Ozymandns, chunked downloads, and bittorrent would all be neat ideas. F.4. Support for bogus clocks. Thandy should have a user configurable "no, my clock is _supposed_ to be wrong" mode, since lots of users seem to _like_ having their clocks in 1970 forever. R. Ideas I'm rejecting for the moment R.1. Considering recommended versions from Tor consensus directory documents This requires a working Tor to update Tor; that's not necessarily a great idea. R.2. Integration with existing GPG signatures The OpenPGP signature and key format is so complicated that you'd have to be mad to try to read it yourself. (Check out RFC2440 for information about how bad it is in theory; in practice, it's worse.) Therefore, if we wanted to check OpenPGP signatures, we would basically have to bundle GPG.