summaryrefslogtreecommitdiff
path: root/specs/thandy-spec.txt
diff options
context:
space:
mode:
Diffstat (limited to 'specs/thandy-spec.txt')
-rw-r--r--specs/thandy-spec.txt713
1 files changed, 713 insertions, 0 deletions
diff --git a/specs/thandy-spec.txt b/specs/thandy-spec.txt
new file mode 100644
index 0000000..3c21466
--- /dev/null
+++ b/specs/thandy-spec.txt
@@ -0,0 +1,713 @@
+
+ Thandy: Automatic updates for Tor bundles
+
+0. Preliminaries
+
+0.0. Scope
+
+ This document describes a system for distributing Tor binary bundle
+ updates.
+
+0.1. Proposed code name
+
+ Since "auto-update" is so generic, I had been thinking about going with
+ "glider", based on the sugar glider you get when you search for "handy
+ pocket creature". Based on conversations, it seems that "glider"
+ is taken by a well-known WoW bot, so I'm rechristening this thing
+ as "Thandy" (which could stand for Tor's Handy pocket creature if
+ you want it to, or which could also be a person's first name).
+
+ Some of this document still refers to "Glider", and needs to be updated.
+
+0.2. Non-goals
+
+ This is not meant to replace any existing download mechanism for
+ users who prefer that mechanism. For example, just downloading
+ source will still work fine.
+
+ Similarly, we're not trying to force users who do not want to use
+ downloaded binaries to use them, or to force users who do not want
+ automatic updates to get them. {This should be obvious, but enough
+ people have asked that I'm putting it in the document.}
+
+ This is not a general-purpose package manager like yum or apt: it
+ assumes that users will want to have one or more of a set of
+ "bundles", not an arbitrary selection of packages dependant on one
+ another. (Rationale: these systems do what they do pretty well.)
+
+ This is also not a general-purpose package format. It assumes the
+ existence of an external package format that can handle install,
+ update, remove, and version query.
+
+0.3. Goals
+
+ Once Tor was a single executable that you could just run. Then it
+ required Privoxy. Now, thanks to the Tor Browser Bundle and related
+ projects, a full installation can contain Tor, Privoxy, Torbutton,
+ Firefox, and more.
+
+ We need to keep this software updated. When we make security fixes,
+ quick uptake helps narrow the window in which attackers can exploit
+ them.
+
+ We need updates to be easy. Each additional step a user must take to
+ get updated means that more users will stay with older insecure
+ versions.
+
+ We need updates to be secure. We're supposed to be good at crypto;
+ let's act like it. There is no good reason in this day and age to
+ subject users to rollback attacks or unsigned packages or whatever.
+
+ We need administration to be simple. Tor doesn't have a release
+ engineering team, so we can't add too many hard steps to putting out
+ a new release.
+
+ The system should be easy to implement; we may need to do multiple
+ implementations on the client side at least.
+
+0.3.1. Goals for package formats and PKIs
+
+ It should be possible to mirror a repository using only rsync and
+ cron.
+
+ Separate keys should be used for different people and different
+ roles.
+
+ Only a minimal set of keys should have to be kept online to keep
+ the system running.
+
+ The system should handle any single computer or system or person
+ being unavailable.
+
+ The formats and protocols should be pretty future-proof.
+
+1. System overview
+
+ The basic unit of updatability is a "bundle". A bundle is a set of
+ software components, or "packages", plus some rules about installing
+ them. Example bundles could be "Tor Browser, stable series" or
+ "Basic Tor, development series".
+
+ When Glider has responsibility for keeping a bundle up to date, we
+ say that a user has "subscribed" to that bundle.
+
+ Conceptually, there are four parts to keeping a bundle up to date:
+
+ Polling:
+ - Periodically, Glider asks a mirror whether there is a newer
+ version of some bundle that a user has subscribed to. If so,
+ Glider determines what's in the bundle.
+
+ Fetching:
+ - If the bundle contains packages that Glider hasn't installed
+ or hasn't cached, it needs to download them from a mirror.
+ This can happen over any protocol; v1 should support at least
+ http and https-over-Tor. V1 should also support resuming
+ partial downloads, since many users have unreliable
+ connections.
+
+ Later versions could support Bittorrent, or whatever.
+
+ Validation:
+ - Throughout the process, Glider must ensure that all the
+ bundles are signed correctly, all the packages are signed
+ correctly, and everything is up-to-date.
+
+ We want to specify this so that users can't be tricked about
+ the contents of a bundle, can't install a malicious package,
+ and can't be fooled into believing that an old bundle is
+ actually the latest.
+
+ Installation:
+ - Now Glider has a set of packages to install. The format of
+ these packages will be platform-dependent: they could be pkg
+ files on OSX, MSI files on Win32, RPMs or DEBs on Linux, and
+ so on. Glider should query the user for permission to start
+ installing packages, then install the packages. All other
+ steps should generally happen automatically, in the
+ background, without needing user intervention. This part
+ needs user intervention because (A) it isn't nice to install
+ updates without permission, and (B) in some configurations,
+ it needs administrator privileges.
+
+ (NO OTHER PART of this design needs administrator privileges.)
+
+1.1. The repository
+
+ Each Glider instance knows about one or more "repositories". A
+ repository is a filesystem somewhere that contains the packages in a
+ set of bundles, and some associated metadata. A repository must
+ exist at one or more canonical hosts, and may have a number of full
+ or partial mirrors.
+
+ In v1, each Glider instance will know about only one repository.
+
+1.2. The PKI
+
+ The trust root for the whole system is, necessarily, whatever users
+ download when they first download a copy of Glider. We need to make
+ sure that the first download happens from a site we trust, using
+ HTTPS.
+
+ Glider ships with root keys, which in turn are used to verify the
+ keys for all the other roles. There are a few root keys, operated by
+ trusted admins for the system. If root keys ever need to be changed,
+ we can just ship an update of Glider: it's supposed to be
+ self-updating anyway.
+
+ The root keys are only used to sign a 'key list' of all the other
+ keys and their roles. A key list is valid if it has been signed by a
+ threshold of root keys.
+
+ Each package is signed with the key of its authorized builder. For
+ example, one volunteer may be authorized to build the mac versions of
+ several packages, and another may be authorized to build the windows
+ version of just one.
+
+ Each bundle is signed with the key of its maintainer. It's assumed
+ that the bundle maintainer might be the package maintainer for some
+ but not all of the packages.
+
+ The list of mirrors is also signed. If the mirror list is
+ automatically updated, this key must be kept online; otherwise, it
+ can be offline.
+
+ To prevent an adversary from replaying an out-of-date signed
+ document, an automated process periodically signs a timestamped
+ statement containing the hashes of the mirror list, the latest
+ bundles, and the key list, using yet another special-purpose key.
+ This key must be kept online.
+
+1.3. Threat Model And Analysis
+
+ We assume an adversary who can operate compromised mirrors, and who
+ can possibly compromise the main repository. At worst, such an
+ adversary can DOS users in a way that they can detect.
+
+ We're assuming for the moment an OSX/Win32-like execution model,
+ where all packages will run equal privilege, but occasionally
+ installation will require higher privilege. This means that once a
+ hostile package is installed, it can basically do whatever it
+ wants. As rootkit writers demonstrate, compromise is really
+ tenuous: any attacker who can induce a user to install a hostile
+ piece of code has, in effect, permanently compromised that user
+ until they reinstall.
+
+ Thus, if an adversary compromises enough keys to sign a compromised
+ package, or tricks a packager into signing a compromised package,
+ and manages to get that package into a signed bundle, the best we
+ can do is to limit the number of users who are affected. We do
+ this by compartmentalizing signing keys so that only the package
+ and bundle in question are at risk.
+
+ (If we had replicated build processes and a bit-by-bit reliable
+ build process, we could have multiple packagers test that a binary
+ was built properly, and multiply sign it. This would be effective
+ against an adversary compromising a single packaging key, but not
+ against one compromising a source repository.)
+
+2. The repository layout
+
+ The filesystem layout in the repository is used for two purposes:
+ - To give mirrors an easy way to mirror only some of the repository.
+ - To specify which parts of the repository a given key has the
+ authority to sign.
+
+ The following files exist in all repositories and mirrors:
+
+ /meta/keys.txt
+
+ Signed by the root keys; indicates keys and roles.
+ [???? I'm using the txt extension here. Is that smart?]
+
+ /meta/mirrors.txt
+
+ Signed by the mirror key; indicates which parts of the
+ repository are mirrored at what mirrors.
+
+ /meta/timestamp.txt
+
+ Signed by the timestamp key; indicates hashes and timestamps
+ for the latest versions of keys.txt and mirrors.txt. Also
+ indicates the latest version of each bundle for each os/arch.
+
+ This is the only file that needs to be downloaded for polling.
+
+ /bundleinfo/bundlename/os-arch/bundlename-os-arch-bundleversion.txt
+
+ Signed by the appropriate bundle key. Describes what
+ packages make up a bundle, and what order to install,
+ uninstall, and upgrade them in.
+
+ /pkginfo/packagename/os-arch/version/packagename-os-arch-packageversion.txt
+
+ Signed by the appropriate package key. Tells the name of the
+ file that makes up a package, its hash, and what procedure
+ is used to install it.
+
+ /packages/packagename/os-arch/version/(some filename)
+
+ The actual package file. Its naming convention will depend
+ on the underlying packaging system.
+
+3. Document formats
+
+3.1. Metaformat
+
+ All documents use Rivest's SEXP meta-format as documented at
+ http://people.csail.mit.edu/rivest/sexp.html
+ with the restriction that no "display hint" fields are to be used,
+ and the base64 transit encoding isn't used either.
+
+ (We use SEXP because it's really easy to parse, really portable,
+ and unlike most other tagged data formats, has a
+ trivially-specified canonical format suitable for hashing.)
+
+ In descriptions of syntax below, we use regex-style qualifiers, so
+ that in
+ (sofa slipcover? occupant* leg+)
+ the sofa will have an optional slipcover, zero or more occupants,
+ and one or more legs. This pattern matches (sofa leg) and (sofa
+ slipcover occupant occupant leg leg leg leg) but not (sofa leg
+ slipcover).
+
+ We also use a braces notation to indicate elements that can occur
+ in any order. For example,
+ (bread {flour+ eggs? yeast})
+ matches a list starting with "bread", and then containing one or
+ more of flours, zero or one occurrences of eggs, and one
+ occurrence of yeast, in any order. This pattern matches (bread eggs
+ yeast flour) but not (bread yeast) or (bread flour eggs yeast
+ macadamias).
+
+3.2. File formats: general principles
+
+ We use tagged lists (lists whose first element is a string) to
+ indicate typed objects. Tags are generally lower-case, with
+ hyphens used for separation. Think Lispy.
+
+ We use attrlists [lists of (key value) lists] to indicate a
+ multimap from keys to values. Clients MUST accept unrecognized
+ keys in these attrlists. The syntax for an attrlist with two
+ recognized and required keys is typically given as ({(key1 val1)
+ (key2 val2) (ATTR VAL)*}), indicating that the keys can occur in
+ any order, intermixed with other attributes.
+
+ Timestamp files will be downloaded very frequently; all other files
+ will be much smaller in size than package files. Thus,
+ size-optimization for timestamp files makes sense and most other
+ other space optimizations don't.
+
+ Versions are represented as lists of the form (v I1 I2 I3 I4 ...)
+ where each item is a number or alphanumeric version component. For
+ example, the version "0.2.1.5-alpha" is represented as (v 0 2 1 5
+ alpha).
+
+ All signed files are of the format:
+
+ (signed
+ X
+ (signature ({(keyid K) (method M) (ATTR VAL)*}) SIG)+
+ )
+
+ { "_type" : "Signed",
+ "signed" : X,
+ "sigatures" : [
+ { "keyid" : K,
+ "method" : M,
+ ...
+ "sig" : S } ]
+
+ where: X is a list whose first element describes the signed object.
+ K is the identifier of a key signing the document
+ M is the method to be used to make the signature
+ (ATTR VAL) is an arbitrary list whose first element is a
+ string.
+ SIG is a signature of the canonical encoding of X using the
+ identified key.
+
+ We define two signing methods at present:
+ sha256-oaep : A signature of the SHA256 hash of the canonical
+ encoding of X, using OAEP+ padding. [XXXX say more about mgf]
+
+ All times are given as strings of the format "YYYY-MM-DD HH:MM:SS",
+ in UTC.
+
+ All keys are of the format:
+ (pubkey ({(type TYPE) (ATTR VAL)*}) KEYVAL)
+
+ { "_keytype" : TYPE,
+ ...
+ "keyval" : KEYVAL }
+
+ where TYPE is a string describing the type of the key and how it's
+ used to sign documents. The type determines the interpretation of
+ KEYVAL.
+
+ The ID of a key is a two-element list of the type and the SHA-256
+ hash of the canonical encoding of the KEYVAL field.
+
+ We define one keytype at present: 'rsa'. The KEYVAL in this case
+ is a 2-element list of (e n), with both values given in big-endian
+ binary format. [This makes keys 45-60% more compact than using
+ decimal integers.]
+
+ {Values given as integers.}
+
+ {'e' : e, 'n' : n, big-endian hex. }
+
+ All RSA keys must be at least 2048 bits long.
+
+
+ Every role in the system is associated with a key. Replacing
+ anything but a root key is supposed to be relatively easy.
+
+ Root-keys sign other keys, and certify them as belonging to roles.
+ Clients are configured to know the root keys.
+
+ Bundle keys certify the contents of a bundle.
+
+ Package keys certify packages for a given program or set of
+ programs.
+
+ Mirror keys certify a list of mirrors. We expect this to be an
+ automated process.
+
+ Timestamp keys certify that given versions of other metadata
+ documents are up-to-date. They are the only keys that absolutely
+ need to be kept online. (If they are not, timestamps won't be
+ generated.)
+
+3.3. File formats: key list
+
+ The key list file is signed by multiple root keys. It indicates
+ which keys are authorized to sign which parts of the repository.
+
+ (keylist
+ (ts TIME)
+ (keys
+ ((key ({(roles (ROLE PATH)+) (ATTR VAL)*}) KEY)*)
+ ...
+ )
+
+ { "_type" : "Keylist",
+ "ts" : TIME,
+ "keys" : [
+ { "roles" : [ [ ROLE, PATH ], ... ],
+ ...
+ "key" : KEY }, ... ] }
+
+ The "ts" line describes when the keys file was updated. Clients
+ MUST NOT replace a file with an older one, and SHOULD NOT accept a
+ file too far in the future.
+
+ A ROLE is one of "timestamp" "mirrors" "bundle" or "package".
+
+ PATH is a path relative to the top of the directory hierarchy. It
+ may contain "*" elements to indicate "any file", and may end with a
+ "/**" element to indicate all files under a given point.
+
+3.4. File formats: mirror list
+
+ The mirror list is signed by a mirror key. It indicates which
+ mirrors are active and believed to be mirroring which parts of the
+ repository.
+
+ (mirrorlist
+ (ts TIME)
+ (mirrors
+ ( (mirror ({(name N) (urlbase U) (contents PATH+) (weight W)
+ (official)? (ATTR VAL)})) * )
+ ...
+ )
+
+ { "_type" : "Mirrorlist",
+ "mirrors" : [
+ { "name" : N,
+ "urlbase" : U,
+ "contents" : [PATH ... ] ,
+ "weight" : W,
+ "official" : BOOL,
+ ...
+ }, ... ]
+ }
+
+ Every mirror is a copy of some or all of the directory hierarchy
+ containing at least the /meta, /bundles/, and /pkginfo directories.
+
+ N is a descriptive name for the mirror; U is the URL of the mirror's
+ base (i.e., the parent of the "meta" directory); and the PATH
+ elements are the components describing how much of the packages
+ directory is mirrored. Their format is as in the keylist file.
+
+ W is an integer used to weight mirrors when picking at random;
+ mirrors with more bandwidth should have higher weigths. The
+ "official" element should only be present if the mirror is (one of
+ the) official repositories operated by the Tor Project.
+
+3.5. File formats: timestamp files
+
+ The timestamp file is signed by a timestamp key. It indicates the
+ latest versions of other files, and contains a regularly updated
+ timestamp to prevent rollback attacks.
+
+ (ts
+ ({(at TIME)
+ (m TIME MIRRORLISTHASH)
+ (k TIME KEYLISTHASH)
+ (b NAME VERSION PATH TIME HASH)*})
+ )
+
+ { "_type" : Timestamp,
+ "at" : TIME,
+ "m" : [ TIME, HASH ],
+ "k" : [ TIME, HASH ],
+ "b" : { NAME :
+ [ [ Version, Path, Time, Hash ] ] }
+ }
+
+ TIME is when the timestamp was signed. MIRRORLISTHASH is the digest
+ of the mirror-list file; KEYLISTHASH is the digest of the key list
+ file; and the 'b' entries are a list of the latest version of all
+ bundles and their locations and hashes.
+
+3.6. File formats: bundle files
+
+ (bundle
+ (at TIME)
+ (os OS)
+ [(arch ARCH)]
+ (version V)
+ (packages
+ (NAME VERSION PATH HASH ({(order INST UPDATE REMOVE)
+ (optional)?
+ (gloss LANG TEXT)*
+ (longloss LANG TEXT)*
+ (ATTR VAL)*})? )* )
+ )
+
+ { "_type" : "Bundle",
+ "name" : NAME,
+ "at" : TIME,
+ "os" : OS,
+ [ "arch" : ARCH, ]
+ "version" : V
+ "packages" :
+ [ { "name" : NAME,
+ "version" : VERSION,
+ "path" : PATH,
+ "hash" : HASH,
+ "order" : [ INST, UPDATE, REMOVE ],
+ [ "optional : BOOL, ]
+ "gloss" : { LANG : TEXT },
+ "longgloss" : { LANG : TEXT },
+ } ] }
+
+ Most elements are self-explanatory; the INST, UPDATE, and REMOVE
+ elements of the order element are numbers defining the order in
+ which the packages are installed, updated, and removed respectively.
+ The "optional" element is present if the package is optional.
+ "Gloss" is a short utf-8 human-readable string explaining what the
+ package provides for the bundle; "longloss" is a longer such
+ utf-8 string.
+
+ (Note that the gloss strings are meant, not to describe the package,
+ but to describe what the package provides for the bundle. For
+ example, "The Anonymous Email Bundle needs the Python Runtime to run
+ Mixminion.")
+
+ Multiple gloss strings are allowed; each should have a different
+ language. The UI should display the must appropriate language to the
+ user.
+
+3.7. File formats: package files
+
+ (package
+ ({(name NAME)
+ (version VERSION)
+ (format FMT ((ATTR VAL)*)? )
+ (path PATH)
+ (ts TIME)
+ (digest HASH)
+ (shortdesc LANG TEXT)*
+ (longdesc LANG TEXT)*
+ (ATTR VAL)* })
+ )
+
+ Most elements are self-explanatory. The "FMT" element describes the
+ file format of the package, which should give enough information
+ about how to install it.
+
+ No two package files in the same repository should have the same
+ name and version. If a package needs to be changed, the version
+ MUST be incremented.
+
+ Descriptions are tagged with languages in the same way as glosses.
+
+4. Detailed Workflows
+
+4.1. The client application
+
+ Periodically, the client updater fetches a timestamp file from a
+ mirror. If the timestamp in the file is up-to-date, the client
+ first checks to see whether the keys file listed is one that the
+ client has. If not, the client fetches it, makes sure the hash of
+ the keys file matches the hash in the timestamp file, makes sure its
+ date is more recent than any keys file they have but not too far in
+ the future, and that it is signed by enough root keys that the
+ client recognizes.
+
+ [If the timestamp file is not up-to-date, the client tries a
+ few mirrors until it finds one with a good timestamp.]
+
+ [If the keys file from a mirror does not match the timestamp
+ file, the client tries a new mirror for both.]
+
+ [If the keys file is not signed by enough root keys, the client
+ warns the user and tries another mirror for both the timestamp
+ file and the keys file.]
+
+ Once the client has an up-to-date keys file, the client checks the
+ signature on the timestamp file. Assuming it checks out, the client
+ refreshes the mirror list as needed, and refreshes any bundle files
+ to which the user is subscribed if the client does not have
+ the latest version of those files. The client checks signatures on
+ these files, and fetches package metadata for any packages listed in
+ the bundle file that the client does not have, checks signatures on
+ these, and fetches binaries for packages that might need to be
+ installed or updated. As the packages arrive, clients check their
+ hashes.
+
+ Once the client has gotten enough packages, it informs the user that
+ new packages have arrived, and asks them if they want to update.
+
+ Clients SHOULD cache at least the latest versions they have received
+ of all files.
+
+4.1.1. Download preferences
+
+ Users should be able to specify that packages must be only
+ downloaded over Tor, or must only be downloaded over encrypted
+ protocols, or both. Users should also be able to express preference
+ for Tor vs non-Tor and encrypted vs non-encrypted, even if they
+ allow both.
+
+4.2. Mirrors
+
+ Periodically, mirrors do an rsync or equivalent to fetch the latest
+ version of whatever parts of the repository have changed since the
+ version they currently hold. Mirrors SHOULD replace older versions
+ of the repository idempotently, so that clients are less likely to
+ see inconsistent state. Mirrors SHOULD validate the information
+ they receive, and not serve partial or inconsistent files.
+
+4.3. Workflow: Packagers
+
+ When a new binary package is done, the person making the package
+ runs a tool to generate and sign a package file, and sends both the
+ package and the package file to a repository admin. Typically, the
+ base package file will be generated by inserting a version into a
+ template.
+
+ Packages MAY have as part of their build process a script to
+ generate the appropriately versioned package file. This script
+ should at a minimum demand a build version, or use a timestamp in
+ place of a build version, to prevent two packages with the same
+ version from being created.
+
+4.4. Workflow: bundlers
+
+ When the packages in a bundle are done, the bundler runs a tool on
+ the package files to generate and sign a bundle file. Typically,
+ this tool uses a template bundle file.
+
+4.5. Workflow: repository administrators
+
+ Repository administrators use a tool to validate signed files into the
+ repository. The repository should not be altered manually.
+
+ This tool acts as follows:
+ - Package files may be added, but never replaced.
+ - Bundle files may be added, but never replaced.
+ - No file may be added unless it is syntactically valid and
+ signed by a key in the keys file authorized to sign files of
+ this type in this file's location location.
+
+ - A package file may not be added unless all of its binary
+ packages match their hashes.
+
+ - A bundle file may not be added unless all of its package files
+ are present and match their hashes.
+
+ - When adding a new keylist, bundle, or mirrors list, the
+ timestamp file must be regenerated immediately.
+
+5. Parameter setting and corner cases.
+
+5.1. Timing
+
+ The timestamp file SHOULD be regenerated every 15 minutes. Mirrors
+ SHOULD attempt to update every hour. Clients SHOULD accept a
+ timestamp file up to 6 hours old.
+
+5.2. Format versioning and forward-compatibility:
+
+ All of the above formats include the ability to add more
+ attribute-value fields for backwards-compatible format changes. If
+ we need to make a backwards incompatible format change, we create a
+ new filename for the new format.
+
+5.3. Key management and migration:
+
+ Root keys should be kept offline. All keys except timestamp and
+ mirror keys should be stored encrypted.
+
+ All the formats above allow for multiple keys to sign a single
+ document. To replace a compromised root key, it suffices to sign
+ keylist documents with both the compromised key and its replacement
+ until all clients have updated to a new version of the autoupdater.
+
+ To replace another key, it suffices to authorize the new key in the
+ keylist. Note that a new package or bundle key must re-sign and
+ issue new versions of all packages or bundles it has generated.
+
+
+
+F. Future directions and open questions
+
+F.1. Package decomposition
+
+ It would be neat to decouple existing packages. Right now, we'd
+ never want a windows user to have to fetch an openssl dll and Tor
+ separately. But if they're using an auto-update tool, it'd be
+ pretty keen to have them not need to fetch a new openssl every time
+ Tor has a bugfix.
+
+F.2. Caching at Tor servers.
+
+ See Tor Proposal number 127.
+
+F.3. Support for more download methods
+
+ Ozymandns, chunked downloads, and bittorrent would all be neat
+ ideas.
+
+F.4. Support for bogus clocks.
+
+ Glider should have a user configurable "no, my clock is _supposed_
+ to be wrong" mode, since lots of users seem to _like_ having their
+ clocks in 1970 forever.
+
+R. Ideas I'm rejecting for the moment
+
+R.1. Considering recommended versions from Tor consensus directory documents
+
+ This requires a working Tor to update Tor; that's not necessarily a
+ great idea.
+
+R.2. Integration with existing GPG signatures
+
+ The OpenPGP signature and key format is so complicated that you'd
+ have to be mad to touch it.
+
+