summaryrefslogtreecommitdiff
path: root/specs
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2008-09-04 16:28:44 +0000
committerNick Mathewson <nickm@torproject.org>2008-09-04 16:28:44 +0000
commitfcd98068a15eabbd3ed9264993c15058fd5c4815 (patch)
tree7136f62883724beb8a9216c15370ca5ad52cb0b6 /specs
parent081c3323dfb0121ad059a71c620ba0911d64cafa (diff)
Clarify and add intro to updater spec. Needs a rename and more merging.
git-svn-id: file:///home/or/svnrepo/updater/trunk@16756 55e972cd-5a19-0410-ae62-a4d7a52db4cd
Diffstat (limited to 'specs')
-rw-r--r--specs/U2-formats.txt370
1 files changed, 257 insertions, 113 deletions
diff --git a/specs/U2-formats.txt b/specs/U2-formats.txt
index 16bc1d6..64add21 100644
--- a/specs/U2-formats.txt
+++ b/specs/U2-formats.txt
@@ -1,48 +1,48 @@
+0. Preliminaries
-Scope
+0.0. Scope
- This document describes a repository and document format for use in
- distributing Tor bundle updates. It is meant to be a component of
- an overall automatic update system.
+ This document describes a system for distributing Tor bundle updates.
- Not described in this document is the design of the packages or their
- install process, though some requirements are listed.
-
-Proposed code name
+0.1. Proposed code name
Since "auto-update" is so generic, I've been thinking about going with
- with "hapoc" or "glider" or "petaurus", all based on the sugar
- glider you get when you search for "handy pocket creature".
+ "glider", based on the sugar glider you get when you search for "handy
+ pocket creature". I haven't yet done a search to find out whether
+ somebody else is using the name, so we shouldn't get too attached to it
+ before we see if it's taken.
-Metaformat
+0.2. Goals
- All documents use Rivest's SEXP meta-format as documented at
- http://people.csail.mit.edu/rivest/sexp.html
- with the restriction that no "display hint" fields are to be used,
- and the base64 transit encoding isn't used either.
+ Once Tor was a single executable that you could just run. Then it
+ required Privoxy. Now, thanks to the Tor Browser Bundle and related
+ projects, a full installation can contain Tor, Privoxy, Torbutton,
+ Firefox, and more.
- In descriptions of syntax below, we use regex-style qualifiers, so
- that in
- (sofa slipcover? occupant* leg+)
- the sofa will have an optional slipcover, zero or more occupants,
- and one or more legs. This pattern matches (sofa leg) and (sofa
- slipcover occupant occupant leg leg leg leg) but not (sofa leg
- slipcover).
+ We need to keep this software updated. When we make security fixes,
+ quick uptake helps narrow the window in which attackers can exploit
+ them.
- We also use a braces notation to indicate elements that can occur
- in any order. For example,
- (bread {flour+ eggs? yeast})
- matches a list starting with "bread", and then containing one or
- more of flours, zero or one occurrences of eggs, and one
- occurrence of yeast, in any order. This pattern matches (bread eggs
- yeast flour) but not (bread yeast) or (bread flour eggs yeast
- macadamias).
+ We need updates to be easy. Each additional step a user must take to
+ get updated means that more users will stay with older insecure
+ versions.
+
+ We need updates to be secure. We're supposed to be good at crypto;
+ let's act like it. There is no good reason in this day and age to
+ subject users to rollback attacks or unsigned packages or whatever.
+ We need administration to be simple. Tor doesn't have a release
+ engineering team, so we can't add too many hard steps to putting out
+ a new release.
-Goals
+ The system should be easy to implement; we may need to do multiple
+ implementations on the client side at least.
- It should be possible to mirror a repository using only rsync and cron.
+0.2.1. Goals for package formats and PKIs
+
+ It should be possible to mirror a repository using only rsync and
+ cron.
Separate keys should be used for different people and different
roles.
@@ -53,87 +53,155 @@ Goals
The system should handle any single computer or system or person
being unavailable.
- The system should be pretty future-proof.
-
- The client-side of the architecture should be really easy to implement.
-
-Non-goals:
+ The formats and protocols should be pretty future-proof.
- This is not a package format. Instead, we reuse existing package
- formats for each platform.
+0.3. Non-goals
This is not a general-purpose package manager like yum or apt: it
assumes that users will want to have one or more of a set of
"bundles", not an arbitrary selection of packages dependant on one
- another.
+ another. (Rationale: these systems do what they do pretty well.)
This is also not a general-purpose package format. It assumes the
existence of an external package format that can handle install,
- update, remove, and version query.
-
-Architecture: Repository
-
- A "repository" is a directory hierarchy containing packages,
- bundles, and metadata, all signed.
-
- A "package" is a single independent downloadable, installable
- binary archive. It could be an MSI, an RPM, a DMG, or so on.
- Every package is a compiled instance of some piece of
- software (an 'application') for some (os, architecture,
- version) combinations. Some packages are "glue" that make other
- packages work well together or get configured properly.
-
- A "bundle" is a list of of packages to be installed together.
- Examples might be "Tor Browser Bundle" or "Tor plus controller". A
- bundle is versioned, and every bundle is for a particular (os,
- architecture) combination. Bundles specify which order to install
- or update their components.
-
- Metadata is used to:
- - Find mirrors
- - Validate packages, bundles, and metadata
- - Make sure information is up-to-date
- - Determine which packages are in a bundle
+ update, remove, and version query. (Rationale:
+
+1. System overview
+
+ The basic unit of updatability is a "bundle". A bundle is a set of
+ software components, or "packages", plus some rules about installing
+ them. Example bundles could be "Tor Browser, stable series" or
+ "Basic Tor, development series".
+
+ When Glider has responsibility for keeping a bundle up to date, we
+ say that a user has "subscribed" to that bundle.
+
+ Conceptually, there are four parts to keeping a bundle up to date:
+
+ Polling:
+ - Periodically, Glider asks a mirror whether there is a newer
+ version of some bundle that a user has subscribed to. If so,
+ Glider determines what's in the bundle.
+
+ Fetching:
+ - If the bundle contains packages that Glider hasn't installed
+ or hasn't cached, it needs to download them from a mirror.
+ This can happen over any protocol; v1 should support at least
+ http and https-over-Tor. V1 should also support resuming
+ partial downloads, since many users have unreliable
+ connections.
+
+ Later versions could support Bittorrent, or whatever.
+
+ Validation:
+ - Throughout the process, Glider must ensure that all the
+ bundles are signed correctly, all the packages are signed
+ correctly, and everything is up-to-date.
+
+ We want to specify this so that users can't be tricked about
+ the contents of a bundle, can't install a malicious package,
+ and can't be fooled into believing that an old bundle is
+ actually the latest.
+
+ Installation:
+ - Now Glider has a set of packages to install. The format of
+ these packages will be platform-dependent: they could be pkg
+ files on OSX, MSI files on Win32, RPMs or DEBs on Linux, and
+ so on. Glider should query the user for permission to start,
+ then install the packages.
+
+1.1. The repository
+
+ Each Glider instance knows about one or more "repositories". A
+ repository is a filesystem somewhere that contains the packages in a
+ set of bundles, and some associated metadata. A repository must
+ exist at one or more canonical hosts, and may have a number of full
+ or partial mirrors.
+
+ In v1, each Glider instance will know about only one repository.
+
+1.2. The PKI
+
+ The trust root for the whole system is, necessarily, whatever users
+ download when they first download a copy of Glider. We need to make
+ sure that the first download happens from a site we trust, using
+ HTTPS.
+
+ Glider ships with root keys, which in turn are used to verify the
+ keys for all the other roles. There are a few root keys, operated by
+ trusted admins for the system. If root keys ever need to be changed,
+ we can just ship an update of Glider: it's supposed to be
+ self-updating anyway.
+
+ The root keys are only used to sign a 'key list' of all the other
+ keys and their roles. A key list is valid if it has been signed by a
+ threshold of root keys.
+
+ Each package is signed with the key of its authorized builder. For
+ example, one volunteer may be authorized to build the mac versions of
+ several packages, and another may be authorized to build the windows
+ version of just one.
+
+ Each bundle is signed with the key of its maintainer. It's assumed
+ that the bundle maintainer might be the package maintainer for some
+ but not all of the packages.
+
+ The list of mirrors is also signed. If the mirror list is
+ automatically updated, this key must be kept online; otherwise, it
+ can be offline.
+
+ To prevent an adversary from replaying an out-of-date signed
+ document, an automated process periodically signs a timestamped
+ statement containing the hashes of the mirror list, the latest
+ bundles, and the key list, using yet another special-purpose key.
+ This key must be kept online.
+
+1.3. Threat Model And Analysis
+
+ We assume an adversary who can operate compromised mirrors, and who
+ can possibly compromise the main repository. At worst, such an
+ adversary can DOS users in a way that they can detect.
+
+ We're assuming for the moment an OSX/Win32-like execution model,
+ where all packages will run equal privilege, but occasionally
+ installation will require higher privilege. This means that once a
+ hostile package is installed, it can basically do whatever it
+ wants. As rootkit writers demonstrate, compromise is really
+ tenuous: any attacker who can induce a user to install a hostile
+ piece of code has, in effect, permanently compromised that user
+ until they reinstall.
+
+ Thus, if an adversary compromises enough keys to sign a compromised
+ package, or tricks a packager into signing a compromised package,
+ and manages to get that package into a signed bundle, the best we
+ can do is to limit the number of users who are affected. We do
+ this by compartmentalizing signing keys so that only the package
+ and bundle in question are at risk.
+
+ (If we had replicated build processes and a bit-by-bit reliable
+ build process, we could have multiple packagers test that a binary
+ was built properly, and multiply sign it. This would be effective
+ against an adversary compromising a single packaging key, but not
+ against one compromising a source repository.)
+
+2. The repository layout
The filesystem layout in the repository is used for two purposes:
- To give mirrors an easy way to mirror only some of the repository.
- To specify which parts of the repository a given key has the
authority to sign.
-Architecture: Roles
-
- Every role in the system is associated with a key. Replacing
- anything but a root key is supposed to be relatively easy.
-
- Root-keys sign other keys, and certify them as belonging to roles.
- Clients are configured to know the root keys.
-
- Bundle keys certify the contents of a bundle.
-
- Package keys certify packages for a given program or set of
- programs.
-
- Mirror keys certify a list of mirrors. We expect this to be an
- automated process.
-
- Timestamp keys certify that given versions of other metadata
- documents are up-to-date. They are the only keys that absolutely
- need to be kept online. (If they are not, timestamps won't be
- generated.)
-
-Directory layout
-
The following files exist in all repositories and mirrors:
/meta/keys.txt
Signed by the root keys; indicates keys and roles.
- [XXXX I'm using the txt extension here. Is that smart?]
+ [???? I'm using the txt extension here. Is that smart?]
/meta/mirrors.txt
- Signed by the mirror key; indicates which parts of the repo
- are mirrored where.
+ Signed by the mirror key; indicates which parts of the
+ repository are mirrored at what mirrors.
/meta/timestamp.txt
@@ -141,6 +209,8 @@ Directory layout
for the latest versions of keys.txt and mirrors.txt. Also
indicates the latest version of each bundle for each os/arch.
+ This is the only file that needs to be downloaded for polling.
+
/bundleinfo/bundlename/os-arch/bundlename-os-arch-bundleversion.txt
Signed by the appropriate bundle key. Describes what
@@ -150,14 +220,45 @@ Directory layout
/pkginfo/packagename/os-arch/version/packagename-os-arch-packageversion.txt
Signed by the appropriate package key. Tells the name of the
- file that makes up the bundle, its hash, and what procedure
+ file that makes up a package, its hash, and what procedure
is used to install it.
/packages/packagename/os-arch/version/(some filename)
- The actual files [XXX finish sentence]
+ The actual package file. Its naming convention will depend
+ on the underlying packaging system.
+
+3. Document formats
-File formats: general principles
+3.1. Metaformat
+
+ All documents use Rivest's SEXP meta-format as documented at
+ http://people.csail.mit.edu/rivest/sexp.html
+ with the restriction that no "display hint" fields are to be used,
+ and the base64 transit encoding isn't used either.
+
+ (We use SEXP because it's really easy to parse, really portable,
+ and unlike most other tagged data formats, has a
+ trivially-specified canonical format suitable for hashing.)
+
+ In descriptions of syntax below, we use regex-style qualifiers, so
+ that in
+ (sofa slipcover? occupant* leg+)
+ the sofa will have an optional slipcover, zero or more occupants,
+ and one or more legs. This pattern matches (sofa leg) and (sofa
+ slipcover occupant occupant leg leg leg leg) but not (sofa leg
+ slipcover).
+
+ We also use a braces notation to indicate elements that can occur
+ in any order. For example,
+ (bread {flour+ eggs? yeast})
+ matches a list starting with "bread", and then containing one or
+ more of flours, zero or one occurrences of eggs, and one
+ occurrence of yeast, in any order. This pattern matches (bread eggs
+ yeast flour) but not (bread yeast) or (bread flour eggs yeast
+ macadamias).
+
+3.2. File formats: general principles
We use tagged lists (lists whose first element is a string) to
indicate typed objects. Tags are generally lower-case, with
@@ -211,11 +312,37 @@ File formats: general principles
The ID of a key is the type field concatenated with the SHA-256
hash of the canonical encoding of the KEYVAL field.
- We define one keytype at present: 'rsa'. The KEYVAL in this case is a
- 2-element list of (e p), with both values given in big-endian
- binary format. [This makes keys 45-60% more compact.]
+ We define one keytype at present: 'rsa'. The KEYVAL in this case
+ is a 2-element list of (e n), with both values given in big-endian
+ binary format. [This makes keys 45-60% more compact than using
+ decimal integers.]
+
+ All RSA keys must be at least 2048 bits long.
+
+
+ Every role in the system is associated with a key. Replacing
+ anything but a root key is supposed to be relatively easy.
+
+ Root-keys sign other keys, and certify them as belonging to roles.
+ Clients are configured to know the root keys.
+
+ Bundle keys certify the contents of a bundle.
+
+ Package keys certify packages for a given program or set of
+ programs.
+
+ Mirror keys certify a list of mirrors. We expect this to be an
+ automated process.
-File formats: key list
+ Timestamp keys certify that given versions of other metadata
+ documents are up-to-date. They are the only keys that absolutely
+ need to be kept online. (If they are not, timestamps won't be
+ generated.)
+
+3.3. File formats: key list
+
+ The key list file is signed by multiple root keys. It indicates
+ which keys are authorized to sign which parts of the repository.
(keylist
(ts TIME)
@@ -228,13 +355,17 @@ File formats: key list
MUST NOT replace a file with an older one, and SHOULD NOT accept a
file too far in the future.
- A ROLE is one of "timestamp" "mirrors" "bundle" or "package"
+ A ROLE is one of "timestamp" "mirrors" "bundle" or "package".
PATH is a path relative to the top of the directory hierarchy. It
may contain "*" elements to indicate "any file", and may end with a
"/**" element to indicate all files under a given point.
-File formats: mirror list
+3.4. File formats: mirror list
+
+ The mirror list is signed by a mirror key. It indicates which
+ mirrors are active and believed to be mirroring which parts of the
+ repository.
(mirrorlist
(ts TIME)
@@ -251,7 +382,12 @@ File formats: mirror list
elements are the components describing how much of the packages
directory is mirrored. Their format is as in the keylist file.
-File formats: timestamp files
+3.5. File formats: timestamp files
+
+ The timestamp file is signed by a timestamp key. It indicates the
+ latest versions of other files, and contains a regularly updated
+ timestamp to prevent rollback attacks.
+
(ts
({(at TIME)
(m TIME MIRRORLISTHASH)
@@ -264,7 +400,7 @@ File formats: timestamp files
file; and the 'b' entries are a list of the latest version of each
bundles and their locations and hashes.
-File formats: bundle files
+3.6. File formats: bundle files
(bundle
(at TIME)
@@ -292,10 +428,12 @@ File formats: bundle files
example, "The Anonymous Email Bundle needs the Python Runtime to run
Mixminion.")
- [XXX consider translated strings here, if the gloss strings are ever
- meant to be shown to users. -RD]
+ Multiple gloss strings are allowed; each should have a different
+ language. The UI should display the must appropriate language to the
+ user.
+
+3.7. File formats: package files
-File formats: package files
(package
({(name NAME)
(version VERSION)
@@ -316,7 +454,11 @@ File formats: package files
name and version. If a package needs to be changed, the version
MUST be incremented.
-Workflows: The client application
+ Descriptions are tagged with languages in the same way as glosses.
+
+4. Detailed Workflows
+
+4.1. The client application
Periodically, the client updater fetches a timestamp file from a
mirror. If the timestamp in the file is up-to-date, the client
@@ -354,7 +496,7 @@ Workflows: The client application
Clients SHOULD cache at least the latest versions they have received
of all files.
-Workflow: Mirrors
+4.2. Mirrors
Periodically, mirrors do an rsync or equivalent to fetch the latest
version of whatever parts of the repository have changed since the
@@ -363,7 +505,7 @@ Workflow: Mirrors
see inconsistent state. Mirrors SHOULD validate the information
they receive, and not serve partial or inconsistent files.
-Workflow: Packagers
+4.3. Workflow: Packagers
When a new binary package is done, the person making the package
runs a tool to generate and sign a package file, and sends both the
@@ -377,13 +519,13 @@ Workflow: Packagers
place of a build version, to prevent two packages with the same
version from being created.
-Workflow: bundlers
+4.4. Workflow: bundlers
When the packages in a bundle are done, the bundler runs a tool on
the package files to generate and sign a bundle file. Typically,
this tool uses a template bundle file.
-Workflow: repository administrators
+4.5. Workflow: repository administrators
Repository administrators use a tool to validate signed files into the
repository. The repository should not be altered manually.
@@ -404,20 +546,22 @@ Workflow: repository administrators
- When adding a new keylist, bundle, or mirrors list, the
timestamp file must be regenerated immediately.
-Timing:
+5. Parameter setting and corner cases.
+
+5.1. Timing:
The timestamp file SHOULD be regenerated every 15 minutes. Mirrors
SHOULD attempt to update every hour. Clients SHOULD accept a
timestamp file up to 6 hours old.
-Format versioning and forward-compatibility:
+5.2. Format versioning and forward-compatibility:
All of the above formats include the ability to add more
attribute-value fields for backwards-compatible format changes. If
we need to make a backwards incompatible format change, we create a
new filename for the new format.
-Key management and migration:
+5.3. Key management and migration:
Root keys should be kept offline. All keys except timestamp and
mirror keys should be stored encrypted.