diff options
-rw-r--r-- | README.md | 14 | ||||
-rw-r--r-- | doc/commands.md | 285 | ||||
-rw-r--r-- | doc/config.md | 229 | ||||
-rw-r--r-- | doc/en.md | 77 | ||||
-rw-r--r-- | doc/guide.md | 257 | ||||
-rw-r--r-- | doc/known-issues.md | 64 | ||||
-rw-r--r-- | doc/quick-start.md | 245 | ||||
-rw-r--r-- | doc/service-diagram.odg | bin | 0 -> 12131 bytes | |||
-rw-r--r-- | doc/service-diagram.png | bin | 0 -> 25988 bytes | |||
-rw-r--r-- | doc/under-the-hood.md | 37 |
10 files changed, 1203 insertions, 5 deletions
@@ -30,14 +30,18 @@ While you can deploy all services on one server, we stronly recommend to use sep Troubleshooting =============== -If you have a problem, we are interested in fixing it! The best way for us to solve your problem is if you provide to us the complete log of what you did, and the output that was produced. Please don't cut out what appears to be useless information and only include the error that you received, instead copy and paste the complete log so that we can better determine the overall situation. +If you have a problem, we are interested in fixing it! -Visit https://leap.se/en/development for contact possibilities. +If you have a problem, be sure to have a look at the Known Issues section of the documentation to see if your issue is detailed there. -Known bugs ----------- +If not, the best way for us to solve your problem is if you provide to us the complete log of what you did, and the output that was produced. Please don't cut out what appears to be useless information and only include the error that you received, instead copy and paste the complete log so that we can better determine the overall situation. If you can run the same command that produced the error with a raised verbosity level (such as -v2), that provides us with more useful debugging information. -* Please read the section in the documentation about Known Issues (https://leap.se/docs/known-issues) +Visit https://leap.se/development for contact possibilities. + +Known Issues +------------ + +* Please read the section in the documentation about Known Issues (https://leap.se/docs/platform/known-issues) More Information diff --git a/doc/commands.md b/doc/commands.md new file mode 100644 index 00000000..b176541f --- /dev/null +++ b/doc/commands.md @@ -0,0 +1,285 @@ +@title = 'Command Line Reference' + +The command "leap" can be used to manage a bevy of servers running the LEAP platform from the comfort of your own home. + + +# Global Options + +* `--log FILE` +Override default log file +Default Value: None + +* `-v|--verbose LEVEL` +Verbosity level 0..2 +Default Value: 1 + +* `--help` +Show this message + +* `--version` +Display version number and exit + +* `--yes` +Skip prompts and assume "yes" + + +# leap add-user USERNAME + +Adds a new trusted sysadmin + + + +**Options** + +* `--pgp-pub-key arg` +OpenPGP public key file for this new user +Default Value: None + +* `--ssh-pub-key arg` +SSH public key file for this new user +Default Value: None + +* `--self` +lets you choose among your public keys + + +# leap cert + +Manage X.509 certificates + + + +## leap cert ca + +Creates two Certificate Authorities (one for validating servers and one for validating clients). + +See see what values are used in the generation of the certificates (like name and key size), run `leap inspect provider` and look for the "ca" property. To see the details of the created certs, run `leap inspect <file>`. + +## leap cert csr + +Creates a CSR for use in buying a commercial X.509 certificate. + +The CSR created is for the for the provider's primary domain. The properties used for this CSR come from `provider.ca.server_certificates`. + +## leap cert dh + +Creates a Diffie-Hellman parameter file. + + + +## leap cert update <node-filter> + +Creates or renews a X.509 certificate/key pair for a single node or all nodes, but only if needed. + +This command will a generate new certificate for a node if some value in the node has changed that is included in the certificate (like hostname or IP address), or if the old certificate will be expiring soon. Sometimes, you might want to force the generation of a new certificate, such as in the cases where you have changed a CA parameter for server certificates, like bit size or digest hash. In this case, use --force. If <node-filter> is empty, this command will apply to all nodes. + +**Options** + +* `--force` +Always generate new certificates + + +# leap clean + +Removes all files generated with the "compile" command. + + + +# leap compile + +Compiles node configuration files into hiera files used for deployment. + + + +# leap deploy FILTER + +Apply recipes to a node or set of nodes. + +The FILTER can be the name of a node, service, or tag. + +**Options** + +* `--tags TAG[,TAG]` +Specify tags to pass through to puppet (overriding the default). +Default Value: leap_base,leap_service + +* `--fast` +Makes the deploy command faster by skipping some slow steps. A "fast" deploy can be used safely if you recently completed a normal deploy. + + +# leap help command + +Shows a list of commands or help for one command + +Gets help for the application or its commands. Can also list the commands in a way helpful to creating a bash-style completion function + +**Options** + +* `-c` +List commands one per line, to assist with shell completion + + +# leap inspect FILE + +Prints details about a file. Alternately, the argument FILE can be the name of a node, service or tag. + + + +# leap list [FILTER] + +List nodes and their classifications + +Prints out a listing of nodes, services, or tags. If present, the FILTER can be a list of names of nodes, services, or tags. If the name is prefixed with +, this acts like an AND condition. For example: + +`leap list node1 node2` matches all nodes named "node1" OR "node2" + +`leap list openvpn +local` matches all nodes with service "openvpn" AND tag "local" + +**Options** + +* `--print arg` +What attributes to print (optional) +Default Value: None + + +# leap local + +Manage local virtual machines. + +This command provides a convient way to manage Vagrant-based virtual machines. If FILTER argument is missing, the command runs on all local virtual machines. The Vagrantfile is automatically generated in 'test/Vagrantfile'. If you want to run vagrant commands manually, cd to 'test'. + +## leap local destroy [FILTER] + +Destroys the virtual machine(s), reclaiming the disk space + + + +## leap local reset [FILTER] + +Resets virtual machine(s) to the last saved snapshot + + + +## leap local save [FILTER] + +Saves the current state of the virtual machine as a new snapshot + + + +## leap local start [FILTER] + +Starts up the virtual machine(s) + + + +## leap local status [FILTER] + +Print the status of local virtual machine(s) + + + +## leap local stop [FILTER] + +Shuts down the virtual machine(s) + + + +# leap new DIRECTORY + +Creates a new provider instance in the specified directory, creating it if necessary. + + + +**Options** + +* `--contacts arg` +Default email address contacts. +Default Value: None + +* `--domain arg` +The primary domain of the provider. +Default Value: None + +* `--name arg` +The name of the provider. +Default Value: None + +* `--platform arg` +File path of the leap_platform directory. +Default Value: None + + +# leap node + +Node management + + + +## leap node add NAME [SEED] + +Create a new configuration file for a node named NAME. + +If specified, the optional argument SEED can be used to seed values in the node configuration file. + +The format is property_name:value. + +For example: `leap node add web1 ip_address:1.2.3.4 services:webapp`. + +To set nested properties, property name can contain '.', like so: `leap node add web1 ssh.port:44` + +Separeate multiple values for a single property with a comma, like so: `leap node add mynode services:webapp,dns` + +**Options** + +* `--local` +Make a local testing node (by automatically assigning the next available local IP address). Local nodes are run as virtual machines on your computer. + + +## leap node init FILTER + +Bootstraps a node or nodes, setting up SSH keys and installing prerequisite packages + +This command prepares a server to be used with the LEAP Platform by saving the server's SSH host key, copying the authorized_keys file, and installing packages that are required for deploying. Node init must be run before deploying to a server, and the server must be running and available via the network. This command only needs to be run once, but there is no harm in running it multiple times. + +**Options** + +* `--echo` +If set, passwords are visible as you type them (default is hidden) + + +## leap node mv OLD_NAME NEW_NAME + +Renames a node file, and all its related files. + + + +## leap node rm NAME + +Removes all the files related to the node named NAME. + + + +# leap ssh NAME + +Log in to the specified node with an interactive shell. + + + +# leap test + +Run tests. + + + +## leap test init + +Creates files needed to run tests. + + + +## leap test run + +Run tests. + + +Default Command: run diff --git a/doc/config.md b/doc/config.md new file mode 100644 index 00000000..d0b1f6a7 --- /dev/null +++ b/doc/config.md @@ -0,0 +1,229 @@ +@title = "Configuration Files" + +Leapfile +------------------------------------------- + +A `Leapfile` defines options for the `leap` command and lives at the root of your provider directory. `Leapfile` is evaluated as ruby, so you can include whatever weird logic you want in this file. In particular, there are several variables you can set that modify the behavior of leap. For example: + + @platform_directory_path = '../leap_platform' + @log = '/var/log/leap.log' + +Additionally, you can create a `~/.leaprc` file that is loaded after `Leapfile` and is evaluated the same way. + +Platform options: + +* `@platform_directory_path` (required). This must be set to the path where `leap_platform` lives. The path may be relative. +* `@platform_branch`. If set, a check is preformed before running any command to ensure that the currently checked out branch of `leap_platform` matches the value set for `@platform_branch`. This is useful if you have a stable branch of your provider that you want to ensure runs off the master branch of `leap_platform`. +* `@allow_production_deploy`. By default, you can only deploy to production nodes if the current branch is 'master' or if the provider directory is not a git repository. This option allows you to override this behavior. + +Vagrant options: + +* `@vagrant_network`. Allows you to override the default network used for local nodes. It should include a netmask like `@vagrant_network = '10.0.0.0/24'`. +* `@custom_vagrant_vm_line`. Insert arbitrary text into the auto-generated Vagrantfile. For example, `@custom_vagrant_vm_line = "config.vm.boot_mode = :gui"`. + +Logging options: + +* `@log`. If set, all command invocation and results are logged to the specified file. This is the same as the switch `--log FILE`, except that the command line switch will override the value in the Leapfile. + + +Configuration files +------------------------------------------- + +All configuration files, other than `Leapfile`, are in the JSON format. For example: + + { + "key1": "value1", + "key2": "value2" + } + +Keys should match `/[a-z0-9_]/` + +Unlike traditional JSON, comments are allowed. If the first non-whitespace characters are `//` then the line is treated as a comment. + + // this is a comment + { + // this is a comment + "key": "value" // this is an error + } + +Options in the configuration files might be nested hashes, arrays, numbers, strings, or boolean. Numbers and boolean values should **not** be quoted. For example: + + { + "openvpn": { + "ip_address": "1.1.1.1", + "protocols": ["tcp", "udp"], + "ports": [80, 53], + "options": { + "public_ip": false, + "adblock": true + } + } + } + +If the value string is prefixed with an '=' character, the result is evaluated as ruby. For example: + + { + "domain": { + "public": "domain.org" + } + "api_domain": "= 'api.' + domain.public" + } + +In this case, the property "api_domain" will be set to "api.domain.org". So long as you do not create unresolvable circular dependencies, you can reference other properties in evaluated ruby that are themselves evaluated ruby. + +See "Macros" below for information on the special macros available to the evaluated ruby. + +TIP: In rare cases, you might want to force the evaluation of a value to happen in a later pass after most of the other properties have been evaluated. To do this, prefix the value string with "=>" instead of "=". + +Node inheritance +---------------------------------------- + +Every node inherits from common.json and also any of the services or tags attached to the node. Additionally, the `leap_platform` contains a directory `provider_base` that defines the default values for tags, services and common.json. + +Suppose you have a node configuration for `bitmask/nodes/willamette.json` like so: + + { + "services": "webapp", + "tags": ["production", "northwest-us"], + "ip_address": "1.1.1.1" + } + +This node will have hostname "willamette" and it will inherit from the following files (in this order): + +1. common.json + - load defaults: `provider_base/common.json` + - load provider: `bitmask/common.json` +2. service "webapp" + - load defaults: `provider_base/services/webapp.json` + - load provider: `bitmask/services/webapp.json` +3. tag "production" + - load defaults: `provider_base/tags/production.json` + - load provider: `bitmask/tags/production.json` +4. tag "northwest-us" + - load: `bitmask/tags/northwest-us.json` +5. finally, load node "willamette" + - load: `bitmask/nodes/willamette.json` + +The `provider_base` directory is under the `leap_platform` specified in the file `Leapfile`. + +To see all the variables a node has inherited, you could run `leap inspect willamette`. + +Common configuration options +---------------------------------------- + +You can use the command `leap inspect` to see what options are available for a provider, node, service, or tag configuration. For example: + +* `leap inspect common` -- show the options inherited by all nodes. +* `leap inspect --base common` -- show the common.json from `provider_base` without the local `common.json` inheritance applied. +* `leap inspect webapp` -- show all the options available for the service `webapp`. + +Here are some of the more important options you should be aware of: + +* `ip_address` -- Required for all nodes, no default. +* `ssh.port` -- The SSH port you want the node's OpenSSH server to bind to. This is also the default when trying to connect to a node, but if the node currently has OpenSSH running on a different port then run deploy with `--port` to override the `ssh.port` configuration value. +* `mosh.enabled` -- If set to `true`, then mosh will be installed on the server. The default is `false`. + +Macros +---------------------------------------- + +When using evaluated ruby in a JSON configuration file, there are several special macros that are available. These are evaluated in the context of a node (available as the variable `self`). + +The following methods are available to the evaluated ruby: + +`variable.variable` + + > Any variable defined or inherited by a particular node configuration is available by just referencing it using either hash notation or object field notation (e.g. `['domain']['public']` or `domain.public`). Circular references are not allowed, but otherwise it is OK to nest evaluated values in other evaluated values. If a value has not been defined, the hash notation will return nil but the field notation will raise an exception. Properties of services, tags, and the global provider can all be referenced the same way. For example, `global.services['openvpn'].x509.dh`. + +`nodes` + + > A hash of all nodes. This list can be filtered. + +`nodes_like_me` + + > A hash of nodes that have the same deployment tags as the current node (e.g. 'production' or 'local'). + +`global.services` + + > A hash of all services, e.g. `global.services['openvpn']` would return the "openvpn" service. + +`global.tags` + + > A hash of all tags, e.g. `global.tags['production']` would return the "production" tag. + + `global.provider` + + > Can be used to access variables defined in `provider.json`, e.g. `global.provider.contacts.default`. + +`file(filename)` + + > Inserts the full contents of the file. If the file is an erb template, it is rendered. The filename can either be one of the pre-defined file symbols, or it can be a path relative to the "files" directory in your provider instance. E.g, `file :ca_cert` or `files 'ca/ca.crt'`. + +`file_path(filename)` + + > Ensures that the file will get rsynced to the node as an individual file. The value returned by `file_path` is the full path where this file will ultimately live when deploy to the node. e.g. `file_path :ca_cert` or `file_path 'branding/images/logo.png'`. + +`secret(:symbol)` + + > Returns the value of a secret in secrets.json (or creates it if necessary). E.g. `secret :couch_admin_password` + +`hosts_file` + + > Returns a data structure that puppet will use to generate /etc/hosts. Care is taken to use the local IP of other hosts when needed. + +`known_hosts_file` + + > Returns the lines needed in a SSH `known_hosts` file. + +`stunnel_client(node_list, port, options={})` + + > Returns a stunnel configuration data structure for the client side. Argument `node_list` is an `ObjectList` of nodes running stunnel servers. Argument `port` is the real port of the ultimate service running on the servers that the client wants to connect to. + +`stunnel_server(port)` + + > Generates a stunnel server entry. The `port` is the real port targeted service. + +Hash tables +----------------------------------------- + +The macros `nodes`, `nodes_like_me`, `global.services`, and `global.tags` all return a hash table of configuration objects (either nodes, services, or tags). There are several ways to filter and process these hash tables: + +Access an element by name: + + nodes['vpn1'] # returns node named 'vpn1' + global.services['openvpn'] # returns service named 'openvpn' + +Create a new hash table by applying filters: + + nodes[:public_dns => true] # all nodes where public_dns == true + nodes[:services => 'openvpn', :services => 'tor'] # openvpn OR tor + nodes[:services => 'openvpn'][:tags => 'production'] # openvpn AND production + nodes[:name => "!bob"] # all nodes that are NOT named "bob" + +Create an array of values by selecting a single field: + + nodes.field('location.name') + ==> ['seattle', 'istanbul'] + +Create an array of hashes by selecting multiple fields: + + nodes.fields('domain.full', 'ip_address') + ==> [ + {'domain_full' => 'red.bitmask.net', 'ip_address' => '1.1.1.1'}, + {'domain_full' => 'blue.bitmask.net', 'ip_address' => '1.1.1.2'}, + ] + +Create a new hash table of hashes, with only certain fields: + + nodes.pick_fields('domain.full', 'ip_address') + ==> { + "red" => {'domain_full' => 'red.bitmask.net', 'ip_address' => '1.1.1.1'}, + "blue => {'domain_full' => 'blue.bitmask.net', 'ip_address' => '1.1.1.2'}, + } + +With `pick_fields`, if there is only one field, it will generate a simple hash table: + + nodes.pick_fields('ip_address') + ==> { + "red" => '1.1.1.1', + "blue => '1.1.1.2', + } diff --git a/doc/en.md b/doc/en.md new file mode 100644 index 00000000..bdae4630 --- /dev/null +++ b/doc/en.md @@ -0,0 +1,77 @@ +@title = 'LEAP Platform for Service Providers' +@nav_title = 'Provider Platform' +@summary = 'Software platform to automate the process of running a communication service provider.' +@toc = true + +The *LEAP Platform* is set of complementary packages and server recipes to automate the maintenance of LEAP services in a hardened Debian environment. Its goal is to make it as painless as possible for sysadmins to deploy and maintain a service provider's infrastructure for secure communication. + +The LEAP Platform consists of three parts, detailed below: + +1. The platform recipes. +2. The provider instance. +3. The `leap` command line tool. + +The platform recipes +-------------------- + +The LEAP platform recipes define an abstract service provider. It is a set of [Puppet](https://puppetlabs.com/puppet/puppet-open-source/) modules designed to work together to provide to sysadmins everything they need to manage a service provider infrastructure that provides secure communication services. + +LEAP maintains a repository of platform recipes, which typically do not need to be modified, although it can be forked and merged as desired. Most service providers using the LEAP platform can use the same set of platform recipes. + +As these recipes consist in abstract definitions, in order to configure settings for a particular service provider a system administrator has to create a provider instance (see below). + +LEAP's platform recipes are distributed as a git repository: `git://leap.se/leap_platform.git` + +The provider instance +--------------------- + +A provider instance is a directory tree (typically tracked in git) containing all the configurations for a service provider's infrastructure. A provider instance primarily consists of: + +* A pointer to the platform recipes. +* A global configuration file for the provider. +* A configuration file for each server (node) in the provider's infrastructure. +* Additional files, such as certificates and keys. + +A minimal provider instance directory looks like this: + + └── bitmask # provider instance directory. + ├── Leapfile # settings for the `leap` command line tool. + ├── provider.json # global settings of the provider. + ├── common.json # settings common to all nodes. + ├── nodes/ # a directory for node configurations. + ├── files/ # keys, certificates, and other files. + └── users/ # public key information for privileged sysadmins. + + +A provider instance directory contains everything needed to manage all the servers that compose a provider's infrastructure. Because of this, any versioning tool and development work-flow can be used to manage your provider instance. + +The `leap` command line tool +---------------------------- + +The `leap` [command line tool](commands) is used by sysadmins to manage everything about a service provider's infrastructure. Except when creating an new provider instance, `leap` is run from within the directory tree of a provider instance. + +The `leap` command line has many capabilities, including: + +* Create, initialize, and deploy nodes. +* Manage keys and certificates. +* Query information about the node configurations. + +Traditional system configuration automation systems, like [Puppet](https://puppetlabs.com/puppet/puppet-open-source/) or [Chef](http://www.opscode.com/chef/), deploy changes to servers using a pull method. Each server pulls a manifest from a central master server and uses this to alter the state of the server. + +Instead, the `leap` tool uses a masterless push method: The sysadmin runs `leap deploy` from the provider instance directory on their desktop machine to push the changes out to every server (or a subset of servers). LEAP still uses Puppet, but there is no central master server that each node must pull from. + +One other significant difference between LEAP and typical system automation is how interactions among servers are handled. Rather than store a central database of information about each server that can be queried when a recipe is applied, the `leap` command compiles static representation of all the information a particular server will need in order to apply the recipes. In compiling this static representation, `leap` can use arbitrary programming logic to query and manipulate information about other servers. + +These two approaches, masterless push and pre-compiled static configuration, allow the sysadmin to manage a set of LEAP servers using traditional software development techniques of branching and merging, to more easily create local testing environments using virtual servers, and to deploy without the added complexity and failure potential of a master server. + +The `leap` command line tool is distributed as a git repository: `git://leap.se/leap_cli`. It can be installed with `sudo gem install leap_cli`. + +Getting started +---------------------------------- + +We recommend reading the platform documentation in the following order: + +1. [Quick start tutorial](platform/quick-start). +2. [Platform Guide](platform/guide). +3. [Configuration format](platform/config). +4. The `leap` [command reference](platform/commands). diff --git a/doc/guide.md b/doc/guide.md new file mode 100644 index 00000000..dae392e5 --- /dev/null +++ b/doc/guide.md @@ -0,0 +1,257 @@ +@title = "LEAP Platform Guide" +@nav_title = "Guide" + +Services +================================ + +Every node has one or more services that determines the node's function within your provider's infrastructure. + +When adding a new node to your provider, you should ask yourself four questions: + +* **many or few?** Some services benefit from having many nodes, while some services are best run on only one or two nodes. +* **required or optional?** Some services are required, while others can be left out. +* **who does the node communicate with?** Some services communicate very heavily with other particular services. Nodes running these services should be close together. +* **public or private?** Some services communicate with the public internet, while others only need to communicate with other nodes in the infrastructure. + +Brief overview of the services: + +![services diagram](service-diagram.png) + +* **webapp**: The web application. Runs both webapp control panel for users and admins as well as the REST API that the client uses. Needs to communicate heavily with `couchdb` nodes. You need at least one, good to have two for redundancy. The webapp does not get a lot of traffic, so you will not need many. +* **couchdb**: The database for users and user data. You can get away with just one, but for proper redundancy you should have at least three. Communicates heavily with `webapp` and `mx` nodes. +* **soledad**: Handles the data syncing with clients. Typically combined with `couchdb` service, since it communicates heavily with couchdb. (not currently in stable release) +* **mx**: Incoming and outgoing MX servers. Communicates with the public internet, clients, and `couchdb` nodes. (not currently in stable release) +* **openvpn**: OpenVPN gateway for clients. You need at least one, but want as many as needed to support the bandwidth your users are doing. The `openvpn` nodes are autonomous and don't need to communicate with any other nodes. Often combined with `tor` service. + +Not pictured: + +* **monitor**: Internal service to monitor all the other nodes. Currently, you can have zero or one `monitor` nodes. +* **tor**: Sets up a tor exit node, unconnected to any other service. +* **dns**: Not yet implemented. + +Locations +================================ + +All nodes should have a `location.name` specified, and optionally additional information about the location, like the time zone. This location information is used for two things: + +* Determine which nodes can, or must, communicate with one another via a local network. The way some virtualization environments work, like OpenStack, requires that nodes communicate via the local network if they are on the same network. +* Allows the client to prefer connections to nodes that are closer in physical proximity to the user. This is particularly important for OpenVPN nodes. + +The location stanza in a node's config file looks like this: + + { + "location": { + "id": "ankara", + "name": "Ankara", + "country_code": "TR", + "timezone": "+2", + "hemisphere": "N" + } + } + +The fields: + +* `id`: An internal handle to use for this location. If two nodes have match `location.id`, then they are treated as being on a local network with one another. This value defaults to downcase and underscore of `location.name`. +* `name`: Can be anything, might be displayed to the user in the client if they choose to manually select a gateway. +* `country_code`: The [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1) two letter country code. +* `timezone`: The timezone expressed as an offset from UTC (in standard time, not daylight savings). You can look up the timezone using this [handy map](http://www.timeanddate.com/time/map/). +* `hemisphere`: This should be "S" for all servers in South America, Africa, or Australia. Otherwise, this should be "N". + +These location options are very imprecise, but good enough for most usage. The client often does not know its own location precisely either. Instead, the client makes an educated guess at location based on the OS's timezone and locale. + +If you have multiple nodes in a single location, it is best to use a tag for the location. For example: + +`tags/ankara.json`: + + { + "location": { + "name": "Ankara", + "country_code": "TR", + "timezone": "+2", + "hemisphere": "N" + } + } + +`nodes/vpngateway.json`: + + { + "services": "openvpn", + "tags": ["production", "ankara"], + "ip_address": "1.1.1.1", + "openvpn": { + "gateway_address": "1.1.1.2" + } + } + +Unless you are using OpenStack or AWS, setting `location` for nodes is not required. It is, however, highly recommended. + +Working with SSH +================================ + +Whenever the `leap` command nees to push changes to a node or gather information from a node, it tunnels this command over SSH. Another way to put this: the security of your servers rests entirely on SSH. Because of this, it is important that you understand how `leap` uses SSH. + +SSH related files +------------------------------- + +Assuming your provider directory is called 'provider': + +* `provider/nodes/crow/crow_ssh.pub` -- The public SSH host key for node 'crow'. +* `provider/users/alice/alice_ssh.pub` -- The public SSH user key for user 'alice'. Anyone with the private key that corresponds to this public key will have root access to all nodes. +* `provider/files/ssh/known_hosts` -- An autogenerated known_hosts, built from combining `provider/nodes/*/*_ssh.pub`. You must not edit this file directly. If you need to change it, remove or change one of the files that is used to generate `known_hosts` and then run `leap compile`. +* `provider/files/ssh/authorized_keys` -- An autogenerated list of all the user SSH keys with root access to the notes. It is created from `provider/users/*/*_ssh.pub`. You must not edit this file directly. If you need to change it, remove or change one of the files that is used to generate `authorized_keys` and then run `leap compile`. + +All of these files should be committed to source control. + +If you rename, remove, or add a node with `leap node [mv|add|rm]` the SSH key files and the `known_hosts` file will get properly updated. + +SSH and local nodes +----------------------------- + +Local nodes are run as Vagrant virtual machines. The `leap` command handles SSH slightly differently for these nodes. + +Basically, all the SSH security is turned off for local nodes. Since local nodes only exist for a short time on your computer and can't be reached from the internet, this is not a problem. + +Specifically, for local nodes: + +1. `known_hosts` is never updated with local node keys, since the SSH public key of a local node is different for each user. +2. `leap` entirely skips the checking of host keys when connecting with a local node. +3. `leap` adds the public Vagrant SSH key to the list of SSH keys for a user. The public Vagrant SSH key is a shared and insecure key that has root access to most Vagrant virtual machines. + +When SSH host key changes +------------------------------- + +If the host key for a node has changed, you will get an error "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED". + +To fix this, you need to remove the file `files/nodes/stompy/stompy_ssh.pub` and run `leap node init stompy`, where the node's name is 'stompy'. **Only do this if you are ABSOLUTELY CERTAIN that the node's SSH host key has changed**. + +Changing the SSH port +-------------------------------- + +Suppose you have a node `blinky` that has SSH listening on port 22 and you want to make it port 2200. + +First, modify the configuration for `blinky` to specify the variable `ssh.port` as 2200. Usually, this is done in `common.json` or in a tag file. + +For example, you could put this in `tags/production.json`: + + { + "ssh": { + "port": 2200 + } + } + +Run `leap compile` and open `hiera/blinky.yaml` to confirm that `ssh.port` is set to 2200. The port number must be specified as a number, not a string (no quotes). + +Then, you need to deploy this change so that SSH will bind to 2200. You cannot simply run `leap deploy blinky` because this command will default to using the variable `ssh.port` which is now `2200` but SSH on the node is still bound to 22. + +So, you manually override the port in the deploy command, using the old port: + + leap deploy --port 22 blinky + +Afterwards, SSH on `blinky` should be listening on port 2200 and you can just run `leap deploy blinky` from then on. + +X.509 Certificates +================================ + +Configuration options +------------------------------------------- + +The `ca` option in provider.json provides settings used when generating CAs and certificates. The defaults are as follows: + + "ca": { + "name": "= global.provider.ca.organization + ' Root CA'", + "organization": "= global.provider.name", + "organizational_unit": "= 'https://' + global.provider.name", + "bit_size": 4096, + "digest": "SHA256", + "life_span": "10y", + "server_certificates": { + "bit_size": 2024, + "digest": "SHA256", + "life_span": "1y" + }, + "client_certificates": { + "bit_size": 2024, + "digest": "SHA256", + "life_span": "2m", + "limited_prefix": "LIMITED", + "unlimited_prefix": "UNLIMITED" + } + } + +To see what values are used for your provider, run `leap inspect provider.json`. You can modify the defaults as you wish by adding the values to provider.json. + +NOTE: A certificate `bit_size` greater than 2024 will probably not be recognized by most commercial CAs. + +Certificate Authorities +----------------------------------------- + +There are three x.509 certificate authorities (CA) associated with your provider: + +1. **Commercial CA:** It is strongly recommended that you purchase a commercial cert for your primary domain. The goal of platform is to not depend on the commercial CA system, but it does increase security and usability if you purchase a certificate. The cert for the commercial CA must live at `files/cert/commercial_ca.crt`. +2. **Server CA:** This is a self-signed CA responsible for signing all the **server** certificates. The private key lives at `files/ca/ca.key` and the public cert lives at `files/ca/ca.crt`. The key is very sensitive information and must be kept private. The public cert is distributed publicly. +3. **Client CA:** This is a self-signed CA responsible for signing all the **client** certificates. The private key lives at `files/ca/client_ca.key` and the public cert lives at `files/ca/client_ca.crt`. Neither file is distribute publicly. It is not a big deal if the private key for the client CA is compromised, you can just generate a new one and re-deploy. + +To generate both the Server CA and the Client CA, run the command: + + leap cert ca + +Server certificates +----------------------------------- + +Most every server in your service provider will have a x.509 certificate, generated by the `leap` command using the Server CA. Whenever you modify any settings of a node that might affect it's certificate (like changing the IP address, hostname, or settings in provider.json), you can magically regenerate all the certs that need to be regenerated with this command: + + leap cert update + +Run `leap help cert update` for notes on usage options. + +Because the server certificates are generated locally on your personal machine, the private key for the Server CA need never be put on any server. It is up to you to keep this file secure. + +Client certificates +-------------------------------- + +Every leap client gets its own time-limited client certificate. This cert is use to connect to the OpenVPN gateway (and probably other things in the future). It is generated on the fly by the webapp using the Client CA. + +To make this work, the private key of the Client CA is made available to the webapp. This might seem bad, but compromise of the Client CA simply allows the attacker to use the OpenVPN gateways without paying. In the future, we plan to add a command to automatically regenerate the Client CA periodically. + +There are two types of client certificates: limited and unlimited. A client using a limited cert will have its bandwidth limited to the rate specified by `provider.service.bandwidth_limit` (in Bytes per second). An unlimited cert is given to the user if they authenticate and the user's service level matches one configured in `provider.service.levels` without bandwidth limits. Otherwise, the user is given a limited client cert. + +Commercial certificates +----------------------------------- + +We strongly recommend that you use a commercial signed server certificate for your primary domain (in other words, a certificate with a common name matching whatever you have configured for `provider.domain`). This provides several benefits: + +1. When users visit your website, they don't get a scary notice that something is wrong. +2. When a user runs the LEAP client, selecting your service provider will not cause a warning message. +3. When other providers first discover your provider, they are more likely to trust your provider key if it is fetched over a commercially verified link. + +The LEAP platform is designed so that it assumes you are using a commercial cert for the primary domain of your provider, but all other servers are assumed to use non-commercial certs signed by the Server CA you create. + +To generate a CSR, run: + + leap cert csr + +This command will generate the CSR and private key matching `provider.domain` (you can change the domain with `--domain=DOMAIN` switch). It also generates a server certificate signed with the Server CA. You should delete this certificate and replace it with a real one once it is created by your commercial CA. + +The related commercial cert files are: + + files/ + certs/ + domain.org.crt # Server certificate for domain.org, obtained by commercial CA. + domain.org.csr # Certificate signing request + domain.org.key # Private key for you certificate + commercial_ca.crt # The CA cert obtained from the commercial CA. + +The private key file is extremely sensitive and care should be taken with its provenance. + +If your commercial CA has a chained CA cert, you should be OK if you just put the **last** cert in the chain into the `commercial_ca.crt` file. This only works if the other CAs in the chain have certs in the debian package `ca-certificates`, which is the case for almost all CAs. + +Facts +============================== + +There are a few cases when we must gather internal data from a node before we can successfully deploy to other nodes. This is what `facts.json` is for. It stores a snapshot of certain facts about each node, as needed. Entries in `facts.json` are updated automatically when you initialize, rename, or remove a node. To manually force a full update of `facts.json`, run: + + leap facts update FILTER + +Run `leap help facts update` for more information. + +The file `facts.json` should be committed to source control. You might not have a `facts.json` if one is not required for your provider. diff --git a/doc/known-issues.md b/doc/known-issues.md new file mode 100644 index 00000000..abd28084 --- /dev/null +++ b/doc/known-issues.md @@ -0,0 +1,64 @@ +@title = 'Leap Platform Release Notes' +@nav_title = 'Known issues' +@summary = 'Known issues in the Leap Platform.' +@toc = true + +Here you can find documentation about known issues and potential work-arounds in the current Leap Platform release. + +0.2.2 +===== + +In this release the following issues are known, work-arounds are noted when available. + +General Issues +-------------- + +. This release does *not* anonymize your logs (see: https://leap.se/code/issues/1897) + +. This release does *not* setup email relaying, so admins will not receive important email notifications. Email service will be part of the next release (see: https://leap.se/code/issues/1683 https://leap.se/code/issues/1905) + +. Your openvpn gateway address will be added on the /24 network, and is not configurable in this release (see: https://leap.se/code/issues/1863) + +. You must not add a node with an underscore in the name, you also cannot use a hyphen for a vagrant node (see: https://leap.se/code/issues/3087) + +. The nagios website check reports success when the webapp is not functioning but apache is up (see: https://leap.se/code/issues/1629) + +User setup and ssh +------------------ + +. if you aren't using a single ssh key, but have different ones, you will need to define the following at the top of your ~/.ssh/config: + HostName <ip address> + IdentityFile <path to identity file> + + (see: https://leap.se/code/issues/2946 and https://leap.se/code/issues/3002) + +. If the ssh host key changes, you need to run node init again (see: https://leap.se/en/docs/platform/guide#Working.with.SSH) + +. At the moment, only ECDSA ssh host keys are supported. If you get the following error: `= FAILED ssh-keyscan: no hostkey alg (must be missing an ecdsa public host key)` then you should confirm that you have the following line defined in your server's /etc/ssh/sshd_config: +HostKey /etc/ssh/ssh_host_ecdsa_key and that file exists. If you made a change to your sshd_config, then you need to run `/etc/init.d/ssh restart` (see: https://leap.se/code/issues/2373) + +. To remove an admin's access to your servers, please remove the directory for that user under the `users/` subdirectory in your provider directory and then remove that user's ssh keys from files/ssh/authorized_keys. When finished you *must* run a `leap deploy` to update that information on the servers (see: https://leap.se/code/issues/1863) + +. At the moment, it is only possible to add an admin who will have access to all LEAP servers (see: https://leap.se/code/issues/2280) + +. leap add-user --self allows only one key - if you run that command twice with different keys, you will just replace the key with the second key. To add a second key, add it manually to files/ssh/authorized_keys (see: https://leap.se/code/issues/866) + +Deploying +--------- + +. If you have any errors during a run, please try to deploy again as this often solves non-deterministic issues that were not uncovered in our testing. Please re-deploy with `leap -v2 deploy` to get more verbose logs and capture the complete output to provide to us for debugging. + +. If when deploying your debian mirror fails for some reason, network anomoly or the mirror itself is out of date, then platform deployment will not succeed properly. Check the mirror is up and try to deploy again when it is resolved (see: https://leap.se/code/issues/1091) + +. Deployment gives 'error: in `%`: too few arguments (ArgumentError)' - this is because you attempted to do a deploy before initializing a node, please initialize the node first and then do a deploy afterwards (see: https://leap.se/code/issues/2550) + +. This release has no ability to custom configure apt sources or proxies (see: https://leap.se/code/issues/1971) + +. When running a deploy at a verbosity level of 2 and above, you will notice puppet deprecation warnings, these are known and we are working on fixing them + +Special Environments +-------------------- + +. When deploying to OpenStack release "nova" or newer, you will need to do an initial deploy, then when it has finished run `leap facts update` and then deploy again (see: https://leap.se/code/issues/3020) + +. It is not possible to actually use the EIP openvpn server on vagrant nodes (see: https://leap.se/code/issues/2401) diff --git a/doc/quick-start.md b/doc/quick-start.md new file mode 100644 index 00000000..5ba28f8d --- /dev/null +++ b/doc/quick-start.md @@ -0,0 +1,245 @@ +@title = 'LEAP Platform Quick Start' +@nav_title = 'Quick Start' + +This tutorial walks you through the initial process of creating and deploying a service provider running the [LEAP platform](platform). First examples aim to build a provider in a virtual environment, and in the end running in real hardware is targeted. + +First, a few definitions: + +* **node:** A server that is part of the service provider's infrastructure. All nodes are running the Debian GNU/Linux operating system. +* **sysadmin:** This is you. +* **sysadmin machine:** Your desktop or laptop computer that you use to control the nodes. This machine can be running any variant of Unix, Linux, or Mac OS (however, only Debian derivatives are supported at the moment). + +All the commands in this tutorial are run on your sysadmin machine. In order to complete the tutorial, the sysadmin machine must: + +* Be a real machine with virtualization support in the CPU (VT-x or AMD-V). In other words, not a virtual machine. +* Have at least 4gb of RAM. +* Have a fast internet connection (because you will be downloading a lot of big files, like virtual machine images). + +Install prerequisites +-------------------------------- + +*Debian & Ubuntu* + +Install core prerequisites: + + sudo apt-get install git ruby ruby-dev rsync openssh-client openssl rake make + +Install Vagrant in order to be able to test with local virtual machines (typically optional, but required for this tutorial): + + sudo apt-get install vagrant virtualbox + +<!-- +*Mac OS* + +1. Install rubygems from https://rubygems.org/pages/download (unless the `gem` command is already installed). +2. Install Vagrant.dmg from http://downloads.vagrantup.com/ +--> + +Install leap +--------------------- + +<!--Install the `leap` command as a gem: + + sudo gem install leap_cli + +Alternately, you can install `leap` from source: + + git clone git://leap.se/leap_cli.git + cd leap_cli + rake build +--> + +Install `leap` command from source: + + git clone git://leap.se/leap_cli.git + cd leap_cli + rake build + +Then, install as root user (recommended): + + sudo rake install + +Or, install as unprivileged user: + + rake install + # watch out for the directory leap is installed to, then i.e. + sudo ln -s ~/.gem/ruby/1.9.1/bin/leap /usr/local/bin/leap + +With both methods, you can use now /usr/local/bin/leap, which in most cases will be in your $PATH. + + +Create a provider instance +--------------------------------------- + +A provider instance is a directory tree, usually stored in git, that contains everything you need to manage an infrastructure for a service provider. In this case, we create one for bitmask.net and call the instance directory 'bitmask'. + + mkdir -p ~/leap/bitmask + +Now, we will initialize this directory to make it a provider instance. Your provider instance will need to know where it can find local copy of the git repository leap_platform, which holds the puppet recipes you will need to manage your servers. Typically, you will not need to modify leap_platform. + + cd ~/leap/bitmask + leap new . + +The `leap new` command will ask you for several required values: + +* domain: The primary domain name of your service provider. In this tutorial, we will be using "bitmask.net". +* name: The name of your service provider. +* contact emails: A comma separated list of email addresses that should be used for important service provider contacts (for things like postmaster aliases, Tor contact emails, etc). +* platform: The directory where you have a copy of the `leap_platform` git repository checked out. If it doesn't exist, it will be downloaded for you. + +You may want to poke around and see what is in the files we just created. For example: + + cat provider.json + +Optionally, commit your provider directory using the version control software you fancy. For example: + + git init + git add . + git commit -m "initial commit" + +Now add yourself as a privileged sysadmin who will have access to deploy to servers: + + leap add-user --self + +NOTE: in most cases, `leap` must be run from within a provider instance directory tree (e.g. ~/leap/bitmask). + +Now generate required X509 certificates and keys: + + leap cert ca + leap cert csr + +To see details about the keys and certs that the prior two commands created, you can use `leap inspect` like so: + + leap inspect files/ca/ca.crt + + +Edit provider.json configuration +-------------------------------------- + +There are a few required settings in provider.json. At a minimum, you must have: + + { + "domain": "bitmask.net", + "name": "Bitmask", + "contacts": { + "default": "email1@domain.org, email2@domain.org" + } + } + +For a full list of possible settings, you can use `leap inspect` to see how provider.json is evaluated after including the inherited defaults: + + leap inspect provider.json + +Create nodes +--------------------- + +A "node" is a server that is part of your infrastructure. Every node can have one or more services associated with it. Some nodes are "local" and used only for testing. These local nodes exist only as virtual machines on your computer and cannot be accessed from outside (see `leap help local` for more information). + +Create a local node, with the service "webapp": + + leap node add --local web1 services:webapp + +This created a node configuration file in `nodes/web1.json`, but it did not create the virtual machine. In order to test our node "web1", we need to first spin up a virtual machine. The next command will probably take a very long time, because it will need to download a VM image (about 700mb). + + leap local start + +Now that the virtual machine for web1 is running, you need to initialize it and then deploy the recipes to it. You only need to initialize a node once, but there is no harm in doing it multiple times. These commands will take a while to run the first time, as it needs to update the package cache on the new virtual machine. + + leap node init web1 + leap deploy web1 + +That is it, you should now have your first running node. However, the LEAP web application requires a database to run, so let's add a "couchdb" node: + + leap node add --local db1 services:couchdb + leap local start + leap node init db1 + leap deploy db1 + +Access the web application +-------------------------------------------- + +You should now have two local virtual machines running, one for the web application and one for the database. In order to connect to the web application in your browser, you need to point your domain at the IP address of the web application node (named web1 in this example). + +There are a lot of different ways to do this, but one easy way is to modify your `/etc/hosts` file. First, find the IP address of the webapp node: + + leap list webapp --print ip_address + +Then modify `/etc/hosts` like so: + + 10.5.5.47 DOMAIN + +Replacing 'DOMAIN' with whatever you specified as the `domain` in the `leap new` command. + +Next, you can connect to the web application either using a web browser or via the API using the LEAP client. To use a browser, connect to https://DOMAIN. Your browser will complain about an untrusted cert, but for now just bypass this. From there, you should be able to register a new user and login. + +What is going on here? +-------------------------------------------- + +First, some background terminology: + +* **puppet**: Puppet is a system for automating deployment and management of servers (called nodes). +* **hiera files**: In puppet, you can use something called a 'hiera file' to seed a node with a few configuration values. In LEAP, we go all out and put *every* configuration value needed for a node in the hiera file, and automatically compile a custom hiera file for each node. + +When you run `leap deploy`, a bunch of things happen, in this order: + +1. **Compile hiera files**: The hiera configuration file for each node is compiled in YAML format and saved in the directory `hiera`. The source material for this hiera file consists of all the JSON configuration files imported or inherited by the node's JSON config file. +* **Copy required files to node**: All the files needed for puppet to run are rsync'ed to each node. This includes the entire leap_platform directory, as well as the node's hiera file and other files needed by puppet to set up the node (keys, binary files, etc). +* **Puppet is run**: Once the node is ready, leap connects to the node via ssh and runs `puppet apply`. Puppet is applied locally on the node, without a daemon or puppetmaster. + +You can run `leap -v2 deploy` to see exactly what commands are being executed. + +<!-- See [under the hood](under-the-hood) for more details. --> + +Additional commands +------------------------------------------- + +Here are a few useful commands you can run on your new local nodes: + +* `leap ssh web1` -- SSH into node web1 (requires `leap node init web1` first). +* `leap list` -- list all nodes. +* `leap list --print ip_address` -- list a particular attribute of all nodes. +* `leap local reset web1` -- return web1 to a pristine state. +* `leap local stop` -- stop all local virtual machines. +* `leap local status` -- get the running state of all the local virtual machines. +* `leap cert update` -- generate new certificates if needed. + +See the full command reference for more information. + +Node filters +------------------------------------------- + +Many of the `leap` commands take a "node filter". You can use a node filter to target a command at one or more nodes. + +A node filter consists of one or more keywords, with an optional "+" before each keyword. + +* keywords can be a node name, a service type, or a tag. +* the "+" before the keyword constructs an AND condition +* otherwise, multiple keywords together construct an OR condition + +Examples: + +* `leap list openvpn` -- list all nodes with service openvpn. +* `leap list openvpn +production` -- only nodes of service type openvpn AND tag production. +* `leap deploy webapp openvpn` -- deploy to all webapp OR openvpn nodes. +* `leap node init vpn1` -- just init the node named vpn1. + +Running on real hardware +----------------------------------- + +The steps required to initialize and deploy to nodes on the public internet are basically the same as we have seen so far for local testing nodes. There are a few key differences: + +* Obviously, you will need to acquire a real or virtual machine that you can SSH into remotely. +* When creating the node configuration, you should give it the tag "production" if the node is to be used in your production infrastructure. +* When creating the node configuration, you need to specify the IP address of the node. + +For example: + + leap node add db1 tags:production services:couchdb ip_address:4.4.4.4 + +Also, running `leap node init NODE_NAME` on a real server will prompt you to verify the fingerprint of the SSH host key and to provide the root password of the server NODE_NAME. You should only need to do this once. + +What's next +----------------------------------- + +Read the [LEAP platform guide](guide) to learn about planning and securing your infrastructure. + diff --git a/doc/service-diagram.odg b/doc/service-diagram.odg Binary files differnew file mode 100644 index 00000000..09265c2d --- /dev/null +++ b/doc/service-diagram.odg diff --git a/doc/service-diagram.png b/doc/service-diagram.png Binary files differnew file mode 100644 index 00000000..85e62436 --- /dev/null +++ b/doc/service-diagram.png diff --git a/doc/under-the-hood.md b/doc/under-the-hood.md new file mode 100644 index 00000000..080a153e --- /dev/null +++ b/doc/under-the-hood.md @@ -0,0 +1,37 @@ +@title = "Under the hood" + +This page contains various details on the how the platform is implemented. You can safely ignore this page, although it may be useful if you plan to make modifications to the platform. + +Puppet Details +====================================== + +Run stages +---------- + +We use two run stages for resource ordering: + +* initial: configure hostname, apt-get update + apt-get dist-upgrade +* main: everything else + +Stage initial is run before stage main. + +see http://docs.puppetlabs.com/puppet/2.7/reference/lang_run_stages.html for run stage documentation. + +Tags +---- + +Tags are beeing used to deploy different classes. + +* leap_base: site_config::default (configure hostname + resolver, sshd, ) +* leap_slow: site_config::slow (slow: apt-get update, apt-get dist-upgrade) +* leap_service: cofigure platform service (openvpn, couchdb, etc.) + +You can pass any combination of tags, i.e. use + +* "--tags leap_base,leap_slow,leap_service" (DEFAULT): Deploy all +* "--tags leap_service": Only deploy service(s) (useful for debugging/development) +* "--tags leap_base": Only deploy basic configuration (again, useful for debugging/development) + +See http://docs.puppetlabs.com/puppet/2.7/reference/lang_tags.html for puppet tag usage. + + |