docs/platform/guide.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354

@title = "LEAP Platform Guide"
@nav_title = "Guide"

Node types
================================

Every node has one or more services that determines the node's function within your provider's infrastructure.

When adding a new node to your provider, you should ask yourself four questions:

* **many or few?** Some services benefit from having many nodes, while some services are best run on only one or two nodes.
* **required or optional?** Some services are required, while others can be left out.
* **who does the node communicate with?** Some services communicate very heavily with other particular services. Nodes running these services should be close together.
* **public or private?** Some services communicate with the public internet, while others only need to communicate with other nodes in the infrastructure.

Brief overview of the services:

* **webapp**: The web application. Runs both webapp control panel for users and admins as well as the REST API that the client uses. Needs to communicate heavily with `couchdb` nodes. You need at least one, good to have two for redundancy. The webapp does not get a lot of traffic, so you will not need many.
* **couchdb**: The database for users and user data. You can get away with just one, but for proper redundancy you should have at least three. Communicates heavily with `webapp`, `mx`, and `soledad` nodes.
* **soledad**: Handles the data syncing with clients. Typically combined with `couchdb` service, since it communicates heavily with couchdb.
* **mx**: Incoming and outgoing MX servers. Communicates with the public internet, clients, and `couchdb` nodes.
* **openvpn**: OpenVPN gateway for clients. You need at least one, but want as many as needed to support the bandwidth your users are doing. The `openvpn` nodes are autonomous and don't need to communicate with any other nodes. Often combined with `tor` service.
* **monitor**: Internal service to monitor all the other nodes. Currently, you can have zero or one `monitor` nodes.
* **tor**: Sets up a tor exit node, unconnected to any other service.
* **dns**: Not yet implemented.

Webapp
-----------------------------------

The webapp node is responsible for both the user face web application and the API that the client interacts with.

Some users can be "admins" with special powers to answer tickets and close accounts. To make an account into an administrator, you need to configure the `webapp.admins` property with an array of user names.

For example, to make users `alice` and `bob` into admins, create a file `services/webapp.json` with the following content:

    {
      "webapp": {
        "admins": ["bob", "alice"]
      }
    }

And then redeploy to all webapp nodes:

    leap deploy webapp

By putting this in `services/webapp.json`, you will ensure that all webapp nodes inherit the value for `webapp.admins`.

Services
================================

What nodes do you need for a provider that offers particular services?

<table class="table table-striped">
<tr>
<th>Node Type</th>
<th>VPN Service</th>
<th>Email Service</th>
</tr>
<tr>
<td>webapp</td>
<td>required</td>
<td>required</td>
</tr>
<tr>
<td>couchdb</td>
<td>required</td>
<td>required</td>
</tr>
<tr>
<td>soledad</td>
<td>not used</td>
<td>required</td>
</tr>
<tr>
<td>mx</td>
<td>not used</td>
<td>required</td>
</tr>
<tr>
<td>openvpn</td>
<td>required</td>
<td>not used</td>
</tr>
<tr>
<td>monitor</td>
<td>optional</td>
<td>optional</td>
</tr>
<tr>
<td>tor</td>
<td>optional</td>
<td>optional</td>
</tr>
<table>

Locations
================================

All nodes should have a `location.name` specified, and optionally additional information about the location, like the time zone. This location information is used for two things:

* Determine which nodes can, or must, communicate with one another via a local network. The way some virtualization environments work, like OpenStack, requires that nodes communicate via the local network if they are on the same network.
* Allows the client to prefer connections to nodes that are closer in physical proximity to the user. This is particularly important for OpenVPN nodes.

The location stanza in a node's config file looks like this:

    {
      "location": {
        "id": "ankara",
        "name": "Ankara",
        "country_code": "TR",
        "timezone": "+2",
        "hemisphere": "N"
      }
    }

The fields:

* `id`: An internal handle to use for this location. If two nodes have match `location.id`, then they are treated as being on a local network with one another. This value defaults to downcase and underscore of `location.name`.
* `name`: Can be anything, might be displayed to the user in the client if they choose to manually select a gateway.
* `country_code`: The [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1) two letter country code.
* `timezone`: The timezone expressed as an offset from UTC (in standard time, not daylight savings). You can look up the timezone using this [handy map](http://www.timeanddate.com/time/map/).
* `hemisphere`: This should be "S" for all servers in South America, Africa, or Australia. Otherwise, this should be "N".

These location options are very imprecise, but good enough for most usage. The client often does not know its own location precisely either. Instead, the client makes an educated guess at location based on the OS's timezone and locale.

If you have multiple nodes in a single location, it is best to use a tag for the location. For example:

`tags/ankara.json`:

    {
      "location": {
        "name": "Ankara",
        "country_code": "TR",
        "timezone": "+2",
        "hemisphere": "N"
      }
    }

`nodes/vpngateway.json`:

    {
      "services": "openvpn",
      "tags": ["production", "ankara"],
      "ip_address": "1.1.1.1",
      "openvpn": {
        "gateway_address": "1.1.1.2"
      }
    }

Unless you are using OpenStack or AWS, setting `location` for nodes is not required. It is, however, highly recommended.

Working with SSH
================================

Whenever the `leap` command nees to push changes to a node or gather information from a node, it tunnels this command over SSH. Another way to put this: the security of your servers rests entirely on SSH. Because of this, it is important that you understand how `leap` uses SSH.

SSH related files
-------------------------------

Assuming your provider directory is called 'provider':

* `provider/nodes/crow/crow_ssh.pub` -- The public SSH host key for node 'crow'.
* `provider/users/alice/alice_ssh.pub` -- The public SSH user key for user 'alice'. Anyone with the private key that corresponds to this public key will have root access to all nodes.
* `provider/files/ssh/known_hosts` -- An autogenerated known_hosts, built from combining `provider/nodes/*/*_ssh.pub`. You must not edit this file directly. If you need to change it, remove or change one of the files that is used to generate `known_hosts` and then run `leap compile`.
* `provider/files/ssh/authorized_keys` -- An autogenerated list of all the user SSH keys with root access to the notes. It is created from `provider/users/*/*_ssh.pub`. You must not edit this file directly. If you need to change it, remove or change one of the files that is used to generate `authorized_keys` and then run `leap compile`.

All of these files should be committed to source control.

If you rename, remove, or add a node with `leap node [mv|add|rm]` the SSH key files and the `known_hosts` file will get properly updated.

SSH and local nodes
-----------------------------

Local nodes are run as Vagrant virtual machines. The `leap` command handles SSH slightly differently for these nodes.

Basically, all the SSH security is turned off for local nodes. Since local nodes only exist for a short time on your computer and can't be reached from the internet, this is not a problem.

Specifically, for local nodes:

1. `known_hosts` is never updated with local node keys, since the SSH public key of a local node is different for each user.
2. `leap` entirely skips the checking of host keys when connecting with a local node.
3. `leap` adds the public Vagrant SSH key to the list of SSH keys for a user. The public Vagrant SSH key is a shared and insecure key that has root access to most Vagrant virtual machines.

When SSH host key changes
-------------------------------

If the host key for a node has changed, you will get an error "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED".

To fix this, you need to remove the file `files/nodes/stompy/stompy_ssh.pub` and run `leap node init stompy`, where the node's name is 'stompy'. **Only do this if you are ABSOLUTELY CERTAIN that the node's SSH host key has changed**.

Changing the SSH port
--------------------------------

Suppose you have a node `blinky` that has SSH listening on port 22 and you want to make it port 2200.

First, modify the configuration for `blinky` to specify the variable `ssh.port` as 2200. Usually, this is done in `common.json` or in a tag file.

For example, you could put this in `tags/production.json`:

    {
      "ssh": {
        "port": 2200
      }
    }

Run `leap compile` and open `hiera/blinky.yaml` to confirm that `ssh.port` is set to 2200. The port number must be specified as a number, not a string (no quotes).

Then, you need to deploy this change so that SSH will bind to 2200. You cannot simply run `leap deploy blinky` because this command will default to using the variable `ssh.port` which is now `2200` but SSH on the node is still bound to 22.

So, you manually override the port in the deploy command, using the old port:

    leap deploy --port 22 blinky

Afterwards, SSH on `blinky` should be listening on port 2200 and you can just run `leap deploy blinky` from then on.

X.509 Certificates
================================

Configuration options
-------------------------------------------

The `ca` option in provider.json provides settings used when generating CAs and certificates. The defaults are as follows:

    {
      "ca": {
        "name": "= global.provider.ca.organization + ' Root CA'",
        "organization": "= global.provider.name[global.provider.default_language]",
        "organizational_unit": "= 'https://' + global.provider.domain",
        "bit_size": 4096,
        "digest": "SHA256",
        "life_span": "10y",
        "server_certificates": {
          "bit_size": 2048,
          "digest": "SHA256",
          "life_span": "1y"
        },
        "client_certificates": {
          "bit_size": 2048,
          "digest": "SHA256",
          "life_span": "2m",
          "limited_prefix": "LIMITED",
          "unlimited_prefix": "UNLIMITED"
        }
      }
    }

You should not need to override these defaults in your own provider.json, but you can if you want to. To see what values are used for your provider, run `leap inspect provider.json`.

NOTE: A certificate `bit_size` greater than 2048 will probably not be recognized by most commercial CAs.

Certificate Authorities
-----------------------------------------

There are three x.509 certificate authorities (CA) associated with your provider:

1. **Commercial CA:** It is strongly recommended that you purchase a commercial cert for your primary domain. The goal of platform is to not depend on the commercial CA system, but it does increase security and usability if you purchase a certificate. The cert for the commercial CA must live at `files/cert/commercial_ca.crt`.
2. **Server CA:** This is a self-signed CA responsible for signing all the **server** certificates. The private key lives at `files/ca/ca.key` and the public cert lives at `files/ca/ca.crt`. The key is very sensitive information and must be kept private. The public cert is distributed publicly.
3. **Client CA:** This is a self-signed CA responsible for signing all the **client** certificates. The private key lives at `files/ca/client_ca.key` and the public cert lives at `files/ca/client_ca.crt`. Neither file is distribute publicly. It is not a big deal if the private key for the client CA is compromised, you can just generate a new one and re-deploy.

To generate both the Server CA and the Client CA, run the command:

    leap cert ca

Server certificates
-----------------------------------

Most every server in your service provider will have a x.509 certificate, generated by the `leap` command using the Server CA. Whenever you modify any settings of a node that might affect it's certificate (like changing the IP address, hostname, or settings in provider.json), you can magically regenerate all the certs that need to be regenerated with this command:

    leap cert update

Run `leap help cert update` for notes on usage options.

Because the server certificates are generated locally on your personal machine, the private key for the Server CA need never be put on any server. It is up to you to keep this file secure.

Client certificates
--------------------------------

Every leap client gets its own time-limited client certificate. This cert is use to connect to the OpenVPN gateway (and probably other things in the future). It is generated on the fly by the webapp using the Client CA.

To make this work, the private key of the Client CA is made available to the webapp. This might seem bad, but compromise of the Client CA simply allows the attacker to use the OpenVPN gateways without paying. In the future, we plan to add a command to automatically regenerate the Client CA periodically.

There are two types of client certificates: limited and unlimited. A client using a limited cert will have its bandwidth limited to the rate specified by `provider.service.bandwidth_limit` (in Bytes per second). An unlimited cert is given to the user if they authenticate and the user's service level matches one configured in `provider.service.levels` without bandwidth limits. Otherwise, the user is given a limited client cert.

Commercial certificates
-----------------------------------

We strongly recommend that you use a commercial signed server certificate for your primary domain (in other words, a certificate with a common name matching whatever you have configured for `provider.domain`). This provides several benefits:

1. When users visit your website, they don't get a scary notice that something is wrong.
2. When a user runs the LEAP client, selecting your service provider will not cause a warning message.
3. When other providers first discover your provider, they are more likely to trust your provider key if it is fetched over a commercially verified link.

The LEAP platform is designed so that it assumes you are using a commercial cert for the primary domain of your provider, but all other servers are assumed to use non-commercial certs signed by the Server CA you create.

To generate a CSR, run:

    leap cert csr

This command will generate the CSR and private key matching `provider.domain` (you can change the domain with `--domain=DOMAIN` switch). It also generates a server certificate signed with the Server CA. You should delete this certificate and replace it with a real one once it is created by your commercial CA.

The related commercial cert files are:

    files/
      certs/
        domain.org.crt    # Server certificate for domain.org, obtained by commercial CA.
        domain.org.csr    # Certificate signing request
        domain.org.key    # Private key for you certificate
        commercial_ca.crt # The CA cert obtained from the commercial CA.

The private key file is extremely sensitive and care should be taken with its provenance.

If your commercial CA has a chained CA cert, you should be OK if you just put the **last** cert in the chain into the `commercial_ca.crt` file. This only works if the other CAs in the chain have certs in the debian package `ca-certificates`, which is the case for almost all CAs.

If you want to add additional fields to the CSR, like country, city, or locality, you can configure these values in provider.json like so:

      "ca": {
        "server_certificates": {
          "country": "US",
          "state": "Washington",
          "locality": "Seattle"
        }
      }

If they are not present, the CSR will be created without them.

Facts
==============================

There are a few cases when we must gather internal data from a node before we can successfully deploy to other nodes. This is what `facts.json` is for. It stores a snapshot of certain facts about each node, as needed. Entries in `facts.json` are updated automatically when you initialize, rename, or remove a node. To manually force a full update of `facts.json`, run:

    leap facts update FILTER

Run `leap help facts update` for more information.

The file `facts.json` should be committed to source control. You might not have a `facts.json` if one is not required for your provider.

Disabling Nodes
=====================================

There are two ways to temporarily disable a node:

**Option 1: enabled == false**

If a node has a property `enabled` set to false, then the `leap` command will skip over the node and pretend that it does not exist. For example:

    {
      "ip_address": "1.1.1.1",
      "services": ["openvpn"],
      "enabled": false
    }

**Options 2: no-deploy**

If the file `/etc/leap/no-deploy` exists on a node, then when you run the commmand `leap deploy` it will halt and prevent a deploy from going through (if the node was going to be included in the deploy).