From 49513b828f019a0eb7c6f5082f6e9d817136904a Mon Sep 17 00:00:00 2001 From: Micah Anderson Date: Thu, 11 Jun 2015 10:36:16 -0400 Subject: update /doc dir with latest from leap docs/platform Change-Id: If4bcf7e2139b672c3e38f55e54d1f121a5601860 --- doc/troubleshooting/en.haml | 2 - doc/troubleshooting/known-issues.md | 88 ++++++++++++++++++++++++++----------- doc/troubleshooting/tests.md | 39 +++++++++++++++- doc/troubleshooting/vagrant.md | 45 +++++++++++++++++++ 4 files changed, 146 insertions(+), 28 deletions(-) create mode 100644 doc/troubleshooting/vagrant.md (limited to 'doc/troubleshooting') diff --git a/doc/troubleshooting/en.haml b/doc/troubleshooting/en.haml index a4b44939..f0f1359c 100644 --- a/doc/troubleshooting/en.haml +++ b/doc/troubleshooting/en.haml @@ -1,5 +1,3 @@ - @title = "Troubleshooting" -%h1.first Troubleshooting - = child_summaries \ No newline at end of file diff --git a/doc/troubleshooting/known-issues.md b/doc/troubleshooting/known-issues.md index b924fa4b..4defc886 100644 --- a/doc/troubleshooting/known-issues.md +++ b/doc/troubleshooting/known-issues.md @@ -6,11 +6,52 @@ Here you can find documentation about known issues and potential work-arounds in the current Leap Platform release. 0.6.0 -===== +============== -openvpn -------- -. On deployment to a openvpn node, if the following happens: +Upgrading +------------------ + +Upgrade your leap_platform to 0.6 and make sure you have the latest leap_cli. + +**Update leap_platform:** + + cd leap_platform + git pull + git checkout -b 0.6.0 0.6.0 + +**Update leap_cli:** + +If it is installed as a gem from rubygems: + + sudo gem update leap_cli + +If it is installed as a gem from source: + + cd leap_cli + git pull + git checkout master + rake build + sudo rake install + +If it is run directly from source: + + cd leap_cli + git pull + git checkout master + +To upgrade: + + leap --version # must be at least 1.6.2 + leap cert update + leap deploy + leap test + +If the tests fail, try deploying again. If a test fails because there are two tapicero daemons running, you need to ssh into the server, kill all the tapicero daemons manually, and then try deploying again (sometimes the daemon from platform 0.5 would put its PID file in an odd place). + +OpenVPN +------------------ + +On deployment to a openvpn node, if the following happens: - err: /Stage[main]/Site_openvpn/Service[openvpn]/ensure: change from stopped to running failed: Could not start Service[openvpn]: Execution of '/etc/init.d/openvpn start' returned 1: at /srv/leap/puppet/modules/site_openvpn/manifests/init.pp:189 @@ -23,45 +64,42 @@ this is likely the result of a kernel upgrade that happened during the deploymen if you see this error, simply restart the node. CouchDB -------- -. You can't deploy new couchdb nodes after one or more have been deployed. Make *sure* that you configure and deploy all your couchdb nodes when starting the provider. The problem is that we dont not have a clean way of adding couch nodes after initial creation of the databases, so any nodes added after result in improperly synchronized data. See Bug [#5601](https://leap.se/code/issues/5601) for more information. +--------------------- -. In some scenarios, such as when certain components are unavailable, the couchdb syncing will be broken. When things are brought back to normal, shortly after restart, the nodes will attempt to resync all their data, and can fail to complete this process because they run out of file descriptors. A symptom of this is the webapp wont allow you to register or login, the /opt/bigcouch/var/log/bigcouch.log is huge with a lot of errors that include (over multiple lines): {error, emfile}}. We have raised the limits for available file descriptors to bigcouch to try and accommodate for this situation, but if you still experience it, you may need to increase your /etc/sv/bigcouch/run ulimit values and restart bigcouch while monitoring the open file descriptors. We hope that in the next platform release, a newer couchdb will be better at handling these resources. +At the moment, we strongly advise only have one bigcouch server for stability purposes. + +With multiple couch nodes (not recommended at this time), in some scenarios, such as when certain components are unavailable, the couchdb syncing will be broken. When things are brought back to normal, shortly after restart, the nodes will attempt to resync all their data, and can fail to complete this process because they run out of file descriptors. A symptom of this is the webapp wont allow you to register or login, the /opt/bigcouch/var/log/bigcouch.log is huge with a lot of errors that include (over multiple lines): {error, emfile}}. We have raised the limits for available file descriptors to bigcouch to try and accommodate for this situation, but if you still experience it, you may need to increase your /etc/sv/bigcouch/run ulimit values and restart bigcouch while monitoring the open file descriptors. We hope that in the next platform release, a newer couchdb will be better at handling these resources. You can also see the number of file descriptors in use by doing: # watch -n1 -d lsof -p `pidof beam`|wc -l +The command `leap db destroy` will not automatically recreate new databases. You must run `leap deploy` afterwards for this. + User setup and ssh ------------------ -. if you aren't using a single ssh key, but have different ones, you will need to define the following at the top of your ~/.ssh/config: - HostName - IdentityFile - - (see: https://leap.se/code/issues/2946 and https://leap.se/code/issues/3002) - -. If the ssh host key changes, you need to run node init again (see: https://leap.se/en/docs/platform/guide#Working.with.SSH) - -. To remove an admin's access to your servers, please remove the directory for that user under the `users/` subdirectory in your provider directory and then remove that user's ssh keys from files/ssh/authorized_keys. When finished you *must* run a `leap deploy` to update that information on the servers. +At the moment, it is only possible to add an admin who will have access to all LEAP servers (see: https://leap.se/code/issues/2280) -. At the moment, it is only possible to add an admin who will have access to all LEAP servers (see: https://leap.se/code/issues/2280) +The command `leap add-user --self` allows only one SSH key. If you want to specify more than one key for a user, you can do it manually: -. leap add-user --self allows only one key - if you run that command twice with different keys, you will just replace the key with the second key. To add a second key, add it manually to files/ssh/authorized_keys (see: https://leap.se/code/issues/866) + users/userx/userx_ssh.pub + users/userx/otherkey_ssh.pub +All keys matching 'userx/*_ssh.pub' will be used for that user. Deploying --------- -. If you have any errors during a run, please try to deploy again as this often solves non-deterministic issues that were not uncovered in our testing. Please re-deploy with `leap -v2 deploy` to get more verbose logs and capture the complete output to provide to us for debugging. +If you have any errors during a run, please try to deploy again as this often solves non-deterministic issues that were not uncovered in our testing. Please re-deploy with `leap -v2 deploy` to get more verbose logs and capture the complete output to provide to us for debugging. -. If when deploying your debian mirror fails for some reason, network anomoly or the mirror itself is out of date, then platform deployment will not succeed properly. Check the mirror is up and try to deploy again when it is resolved (see: https://leap.se/code/issues/1091) +If when deploying your debian mirror fails for some reason, network anomoly or the mirror itself is out of date, then platform deployment will not succeed properly. Check the mirror is up and try to deploy again when it is resolved (see: https://leap.se/code/issues/1091) -. Deployment gives 'error: in `%`: too few arguments (ArgumentError)' - this is because you attempted to do a deploy before initializing a node, please initialize the node first and then do a deploy afterwards (see: https://leap.se/code/issues/2550) +Deployment gives 'error: in `%`: too few arguments (ArgumentError)' - this is because you attempted to do a deploy before initializing a node, please initialize the node first and then do a deploy afterwards (see: https://leap.se/code/issues/2550) -. This release has no ability to custom configure apt sources or proxies (see: https://leap.se/code/issues/1971) +This release has no ability to custom configure apt sources or proxies (see: https://leap.se/code/issues/1971) -. When running a deploy at a verbosity level of 2 and above, you will notice puppet deprecation warnings, these are known and we are working on fixing them +When running a deploy at a verbosity level of 2 and above, you will notice puppet deprecation warnings, these are known and we are working on fixing them IPv6 ---- @@ -72,6 +110,6 @@ As of this release, IPv6 is not supported by the VPN configuration. If IPv6 is d Special Environments -------------------- -. When deploying to OpenStack release "nova" or newer, you will need to do an initial deploy, then when it has finished run `leap facts update` and then deploy again (see: https://leap.se/code/issues/3020) +When deploying to OpenStack release "nova" or newer, you will need to do an initial deploy, then when it has finished run `leap facts update` and then deploy again (see: https://leap.se/code/issues/3020) -. It is not possible to actually use the EIP openvpn server on vagrant nodes (see: https://leap.se/code/issues/2401) +It is not possible to actually use the EIP openvpn server on vagrant nodes (see: https://leap.se/code/issues/2401) diff --git a/doc/troubleshooting/tests.md b/doc/troubleshooting/tests.md index 84064043..b85c19d2 100644 --- a/doc/troubleshooting/tests.md +++ b/doc/troubleshooting/tests.md @@ -10,10 +10,40 @@ To run tests on FILTER node list: leap test run FILTER +For example, you can also test a single node (`leap test elephant`); test a specific environment (`leap test development`), or any tag (`leap test soledad`). + Alternately, you can run test on all nodes (probably only useful if you have pinned the environment): leap test +The tests that are performed are located in the platform under the tests directory. + +## Testing with the bitmask client + +Download the provider ca: + + wget --no-check-certificate https://example.org/ca.crt -O /tmp/ca.crt + +Start bitmask: + + bitmask --ca-cert-file /tmp/ca.crt + +## Testing Recieving Mail + +Use i.e. swaks to send a testmail + + swaks -f noone@example.org -t testuser@example.org -s example.org + +and use your favorite mail client to examine your inbox. + +You can also use [offlineimap](http://offlineimap.org/) to fetch mails: + + offlineimap -c vagrant/.offlineimaprc.example.org + +WARNING: Use offlineimap *only* for testing/debugging, +because it will save the mails *decrypted* locally to +your disk ! + ## Monitoring In order to set up a monitoring node, you simply add a `monitor` service tag to the node configuration file. It could be combined with any other service, but we propose that you add it to the webapp node, as this already is public accessible via HTTPS. @@ -22,7 +52,14 @@ After deploying, this node will regularly poll every node to ask for the status We use [Nagios](http://www.nagios.org/) together with [Check MK agent](https://en.wikipedia.org/wiki/Check_MK) for running checks on remote hosts. -You can log into the monitoring web interface via [https://MONITORNODE/nagios3/](https://MONITORNODE/nagios3/). The username is `nagiosadmin` and the password is found in the secrets.json file in your provider directory. +One nagios installation will monitor all nodes in all your environments. You can log into the monitoring web interface via [https://DOMAIN/nagios3/](https://DOMAIN/nagios3/). The username is `nagiosadmin` and the password is found in the secrets.json file in your provider directory. +Nagios will send out mails to the `contacts` address provided in `provider.json`. + + +## Nagios Frontents + +There are other ways to check and get notified by Nagios besides regularly checking the Nagios webinterface or reading email notifications. Check out the [Frontends (GUIs and CLIs)](http://exchange.nagios.org/directory/Addons/Frontends-%28GUIs-and-CLIs%29) on the Nagios project website. +A recommended status tray application is [Nagstamon](https://nagstamon.ifw-dresden.de/), which is available for Linux, MacOS X and Windows. It can not only notify you of hosts/services failures, you can also acknoledge or recheck these with it. ### Log Monitoring diff --git a/doc/troubleshooting/vagrant.md b/doc/troubleshooting/vagrant.md new file mode 100644 index 00000000..ad284161 --- /dev/null +++ b/doc/troubleshooting/vagrant.md @@ -0,0 +1,45 @@ +@title = 'LEAP Platform Vagrant testing' +@nav_title = 'Vagrant Integration' +@summary = 'Testing your provider with Vagrant' + +Setting up Vagrant for a testing the platform +============================================= + +There are two ways you can setup leap platform using vagrant. + +Using the Vagrantfile provided by Leap Platform +----------------------------------------------- + +This is by far the easiest way. It will install a single node mail server in the default +configuration with one single command. + +Clone the platform with + + git clone https://github.com/leapcode/leap_platform.git + +Start the vagrant box with + + cd leap_platform + vagrant up + +Follow the instructions how to configure your `/etc/hosts` +in order to use the provider! + +You can login via ssh with the systemuser `vagrant` and the same password. + +There are 2 users preconfigured: + +. `testuser` with pw `hallo123` +. `testadmin` with pw `hallo123` + + +Use the leap_cli vagrant integration +------------------------------------ + +Install leap_cli and leap_platform on your host, configure a provider from scratch and use the `leap local` commands to manage your vagrant node(s). + +See https://leap.se/en/docs/platform/development how to use the leap_cli vagrant +integration and https://leap.se/en/docs/platform/tutorials/single-node-email how +to setup a single node mail server. + + -- cgit v1.2.3