summaryrefslogtreecommitdiff
path: root/puppet/modules/site_check_mk
AgeCommit message (Collapse)Author
2017-11-08webapp: alert on 409 responsesAzul
They might be meaningful response codes for some scenarios. But so far we are not conciously sending them out. If they occur that is because we handed them down from couch. So we might want to fix the underlying issue. Couch 409s should be caught by the webapp and handled there.
2017-06-22Delay hard state of the nagios APT checkVarac
Delay a hard state of the APT check for 1 day so unattended_upgrades has time to upgrade packages. Resolves: #8748
2017-03-15[8144] Remove Haproxyvarac
We used haproxy because we had multiple bigcouch nodes but now with a single couchdb node this is not needed anymore. - Resolves: #8144
2016-09-06[feat] Add check_mk config values, dont set themvarac
When setting values like ignored_services = [...] this will override other `ignored_services` that might get parsed before. Instead, we use `+=` so multiple files can add sth to this config value.
2016-08-31[bug] Remove Nagios soledad procs checkvarac
leap_cli already checks for running procs - Resolves: #8380
2016-08-31Document site_check_mk::agent::soledadvarac
2016-08-20ignore noisy 401 errors from soledad log.Micah
Change-Id: Ia1764cb28e263353856523c11f351a39774bf3b4
2016-06-30Remove bigcouch (#8056)Micah
Change-Id: I0c6e27298c63bd37de1410985d054799818c22a4
2016-06-07Merge remote-tracking branch 'origin/0.8.x' into developvarac
2016-05-31Reduce check_mk timeouts (#7807).Micah
check_mk operations can take a long time (such as when doing a re-inventory using "check_mk -II") when multiple hosts are down. This decreases the connect timeout to 5 seconds. Change-Id: I1eac5f14bad2afc2ffc4cbf8c950c24b052a0d6e
2016-05-12[feat] catch abnormal proc termination in syslogvarac
Sometimes a floating point exception or segfault of a process results in systemd restarting it, we want to recognize this from the syslog i.e.: systemd[1]: pixelated-server.service: main process exited, code=killed, status=8/FPE systemd[1]: Unit pixelated-server.service entered failed state. - Related: https://github.com/pixelated/pixelated-user-agent/issues/683
2016-05-03[bug] Run check_mk-refresh-inventory-daily after check_mk-refreshvarac
Otherwise, the nagios config will get regenerated and nagios gets reloaded before all checks are registered by a check_mk inventory. - Related: #6873
2016-05-03[bug] run check_mk inventory on every puppetrunvarac
After upgrading the platform, there might be old check_mk checks registered on the monitor hosts. We now run a check_mk inventory on every run that also purged old non-existng checks. - Resolves: #6873
2016-04-03check_mk: monitor webapp log for response code 500Azul
2016-03-31[bug] Fix couch_stats scriptvarac
It failed to calculate the sessions and tokens db names. - Resolves: #7658
2016-03-09[bug] Adopt new parameters from nagios and check_mk modulevarac
2016-02-25fix typo in last commitvarac
2016-02-25check-mk's mk_job depends on the time packagevarac
2016-02-23default to plain couchdb, unless otherwise specified.elijah
# Conflicts: # puppet/modules/site_couchdb/manifests/plain.pp
2015-12-01Switch from 'vmail' to leap-mx's user/group (#6936, #7639)Micah
This change will make sure that the user/group for leap-mx exist, and it changes the mail location from /var/mail/vmail to the more helpful name /var/mail/leap-mx. This change requires: https://github.com/leapcode/leap_mx/pull/78 and it would replace merge request: https://github.com/leapcode/leap_mx/pull/65 and fix https://leap.se/code/issues/6936 and https://leap.se/code/issues/7635 Change-Id: Idbe678dc999e394232c2eeef2b2018d39ab7cc3b
2015-11-17[bug] fix check_mk on jessievarac
- Related: #6920
2015-11-16[feat] Remove redundant nagios check for mx procsvarac
leap_cli integrates a check for running mx procs already, which is also integrated into nagios (called "Mx/Are_MX_daemons_running")
2015-10-31[bug] Add bigcouch syslog snippet for logwatchvarac
2015-10-30[bug] Remove duplicte declarationvarac
Duplicate declaration: File[/srv/leap/nagios/plugins/check_unix_open_fds.pl] is already declared in file /srv/leap/puppet/modules/site_check_mk/manifests/agent/couchdb/bigcouch.pp at line 44; cannot redeclare at /srv/leap/puppet/modules/site_check_mk/manifests/agent/couchdb.pp:23 on node rewdevcouch1.rewire.org
2015-10-30[feat] Remove bigcouch nagios leftoversvarac
When migrating from bigcouch to couchdb, we need to remove leftover nagios tests for bigcouch. - Added new classes: site_check_mk::agent::couchdb::bigcouch and site_check_mk::agent::couchdb::master - Tested: unstable.pixelated-project.org - Resolves: https://github.com/pixelated/pixelated-platform/issues/126
2015-10-06[feat] remove tapicero leftoversvarac
Soledad now creates user-dbs, which has been done by tapicero in the past. we need to remove any leftovers from tapicero.
2015-08-27Merge branch '6847_improve_nagios_mail_subject' into developvarac
2015-08-13Increase readability of nagios notification mail subjects (#6847)varac
Change-Id: Ic9af9ef3602abbb51edf1c9d71d4d264b4ace714
2015-08-12Don't use check_mk logwatch to watch bigcouch logs anymore (#7375)varac
The rationale here is: - bigcouch/its included erlang version is incredibly noisy and spits out warnings/error msgs all the time - it uses the worst logging format i ever saw, multiple lines directly to a file (couch 2.0 uses lager as logging backend which can log to syslog) - trying to sort out the false positives will take too much time, and who knows which of them will be resolved in couch 1.6/2.0 Change-Id: Idbe6b37a19cd65ce31a50d4c28eedb4cf15ba3b5
2015-07-21Increase tapicero heatbeat nagios checks (#7275)Micah Anderson
Increase warning/critical thresholds for time between tapicero heartbeat checks so it will emit less false positives Change-Id: I0f97373d88658b7f17b2c4e8c1963198dc3f66ed
2015-07-07check_mk should not falsely report multiple instances running (#6866)varac
Change-Id: Ie7943c9a541c3cd2feac7686ed1092aadc5a7c7a
2015-07-07Ignore openvpn logwatch warnings (#6867)varac
These are warnings that might have different origins, each of them we don't want to alarm the admin: - A bitmask client bug (user will poke the client devs if things break, and they will go after it) - A simple network failure, packets might get cut of - Malicious user tries to temper with TLS handshakes - this gets more interesting, but still (like ssh bruteforce attacs) an admin would not want to get annoyed by this by default, but they still have the option to use log analysers of their choice if they want to investigate this. Change-Id: I23ca3b700e41f22f34ad3346ed4e647b86000bb2
2015-07-07moved removal of leap_couch_stats.sh TMPFILE to end of script (#7217)varac
Change-Id: If844b95c44e697f480df8ee2ae6607709b9942f7
2015-07-07remove leap_couch_stats.sh TMPFILE so /tmp/ won't fill with tmp files (#7217)varac
Change-Id: I7b778e1e1af2784bd79840f20453ca8718927e25
2015-07-06Don't monitor disabled nodes (#7235)varac
Change-Id: I51ce8a9e8773d267c270a1725a497f9a43f2e9ff Sidenote: $nagios_hosts was never used
2015-05-27leap_couch_stats.sh handles rotated dbs (#6987)varac
Change-Id: I115ebdefd7365bf15a30c4a3ce7a4543ad757cec
2015-04-26run check_mk_agent every 4 instead of 10 minutes, useful for better graphsvarac
Change-Id: Ibefc6ce08cf714cf79a460a8b6eb32e2851ce22c
2015-04-26Tapicero changed it's error message when uploading design doc fails in race ↵varac
condition with another tapicero instance #6534 Change-Id: Ie194a2983210601bd24aef5e74f8b7fa2b7c433f
2015-04-16restore tapicero heartbeat.elijah
2015-04-16clean up logging mess: add 'logfile' define, mv openvpn and stunnel logs to ↵elijah
their own files, fix mx logwatch path.
2015-04-15fix tapicero & webapp logs: remove heartbeat log check, move to ↵elijah
/var/log/tapicero, fix webapp logwatch location.
2015-04-07Merge branch '6749_leap_couch_stats' into developvarac
2015-04-07added local check_mk couchdb script (#6749)varac
leap_couch_stats.sh is a local check_mk agent script which provides per-db stats as well as global stats. Change-Id: I1eba19a3a0210d3127acbad119dfd2918414ff4a
2015-04-01run check_mk tests every 10 minuteselijah
2015-03-12require file for augeas resources in site_check_mk::agent::*varac
Change-Id: Ia5ac6f50e023d7d358d17c661b71c6a5880ec445
2015-03-11Change nagios to be aware of soledad user change (Bug #6612)varac
Change-Id: Id53d6432a58006653f4d9ddd6355ae505a5273eb
2015-03-11Use augeas instead of file_line to configure entries in ↵varac
/etc/check_mk/mrpe.cfg (Bug #6788) We used file_line before, but when the some check parameters change, a new line would be added, leaving the old line there, resulting in two checks with the same name but with different parameters. Augeas can handle this better, but it is important to use 'rm' to remove all old lines with different parameters before adding the new line. Change-Id: Iad69dfd20f487a16d372a4f4a4bc53299f9e4a66
2015-03-04temporarily increase the delay between soledad / web api tests to 60 ↵elijah
minutes, until we are able to fix the issue with the test users creating db bloat.
2015-01-22Provide a base-level set of quality entropy by installing haveged onMicah Anderson
systems by default (#6664) Change-Id: Ic2d4416b7c55f00f01d4b2ade78339d653bc8993
2014-12-18update tapicero logwatch messages to remove extra space0.6.0rc3Micah Anderson
Change-Id: I0149ac2e767531d9724b57b9e3bdae7943f954ff