summaryrefslogtreecommitdiff
path: root/puppet/modules/site_check_mk
AgeCommit message (Collapse)Author
2015-08-27Merge branch '6847_improve_nagios_mail_subject' into developvarac
2015-08-13Increase readability of nagios notification mail subjects (#6847)varac
Change-Id: Ic9af9ef3602abbb51edf1c9d71d4d264b4ace714
2015-08-12Don't use check_mk logwatch to watch bigcouch logs anymore (#7375)varac
The rationale here is: - bigcouch/its included erlang version is incredibly noisy and spits out warnings/error msgs all the time - it uses the worst logging format i ever saw, multiple lines directly to a file (couch 2.0 uses lager as logging backend which can log to syslog) - trying to sort out the false positives will take too much time, and who knows which of them will be resolved in couch 1.6/2.0 Change-Id: Idbe6b37a19cd65ce31a50d4c28eedb4cf15ba3b5
2015-07-21Increase tapicero heatbeat nagios checks (#7275)Micah Anderson
Increase warning/critical thresholds for time between tapicero heartbeat checks so it will emit less false positives Change-Id: I0f97373d88658b7f17b2c4e8c1963198dc3f66ed
2015-07-07check_mk should not falsely report multiple instances running (#6866)varac
Change-Id: Ie7943c9a541c3cd2feac7686ed1092aadc5a7c7a
2015-07-07Ignore openvpn logwatch warnings (#6867)varac
These are warnings that might have different origins, each of them we don't want to alarm the admin: - A bitmask client bug (user will poke the client devs if things break, and they will go after it) - A simple network failure, packets might get cut of - Malicious user tries to temper with TLS handshakes - this gets more interesting, but still (like ssh bruteforce attacs) an admin would not want to get annoyed by this by default, but they still have the option to use log analysers of their choice if they want to investigate this. Change-Id: I23ca3b700e41f22f34ad3346ed4e647b86000bb2
2015-07-07moved removal of leap_couch_stats.sh TMPFILE to end of script (#7217)varac
Change-Id: If844b95c44e697f480df8ee2ae6607709b9942f7
2015-07-07remove leap_couch_stats.sh TMPFILE so /tmp/ won't fill with tmp files (#7217)varac
Change-Id: I7b778e1e1af2784bd79840f20453ca8718927e25
2015-07-06Don't monitor disabled nodes (#7235)varac
Change-Id: I51ce8a9e8773d267c270a1725a497f9a43f2e9ff Sidenote: $nagios_hosts was never used
2015-05-27leap_couch_stats.sh handles rotated dbs (#6987)varac
Change-Id: I115ebdefd7365bf15a30c4a3ce7a4543ad757cec
2015-04-26run check_mk_agent every 4 instead of 10 minutes, useful for better graphsvarac
Change-Id: Ibefc6ce08cf714cf79a460a8b6eb32e2851ce22c
2015-04-26Tapicero changed it's error message when uploading design doc fails in race ↵varac
condition with another tapicero instance #6534 Change-Id: Ie194a2983210601bd24aef5e74f8b7fa2b7c433f
2015-04-16restore tapicero heartbeat.elijah
2015-04-16clean up logging mess: add 'logfile' define, mv openvpn and stunnel logs to ↵elijah
their own files, fix mx logwatch path.
2015-04-15fix tapicero & webapp logs: remove heartbeat log check, move to ↵elijah
/var/log/tapicero, fix webapp logwatch location.
2015-04-07Merge branch '6749_leap_couch_stats' into developvarac
2015-04-07added local check_mk couchdb script (#6749)varac
leap_couch_stats.sh is a local check_mk agent script which provides per-db stats as well as global stats. Change-Id: I1eba19a3a0210d3127acbad119dfd2918414ff4a
2015-04-01run check_mk tests every 10 minuteselijah
2015-03-12require file for augeas resources in site_check_mk::agent::*varac
Change-Id: Ia5ac6f50e023d7d358d17c661b71c6a5880ec445
2015-03-11Change nagios to be aware of soledad user change (Bug #6612)varac
Change-Id: Id53d6432a58006653f4d9ddd6355ae505a5273eb
2015-03-11Use augeas instead of file_line to configure entries in ↵varac
/etc/check_mk/mrpe.cfg (Bug #6788) We used file_line before, but when the some check parameters change, a new line would be added, leaving the old line there, resulting in two checks with the same name but with different parameters. Augeas can handle this better, but it is important to use 'rm' to remove all old lines with different parameters before adding the new line. Change-Id: Iad69dfd20f487a16d372a4f4a4bc53299f9e4a66
2015-03-04temporarily increase the delay between soledad / web api tests to 60 ↵elijah
minutes, until we are able to fix the issue with the test users creating db bloat.
2015-01-22Provide a base-level set of quality entropy by installing haveged onMicah Anderson
systems by default (#6664) Change-Id: Ic2d4416b7c55f00f01d4b2ade78339d653bc8993
2014-12-18update tapicero logwatch messages to remove extra space0.6.0rc3Micah Anderson
Change-Id: I0149ac2e767531d9724b57b9e3bdae7943f954ff
2014-12-17Merge branch 'bug/6566' into 'develop'varac
Bug/6566 See merge request !19
2014-12-17 Check_mk logwatch: ignore openvpn warnings (Feature #6568)varac
Change-Id: I0d30afbcc6dcb90c6716f7c6bb0bca3e6ae0964a
2014-12-17Update to logwatch ignore for tapiceroMicah Anderson
Change-Id: I1d8cedfeb1153312c13f7f182c7ac3b031647dd4
2014-12-17Ignore Soledad "Timing out client" warning (Bug #6566)Micah Anderson
Change-Id: I6d3fa5028ba6eaca7b21a7e850136ef980f6e782
2014-12-17Check tapicero heartbeat (Bug #6556)varac
In order to assure tapicero is still working, we need to monitor /var/log/syslog for the last tapicero log msg, which should not be older than the last check_mk_agent run (every 2 mins atm).
2014-12-17Merge branch 'micah/platform-feature/6544' into developvarac
Conflicts: puppet/modules/site_check_mk/files/agent/logwatch/bigcouch.cfg Change-Id: I1646e49ffa5437a861b402b755bc15943c42ec4f
2014-12-16Ignore "Generic server terminating" bigcouch message (Feature #6544)Micah Anderson
Change-Id: I73defd7964501e4eabe7dd05c02887e7aeb2f063
2014-12-16Merge branch 'bug/6545' into 'develop'varac
Bug/6545 See merge request !16
2014-12-16Ignore postfix "too many errors after DATA" logwatch msg (Bug #6545)Micah Anderson
Change-Id: I0abeb88f7b6548e5742bd3d99b2f4e5d9c6cf421
2014-12-16ignore additional bigcouch error messages (#6512)Micah Anderson
Change-Id: Ie51fb485bcae9a9467c465bdd1b4a5785023db04
2014-12-16Move kernel ipv6 log message up before the 'C error' line to it isMicah Anderson
caught (#6540) Change-Id: I1fe8d4cf60532dfe01cfb3a014c4cbeb4acdc479
2014-12-11Ignore additional tapicero message (#6542):Micah Anderson
tapicero[921]: Checking security of user-1b3b1fb78db851190fa72dac01207b8d failed (trying again soon): RestClient::ResourceNotFound: 404 Resource Not Found: {"error":"not_found","reason":"Database does not exist."}") tapicero recovers from this error Change-Id: Ic105823ddc282512000e6d7445539428581eb997
2014-12-11Increase max_check_attempts for hosts checks (Bug #6535)varac
Change-Id: I10ec569821f329e3bd10ac87242db102e9c82246
2014-12-11Merge branch '6539_increase_time_between_check_mk_agent_runs' into 'develop'Micah
6539 increase time between check mk agent runs https://leap.se/code/issues/6539 See merge request !11
2014-12-11Increase time between two check_mk_agent runs (Bug #6539)varac
right now, check_mk_agent is run every minute on each host. The soledad sync test depends on tapicero, and in between finishing the soledad test and removing the testuser db, and the start of another test there's only 13s Change-Id: I5b22ba02470cce799a12043d21091c0c9b8e0b5f
2014-12-11logwatch: ignore ipv6 icmp errors (Bug #6540)Micah Anderson
Change-Id: I198c5245c7e73d6dd7a7d9725fac1eb9a8f425a5
2014-12-10update ffa53ef321bbfd771afff1ccb230d1b5e4f9ab00 to fix orderingMicah Anderson
requirement in logwatch, remove extended regexp character class and also ignore "Writing security" lines Change-Id: I7d33725db06a40361a3b04f9591adeb6a025bf77
2014-12-10Merge branch 'bug/6512' into 'develop'varac
Bug/6512 See merge request !5
2014-12-10ignore transient Tapicero errors when creating a db (Bug #6511)varac
Change-Id: I0939070482fad4f99f03e41094a3df42ff5063e4
2014-12-09Ignore rexi_EXIT bigcouch error (Bug #6512)Micah Anderson
Change-Id: I03842b65329aabb012cc2c7514007e174cbd8fc0
2014-12-09 logwatch: ignore postfix errors on lost connection (Bug #6476)varac
Change-Id: I0b1eec11a3b3da39d65572b6bee8b3ce892e08ac
2014-12-04remove webapp python tests, because they are integrated into the platform ↵varac
now (Bug #6489) Change-Id: Iaec748a173b6e11bb3ab3c11ca152809817644f9
2014-12-04Merge "Change nagios mail To: Header to contain the actual platform ↵Varac
environment's contact email (Bug #6466)" into develop
2014-12-02Change nagios mail To: Header to contain the actual platform environment's ↵Micah Anderson
contact email (Bug #6466) Change-Id: Ib86ae771e0ac3b6f329a517a8a31c9ec54d33a05
2014-12-02Ignore bigcouch conflict errors, mainly coming from tapicero creating new ↵varac
users (Feature #6481) There are potentially many tapicero daemons running, and they all try to do the same thing at the same time. It is basically designed to create race conditions. All tapicero daemons try to create the user db at the same time. Only one of them wins the race and actually creates it. We need to fix this later (see https://leap.se/code/issues/6480) but for now, we ignore them because conflict errors should be handled by the applictation anyway. Change-Id: I91095b1901d238e3d199954ba3716023d3fd49c1
2014-12-01Increase the nagios alert thresholds for bigcouch open file descriptors (#6473)Micah Anderson
Change-Id: I2549d781427fffc865c2bdcd1e950d60dad509fd