We now have Monit to monitor crucial services on each nodes.
How about we make a services status page to have its switches flipped automatically through it.
Estimated work items
- Find how to catch service recovery in Monit when we could send a recovered call (i.e. service tests)
- Adjust cachet to have 1:1 mapping of components and monit checks
- Make mapping of monit status 1:1 with cachet
- Find way to make Monit send variables into update script
- Create cachet api update script that’ll be used by Monit
- Create API update only account
Proposal
Cachet’ documentation is not very complete but we could use Monit event handler (see how they’d do it with a 3rd party provider)
Configure Monit to make a trigger
# An example of Salt stack managed Monit template
# refer to salt-states/mysql/files/monit.conf.jinja
check process mysql
matching "mysql"
group database
start = "/usr/sbin/service mysql start"
stop = "/usr/sbin/service mysql stop"
if failed host {{ ip4_interfaces[0]|default('127.0.0.1') }} port 3306
protocol MYSQL then restart
if not exist for 3 cycles then restart
if 3 restarts within 5 cycles then exec /path/to/monit_update_cachet_db.sh
Setup an update script
#!/bin/sh
# /path/to/monit_update_cachet_db.sh
# Make an update to the cachet API
# -u would contain pre-populated cachet update only user
# components/2 would be the component id
# we’d have to figure out how monit tells status and make sure the value at status=3 is the right one
#10.10.10.2:8000 is the internal upstream service we send our update requests
/usr/bin/curl -u user:pass -XPUT \
-d status=3 \
10.10.10.2:8000/api/components/2
Example on how to update a component status
Using curl we an update of the database component into partial outage would look like this;
API call
Its using incident status 2, which would mean "partial outage". See also post-parameters section.
curl -u user:pass -XPUT -d status=3 10.10.10.2:8000/api/components/2
{
"data": {
"created_at": 1427482793,
"description": "MariaDB database cluster nodes",
"id": 2,
"incident_count": 0,
"name": "db cluster",
"status": "Partial Outage",
"status_id": 3,
"updated_at": 1430332325
}
}
How the status is displayed

We now have Monit to monitor crucial services on each nodes.
How about we make a services status page to have its switches flipped automatically through it.
Estimated work items
Proposal
Cachet’ documentation is not very complete but we could use Monit event handler (see how they’d do it with a 3rd party provider)
Configure Monit to make a trigger
Setup an update script
Example on how to update a component status
Using
curlwe an update of the database component into partial outage would look like this;API call
Its using incident status 2, which would mean "partial outage". See also post-parameters section.
How the status is displayed