Page 2 of 2

Re: Services don't warn on stop/waiting

Posted: Fri Jun 19, 2015 12:13 pm
by lmiltchev
You forgot to run:

Code: Select all

echo $?
after each command... I wanted to see the exit codes from each command. Otherwise, the output looks fine.

Can you also run the following command on the remote machine and show us the output?

Code: Select all

grep check_init_service /etc/sudoers
If you are missing the line below:

Code: Select all

nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_init_service
you may add it to sudoers to see if this is going to fix the issue.

Re: Services don't warn on stop/waiting

Posted: Mon Jun 22, 2015 9:30 am
by courhar
Done both of those. Should the exit code be a 1 rather than 0 then?

Re: Services don't warn on stop/waiting

Posted: Mon Jun 22, 2015 11:55 am
by jolson
You are correct - the exit codes are specified as follows:
0 OK UP
1 WARNING UP or DOWN/UNREACHABLE*
2 CRITICAL DOWN/UNREACHABLE
3 UNKNOWN DOWN/UNREACHABLE

Re: Services don't warn on stop/waiting

Posted: Tue Jun 23, 2015 2:33 am
by courhar
So how can I change the check_init_services script so that it gives a 1? for finding a service stopped/waiting? or even a 2 as the services are critical for what we are trying to monitor.

Re: Services don't warn on stop/waiting

Posted: Tue Jun 23, 2015 12:17 pm
by lmiltchev
Is the service actually running? Can you run the following command on the remote machine and show us the output?

Code: Select all

ps aux | grep heat-api
If the "check_init_service" is not working with this service, perhaps you can find a different plugin or set up a custom command in order to accomplish your goal.

Re: Services don't warn on stop/waiting

Posted: Wed Jun 24, 2015 4:30 am
by courhar
Ran the command you asked. Is there another plug in you could recommend? I tried check_services which didn't alarm either.

Thanks

Courhar

Re: Services don't warn on stop/waiting

Posted: Wed Jun 24, 2015 11:56 am
by lmiltchev
You may want to try using SNMP. Please, review our documentation on the topic here:

https://assets.nagios.com/downloads/nag ... g_SNMP.pdf

An example of a command for checking the sshd process would be:

Code: Select all

/usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <remote machine>  -C <community string> --v2c -n 'sshd' -w '1,30' -c '1,50'
2 process matching sshd (> 1) (<= 30):OK
I am not sure if using "heat-api" as the name of the process (regexp) will produce the expected results. You may need to run snmpwalk to see what is available.