Page 1 of 2

Passive checks incorrectly/partially processed and PENDING

Posted: Fri Feb 02, 2018 11:42 am
by JoostICM
Hello,

I hope someone can help me with the following problem I am experiencing with passive checks on NAGIOS Core.

We've started only recently to deploy passive checks so I am not at all experienced in implementing these. It might be I am making a stupid mistake, but we've looked for problems for a while without finding any. Therefore I hope to find someone here that can help me get this up and running.

I have a crontabbed script on the NAGIOS host itself that retrieves the state of a number of internet MPEG streams and looks if they are up or down. The script runs every 10 minutes and, after some processing, drops PROCESS_SERVICE_CHECK_RESULT records in the command file. When I tail the command file (standard location and name) during the time the script runs I see it correctly submitting the commands so everything from running the script, retrieving the info, processing it and writing output into the cmd file apparently is ok:

example of submitted records in the cmd file:

Code: Select all

[1517587415] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream1;2;Stream is OFFLINE
[1517587415] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream2;2;Stream is OFFLINE
[1517587416] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream3;0;Stream is ONLINE
[1517587416] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream4;0;Stream is ONLINE
Host declaration:

Code: Select all

define host {
        use                             linux-server
        host_name                       Streams
        alias                           Check Streams
        address                         127.0.0.1
        check_period                    24x7
        check_command                   check-host-alive
        notification_period             24x7
}
Service declaration:

Code: Select all

define service {
        use                             generic-service
        host_name                       Streams
        service_description             Stream1
        active_checks_enabled           0                       ; Active service checks are disabled
        passive_checks_enabled          1
        parallelize_check               1
        is_volatile                     1
        obsess_over_service             0
        check_freshness                 1                       ; Check service 'freshness'
        freshness_threshold             4800                    ; How fresh must the check be?
        check_command                   gone_stale              ; Report staleness
}
gone_stale as declared in the command file:

Code: Select all

define command {
        command_name gone_stale
        command_line /usr/lib/nagios/plugins/check_dummy 2 “There hasn’t been an update recently!”
}
I also noticed this command works fine as after the stale timeout it switches all services to critical. This is processed completely so all of them go crit, even the 3/7 already processed. But I have submitted numerous results with return code 0 (or 2) on more than half of them and every 10 minutes, so this should not happen. Why these go stale is beyond me.

switches in nagios.cfg enabled:

Code: Select all

accept_passive_service_checks=1
check_service_freshness=1
What happens is the following:
- I receive 7 OFFLINE critical submits and around 11 ONLINE.
- Only 3 of those OFFLINE services go critical, the other 4 remain PENDING
- ALL return code 0 services do not change state in the GUI and stay PENDING
- Although the script deposits close to 20 results in the command file the nagios.log only shows 3 out of 7 critical records processed and no Ok records processed.
- Submitting passive check results via the GUI do not seem to adapt the service state to the result entered and check stays on PENDING. The record can be seen in the cmd file but nagios.log doesn't mention any processing of the record. This is both when submitting an Ok or a Critical.

I would like to see NAGIOS set all online (return code 0) services to green/Ok and the offline ones (return code 2) to red/critical. If the state of the service changes NAGIOS should adapt the service color/state. Unfortunately I get neither of those to work.

My thanks in advance for any response! If you need any additional data please let me know!

Best regards, Joost

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Fri Feb 02, 2018 3:49 pm
by tgriep
What version of Nagios are you running?
If you are sending the Offline alerts with the same state but a different message, they will not be logged unless State Stalking is setup.
https://assets.nagios.com/downloads/nag ... lking.html

Couple of thing, if there are more that 2 Nagios services running, that could cause the issue you are having.
Stop the nagios service and verify that all of them are stopped. Kill them off if there are any left.

You may want to disable parallelize_check for those services.
What that does is allow more than one of the service checks to run at the same time and if the same check runs, it could cause the issue you are having.
Try that and see if that fixes the issue.

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Mon Feb 05, 2018 3:54 am
by JoostICM
hello tgriep,

Thanks for your reply!

1. I don't think stalking is the answer if I read the source you referred to.
If the plugin always returns the same text output for a particular state, there is no reason to enable stalking for that state.
As long as the status of the stream doesn't change log messages are the same and text output doesn't change. I have only two possible values in that field. "Stream is ONLINE" and "Stream is OFFLINE". Or do you suggest this because of another reason than the log message?

2. Have verified amount of daemons running. When I shut down the nagios process there is no record left in ps rollouts so I guess this isn't the case.

3. I will disable those and see what effect it has. Thanks for pointing that out!

Thanks!
Joost

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Mon Feb 05, 2018 5:52 am
by JoostICM
I've checked everything you pointed at.

When I disabled parallel checking I saw more negatives come in (crits) but still no state change on Ok. The checks that are ok stay on PENDING.

Also, after the stale timeout is passed all go to critical yet again. NAGIOS seems to not notice it has received multiple records for all affected services, as even the ones set to offline (critical state) go stale while it receives check results every 10 minutes and the stale timeout is 1,5h.

I tried disabling volatile services just to try something different but this didn't affect the problem at all.

I'm running NAGIOS Core version 4.3.4.

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Mon Feb 05, 2018 1:39 pm
by npolovenko
@JoostICM, We do have a recommended script to passively process service checks:
https://old.nagios.org/developerinfo/ex ... and_id=114
Can you run it against your service and let us know whether it works?
Also, please upload the script that you previously used to populate the cmd file.
Thank you.

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Tue Feb 06, 2018 9:17 am
by JoostICM
Hello,

My submission script, found somewhere online:

Code: Select all

#!/bin/sh

# Write a command to the Nagios command file to cause
# it to process a service check result

echocmd="/bin/echo"

CommandFile="/usr/local/nagios/var/rw/nagios.cmd"

# get the current date/time in seconds since UNIX epoch
datetime=`date +%s`

# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"

# append the command to the end of the command file
`$echocmd $cmdline >> $CommandFile`
The script that calls this one does that as follows:

Code: Select all

/usr/local/nagios/libexec/eventhandlers/submit_check_result Streams $var1 $var2 $var3
With:
var1: Service Name
var2: Alarm State (0 or 2)
var3: Additional info text (plugin output, "Stream is ONLINE/OFFLINE" text)

As the dispatch into the cmd file is correctly formatted I doubt changing this script will do anything, but I have seen weirder things in my work space ;-)

Thanks again!

Greetings Joost

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Tue Feb 06, 2018 9:35 am
by JoostICM
update: tried the passive result script you provided.

adapted script (running on ubuntu):
#!/bin/sh
# This is a sample shell script showing how you can submit the PROCESS_SERVICE_CHECK_RESULT command
# to Nagios. Adjust variables to fit your environment as necessary.

now=`date +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'

/usr/bin/printf "[%lu] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream1;0;OK- Everything Looks Great\n" $now > $commandfile
Output in CMD file:

Code: Select all

 [1517927427] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream1;0;OK- Everything Looks Great 
Check stays on Pending. No result visible in NAGIOS Core GUI.

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Tue Feb 06, 2018 11:42 am
by npolovenko
@JoostICM, Let's check whether you have any problems with the time on your system.
Please run this command to check the local time:

Code: Select all

date 
And to check the PHP time:

Code: Select all

php -a
echo (new \DateTime())->format('Y-m-d H:i:s');
Next, please make sure that SELinux is disabled:

Code: Select all

sestatus
Also, to be able to execute external commands:
Please navigate to the /usr/local/nagios/etc/ folder, open nagios.cfg file and make sure that:

Code: Select all

check_external_commands=1
Please run this command to check for cron issues:

Code: Select all

chage -I -1 -m 0 -M 99999 -E -1 nagios
And please show us the contents of this file:

Code: Select all

nano /etc/security/access.conf

Does the script to submit external commands that I provided to you work with any other service checks?

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Fri Feb 09, 2018 10:53 am
by JoostICM
Hi npolovenko,

Thanks for your response.

Here are the outputs of the suggested commands:

Code: Select all

user@nagios-server:~$ php -a
Interactive mode enabled

php > echo (new \DateTime())->format('Y-m-d H:i:s');
2018-02-09 11:25:07
php > quit
user@nagios-server:~$ date
Fri Feb  9 11:25:11 CET 2018
user@nagios-server:~$ sestatus
-bash: sestatus: command not found
user@nagios-server:~$ cat /usr/local/nagios/etc/nagios.cfg | grep check_external_commands
check_external_commands=1
root@nagios-server:~# chage -I -1 -m 0 -M 99999 -E -1 nagios
root@nagios-server:~#
my /etc/security/access.conf has no directives set. I only find commented lines in there (standardtext with instructions on how to use the file) so I assume providing the contents is not needed.

I haven't tried the submission script on another service yet as I have no other passives configured. I will create a test and try submitting results via the script.

Re: Passive checks incorrectly/partially processed and PENDI

Posted: Fri Feb 09, 2018 3:26 pm
by npolovenko
@JoostICM , I would like to see the following three files: status.dat, objects.cache and nagios.log.

Code: Select all

/usr/local/nagios/var/status.dat
/usr/local/nagios/var/objects.cache
/usr/local/nagios/var/nagios.log
I haven't tried the submission script on another service yet as I have no other passives configured. I will create a test and try submitting results via the script.
Yeah, that would help.