Passive checks incorrectly/partially processed and PENDING

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Passive checks incorrectly/partially processed and PENDING

Postby JoostICM » Fri Feb 02, 2018 11:42 am

Hello,

I hope someone can help me with the following problem I am experiencing with passive checks on NAGIOS Core.

We've started only recently to deploy passive checks so I am not at all experienced in implementing these. It might be I am making a stupid mistake, but we've looked for problems for a while without finding any. Therefore I hope to find someone here that can help me get this up and running.

I have a crontabbed script on the NAGIOS host itself that retrieves the state of a number of internet MPEG streams and looks if they are up or down. The script runs every 10 minutes and, after some processing, drops PROCESS_SERVICE_CHECK_RESULT records in the command file. When I tail the command file (standard location and name) during the time the script runs I see it correctly submitting the commands so everything from running the script, retrieving the info, processing it and writing output into the cmd file apparently is ok:

example of submitted records in the cmd file:
Code: Select all
[1517587415] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream1;2;Stream is OFFLINE
[1517587415] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream2;2;Stream is OFFLINE
[1517587416] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream3;0;Stream is ONLINE
[1517587416] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream4;0;Stream is ONLINE


Host declaration:
Code: Select all
define host {
        use                             linux-server
        host_name                       Streams
        alias                           Check Streams
        address                         127.0.0.1
        check_period                    24x7
        check_command                   check-host-alive
        notification_period             24x7
}


Service declaration:
Code: Select all
define service {
        use                             generic-service
        host_name                       Streams
        service_description             Stream1
        active_checks_enabled           0                       ; Active service checks are disabled
        passive_checks_enabled          1
        parallelize_check               1
        is_volatile                     1
        obsess_over_service             0
        check_freshness                 1                       ; Check service 'freshness'
        freshness_threshold             4800                    ; How fresh must the check be?
        check_command                   gone_stale              ; Report staleness
}


gone_stale as declared in the command file:
Code: Select all
define command {
        command_name gone_stale
        command_line /usr/lib/nagios/plugins/check_dummy 2 “There hasn’t been an update recently!”
}


I also noticed this command works fine as after the stale timeout it switches all services to critical. This is processed completely so all of them go crit, even the 3/7 already processed. But I have submitted numerous results with return code 0 (or 2) on more than half of them and every 10 minutes, so this should not happen. Why these go stale is beyond me.

switches in nagios.cfg enabled:
Code: Select all
accept_passive_service_checks=1
check_service_freshness=1


What happens is the following:
- I receive 7 OFFLINE critical submits and around 11 ONLINE.
- Only 3 of those OFFLINE services go critical, the other 4 remain PENDING
- ALL return code 0 services do not change state in the GUI and stay PENDING
- Although the script deposits close to 20 results in the command file the nagios.log only shows 3 out of 7 critical records processed and no Ok records processed.
- Submitting passive check results via the GUI do not seem to adapt the service state to the result entered and check stays on PENDING. The record can be seen in the cmd file but nagios.log doesn't mention any processing of the record. This is both when submitting an Ok or a Critical.

I would like to see NAGIOS set all online (return code 0) services to green/Ok and the offline ones (return code 2) to red/critical. If the state of the service changes NAGIOS should adapt the service color/state. Unfortunately I get neither of those to work.

My thanks in advance for any response! If you need any additional data please let me know!

Best regards, Joost
JoostICM
 
Posts: 8
Joined: Fri Feb 02, 2018 10:33 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby tgriep » Fri Feb 02, 2018 3:49 pm

What version of Nagios are you running?
If you are sending the Offline alerts with the same state but a different message, they will not be logged unless State Stalking is setup.
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/stalking.html

Couple of thing, if there are more that 2 Nagios services running, that could cause the issue you are having.
Stop the nagios service and verify that all of them are stopped. Kill them off if there are any left.

You may want to disable parallelize_check for those services.
What that does is allow more than one of the service checks to run at the same time and if the same check runs, it could cause the issue you are having.
Try that and see if that fixes the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6435
Joined: Thu Oct 30, 2014 9:02 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby JoostICM » Mon Feb 05, 2018 3:54 am

hello tgriep,

Thanks for your reply!

1. I don't think stalking is the answer if I read the source you referred to.

If the plugin always returns the same text output for a particular state, there is no reason to enable stalking for that state.


As long as the status of the stream doesn't change log messages are the same and text output doesn't change. I have only two possible values in that field. "Stream is ONLINE" and "Stream is OFFLINE". Or do you suggest this because of another reason than the log message?

2. Have verified amount of daemons running. When I shut down the nagios process there is no record left in ps rollouts so I guess this isn't the case.

3. I will disable those and see what effect it has. Thanks for pointing that out!

Thanks!
Joost
JoostICM
 
Posts: 8
Joined: Fri Feb 02, 2018 10:33 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby JoostICM » Mon Feb 05, 2018 5:52 am

I've checked everything you pointed at.

When I disabled parallel checking I saw more negatives come in (crits) but still no state change on Ok. The checks that are ok stay on PENDING.

Also, after the stale timeout is passed all go to critical yet again. NAGIOS seems to not notice it has received multiple records for all affected services, as even the ones set to offline (critical state) go stale while it receives check results every 10 minutes and the stale timeout is 1,5h.

I tried disabling volatile services just to try something different but this didn't affect the problem at all.

I'm running NAGIOS Core version 4.3.4.
JoostICM
 
Posts: 8
Joined: Fri Feb 02, 2018 10:33 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby npolovenko » Mon Feb 05, 2018 1:39 pm

@JoostICM, We do have a recommended script to passively process service checks:
https://old.nagios.org/developerinfo/ex ... and_id=114
Can you run it against your service and let us know whether it works?
Also, please upload the script that you previously used to populate the cmd file.
Thank you.
User avatar
npolovenko
Support Tech
 
Posts: 1315
Joined: Mon May 15, 2017 5:00 pm

Re: Passive checks incorrectly/partially processed and PENDI

Postby JoostICM » Tue Feb 06, 2018 9:17 am

Hello,

My submission script, found somewhere online:

Code: Select all
#!/bin/sh

# Write a command to the Nagios command file to cause
# it to process a service check result

echocmd="/bin/echo"

CommandFile="/usr/local/nagios/var/rw/nagios.cmd"

# get the current date/time in seconds since UNIX epoch
datetime=`date +%s`

# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"

# append the command to the end of the command file
`$echocmd $cmdline >> $CommandFile`


The script that calls this one does that as follows:

Code: Select all
/usr/local/nagios/libexec/eventhandlers/submit_check_result Streams $var1 $var2 $var3


With:
var1: Service Name
var2: Alarm State (0 or 2)
var3: Additional info text (plugin output, "Stream is ONLINE/OFFLINE" text)

As the dispatch into the cmd file is correctly formatted I doubt changing this script will do anything, but I have seen weirder things in my work space ;-)

Thanks again!

Greetings Joost
Last edited by JoostICM on Tue Feb 06, 2018 9:36 am, edited 1 time in total.
JoostICM
 
Posts: 8
Joined: Fri Feb 02, 2018 10:33 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby JoostICM » Tue Feb 06, 2018 9:35 am

update: tried the passive result script you provided.

adapted script (running on ubuntu):
#!/bin/sh
# This is a sample shell script showing how you can submit the PROCESS_SERVICE_CHECK_RESULT command
# to Nagios. Adjust variables to fit your environment as necessary.

now=`date +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'

/usr/bin/printf "[%lu] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream1;0;OK- Everything Looks Great\n" $now > $commandfile


Output in CMD file:
Code: Select all
[1517927427] PROCESS_SERVICE_CHECK_RESULT;Streams;Stream1;0;OK- Everything Looks Great


Check stays on Pending. No result visible in NAGIOS Core GUI.
JoostICM
 
Posts: 8
Joined: Fri Feb 02, 2018 10:33 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby npolovenko » Tue Feb 06, 2018 11:42 am

@JoostICM, Let's check whether you have any problems with the time on your system.
Please run this command to check the local time:
Code: Select all
date

And to check the PHP time:
Code: Select all
php -a
echo (new \DateTime())->format('Y-m-d H:i:s');


Next, please make sure that SELinux is disabled:
Code: Select all
sestatus


Also, to be able to execute external commands:
Please navigate to the /usr/local/nagios/etc/ folder, open nagios.cfg file and make sure that:
Code: Select all
check_external_commands=1


Please run this command to check for cron issues:
Code: Select all
chage -I -1 -m 0 -M 99999 -E -1 nagios


And please show us the contents of this file:
Code: Select all
nano /etc/security/access.conf



Does the script to submit external commands that I provided to you work with any other service checks?
User avatar
npolovenko
Support Tech
 
Posts: 1315
Joined: Mon May 15, 2017 5:00 pm

Re: Passive checks incorrectly/partially processed and PENDI

Postby JoostICM » Fri Feb 09, 2018 10:53 am

Hi npolovenko,

Thanks for your response.

Here are the outputs of the suggested commands:

Code: Select all
user@nagios-server:~$ php -a
Interactive mode enabled

php > echo (new \DateTime())->format('Y-m-d H:i:s');
2018-02-09 11:25:07
php > quit
user@nagios-server:~$ date
Fri Feb  9 11:25:11 CET 2018
user@nagios-server:~$ sestatus
-bash: sestatus: command not found
user@nagios-server:~$ cat /usr/local/nagios/etc/nagios.cfg | grep check_external_commands
check_external_commands=1
root@nagios-server:~# chage -I -1 -m 0 -M 99999 -E -1 nagios
root@nagios-server:~#


my /etc/security/access.conf has no directives set. I only find commented lines in there (standardtext with instructions on how to use the file) so I assume providing the contents is not needed.

I haven't tried the submission script on another service yet as I have no other passives configured. I will create a test and try submitting results via the script.
JoostICM
 
Posts: 8
Joined: Fri Feb 02, 2018 10:33 am

Re: Passive checks incorrectly/partially processed and PENDI

Postby npolovenko » Fri Feb 09, 2018 3:26 pm

@JoostICM , I would like to see the following three files: status.dat, objects.cache and nagios.log.
Code: Select all
/usr/local/nagios/var/status.dat
/usr/local/nagios/var/objects.cache
/usr/local/nagios/var/nagios.log


I haven't tried the submission script on another service yet as I have no other passives configured. I will create a test and try submitting results via the script.

Yeah, that would help.
User avatar
npolovenko
Support Tech
 
Posts: 1315
Joined: Mon May 15, 2017 5:00 pm

Next

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 28 guests