Page 1 of 1

Passive checks out of sync with log

Posted: Wed Oct 02, 2019 9:56 am
by invade
Hi.

We have various systems sending passive checks to Nagios via Gearman.

While investigating a problem I noticed a strange set of events.

I'm not sure if this is a problem or "working as expected" but, I don't understand what's happening so wondered if someone could explain.

Below is a series of checks being sent from a client, including a timestamp of when they were run:

Code: Select all

[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 0 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:26:19 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 1 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:26:40 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 2 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:27:40 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 3 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:28:09 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 4 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:29:09 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=2 --message="Message 5 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:29:40 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 0 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:30:20 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 1 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:31:10 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 2 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:31:40 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 3 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:32:10 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=0 --message="Message 4 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:32:50 BST
[root@client ~]# date "+%d-%m-%Y @ %T %Z" ; /usr/bin/send_gearman --server=gearman --encryption=yes --key=${KEY} --host=${HOSTNAME} --service=Test --returncode=2 --message="Message 5 - $(date "+%d-%m-%Y @ %T %Z")"
02-10-2019 @ 15:33:50 BST
and below are the entries in the Nagios log:

Code: Select all

[Wed Oct  2 15:26:41 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 0 - 02-10-2019 @ 15:26:19 BST
[Wed Oct  2 15:27:42 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 1 - 02-10-2019 @ 15:26:40 BST
[Wed Oct  2 15:28:12 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 2 - 02-10-2019 @ 15:27:40 BST
[Wed Oct  2 15:29:12 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 3 - 02-10-2019 @ 15:28:09 BST
[Wed Oct  2 15:29:42 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 4 - 02-10-2019 @ 15:29:09 BST
[Wed Oct  2 15:29:42 2019] SERVICE NOTIFICATION: client;client;Test;CRITICAL;notifyservice-client;Message 5 - 02-10-2019 @ 15:29:40 BST
[Wed Oct  2 15:29:42 2019] SERVICE ALERT: client;Test;CRITICAL;HARD;1;Message 5 - 02-10-2019 @ 15:29:40 BST
[Wed Oct  2 15:30:22 2019] PASSIVE SERVICE CHECK: client;Test;2;Message 5 - 02-10-2019 @ 15:29:40 BST
[Wed Oct  2 15:31:11 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 0 - 02-10-2019 @ 15:30:20 BST
[Wed Oct  2 15:31:42 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 1 - 02-10-2019 @ 15:31:10 BST
[Wed Oct  2 15:32:11 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 2 - 02-10-2019 @ 15:31:40 BST
[Wed Oct  2 15:32:51 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 3 - 02-10-2019 @ 15:32:10 BST
[Wed Oct  2 15:33:52 2019] PASSIVE SERVICE CHECK: client;Test;0;Message 4 - 02-10-2019 @ 15:32:50 BST
[Wed Oct  2 15:33:52 2019] SERVICE NOTIFICATION: client;client;Test;CRITICAL;notifyservice-client;Message 5 - 02-10-2019 @ 15:33:50 BST
[Wed Oct  2 15:33:52 2019] SERVICE ALERT: client;Test;CRITICAL;HARD;1;Message 5 - 02-10-2019 @ 15:33:50 BST
What I noticed is the first check run at 15:26:19 is not logged by Nagios until the next check is run at 15:26:40, this in turn is not logged until the check is run at 15:27:40, and so on.

This continues until I send a check at 15:29:40 with a different return code. This prompts the previous check run at 15:29:09 to be logged and followed immediately by the alert and notification.

Hopefully I've explained the situation clearly.

Software versions in use are the following packages for CentOS:
mod_gearman : 3.1.0
gearmand : 0.33-7
nagios : 4.4.3

If any additional information is required, just ask.

Thanks in advance.

Re: Passive checks out of sync with log

Posted: Thu Oct 03, 2019 3:24 pm
by eloyd
Nagios doesn't always log the result of every status check, but it does log the result of every status check where the result is different from the lats time it ran. You might want to look into state stalking.

Re: Passive checks out of sync with log

Posted: Thu Oct 03, 2019 4:54 pm
by scottwilkerson
Thanks @eloyd

Re: Passive checks out of sync with log

Posted: Fri Oct 04, 2019 9:07 am
by invade
Many thanks for the explanation and suggestion.

Re: Passive checks out of sync with log

Posted: Fri Oct 04, 2019 1:47 pm
by benjaminsmith
Many thanks for the explanation and suggestion.
Your welcome. May we close this thread or did you have any other questions?

Re: Passive checks out of sync with log

Posted: Mon Oct 07, 2019 3:50 am
by invade
Please close. Thank you.

Re: Passive checks out of sync with log

Posted: Mon Oct 07, 2019 6:49 am
by scottwilkerson
invade wrote:Please close. Thank you.
Great!

Locking