Passive checks reams to miss a beat

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Djarner
Posts: 2
Joined: Tue Jan 24, 2017 8:25 am

Passive checks reams to miss a beat

Post by Djarner »

Hi

We have a questions regarding passive control, NRDP,

Our Nagios XI 5.7.2 i running on Redhat6 64 in vmware.

We have created a host with 10 checks in Nagios XI, all passive.

They are updated from a client using a custom script.

If we run the script to update one service one time, all works as expected.
and if we run the script to update all 10 services once, all works as well.
But if we update the same service with a new status or status information more than once in quick succession.
It seems random which update is displayed.

If we delay updates of same service to 10 seconds, the last update is shows. Almost all the time.

The question is
How fast or slow can we update the same service, and expect the last of the updates to be shown. Every time ?

We have even tried to use send_nrpd.sh as a test, to rule out any error in our custom code.

% cat test10.sh
PAUSE=10
for t in 1 2 3 4 5 6 7 8 9 10; do
./send_nrdp.sh -u https://nagiosserver/nrdp/ -t <very secret token> -H testserver -s "batch job 1" -S 1 -o "Job run warning $t"
sleep $PAUSE
./send_nrdp.sh -u https://nagiosserver/nrdp/ -t <very secret token> -H testserver -s "batch job 1" -S 0 -o "Job run ok $t"
sleep $PAUSE
done


This run with similar results, PAUSE under 10 seems to give random results.
10 and above seems to work as expected, last update i shown.

I have ensured that send_nrdp.sh returns 1 every time

Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
......

Please advice to the inner workings in Nagios, to figure out reasonable timings.

Regards.

Henrik
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Passive checks reams to miss a beat

Post by scottwilkerson »

This is governed by how frequently you have nagios process the passive check results, every 10 seconds is the default
in the nagios.cfg

Code: Select all

check_result_reaper_frequency=10
https://assets.nagios.com/downloads/nag ... _frequency
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mrmit
Posts: 11
Joined: Mon May 25, 2020 3:07 am

Re: Passive checks reams to miss a beat

Post by mrmit »

Yes, that sounds correct - but why do we not get all the statuses? It misses 40% of the status updatess if we update each 1s - and status updates within 10s are inserted in a random order, which we can live with, if they are just inserted correctly.
Djarner
Posts: 2
Joined: Tue Jan 24, 2017 8:25 am

Re: Passive checks reams to miss a beat

Post by Djarner »

mrmit wrote:Yes, that sounds correct - but why do we not get all the statuses? It misses 40% of the status updatess if we update each 1s - and status updates within 10s are inserted in a random order, which we can live with, if they are just inserted correctly.
As mrmit describe
If we send 10 updates in rapid succession, Nagios first shows one. That seem random.
I guess the inner workings of Nagios is responsible for this. Nagios poll's for a status internally.

One would expect that Nagios would show the last update (of the 10) next time the web interface updates. but it does not.
It seems like the rest of the 10 updates is forgotten or deleted, after the first update.

Regards

Djarner
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Passive checks reams to miss a beat

Post by scottwilkerson »

You can drop this to 1

Code: Select all

check_result_reaper_frequency=1
And restart nagios.

This will process checks every second, more frequently than that is subject to the behavior you are seeing as the checks are placed in a queue directory and not processed in a specific order as this is emptied at the interval in the setting above
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mrmit
Posts: 11
Joined: Mon May 25, 2020 3:07 am

Re: Passive checks reams to miss a beat

Post by mrmit »

We understand the async queue and the random order which check results are inserted, that is not the issue.

The issue is that is loses 40% of the status updates - that is a consistency problem, why does it lose data when pushing 1 update per second??
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Passive checks reams to miss a beat

Post by scottwilkerson »

mrmit wrote:We understand the async queue and the random order which check results are inserted, that is not the issue.

The issue is that is loses 40% of the status updates - that is a consistency problem, why does it lose data when pushing 1 update per second??
Are you sure it is losing them instead of just not processing them in the correct order?

I would set the following in your nagios.cfg

Code: Select all

log_passive_checks=1
Then restart nagios

You should be able to see it process each check
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mrmit
Posts: 11
Joined: Mon May 25, 2020 3:07 am

Re: Passive checks reams to miss a beat

Post by mrmit »

ok, I think I understand now. Off course when all the check statuses are in random order - it will not show the ones where the status does not change from ok to warning for example. That is off course why it seems to lose some check statuses.

I would think that enabling volatile on the service would fix that, so we could see all the updates - in the description it says exactly that, but it doesnt seem to work.

When we are using a template, options on the service does take effect, right?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Passive checks reams to miss a beat

Post by ssax »

You likely want to use State Stalking for the logging as well:

https://assets.nagios.com/downloads/nag ... lking.html

Correct, anytime you set something directly on the host/service it will override anything defined in a template.
mrmit
Posts: 11
Joined: Mon May 25, 2020 3:07 am

Re: Passive checks reams to miss a beat

Post by mrmit »

I dont intentionally want to drag this out, but still it doesnt seem to work.
We have enabled volatile, stalking and obsessing, still when I push OK status checks to the service, nothing appears in service history. Wont it show up there?
Locked