Passive checks reams to miss a beat
Passive checks reams to miss a beat
Hi
We have a questions regarding passive control, NRDP,
Our Nagios XI 5.7.2 i running on Redhat6 64 in vmware.
We have created a host with 10 checks in Nagios XI, all passive.
They are updated from a client using a custom script.
If we run the script to update one service one time, all works as expected.
and if we run the script to update all 10 services once, all works as well.
But if we update the same service with a new status or status information more than once in quick succession.
It seems random which update is displayed.
If we delay updates of same service to 10 seconds, the last update is shows. Almost all the time.
The question is
How fast or slow can we update the same service, and expect the last of the updates to be shown. Every time ?
We have even tried to use send_nrpd.sh as a test, to rule out any error in our custom code.
% cat test10.sh
PAUSE=10
for t in 1 2 3 4 5 6 7 8 9 10; do
./send_nrdp.sh -u https://nagiosserver/nrdp/ -t <very secret token> -H testserver -s "batch job 1" -S 1 -o "Job run warning $t"
sleep $PAUSE
./send_nrdp.sh -u https://nagiosserver/nrdp/ -t <very secret token> -H testserver -s "batch job 1" -S 0 -o "Job run ok $t"
sleep $PAUSE
done
This run with similar results, PAUSE under 10 seems to give random results.
10 and above seems to work as expected, last update i shown.
I have ensured that send_nrdp.sh returns 1 every time
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
......
Please advice to the inner workings in Nagios, to figure out reasonable timings.
Regards.
Henrik
We have a questions regarding passive control, NRDP,
Our Nagios XI 5.7.2 i running on Redhat6 64 in vmware.
We have created a host with 10 checks in Nagios XI, all passive.
They are updated from a client using a custom script.
If we run the script to update one service one time, all works as expected.
and if we run the script to update all 10 services once, all works as well.
But if we update the same service with a new status or status information more than once in quick succession.
It seems random which update is displayed.
If we delay updates of same service to 10 seconds, the last update is shows. Almost all the time.
The question is
How fast or slow can we update the same service, and expect the last of the updates to be shown. Every time ?
We have even tried to use send_nrpd.sh as a test, to rule out any error in our custom code.
% cat test10.sh
PAUSE=10
for t in 1 2 3 4 5 6 7 8 9 10; do
./send_nrdp.sh -u https://nagiosserver/nrdp/ -t <very secret token> -H testserver -s "batch job 1" -S 1 -o "Job run warning $t"
sleep $PAUSE
./send_nrdp.sh -u https://nagiosserver/nrdp/ -t <very secret token> -H testserver -s "batch job 1" -S 0 -o "Job run ok $t"
sleep $PAUSE
done
This run with similar results, PAUSE under 10 seems to give random results.
10 and above seems to work as expected, last update i shown.
I have ensured that send_nrdp.sh returns 1 every time
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
Sent 1 checks to https://nagiosserver/nrdp/
......
Please advice to the inner workings in Nagios, to figure out reasonable timings.
Regards.
Henrik
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Passive checks reams to miss a beat
This is governed by how frequently you have nagios process the passive check results, every 10 seconds is the default
in the nagios.cfg
https://assets.nagios.com/downloads/nag ... _frequency
in the nagios.cfg
Code: Select all
check_result_reaper_frequency=10Re: Passive checks reams to miss a beat
Yes, that sounds correct - but why do we not get all the statuses? It misses 40% of the status updatess if we update each 1s - and status updates within 10s are inserted in a random order, which we can live with, if they are just inserted correctly.
Re: Passive checks reams to miss a beat
As mrmit describemrmit wrote:Yes, that sounds correct - but why do we not get all the statuses? It misses 40% of the status updatess if we update each 1s - and status updates within 10s are inserted in a random order, which we can live with, if they are just inserted correctly.
If we send 10 updates in rapid succession, Nagios first shows one. That seem random.
I guess the inner workings of Nagios is responsible for this. Nagios poll's for a status internally.
One would expect that Nagios would show the last update (of the 10) next time the web interface updates. but it does not.
It seems like the rest of the 10 updates is forgotten or deleted, after the first update.
Regards
Djarner
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Passive checks reams to miss a beat
You can drop this to 1
And restart nagios.
This will process checks every second, more frequently than that is subject to the behavior you are seeing as the checks are placed in a queue directory and not processed in a specific order as this is emptied at the interval in the setting above
Code: Select all
check_result_reaper_frequency=1This will process checks every second, more frequently than that is subject to the behavior you are seeing as the checks are placed in a queue directory and not processed in a specific order as this is emptied at the interval in the setting above
Re: Passive checks reams to miss a beat
We understand the async queue and the random order which check results are inserted, that is not the issue.
The issue is that is loses 40% of the status updates - that is a consistency problem, why does it lose data when pushing 1 update per second??
The issue is that is loses 40% of the status updates - that is a consistency problem, why does it lose data when pushing 1 update per second??
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Passive checks reams to miss a beat
Are you sure it is losing them instead of just not processing them in the correct order?mrmit wrote:We understand the async queue and the random order which check results are inserted, that is not the issue.
The issue is that is loses 40% of the status updates - that is a consistency problem, why does it lose data when pushing 1 update per second??
I would set the following in your nagios.cfg
Code: Select all
log_passive_checks=1You should be able to see it process each check
Re: Passive checks reams to miss a beat
ok, I think I understand now. Off course when all the check statuses are in random order - it will not show the ones where the status does not change from ok to warning for example. That is off course why it seems to lose some check statuses.
I would think that enabling volatile on the service would fix that, so we could see all the updates - in the description it says exactly that, but it doesnt seem to work.
When we are using a template, options on the service does take effect, right?
I would think that enabling volatile on the service would fix that, so we could see all the updates - in the description it says exactly that, but it doesnt seem to work.
When we are using a template, options on the service does take effect, right?
Re: Passive checks reams to miss a beat
You likely want to use State Stalking for the logging as well:
https://assets.nagios.com/downloads/nag ... lking.html
Correct, anytime you set something directly on the host/service it will override anything defined in a template.
https://assets.nagios.com/downloads/nag ... lking.html
Correct, anytime you set something directly on the host/service it will override anything defined in a template.
Re: Passive checks reams to miss a beat
I dont intentionally want to drag this out, but still it doesnt seem to work.
We have enabled volatile, stalking and obsessing, still when I push OK status checks to the service, nothing appears in service history. Wont it show up there?
We have enabled volatile, stalking and obsessing, still when I push OK status checks to the service, nothing appears in service history. Wont it show up there?