Inconsistent NRDP performance

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Inconsistent NRDP performance

Post by cdienger »

Sounds like realtime data isn't getting updated. This could be due to multiple nagios processes or caching. Are there multiple Nagios processes running on any of the machines? "ps -ef | grep nagios.cfg" should only show to processes similar to:

nagios 11232 1 3 12:27 ? 00:07:33 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 11276 11232 0 12:27 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg


Anything more should be killed. The caching option can be found under Admin > System Config > Performance Settings > Backend Cache.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
krutaw
Posts: 60
Joined: Wed Jul 31, 2013 6:30 pm

Re: Inconsistent NRDP performance

Post by krutaw »

cdienger wrote:Sounds like realtime data isn't getting updated. This could be due to multiple nagios processes or caching. Are there multiple Nagios processes running on any of the machines? "ps -ef | grep nagios.cfg" should only show to processes similar to:

nagios 11232 1 3 12:27 ? 00:07:33 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 11276 11232 0 12:27 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg


Anything more should be killed. The caching option can be found under Admin > System Config > Performance Settings > Backend Cache.
I checked all 3 of the active nagios servers and there are indeed only 2 processes with nagios.cfg in the command line and the same was true of the passive server.

Also, I checked and the caching settings are not enabled in any of the servers so that's not what is driving this. Where else should I be looking at this point?
krutaw
Posts: 60
Joined: Wed Jul 31, 2013 6:30 pm

Re: Inconsistent NRDP performance

Post by krutaw »

I think I found something, but I'm not sure what the heck to make of it. I happened to spot these lines in my nagios.log:

Code: Select all

[1521057945] Warning: The results of service 'Datastore - Usage' on host 'some_host_name' are stale by 0d 0h 0m 32s (threshold=0d 0h 25m 0s).  I'm forcing an immediate check of the service.
What's interesting about that is that the Nagios passive server sees that I've set the freshness threshold to 25 minutes, but is considering the checks stale after mere seconds. And it's not just that one check, the log was literally littered with them timing out after as little as 1 second. As you can see by the log output, I've set the freshness threshold but it's being ignored. Thoughts?

Oh, and in case it helps, I looked at the settings for one of the failing checks in objects.cache and it looks like this:

Code: Select all

define service {
        host_name	some_host_name
        service_description     Datastore - Usage
        check_period    xi_timeperiod_24x7
        check_command   check_dummy!2!"Data not received from $_HOSTNAGHOST$"!!!!!!
        contacts        nagiosadmin
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance	0
        check_interval  10.000000
        retry_interval  1.000000
        max_check_attempts	5
        is_volatile     0
        parallelize_check	1
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold	0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     1500
        check_freshness 1
        notification_options    a
        notifications_enabled   0
        notification_interval   60.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data	0
        retain_status_information	1
        retain_nonstatus_information    1
        }
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Inconsistent NRDP performance

Post by cdienger »

I'm curious about the testing method with the log - the message would indicate that the check hasn't come in for threshold+stalevalue or 25minutes 32 seconds in this case. That would be inline with the behavior we've been seeing but on the other hand if the check is going stale BEFORE the threshold of 25 minutes is reached, that would be a problem.

If the above doesn't help you with finding the problem, I'd like to take a look at the systems on a remote and would request you open a ticket at http://support.nagios.com/tickets/ in that case.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked