Problems with NRDP processing some Passive checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
sgh
Posts: 2
Joined: Thu Jul 01, 2021 12:05 pm

Problems with NRDP processing some Passive checks

Post by sgh »

I'm new here, but have been using Nagios for many, many years.

I'm having a strange problem that I haven't seen before. I have two identical remote Nagios installs on the same remote network, monitoring only local devices (one is monitoring RTD's, the other is monitoring CANBus). The hardware and software is identical (they were built at the same time) with the exception that one has RTD interfaces and the other has CANx. On the RTD one, passive checks are sent back to XI very reliably without any hiccups or problems. On the CANx one, some passive checks are sent seemingly reliably, some unreliably, and some not at all. Concerningly, it's apparent even for the preconfigured localhost checks. On the remote Nagios, all active checks are appearing completely fine. I have tried putting in a Freshness check but it has not resolved the problem

Example:
Nagios XI shows this:

Code: Select all

Memory Usage
Passive Only Check
Ok	
14h 33m 50s	
1/1	
2022-02-23 08:32:05	
OK: No data received yet.
While the remote Nagios shows this:

Code: Select all

Memory Usage
Ok
15h 39m 30s
1/5
2022-02-23 08:33:21
OK - 7282 / 7812 MB (93%) Free Memory, Used: 497 MB, Shared: 26 MB, Buffers + Cached: 343 MB
Other checks are fine, but periodically they will go bad as well like the Memory Usage "OK: No data received yet." check. For example I've seen it happen with PING and Service Status - httpd (fine on remote, showing OK: No data received yet on Nagios XI).

Another oddity - while the "Service Status - mysqld" check is running happily and successfully on the remote Nagios, the check has never managed to show up on Nagios XI (it has never appeared in Unconfigured Objects).

These checks all run fine on the other remote Nagios install and show up on Nagios XI without a problem.

Any thoughts? I was considering deleting the host and all checks from Nagios XI and allowing them to come in again as Unconfigured Objects. I should also note that the checkresults folder is empty. Things appear there for brief seconds but disappear immediately. It's acting like the remote Nagios is just not sending some of the NDRP data to Nagios XI, but I haven't found a place to watch for the sends to see if it's missing any.

Thank you so much in advance for any suggestions!
sgh
Posts: 2
Joined: Thu Jul 01, 2021 12:05 pm

Re: Problems with NRDP processing some Passive checks

Post by sgh »

This has been sort of resolved.

I watched the perfdataproc.log and saw that occassionally NRDP uploads were failing with the following:

Code: Select all

STDOUT: ERROR: The NRDP Server said BAD XML
A lot of uploads went fine.

I think there is something bad in the Service Status - mysqld output data that is causing the routine to build the XML to fail. Whenever the output from the Service Status - mysqld check would be packaged up with other output from other checks, that upload would fail so Nagios XI would not get updated, that's why for the other checks they would sometimes update, sometimes not.

Oddly enough, the other Nagios remote is happily updating the Service Status - mysqld without issues. I looked way back in the logs on it and there were a bunch of BAD XML errors, but they haven't occurred for a very long time. It must be something in that specific output on just the one Nagios remote.

I turned off the Service Status - mysqld check and I haven't seen another error and everything seems to be updating perfectly now.

Incidentally, I tried both newer and older send_nrdp.sh scripts (up to 0.6.1) and they all had the issue. The send_nrdp.sh on the other Nagios remote has the same version as the one that is having problems.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Problems with NRDP processing some Passive checks

Post by ssax »

Hmm, I thought the CDATA wrap of the service output in the send_nrdp.sh script should have taken care of bad XML issues with the output.

Usually, I recommend that you edit this file:

Code: Select all

/usr/local/nrdp/server/config.inc.php
And make sure these are set at the bottom:

Code: Select all

// Enable debug logging
$cfg["debug"] = true;

// Where should the logs go?
$cfg["debug_log"] = "/usr/local/nrdp/server/debug.log";
Then I run tail command while watching the submissions and try to see why the XML is showing as bad:

Code: Select all

tail -Fn0 /usr/local/nrdp/server/debug.log
After you're done testing, disable the debug logging so that it doesn't fill up your disk!
Locked