Page 1 of 3

5.5.0 upgrades resuting in more 'unknowns'

Posted: Mon Jul 16, 2018 6:02 am
by JGCG
Hi,

We upgraded to 5.5.0 last week and have noticed we now get a significate more number of 'Uknown' issues than before.
If we manually re-check these, they eventually go to an OK state.

Some of the typical errors we receieve are:
*Alarm signal (Nagios time-out)
*CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
*NRPE: Unable to read output
*No answer from host
*could not fetch information from server
*Free disk space : Invalid drive

Has something changed with the upgrade that is causing this, and how can we fix it?

One last question, we are using the Virtual Machine from: https://www.nagios.com/downloads/nagios-xi/vmware/
I ran a yum-update and can see there are a few updates waiting to apply, if we update Nagios through the web interface, will this also include these system updates reported by yum?
If not, is it safe to run a yum-update?

Thanks.

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Mon Jul 16, 2018 12:55 pm
by jomann
It is safe to run a yum update. The upgrade does not update your system, only the components used by XI.

What version of NRPE are these checking? Are they NRPE agents installed with the linux agent tarball we provide? If so, we are now running NRPE 3.2.x on XI 5.5 so you may need to add a -2 to the command line for check_nrpe on systems that are running 2.1.5 and have problems. This would force check_nrpe to use the old packet type for NRPE v2.

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Wed Jul 18, 2018 3:25 am
by JGCG
It seems this issue has cleared itself, although we haven't changed anything locally or remotely.
The soft 'Unknowns' have dropped from around 150 per day to just a handful.

Re: [Solved] 5.5.0 upgrades resuting in more 'unknowns'

Posted: Wed Jul 18, 2018 9:18 am
by jomann
That is interesting, let us know if you have any other issues with it.

Re: [Solved] 5.5.0 upgrades resuting in more 'unknowns'

Posted: Wed Jul 18, 2018 9:47 am
by JGCG
Scrap my last comment, they are now back.
I upgraded to 5.5.0, and as of Monday (7 days after), the amount of unknowns reduced drastically.

I upgraded to 5.5.1 this morning, and immediately after the 'Unknown' alerts are back.
The same thing happend last week, the 'Unknown' starting occuring immediately after the upgrade.

All the ones up at the moment SNMP checks on different hosts.
If they get re-checked, they clear, but shortly after when Nagios does it's check they go 'Unknown' again.

Was there a change in >= 5.5.0 that affected SNMP?
Or are any DB mainteance/cache clearing scripts that kick in after 7 days which would explain why the original issue cleared?

If you check the below log screenshot, you'll see we had one Unknown last night at 1AM, but when upgrading around midday today they starting occuring every few minutes.

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Wed Jul 18, 2018 10:09 am
by swolf
We did add some SNMP-related features (specifically, the SNMP Trap Interface in the admin menu), but this looks like an active check (regular SNMP) setup.
The only change that would have occurred is that we upgraded the nagios-plugins package, but for this plugin we only changed it by enabling IPv6 (this commit).

Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?

Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Wed Jul 18, 2018 10:46 am
by JGCG
swolf wrote: Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?

Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?
No changes have occured in the environment in the last week since doing the two upgrades and we have no hardcoded entries in the hosts file (besides the usual local hostnames).
We can SNMP walk fine and run the check manually from the command line fine, and forcing a manual re-check from the interface also is fine and clears the alert.
The issue happens sporadically when Nagios invokes the check.

I've had a check through the ones that seem to be occuring most frequently and they are using the check_snmp_load.pl, check_snmp_process.pl, and check_snmp_storage.pl scripts.

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Thu Jul 19, 2018 8:54 am
by jomann
Have you tried running the commands on the command line as the nagios user? That is how Nagios Core runs the checks, it looks like some of them are timeouts on the Nagios Core side, and possibly time outs on the host side. You could also up the Nagios Core timeout if you wanted in /usr/local/nagios/etc/nagios.cfg with host_check_timeout and service_check_timeout options.

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Thu Jul 19, 2018 9:32 am
by JGCG
jomann wrote:Have you tried running the commands on the command line as the nagios user? That is how Nagios Core runs the checks, it looks like some of them are timeouts on the Nagios Core side, and possibly time outs on the host side. You could also up the Nagios Core timeout if you wanted in /usr/local/nagios/etc/nagios.cfg with host_check_timeout and service_check_timeout options.
I've tried as the nagios user executing the commands manaully and the work fine.

I've been monitoring today, and once again, the unknowns have now stopped occuring as frequently, so feel free to close this thread.
It's strange however that the issue is rampant imediately after an upgrade, then last for a few days and then stop.

Edit: it looks as if another user is experiencing the same issue: https://support.nagios.com/forum/viewto ... 16&t=49512

Re: 5.5.0 upgrades resuting in more 'unknowns'

Posted: Thu Jul 19, 2018 9:52 am
by scottwilkerson
Out of curiosity, do you know what version you upgraded from?

Also, do you know if you have every needed to edit this plugin in previous versions?