5.5.0 upgrades resuting in more 'unknowns'

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
JGCG
Posts: 45
Joined: Fri Sep 29, 2017 6:31 am

5.5.0 upgrades resuting in more 'unknowns'

Post by JGCG »

Hi,

We upgraded to 5.5.0 last week and have noticed we now get a significate more number of 'Uknown' issues than before.
If we manually re-check these, they eventually go to an OK state.

Some of the typical errors we receieve are:
*Alarm signal (Nagios time-out)
*CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
*NRPE: Unable to read output
*No answer from host
*could not fetch information from server
*Free disk space : Invalid drive

Has something changed with the upgrade that is causing this, and how can we fix it?

One last question, we are using the Virtual Machine from: https://www.nagios.com/downloads/nagios-xi/vmware/
I ran a yum-update and can see there are a few updates waiting to apply, if we update Nagios through the web interface, will this also include these system updates reported by yum?
If not, is it safe to run a yum-update?

Thanks.
Last edited by JGCG on Wed Jul 18, 2018 9:47 am, edited 2 times in total.
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by jomann »

It is safe to run a yum update. The upgrade does not update your system, only the components used by XI.

What version of NRPE are these checking? Are they NRPE agents installed with the linux agent tarball we provide? If so, we are now running NRPE 3.2.x on XI 5.5 so you may need to add a -2 to the command line for check_nrpe on systems that are running 2.1.5 and have problems. This would force check_nrpe to use the old packet type for NRPE v2.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
JGCG
Posts: 45
Joined: Fri Sep 29, 2017 6:31 am

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by JGCG »

It seems this issue has cleared itself, although we haven't changed anything locally or remotely.
The soft 'Unknowns' have dropped from around 150 per day to just a handful.
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: [Solved] 5.5.0 upgrades resuting in more 'unknowns'

Post by jomann »

That is interesting, let us know if you have any other issues with it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
JGCG
Posts: 45
Joined: Fri Sep 29, 2017 6:31 am

Re: [Solved] 5.5.0 upgrades resuting in more 'unknowns'

Post by JGCG »

Scrap my last comment, they are now back.
I upgraded to 5.5.0, and as of Monday (7 days after), the amount of unknowns reduced drastically.

I upgraded to 5.5.1 this morning, and immediately after the 'Unknown' alerts are back.
The same thing happend last week, the 'Unknown' starting occuring immediately after the upgrade.

All the ones up at the moment SNMP checks on different hosts.
If they get re-checked, they clear, but shortly after when Nagios does it's check they go 'Unknown' again.

Was there a change in >= 5.5.0 that affected SNMP?
Or are any DB mainteance/cache clearing scripts that kick in after 7 days which would explain why the original issue cleared?

If you check the below log screenshot, you'll see we had one Unknown last night at 1AM, but when upgrading around midday today they starting occuring every few minutes.
You do not have the required permissions to view the files attached to this post.
swolf

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by swolf »

We did add some SNMP-related features (specifically, the SNMP Trap Interface in the admin menu), but this looks like an active check (regular SNMP) setup.
The only change that would have occurred is that we upgraded the nagios-plugins package, but for this plugin we only changed it by enabling IPv6 (this commit).

Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?

Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?
JGCG
Posts: 45
Joined: Fri Sep 29, 2017 6:31 am

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by JGCG »

swolf wrote: Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?

Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?
No changes have occured in the environment in the last week since doing the two upgrades and we have no hardcoded entries in the hosts file (besides the usual local hostnames).
We can SNMP walk fine and run the check manually from the command line fine, and forcing a manual re-check from the interface also is fine and clears the alert.
The issue happens sporadically when Nagios invokes the check.

I've had a check through the ones that seem to be occuring most frequently and they are using the check_snmp_load.pl, check_snmp_process.pl, and check_snmp_storage.pl scripts.
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by jomann »

Have you tried running the commands on the command line as the nagios user? That is how Nagios Core runs the checks, it looks like some of them are timeouts on the Nagios Core side, and possibly time outs on the host side. You could also up the Nagios Core timeout if you wanted in /usr/local/nagios/etc/nagios.cfg with host_check_timeout and service_check_timeout options.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
JGCG
Posts: 45
Joined: Fri Sep 29, 2017 6:31 am

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by JGCG »

jomann wrote:Have you tried running the commands on the command line as the nagios user? That is how Nagios Core runs the checks, it looks like some of them are timeouts on the Nagios Core side, and possibly time outs on the host side. You could also up the Nagios Core timeout if you wanted in /usr/local/nagios/etc/nagios.cfg with host_check_timeout and service_check_timeout options.
I've tried as the nagios user executing the commands manaully and the work fine.

I've been monitoring today, and once again, the unknowns have now stopped occuring as frequently, so feel free to close this thread.
It's strange however that the issue is rampant imediately after an upgrade, then last for a few days and then stop.

Edit: it looks as if another user is experiencing the same issue: https://support.nagios.com/forum/viewto ... 16&t=49512
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: 5.5.0 upgrades resuting in more 'unknowns'

Post by scottwilkerson »

Out of curiosity, do you know what version you upgraded from?

Also, do you know if you have every needed to edit this plugin in previous versions?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked