Hi,
We upgraded to 5.5.0 last week and have noticed we now get a significate more number of 'Uknown' issues than before.
If we manually re-check these, they eventually go to an OK state.
Some of the typical errors we receieve are:
*Alarm signal (Nagios time-out)
*CHECK_NRPE: Receive header underflow - only -1 bytes received (4 expected).
*NRPE: Unable to read output
*No answer from host
*could not fetch information from server
*Free disk space : Invalid drive
Has something changed with the upgrade that is causing this, and how can we fix it?
One last question, we are using the Virtual Machine from: https://www.nagios.com/downloads/nagios-xi/vmware/
I ran a yum-update and can see there are a few updates waiting to apply, if we update Nagios through the web interface, will this also include these system updates reported by yum?
If not, is it safe to run a yum-update?
Thanks.
5.5.0 upgrades resuting in more 'unknowns'
5.5.0 upgrades resuting in more 'unknowns'
Last edited by JGCG on Wed Jul 18, 2018 9:47 am, edited 2 times in total.
Re: 5.5.0 upgrades resuting in more 'unknowns'
It is safe to run a yum update. The upgrade does not update your system, only the components used by XI.
What version of NRPE are these checking? Are they NRPE agents installed with the linux agent tarball we provide? If so, we are now running NRPE 3.2.x on XI 5.5 so you may need to add a -2 to the command line for check_nrpe on systems that are running 2.1.5 and have problems. This would force check_nrpe to use the old packet type for NRPE v2.
What version of NRPE are these checking? Are they NRPE agents installed with the linux agent tarball we provide? If so, we are now running NRPE 3.2.x on XI 5.5 so you may need to add a -2 to the command line for check_nrpe on systems that are running 2.1.5 and have problems. This would force check_nrpe to use the old packet type for NRPE v2.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: 5.5.0 upgrades resuting in more 'unknowns'
It seems this issue has cleared itself, although we haven't changed anything locally or remotely.
The soft 'Unknowns' have dropped from around 150 per day to just a handful.
The soft 'Unknowns' have dropped from around 150 per day to just a handful.
Re: [Solved] 5.5.0 upgrades resuting in more 'unknowns'
That is interesting, let us know if you have any other issues with it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: [Solved] 5.5.0 upgrades resuting in more 'unknowns'
Scrap my last comment, they are now back.
I upgraded to 5.5.0, and as of Monday (7 days after), the amount of unknowns reduced drastically.
I upgraded to 5.5.1 this morning, and immediately after the 'Unknown' alerts are back.
The same thing happend last week, the 'Unknown' starting occuring immediately after the upgrade.
All the ones up at the moment SNMP checks on different hosts.
If they get re-checked, they clear, but shortly after when Nagios does it's check they go 'Unknown' again.
Was there a change in >= 5.5.0 that affected SNMP?
Or are any DB mainteance/cache clearing scripts that kick in after 7 days which would explain why the original issue cleared?
If you check the below log screenshot, you'll see we had one Unknown last night at 1AM, but when upgrading around midday today they starting occuring every few minutes.
I upgraded to 5.5.0, and as of Monday (7 days after), the amount of unknowns reduced drastically.
I upgraded to 5.5.1 this morning, and immediately after the 'Unknown' alerts are back.
The same thing happend last week, the 'Unknown' starting occuring immediately after the upgrade.
All the ones up at the moment SNMP checks on different hosts.
If they get re-checked, they clear, but shortly after when Nagios does it's check they go 'Unknown' again.
Was there a change in >= 5.5.0 that affected SNMP?
Or are any DB mainteance/cache clearing scripts that kick in after 7 days which would explain why the original issue cleared?
If you check the below log screenshot, you'll see we had one Unknown last night at 1AM, but when upgrading around midday today they starting occuring every few minutes.
You do not have the required permissions to view the files attached to this post.
-
swolf
Re: 5.5.0 upgrades resuting in more 'unknowns'
We did add some SNMP-related features (specifically, the SNMP Trap Interface in the admin menu), but this looks like an active check (regular SNMP) setup.
The only change that would have occurred is that we upgraded the nagios-plugins package, but for this plugin we only changed it by enabling IPv6 (this commit).
Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?
Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?
The only change that would have occurred is that we upgraded the nagios-plugins package, but for this plugin we only changed it by enabling IPv6 (this commit).
Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?
Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?
Re: 5.5.0 upgrades resuting in more 'unknowns'
No changes have occured in the environment in the last week since doing the two upgrades and we have no hardcoded entries in the hosts file (besides the usual local hostnames).swolf wrote: Did you have any other upgrades in your environment that coincided with the Nagios XI upgrade, like possibly your DNS server? Did you have any hardcoded hostnames in your XI machine's /etc/hosts file? If so, are they still there?
Can you still use a command like snmpwalk from your Nagios XI terminal to the host that's giving UNKNOWN status?
We can SNMP walk fine and run the check manually from the command line fine, and forcing a manual re-check from the interface also is fine and clears the alert.
The issue happens sporadically when Nagios invokes the check.
I've had a check through the ones that seem to be occuring most frequently and they are using the check_snmp_load.pl, check_snmp_process.pl, and check_snmp_storage.pl scripts.
Re: 5.5.0 upgrades resuting in more 'unknowns'
Have you tried running the commands on the command line as the nagios user? That is how Nagios Core runs the checks, it looks like some of them are timeouts on the Nagios Core side, and possibly time outs on the host side. You could also up the Nagios Core timeout if you wanted in /usr/local/nagios/etc/nagios.cfg with host_check_timeout and service_check_timeout options.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: 5.5.0 upgrades resuting in more 'unknowns'
I've tried as the nagios user executing the commands manaully and the work fine.jomann wrote:Have you tried running the commands on the command line as the nagios user? That is how Nagios Core runs the checks, it looks like some of them are timeouts on the Nagios Core side, and possibly time outs on the host side. You could also up the Nagios Core timeout if you wanted in /usr/local/nagios/etc/nagios.cfg with host_check_timeout and service_check_timeout options.
I've been monitoring today, and once again, the unknowns have now stopped occuring as frequently, so feel free to close this thread.
It's strange however that the issue is rampant imediately after an upgrade, then last for a few days and then stop.
Edit: it looks as if another user is experiencing the same issue: https://support.nagios.com/forum/viewto ... 16&t=49512
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: 5.5.0 upgrades resuting in more 'unknowns'
Out of curiosity, do you know what version you upgraded from?
Also, do you know if you have every needed to edit this plugin in previous versions?
Also, do you know if you have every needed to edit this plugin in previous versions?