Page 1 of 1
NagiosXI keeps sending notification "CRITICAL - x.x.x.x: rt
Posted: Wed Feb 14, 2018 11:47 am
by support.lta
I would like to seek some help regarding nagiosxi.
We have been receiving a frequent notification that says "CRITICAL - 10.20.20.58: rta nan, lost 100%".
While investigating on the server in question, it appears that it is UP all the time, it is pingable (no packet loss) and we can ssh to the box.
We have checked the server's uptime and there are no reboots/shutdown whatsoever.
I also have checked the interface port of the switch where this LAN is connected to and it is showing no down time as well.
Please note that there are no system update done on this server and it is not connected to the internet.
Anyone have encountered this in your environment and what is your solution?
Thank you very much.
Re: NagiosXI keeps sending notification "CRITICAL - x.x.x.x:
Posted: Wed Feb 14, 2018 3:01 pm
by npolovenko
@support.lta, Is the host check OK or in Critical right now? If this happens irregularly, I'd add a timeout value. Perhaps when the server is busy it responds slower and the check times out. What's the name of the server?
Re: NagiosXI keeps sending notification "CRITICAL - x.x.x.x:
Posted: Wed Feb 14, 2018 3:23 pm
by lmiltchev
I see the following entry in the mariadb log:
Version: '5.5.52-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
171212 10:24:47 [Warning] IP address '172.30.61.50' could not be resolved: Name or service not known
Does this IP exist on your server? Have you changed the IP of your Nagios XI server lately?
Also, you have multiple nagios processes running, which can cause the issue:
nagios 5064 1 0 2017 ? 03:06:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5420 1 0 2017 ? 02:58:58 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8652 1 0 Jan19 ? 00:19:29 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12187 1 0 2017 ? 03:03:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12219 1 0 2017 ? 03:03:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12714 1 0 2017 ? 02:56:43 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19675 1 0 2017 ? 02:59:47 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 28688 1 0 2017 ? 03:05:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31439 1 0 2017 ? 02:43:15 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31711 1 0 2017 ? 02:57:23 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In order to resolve the issue, run the following commands from the command line:
Code: Select all
service nagios stop
killall nagios
service nagios start
Note: This problem was more common with some of the older versions of Nagios XI. The issue is rarely seen in the newer version of XI. I would recommend that you upgrade to the latest version of Nagios XI.
Let us know if this helped. Thank you!
Re: NagiosXI keeps sending notification "CRITICAL - x.x.x.x:
Posted: Wed Feb 14, 2018 8:18 pm
by support.lta
npolovenko wrote:@support.lta, Is the host check OK or in Critical right now? If this happens irregularly, I'd add a timeout value. Perhaps when the server is busy it responds slower and the check times out. What's the name of the server?
It is not critical at the moment because it recovers quickly like in seconds which frustrating due to notification sent across the team and is false.
This server is the our application server and the interface in question is going to the backup.
How do I add/change the timeout value, we are quite new to nagiosxi?
Re: NagiosXI keeps sending notification "CRITICAL - x.x.x.x:
Posted: Wed Feb 14, 2018 8:25 pm
by support.lta
lmiltchev wrote:I see the following entry in the mariadb log:
Version: '5.5.52-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
171212 10:24:47 [Warning] IP address '172.30.61.50' could not be resolved: Name or service not known
Does this IP exist on your server? Have you changed the IP of your Nagios XI server lately?
Also, you have multiple nagios processes running, which can cause the issue:
nagios 5064 1 0 2017 ? 03:06:27 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5420 1 0 2017 ? 02:58:58 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8652 1 0 Jan19 ? 00:19:29 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12187 1 0 2017 ? 03:03:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12219 1 0 2017 ? 03:03:02 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12714 1 0 2017 ? 02:56:43 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19675 1 0 2017 ? 02:59:47 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 28688 1 0 2017 ? 03:05:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31439 1 0 2017 ? 02:43:15 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31711 1 0 2017 ? 02:57:23 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In order to resolve the issue, run the following commands from the command line:
Code: Select all
service nagios stop
killall nagios
service nagios start
Note: This problem was more common with some of the older versions of Nagios XI. The issue is rarely seen in the newer version of XI. I would recommend that you upgrade to the latest version of Nagios XI.
Let us know if this helped. Thank you!
Thanks for the looking into this.
The IP in question is 10.20.20.58 and we have not change any IP in our Nagios XI server.
Regarding these processes, we will look into it as well.
But to give you some note that we have not done any update in our Nagios Xi for a year.
We did not encounter any of these issues until 2-3 months ago. Is there any permanent solution for this?
This is a production server and everyone gets alarm whenever there is a notification coming in and is also a CRITICAL..
I will propose this upgrade to our team, is this something we just need to run or it take a few process and configuration?
Again, thank you for the response.
Re: NagiosXI keeps sending notification "CRITICAL - x.x.x.x:
Posted: Thu Feb 15, 2018 1:12 pm
by lmiltchev
We did not encounter any of these issues until 2-3 months ago. Is there any permanent solution for this?
It is possible that you added more checks, some of which may take a very long time to execute. If nagios doesn't exit "cleanly", and you have some timeouts, you may end up with multiple nagios processes (instances of nagios) running on the same server. It is hard to say what caused the issue after the fact. If you start experiencing the issue again, open a new ticket here:
https://support.nagios.com/tickets and upload your profile (Admin > System Profile > Download Profile)
I will propose this upgrade to our team, is this something we just need to run or it take a few process and configuration?
I would recommend fixing the issue with the multiple nagios instances first.
Code: Select all
service nagios stop
killall nagios
service nagios start
As far as upgrading - make sure you don't have any config errors and make all of the necessary backups prior to running the upgrade. Here's our official documentation on upgrading Nagios XI:
https://assets.nagios.com/downloads/nag ... ctions.pdf