Hi all
We have nearly 400 network devices being monitored via snmp v3. However the router host is being checked using the check_host_alive command. In a 3 month period after installation, we have got the below scenario around 2 times.
The point is sometimes, there are 4-5 or at times 80-90 hosts showing as down with 100% packet loss in host-check command. But their respective services are showing up. Also the pinging to the host gives very low rta much below the threshold. Also when we re-schedule immediate check for that host, it does not correct itself. It stays in this state even for hours together.
We have to restart the nagios & httpd service to bring them back to normal.
Any ideas welcome.
Info which might be helpful for your analysis: We have installed nagiosxi on a 4 CPU/ 16GB RAM server of which the CPU usage is only less than 10%, but RAM utilisation is around 15GB with 5GB cached. There is a 12GB swap which is not used at all.
Regards
SBA-Viswa
False notification
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: False notification
If you run a standard ping to these hosts from the CLI do you have packet loss there?
Re: False notification
During the said time when the hosts shows down & the services shows up, the host is actually up. Tested by pinging from the "ping this host" option in the monitoring GUI as well as directly from outside the nagios environment.
Also evaluated teh following document --> http://assets.nagios.com/downloads/nagi ... _In_XI.pdf
Here it was mentioned that there might be firewalls inbetween to block the icmp traffic, but still it happens to certain devices & also it get to normal if we restart the nagios & httpd service.
SBA-Viswa
Also evaluated teh following document --> http://assets.nagios.com/downloads/nagi ... _In_XI.pdf
Here it was mentioned that there might be firewalls inbetween to block the icmp traffic, but still it happens to certain devices & also it get to normal if we restart the nagios & httpd service.
SBA-Viswa
Re: False notification
Verify that there are not 2 separate nagios processes:
Code: Select all
service nagios stop
killall nagios
ps -aef | grep nagios
service nagios startFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: False notification
The said issue has cropped up again today only with around 51 host showing down when actually not.
Our client checked the processes during this issue period. There were no multiple nagios instances which could have triggered this scenario.
Please throw light on what next to be checked. Our client is going embarassed & actually doubting the credibility of the status of other hosts.
Regards
SBA-Viswa
Our client checked the processes during this issue period. There were no multiple nagios instances which could have triggered this scenario.
Please throw light on what next to be checked. Our client is going embarassed & actually doubting the credibility of the status of other hosts.
Regards
SBA-Viswa
Re: False notification
If you are a paying customer, could you please send an email to [email protected] to open up a ticket. We may need to look at your configuration snapshot tarball and that is best done through more secure means.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: False notification
Yes our customer has support previledges. Ultimately I need to ask the customer to mail nagios support team.
But can you guide me on how & when to take the configuration snapshot tarball.
But can you guide me on how & when to take the configuration snapshot tarball.
Re: False notification
From the Nagios XI web interface, click on the "Admin" menu, then click on "Config Snapshots" under the "Monitoring Config" menu on the left, click on both, the "Download' and "View Output" actions buttons, save both files (*.tar.gz and *.txt), and email them to [email protected].
Be sure to check out our Knowledgebase for helpful articles and solutions!