False notification

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
sbaviswa
Posts: 17
Joined: Tue Jan 22, 2013 9:04 am

False notification

Post by sbaviswa »

Hi all

We have nearly 400 network devices being monitored via snmp v3. However the router host is being checked using the check_host_alive command. In a 3 month period after installation, we have got the below scenario around 2 times.

The point is sometimes, there are 4-5 or at times 80-90 hosts showing as down with 100% packet loss in host-check command. But their respective services are showing up. Also the pinging to the host gives very low rta much below the threshold. Also when we re-schedule immediate check for that host, it does not correct itself. It stays in this state even for hours together.

We have to restart the nagios & httpd service to bring them back to normal.

Any ideas welcome.

Info which might be helpful for your analysis: We have installed nagiosxi on a 4 CPU/ 16GB RAM server of which the CPU usage is only less than 10%, but RAM utilisation is around 15GB with 5GB cached. There is a 12GB swap which is not used at all.

Regards
SBA-Viswa
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: False notification

Post by scottwilkerson »

If you run a standard ping to these hosts from the CLI do you have packet loss there?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
sbaviswa
Posts: 17
Joined: Tue Jan 22, 2013 9:04 am

Re: False notification

Post by sbaviswa »

During the said time when the hosts shows down & the services shows up, the host is actually up. Tested by pinging from the "ping this host" option in the monitoring GUI as well as directly from outside the nagios environment.

Also evaluated teh following document --> http://assets.nagios.com/downloads/nagi ... _In_XI.pdf

Here it was mentioned that there might be firewalls inbetween to block the icmp traffic, but still it happens to certain devices & also it get to normal if we restart the nagios & httpd service.

SBA-Viswa
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: False notification

Post by abrist »

Verify that there are not 2 separate nagios processes:

Code: Select all

service nagios stop
killall nagios
ps -aef | grep nagios
service nagios start
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
sbaviswa
Posts: 17
Joined: Tue Jan 22, 2013 9:04 am

Re: False notification

Post by sbaviswa »

The said issue has cropped up again today only with around 51 host showing down when actually not.

Our client checked the processes during this issue period. There were no multiple nagios instances which could have triggered this scenario.

Please throw light on what next to be checked. Our client is going embarassed & actually doubting the credibility of the status of other hosts.

Regards
SBA-Viswa
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: False notification

Post by abrist »

If you are a paying customer, could you please send an email to [email protected] to open up a ticket. We may need to look at your configuration snapshot tarball and that is best done through more secure means.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
sbaviswa
Posts: 17
Joined: Tue Jan 22, 2013 9:04 am

Re: False notification

Post by sbaviswa »

Yes our customer has support previledges. Ultimately I need to ask the customer to mail nagios support team.
But can you guide me on how & when to take the configuration snapshot tarball.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: False notification

Post by lmiltchev »

From the Nagios XI web interface, click on the "Admin" menu, then click on "Config Snapshots" under the "Monitoring Config" menu on the left, click on both, the "Download' and "View Output" actions buttons, save both files (*.tar.gz and *.txt), and email them to [email protected].
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked