False notification
Posted: Sat Apr 13, 2013 12:49 am
Hi all
We have nearly 400 network devices being monitored via snmp v3. However the router host is being checked using the check_host_alive command. In a 3 month period after installation, we have got the below scenario around 2 times.
The point is sometimes, there are 4-5 or at times 80-90 hosts showing as down with 100% packet loss in host-check command. But their respective services are showing up. Also the pinging to the host gives very low rta much below the threshold. Also when we re-schedule immediate check for that host, it does not correct itself. It stays in this state even for hours together.
We have to restart the nagios & httpd service to bring them back to normal.
Any ideas welcome.
Info which might be helpful for your analysis: We have installed nagiosxi on a 4 CPU/ 16GB RAM server of which the CPU usage is only less than 10%, but RAM utilisation is around 15GB with 5GB cached. There is a 12GB swap which is not used at all.
Regards
SBA-Viswa
We have nearly 400 network devices being monitored via snmp v3. However the router host is being checked using the check_host_alive command. In a 3 month period after installation, we have got the below scenario around 2 times.
The point is sometimes, there are 4-5 or at times 80-90 hosts showing as down with 100% packet loss in host-check command. But their respective services are showing up. Also the pinging to the host gives very low rta much below the threshold. Also when we re-schedule immediate check for that host, it does not correct itself. It stays in this state even for hours together.
We have to restart the nagios & httpd service to bring them back to normal.
Any ideas welcome.
Info which might be helpful for your analysis: We have installed nagiosxi on a 4 CPU/ 16GB RAM server of which the CPU usage is only less than 10%, but RAM utilisation is around 15GB with 5GB cached. There is a 12GB swap which is not used at all.
Regards
SBA-Viswa