Intermittent alerts from Unix server

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
pbsindian
Posts: 14
Joined: Tue Aug 14, 2018 3:29 pm

Intermittent alerts from Unix server

Post by pbsindian »

Hi Team,
We have configured our Unix monitoring using SNMP. We are running in POC phase and we are seeing frequent alerts from different servers at different time frames. We haven't found any pattern w.r.t. to alerts.

For example we have received following alerts all at same time:

For host :
tg-pxoct is DOWN CRITICAL - Plugin timed out while executing system call
And for each service :
tg-pxoct : Disk Usage is CRITICAL ERROR: Description/Type table : No response from remote host 10.XX.XX.XX.

Host was not down during that time. We didn't find any issues with the host either. Though below are the intervals set, why did we receive alerts right from go and got clear alerts in few minutes. Why didn't it wait for completing the pooling cycle before sending us alerts? If it because, it got timed out instead of fail/success. How do we avoid these issues?

Check Interval : 5
Retry Interval : 1
Max check attempts : 19

We are using same SNMP community string to monitor Nagios and also other monitoring tools. Will that be an issue?


Thanks,
Bhargava
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Intermittent alerts from Unix server

Post by tgriep »

One thing to try is to increase the timeout for your SNMP check.
Some plugins when they poll a server, have to retrieve alot of data and if it does not get all of the data in time, it causes a timeout.
Most of the SNMP plugins have a 5 second timeout.
Try editing your command and increase the timeout to 59 seconds be adding the following to the command line.
-t 50

Another thing to look into. SNMP uses the UDP protocol and if there are ant network congestion's, that data could be dropped.
Make sure your network devices are set to not drop that data.

Let us know if this helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!
pbsindian
Posts: 14
Joined: Tue Aug 14, 2018 3:29 pm

Re: Intermittent alerts from Unix server

Post by pbsindian »

Thank you. We have applied timeout on servers which were alerting. We will monitor for next couple of days.

We have been seeing different kinds of time out errors like below from various servers intermittently.

ERROR: General time-out (Alarm signal)
ERROR: Description/Type table : No response from remote host
No answer from host
service check timed out
no response from host

What is the best way to handle these time outs?

Should we add -t 59 across all the the 1000+ services we onboarded so far?

Thanks,
Bhargava
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Intermittent alerts from Unix server

Post by tgriep »

If the timeout alerts are generated from SNMP checks, then I would increase the timeout value for the command that you are using for the checks.
That way, you would only have to edit a few commands in the Core Config Manager instead of editing the service checks individually to fix the timeout issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked