ERROR: General time-out (Alarm signal) ?
ERROR: General time-out (Alarm signal) ?
There're a lot of monitors on our NagiosXI including localize sites and overseas sites, some times the "status" will display "unknown" and the "status information" will display "General time-out (Alarm signal)", what could be the problem and how to fix that?
You do not have the required permissions to view the files attached to this post.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: ERROR: General time-out (Alarm signal) ?
Hi @xpertech,
Since this is an intermittent issue, it's likely that problem is related to network connectivity. The Nagios XI server is losing its connection to the remote host, and those checks are generating a timeout error.
When this is a happening, can you ping one of those hosts from the Nagios Server to verify the connection?
Since this is an intermittent issue, it's likely that problem is related to network connectivity. The Nagios XI server is losing its connection to the remote host, and those checks are generating a timeout error.
When this is a happening, can you ping one of those hosts from the Nagios Server to verify the connection?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: ERROR: General time-out (Alarm signal) ?
But when error happened, the PING is normal?!
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: ERROR: General time-out (Alarm signal) ?
Hi @xpertech,
Ok. What check command are you using for those services? The reason I ask, is that if you are using SNMP to check those services,we've seen this type of behavior when there is UDP packet loss over the network.
Can you post the check command or PM your system profile?
You can try to increase the timeout value for the plugin, however, if your experiencing UDP packet loss due to network issues, you'll most likely still get the same error message.
Edit the SNMP commands and add the following to the command line
Nagios XI - How To Test Check Commands From The Command-line
https://support.nagios.com/kb/article/n ... e-167.html
Ok. What check command are you using for those services? The reason I ask, is that if you are using SNMP to check those services,we've seen this type of behavior when there is UDP packet loss over the network.
Can you post the check command or PM your system profile?
You can try to increase the timeout value for the plugin, however, if your experiencing UDP packet loss due to network issues, you'll most likely still get the same error message.
Edit the SNMP commands and add the following to the command line
Code: Select all
-t 60
https://support.nagios.com/kb/article/n ... e-167.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: ERROR: General time-out (Alarm signal) ?
It seems that doesn't change much when increase timeout value, but when change SNMP version from v2 to v1, the amount of "unknown" status became lower but still some of that, I wonder why the SNMP version will affect the amount?!
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: ERROR: General time-out (Alarm signal) ?
Hi @xpertech,
Try running the checks for those hosts from the commmand line while increasing the timeout settings to see how long it's taking. It looks like increasing the timeouts may resolve the issue.
Once you determine how long it's taking you can increase the default time out settings (below) in /usr/local/nagios/etc/nagios.cfg:
What's with the different SNMP versions? v1, v2c, v3
https://www.logicmonitor.com/blog/whats ... s1-v2c-v3/
Try running the checks for those hosts from the commmand line while increasing the timeout settings to see how long it's taking. It looks like increasing the timeouts may resolve the issue.
Once you determine how long it's taking you can increase the default time out settings (below) in /usr/local/nagios/etc/nagios.cfg:
Code: Select all
host_check_timeout=30
service_check_timeout=60
https://www.logicmonitor.com/blog/whats ... s1-v2c-v3/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: ERROR: General time-out (Alarm signal) ?
There still some Unknown status after increase timeout value,
Is there a way to make a test to verify that the Unknown status were caused by network issue?
Is there a way to make a test to verify that the Unknown status were caused by network issue?
Re: ERROR: General time-out (Alarm signal) ?
One of the things that I've noticed is that the SNMP daemon on devices seems to be the lowest priority in terms of process priority so if you system is overloaded it may take a lot longer to poll the SNMP data (or it might not respond at all because it's too overloaded).
You can see that increasing the timeout fixes some of them which means that likely increasing the timeout may fix even more of them.
This could be because of the load on the remote device OR it could be a network issue. More than likely it's just the remote machine not responding in time before the check timeout occurs, that's the reason for the timeout error.
Are you monitoring the other metrics on those systems such as CPU/memory? Is the resource usage high on those machines when this occurs?
Generally what I do when I'm troubleshooting this is to manually run the check commands from the command line to see how long they take or run an snmpwalk against it during that time and most of the time the remote device will be performing poorly when running the commands during that time. Do you see anything in the logs on the remote devices that indicate an issue?
You can see that increasing the timeout fixes some of them which means that likely increasing the timeout may fix even more of them.
This could be because of the load on the remote device OR it could be a network issue. More than likely it's just the remote machine not responding in time before the check timeout occurs, that's the reason for the timeout error.
Are you monitoring the other metrics on those systems such as CPU/memory? Is the resource usage high on those machines when this occurs?
Generally what I do when I'm troubleshooting this is to manually run the check commands from the command line to see how long they take or run an snmpwalk against it during that time and most of the time the remote device will be performing poorly when running the commands during that time. Do you see anything in the logs on the remote devices that indicate an issue?
Code: Select all
snmpwalk -v 2c -c public X.X.X.X:161Re: ERROR: General time-out (Alarm signal) ?
None of the local hosts respond Unknown status, there were overseas(American & Europe) hosts respond Unknown status, so it seems not the NagiosXI loading or performance issue.
When GUI got Unknown status, we use snmpkwalk command and there were all normal, but in the same time the GUI check disk & RAM display that host-A normal but host-B Unknown, also the check CPU display host-A Unknown and host-B normal, so it seems that not the network issue?! we also check the loading of host-A and B, they seem not in heavy duty.
When GUI got Unknown status, we use snmpkwalk command and there were all normal, but in the same time the GUI check disk & RAM display that host-A normal but host-B Unknown, also the check CPU display host-A Unknown and host-B normal, so it seems that not the network issue?! we also check the loading of host-A and B, they seem not in heavy duty.
You do not have the required permissions to view the files attached to this post.
Re: ERROR: General time-out (Alarm signal) ?
attachment
You do not have the required permissions to view the files attached to this post.