Linux SNMP Checks Returning Intermittent Errors
Posted: Wed Apr 09, 2025 10:29 am
Hello,
We recently discovered that Linux SNMP checks return errors throughout the day. This just seems like a brief error as they clear almost immediately.
We are only running this type of check on a few servers, but we have noticed the errors on all of them. One server in particular is alerting around 15 plus times per day per check.
On this particular server, we are running 1 CPU check, 1 MEM check, and 10 DISK checks.
For the CPU check, we are getting UNKNOWN - No answer from host. This is the check command:
$USER1$/check_snmp_load_wizard.pl -H $HOSTADDRESS$ -C <community string> --v2c -w 95 -c 98 -f
For the MEM check, we are getting UNKNOWN - ERROR: netsnmp : No response from remote host "<hostname>". This is the check command:
$USER1$/check_snmp_mem.pl -H $HOSTADDRESS$ -C <community string> -2 -w 90,70 -c 95,75 -f
For the DISK checks, we are getting CRITICAL - ERROR: Description/Type table : No response from remote host "<hostname>". This is the check command:
$USER1$/check_snmp_storage_wizard.pl -H $HOSTADDRESS$ -C <community string> --v2c -m "^/var$" -w 95 -c 98 -f
The particular server is running Red Hat Enterprise Linux v7.9.0 STANDARD
We are in the process of setting up a new QA environment and add the same checks there, and we are getting similar results.
I then made a couple changes
1) Specifically to the check_snmp_load_wizard.pl, I enabled my $TIMEOUT = 30; (formerly using the default of 15;).
2) I added -t 60 to the checks
The results were less frequency of alerts, and all alerts now come in as UNKNOWN - ERROR: General time-out (Alarm signal)
I have also tried combining the disk checks into a single check, thinking the issues is the frequency of the snmp calls to the server. This has produced similar results, less frequency with the same General time-out error. We also lose performance graphs.
Any assistance in resolving this is greatly appreciated. The issue is not the alerts, but the noise within the UI and in the state history, making it difficult for our application owners to be aware of any legitimate issues.
Thanks in advance.
We recently discovered that Linux SNMP checks return errors throughout the day. This just seems like a brief error as they clear almost immediately.
We are only running this type of check on a few servers, but we have noticed the errors on all of them. One server in particular is alerting around 15 plus times per day per check.
On this particular server, we are running 1 CPU check, 1 MEM check, and 10 DISK checks.
For the CPU check, we are getting UNKNOWN - No answer from host. This is the check command:
$USER1$/check_snmp_load_wizard.pl -H $HOSTADDRESS$ -C <community string> --v2c -w 95 -c 98 -f
For the MEM check, we are getting UNKNOWN - ERROR: netsnmp : No response from remote host "<hostname>". This is the check command:
$USER1$/check_snmp_mem.pl -H $HOSTADDRESS$ -C <community string> -2 -w 90,70 -c 95,75 -f
For the DISK checks, we are getting CRITICAL - ERROR: Description/Type table : No response from remote host "<hostname>". This is the check command:
$USER1$/check_snmp_storage_wizard.pl -H $HOSTADDRESS$ -C <community string> --v2c -m "^/var$" -w 95 -c 98 -f
The particular server is running Red Hat Enterprise Linux v7.9.0 STANDARD
We are in the process of setting up a new QA environment and add the same checks there, and we are getting similar results.
I then made a couple changes
1) Specifically to the check_snmp_load_wizard.pl, I enabled my $TIMEOUT = 30; (formerly using the default of 15;).
2) I added -t 60 to the checks
The results were less frequency of alerts, and all alerts now come in as UNKNOWN - ERROR: General time-out (Alarm signal)
I have also tried combining the disk checks into a single check, thinking the issues is the frequency of the snmp calls to the server. This has produced similar results, less frequency with the same General time-out error. We also lose performance graphs.
Any assistance in resolving this is greatly appreciated. The issue is not the alerts, but the noise within the UI and in the state history, making it difficult for our application owners to be aware of any legitimate issues.
Thanks in advance.