Linux SNMP: Process name table No response from remote host
Posted: Fri Jun 21, 2013 9:28 am
I am having trouble running SNMP v3 checks against some RedHat Linux hosts.
The check plugin being run is check_snmp_process_wizard.pl with the -f flag (as the process I need to find can only be found using the fullpath option)
In the Nagios GUI, the error message I am getting back from the check is ERROR: Process name table : No response from remote host
Running the check from the command line of the Nagios server (the date commands are there for checking the timing of the command):
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -f -w '0,1' -c '0,1'; date
Fri Jun 21 08:49:57 CDT 2013
ERROR: Process name table : No response from remote host '<omitted>'.
Fri Jun 21 08:50:07 CDT 2013
However if I omit the -f flag I get a response (but I need the -f to find the process I am looking for - this is just to demonstrate that SNMP works):
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -w '0,1' -c '0,1'; date
Fri Jun 21 09:07:56 CDT 2013
No process matching ora_pmon_* found : CRITICAL
Fri Jun 21 09:07:57 CDT 2013
Now, I know this is not a SNMP user permission problem because I can execute an snmpwalk of the host and find the process I am looking for with no problem:
snmpwalk -v 3 -l authPriv -a sha -A <omitted> -x aes -X <omitted> -u <omitted> "<omitted>" | grep ora_pmon_*
HOST-RESOURCES-MIB::hrSWRunPath.4562 = STRING: "ora_pmon_GRID"
Thinking this may be a timeout problem trying to check the path table rather than the process name table, I tried adding the --timeout flag to the command:
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -f --timeout=10 -w '0,1' -c '0,1'; date
Fri Jun 21 09:12:50 CDT 2013
ERROR: Alarm signal (Nagios time-out)
Fri Jun 21 09:13:06 CDT 2013
With the timeout flag, I get a different error, but looking at the 'date' command output, it looks like the command is ignoring my custom timeout value: Fri Jun 21 09:12:50 CDT 2013 -> Fri Jun 21 09:13:06 CDT 2013 = 15 seconds - timeout was set to 10.
I tried this with a very small timeout value and got the same result:
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -f --timeout=5 -w '0,1' -c '0,1'; date
Fri Jun 21 09:14:43 CDT 2013
ERROR: Process name table : No response from remote host '<omitted>'.
Fri Jun 21 09:14:54 CDT 2013
Check duration: Fri Jun 21 09:14:43 CDT 2013 -> Fri Jun 21 09:14:54 CDT 2013 = 11 seconds instead of 5 seconds.
Thinking this may be using one of the configured nagios timeouts I checked all of the .cfg files:
grep timeout= /usr/local/nagios/etc/*
/usr/local/nagios/etc/nagios.cfg:event_handler_timeout=30
/usr/local/nagios/etc/nagios.cfg:host_check_timeout=30
/usr/local/nagios/etc/nagios.cfg:notification_timeout=30
/usr/local/nagios/etc/nagios.cfg:ocsp_timeout=5
/usr/local/nagios/etc/nagios.cfg:perfdata_timeout=5
/usr/local/nagios/etc/nagios.cfg:service_check_timeout=60
/usr/local/nagios/etc/ndomod.cfg:file_rotation_timeout=60
/usr/local/nagios/etc/nrpe.cfg:command_timeout=60
/usr/local/nagios/etc/nrpe.cfg:connection_timeout=300
None of the timeout values are set to 15 seconds. So, I have no idea why:
1. The -f flag is causing these commands to fail
2. Why the command is not obeying the --timeout flag
Any help would be appreciated
The check plugin being run is check_snmp_process_wizard.pl with the -f flag (as the process I need to find can only be found using the fullpath option)
In the Nagios GUI, the error message I am getting back from the check is ERROR: Process name table : No response from remote host
Running the check from the command line of the Nagios server (the date commands are there for checking the timing of the command):
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -f -w '0,1' -c '0,1'; date
Fri Jun 21 08:49:57 CDT 2013
ERROR: Process name table : No response from remote host '<omitted>'.
Fri Jun 21 08:50:07 CDT 2013
However if I omit the -f flag I get a response (but I need the -f to find the process I am looking for - this is just to demonstrate that SNMP works):
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -w '0,1' -c '0,1'; date
Fri Jun 21 09:07:56 CDT 2013
No process matching ora_pmon_* found : CRITICAL
Fri Jun 21 09:07:57 CDT 2013
Now, I know this is not a SNMP user permission problem because I can execute an snmpwalk of the host and find the process I am looking for with no problem:
snmpwalk -v 3 -l authPriv -a sha -A <omitted> -x aes -X <omitted> -u <omitted> "<omitted>" | grep ora_pmon_*
HOST-RESOURCES-MIB::hrSWRunPath.4562 = STRING: "ora_pmon_GRID"
Thinking this may be a timeout problem trying to check the path table rather than the process name table, I tried adding the --timeout flag to the command:
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -f --timeout=10 -w '0,1' -c '0,1'; date
Fri Jun 21 09:12:50 CDT 2013
ERROR: Alarm signal (Nagios time-out)
Fri Jun 21 09:13:06 CDT 2013
With the timeout flag, I get a different error, but looking at the 'date' command output, it looks like the command is ignoring my custom timeout value: Fri Jun 21 09:12:50 CDT 2013 -> Fri Jun 21 09:13:06 CDT 2013 = 15 seconds - timeout was set to 10.
I tried this with a very small timeout value and got the same result:
date; /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H <omitted> --login=<omitted> --passwd=<omitted> --privpass=<omitted> --protocols=sha,aes -n 'ora_pmon_*' -f --timeout=5 -w '0,1' -c '0,1'; date
Fri Jun 21 09:14:43 CDT 2013
ERROR: Process name table : No response from remote host '<omitted>'.
Fri Jun 21 09:14:54 CDT 2013
Check duration: Fri Jun 21 09:14:43 CDT 2013 -> Fri Jun 21 09:14:54 CDT 2013 = 11 seconds instead of 5 seconds.
Thinking this may be using one of the configured nagios timeouts I checked all of the .cfg files:
grep timeout= /usr/local/nagios/etc/*
/usr/local/nagios/etc/nagios.cfg:event_handler_timeout=30
/usr/local/nagios/etc/nagios.cfg:host_check_timeout=30
/usr/local/nagios/etc/nagios.cfg:notification_timeout=30
/usr/local/nagios/etc/nagios.cfg:ocsp_timeout=5
/usr/local/nagios/etc/nagios.cfg:perfdata_timeout=5
/usr/local/nagios/etc/nagios.cfg:service_check_timeout=60
/usr/local/nagios/etc/ndomod.cfg:file_rotation_timeout=60
/usr/local/nagios/etc/nrpe.cfg:command_timeout=60
/usr/local/nagios/etc/nrpe.cfg:connection_timeout=300
None of the timeout values are set to 15 seconds. So, I have no idea why:
1. The -f flag is causing these commands to fail
2. Why the command is not obeying the --timeout flag
Any help would be appreciated