I'm running Nagios XI 5.11.3, and I seem to have some command stuck in the External Command File.
I noticed that the messages file in /var/log is growing, eating up disk space, and when I look in it it's full of the same messages.
I think they're related to SNMP traps being processed to the external command file, but we're not SNMP traps anywhere. I've likely tried to set this up before, but then changed my mind.
One of the hosts in question is actually shut down, and I've checked all my UPS devices and removed any SNMP trap entries from them, so I think Nagios is just reprocessing these old messages. How can I confirm this is the case, and how can I make it stop!
Many Thanks,
Peter.
Entries in /var/log/messages
Code: Select all
Nov 20 16:41:24 nagios nagios: Warning: Passive check result was received for service 'SNMP Traps' on host 'UNKNOWN', but the host could not be found!
Nov 20 16:41:24 nagios nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed.
Nov 20 16:41:24 nagios nagios: External command [1694786177] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed. returned error Command failed
Nov 20 16:41:24 nagios nagios: Warning: Passive check result was received for service 'SNMP Traps' on host 'UNKNOWN', but the host could not be found!
Nov 20 16:41:24 nagios nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;1;APC UPS: On battery: The UPS has switched to battery backup power. / enterprises.318.2.3.3.0 ():UPS: On battery power in response to distorted input.
Nov 20 16:41:24 nagios nagios: External command [1695001305] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;1;APC UPS: On battery: The UPS has switched to battery backup power. / enterprises.318.2.3.3.0 ():UPS: On battery power in response to distorted input. returned error Command failed
Nov 20 16:41:24 nagios nagios: Warning: Passive check result was received for service 'SNMP Traps' on host 'UNKNOWN', but the host could not be found!
Nov 20 16:41:24 nagios nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Utility power restored: Returned from battery backup power; utility power restored. / enterprises.318.2.3.3.0 ():UPS: No longer on battery power.
Nov 20 16:41:24 nagios nagios: External command [1695001306] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Utility power restored: Returned from battery backup power; utility power restored. / enterprises.318.2.3.3.0 ():UPS: No longer on battery power. returned error Command failed
If I ran a cat of /usr/local/nagios/var/rw/nagios.cmd, I see the following lines appear every few seconds:
Code: Select all
[1689222801] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 02 02 02 02 02 01 01 01 02 / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 02 02 02 02 02 01 01 01 02
[1690376072] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 01 01 04 02 02 01 01 01 02 / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 01 01 04 02 02 01 01 01 02
[1691157436] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed.
[1691634065] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 02 02 02 02 02 01 01 01 02 / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 02 02 02 02 02 01 01 01 02
[1691634125] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02 / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02
[1689222801] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 02 02 02 02 02 01 01 01 02 / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 02 02 02 02 02 01 01 01 02
[1689222860] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02 / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02
[1692367017] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed.