Clear External Command File?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
proddan
Posts: 16
Joined: Mon Feb 13, 2017 8:38 am

Clear External Command File?

Post by proddan »

Hi Everyone,

I'm running Nagios XI 5.11.3, and I seem to have some command stuck in the External Command File.

I noticed that the messages file in /var/log is growing, eating up disk space, and when I look in it it's full of the same messages.

I think they're related to SNMP traps being processed to the external command file, but we're not SNMP traps anywhere. I've likely tried to set this up before, but then changed my mind.

One of the hosts in question is actually shut down, and I've checked all my UPS devices and removed any SNMP trap entries from them, so I think Nagios is just reprocessing these old messages. How can I confirm this is the case, and how can I make it stop!


Many Thanks,


Peter.

Entries in /var/log/messages

Code: Select all

Nov 20 16:41:24 nagios nagios: Warning:  Passive check result was received for service 'SNMP Traps' on host 'UNKNOWN', but the host could not be found!
Nov 20 16:41:24 nagios nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed.
Nov 20 16:41:24 nagios nagios: External command [1694786177] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed. returned error Command failed
Nov 20 16:41:24 nagios nagios: Warning:  Passive check result was received for service 'SNMP Traps' on host 'UNKNOWN', but the host could not be found!
Nov 20 16:41:24 nagios nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;1;APC UPS: On battery: The UPS has switched to battery backup power. / enterprises.318.2.3.3.0 ():UPS: On battery power in response to distorted input.
Nov 20 16:41:24 nagios nagios: External command [1695001305] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;1;APC UPS: On battery: The UPS has switched to battery backup power. / enterprises.318.2.3.3.0 ():UPS: On battery power in response to distorted input. returned error Command failed
Nov 20 16:41:24 nagios nagios: Warning:  Passive check result was received for service 'SNMP Traps' on host 'UNKNOWN', but the host could not be found!
Nov 20 16:41:24 nagios nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Utility power restored: Returned from battery backup power; utility power restored. / enterprises.318.2.3.3.0 ():UPS: No longer on battery power.
Nov 20 16:41:24 nagios nagios: External command [1695001306] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Utility power restored: Returned from battery backup power; utility power restored. / enterprises.318.2.3.3.0 ():UPS: No longer on battery power. returned error Command failed

If I ran a cat of /usr/local/nagios/var/rw/nagios.cmd, I see the following lines appear every few seconds:

Code: Select all

[1689222801] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 02 02 02 02 02 01 01 01 02  / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 02 02 02 02 02 01 01 01 02
[1690376072] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 01 01 04 02 02 01 01 01 02  / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 01 01 04 02 02 01 01 01 02
[1691157436] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed.
[1691634065] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 02 02 02 02 02 01 01 01 02  / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 02 02 02 02 02 01 01 01 02
[1691634125] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02  / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02
[1689222801] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 01 01 02 02 02 02 02 02 02 02 01 01 01 02  / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():01 01 02 02 02 02 02 02 02 02 01 01 01 02
[1689222860] PROCESS_SERVICE_CHECK_RESULT;storage01;SNMP Traps;0;Health Status Array Change occurred (11020): A change in the health status of the server has occurred, the status is now 02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02  / sysName.0 (OCTETSTR):STORAGE01 enterprises.232.11.2.11.1.0 ():0 enterprises.232.11.2.10.7.0 ():02 02 02 02 02 02 02 02 02 02 01 01 01 02 02 02
[1692367017] PROCESS_SERVICE_CHECK_RESULT;UNKNOWN;SNMP Traps;0;APC UPS: Passed self-test: The UPS passed internal self-test. / enterprises.318.2.3.3.0 ():UPS: Self-Test passed.
gwesterman
Posts: 258
Joined: Wed Aug 23, 2023 11:29 am

Re: Clear External Command File?

Post by gwesterman »

Hi @proddan,

Based on your command file (nagios.cmd), some external application is regularly writing the command "PROCESS_SERVICE_CHECK_RESULT" to your command file, which is then being processed, promptly fails, and writes an error message to your log file. There are a few ways I can think to fix this:
1) The ideal and cleanest way to fix this would be to locate the external application writing this command to nagios.cmd and disable it. However, if you cannot find this, there are two other things you could try.
2) Not ideal, but you could disable external commands entirely. This is done by setting

Code: Select all

check_external_commands=0
(in nagios.cfg). This wouldn't allow you to use other external commands now (or ever) but it is simple.
3) Perhaps the ideal option in your scenario would be to redirect you command file. You can specify a new file location for Nagios to use to process external commands. This way you can still use external commands and your misconfigured SNMP external application will be writing to a file that does nothing (or doesn't exist if you just delete it). You can do this by specifying the file location of your command file with

Code: Select all

command_file=/usr/local/nagios/var/rw/new_command_file.cmd
in nagios.cfg.

This article might be of assistance for reference.

Let us know if this works and if you have any other questions.
proddan
Posts: 16
Joined: Mon Feb 13, 2017 8:38 am

Re: Clear External Command File?

Post by proddan »

Hi gwesterman,

Thanks, I'll give that a try.

I don't use External Commands, so may be easier to just disable them.

Do you know if there's any way I can see what's writing to the external command file?
I've looked for a log, but can't find one.

Thanks,


Peter
proddan
Posts: 16
Joined: Mon Feb 13, 2017 8:38 am

Re: Clear External Command File?

Post by proddan »

I'm kind of making some progress, but I still have an issue.

I looked in "Unconfigured Objects" and saw two entries in there, which match the errors I'm seeing in the messages files and the entries in the command file.

I tried to remove these, and it allowed me to, but after a few minutes the two unconfigured hosts came back.

I then decided to configure these as hosts, which has cleared the errors from the command logs, but I'm now getting spammed with emails alerting me about these services, then telling me they have recovered, then telling me about them again.

I know for a fact that the storage array in the first message is no longer sending them, as it's physically been decommissioned.
I'm also pretty certain the UPS device isn't sending any messages out, as I found the device with the misconfigured SNMP trap and removed that. However, I'm still getting several messages per minute.

Is the an SNMP queue I can see on Nagios? It seems like it's submitting the same message to the command rile over and over again.


Thanks,


Peter
gwesterman
Posts: 258
Joined: Wed Aug 23, 2023 11:29 am

Re: Clear External Command File?

Post by gwesterman »

Hi @proddan,

If you just need to turn off the emails from these now configured hosts, you can disable (or modify to your liking) their notifications in XI. For external command logs, these are found in the main log file /usr/local/nagios/var/nagios.log and rotated daily into /usr/local/nagios/var/archives. I would also check the Auto Configure Settings to see if Enable Auto Import is checked. This may be the cause of your reappearing hosts.

Thanks!
Post Reply