Page 1 of 2
External command error: Malformed command
Posted: Tue Jan 23, 2018 2:51 am
by paolo974
Hello,
Since 5 days, my Nagios is not working properly.
I have 1 master and 4 slaves. Slaves is sending passive checks to master.
The problem is that sometimes, slaves are sending:
Code: Select all
8 PROCESS_SERVICE_CHECK_RESULT;<host>;<service_description>;<service_state>;<plugin_output>
Note the
8 at the beginning, it should be
[<timestamp>]
And, in the
nagios.log file of the master, I have
Code: Select all
[<timestamp>] External command error: Malformed command
Slaves are sending passive checks by writting in
var/rw/nagios.cmd file
Master and slaves are on the same machine.
Do someone have an idea about what is going on ?
Thank you
Re: External command error: Malformed command
Posted: Tue Jan 23, 2018 2:27 pm
by mcapra
paolo974 wrote:
Slaves are sending passive checks by writting in var/rw/nagios.cmd file
How specifically is this being done? Is there an event handler, script, cron job, NRDP, wrapper scripts, etc?
Re: External command error: Malformed command
Posted: Tue Jan 23, 2018 4:44 pm
by dwhitfield
mcapra wrote: Is there an event handler, script, cron job, NRDP, wrapper scripts, etc?
whatever it is, can you send us the relevant configs/scripts?
Re: External command error: Malformed command
Posted: Wed Jan 24, 2018 1:29 am
by paolo974
It is event handler
slaves use event handler and code is:
Code: Select all
#!/bin/sh
# write a command to the Nagios command file to cause
# it to process a service check result
echocmd="/bin/echo"
CommandFile="/app/monitor/MASTER1/var/rw/nagios.cmd"
# get the current date/time in seconds sine UNIX epoch
datetime=`date +%s`
# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"
# append the command to the end of the command file
`$echocmd $cmdline >> $CommandFile`
Example of a service definition, defined on slaves:
Code: Select all
define service {
host_name myhost
service_description CPU Usage
check_period 24x7
check_command nrpe_check_cpu
contact_groups admins,operators,viewers
notification_period 24x7
initial_state o
importance 0
check_interval 10.000000
retry_interval 2.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,w,u,c
notifications_enabled 0
notification_interval 120.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
Re: External command error: Malformed command
Posted: Wed Jan 24, 2018 5:49 pm
by npolovenko
@paolo974, Have you taken a look inside the /app/monitor/MASTER1/var/rw/nagios.cmd file to see if the number 8 gets written out instead of the epoch stamp? The script looks fine at a first glance.
Re: External command error: Malformed command
Posted: Thu Jan 25, 2018 1:20 am
by paolo974
@npolovenko, Yes, the number 8 gets written out instead of the epoch stamp in /app/monitor/MASTER1/var/rw/nagios.cmd file
In /app/monitor/MASTER1/var/nagios.log, the number 8 gets written out too,
and if I put an
echo $cmdline >> /tmp/debug before
`$echocmd $cmdline >> $CommandFile` in event handler script, the number 8 gets written out too in /tmp/debug file.
As a workaround, I change the line
`$echocmd $cmdline >> $CommandFile` by
`echo $cmdline | sed -e 's/^8 /\['$(date +%s)'\] /' >> $CommandFile`, it works but I still don't know why the number 8 gets written
Re: External command error: Malformed command
Posted: Thu Jan 25, 2018 10:39 am
by mcapra
It might depend on the OS and shell being used:
As a troubleshooting step, you might try doing this as a one-liner on the off chance some strange environment/shell related things are happening:
Code: Select all
/bin/echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4" >> /app/monitor/MASTER1/var/rw/nagios.cmd
Re: External command error: Malformed command
Posted: Thu Jan 25, 2018 2:51 pm
by tmcdonald
I'm with
@mcapra on this one, I couldn't reproduce the issue on Debian or CentOS. I did notice that
/bin/sh --help did not work on Debian, so running
ls -l /bin/sh might be needed to tell us where it is symlinking to.
Re: External command error: Malformed command
Posted: Fri Jan 26, 2018 1:56 am
by paolo974
# /bin/sh --help
Code: Select all
GNU bash, version 4.1.2(1)-release-(x86_64-redhat-linux-gnu)
Usage: /bin/sh [GNU long option] [option] ...
/bin/sh [GNU long option] [option] script-file ...
GNU long options:
--debug
--debugger
--dump-po-strings
--dump-strings
--help
--init-file
--login
--noediting
--noprofile
--norc
--posix
--protected
--rcfile
--rpm-requires
--restricted
--verbose
--version
Shell options:
-irsD or -c command or -O shopt_option (invocation only)
-abefhkmnptuvxBCHP or -o option
Type `/bin/sh -c "help set"' for more information about shell options.
Type `/bin/sh -c help' for more information about shell builtin commands.
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.9 (Santiago)
# ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Nov 23 06:36 /bin/sh -> bash
Using following one-liner works
/bin/echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4" >> /app/monitor/MASTER1/var/rw/nagios.cmd
if I use following code for my event handler, it works but from time to time I still see number 8 getting written in /tmp/debug
event handler code:
Code: Select all
#!/bin/sh
/bin/echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4" >> /app/monitor/MASTER1/var/rw/nagios.cmd
## for debug purpose
echocmd="/bin/echo"
CommandFile="/app/monitor/MASTER1/var/rw/nagios.cmd"
datetime=`date +%s`
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"
`$echocmd $cmdline >> /tmp/debug`
# cat /tmp/debug
Code: Select all
<ellipsed>
[1516947367] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947367] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947367] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947367] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947367] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947367] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
8 PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947369] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947369] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947369] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
[1516947369] PROCESS_SERVICE_CHECK_RESULT; <ellipsed>
<ellipsed>
I noticed that the number 8 gets written only when timestamp ends with 8
Re: External command error: Malformed command
Posted: Fri Jan 26, 2018 4:47 pm
by dwhitfield
What version of Core is this? We can't really come up with any reason why this might be happening, but at least if we have a version on which we can test we can see if we can reproduce.