Page 1 of 1

Mail of check_proc does not contain full output

Posted: Sat Jan 20, 2018 11:29 am
by hvdbrand
Dear all,

I am using nagios core 4.3.4 on CentOS 7.
I have set up check_proc to report porcesses that use a lot of virtual memory.
The command that I used is $USER1$/check_procs -w 20000000 -c 10000000 -m VSZ -vv
With debug options, I try to get all information in the e-mail message.

The output that I see on my nagios site looks like this:
CMD: /usr/bin/ps -eo 's uid pid ppid vsz rss pcpu etime comm args'
Matched: uid=0 vsz=194016 rss=7132 pid=1 ppid=0 pcpu=0.00 stat=S etime=4-14:14:19 prog=systemd args=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
Matched: uid=0 vsz=0 rss=0 pid=2 ppid=0 pcpu=0.00 stat=S etime=4-14:14:19 prog=kthreadd args=[kthreadd]
Matched: uid=0 vsz=0 rss=0 pid=3 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=0 args=[ksoftirqd/0]
Matched: uid=0 vsz=0 rss=0 pid=5 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=0:0H args=[kworker/0:0H]
Matched: uid=0 vsz=0 rss=0 pid=7 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=0 args=[migration/0]
Matched: uid=0 vsz=0 rss=0 pid=8 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=rcu_bh args=[rcu_bh]
Matched: uid=0 vsz=0 rss=0 pid=9 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=rcu_sched args=[rcu_sched]
Matched: uid=0 vsz=0 rss=0 pid=10 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=0 args=[watchdog/0]
Matched: uid=0 vsz=0 rss=0 pid=11 ppid=2 pcpu=0.00 stat=S etime=4-14:14:19 prog=1 args=[watchdog/1]
...
Truncated
...
Matched: uid=0 vsz=187676 rss=2432 pid=28604 ppid=1 pcpu=0.00 stat=S etime=05:51:53 prog=su args=su -s /bin/sh -c umask $0; exec "$1" "$@" emby -- 002 env MAGICK_HOME=/usr/lib/emby-server MAGICK_CODER_FILTER_PATH=/usr/lib/emby-server/lib64/EmbyMagick-6.9.6/modules-Q8/filters MAGICK_CODER_MODULE_PATH=/usr/lib/emby-server/lib64/EmbyMagick-6.9.6/modules-Q8/coders MONO_THREADS_PER_CPU=500 max-heap-size=96m,soft-heap-limit=64m,nursery-size=4m LD_LIBRARY_PATH=/usr/lib/emby-server/lib64 /usr/bin/mono-sgen --optimize=all /usr/lib/emby-server/bin/MediaBrowser.Server.Mono.exe -programdata /var/lib/emby-server -restartpath /usr/lib/emby-server/restart.sh
Matched: uid=997 vsz=1782576 rss=132164 pid=28623 ppid=28604 pcpu=0.00 stat=S etime=05:51:53 prog=mono-sgen args=/usr/bin/mono-sgen --optimize=all /usr/lib/emby-server/bin/MediaBrowser.Server.Mono.exe -programdata /var/lib/emby-server -restartpath /usr/lib/emby-server/restart.sh
Matched: uid=99 vsz=2453252 rss=178500 pid=30516 ppid=1 pcpu=0.70 stat=S etime=05:04:36 prog=ntopng args=/usr/local/bin/ntopng /run/ntopng.conf
Matched: uid=0 vsz=0 rss=0 pid=31658 ppid=2 pcpu=0.00 stat=S etime=04:37:00 prog=3:1 args=[kworker/3:1]
VSZ OK: 215 processes
However, the e-mail notification that I get contains the following:
***** Nagios *****

Notification Type: RECOVERY

Service: Proc Virt
Host: Linux and Nagios Server
Address: 127.0.0.1
State: OK

Date/Time: Tue Jan 16 21:59:47 CET 2018

Additional Info:

CMD: /usr/bin/ps -eo s uid pid ppid vsz rss pcpu etime comm args
So I am missing a large part of the output in the e-mail.
I have verified that I can get the output when I mail it myself using the mail command line program.

So to me it seems that nagios does not send the full information.
Has anyone got a clue what is happening or how I can debug this issue?

Best wishes,
Hugo

Re: Mail of check_proc does not contain full output

Posted: Mon Jan 22, 2018 10:55 am
by mcapra
TL;DR -- the default Nagios config objects use $SERVICEOUTPUT$ for notifications which only returns the first line of STDOUT.

There's 2 different Nagios macros that are responsible for service status information: $SERVICEOUTPUT$ and $LONGSERVICEOUTPUT$.

I'm betting if you were to check the command definition for the contact's service/host_notification_commands, it's using $SERVICEOUTPUT$.

Here's the generic-contact template which uses notify-service-by-email as the service_notification_commands:

Code: Select all

define contact{
        name                            generic-contact         ; The name of this contact template
        service_notification_period     24x7                    ; service notifications can be sent anytime
        host_notification_period        24x7                    ; host notifications can be sent anytime
        service_notification_options    w,u,c,r,f,s             ; send notifications for all service states, flapping events, and scheduled downtime events
        host_notification_options       d,u,r,f,s               ; send notifications for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email ; send service notifications via email
        host_notification_commands      notify-host-by-email    ; send host notifications via email
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }
And here's the corresponding notify-service-by-email command definition:

Code: Select all

define command{
        command_name    notify-service-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
        }
Which indeed uses $SERVICEOUTPUT$.

Re: Mail of check_proc does not contain full output

Posted: Mon Jan 22, 2018 11:30 am
by dwhitfield
Thanks @mcapra!

OP, let us know if you need any additional information!