Page 1 of 1

custom check_cpu_stats Long Output

Posted: Tue Apr 29, 2014 3:55 pm
by brandon.pal
Hi,

I am looking to customize check_cpu_stats to also include a full print out of top when i'm viewing the service check in XI.

I've added the output to check_cpu_stats and when I run it via command line it works. In Nagios it is truncating the top output as it's to long. So questions

1) How can I extend the allowed output
2) Is there a better way to do this
3) How can I assure formatting looks right.

I've uploaded an attachment of what I'm seeing now in Nagios and also one that also shows what we'd like to see. What we'd like to see is from our current system xymon/hobbit.

Re: custom check_cpu_stats Long Output

Posted: Tue Apr 29, 2014 4:48 pm
by abrist
How are you running the check? With nrpe or another agent?
If you can run it through nrpe from the cli and it displays in full, then we might have to look at the XI details php files. If it is truncated by nrpe, then we may be able to recompile nrpe for larger outputs.
Can you post the top plugin script as well? I am curious to try this on one of my test boxes.

Re: custom check_cpu_stats Long Output

Posted: Tue Apr 29, 2014 5:19 pm
by Box293
From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".

Is the Status Information and Performance data shown here truncated?

Re: custom check_cpu_stats Long Output

Posted: Tue Apr 29, 2014 10:23 pm
by brandon.pal
Box293 wrote:From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".
Is the Status Information and Performance data shown here truncated?
- Yes it is truncated here.
abrist wrote:How are you running the check? With nrpe or another agent?
NRPE
abrist wrote:If you can run it through nrpe from the cli and it displays in full, then we might have to look at the XI details php files. If it is truncated by nrpe, then we may be able to recompile nrpe for larger outputs.
When I run form the nagios server I get:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H 10.91.101.25 -c check_cpu_stats
CPU STATISTICS OK: user=0.40% system=0.00% iowait=0.00% idle=99.60% | user=0.40% system=0.00% iowait=0.00%;30;100 idle=99.60%
top - 23:23:38 up 8 days, 12:06,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 108 total,   1 running, 107 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  24608492k total,  1355024k used, 23253468k free,   177276k buffers
Swap:  8191992k total,        0k used,  8191992k free,   787404k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      20   0 19364 1552 1232 S  0.0  0.0   0:01.36 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.40 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0
When I run the script on the server itself:

Code: Select all

CPU STATISTICS OK: user=0.00% system=0.00% iowait=0.00% idle=100.00% | user=0.00% system=0.00% iowait=0.00%;30;100 idle=100.00%
top - 23:20:55 up 8 days, 12:04,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 105 total,   1 running, 104 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  24608492k total,  1352528k used, 23255964k free,   177220k buffers
Swap:  8191992k total,        0k used,  8191992k free,   787376k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      20   0 19364 1552 1232 S  0.0  0.0   0:01.36 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.40 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.89 watchdog/0
    7 root      20   0     0    0    0 S  0.0  0.0   0:29.83 events/0
    8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cgroup
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper
   10 root      20   0     0    0    0 S  0.0  0.0   0:00.00 netns
   11 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr
   12 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pm
   13 root      20   0     0    0    0 S  0.0  0.0   0:02.30 sync_supers
   14 root      20   0     0    0    0 S  0.0  0.0   0:02.98 bdi-default
   15 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/0
   16 root      20   0     0    0    0 S  0.0  0.0   0:02.05 kblockd/0
   17 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpid
   18 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_notify
   19 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_hotplug
   20 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_aux
   21 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/0
   22 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksuspend_usbd
   23 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khubd
   24 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kseriod
   25 root      20   0     0    0    0 S  0.0  0.0   0:00.00 md/0
   26 root      20   0     0    0    0 S  0.0  0.0   0:00.00 md_misc/0
   27 root      20   0     0    0    0 S  0.0  0.0   0:00.00 linkwatch
   28 root      20   0     0    0    0 S  0.0  0.0   0:00.15 khungtaskd
   29 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kswapd0
   30 root      25   5     0    0    0 S  0.0  0.0   0:00.00 ksmd
   31 root      39  19     0    0    0 S  0.0  0.0   0:05.05 khugepaged
   32 root      20   0     0    0    0 S  0.0  0.0   0:00.00 aio/0
   33 root      20   0     0    0    0 S  0.0  0.0   0:00.00 crypto/0
   38 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthrotld/0
   39 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pciehpd
   41 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
   42 root      20   0     0    0    0 S  0.0  0.0   0:00.00 usbhid_resumer
   73 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kstriped
  144 root      20   0     0    0    0 S  0.0  0.0   0:14.46 mpt_poll_0
  145 root      20   0     0    0    0 S  0.0  0.0   0:00.00 mpt/0
  146 root      20   0     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  150 root      20   0     0    0    0 S  0.0  0.0   0:00.01 scsi_eh_1
  151 root      20   0     0    0    0 S  0.0  0.0   0:00.02 scsi_eh_2
  297 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kdmflush
  299 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kdmflush
  316 root      20   0     0    0    0 S  0.0  0.0   0:03.68 jbd2/dm-0-8
  317 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ext4-dio-unwrit
  400 root      16  -4 10712  796  320 S  0.0  0.0   0:00.46 udevd
  585 root      20   0     0    0    0 S  0.0  0.0   0:09.26 vmmemctl
  691 root      18  -2 10712  836  348 S  0.0  0.0   0:00.00 udevd
  696 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kdmflush
  735 root      20   0     0    0    0 S  0.0  0.0   0:00.00 jbd2/sda1-8
  736 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ext4-dio-unwrit
  737 root      20   0     0    0    0 S  0.0  0.0   0:00.09 jbd2/dm-2-8
  738 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ext4-dio-unwrit
  779 root      20   0     0    0    0 S  0.0  0.0   0:00.13 kauditd
 1159 root      20   0  184m 4400 3556 S  0.0  0.0   5:48.19 vmtoolsd
 1269 root      16  -4 27640  836  564 S  0.0  0.0   0:01.18 auditd
 1336 rpc       20   0 18976  900  652 S  0.0  0.0   0:00.76 rpcbind
 1354 rpcuser   20   0 23348 1336  888 S  0.0  0.0   0:00.03 rpc.statd
 1464 dbus      20   0 21404  920  636 S  0.0  0.0   0:00.04 dbus-daemon
 1480 root      20   0  184m 3332 2448 S  0.0  0.0   0:00.00 cupsd
 1497 root      20   0     0    0    0 S  0.0  0.0   0:03.21 flush-253:0
 1507 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/0
 1509 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kslowd000
 1510 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kslowd001
 1511 root      20   0     0    0    0 S  0.0  0.0   0:00.00 nfsiod
 1512 root      20   0     0    0    0 S  0.0  0.0   0:00.00 nfsv4.0-svc
 1540 root      20   0  4080  632  524 S  0.0  0.0   0:00.00 acpid
 1549 haldaemo  20   0 37968 3836 2828 S  0.0  0.0   0:03.34 hald
 1550 root      20   0 20328 1172  976 S  0.0  0.0   0:00.00 hald-runner
 1580 root      20   0 22448 1092  932 S  0.0  0.0   0:00.00 hald-addon-inpu
 1594 haldaemo  20   0 17936 1036  892 S  0.0  0.0   0:00.00 hald-addon-acpi
 1612 root      20   0  376m 1776 1288 S  0.0  0.0   0:07.23 automount
 1628 root      20   0  6280  304  180 S  0.0  0.0   0:00.00 mcelog
 1640 root      20   0 66608 1236  520 S  0.0  0.0   0:01.28 sshd
 1648 ntp       20   0 26496 1936 1360 S  0.0  0.0   0:00.72 ntpd
 1724 root      20   0 81280 3420 2516 S  0.0  0.0   0:02.02 master
 1733 postfix   20   0 81532 3428 2544 S  0.0  0.0   0:00.33 qmgr
 1748 root      20   0  107m  900  768 S  0.0  0.0   0:00.00 abrtd
 1756 root      20   0  176m 9264 2228 S  0.0  0.0   0:20.36 httpd
 1772 root      20   0  114m 1248  644 S  0.0  0.0   0:02.41 crond
 1830 root      20   0 21540  476  296 S  0.0  0.0   0:00.00 atd
 1878 root      20   0 62340  596  240 S  0.0  0.0   0:00.67 certmonger
 1908 root      20   0  4064  536  464 S  0.0  0.0   0:00.00 mingetty
 1912 root      20   0  4064  532  464 S  0.0  0.0   0:00.00 mingetty
 1916 root      20   0  4064  532  464 S  0.0  0.0   0:00.00 mingetty
 1920 root      20   0  4064  536  464 S  0.0  0.0   0:00.00 mingetty
 1924 root      20   0  4064  536  464 S  0.0  0.0   0:00.00 mingetty
 1928 root      20   0  4064  536  464 S  0.0  0.0   0:00.00 mingetty
 2294 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2296 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2297 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2298 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2299 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2300 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2301 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 2302 apache    20   0  176m 7684  624 S  0.0  0.0   0:00.00 httpd
 3637 postfix   20   0 81360 3380 2508 S  0.0  0.0   0:00.00 pickup
 4344 root      20   0 98.0m 4044 3064 S  0.0  0.0   0:00.12 sshd
 4346 root      20   0  105m 1928 1456 S  0.0  0.0   0:00.04 bash
 4767 root      20   0  103m 1224 1060 S  0.0  0.0   0:00.00 check_cpu_stats
 4768 root      20   0 15032 1084  824 R  0.0  0.0   0:00.00 top
19737 root      20   0  185m 1784 1056 S  0.0  0.0   0:00.56 rsyslogd
27242 nrpe      20   0 41328 1244  860 S  0.0  0.0   0:01.88 nrpe
abrist wrote:Can you post the top plugin script as well? I am curious to try this on one of my test boxes.
To check_cpu_stats I've added the following variable deceleration at the top:

Code: Select all

top=`top -b -n 1`
Also I've changed the output to the following:

Code: Select all

echo "$label user=${CPU_USER}% system=${CPU_SYSTEM}% iowait=${CPU_IOWAIT}% idle=${CPU_IDLE}% | user=${CPU_USER2}% system=${CPU_SYSTEM2}% iowait=${CPU_IOWAIT2}%;$WARNING_THRESHOLD;$CRITICAL_THRESHOLD idle=${CPU_IDLE2}%"
       echo "$top"
        exit $result

Re: custom check_cpu_stats Long Output

Posted: Wed Apr 30, 2014 11:25 am
by sreinhardt
I'm not sure about your system, but top on any of mine is around 15Kb+ nrpe has a hard limit of 4096 bytes passed for any service check including total standard and long service output. While your first image did not seem near that limit, is the page that box263 mentioned longer than the main service details output, and possibly closer to that limit?
From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".

Is the Status Information and Performance data shown here truncated?

Re: custom check_cpu_stats Long Output

Posted: Wed Apr 30, 2014 4:28 pm
by Box293
sreinhardt wrote:nrpe has a hard limit of 4096 bytes passed for any service check including total standard and long service output.
I came across this issue while developing a plugin recently and it was because of the nrpe limit.

I decided to stop using nrpe and instead I used check_by_ssh to perform the remote checks.

check_by_ssh does not have a limit on how much data it will receive back and hence it works great for plugins with a lot of output.

As a bonus, I found that configuring and using check_by_ssh was less complicated and quicker to get it up and running as you don't need to install an agent on the remote servers.

Re: custom check_cpu_stats Long Output

Posted: Thu May 01, 2014 10:59 am
by brandon.pal
From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".
Is the Status Information and Performance data shown here truncated?

I've attached the image. Looks the same to me.


Box293 wrote:I decided to stop using nrpe and instead I used check_by_ssh to perform the remote checks.
I will have a look at this.

Re: custom check_cpu_stats Long Output

Posted: Thu May 01, 2014 12:50 pm
by scottwilkerson
Box293 wrote: I decided to stop using nrpe and instead I used check_by_ssh to perform the remote checks.
I would concur, with one caveat, in XI you would need to do some extra massaging to get the output you are looking for.
1. Enable HTML plugin output at Admin -> Manage System Config -> Allow HTML Tags in Host/Service Status
2. Wrap your long_output in

Code: Select all

<pre></pre>
tags
3. Modify the nagios database to accept much longer long_output

Code: Select all

echo "ALTER TABLE nagios_servicestatus MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_servicechecks MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_hoststatus MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_hostchecks MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios

Re: custom check_cpu_stats Long Output

Posted: Fri May 02, 2014 10:52 am
by brandon.pal
Ok activated the <pre> tag and that's working as the text is formatting has improved.

Altered the tables but still only getting the same amount via NRPE. I've yet to setup the SSH. Can I alter NRPE to except more?

Re: custom check_cpu_stats Long Output

Posted: Fri May 02, 2014 10:57 am
by abrist
brandon.pal wrote: Can I alter NRPE to except more?
Not easily. Both the agent and the check_nrpe plugin would need to be recompiled with a rather old (and possibly unstable) patch from years ago. There has been some discussion internally concerning adding this functionality to nrpe in an official way, but no action has been taken as of yet. I would suggest looking at check_by_ssh in the meantime.