Page 1 of 1
custom check_cpu_stats Long Output
Posted: Tue Apr 29, 2014 3:55 pm
by brandon.pal
Hi,
I am looking to customize check_cpu_stats to also include a full print out of top when i'm viewing the service check in XI.
I've added the output to check_cpu_stats and when I run it via command line it works. In Nagios it is truncating the top output as it's to long. So questions
1) How can I extend the allowed output
2) Is there a better way to do this
3) How can I assure formatting looks right.
I've uploaded an attachment of what I'm seeing now in Nagios and also one that also shows what we'd like to see. What we'd like to see is from our current system xymon/hobbit.
Re: custom check_cpu_stats Long Output
Posted: Tue Apr 29, 2014 4:48 pm
by abrist
How are you running the check? With nrpe or another agent?
If you can run it through nrpe from the cli and it displays in full, then we might have to look at the XI details php files. If it is truncated by nrpe, then we may be able to recompile nrpe for larger outputs.
Can you post the top plugin script as well? I am curious to try this on one of my test boxes.
Re: custom check_cpu_stats Long Output
Posted: Tue Apr 29, 2014 5:19 pm
by Box293
From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".
Is the Status Information and Performance data shown here truncated?
Re: custom check_cpu_stats Long Output
Posted: Tue Apr 29, 2014 10:23 pm
by brandon.pal
Box293 wrote:From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".
Is the Status Information and Performance data shown here truncated?
- Yes it is truncated here.
abrist wrote:How are you running the check? With nrpe or another agent?
NRPE
abrist wrote:If you can run it through nrpe from the cli and it displays in full, then we might have to look at the XI details php files. If it is truncated by nrpe, then we may be able to recompile nrpe for larger outputs.
When I run form the nagios server I get:
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H 10.91.101.25 -c check_cpu_stats
CPU STATISTICS OK: user=0.40% system=0.00% iowait=0.00% idle=99.60% | user=0.40% system=0.00% iowait=0.00%;30;100 idle=99.60%
top - 23:23:38 up 8 days, 12:06, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 108 total, 1 running, 107 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24608492k total, 1355024k used, 23253468k free, 177276k buffers
Swap: 8191992k total, 0k used, 8191992k free, 787404k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 19364 1552 1232 S 0.0 0.0 0:01.36 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.40 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0
When I run the script on the server itself:
Code: Select all
CPU STATISTICS OK: user=0.00% system=0.00% iowait=0.00% idle=100.00% | user=0.00% system=0.00% iowait=0.00%;30;100 idle=100.00%
top - 23:20:55 up 8 days, 12:04, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 105 total, 1 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24608492k total, 1352528k used, 23255964k free, 177220k buffers
Swap: 8191992k total, 0k used, 8191992k free, 787376k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 19364 1552 1232 S 0.0 0.0 0:01.36 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.40 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.89 watchdog/0
7 root 20 0 0 0 0 S 0.0 0.0 0:29.83 events/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pm
13 root 20 0 0 0 0 S 0.0 0.0 0:02.30 sync_supers
14 root 20 0 0 0 0 S 0.0 0.0 0:02.98 bdi-default
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/0
16 root 20 0 0 0 0 S 0.0 0.0 0:02.05 kblockd/0
17 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpid
18 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_hotplug
20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_aux
21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/0
22 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksuspend_usbd
23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd
24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kseriod
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md/0
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md_misc/0
27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 linkwatch
28 root 20 0 0 0 0 S 0.0 0.0 0:00.15 khungtaskd
29 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0
30 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
31 root 39 19 0 0 0 S 0.0 0.0 0:05.05 khugepaged
32 root 20 0 0 0 0 S 0.0 0.0 0:00.00 aio/0
33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 crypto/0
38 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthrotld/0
39 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pciehpd
41 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
42 root 20 0 0 0 0 S 0.0 0.0 0:00.00 usbhid_resumer
73 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kstriped
144 root 20 0 0 0 0 S 0.0 0.0 0:14.46 mpt_poll_0
145 root 20 0 0 0 0 S 0.0 0.0 0:00.00 mpt/0
146 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
150 root 20 0 0 0 0 S 0.0 0.0 0:00.01 scsi_eh_1
151 root 20 0 0 0 0 S 0.0 0.0 0:00.02 scsi_eh_2
297 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdmflush
299 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdmflush
316 root 20 0 0 0 0 S 0.0 0.0 0:03.68 jbd2/dm-0-8
317 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit
400 root 16 -4 10712 796 320 S 0.0 0.0 0:00.46 udevd
585 root 20 0 0 0 0 S 0.0 0.0 0:09.26 vmmemctl
691 root 18 -2 10712 836 348 S 0.0 0.0 0:00.00 udevd
696 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdmflush
735 root 20 0 0 0 0 S 0.0 0.0 0:00.00 jbd2/sda1-8
736 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit
737 root 20 0 0 0 0 S 0.0 0.0 0:00.09 jbd2/dm-2-8
738 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit
779 root 20 0 0 0 0 S 0.0 0.0 0:00.13 kauditd
1159 root 20 0 184m 4400 3556 S 0.0 0.0 5:48.19 vmtoolsd
1269 root 16 -4 27640 836 564 S 0.0 0.0 0:01.18 auditd
1336 rpc 20 0 18976 900 652 S 0.0 0.0 0:00.76 rpcbind
1354 rpcuser 20 0 23348 1336 888 S 0.0 0.0 0:00.03 rpc.statd
1464 dbus 20 0 21404 920 636 S 0.0 0.0 0:00.04 dbus-daemon
1480 root 20 0 184m 3332 2448 S 0.0 0.0 0:00.00 cupsd
1497 root 20 0 0 0 0 S 0.0 0.0 0:03.21 flush-253:0
1507 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rpciod/0
1509 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kslowd000
1510 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kslowd001
1511 root 20 0 0 0 0 S 0.0 0.0 0:00.00 nfsiod
1512 root 20 0 0 0 0 S 0.0 0.0 0:00.00 nfsv4.0-svc
1540 root 20 0 4080 632 524 S 0.0 0.0 0:00.00 acpid
1549 haldaemo 20 0 37968 3836 2828 S 0.0 0.0 0:03.34 hald
1550 root 20 0 20328 1172 976 S 0.0 0.0 0:00.00 hald-runner
1580 root 20 0 22448 1092 932 S 0.0 0.0 0:00.00 hald-addon-inpu
1594 haldaemo 20 0 17936 1036 892 S 0.0 0.0 0:00.00 hald-addon-acpi
1612 root 20 0 376m 1776 1288 S 0.0 0.0 0:07.23 automount
1628 root 20 0 6280 304 180 S 0.0 0.0 0:00.00 mcelog
1640 root 20 0 66608 1236 520 S 0.0 0.0 0:01.28 sshd
1648 ntp 20 0 26496 1936 1360 S 0.0 0.0 0:00.72 ntpd
1724 root 20 0 81280 3420 2516 S 0.0 0.0 0:02.02 master
1733 postfix 20 0 81532 3428 2544 S 0.0 0.0 0:00.33 qmgr
1748 root 20 0 107m 900 768 S 0.0 0.0 0:00.00 abrtd
1756 root 20 0 176m 9264 2228 S 0.0 0.0 0:20.36 httpd
1772 root 20 0 114m 1248 644 S 0.0 0.0 0:02.41 crond
1830 root 20 0 21540 476 296 S 0.0 0.0 0:00.00 atd
1878 root 20 0 62340 596 240 S 0.0 0.0 0:00.67 certmonger
1908 root 20 0 4064 536 464 S 0.0 0.0 0:00.00 mingetty
1912 root 20 0 4064 532 464 S 0.0 0.0 0:00.00 mingetty
1916 root 20 0 4064 532 464 S 0.0 0.0 0:00.00 mingetty
1920 root 20 0 4064 536 464 S 0.0 0.0 0:00.00 mingetty
1924 root 20 0 4064 536 464 S 0.0 0.0 0:00.00 mingetty
1928 root 20 0 4064 536 464 S 0.0 0.0 0:00.00 mingetty
2294 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2296 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2297 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2298 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2299 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2300 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2301 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
2302 apache 20 0 176m 7684 624 S 0.0 0.0 0:00.00 httpd
3637 postfix 20 0 81360 3380 2508 S 0.0 0.0 0:00.00 pickup
4344 root 20 0 98.0m 4044 3064 S 0.0 0.0 0:00.12 sshd
4346 root 20 0 105m 1928 1456 S 0.0 0.0 0:00.04 bash
4767 root 20 0 103m 1224 1060 S 0.0 0.0 0:00.00 check_cpu_stats
4768 root 20 0 15032 1084 824 R 0.0 0.0 0:00.00 top
19737 root 20 0 185m 1784 1056 S 0.0 0.0 0:00.56 rsyslogd
27242 nrpe 20 0 41328 1244 860 S 0.0 0.0 0:01.88 nrpe
abrist wrote:Can you post the top plugin script as well? I am curious to try this on one of my test boxes.
To check_cpu_stats I've added the following variable deceleration at the top:
Also I've changed the output to the following:
Code: Select all
echo "$label user=${CPU_USER}% system=${CPU_SYSTEM}% iowait=${CPU_IOWAIT}% idle=${CPU_IDLE}% | user=${CPU_USER2}% system=${CPU_SYSTEM2}% iowait=${CPU_IOWAIT2}%;$WARNING_THRESHOLD;$CRITICAL_THRESHOLD idle=${CPU_IDLE2}%"
echo "$top"
exit $result
Re: custom check_cpu_stats Long Output
Posted: Wed Apr 30, 2014 11:25 am
by sreinhardt
I'm not sure about your system, but top on any of mine is around 15Kb+ nrpe has a hard limit of 4096 bytes passed for any service check including total standard and long service output. While your first image did not seem near that limit, is the page that box263 mentioned longer than the main service details output, and possibly closer to that limit?
From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".
Is the Status Information and Performance data shown here truncated?
Re: custom check_cpu_stats Long Output
Posted: Wed Apr 30, 2014 4:28 pm
by Box293
sreinhardt wrote:nrpe has a hard limit of 4096 bytes passed for any service check including total standard and long service output.
I came across this issue while developing a plugin recently and it was because of the nrpe limit.
I decided to stop using nrpe and instead I used check_by_ssh to perform the remote checks.
check_by_ssh does not have a limit on how much data it will receive back and hence it works great for plugins with a lot of output.
As a bonus, I found that configuring and using check_by_ssh was less complicated and quicker to get it up and running as you don't need to install an agent on the remote servers.
Re: custom check_cpu_stats Long Output
Posted: Thu May 01, 2014 10:59 am
by brandon.pal
From the Service Status Detail page, on the Advanced tab, click on the link "See this service in Nagios Core".
Is the Status Information and Performance data shown here truncated?
I've attached the image. Looks the same to me.
Box293 wrote:I decided to stop using nrpe and instead I used check_by_ssh to perform the remote checks.
I will have a look at this.
Re: custom check_cpu_stats Long Output
Posted: Thu May 01, 2014 12:50 pm
by scottwilkerson
Box293 wrote:
I decided to stop using nrpe and instead I used check_by_ssh to perform the remote checks.
I would concur, with one caveat, in XI you would need to do some extra massaging to get the output you are looking for.
1. Enable HTML plugin output at Admin -> Manage System Config -> Allow HTML Tags in Host/Service Status
2. Wrap your long_output in
tags
3. Modify the nagios database to accept much longer long_output
Code: Select all
echo "ALTER TABLE nagios_servicestatus MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_servicechecks MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_hoststatus MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
echo "ALTER TABLE nagios_hostchecks MODIFY long_output VARCHAR(65536);"|mysql -pnagiosxi nagios
Re: custom check_cpu_stats Long Output
Posted: Fri May 02, 2014 10:52 am
by brandon.pal
Ok activated the <pre> tag and that's working as the text is formatting has improved.
Altered the tables but still only getting the same amount via NRPE. I've yet to setup the SSH. Can I alter NRPE to except more?
Re: custom check_cpu_stats Long Output
Posted: Fri May 02, 2014 10:57 am
by abrist
brandon.pal wrote: Can I alter NRPE to except more?
Not easily. Both the agent and the check_nrpe plugin would need to be recompiled with a rather old (and possibly unstable) patch from years ago. There has been some discussion internally concerning adding this functionality to nrpe in an official way, but no action has been taken as of yet. I would suggest looking at check_by_ssh in the meantime.