Perf Data stopped working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Perf Data stopped working

Post by CFT6Server »

Code: Select all

# grep -v ^# /etc/mod_gearman/mod_gearman_neb.conf

debug=1

logfile=/var/log/mod_gearman/mod_gearman_neb.log

server=localhost:4730




eventhandler=yes


services=yes


hosts=yes


hostgroups=Network_ALL

servicegroups=ALL_Network_Bandwidth,WMI_CPU_Checks,WMI_IO_Checks,WMI_NETWORK_Checks

do_hostchecks=yes

route_eventhandler_like_checks=no

encryption=yes


key=somethinghere




use_uniq_jobs=on




localhostgroups=localhost


localservicegroups=


result_workers=1


perfdata=no

perfdata_mode=1

orphan_host_checks=yes

orphan_service_checks=yes

accept_clear_results=no

Code: Select all

# grep gearman /usr/local/nagios/etc/nagios.cfg
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf eventhandler=no

Code: Select all

# grep -v ^# /etc/mod_gearman/mod_gearman_worker.conf


debug=1

logfile=/var/log/mod_gearman/mod_gearman_worker.log

server=<server>:4730




eventhandler=yes


services=no


hosts=no


hostgroups=Network_ALL

servicegroups=ALL_Network_Bandwidth

encryption=yes


key=<somethinghere>




job_timeout=120

min-worker=100

max-worker=500

idle-timeout=30

max-jobs=1000


spawn-rate=1

fork_on_exec=no

load_limit1=10

load_limit5=10

load_limit15=10

show_error_output=yes


enable_embedded_perl=on

use_embedded_perl_implicitly=off

use_perl_cache=on

p1_file=/usr/share/mod_gearman/mod_gearman_p1.pl
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Perf Data stopped working

Post by bheden »

This may be helpful in eliminating your memory leak.

Please ensure that you have a backup of your server before you attempt the upgrade listed in the instructions below.

Code: Select all

cd /tmp
yum remove libgearman-devel libgearman gearmand mod_gearman
mkdir gearman_install
cd gearman_install/
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/gearmand-0.33-2.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/gearmand-devel-0.33-2.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/gearmand-server-0.33-2.rhel6.x86_64.rpm
wget http://mod-gearman.org/download/v2.1.1/rhel6/x86_64/mod_gearman2-2.1.1-1.rhel6.x86_64.rpm
yum --nogpgcheck localinstall *
sed -i 's/\(^broker_module=.*mod_gearman.*\)/#\1/' /usr/local/nagios/etc/nagios.cfg
echo "broker_module=/usr/lib64/mod_gearman2/mod_gearman2.o config=/etc/mod_gearman/mod_gearman_neb.conf eventhandler=no" >> /usr/local/nagios/etc/nagios.cfg
service nagios stop
service mod_gearman_worker stop
service gearmand stop
service gearmand start
service mod_gearman_worker start
service nagios start
Please inform us if this resolves your issue. Thank you.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Perf Data stopped working

Post by CFT6Server »

I have not updated this yet, but noticed that while I was away, performance graphs are no longer working again. I rebooted this morning, but the performance graphs are still missing. I can't seem to get them to show up....

Code: Select all

# tail -25  /usr/local/nagios/var/npcd.log
[01-28-2016 10:04:25] NPCD: ThreadCounter 4/5 File is 1454003856.perfdata.host
[01-28-2016 10:04:25] NPCD: Regular File: 1454003856.perfdata.host
[01-28-2016 10:04:25] NPCD: A thread was started on thread_counter = 4
[01-28-2016 10:04:25] NPCD: DEBUG: load 11.510000/20.000000
[01-28-2016 10:04:25] NPCD: ThreadCounter 5/5 File is 1454003856.perfdata.service
[01-28-2016 10:04:25] NPCD: Regular File: 1454003856.perfdata.service
[01-28-2016 10:04:25] NPCD: WARN: MAX Thread reached: 1454003856.perfdata.service comes later with ThreadCounter: 5
[01-28-2016 10:04:25] NPCD: DEBUG: Will wait for th['4']
[01-28-2016 10:04:25] NPCD: Processing file 1454003856.perfdata.host with ID 140518939485952 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1454003856.perfdata.host
[01-28-2016 10:04:25] NPCD: Processing file '1454003856.perfdata.host'
[01-28-2016 10:04:43] NPCD: DEBUG: Will wait for th['3']
[01-28-2016 10:05:10] NPCD: ERROR: Executed command exits with return code '7'
[01-28-2016 10:05:10] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1454003826.perfdata.service'
[01-28-2016 10:05:14] NPCD: ERROR: Executed command exits with return code '7'
[01-28-2016 10:05:14] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1454003842.perfdata.service'
[01-28-2016 10:05:16] NPCD: DEBUG: Will wait for th['2']
[01-28-2016 10:05:16] NPCD: DEBUG: Will wait for th['1']
[01-28-2016 10:05:16] NPCD: DEBUG: Will wait for th['0']
[01-28-2016 10:05:16] NPCD: DEBUG: load 7.470000/20.000000
[01-28-2016 10:05:16] NPCD: ThreadCounter 0/5 File is 1454003856.perfdata.service
[01-28-2016 10:05:16] NPCD: Regular File: 1454003856.perfdata.service
[01-28-2016 10:05:16] NPCD: A thread was started on thread_counter = 0
[01-28-2016 10:05:16] NPCD: Have to wait: Filecounter = 46 - thread_counter = 1
[01-28-2016 10:05:16] NPCD: Processing file 1454003856.perfdata.service with ID 140518970955520 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1454003856.perfdata.service
[01-28-2016 10:05:16] NPCD: Processing file '1454003856.perfdata.service'

Code: Select all

]# tail -25  /usr/local/nagios/var/perfdata.log
2016-01-28 10:05:47 [6831] [1] Found Performance Data for L2E-LAN-B02 / CPU_Busy_5_Sec ('CPU Busy 5 Sec'=1;80;90;)
2016-01-28 10:05:47 [6831] [2] No Custom Template found for check_snmp (/usr/local/nagios/etc/pnp/check_commands/check_snmp.cfg)
2016-01-28 10:05:47 [6831] [2] Template is check_snmp.php
2016-01-28 10:05:47 [6831] [2] data2rrd called
2016-01-28 10:05:47 [6831] [2] RRDs::update /usr/local/nagios/share/perfdata/L2E-LAN-B02/CPU_Busy_5_Sec.rrd 1454003842:1
2016-01-28 10:05:47 [6831] [2] /usr/local/nagios/share/perfdata/L2E-LAN-B02/CPU_Busy_5_Sec.rrd updated
2016-01-28 10:05:47 [6831] [2] Processing Line 77
2016-01-28 10:05:47 [6831] [2] Datatype set to 'SERVICEPERFDATA'
2016-01-28 10:05:47 [6831] [1] Found Performance Data for L2E-LAN-B01 / VPC-3_to_KDCNBUFLT-SW-01_Po3_Port_Channel_Bandwidth (in=.000005Gb/s;7;8 out=.000767Gb/s;7;8)
2016-01-28 10:05:47 [6831] [2] No Custom Template found for check_xi_service_mrtgtraf (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_mrtgtraf.cfg)
2016-01-28 10:05:47 [6831] [2] Template is check_xi_service_mrtgtraf.php
2016-01-28 10:05:47 [6831] [2] No Custom Template found for check_xi_service_mrtgtraf (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_mrtgtraf.cfg)
2016-01-28 10:05:47 [6831] [2] Template is check_xi_service_mrtgtraf.php
2016-01-28 10:05:47 [6831] [2] data2rrd called
2016-01-28 10:05:47 [6831] [2] RRDs::update /usr/local/nagios/share/perfdata/L2E-LAN-B01/VPC-3_to_KDCNBUFLT-SW-01_Po3_Port_Channel_Bandwidth.rrd 1454003842:.000005:.000767
2016-01-28 10:05:47 [6831] [2] /usr/local/nagios/share/perfdata/L2E-LAN-B01/VPC-3_to_KDCNBUFLT-SW-01_Po3_Port_Channel_Bandwidth.rrd updated
2016-01-28 10:05:47 [6831] [2] Processing Line 78
2016-01-28 10:05:47 [6831] [2] Datatype set to 'SERVICEPERFDATA'
2016-01-28 10:05:47 [6831] [1] Found Performance Data for L2E-LAN-B01 / e2_2_UCS_Domain__1_Interconnect_A_1_18_Bandwidth (in=.032351Gb/s;7;8 out=.027169Gb/s;7;8)
2016-01-28 10:05:47 [6831] [2] No Custom Template found for check_xi_service_mrtgtraf (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_mrtgtraf.cfg)
2016-01-28 10:05:47 [6831] [2] Template is check_xi_service_mrtgtraf.php
2016-01-28 10:05:47 [6831] [2] No Custom Template found for check_xi_service_mrtgtraf (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_mrtgtraf.cfg)
2016-01-28 10:05:47 [6831] [2] Template is check_xi_service_mrtgtraf.php
2016-01-28 10:05:47 [6831] [2] data2rrd called
2016-01-28 10:05:47 [6831] [2] RRDs::update /usr/local/nagios/share/perfdata/L2E-LAN-B01/e2_2_UCS_Domain__1_Interconnect_A_1_18_Bandwidth.rrd 1454003842:.032351:.027169

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:         15950       6375       9574         27        104       4565
-/+ buffers/cache:       1705      14244
Swap:         2015          0       2015

Code: Select all

top - 10:09:48 up  1:09,  1 user,  load average: 5.21, 7.13, 6.31
Tasks: 338 total,   1 running, 337 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.5%us,  4.2%sy,  0.0%ni, 70.9%id, 12.3%wa,  0.1%hi,  1.0%si,  0.0%st
Mem:  16333268k total,  6565848k used,  9767420k free,   106868k buffers
Swap:  2064380k total,        0k used,  2064380k free,  4717636k cached
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Perf Data stopped working

Post by CFT6Server »

Looks like some graphs are starting to show after an hour or so, but so far seems spotty.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Perf Data stopped working

Post by scottwilkerson »

Glad to hear they are coming back, albeit spotty.

I did notice in your previous post you have a fairly high I/O wait time... Do you have a RAM Disk setup on this server?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Perf Data stopped working

Post by bheden »

Since you haven't upgraded yet..

I rewrote the ModGearman install script to install/upgrade automatically and copy the necessary configuration files over.

This is still in testing, but has passed all of our internal tests so far.

If you'd like to give it a go (with the caveat of supplying feedback, of course), the URL is:

http://assets.nagios.com/downloads/nagi ... Install.sh

On your server:

Code: Select all

wget http://assets.nagios.com/downloads/nagiosxi/scripts/ModGearmanInstall.sh
chmod +x ModGearmanInstall.sh
./ModGearmanInstall.sh --server --upgrade
And then follow the prompts. Make sure you have a good backup before you start. :)
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Perf Data stopped working

Post by CFT6Server »

I'll give this a try. We have multiple instances, and I am seeing this issue on a much smaller implementation also where performance graphs just stops working. Service checks are still all running fine.....

In case this provides some clues... I was able to catch this early enough today on this server, so the data stopped just before 8. after I rebooted the server, the perfs graphs looks like below. So perhaps that data is stuck?

Also using Box293's tool to review the data to confirm that it stopped before 8am...
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Perf Data stopped working

Post by rkennedy »

Are you using a ramdisk on this implementation?

The data stopping can be related to memory being full, and not being able to process. After each reboot then, is the data always coming back with time?

Let us know how the updated gearman goes.
Former Nagios Employee
Locked