Performance Data Not Working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
GreatWolfResorts
Posts: 48
Joined: Tue Mar 15, 2011 11:12 am
Location: Madison, WI
Contact:

Performance Data Not Working

Post by GreatWolfResorts »

I've tried sifting through former posts on performance data issues without any real luck. This is a two phase change that occurred with our NagiosXI environment.

1. We needed to move our system off a physical server that requires decommissioning.
2. We wanted to upgrade to 2012 R2.9 to stay current.

The new server is a VM on ESXi. The former version of the software was 2012 R1.6 (So quite a jump.) The first step was to build the VM on R1.6, then restore the latest backup we to this server. Once complete, I logged in and verified things looked good. The second step was to download the R2.9 and run the upgrade script. This completed without error. One item that stood out right away was the performance graphs weren't working. This was apparent because many of our networking checks were complaining about the mrtg .rrd files not being present. I grabbed the files from the old server and populated them on the new one. These errors cleared, but we lack any new performance data.

Hopefully this information will prove helpful:

Code: Select all

[root@gwr-noc /]# tail -15 /usr/local/nagios/var/npcd.log
[05-02-2014 12:02:45] NPCD: npcd Daemon (0.4.14) started with PID=18461
[05-02-2014 12:02:45] NPCD: Please have a look at 'npcd -V' to get license information
[05-02-2014 12:02:45] NPCD: HINT: load_threshold is enabled - ('20.000000')
[05-02-2014 12:08:11] NPCD: Caught Termination Signal - Hasta la vista... baby
[05-02-2014 12:08:11] NPCD: npcd Daemon (0.4.14) started with PID=27269
[05-02-2014 12:08:11] NPCD: Please have a look at 'npcd -V' to get license information
[05-02-2014 12:08:11] NPCD: HINT: load_threshold is enabled - ('20.000000')
[05-02-2014 12:19:07] NPCD: Caught Termination Signal - Hasta la vista... baby
[05-02-2014 12:19:07] NPCD: npcd Daemon (0.4.14) started with PID=25768
[05-02-2014 12:19:07] NPCD: Please have a look at 'npcd -V' to get license information
[05-02-2014 12:19:07] NPCD: HINT: load_threshold is enabled - ('20.000000')
[05-02-2014 12:27:23] NPCD: Caught Termination Signal - Hasta la vista... baby
[05-02-2014 12:28:06] NPCD: npcd Daemon (0.4.14) started with PID=3144
[05-02-2014 12:28:06] NPCD: Please have a look at 'npcd -V' to get license information
[05-02-2014 12:28:06] NPCD: HINT: load_threshold is enabled - ('20.000000')

Code: Select all

[root@gwr-noc /]# tail -15 /usr/local/nagios/var/perfdata.log
2014-04-29 17:00:35 [29373] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1398808808.perfdata.host-PID-29373 deleted
2014-04-29 17:00:35 [29373] [0] *** Timeout while processing Host: "WB8000_DELLS_3" Service: "_HOST_"
2014-04-29 17:00:35 [29373] [0] *** process_perfdata.pl terminated on signal ALRM
2014-04-29 17:40:35 [26051] [0] *** TIMEOUT: Timeout after 12 secs. ***
2014-04-29 17:40:35 [26051] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-29 17:40:35 [26051] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-29 17:40:35 [26051] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1398811208.perfdata.host-PID-26051 deleted
2014-04-29 17:40:35 [26051] [0] *** Timeout while processing Host: "WI-FILER-IMM" Service: "_HOST_"
2014-04-29 17:40:35 [26051] [0] *** process_perfdata.pl terminated on signal ALRM
2014-04-30 13:05:31 [27739] [0] *** TIMEOUT: Timeout after 12 secs. ***
2014-04-30 13:05:31 [27739] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-04-30 13:05:31 [27739] [0] *** TIMEOUT: Please check your npcd.cfg
2014-04-30 13:05:31 [27739] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1398881110.perfdata.host-PID-27739 deleted
2014-04-30 13:05:31 [27739] [0] *** Timeout while processing Host: "WI-SW5" Service: "_HOST_"
2014-04-30 13:05:31 [27739] [0] *** process_perfdata.pl terminated on signal ALRM
Something interesting here is that you'll notice the latest timestamp on this log shows day before yesterday. Nothing further has been entered into this log post-migration.

Code: Select all

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      246G  6.7G  227G   3% /
tmpfs                 2.9G     0  2.9G   0% /dev/shm
/dev/sda1              97M   28M   65M  31% /boot

Code: Select all

[root@gwr-noc /]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
At this point I'm just picking at straws, so any and all help is greatly appreciated. Thanks!

Dan
Nagios XI 5.2.5 | CentOS6.3 x86_64 | Virtual Instance on VMware vSphere 6
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Performance Data Not Working

Post by abrist »

1. Was there an architecture change in this process (32bit to 64 bit)?
2. Is npcd running?

Code: Select all

service npcd status
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
GreatWolfResorts
Posts: 48
Joined: Tue Mar 15, 2011 11:12 am
Location: Madison, WI
Contact:

Re: Performance Data Not Working

Post by GreatWolfResorts »

Both the old server and the new VM are 64bit, so we should be okay there.

Code: Select all

[root@gwr-noc /]# service npcd status
NPCD running (pid 3144).
Nagios XI 5.2.5 | CentOS6.3 x86_64 | Virtual Instance on VMware vSphere 6
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance Data Not Working

Post by lmiltchev »

Can you also run the following commands, and show us the output?

Code: Select all

ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Have you tried utilizing a RAM disk on any of these two machines?
Be sure to check out our Knowledgebase for helpful articles and solutions!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Performance Data Not Working

Post by slansing »

Hmm, what is the output of all three of these, I realize you did post one above:

Code: Select all

ls /usr/local/nagios/var/spool/xidpe | wc -l
ls /usr/local/nagios/var/spool/perfdata| wc -l
ls /usr/local/nagios/var/spool/checkresults | wc -l
Is this only happening with your MRTG based checks? Or everything?
User avatar
GreatWolfResorts
Posts: 48
Joined: Tue Mar 15, 2011 11:12 am
Location: Madison, WI
Contact:

Re: Performance Data Not Working

Post by GreatWolfResorts »

Code: Select all

[root@gwr-noc /]# ls /usr/local/nagios/var/spool/perfdata | wc -l
0
[root@gwr-noc /]# ls /usr/local/nagios/var/spool/checkresults | wc -l
6
[root@gwr-noc /]# ls /usr/local/nagios/var/spool/xidpe | wc -l
0
I haven't used a RAM disk with either system.

This is happening on all performance data, not just the MRTG, though the MRTG based checks were the ones that initially drew me to the problem at hand just because of the blatant check failure being presented on the tactical screen.
Nagios XI 5.2.5 | CentOS6.3 x86_64 | Virtual Instance on VMware vSphere 6
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance Data Not Working

Post by lmiltchev »

Are the perf graphs blank? What are the permissions on the "perdata" directory and the items in it?

Code: Select all

ll -d /usr/local/nagios/share/perfdata/
ll /usr/local/nagios/share/perfdata/
Have you tried restarting npcd?

Code: Select all

service npcd restart
Can you run the following command and watch the output for a while to see if there is going to be any activity?

Code: Select all

watch 'ls /usr/local/nagios/var/spool/xidpe | wc -l'
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
GreatWolfResorts
Posts: 48
Joined: Tue Mar 15, 2011 11:12 am
Location: Madison, WI
Contact:

Re: Performance Data Not Working

Post by GreatWolfResorts »

The performance graphs are displaying historical data to the point when the backup was made. Nothing further. Image is attached.

Code: Select all

[root@gwr-noc init.d]# ll -d /usr/local/nagios/share/perfdata/
drwxrwxr-x 595 nagios nagios 20480 Apr 10 13:26 /usr/local/nagios/share/perfdata/

[root@gwr-noc init.d]# ll /usr/local/nagios/share/perfdata/
total 741032
drwxrwxr-x 2 nagios nagios      4096 Jul 20  2012 5N-ASA
drwxrwxr-x 2 nagios nagios      4096 Feb 18 15:59 5N-BACKUP
drwxrwxr-x 2 nagios nagios      4096 Feb 18 16:01 5N-BACKUP-RSA2
drwxrwxr-x 2 nagios nagios      4096 Jan 16  2013 5N-CCI
drwxrwxr-x 2 nagios nagios      4096 Jul 20  2012 5N-DS3400A
drwxrwxr-x 2 nagios nagios      4096 Jul 20  2012 5N-DS3400B
drwxrwxr-x 2 nagios nagios      4096 Jul 20  2012 5N-FSW1
drwxrwxr-x 2 nagios nagios      4096 Jul 20  2012 5N-FSW2
The npcd has been restarted a few times already without success.

The watch on xidpe is showing the value jump from 0 to 2 and back again periodically. So it does appear to have some activity in the folder.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.2.5 | CentOS6.3 x86_64 | Virtual Instance on VMware vSphere 6
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance Data Not Working

Post by lmiltchev »

Can you try adding a new, "test" host, wait for 15-20 min and check to see if perf graphs work for the "new" host?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
GreatWolfResorts
Posts: 48
Joined: Tue Mar 15, 2011 11:12 am
Location: Madison, WI
Contact:

Re: Performance Data Not Working

Post by GreatWolfResorts »

I added a new host and disk space check to the server. Performance data is trending properly and displaying on the graphs.
Nagios XI 5.2.5 | CentOS6.3 x86_64 | Virtual Instance on VMware vSphere 6
Locked