Gaps in Performance graphs

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Gaps in Performance graphs

Post by benhank »

We are having an isolated instance where a device’s performance graph has gaps. We have checked multiple devices and their graphs are fine. I have attached a document
showing the issue.
You do not have the required permissions to view the files attached to this post.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Gaps in Performance graphs

Post by abrist »

Lets make sure that graphing wasn't hitting a timeout or load limit:

Code: Select all

grep WKENCHP03.Healthone.org /usr/local/nagios/var/perfdata.log
grep WKENCHP03.Healthone.org /usr/local/nagios/var/npcd.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Gaps in Performance graphs

Post by benhank »

here are the responses from the grep:

Code: Select all

Using username "root".
Last login: Mon Jul  1 11:35:09 2013 from 172.22.161.161
[root@LkennagiosP01 ~]# grep WKENCHP03.Healthone.org /usr/local/nagios/var/perfdata.log
[root@LkennagiosP01 ~]# grep WKENCHP03.Healthone.org /usr/local/nagios/var/npcd.log
[root@LkennagiosP01 ~]#
no data was returned.
I did do a tail and got this:

Code: Select all

[root@LkennagiosP01 ~]# tail /usr/local/nagios/var/perfdata.log
2013-07-01 11:44:25 [15088] [0] *** TIMEOUT: Please check your npcd.cfg
2013-07-01 11:44:25 [15088] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1372693442.perfdata.service-PID-15088 deleted
2013-07-01 11:44:25 [15088] [0] *** Timeout while processing Host: "PBY_F1_2960-S12" Service: "If_Vlan100"
2013-07-01 11:44:25 [15088] [0] *** process_perfdata.pl terminated on signal ALRM
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: Please check your npcd.cfg
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1372693517.perfdata.service-PID-19357 deleted
2013-07-01 11:45:34 [19357] [0] *** Timeout while processing Host: "SOM-UPS-IDF-3-1" Service: "Connectivity"
2013-07-01 11:45:34 [19357] [0] *** process_perfdata.pl terminated on signal ALRM
[root@LkennagiosP01 ~]#

Code: Select all

[root@LkennagiosP01 ~]# tail /usr/local/nagios/var/npcd.log
[07-01-2013 11:35:31] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:35:31] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372692917.perfdata.service'
[07-01-2013 11:39:29] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:39:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693142.perfdata.service'
[07-01-2013 11:40:41] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:40:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693217.perfdata.service'
[07-01-2013 11:44:25] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:44:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693442.perfdata.service'
[07-01-2013 11:45:34] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:45:34] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693517.perfdata.service'
[root@LkennagiosP01 ~]#
Last edited by benhank on Mon Jul 01, 2013 10:48 am, edited 1 time in total.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Gaps in Performance graphs

Post by abrist »

Could you post those services' configs?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Gaps in Performance graphs

Post by benhank »

Sent toy via pm.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Gaps in Performance graphs

Post by abrist »

Looks like you are hitting the timeout limit.
Edit:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:

Code: Select all

TIMEOUT = 5
To:

Code: Select all

TIMEOUT = 20
Restart npcd:

Code: Select all

service npcd restart
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Gaps in Performance graphs

Post by benhank »

This system has a recent gap in the CPU usage. We found this in the perfdata.log. Nothing in the NPCD.log

[root@LkennagiosP01 nagiosxi]# grep WKENAHPRESP236.Healthone.org /usr/local/nagios/var/perfdata.log
2013-06-29 22:37:59 [23517] [0] *** Timeout while processing Host: "WKENAHPRESP236.Healthone.org" Service: "NRPE__Event_ID_1008_Status"
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Gaps in Performance graphs

Post by abrist »

Yeah. It probably stopped processing perfdata due to the timeouts. Increasing the timeout will make sure this does not happen in the future. Did you do so yet?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Gaps in Performance graphs

Post by benhank »

No, it is currently set to 5 secs. Should we go to 10?
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Gaps in Performance graphs

Post by slansing »

10 may not be enough, you could try 15 or 20 to start with.
Locked