Gaps in Performance graphs
Gaps in Performance graphs
We are having an isolated instance where a device’s performance graph has gaps. We have checked multiple devices and their graphs are fine. I have attached a document
showing the issue.
showing the issue.
You do not have the required permissions to view the files attached to this post.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Re: Gaps in Performance graphs
Lets make sure that graphing wasn't hitting a timeout or load limit:
Code: Select all
grep WKENCHP03.Healthone.org /usr/local/nagios/var/perfdata.log
grep WKENCHP03.Healthone.org /usr/local/nagios/var/npcd.logFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Gaps in Performance graphs
here are the responses from the grep:
no data was returned.
I did do a tail and got this:
Code: Select all
Using username "root".
Last login: Mon Jul 1 11:35:09 2013 from 172.22.161.161
[root@LkennagiosP01 ~]# grep WKENCHP03.Healthone.org /usr/local/nagios/var/perfdata.log
[root@LkennagiosP01 ~]# grep WKENCHP03.Healthone.org /usr/local/nagios/var/npcd.log
[root@LkennagiosP01 ~]#
I did do a tail and got this:
Code: Select all
[root@LkennagiosP01 ~]# tail /usr/local/nagios/var/perfdata.log
2013-07-01 11:44:25 [15088] [0] *** TIMEOUT: Please check your npcd.cfg
2013-07-01 11:44:25 [15088] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1372693442.perfdata.service-PID-15088 deleted
2013-07-01 11:44:25 [15088] [0] *** Timeout while processing Host: "PBY_F1_2960-S12" Service: "If_Vlan100"
2013-07-01 11:44:25 [15088] [0] *** process_perfdata.pl terminated on signal ALRM
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: Please check your npcd.cfg
2013-07-01 11:45:34 [19357] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1372693517.perfdata.service-PID-19357 deleted
2013-07-01 11:45:34 [19357] [0] *** Timeout while processing Host: "SOM-UPS-IDF-3-1" Service: "Connectivity"
2013-07-01 11:45:34 [19357] [0] *** process_perfdata.pl terminated on signal ALRM
[root@LkennagiosP01 ~]#
Code: Select all
[root@LkennagiosP01 ~]# tail /usr/local/nagios/var/npcd.log
[07-01-2013 11:35:31] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:35:31] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372692917.perfdata.service'
[07-01-2013 11:39:29] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:39:29] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693142.perfdata.service'
[07-01-2013 11:40:41] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:40:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693217.perfdata.service'
[07-01-2013 11:44:25] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:44:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693442.perfdata.service'
[07-01-2013 11:45:34] NPCD: ERROR: Executed command exits with return code '7'
[07-01-2013 11:45:34] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1372693517.perfdata.service'
[root@LkennagiosP01 ~]#
Last edited by benhank on Mon Jul 01, 2013 10:48 am, edited 1 time in total.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Re: Gaps in Performance graphs
Could you post those services' configs?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Gaps in Performance graphs
Sent toy via pm.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Re: Gaps in Performance graphs
Looks like you are hitting the timeout limit.
Edit:
Change:
To:
Restart npcd:
Edit:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfgCode: Select all
TIMEOUT = 5Code: Select all
TIMEOUT = 20Code: Select all
service npcd restartFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Gaps in Performance graphs
This system has a recent gap in the CPU usage. We found this in the perfdata.log. Nothing in the NPCD.log
[root@LkennagiosP01 nagiosxi]# grep WKENAHPRESP236.Healthone.org /usr/local/nagios/var/perfdata.log
2013-06-29 22:37:59 [23517] [0] *** Timeout while processing Host: "WKENAHPRESP236.Healthone.org" Service: "NRPE__Event_ID_1008_Status"
[root@LkennagiosP01 nagiosxi]# grep WKENAHPRESP236.Healthone.org /usr/local/nagios/var/perfdata.log
2013-06-29 22:37:59 [23517] [0] *** Timeout while processing Host: "WKENAHPRESP236.Healthone.org" Service: "NRPE__Event_ID_1008_Status"
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Re: Gaps in Performance graphs
Yeah. It probably stopped processing perfdata due to the timeouts. Increasing the timeout will make sure this does not happen in the future. Did you do so yet?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Gaps in Performance graphs
No, it is currently set to 5 secs. Should we go to 10?
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Gaps in Performance graphs
10 may not be enough, you could try 15 or 20 to start with.