pnp4nagios graphs randomly stop updating

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
agentdavidson
Posts: 9
Joined: Mon Mar 13, 2017 9:50 pm

pnp4nagios graphs randomly stop updating

Post by agentdavidson »

Hello -
I have a Nagios Core (4.3.4) + php4nagios (0.6.25) on running on RHEL 7.3 VM and every now and then ALL the php4nagios graphs will stop updating. It's pretty random, say once every two weeks, random day/time. It's fixed by restarting Nagios [service nagios stop | start].

All other Nagios functions seem to be running fine when this happens.

I've setup a Nagios check to detect when it happen using a file age check on /usr/local/pnp4nagios/var/perfdata/localhost/_HOST_.xml which should get updated every check cycle (60sec in this case). So when this file hasn't been updated after 5mins it cuts a CRITICAL alert and I'll find php4nagios graphs aren't updating.

Any suggestions on where to go from here? Is it Nagios or php4nagios that updates the .xml files?

Regards
Matt
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: pnp4nagios graphs randomly stop updating

Post by mcapra »

Here's the official documentation regarding performance data:
https://assets.nagios.com/downloads/nag ... fdata.html
agentdavidson wrote:Is it Nagios or php4nagios that updates the .xml files?
Nagios writes performance data to the file specified by the service_perfdata_file and host_perfdata_file directives in your main nagios.cfg file. The format in which Nagios writes data to those files is specified by the host_perfdata_file_template and service_perfdata_file_template directives in that same file.

pnp4nagios consumes that file and stuffs it into RRDs (which are tied to the xml files you mentioned). Those RRDs are what pnp4nagios reads from when it generates graphs.

You might bump-up the logging level for pnp4nagios's process_perfdata.pl script and keep an eye on the logfile. Where that file is located depends on how you setup pnp4nagios, but the name is process_perfdata.cfg.

If you're using NPCD, you can bump up some logging for that as well.
Former Nagios employee
https://www.mcapra.com/
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: pnp4nagios graphs randomly stop updating

Post by tmcdonald »

Thanks for the assist, @mcapra!
Former Nagios employee
agentdavidson
Posts: 9
Joined: Mon Mar 13, 2017 9:50 pm

Re: pnp4nagios graphs randomly stop updating

Post by agentdavidson »

Thanks for the info. I've found that when the graphs stop updating /usr/local/pnp4nagios/var/perfdata.log stops updating also.

For example this morning the graphs stopped updating around 3:23am and the log file also comes to a halt at that time.

[root@zpredvmnet1:/usr/local/pnp4nagios/var] tail perfdata.log
2018-02-27 03:23:33 [30855] [2] /usr/local/pnp4nagios/var/perfdata/r0020031/_HOST_.rrd updated
2018-02-27 03:23:33 [30855] [2] Processing Line 4437
2018-02-27 03:23:33 [30855] [2] Datatype set to 'HOSTPERFDATA'
2018-02-27 03:23:33 [30855] [1] Found Performance Data for r0012001 / _HOST_ (rta=21.936001ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)
2018-02-27 03:23:33 [30855] [2] data2rrd called
2018-02-27 03:23:33 [30855] [2] RRDs::update /usr/local/pnp4nagios/var/perfdata/r0012001/_HOST_.rrd 1519654605:21.936001:0
2018-02-27 03:23:33 [30855] [2] /usr/local/pnp4nagios/var/perfdata/r0012001/_HOST_.rrd updated
2018-02-27 03:23:33 [30855] [2] Processing Line 4438
2018-02-27 03:23:33 [30855] [2] Datatype set to 'HOSTPERFDATA'
2018-02-27 03:23:33 [30855] [1] Found Performance Data for r0052501 / _HOST_ (rta=16.988001ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)

It's like Nagios just stops calling the php4nagios script.

I'll bump up the logging level as suggested as see what falls out. [EDIT - seems I'm already @ level 2 (debug)]
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: pnp4nagios graphs randomly stop updating

Post by scottwilkerson »

I am not familiar with how that version of pnp works, does it have config files like a npcd.cfg? does is run the npcd daemon? If so there should be logs for that as well

Can you share any of the configs?

I did some searching on the web and did see some similar responses but didn't see a solution. Did you ask the pnp4nagios creators?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
agentdavidson
Posts: 9
Joined: Mon Mar 13, 2017 9:50 pm

Re: pnp4nagios graphs randomly stop updating

Post by agentdavidson »

I'm running PHP4Nagios in Bulk Mode (without NCPD or npcdmod) and the doco on the PNP4Nagios site (http://docs.pnp4nagios.org/_detail/bulk ... 6%3Aconfig) suggests that it is the Nagios process that calls process_perfdata.pl.

The behaviour I'm seeing looks a lot like Nagios just stops calling process_perfdata.pl.

And that I can kick things back into life with a restart of Nagios also seems to point to a Nagios issue, despite all other Nagios functionality (polling, alerting, perfdata etc) working fine at the time.

I am still looking into this as time permits and will share any progress or findings.

Cheers.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: pnp4nagios graphs randomly stop updating

Post by tgriep »

Can you post the following files so we can check the settings?

Code: Select all

process_perfdata.cfg
nagios.cfg
When the graphs stop, do you see any errors in the nagios.log file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
agentdavidson
Posts: 9
Joined: Mon Mar 13, 2017 9:50 pm

Re: pnp4nagios graphs randomly stop updating

Post by agentdavidson »

I've some further information to hand. I found that in /var/log/messages the issue seems to be preceded by increasing numbers of the following errors.

Code: Select all

Jun  5 19:19:37 myserver nagios: Warning: fork() in my_system_r() failed for command "/usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata" - errno: Cannot allocate memory
Jun  5 19:19:37 myserver nagios: Warning: fork() in my_system_r() failed for command "/usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata" - errno: Cannot allocate memory
They start popping up at an irregular frequency between 5-15mins but by the time the graphs stop updating they are happening every 15secs which aligns with the service_perfdata_file_processing_interval and host_perfdata_file_processing_interval settings in nagios.cfg

In other words, it degrades to a point where it is balking every time it runs.

The other interesting aspect is that when I find the graphs have stopped updating if I run the process_perfdata.pl commands manually the graphs will get updated to current time. Takes a good few mins but they eventually catch up to current time. I can then stop | start nagios and I don't get get left with gaps in the graphs.

I can post process_perfdata.cfg and nagios.cfg if that is required but they are pretty much straight out of the box.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: pnp4nagios graphs randomly stop updating

Post by scottwilkerson »

This looks like the system is either out of memory or the pnp4nagios you are using cannot allocate it properly.

This would be a question for the pnp4nagios developers. We are not the developers for pnp4nagios.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked