Page 2 of 4
Re: performance data not kept for some services
Posted: Thu Apr 07, 2011 2:44 pm
by rdedon
I stand corrected on the spacing as this can be an issue in other cases but not here.
Re: performance data not kept for some services
Posted: Thu Apr 07, 2011 2:48 pm
by tonyyarusso
How did you confirm that RRD files weren't being generated? ls?
Re: performance data not kept for some services
Posted: Thu Apr 07, 2011 4:31 pm
by lyle
I've always wondered how this all goes together, so I'm doing some PNP reading while I wait to see if your Team has any ideas.
Looking at the pnp config file, it looks like logging is set just below debug, so I'm going to look through the log file for any clues. And I might write a script to keep an eye on the spool directory to see if there's anything coming in from these services.
Maybe the data isn't even making it to PNP, but I'm not sure what to look for in the Core area.
Thanks again for any advice....Lyle
Re: performance data not kept for some services
Posted: Thu Apr 07, 2011 4:47 pm
by lyle
I've never understood how Core, PNP, and RRD all work together, so I'm doing a little reading on PNP while I wait to see if your Team has any ideas.
Waiting for something to show up in PNP's spool directory, I stumbled across this:
Code: Select all
[sysops@asb-sac-ngs-002 perfdata]$ ls -l
total 0
[sysops@asb-sac-ngs-002 perfdata]$ ls -l
total 12
-rw-r--r-- 1 nagios users 0 Apr 7 14:38 host-perfdata.1302212333
-rw-r--r-- 1 nagios users 373 Apr 7 14:38 service-perfdata.1302212333
[sysops@asb-sac-ngs-002 perfdata]$ cat serv*
DATATYPE::SERVICEPERFDATA TIMET::1302212328 HOSTNAME::pat-sac-cpa-001 SERVICEDESC::Memory/Windows SERVICEPERFDATA::physical memory %=25%;80;90; physical memory=1.03G;3.;3.;0;4; SERVICECHECKCOMMAND::check_nrpe_mem HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::OK: physical memory: Total: 4G - Used: 1.03G (25%) - Free: 2.97G (75%)
Looks to me like the performance data is making it to the spool directory. The quotes don't seem to survive, and maybe that's a problem for PNP.
The logfile for PNP doesn't give any info, just that the files were processed. I guess I could increase the logging level to DEBUG:
Code: Select all
[04-07-2011 14:39:02] NPCD: ThreadCounter 0/5 File is host-perfdata.1302212333
[04-07-2011 14:39:02] NPCD: Regular File: host-perfdata.1302212333
[04-07-2011 14:39:02] NPCD: A thread was started on thread_counter = 0
[04-07-2011 14:39:02] NPCD: DEBUG: load 0.370000/10.000000
[04-07-2011 14:39:02] NPCD: ThreadCounter 1/5 File is service-perfdata.1302212333
[04-07-2011 14:39:02] NPCD: Regular File: service-perfdata.1302212333
[04-07-2011 14:39:02] NPCD: A thread was started on thread_counter = 1
[04-07-2011 14:39:02] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[04-07-2011 14:39:02] NPCD: Processing file host-perfdata.1302212333 with ID 1087129920 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//host-perfdata.1302212333
[04-07-2011 14:39:02] NPCD: Processing file 'host-perfdata.1302212333'
[04-07-2011 14:39:02] NPCD: Processing file service-perfdata.1302212333 with ID 1097619776 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1302212333
[04-07-2011 14:39:02] NPCD: Processing file 'service-perfdata.1302212333'
[04-07-2011 14:39:03] NPCD: No more files to process... waiting for 15 seconds
Re: performance data not kept for some services
Posted: Thu Apr 07, 2011 7:07 pm
by lyle
Hi, Tony:
Sorry, didn't see your reply about how I know there's no rrd files. Yes, I just "ls -l" on the directory. Here's an example:
Code: Select all
# pwd
/usr/local/nagios/share/perfdata/pat-sac-cpa-001
# ls -l
total 1928
-rw-rw-rw- 1 nagios nagios 384960 Apr 7 17:01 cpu_Windows.rrd
-rw-rw-rw- 1 nagios nagios 1573 Apr 7 17:01 cpu_Windows.xml
-rw-rw-rw- 1 nagios nagios 768232 Apr 7 17:02 _HOST_.rrd
-rw-rw-rw- 1 nagios nagios 1934 Apr 7 17:02 _HOST_.xml
-rw-rw-rw- 1 nagios nagios 768232 Apr 6 13:34 Ping.rrd
-rw-rw-rw- 1 nagios nagios 2121 Apr 6 13:34 Ping.xml
BTW, I set the npcd log level to "-1" and got no more information than when it was at "2"; just that the files had been processed.
Thanks....Lyle
Re: performance data not kept for some services
Posted: Fri Apr 08, 2011 10:20 am
by rdedon
Just a heads up that we are still working with this

Re: performance data not kept for some services
Posted: Fri Apr 08, 2011 11:18 am
by rdedon
Another heads up, we do have it working but putting it through a few tests firsts

Re: performance data not kept for some services
Posted: Fri Apr 08, 2011 11:35 am
by mguthrie
Ok, I think I got it. You may have found a PNP bug, because the perf data that you have "should" be working according to their documentation. I ran some syntax tests on your performance data, and I think I found what was causing the problem.
I removed the [min][max] values from the performance data, and the rrd files generated.
Fixed Perfdata:
'physical memory %'=25%;80;90;;; 'physical memory'=1.0G;3.;3.;
Broken Perfdata:
'physical memory %'=25%;80;90;;; 'physical memory'=1.0G;3.;3.;0;4;
I tried several different possibilities to get those min/max values to work, but for whatever reason, they seem to be preventing the rrd files from being generated.
Re: performance data not kept for some services
Posted: Fri Apr 08, 2011 4:07 pm
by lyle
Lucky me. 8-|
I don't have any control over those min/max parameters being returned by NSClient++, right?
I see you're distributing XI with version 0.4.14 of PNP. I wonder if 0.6 would fix this? I'm guessing that bug reports for 0.4 won't get much attention.
I need to get this working. So I've posed the question on the NSClient++ forum, but guess I should post on the PNP forum too.
Thanks for taking a look at this, and please let me know if the Team has any other thoughts..Lyle
Re: performance data not kept for some services
Posted: Fri Apr 08, 2011 4:57 pm
by mguthrie
Actually the min/max specifications are probably being passed as parameters somewhere in the command definition, either the XI server's command definition, or on NSClient. If you could send the check command definition, the service definition, and the NSClient's command definition we can probably help you find it.