service-perfdata.out got increasing huge size

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
grayloglearn
Posts: 222
Joined: Thu Jul 06, 2017 8:55 am

service-perfdata.out got increasing huge size

Post by grayloglearn »

Hi team,

We are getting the issue with service-perfdata.out file usage is more. Could you please suggest how to decrease this.
In our nagios core pnp4 nagios is also configured.

[root@1 nagios]# cd var/
[root@ var]# ls
archives nagios.configtest nagios.log objects.precache rw spool
host-perfdata.out nagios.lock objects.cache retention.dat service-perfdata.out status.dat
[root@1 var]# du -sh *
0 archives
35G host-perfdata.out
4.0K nagios.configtest
4.0K nagios.lock
421M nagios.log
2.7M objects.cache
2.7M objects.precache
4.2M retention.dat
40K rw
598G service-perfdata.out
8.0K spool
4.2M status.dat
[root@1 var]# pwd
/usr/local/nagios/var
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: service-perfdata.out got increasing huge size

Post by benjaminsmith »

Hello @grayloglearn,

How long has this been happening and what is the status of npcd?

Code: Select all

systemctl status npcd.service
# or
service npcd status
If it's not running, you can re-start it but with that many files backed up, you'll need to remove them otherwise the server will not be able to process them and keep up with incoming files.

It's also possible that the load on the server is too high, hitting the max threshold settings causing the files to spool up.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
grayloglearn
Posts: 222
Joined: Thu Jul 06, 2017 8:55 am

Re: service-perfdata.out got increasing huge size

Post by grayloglearn »

Hi team,
we have restarted the npcd service. but even though we are able to see the same usage.

[root@XXXX var]# tail +n 50 service-perfdata.out
tail: cannot open ‘+n’ for reading: No such file or directory
tail: cannot open ‘50’ for reading: No such file or directory
==> service-perfdata.out <==

599G service-perfdata.out. just want to check with you how to trim the data in this files but when we see the service-perfdata.out first 50 lines we are able to see today entry only then how service-perfdata.out is in huge not understand.
1568080719 app_fraepkdb fs_/oracle/stage OK 1 HARD 0.000 0.740 OK - 0.0% used (0.00 of 9.8 GB), (levels at 80.0/90.0%), trend: 0.00B / 24 hours /oracle/stage=1.8515625MB;8000;9000;0;10000.0
1568080719 app_fraepkdb lparstat OK 1 HARD 0.000 0.742 OK: AIX lparstat, user=2.6% sys=2.2% wait=0.1% idle=95.1% physc=0.02 app=1246867123 user=2.6%;;;; sys=2.2%;;;; wait=0.1%;;;; idle=95.1%;;;; physc=0.02;;;; entc=10.5%;;;; lbusy=0.4;;;; app=1246867123;;;;
1568080719 de-ffm-iproxy02 Number of threads OK 1 HARD 0.000 0.727 OK - 224 threads threads=224;2000.0;4000.0;0;
1568080719 app_fraepkdb fs_/tmp OK 1 HARD 0.000 0.740 OK - 25.3% used (2.03 of 8.0 GB), (levels at 80.0/90.0%), trend: +907.81B / 24 hours /tmp=2074.5703125MB;6553;7372;0;8192.0
1568080719 app_fraepkdb fs_/var OK 1 HARD 0.000 0.742 OK - 70.2% used (0.88 of 1.2 GB), (levels at 80.0/90.0%), trend: +115.89KB / 24 hours /var=898.44921875MB;1024;1152;0;1280.0
1568080719 app_fraepkdb Check_MK OK 1 HARD 0.093 0.000 OK - Agent version 1.1.10, execution time 0.1 sec execution_time=0.064
1568080719 htpmsdbtest session-usage OK 1 HARD 0.136 0.000 OK - 11.69% of session resources usedsession_usage=11.69%;80;100
1568080719 app_fraepkdb fs_/usr OK 1 HARD 0.000 0.742 OK - 79.7% used (5.53 of 6.9 GB), (levels at 80.0/90.0%), trend: +544.53KB / 24 hours /usr=5661.95703125MB;5683;6393;0;7104.0
1568080719 de-ffm-iproxy02 fs_/usr OK 1 HARD 0.000 0.732 OK - 25.9% used (2.52 of 9.7 GB), (levels at 80.0/90.0%), trend: +5.08B / 24 hours /usr=2581.40625MB;7961;8956;0;9951.3046875
1568080719 PROVISDBPROD process-usage OK 1 HARD 0.140 0.000 OK - 49.33% of process resources usedprocess_usage=49.33%;80;100


Please suggest how we can resolve this.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: service-perfdata.out got increasing huge size

Post by scottwilkerson »

You can truncate it by running

Code: Select all

cat /dev/null > service-perfdata.out
But finding out why it isn't getting reaped by your pnp installation is a different matter

Can you show the output of

Code: Select all

grep perfdata /usr/local/nagios/etc/nagios.cfg
ls -al /usr/local/nagios/var|grep perfdata
Do you get graphs in PNP4nagios ? Is it setup correctly to read these files?

Have you considered Nagios XI that has graphing already setup properly?
https://www.nagios.com/products/nagios-xi/
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
grayloglearn
Posts: 222
Joined: Thu Jul 06, 2017 8:55 am

Re: service-perfdata.out got increasing huge size

Post by grayloglearn »

Hi Team,

thanks for the reply, Need some help we have observed that file contain data from aug 2016. Could you please suggest command to remove the data from aug 2016 to dec 2016.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: service-perfdata.out got increasing huge size

Post by scottwilkerson »

I don't even know if this files is being used by any systems. With it being 598G I doubt it, but if you want to just keep the last xxxx line you could run something like this

Code: Select all

tail - xxxx service-perfdata.out > service-perfdata.out_new
mv service-perfdata.out_new service-perfdata.out
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
grayloglearn
Posts: 222
Joined: Thu Jul 06, 2017 8:55 am

Re: service-perfdata.out got increasing huge size

Post by grayloglearn »

I just want to give some details hope this details will help you to figure it out the problem

# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
}


# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
}

#
# Bulk with NPCD mode
#
define command {
command_name process-service-perfdata-file
command_line /bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata.$TIMET$
}
define command {
command_name process-host-perfdata-file
command_line /bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata.$TIMET$


The red colored/Bold is having 600GB now. We don't know how to resolve this.
But we will delete the 2016 data in service-perfdata.out then we will check it out how its working. For this i need command to remove the data from 2016 jan to 2016 dec. So that we wll check , could you please help with such commands. We have tried but we could not find the right command.

you ask about some details but we did not provide now we are providing the details please check.


[root@XXXXX ~]# grep perfdata /usr/local/nagios/etc/nagios.cfg
perfdata_timeout=5
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
#host_perfdata_file=/usr/local/nagios/var/host-perfdata
#service_perfdata_file=/usr/local/nagios/var/service-perfdata
#host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTION TIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
#service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICE DESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDA TA$
#host_perfdata_file_mode=a
#service_perfdata_file_mode=a
#host_perfdata_file_processing_interval=0
#service_perfdata_file_processing_interval=0
#host_perfdata_file_processing_command=process-host-perfdata-file
#service_perfdata_file_processing_command=process-service-perfdata-file
# These options determine wether the core will process empty perfdata
# If you don't require empty perfdata - saving some cpu cycles
#host_perfdata_process_empty_results=1
#service_perfdata_process_empty_results=1
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNA ME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\t SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYP E::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTAT ETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$H OSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHO STSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file


[root@XXXX~]# ls -al /usr/local/nagios/var|grep perfdata
-rw-r--r-- 1 nagios nagios 37084081895 Sep 18 09:12 host-perfdata.out
-rw-r--r-- 1 nagios nagios 647629058601 Sep 18 09:12 service-perfdata.out
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: service-perfdata.out got increasing huge size

Post by scottwilkerson »

Do you have something on your system that is doing something with the file you are appending to at : ?
/usr/local/nagios/var/host-perfdata.out
/usr/local/nagios/var/service-perfdata.out
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
grayloglearn
Posts: 222
Joined: Thu Jul 06, 2017 8:55 am

Re: service-perfdata.out got increasing huge size

Post by grayloglearn »

Actually we are also surprised after seeing service-perfdata.out file. We are really not sure why the file is gradually increasing.
Not aware of appending data too.
But once we open the file using timestamp we could see that file consist the data from 2016 so finally i decided that i want to remove the 2016 data in that file so that we can do some free on that file.

If any chance to give the command to remove the only 2016 jan to 2016 dec data in service-perdata.out??
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: service-perfdata.out got increasing huge size

Post by scottwilkerson »

I don't have a command to do that but you would somehow need to process the file where the first field is less than 1483189199
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked