Gap in perf data : fork my_system_r failed

brdr · Post by **brdr** » Wed Dec 09, 2015 12:31 pm

Hi guys,

We have XI 2014R.7.

Last night and into morning we have a gap in all our performance data in our PRODUCTION environment.

Nagios.log (indicating issue)
[Tue Dec 8 17:20:18 2015] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1449613218.perfdata.service"

NPCD.log (indicates gap in perf data)
[12-08-2015 22:48:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1449632884.perfdata.service'
[12-09-2015 11:08:15] NPCD: ERROR: Executed command exits with return code '1'

There is no system error in /var/log/messages. Our ulimit is below. Any idea why this issue would occur? And why it would suddenly clear up at that time? Thx.

# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 95125
max locked memory (kbytes, -l) 128
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 20480
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

rkennedy · Post by **rkennedy** » Wed Dec 09, 2015 12:47 pm

How many hosts / services are you currently checking?

Additionally, what is the result of top|head -5?

brdr · Post by **brdr** » Wed Dec 09, 2015 12:56 pm

948 hosts, 6679 services. XI server + 2 mod_gearman servers in this environment.

top - 12:54:37 up 6 days, 13:52, 2 users, load average: 4.40, 4.11, 3.72
Tasks: 329 total, 2 running, 326 sleeping, 0 stopped, 1 zombie
Cpu(s): 27.5%us, 4.9%sy, 0.0%ni, 67.0%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 12197752k total, 8215544k used, 3982208k free, 106536k buffers
Swap: 2064380k total, 45088k used, 2019292k free, 2584568k cached
[root@bed-600-124 var]#

jolson · Post by **jolson** » Wed Dec 09, 2015 2:46 pm

Is there any stacked up perfdata? It seems like your 'mv' command failed. This could be due to an abundance of stacked up performance data.

Code: Select all

ls -l /usr/local/nagios/var/spool/perfdata | wc -l
ls -l /usr/local/nagios/var/spool/xidpe | wc -l

How many CPUs are in your system? Free memory? Disk space?

Code: Select all

lscpu
free -m
df -h

brdr · Post by **brdr** » Wed Dec 09, 2015 3:21 pm

I may have seen the xidpe directory fill up at the inode level but still indicate plenty of disk space. Could that be an issue?
--
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
--
--
free -m
total used free shared buffers cached
Mem: 11911 9200 2711 13 146 2646
-/+ buffers/cache: 6407 5503
Swap: 2015 44 1971
--
--
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
3.9G 2.5G 1.2G 69% /
tmpfs 5.9G 0 5.9G 0% /dev/shm
/dev/sda1 92M 72M 16M 83% /boot
/dev/mapper/VolGroup00-LogVol05
2.9G 616M 2.2G 22% /home
/dev/mapper/VolGroup00-LogVol02
129G 81G 42G 67% /usr
/dev/mapper/VolGroup00-LogVol03
5.8G 4.0G 1.6G 73% /var
--
--

Post by **Box293** » Wed Dec 09, 2015 5:59 pm

brdr wrote:I may have seen the xidpe directory fill up at the inode level but still indicate plenty of disk space. Could that be an issue?

Yes, running out of inodes is a big problem.

You will need to add more space to that disk and that will increase the amount of available inodes.

Brings back memories of MS-DOS 6 and partitioning the 250MB hard disk into smaller partitions to prevent wasted space

brdr · Post by **brdr** » Thu Dec 10, 2015 8:44 am

i know right!

Thanks for info. Before adding space to partition i want to see if the maxing out of the inodes is the real problem. We had some strange behaviour last week too (Nagios XI hung) so i hope this is related.

I found a good plugin to check inode on XI server and added a check for the partition.

Here is the plugin case anyone is interested..
https://exchange.nagios.org/directory/P ... es/details

Thanks. Please close.

rkennedy · Post by **rkennedy** » Thu Dec 10, 2015 11:08 am

I will now close this thread out. Feel free to open a new one if you need more assistance in the future.

Nagios Support Forum

Gap in perf data : fork my_system_r failed

Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed

Re: Gap in perf data : fork my_system_r failed