Page 2 of 4
Re: CRITICAL - popen timeout received, but no child process
Posted: Tue Mar 31, 2015 2:31 pm
by bosecorp
done
still see the same. it's getting better but still
Re: CRITICAL - popen timeout received, but no child process
Posted: Tue Mar 31, 2015 3:03 pm
by jdalrymple
Code: Select all
[jdalrymple@localhost ~]$ for file in {1..300000}; do touch $file; done
[jdalrymple@localhost ~]$ ls -l | wc -l
300001
[jdalrymple@localhost ~]$ find ./ -type f -exec rm {} \;
[jdalrymple@localhost ~]$ ls -l | wc -l
1
It will take awhile, and it may beat up on your CPU so you may want to do it off-hours. It should work fine though.
Re: CRITICAL - popen timeout received, but no child process
Posted: Tue Mar 31, 2015 3:08 pm
by bosecorp
done
got better, but still
Re: CRITICAL - popen timeout received, but no child process
Posted: Tue Mar 31, 2015 3:52 pm
by jdalrymple
uptime and df please:
Code: Select all
[jdalrymple@localhost ~]$ uptime
15:51:57 up 6 days, 7:03, 1 user, load average: 0.00, 0.00, 0.02
[jdalrymple@localhost ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-lv_root
18G 3.1G 14G 19% /
tmpfs 491M 0 491M 0% /dev/shm
/dev/sda1 477M 28M 425M 6% /boot
Re: CRITICAL - popen timeout received, but no child process
Posted: Tue Mar 31, 2015 4:09 pm
by bosecorp
Code: Select all
root@nagmonus1:(03-31 17:09): /root
# uptime
17:09:09 up 4 days, 22:25, 3 users, load average: 3.45, 3.76, 3.88
root@nagmonus1:(03-31 17:09): /root
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rootvg-lvroot 2.0G 820M 1.1G 43% /
tmpfs 9.9G 0 9.9G 0% /dev/shm
/dev/sda1 243M 49M 181M 22% /boot
/dev/mapper/rootvg-lvopt 2.0G 92M 1.8G 5% /opt
/dev/mapper/rootvg-lvtmp 6.9G 75M 6.5G 2% /tmp
/dev/mapper/rootvg-lvusers 4.0G 137M 3.7G 4% /users
/dev/mapper/rootvg-lvusr 7.9G 4.1G 3.4G 55% /usr
/dev/mapper/rootvg-lvvar 15G 5.9G 8.2G 42% /var
/dev/mapper/vgapp-lvapp 49G 4.1G 42G 9% /app
/dev/mapper/vgapp-lvstore 69G 6.6G 59G 11% /store
/dev/mapper/vgapp-lvlocalnagios 128G 28G 95G 23% /usr/local/nagios
/dev/mapper/vgapp-lvmysql 69G 2.6G 63G 4% /var/lib/mysql
Re: CRITICAL - popen timeout received, but no child process
Posted: Wed Apr 01, 2015 1:38 pm
by ssax
For the hosts that are experiencing the problems are you using check_icmp or check_ping or something else?
Re: CRITICAL - popen timeout received, but no child process
Posted: Wed Apr 01, 2015 1:43 pm
by bosecorp
I think check_icmp
remember, that I am using NRDS or NRDP or whatever is called. so the actual checks are done by the host.
but correct me if I am wrong but the actual host check is done by Nagios, right?
these host is member of this template xiwizard_passive_host, and when I go and check that host template, it's associated with anohter host template called xiwizard_generic_host, which has the following command configured "$USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$,$ARG2$ -c $ARG3$,$ARG4$ -p 5 -t 30"
Re: CRITICAL - popen timeout received, but no child process
Posted: Wed Apr 01, 2015 5:10 pm
by jdalrymple
bosecorp,
Sorry, I missed you mentioning that these are passive checks.
NRDS is the tool you're using?
Can you run the check_icmp from the host command line and verify the output?
There is no chance that we now have weird situations where gearman checks are getting mixed in on hosts with NRDS checks is there? Your environment must be very complicated. Do you guys have any sketches of how it's laid out? How do you determine what is monitored by gearman and what is sending NRDS checks back in?
Re: CRITICAL - popen timeout received, but no child process
Posted: Wed Apr 01, 2015 5:30 pm
by bosecorp
these checks are being done by the JOB server
Yes, I am using NRDS.
this is happening with severall AIX-Linux clients
Re: CRITICAL - popen timeout received, but no child process
Posted: Thu Apr 02, 2015 9:42 am
by jdalrymple
We need a better description of your environment I'm certain. Do you have a Visio diagram of your monitoring infrastructure so we can better understand how it works?
I'm having a hard time wrapping my mind around what you just said. The "Job server" is by definition the Nagios server, or the server with the NEB module installed and running. Passive checks on remote hosts cannot be submitted by the Nagios server.
If your environment is configured and running just as described above then I can understand why there would be issues
