Page 2 of 4

Re: CRITICAL - popen timeout received, but no child process

Posted: Tue Mar 31, 2015 2:31 pm
by bosecorp
done

still see the same. it's getting better but still

Re: CRITICAL - popen timeout received, but no child process

Posted: Tue Mar 31, 2015 3:03 pm
by jdalrymple

Code: Select all

[jdalrymple@localhost ~]$ for file in {1..300000}; do touch $file; done
[jdalrymple@localhost ~]$ ls -l | wc -l
300001
[jdalrymple@localhost ~]$ find ./ -type f -exec rm {} \;
[jdalrymple@localhost ~]$ ls -l | wc -l
1
It will take awhile, and it may beat up on your CPU so you may want to do it off-hours. It should work fine though.

Re: CRITICAL - popen timeout received, but no child process

Posted: Tue Mar 31, 2015 3:08 pm
by bosecorp
done

got better, but still

Re: CRITICAL - popen timeout received, but no child process

Posted: Tue Mar 31, 2015 3:52 pm
by jdalrymple
uptime and df please:

Code: Select all

[jdalrymple@localhost ~]$ uptime
 15:51:57 up 6 days,  7:03,  1 user,  load average: 0.00, 0.00, 0.02
[jdalrymple@localhost ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/centos-lv_root
                       18G  3.1G   14G  19% /
tmpfs                 491M     0  491M   0% /dev/shm
/dev/sda1             477M   28M  425M   6% /boot

Re: CRITICAL - popen timeout received, but no child process

Posted: Tue Mar 31, 2015 4:09 pm
by bosecorp

Code: Select all

root@nagmonus1:(03-31 17:09): /root
# uptime
 17:09:09 up 4 days, 22:25,  3 users,  load average: 3.45, 3.76, 3.88
root@nagmonus1:(03-31 17:09): /root
# df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-lvroot        2.0G  820M  1.1G  43% /
tmpfs                            9.9G     0  9.9G   0% /dev/shm
/dev/sda1                        243M   49M  181M  22% /boot
/dev/mapper/rootvg-lvopt         2.0G   92M  1.8G   5% /opt
/dev/mapper/rootvg-lvtmp         6.9G   75M  6.5G   2% /tmp
/dev/mapper/rootvg-lvusers       4.0G  137M  3.7G   4% /users
/dev/mapper/rootvg-lvusr         7.9G  4.1G  3.4G  55% /usr
/dev/mapper/rootvg-lvvar          15G  5.9G  8.2G  42% /var
/dev/mapper/vgapp-lvapp           49G  4.1G   42G   9% /app
/dev/mapper/vgapp-lvstore         69G  6.6G   59G  11% /store
/dev/mapper/vgapp-lvlocalnagios  128G   28G   95G  23% /usr/local/nagios
/dev/mapper/vgapp-lvmysql         69G  2.6G   63G   4% /var/lib/mysql

Re: CRITICAL - popen timeout received, but no child process

Posted: Wed Apr 01, 2015 1:38 pm
by ssax
For the hosts that are experiencing the problems are you using check_icmp or check_ping or something else?

Re: CRITICAL - popen timeout received, but no child process

Posted: Wed Apr 01, 2015 1:43 pm
by bosecorp
I think check_icmp

remember, that I am using NRDS or NRDP or whatever is called. so the actual checks are done by the host.

but correct me if I am wrong but the actual host check is done by Nagios, right?

these host is member of this template xiwizard_passive_host, and when I go and check that host template, it's associated with anohter host template called xiwizard_generic_host, which has the following command configured "$USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$,$ARG2$ -c $ARG3$,$ARG4$ -p 5 -t 30"

Re: CRITICAL - popen timeout received, but no child process

Posted: Wed Apr 01, 2015 5:10 pm
by jdalrymple
bosecorp,

Sorry, I missed you mentioning that these are passive checks.

NRDS is the tool you're using?

Can you run the check_icmp from the host command line and verify the output?

There is no chance that we now have weird situations where gearman checks are getting mixed in on hosts with NRDS checks is there? Your environment must be very complicated. Do you guys have any sketches of how it's laid out? How do you determine what is monitored by gearman and what is sending NRDS checks back in?

Re: CRITICAL - popen timeout received, but no child process

Posted: Wed Apr 01, 2015 5:30 pm
by bosecorp
these checks are being done by the JOB server

Yes, I am using NRDS.


this is happening with severall AIX-Linux clients

Re: CRITICAL - popen timeout received, but no child process

Posted: Thu Apr 02, 2015 9:42 am
by jdalrymple
We need a better description of your environment I'm certain. Do you have a Visio diagram of your monitoring infrastructure so we can better understand how it works?

I'm having a hard time wrapping my mind around what you just said. The "Job server" is by definition the Nagios server, or the server with the NEB module installed and running. Passive checks on remote hosts cannot be submitted by the Nagios server.

If your environment is configured and running just as described above then I can understand why there would be issues :)