CRITICAL - popen timeout received, but no child process

bosecorp · Post by **bosecorp** » Tue Mar 31, 2015 2:31 pm

done

still see the same. it's getting better but still

jdalrymple · Post by **jdalrymple** » Tue Mar 31, 2015 3:03 pm

[jdalrymple@localhost ~]$ for file in {1..300000}; do touch $file; done
[jdalrymple@localhost ~]$ ls -l | wc -l
300001
[jdalrymple@localhost ~]$ find ./ -type f -exec rm {} \;
[jdalrymple@localhost ~]$ ls -l | wc -l
1

It will take awhile, and it may beat up on your CPU so you may want to do it off-hours. It should work fine though.

bosecorp · Post by **bosecorp** » Tue Mar 31, 2015 3:08 pm

done

got better, but still

jdalrymple · Post by **jdalrymple** » Tue Mar 31, 2015 3:52 pm

uptime and df please:

Code: Select all

[jdalrymple@localhost ~]$ uptime
 15:51:57 up 6 days,  7:03,  1 user,  load average: 0.00, 0.00, 0.02
[jdalrymple@localhost ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/centos-lv_root
                       18G  3.1G   14G  19% /
tmpfs                 491M     0  491M   0% /dev/shm
/dev/sda1             477M   28M  425M   6% /boot

bosecorp · Post by **bosecorp** » Tue Mar 31, 2015 4:09 pm

Code: Select all

root@nagmonus1:(03-31 17:09): /root
# uptime
 17:09:09 up 4 days, 22:25,  3 users,  load average: 3.45, 3.76, 3.88
root@nagmonus1:(03-31 17:09): /root
# df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-lvroot        2.0G  820M  1.1G  43% /
tmpfs                            9.9G     0  9.9G   0% /dev/shm
/dev/sda1                        243M   49M  181M  22% /boot
/dev/mapper/rootvg-lvopt         2.0G   92M  1.8G   5% /opt
/dev/mapper/rootvg-lvtmp         6.9G   75M  6.5G   2% /tmp
/dev/mapper/rootvg-lvusers       4.0G  137M  3.7G   4% /users
/dev/mapper/rootvg-lvusr         7.9G  4.1G  3.4G  55% /usr
/dev/mapper/rootvg-lvvar          15G  5.9G  8.2G  42% /var
/dev/mapper/vgapp-lvapp           49G  4.1G   42G   9% /app
/dev/mapper/vgapp-lvstore         69G  6.6G   59G  11% /store
/dev/mapper/vgapp-lvlocalnagios  128G   28G   95G  23% /usr/local/nagios
/dev/mapper/vgapp-lvmysql         69G  2.6G   63G   4% /var/lib/mysql

ssax · Post by **ssax** » Wed Apr 01, 2015 1:38 pm

For the hosts that are experiencing the problems are you using check_icmp or check_ping or something else?

bosecorp · Post by **bosecorp** » Wed Apr 01, 2015 1:43 pm

I think check_icmp

remember, that I am using NRDS or NRDP or whatever is called. so the actual checks are done by the host.

but correct me if I am wrong but the actual host check is done by Nagios, right?

these host is member of this template xiwizard_passive_host, and when I go and check that host template, it's associated with anohter host template called xiwizard_generic_host, which has the following command configured "$USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$,$ARG2$ -c $ARG3$,$ARG4$ -p 5 -t 30"

jdalrymple · Post by **jdalrymple** » Wed Apr 01, 2015 5:10 pm

bosecorp,

Sorry, I missed you mentioning that these are passive checks.

NRDS is the tool you're using?

Can you run the check_icmp from the host command line and verify the output?

There is no chance that we now have weird situations where gearman checks are getting mixed in on hosts with NRDS checks is there? Your environment must be very complicated. Do you guys have any sketches of how it's laid out? How do you determine what is monitored by gearman and what is sending NRDS checks back in?

bosecorp · Post by **bosecorp** » Wed Apr 01, 2015 5:30 pm

these checks are being done by the JOB server

Yes, I am using NRDS.

this is happening with severall AIX-Linux clients

jdalrymple · Post by **jdalrymple** » Thu Apr 02, 2015 9:42 am

We need a better description of your environment I'm certain. Do you have a Visio diagram of your monitoring infrastructure so we can better understand how it works?

I'm having a hard time wrapping my mind around what you just said. The "Job server" is by definition the Nagios server, or the server with the NEB module installed and running. Passive checks on remote hosts cannot be submitted by the Nagios server.

If your environment is configured and running just as described above then I can understand why there would be issues

Nagios Support Forum

CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process

Re: CRITICAL - popen timeout received, but no child process