CRITICAL - popen timeout received, but no child process
Re: CRITICAL - popen timeout received, but no child process
I did that yesterday and I am still seeing the same problem
Re: CRITICAL - popen timeout received, but no child process
I agree with jdalrymple in that we need a bit more info on how things are being laid out. Running a check_icmp or check_ping as a passive check via NRDS does not make a lot of sense. What specifically do you mean by the "JOB server"? Do you mean the Nagios server, the remote server that you are monitoring passively, or something else?
Former Nagios employee
Re: CRITICAL - popen timeout received, but no child process
what I mean is the nagios server
I am assuming is doing check_icmp because I am getting alerts reporting that the device is down. I get an email from nagios saying that "Info: CRITICAL - popen timeout received, but no child process". I also get this email alert from nagios "Info: CRITICAL - Plugin timed out after 10 seconds". they majority of the alerts have first error message.
I am getting false alerts that many Linux-AIX servers are doing. All of them are running NRDS, meaning is all passive checks. I get a alerts from nagios that these devices are down and then minutes later they are UP in nagios.
Now in terms of my environment.
MY nagios servers is not behind any firewall.
I have two workers that are behind the firewall, 1 in each different geographic location.
the configuration of this clients are done by the wizard. typically we go to admin & unconfigure objects. from there we follow the wizard. So essentially it's configure as a passive check but don;t know why I get this alerts. Like I said before these hosts are member of template xiwizard_passive_host, and when I go and check that host template, it's associated with anohter host template called xiwizard_generic_host. the template "xiwizard_generic_host" is the one doing the check_icmp but this all gets configured as part of the wizard.
I am assuming is doing check_icmp because I am getting alerts reporting that the device is down. I get an email from nagios saying that "Info: CRITICAL - popen timeout received, but no child process". I also get this email alert from nagios "Info: CRITICAL - Plugin timed out after 10 seconds". they majority of the alerts have first error message.
I am getting false alerts that many Linux-AIX servers are doing. All of them are running NRDS, meaning is all passive checks. I get a alerts from nagios that these devices are down and then minutes later they are UP in nagios.
Now in terms of my environment.
MY nagios servers is not behind any firewall.
I have two workers that are behind the firewall, 1 in each different geographic location.
the configuration of this clients are done by the wizard. typically we go to admin & unconfigure objects. from there we follow the wizard. So essentially it's configure as a passive check but don;t know why I get this alerts. Like I said before these hosts are member of template xiwizard_passive_host, and when I go and check that host template, it's associated with anohter host template called xiwizard_generic_host. the template "xiwizard_generic_host" is the one doing the check_icmp but this all gets configured as part of the wizard.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: CRITICAL - popen timeout received, but no child process
If you're using the NRDS wizard on the XI server and following the XI install instructions then the workers *shouldn't* be involved. It might be useful for YOU to look at your XI interface and identify which hosts are configured passive and which are not (see attachment). Based upon our long-winded gearman thread a few weeks ago I was under the impression that most of your checks were run by the gearman server, not passively submitted to XI.
pick one of your broken hosts and share with us the config in /usr/local/nagios/etc/hosts/myhostname.cfg:
pick one of your broken hosts and share with us the config in /usr/local/nagios/etc/hosts/myhostname.cfg:
Code: Select all
[jdalrymple@localhost ~]$ cat /usr/local/nagios/etc/hosts/localhost_passive.cfg
###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.3.2
# Date: 2015-04-02 14:22:53
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################
define host {
host_name localhost_passive
use xiwizard_passive_host
address localhost_passive
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
contacts nagiosadmin
notification_interval 60
notification_period xi_timeperiod_24x7
stalking_options n
icon_image passiveobject.png
statusmap_image passiveobject.png
_xiwizard passiveobject
register 1
}
###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################You do not have the required permissions to view the files attached to this post.
Re: CRITICAL - popen timeout received, but no child process
most of the devices are not passive.
only the AIX and Linux are passive
define host {
host_name dbamon.bose.com
use xiwizard_passive_host
address dbamon
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
contact_groups CIS-DCOPS,CIS-ISCG
notification_interval 60
notification_period xi_timeperiod_24x7
stalking_options n
icon_image passiveobject.png
statusmap_image passiveobject.png
_xiwizard passiveobject
register 1
}
only the AIX and Linux are passive
define host {
host_name dbamon.bose.com
use xiwizard_passive_host
address dbamon
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
contact_groups CIS-DCOPS,CIS-ISCG
notification_interval 60
notification_period xi_timeperiod_24x7
stalking_options n
icon_image passiveobject.png
statusmap_image passiveobject.png
_xiwizard passiveobject
register 1
}
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: CRITICAL - popen timeout received, but no child process
If you're using XI defaults this is the code that is performed for AIX host checks:
Can you run that from one of your AIX boxes giving you a problem and see what the result is?
Code: Select all
/opt/nagios/libexec/check_ping -H localhost -w 200.0,40% -c 400.0,80% -p 1Re: CRITICAL - popen timeout received, but no child process
# ./check_ping -H localhost -w 200.0,40% -c 400.0,80% -p 1
PING OK - Packet loss = 0%, RTA = 0.03 ms|rta=0.031000ms;200.000000;400.000000;0.000000 pl=0%;40;80;0
PING OK - Packet loss = 0%, RTA = 0.03 ms|rta=0.031000ms;200.000000;400.000000;0.000000 pl=0%;40;80;0
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: CRITICAL - popen timeout received, but no child process
Your configuration looks proper and obviously that check result looks good.bosecorp wrote:I am getting this error in many of my AIX and Linux clients
CRITICAL - popen timeout received, but no child process
Can you show us a screenshot of the error? I feel like I'm missing something.
Re: CRITICAL - popen timeout received, but no child process
another question I have, how come I am getting this alerts about this device being down if the checks are passive. the host check is not being done by the Nagios server, right?
From: Nagios XI Production [mailto:xxxxxxxxxxxxxxxxxxxxxxxx]
Sent: Friday, April 3, 2015 10:46 AM
Subject: PROBLEM Host Alert - dbamon.bose.com is DOWN
***** Nagios XI Alert *****
Nagios has detected a problem with this host.
Notification Type: PROBLEM
Host: dbamon.bose.com
State: DOWN
Address: dbamon
Info: CRITICAL - Plugin timed out after 10 seconds
Date/Time: 2015-04-03 10:45:54
Respond: http://xx.xx.xx.xx/nagiosxi/rr.php?uid= ... 308a1061fb
Nagios URL: http://xx.xx.xx.xx/nagiosxi/
From: Nagios XI Production [mailto:xxxxxxxxxxxxxxxxxxxxxxxx]
Sent: Friday, April 3, 2015 10:46 AM
Subject: PROBLEM Host Alert - dbamon.bose.com is DOWN
***** Nagios XI Alert *****
Nagios has detected a problem with this host.
Notification Type: PROBLEM
Host: dbamon.bose.com
State: DOWN
Address: dbamon
Info: CRITICAL - Plugin timed out after 10 seconds
Date/Time: 2015-04-03 10:45:54
Respond: http://xx.xx.xx.xx/nagiosxi/rr.php?uid= ... 308a1061fb
Nagios URL: http://xx.xx.xx.xx/nagiosxi/
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: CRITICAL - popen timeout received, but no child process
Simple answer - if a passive check sends back a result saying "I'm down" Nagios doesn't interpret that, it reports as it's being told to.
Make sense?
Can we get that screenshot?
Make sense?
Can we get that screenshot?