Waiting for configuration verification took about 40 minutes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nagio60
Posts: 9
Joined: Tue Feb 28, 2017 7:35 pm

Waiting for configuration verification took about 40 minutes

Post by nagio60 »

After setup the monitor and apply, I would get "Waiting for configuration verification" that took about 40 minutes. Looking at tha nagios.log file I saw tons of the following

[1490657671] wproc: Core Worker 27743: job 1199 (pid=20620): Dormant child reaped
[1490657671] wproc: Core Worker 27743: tv.tv_sec is currently 1490657671
[1490657671] wproc: Core Worker 27743: Failed to reap child with pid 21367. Next attempt @ 1490657676.385565
[1490657671] wproc: Core Worker 27743: tv.tv_sec is currently 1490657671
[1490657671] wproc: Core Worker 27743: Failed to reap child with pid 21369. Next attempt @ 1490657676.386493
[1490657681] wproc: Core Worker 27743: tv.tv_sec is currently 1490657681
[1490657681] wproc: Core Worker 27743: Failed to reap child with pid 21367. Next attempt @ 1490657686.421934
[1490657681] wproc: Core Worker 27743: tv.tv_sec is currently 1490657681
[1490657681] wproc: Core Worker 27743: Failed to reap child with pid 21369. Next attempt @ 1490657686.431027
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Waiting for configuration verification took about 40 min

Post by avandemore »

What versions are you running?
Previous Nagios employee
nagio60
Posts: 9
Joined: Tue Feb 28, 2017 7:35 pm

Re: Waiting for configuration verification took about 40 min

Post by nagio60 »

Hi, it's Nagios XI 5.4.2

One thing I noticed is that http load is extremely high when the core manager is making update. After that's done, it's back to normal.

top - 21:34:58 up 27 days, 1:00, 1 user, load average: 13.57, 6.11, 4.30
Tasks: 194 total, 23 running, 169 sleeping, 0 stopped, 2 zombie
Cpu(s): 94.4%us, 5.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 8050724k total, 4059320k used, 3991404k free, 986440k buffers
Swap: 10485752k total, 124k used, 10485628k free, 1378584k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12021 apache 16 0 427m 56m 5260 S 40.5 0.7 18:52.25 httpd
1730 apache 16 0 424m 53m 4332 S 24.2 0.7 8:16.60 httpd
4343 apache 16 0 427m 56m 5140 S 23.6 0.7 9:04.17 httpd
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Waiting for configuration verification took about 40 min

Post by tgriep »

Can you login to the Nagios server in a shell and post the output?

Code: Select all

cd /usr/local/nagiosxi/scripts
su nagios
./reconfigure_nagios.sh
Running the reconfigure_nagios script is almost the same as running the Apply Config and we should be able to see the error and go from there.
Be sure to check out our Knowledgebase for helpful articles and solutions!
nagio60
Posts: 9
Joined: Tue Feb 28, 2017 7:35 pm

Re: Waiting for configuration verification took about 40 min

Post by nagio60 »

Hi, here's the output

URL: http://localhost/nagiosxi/includes/components/ccm/
CMDLINE
/usr/bin/wget --save-cookies nagiosql.cookies --keep-session-cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'submit=Login&hidelog=true&loginSubmitted=true&backend=1&username=nagiosxi&password=68gITJ' -O nagiosql.login--2017-03-28 12:50:42-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.login'

[ <=> ] 36,685 --.-K/s in 0s

2017-03-28 12:50:46 (224 MB/s) - `nagiosql.login' saved [36685]

LOGIN SUCCESSFUL!
IMPORTING CONFIG FILES...URL: http://localhost/nagiosxi/includes/components/ccm/
Array
(
)
RESETTING PERMS
URL: http://localhost/nagiosxi/includes/components/ccm/
CMDLINE
/usr/bin/wget --save-cookies nagiosql.cookies --keep-session-cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'submit=Login&hidelog=true&loginSubmitted=true&backend=1&username=nagiosxi&password=68gITJ' -O nagiosql.login--2017-03-28 12:50:48-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.login'

[ <=> ] 36,685 --.-K/s in 0s

2017-03-28 12:50:49 (265 MB/s) - `nagiosql.login' saved [36685]

LOGIN SUCCESSFUL!
URL: http://localhost/nagiosxi/includes/components/ccm/
CMDLINE:
/usr/bin/wget --load-cookies=nagiosql.cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'backend=1&cmd=apply&type=writeConfig' -O nagiosql.export.monitoring
--2017-03-28 12:50:50-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.export.monitoring'

[ <=> ] 22,130 --.-K/s in 0s

2017-03-28 12:50:51 (261 MB/s) - `nagiosql.export.monitoring' saved [22130]

WRITE CONFIGS SUCCESSFUL!
OUTPUT:
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 98 services.
Warning: Host '10.97.139.1' has no default contacts or contactgroups defined!
Checked 19 hosts.
Checked 5 host groups.
Checked 0 service groups.
Warning: Service recovery notification option for contact 'nagiosadmin' doesn't make any sense - specify critical and/or warning options as well
Checked 2 contacts.
Checked 2 contact groups.
Checked 124 commands.
Checked 8 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 19 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 8 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 2
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check
RET: 0
Running configuration check...
Stopping nagios: done.
Starting nagios: done.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Waiting for configuration verification took about 40 min

Post by scottwilkerson »

That appears to have only taken 5 seconds, can you try applying config through the UI again?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
nagio60
Posts: 9
Joined: Tue Feb 28, 2017 7:35 pm

Re: Waiting for configuration verification took about 40 min

Post by nagio60 »

Try apply the config again from UI, same deal. In fact, the node was in "almost" frozen state while this is taking place. Again, httpd was the culprit for high load.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12032 apache 16 0 426m 56m 4876 R 50.1 0.7 26:06.20 httpd
30582 apache 16 0 425m 55m 4300 S 23.5 0.7 27:10.17 httpd
30580 apache 16 0 426m 55m 4344 S 23.2 0.7 30:08.58 httpd


[1490741800] SERVICE ALERT: localhost;Current Load;CRITICAL;SOFT;2;CRITICAL - load average: 31.09, 13.04, 5.89
[1490741804] wproc: Core Worker 18730: job 1218 (pid=5261) timed out. Killing it
[1490741804] wproc: Core Worker 18730: kill(-5261, SIGKILL) failed: Operation not permitted
[1490741811] wproc: CHECK job 1218 from worker Core Worker 18730 timed out after 126.59s
[1490741811] wproc: host=localhost; service=Service Status - ndo2db;
[1490741811] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1490741811] Warning: Check of service 'Service Status - ndo2db' on host 'localhost' timed out after 126.593s!
[1490741811] SERVICE ALERT: localhost;Service Status - ndo2db;CRITICAL;SOFT;1;(Service check timed out after 126.59 seconds)
[1490741811] wproc: Core Worker 18730: kill(-5261, SIGKILL) failed: Operation not permitted
[1490741811] wproc: Core Worker 18730: job 1218 (pid=5261): Dormant child reaped
[1490741851] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1490741842.perfdata.host' timed out after 5 seconds
[1490741864] Warning: Service performance data file processing command '/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1490741856.perfdata.service' timed out after 5 seconds
[1490741864] SERVICE ALERT: localhost;Current Load;CRITICAL;SOFT;3;CRITICAL - load average: 39.05, 18.83, 8.37
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Waiting for configuration verification took about 40 min

Post by avandemore »

Please run:

Code: Select all

# tail -F /usr/local/nagiosxi/var/cmdsubsys.log
Then do an apply config from the GUI. Cancel out of the tail, then please send the output generated.

Also please include the files /var/log/httpd/*error_log.
Previous Nagios employee
nagio60
Posts: 9
Joined: Tue Feb 28, 2017 7:35 pm

Re: Waiting for configuration verification took about 40 min

Post by nagio60 »

The /var/log/httpd/*error_log is empty

> ls -lrt *error_log
-rw-rw-rw- 1 root root 0 Feb 28 20:33 ssl_error_log

Applied the config and got the following,

....
PROCESSED 0 COMMANDS
YING NAGIOSCORE CONFIG...
CMDLINE=cd /usr/local/nagiosxi/scripts && ./reconfigure_nagios.sh
URL: http://localhost/nagiosxi/includes/components/ccm/
CMDLINE
--2017-03-30 13:20:30-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.login'

0K .......... .......... .......... ..... 150M=0s

2017-03-30 13:20:44 (150 MB/s) - `nagiosql.login' saved [36724]

/usr/bin/wget --save-cookies nagiosql.cookies --keep-session-cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'submit=Login&hidelog=true&loginSubmitted=true&backend=1&username=nagiosxi&password=68gITJ' -O nagiosql.loginLOGIN SUCCESSFUL!
tail: /usr/local/nagiosxi/var/cmdsubsys.log: file truncated
IMPORTING CONFIG FILES...URL: http://localhost/nagiosxi/includes/components/ccm/
Array
(
[0] => 9DY5NVED.tmp.cfg
[1] => n001.testing.com.cfg
)
IMPORTING /usr/local/nagios/etc/import/9DY5NVED.tmp.cfg
CMDLINE:
/usr/bin/wget --load-cookies=nagiosql.cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'backend=1&cmd=admin&type=import&importsubmitted=true&chbOverwrite=1&subForm=Import&selImportFile[]=/usr/local/nagios/etc/import/9DY5NVED.tmp.cfg' -O nagiosql.import.monitoring
--2017-03-30 13:23:22-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.import.monitoring'

0K .......... .......... .......... 159M=0s

2017-03-30 13:23:40 (159 MB/s) - `nagiosql.import.monitoring' saved [30921]

IMPORTING /usr/local/nagios/etc/import/n001.testing.com.cfg
CMDLINE:
/usr/bin/wget --load-cookies=nagiosql.cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'backend=1&cmd=admin&type=import&importsubmitted=true&chbOverwrite=1&subForm=Import&selImportFile[]=/usr/local/nagios/etc/import/n001.testing.com.cfg' -O nagiosql.import.monitoring
--2017-03-30 13:23:40-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.import.monitoring'

0K .......... .......... .......... 189M=0s

2017-03-30 13:23:50 (189 MB/s) - `nagiosql.import.monitoring' saved [30787]

...........RESETTING PERMS
....
PROCESSED 0 COMMANDS
.........................................
PROCESSED 0 COMMANDS
URL: http://localhost/nagiosxi/includes/components/ccm/
CMDLINE
tail: /usr/local/nagiosxi/var/cmdsubsys.log: file truncated
.--2017-03-30 13:34:05-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.login'

0K .......... .......... .......... ..... 168M=0s

2017-03-30 13:34:36 (168 MB/s) - `nagiosql.login' saved [36799]

/usr/bin/wget --save-cookies nagiosql.cookies --keep-session-cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'submit=Login&hidelog=true&loginSubmitted=true&backend=1&username=nagiosxi&password=68gITJ' -O nagiosql.loginLOGIN SUCCESSFUL!
........
PROCESSED 0 COMMANDS
.....................
PROCESSED 0 COMMANDS
server at 'reading authorization packet', system error: 0 in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-mysqli.inc.php on line 117
...........................................................
PROCESSED 0 COMMANDS
URL: http://localhost/nagiosxi/includes/components/ccm/
CMDLINE:
/usr/bin/wget --load-cookies=nagiosql.cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'backend=1&cmd=apply&type=writeConfig' -O nagiosql.export.monitoring
--2017-03-30 13:42:45-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `nagiosql.export.monitoring'

0K .......... .......... . 160M=0s

2017-03-30 13:42:48 (160 MB/s) - `nagiosql.export.monitoring' saved [22314]

WRITE CONFIGS SUCCESSFUL!
OUTPUT:
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 100 services.
Warning: Host '10.97.139.1' has no default contacts or contactgroups defined!
Checked 21 hosts.
Checked 5 host groups.
Checked 0 service groups.
Warning: Service recovery notification option for contact 'nagiosadmin' doesn't make any sense - specify critical and/or warning options as well
Checked 2 contacts.
Checked 2 contact groups.
Checked 124 commands.
Checked 8 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 21 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 8 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 2
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check
RET: 0
.....................................................
PROCESSED 0 COMMANDS

Running configuration check...
.Stopping nagios: done.
Starting nagios: done.
OUTPUT=Starting nagios: done.
RETURNCODE=0

PROCESSED 1 COMMANDS
......................................................
PROCESSED 0 COMMANDS
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Waiting for configuration verification took about 40 min

Post by dwhitfield »

Is this a new install? If so, you ***could*** run reset_defaults.sh (I've intentionally not given you the full path so you don't accidentally run it). Now, why did I highlight could? Well, it's a giant red reset button (neither literally a button nor red). It's not something you want to do lightly, but it might get back on track more quickly than the forum.

Another option would be to try to upgrade to 5.4.3: https://assets.nagios.com/downloads/nag ... nstall.pdf . Again, it's a bit of a sledgehammer approach, but we are over three days since your original post, so maybe it is time for the sledgehammer. The upgrade makes a lot more sense to me than the reset, but people have different rules about upgrading in their environment, so I've given both options.

Lastly, can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button towards the top. If for whatever reason you *cannot* download the profile, please put the output of Show Profile in the thread (that will at least get us some info). This will give us a lot of logs all at once, which again, I hope would speed up a resolution for you.

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.

EDIT: profile received
Last edited by dwhitfield on Thu Apr 06, 2017 3:13 pm, edited 1 time in total.
Reason: profile received
Locked