Monitoring engine won't start (in Prod AND Test)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Monitoring engine won't start (in Prod AND Test)

Post by snapon_admin »

Showing the monitoring engine is stopped and won't start when I click start. Here's what I'm seeing.
monitoring engine not running.png
When I click start I get
Your request was not processed in a timely manner, It may still execute, as the server may be temporarily busy.
I've verified config, all looks good. Written config files, restarted Nagios service, etc. Checks appear to still be running, but we're still getting this issue. Thoughts?

Edit: This is now affecting my test server as well. What do?
You do not have the required permissions to view the files attached to this post.
Last edited by snapon_admin on Thu May 15, 2014 2:41 pm, edited 2 times in total.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Monitoring engine won't start

Post by slansing »

I just solved something like this on my end, lets check some stuff:

Code: Select all

service nagios restart
service nagios status

service crond restart
service crond status

service ndo2db restart
service ndo2db status

Code: Select all

tail -100 /usr/local/nagios/var/nagios.log
Are you using a mod gearman version that you had on a 2012 XI version on this server right now?
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Monitoring engine won't start

Post by snapon_admin »

Code: Select all

[root@lisl-ngos-01-pv conf]# service nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@lisl-ngos-01-pv conf]# service nagios status
nagios (pid 11371) is running...
[root@lisl-ngos-01-pv conf]# service crond restart
Stopping crond:                                            [  OK  ]
Starting crond:                                            [  OK  ]
[root@lisl-ngos-01-pv conf]# service crond status
crond (pid  13208) is running...
[root@lisl-ngos-01-pv conf]# service ndo2db restart
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting ndo2db: done.
[root@lisl-ngos-01-pv conf]# service ndo2db status
ndo2db (pid 15870) is running...
[root@lisl-ngos-01-pv conf]# tail -100 /usr/local/nagios/var/nagios.log
[1400178542] Warning: Duplicate definition found for service 'Load' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 380)
[1400178542] Warning: Duplicate definition found for service 'ldap process' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 364)
[1400178542] Warning: Duplicate definition found for service 'ldap client service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 348)
[1400178542] Warning: Duplicate definition found for service 'inetd service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 332)
[1400178542] Warning: Duplicate definition found for service 'inetd process' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 316)
[1400178542] Warning: Duplicate definition found for service 'gss service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 300)
[1400178542] Warning: Duplicate definition found for service 'fmd service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 284)
[1400178542] Warning: Duplicate definition found for service 'fmd process' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 268)
[1400178542] Warning: Duplicate definition found for service 'Faults' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 252)
[1400178542] Warning: Duplicate definition found for service 'ctmagent7 service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 236)
[1400178542] Warning: Duplicate definition found for service 'ctmagent7 process' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 220)
[1400178542] Warning: Duplicate definition found for service 'CPU Stats' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 204)
[1400178542] Warning: Duplicate definition found for service 'bind service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 188)
[1400178542] Warning: Duplicate definition found for service 'bind process' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 172)
[1400178542] Warning: Duplicate definition found for service 'autofs service' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 156)
[1400178542] Warning: Duplicate definition found for service 'autofs process' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 140)
[1400178542] Warning: Duplicate definition found for service '/workspace Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 124)
[1400178542] Warning: Duplicate definition found for service '/vendor/quest Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 108)
[1400178542] Warning: Duplicate definition found for service '/vendor/nbadmin Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 93)
[1400178542] Warning: Duplicate definition found for service '/vendor/nagios Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 77)
[1400178542] Warning: Duplicate definition found for service '/vendor/ctmlogs Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 61)
[1400178542] Warning: Duplicate definition found for service '/vendor/ctmagent7 Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 45)
[1400178542] Warning: Duplicate definition found for service '/export/home Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 30)
[1400178542] Warning: Duplicate definition found for service '/ Disk Usage' on host 'kenapps04g' (config file '/usr/local/nagios/etc/services/kenapps04g.cfg', starting on line 14)
[1400178542] Warning: Service 'Interface list' on host 'AlgonaIA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'ArndellParkNSW-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'CalgaryAB-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'CarsonCityNV-RC-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'CarsonCityNV-VC-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'CityofIndustryCA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'ColumbusGA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'ConwayAR-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'CorkIE-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'CrystalLakeIL-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'ElizabethtonTN-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'ElkmontAL-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'HarrisburgPA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'KenoshaWI-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'KingsLynnGB-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'LibertyvilleIL-Core-A'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'LibertyvilleIL-Core-B'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'LincolnshireIL-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'MilwaukeeWI-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'MississaugaON-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'MurphyNC-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'NewmarketON-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'OliveBranchMS-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'RobesoniaPA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'RochesterHillsMI-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'SanJoseCA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Client Connections' on host 'Solsrp07' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'Interface list' on host 'ThroopPA-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'TlalnepantlaMX-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service 'Interface list' on host 'UnterneukirchenDE-Core'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1400178542] Warning: Service '/ Disk Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service '/vendor/ctmagent7 Disk Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service '/vendor/ctmlogs Disk Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service '/vendor/nagios Disk Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service '/vendor/quest Disk Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service '/workspace Disk Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'Ping' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'Swap Usage' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'Total Processes' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'Users' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'autofs process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'autofs service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'bind process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'bind service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'ctmagent process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'ctmagent7 service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'gss service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'inetd process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'inetd service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'ldap client service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'ldap process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'local filesystem service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'name-service-cache service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'nfs client service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'nfs status service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'sendmail process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'sendmail-client service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'ssh process' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'ssh service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'system-log service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'tcp service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Service 'utmp service' on host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Host 'kendbms12t on kenapps47g' has no default contacts or contactgroups defined!
[1400178542] Warning: Host 'kengrid01p' has no default contacts or contactgroups defined!
[1400178546] Successfully launched command file worker with pid 11415
[1400178560] HOST ALERT: KingsLynnGB-Sentry;UP;HARD;1;OK - 10.160.250.1: rta 118.520ms, lost 0%
[1400178560] HOST ALERT: KingsLynnGB-Core;UP;HARD;1;OK - 10.160.19.2: rta 117.450ms, lost 0%
[1400178561] HOST ALERT: LidkopingSE-MPLS;UP;HARD;1;OK - 10.173.2.1: rta 146.847ms, lost 0%
[1400178568] ndomod: Error writing to data sink!  Some output may get lost...
[1400178568] ndomod: Please check remote ndo2db log, database connection or SSL Parameters
[1400178571] SERVICE ALERT: MississaugaON-MPLS;Outside Bandwidth;CRITICAL;SOFT;3;CRITICAL - Current BW in: 3484.81Kbps Out: 412.68Kbps
[1400178571] HOST ALERT: MississaugaON-MPLS;UP;HARD;1;OK - 10.94.19.1: rta 189.440ms, lost 0%
[1400178584] ndomod: Successfully reconnected to data sink!  0 items lost, 2156 queued items to flush.
[1400178584] ndomod: Successfully flushed 2156 queued items to data sink.
[1400178584] HOST ALERT: SelangorMY-Sentry;UP;HARD;1;OK - 10.145.213.1: rta 256.611ms, lost 0%
[1400178595] HOST ALERT: olive-branch-ip;UP;HARD;1;OK - 10.39.253.100: rta 45.906ms, lost 0%
[root@lisl-ngos-01-pv conf]#
Don't use mod gearman.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Monitoring engine won't start

Post by slansing »

Looks like ndo may not have started, what is the output of:

Code: Select all

ll /usr/local/nagios/var/ndo2db.lock
Are you seeing a running engine and running graph in the web interface now?
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Monitoring engine won't start

Post by snapon_admin »

Code: Select all

[root@lisl-ngos-01-pv conf]# ll /usr/local/nagios/var/ndo2db.lock
-rw-r--r--. 1 nagios nagios 6 May 15 13:29 /usr/local/nagios/var/ndo2db.lock
You have new mail in /var/spool/mail/root
[root@lisl-ngos-01-pv conf]#
Nope, still not running.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Monitoring engine won't start

Post by snapon_admin »

Ummm, just noticed this same thing happened on our test server. Monitoring engine is stopped and can't be started there either? Our test server has a grand total of like 2 servers being monitored on it atm, so not sure what the deal is here. I saw another post about config just sitting there taking forever, which mine does as well. I wonder if his monitoring engine is stopped as well?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Monitoring engine won't start (in Prod AND Test)

Post by tmcdonald »

I think you'll like what I have to say here: http://support.nagios.com/forum/viewtop ... 335#p98335
tmcdonald wrote:And adding on to what snapon said, I am guessing you can't manually force an immediate check of a service/host?

Good news: abrist and swilkerson are working on this right now and things are looking promising. I had a ticket earlier today that prompted this, so we can confirm that this is a known issue with a possible fix on the way.

I wasn't able to hear the full conversation since I was on the phone, but it is related to how you have SSL configured. I am guessing you both force SSL?
Former Nagios employee
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Monitoring engine won't start (in Prod AND Test)

Post by snapon_admin »

Hahaha, I just replied to that, and you are correct sir. I will keep an eye out for the fix, and thanks for the help as always!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Monitoring engine won't start (in Prod AND Test)

Post by tmcdonald »

Unzip the attached file and place in /usr/local/nagiosxi/html/includes/
You do not have the required permissions to view the files attached to this post.
Former Nagios employee
tylergates_ats
Posts: 5
Joined: Mon Jul 14, 2014 8:34 am

Re: Monitoring engine won't start (in Prod AND Test)

Post by tylergates_ats »

Running NagiosXI 2014r1.3 I have tried the utils-backend file listed here and am still getting the 'Process Info' page showing errors as well as the php division by zero warnings after trying to use secure https configured in apache.

Does anyone else have a solution to force https in NagiosXI without encountering these errors?

To recreate I modify nagiosxi.conf to included these directives

Code: Select all

RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://%{SERVER_NAME}%{REQUEST_URI}
Locked