Monitoring Engine cannot running

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
hhnagiosxi
Posts: 11
Joined: Sun May 20, 2018 10:28 pm

Monitoring Engine cannot running

Post by hhnagiosxi »

Monitoring Engine cannot running.


[root@nagiosxi libexec]# /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Website: https://www.nagios.org
Nagios 4.4.5 starting... (PID=5619)
Local time is Sat Jul 25 10:53:03 HKT 2020
wproc: Successfully registered manager as @wproc with query handler
wproc: Registry request: name=Core Worker 5620;pid=5620
wproc: Registry request: name=Core Worker 5624;pid=5624
wproc: Registry request: name=Core Worker 5622;pid=5622
wproc: Registry request: name=Core Worker 5623;pid=5623
wproc: Registry request: name=Core Worker 5621;pid=5621
wproc: Registry request: name=Core Worker 5625;pid=5625
Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Warning: Duplicate definition found for service 'HTTP' on host 'hhcwf02.hiphing.com.hk' (config file '/usr/local/nagios/etc/services/hhcwf02.hiphing.nws.cfg', starting on line 90)
Warning: Service '/dev/mapper/efilingstr1_vg_data-lv_home' on host 'efilingstr.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhcwf01.hiphing.com.hk' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhcwf01.hiphing.com.hk' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhcwf02.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhcwf02.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive E: Disk Usage' on host 'hhcwf02.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhcwf03.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Unique Temp Serial Number' on host 'hhdb.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Valid Employee Master' on host 'hhdb.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'MSSQL Query - Sync Agent' on host 'hhdb03.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'MSSQL Query - Sync Jobs transaction' on host 'hhdb03.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Check win scheduled jobs' on host 'hhdmscom01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmscom01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhdmscom01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmsdb.hiphing.com.hk' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhdmsdb.hiphing.com.hk' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive E: Disk Usage' on host 'hhdmsdb.hiphing.com.hk' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Check win scheduled jobs' on host 'hhdmsdb01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmsdb01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhdmsdb01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive E: Disk Usage' on host 'hhdmsdb01.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmsdb02.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhdmsdb02.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive E: Disk Usage' on host 'hhdmsdb02.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmsweb05.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmsweb06.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhdmsweb07.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'hhtender.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive D: Disk Usage' on host 'hhtender.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'intranetpreprod.hiphing.com.hk' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'Drive C: Disk Usage' on host 'peoplebatchprd.hiphing.nws' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Successfully launched command file worker with pid 563
hhnagiosxi
Posts: 11
Joined: Sun May 20, 2018 10:28 pm

Re: Monitoring Engine cannot running

Post by hhnagiosxi »

I saw one of the major error in nagios.log which is "Error writing to data sink" and there are some crond.service errors.

[root@nagiosxi /]# tail -f /usr/local/nagios/var/nagios.log
[1595837016] Warning: Service 'Drive D: Disk Usage' on host 'hhdmsdb02.hiphing.n ws' has a notification interval less than its check interval! Notifications ar e only re-sent after checks are made, so the effective notification interval wil l be that of the check interval.
[1595837016] Warning: Service 'Drive E: Disk Usage' on host 'hhdmsdb02.hiphing.n ws' has a notification interval less than its check interval! Notifications ar e only re-sent after checks are made, so the effective notification interval wil l be that of the check interval.
[1595837016] Warning: Service 'Drive C: Disk Usage' on host 'hhdmsweb05.hiphing. nws' has a notification interval less than its check interval! Notifications a re only re-sent after checks are made, so the effective notification interval wi ll be that of the check interval.
[1595837016] Warning: Service 'Drive C: Disk Usage' on host 'hhdmsweb06.hiphing. nws' has a notification interval less than its check interval! Notifications a re only re-sent after checks are made, so the effective notification interval wi ll be that of the check interval.
[1595837016] Warning: Service 'Drive C: Disk Usage' on host 'hhdmsweb07.hiphing. nws' has a notification interval less than its check interval! Notifications a re only re-sent after checks are made, so the effective notification interval wi ll be that of the check interval.
[1595837016] Warning: Service 'Drive C: Disk Usage' on host 'hhtender.hiphing.nw s' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1595837016] Warning: Service 'Drive D: Disk Usage' on host 'hhtender.hiphing.nw s' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
[1595837016] Warning: Service 'Drive C: Disk Usage' on host 'intranetpreprod.hip hing.com.hk' has a notification interval less than its check interval! Notific ations are only re-sent after checks are made, so the effective notification int erval will be that of the check interval.
[1595837016] Warning: Service 'Drive C: Disk Usage' on host 'peoplebatchprd.hiph ing.nws' has a notification interval less than its check interval! Notificatio ns are only re-sent after checks are made, so the effective notification interva l will be that of the check interval.
[1595837016] Successfully launched command file worker with pid 29804
[1595837033] ndomod: Successfully reconnected to data sink! 2107 items lost, 50 00 queued items to flush.
[1595837033] ndomod: Error writing to data sink! Some output may get lost. 500 0 queued items to flush.
[1595837050] ndomod: Successfully reconnected to data sink! 74 items lost, 5000 queued items to flush.
[1595837050] ndomod: Error writing to data sink! Some output may get lost. 5000 queued items to flush.
[1595837066] ndomod: Successfully reconnected to data sink! 38 items lost, 5000 queued items to flush.
[1595837066] ndomod: Error writing to data sink! Some output may get lost. 5000 queued items to flush.
[1595837082] ndomod: Successfully reconnected to data sink! 63 items lost, 5000 queued items to flush.
[1595837082] ndomod: Error writing to data sink! Some output may get lost. 5000 queued items to flush.
[1595837098] ndomod: Successfully reconnected to data sink! 39 items lost, 5000 queued items to flush.
[1595837098] ndomod: Error writing to data sink! Some output may get lost. 5000 queued items to flush.
[1595837114] ndomod: Successfully reconnected to data sink! 63 items lost, 5000 queued items to flush.
[1595837114] ndomod: Error writing to data sink! Some output may get lost. 5000 queued items to flush.

[root@nagiosxi /]# service crond status
Redirecting to /bin/systemctl status crond.service
● crond.service - Command Scheduler
Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-07-17 22:00:55 HKT; 1 weeks 2 days ago
Main PID: 800 (crond)
CGroup: /system.slice/crond.service
└─800 /usr/sbin/crond -n

Jul 27 16:19:01 nagiosxi crond[5090]: PAM unable to dlopen(/usr/lib64/security/pam_oddjob_mkhomedir.so): /usr...ctory
Jul 27 16:19:01 nagiosxi crond[5092]: PAM unable to dlopen(/usr/lib64/security/pam_oddjob_mkhomedir.so): /usr...ctory
Jul 27 16:19:01 nagiosxi crond[5092]: PAM adding faulty module: /usr/lib64/security/pam_oddjob_mkhomedir.so
Jul 27 16:19:01 nagiosxi crond[5090]: PAM adding faulty module: /usr/lib64/security/pam_oddjob_mkhomedir.so
Jul 27 16:19:01 nagiosxi crond[5085]: PAM unable to dlopen(/usr/lib64/security/pam_oddjob_mkhomedir.so): /usr...ctory
Jul 27 16:19:01 nagiosxi crond[5085]: PAM adding faulty module: /usr/lib64/security/pam_oddjob_mkhomedir.so
Jul 27 16:19:01 nagiosxi crond[5088]: PAM unable to dlopen(/usr/lib64/security/pam_oddjob_mkhomedir.so): /usr...ctory
Jul 27 16:19:01 nagiosxi crond[5088]: PAM adding faulty module: /usr/lib64/security/pam_oddjob_mkhomedir.so
Jul 27 16:19:01 nagiosxi crond[5082]: PAM unable to dlopen(/usr/lib64/security/pam_oddjob_mkhomedir.so): /usr...ctory
Jul 27 16:19:01 nagiosxi crond[5082]: PAM adding faulty module: /usr/lib64/security/pam_oddjob_mkhomedir.so
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Monitoring Engine cannot running

Post by benjaminsmith »

Hi,

Let's get a system profile so we can review all of the logs for you, please follow the instructions below.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

Then try to re-start the whole stack, clear the kernel message queues, and let me know if you notice any improvement. The following commands are for Cent/RHEL 7+ and may have to be adjusted for other distributions.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Lastly, how long has this been happening and have you made any changes lately? Thanks, Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
hhnagiosxi
Posts: 11
Joined: Sun May 20, 2018 10:28 pm

Re: Monitoring Engine cannot running

Post by hhnagiosxi »

Thanks for your reply. I have uploaded the profile for your review. I will try your above commands at the same time and will get back results to you.

Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
hhnagiosxi
Posts: 11
Joined: Sun May 20, 2018 10:28 pm

Re: Monitoring Engine cannot running

Post by hhnagiosxi »

This issue has been occurred since yum update. And I have followed your colleague's instruction below to reinstall again. It helps to fix my ndo2db missing issue.

https://support.nagios.com/forum/viewto ... 16&t=59428

I have followed your commands and Nagios monitoring engine is running now. However, I find that there is no notification (refers to the attachment) and even I created the new host and service in Nagios core configure manager, there is no show in bbmap.
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Monitoring Engine cannot running

Post by benjaminsmith »

Hi,

Thanks for the profile, the issue is the database connectivity.

Code: Select all

Jul 28 18:12:11 nagiosxi ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Jul 28 18:12:11 nagiosxi ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Since the database is offloaded the database log is not in the profile, can you fetch this from the remote server, and post it to the thread. Also, verify the database service is running on the remote machine.

Lastly, for a smaller system like this, I would recommend keeping the database local as network bottlenecks are probably greater than the server resources required from the database service.

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
hhnagiosxi
Posts: 11
Joined: Sun May 20, 2018 10:28 pm

Re: Monitoring Engine cannot running

Post by hhnagiosxi »

Hi BenJaminsmith,

Thanks for your great help.
Monitoring engine seems to be OK now after I reconfigured /usr/local/nagios/etc/ndo2db.cfg file, just pointed db_host=my remote database server. I have uploaded the latest profile for your review in case it still has the major error(s).

As for your recommendation, whether it is difficult to roll back the database from MySQL server to the local database? :mrgreen:
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Monitoring Engine cannot running

Post by benjaminsmith »

Hi,

Glad to be of service. That looks better, I did notice this error in the messages log, however, it may just be a setting with the display hardware.

Code: Select all

Jul 29 18:53:11 nagiosxi kernel: IN-world:IN=enp3s0f0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:56:bd:f1:49:08:00 SRC=10.13.0.161 DST=10.13.0.255 LEN=229 TOS=0x00 PREC=0x00 TTL=128 ID=63394 PROTO=UDP SPT=138 DPT=138 LEN=209 
Jul 29 18:53:11 nagiosxi kernel: [drm:radeon_vga_detect [radeon]] *ERROR* VGA-1: probed a monitor but no|invalid EDID
As far as moving the database back local, it's just the opposite of the process of offloading it, see the steps in the guide below:
Offloading MySQL to Remote Server

If you want to run through the steps on a test server first, that would be recommended. With your Nagios XI license, you're entitled to 3 activations: production, test, and backup.

Ben
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked