Nagios Was down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
vinayvemula
Posts: 18
Joined: Mon Jun 03, 2019 2:09 am

Nagios Was down

Post by vinayvemula »

Hi Team,
Our Nagiosxi Engine was down. Unable to open nagios xi GUI.
When we give pcs status command output is below:

[root@******** services]# pcs status
Cluster name: nagioscluster
Stack: corosync
Current DC: ************ (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Fri May 29 15:18:16 2020
Last change: Fri May 29 12:18:43 2020 by root via cibadmin on *************

2 nodes configured
9 resources configured (2 DISABLED)

Online: [ ********.rwest.local ********.rwest.local ]

Full list of resources:

Master/Slave Set: DrbdDataClone [DrbdData]
Masters: [ *********** ]
Slaves: [ *********** ]
Resource Group: g_nagios
p_fs_drbd (ocf::heartbeat:Filesystem): Started **********
p_mysql (ocf::heartbeat:mysql): Stopped (disabled)
p_nagios (systemd:nagios): Stopped
p_ndo2db (systemd:ndo2db): Stopped
p_npcd (lsb:npcd): Stopped
p_httpd (ocf::heartbeat:apache): Stopped
p_crond (systemd:crond): Stopped

Failed Resource Actions:
* p_nagios_start_0 on ******** 'not running' (7): call=44, status=complete, exitreason='',
last-rc-change='Fri May 29 12:15:18 2020', queued=0ms, exec=2034ms
* p_fs_drbd_start_0 on ************* 'unknown error' (1): call=40, status=complete, exitreason='Couldn't mount device [/dev/drbd0] as /drbd',
last-rc-change='Fri May 29 12:15:25 2020', queued=0ms, exec=48ms

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

------------------------------------------------------------------------------------------------------------------------------------------------------------
all the Nagios services are down.
mariadb, crond, ndo2db, nagios, npcd, httpd
when we tried starting services one by one httpd, npcd started but
unable to start nagios service, getting below error:

[root@****** services]# systemctl start nagios
Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.

Could you please help us. This is our new nagios instance where we are migrating our old monitoring.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios Was down

Post by benjaminsmith »

Hi @vinayvemula,

Is this the production server that is down right now or are you currently setting up a test server for migration? After you restarted Apache, are you able to pull up the GUI? If you're able to get in, it would help to get a system profile for troubleshooting and reviewing the logs.

To download a profile from the GUI, go to Admin > System Profile > Download Profile. Otherwise, you can generate this from the command line by running the following:

Code: Select all

/usr/local/nagiosxi/scripts/components/getprofile.sh support
# The profile will be in the following directory:
/usr/local/nagiosxi/var/components/profile.zip 
Lastly, can your tell us more about your environment? It looks like you have Nagios XI setup up as cluster?

Thanks.
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
vinayvemula
Posts: 18
Joined: Mon Jun 03, 2019 2:09 am

Re: Nagios Was down

Post by vinayvemula »

Hi Team,

This is production server only and we have completed migration and is about to go live in next week. This is running fine until 28 of this month then the issue started.
Yes the setup is DRBD cluster. Post the issue We tried to bring it into stand alone and tried restarting it but no luck for us.

Please find attached profile zip.

Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios Was down

Post by benjaminsmith »

Hi @ vinayvemula,

Thanks for sending over the system profile, let's try to get the Nagios service started first. Please bring it into standalone operation ( without DRBD ). Log in into the shell as root, and run the following:

Code: Select all

/usr/local/nagiosxi/scripts/reset_config_perms.sh
Next, run the following command to check the configuration files for errors, and post the output.

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If it clears without errors, try to start the nagios service again and post any error messages along with journalctl -xe output. Lastly, what operating system are you using?

Thanks.
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked