Services down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Services down

Post by vmwareguy »

Came into work this morning and the following services are down

Monitoring Engine
Performance Grapher
Database Backend

When I run service nagios status I get this:

Redirecting to /bin/systemctl status nagios.service
● nagios.service - Nagios Core 4.4.2
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-11-14 03:36:04 EST; 8min ago
Docs: https://www.nagios.org/documentation
Process: 32110 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 32107 ExecStop=/usr/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 301 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 300 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 305 (nagios)
CGroup: /system.slice/nagios.service
├─ 305 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
├─ 306 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 307 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 308 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 309 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 340 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
├─15812 /usr/local/nagios/libexec/check_nrpe -H 214.16.218.7 -t 30 -c check_cpu_stats -a -w 85 -c 95
└─15854 /usr/local/nagios/libexec/check_nrpe -H 160.107.103.151 -t 30 -c check_cpu_stats -a -w 85 -c 95


Nov 14 03:43:35 ServerName sudo[14470]: pam_unix(sudo:auth): conversation failed
Nov 14 03:43:35 ServerName sudo[14470]: pam_unix(sudo:auth): auth could not identify password for [nagios]
Nov 14 03:45:10 ServerName sudo[17615]: pam_unix(sudo:auth): conversation failed
Nov 14 03:45:10 ServerName sudo[17615]: pam_unix(sudo:auth): auth could not identify password for [nagios]


Not sure whats going on with these services. Any help would be great.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Services down

Post by scottwilkerson »

Can you show the output of the following

Code: Select all

chage -l nagios
service crond status
tail -20 /var/log/cron
df -h
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Services down

Post by vmwareguy »

scottwilkerson wrote:Can you show the output of the following

Code: Select all

chage -l nagios
service crond status
tail -20 /var/log/cron
df -h
chage -l nagios
Last password change : Oct 17, 2018
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7

service crond status
Redirecting to /bin/systemctl status crond.service
● crond.service - Command Scheduler
Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-11-14 04:16:09 EST; 40min ago
Main PID: 1433 (crond)
CGroup: /system.slice/crond.service
└─1433 /usr/sbin/crond -n

Nov 14 04:16:09 Servername systemd[1]: Started Command Scheduler.
Nov 14 04:16:09 Servername crond[1433]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 88% i...ed.)
Nov 14 04:16:10 Servername crond[1433]: (root) Unauthorized SELinux context=unconfined_u:unconfine...oot)
Nov 14 04:16:10 Servername crond[1433]: (root) FAILED (loading cron table)
Nov 14 04:16:10 Servername crond[1433]: (CRON) INFO (running with inotify support)
Hint: Some lines were ellipsized, use -l to show in full.

tail -20 /var/log/cron
Nov 14 04:56:01 servername CROND[28580]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php >> /usr/local/nagiosxi/var/deadpool.log 2>&1)
Nov 14 04:56:01 servername CROND[28582]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Nov 14 04:56:01 servername CROND[28588]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1)
Nov 14 04:56:01 servername CROND[28585]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1)
Nov 14 04:56:01 servername CROND[28586]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php >> /usr/local/nagiosxi/var/reportengine.log 2>&1)
Nov 14 04:56:01 servername CROND[28590]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php >> /usr/local/nagiosxi/var/cleaner.log 2>&1)
Nov 14 04:56:01 servername CROND[28591]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&1)
Nov 14 04:56:01 servername CROND[28592]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1)
Nov 14 04:56:01 servername CROND[28587]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Nov 14 04:56:01 servername CROND[28589]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1)
Nov 14 04:57:01 servername CROND[29774]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php >> /usr/local/nagiosxi/var/cleaner.log 2>&1)
Nov 14 04:57:01 servername CROND[29778]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Nov 14 04:57:01 servername CROND[29779]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Nov 14 04:57:01 servername CROND[29782]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1)
Nov 14 04:57:01 servername CROND[29784]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1)
Nov 14 04:57:01 servername CROND[29780]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php >> /usr/local/nagiosxi/var/deadpool.log 2>&1)
Nov 14 04:57:01 servername CROND[29776]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1)
Nov 14 04:57:01 servername CROND[29777]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&1)
Nov 14 04:57:01 servername CROND[29781]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php >> /usr/local/nagiosxi/var/reportengine.log 2>&1)
Nov 14 04:57:01 servername CROND[29785]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1)


df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 16K 3.9G 1% /dev/shm
tmpfs 3.9G 9.2M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/vg-lv_root 28G 7.0G 20G 27% /
/dev/sda1 488M 189M 264M 42% /boot
/dev/mapper/vg-lv_home 14G 41M 13G 1% /home
/dev/mapper/vg-lv_tmp 1.2G 120M 954M 12% /tmp
/dev/mapper/vg-lv_var 3.1G 626M 2.3G 22% /var
/dev/mapper/vg-lv_opt 5.7G 957M 4.4G 18% /opt
/dev/mapper/vg-lv_log 3.1G 1.2G 1.7G 41% /var/log
/dev/mapper/vg-lv_audit 1.9G 42M 1.8G 3% /var/log/audit
tmpfs 783M 0 783M 0% /run/user/996
tmpfs 783M 0 783M 0% /run/user/1001
tmpfs 783M 0 783M 0% /run/user/1000
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Services down

Post by scottwilkerson »

a few more

Code: Select all

getenforce
grep NAGIOSXI /etc/sudoers
grep nag /etc/group
su nagios
/usr/local/nagiosxi/scripts/manage_services.sh status nagios
tail -30 /usr/local/nagiosxi/var/sysstat.log
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Services down

Post by vmwareguy »

scottwilkerson wrote:a few more

Code: Select all

getenforce
grep NAGIOSXI /etc/sudoers
su nagios
/usr/local/nagiosxi/scripts/manage_services.sh status nagios

getenforce
Enforcing

grep NAGIOSXI /etc/sudoers
User_Alias NAGIOSXI=nagios
User_Alias NAGIOSXIWEB=apache
NAGIOSXI ALL = PASSWD:/etc/init.d/nagios start
NAGIOSXI ALL = PASSWD:/etc/init.d/nagios stop
NAGIOSXI ALL = PASSWD:/etc/init.d/nagios restart
NAGIOSXI ALL = PASSWD:/etc/init.d/nagios reload
NAGIOSXI ALL = PASSWD:/etc/init.d/nagios status
NAGIOSXI ALL = PASSWD:/etc/init.d/nagios checkconfig
NAGIOSXI ALL = PASSWD:/etc/init.d/ndo2db start
NAGIOSXI ALL = PASSWD:/etc/init.d/ndo2db stop
NAGIOSXI ALL = PASSWD:/etc/init.d/ndo2db restart
NAGIOSXI ALL = PASSWD:/etc/init.d/ndo2db reload
NAGIOSXI ALL = PASSWD:/etc/init.d/ndo2db status
NAGIOSXI ALL = PASSWD:/etc/init.d/npcd start
NAGIOSXI ALL = PASSWD:/etc/init.d/npcd stop
NAGIOSXI ALL = PASSWD:/etc/init.d/npcd restart
NAGIOSXI ALL = PASSWD:/etc/init.d/npcd reload
NAGIOSXI ALL = PASSWD:/etc/init.d/npcd status
NAGIOSXI ALL = PASSWD:/usr/bin/php /usr/local/nagiosxi/html/includes/components/autodiscovery/scripts/autodiscover_new.php *
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/scripts/upgrade_to_latest.sh
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/scripts/change_timezone.sh
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/scripts/reset_config_perms.sh
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/scripts/manage_ssl_config.sh *
NAGIOSXI ALL = PASSWD:/usr/local/nagiosxi/scripts/backup_xi.sh *
NAGIOSXIWEB ALL = PASSWD:/usr/bin/tail -100 /var/log/messages
NAGIOSXIWEB ALL = PASSWD:/usr/bin/tail -100 /var/log/httpd/error_log
NAGIOSXIWEB ALL = PASSWD:/usr/bin/tail -100 /var/log/mysqld.log
NAGIOSXIWEB ALL = PASSWD:/usr/bin/php /usr/local/nagiosxi/html/includes/components/autodiscovery/scripts/autodiscover_new.php *
NAGIOSXIWEB ALL = PASSWD:/usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
NAGIOSXIWEB ALL = PASSWD:/etc/init.d/snmptt restart
NAGIOSXIWEB ALL = PASSWD:/usr/local/nagiosxi/scripts/repair_databases.sh
NAGIOSXIWEB ALL = PASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *


su nagios
no data to give



/usr/local/nagiosxi/scripts/manage_services.sh status nagios
nagios.service - Nagios Core 4.4.2
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-11-14 04:22:25 EST; 43min ago
Docs: https://www.nagios.org/documentation
Process: 5761 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 5759 ExecStop=/usr/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 5764 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 5762 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 5765 (nagios)
CGroup: /system.slice/nagios.service
├─3769 /usr/local/nagios/libexec/check_nrpe -H 214.16.218.36 -t 30 -c check_init_service -a sshd
├─5765 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
├─5768 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─5769 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─5770 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─5771 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
└─5778 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Services down

Post by scottwilkerson »

vmwareguy wrote:getenforce
Enforcing
This is a problem, SELinux needs to be disabled for Nagios XI to function properly

Run the following

Code: Select all

setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Services down

Post by vmwareguy »

scottwilkerson wrote:
vmwareguy wrote:getenforce
Enforcing
This is a problem, SELinux needs to be disabled for Nagios XI to function properly

Run the following

Code: Select all

setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
getenforce is now disabled but services still aren't starting
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Services down

Post by scottwilkerson »

You likely will need to reboot the XI server after doing this

Code: Select all

reboot
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Services down

Post by vmwareguy »

scottwilkerson wrote:You likely will need to reboot the XI server after doing this

Code: Select all

reboot

I have twice :D
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Services down

Post by vmwareguy »

run service nagios
bash: run: command not found
# service nagios status
Redirecting to /bin/systemctl status nagios.service
? nagios.service - Nagios Core 4.4.2
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-11-14 05:22:23 EST; 8min ago
Docs: https://www.nagios.org/documentation
Process: 2547 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 2541 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 2548 (nagios)
CGroup: /system.slice/nagios.service
+-2548 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
+-2551 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
+-2552 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
+-2553 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
+-2554 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
+-2561 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
+-6056 sudo /usr/local/nagiosxi/scripts/manage_services.sh status mysqld
+-6066 /usr/local/nagios/libexec/check_nt -H 160.107.103.27 -s asdf1234ASDF1234 -p 12489 -v USEDDISKSPACE -l C...

Nov 14 05:27:52 servername sudo[4923]: pam_unix(sudo:auth): conversation failed
Nov 14 05:27:52 servername sudo[4923]: pam_unix(sudo:auth): auth could not identify password for [nagios]
Nov 14 05:27:57 servername nagios[2548]: SERVICE ALERT: servername;Memory Usage;CR...eout
Nov 14 05:27:59 servername nagios[2548]: HOST ALERT: servername;UP;SOFT;1;OK - 160...t 0%
Nov 14 05:28:38 servername sudo[5284]: pam_unix(sudo:auth): conversation failed
Nov 14 05:28:38 servername sudo[5284]: pam_unix(sudo:auth): auth could not identify password for [nagios]
Nov 14 05:30:02 servername sudo[5842]: pam_unix(sudo:auth): conversation failed
Nov 14 05:30:02 servername sudo[5842]: pam_unix(sudo:auth): auth could not identify password for [nagios]
Nov 14 05:30:23 servername sudo[6056]: pam_unix(sudo:auth): conversation failed
Nov 14 05:30:23 servername sudo[6056]: pam_unix(sudo:auth): auth could not identify password for [nagios]
Hint: Some lines were ellipsized, use -l to show in full.
Locked