My server got the DEVIL in it!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

My server got the DEVIL in it!

Post by benhank »

So according to the the following you will see that
1. I killed and restarted the nagios service, but it isnt running
2. While it is not running I was able to do an apply config
3.the nagios gui says everything is runnin fine.
4. on a side note the apply config log shows a number of warnings but the warning count is 0
Behold!:

Code: Select all

[root@lkennagiost01 etc]# service nagios stop
Stopping nagios: kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
done.
[root@lkennagiost01 etc]# killall -9 nagios
[root@lkennagiost01 etc]# service nagios start
Starting nagios: done.
[root@lkennagiost01 etc]# service nagios status
nagios is not running
[root@lkennagiost01 etc]#  tail -f /usr/local/nagiosxi/var/cmdsubsys.log

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS
APPLYING NAGIOSCORE CONFIG...
CMDLINE=cd /usr/local/nagiosxi/scripts && ./reconfigure_nagios.sh
No entry for terminal type "unknown";
using dumb terminal settings.

--- reset_config_perms.sh ------------
> Setting CCM script permissions
> Setting script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------

--- ccm_import.php -------------------
> Setting import directory: /usr/local/nagios/etc/import/
> Importing config files into the CCM
  No files to import
--------------------------------------

--- ccm_export.php -------------------
> Writing CCM configuration to Nagios files
  Finished writing out configuraton
--------------------------------------

--------------------------------------
> Verifying configuration with Nagios Core
> Output:
Nagios Core 4.4.2
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-08-16
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Warning: Duplicate definition found for service 'App - EPIC EPS Printing - Load Balancer - EPS_ATR' on host 'eps_atr.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_epic_printing_server_ping_wkenatrepsp04.xxx.org.cfg', starting on line 16)
Warning: Duplicate definition found for service 'App - EPIC EPS Printing - Load Balancer - EPS_ATR' on host 'eps_atr.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_epic_printing_server_ping_wkenatrepsp05.xxx.org.cfg', starting on line 16)
Warning: Duplicate definition found for service 'App - EPIC EPS Printing - Load Balancer - EPS_ATR' on host 'eps_atr.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_epic_printing_server_ping_wkenatrepsp02.xxx.org.cfg', starting on line 16)
Warning: Duplicate definition found for service 'App - EPIC EPS Printing - Load Balancer - EPS_ATR' on host 'eps_atr.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_epic_printing_server_ping_wkenatrepsp06.xxx.org.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NT: Memory Usage' on host 'WKENPHAPOSP01.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_windows_memory_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WVKENEXCHP04.xxx.net' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WVKENEXCHP03.xxx.net' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WVKENEXCHP02.xxx.net' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WVKENEXCHP01.xxx.net' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WVKENBIDBT01.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WVKENBIDBP01.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives' on host 'WKENMUSEP01.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives - VNA' on host 'SVR-SQL03.vnacarenetwork.net' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage_VNA.cfg', starting on line 16)
Warning: Duplicate definition found for service 'NSClient: NRPE: Disk Usage: All local drives - VNA' on host 'SVR-ARCHIVEONE.vnacarenetwork.net' (config file '/usr/local/nagios/etc/services/xxx_windows_all_disk_usage_VNA.cfg', starting on line 16)
Warning: Duplicate definition found for service 'Interface Table Status - core network devices' on host 'KEN-GLBXINET1' (config file '/usr/local/nagios/etc/services/xxx_interface_table_core.cfg', starting on line 16)
Warning: Duplicate definition found for service 'Linux Current Load' on host 'LKENPESP4.xxx.org' (config file '/usr/local/nagios/etc/services/xxx_linux_current_load.cfg', starting on line 16)


   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 12143 services.
        Checked 2295 hosts.
        Checked 276 host groups.
        Checked 1 service groups.
        Checked 129 contacts.
        Checked 17 contact groups.
        Checked 339 commands.
        Checked 143 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 2295 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 143 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
> Return Code: 0
--------------------------------------
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Stopping nagios: done.
Starting nagios: done.
OUTPUT=Starting nagios: done.
RETURNCODE=0

PROCESSED 1 COMMANDS
CMDLINE=php /usr/local/nagiosxi/html/includes/components/nagiosbpi/api_tool.php --cmd=syncall
CMD: syncall
PHP Notice:  Undefined variable: err in /usr/local/nagiosxi/html/includes/components/nagiosbpi/api_tool.php on line 146
MSG: Could not get data for objects. NDO or Core may not be running.
OUTPUT=MSG: Could not get data for objects. NDO or Core may not be running.
RETURNCODE=0

^C
[root@lkennagiost01 etc]# service nagios status
nagios is not running
[root@lkennagiost01 etc]#
Capture.PNG
You do not have the required permissions to view the files attached to this post.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: My server got the DEVIL in it!

Post by npolovenko »

Hi, @benhank. Have you upgraded this XI instance recently? What version are you currently running? The devil in your server might've changed the path to the nagios.lock file in the /etc/init.d/nagios script.
Please run the following command first:
cat /usr/local/nagios/etc/nagios.cfg | grep lock_file
And make sure that this path corresponds to the path in the init file:
/etc/init.d/nagios
If it's different change the path in the init script, kill all Nagios processes and start Nagios again.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: My server got the DEVIL in it!

Post by benhank »

yeah, I upgraded from 5412 to 557
Heres the result of that command:

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg | grep lock_file
lock_file=/usr/local/nagios/var/nagios.lock
here is the nagios init file snippet:

Code: Select all

# Our install-time configuration.
prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosCfgtestFile=${prefix}/var/nagios.configtest
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=/var/run/nagios.lock
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
checkconfig="true"
I don't touch init files much which line do I change?
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: My server got the DEVIL in it!

Post by benhank »

scratch that we made the change and its working now. but my final question is why or how did the lock file location change.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
sigmainformatique
Posts: 74
Joined: Mon Apr 23, 2018 8:11 am

Re: My server got the DEVIL in it!

Post by sigmainformatique »

Hi,

Same issue here, for two times, my lock file changed from /usr/local/nagios/... to /var/lock/... (without rights). I was disapointed because XI said I had issue in my configuration, even if my config was OK (by making a nagios checkconfig).
I think this issue is related with the mecanism that can restore old configurations.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: My server got the DEVIL in it!

Post by npolovenko »

@benhank, The lock file location was changed in XI 5.5. Normally the upgrade script takes care of the init file and updates it with the correct path. In rare instances, it doesn't and the init script needs to be modified by hand. Let me know if everything is working as it should now?

@ sigmainformatique, Please read my response above. Are you saying that you had this happen a few times? If so we may need to open a separate thread for your issue and investigate further.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sigmainformatique
Posts: 74
Joined: Mon Apr 23, 2018 8:11 am

Re: My server got the DEVIL in it!

Post by sigmainformatique »

In some cases (wrong configurations...) lock file come back to its old path. Very disturbing...
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: My server got the DEVIL in it!

Post by ssax »

There was a bugfix in XI 5.5.7 that impacted people with downgraded Core, when an apply config fails or you revert to a config snapshot on XI 5.5.7 it replaces the lockfile path in /usr/local/nagios/etc/nagios.cfg.

Please edit these files and make sure the lock file matches your lockfile path:

Code: Select all

/usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
/usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
Specifically, this line:

Code: Select all

lockfile="/var/run/nagios.lock"
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: My server got the DEVIL in it!

Post by benhank »

Thanks for the fixes and info fellas you can lock it up im all set
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: My server got the DEVIL in it!

Post by npolovenko »

@benhank, Sounds good! Closing the thread.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked