Page 1 of 1

Issue applying configuration, system is down

Posted: Thu Dec 10, 2020 11:49 am
by dshearon
Good morning,

One of my team members contacted me this morning and informed that there was an error applying the configuration and the system monitoring was down. I tried reverting to an old snapshot but was met with the same error when trying to apply again. Below is the text of the error message as well as the .cfg file it is complaining about. The host names are a single line in the file but get broken up in the paste below due to word wrap. Any guidance you can provide would be appreciated.

Error: Could not find any host matching 'OPK-1A-03-S1' (config file '/usr/local/nagios/etc/services/Juniper - CPU RE0.cfg', starting on line 16)
Error: Failed to expand host list 'ar01.atl' for service 'CPU Usage RE0' (/usr/local/nagios/etc/services/Juniper - CPU RE0.cfg:16)


CFG file contents:
###############################################################################
#
# Services configuration file
#
# Created by: Nagios CCM 3.0.6
# Date: 2020-09-30 18:16:16
# Version: Nagios Core 4.x
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
host_name ar01.atl,ar01.clt,br01.clt,br01.lko,cor01.atl,cor01.clt,cor01.lhr,cor01.lko,cor01.opk,cor01.rxe3,opk-1a-03-s1,vdr01.atl,vdr01.clt,vpn01.lhr,vpn01.lko,wr01.atl,wr01.clt,wr01.lhr,wr01.lko,wr01.opk,wr02.atl,wr02.clt,wr02.lko,wr02.rxe
service_description CPU Usage RE0
use Juniper - CPU RE0
display_name CPU Usage RE0
register 1
}

###############################################################################
#
# Services configuration file
#
# END OF FILE
#
###############################################################################

Re: Issue applying configuration, system is down

Posted: Thu Dec 10, 2020 2:47 pm
by benjaminsmith
Hi,
Error: Could not find any host matching 'OPK-1A-03-S1' (config file '/usr/local/nagios/etc/services/Juniper - CPU RE0.cfg', starting on line 16)
Error: Failed to expand host list 'ar01.atl' for service 'CPU Usage RE0' (/usr/local/nagios/etc/services/Juniper - CPU RE0.cfg:16)
It looks like there are couple hosts that it's unable to find. I would suggest removing those hosts from the services definitions. Then run through the following steps in Configure > CCM > Tools > Config File Management

1. Delete Files
2. Write Files
3. Verify Files ( if you see any errors follow the message to correct and repeat steps 1-3 until it passes)
4. Restart Nagios Core

If you're unable to resolve the error, you can restore a known good snapshot to get back up and running. Go to Configure > Core Configuration Manager > Quick Tools > Configuration Snapthost, and select the latest 'OK' snapshot to restore.
config-snapshot.png
Otherwise, send me the system profile and I can try to test out the configurations here.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Best Regards,
Benjamin