Problem during upgrade.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Problem during upgrade.

Post by benhank »

during the upgrade from version 5.4.12 to the lastest version, the upgrade failed at this point:

Code: Select all

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Error: Template '' specified in host definition could not be not found (config f                                                                                                             ile '/usr/local/nagios/etc/hosttemplates.cfg', starting on line 1812)
Error: Host escalation host name is NULL
Error: Could not register host escalation (config file '/usr/local/nagios/etc/ho                                                                                                             stescalations.cfg', starting on line 27)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data definitions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.
> Return Code: 1
--------------------------------------
/usr/local/nagiosxi/nom/checkpoints/nagioscore/errors /usr/local/nagiosxi/script                                                                                                             s
tar: Removing leading `/' from member names
/usr/local/nagiosxi/scripts
LATEST NOM SNAPSHOT: /usr/local/nagiosxi/nom/checkpoints/nagioscore/1581358928.t                                                                                                             ar.gz
/ /usr/local/nagiosxi/scripts
RESTORING NOM SNAPSHOT : /usr/local/nagiosxi/nom/checkpoints/nagioscore/15813589                                                                                                             28.tar.gz
/usr/local/nagiosxi/scripts

--- reset_config_perms.sh ------------
> Setting script permissions
> Setting CCM script permissions
> Setting special script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting libexec directory permissions
> Setting Nagios XI config permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------
[root@lkennagiost01 nagiosxi]#
I could not figure out how to fix it and after a while I just restored form a working snapshot and was able to complete the upgrade.
However, I today I did an apply config with ran perfectly:

Code: Select all

root@lkennagiost01 ~]# tail -f /usr/local/nagiosxi/var/cmdsubsys.log

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS
APPLYING NAGIOSCORE CONFIG...
CMDLINE=cd /usr/local/nagiosxi/scripts && ./reconfigure_nagios.sh
No entry for terminal type "unknown";
using dumb terminal settings.

--- reset_config_perms.sh ------------
> Setting script permissions
> Setting CCM script permissions
> Setting special script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
/bin/chmod: cannot access `/usr/local/nagios/share/perfdata/BTR-111GROSSM-SW1/If_GigabitEthernet2_0_32.xml.10666': No such file or directory
> Setting libexec directory permissions
> Setting Nagios XI config permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------

--- ccm_import.php -------------------
> Setting import directory: /usr/local/nagios/etc/import/
> Importing config files into the CCM
  No files to import
--------------------------------------

--- ccm_export.php -------------------
> Writing CCM configuration to Nagios files
  Finished writing out configuraton
--------------------------------------

--------------------------------------
> Verifying configuration with Nagios Core
> Output:
Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...

   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 15182 services.
        Checked 2451 hosts.
        Checked 269 host groups.
        Checked 1 service groups.
        Checked 156 contacts.
        Checked 20 contact groups.
        Checked 352 commands.
        Checked 175 time periods.
        Checked 281 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 2451 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 175 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
> Return Code: 0
--------------------------------------
Stopping nagios: ....done.
Starting nagios: done.
OUTPUT=Starting nagios: done.
RETURNCODE=0

PROCESSED 1 COMMANDS
CMDLINE=php /usr/local/nagiosxi/html/includes/components/nagiosbpi/api_tool.php --cmd=syncall
I then re reran the upgrade and the issue listed above returned right after I did a successful apply config.
i need to fix this as i plan on running the upgrade on my two production servers and I dont want this to crop up if possible. To make matters worse after i got the error I restored a snapshot and successfully did an apply config but when I re ran the upgrade it still generated the error and now I cant do an apply config without restoring from a snapshot.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problem during upgrade.

Post by scottwilkerson »

It seems there is a host template that has a blank use field around line 1812 if you go to CCM -> host templates -> click "view config" next to any item.

Additionally, if you go to CCM -> host escalations -> click "view config" and look at the config around line 27 as it is erroring there as well.

The above would need to be done when it is in a state where you are getting the errors you describe above.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Problem during upgrade.

Post by benhank »

there are two issues here I think.
Issue 1.
The apply config worked properly before the upgrade, but cropped up during the upgrade.Also even after applying a working snapshot, and successfully doing an apply config after the snapshot restore, the problem crops up again if I attempt the upgrade.

Issue 2.
Ive looked at the hostemplates.cfg file and this is the entry that begins at line 1812:

Code: Select all

define host {
    name                            Linux-Server
    alias                           Linux Servers
    check_command                   check-host-alive!!!!!!!!
    max_check_attempts              5
    check_interval                  5
    retry_interval                  1
    check_period                    24x7
    event_handler_enabled           1
    process_perf_data               1
    retain_status_information       1
    retain_nonstatus_information    1
    contact_groups                  Data Center Team,Midrange Team,Monitoring Team
    notification_interval           0
    notification_period             24x7
    notification_options            d,u,r,
    register                        0
}
I am also including the entire hosttemplate cfg file just in case I missed something.
You do not have the required permissions to view the files attached to this post.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problem during upgrade.

Post by scottwilkerson »

This file (hosttemplates.cfg) would be one that is running successfully, I would nee to see the output from
CCM -> host templates -> click "view config" next to any item

when it won't apply, this will give me the file that is trying to be applied, vs. one that is known working (hosttemplates.cfg)
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Problem during upgrade.

Post by benhank »

Ok, Ive gotten both versions of the file =D
I tailed the logfile during the snapshot restore as well as the successful apply config:

Code: Select all

--- reset_config_perms.sh ------------
> Setting script permissions
> Setting CCM script permissions
> Setting special script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
/bin/chmod: cannot access `/usr/local/nagios/share/perfdata/MBO-34STMARTIN-CSW1/                                                                                                             If_port-channel1.xml.12707': No such file or directory
> Setting libexec directory permissions
> Setting Nagios XI config permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------
Restoring CCM databases...

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS

PROCESSED 0 COMMANDS
Restore Complete.
OUTPUT=Restore Complete.
RETURNCODE=0

PROCESSED 4 COMMANDS
APPLYING NAGIOSCORE CONFIG...
CMDLINE=cd /usr/local/nagiosxi/scripts && ./reconfigure_nagios.sh
No entry for terminal type "unknown";
using dumb terminal settings.

--- reset_config_perms.sh ------------
> Setting script permissions
> Setting CCM script permissions
> Setting special script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting libexec directory permissions
> Setting Nagios XI config permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------

--- ccm_import.php -------------------
> Setting import directory: /usr/local/nagios/etc/import/
> Importing config files into the CCM
  No files to import
--------------------------------------

--- ccm_export.php -------------------
> Writing CCM configuration to Nagios files
  Finished writing out configuraton
--------------------------------------

--------------------------------------
> Verifying configuration with Nagios Core
> Output:
Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributo                                                                                                             rs
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 15182 services.
        Checked 2451 hosts.
        Checked 269 host groups.
        Checked 1 service groups.
        Checked 156 contacts.
        Checked 20 contact groups.
        Checked 352 commands.
        Checked 175 time periods.
        Checked 281 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 2451 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 175 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
> Return Code: 0
--------------------------------------
Stopping nagios: .done.
Starting nagios: done.
OUTPUT=Starting nagios: done.
RETURNCODE=0
CMDLINE=php /usr/local/nagiosxi/html/includes/components/nagiosbpi/api_tool.php                                                                                                              --cmd=syncall

PROCESSED 1 COMMANDS
OUTPUT=
RETURNCODE=255

PROCESSED 1 COMMANDS

You do not have the required permissions to view the files attached to this post.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problem during upgrade.

Post by scottwilkerson »

Can you do the following, make a file /usr/local/nagios/etc/import/template.cfg with the following content

Code: Select all

define host {
    name                            generic-host
    event_handler_enabled           1
    flap_detection_enabled          1
    process_perf_data               1
    retain_status_information       1
    retain_nonstatus_information    1
    contact_groups                  admins
    notification_period             24x7
    notifications_enabled           1
    register                        0
}
define host {
    name                            IBM-i
    check_command                   check-host-alive
    use                             generic-host
    max_check_attempts              10
    check_interval                  5
    retry_interval                  1
    check_period                    24x7
    contact_groups                  admins
    notification_interval           120
    notification_period             24x7
    notification_options            d,u,r
    register                        0
}
then apply configuration again..

After this it should work.

It looks like there is a new template that is being added for the new IBM wizards that relies on a generic-host template that is missing on your system
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Problem during upgrade.

Post by benhank »

Capture.PNG
That worked! The apply config was successful!
now if its going to be too complicated, how did you know that the ibm wizard was causing this? I want to be able to track down solutions like this in the future.

Now if your answer is "you're not a jedi yet" I'll take that too lol

Next step is to rerun the upgrade and see if it works. I need to do this because the server we are talking about is a test machine, I'm testing the upgrade to see if it will go smoothly on my two production systems.

Or is there a way to exclude the IBM wizard from installing during the upgrade as we dont use it.
You do not have the required permissions to view the files attached to this post.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problem during upgrade.

Post by scottwilkerson »

benhank wrote:now if its going to be too complicated, how did you know that the ibm wizard was causing this? I want to be able to track down solutions like this in the future.
starting on line 1812 of the file in your "not working" file there was this

Code: Select all

define host {
    name                            IBM-i
    check_command                   check-host-alive
    use                             
    max_check_attempts              10
    check_interval                  5
    retry_interval                  1
    check_period                    24x7
    contact_groups                  admins
    notification_interval           120
    notification_period             24x7
    notification_options            d,u,r
    register                        0
}
because I am Jedi level, and because seeing this I knew that we had recently added those wizards. I then looked at the template code in the wizards and found that it was supposed to have generic-host as a template, applied to the newly added template IBM-i. but yours was missing, I looked through your templates and saw you were missing the generic-host template, and hence created the code above to create it...
benhank wrote: Or is there a way to exclude the IBM wizard from installing during the upgrade as we dont use it.
You can run this just before running your ./upgrade command on any other servers

Code: Select all

rm -f /tmp/nagiosxi/subcomponents/xiwizards/wizards/ibmi*
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: Problem during upgrade.

Post by benhank »

gotcha and thank you Obi-wan-wilkerson! And the upgrade wen smoothly you can lock this up!
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problem during upgrade.

Post by scottwilkerson »

benhank wrote:gotcha and thank you Obi-wan-wilkerson! And the upgrade wen smoothly you can lock this up!
great!

Locking thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked