Page 1 of 2

Restoring config snapshot doesn't bring back services

Posted: Mon Dec 06, 2021 6:44 pm
by mcombs77
I archived a config snapshot before using the bulk modification tool. I messed up one of the service checks on all of our Windows hosts. I then restored the configuration snapshot to start over...except service doesn't come back in CCM. The service is still in the configuration files on disk. I've tried several different snapshots. Also tried to delete the config files and write them from the DB using the config file management tools. This causes the config files to not have the service. Attempt to restore snapshot again and the same problem, services exist in config files, but not in CCM. And the problem service that I need to reconfigure initially still shows up as down in services. Any ideas on how to get the config snapshot restored properly?

Re: Restoring config snapshot doesn't bring back services

Posted: Tue Dec 07, 2021 11:56 am
by mcombs77
More details I found in cmdsubsys.log:

.....................................................PROCESSING COMMAND ID 78051...
PROCESS COMMAND: CMD=1113, DATA=1638822320 restore archives
CMDLINE=/usr/local/nagiosxi/scripts/ccm_snapshot.sh 1638822320 restore archives
Restoring CCMl snapshot
Removing old files from /usr/local/nagios/etc
/ ~
RESTORING NOM SNAPSHOT : /usr/local/nagiosxi/nom/checkpoints/nagioscore/archives/1638822320.tar.gz
tar: usr/local/nagios/etc/hosts: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/pnp/pages: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/pnp/check_commands: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/pnp: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/import: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/services: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/static: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/props: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/text-base: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/tmp/props: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/tmp/text-base: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/tmp/prop-base: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/tmp: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn/prop-base: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe/.svn: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/nrpe: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc/objects: Cannot utime: Operation not permitted
tar: usr/local/nagios/etc: Cannot utime: Operation not permitted
tar: Exiting with failure status due to previous errors
~
No entry for terminal type "unknown";
using dumb terminal settings.

--- reset_config_perms.sh ------------
> Setting script permissions
> Setting CCM script permissions
> Setting special script permissions
> Setting special component script permissions
> Setting migrate permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting libexec directory permissions
> Setting Nagios XI config permissions
> Setting NOM checkpoint user:group permissions
> + Setting Nagios Core corelog.newobjects user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------
Restoring CCM databases...
ERROR 2026 (HY000): SSL connection error: self signed certificate in certificate chain
OUTPUT=Restoring CCM databases...
RETURNCODE=1
.....
PROCESSED 1 COMMANDS

Re: Restoring config snapshot doesn't bring back services

Posted: Tue Dec 07, 2021 12:03 pm
by mcombs77
NagiosXI and MariaDB run on separate RHEL 7.9 x64 servers.
MariaDB-server-10.4.21-1.el7.centos.x86_64
Manual install of Nagios XI, version 5.8.6

Re: Restoring config snapshot doesn't bring back services

Posted: Tue Dec 07, 2021 1:09 pm
by kfanselow
Hi mcombs77,

Could you run the following commands as root and provide the output:

Code: Select all

find /usr/local/nagios -ls 
find /usr/local/nagiosxi -ls 
df -h 
df -i 
sestatus


Also could you generate a system profile and send it to me via private message ?

Login to the Nagios XI GUI using a web browser.
Click the "Admin" (Top) -> "System Profile" Menu (Left)
Click the "Download Profile" button

Thanks and Best Regards,
Keith

Re: Restoring config snapshot doesn't bring back services

Posted: Tue Dec 07, 2021 2:15 pm
by mcombs77
Output attached.

Re: Restoring config snapshot doesn't bring back services

Posted: Wed Dec 08, 2021 11:48 am
by kfanselow
Hi mcombs77,

Which service, or services, were you manipulating with the bulk modification tool ? Also could you run through the following steps for us:

1) In the UI go to Configure (top) -> Click on Core Config Manager -> Tools (left frame bottom half ) -> Config File Management, Click on "Delete Files", then Write Configs button to manually write the configuration data to file. Then use the verify button to verify the files are free of errors. If that succeeds without errors go ahead and Click on Restart Nagios Core. After this please create a tar ball of the config files.

Code: Select all

tar -czvf config-from-db.12082021.tgz  /usr/local/nagios/etc /usr/local/nagios/var/objects.cache 

2) Could you restore the archived snapshot you created prior to the bulk change. Click on "Apply Configuration" -- if it's not a the top of the frame you can get to the button via Configure (top) -> Core Config Manager -> Quick Tools ( left frame ) -> Apply Configuration. After applying the config please create a second tar ball:

Code: Select all

tar -czvf config-from-restore.12082021.tgz  /usr/local/nagios/etc /usr/local/nagios/var/nagios.log /usr/local/nagios/var/objects.cache /usr/local/nagiosxi/var/cmdsubsys.log 
Then send those two tar balls to me via PM and we'll take a look.

Thanks and Best Regards,
Keith

Re: Restoring config snapshot doesn't bring back services

Posted: Wed Dec 08, 2021 6:25 pm
by mcombs77
I performed the delete and write config files prior to starting this post. The config files wrote properly. The missing services in CCM were in fact in the services config files and Nagios was alerting on them since the service on the monitored servers had changed names with an upgrade.

This same scenario occurred when attempt a config snapshot restore from many different points in time of the prior few days.

I manually recreated the services already in CCM, so I can't reproduce the original problem anymore.

I can duplicate the behavior by changing a service on a test node to not active. When I restore, the config file has the service active, but CCM still shows not active. The restore doesn't seem to be restoring to the NagiosXI DB.

I am working on the output you requested from the duplication of the problem above.

Re: Restoring config snapshot doesn't bring back services

Posted: Thu Dec 09, 2021 5:41 pm
by kfanselow
Thank you for sending that information over, the repetition of steps provided some additional data points for review. There appear to be a couple of things going on - the first of which is the apparent permission problem. Looking at the file permissions, the perms, user and group names appear to be ok but it is erroring when trying to update the times on the directory. We are able to replicate that problem on another system and we're looking into whether or not it could be a problem. Would you be willing to take a look at the UID/GIDs in your archive file and compare it to what you have defined in the system.

Code: Select all

tar --numeric-owner -tzvf  /usr/local/nagiosxi/nom/checkpoints/nagioscore/archives/1638822320.tar.gz| awk '{print $2}' | sort -u
 tar -tzvf  /usr/local/nagiosxi/nom/checkpoints/nagioscore/archives/1638822320.tar.gz| awk '{print $2}' | sort -u  

grep apache  /etc/passwd 
grep nagios /etc/group 

mount 


The second problem as you noticed was the MySQL SSL error message: "ERROR 2026 (HY000): SSL connection error: self signed certificate in certificate chain"

The ccm_snapshot.sh script that handles the backup and restores calls mysql directly on the command line and appears to be having a certificate issue. For reference here is the relevant line from the script:

Code: Select all

 gunzip < $NOMDIR/${ts}_nagiosql.sql.gz | mysql -h "$nagiosql_dbserver" --port="$nagiosql_dbport" -u $cfg__db_info__nagiosql__user --password="$cfg__db_info__nagiosql__pwd" $cfg__db_info__nagiosql__db 

So said differently the mysql client on the nagios server is not able to connect to your mysql server due to the certificate issue. This is going to require some system level troubleshooting. On your XI server could you run the following command, as the user nagios, replacing the appropriate values with the correct ones from /usr/local/nagiosxi/html/config.inc.php ( lines 36-39 ) ?

Code: Select all

mysql -h SERVER --port=PORT -u USER --password=PASSWD  nagiosxi -e "select  *  from xi_events  ;" 
The third issue is configuration warning messages, but we can address those after the first two issues are resolved..

Thanks and Best Regards,
Keith

Re: Restoring config snapshot doesn't bring back services

Posted: Tue Dec 14, 2021 11:41 am
by mcombs77
I am trying to get back to collecting the requested data, I've just been hammered by the log4j v2 problem the last couple of days.

Re: Restoring config snapshot doesn't bring back services

Posted: Tue Dec 14, 2021 12:39 pm
by kfanselow
No hurry or worries - the log4j issue is affecting a lot of people.

When you do have time to follow up it might save us a step if you PM'd the archive to me as well.

Thanks and Best Regards,
Keith