Nagios XI 2014R2.5 Backup failing

JakeHatMacys · Post by **JakeHatMacys** » Wed Aug 19, 2015 8:01 am

I have a hunch it may be due to a Sudo change... (When we upgraded a box to 2.7 the Nagios User didn't have permissions) This box however is still running 2.5 and I'm wondering if that change may have negatively affected the 2.5 box. I'll paste below the change below the error... then again we have another box running 2.5 and working fine :/

Error:

Code: Select all

 /usr/local/nagiosxi/scripts/backup_xi.sh
Backing up Core Config Manager (NagiosQL)...
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
Backing up Nagios Core...
tar: Removing leading `/' from member names
tar: /usr/local/nagios/share/perfdata/esu1l384: file changed as we read it
tar: /usr/local/nagios/share/perfdata/esu2v775: file changed as we read it
tar: /usr/local/nagios/var/ndo.sock: socket ignored
tar: /usr/local/nagios/var/rw/nagios.qh: socket ignored
tar: /usr/local/nagios/var: file changed as we read it
Backing up Nagios XI...
tar: Removing leading `/' from member names
Backing up MRTG...
tar: Removing leading `/' from member names
Backing up NRDP...
tar: Removing leading `/' from member names
Backing up MySQL databases...
mysqldump: Got error: 144: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed                    when using LOCK TABLES
Error backing up MySQL database 'nagios' - check the password in this script!

Sudo File change, per my Unix admin the way they do Sudo is from a master file. The entry on the Nagios Box doesn't really do anything:

Code: Select all


[u][b]Original:[/b][/u]

Defaults:nagios !requiretty
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_init_service
User_Alias      NAGIOSXI=nagios
User_Alias              NAGIOSXIWEB=apache
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios status
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios checkconfig
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db status
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd status
NAGIOSXI ALL = NOPASSWD:/usr/bin/nmap *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/upgrade_to_latest.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/change_timezone.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/messages
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/httpd/error_log
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/mysqld.log
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/nmap *
NAGIOSXIWEB ALL = NOPASSWD:/etc/init.d/snmptt restart
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/repair_databases.sh
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *


[u][b]These were converted down to:[/b][/u]

Defaults:nagios !requiretty

Host_Alias     NAGIOS = ***list of hosts servers deleted****

Cmnd_Alias     NAGIOS=/usr/local/nagios/libexec/check_init_service, /etc/init.d/nagios *, /etc/init.d/ndo2db *, /etc/init.d/npcd *, /usr/bin/nmap *, /usr/local/nagiosxi/scripts/*
Cmnd_Alias     NAGIOSWEB = /usr/bin/tail -100 /var/log/messages, /usr/bin/tail -100 /var/log/httpd/error_log, /usr/bin/tail -100 /var/log/mysqld.log, /usr/bin/nmap *, /etc/init.d/snmptt restart, /usr/local/nagiosxi/scripts/*

nagios     NAGIOS = (root) NOPASSWD:NAGIOS
apache     NAGIOSWEB = (root) NOPASSWD:NAGIOSWEB

Post by **tgriep** » Wed Aug 19, 2015 9:14 am

It looks like the mysql database needs to be repaired. Here are the instructions to do that.
https://assets.nagios.com/downloads/nag ... tabase.pdf
After the repair, see if the backup finishes.

JakeHatMacys · Post by **JakeHatMacys** » Wed Aug 19, 2015 2:59 pm

Ran the repair which worked. But we're seeing some errors in /var directory:

Code: Select all

Aug 19 15:53:00 esu1l268 nagios: wproc: CHECK job 876 from worker Core Worker 3152 timed out after 60.01s
Aug 19 15:53:00 esu1l268 nagios: wproc:   host=s96d3z0; service=check_FS_space_Solaris_by_sshpass;
Aug 19 15:53:00 esu1l268 nagios: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 19 15:53:00 esu1l268 ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server 

through socket '/var/lib/mysql/mysql.sock' (2)
Aug 19 15:53:00 esu1l268 ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server 

through socket '/var/lib/mysql/mysql.sock' (2)
Aug 19 15:53:00 esu1l268 nagios: Warning: Check of service 'check_FS_space_Solaris_by_sshpass' on host 's96d3z0' 

timed out after 60.006s!
Aug 19 15:53:00 esu1l268 ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server 

through socket '/var/lib/mysql/mysql.sock' (2)
Aug 19 15:53:00 esu1l268 ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server 

through socket '/var/lib/mysql/mysql.sock' (2)
Aug 19 15:53:00 esu1l268 nagios: wproc: Core Worker 3152: job 876 (pid=17007): Dormant child reaped
Aug 19 15:53:00 esu1l268 rsyslogd-2177: imuxsock begins to drop messages from pid 3169 due to rate-limiting
Aug 19 15:53:02 esu1l268 snmpd[2649]: Connection from UDP: [11.48.116.70]:47248->[11.48.4.85]

Specifically showing in: /var/log/messages

Post by **tgriep** » Wed Aug 19, 2015 3:11 pm

Is the MYSQL database off loaded to another server?
If it is running locally, lets restart the processes by following these steps.

Code: Select all

service nagios stop
service ndo2db stop
killall -9 nagios
service mysqld restart
service ndo2db start
service nagios start

Try that and see if the errors are gone.

JakeHatMacys · Post by **JakeHatMacys** » Mon Aug 24, 2015 1:57 pm

So when I run that MySQL repair things look good for a bit. not sure how long... but after what seemed like 30 minutes now none of my services are listed in NagiosXI or core.

Things seem to be running in the background okay though as we're still alerting.

What it's saying when we try to pull up the services page in XI:

SQL: SQL Error [ndoutils] : Got error 28 from storage engine

Showing 1-500 of 9,885 total records

We are currently running 2500 host checks & close to 10,000 services on the box (it's a massive box hardware isn't being taxed). But wondering if the Engine just isn't keeping up???

Any ideas for diagnosis?

For the record I don't care so much about the back up running at the moment as I do getting the Database error fixed!

JakeHatMacys · Post by **JakeHatMacys** » Mon Aug 24, 2015 2:00 pm

tgriep wrote:Is the MYSQL database off loaded to another server?
If it is running locally, lets restart the processes by following these steps.
Code: Select all
service nagios stop
service ndo2db stop
killall -9 nagios
service mysqld restart
service ndo2db start
service nagios start
Try that and see if the errors are gone.

It is running locally to answer this sorry.

And when I tail /var/log/messages

Aug 24 15:00:41 esu1l268 ndo2db: Warning: queue send error, retrying...
Aug 24 15:00:57 esu1l268 sshd[32543]: Did not receive identification string from 11.48.23.75
Aug 24 15:01:01 esu1l268 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Aug 24 15:01:01 esu1l268 ndo2db: Warning: queue send error, retrying...
Aug 24 15:01:21 esu1l268 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Aug 24 15:01:21 esu1l268 ndo2db: Warning: queue send error, retrying...
Aug 24 15:01:41 esu1l268 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Aug 24 15:01:41 esu1l268 ndo2db: Warning: queue send error, retrying...
Aug 24 15:02:01 esu1l268 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Aug 24 15:02:01 esu1l268 ndo2db: Warning: queue send error, retrying...

hsmith · Post by **hsmith** » Mon Aug 24, 2015 2:02 pm

JakeHatMacys wrote:So when I run that MySQL repair things look good for a bit. not sure how long... but after what seemed like 30 minutes now none of my services are listed in NagiosXI or core.

Things seem to be running in the background okay though as we're still alerting.

What it's saying when we try to pull up the services page in XI:
SQL: SQL Error [ndoutils] : Got error 28 from storage engine

Showing 1-500 of 9,885 total records
We are currently running 2500 host checks & close to 10,000 services on the box (it's a massive box hardware isn't being taxed). But wondering if the Engine just isn't keeping up???

Any ideas for diagnosis?

For the record I don't care so much about the back up running at the moment as I do getting the Database error fixed!

Error 28 is related to storage space..

How's a df -h look?

JakeHatMacys · Post by **JakeHatMacys** » Mon Aug 24, 2015 2:04 pm

Woah yeah that DB maint sure filled the logs up:

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/localvg00-lv_slash
17G 6.7G 8.8G 44% /
tmpfs 48G 1.0M 48G 1% /dev/shm
/dev/mapper/localvg00-lv_archive
4.0G 235M 3.6G 7% /archive
/dev/sda1 248M 38M 198M 17% /boot
/dev/mapper/localvg00-lv_home
4.0G 1.4G 2.4G 37% /home
/dev/mapper/localvg00-lv_opt
16G 431M 15G 3% /opt
/dev/mapper/localvg00-lv_tmp
3.2G 3.0G 0 100% /tmp
/dev/mapper/localvg00-lv_usr
9.9G 2.4G 7.1G 25% /usr
/dev/mapper/localvg00-lv_usr_local
56G 25G 29G 47% /usr/local
/dev/mapper/localvg00-lv_var
10G 8.9G 644M 94% /var
xxxxxxxxxxxx:/export/nagios-esu1l268
60G 6.7G 54G 12% /nagios_backups
xxxxxxxxxxxx:/export/nagios-esu2
100M 0 100M 0% /esu1l268/Nagios

JakeHatMacys · Post by **JakeHatMacys** » Mon Aug 24, 2015 2:07 pm

I think this happened last time, there was a deleted file still holding onto file system space.

Gonna reboot the box.

JakeHatMacys · Post by **JakeHatMacys** » Mon Aug 24, 2015 2:32 pm

Reboot resolved things for now. Thanks for the heads up... Didn't expect that maint to fill up the /tmp and almost /var

We're going to be migrating off this box soon so we shouldn't have space issues going forward.

Nagios Support Forum

Nagios XI 2014R2.5 Backup failing

Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing

Re: Nagios XI 2014R2.5 Backup failing