server was down, now UP but many gzip processes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
zaji_nms
Posts: 616
Joined: Tue Oct 16, 2012 12:28 am

server was down, now UP but many gzip processes

Post by zaji_nms »

Dear Expert

Our server was down for 4 days, now its UP, but CPU/LOAD is shooting because of gzip, (we fixed by killall -9 gzip but has to repeat this cmd many many times) I think NagiosXI trying to backup all last 4 days backup which was pending due to server was down. Just think if in worse case if server down more than week or month.

I think you should fix this issue, should only give WARNING message on web(http), same way when DATABASE get crushed its giving. NagiosXI should not try all last 4 days backup (its meaning less).

do same lab test, version XI 5.x.x

Regards
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: server was down, now UP but many gzip processes

Post by mbellerue »

That does sound pretty bad. What exact version of XI are you running that had this problem?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
zaji_nms
Posts: 616
Joined: Tue Oct 16, 2012 12:28 am

Re: server was down, now UP but many gzip processes

Post by zaji_nms »

Happy New Year to Nagios Team and readers

we faced/experienced same issue when it was NagiosXI 5.2.3 and NagiosXI 5.2.7
better you please do some lab test on your latest XI 5.6.9

Regards
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: server was down, now UP but many gzip processes

Post by lmiltchev »

I was not able to recreate the issue in-house. After powering on a server with a scheduled daily backup, only one backup (tarball) was created, not several. I didn't want to wait for 4 days, so I just changed the date/time of the server to 4 days in the future. :) Also, I only had a "local" backup set up.

Are you sure that your Nagios XI server tried to run backups for the last 4 days? Is it possible that you have different backup types that were run, e.g. Local, SSH, and FTP?

It is hard to imagine that gzip would crash the server or cause such a high CPU load. We haven't experience this during our tests, and haven't heard of any other users, reporting a similar (or the same) issue. I will do some more digging into this, but so far, I haven't had any luck reproducing the problem.
Be sure to check out our Knowledgebase for helpful articles and solutions!
zaji_nms
Posts: 616
Joined: Tue Oct 16, 2012 12:28 am

Re: server was down, now UP but many gzip processes

Post by zaji_nms »

Attn : lmiltchev

Yes, we have Local & SSH backups. (SSH on remote server).

Please note, we had this bad experience two times. I posted to benefit any other user who will/may face this problem in future.

Try to simulate in you lab (real environment with big data file 3GB+), if its very short backup file possible you will not notice.

if your digging not find any issue, you can close this POST.

Regards
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: server was down, now UP but many gzip processes

Post by lmiltchev »

Try to simulate in you lab (real environment with big data file 3GB+), if its very short backup file possible you will not notice.
I would like to test this, and file a bug report if it is a bug. This could be an "edge case", and sometimes it is difficult to reproduce the issue when this happens. It would help if we had more information, which can help us lab this in-house.

1. When you say: "server was down", do you mean your server (or VM) was powered off? The reason I am asking is that it is possible that apache was not running, and you were not able to log in the web interface, but in this case, nagios would be still running in the background. I just want to find out what was the exact case.

2. What is the "Backup Limit" that you are currently using?

3. Do you have any "partial" backups (folders) in your backup location(s)? Can you list your backups to see what is the "normal" size of the tarballs?

Example:

Code: Select all

ls -la /store/backups/nagiosxi/
4. Do you have anything unusual in the "/usr/local/nagiosxi/var/components/scheduledbackups.log" that can point us to the right direction?

5. Have you enabled debugging for scheduled backups in order to troubleshoot the issue?

https://support.nagios.com/kb/article/n ... l-578.html

6. Were the cron jobs running when you were having this issue?

Code: Select all

 ps -ef | grep cron | grep -v grep
Once I have this info, I will try testing the scheduled backups again in a large environment. Thanks!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked