Dear Expert
Our server was down for 4 days, now its UP, but CPU/LOAD is shooting because of gzip, (we fixed by killall -9 gzip but has to repeat this cmd many many times) I think NagiosXI trying to backup all last 4 days backup which was pending due to server was down. Just think if in worse case if server down more than week or month.
I think you should fix this issue, should only give WARNING message on web(http), same way when DATABASE get crushed its giving. NagiosXI should not try all last 4 days backup (its meaning less).
do same lab test, version XI 5.x.x
Regards
server was down, now UP but many gzip processes
Re: server was down, now UP but many gzip processes
That does sound pretty bad. What exact version of XI are you running that had this problem?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: server was down, now UP but many gzip processes
Happy New Year to Nagios Team and readers
we faced/experienced same issue when it was NagiosXI 5.2.3 and NagiosXI 5.2.7
better you please do some lab test on your latest XI 5.6.9
Regards
we faced/experienced same issue when it was NagiosXI 5.2.3 and NagiosXI 5.2.7
better you please do some lab test on your latest XI 5.6.9
Regards
Re: server was down, now UP but many gzip processes
I was not able to recreate the issue in-house. After powering on a server with a scheduled daily backup, only one backup (tarball) was created, not several. I didn't want to wait for 4 days, so I just changed the date/time of the server to 4 days in the future.
Also, I only had a "local" backup set up.
Are you sure that your Nagios XI server tried to run backups for the last 4 days? Is it possible that you have different backup types that were run, e.g. Local, SSH, and FTP?
It is hard to imagine that gzip would crash the server or cause such a high CPU load. We haven't experience this during our tests, and haven't heard of any other users, reporting a similar (or the same) issue. I will do some more digging into this, but so far, I haven't had any luck reproducing the problem.
Are you sure that your Nagios XI server tried to run backups for the last 4 days? Is it possible that you have different backup types that were run, e.g. Local, SSH, and FTP?
It is hard to imagine that gzip would crash the server or cause such a high CPU load. We haven't experience this during our tests, and haven't heard of any other users, reporting a similar (or the same) issue. I will do some more digging into this, but so far, I haven't had any luck reproducing the problem.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: server was down, now UP but many gzip processes
Attn : lmiltchev
Yes, we have Local & SSH backups. (SSH on remote server).
Please note, we had this bad experience two times. I posted to benefit any other user who will/may face this problem in future.
Try to simulate in you lab (real environment with big data file 3GB+), if its very short backup file possible you will not notice.
if your digging not find any issue, you can close this POST.
Regards
Yes, we have Local & SSH backups. (SSH on remote server).
Please note, we had this bad experience two times. I posted to benefit any other user who will/may face this problem in future.
Try to simulate in you lab (real environment with big data file 3GB+), if its very short backup file possible you will not notice.
if your digging not find any issue, you can close this POST.
Regards
Re: server was down, now UP but many gzip processes
I would like to test this, and file a bug report if it is a bug. This could be an "edge case", and sometimes it is difficult to reproduce the issue when this happens. It would help if we had more information, which can help us lab this in-house.Try to simulate in you lab (real environment with big data file 3GB+), if its very short backup file possible you will not notice.
1. When you say: "server was down", do you mean your server (or VM) was powered off? The reason I am asking is that it is possible that apache was not running, and you were not able to log in the web interface, but in this case, nagios would be still running in the background. I just want to find out what was the exact case.
2. What is the "Backup Limit" that you are currently using?
3. Do you have any "partial" backups (folders) in your backup location(s)? Can you list your backups to see what is the "normal" size of the tarballs?
Example:
Code: Select all
ls -la /store/backups/nagiosxi/5. Have you enabled debugging for scheduled backups in order to troubleshoot the issue?
https://support.nagios.com/kb/article/n ... l-578.html
6. Were the cron jobs running when you were having this issue?
Code: Select all
ps -ef | grep cron | grep -v grepBe sure to check out our Knowledgebase for helpful articles and solutions!