Page 1 of 1
Nagios Worker 100 CPU - Nagios hangs
Posted: Wed Oct 04, 2017 2:52 am
by Mike-sbg
Hello!
Im Using Nagios 4.3.2 on den CENTOS 7 and I've a strange behaviour, which I read, that it is allready fixed:
One of the worker threaks uses 100 CPU's and the hole Nagios Process stopps working.
Only Restarting Nagios fixes the problem for a view hours.
The strange thing is, that this server was in testmode (behing an Nagios 3 Server connect with OCSP) - everything worked fine for some months.
I exprimented a little bit the the workers= Setting in Nagios.cfg and found out, when I reduce the workers to 4 ... the crash problem appears evry few minutes.
With setting workers=9 it happens 1-2 times a day
A the Momant im Testing with more workers.
Can anybody help me...
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Wed Oct 04, 2017 4:40 pm
by tgriep
Can you post your nagios.cfg file so we can the settings?
Also, when the system hangs, can you post the nagios.log file and if you have syslog enabled in the nagios.cfg file, post any entries in the /var/log/messages file.
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Thu Oct 05, 2017 1:04 am
by Mike-sbg
The Nagios.log prints nothing special when the problem ocurs ... it just freezes ...
I looked a the abrt-log and the only thing I saw, was that the check_dns produces
Code: Select all
Oct 5 07:52:55 atnagiossr03 abrt-hook-ccpp: Process 16150 (check_dns) of user 993 killed by SIGSEGV - dumping core
Oct 5 07:52:55 atnagiossr03 abrt-server: Executable '/usr/lib64/nagios/plugins/check_dns' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Oct 5 07:52:55 atnagiossr03 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-10-05-07:52:55-16150' exited with 1
Oct 5 07:52:55 atnagiossr03 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2017-10-05-07:52:55-16150'
I disabled the check for testing, because I don't know if it is part of the problem ...
The Nagios.cfg is attached to this thread.
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Thu Oct 05, 2017 9:47 am
by tgriep
The core dump for the check_dns plugin could be what is causing the hang as the system may not be able to save the dump file causing the worker to hang.
If disabling that check keeps the system from hanging, then you may need to enable debugging in Nagios to see if you can get more details on the issue in the nagios.debug file.
Also, make sure the ulimit settings on the server is set to unlimited so the core dump file can be created if it happens again.
That setting is found in this file.
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Fri Oct 06, 2017 12:25 am
by Mike-sbg
I disabled the CHECK_DNS totally and no abrts happen any more.
But nagios still hung 2 times the last 12 hours.
Thank you for the advice with the limits.conf I set them to unlimited ... I hope this fixes the problem.
I'll report again on monday ...
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Fri Oct 06, 2017 8:48 am
by tgriep
OK, let us know your findings.
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Mon Oct 09, 2017 12:15 am
by Mike-sbg
Now Nagios stayed stable the whole last weekend.
The adivce with the ulimits fixed my problem.
I'm very happy!
Thank you for helping!!!
Re: Nagios Worker 100 CPU - Nagios hangs
Posted: Mon Oct 09, 2017 8:59 am
by tgriep
Your very welcome. I'll close and mark the post as solved. If you have any issues in the future, feel free to open a new post.