Page 1 of 2

Number of Nagios workers causing interruptions

Posted: Sun Jun 04, 2017 3:54 am
by reincarne
Hi,
I noticed that at least once in a week my Nagios XI stops functioning and the only way to resolve it is to kill the Nagios workers.

In a normal day work, there are 12 workers. However, during the week they are growing, everytime there are 12 new workers created with a new ID.
what is the reason and how can I solve it?

nagios 5787 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5788 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5789 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5791 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5792 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5793 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5794 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5795 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5796 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5797 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5798 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 5799 5785 0 08:46 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
alexle 20679 3054 0 08:52 pts/2 00:00:00 grep worker

Re: Number of Nagios workers causing interruptions

Posted: Mon Jun 05, 2017 9:53 am
by avandemore
Killing the workers isn't the right way, they are likely needed.

Can you describe in more detail by this: my Nagios XI stops functioning?

A profile generated in one of this suboptimal states may be useful as well if it is possible.

XI > Admin > System Profile > Download Profile

Please include the zip file in your response. You can PM myself or other support personnel it as well.

Re: Number of Nagios workers causing interruptions

Posted: Wed Jun 21, 2017 1:27 pm
by tmcdonald
Just checking in since we have not heard from you in a while. Did @avandemore's post clear things up or has the issue otherwise been resolved?

Re: Number of Nagios workers causing interruptions

Posted: Sun Nov 26, 2017 7:57 am
by reincarne
Hi,
We are still experiencing this issue. To whom should I send the profile?

Re: Number of Nagios workers causing interruptions

Posted: Mon Nov 27, 2017 10:55 am
by tmcdonald
Please send it to me and make sure to reply back to this thread once you have done so.

Update: Profile received and shared with team.

Re: Number of Nagios workers causing interruptions

Posted: Tue Nov 28, 2017 2:04 am
by reincarne
tmcdonald wrote:Please send it to me and make sure to reply back to this thread once you have done so.
Hi,
Send you a PM.

Re: Number of Nagios workers causing interruptions

Posted: Tue Nov 28, 2017 11:47 am
by kyang
Looking into your profile, the logs date back to June 20th? Could you send us an updated profile?

From your top command, you are experiencing 100% CPU usage from mysqld.

How many hosts/services do you have?

Did you offload your database?

The best thing would be to send us an updated profile, so we can see how things are looking now.

Re: Number of Nagios workers causing interruptions

Posted: Sun Dec 10, 2017 7:27 am
by reincarne
sent you the profile file

Re: Number of Nagios workers causing interruptions

Posted: Mon Dec 11, 2017 10:12 am
by kyang
How many hosts/services do you have?

Is this an offloaded DB?

UPDATE: Profile received!

Put in teamshare.

Re: Number of Nagios workers causing interruptions

Posted: Sun Dec 17, 2017 4:37 am
by reincarne
We have about 1700 hosts and 31k service checks.
Offloading the DB caused more issues.