Page 2 of 2

Re: Number of Nagios workers causing interruptions

Posted: Sun Dec 17, 2017 10:38 pm
by tacolover101
a few questions for you:
- do you have certain checks that fail, or keep threads open? SNMP can cause this as the timeout is quite long.
- what type of disks is the server running on? performance wise, you're getting close to what normal hardware can handle before it needs to be tweaked.
- is this a physical machine or a VM?

Re: Number of Nagios workers causing interruptions

Posted: Mon Dec 18, 2017 11:11 am
by kyang
Thanks for the help @tacolover101!

reincarne, please respond to tacolover's questions and let us know the answers.

Also, is the DB offloaded still or back in the XI server?

Re: Number of Nagios workers causing interruptions

Posted: Sun Jan 14, 2018 9:40 am
by reincarne
tacolover101 wrote:a few questions for you:
- do you have certain checks that fail, or keep threads open? SNMP can cause this as the timeout is quite long.
- what type of disks is the server running on? performance wise, you're getting close to what normal hardware can handle before it needs to be tweaked.
- is this a physical machine or a VM?
Well, there are some checks that fails - some of them as a result of a real issue, some of them can be caused by security issues etc.
Still, why Nagios has to create new workers? New workers sort of create zombies which then keep some old data mixed with an updated data and causing load on the server.

We fixed it by creating a crontab job that will monitor number of workers and kill zombies :)

Re: Number of Nagios workers causing interruptions

Posted: Mon Jan 15, 2018 12:57 pm
by dwhitfield
reincarne wrote: We fixed it by creating a crontab job that will monitor number of workers and kill zombies :)
The workers shouldn't be zombies, but it sounds like you've got a resolution. Are we ready to lock this up?