Page 2 of 2
Re: Number of Nagios workers causing interruptions
Posted: Sun Dec 17, 2017 10:38 pm
by tacolover101
a few questions for you:
- do you have certain checks that fail, or keep threads open? SNMP can cause this as the timeout is quite long.
- what type of disks is the server running on? performance wise, you're getting close to what normal hardware can handle before it needs to be tweaked.
- is this a physical machine or a VM?
Re: Number of Nagios workers causing interruptions
Posted: Mon Dec 18, 2017 11:11 am
by kyang
Thanks for the help
@tacolover101!
reincarne, please respond to tacolover's questions and let us know the answers.
Also, is the DB offloaded still or back in the XI server?
Re: Number of Nagios workers causing interruptions
Posted: Sun Jan 14, 2018 9:40 am
by reincarne
tacolover101 wrote:a few questions for you:
- do you have certain checks that fail, or keep threads open? SNMP can cause this as the timeout is quite long.
- what type of disks is the server running on? performance wise, you're getting close to what normal hardware can handle before it needs to be tweaked.
- is this a physical machine or a VM?
Well, there are some checks that fails - some of them as a result of a real issue, some of them can be caused by security issues etc.
Still, why Nagios has to create new workers? New workers sort of create zombies which then keep some old data mixed with an updated data and causing load on the server.
We fixed it by creating a crontab job that will monitor number of workers and kill zombies

Re: Number of Nagios workers causing interruptions
Posted: Mon Jan 15, 2018 12:57 pm
by dwhitfield
reincarne wrote:
We fixed it by creating a crontab job that will monitor number of workers and kill zombies

The workers shouldn't be zombies, but it sounds like you've got a resolution. Are we ready to lock this up?