Check this out:(Localhost # of processes started growing mid day Wednesday)
localhost-total_processes.jpg
That correlates exactly to when I updated the plugins on about 850 of our servers. We do use check_by_ssh to run the cpu/disk/memory/process/etc checks and those are the ones not ending. When it hits the peaks I get this:(ndo2db offloaded, ignore the 3rd one)
system status.JPG
When this happens I can look at top and it looks like Nagios is still running as I see checks running and perfdata is being logged, but XI just stops seeing any update information until I kill all the ssh connections with
Code: Select all
kill -9 `ps -ef | grep /usr/bin/ssh | grep -v grep | awk '{print $2}'`
and restart nagios process.
2 things...
1.) Any idea why updating plugins would make check_by_ssh stop closing out processes cleanly?
2.) Any idea why XI is behaving like it is when the process count gets that high? The server isn't under any stress or anything. Is it because I have max concurrent checks set to 4000?
You do not have the required permissions to view the files attached to this post.