nagios stalls unless I use strace
Posted: Mon Mar 23, 2020 3:40 pm
I am setting up nagios on a new server (RHEL 7) and it was working fine for a couple weeks and then all of a sudden it just stalled. Nagios itself still runs, but nothing is happening. status.dat doesn't update, there is nothing in nagios.log or nagios.debug. The data in spool/checkresults just piles up. We saw a few zombie processes, so we restarted nagios, but then it would run some tests for less than a minute and then just stall.
We can't think of anything that changed on the system to have caused the behavior to change.
Today I tried running nagios with strace to see if we could see what was causing it to stall. But when I run it with strace, nagios works flawlessly. Our theory is that it slows nagios down enough to get around whatever the issue is. But we're not sure what to do from here.
Has anyone else seen anything like this before?
We can't think of anything that changed on the system to have caused the behavior to change.
Today I tried running nagios with strace to see if we could see what was causing it to stall. But when I run it with strace, nagios works flawlessly. Our theory is that it slows nagios down enough to get around whatever the issue is. But we're not sure what to do from here.
Has anyone else seen anything like this before?