Nagios process crashing, looking for debugging suggestions
Posted: Tue Feb 16, 2021 9:13 pm
Hiya, as per the subject the Nagios Process appears to be crashing / bombing out without any obvious reason (to me)
I'm looking for suggestions / help in finding why this is bombing out:
Our Setup:
nagios process running on physical Centos Linux 7.9.2009 (Core)
we distribute jobs to workers (4 of them) using mod_gearman
Nagios XI 5.8.1
nagios --version
Nagios Core 4.4.6
gearmand --version
gearmand 1.1.19.1 - https://github.com/gearman/gearmand/issues
mod_gearman_worker --version
mod_gearman_worker: version 3.3.0 running on libgearman 1.1.19.1
nagios.log excerpt:
[1613526089] NDO-3: Ended flapping thread
[1613526089] NDO-3: Ended acknowledgement thread
[1613526089] NDO-3: Ended statechange thread
[1613526089] NDO-3: Ended event_handler thread
[1613526089] NDO-3: Ended notification thread
...
[1613526089] Caught SIGSEGV, shutting down...
...
...
...
[1613526323] NDO-3: Ended acknowledgement thread
[1613526323] NDO-3: Ended flapping thread
[1613526323] NDO-3: Ended statechange thread
[1613526323] NDO-3: Ended event_handler thread
[1613526323] Caught SIGSEGV, shutting down...
I'm relatively happy to poke around and try things out to see if we can debug the fault, at the moment i've enabled a process watching script that will restart nagios if it detects it down, and it manages to keep it running for the moment. But this is of course not idea, as it is happening far too frequently.
I've read in the past this can be related to mod_gearman and header mismatch. So perhaps I need to rebuild mod_gearman, at present I am using the nagios install scripts for workers / server to setup the two and they usually work well enough for me.
cheers
--Aaron
I'm looking for suggestions / help in finding why this is bombing out:
Our Setup:
nagios process running on physical Centos Linux 7.9.2009 (Core)
we distribute jobs to workers (4 of them) using mod_gearman
Nagios XI 5.8.1
nagios --version
Nagios Core 4.4.6
gearmand --version
gearmand 1.1.19.1 - https://github.com/gearman/gearmand/issues
mod_gearman_worker --version
mod_gearman_worker: version 3.3.0 running on libgearman 1.1.19.1
nagios.log excerpt:
[1613526089] NDO-3: Ended flapping thread
[1613526089] NDO-3: Ended acknowledgement thread
[1613526089] NDO-3: Ended statechange thread
[1613526089] NDO-3: Ended event_handler thread
[1613526089] NDO-3: Ended notification thread
...
[1613526089] Caught SIGSEGV, shutting down...
...
...
...
[1613526323] NDO-3: Ended acknowledgement thread
[1613526323] NDO-3: Ended flapping thread
[1613526323] NDO-3: Ended statechange thread
[1613526323] NDO-3: Ended event_handler thread
[1613526323] Caught SIGSEGV, shutting down...
I'm relatively happy to poke around and try things out to see if we can debug the fault, at the moment i've enabled a process watching script that will restart nagios if it detects it down, and it manages to keep it running for the moment. But this is of course not idea, as it is happening far too frequently.
I've read in the past this can be related to mod_gearman and header mismatch. So perhaps I need to rebuild mod_gearman, at present I am using the nagios install scripts for workers / server to setup the two and they usually work well enough for me.
cheers
--Aaron