I'm working on an upgrade (well, re-install) from 3.5 to 4.0.8. Everything seems to be fine, but all of a sudden Nagios is crashing or being killed.
From nagios.log:
[1432740780] Nagios 4.0.8 starting... (PID=21873)
[1432740780] Local time is Wed May 27 15:33:00 GMT 2015
[1432740780] LOG VERSION: 2.0
[1432740780] qh: Socket '/var/nagios/rw/nagios.qh' successfully initialized
[1432740780] qh: core query handler registered
[1432740780] nerd: Channel hostchecks registered successfully
[1432740780] nerd: Channel servicechecks registered successfully
[1432740780] nerd: Channel opathchecks registered successfully
[1432740780] nerd: Fully initialized and ready to rock!
[1432740780] wproc: Successfully registered manager as @wproc with query handler
[1432740780] wproc: Registry request: name=Core Worker 21878;pid=21878
[1432740780] wproc: Registry request: name=Core Worker 21876;pid=21876
[1432740780] wproc: Registry request: name=Core Worker 21877;pid=21877
[1432740780] wproc: Registry request: name=Core Worker 21875;pid=21875
[1432740780] Successfully launched command file worker with pid 21879
[1432741202] Caught SIGTERM, shutting down...
[1432741202] Caught SIGTERM, shutting down...
[1432741202] Successfully shutdown... (PID=21873)
[1432741202] Event broker module 'NERD' deinitialized successfully.
I enabled debug mode, and didn't see much more:
[1432741201.662290] [016.1] [pid=21873] Host is not flapping (0.00% state change).
[1432741201.662304] [12288.1] [pid=21873] ## Polling 811ms; sockets=6; events=388; iobs=0x90d110
[1432741202.015112] [064.1] [pid=21873] Making callbacks (type 2)...
[1432741202.015150] [001.0] [pid=21873] event_execution_loop() end
[1432741202.015240] [064.1] [pid=21873] Making callbacks (type 0)...
[1432741202.015254] [064.1] [pid=21873] Making callbacks (type 0)...
[1432741202.015262] [064.1] [pid=21873] Making callbacks (type 19)...
[1432741202.015270] [001.0] [pid=21873] xrddefault_save_state_information()
[1432741202.110019] [064.1] [pid=21873] Making callbacks (type 19)...
[1432741202.111746] [064.1] [pid=21873] Making callbacks (type 2)...
[1432741202.111762] [064.0] [pid=21873] Attempting to unload module 'NERD': flags=1, reason=1
[1432741202.111780] [064.0] [pid=21873] Module 'NERD' unloaded successfully.
[1432741202.112611] [001.0] [pid=21873] clear_volatile_macros_r()
Any thoughts?
Caught SIGTERM, shutting down
Re: Caught SIGTERM, shutting down
Do you have any brokers configured? (like mod_gearman, livestatus, ndo, etc)
I ask because most brokers for 3.x will not work for 4.x. You will need to rebuild them for core 4.x.
Code: Select all
grep broker /usr/local/nagios/etc/nagios.cfg
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Caught SIGTERM, shutting down
No brokers. I think I figured it out - the init.d script was faulty. It didn't match our nagios.cfg file (pid location, directory structure for retention, cmd, etc.), and therefore it wasn't properly stopping the server. I found a link that specified changing the "kill" for a stop to a "killall nagios". Some combination of that I think led to an incorrect startup and who know what else. I just went through the definitions and correct the init.d script, and it now starts/stops/restarts successfully, and after crashing every time after about 5 minutes, it has now been running for almost 2 hours. I'll update the post again if it dies again.
Re: Caught SIGTERM, shutting down
Thanks for the update.
Be sure to check out our Knowledgebase for helpful articles and solutions!