Page 1 of 1

Nagios service stops after a few seconds

Posted: Tue Jan 14, 2020 8:34 am
by IPOInS
Good Morning,

We have been experiencing an issue where "nagios.service" will fail a few seconds after being started.

"systemctl status nagios.service" shows:

Code: Select all

● nagios.service - Nagios Core 4.4.2
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2020-01-14 13:11:16 GMT; 3s ago
     Docs: https://www.nagios.org/documentation
  Process: 11524 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 11521 ExecStop=/usr/bin/kill -s TERM ${MAINPID} (code=exited, status=1/FAILURE)
  Process: 11494 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 11492 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 11496 (code=killed, signal=ABRT)

Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: ndomod registered for state change data'
Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: ndomod registered for contact status data'
Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: ndomod registered for adaptive contact data'
Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: nagios.service: main process exited, code=killed, status=6/ABRT
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] kill[11521]: kill: cannot find process ""
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] nagios[11519]: Caught SIGTERM, shutting down...
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: nagios.service: control process exited, code=exited status=1
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: Unit nagios.service entered failed state.
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: nagios.service failed.


/usr/local/nagios/var/nagios.log shows the following:

Code: Select all

[1579007966] Nagios 4.2.4 starting... (PID=14301)
[1579007966] Local time is Tue Jan 14 13:19:26 GMT 2020
[1579007966] LOG VERSION: 2.0
[1579007966] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1579007966] qh: core query handler registered
[1579007966] nerd: Channel hostchecks registered successfully
[1579007966] nerd: Channel servicechecks registered successfully
[1579007966] nerd: Channel opathchecks registered successfully
[1579007966] nerd: Fully initialized and ready to rock!
[1579007966] wproc: Successfully registered manager as @wproc with query handler
[1579007966] wproc: Registry request: name=Core Worker 14303;pid=14303
[1579007966] wproc: Registry request: name=Core Worker 14304;pid=14304
[1579007966] wproc: Registry request: name=Core Worker 14302;pid=14302
[1579007966] wproc: Registry request: name=Core Worker 14305;pid=14305
[1579007966] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1579007966] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1579007966] ndomod: NDOMOD 2.1.3 (2017-04-13) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1579007966] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1579007966] ndomod registered for process data
[1579007966] ndomod registered for log data'
[1579007966] ndomod registered for system command data'
[1579007966] ndomod registered for event handler data'
[1579007966] ndomod registered for notification data'
[1579007966] ndomod registered for comment data'
[1579007966] ndomod registered for downtime data'
[1579007966] ndomod registered for flapping data'
[1579007966] ndomod registered for program status data'
[1579007966] ndomod registered for host status data'
[1579007966] ndomod registered for service status data'
[1579007966] ndomod registered for adaptive program data'
[1579007966] ndomod registered for adaptive host data'
[1579007966] ndomod registered for adaptive service data'
[1579007966] ndomod registered for external command data'
[1579007966] ndomod registered for aggregated status data'
[1579007966] ndomod registered for retention data'
[1579007966] ndomod registered for contact data'
[1579007966] ndomod registered for contact notification data'
[1579007966] ndomod registered for acknowledgement data'
[1579007966] ndomod registered for state change data'
[1579007966] ndomod registered for contact status data'
[1579007966] ndomod registered for adaptive contact data'
[1579007966] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1579007972] Successfully launched command file worker with pid 14319
[1579007972] Caught SIGTERM, shutting down...
Running a pre-flight check shows no config errors. Server is running the latest version (5.6.9) of Nagios XI on CentOS 7.

Please let me know if you need anything else.

Thanks!

Re: Nagios service stops after a few seconds

Posted: Tue Jan 14, 2020 12:20 pm
by mbellerue
Hi @IPOInS, welcome to the forums!

Can you send in a system profile? You should still be able to get that with the monitoring engine down. Just go to Admin -> System Profile -> Download Profile. You can PM that to me if you don't want to post it here.

Re: Nagios service stops after a few seconds

Posted: Tue Jan 14, 2020 12:26 pm
by benjaminsmith
Hello,

Welcome to the Nagios Support Forums! If the system is not (starting) monitoring, I would recommend moving this over to a support ticket for faster resolution. To open a support ticket go to:
https://support.nagios.com/tickets/

A few questions to help us troubleshoot the error.

1. When did this start happening and did it coincide with any system changes (i.e. upgrades)
2. Verify that you do not have multiple Nagios processes running.

Code: Select all

ps -ef | head -1 && ps -ef | grep bin/nagios
3. Upload the nagios.cfg to the post or ticket.

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg
4. Post the unit file for the service. Thanks.

Code: Select all

cat /usr/lib/systemd/system/nagios.service

Re: Nagios service stops after a few seconds

Posted: Wed Jan 15, 2020 7:55 am
by IPOInS
Thanks for the advice! I'll move this over to a support ticket.

Re: Nagios service stops after a few seconds

Posted: Wed Jan 15, 2020 10:07 am
by benjaminsmith
Hello,
Thanks for the advice! I'll move this over to a support ticket.
Sounds good. Please reference this thread and provide the system profile and other logs when opening the ticket. Thanks.