Nagios service stops after a few seconds

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
IPOInS
Posts: 25
Joined: Tue Jan 14, 2020 6:08 am

Nagios service stops after a few seconds

Post by IPOInS »

Good Morning,

We have been experiencing an issue where "nagios.service" will fail a few seconds after being started.

"systemctl status nagios.service" shows:

Code: Select all

● nagios.service - Nagios Core 4.4.2
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2020-01-14 13:11:16 GMT; 3s ago
     Docs: https://www.nagios.org/documentation
  Process: 11524 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 11521 ExecStop=/usr/bin/kill -s TERM ${MAINPID} (code=exited, status=1/FAILURE)
  Process: 11494 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 11492 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 11496 (code=killed, signal=ABRT)

Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: ndomod registered for state change data'
Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: ndomod registered for contact status data'
Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: ndomod registered for adaptive contact data'
Jan 14 13:11:10 nag-man-pr-1.[DOMAIN] nagios[11496]: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: nagios.service: main process exited, code=killed, status=6/ABRT
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] kill[11521]: kill: cannot find process ""
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] nagios[11519]: Caught SIGTERM, shutting down...
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: nagios.service: control process exited, code=exited status=1
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: Unit nagios.service entered failed state.
Jan 14 13:11:16 nag-man-pr-1.[DOMAIN] systemd[1]: nagios.service failed.


/usr/local/nagios/var/nagios.log shows the following:

Code: Select all

[1579007966] Nagios 4.2.4 starting... (PID=14301)
[1579007966] Local time is Tue Jan 14 13:19:26 GMT 2020
[1579007966] LOG VERSION: 2.0
[1579007966] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1579007966] qh: core query handler registered
[1579007966] nerd: Channel hostchecks registered successfully
[1579007966] nerd: Channel servicechecks registered successfully
[1579007966] nerd: Channel opathchecks registered successfully
[1579007966] nerd: Fully initialized and ready to rock!
[1579007966] wproc: Successfully registered manager as @wproc with query handler
[1579007966] wproc: Registry request: name=Core Worker 14303;pid=14303
[1579007966] wproc: Registry request: name=Core Worker 14304;pid=14304
[1579007966] wproc: Registry request: name=Core Worker 14302;pid=14302
[1579007966] wproc: Registry request: name=Core Worker 14305;pid=14305
[1579007966] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1579007966] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1579007966] ndomod: NDOMOD 2.1.3 (2017-04-13) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1579007966] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1579007966] ndomod registered for process data
[1579007966] ndomod registered for log data'
[1579007966] ndomod registered for system command data'
[1579007966] ndomod registered for event handler data'
[1579007966] ndomod registered for notification data'
[1579007966] ndomod registered for comment data'
[1579007966] ndomod registered for downtime data'
[1579007966] ndomod registered for flapping data'
[1579007966] ndomod registered for program status data'
[1579007966] ndomod registered for host status data'
[1579007966] ndomod registered for service status data'
[1579007966] ndomod registered for adaptive program data'
[1579007966] ndomod registered for adaptive host data'
[1579007966] ndomod registered for adaptive service data'
[1579007966] ndomod registered for external command data'
[1579007966] ndomod registered for aggregated status data'
[1579007966] ndomod registered for retention data'
[1579007966] ndomod registered for contact data'
[1579007966] ndomod registered for contact notification data'
[1579007966] ndomod registered for acknowledgement data'
[1579007966] ndomod registered for state change data'
[1579007966] ndomod registered for contact status data'
[1579007966] ndomod registered for adaptive contact data'
[1579007966] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1579007972] Successfully launched command file worker with pid 14319
[1579007972] Caught SIGTERM, shutting down...
Running a pre-flight check shows no config errors. Server is running the latest version (5.6.9) of Nagios XI on CentOS 7.

Please let me know if you need anything else.

Thanks!
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Nagios service stops after a few seconds

Post by mbellerue »

Hi @IPOInS, welcome to the forums!

Can you send in a system profile? You should still be able to get that with the monitoring engine down. Just go to Admin -> System Profile -> Download Profile. You can PM that to me if you don't want to post it here.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios service stops after a few seconds

Post by benjaminsmith »

Hello,

Welcome to the Nagios Support Forums! If the system is not (starting) monitoring, I would recommend moving this over to a support ticket for faster resolution. To open a support ticket go to:
https://support.nagios.com/tickets/

A few questions to help us troubleshoot the error.

1. When did this start happening and did it coincide with any system changes (i.e. upgrades)
2. Verify that you do not have multiple Nagios processes running.

Code: Select all

ps -ef | head -1 && ps -ef | grep bin/nagios
3. Upload the nagios.cfg to the post or ticket.

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg
4. Post the unit file for the service. Thanks.

Code: Select all

cat /usr/lib/systemd/system/nagios.service
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
IPOInS
Posts: 25
Joined: Tue Jan 14, 2020 6:08 am

Re: Nagios service stops after a few seconds

Post by IPOInS »

Thanks for the advice! I'll move this over to a support ticket.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios service stops after a few seconds

Post by benjaminsmith »

Hello,
Thanks for the advice! I'll move this over to a support ticket.
Sounds good. Please reference this thread and provide the system profile and other logs when opening the ticket. Thanks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked