Unable to start Nagios services

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Unable to start Nagios services

Post by Amit_Alone »

I have referred many of the forum to start the Nagios service. But still following all the step Nagios service is not started.

Below the o/p I have observed while starting the Nagios.

Code: Select all

[root@VORINFNAGIOST01 system]# service nagios restart
Redirecting to /bin/systemctl restart nagios.service
Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.
[root@VORINFNAGIOST01 system]# journalctl -xe
Feb 08 12:36:01 VORINFNAGIOST01 sudo[8028]: pam_unix(sudo:session): session closed for user root
Feb 08 12:36:01 VORINFNAGIOST01 sudo[8050]:   nagios : TTY=unknown ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagiosxi/scripts/manage_services.sh st
Feb 08 12:36:01 VORINFNAGIOST01 sudo[8050]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 08 12:36:01 VORINFNAGIOST01 sudo[8050]: pam_unix(sudo:session): session closed for user root
Feb 08 12:36:04 VORINFNAGIOST01 sssd[ldap_child[8144]][8144]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication fail
Feb 08 12:36:07 VORINFNAGIOST01 polkitd[8122]: Registered Authentication Agent for unix-process:8146:43583090 (system bus name :1.9123 [/usr/bin/pkttyagent -
Feb 08 12:36:07 VORINFNAGIOST01 systemd[1]: Starting Nagios Core 4.4.6...
-- Subject: Unit nagios.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has begun starting up.
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Nagios Core 4.4.6
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Copyright (c) 1999-2009 Ethan Galstad
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Last Modified: 2020-04-28
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: License: GPL
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Website: https://www.nagios.org
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Reading configuration data...
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Read main config file okay...
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error: Template 'xiwizard_websitefis_host' specified in host definition could not be not found (config file '/u
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error: Invalid max_check_attempts value for host 'localhost'
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error: Could not register host (config file '/usr/local/nagios/etc/hosts/localhost.cfg', starting on line 16)
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error processing object config files!
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: ***> One or more problems was encountered while processing the config files...
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Check your configuration file(s) to ensure that they contain valid
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: directives and data definitions.  If you are upgrading from a previous
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: version of Nagios, you should be aware that some variables/definitions
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: may have been removed or modified in this version.  Make sure to read
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: the HTML documentation regarding the config files, as well as the
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: 'Whats New' section to find out what has changed.
Feb 08 12:36:07 VORINFNAGIOST01 systemd[1]: nagios.service: control process exited, code=exited status=1
Feb 08 12:36:07 VORINFNAGIOST01 systemd[1]: Failed to start Nagios Core 4.4.6.
-- Subject: Unit nagios.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has failed.
--
-- The result is failed.
Feb 08 12:36:07 VORINFNAGIOST01 systemd[1]: Unit nagios.service entered failed state.
Feb 08 12:36:07 VORINFNAGIOST01 systemd[1]: nagios.service failed.
Feb 08 12:36:07 VORINFNAGIOST01 polkitd[8122]: Unregistered Authentication Agent for unix-process:8146:43583090 (system bus name :1.9123, object path /org/fr

Code: Select all

[root@VORINFNAGIOST01 system]# cat nagios.service
[Unit]
Description=Nagios Core 4.4.6
Documentation=https://www.nagios.org/documentation
After=network.target local-fs.target mariadb.service

[Service]
Type=forking
User=nagios
Group=nagios
PIDFile=/var/run/nagios.pid
ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
ExecStop=/usr/bin/kill -s TERM ${MAINPID}
ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd
ExecReload=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
ExecReload=/usr/bin/kill -s HUP ${MAINPID}

[Install]
WantedBy=multi-user.target
[root@VORINFNAGIOST01 system]#

Code: Select all

[root@VORINFNAGIOST01 system]# cat /usr/local/nagios/etc/nagios.cfg | grep lock
lock_file=/var/run/nagios.lock
[root@VORINFNAGIOST01 system]#
Please assist me how to start the nagios service.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Unable to start Nagios services

Post by vtrac »

Hi Amit_Alone,
Looks to me like you are having issue with your Nagios config host or service definition that have invalid value defined.

Code: Select all

Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error: Template 'xiwizard_websitefis_host' specified in host definition could not be not found (config file '/u
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error: Invalid max_check_attempts value for host 'localhost'
Feb 08 12:36:07 VORINFNAGIOST01 nagios[8167]: Error: Could not register host (config file '/usr/local/nagios/etc/hosts/localhost.cfg', starting on line 16)
Based on the error above, looks like you might be having an invalid "max_check_attemps" value for host "localhost".
Please use "Core Config Manager" to fix this issue by enter the correct value "max_check_attemps" required for "localhost".
"Nagios XI" GUI > Configure > Core Config Manager > Hosts > click "localhost" > "Check Settings"

Please run the below command on your Nagios XI command prompt

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
The above command will give you more glues as to what happens. Fix ALL the issue(s) base on the output provided until it is clean.

Also, please check "/var/log/mariadb/mariadb.log" database log and see if you have database crashes.
If so, please run the below script to fix the database:

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Now, restart Nagios:

Code: Select all

systemctl restart nagios

Regards,
Vinh
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: Unable to start Nagios services

Post by Amit_Alone »

Thanks Vinh, for reply. I have corrected the max_check_attemps for localhost and after that while trying to restart the Nagios on GUI I'm observing below error.

Code: Select all

Error: Template 'xiwizard_websitefis_host' specified in host definition could not be not found (config file '/usr/local/nagios/etc/hosts/1832.PTACONNECT.COM.cfg', starting on line 16)
However, I have removed the host 1832.PTACONNECT.COM from monitoring but still it is showing the same error. I tried updating the hosttemplate.cfg file by adding define host details for xiwizard_websitefis_host but every time it is getting wipe out and above error start reflecting in the GUI.

Code: Select all

[root@VORINFNAGIOST01 etc]# service nagios restart
Redirecting to /bin/systemctl restart nagios.service
Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.
[root@VORINFNAGIOST01 etc]# journalctl -xe
--
-- The start-up result is done.
Feb 09 09:13:01 VORINFNAGIOST01 systemd[1]: Started Session 18315 of user nagios.
-- Subject: Unit session-18315.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-18315.scope has finished starting up.
--
-- The start-up result is done.
Feb 09 09:13:01 VORINFNAGIOST01 systemd[1]: Started Session 18316 of user nagios.
-- Subject: Unit session-18316.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-18316.scope has finished starting up.
--
-- The start-up result is done.
Feb 09 09:13:01 VORINFNAGIOST01 systemd[1]: Started Session 18317 of user nagios.
-- Subject: Unit session-18317.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-18317.scope has finished starting up.
--
-- The start-up result is done.
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12346]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/snmptt_service_results.php >> /usr/local/nagiosxi/var/sn
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12348]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12347]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php >> /usr/local/nagiosxi/var/deadpool.log 2>&
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12349]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php >> /usr/local/nagiosxi/var/reportengine
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12350]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12352]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12353]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php >> /usr/local/nagiosxi/var/cleaner.log 2>&1)
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12356]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1)
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12351]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handl
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12357]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1)
Feb 09 09:13:01 VORINFNAGIOST01 CROND[12359]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&
Feb 09 09:13:02 VORINFNAGIOST01 sudo[12392]:   nagios : TTY=unknown ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagiosxi/scripts/manage_services.sh s
Feb 09 09:13:02 VORINFNAGIOST01 sudo[12392]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 09 09:13:02 VORINFNAGIOST01 sudo[12392]: pam_unix(sudo:session): session closed for user root
Feb 09 09:13:02 VORINFNAGIOST01 sudo[12420]:   nagios : TTY=unknown ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagiosxi/scripts/manage_services.sh s
Feb 09 09:13:02 VORINFNAGIOST01 sudo[12420]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 09 09:13:02 VORINFNAGIOST01 sudo[12420]: pam_unix(sudo:session): session closed for user root

[root@VORINFNAGIOST01 etc]#
[root@VORINFNAGIOST01 etc]# systemctl status nagios.service
● nagios.service - Nagios Core 4.4.6
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2021-02-09 09:13:40 EST; 11s ago
     Docs: https://www.nagios.org/documentation
  Process: 12550 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 12547 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=254)
  Process: 12545 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)

Feb 09 09:13:40 VORINFNAGIOST01 systemd[1]: Starting Nagios Core 4.4.6...
Feb 09 09:13:40 VORINFNAGIOST01 systemd[1]: nagios.service: control process exited, code=exited status=254
Feb 09 09:13:40 VORINFNAGIOST01 systemd[1]: Failed to start Nagios Core 4.4.6.
Feb 09 09:13:40 VORINFNAGIOST01 systemd[1]: Unit nagios.service entered failed state.
Feb 09 09:13:40 VORINFNAGIOST01 systemd[1]: nagios.service failed.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Unable to start Nagios services

Post by vtrac »

Hi Amit_Alone,
Please run the below command and see what the output said.
If there is error, please use CCM to correct the issue. Do not edit the file directly.

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
To fix issue from the output above command, please use CCM:
Nagios XI GUI > Configure > Core Config Manager > use "Hosts" or "Services" depending on your error output above.

Regards,
Vinh
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: Unable to start Nagios services

Post by Amit_Alone »

Hi Vinh,

As suggested from GUI I tried to correct the error which was displaying at the time of starting the Nagios service. After some trouble shooting the error stop displaying while starting the Nagios service but now the issue is with the Monitoring Engine status as it is not showing active.

I have attach the screen shot of the same. Also on the pre-flight check cmd is showing no error and warning.

Code: Select all

[root@VORINFNAGIOST01 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2020-04-28
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 29 services.
        Checked 4 hosts.
        Checked 7 host groups.
        Checked 0 service groups.
        Checked 9 contacts.
        Checked 2 contact groups.
        Checked 138 commands.
        Checked 14 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 4 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 14 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
[root@VORINFNAGIOST01 ~]#
Below is the journalctl o/p which was displayed after performing restart from cmd as well.

Code: Select all

[root@VORINFNAGIOST01 ~]# service nagios restart
Redirecting to /bin/systemctl restart nagios.service
[root@VORINFNAGIOST01 ~]# journalctl -xe
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checked 4 hosts
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checked 0 service dependencies
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checked 0 host dependencies
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checked 14 timeperiods
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checking global event handlers...
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checking obsessive compulsive processor commands...
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Checking misc settings...
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Total Warnings: 0
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Total Errors:   0
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1409]: Things look okay - No serious problems were detected during the pre-flight check
Feb 10 12:31:28 VORINFNAGIOST01 systemd[1]: Started Nagios Core 4.4.6.
-- Subject: Unit nagios.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has finished starting up.
--
-- The start-up result is done.
Feb 10 12:31:28 VORINFNAGIOST01 polkitd[8122]: Unregistered Authentication Agent for unix-process:1393:60835207 (system bus name :1.92861, object path /org/f
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: Nagios 4.4.6 starting... (PID=1414)
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: Local time is Wed Feb 10 12:31:28 EST 2021
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: LOG VERSION: 2.0
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: qh: core query handler registered
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: qh: echo service query handler registered
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: qh: help for the query handler registered
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Successfully registered manager as @wproc with query handler
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1416;pid=1416
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1417;pid=1417
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1421;pid=1421
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1420;pid=1420
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1418;pid=1418
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1423;pid=1423
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1424;pid=1424
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1422;pid=1422
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1419;pid=1419
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1426;pid=1426
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1427;pid=1427
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: wproc: Registry request: name=Core Worker 1425;pid=1425
Feb 10 12:31:28 VORINFNAGIOST01 nagios[1414]: ndomod: NDOMOD 2.1.3 (2017-04-13) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Feb 10 12:31:28 VORINFNAGIOST01 kill[1415]: kill: cannot find process ""
Feb 10 12:31:28 VORINFNAGIOST01 systemd[1]: nagios.service: control process exited, code=exited status=1
Feb 10 12:31:30 VORINFNAGIOST01 sssd[ldap_child[1429]][1429]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication fail

Below is the nagios.log I have captured while starting the Nagios from GUI

Code: Select all

[1612978925] Nagios 4.4.6 starting... (PID=3978)
[1612978925] Local time is Wed Feb 10 12:42:05 EST 2021
[1612978925] LOG VERSION: 2.0
[1612978925] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1612978925] qh: core query handler registered
[1612978925] qh: echo service query handler registered
[1612978925] qh: help for the query handler registered
[1612978925] wproc: Successfully registered manager as @wproc with query handler
[1612978925] wproc: Registry request: name=Core Worker 3980;pid=3980
[1612978925] wproc: Registry request: name=Core Worker 3981;pid=3981
[1612978925] wproc: Registry request: name=Core Worker 3982;pid=3982
[1612978925] wproc: Registry request: name=Core Worker 3983;pid=3983
[1612978925] wproc: Registry request: name=Core Worker 3984;pid=3984
[1612978925] wproc: Registry request: name=Core Worker 3985;pid=3985
[1612978925] wproc: Registry request: name=Core Worker 3986;pid=3986
[1612978925] wproc: Registry request: name=Core Worker 3987;pid=3987
[1612978925] wproc: Registry request: name=Core Worker 3988;pid=3988
[1612978925] wproc: Registry request: name=Core Worker 3989;pid=3989
[1612978925] wproc: Registry request: name=Core Worker 3990;pid=3990
[1612978925] wproc: Registry request: name=Core Worker 3991;pid=3991
[1612978925] Caught SIGTERM, shutting down...
[1612978925] ndomod: NDOMOD 2.1.3 (2017-04-13) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1612978925] ndomod: Could not open data sink!  I'll keep trying, but some output may get lost...
[1612978925] ndomod registered for process data
[1612978925] ndomod registered for log data'
[1612978925] ndomod registered for system command data'
[1612978925] ndomod registered for event handler data'
[1612978925] ndomod registered for notification data'
[1612978925] ndomod registered for comment data'
[1612978925] ndomod registered for downtime data'
[1612978925] ndomod registered for flapping data'
[1612978925] ndomod registered for program status data'
[1612978925] ndomod registered for host status data'
[1612978925] ndomod registered for service status data'
[1612978925] ndomod registered for adaptive program data'
[1612978925] ndomod registered for adaptive host data'
[1612978925] ndomod registered for adaptive service data'
[1612978925] ndomod registered for external command data'
[1612978925] ndomod registered for aggregated status data'
[1612978925] ndomod registered for retention data'
[1612978925] ndomod registered for contact data'
[1612978925] ndomod registered for contact notification data'
[1612978925] ndomod registered for acknowledgement data'
[1612978925] ndomod registered for state change data'
[1612978925] ndomod registered for contact status data'
[1612978925] ndomod registered for adaptive contact data'
[1612978925] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1612978931] Successfully launched command file worker with pid 3994
[1612978931] Successfully shutdown... (PID=3978)
[1612978931] ndomod: Shutdown complete.
[1612978931] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
You do not have the required permissions to view the files attached to this post.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Unable to start Nagios services

Post by vtrac »

Hi
Please make sure you have "localhost" defined on both lines of "127.0.0.1" and "::1" in your "/etc/hosts".

cat /etc/hosts

Code: Select all

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 vt-nagiosxi-62
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 vt-nagiosxi-62
Restart Nagios:

Code: Select all

systemctl restart nagios
If you are still having issue after restart Nagios, please send me the "profile.zip"

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket.

Regards,
Vinh
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: Unable to start Nagios services

Post by Amit_Alone »

As suggested, I have updated the /etc/hosts/ file. But still while trying to starting the Nagios service it is not coming up.

Code: Select all

[root@VORINFNAGIOST01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 vt-nagiosxi-62
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 vt-nagiosxi-62
I will PM the Profile as requested.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Unable to start Nagios services

Post by vtrac »

Hi,
Most people kept thinking that they have already made the changes to the config of a "host" or "service" by editing the config file directly using the command line editor.

You can not make change by manually edit any of the cfg file by hand since it will be overwritten by the data coming from the database.

I looked at your profile.zip and notice this error:

Code: Select all

Error: Template 'xiwizard_websitefis_host' specified in host definition could not be not found (config file '/usr/local/nagios/etc/hosts/1832.PTACONNECT.COM.cfg', starting on line 16)
Since you DO NOT have the template 'xiwizard_websitefis_host' defined any more, you must removed all your host and services that use this template.

You can not disable "1832.PTACONNECT.COM", it must be removed or reconfigure to use a different template since 'xiwizard_websitefis_host' is NOT available.

Please use CCM - "Core Config Manager" to remove host "1832.PTACONNECT.COM" with the below steps:

Code: Select all

Nagios XI GUI > Configure > Core Configure Manager > Hosts 
Now, select the red "X" belong to "1832.PTACONNECT.COM" to "delete" that host.
Now click "Apply Configuration"

You also must remove those two services which belong to "1832.PTACONNECT.COM" as well:

Nagios XI GUI > Configure > Core Configure Manager > Services
Now, select the red "X" belong to "1832.PTACONNECT.COM" services to "delete" those two services.
Now click "Apply Configuration"
Regards,
Vinh
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Unable to start Nagios services

Post by vtrac »

Hi,
One very important thing I also noticed is your "/var" filesystem might be full or was full.
There might be something running that might have used up all the disk space under "/var" like backup for example.

Here's the error I noticed in the database log:

Version: '5.5.65-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
200808 22:01:56 [Warning] IP address '10.252.0.254' could not be resolved: Name or service not known
201022 4:24:56 [Warning] mysqld: Disk is full writing './nagios/nagios_systemcommands.MYD' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
201022 4:24:56 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
201022 5:20:02 [Warning] mysqld: Disk is full writing './nagios/nagios_servicestatus.TMD' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
201022 5:20:02 [Warning] mysqld: Retry in 60 secs. Message reprinte201022 10:58:56 [ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and should be repaired
201022 10:58:56 [Warning] Checking table: './nagios/nagios_logentries'
201022 10:58:56 [ERROR] mysqld: Table './nagios/nagios_statehistory' is marked as crashed and should be repaired
201022 10:58:56 [Warning] Checking table: './nagios/nagios_statehistory'
201022 11:00:02 InnoDB: Error: Write to file ./nagiosxi/#sql-3b9c_125.ibd failed at offset 0 229376.
InnoDB: 16384 bytes should have been written, only 4096 were written.
InnoDB: Operating system error number 28.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 28 means 'No space left on device'.


Please talk to your system-admin person/people to help you get more disk space for "/var".


Regards,
Vinh
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: Unable to start Nagios services

Post by Amit_Alone »

Hi Vinh,

Template 'xiwizard_websitefis_host' and host '1832.PTACONNECT.COM' was already been removed because of that itself while trying to start the Nagios service I didn't observed the highlighted error and I have mention this error at the time of initial communication.

Also, highlighted logs are from the last year Oct month. Currently we do have enough space to get system up.

Please refer the below space details.

Code: Select all

[root@VORINFNAGIOST01 ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 7.8G     0  7.8G   0% /dev
tmpfs                    7.8G     0  7.8G   0% /dev/shm
tmpfs                    7.8G  505M  7.3G   7% /run
tmpfs                    7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/centos-root   40G  7.0G   31G  19% /
/dev/sda1                477M  266M  182M  60% /boot
/dev/mapper/centos-tmp   4.8G  451M  4.1G  10% /tmp
/dev/mapper/centos-home  4.8G   71M  4.5G   2% /home
/dev/mapper/centos-var   5.3G  2.3G  2.8G  45% /var
tmpfs                    1.6G     0  1.6G   0% /run/user/1002
tmpfs                    1.6G     0  1.6G   0% /run/user/1000

Code: Select all

[root@VORINFNAGIOST01 system]# cat nagios.service
[Unit]
Description=Nagios Core 4.4.6
Documentation=https://www.nagios.org/documentation
After=network.target local-fs.target mariadb.service

[Service]
ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
ExecStop=/usr/bin/kill -s TERM ${MAINPID}
ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd
ExecReload=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
ExecReload=/usr/bin/kill -s HUP ${MAINPID}

[Install]
WantedBy=multi-user.target
Below is the today's date database logs in which we can observe no error is showing for disk full.

Code: Select all

210209  4:35:30 [Note] /usr/libexec/mysqld: Normal shutdown
210209  4:35:30 [Note] Event Scheduler: Purging the queue. 0 events
210209  4:35:30  InnoDB: Starting shutdown...
210209  4:35:31  InnoDB: Waiting for 31 pages to be flushed
210209  4:35:34  InnoDB: Shutdown completed; log sequence number 6461579419
210209  4:35:34 [Note] /usr/libexec/mysqld: Shutdown complete

210209 04:35:35 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended
210209 04:35:35 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
210209  4:35:35 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 4289 ...
210209  4:35:35 InnoDB: The InnoDB memory heap is disabled
210209  4:35:35 InnoDB: Mutexes and rw_locks use GCC atomic builtins
210209  4:35:35 InnoDB: Compressed tables use zlib 1.2.7
210209  4:35:35 InnoDB: Using Linux native AIO
210209  4:35:35 InnoDB: Initializing buffer pool, size = 128.0M
210209  4:35:35 InnoDB: Completed initialization of buffer pool
210209  4:35:35 InnoDB: highest supported file format is Barracuda.
210209  4:35:35  InnoDB: Waiting for the background threads to start
210209  4:35:36 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 6461579419
210209  4:35:36 [Note] Plugin 'FEEDBACK' is disabled.
210209  4:35:36 [Note] Server socket created on IP: '0.0.0.0'.
210209  4:35:36 [Note] Event Scheduler: Loaded 0 events
210209  4:35:36 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.68-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
Locked