This support forum board is for support questions relating to
Nagios XI , our flagship commercial network monitoring solution.
wneville
Posts: 65 Joined: Wed Mar 31, 2021 3:35 pm
Post
by wneville » Thu Mar 07, 2024 3:15 pm
Hello,
I have a Nagios Core installation that is accepting passive checks from a few remote servers. I am now attempting to create a service which will monitor the nagios process on the localhost. Here is my service definition in /usr/local/nagios/etc/objects/localhost.cfg:
Code: Select all
define service {
host_name localhost
service_description nagios_service_status
check_command check_local_procs!1:!nagios!
max_check_attempts 2
check_interval 5
retry_interval 5
check_period 24x7
active_checks_enabled 1
notification_interval 60
notification_options w,c,r
contact_groups moogsoftadmins
notifications_enabled 1
notification_period 24x7
}
Here is the command being called by the service:
Code: Select all
define command {
command_name check_local_procs
command_line $USER1$/check_procs -c $ARG1$ -C $ARG2$ $ARG3$
}
When we run systemctl restart nagios.service, the service titled 'nagios_service_status' never shows up. Can anyone help me troubleshoot why this might be happening? (or not happening.....
)
danderson
Posts: 127 Joined: Wed Aug 09, 2023 10:05 am
Post
by danderson » Fri Mar 08, 2024 11:56 am
Thanks for reaching out
@wneville ,
Are there any errors within
journalctl -xeu nagios.service when you restart nagios?
wneville
Posts: 65 Joined: Wed Mar 31, 2021 3:35 pm
Post
by wneville » Mon Mar 11, 2024 12:51 pm
Thanks so much for your help. Here is the output:
Code: Select all
Mar 11 13:45:31 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver1];Swap Usage;0;OK: Swap usage was 0.00 % (Used: 0.0
Mar 11 13:45:31 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver1];Memory Usage;0;OK: Memory usage was 57.40 % (Avai
Mar 11 13:45:31 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver1];Process Count;0;OK: Process count was 115
Mar 11 13:45:51 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver2];Disk Usage;0;OK: Free was 178.90 GiB
Mar 11 13:45:51 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver2];CPU Usage;0;OK: Percent was 0.00 %
Mar 11 13:45:51 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver2];Swap Usage;0;OK: Swap usage was 0.00 % (Used: 0
Mar 11 13:45:51 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver2];Memory Usage;0;OK: Memory usage was 24.40 % (Av
Mar 11 13:45:51 [servername_redacted] nagios[4747]: PASSIVE SERVICE CHECK: [remoteserver2];Process Count;0;OK: Process count was 129
Mar 11 13:45:58 [servername_redacted] systemd[1]: Stopping Nagios Core 4.4.14...
-- Subject: Unit nagios.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has begun shutting down.
Mar 11 13:45:58 [servername_redacted] nagios[4747]: Caught SIGTERM, shutting down...
Mar 11 13:45:58 [servername_redacted] nagios[4747]: Successfully shutdown... (PID=4747)
Mar 11 13:45:58 [servername_redacted] systemd[1]: Stopped Nagios Core 4.4.14.
-- Subject: Unit nagios.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has finished shutting down.
Mar 11 13:45:58 [servername_redacted] systemd[1]: Starting Nagios Core 4.4.14...
-- Subject: Unit nagios.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has begun starting up.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Nagios Core 4.4.14
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Copyright (c) 1999-2009 Ethan Galstad
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Last Modified: 2023-08-01
Mar 11 13:45:58 [servername_redacted] nagios[29223]: License: GPL
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Website: https://www.nagios.org
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Reading configuration data...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Read main config file okay...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Read object config files okay...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Running pre-flight check on configuration data...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checking objects...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 90 services.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 17 hosts.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 3 host groups.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 0 service groups.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 2 contacts.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 2 contact groups.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 29 commands.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 5 time periods.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 0 host escalations.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 0 service escalations.
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checking for circular paths...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 17 hosts
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 0 service dependencies
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 0 host dependencies
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checked 5 timeperiods
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checking global event handlers...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checking obsessive compulsive processor commands...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Checking misc settings...
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Total Warnings: 0
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Total Errors: 0
Mar 11 13:45:58 [servername_redacted] nagios[29223]: Things look okay - No serious problems were detected during the pre-flight check
Mar 11 13:45:58 [servername_redacted] nagios[29231]: Nagios 4.4.14 starting... (PID=29231)
Mar 11 13:45:58 [servername_redacted] systemd[1]: Started Nagios Core 4.4.14.
-- Subject: Unit nagios.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nagios.service has finished starting up.
--
-- The start-up result is done.
Mar 11 13:45:58 [servername_redacted] nagios[29231]: Local time is Mon Mar 11 13:45:58 EDT 2024
Mar 11 13:45:58 [servername_redacted] nagios[29231]: LOG VERSION: 2.0
Mar 11 13:45:58 [servername_redacted] nagios[29231]: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Mar 11 13:45:58 [servername_redacted] nagios[29231]: qh: core query handler registered
Mar 11 13:45:58 [servername_redacted] nagios[29231]: qh: echo service query handler registered
Mar 11 13:45:58 [servername_redacted] nagios[29231]: qh: help for the query handler registered
Mar 11 13:45:58 [servername_redacted] nagios[29231]: wproc: Successfully registered manager as @wproc with query handler
Mar 11 13:45:58 [servername_redacted] nagios[29231]: wproc: Registry request: name=Core Worker 29232;pid=29232
Mar 11 13:45:58 [servername_redacted] nagios[29231]: wproc: Registry request: name=Core Worker 29234;pid=29234
Mar 11 13:45:58 [servername_redacted] nagios[29231]: wproc: Registry request: name=Core Worker 29235;pid=29235
Mar 11 13:45:58 [servername_redacted] nagios[29231]: wproc: Registry request: name=Core Worker 29233;pid=29233
Mar 11 13:45:58 [servername_redacted] nagios[29231]: Warning: Could not open object cache file '/usr/local/nagios/var/objects.cache' for writing!
Mar 11 13:45:58 [servername_redacted] nagios[29231]: Successfully launched command file worker with pid 29236
wneville
Posts: 65 Joined: Wed Mar 31, 2021 3:35 pm
Post
by wneville » Tue Mar 19, 2024 7:09 am
Good Morning @danderson - any ideas here? There were no errors in the journalctl output as noted in my last post
lgute
Posts: 126 Joined: Mon Apr 06, 2020 2:49 pm
Post
by lgute » Tue Mar 19, 2024 11:17 am
Hi
@wneville , thanks for reaching out.
Core should installed with a check command called "check_local_procs", with...
Code: Select all
check_command check_local_procs!250!400!RSZDT
define command {
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}
Unless you overwrote that original command, you probably need to change the name of your new command to something else, for example check_local_procs_service_status.
Please let us know if you have any other questions or concerns.
-Laura
wneville
Posts: 65 Joined: Wed Mar 31, 2021 3:35 pm
Post
by wneville » Wed Mar 27, 2024 10:40 am
You can close this thread. The issue was that someone or something had edited the objects.cache file, which changed the file's owner to a user that didn't exist. After that, the application was trying to write the new objects to the objects.cache file but failing
Code: Select all
Mar 11 13:45:58 [servername_redacted] nagios[29231]: Warning: Could not open object cache file '/usr/local/nagios/var/objects.cache' for writing!
Changing the objects.cache owner allowed the new services to be picked up when the nagios service was restarted
Thank you so much for your time and effort, I really really appreciate it!
lgute
Posts: 126 Joined: Mon Apr 06, 2020 2:49 pm
Post
by lgute » Thu Mar 28, 2024 8:27 am
Hi
@wneville ,
Thank you for the update.
Please let us know if you have any other questions or concerns.
-Laura