Nagios stopping aftter applyconfig

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
paul.jobb
Posts: 167
Joined: Tue Aug 02, 2011 4:37 pm

Nagios stopping aftter applyconfig

Post by paul.jobb »

Hi

I have some daily automation that updates nagios and applies the configuration using the nagiosxi rest api's.

Periodically after that job runs, not always, the nagios daemon will stop shortly after. Looking where I might troubleshoot this, its possible this may have started happening after I did the last upgrade...

NagiosXi version 5.7.2, we have 3 mod_gearman_workers
Red Hat Enterprise Linux Server release 7.6
the server seems to have more than enough resources
Capture.PNG



/var/log/messages

Aug 26 08:24:50 vnl1654 nagios: SERVICE ALERT: VNL1351;System Load;WARNING;HARD;3;Load : 1.72 3.49 3.88 : 3.88 > 3.0 : WARNING
Aug 26 08:24:50 vnl1654 nagios: SERVICE ALERT: VNL978;CPU Usage;WARNING;SOFT;1;CPU used 89.0% (>80) : WARNING
Aug 26 08:24:51 vnl1654 systemd[1]: Stopping Nagios Core 4.4.6...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Successfully shutdown... (PID=12362)
Aug 26 08:24:56 vnl1654 nagios: Event broker module '/usr/local/nagios/bin/ndo.so' deinitialized successfully.
Aug 26 08:24:56 vnl1654 nagios: Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' deinitialized successfully.
Aug 26 08:24:56 vnl1654 systemd[1]: Stopped Nagios Core 4.4.6.
Aug 26 08:24:56 vnl1654 systemd[1]: Starting Nagios Core 4.4.6...
Aug 26 08:24:56 vnl1654 nagios[25906]: Nagios Core 4.4.6
Aug 26 08:24:56 vnl1654 nagios[25906]: Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Aug 26 08:24:56 vnl1654 nagios[25906]: Copyright (c) 1999-2009 Ethan Galstad
Aug 26 08:24:56 vnl1654 nagios[25906]: Last Modified: 2020-04-28
Aug 26 08:24:56 vnl1654 nagios[25906]: License: GPL
Aug 26 08:24:56 vnl1654 nagios[25906]: Website: https://www.nagios.org
Aug 26 08:24:56 vnl1654 nagios[25906]: Reading configuration data...
Aug 26 08:24:56 vnl1654 nagios[25906]: Read main config file okay...
Aug 26 08:24:56 vnl1654 nagios[25906]: Read object config files okay...
Aug 26 08:24:56 vnl1654 nagios[25906]: Running pre-flight check on configuration data...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking objects...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 13677 services.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 1523 hosts.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 7 host groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 2 service groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 128 contacts.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 19 contact groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 206 commands.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 8 time periods.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 host escalations.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 service escalations.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking for circular paths...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 1523 hosts
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 service dependencies
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 host dependencies
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 8 timeperiods
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking global event handlers...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking obsessive compulsive processor commands...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking misc settings...
Aug 26 08:24:56 vnl1654 nagios[25906]: Total Warnings: 0
Aug 26 08:24:56 vnl1654 nagios[25906]: Total Errors: 0
Aug 26 08:24:56 vnl1654 nagios[25906]: Things look okay - No serious problems were detected during the pre-flight check
Aug 26 08:24:56 vnl1654 systemd[1]: Started Nagios Core 4.4.6.
Aug 26 08:24:56 vnl1654 nagios: Nagios 4.4.6 starting... (PID=25910)
Aug 26 08:24:56 vnl1654 nagios: Local time is Wed Aug 26 08:24:56 MDT 2020
Aug 26 08:24:56 vnl1654 nagios: LOG VERSION: 2.0
Aug 26 08:24:56 vnl1654 nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 26 08:24:56 vnl1654 nagios: qh: core query handler registered
Aug 26 08:24:56 vnl1654 nagios: qh: echo service query handler registered
Aug 26 08:24:56 vnl1654 nagios: qh: help for the query handler registered
Aug 26 08:24:56 vnl1654 nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25912;pid=25912
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25914;pid=25914
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25911;pid=25911
Aug 26 08:24:57 vnl1654 nagios: wproc: Registry request: name=Core Worker 25913;pid=25913
Aug 26 08:24:57 vnl1654 nagios: Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
Aug 26 08:24:57 vnl1654 nagios: mod_gearman: initialized version 3.0.7 (libgearman 0.33)
Aug 26 08:24:57 vnl1654 nagios: Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' initialized successfully.
Aug 26 08:24:57 vnl1654 nagios: WARNING: RLIMIT_NPROC is 63450, total max estimated processes is 73496! You should increase your limits (ulimit -u, or limits.conf)
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260615 of user root.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260616 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of pcp.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260614 of user pcp.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260619 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260617 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260618 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260620 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260621 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260622 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260623 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260624 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260625 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260626 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of questusr.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260628 of user questusr.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260627 of user nagios.
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to stat() file /etc/opt/quest/vas/users.deny, skipping
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to build file records for class user-deny
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to build file records for class user-maps
Aug 26 08:25:02 vnl1654 systemd[1]: Removed slice User Slice of questusr.
Aug 26 08:25:02 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:03 vnl1654 systemd[1]: Removed slice User Slice of pcp.
Aug 26 08:25:05 vnl1654 nagios: Successfully launched command file worker with pid 26277
Aug 26 08:25:07 vnl1654 snmpd[1715]: Connection from UDP: [199.215.83.194]:47757->[10.40.0.16]:161
Aug 26 08:25:19 vnl1654 nagios: SERVICE ALERT: VNL1285;System Load;OK;HARD;3;Load : 0.00 0.08 2.55 : OK
Aug 26 08:25:31 vnl1654 dbus[1443]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Aug 26 08:25:31 vnl1654 dbus[1443]: [system] Successfully activated service 'org.freedesktop.problems'
Aug 26 08:25:33 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:33 vnl1654 systemd[1]: Started Session c17177 of user root.
Aug 26 08:25:33 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:45 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:45 vnl1654 systemd[1]: Started Session c17178 of user root.
Aug 26 08:25:45 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:59 vnl1654 nagios: SERVICE ALERT: VNL978;CPU Usage;WARNING;SOFT;2;CPU used 86.0% (>80) : WARNING
Aug 26 08:25:59 vnl1654 nagios: SERVICE ALERT: VNL539;netsnmp Memory Usage;OK;HARD;5;Memory Utilization OK - %used_real is 97.93%, total_real is 7821.5 MB, avail_real is 162.1 MB, cached is 6537.7 MB, buffer is 0.0 MB, %user_real is 19.82%, %cached_real is 83.59%, %buffer_real is 0.00%
Aug 26 08:25:59 vnl1654 nagios: Caught SIGSEGV, shutting down...
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Aug 26 08:25:59 vnl1654 kill[26568]: kill: cannot find process ""
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service: control process exited, code=exited status=1
Aug 26 08:25:59 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:25:59 vnl1654 systemd[1]: Unit nagios.service entered failed state.
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service failed.
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios stopping aftter applyconfig

Post by benjaminsmith »

Hi,

How after does this occur and what method are you using to re-start the services (e.g system re-boot). I did notice this message in the log.
nagios: WARNING: RLIMIT_NPROC is 63450, total max estimated processes is 73496! You should increase your limits (ulimit -u, or limits.conf)
I would go ahead and increase those limits. To increase it, edit the /etc/security/limits.conf file and add the following to the bottom of the file.

Code: Select all

*          soft     nproc          262144
*          hard    nproc          262144
Save the change and reboot the server for the change to take effect. Then PM your system profile and we'll review the other logs to troubleshoot this for you.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

Reference:
How to set nproc (Hard and Soft) Values in CentOS / RHEL 5,6,7
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
paul.jobb
Posts: 167
Joined: Tue Aug 02, 2011 4:37 pm

Re: Nagios stopping aftter applyconfig

Post by paul.jobb »

Thanks,

I added those settings and rebooted the server.

to restart the service its just "systemctl start nagios" and it seems to be fine after no problems.

I sent you the profile.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios stopping aftter applyconfig

Post by benjaminsmith »

Hi Paul,
to restart the service its just "systemctl start nagios" and it seems to be fine after no problems
It sounds like things are working much better now. I looked over the profile and nothing really jumping out in the logs.

Let me know if you're still seeing the issue or if's ok to close this out.

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
paul.jobb
Posts: 167
Joined: Tue Aug 02, 2011 4:37 pm

Re: Nagios stopping aftter applyconfig

Post by paul.jobb »

I scheduled those automation scripts that update nagios and re-apply config through cron and I haven't had seen any problems. So i'm thinking maybe just an anomaly, I don't know.

I did add those two entries to limits.conf, I still get that same error though "WARNING: RLIMIT_NPROC is 63450, total max estimated processes is 73496! You should increase your limits (ulimit -u, or limits.conf)" , I thought I saw in another post that may be an erroneous message however.

I have a bit of a request I am using check_nagiosxiserver.php to monitor from another nagios instance using the api key, Is it possible to add to the api to start the nagios daemon? It has the capability to apply config remotely. Then I can add an event handler easily enough to restart nagios if it happens to stop.

Thanks for your help, you can close this off.

Thanks,
Paul
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios stopping aftter applyconfig

Post by benjaminsmith »

Hi Paul,

Glad it's working better.

Regarding the API question, you can do this using the system/command option in the System API endpoint. This wil allow you to send Nagios Core external command via the API, the following Core command would restart the Nagios service.

Restart Program

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked