Page 2 of 2

Re: Nagios service constantly exited

Posted: Mon Nov 08, 2021 6:32 am
by safuanmansor
Hi benjaminsmith,

As per the suggestion.
We have increase the nproc value at /etc/security/limits.conf and also at /etc/security/limits.d/20-nproc.conf base on the webpost @ https://www.thegeekdiary.com/how-to-set ... -rhel-567/

Currently the nagios service is not exited and we will future monitored it.

Re: Nagios service constantly exited

Posted: Mon Nov 08, 2021 10:21 am
by benjaminsmith
Hi Safuan,

That's good to hear, we'll keep this open for now and if you have any issues please provide a fresh system profile for us to review.

Thanks,
Benjamin

Re: Nagios service constantly exited

Posted: Mon Nov 08, 2021 10:33 am
by safuanmansor
Hi benjamin..
last 1 hour, we are hitting a randomly failed graph on nagios.

Nagios not restared.
No nproc errror.
Yet the graph failed.

We can see that a long flat line on the perfdata with the same value where it suppose to be curved.
dr-db-dr-zone1-rib-rac1-online_banking_concurrent_users3 (1).jpg
Appricate your advice on this, the latest profile is sent thru pm.

Thanks
Safuan

Re: Nagios service constantly exited

Posted: Mon Nov 08, 2021 12:15 pm
by safuanmansor
We also see the behavior where the check event restarted base on the active check statistic
IMG-20211109-WA0000.jpg
It climb to 38k over a period of time and then slowly reduce until less than 200 and climb up again. Normally the graph not updating during this behaviour.

Re: Nagios service constantly exited

Posted: Mon Nov 08, 2021 4:13 pm
by benjaminsmith
HI,

This looks like a load issue. I noticed the default service check time out has been increased significantly over the default. Open up /usr/local/nagios/etc/nagios.cfgand change this back to 60 seconds. Service checks that cannot be completed within a reasonable time need to be stopped to avoid too many simultaneous processes.
#service_check_timeout=60
service_check_timeout=600
The defaults on the performance graph have been increased, but I would recommend reducing the max load threshold to 75. Open up /usr/local/nagios/etc/pnp/npcd.cfg, and change the following to:'

Code: Select all

load_threshold = 75.0
Then restart Nagios Core and NPCD

Code: Select all

systemctl restart nagios
systemctl restart npcd
Let me know if that helps. I would recommend that your company starts planning for an additional XI server and break this system up into multiple servers to help decrease the load.

Regards,
--Benjamin

Re: Nagios service constantly exited

Posted: Wed Nov 10, 2021 2:07 am
by safuanmansor
Hi benjaminsmith,

The suggesstion of splitting check to another XI is considered but will take some times as it need to go to long internal process.

We hit another error that we just seen today.

[1636517282] NDO-3: ndo_return = 1 (Statement not prepared)
[1636517282] NDO-3: ndo_handle_comment(ndo-handlers.c:618): Unable to bind parameters
[1636517282] NDO-3: Query failed in ndo_empty_queue_comment
[1636517282] NDO-3: ndo_return = 1 (Statement not prepared)
[1636517282] NDO-3: ndo_handle_comment(ndo-handlers.c:634): Unable to bind parameters
[1636517282] NDO-3: Query failed in ndo_empty_queue_comment
[1636517282] NDO-3: ndo_return = 1 (Statement not prepared)
[1636517282] NDO-3: ndo_handle_comment(ndo-handlers.c:618): Unable to bind parameters
[1636517282] NDO-3: Query failed in ndo_empty_queue_comment
[1636517282] NDO-3: Ended event_handler thread
[1636517282] NDO-3: ndo_return = 1 (Statement not prepared)
[1636517282] NDO-3: ndo_handle_comment(ndo-handlers.c:634): Unable to bind parameters
[1636517282] NDO-3: Query failed in ndo_empty_queue_comment
[1636517282] NDO-3: ndo_return = 1 (Statement not prepared)
[1636517282] NDO-3: ndo_handle_contact_notification(ndo-handlers.c:1320): Unable to bind parameters
[1636517282] NDO-3: Query failed in ndo_empty_queue_notification (handle_contact_notification)
[1636517282] NDO-3: ndo_return = 1 (Statement not prepared)
[1636517282] NDO-3: ndo_handle_comment(ndo-handlers.c:618): Unable to bind parameters
[1636517282] NDO-3: Query failed in ndo_empty_queue_comment

The only solution that i saw inside the forum is by downgrading ndo3 to ndo2db.
What is the downside of this downgrade? Do we lost any features on the latest version of nagios?

Base on this below articles
https://support.nagios.com/kb/article/u ... i-885.html. There is a situation where we can upgrade from ndo2db to ndo3. So will reinstallation of ndo3 can be a solution instead of downgrading it?

Thanks ,
Safuan

Re: Nagios service constantly exited

Posted: Wed Nov 10, 2021 2:07 pm
by benjaminsmith
Hi Safaun,

Let's try to stop everything a do database repair and restart. Please log in as root and run the following:

Code: Select all

systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
systemctl stop crond
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then systemctl stop postgresql; fi;
systemctl restart mariadb
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
rm -f /usr/local/nagiosxi/tmp/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill python
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
systemctl restart httpd
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Then check the nagios logs again. If that doesn't work we can try downgrading to ndo2db (it's relatively easy to downgrade and upgrade again at a later date).

# STANDARD DOWNGRADE OF NDO3

Code: Select all

systemctl stop nagios
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi/subcomponents/ndoutils
./install
systemctl enable ndo2db
Then edit your /usr/local/nagios/etc/nagios.cfg and make sure this line is uncommented:

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
Make sure this line is commented:

Code: Select all

#broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
Then start the nagios service:

Code: Select all

systemctl start nagios
systemctl start ndo2db
--Benjamin

Re: Nagios service constantly exited

Posted: Tue Nov 23, 2021 10:25 pm
by safuanmansor
Hi benjaminsmith,

After applying suggested setting to reduce the the service_check_timeout. Nagios seem to be stable without suddent exited despite the NDO still crashing base on the logs file. The suggestion to downgrade it to ndo2db is a proven workaround as tested on the test server.

Is this the only solution for it at the moment?

Thanks,
Safuan

Re: Nagios service constantly exited

Posted: Wed Nov 24, 2021 11:17 am
by benjaminsmith
Hi Sufaun,
Is this the only solution for it at the moment?
For now, I would recommend staying on ndo2b. We will be making some more updates to ndo3 in a coming release that should help resolve table migration errors and let's try to upgrade again at that time. It's not very difficult to upgrade or downgrade the backend database application.

Let me know if that sounds alright for you.

Regards,
Benjamin

Re: Nagios service constantly exited

Posted: Mon Nov 29, 2021 12:02 am
by safuanmansor
Hi Ben,

Yeah , sound right to me. Thanks for the support. You may close this thread.

Regards,
Safuan