Page 2 of 3

Re: NPCD: WARN: Max load reached

Posted: Mon Jun 26, 2017 9:55 am
by tgriep
The log from the remote host looks like you are using the NSClient++ in passive mode to send the checks to the Nagios server and it is not successively sending it's data and that is probably leaving the connection open on the Nagios server.
Over time, this will cause the hung processes you are seeing.

Check the settings in the nsclient.ini file on the remote server and verify that the encryption and the NSCA password match the settings on the Nagios server.
Restart the NSClient++ service after the editing the file.

Until you go through all of the servers that are hanging the connections, you can limit then on the Nagios server.
To do that, edit he /etc/xinetd.d/nsca file and change the UNLIMITED lines from

Code: Select all

#        per_source      = UNLIMITED
#        instances       = UNLIMITED
to a smaller value. You will have to adjust the settings for what will work in your environment.

Code: Select all

per_source      = 10
instances       = 10
Save the file and restart xinetd by running

Code: Select all

service xinetd retart
Try that and post and questions you have.

Re: NPCD: WARN: Max load reached

Posted: Tue Jun 27, 2017 3:27 am
by michal.nastaly
The monitoring agent hasn't been working due to a missing nagios.lock file.

Could this be related?

Re: NPCD: WARN: Max load reached

Posted: Tue Jun 27, 2017 8:31 am
by tgriep
Yes, that could be it. If the Nagios process is not running, it is not processing the inbound traffic to they will hang waiting to be processed.
Did you get the Monitoring Agent running?
Normally you would remove the lock file and then start the process.

Re: NPCD: WARN: Max load reached

Posted: Tue Jun 27, 2017 9:42 am
by michal.nastaly
I can't get the monitoring agent to run as the .lock file is missing.

moreover, the only components that are running in the Admin tab of the Web GUI are:

Cleaner and Nonstop Operations Manager.

All the other components are offline.

Re: NPCD: WARN: Max load reached

Posted: Tue Jun 27, 2017 9:48 am
by tgriep
Can you Private Message me or post your System Profile so we can view the settings and the log files from your server?
To get your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and either PM it or post it here.

Re: NPCD: WARN: Max load reached

Posted: Wed Jun 28, 2017 2:19 am
by michal.nastaly
I have PM'd you.

Re: NPCD: WARN: Max load reached

Posted: Wed Jun 28, 2017 8:05 am
by michal.nastaly
We have rolled back our config to the latest working one and it seems to be working again for now. we'll keep monitoring it.

It would be nice to know what caused the monitoring agent to go offline and what caused the nagios.lock file to go missing but atleast we have it up and running again.

Thanks for your support tgriep!

Re: NPCD: WARN: Max load reached

Posted: Wed Jun 28, 2017 8:59 am
by tgriep
I took a look at the profile and didn't see the reason for when the Nagios Process first went down but I suspect that all of the NSCA connections to the server was hitting one of the ulimits and that is why it could not be started until those connections were closed down.

I did see this error in the Apache log files.

Code: Select all

zend_mm_heap corrupted
To fix that error add the following to the bottom of the /etc/php.ini file.

Code: Select all

opcache.fast_shutdown=0
Save the file and restart the Apache Process by running

Code: Select all

service httpd restart
It probably didn't cause the issue but this will help with accessing the XI GUI.

Re: NPCD: WARN: Max load reached

Posted: Thu Jun 29, 2017 3:07 am
by michal.nastaly
Thanks for your support! :D

Re: NPCD: WARN: Max load reached

Posted: Thu Jun 29, 2017 9:12 am
by tmcdonald
Did you have any further (related) questions or are we okay to close this thread?