Various Messages spawning alerts on NCPA/Plugin execution
Posted: Tue Feb 01, 2022 10:55 am
In need of some feedback and understanding...
I'm running Nagios XI (5.8.6) on RHEL 7.9 and I'm seeing a couple of messages surrounding a service check that utilizes the NCPA agent plugin on a Windows 2008R2 server that uses/executes a Windows Powershell script. The script checks to see if an UNC doesn't exists and triggers an alert. Since 01/20/22, we have now seen these messages around 5-6 times over the last few days. I've ran a report and this check hasn't notifed/failed nor alerted since 10/01/21 for this exact service check. I've asked the windows admin to check the health of the server and apparently all is well.
What I'm looking for here is any encounter these types of messages and what was done about it? I would like to understand what they mean and to have concrete evidence that this is not an Nagios XI problem directly. Here's the messages. I've added the --verbose to the service check and not much more is displayed in terms of output.
lastly, I'm wondering if this has a resource contentsion on the XI system. I have recently seen this as well. I understand the message in terms of what it states but if this was hitting the limit, we should have many more checks having execution issues is my impression.
Jan 31 13:33:23 ch-nagios-p02 nagios: WARNING: RLIMIT_NPROC is 31192, total max estimated processes is 33770! You should increase your limits (ulimit -u, or limits.conf)
1) Thread failed to start
2) Return code of 66
3) Return code of 82
4) Out of bounds
5) The shell cannot be started. A failure occurred during initialization:
/var/log/messages:
Jan 28 21:11:21 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;SOFT;1;Thread failed to start.
Jan 28 21:12:30 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;SOFT;2;Thread failed to start.
Jan 28 21:13:36 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;HARD;3;Thread failed to start.
Jan 28 21:15:44 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;OK;HARD;3;OK: UNC Share is accessible
Jan 29 02:13:09 ch-nagios-p02 nagios: Warning: Return code of 66 for check of service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds.
Jan 29 02:13:09 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;CRITICAL;SOFT;1;(Return code of 66 for service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds)
Jan 31 01:25:20 ch-nagios-p02 nagios: SERVICE ALERT: EG-RPTDB-P03;CPU Usage;CRITICAL;SOFT;1;CRITICAL: Percent was 97.53 %
Jan 31 01:25:34 ch-nagios-p02 nagios: Warning: Return code of 82 for check of service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds.
Jan 31 01:25:34 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;CRITICAL;SOFT;1;(Return code of 82 for service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds)
Jan 31 12:02:03 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;OK;SOFT;2;The shell cannot be started. A failure occurred during initialization
I'm running Nagios XI (5.8.6) on RHEL 7.9 and I'm seeing a couple of messages surrounding a service check that utilizes the NCPA agent plugin on a Windows 2008R2 server that uses/executes a Windows Powershell script. The script checks to see if an UNC doesn't exists and triggers an alert. Since 01/20/22, we have now seen these messages around 5-6 times over the last few days. I've ran a report and this check hasn't notifed/failed nor alerted since 10/01/21 for this exact service check. I've asked the windows admin to check the health of the server and apparently all is well.
What I'm looking for here is any encounter these types of messages and what was done about it? I would like to understand what they mean and to have concrete evidence that this is not an Nagios XI problem directly. Here's the messages. I've added the --verbose to the service check and not much more is displayed in terms of output.
lastly, I'm wondering if this has a resource contentsion on the XI system. I have recently seen this as well. I understand the message in terms of what it states but if this was hitting the limit, we should have many more checks having execution issues is my impression.
Jan 31 13:33:23 ch-nagios-p02 nagios: WARNING: RLIMIT_NPROC is 31192, total max estimated processes is 33770! You should increase your limits (ulimit -u, or limits.conf)
1) Thread failed to start
2) Return code of 66
3) Return code of 82
4) Out of bounds
5) The shell cannot be started. A failure occurred during initialization:
/var/log/messages:
Jan 28 21:11:21 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;SOFT;1;Thread failed to start.
Jan 28 21:12:30 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;SOFT;2;Thread failed to start.
Jan 28 21:13:36 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;HARD;3;Thread failed to start.
Jan 28 21:15:44 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;OK;HARD;3;OK: UNC Share is accessible
Jan 29 02:13:09 ch-nagios-p02 nagios: Warning: Return code of 66 for check of service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds.
Jan 29 02:13:09 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;CRITICAL;SOFT;1;(Return code of 66 for service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds)
Jan 31 01:25:20 ch-nagios-p02 nagios: SERVICE ALERT: EG-RPTDB-P03;CPU Usage;CRITICAL;SOFT;1;CRITICAL: Percent was 97.53 %
Jan 31 01:25:34 ch-nagios-p02 nagios: Warning: Return code of 82 for check of service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds.
Jan 31 01:25:34 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;CRITICAL;SOFT;1;(Return code of 82 for service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds)
Jan 31 12:02:03 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;OK;SOFT;2;The shell cannot be started. A failure occurred during initialization