In need of some feedback and understanding...
I'm running Nagios XI (5.8.6) on RHEL 7.9 and I'm seeing a couple of messages surrounding a service check that utilizes the NCPA agent plugin on a Windows 2008R2 server that uses/executes a Windows Powershell script. The script checks to see if an UNC doesn't exists and triggers an alert. Since 01/20/22, we have now seen these messages around 5-6 times over the last few days. I've ran a report and this check hasn't notifed/failed nor alerted since 10/01/21 for this exact service check. I've asked the windows admin to check the health of the server and apparently all is well.
What I'm looking for here is any encounter these types of messages and what was done about it? I would like to understand what they mean and to have concrete evidence that this is not an Nagios XI problem directly. Here's the messages. I've added the --verbose to the service check and not much more is displayed in terms of output.
lastly, I'm wondering if this has a resource contentsion on the XI system. I have recently seen this as well. I understand the message in terms of what it states but if this was hitting the limit, we should have many more checks having execution issues is my impression.
Jan 31 13:33:23 ch-nagios-p02 nagios: WARNING: RLIMIT_NPROC is 31192, total max estimated processes is 33770! You should increase your limits (ulimit -u, or limits.conf)
1) Thread failed to start
2) Return code of 66
3) Return code of 82
4) Out of bounds
5) The shell cannot be started. A failure occurred during initialization:
/var/log/messages:
Jan 28 21:11:21 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;SOFT;1;Thread failed to start.
Jan 28 21:12:30 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;SOFT;2;Thread failed to start.
Jan 28 21:13:36 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;WARNING;HARD;3;Thread failed to start.
Jan 28 21:15:44 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;OK;HARD;3;OK: UNC Share is accessible
Jan 29 02:13:09 ch-nagios-p02 nagios: Warning: Return code of 66 for check of service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds.
Jan 29 02:13:09 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;CRITICAL;SOFT;1;(Return code of 66 for service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds)
Jan 31 01:25:20 ch-nagios-p02 nagios: SERVICE ALERT: EG-RPTDB-P03;CPU Usage;CRITICAL;SOFT;1;CRITICAL: Percent was 97.53 %
Jan 31 01:25:34 ch-nagios-p02 nagios: Warning: Return code of 82 for check of service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds.
Jan 31 01:25:34 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;CRITICAL;SOFT;1;(Return code of 82 for service 'Witness Share - Cluster Name: CH-SQL-D02' on host 'CH-SHPNT-D50' was out of bounds)
Jan 31 12:02:03 ch-nagios-p02 nagios: SERVICE ALERT: CH-SHPNT-D50;Witness Share - Cluster Name: CH-SQL-D02;OK;SOFT;2;The shell cannot be started. A failure occurred during initialization
Various Messages spawning alerts on NCPA/Plugin execution
-
jstormshak
- Posts: 27
- Joined: Mon May 04, 2020 11:41 am
Re: Various Messages spawning alerts on NCPA/Plugin executio
I found this here:
https://appuals.com/powershell-failure- ... alization/
I'm wondering if we switch NCPA to using the 32-bit powershell in the ncpa.cfg:
What is yours currently set to?
What plugin are you running? Please attach or link us to it so I can review it.
You can try following the Running a .NET Framework Repair Tool section on that page and see if that helps.
https://appuals.com/powershell-failure- ... alization/
I'm wondering if we switch NCPA to using the 32-bit powershell in the ncpa.cfg:
Code: Select all
# Since windows NCPA is 32-bit, if you need to use 64-bit powershell, try the following for
# the powershell plugin definition:
# .ps1 = c:\windows\sysnative\windowspowershell\v1.0\powershell.exe -ExecutionPolicy Bypass -File $plugin_name $plugin_args
#
# Linux / Mac OS X
.sh = /bin/sh $plugin_name $plugin_args
.py = python $plugin_name $plugin_args
# Windows
.ps1 = powershell -ExecutionPolicy Bypass -File $plugin_name $plugin_argsWhat plugin are you running? Please attach or link us to it so I can review it.
You can try following the Running a .NET Framework Repair Tool section on that page and see if that helps.
-
jstormshak
- Posts: 27
- Joined: Mon May 04, 2020 11:41 am
Re: Various Messages spawning alerts on NCPA/Plugin executio
Our NCPA is currently the exact same as what you posted in your response. The version of NCPA is "Version 2.3.1".
Re: Various Messages spawning alerts on NCPA/Plugin executio
You can try editing the ncpa.cfg file on the remote system and changing this:
To this:
Then restart the ncpa services and test it again.
OR you can try rebuilding .net via the link I sent.
Another option would be to upgrade NCPA to the latest (2.4.0) to see if that helps.
Code: Select all
.ps1 = powershell -ExecutionPolicy Bypass -File $plugin_name $plugin_argsCode: Select all
#.ps1 = powershell -ExecutionPolicy Bypass -File $plugin_name $plugin_args
.ps1 = c:\windows\sysnative\windowspowershell\v1.0\powershell.exe -ExecutionPolicy Bypass -File $plugin_name $plugin_argsOR you can try rebuilding .net via the link I sent.
Another option would be to upgrade NCPA to the latest (2.4.0) to see if that helps.