Max concurrent service checks (50) has been reached
Posted: Tue Oct 06, 2020 2:11 am
I am getting high load and CPU on my nagios host. No changes have been made recently.
When i checked the /var/log/message, i could see the below entries:
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf205-idrac:idrac_snmp_power_supply by 13 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf211-idrac:idrac_snmp_power_unit by 10 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging localhost:Service Status - crond by 5 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging csitstrh7:Memory Usage by 9 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf208-idrac:idrac_snmp_sensor by 15 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadvcbf001:Datastore Usage - malesx_vol24 by 11 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadvcbf001:Datastore Usage - malesx_vol02 by 7 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvubxe03_prod_core_2019:Ping by 5 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf209-idrac:idrac_snmp_vdisk by 7 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf206-idrac:idrac_snmp_sensor by 12 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvujss01_prod_core_2019:Ping by 14 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf208-idrac:idrac_snmp_fan by 8 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvubxe01_prod_core_2019:CPU Stats by 11 seconds...
Looking at the npcd.log i could see the below errors:
[10-06-2020 02:58:31] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967260.perfdata.service'
[10-06-2020 02:58:46] NPCD: WARN: MAX load reached: load 53.670000/20.000000 at i=0
[10-06-2020 02:59:01] NPCD: WARN: MAX load reached: load 42.670000/20.000000 at i=1
[10-06-2020 02:59:16] NPCD: WARN: MAX load reached: load 34.920000/20.000000 at i=1
[10-06-2020 02:59:31] NPCD: WARN: MAX load reached: load 27.780000/20.000000 at i=1
[10-06-2020 02:59:46] NPCD: WARN: MAX load reached: load 22.290000/20.000000 at i=1
[10-06-2020 03:00:46] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:00:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967618.perfdata.service'
[10-06-2020 03:01:21] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:01:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967646.perfdata.service'
[10-06-2020 03:01:27] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:01:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967641.perfdata.service'
[10-06-2020 03:04:49] NPCD: ERROR: Executed command exits with return code '7'
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf205-idrac:idrac_snmp_power_supply by 13 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf211-idrac:idrac_snmp_power_unit by 10 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging localhost:Service Status - crond by 5 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging csitstrh7:Memory Usage by 9 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf208-idrac:idrac_snmp_sensor by 15 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadvcbf001:Datastore Usage - malesx_vol24 by 11 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadvcbf001:Datastore Usage - malesx_vol02 by 7 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvubxe03_prod_core_2019:Ping by 5 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf209-idrac:idrac_snmp_vdisk by 7 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf206-idrac:idrac_snmp_sensor by 12 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvujss01_prod_core_2019:Ping by 14 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf208-idrac:idrac_snmp_fan by 8 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvubxe01_prod_core_2019:CPU Stats by 11 seconds...
Looking at the npcd.log i could see the below errors:
[10-06-2020 02:58:31] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967260.perfdata.service'
[10-06-2020 02:58:46] NPCD: WARN: MAX load reached: load 53.670000/20.000000 at i=0
[10-06-2020 02:59:01] NPCD: WARN: MAX load reached: load 42.670000/20.000000 at i=1
[10-06-2020 02:59:16] NPCD: WARN: MAX load reached: load 34.920000/20.000000 at i=1
[10-06-2020 02:59:31] NPCD: WARN: MAX load reached: load 27.780000/20.000000 at i=1
[10-06-2020 02:59:46] NPCD: WARN: MAX load reached: load 22.290000/20.000000 at i=1
[10-06-2020 03:00:46] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:00:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967618.perfdata.service'
[10-06-2020 03:01:21] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:01:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967646.perfdata.service'
[10-06-2020 03:01:27] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:01:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967641.perfdata.service'
[10-06-2020 03:04:49] NPCD: ERROR: Executed command exits with return code '7'