I am getting high load and CPU on my nagios host. No changes have been made recently.
When i checked the /var/log/message, i could see the below entries:
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf205-idrac:idrac_snmp_power_supply by 13 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf211-idrac:idrac_snmp_power_unit by 10 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging localhost:Service Status - crond by 5 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging csitstrh7:Memory Usage by 9 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf208-idrac:idrac_snmp_sensor by 15 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadvcbf001:Datastore Usage - malesx_vol24 by 11 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadvcbf001:Datastore Usage - malesx_vol02 by 7 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvubxe03_prod_core_2019:Ping by 5 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf209-idrac:idrac_snmp_vdisk by 7 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf206-idrac:idrac_snmp_sensor by 12 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvujss01_prod_core_2019:Ping by 14 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging iadxenbf208-idrac:idrac_snmp_fan by 8 seconds...
Oct 6 03:04:17 cvrmnagiosxi001 nagios: #011Max concurrent service checks (50) has been reached. Nudging gvubxe01_prod_core_2019:CPU Stats by 11 seconds...
Looking at the npcd.log i could see the below errors:
[10-06-2020 02:58:31] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967260.perfdata.service'
[10-06-2020 02:58:46] NPCD: WARN: MAX load reached: load 53.670000/20.000000 at i=0
[10-06-2020 02:59:01] NPCD: WARN: MAX load reached: load 42.670000/20.000000 at i=1
[10-06-2020 02:59:16] NPCD: WARN: MAX load reached: load 34.920000/20.000000 at i=1
[10-06-2020 02:59:31] NPCD: WARN: MAX load reached: load 27.780000/20.000000 at i=1
[10-06-2020 02:59:46] NPCD: WARN: MAX load reached: load 22.290000/20.000000 at i=1
[10-06-2020 03:00:46] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:00:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967618.perfdata.service'
[10-06-2020 03:01:21] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:01:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967646.perfdata.service'
[10-06-2020 03:01:27] NPCD: ERROR: Executed command exits with return code '7'
[10-06-2020 03:01:27] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1601967641.perfdata.service'
[10-06-2020 03:04:49] NPCD: ERROR: Executed command exits with return code '7'
Max concurrent service checks (50) has been reached
-
IT-OPS-SYS
- Posts: 184
- Joined: Sun Jan 07, 2018 12:56 pm
Re: Max concurrent service checks (50) has been reached
This is being addressed in a ticket so we'll continue troubleshooting steps through the ticket and lock this thread.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.