Total number of concurrent checks exceeds 15, aborting!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
bsivavani
Posts: 339
Joined: Tue Oct 06, 2015 9:17 am

Total number of concurrent checks exceeds 15, aborting!

Post by bsivavani »

Hi Support,

We are receiving below error in production and it is impacting monitoring as we are receiving flood of alerts in a minute.

Total number of concurrent checks exceeds 15, aborting!

We have followed below technote and increased checks to 20, but no luck.

https://support.nagios.com/forum/viewto ... 3&p=108791

Please suggest how to proceed on this.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by rkennedy »

I know you mentioned increasing it, but what is the result once you increase the checks to 20? --concurrent_checks 20
Former Nagios Employee
bsivavani
Posts: 339
Joined: Tue Oct 06, 2015 9:17 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by bsivavani »

after changing we are receiving below error

Total number of concurrent checks exceeds 20, aborting!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by rkennedy »

How many checks do you have in your environment running against VMware? What happens if the conncurrent_checks is increased to say... 50?
Former Nagios Employee
bsivavani
Posts: 339
Joined: Tue Oct 06, 2015 9:17 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by bsivavani »

How many checks do you have in your environment running against VMware? What happens if the conncurrent_checks is increased to say... 50?


Please let us know is there any way to find total number of checks against VMware ?
bsivavani
Posts: 339
Joined: Tue Oct 06, 2015 9:17 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by bsivavani »

Below are vMA server information

Please find attachment for top command output.

------------------------------------------------------------
Free memory
vi-admin@aushosvmaprd00:~> free -m
total used free shared buffers cached
Mem: 12041 705 11336 0 12 81
-/+ buffers/cache: 610 11430
Swap: 133 89 44
vi-admin@aushosvmaprd00:~>
---------------------------------------------------------

---------------------------
Disk usage
vi-admin@aushosvmaprd00:~> df -kh
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 2.7G 1.5G 1.1G 59% /
udev 5.9G 88K 5.9G 1% /dev
tmpfs 5.9G 0 5.9G 0% /dev/shm
/dev/sda1 128M 37M 85M 31% /boot
vi-admin@aushosvmaprd00:~>
------------------------


As this is impacting production on all servers. Please suggest us next steps.
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by rkennedy »

I believe you may have quite a bit of checks running against your vmware, and that is the underlying issue. The script is not running because of the amount of concurrent checks you have running. As I said in my previous post, you'll need to try increasing it to a higher number.

What is the result of -

Code: Select all

ps -ef|grep box293_check_vm|wc -l
Former Nagios Employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Total number of concurrent checks exceeds 15, aborting!

Post by WillemDH »

Ok, Just a few thoughts that could help you in the long run:

Do you monitor your VMA servers? How many VMA servers do you have? How many CPU's do your VMA servers have? If you are not yet monitoring your VMA servers, please start doing so. Install the NRPE agent on the VMA servers. You might consider using this plugin: https://exchange.nagios.org/directory/P ... ss/details and use it to monitor process count, average CPU usage and memory usage of the box293_check_vm process.

You might consider desactivating a bunch of services making use of you VMA. Start by desactvating 50 % of your services making use of your VMA. Then it might be a good idea to give your VMA a reboot and monitor if the load calms down. If everything seems to work ok, you can re-activate another 25 % of your services and check again. It might be necessary to add more CPU's or even setup another VMA server to split up the load. (that's what I had to do)

Also if you are making use of the overall datastore performance checks, desactivate these first as these generate a lot of load.

EDIT: Did you add 'nice' in your commands?

I hope this helps! Grtz
Nagios XI 5.8.1
https://outsideit.net
bsivavani
Posts: 339
Joined: Tue Oct 06, 2015 9:17 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by bsivavani »

Thank you. We have below concerns.

1.How many minimum number of checks can we do in one vMA server.

2.We have more than 100 ESXi servers to be added on to Nagios that too for data store adapter and storage adapter performance monitoring in one vMA.

3.We are using one vMA server for on-boarding 100 servers (1000 services, if we take average of 10 per server). Please suggest if we can proceed on this.

4.Please give us example command with nice.

Please find below cpu information from vma server
vi-admin@aushosvmaprd00:~> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Stepping: 4
CPU MHz: 2793.000
BogoMIPS: 5586.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-5
vi-admin@aushosvmaprd00:~>
bsivavani
Posts: 339
Joined: Tue Oct 06, 2015 9:17 am

Re: Total number of concurrent checks exceeds 15, aborting!

Post by bsivavani »

We are receiving flood alerts for CPU Usage and Memory aswell.

Shall we increase concurrent checks to 50 for those services.
Locked