nagios child process takes a long time to start

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
corkyman
Posts: 120
Joined: Wed Jul 13, 2016 12:58 pm

nagios child process takes a long time to start

Post by corkyman »

We are in process of migrating to the remote enterprise MySQL 5.7 server from a locally running MySQL 5.5. During the test we observers a very long time that it takes for a second nagios.cfg process to start -- about 12-14 minutes. This happens when no nagios service processes are running and then the command 'service nagios start' is issued. The command completes quickly and the first nagios.cfg starts running. The second nagios.cfg process starts in 12-14 minutes. After that all states are green as expected and Nagios starts processing checks and operating normally. When we have fallen back to our local MySQL and times the startup of the second process it was about 2 min.
What information can I collect to debug this issue besides looking at the db logs? What is happening when the second process starts?
The duration of this delay is comparable to the time it normally takes to run Apply Config or data base repair.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: nagios child process takes a long time to start

Post by rkennedy »

A few questions -
1. When you say enterprise MYSQL, what exactly are you referring to? I was able to find https://www.mysql.com/products/enterprise/ - but, this points to Oracle ultimately.
2. Is the database server hosted locally, or is it remote now? Just trying to grasp if there is any latency.
3. What sort of disks in the machine running?
4. Is it a physical machine, or a virtual machine? I've seen VM's come to a halt due to a poor hypervisor in the past / "rate limiting".
3. How many hosts / services do you have configured on the system? Generally the longer time is due to a lot of objects. I've seen it take a few minutes before for the child to spawn, but never over 10.
Former Nagios Employee
corkyman
Posts: 120
Joined: Wed Jul 13, 2016 12:58 pm

Re: nagios child process takes a long time to start

Post by corkyman »

1. By the enterprise MySQL I mean an enterprise grade MySQL system. It runs on a cluster of two physical server, each having 32 cores.
2. The database is hosted remotely. We testing this new remote configuration. Currently, the Nagios system uses a local MySQL db which is on the same virtual machine as the Nagios. This vm has 6 cores.
3. The return from 'hdparm -I /dev/sda' command

Local db machine:

ATA device, with non-removable media
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders 0 0
heads 0 0
sectors/track 0 0
--
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 0 MBytes
device size with M = 1000*1000: 0 MBytes
cache/buffer size = unknown
Capabilities:
IORDY not likely
Cannot perform double-word IO
R/W multiple sector transfer: not supported
DMA: not supported
PIO: pio0

Remote db machine:

ATA device, with non-removable media
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders 0 0
heads 0 0
sectors/track 0 0
--
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 0 MBytes
device size with M = 1000*1000: 0 MBytes
cache/buffer size = unknown
Capabilities:
IORDY not likely
Cannot perform double-word IO
R/W multiple sector transfer: not supported
DMA: not supported
PIO: pio0

4. The remote db is a physical machine (nagios start up 12-14 min). The local db is a virtual machine (start up 2 min)
5. There are 7091 hosts and 80829 services defined.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios child process takes a long time to start

Post by tgriep »

When you say the database is hosted remotely, can you describe the details on that?
Is it on the same network and what is the speed between the Nagios server and the MYSQL server?
Be sure to check out our Knowledgebase for helpful articles and solutions!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: nagios child process takes a long time to start

Post by tmcdonald »

In addition, with 88k objects defined, the startup is going to be a bit slow as Nagios needs to update the remote DB with the retained status info. Normally I don't recommend more than about 30k checks on a single XI server at once without some performance tweaks. 88k is definitely beyond what we would normally recommend for an offloaded DB, since at a certain point (as @tgriep alluded to) the offloading of a DB actually does more harm than good if your network speeds are not up to par.
Former Nagios employee
corkyman
Posts: 120
Joined: Wed Jul 13, 2016 12:58 pm

Re: nagios child process takes a long time to start

Post by corkyman »

The remote db server is installed on a two node cluster. Two real, 16 core servers. The connection to the cluster is set up though VIP. It is on a different network though. Here is the traceroute from my Nagios server to the VIP.
[c601018@vhlgnngxi001 ~]$ traceroute 10.7.65.130
traceroute to 10.7.65.130 (10.7.65.130), 30 hops max, 60 byte packets
1 MG-6004-HD3-1.tvlport.net (10.5.194.3) 0.399 ms MG-6004-HD3-2.tvlport.net (10.5.194.2) 0.493 ms 0.603 ms
2 MG-FW-VRRP.tvlport.net (10.5.192.193) 1.209 ms 1.467 ms 1.457 ms
3 AC-6509-HD3-1.tvlport.net (10.5.192.3) 1.305 ms 1.295 ms 1.415 ms
4 TF-FW-HSRP.tvlport.net (10.7.8.46) 1.266 ms 1.388 ms 1.380 ms
5 * TF-7018-HD3-1.tvlport.net (10.7.8.77) 4.530 ms 1.732 ms
6 10.7.8.203 (10.7.8.203) 1.554 ms 2.273 ms 2.620 ms
7 TB-7018-HD3-1.tvlport.net (10.7.8.157) 3.137 ms 2.355 ms 3.241 ms
8 mynagiosxidb.prod.tvlport.com (10.7.65.130) 2.482 ms 2.473 ms 2.447 ms
[c601018@vhlgnngxi001 ~]$

I am reaching out to my network guys for details and validation. Anything specific you need me to ask them?
I'd still like to understand the process that Nagios is going through between the first nagios service starting and the second and the impact of the network speed on what is going on at that time.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios child process takes a long time to start

Post by tgriep »

The MYSQL server should be on the same network as the XI server to get the highest speed between the servers and to minimize any outages between the servers that could cause MYSQL table corruptions.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked