Slowness troubleshooting --> 5.4.11 to 5.4.13.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by emartine »

So the production server at this point... I've received multiple complaints that it is taking about 2 minutes for changes to be applied and "while those changes are taking effect Nagios isn't checking anything." Not sure if the second statement is true or not.


At this time I see:

ps -ef | grep nagios.cfg
root 34212 32642 0 01:31 pts/0 00:00:00 grep nagios.cfg
nagios 51312 1 5 May20 ? 00:33:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 51457 51312 0 May20 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg


At this time we are monitoring
2099 hosts and 17410 services.



Host Check Latency
Min 0.00 sec

Max 1.13 sec

Avg 0.01 sec

Host Check Execution Time
Min 0.00 sec

Max 10.00 sec

Avg 0.06 sec

Service Check Latency
Min 0.00 sec

Max 1.17 sec

Avg 0.02 sec

Service Check Execution Time
Min 0.00 sec

Max 60.01 sec

Avg 0.14 sec



Active Host Checks
1-min 574

5-min 1,922

15-min 1,923

Passive Host Checks
1-min 0

5-min 0

15-min 0

Active Service Checks
1-min 5,183

5-min 15,468

15-min 15,470

Passive Service Checks
1-min 0

5-min 10

15-min 10

Metric

Value

Load
1-min 1.15

5-min 1.11

15-min 1.14

CPU Stats
User 2.82%

Nice 0.00%

System 1.18%

I/O Wait 0.01%

Steal 0.00%

Idle 95.99%

Memory
Total 129022 MB
Used 115,618 MB

Free 13,404 MB

Shared 33 MB

Buffers 298 MB

Cached 85,341 MB

Swap
Total 4,094 MB
Used 95 MB

Free 3,999 MB



My top CPU resource users:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26939 mysql 20 0 9.8g 301m 3820 S 69.4 0.2 118658:49 mysqld
33326 apache 20 0 491m 68m 8736 R 69.4 0.1 14:37.10 httpd
9662 nagios 20 0 55988 7420 1060 S 17.8 0.0 5:22.99 ndo2db
9607 nagios 20 0 106m 90m 1336 S 8.6 0.1 0:54.38 nagios
62167 apache 20 0 492m 68m 8744 S 5.9 0.1 11:05.51 httpd
35899 apache 20 0 475m 52m 8332 S 5.6 0.0 0:23.72 httpd


I know there are two performance tunning documents:

https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... zation.pdf

After reading these I'm still trying to figure out what the best option is.
Any advice?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

Previously we had increased the time following this doc
https://support.nagios.com/kb/article.php?id=172

For faster restarts it would be better to set this back down to about 20 seconds

Code: Select all

for i in {1..20} ; do
In the Maximizing Performance doc you listed there are 2 things that will boost performance the most
https://assets.nagios.com/downloads/nag ... giosXI.pdf
and
https://assets.nagios.com/downloads/nag ... Server.pdf
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by emartine »

It is running on a 1TB fusion io card. Before I start messing with disk and offloading mysql. I first want to try:

1) Increasing the frequency of the checks for test servers from 3min to 7min.
2) I will also make the time change to 20 seconds as mentioned below.
3) Change the settings for check_result_reaper_frequency=3 and max_check_result_reaper_time=10
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slowness troubleshooting --> 5.4.11 to 5.4.13.

Post by scottwilkerson »

emartine wrote:It is running on a 1TB fusion io card. Before I start messing with disk and offloading mysql. I first want to try:

1) Increasing the frequency of the checks for test servers from 3min to 7min.
2) I will also make the time change to 20 seconds as mentioned below.
3) Change the settings for check_result_reaper_frequency=3 and max_check_result_reaper_time=10
These all sound like logical steps. #1 will make a big difference system wide.
#2 will make the Apply Configurations happen much faster
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked