Page 1 of 1

NagiosXI-5.4.2 performance Issue

Posted: Thu Feb 09, 2017 6:08 am
by vinish098
Dear Team,

We are monitoring 630 hosts with 6500 services.

Server Config mentioned below:
RAM - 24GB
CPU Core - 12
HDD Size - 550GB

Whenever I do any config modifications in existing host, "Apply Configuration" is taking 3-4 hours to complete.
monitoring-performance.JPG
server-stats.JPG
Please help us to overcome this issue

Re: NagiosXI-5.4.2 performance Issue

Posted: Thu Feb 09, 2017 1:00 pm
by rkennedy
Do you have a lot of failing checks on your system? The reason I ask, is in the past I've seen failing SNMP checks increase the load substantially.

A few other questions -
- What sort of disks is the XI machine running on?
- Does it always take hours to finish?
- When you apply configuration, please run a tail -f /usr/local/nagiosxi/var/cmdsubsys.log - prior to doing so, then post this for us to review. This should show us what's taking so long for the system to finally successfully apply.

Re: NagiosXI-5.4.2 performance Issue

Posted: Fri Feb 10, 2017 4:04 am
by vinish098
PFB screenshot for the host & service status
status.JPG
- No Shared disk, Local disk
- I am not sure, whether its finished or not. Page got hanged after 3 hours.
- /usr/local/nagiosxi/var/cmdsubsys.log
applyconfig-log.JPG

Re: NagiosXI-5.4.2 performance Issue

Posted: Fri Feb 10, 2017 1:51 pm
by rkennedy
There appear to be quite a bit of failing checks - is this normal in your environment?

What is the full output of these commands?

Code: Select all

top | head -n19
free -m
Then, the full output of these two as well -

Code: Select all

ps -eo pcpu,args --sort=-%cpu

ps axo rss,comm,pid \
| awk '{ proc_list[$2]++; proc_list[$2 "," 1] += $1; } \
END { for (proc in proc_list) { printf("%d\t%s\n", \
proc_list[proc "," 1],proc); }}' | sort -n | tail -n 10 | sort -rn \
| awk '{$1/=1024;printf "%.0fMB\t",$1}{print $2}'

Re: NagiosXI-5.4.2 performance Issue

Posted: Wed Mar 01, 2017 2:25 pm
by tmcdonald
Just checking in since we have not heard from you in a while. Did @rkennedy's post clear things up or has the issue otherwise been resolved?