Help calculating load

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Help calculating load

Post by jbennett »

We are now using NRDS to check a number of things on a number of machines.

We currently have 394 linux boxes that we need to check roughly 47 services on EACH (some fewer, some more)

Am I reaching the limits of a single box even if these are all being run via NRDS checks?

My current box has the following:

Code: Select all

# Active Host / Service Checks:	1635 / 2917
# Passive Host / Service Checks:	120 / 8694
I will be looking to add roughly 275 more passive hosts and teh associated NRDS service checks with those.

Currently, # top shows the following:

Code: Select all

top - 11:21:57 up 3 days, 13 min,  4 users,  load average: 18.49, 21.19, 21.66
Tasks: 443 total,  13 running, 430 sleeping,   0 stopped,   0 zombie
Cpu(s): 92.8%us,  6.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  12309660k total,  6694676k used,  5614984k free,   596440k buffers
Swap: 18972656k total,        0k used, 18972656k free,  3304544k cached
What are my best options for making this work as far as system capability?

How much would it benefit me to offload the db to a remote dedicated db server?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Help calculating load

Post by lmiltchev »

Offloading the database would definitely help you reduce the load. It's hard to say how much it would benefit you, but the bigger the database is, the higher the benefit would be. You may also want to take a look at some other options for boosting the Nagios XI performance here:

http://assets.nagios.com/downloads/nagi ... p#boosting

These are our general guidelines on the hardware requirements needed to run Nagios XI:

http://assets.nagios.com/downloads/nagi ... ements.pdf
Be sure to check out our Knowledgebase for helpful articles and solutions!
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Help calculating load

Post by jbennett »

I had not seen the rrdcashed option.

I just went through the documentation found here: http://assets.nagios.com/downloads/nagi ... ios_XI.pdf

But I'm a little lost on the verifying step. Per that step:
Now to verify that the daemon is working correctly, check the location of the directory that was specified with the -j option in the
/etc/sysconfig/rrdcached file. In the example above, the journaling directory is /tmp. There should be an rrd.journal file there with a
recent timestamp matching the last time the rrdcached service was restarted.
When I check the /etc/sysconfig/rrdcached file, I only see the following:

Code: Select all

tmp]# vi /etc/sysconfig/rrdcached

# Settings for rrdcached
OPTIONS="-l unix:/var/rrdtool/rrdcached/rrdcached.sock -s rrdcached -m 664 -b /var/rrdtool/rrdcached"
RRDC_USER=rrdcached
There is no '-j' option indicated. Also, later in the verification step it says:
The PNP changes can be verified by looking at a performance graph in the interface after the number of seconds specified by the -w
directive in the /etc/sysconfig/rrdcached file.
There is no '-w' directive here.

Have I missed a step somewhere?

*feeling dense*
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Help calculating load

Post by slansing »

I believe by default the journal is placed in /tmp, you can add a custom directory by adding the -j flag in the configuration file:

Code: Select all

-j dir
Write updates to a journal in dir. In the event of a program or system crash, this will allow the daemon to write any updates that were pending at the time of the crash.

On startup, the daemon will check for journal files in this directory. If found, all updates therein will be read into memory before the daemon starts accepting new connections.

The journal will be rotated with the same frequency as the flush timer given by -f.

When journaling is enabled, the daemon will use a fast shutdown procedure. Rather than flushing all files to disk, it will make sure the journal is properly written and exit immediately. Although the RRD data files are not fully up-to-date, no information is lost; all pending updates will be replayed from the journal next time the daemon starts up.

To disable fast shutdown, use the -F option.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Help calculating load

Post by jbennett »

slansing wrote:I believe by default the journal is placed in /tmp, you can add a custom directory by adding the -j flag in the configuration file:
*confused*

There is a 'journal' file in /tmp but it's not doing anything:

Code: Select all

tmp]# ls -lh *journal*
-rw-r--r-- 1 root root 0 Jul 11 14:19 journal
Last edited by jbennett on Fri Jul 12, 2013 7:13 am, edited 1 time in total.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Help calculating load

Post by lmiltchev »

What do you see when you cat it?

Code: Select all

cat /tmp/rrd.journal.xxx.xxx
Are your rrds updating?
Be sure to check out our Knowledgebase for helpful articles and solutions!
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Help calculating load

Post by jbennett »

That's what I'm saying, I don't have the rrd.journal.xx.xx file in the first place, even after running the install script:

Code: Select all

tmp]# cat /tmp/rrd.journal.*
cat: /tmp/rrd.journal.*: No such file or directory
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Help calculating load

Post by slansing »

What Ludmil is asking is if you are getting perfdata right now, and if your actual .RRD files are updating. There may be no need for this file as all it shows is at what time which RRD is updated, in a log format.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Help calculating load

Post by jbennett »

Data is showing up correctly in Nagios and when I check the ram disk, it appears that the host-perfdata and service-perfdata files are updating:

Code: Select all

-rw-rw-r-- 1 nagios users  2.1K Jul 12 10:01 host-perfdata
-rw-r--r-- 1 nagios nagios  14M Jul 11 15:04 objects.cache
-rw-rw-r-- 1 nagios users   24K Jul 12 10:01 service-perfdata
where would my rrd files be located?

In checking my graphs, I don't see much in the way of data actually ON the graphs, only on a couple of them. Also, there are graphs showing for hosts which I have disabled (and even deleted some) as of yesterday and since applied configs. EDIT: THis last part appears to be related to hosts being added to particular host groups that are still active. I'm trying to delete those host groups as they are no longer necessary, but I'm not able to successfully apply the configs once that's been done.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Help calculating load

Post by sreinhardt »

Most likely you are running into fun dependency issues. The hosts containing the hostgroups within their configs, opposed to the host group config defining which hosts should be contained in the group. The errors should show the hostname that is dependent on which hostgroup, so that you can go to that host and remove it from the host config, and attempt to apply configuration again. Otherwise feel free to post the errors you are seeing and we can take a look!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked