tuning nagios

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
hanya.radwan
Posts: 194
Joined: Tue Feb 25, 2014 6:12 am
Location: palestine

tuning nagios

Post by hanya.radwan »

hi,

I noticed that Nagios Ram reaches to 98%(critical) and I/O Wait also reaches to critical values,and there is a delay in last check time for the current time so can I increase the RAM and CPU normally, or does there any tuning issues for increase the performance :
the current RAM value is 4GIGA , please check the images
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: tuning nagios

Post by scottwilkerson »

The RAM isn't the problem as there is plenty of cached memory.

The real problem is the I/O Wait, which means Nagios is waiting for the disk. Have a look at
http://assets.nagios.com/downloads/nagi ... p#boosting

Specifically, "Utilizing A RAM Disk In NagiosXI" and "Offloading MySQL To A Remote Server" are know to reduce disk I/O however nothing will help more than fast dedicated disks (not shared with other VM infrastructure)
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: tuning nagios

Post by krobertson71 »

We use SSD's for our server. I/O wait on mine is 0.00 90% of the time.

If this a VM and using VMFS on a storage network, then your disk latency will not be good when you start stacking up on the checks.
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: tuning nagios

Post by cmerchant »

Might help to get context of what the problem might be. Your service event queue peaked at ~1200, was this after a restart? How many hosts/ service checks do you have? What is your check interval?

You might try to do this:

Code: Select all

service nagios stop
service ndo2db stop
service ndo2db start
service nagios start
to reset the scheduler.

Also a suggestion that you update your Nagios XI to the latest version.
hanya.radwan
Posts: 194
Joined: Tue Feb 25, 2014 6:12 am
Location: palestine

Re: tuning nagios

Post by hanya.radwan »

I applied the above RAM performance solutions , but the problem still.
I have 1212 service and 194 host and nagios release:Nagios XI 2014R1.4 environment:VMware
and the above screen problems happened each hour
also I have one host send trap each second and the size of this trap about 700 characters
the trap:
15:24:05.677351 IP 10.200.29.196.57426 > nms.jawwal.ps.snmptrap: V2Trap(659) system.sysUpTime.0=354886300 S:1.1.4.1.0=E:193.126.3.16.0.1 E:193.126.3.16.1.1.0=07_df_01_0b_0f_17_1a_05_2b_03_00 E:193.126.3.16.1.2.0=E:193.126.3.17.1.1.1.1.18.1 E:193.126.3.16.1.3.0=3 E:193.126.3.16.1.4.0=4 E:193.126.3.16.1.5.0=504 E:193.126.3.16.1.6.0="Alarm: The connection to the SMS-C could not be established or has been broken ResourceId: module=mpeSmppInterfacePmModule, group=smsC, object=smsCNoConnectToSmsc, instance=1" E:193.126.3.16.1.7.0="The SMPP Gateway process has terminated the connection to an SMS centre due to an I/O problem. ^JThe SMPP gateway will automatically try to re-establish the connection.^JSession ID = SMS-C 1 - Receiver^M^JFailed operation = Receive message header^M^JIoExceptio"
hanya.radwan
Posts: 194
Joined: Tue Feb 25, 2014 6:12 am
Location: palestine

Re: tuning nagios

Post by hanya.radwan »

problem.png
after applying the above solutions, when I tried to open reports , the above error appear as attached:
from event log:-
[01-12-2015 13:21:32] Error: Unable to update status data file '/var/nagiosramdisk/status.dat': No space left on device
Informational Message[01-12-2015 13:21:32] Error: Unable to rename file '/usr/local/nagios/var/nagios.tmp2fi6GJ' to '/var/nagiosramdisk/status.dat': No space left on device
Informational Message[01-12-2015 13:21:32] Error: my_fcopy() failed to write to '/var/nagiosramdisk/status.dat': No space left on device
You do not have the required permissions to view the files attached to this post.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: tuning nagios

Post by slansing »

Well, it appears you only have roughly 4gigs of memory allotted for the XI server itself, that mixed with your use of a ramdisk is going to cause issues, even at only 1212 services, and 194 hosts. You will want to give your VM more memory than that, as well as likely increase your ramdisk size as it appears you just ran out of room on it. This may be due to you using almost all of your memory, what size did you mount the ramdisk with? One option that may work, would be to stop Nagios, and move all of the contents of the ramdisk to another directory on your local filesystem, then un-mount the ramdisk, and re-mount it with a higher size once you have provided more memory to the system. Then you can move those files back into the ramdisk and start Nagios.

http://assets.nagios.com/downloads/nagi ... giosXI.pdf

Of course, you have to do a bit of math here, if you only have 4gb of memory as you currently do, and are using almost all of that 4gb for the system itself, not including the ramdisk, you are going to have issues, so you need to provide some wiggle-room for the system as well.
hanya.radwan
Posts: 194
Joined: Tue Feb 25, 2014 6:12 am
Location: palestine

Re: tuning nagios

Post by hanya.radwan »

I upgraded to 20142.5, and increase the ram to 6Giga and apply ram disk solution, after that I noticed that the behavior improved after I put ram for mounted file to 500M.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: tuning nagios

Post by scottwilkerson »

hanya.radwan wrote:I upgraded to 20142.5, and increase the ram to 6Giga and apply ram disk solution, after that I noticed that the behavior improved after I put ram for mounted file to 500M.
Excellent.

Also, just wanted to mention because this is in a VMWare environment, if you have other systems utilizing the same disk, you could still experience I/O wait if the other systems have a spike in I/O usage once per hour...
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hanya.radwan
Posts: 194
Joined: Tue Feb 25, 2014 6:12 am
Location: palestine

Re: tuning nagios

Post by hanya.radwan »

thanks
Locked