Disk checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Disk checks

Post by BanditBBS »

Ok, so currently we do one disk check per server and use the -e option to only show errors. Sometimes it can take weeks for a space issue to get fixed and it stays in a warning or critical state that entire time. If I turn on volatile it would log all non OK states but also send a notification for every one. i swear there was an option to only send alert if the text of the plugin changed as well(ie a new drive passed a threshold)

So, I'm being dumb right now, but for which reason? lol
1.) I don't know what option to enable
2.) No such option exists
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Disk checks

Post by ssax »

I believe you may be referring to state stalking:

https://assets.nagios.com/downloads/nag ... lking.html

Let me know if that isn't what you're looking for.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks

Post by BanditBBS »

What I'm looking for is a mixture of Stalking and Volatile. I ideally want Volatile, but notifications only on output change.

i.e. this:

1.) 9:00 - Warning - /usr 90% (Send notification and log it)
2.) 9:05 - Warning - /usr 90% (nothing logged and no notifications)
3.) 9:10 - Warning - /usr 90% - Warning - /var 94% (Send notification and log it!)

Stalking would have just logged 1 and 3 and sent notification for 1, right?
Volatile would log all 3 and send out notifications for all three, right?

So neither are what I am wanting fully, at least from what I read on them. If no option to do what I want, my only other option is to convert 1000 disk checks into 11,000 disk checks and well...ugh, lol
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Disk checks

Post by WillemDH »

I'm having similar issues for multiple volumes on our NetApp, but slightly different as we acknowledge a problem once it arrives in open service problems. when "1.) 9:00 - Warning - /usr 90% (Send notification and log it)" is happening someone will acknowledge the problem. Am I understanding right that you don't ack the problem? Needing to acknowledge the problem (and creating a support ticket) generates an extra layer of complexity to solve the issue. Making the ack unsticky only helps to send an email, but as the problem is still acknowledged, a new support ticket is not made, which potentially causes issues.
For Windows and Linux servers I'm creating a service for each disk as I need to monitor disk load too, so i'm not having this issue there. This setup of course needs a lot more service (as you said), but also requires some mechanism which will check on a regular interval (eg once a week if new disks are added to a server and add the service to Nagios if found.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks

Post by BanditBBS »

Willem,

We ACK also and our notification interval is set to 0 so only the one alert is sent. We open a task in our helpdesk app for every notification. My end goal here is to open a task for /usr crossing a threshold and then another with /var does even if /usr is still in bad state. It is looking more and more that I'll be adding 11,000 new checks :(

EDIT: Now to figure out the best way to add 11,000 services...hmmmm...suggestions? Can I just use bulk cloning and add the services to already existing hosts?
Edit #2: DUH, the new API!!!
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Disk checks

Post by WillemDH »

I'm still using bulk host cloning for most of my bulk edits. It's proven and very fast, you just need to make a weekly inventory of all your Nagios hosts and export to Excel with a column for each type of hostgroup (os, location, infrastructure and role), filter the excel based on whatever you need. Paste in Bulk Host cloning. Select service, next, next finish and done. I hope your systems are up for the extra load! Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks

Post by BanditBBS »

WillemDH wrote:I'm still using bulk host cloning for most of my bulk edits. It's proven and very fast, you just need to make a weekly inventory of all your Nagios hosts and export to Excel with a column for each type of hostgroup (os, location, infrastructure and role), filter the excel based on whatever you need. Paste in Bulk Host cloning. Select service, next, next finish and done. I hope your systems are up for the extra load! Grtz
Willem - So how do you go about filling in those other fields every time you do your weekly export?

Edit: Nevermind, it wouldn't work for me as every 10 or so would be using a different service template and belong to a different servicegroup. Thats what makes the API such a nice thing for me, I'm building a complex as ever spreadsheet that I just have to past in the hostnames and bam...it'll spit out the 11,000 curl statements, then I just to do cut-n-paste.

And yeah, I pray it can handle the load too....going from ~19000 services to ~30000 with this change.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Disk checks

Post by WillemDH »

Haa, a little Powershell script I wrote making use of the PSExcel module (http://ramblingcookiemonster.github.io/PSExcel-Intro/). Just get all hosts in a hostgroup with invoke-webrequest json query. Then loop through all hosts and retrieve the hostgroups for each host. Put everything in a custom Powershell object , eg $Demodata and then

Code: Select all

$DemoData | Export-XLSX -Path C:\temp\Demo.xlsx


Sry I hope the above makes sense to you. I can't publish this on the Exchange as it would only work when your hostgroups are named exactly the same etc... It also contains some sensitive information and needs large cleanup.

But I can highly advice creating something similar. If you (or a colleague) have a bit Powershell knowledge you should be able to get it working. Let me know where you are stuck if you would ever give it a try. Every week I try to expand the script a bit. Nowadays it will compare the host's hostgroups with the actual os, location, infratructure and role and update the hostgroups with Reactor if needed (https://github.com/willemdh/naf_nagios_ ... hostgroups)
And I'm halfway to also retrieve all kinds of info of the server, eg contact and address, serial number etc and put it in the free variables with https://github.com/willemdh/naf_nagios_ ... st_freevar

Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks

Post by BanditBBS »

One of these days I'll have to do a remote join.me session for you and show you my setup(and maybe pass presenter to you and you do same if you can) that way we both might get good ideas :)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Disk checks

Post by WillemDH »

Well I think that's a good idea. Let's try do this somewhere mid January, as I have a two week holiday starting Friday.
Nagios XI 5.8.1
https://outsideit.net
Locked