Ok, so currently we do one disk check per server and use the -e option to only show errors. Sometimes it can take weeks for a space issue to get fixed and it stays in a warning or critical state that entire time. If I turn on volatile it would log all non OK states but also send a notification for every one. i swear there was an option to only send alert if the text of the plugin changed as well(ie a new drive passed a threshold)
So, I'm being dumb right now, but for which reason? lol
1.) I don't know what option to enable
2.) No such option exists
Disk checks
Disk checks
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Disk checks
I believe you may be referring to state stalking:
https://assets.nagios.com/downloads/nag ... lking.html
Let me know if that isn't what you're looking for.
https://assets.nagios.com/downloads/nag ... lking.html
Let me know if that isn't what you're looking for.
Re: Disk checks
What I'm looking for is a mixture of Stalking and Volatile. I ideally want Volatile, but notifications only on output change.
i.e. this:
1.) 9:00 - Warning - /usr 90% (Send notification and log it)
2.) 9:05 - Warning - /usr 90% (nothing logged and no notifications)
3.) 9:10 - Warning - /usr 90% - Warning - /var 94% (Send notification and log it!)
Stalking would have just logged 1 and 3 and sent notification for 1, right?
Volatile would log all 3 and send out notifications for all three, right?
So neither are what I am wanting fully, at least from what I read on them. If no option to do what I want, my only other option is to convert 1000 disk checks into 11,000 disk checks and well...ugh, lol
i.e. this:
1.) 9:00 - Warning - /usr 90% (Send notification and log it)
2.) 9:05 - Warning - /usr 90% (nothing logged and no notifications)
3.) 9:10 - Warning - /usr 90% - Warning - /var 94% (Send notification and log it!)
Stalking would have just logged 1 and 3 and sent notification for 1, right?
Volatile would log all 3 and send out notifications for all three, right?
So neither are what I am wanting fully, at least from what I read on them. If no option to do what I want, my only other option is to convert 1000 disk checks into 11,000 disk checks and well...ugh, lol
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Disk checks
I'm having similar issues for multiple volumes on our NetApp, but slightly different as we acknowledge a problem once it arrives in open service problems. when "1.) 9:00 - Warning - /usr 90% (Send notification and log it)" is happening someone will acknowledge the problem. Am I understanding right that you don't ack the problem? Needing to acknowledge the problem (and creating a support ticket) generates an extra layer of complexity to solve the issue. Making the ack unsticky only helps to send an email, but as the problem is still acknowledged, a new support ticket is not made, which potentially causes issues.
For Windows and Linux servers I'm creating a service for each disk as I need to monitor disk load too, so i'm not having this issue there. This setup of course needs a lot more service (as you said), but also requires some mechanism which will check on a regular interval (eg once a week if new disks are added to a server and add the service to Nagios if found.
For Windows and Linux servers I'm creating a service for each disk as I need to monitor disk load too, so i'm not having this issue there. This setup of course needs a lot more service (as you said), but also requires some mechanism which will check on a regular interval (eg once a week if new disks are added to a server and add the service to Nagios if found.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Disk checks
Willem,
We ACK also and our notification interval is set to 0 so only the one alert is sent. We open a task in our helpdesk app for every notification. My end goal here is to open a task for /usr crossing a threshold and then another with /var does even if /usr is still in bad state. It is looking more and more that I'll be adding 11,000 new checks
EDIT: Now to figure out the best way to add 11,000 services...hmmmm...suggestions? Can I just use bulk cloning and add the services to already existing hosts?
Edit #2: DUH, the new API!!!
We ACK also and our notification interval is set to 0 so only the one alert is sent. We open a task in our helpdesk app for every notification. My end goal here is to open a task for /usr crossing a threshold and then another with /var does even if /usr is still in bad state. It is looking more and more that I'll be adding 11,000 new checks
EDIT: Now to figure out the best way to add 11,000 services...hmmmm...suggestions? Can I just use bulk cloning and add the services to already existing hosts?
Edit #2: DUH, the new API!!!
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Disk checks
I'm still using bulk host cloning for most of my bulk edits. It's proven and very fast, you just need to make a weekly inventory of all your Nagios hosts and export to Excel with a column for each type of hostgroup (os, location, infrastructure and role), filter the excel based on whatever you need. Paste in Bulk Host cloning. Select service, next, next finish and done. I hope your systems are up for the extra load! Grtz
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Disk checks
Willem - So how do you go about filling in those other fields every time you do your weekly export?WillemDH wrote:I'm still using bulk host cloning for most of my bulk edits. It's proven and very fast, you just need to make a weekly inventory of all your Nagios hosts and export to Excel with a column for each type of hostgroup (os, location, infrastructure and role), filter the excel based on whatever you need. Paste in Bulk Host cloning. Select service, next, next finish and done. I hope your systems are up for the extra load! Grtz
Edit: Nevermind, it wouldn't work for me as every 10 or so would be using a different service template and belong to a different servicegroup. Thats what makes the API such a nice thing for me, I'm building a complex as ever spreadsheet that I just have to past in the hostnames and bam...it'll spit out the 11,000 curl statements, then I just to do cut-n-paste.
And yeah, I pray it can handle the load too....going from ~19000 services to ~30000 with this change.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Disk checks
Haa, a little Powershell script I wrote making use of the PSExcel module (http://ramblingcookiemonster.github.io/PSExcel-Intro/). Just get all hosts in a hostgroup with invoke-webrequest json query. Then loop through all hosts and retrieve the hostgroups for each host. Put everything in a custom Powershell object , eg $Demodata and then
Sry I hope the above makes sense to you. I can't publish this on the Exchange as it would only work when your hostgroups are named exactly the same etc... It also contains some sensitive information and needs large cleanup.
But I can highly advice creating something similar. If you (or a colleague) have a bit Powershell knowledge you should be able to get it working. Let me know where you are stuck if you would ever give it a try. Every week I try to expand the script a bit. Nowadays it will compare the host's hostgroups with the actual os, location, infratructure and role and update the hostgroups with Reactor if needed (https://github.com/willemdh/naf_nagios_ ... hostgroups)
And I'm halfway to also retrieve all kinds of info of the server, eg contact and address, serial number etc and put it in the free variables with https://github.com/willemdh/naf_nagios_ ... st_freevar
Grtz
Code: Select all
$DemoData | Export-XLSX -Path C:\temp\Demo.xlsxSry I hope the above makes sense to you. I can't publish this on the Exchange as it would only work when your hostgroups are named exactly the same etc... It also contains some sensitive information and needs large cleanup.
But I can highly advice creating something similar. If you (or a colleague) have a bit Powershell knowledge you should be able to get it working. Let me know where you are stuck if you would ever give it a try. Every week I try to expand the script a bit. Nowadays it will compare the host's hostgroups with the actual os, location, infratructure and role and update the hostgroups with Reactor if needed (https://github.com/willemdh/naf_nagios_ ... hostgroups)
And I'm halfway to also retrieve all kinds of info of the server, eg contact and address, serial number etc and put it in the free variables with https://github.com/willemdh/naf_nagios_ ... st_freevar
Grtz
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Disk checks
One of these days I'll have to do a remote join.me session for you and show you my setup(and maybe pass presenter to you and you do same if you can) that way we both might get good ideas 
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Disk checks
Well I think that's a good idea. Let's try do this somewhere mid January, as I have a two week holiday starting Friday.
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net