Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
I have some difficulty configuring or setting up the type of checking the host and services. The problem i have is this: For simplicity of my problem i say that i have one Cisco router and two services, PING and ifStatus about one of his port, FastEthernet 0/0. So my browser is open on the nagios page ( my nagios page ) and the router is in front of me ... im plug out the cable and after a 1-2 minutes i see that the router have problem i.e. the PING is in down state and the interfaces is down. I plug in the caable and after 3-5 mint im getting OK status on both services.. So my problem is the minutes. I would like for this router when i pluge out the cable to get immediately status information that the FastEthernet 0/0 is down and PING is down and normally the Host is down .
What have i done for now is:
I enable active_checks_enabled on the services and check_interval i set to 0 as on demand, also active_checks_enabled and i think that's it ... and still i have no results.
Any idea is this is possible or it has to have some delay about this type of check or notification ... also i have to receive mail about that the host is down immediately and if host is up i will gave him 2-3 min. to ensure that he is really up and then send a mail.
For host down there will have to be some kind of delay, otherwise your Nagios would have to be perpetually checking that one device and it would never get any other work done. Not only that you would most likely get sooooo many false positives that it would drive people nuts.
Unless your device happens to be so incredibly critical (to the point where it being down for minutes is costing your company absurd amounts of money) then anywhere from 5 ~ 10 minutes is a pretty safe interval for most devices. As far as I know the minimum check interval is 1 minute, there may be an obscure main config option to change it to seconds but I don't ever remember seeing one... if there is... I still don't recommend it though
Unless your device happens to be so incredibly critical (to the point where it being down for minutes is costing your company absurd amounts of money) then anywhere from 5 ~ 10 minutes is a pretty safe interval for most devices. As far as I know the minimum check interval is 1 minute, there may be an obscure main config option to change it to seconds but I don't ever remember seeing one... if there is... I still don't recommend it though
It is very important the minutes. I will focus on 3-4 devices that we need this kind of monitoring. I would like to know at the moment the interfaces of that device are down or the ping or the same host ( host means router for me in this situation ) is down, to know about it. And 5-10 minutes is not good for this 4 devices. I need at least 5 second or maximum 10 seconds.
And now i gout another questions that is come up in mind. What is with the log files, if like you say ...for example, the router is down at 13:00 h and that 5-10 minutes past and then i get notification, i mean the status went from OK to Down ( soft - hard state ), and the notifications is change at 13:05 min. And the real situation is that the host is down at 13:00 h. So does the log file will tell that the host was down at 13:00 or at 13:05 minutes.
Here is the real situation. We have 700 routers with leased line and VPN connection to them. And it's really expensive. We notes that on some routers or lines we losing connectivity for a long period or we dont have stable connection ... the line goes down for about 10- 20 mint and then up and something like that. So we need to make some log files to have as proof that we dont have 24/7 connectivity as we should. That's way i need for some router to make this possible ... to notes the seconds that we have down status on that host.
On a slightly different note, could you please use just one account. It gets very confusing when you post as separate users, despite that we generally know who you are.
As for monitoring this quickly, to reiterate what jsmurphy said, this is not a good idea. You would be MUCH better off configuring this device to send snmp traps for link states and such. This way your device can send an alert to nagios as soon as the cable is plugged in or unplugged, nagios can process and send an alert. However 5-10 seconds is unreasonable in any respect. The information has to be processed by the router\switch, if using snmp traps, be processed by snmptt and trapd, sent as a passive result to nagios, then nagios has to register this issue, and generate an email\sms alert for you, your mail server now needs to get the message, and transfer that off to the recipients and their devices. 1-2 minutes is pretty doable, but there are so many factors in load on the various systems, network speeds, and especially with regards to email, how often the client is checking for new emails.
As for your second question, the state history should note both soft and hard states, and be accessible. So the first time it notices an issue, this would be shown. I am unsure of whether the logs would show both as well, or just hard states.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Sorry that i use two separate users, one of it IvanAK is private, ( love this Nagios ), and this one is business-like. I will try to focus on one user.
So if i configure this device to send traps as you say, what else should i configure on nagios. And what should i do or how to do it, to configure maximum fasts reply on nagios host or service configuration options. You say that 5-10 seconds are unreasonable, if 1 minute is the max. what options should i change to get to this results.
As for your second question, the state history should note both soft and hard states, and be accessible. So the first time it notices an issue, this would be shown. I am unsure of whether the logs would show both as well, or just hard states.
Im just checking with this if im right ...
So if the router is down at 13:00 and Nagios see this at 13:03 ... in the log file i should have that the router is down at 13:03 ? Right ?
And the final question is, what about the ping service. When we ping some host if the plugin is unplugged, after the second ping packet goes we know that the host is down. Can i bring this Nagios service to this type of situation ?
Thanks for trying to stay with one account. I don't mean to pick on you, just like I said, a little confusing on our end.
Well, since this is a passive check with snmptraps, you would want freshness set most likely, to do an active check of check_ping or something similar if nothing has been submitted. I should clarify, if you are losing ping as stated later, snmptraps will be no better as the remote device still needs to communicate with nagios. The idea was that you would be receiving traps from something at the same location as the nagios server, so that when an issue is noticed, the internal connection is still able to alert nagios. Otherwise if this is not the case, you would want to use something more like check ping. Set your interval to be 1 minute or so, and do 1-2 retries at .5 interval. Honestly this is the fastest i would recommend checking due to potential false flags and overloading nagios from running other checks.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
and you are saying that if i monitoring the Router 1 with snmp traps, and if the link to the Router 2 is down then i will notes, because Router 1 can send this notification to the nagios. But if i monitoring Router 10 and the interfaces on Router 10 is down ( the interface that is connected to the ISP ) then i will not get information immediately, i should wait nagios to perform the check. Right ?
As i learn for now, there is no big difference between active and passive in my situation, because the topology that i have on this 5 Routers that i like to check actively is like Router 10. They have just one connection to the ISP and to the nagios. So what i have to do to get this as fast as posible ? I have to change the freshness of Nagios, set the interval to 1 minute and 1-2 retries ? Right ?
If your Nagios server is able to contact router 10 and none of the routs leading to this router go down then you will be notified as soon as nagios returns its active check, or a passive check is received. If something on the ISP's end goes down, or router 1 or switch 2, before you get the check result back from router 10 it will appear it is down as well, this is why we recommend you use parent hosts if they reside on a higher level of your network or lay between nagios and one of your hosts. That way, if the parent goes down the child goes into an unknown state since you do not really know if the child is down or just unreachable.
Ok now iv got it. So i can do this next step. Because all those 5 routers that end in the VPN tunell configured on Router 1, i can make for those 5 Routers that is critical, on Router 1 with snmptraps traps to tell nagios that those routers are down. So i guess that with snmp traps i can get the info as fast as possible. The Router 1 will tell on Nagios if Router 10 is down, with traps. If im right, i just have to configure traps on Router 1 to be send on Nagios and configure Nagios to get this? Right ?
I'd say router 10 would tell Nagios it is down, not 1 telling Nagios 10 is down, unless you have a way to grab traps from other routers on your network with one router, and forward those to Nagios. Nagios will not automatically receive traps unless you have set router 10 to send traps down.