Page 1 of 1

if HOST not reachable, SERVICE Status should not change

Posted: Sun May 24, 2020 4:30 am
by zaji_nms
Dear Expert

Previously we have ask the feature that if HOST not reachable , none of SERVICE STATUS should change to WARNING.

Please check below history, SERVICE is DOWN for more than 1 year but whenever HOST unreachable the Service STATUS changed from CRITICAL to WARNING and generating Alert/Notification and HOST become reachable, found SERVICE STATUS=CRITICAL and generating ALERT with new time stamp and sending Notification, why why?

Date / Time Host Service State State Type Attempt Information
2020-05-22 17:07:39 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2020-04-28 07:31:06 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2020-04-28 07:21:16 my-router-1 my-link-1 Status WARNING HARD 5 of 5 WARNING: SNMP error: No response from remote host 'my-router-1'
2019-12-26 14:43:38 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2019-12-26 14:38:45 my-router-1 my-link-1 Status WARNING HARD 5 of 5 WARNING: SNMP error: No response from remote host 'my-router-1'
2019-12-26 14:23:45 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2019-12-26 14:18:51 my-router-1 my-link-1 Status WARNING HARD 5 of 5 WARNING: SNMP error: No response from remote host 'my-router-1'
2018-10-13 02:49:12 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2018-10-13 02:44:25 my-router-1 my-link-1 Status WARNING HARD 5 of 5 WARNING: SNMP error: No response from remote host 'my-router-1'
2018-07-18 11:58:34 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2018-07-18 11:27:55 my-router-1 my-link-1 Status OK HARD 5 of 5 OK: Interface 1-1-3-1 (index 180) is up.
2018-07-18 10:28:14 my-router-1 my-link-1 Status CRITICAL HARD 5 of 5 CRITICAL: Interface 1-1-3-1 (index 180) is down.
2018-07-04 10:38:09 my-router-1 my-link-1 Status OK HARD 5 of 5 OK: Interface 1-1-3-1 (index 180) is up.

regards

Re: if HOST not reachable, SERVICE Status should not change

Posted: Tue May 26, 2020 4:31 pm
by ssax
It's because the plugin is returning a WARNING when it can't connect:

Code: Select all

if (!defined($response = $session->get_request(@snmpoids))) {
        $answer=$session->error;
        $session->close;
        $state = 'WARNING';
        print ("$state: SNMP error: $answer\n");
        exit $ERRORS{$state};
}
You can set host_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and restart the nagios service so that the service checks don't even try to run if the host is in a problem state (hard or soft).

Otherwise you could modify the plugin to return CRITICAL instead of WARNING. NOTE: If you do this and you upgrade XI, it will likely need to be re-edited after every upgrade.

Re: if HOST not reachable, SERVICE Status should not change

Posted: Wed May 27, 2020 1:14 am
by zaji_nms
dear ssax

I think only we have asked this feature long back but looks there is bug, still not working as should work, as we already set that long back in our CFG file as below. I think only not generating Alert in the Home Operations Center (me not sure too), but Email/Notification triggering (plz do some lab testing at your end).

more /usr/local/nagios/etc/nagios.cfg | grep host_down_disable_service_checks
host_down_disable_service_checks=1

can you please provide the script full path with name to edit, better you write the script with edited text/code (script name with full path)

regards

Re: if HOST not reachable, SERVICE Status should not change

Posted: Wed May 27, 2020 3:58 pm
by ssax
Please go to Reports > State History:
- Adjust the Period to include the time this occurred
- Select the host from the Limit To dropdown
--- Don't limit on the Services, we want to see host and service states
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run

I want to see the host and the service states to determine if there was a bug, if it doesn't include a hoststate go back further in time so it includes it and includes the ones already listed.

Please send me the report, you can either download it as a PDF or CSV.

This is the plugin:

Code: Select all

/usr/local/nagios/libexec/check_ifoperstatus

Code: Select all

    158 if (!defined($response = $session->get_request(@snmpoids))) {
    159         $answer=$session->error;
    160         $session->close;
    161         $state = 'WARNING';
    162         print ("$state: SNMP error: $answer\n");
    163         exit $ERRORS{$state};
    164 }