Page 2 of 5

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Mon Dec 04, 2017 2:51 pm
by emssa
from the command line I just get server port 80 open etc. nothing with latency so yes i guess this is what you meant if i was willing to sweep network latency issues under the rug and go for a more it's up or down approach.

As far as /etc/host.conf I just have multi on.. so not sure where the statement you provided is kept?

Is that a nagios specific thing?

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Mon Dec 04, 2017 3:55 pm
by kyang
from the command line I just get server port 80 open etc. nothing with latency so yes i guess this is what you meant if i was willing to sweep network latency issues under the rug and go for a more it's up or down approach
What command did you try when you got this? Could you show us?

Or you could open a separate thread or ticket so we can handle that issue.
so not sure where the statement you provided is kept
As for this, I'm not sure what you mean? Was this in correlation to what @dwhitfield said in his previous post?

A host/service config is usually kept in this location. Is that what you mean?

Code: Select all

/usr/local/nagios/etc/hosts
/usr/local/nagios/etc/services
Could you run a tcpdump on the switch to see the actual traffic?

Code: Select all

yum -y install tcpdump

tcpdump -s 0 -i any host <IPofSwitch>

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Tue Dec 05, 2017 8:13 am
by emssa
the command was a simple tcping <server> <port> as per the man

Yes he mentioned what was in my host config, but now that i know he meant

/usr/local/nagios/etc/hosts
/usr/local/nagios/etc/services

I will look there.

We do not have access to our switches, routers and firewalls. Why it is so important that we can monitor all we can from the network through tools like Nagios.

Mainly wanted to see if this was more of a known issue with more of a simple fix but this sounds like it is pointing to the need of turning off CoS/QoS on the network devices vs trying to force nagios to work with it, as stated this is also effecting HA and SAN balancing (HIT kit).

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Tue Dec 05, 2017 8:36 am
by emssa
Here is an example from a source ping that we are seeing:

64 bytes from servera: icmp_seq=1 ttl=64 time=6.38 ms
64 bytes from serverb: icmp_seq=1 ttl=64 time=0.120 ms
64 bytes from serverc: icmp_seq=1 ttl=64 time=0.117 ms
64 bytes from serverd: icmp_seq=1 ttl=64 time=0.087 ms
64 bytes from servere: icmp_seq=1 ttl=64 time=0.121 ms
64 bytes from serverf: icmp_seq=1 ttl=64 time=0.089 ms
64 bytes from serverg: icmp_seq=1 ttl=64 time=54.6 ms
64 bytes from serverh: icmp_seq=1 ttl=64 time=171 ms
64 bytes from serveri: icmp_seq=1 ttl=64 time=219 ms
64 bytes from serverj: icmp_seq=1 ttl=64 time=265 ms
64 bytes from serverk: icmp_seq=1 ttl=64 time=354 ms
64 bytes from serverl: icmp_seq=1 ttl=64 time=435 ms
64 bytes from serverm: icmp_seq=1 ttl=64 time=949 ms

These are all on the same /24 and gateway except they are not just 10 off as the juniper article explains and are enough to cause issues with thresholds on things such as corosync/pacemaker, false alerts from Nagios though let me add Nagios is just reading as it is presented, and load balance dynamic Hit kit which our sans use for balancing including dropping connections iscsi_tcp for time outs.


here is the cfg from our nagios server:

###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.6.8
# Date: 2017-08-18 18:45:24
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define host {
host_name host
use xiwizard_generic_host
address host
parents gateway
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
contacts nagiosadmin
contact_groups admins
notification_interval 60
notification_period xi_timeperiod_24x7
_xiwizard autodiscovery
register 1
}

###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################


and the service for our Nagios server:

###############################################################################
#
# Service configuration file
#
# Created by: Nagios Core Config Manager 2.6.7
# Date: 2017-07-25 19:27:31
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
host_name host
service_description HTTP
use xiwizard_website_http_service
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts user
contact_groups admins
_xiwizard autodiscovery
register 1
}

define service {
host_name host
service_description HTTPS
use xiwizard_website_http_service
check_command check_xi_service_http!-S
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts user
contact_groups admins
_xiwizard autodiscovery
register 1
}

define service {
host_name host
service_description Ping
use xiwizard_genericnetdevice_ping_service
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
contact_groups admins
_xiwizard autodiscovery
register 1
}

define service {
host_name host
service_description SSH
use xiwizard_ssh_service
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts user
contact_groups admins
_xiwizard autodiscovery
register 1
}

define service {
host_name host
service_description TCP Port 3306 - mysql
use xiwizard_tcp_service
check_command check_xi_service_tcp!-p 3306
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts user
contact_groups admins
_xiwizard autodiscovery
register 1
}

###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Tue Dec 05, 2017 11:22 am
by dwhitfield
I was saying you could use tcping instead of icmp ping to get around the icmp ping issues and Juniper routers. Please see https://assets.nagios.com/downloads/nag ... _In_XI.pdf

I'm not sure I follow the 302 issue you mentioned. 302 on HTTP is not an error, but a message that the page was found. Can you post a screenshot of the 302 message in nagios so we can get a better sense of what is going on?

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Tue Dec 05, 2017 11:41 am
by emssa
Thanks for link on the tcping

As far as the 302 i usually catch those in the gui as an alerts to a service but it is also logging them:

/usr/local/nagios/var/nagios.log | grep 302

[1512473344] SERVICE ALERT: hosta;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 493 bytes in 2.887 second response time
[1512476989] SERVICE ALERT: hostb;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 493 bytes in 2.544 second response time
[1512478242] SERVICE ALERT: hostc;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 493 bytes in 3.363 second response time
[1512481888] SERVICE ALERT: hostd;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 493 bytes in 4.014 second response time
[1512487172] SERVICE ALERT: hoste;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 493 bytes in 2.904 second response time
[1512487710] SERVICE ALERT: hostf;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 565 bytes in 3.425 second response time
[1512489022] SERVICE ALERT: hostg;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 493 bytes in 2.590 second response time
[1512490154] SERVICE ALERT: hosth;HTTPS;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 565 bytes in 2.614 second response time

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Tue Dec 05, 2017 11:44 am
by emssa
also seeing variants of those for https and:

[1512432000] CURRENT HOST STATE:

[1512432000] CURRENT SERVICE STATE:

besides the service alert

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Tue Dec 05, 2017 12:13 pm
by dwhitfield
Each one of those says the service is ok. What exactly is the problem with the http check?

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Wed Dec 06, 2017 11:19 am
by emssa
They are originally not ok and are alerts that we receive saying the service is critical.

The 302 is the follow up of the error clearing itself which really isn't a 302 to begin with because the service does not redirect.

First we get an alert critical as I posted in the beginning of the post then we see this
Screenshot from 2017-12-06 11-02-42.png

Re: Latency Uptime Warnings/Alerts and Juniper Switches

Posted: Wed Dec 06, 2017 2:44 pm
by kyang
Is the web address you are checking in the host for the service HTTPS the actual one URL?

When you type the URL into the address bar, does the website redirect to a different one or no?

You could try following the redirect, by using -f

Code: Select all

 -f, --onredirect=<ok|warning|critical|follow|sticky|stickyport>
    How to handle redirected pages. sticky is like follow but stick to the
    specified IP address. stickyport also ensures port stays the same.