Page 1 of 2
Host shows down but services are ok?
Posted: Fri Apr 19, 2019 8:45 am
by bmallett
I have a group of wireless APs that are being assigned IP addresses via DHCP. In order to check these, I wrote a plugin that takes their MAC Address and Parent hostname (as a DNS name), telnets the parent switch, runs 'show arp', parses the arp table to match the line with the MAC Address passed, return the IP address assigned, and then ping that IP. The service works and shows 'OK', but the host shows 'DOWN', due to 'check_ping'. I have not assigned 'check_ping' to this host or hostgroup. Is there a way to have Nagios display the host status by the result of the service? Also, why is Nagios still using 'check_ping' for something where it isn't 'assigned'?
HOST STATUS:
Host Status: DOWN (for 0d 20h 3m 38s)
Status Information: check_ping: Invalid hostname/address - AD-02-rm02-storage
Usage:
check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%
[-p packets] [-t timeout] [-4
Performance Data: -6]
SERVICE STATUS:
Service State Information
Current Status: OK (for 0d 16h 58m 29s)
Status Information: (No output on stdout) stderr:
Re: Host shows down but services are ok?
Posted: Fri Apr 19, 2019 1:52 pm
by cdienger
A service must be assigned to a host definition and a host definition must define a check_command(which points to a plugin). It doesn't have to be directly assigned - it can be assigned through a template. If you look at the host definition you should see a "use" line. This points to a template where the check_command is likely assigned.
The host's check_command doesn't really differ from a service's, meaning you could get rid of the current service definition and modify the host definition to use the plugin you created.
Re: Host shows down but services are ok?
Posted: Fri Apr 19, 2019 2:19 pm
by bmallett
Got that part working now. Thanks.
The issue I am having now is making sure I am properly passing the variables from nagios to the plugin. Every single one is returning 'OK' when it shouldn't...
I am currently doing this in the PHP:
Code: Select all
$mac = ( isset( $argv[1] ) ? $argv[1] : null );
$host = ( isset( $argv[2] ) ? $argv[2] : null );
When I call the plugin with nagios, I am calling the command like this:
Code: Select all
define command{
command_name check_mac
command_line $USER1$/check_mac $_HOSTMACADDRESS$ $_HOSTPARENT_DNS$
}
In the host definitions, I am adding the custom needed variables like this:
Code: Select all
_MACADDRESS A4:93:4C:43:5A:DA
_PARENT_DNS PS-122-3750X-01-111
Technically I could use the 'parents' field in the host definition, but I am unsure how to grab/use it.
Finally, in my service definition, I am using this:
Code: Select all
define service {
use generic-service
hostgroup_name access-points
servicegroups ap-status
service_description Get IP from MAC ADDRESS and Ping for AP Status
check_command check_mac!$_HOSTMAC_ADDRESS$!$_HOSTPARENT_DNS$
}
What do I need to change in order to pass the parent field, if possible, or just the two custom fields to the php plugin?
Code: Select all
define host {
use generic-access-point
host_name PS-122-102
alias Access Point
display_name AP RM xxx
# address
parents PS-122-3750X-01-111
hostgroups access-points
_MACADDRESS A4:93:4C:43:5D:AD
_PARENT_DNS PS-122-3750X-01-111
process_perf_data 1
icon_image Access-Point.png
icon_image_alt Access Point
vrml_image Access-Point.gd2
}
This is my first time building something for Nagios. Thanks.
Re: Host shows down but services are ok?
Posted: Fri Apr 19, 2019 2:54 pm
by cdienger
Try this for the command:
Code: Select all
define command{
command_name check_mac
command_line $USER1$/check_mac $ARG1$ $ARG2$
}
and this for the service:
Code: Select all
define service {
use generic-service
hostgroup_name access-points
servicegroups ap-status
service_description Get IP from MAC ADDRESS and Ping for AP Status
check_command check_mac!$_MACADDRESS$!$_PARENT_DNS$
}
Re: Host shows down but services are ok?
Posted: Fri Apr 19, 2019 3:05 pm
by bmallett
PERFECT!
Thanks a ton!
Re: Host shows down but services are ok?
Posted: Fri Apr 19, 2019 3:11 pm
by npolovenko
@bmallett, Would you have any other questions for us before I lock this thread?
Re: Host shows down but services are ok?
Posted: Fri Apr 19, 2019 3:25 pm
by bmallett
well... Not in regards to this, but if you want to answer it here, I will oblige...
Regarding communication, specifically email notifications, I have been "hit or miss" in getting them configured the easiest way to manage. This may not be the best approach, but I am a firm believer of making jobs as easy as they can be.
That said, in order to maintain which things are triggering notifications and who receives those notifications, I attempted to have them in hostgroups. This didn't work for obvious reasons. (The docs say it doesn't.)
I am assuming I need to specify the flags in each individual host for them to trigger properly. If that assumption is correct, what all flags need to be added to each individual host?
I was using the following in the 'templates', but since I had the same hosts in various hostgroups, they would spit out multiple emails for each occurrence. Is that the expected functionality or did I have something else askew?
Code: Select all
ontact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
If there is a better way to manage notification emails or is that the best way? (individual hosts)
Thanks again for the help.
Re: Host shows down but services are ok?
Posted: Mon Apr 22, 2019 5:02 pm
by npolovenko
@bmallett, It's not possible to add notification options to contact groups directly. You'd need to define these settings per host or per service.
Here is the list of options you need to add to each host and service:
Host:
notification_interval 60
notification_options d,u,r,f,s
notification_period 24x7
contact_groups admins
Service:
notification_interval 60
notification_options w,u,c,r,f,s
notification_period 24x7
contact_groups admins
But there is a possible shortcut. You can add these options to your templates and have all other hosts and services use these templates.
To have your service use a template you can add this line to each service definition:
use myTemplate
Here is an example of service and host templates with notification options:
define service {
name local-service
use generic-service
max_check_attempts 4
check_interval 5
retry_interval 1
notification_interval 60
notification_options w,u,c,r,f,s
notification_period 24x7
contact_groups admins
register 0
}
define host {
name generic-host
notification_options d,u,r,f,s
notification_period 24x7
notification_interval 60
notifications_enabled 1
contact_groups admins
register 0
}
Re: Host shows down but services are ok?
Posted: Wed Apr 24, 2019 8:05 am
by bmallett
@npolovenko
That's what I thought. I was just hoping for something different. I guess I could just make a template above the main "generic" for each "sub-group" and do it at that level.
Thanks again for all your help.
Re: Host shows down but services are ok?
Posted: Wed Apr 24, 2019 3:03 pm
by lmiltchev
Sounds good. I am closing this topic now. If you have any further questions, please start a new thread.