Page 1 of 2

Indirect Host and Service Checks

Posted: Sun Aug 17, 2014 3:27 pm
by DanielB
Hi,

I am trying to configure Nagios to checks of internal hosts of a network in different locations. According I was investigating, this can be accomplished through a remote NRPE daemon at each location. This has the advantage that only is necessary to to open a single port 5666:

http://nagios.sourceforge.net/docs/3_0/ ... hecks.html

Unfortunately I did not find configuration examples about this scenary. The only source from which I could get something to test was the following:

http://books.google.com.ar/books?id=Iuj ... ks&f=false

I was doing some testing on my local network that were not satisfactory. Following the example of the book for ping checks, I did the following:

Configuration of Nagios Core installation:
===================================

I use "check-host-alive-external" on the template:

Code: Select all

define host{
        name                            linux-server    ; The name of this host template
        use                             generic-host    ; This template inherits other values from the generic-host template
        check_period                    24x7            ; By default, Linux hosts are checked round the clock
        check_interval                  5               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              10              ; Check each Linux host 10 times (max)
        check_command                   check-host-alive-external ; Default command to check Linux hosts
        notification_period             workhours       ; Linux admins hate to be woken up, so we only notify during the day
                                                        ; Note that the notification_period variable is being overridden from
                                                        ; the value that is inherited from the generic-host template!
        notification_interval           120             ; Resend notifications every 2 hours
        notification_options            d,u,r           ; Only send notifications for specific host states
        contact_groups                  admins          ; Notifications get sent to the admins by default
        hostgroups                      linux-servers   ; DGB - 20090719
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
The definitions of the commands are:

Code: Select all

define command {
        command_name check-host-alive-external
        command_line check_nrpe_external!check_ping
       }

define command {
        command_name check_nrpe_external
        command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS:192.168.2.245$ -c $ARG1$ -a $HOSTADDRESS$
       }
Where 192.168.2.245 is the IP of the intermediary NRPE daemon. In this host I used the following configuration for ping checks:

Configuration on intermediary NRPE daemon:
======================================

Code: Select all

allowed_hosts=192.168.2.210

[...]

dont_blame_nrpe=1

[...]

command[check_ping]=/usr/local/nagios/libexec/check_ping -H $ARG1$ -w 3000.0,80% -c 5000.0,100% -p 5
When reviewing the log in /var/log/nagios3/nagios.log on Nagios Core, I find something like the following:

Code: Select all

[1408304938] Warning: Return code of 127 for check of host 'ws1' was out of bounds. Make sure the plugin you're trying to run actually exists. 
I get the feeling that there is an incorrect definition for the "Make sure the plugin you're trying to run actually exists".

I would appreciate if you could tell me what might be misconfigured. If you also have an example for this scenario, I would appreciate if you can share it.

Thank you in advance for your replies.

Best regards,
Daniel

Re: Indirect Host and Service Checks

Posted: Mon Aug 18, 2014 4:42 pm
by abrist
Can yougive us a listing of trhe libexec directory on the intermediary host?

Code: Select all

ls -la /usr/lib/nagios/plugins/

Re: Indirect Host and Service Checks

Posted: Wed Aug 20, 2014 6:57 am
by DanielB
abrist wrote:Can yougive us a listing of trhe libexec directory on the intermediary host?

Code: Select all

ls -la /usr/lib/nagios/plugins/
Hello, abrist. Thanks for your reply.

Code: Select all

root@nagios:~# ls -la /usr/lib/nagios/plugins/
total 3596
drwxr-xr-x 2 root root   4096 ago 17 15:54 .
drwxr-xr-x 4 root root   4096 ago  8 15:18 ..
-rwxr-xr-x 1 root root 108904 jun 27  2012 check_apt
-rwxr-xr-x 1 root root   7262 jul  2  2012 check_backuppc
-rwxr-xr-x 1 root root   2242 jun 27  2012 check_breeze
-rwxr-xr-x 1 root root  51984 jun 27  2012 check_by_ssh
-rwxr-xr-x 1 root root   1723 jul  2  2012 check_cert_expire
lrwxrwxrwx 1 root root      9 jun 27  2012 check_clamd -> check_tcp
-rwxr-xr-x 1 root root  35112 jun 27  2012 check_cluster
-rwxr-xr-x 1 root root  51536 jun 27  2012 check_dhcp
-rwxr-xr-x 1 root root  47568 jun 27  2012 check_dig
-rwxr-xr-x 1 root root 126000 jun 27  2012 check_disk
-rwxr-xr-x 1 root root   9145 jun 27  2012 check_disk_smb
-rwxr-xr-x 1 root root  47472 jun 27  2012 check_dns
-rwxr-xr-x 1 root root   6924 jul  2  2012 check_dnssec_delegation
-rwxr-xr-x 1 root root  34776 jun 27  2012 check_dummy
-rwxr-xr-x 1 root root  40034 jul  2  2012 check_email_delivery
-rwxr-xr-x 1 root root  21161 jul  2  2012 check_email_delivery_epn
-rwxr-xr-x 1 root root   2781 jul  2  2012 check_entropy
-rwxr-xr-x 1 root root   3053 jun 27  2012 check_file_age
-rwxr-xr-x 1 root root   6315 jun 27  2012 check_flexlm
-rwxr-xr-x 1 root root  47472 jun 27  2012 check_fping
lrwxrwxrwx 1 root root      9 jun 27  2012 check_ftp -> check_tcp
-rwxr-xr-x 1 root root  39336 jun 27  2012 check_game
-rwxr-xr-x 1 root root   8359 jul  2  2012 check_haproxy
lrwxrwxrwx 1 root root     10 jun 27  2012 check_host -> check_icmp
-rwxr-xr-x 1 root root 330316 jul  2  2012 check_hpasm
-rwxr-xr-x 1 root root  47248 jun 27  2012 check_hpjd
-rwxr-xr-x 1 root root 150736 jun 27  2012 check_http
-rwxr-xr-x 1 root root  10839 jul  2  2012 check_httpd_status
-rwxr-xr-x 1 root root  55312 jun 27  2012 check_icmp
-rwxr-xr-x 1 root root  39336 jun 27  2012 check_ide_smart
-rwxr-xr-x 1 root root  15134 jun 27  2012 check_ifoperstatus
-rwxr-xr-x 1 root root  12598 jun 27  2012 check_ifstatus
lrwxrwxrwx 1 root root      9 jun 27  2012 check_imap -> check_tcp
-rwxr-xr-x 1 root root  12402 jul  2  2012 check_imap_quota
-rwxr-xr-x 1 root root   6328 jul  2  2012 check_imap_quota_epn
-rwxr-xr-x 1 root root  37614 jul  2  2012 check_imap_receive
-rwxr-xr-x 1 root root  15745 jul  2  2012 check_imap_receive_epn
-rwxr-xr-x 1 root root  17654 jul  2  2012 check_ipmi_sensor
-rwxr-xr-x 1 root root   6887 jun 27  2012 check_ircd
lrwxrwxrwx 1 root root      9 jun 27  2012 check_jabber -> check_tcp
-rwxr-xr-x 1 root root  43640 jun 27  2012 check_ldap
lrwxrwxrwx 1 root root     10 jun 27  2012 check_ldaps -> check_ldap
-rwxr-xr-x 1 root root   5843 jul  2  2012 check_libs
-rwxr-xr-x 1 root root  16744 jul  2  2012 check_lm_sensors
-rwxr-xr-x 1 root root  39080 jun 27  2012 check_load
-rwxr-xr-x 1 root root   6026 jun 27  2012 check_log
-rwxr-xr-x 1 root root  20284 jun 27  2012 check_mailq
-rwxr-xr-x 1 root root  26944 jul  2  2012 check_memcached
-rwxr-xr-x 1 root root  39272 jun 27  2012 check_mrtg
-rwxr-xr-x 1 root root  39144 jun 27  2012 check_mrtgtraf
-rwxr-xr-x 1 root root  29883 jul  2  2012 check_multipath
-rwxr-xr-x 1 root root  43440 jun 27  2012 check_mysql
-rwxr-xr-x 1 root root 120781 jul  2  2012 check_mysql_health
-rwxr-xr-x 1 root root  43440 jun 27  2012 check_mysql_query
-rwxr-xr-x 1 root root  39112 jun 27  2012 check_nagios
lrwxrwxrwx 1 root root      9 jun 27  2012 check_nntp -> check_tcp
lrwxrwxrwx 1 root root      9 jun 27  2012 check_nntps -> check_tcp
-rwxr-xr-x 1 root root  51632 jun 27  2012 check_nt
-rwxr-xr-x 1 root root  51600 jun 27  2012 check_ntp
-rwxr-xr-x 1 root root  51856 jun 27  2012 check_ntp_peer
-rwxr-xr-x 1 root root  47488 jun 27  2012 check_ntp_time
-rwxr-xr-x 1 root root  63824 jun 27  2012 check_nwstat
-rwxr-xr-x 1 root root   8326 jun 27  2012 check_oracle
-rwxr-xr-x 1 root root  43312 jun 27  2012 check_overcr
-rwxr-xr-x 1 root root   9294 jul  2  2012 check_packages
-rwxr-xr-x 1 root root  43536 jun 27  2012 check_pgsql
-rwxr-xr-x 1 root root  51632 jun 27  2012 check_ping
lrwxrwxrwx 1 root root      9 jun 27  2012 check_pop -> check_tcp
-rwxr-xr-x 1 root root   7129 jul  2  2012 check_printer
-rwxr-xr-x 1 root root 121424 jun 27  2012 check_procs
-rwxr-xr-x 1 root root  43472 jun 27  2012 check_radius
-rwxr-xr-x 1 root root  45083 jul  2  2012 check_raid
-rwxr-xr-x 1 root root  13531 jul  2  2012 check_rbl
-rwxr-xr-x 1 root root  43408 jun 27  2012 check_real
-rwxr-xr-x 1 root root   9581 jun 27  2012 check_rpc
lrwxrwxrwx 1 root root     10 jun 27  2012 check_rta_multi -> check_icmp
-rwxr-xr-x 1 root root   7410 jul  2  2012 check_running_kernel
-rwxr-xr-x 1 root root   1414 jun 27  2012 check_sensors
lrwxrwxrwx 1 root root      9 jun 27  2012 check_simap -> check_tcp
-rwxr-xr-x 1 root root 125648 jun 27  2012 check_smtp
-rwxr-xr-x 1 root root  24545 jul  2  2012 check_smtp_send
-rwxr-xr-x 1 root root  13382 jul  2  2012 check_smtp_send_epn
-rwxr-xr-x 1 root root 134280 jun 27  2012 check_snmp
-rwxr-xr-x 1 root root  97676 jul  2  2012 check_snmp_environment
-rwxr-xr-x 1 root root   4926 jul  2  2012 check_soas
lrwxrwxrwx 1 root root      9 jun 27  2012 check_spop -> check_tcp
-rwxr-xr-x 1 root root  39280 jun 27  2012 check_ssh
-rwxr-xr-x 1 root root  20610 jul  2  2012 check_ssl_cert
lrwxrwxrwx 1 root root      9 jun 27  2012 check_ssmtp -> check_tcp
-rwxr-xr-x 1 root root   2836 jul  2  2012 check_statusfile
-rwxr-xr-x 1 root root  43208 jun 27  2012 check_swap
-rwxr-xr-x 1 root root  56144 jun 27  2012 check_tcp
-rwxr-xr-x 1 root root  43376 jun 27  2012 check_time
lrwxrwxrwx 1 root root      9 jun 27  2012 check_udp -> check_tcp
-rwxr-xr-x 1 root root  47472 jun 27  2012 check_ups
-rwxr-xr-x 1 root root  34952 jun 27  2012 check_users
-rwxr-xr-x 1 root root   2936 jun 27  2012 check_wave
-rwxr-xr-x 1 root root    879 jul  2  2012 check_webinject
-rwxr-xr-x 1 root root   6871 jul  2  2012 check_whois
-rwxr-xr-x 1 root root   8762 jul  2  2012 check_zone_auth
-rwxr-xr-x 1 root root   7731 jul  2  2012 check_zone_rrsig_expiration
-rwxr-xr-x 1 root root   6397 jul  2  2012 imap_ssl_cert
-rwxr-xr-x 1 root root   3480 jul  2  2012 imap_ssl_cert_epn
-rwxr-xr-x 1 root root  43336 jun 27  2012 negate
-rwxr-xr-x 1 root root  39016 jun 27  2012 urlize
-rw-r--r-- 1 root root   1938 jun 27  2012 utils.pm
-rwxr-xr-x 1 root root   2728 jun 27  2012 utils.sh
I have noticed that if I redefine the check_command as follows, the checks works.

Code: Select all

define host{
        name                            linux-server-external    ; The name of this host template
        use                             generic-host    ; This template inherits other values from the generic-host template
        check_period                    24x7            ; By default, Linux hosts are checked round the clock
        check_interval                  5               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              10              ; Check each Linux host 10 times (max)
        check_command                   check_nrpe_external!check_ping ; Default command to check Linux hosts
        notification_period             workhours       ; Linux admins hate to be woken up, so we only notify during the day
                                                        ; Note that the notification_period variable is being overridden from
                                                        ; the value that is inherited from the generic-host template!
        notification_interval           120             ; Resend notifications every 2 hours
        notification_options            d,u,r           ; Only send notifications for specific host states
        contact_groups                  admins          ; Notifications get sent to the admins by default
        hostgroups                      linux-servers-external   ; DGB - 20140818
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
Why can to be this difference?

Best regards,
Daniel

Re: Indirect Host and Service Checks

Posted: Thu Aug 21, 2014 4:46 pm
by slansing
Let me ask you this, why are you trying to run a NRPE command, that runs ping, against a host. Why not just set up a simple ping check on that host, and then set up NRPE checks for the services? I'd highly suggest looking at:

http://nagios.sourceforge.net/docs/3_0/toc.html

Take a look at the "Getting started" sections, they should help you with your basic host/service definitions, as well as explain what should go where, and why.

You would want to set up your indirect checks through your services, I would reserve your host check to verify you can actually talk to the head server, then do the following with a service:


check_nrpe > calls a command on the remote host > the nrpe.cfg on that remote host has the command that was called, actually run a subsequent command against the indirect system.

Re: Indirect Host and Service Checks

Posted: Thu Aug 21, 2014 4:49 pm
by abrist
Ah, I missed something in one of your earlier posts:
DanielB wrote:check_command check-host-alive-external
DanielB wrote:define command {
command_name check-host-alive-external
command_line check_nrpe_external!check_ping
}
The command line is malformed as you are referencing another command with an ARG delimiter (!) from the check-host-alive-external command. This is out of spec. Your best bet would be to copy the command and rename it if you want them to be unique:

Code: Select all

define command {
        command_name check-host-alive-external
        command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS:192.168.2.245$ -c $ARG1$ -a $HOSTADDRESS$
       }

Re: Indirect Host and Service Checks

Posted: Sat Aug 23, 2014 6:02 pm
by DanielB
slansing wrote:Let me ask you this, why are you trying to run a NRPE command, that runs ping, against a host. Why not just set up a simple ping check on that host, and then set up NRPE checks for the services? I'd highly suggest looking at:

http://nagios.sourceforge.net/docs/3_0/toc.html

Take a look at the "Getting started" sections, they should help you with your basic host/service definitions, as well as explain what should go where, and why.

You would want to set up your indirect checks through your services, I would reserve your host check to verify you can actually talk to the head server, then do the following with a service:


check_nrpe > calls a command on the remote host > the nrpe.cfg on that remote host has the command that was called, actually run a subsequent command against the indirect system.
Hello, slansing.

The reason why I am doing the ping and services checks on this way is because the hosts are behind a firewall and they are not directly accessible from the outside. In the remote office also wanted to minimize the number of open ports, so I thought on this way with an intermediary NRPE server.

The idea was that a remote Nagios server could check the internal hosts from various offices. Firewalls on some of these offices are not managed by the people with whom I'm working, so this solution will not be valid for these scenario, so in these cases I have to think in another model where the result of the checks is sent from inside to outside. I think this could be done with passive checks for hosts and services but I have no experience in this area. I was reading some documentation but still I've not entirely clear how I could implement checks. I would appreciate if anyone has any examples on this model checks for hosts and services, to build on these examples to create the SNMP checks and of Windows and GNU/Linux hosts with their services.

Thanks for your reply.

Best regards,
Daniel

Re: Indirect Host and Service Checks

Posted: Sat Aug 23, 2014 6:35 pm
by DanielB
abrist wrote:Ah, I missed something in one of your earlier posts:
DanielB wrote:check_command check-host-alive-external
DanielB wrote:define command {
command_name check-host-alive-external
command_line check_nrpe_external!check_ping
}
The command line is malformed as you are referencing another command with an ARG delimiter (!) from the check-host-alive-external command. This is out of spec. Your best bet would be to copy the command and rename it if you want them to be unique:

Code: Select all

define command {
        command_name check-host-alive-external
        command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS:192.168.2.245$ -c $ARG1$ -a $HOSTADDRESS$
       }
Hello, Abrist.

What do you mean with this is out of spec? To that combination is not possible? And yes, what you propose is the most simplistic as an alternative to "check_nrpe_external!check_ping" in the check_command.

Thanks for your reply.

Best regards,
Daniel

Re: Indirect Host and Service Checks

Posted: Mon Aug 25, 2014 11:10 am
by abrist
DanielB wrote:To that combination is not possible?
Indeed, it is not. Defined "command_lines" are executed through a shell, with $ARGn$s populated from the service/host check definition that calls the command. You cannot reference another defined command name (check_nrpe_external) from within the command_line as it will most likely lead to syntax error when Nagios tries to run the line through a shell.
This is why I suggest formatting your command as such:

Code: Select all

define command {
        command_name check-host-alive-external
        command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS:192.168.2.245$ -c $ARG1$ -a $HOSTADDRESS$
       }

Re: Indirect Host and Service Checks

Posted: Mon Aug 25, 2014 5:37 pm
by DanielB
abrist wrote:
DanielB wrote:To that combination is not possible?
Indeed, it is not. Defined "command_lines" are executed through a shell, with $ARGn$s populated from the service/host check definition that calls the command. You cannot reference another defined command name (check_nrpe_external) from within the command_line as it will most likely lead to syntax error when Nagios tries to run the line through a shell.
This is why I suggest formatting your command as such:

Code: Select all

define command {
        command_name check-host-alive-external
        command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS:192.168.2.245$ -c $ARG1$ -a $HOSTADDRESS$
       }
Hello, abrist.

Ah, ok! Now I've seen the light :D That explains why at first you asked me a list of the files in the plugins directory and why Nagios said "Make sure the plugin you're trying to run actually exists". Thanks for clarifying this!

I have also done some testing with passive checks using NSCA for hosts and services, and I'm happy that it worked as I had no experience in this area and it allowed me to learn something new. For this I developed two bash scripts: check_service_passive.sh and check_host_passive.sh which I put on the host that makes the checks on other hosts of the internal network and then send the results to the Nagios server via NSCA client.

I wanted to consult two aspects regarding with NSCA:

* In terms of security, which encryption mechanism you recommend? In principle I configured 3DES.

* As for the definition of services, I have services with the same name (e.g. "CPU Load") running in active mode, but also I defined an object service with this name for passive checks for this specific group of servers which makes the same check but in passive mode. Nagios not submit a message about duplicate service descriptions. This practice is recommended?

Thanks for your reply.

Best regards,
Daniel

Re: Indirect Host and Service Checks

Posted: Tue Aug 26, 2014 4:53 pm
by sreinhardt
In terms of security, which encryption mechanism you recommend? In principle I configured 3DES.
3des is probably the best that nsca supports, it doesn't have full openssl support at the moment, but this should be more than fine for an internal system.
* As for the definition of services, I have services with the same name (e.g. "CPU Load") running in active mode, but also I defined an object service with this name for passive checks for this specific group of servers which makes the same check but in passive mode. Nagios not submit a message about duplicate service descriptions. This practice is recommended?
I can't say I really suggest this, as if you are using different checks it will cause an issue with perfdata being represented properly. However there are many people that do, so it's really up to you. It will work fine, just be sure to enable passive checks on that service.