NSCA and Distributed Nagios

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

NSCA and Distributed Nagios

Post by petronagios »

Hi, I’m having problems setting up a distributed monitoring environment. Can you help?

Service checks are running on the distributed server and are forwarded to the master. But the service check isn’t updated on the master. If I look in /var/log/messages on the master I can see the following

Sep 11 09:40:48 ablxpn02 xinetd[1552]: START: nsca pid=12214 from=10.4.24.227
Sep 11 09:40:48 ablxpn02 nsca[12214]: Handling the connection...
Sep 11 09:40:49 ablxpn02 nsca[12214]: End of connection...
Sep 11 09:40:49 ablxpn02 xinetd[1552]: EXIT: nsca status=0 pid=12214 duration=1(sec)

Distributed server config
enable_notifications=0
obsess_over_services=1
ocsp_command=submit_check_result
nsca is running under xinetd

define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 60
notification_period 24x7
register 0
}


# Local service definition template - This is NOT a real service, just a template!

define service{
name local-service
use generic-service
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
register 0
}

define service{
use local-service
host_name tvpl0682
service_description Root Partition
check_command check_local_disk!20%!10%!/
}



Master server Config
execute_service_checks=1
check_external_commands=1
accept_passive_service_checks=1
nsca is running under xinetd

# Define a passive check template
define service{
#use generic-service
name passive_service
active_checks_enabled 0
passive_checks_enabled 1
parallelize_check 1
flap_detection_enabled 0
register 0
is_volatile 0
check_period 24x7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
check_freshness 0
contact_groups admins
check_command check_dummy!0
notification_interval 45
notification_period 24x7
notification_options w,u,c,r
stalking_options w,c,u
process_perf_data 1
}

define service{
use passive_service
host_name tvpl0682
service_description Root Partition
active_checks_enabled 0
check_command check_dummy!0
}


Many thanks
Steve.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: NSCA and Distributed Nagios

Post by mguthrie »

Try turning on:

Code: Select all

log_external_commands=1
in the main nagios.cfg. That way you can just tail the nagios.log file and you should see the reason why it's failing.
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

Re: NSCA and Distributed Nagios

Post by petronagios »

Hi Thanks for your reply, I can now see whats happening but I’m not sure how to fix it!

The following is being sent from the distributed host via send_nsca

tvpl0682 Root Partition DISK CRITICAL - free space: / 953 MB (6% inode=92%):

And the following two lines appear in the Nagios log on the Master

[1347531830] EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;tvpl0682;0;DISK CRITICAL - free space: / 953 MB (6% inode=92%):
[1347531839] PASSIVE HOST CHECK: tvpl0682;0;DISK CRITICAL - free space: / 953 MB (6% inode=92%):


The problem is the check is being submitted as HOST_CHECK_RESULT check, so it doesn’t update the passive service check on the master. What do I change to make it submit a PROCESS_SERVICE_CHECK_RESULT ?

My event handler is just a cut n paste from the Nagios website

cat submit_check_result
#!/bin/sh

# Arguments:
# $1 = host_name (Short name of host that the service is
# associated with)
# $2 = svc_description (Description of the service)
# $3 = state_string (A string representing the status of
# the given service - "OK", "WARNING", "CRITICAL"
# or "UNKNOWN")
# $4 = plugin_output (A text string that should be used
# as the plugin output for the service checks)
#

# Convert the state string to the corresponding return code
Return_Code=-1

case "$3" in
OK)
Return_Code=0
;;
WARNING)
Return_Code=1
;;
CRITICAL)
Return_Code=2
;;
UNKNOWN)
Return_Code=-1
;;
esac

# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server

/usr/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$Return_Code" "$4" | /usr/local/nagios/bin/send_nsca sblxppns01 -c /usr/local/nagios/etc/send_nsca.cfg >> /tmp/output


eventhandlers]#


Many thanks
Steve
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: NSCA and Distributed Nagios

Post by mguthrie »

Here's the format for a passive service result:
[<timestamp>] PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>

Passive host result:
[<timestamp>] PROCESS_HOST_CHECK_RESULT;<host_name>;<host_status>;<plugin_output>

(Pulled from the following Core doc)
http://nagios.sourceforge.net/docs/3_0/ ... hecks.html

You need both the host name, and service description for the service result. I'm guessing that's the issue.
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

Re: NSCA and Distributed Nagios

Post by petronagios »

Great thanks for your reply, I re-created the event handler on the distributed server and it now submits the remote command as PROCESS_SERVICE_CHECK_RESULT – Great! But as soon as the service check turns red/critical in the Nagios GUI, it goes back to green/OK even though the file system is still full. See the logfile entries below from the master server.

[1347628821] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;tvpl0682;Root Partition;2;DISK CRITICAL - free space: / 949 MB (6% inode=92%):
[1347628827] PASSIVE SERVICE CHECK: tvpl0682;Root Partition;2;DISK CRITICAL - free space: / 949 MB (6% inode=92%):
[1347628827] SERVICE ALERT: tvpl0682;Root Partition;CRITICAL;HARD;1;DISK CRITICAL - free space: / 949 MB (6% inode=92%):
[1347628827] SERVICE ALERT: tvpl0682;Root Partition;OK;HARD;1;OK

I don’t know where the HARD OK is coming from as the file system is still full!

My passive service check template/service definition is

define service{
name passive-check
use generic-service,srv-pnp
max_check_attempts 1
is_volatile 1
normal_check_interval 2
active_checks_enabled 0
passive_checks_enabled 1
retry_check_interval 1
flap_detection_enabled 0
check_period 24x7
notification_interval 0
notification_period workhours
notification_options w,u,c,r
register 0
}

define service{
use passive-check
host_name tvpl0682
service_description Root Partition
active_checks_enabled 0
check_command check_dummy!0
}


How do I get the service to stay critical until a HARD OK is sent from the distributed server?

Thanks
Steve.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: NSCA and Distributed Nagios

Post by mguthrie »

That seems a little bit odd. I'm noticing on the log output that you posted that there's no plugin output with that either. Is it possible that the sending script has a bug and could be returning false for the return code? "false" will evaluate to 0, and show up as OK for the service status.
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

Re: NSCA and Distributed Nagios

Post by petronagios »

Thanks for your help its all working now. The check didn't stay critical as I had the defined the passive check_command as follows

define service{
use passive-service ; Name of service template to use
host_name tvpl0682
service_description Root Partition
active_checks_enabled 0
check_command check_dummy!0
}

instead of check_dummy!$ARG1$

Many thanks for your help
Steve
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: NSCA and Distributed Nagios

Post by mguthrie »

Good deal, glad it's working for you!
Locked