Event handler execution without calling corresponding script
Posted: Wed Mar 19, 2014 2:00 pm
Hello,
I have a Nagios server with ~2500 defined services and approximately 550 hosts. For the most part the server operates great with a single operational event handler for a specific set of checks that require an open SSH connection to perform. If the checks fail the event handler checks the status of the SSH connection and reconnects itself if required.
I am adding a new event handler that will execute when a host reaches a pre-defined threshold of memory and swap utilization. The event handler will execute an NRPE call to the target server and pass the command adjust_swap, which evaluates a few variables, adjusts them if required, and clears memory and swap space.
The command executes perfectly when called from the Nagios server to the NRPE client (from the nagios user), the command does its work, echos into a file and is seen in the client-side logs. Because the following command can successfully be executed, I do not believe it is a permissions, network, owner/group, or a configuration issue on the NRPE client side.
While the client is in a normal state, I execute a memory stress test to generate swap usage and throw an alert on the Nagios server. The Nagios server then calls the event handler (according to nagios.log), but nothing ever gets executed.
Here are some of the setup parameters I am currently using, with masking over IP addresses and non-relevant data purged. Also note, event handlers are enabled in nagios.cfg and operate for another set of checks present in the server configuration.
NAGIOS SERVER
Any advise? I've looked through a number of support threads, googled, etc, and saw some things to adjust and have since implemented (permissions, echoing to a file), but to no avail.
Sincerely,
Jesse
I have a Nagios server with ~2500 defined services and approximately 550 hosts. For the most part the server operates great with a single operational event handler for a specific set of checks that require an open SSH connection to perform. If the checks fail the event handler checks the status of the SSH connection and reconnects itself if required.
I am adding a new event handler that will execute when a host reaches a pre-defined threshold of memory and swap utilization. The event handler will execute an NRPE call to the target server and pass the command adjust_swap, which evaluates a few variables, adjusts them if required, and clears memory and swap space.
The command executes perfectly when called from the Nagios server to the NRPE client (from the nagios user), the command does its work, echos into a file and is seen in the client-side logs. Because the following command can successfully be executed, I do not believe it is a permissions, network, owner/group, or a configuration issue on the NRPE client side.
Code: Select all
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H <ADDRESS> -p 5666 -c adjust_swap
OK - Memory and Swap cleared, swappiness is set to 10.
[nagios@nagios ~]$Code: Select all
[root@nagios ~]# tail -n 0 -f nagios/var/nagios.log | grep sgr9-test
[1395251790] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;sgr9-test;Memory and Swap Use - With Automatic Cleanup;1395251748
[1395251790] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;1;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395251790] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;1;adjust_swap_viaNRPE
[1395251851] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;2;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395251851] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;2;adjust_swap_viaNRPE
[1395251910] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395251910] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;adjust_swap_viaNRPENAGIOS SERVER
Code: Select all
define host {
use linux-virt-mach
host_name sgr9-test
hostgroups memoryclear
alias sgr9-test
address <ADDRESS>
event_handler_enabled 1
}
define service{
use generic-service,service-pnp
service_description Memory and Swap Use - With Automatic Cleanup
check_command check_memory_swap
event_handler adjust_swap_viaNRPE
event_handler_enabled 1
is_volatile 1
}
define command{
command_name adjust_swap_viaNRPE
command_line $USER1$/usr/local/nagios/libexec/check_nrpe -H <ADDRESS> -p 5666 -c adjust_swap
}
Sincerely,
Jesse