Hello T,
I ran a tcpdump on port 5666 on the NRPE server and I did receive messages if I ran the command manually, or if one of several other NRPE-based checks were performed. As such, I halted all NRPE checks against the target server and re-ran the tcpdump.
Here is the results of running the event-handler manually, the time stamps pause while the script executes, and then resume when the results are returned.
Code: Select all
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:55:26.538907 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [S], seq 3931271594, win 14600, options [mss 1460,sackOK,TS val 1743830829 ecr 0,nop,wscale 9], length 0
10:55:26.538939 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [S.], seq 3340450746, ack 3931271595, win 14480, options [mss 1460,sackOK,TS val 81587480 ecr 1743830829,nop,wscale 7], length 0
10:55:26.539056 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [.], ack 1, win 29, options [nop,nop,TS val 1743830829 ecr 81587480], length 0
10:55:26.564788 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [P.], seq 1:128, ack 1, win 29, options [nop,nop,TS val 1743830854 ecr 81587480], length 127
10:55:26.564811 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [.], ack 128, win 114, options [nop,nop,TS val 81587506 ecr 1743830854], length 0
10:55:26.656450 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [P.], seq 1:212, ack 128, win 114, options [nop,nop,TS val 81587597 ecr 1743830854], length 211
10:55:26.656646 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [.], ack 212, win 31, options [nop,nop,TS val 1743830946 ecr 81587597], length 0
10:55:26.657325 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [P.], seq 128:262, ack 212, win 31, options [nop,nop,TS val 1743830947 ecr 81587597], length 134
10:55:26.657351 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [.], ack 262, win 122, options [nop,nop,TS val 81587598 ecr 1743830947], length 0
10:55:26.657779 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [P.], seq 212:446, ack 262, win 122, options [nop,nop,TS val 81587599 ecr 1743830947], length 234
10:55:26.658729 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [P.], seq 262:1376, ack 446, win 33, options [nop,nop,TS val 1743830948 ecr 81587599], length 1114
10:55:26.698465 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [.], ack 1376, win 139, options [nop,nop,TS val 81587640 ecr 1743830948], length 0
10:55:34.556957 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [P.], seq 446:1560, ack 1376, win 139, options [nop,nop,TS val 81595498 ecr 1743830948], length 1114
10:55:34.557275 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [P.], seq 1376:1413, ack 1560, win 38, options [nop,nop,TS val 1743838847 ecr 81595498], length 37
10:55:34.557315 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [P.], seq 1560:1597, ack 1413, win 139, options [nop,nop,TS val 81595498 ecr 1743838847], length 37
10:55:34.557319 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [F.], seq 1413, ack 1560, win 38, options [nop,nop,TS val 1743838847 ecr 81595498], length 0
10:55:34.560353 IP unknown.servercentral.net.nrpe > unknown.servercentral.net.44902: Flags [F.], seq 1597, ack 1414, win 139, options [nop,nop,TS val 81595501 ecr 1743838847], length 0
10:55:34.560485 IP unknown.servercentral.net.44902 > unknown.servercentral.net.nrpe: Flags [.], ack 1598, win 38, options [nop,nop,TS val 1743838850 ecr 81595498], length 0
These look A-OK to me.
Now, when I initiate the memory stress test and restart the tcpdump I get the following.
From Nagios Log:
Code: Select all
[1395331169] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;OK;HARD;3;Ram : 4%, Swap : 0% : : OK
[1395331169] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;OK;HARD;3;adjust_swap_viaNRPE
[1395331469] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;1;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395331469] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;1;adjust_swap_viaNRPE
[1395331525] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;2;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395331525] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;SOFT;2;adjust_swap_viaNRPE
[1395331588] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395331588] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;adjust_swap_viaNRPE
[1395331886] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395331886] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;adjust_swap_viaNRPE
[1395332187] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395332187] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;adjust_swap_viaNRPE
[1395332487] SERVICE ALERT: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;Ram : 3%, Swap : 6% : > 98, 5 : CRITICAL
[1395332487] SERVICE EVENT HANDLER: sgr9-test;Memory and Swap Use - With Automatic Cleanup;CRITICAL;HARD;3;adjust_swap_viaNRPE
From TCP Dump on NRPE Target:
Code: Select all
sgr9:~# tcpdump port 5666
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
So it looks like Nagios is not sending packets, despite logging the event handler action in the nagios logs. I checked out the configuration link on the nagios site and followed it through to HOSTS > SGR9-TEST > MEMORY CHECK > EVENT HANDLER and got this information:
Code: Select all
Command Name Command Line
To expand: adjust_swap_viaNRPE
adjust_swap_viaNRPE $USER1$/usr/local/nagios/libexec/check_nrpe -H <IP ADDRESS OF NRPE CLIENT HARD CODED INTO COMMAND> -p 5666 -c adjust_swap
-> $USER1$/usr/local/nagios/libexec/check_nrpe -H <IP ADDRESS OF NRPE CLIENT HARD CODED INTO COMMAND> -p 5666 -c adjust_swap
Enter the command_check definition from a host or service definition and press Go to see the expansion of the command
The command is identical to what I run manually to successfully clear the memory on the NRPE client and return results. (which generated the tcpoutput above.)
I double-checked the IP addresses, those look perfect; and that leaves me at a loss.
Sincerely,
Jesse