Strange state change issue
Posted: Wed Mar 25, 2015 11:49 pm
Hi everyone,
I have a nagios server using a mix of gearman and NCSA checking. I'm having this strange problem, wondering if anyone has an idea as to why. Here's the issue:
1. A service turns into a WARNING or CRITICAL state
2. A Host check is received, OK (for gearman check only)
3. Ten seconds after the service failure, host reports CRITICAL (Down) - (host never actually goes down)
4. 50 seconds after host going critical, it checks in and reports OK again
Log (gearman passive check):
[1427343690] PASSIVE SERVICE CHECK: vm1-testvm;Service-Asterisk;2;NOK - Asterisk Service Down!!
[1427343690] SERVICE ALERT: vm1-testvm;Service-Asterisk;CRITICAL;HARD;1;NOK - Asterisk Service Down!!
[1427343690] PASSIVE HOST CHECK: vm1-testvm;0;OK
[1427343700] HOST ALERT: vm1-testvm;DOWN;HARD;1;CRITICAL: Host not reported in - probably down
[1427343750] PASSIVE HOST CHECK: vm1-testvm;0;OK
[1427343750] HOST ALERT: vm1-testvm;UP;HARD;1;OK
Log (NCSA check):
[1427344850] SERVICE ALERT: vm2-testvm;Memory;WARNING;HARD;1;WARNING: There have been no recent passive updates!
[1427344860] HOST ALERT: vm2-testvm;DOWN;HARD;1;CRITICAL: Host not reported in - probably down
This is happening for all hosts, and is becoming a pain what with 4 emails for every host when a service changes state. Any clues would be greatly appreciated.
Regards,
sspaise
I have a nagios server using a mix of gearman and NCSA checking. I'm having this strange problem, wondering if anyone has an idea as to why. Here's the issue:
1. A service turns into a WARNING or CRITICAL state
2. A Host check is received, OK (for gearman check only)
3. Ten seconds after the service failure, host reports CRITICAL (Down) - (host never actually goes down)
4. 50 seconds after host going critical, it checks in and reports OK again
Log (gearman passive check):
[1427343690] PASSIVE SERVICE CHECK: vm1-testvm;Service-Asterisk;2;NOK - Asterisk Service Down!!
[1427343690] SERVICE ALERT: vm1-testvm;Service-Asterisk;CRITICAL;HARD;1;NOK - Asterisk Service Down!!
[1427343690] PASSIVE HOST CHECK: vm1-testvm;0;OK
[1427343700] HOST ALERT: vm1-testvm;DOWN;HARD;1;CRITICAL: Host not reported in - probably down
[1427343750] PASSIVE HOST CHECK: vm1-testvm;0;OK
[1427343750] HOST ALERT: vm1-testvm;UP;HARD;1;OK
Log (NCSA check):
[1427344850] SERVICE ALERT: vm2-testvm;Memory;WARNING;HARD;1;WARNING: There have been no recent passive updates!
[1427344860] HOST ALERT: vm2-testvm;DOWN;HARD;1;CRITICAL: Host not reported in - probably down
This is happening for all hosts, and is becoming a pain what with 4 emails for every host when a service changes state. Any clues would be greatly appreciated.
Regards,
sspaise