Two simultaneous statuses for an active check
Posted: Thu Jul 23, 2015 9:46 am
Hello there,
we are experiencing an issue on some active checks which are reporting two service status at the same time.
These are the logs related to the last occurences of the issue:
[1437329784] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;UNKNOWN;HARD;1;INTERFACE_TRAFFIC UNKNOWN - Error:Time duration between plugin calls is invalid
[1437329784] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;UNKNOWN;HARD;1;xi_service_event_handler
[1437329784] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;INTERFACE_TRAFFIC OK - (in=819.87Mbps/out=140.09Mbps)
[1437329784] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;xi_service_event_handler
[1437158769] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;CRITICAL;HARD;1;INTERFACE_TRAFFIC CRITICAL - (in=0.00Mbps/out=0.00Mbps)
[1437158769] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;CRITICAL;HARD;1;xi_service_event_handler
[1437158769] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;INTERFACE_TRAFFIC OK - (in=822.27Mbps/out=139.35Mbps)
[1437158769] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;xi_service_event_handler
These are the service, serrvice template and the command configurations:
define service {
host_name ROUTER.lan
service_description Traffic_ge-0/0/5
use service-traffic
servicegroups +Services_for_SD,wrk1servicegroups_grp
check_command check_interface_traffic_rate! -C $USER16$ -n ge-0/0/5 -u Mbps -w 30:,0: -c 30:,0:!!!!!!!
max_check_attempts 1
check_interval 5
passive_checks_enabled 0
contacts nagios-tech,netsupport-mail,network-sms
register 1
}
define service {
name service-traffic
use generic-service
is_volatile 0
max_check_attempts 2
check_interval 5
retry_interval 1
check_period 24x7
notification_interval 1440
notification_period 24x7
notification_options w,c,u,r
register 0
}
define command {
command_name check_interface_traffic_rate
command_line $USER1$/check_interface_traffic.pl -H $HOSTADDRESS$ $ARG1$
}
The issue is occurring since we upgraded Nagios XI to 2014R2.7 and modgearman to version 1.5.
We have 3 mod_gearman workers running but the "Traffic_ge-0/0/5" service is configured to run only on a specific worker.
Attached you can find the following files:
-check_interface_traffic.pl perl script used by the command
-mod_geaman_neb.conf
-mod_gearman_worker.conf from worker where that service runs
-mod_gearman_worker2.conf from worker where other services run
Can anyone help us in investigating the problem?
Thanks.
we are experiencing an issue on some active checks which are reporting two service status at the same time.
These are the logs related to the last occurences of the issue:
[1437329784] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;UNKNOWN;HARD;1;INTERFACE_TRAFFIC UNKNOWN - Error:Time duration between plugin calls is invalid
[1437329784] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;UNKNOWN;HARD;1;xi_service_event_handler
[1437329784] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;INTERFACE_TRAFFIC OK - (in=819.87Mbps/out=140.09Mbps)
[1437329784] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;xi_service_event_handler
[1437158769] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;CRITICAL;HARD;1;INTERFACE_TRAFFIC CRITICAL - (in=0.00Mbps/out=0.00Mbps)
[1437158769] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;CRITICAL;HARD;1;xi_service_event_handler
[1437158769] SERVICE ALERT: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;INTERFACE_TRAFFIC OK - (in=822.27Mbps/out=139.35Mbps)
[1437158769] GLOBAL SERVICE EVENT HANDLER: ROUTER.lan;Traffic_ge-0/0/5;OK;HARD;1;xi_service_event_handler
These are the service, serrvice template and the command configurations:
define service {
host_name ROUTER.lan
service_description Traffic_ge-0/0/5
use service-traffic
servicegroups +Services_for_SD,wrk1servicegroups_grp
check_command check_interface_traffic_rate! -C $USER16$ -n ge-0/0/5 -u Mbps -w 30:,0: -c 30:,0:!!!!!!!
max_check_attempts 1
check_interval 5
passive_checks_enabled 0
contacts nagios-tech,netsupport-mail,network-sms
register 1
}
define service {
name service-traffic
use generic-service
is_volatile 0
max_check_attempts 2
check_interval 5
retry_interval 1
check_period 24x7
notification_interval 1440
notification_period 24x7
notification_options w,c,u,r
register 0
}
define command {
command_name check_interface_traffic_rate
command_line $USER1$/check_interface_traffic.pl -H $HOSTADDRESS$ $ARG1$
}
The issue is occurring since we upgraded Nagios XI to 2014R2.7 and modgearman to version 1.5.
We have 3 mod_gearman workers running but the "Traffic_ge-0/0/5" service is configured to run only on a specific worker.
Attached you can find the following files:
-check_interface_traffic.pl perl script used by the command
-mod_geaman_neb.conf
-mod_gearman_worker.conf from worker where that service runs
-mod_gearman_worker2.conf from worker where other services run
Can anyone help us in investigating the problem?
Thanks.