Nagios logs bad service-check data
https://support.nagios.com/forum/viewto ... =7&t=61914
For reasons I do not understand, that posting is now locked, and it appears that no action has been taken on it.
At the time, all I had in hand was the symptom, not the cause. Recently, I ran into the same problem again, and found myself sufficiently annoyed that I dug into the code to see what was happening.
First of all, the same problem also affects logging of passive host checks. So both of these types of messages show incorrect data, namely the state and plugin output that were in place before the new passive-check results were processed, not the state and plugin output of the passive check which is purportedly being logged at this time.
Code: Select all
PASSIVE HOST CHECK
PASSIVE SERVICE CHECK
In Nagios 4.3.4, we had these blocks of code:
- Within process_host_check_result():
Code: Select all
2489 /* log passive checks - we need to do this here, as some my bypass external commands by getting dropped in checkresults dir */ 2490 if(hst->check_type == CHECK_TYPE_PASSIVE) { 2491 if(log_passive_checks == TRUE) 2492 logit(NSLOG_PASSIVE_CHECK, FALSE, "PASSIVE HOST CHECK: %s;%d;%s\n", hst->name, new_state, hst->plugin_output); 2493 }
- Within handle_async_service_check_result():
Code: Select all
529 /* log passive checks - we need to do this here, as some my bypass external commands by getting dropped in checkresults dir */ 530 if(temp_service->check_type == CHECK_TYPE_PASSIVE) { 531 if(log_passive_checks == TRUE) 532 logit(NSLOG_PASSIVE_CHECK, FALSE, "PASSIVE SERVICE CHECK: %s;%s;%d;%s\n", temp_service->host_name, temp_service->description, temp_service->current_state, temp_service->plugin_output); 533 }
- new_state was passed in to process_host_check_result(), and left unmodified since the start of that call. This value was produced by a call to get_host_check_return_code() within the handle_async_host_check_result() routine before process_host_check_result() was called.
- hst->plugin_output is also unchanged within process_host_check_result(), having been updated to the new plugin output by the call to parse_check_output() from within the handle_async_host_check_result() routine before that routine called process_host_check_result(). (That value may be overridden in the middle by the call to get_host_check_return_code(), before process_host_check_result() is called.)
- temp_service->current_state was produced by a call to get_service_check_return_code() from within the handle_async_service_check_result() routine, just a bit before the PASSIVE SERVICE CHECK logging within that same routine.
- temp_service->plugin_output was updated to the new plugin output by the call to parse_check_output() from within the handle_async_service_check_result() routine, just a bit before the PASSIVE SERVICE CHECK logging within that same routine. (That value may be overridden in the middle by the call to get_service_check_return_code(), before the logging occurs.)
- Within host_is_passive():
That comment on line 662 turns out to be prescient.
Code: Select all
661 /* log passive checks - we need to do this here as some may bypass external commands by getting dropped in checkresults dir */ 662 /* todo - check if current_state is right - i don't think it is! */ 663 if (log_passive_checks == TRUE) { 664 logit(NSLOG_PASSIVE_CHECK, FALSE, "PASSIVE HOST CHECK: %s;%d;%s\n", 665 hst->name, 666 hst->current_state, 667 hst->plugin_output); 668 }
- Within service_is_passive():
Code: Select all
633 /* log passive checks - we need to do this here, as some may bypass external commands by getting dropped in checkresults dir */ 634 if (log_passive_checks == TRUE) { 635 logit(NSLOG_PASSIVE_CHECK, FALSE, "PASSIVE SERVICE CHECK: %s;%s;%d;%s\n", 636 svc->host_name, 637 svc->description, 638 svc->current_state, 639 svc->plugin_output); 640 }
- The call to host_is_passive() within handle_async_host_check_result() is made before any of the check-result data is applied, so:
- hst->current_state represents the existing state, which has not yet been updated as it will be within the call to host_initial_handling() that has not yet happened. That routine will set it from analysis within the get_host_check_return_code() routine.
- hst->plugin_output represents the existing plugin output, which has not yet been updated as it will be within the call to host_initial_handling() that has not yet happened. That routine will set it by calling parse_check_output() to perform most of the detail work, with some cleanup afterward.
- The call to service_is_passive() within handle_async_service_check_result() is made before any of the check-result data is applied, so:
- svc->current_state represents the existing state, which has not yet been updated as it will be within the call to service_initial_handling() that has not yet happened. That routine will set it from analysis within the get_service_check_return_code() routine.
- svc->plugin_output represents the existing plugin output, which has not yet been updated as it will be within the call to service_initial_handling() that has not yet happened. That routine will set it by calling parse_check_output() to perform most of the detail work, with some cleanup afterward.
- Move lines 661-668 to immediately after the call to host_initial_handling() on line 2268 (in the untouched Nagios 4.4.6 code), deleting the "i don't think it is!" comment in the process, and enclosing the logging in an additional conditional that reflects the necessary constraint for this output:
Code: Select all
host_initial_handling(hst, cr, &old_plugin_output); /* log passive checks - we need to do this here, as some may bypass external commands by getting dropped in checkresults dir */ if (hst->check_type == CHECK_TYPE_PASSIVE) { if (log_passive_checks == TRUE) { logit(NSLOG_PASSIVE_CHECK, FALSE, "PASSIVE HOST CHECK: %s;%d;%s\n", hst->name, hst->current_state, hst->plugin_output); } }
- Move lines 633-640 to immediately after the call to service_initial_handling() on line 1237 (in the untouched Nagios 4.4.6 code), enclosing the logging in an additional conditional that reflects the necessary constraint for this output:
Code: Select all
service_initial_handling(svc, cr, &old_plugin_output); /* log passive checks - we need to do this here, as some may bypass external commands by getting dropped in checkresults dir */ if (svc->check_type == CHECK_TYPE_PASSIVE) { if (log_passive_checks == TRUE) { logit(NSLOG_PASSIVE_CHECK, FALSE, "PASSIVE SERVICE CHECK: %s;%s;%d;%s\n", svc->host_name, svc->description, svc->current_state, svc->plugin_output); } }