Page 1 of 2

Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 11:22 am
by vext
Getting an error like this:
[1485243023] Warning: Return code of 127 for check of service 'Puppet Status' on host 'web1.mdlocal' was out of bounds. Make sure the plugin you're trying to run actually exists.

This is happening on only one of our Nagios hosts. On this one host There are some checks that return this consistently. Every host that the affected services are checked on return the same error. I haven't found any pattern to what hosts are affected. I have checked obvious things such as resources.cfg and file and directory permissions. I was hoping there would be some insight someone here could help impart on what else to check.

As an example of what is stumping me, here are two checks. The first is working normally, and the second is giving this error consistently. Both are on the same host and both are using the same check_command. Config taken from objects.cache to show all options.

Code: Select all

define service {
  host_name web1.mdlocal
  service_description NRPE
  check_period  24x7
  check_command check_nrpe_slow!check_nrpe_alive
  contact_groups  ops-alert,ops-oncall-primary,monitor-xmpp
  notification_period 24x7
  initial_state o
  check_interval  2.000000
  retry_interval  2.000000
  max_check_attempts  5
  is_volatile 0
  parallelize_check 1
  active_checks_enabled 1
  passive_checks_enabled  1
  obsess_over_service 1
  event_handler_enabled 1
  low_flap_threshold  0.000000
  high_flap_threshold 0.000000
  flap_detection_enabled  1
  flap_detection_options  o,w,u,c
  freshness_threshold 21600
  check_freshness 1
  notification_options  w,c,r
  notifications_enabled 1
  notification_interval 60.000000
  first_notification_delay  0.000000
  stalking_options  n
  process_perf_data 1
  failure_prediction_enabled  1
  notes_url https://wiki.runbook.url/obviously_altered
  retain_status_information 1
  retain_nonstatus_information  1
  }
Non-working:

Code: Select all

define service {
  host_name web1.mdlocal
  service_description Puppet Status
  check_period  24x7
  check_command check_nrpe_slow!check_puppet
  contact_groups  ops-alert,ops-oncall-primary,monitor-xmpp
  notification_period none
  initial_state o
  check_interval  720.000000
  retry_interval  15.000000
  max_check_attempts  3
  is_volatile 0
  parallelize_check 1
  active_checks_enabled 1
  passive_checks_enabled  1
  obsess_over_service 1
  event_handler_enabled 1
  low_flap_threshold  0.000000
  high_flap_threshold 0.000000
  flap_detection_enabled  1
  flap_detection_options  o,w,u,c
  freshness_threshold 5400
  check_freshness 1
  notification_options  w,c,r
  notifications_enabled 1
  notification_interval 60.000000
  first_notification_delay  0.000000
  stalking_options  n
  process_perf_data 1
  failure_prediction_enabled  1
  retain_status_information 1
  retain_nonstatus_information  1
  }

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 11:47 am
by vext
Just to test I changed the failing check to match. (This from objects.cache again for validation)

Working:

Code: Select all

define service {
  host_name web1.mdlocal
  service_description NRPE
  check_period  24x7
  check_command check_nrpe_slow!check_nrpe_alive
  contact_groups  ops-alert,ops-oncall-primary,monitor-xmpp
  notification_period 24x7
  initial_state o
  check_interval  2.000000
  retry_interval  2.000000
  max_check_attempts  5
  is_volatile 0
  parallelize_check 1
  active_checks_enabled 1
  passive_checks_enabled  1
  obsess_over_service 1
  event_handler_enabled 1
  low_flap_threshold  0.000000
  high_flap_threshold 0.000000
  flap_detection_enabled  1
  flap_detection_options  o,w,u,c
  freshness_threshold 21600
  check_freshness 1
  notification_options  w,c,r
  notifications_enabled 1
  notification_interval 60.000000
  first_notification_delay  0.000000
  stalking_options  n
  process_perf_data 1
  failure_prediction_enabled  1
  notes_url https://wiki.runbook.url/obviously_altered
  retain_status_information 1
  retain_nonstatus_information  1
  }
Not-working:

Code: Select all

define service {
  host_name web1.mdlocal
  service_description Puppet Status
  check_period  24x7
  check_command check_nrpe_slow!check_nrpe_alive
  contact_groups  ops-alert,ops-oncall-primary,monitor-xmpp
  notification_period 24x7
  initial_state o
  check_interval  2.000000
  retry_interval  2.000000
  max_check_attempts  5
  is_volatile 0
  parallelize_check 1
  active_checks_enabled 1
  passive_checks_enabled  1
  obsess_over_service 1
  event_handler_enabled 1
  low_flap_threshold  0.000000
  high_flap_threshold 0.000000
  flap_detection_enabled  1
  flap_detection_options  o,w,u,c
  freshness_threshold 21600
  check_freshness 1
  notification_options  w,c,r
  notifications_enabled 1
  notification_interval 60.000000
  first_notification_delay  0.000000
  stalking_options  n
  process_perf_data 1
  failure_prediction_enabled  1
  notes_url https://wiki.runbook.url/obviously_altered
  retain_status_information 1
  retain_nonstatus_information  1
  }

Some debug logging of the failure (with mask of -1 for debug output) if it helps:

Code: Select all

[1485275693.620781] [001.0] [pid=17635] event_execution_loop() start
[1485275693.620788] [008.1] [pid=17635] ** Event Check Loop
[1485275693.620794] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:37 2017
[1485275693.620797] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:34:36 2017
[1485275693.620798] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275693.620803] [001.0] [pid=17635] handle_timed_event() start
[1485275693.620805] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.620807] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.620809] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_CHECK_REAPER, Run Time: Tue Jan 24 08:34:37 2017
[1485275693.620811] [008.0] [pid=17635] ** Check Result Reaper
[1485275693.620815] [001.0] [pid=17635] reap_check_results() start
[1485275693.620817] [016.0] [pid=17635] Starting to reap check results.
[1485275693.620829] [016.1] [pid=17635] Starting to read check result queue '/usr/local/nagios/var/spool/checkresults'...
[1485275693.620835] [016.0] [pid=17635] Finished reaping 0 check results
[1485275693.620837] [001.0] [pid=17635] reap_check_results() end
[1485275693.620839] [001.0] [pid=17635] handle_timed_event() end
[1485275693.620841] [001.0] [pid=17635] reschedule_event()
[1485275693.620843] [001.0] [pid=17635] add_event()
[1485275693.620845] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.620847] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.620849] [064.1] [pid=17635] Making callbacks (type 18)...
[1485275693.620851] [008.1] [pid=17635] ** Event Check Loop
[1485275693.620854] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:40 2017
[1485275693.620857] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:34:36 2017
[1485275693.620859] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275693.620861] [001.0] [pid=17635] handle_timed_event() start
[1485275693.620863] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.620866] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.620868] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_COMMAND_CHECK, Run Time: Tue Jan 24 08:34:40 2017
[1485275693.620870] [008.0] [pid=17635] ** External Command Check Event
[1485275693.620873] [001.0] [pid=17635] check_for_external_commands()
[1485275693.620875] [064.1] [pid=17635] Making callbacks (type 18)...
[1485275693.620878] [001.0] [pid=17635] process_external_command1()
[1485275693.620881] [128.2] [pid=17635] Raw command entry: [1485275627] SCHEDULE_FORCED_SVC_CHECK;web1.mdlocal;Puppet Status;1485275627
[1485275693.620900] [064.1] [pid=17635] Making callbacks (type 9)...
[1485275693.620903] [064.2] [pid=17635] Callback #1 (type 9) return code = 0
[1485275693.620905] [064.1] [pid=17635] Making callbacks (type 24)...
[1485275693.620907] [064.2] [pid=17635] Callback #1 (type 24) return code = 0
[1485275693.620909] [001.0] [pid=17635] process_external_command2()
[1485275693.620910] [128.1] [pid=17635] External Command Type: 54
[1485275693.620912] [128.1] [pid=17635] Command Entry Time: 1485275627
[1485275693.620914] [128.1] [pid=17635] Command Arguments: web1.mdlocal;Puppet Status;1485275627
[1485275693.620922] [001.0] [pid=17635] schedule_service_check()
[1485275693.620925] [016.0] [pid=17635] Scheduling a forced, active check of service 'Puppet Status' on host 'web1.mdlocal' @ Tue Jan 24 08:33:47 2017
[1485275693.620928] [016.2] [pid=17635] Scheduling new service check event.
[1485275693.620930] [001.0] [pid=17635] reschedule_event()
[1485275693.620932] [001.0] [pid=17635] add_event()
[1485275693.620933] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.620935] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.620937] [064.1] [pid=17635] Making callbacks (type 20)...
[1485275693.620939] [064.1] [pid=17635] Making callbacks (type 24)...
[1485275693.620940] [064.2] [pid=17635] Callback #1 (type 24) return code = 0
[1485275693.620943] [001.0] [pid=17635] process_external_command1()
[1485275693.620945] [128.2] [pid=17635] Raw command entry: [1485275582] SCHEDULE_FORCED_SVC_CHECK;web1.mdlocal;Puppet Status;1485275582
[1485275693.620963] [064.1] [pid=17635] Making callbacks (type 9)...
[1485275693.620966] [064.2] [pid=17635] Callback #1 (type 9) return code = 0
[1485275693.620968] [064.1] [pid=17635] Making callbacks (type 24)...
[1485275693.620969] [064.2] [pid=17635] Callback #1 (type 24) return code = 0
[1485275693.620971] [001.0] [pid=17635] process_external_command2()
[1485275693.620972] [128.1] [pid=17635] External Command Type: 54
[1485275693.620974] [128.1] [pid=17635] Command Entry Time: 1485275582
[1485275693.620976] [128.1] [pid=17635] Command Arguments: web1.mdlocal;Puppet Status;1485275582
[1485275693.620979] [001.0] [pid=17635] schedule_service_check()
[1485275693.620982] [016.0] [pid=17635] Scheduling a forced, active check of service 'Puppet Status' on host 'web1.mdlocal' @ Tue Jan 24 08:33:02 2017
[1485275693.620984] [016.2] [pid=17635] Found another service check event for this service @ Tue Jan 24 08:33:47 2017
[1485275693.620986] [016.2] [pid=17635] New service check event is forced and occurs before the existing event, so the new event will be used instead.
[1485275693.620987] [001.0] [pid=17635] remove_event()
[1485275693.620989] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.620991] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.620993] [016.2] [pid=17635] Scheduling new service check event.
[1485275693.620994] [001.0] [pid=17635] reschedule_event()
[1485275693.620996] [001.0] [pid=17635] add_event()
[1485275693.620997] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.620999] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.621001] [064.1] [pid=17635] Making callbacks (type 20)...
[1485275693.621002] [064.1] [pid=17635] Making callbacks (type 24)...
[1485275693.621004] [064.2] [pid=17635] Callback #1 (type 24) return code = 0
[1485275693.621006] [001.0] [pid=17635] handle_timed_event() end
[1485275693.621007] [001.0] [pid=17635] reschedule_event()
[1485275693.621009] [001.0] [pid=17635] add_event()
[1485275693.621011] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.621012] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.621014] [008.1] [pid=17635] ** Event Check Loop
[1485275693.621016] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:45 2017
[1485275693.621018] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:33:02 2017
[1485275693.621020] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275693.621022] [001.0] [pid=17635] handle_timed_event() start
[1485275693.621024] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275693.621025] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275693.621028] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_STATUS_SAVE, Run Time: Tue Jan 24 08:34:45 2017
[1485275693.621029] [008.0] [pid=17635] ** Status Data Save Event
[1485275693.621031] [064.1] [pid=17635] Making callbacks (type 25)...
[1485275693.621033] [001.0] [pid=17635] save_status_data()
[1485275693.621035] [004.2] [pid=17635] Writing status data to temp file '/usr/local/nagios/var/nagios.tmpXXXXXX'
[1485275694.053108] [064.1] [pid=17635] Making callbacks (type 25)...
[1485275694.053123] [001.0] [pid=17635] handle_timed_event() end
[1485275694.053126] [001.0] [pid=17635] reschedule_event()
[1485275694.053128] [001.0] [pid=17635] add_event()
[1485275694.053131] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.053135] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.053137] [008.1] [pid=17635] ** Event Check Loop
[1485275694.053145] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:53 2017
[1485275694.053148] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:33:02 2017
[1485275694.053150] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275694.053155] [001.0] [pid=17635] handle_timed_event() start
[1485275694.053157] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.053167] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.053170] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_CHECK_REAPER, Run Time: Tue Jan 24 08:34:53 2017
[1485275694.053172] [008.0] [pid=17635] ** Check Result Reaper
[1485275694.053174] [001.0] [pid=17635] reap_check_results() start
[1485275694.053175] [016.0] [pid=17635] Starting to reap check results.
[1485275694.053189] [016.1] [pid=17635] Starting to read check result queue '/usr/local/nagios/var/spool/checkresults'...
[1485275694.053195] [016.0] [pid=17635] Finished reaping 0 check results
[1485275694.053197] [001.0] [pid=17635] reap_check_results() end
[1485275694.053198] [001.0] [pid=17635] handle_timed_event() end
[1485275694.053200] [001.0] [pid=17635] reschedule_event()
[1485275694.053202] [001.0] [pid=17635] add_event()
[1485275694.053203] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.053205] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.053207] [008.1] [pid=17635] ** Event Check Loop
[1485275694.053209] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:53 2017
[1485275694.053211] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:33:02 2017
[1485275694.053213] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275694.053215] [001.0] [pid=17635] handle_timed_event() start
[1485275694.053216] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.053218] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.053220] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_COMMAND_CHECK, Run Time: Tue Jan 24 08:34:53 2017
[1485275694.053222] [008.0] [pid=17635] ** External Command Check Event
[1485275694.053224] [001.0] [pid=17635] check_for_external_commands()
[1485275694.053226] [001.0] [pid=17635] handle_timed_event() end
[1485275694.053228] [001.0] [pid=17635] reschedule_event()
[1485275694.053230] [001.0] [pid=17635] add_event()
[1485275694.053231] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.053233] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.053235] [008.1] [pid=17635] ** Event Check Loop
[1485275694.053237] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:55 2017
[1485275694.053239] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:33:02 2017
[1485275694.053240] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275694.053242] [024.1] [pid=17635] We're not executing service checks right now, so we'll skip this event.
[1485275694.053245] [008.1] [pid=17635] Running event...
[1485275694.053246] [001.0] [pid=17635] handle_timed_event() start
[1485275694.053248] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.053249] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.053252] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_SERVICE_CHECK, Run Time: Tue Jan 24 08:33:02 2017
[1485275694.053254] [008.0] [pid=17635] ** Service Check Event ==> Host: 'web1.mdlocal', Service: 'Puppet Status', Options: 1, Latency: 112.053000 sec
[1485275694.053262] [001.0] [pid=17635] run_scheduled_service_check() start
[1485275694.053263] [016.0] [pid=17635] Attempting to run scheduled check of service 'Puppet Status' on host 'web1.mdlocal': check options=1, latency=112.053000
[1485275694.053269] [001.0] [pid=17635] run_async_service_check()
[1485275694.053275] [001.0] [pid=17635] check_service_check_viability()
[1485275694.053278] [064.1] [pid=17635] Making callbacks (type 13)...
[1485275694.053281] [064.2] [pid=17635] Callback #1 (type 13) return code = 0
[1485275694.053283] [016.0] [pid=17635] Checking service 'Puppet Status' on host 'web1.mdlocal'...
[1485275694.053290] [001.0] [pid=17635] get_raw_command_line_r()
[1485275694.053294] [2320.2] [pid=17635] Raw Command Input: $USER1$/check_nrpe -t 60 -H $HOSTADDRESS$ -c $ARG1$
[1485275694.053297] [001.0] [pid=17635] process_macros_r()
[1485275694.053300] [2048.1] [pid=17635] **** BEGIN MACRO PROCESSING ***********
[1485275694.053301] [2048.1] [pid=17635] Processing: 'check_nrpe_alive'
[1485275694.053307] [2048.2] [pid=17635]   Processing part: 'check_nrpe_alive'
[1485275694.053313] [2048.2] [pid=17635]   Not currently in macro.  Running output (16): 'check_nrpe_alive'
[1485275694.053315] [2048.1] [pid=17635]   Done.  Final output: 'check_nrpe_alive'
[1485275694.053317] [2048.1] [pid=17635] **** END MACRO PROCESSING *************
[1485275694.053318] [2320.2] [pid=17635] Expanded Command Output: $USER1$/check_nrpe -t 60 -H $HOSTADDRESS$ -c $ARG1$
[1485275694.053320] [001.0] [pid=17635] process_macros_r()
[1485275694.053322] [2048.1] [pid=17635] **** BEGIN MACRO PROCESSING ***********
[1485275694.053323] [2048.1] [pid=17635] Processing: '$USER1$/check_nrpe -t 60 -H $HOSTADDRESS$ -c $ARG1$'
[1485275694.053325] [2048.2] [pid=17635]   Processing part: ''
[1485275694.053327] [2048.2] [pid=17635]   Not currently in macro.  Running output (0): ''
[1485275694.053328] [2048.2] [pid=17635]   Processing part: 'USER1'
[1485275694.053332] [2048.2] [pid=17635]   Processed 'USER1', Clean Options: 0, Free: 0
[1485275694.053334] [2048.2] [pid=17635]   Processed 'USER1', Clean Options: 0, Free: 0
[1485275694.053336] [2048.2] [pid=17635]   Cleaning options: global=0, local=0, effective=0
[1485275694.053338] [2048.2] [pid=17635]   Uncleaned macro.  Running output (25): '/usr/local/nagios/libexec'
[1485275694.053340] [2048.2] [pid=17635]   Just finished macro.  Running output (25): '/usr/local/nagios/libexec'
[1485275694.053342] [2048.2] [pid=17635]   Processing part: '/check_nrpe -t 60 -H '
[1485275694.053344] [2048.2] [pid=17635]   Not currently in macro.  Running output (46): '/usr/local/nagios/libexec/check_nrpe -t 60 -H '
[1485275694.053345] [2048.2] [pid=17635]   Processing part: 'HOSTADDRESS'
[1485275694.053347] [2048.2] [pid=17635]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 0
[1485275694.053349] [2048.2] [pid=17635]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 0
[1485275694.053351] [2048.2] [pid=17635]   Cleaning options: global=0, local=0, effective=0
[1485275694.053353] [2048.2] [pid=17635]   Uncleaned macro.  Running output (56): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29'
[1485275694.053355] [2048.2] [pid=17635]   Just finished macro.  Running output (56): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29'
[1485275694.053356] [2048.2] [pid=17635]   Processing part: ' -c '
[1485275694.053358] [2048.2] [pid=17635]   Not currently in macro.  Running output (60): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c '
[1485275694.053360] [2048.2] [pid=17635]   Processing part: 'ARG1'
[1485275694.053361] [2048.2] [pid=17635]   Processed 'ARG1', Clean Options: 0, Free: 0
[1485275694.053363] [2048.2] [pid=17635]   Processed 'ARG1', Clean Options: 0, Free: 0
[1485275694.053365] [2048.2] [pid=17635]   Cleaning options: global=0, local=0, effective=0
[1485275694.053367] [2048.2] [pid=17635]   Uncleaned macro.  Running output (76): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485275694.053368] [2048.2] [pid=17635]   Just finished macro.  Running output (76): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485275694.053370] [2048.2] [pid=17635]   Processing part: ''
[1485275694.053372] [2048.2] [pid=17635]   Not currently in macro.  Running output (76): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485275694.053373] [2048.1] [pid=17635]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485275694.053375] [2048.1] [pid=17635] **** END MACRO PROCESSING *************
[1485275694.053377] [064.1] [pid=17635] Making callbacks (type 13)...
[1485275694.053379] [064.2] [pid=17635] Callback #1 (type 13) return code = 0
[1485275694.053396] [016.1] [pid=17635] Check result output will be written to '/usr/local/nagios/var/spool/checkresults/check1UkFKt' (fd=13)
[1485275694.058053] [001.0] [pid=17707] process_macros_r()
[1485275694.058070] [001.0] [pid=17707] process_macros_r()
[1485275694.058075] [001.0] [pid=17707] process_macros_r()
[1485275694.058081] [2048.1] [pid=17707] **** BEGIN MACRO PROCESSING ***********
[1485275694.058083] [2048.1] [pid=17707] Processing: 'essentials::admin'
[1485275694.058085] [2048.2] [pid=17707]   Processing part: 'essentials::admin'
[1485275694.058087] [2048.2] [pid=17707]   Not currently in macro.  Running output (17): 'essentials::admin'
[1485275694.058089] [2048.1] [pid=17707]   Done.  Final output: 'essentials::admin'
[1485275694.058090] [2048.1] [pid=17707] **** END MACRO PROCESSING *************
[1485275694.058094] [001.0] [pid=17707] process_macros_r()
[1485275694.058097] [001.0] [pid=17707] process_macros_r()
[1485275694.058099] [2048.1] [pid=17707] **** BEGIN MACRO PROCESSING ***********
[1485275694.058100] [2048.1] [pid=17707] Processing: 'https://wiki.runbook.url/obviously_altered'
[1485275694.058102] [2048.2] [pid=17707]   Processing part: 'https://wiki.runbook.url/obviously_altered'
[1485275694.058104] [2048.2] [pid=17707]   Not currently in macro.  Running output (58): 'https://wiki.runbook.url/obviously_altered'
[1485275694.058106] [2048.1] [pid=17707]   Done.  Final output: 'https://wiki.runbook.url/obviously_altered'
[1485275694.058107] [2048.1] [pid=17707] **** END MACRO PROCESSING *************
[1485275694.058110] [001.0] [pid=17707] process_macros_r()
[1485275694.058202] [001.0] [pid=17707] process_macros_r()
[1485275694.058206] [001.0] [pid=17707] process_macros_r()
[1485275694.058210] [001.0] [pid=17707] process_macros_r()
[1485275694.058214] [001.0] [pid=17707] process_macros_r()
[1485275694.058217] [001.0] [pid=17707] process_macros_r()
[1485275694.058221] [001.0] [pid=17707] process_macros_r()
[1485275694.058742] [016.2] [pid=17635] Service check is executing in child process (pid=17707)
[1485275694.058766] [001.0] [pid=17635] handle_timed_event() end
[1485275694.058769] [008.1] [pid=17635] ** Event Check Loop
[1485275694.058776] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:55 2017
[1485275694.058779] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:34:36 2017
[1485275694.058781] [008.1] [pid=17635] Current/Max Service Checks: 1/0
[1485275694.058784] [024.1] [pid=17635] We're not executing service checks right now, so we'll skip this event.
[1485275694.058786] [001.0] [pid=17635] remove_event()
[1485275694.058788] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.058795] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.058805] [001.0] [pid=17635] reschedule_event()
[1485275694.058807] [001.0] [pid=17635] add_event()
[1485275694.061264] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275694.061268] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275694.061271] [064.1] [pid=17635] Making callbacks (type 20)...
... ABOVE 14 lines repeated many times ...

[1485275695.173412] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_STATUS_SAVE, Run Time: Tue Jan 24 08:34:55 2017
[1485275695.173415] [008.0] [pid=17635] ** Status Data Save Event
[1485275695.173417] [064.1] [pid=17635] Making callbacks (type 25)...
[1485275695.173418] [001.0] [pid=17635] save_status_data()
[1485275695.173423] [004.2] [pid=17635] Writing status data to temp file '/usr/local/nagios/var/nagios.tmpXXXXXX'
[1485275695.611147] [064.1] [pid=17635] Making callbacks (type 25)...
[1485275695.611162] [001.0] [pid=17635] handle_timed_event() end
[1485275695.611164] [001.0] [pid=17635] reschedule_event()
[1485275695.611166] [001.0] [pid=17635] add_event()
[1485275695.611172] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275695.611177] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275695.611179] [008.1] [pid=17635] ** Event Check Loop
[1485275695.611187] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:55 2017
[1485275695.611191] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:34:55 2017
[1485275695.611192] [008.1] [pid=17635] Current/Max Service Checks: 1/0
[1485275695.611197] [001.0] [pid=17635] handle_timed_event() start
[1485275695.611199] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275695.611201] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275695.611203] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_CHECK_REAPER, Run Time: Tue Jan 24 08:34:55 2017
[1485275695.611205] [008.0] [pid=17635] ** Check Result Reaper
[1485275695.611212] [001.0] [pid=17635] reap_check_results() start
[1485275695.611214] [016.0] [pid=17635] Starting to reap check results.
[1485275695.611227] [016.1] [pid=17635] Starting to read check result queue '/usr/local/nagios/var/spool/checkresults'...
[1485275695.611238] [016.1] [pid=17635] Processing check result file: '/usr/local/nagios/var/spool/checkresults/cFuSPEa'
[1485275695.611278] [016.2] [pid=17635] Found a check result (#1) to handle...
[1485275695.611286] [016.1] [pid=17635] Handling check result for service 'Puppet Status' on host 'web1.mdlocal'...
[1485275695.611288] [001.0] [pid=17635] handle_async_service_check_result()
[1485275695.611290] [016.0] [pid=17635] ** Handling check result for service 'Puppet Status' on host 'web1.mdlocal'...
[1485275695.611292] [016.1] [pid=17635] HOST: web1.mdlocal, SERVICE: Puppet Status, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 127, OUTPUT: (null)
[1485275695.611319] [064.1] [pid=17635] Making callbacks (type 9)...
[1485275695.611322] [064.2] [pid=17635] Callback #1 (type 9) return code = 0
[1485275695.611325] [016.2] [pid=17635] ST: HARD  CA: 5  MA: 5  CS: 2  LS: 2  LHS: 2
[1485275695.611328] [016.1] [pid=17635] Service is in a non-OK state!
[1485275695.611330] [016.1] [pid=17635] Host is currently UP, so we'll recheck its state to make sure...
[1485275695.611342] [016.1] [pid=17635] * Using last known host state: 0
[1485275695.611344] [016.1] [pid=17635] Current/Max Attempt(s): 5/5
[1485275695.611346] [016.1] [pid=17635] Service has reached max number of rechecks, so we'll handle the error...
[1485275695.611348] [001.0] [pid=17635] check_for_service_flapping()
[1485275695.611350] [016.1] [pid=17635] Checking service 'Puppet Status' on host 'web1.mdlocal' for flapping...
[1485275695.611352] [016.2] [pid=17635] LFT=5.00, HFT=20.00, CPC=0.00, PSC=0.00%
[1485275695.611358] [016.1] [pid=17635] Service is not flapping (0.00% state change).
[1485275695.611361] [001.0] [pid=17635] check_for_host_flapping()
[1485275695.611362] [016.1] [pid=17635] Checking host 'web1.mdlocal' for flapping...
[1485275695.611366] [016.2] [pid=17635] LFT=5.00, HFT=20.00, CPC=0.00, PSC=0.00%
[1485275695.611369] [016.1] [pid=17635] Host is not flapping (0.00% state change).
[1485275695.611372] [001.0] [pid=17635] service_notification()
[1485275695.611376] [032.0] [pid=17635] ** Service Notification Attempt ** Host: 'web1.mdlocal', Service: 'Puppet Status', Type: 0, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1485275695.611378] [001.0] [pid=17635] check_service_notification_viability()
[1485275695.611381] [032.1] [pid=17635] Notifications are disabled, so service notifications will not be sent out.
[1485275695.611383] [032.0] [pid=17635] Notification viability test failed.  No notification will be sent out.
[1485275695.611386] [016.1] [pid=17635] Rescheduling next check of service at Tue Jan 24 08:36:54 2017
[1485275695.611388] [001.0] [pid=17635] get_next_valid_time()
[1485275695.611390] [001.0] [pid=17635] check_time_against_period()
[1485275695.611394] [001.0] [pid=17635] schedule_service_check()
[1485275695.611397] [016.0] [pid=17635] Scheduling a non-forced, active check of service 'Puppet Status' on host 'web1.mdlocal' @ Tue Jan 24 08:36:54 2017
[1485275695.611400] [016.2] [pid=17635] Scheduling new service check event.
[1485275695.611401] [001.0] [pid=17635] reschedule_event()
[1485275695.611403] [001.0] [pid=17635] add_event()
[1485275695.613764] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275695.613768] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275695.613770] [064.1] [pid=17635] Making callbacks (type 20)...
[1485275695.613773] [064.1] [pid=17635] Making callbacks (type 13)...
[1485275695.613775] [064.2] [pid=17635] Callback #1 (type 13) return code = 0
[1485275695.613777] [064.1] [pid=17635] Making callbacks (type 20)...
[1485275695.613779] [016.1] [pid=17635] Deleted check result file '/usr/local/nagios/var/spool/checkresults/cFuSPEa'
[1485275695.613781] [016.0] [pid=17635] Finished reaping 1 check results
[1485275695.613783] [001.0] [pid=17635] reap_check_results() end
[1485275695.613785] [001.0] [pid=17635] handle_timed_event() end
[1485275695.613786] [001.0] [pid=17635] reschedule_event()
[1485275695.613788] [001.0] [pid=17635] add_event()
[1485275695.613790] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275695.613791] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275695.613793] [008.1] [pid=17635] ** Event Check Loop
[1485275695.613796] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:34:57 2017
[1485275695.613798] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:34:55 2017
[1485275695.613799] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485275695.613802] [024.1] [pid=17635] We're not executing service checks right now, so we'll skip this event.
[1485275695.613804] [001.0] [pid=17635] remove_event()
[1485275695.613806] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275695.613807] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275695.613810] [001.0] [pid=17635] reschedule_event()
[1485275695.613813] [001.0] [pid=17635] add_event()
[1485275695.614168] [064.1] [pid=17635] Making callbacks (type 8)...
[1485275695.614172] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485275695.614178] [064.1] [pid=17635] Making callbacks (type 20)...

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 11:54 am
by vext
One more debug log just to show the working check output in debug

Code: Select all

[1485276474.456291] [008.1] [pid=17635] ** Event Check Loop
[1485276474.456298] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:55 2017
[1485276474.456300] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:55 2017
[1485276474.456302] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485276474.456304] [008.2] [pid=17635] No events to execute at the moment.  Idling for a bit...
[1485276474.456306] [001.0] [pid=17635] check_for_external_commands()
[1485276474.456308] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276474.456311] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276474.707697] [008.1] [pid=17635] ** Event Check Loop
[1485276474.707707] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:55 2017
[1485276474.707710] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:55 2017
[1485276474.707711] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485276474.707715] [008.2] [pid=17635] No events to execute at the moment.  Idling for a bit...
[1485276474.707716] [001.0] [pid=17635] check_for_external_commands()
[1485276474.707719] [001.0] [pid=17635] process_external_command1()
[1485276474.707721] [128.2] [pid=17635] Raw command entry: [1485276474] SCHEDULE_FORCED_SVC_CHECK;web1.mdlocal;NRPE;1485276474
[1485276474.707756] [064.1] [pid=17635] Making callbacks (type 9)...
[1485276474.707760] [064.2] [pid=17635] Callback #1 (type 9) return code = 0
[1485276474.707762] [064.1] [pid=17635] Making callbacks (type 24)...
[1485276474.707764] [064.2] [pid=17635] Callback #1 (type 24) return code = 0
[1485276474.707766] [001.0] [pid=17635] process_external_command2()
[1485276474.707768] [128.1] [pid=17635] External Command Type: 54
[1485276474.707769] [128.1] [pid=17635] Command Entry Time: 1485276474
[1485276474.707771] [128.1] [pid=17635] Command Arguments: web1.mdlocal;NRPE;1485276474
[1485276474.707779] [001.0] [pid=17635] schedule_service_check()
[1485276474.707790] [016.0] [pid=17635] Scheduling a forced, active check of service 'NRPE' on host 'web1.mdlocal' @ Tue Jan 24 08:47:54 2017
[1485276474.707792] [016.2] [pid=17635] Scheduling new service check event.
[1485276474.707794] [001.0] [pid=17635] reschedule_event()
[1485276474.707796] [001.0] [pid=17635] add_event()
[1485276474.707798] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276474.707800] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276474.707802] [064.1] [pid=17635] Making callbacks (type 20)...
[1485276474.707804] [064.1] [pid=17635] Making callbacks (type 24)...
[1485276474.707805] [064.2] [pid=17635] Callback #1 (type 24) return code = 0
[1485276474.707808] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276474.707810] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276474.960106] [008.1] [pid=17635] ** Event Check Loop
[1485276474.960115] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:55 2017
[1485276474.960118] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:54 2017
[1485276474.960120] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485276474.960123] [024.1] [pid=17635] We're not executing service checks right now, so we'll skip this event.
[1485276474.960124] [008.1] [pid=17635] Running event...
[1485276474.960126] [001.0] [pid=17635] handle_timed_event() start
[1485276474.960128] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276474.960130] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276474.960132] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_SERVICE_CHECK, Run Time: Tue Jan 24 08:47:54 2017
[1485276474.960135] [008.0] [pid=17635] ** Service Check Event ==> Host: 'web1.mdlocal', Service: 'NRPE', Options: 1, Latency: 0.960000 sec
[1485276474.960139] [001.0] [pid=17635] run_scheduled_service_check() start
[1485276474.960140] [016.0] [pid=17635] Attempting to run scheduled check of service 'NRPE' on host 'web1.mdlocal': check options=1, latency=0.960000
[1485276474.960143] [001.0] [pid=17635] run_async_service_check()
[1485276474.960146] [001.0] [pid=17635] check_service_check_viability()
[1485276474.960149] [064.1] [pid=17635] Making callbacks (type 13)...
[1485276474.960151] [064.2] [pid=17635] Callback #1 (type 13) return code = 0
[1485276474.960153] [016.0] [pid=17635] Checking service 'NRPE' on host 'web1.mdlocal'...
[1485276474.960161] [001.0] [pid=17635] get_raw_command_line_r()
[1485276474.960166] [2320.2] [pid=17635] Raw Command Input: $USER1$/check_nrpe -t 60 -H $HOSTADDRESS$ -c $ARG1$
[1485276474.960169] [001.0] [pid=17635] process_macros_r()
[1485276474.960171] [2048.1] [pid=17635] **** BEGIN MACRO PROCESSING ***********
[1485276474.960173] [2048.1] [pid=17635] Processing: 'check_nrpe_alive'
[1485276474.960175] [2048.2] [pid=17635]   Processing part: 'check_nrpe_alive'
[1485276474.960177] [2048.2] [pid=17635]   Not currently in macro.  Running output (16): 'check_nrpe_alive'
[1485276474.960179] [2048.1] [pid=17635]   Done.  Final output: 'check_nrpe_alive'
[1485276474.960181] [2048.1] [pid=17635] **** END MACRO PROCESSING *************
[1485276474.960182] [2320.2] [pid=17635] Expanded Command Output: $USER1$/check_nrpe -t 60 -H $HOSTADDRESS$ -c $ARG1$
[1485276474.960184] [001.0] [pid=17635] process_macros_r()
[1485276474.960186] [2048.1] [pid=17635] **** BEGIN MACRO PROCESSING ***********
[1485276474.960187] [2048.1] [pid=17635] Processing: '$USER1$/check_nrpe -t 60 -H $HOSTADDRESS$ -c $ARG1$'
[1485276474.960189] [2048.2] [pid=17635]   Processing part: ''
[1485276474.960191] [2048.2] [pid=17635]   Not currently in macro.  Running output (0): ''
[1485276474.960192] [2048.2] [pid=17635]   Processing part: 'USER1'
[1485276474.960195] [2048.2] [pid=17635]   Processed 'USER1', Clean Options: 0, Free: 0
[1485276474.960196] [2048.2] [pid=17635]   Processed 'USER1', Clean Options: 0, Free: 0
[1485276474.960198] [2048.2] [pid=17635]   Cleaning options: global=0, local=0, effective=0
[1485276474.960200] [2048.2] [pid=17635]   Uncleaned macro.  Running output (25): '/usr/local/nagios/libexec'
[1485276474.960212] [2048.2] [pid=17635]   Just finished macro.  Running output (25): '/usr/local/nagios/libexec'
[1485276474.960214] [2048.2] [pid=17635]   Processing part: '/check_nrpe -t 60 -H '
[1485276474.960216] [2048.2] [pid=17635]   Not currently in macro.  Running output (46): '/usr/local/nagios/libexec/check_nrpe -t 60 -H '
[1485276474.960217] [2048.2] [pid=17635]   Processing part: 'HOSTADDRESS'
[1485276474.960219] [2048.2] [pid=17635]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 0
[1485276474.960222] [2048.2] [pid=17635]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 0
[1485276474.960224] [2048.2] [pid=17635]   Cleaning options: global=0, local=0, effective=0
[1485276474.960226] [2048.2] [pid=17635]   Uncleaned macro.  Running output (56): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29'
[1485276474.960228] [2048.2] [pid=17635]   Just finished macro.  Running output (56): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29'
[1485276474.960229] [2048.2] [pid=17635]   Processing part: ' -c '
[1485276474.960231] [2048.2] [pid=17635]   Not currently in macro.  Running output (60): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c '
[1485276474.960233] [2048.2] [pid=17635]   Processing part: 'ARG1'
[1485276474.960235] [2048.2] [pid=17635]   Processed 'ARG1', Clean Options: 0, Free: 0
[1485276474.960236] [2048.2] [pid=17635]   Processed 'ARG1', Clean Options: 0, Free: 0
[1485276474.960238] [2048.2] [pid=17635]   Cleaning options: global=0, local=0, effective=0
[1485276474.960240] [2048.2] [pid=17635]   Uncleaned macro.  Running output (76): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485276474.960241] [2048.2] [pid=17635]   Just finished macro.  Running output (76): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485276474.960243] [2048.2] [pid=17635]   Processing part: ''
[1485276474.960245] [2048.2] [pid=17635]   Not currently in macro.  Running output (76): '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485276474.960246] [2048.1] [pid=17635]   Done.  Final output: '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'
[1485276474.960248] [2048.1] [pid=17635] **** END MACRO PROCESSING *************
[1485276474.960250] [064.1] [pid=17635] Making callbacks (type 13)...
[1485276474.960252] [064.2] [pid=17635] Callback #1 (type 13) return code = 0
[1485276474.960270] [016.1] [pid=17635] Check result output will be written to '/usr/local/nagios/var/spool/checkresults/checkXcg5aY' (fd=13)
[1485276474.964934] [001.0] [pid=19558] process_macros_r()
[1485276474.964949] [001.0] [pid=19558] process_macros_r()
[1485276474.964956] [001.0] [pid=19558] process_macros_r()
[1485276474.964958] [2048.1] [pid=19558] **** BEGIN MACRO PROCESSING ***********
[1485276474.964960] [2048.1] [pid=19558] Processing: 'essentials::admin'
[1485276474.964962] [2048.2] [pid=19558]   Processing part: 'essentials::admin'
[1485276474.964964] [2048.2] [pid=19558]   Not currently in macro.  Running output (17): 'essentials::admin'
[1485276474.964966] [2048.1] [pid=19558]   Done.  Final output: 'essentials::admin'
[1485276474.964967] [2048.1] [pid=19558] **** END MACRO PROCESSING *************
[1485276474.964970] [001.0] [pid=19558] process_macros_r()
[1485276474.964974] [001.0] [pid=19558] process_macros_r()
[1485276474.964975] [2048.1] [pid=19558] **** BEGIN MACRO PROCESSING ***********
[1485276474.964977] [2048.1] [pid=19558] Processing: 'https://wiki.runbook.url/obviously_altered'
[1485276474.964979] [2048.2] [pid=19558]   Processing part: 'https://wiki.runbook.url/obviously_altered'
[1485276474.964981] [2048.2] [pid=19558]   Not currently in macro.  Running output (58): 'https://wiki.runbook.url/obviously_altered'
[1485276474.964982] [2048.1] [pid=19558]   Done.  Final output: 'https://wiki.runbook.url/obviously_altered'
[1485276474.964984] [2048.1] [pid=19558] **** END MACRO PROCESSING *************
[1485276474.964992] [001.0] [pid=19558] process_macros_r()
[1485276474.965172] [001.0] [pid=19558] process_macros_r()
[1485276474.965178] [001.0] [pid=19558] process_macros_r()
[1485276474.965182] [001.0] [pid=19558] process_macros_r()
[1485276474.966185] [016.2] [pid=17635] Service check is executing in child process (pid=19558)
[1485276474.966217] [001.0] [pid=17635] handle_timed_event() end
[1485276474.966221] [008.1] [pid=17635] ** Event Check Loop
[1485276474.966230] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:55 2017
[1485276474.966234] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:55 2017
[1485276474.966236] [008.1] [pid=17635] Current/Max Service Checks: 1/0
[1485276474.966240] [008.2] [pid=17635] No events to execute at the moment.  Idling for a bit...
[1485276474.966242] [001.0] [pid=17635] check_for_external_commands()
[1485276474.966245] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276474.966251] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276474.977754] [016.2] [pid=19558] Moving temp check result file '/usr/local/nagios/var/spool/checkresults/checkXcg5aY' to queue file '/usr/local/nagios/var/spool/checkresults/crdAufp'...
[1485276475.217568] [008.1] [pid=17635] ** Event Check Loop
[1485276475.217582] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:55 2017
[1485276475.217585] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:55 2017
[1485276475.217587] [008.1] [pid=17635] Current/Max Service Checks: 1/0
[1485276475.217596] [001.0] [pid=17635] handle_timed_event() start
[1485276475.217599] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276475.217602] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276475.217605] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_STATUS_SAVE, Run Time: Tue Jan 24 08:47:55 2017
[1485276475.217607] [008.0] [pid=17635] ** Status Data Save Event
[1485276475.217609] [064.1] [pid=17635] Making callbacks (type 25)...
[1485276475.217610] [001.0] [pid=17635] save_status_data()
[1485276475.217615] [004.2] [pid=17635] Writing status data to temp file '/usr/local/nagios/var/nagios.tmpXXXXXX'
[1485276475.657366] [064.1] [pid=17635] Making callbacks (type 25)...
[1485276475.657381] [001.0] [pid=17635] handle_timed_event() end
[1485276475.657384] [001.0] [pid=17635] reschedule_event()
[1485276475.657386] [001.0] [pid=17635] add_event()
[1485276475.657391] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276475.657396] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276475.657399] [008.1] [pid=17635] ** Event Check Loop
[1485276475.657407] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:55 2017
[1485276475.657410] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:55 2017
[1485276475.657411] [008.1] [pid=17635] Current/Max Service Checks: 1/0
[1485276475.657416] [001.0] [pid=17635] handle_timed_event() start
[1485276475.657418] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276475.657420] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276475.657422] [008.0] [pid=17635] ** Timed Event ** Type: EVENT_CHECK_REAPER, Run Time: Tue Jan 24 08:47:55 2017
[1485276475.657424] [008.0] [pid=17635] ** Check Result Reaper
[1485276475.657426] [001.0] [pid=17635] reap_check_results() start
[1485276475.657428] [016.0] [pid=17635] Starting to reap check results.
[1485276475.657440] [016.1] [pid=17635] Starting to read check result queue '/usr/local/nagios/var/spool/checkresults'...
[1485276475.657451] [016.1] [pid=17635] Processing check result file: '/usr/local/nagios/var/spool/checkresults/crdAufp'
[1485276475.657490] [016.2] [pid=17635] Found a check result (#1) to handle...
[1485276475.657498] [016.1] [pid=17635] Handling check result for service 'NRPE' on host 'web1.mdlocal'...
[1485276475.657500] [001.0] [pid=17635] handle_async_service_check_result()
[1485276475.657502] [016.0] [pid=17635] ** Handling check result for service 'NRPE' on host 'web1.mdlocal'...
[1485276475.657504] [016.1] [pid=17635] HOST: web1.mdlocal, SERVICE: NRPE, CHECK TYPE: Active, OPTIONS: 1, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: NRPE OK: \n
[1485276475.657532] [016.2] [pid=17635] Parsing check output...
[1485276475.657535] [016.2] [pid=17635] Short Output: NRPE OK:
[1485276475.657537] [016.2] [pid=17635] Long Output:  NULL
[1485276475.657539] [016.2] [pid=17635] Perf Data:    NULL
[1485276475.657541] [016.2] [pid=17635] ST: HARD  CA: 1  MA: 5  CS: 0  LS: 0  LHS: 0
[1485276475.657543] [016.1] [pid=17635] Service is OK.
[1485276475.657545] [016.1] [pid=17635] Service did not change state.
[1485276475.657548] [016.1] [pid=17635] Rescheduling next check of service at Tue Jan 24 08:49:54 2017
[1485276475.657550] [001.0] [pid=17635] get_next_valid_time()
[1485276475.657552] [001.0] [pid=17635] check_time_against_period()
[1485276475.657557] [001.0] [pid=17635] schedule_service_check()
[1485276475.657560] [016.0] [pid=17635] Scheduling a non-forced, active check of service 'NRPE' on host 'web1.mdlocal' @ Tue Jan 24 08:49:54 2017
[1485276475.657563] [016.2] [pid=17635] Scheduling new service check event.
[1485276475.657564] [001.0] [pid=17635] reschedule_event()
[1485276475.657566] [001.0] [pid=17635] add_event()
[1485276475.660060] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276475.660065] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276475.660067] [064.1] [pid=17635] Making callbacks (type 20)...
[1485276475.660070] [064.1] [pid=17635] Making callbacks (type 13)...
[1485276475.660072] [064.2] [pid=17635] Callback #1 (type 13) return code = 0
[1485276475.660074] [064.1] [pid=17635] Making callbacks (type 20)...
[1485276475.660076] [001.0] [pid=17635] check_for_service_flapping()
[1485276475.660078] [016.1] [pid=17635] Checking service 'NRPE' on host 'web1.mdlocal' for flapping...
[1485276475.660080] [016.2] [pid=17635] LFT=5.00, HFT=20.00, CPC=0.00, PSC=0.00%
[1485276475.660086] [016.1] [pid=17635] Service is not flapping (0.00% state change).
[1485276475.660088] [001.0] [pid=17635] check_for_host_flapping()
[1485276475.660090] [016.1] [pid=17635] Checking host 'web1.mdlocal' for flapping...
[1485276475.660093] [016.2] [pid=17635] LFT=5.00, HFT=20.00, CPC=0.00, PSC=0.00%
[1485276475.660097] [016.1] [pid=17635] Host is not flapping (0.00% state change).
[1485276475.660099] [016.1] [pid=17635] Deleted check result file '/usr/local/nagios/var/spool/checkresults/crdAufp'
[1485276475.660102] [016.0] [pid=17635] Finished reaping 1 check results
[1485276475.660103] [001.0] [pid=17635] reap_check_results() end
[1485276475.660105] [001.0] [pid=17635] handle_timed_event() end
[1485276475.660106] [001.0] [pid=17635] reschedule_event()
[1485276475.660108] [001.0] [pid=17635] add_event()
[1485276475.660110] [064.1] [pid=17635] Making callbacks (type 8)...
[1485276475.660111] [064.2] [pid=17635] Callback #1 (type 8) return code = 0
[1485276475.660113] [008.1] [pid=17635] ** Event Check Loop
[1485276475.660116] [008.1] [pid=17635] Next High Priority Event Time: Tue Jan 24 08:47:57 2017
[1485276475.660118] [008.1] [pid=17635] Next Low Priority Event Time:  Tue Jan 24 08:47:55 2017
[1485276475.660120] [008.1] [pid=17635] Current/Max Service Checks: 0/0
[1485276475.660123] [024.1] [pid=17635] We're not executing service checks right now, so we'll skip this event.

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 3:42 pm
by dwhitfield
What version of Core are you using? Was it compiled from source or installed from distro repos? On what OS/version is nagios running? cat /etc/*-release may be of use.

Also, please post the output of ls -la /usr/local/nagios/libexec from both a server returning errors and one that isn't. Thanks!

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 4:25 pm
by vext
OK, please don't laugh. I inherited this, and the modern nagios host I have built are CentOS 7 with 4.1.1 on them.

However the one causing me problems is:

Code: Select all

~$ cat /etc/*-release
CentOS release 5.6 (Final)
~$
We are running 3.5.0 on the older generation. We have ~10 nagios hosts due to inherited architecture, and only one is exhibiting this behavior.
Note that all testing and snippets above are from one box. It isn't that some Nagios instances work and others do not. From one box some services work and some do not.

requested ls -la /usr/local/nagios/libexec:

Code: Select all

~$ ls -la /usr/local/nagios/libexec
total 8732
drwxr-xr-x  7 nagios nagios    4096 Jan 23 22:03 .
drwxrwxr-x 14 nagios nagios    4096 Jan 24 20:00 ..
-rwxr-xr-x  1 nagios nagios  368020 Jan 23 14:08 check_apt
-rwxr-xr-x  1 nagios nagios    3934 Jan 23 14:08 checkBlacklist
-rwxr-xr-x  1 nagios nagios    2245 Jan 23 14:08 check_breeze
-rwxr-xr-x  1 nagios nagios  138043 Jan 23 14:08 check_by_ssh
-rwxr-xr-x  1 nagios nagios   92608 Jan 23 14:08 check_cluster
-rwxr-xr-x  1 nagios nagios    4102 Jan 23 14:08 check_db_query
-rwxr-xr-x  1 nagios nagios   13244 Jan 23 14:08 check_dell.pl
-rwxr-xr-x  1 nagios nagios  130129 Jan 23 14:08 check_dhcp
-rwxr-xr-x  1 nagios nagios  132358 Jan 23 14:08 check_dig
-rwxr-xr-x  1 nagios nagios  420611 Jan 23 14:08 check_disk
-rwxr-xr-x  1 nagios nagios    8080 Jan 23 14:08 check_disk_smb
-rwxr-xr-x  1 nagios nagios  140007 Jan 23 14:08 check_dns
-rwxr-xr-x  1 nagios nagios   58669 Jan 23 14:08 check_dummy
lrwxrwxrwx  1 nagios nagios      24 Jan 23 14:08 check_email_loop -> check_email_loop_snat.pl
-rwxr-xr-x  1 nagios nagios   21387 Jan 23 14:08 check_email_loop_snat.pl
-rwxr-xr-x  1 nagios nagios   20067 Jan 23 14:08 check_email_loop_v1.3.2.pl
-rwxr-xr-x  1 nagios nagios   20655 Jan 23 14:08 check_email_loop_v1.5.pl
-rwxr-xr-x  1 nagios nagios   24098 Jan 23 14:08 check_email_roundtrip
-rwxr-xr-x  1 nagios nagios    9456 Jan 23 14:08 check_esxi_wbem.py
-rwxr-xr-x  1 nagios nagios    3418 Jan 23 14:08 check_esx_wbem.py
-rwxr-xr-x  1 nagios nagios    3056 Jan 23 14:08 check_file_age
-rwxr-xr-x  1 nagios nagios    2767 Jan 23 14:08 check_file_age.sh
-rwxr-xr-x  1 nagios nagios    6318 Jan 23 14:08 check_flexlm
-rwxr-xr-x  1 nagios nagios   45460 Jan 23 14:08 check_fping
-rwxr-xr-x  1 nagios nagios    1593 Jan 23 14:09 check_hdfsfileserver.sh
-rwxr-xr-x  1 nagios nagios    1043 Jan 23 14:08 check_hdfs_remaining.pl
-rwxr-xr-x  1 nagios nagios     775 Jan 23 14:09 check_hdfs_totalcapacity.pl
-rwxr-xr-x  1 nagios nagios     737 Jan 23 14:09 check_heapusage_jmx.sh
-rwxr-xr-x  1 nagios nagios  363663 Jan 23 14:09 check_http
-rwxr-xr-x  1 nagios nagios  137199 Jan 23 14:09 check_icmp
-rwxr-xr-x  1 nagios nagios   99433 Jan 23 14:09 check_ide_smart
-rwxr-xr-x  1 nagios nagios   15137 Jan 23 14:09 check_ifoperstatus
-rwxr-xr-x  1 nagios nagios    2567 Jan 23 14:09 check_ifstats.pl
-rwxr-xr-x  1 nagios nagios   12523 Jan 23 14:09 check_ifstatus
-rwxr-xr-x  1 nagios nagios    7355 Jan 23 14:09 check_ircd
-rwxr-xr-x  1 nagios nagios    3028 Jan 23 14:09 check_jabber_login.rb
-rwxr-xr-x  1 nagios nagios     282 Jan 23 14:09 check_jmx
-rwxr-xr-x  1 nagios nagios    8470 Jan 23 14:09 check_jmx.jar
-rwxr-xr-x  1 nagios nagios    1842 Jan 23 14:09 check_jmx.pl
-rwxr-xr-x  1 nagios nagios    8444 Jan 23 14:08 check-juniper-ssl-vpn.pl
-rwxr-xr-x  1 nagios nagios  112688 Jan 23 14:09 check_ldap
-rwxr-xr-x  1 nagios nagios  125318 Jan 23 14:09 check_load
-rwxr-xr-x  1 nagios nagios    6020 Jan 23 14:09 check_log
-rwxr-xr-x  1 nagios nagios   20287 Jan 23 14:09 check_mailq
-rwxr-xr-x  1 nagios nagios    1168 Jan 23 14:09 check_mailstore.pl
-rwxr-xr-x  1 nagios nagios   99378 Jan 23 14:09 check_mrtg
-rwxr-xr-x  1 nagios nagios   98883 Jan 23 14:09 check_mrtgtraf
-rwxr-xr-x  1 nagios nagios  129707 Jan 23 14:09 check_mysql
-rwxr-xr-x  1 nagios nagios  124985 Jan 23 14:09 check_mysql_query
-rwxr-xr-x  1 nagios nagios 1510034 Jan 23 14:09 check_mysql_query_new
-rwxr-xr-x  1 nagios nagios    9171 Jan 23 14:09 check_mysql_slavestatus.sh
-rwxr-xr-x  1 nagios nagios  117600 Jan 23 14:09 check_nagios
-rwxr-xr-x  1 nagios nagios   31330 Jan 23 14:09 check_netappfiler.py
-rwxr-xr-x  1 nagios nagios    4194 Jan 23 14:09 check_netapp_inodes
-rwxr-xr-x  1 nagios nagios   22770 Jan 23 14:09 check_netapp.pl
-rwxr-xr-x  1 nagios nagios   24744 Jan 23 14:09 check_nrpe
-rwxr-xr-x  1 nagios nagios  128964 Jan 23 14:09 check_nt
-rwxr-xr-x  1 nagios nagios  135325 Jan 23 14:09 check_ntp
-rwxr-xr-x  1 nagios nagios  123177 Jan 23 14:09 check_ntp_peer
-rwxr-xr-x  1 nagios nagios  124323 Jan 23 14:09 check_ntp_time
-rwxr-xr-x  1 nagios nagios  153410 Jan 23 14:09 check_nwstat
-rwxr-xr-x  1 nagios nagios   14900 Jan 23 14:09 check_om_storage.pl
-rwxr-xr-x  1 nagios nagios  114256 Jan 23 14:09 check_overcr
-rwxr-xr-x  1 nagios nagios  153532 Jan 23 14:09 check_ping
-rwxr-xr-x  1 nagios nagios  384203 Jan 23 14:09 check_procs
-rwxr-xr-x  1 nagios nagios   11411 Jan 23 14:09 check_rbl
-rwxr-xr-x  1 nagios nagios  111164 Jan 23 14:09 check_real
-rwxr-xr-x  1 nagios nagios    9584 Jan 23 14:09 check_rpc
-rwxr-xr-x  1 nagios nagios    1135 Jan 23 14:09 check_sensors
-rwxr-xr-x  1 nagios nagios    1867 Jan 23 14:09 check_slave_expire.sh
-rwxr-xr-x  1 nagios nagios  387810 Jan 23 14:09 check_smtp
-rwxr-xr-x  1 nagios nagios  419344 Jan 23 14:09 check_snmp
-rwxr-xr-x  1 nagios nagios   33646 Jan 23 14:09 check_snmp_int.pl
-rwxr-xr-x  1 nagios nagios   20609 Jan 23 14:09 check_snmp_mem.pl
-rwxr-xr-x  1 nagios nagios   20498 Jan 23 14:09 check_snmp_mem.pl.orig
-rwxr-xr-x  1 nagios nagios  111036 Jan 23 14:09 check_ssh
-rwxr-xr-x  1 nagios nagios   16370 Jan 23 14:09 check_ssl_cert
-rwxr-xr-x  1 nagios nagios    5564 Jan 23 14:09 check_ssl_certificate
-rwxr-xr-x  1 nagios nagios     555 Jan 23 14:09 check_ssl_handshake_timer
-rwxr-xr-x  1 nagios nagios  128666 Jan 23 14:09 check_swap
-rwxr-xr-x  1 nagios nagios    2421 Jan 23 14:09 check_tacacs_plus.pl
-rwxr-xr-x  1 nagios nagios  154752 Jan 23 14:09 check_tcp
-rwxr-xr-x  1 nagios nagios  113661 Jan 23 14:09 check_time
lrwxrwxrwx  1 nagios nagios       9 Jan 23 14:09 check_udp -> check_tcp
-rwxr-xr-x  1 nagios nagios  120780 Jan 23 14:09 check_ups
-rwxr-xr-x  1 nagios nagios  124420 Jan 23 14:09 check_users
-rwxr-xr-x  1 nagios nagios    3927 Jan 23 14:09 check_vmware_cpumem.pl
-rwxr-xr-x  1 nagios nagios    2939 Jan 23 14:09 check_wave
-rwxr-xr-x  1 nagios nagios    1915 Jan 23 14:09 check_webservice.py
-rwxr-xr-x  1 nagios nagios     584 Jan 23 14:09 check_zerohour.sh
drwxr-xr-x  2 nagios nagios    4096 Jan 23 14:09 eventhandlers
-rwxr-xr-x  1 nagios nagios    8486 Jan 23 14:09 haproxy_query.pl
drwxr-xr-x  2 nagios nagios    4096 Jan 23 14:09 jmx_properties
-rwxr-xr-x  1 nagios nagios   90754 Jan 23 14:09 negate
drwxr-xr-x  2 nagios nagios    4096 Jan 23 14:09 netapp
-rwxr-xr-x  1 nagios nagios   93951 Jan 23 14:09 urlize
-rwxr-xr-x  1 nagios nagios    1939 Jan 23 14:09 utils.pm
-rwxr-xr-x  1 nagios nagios     862 Jan 23 14:09 utils.sh
It was compiled from source with 2 patches applied:
nagios-hostgroups-segfault-fix.patch

Code: Select all

diff -ur nagios-3.5.0.orig/xdata/xodtemplate.c nagios-3.5.0/xdata/xodtemplate.c
--- nagios-3.5.0.orig/xdata/xodtemplate.c	2013-11-11 17:30:42.000000000 +0000
+++ nagios-3.5.0/xdata/xodtemplate.c	2013-11-11 17:34:10.000000000 +0000
@@ -7628,7 +7628,11 @@
 				logit(NSLOG_CONFIG_ERROR, TRUE, "Error: Could not find member group '%s' specified in contactgroup (config file '%s', starting on line %d)\n", buf, xodtemplate_config_file_name(temp_contactgroup->_config_file), temp_contactgroup->_start_line);
 				return ERROR;
 				}
-			xodtemplate_recombobulate_contactgroup_subgroups(sub_group, &newmembers);
+
+			if (sub_group == temp_contactgroup)
+				newmembers = NULL;
+			else
+				xodtemplate_recombobulate_contactgroup_subgroups(sub_group, &newmembers);

 			/* add new (sub) members */
 			if(newmembers != NULL) {
@@ -7839,7 +7843,11 @@
 				logit(NSLOG_CONFIG_ERROR, TRUE, "Error: Could not find member group '%s' specified in hostgroup (config file '%s', starting on line %d)\n", buf, xodtemplate_config_file_name(temp_hostgroup->_config_file), temp_hostgroup->_start_line);
 				return ERROR;
 				}
-			xodtemplate_recombobulate_hostgroup_subgroups(sub_group, &newmembers);
+
+			if (sub_group == temp_hostgroup)
+				newmembers = NULL;
+			else
+				xodtemplate_recombobulate_hostgroup_subgroups(sub_group, &newmembers);

 			/* add new (sub) members */
 			if(newmembers != NULL) {
@@ -8074,7 +8082,11 @@
 				logit(NSLOG_CONFIG_ERROR, TRUE, "Error: Could not find member group '%s' specified in servicegroup (config file '%s', starting on line %d)\n", buf, xodtemplate_config_file_name(temp_servicegroup->_config_file), temp_servicegroup->_start_line);
 				return ERROR;
 				}
-			xodtemplate_recombobulate_servicegroup_subgroups(sub_group, &newmembers);
+
+			if (sub_group == temp_servicegroup)
+				newmembers = NULL;
+			else
+				xodtemplate_recombobulate_servicegroup_subgroups(sub_group, &newmembers);

 			/* add new (sub) members */
 			if(newmembers != NULL) {
nagios-ipv6-fields.patch

Code: Select all

diff -ru nagios/base/commands.c nagios-new/base/commands.c
--- nagios/base/commands.c	2011-07-26 00:16:12.000000000 +0000
+++ nagios-new/base/commands.c	2013-06-09 18:35:54.000000000 +0000
@@ -2078,6 +2078,10 @@
 				real_host_name = temp_host->name;
 				break;
 				}
+			if(!strcmp(host_name, temp_host->address6)) {
+				real_host_name = temp_host->name;
+				break;
+				}
 			}
 		}

@@ -2219,6 +2223,10 @@
 				real_host_name = temp_host->name;
 				break;
 				}
+			if(!strcmp(host_name, temp_host->address6)) {
+				real_host_name = temp_host->name;
+				break;
+				}
 			}
 		}

diff -ru nagios/cgi/config.c nagios-new/cgi/config.c
--- nagios/cgi/config.c	2013-02-27 01:37:32.000000000 +0000
+++ nagios-new/cgi/config.c	2013-06-09 18:35:54.000000000 +0000
@@ -485,6 +485,7 @@
 	printf("<TH CLASS='data'>Host Name</TH>");
 	printf("<TH CLASS='data'>Alias/Description</TH>");
 	printf("<TH CLASS='data'>Address</TH>");
+	printf("<TH CLASS='data'>Address6</TH>");
 	printf("<TH CLASS='data'>Parent Hosts</TH>");
 	printf("<TH CLASS='data'>Max. Check Attempts</TH>");
 	printf("<TH CLASS='data'>Check Interval</TH>\n");
@@ -544,6 +545,7 @@
 			       url_encode(temp_host->name), CONFIG_CGI, url_encode(temp_host->name), html_encode(temp_host->name, FALSE));
 			printf("<TD CLASS='%s'>%s</TD>\n", bg_class, html_encode(temp_host->alias, FALSE));
 			printf("<TD CLASS='%s'>%s</TD>\n", bg_class, html_encode(temp_host->address, FALSE));
+			printf("<TD CLASS='%s'>%s</TD>\n", bg_class, html_encode(temp_host->address6, FALSE));

 			printf("<TD CLASS='%s'>", bg_class);
 			for(temp_hostsmember = temp_host->parent_hosts; temp_hostsmember != NULL; temp_hostsmember = temp_hostsmember->next) {
diff -ru nagios/cgi/extinfo.c nagios-new/cgi/extinfo.c
--- nagios/cgi/extinfo.c	2011-08-17 07:36:27.000000000 +0000
+++ nagios-new/cgi/extinfo.c	2013-06-09 18:35:54.000000000 +0000
@@ -247,6 +247,9 @@
 				printf("function nagios_get_host_address()\n{\n");
 				printf("return \"%s\";\n", temp_host->address);
 				printf("}\n");
+				printf("function nagios_get_host_address6()\n{\n");
+				printf("return \"%s\";\n", temp_host->address6);
+				printf("}\n");
 				if(temp_service != NULL) {
 					printf("function nagios_get_service_description()\n{\n");
 					printf("return \"%s\";\n", temp_service->description);
@@ -351,7 +354,7 @@
 				if(found == FALSE)
 					printf("No hostgroups");
 				printf("</DIV><BR>\n");
-				printf("<DIV CLASS='data'>%s</DIV>\n", temp_host->address);
+				printf("<DIV CLASS='data'>%s, %s</DIV>\n", temp_host->address, temp_host->address6);
 				}
 			if(display_type == DISPLAY_SERVICE_INFO) {
 				printf("<DIV CLASS='data'>Service</DIV><DIV CLASS='dataTitle'>%s</DIV><DIV CLASS='data'>On Host</DIV>\n", service_desc);
@@ -371,7 +374,7 @@
 					printf("No servicegroups.");
 				printf("</DIV><BR>\n");

-				printf("<DIV CLASS='data'>%s</DIV>\n", temp_host->address);
+				printf("<DIV CLASS='data'>%s, %s</DIV>\n", temp_host->address, temp_host->address6);
 				}
 			if(display_type == DISPLAY_HOSTGROUP_INFO) {
 				printf("<DIV CLASS='data'>Hostgroup</DIV>\n");
diff -ru nagios/cgi/status.c nagios-new/cgi/status.c
--- nagios/cgi/status.c	2013-03-09 21:46:35.000000000 +0000
+++ nagios-new/cgi/status.c	2013-06-09 18:53:24.000000000 +0000
@@ -285,6 +285,11 @@
 						host_name = strdup(temp_host->name);
 						break;
 						}
+					if(!strcmp(host_name,temp_host->address6)){
+						free(host_name);
+						host_name=strdup(temp_host->name);
+						break;
+						}
 					}
 				if(temp_host == NULL) {
 					for(temp_host = host_list; temp_host != NULL; temp_host = temp_host->next) {
@@ -1707,7 +1712,7 @@
 				printf("<td align='left'>\n");
 				printf("<table border=0 cellpadding=0 cellspacing=0>\n");
 				printf("<tr>\n");
-				printf("<td align=left valign=center class='status%s'><a href='%s?type=%d&host=%s' title='%s'>%s</a></td>\n", host_status_bg_class, EXTINFO_CGI, DISPLAY_HOST_INFO, url_encode(temp_status->host_name), temp_host->address, temp_status->host_name);
+				printf("<td align=left valign=center class='status%s'><a href='%s?type=%d&host=%s' title='%s %s'>%s</a></td>\n", host_status_bg_class, EXTINFO_CGI, DISPLAY_HOST_INFO, url_encode(temp_status->host_name), temp_host->address, temp_host->address6, temp_status->host_name);
 				printf("</tr>\n");
 				printf("</table>\n");
 				printf("</td>\n");
@@ -2240,7 +2245,7 @@
 			printf("<td align='left'>\n");
 			printf("<table border=0 cellpadding=0 cellspacing=0>\n");
 			printf("<tr>\n");
-			printf("<td align=left valign=center class='status%s'><a href='%s?type=%d&host=%s' title='%s'>%s</a> </td>\n", status_class, EXTINFO_CGI, DISPLAY_HOST_INFO, url_encode(temp_status->host_name), temp_host->address, temp_status->host_name);
+			printf("<td align=left valign=center class='status%s'><a href='%s?type=%d&host=%s' title='%s %s'>%s</a> </td>\n", status_class, EXTINFO_CGI, DISPLAY_HOST_INFO, url_encode(temp_status->host_name), temp_host->address, temp_host->address6, temp_status->host_name);
 			printf("</tr>\n");
 			printf("</table>\n");
 			printf("</td>\n");
@@ -3678,7 +3683,7 @@

 	printf("<table border=0 WIDTH=100%% cellpadding=0 cellspacing=0>\n");
 	printf("<tr class='status%s'>\n", status_bg_class);
-	printf("<td class='status%s'><a href='%s?host=%s&style=detail' title='%s'>%s</a></td>\n", status_bg_class, STATUS_CGI, url_encode(hststatus->host_name), temp_host->address, hststatus->host_name);
+	printf("<td class='status%s'><a href='%s?host=%s&style=detail' title='%s %s'>%s</a></td>\n", status_bg_class, STATUS_CGI, url_encode(hststatus->host_name), temp_host->address, temp_host->address6, hststatus->host_name);

 	if(temp_host->icon_image != NULL) {
 		printf("<td class='status%s' WIDTH=5></td>\n", status_bg_class);
diff -ru nagios/cgi/statusmap.c nagios-new/cgi/statusmap.c
--- nagios/cgi/statusmap.c	2011-08-17 07:36:27.000000000 +0000
+++ nagios-new/cgi/statusmap.c	2013-06-09 18:35:54.000000000 +0000
@@ -1955,6 +1955,7 @@
 	printf("<tr><td class=\\\"popupText\\\">Name:</td><td class=\\\"popupText\\\"><b>%s</b></td></tr>", escape_string(hst->name));
 	printf("<tr><td class=\\\"popupText\\\">Alias:</td><td class=\\\"popupText\\\"><b>%s</b></td></tr>", escape_string(hst->alias));
 	printf("<tr><td class=\\\"popupText\\\">Address:</td><td class=\\\"popupText\\\"><b>%s</b></td></tr>", html_encode(hst->address, TRUE));
+	printf("<tr><td class=\\\"popupText\\\">Address6:</td><td class=\\\"popupText\\\"><b>%s</b></td></tr>", html_encode(hst->address6, TRUE));
 	printf("<tr><td class=\\\"popupText\\\">State:</td><td class=\\\"popupText\\\"><b>");

 	/* get the status of the host (pending, up, down, or unreachable) */
diff -ru nagios/common/macros.c nagios-new/common/macros.c
--- nagios/common/macros.c	2013-03-09 21:46:35.000000000 +0000
+++ nagios-new/common/macros.c	2013-06-09 18:42:33.000000000 +0000
@@ -679,6 +679,7 @@
 		case MACRO_HOSTNAME:
 		case MACRO_HOSTALIAS:
 		case MACRO_HOSTADDRESS:
+		case MACRO_HOSTADDRESS6:
 		case MACRO_LASTHOSTCHECK:
 		case MACRO_LASTHOSTSTATECHANGE:
 		case MACRO_HOSTOUTPUT:
@@ -1616,6 +1617,9 @@
 		case MACRO_HOSTADDRESS:
 			*output = (char *)strdup(temp_host->address);
 			break;
+		case MACRO_HOSTADDRESS6:
+			*output = (char *)strdup(temp_host->address6);
+			break;
 #ifdef NSCORE
 		case MACRO_HOSTSTATE:
 			if(temp_host->current_state == HOST_DOWN)
@@ -2616,6 +2620,7 @@
 	add_macrox_name(HOSTNAME);
 	add_macrox_name(HOSTALIAS);
 	add_macrox_name(HOSTADDRESS);
+	add_macrox_name(HOSTADDRESS6);
 	add_macrox_name(SERVICEDESC);
 	add_macrox_name(SERVICESTATE);
 	add_macrox_name(SERVICESTATEID);
@@ -2988,6 +2993,7 @@
 	my_free(mac->x[MACRO_HOSTDISPLAYNAME]);
 	my_free(mac->x[MACRO_HOSTALIAS]);
 	my_free(mac->x[MACRO_HOSTADDRESS]);
+	my_free(mac->x[MACRO_HOSTADDRESS6]);
 	my_free(mac->x[MACRO_HOSTOUTPUT]);
 	my_free(mac->x[MACRO_LONGHOSTOUTPUT]);
 	my_free(mac->x[MACRO_HOSTPERFDATA]);
diff -ru nagios/common/objects.c nagios-new/common/objects.c
--- nagios/common/objects.c	2011-08-24 12:15:21.000000000 +0000
+++ nagios-new/common/objects.c	2013-06-09 18:35:54.000000000 +0000
@@ -593,7 +593,7 @@


 /* add a new host definition */
-host *add_host(char *name, char *display_name, char *alias, char *address, char *check_period, int initial_state, double check_interval, double retry_interval, int max_attempts, int notify_up, int notify_down, int notify_unreachable, int notify_flapping, int notify_downtime, double notification_interval, double first_notification_delay, char *notification_period, int notifications_enabled, char *check_command, int checks_enabled, int accept_passive_checks, char *event_handler, int event_handler_enabled, int flap_detection_enabled, double low_flap_threshold, double high_flap_threshold, int flap_detection_on_up, int flap_detection_on_down, int flap_detection_on_unreachable, int stalk_on_up, int stalk_on_down, int stalk_on_unreachable, int process_perfdata, int failure_prediction_enabled, char *failure_prediction_options, int check_freshness, int freshness_threshold, char *notes, char *notes_url, char *action_url, char *icon_image, char *icon_image_alt, char *vrml_image, char *statusmap_image, int x_2d, int y_2d, int have_2d_coords, double x_3d, double y_3d, double z_3d, int have_3d_coords, int should_be_drawn, int retain_status_information, int retain_nonstatus_information, int obsess_over_host) {
+host *add_host(char *name, char *display_name, char *alias, char *address, char *address6, char *check_period, int initial_state, double check_interval, double retry_interval, int max_attempts, int notify_up, int notify_down, int notify_unreachable, int notify_flapping, int notify_downtime, double notification_interval, double first_notification_delay, char *notification_period, int notifications_enabled, char *check_command, int checks_enabled, int accept_passive_checks, char *event_handler, int event_handler_enabled, int flap_detection_enabled, double low_flap_threshold, double high_flap_threshold, int flap_detection_on_up, int flap_detection_on_down, int flap_detection_on_unreachable, int stalk_on_up, int stalk_on_down, int stalk_on_unreachable, int process_perfdata, int failure_prediction_enabled, char *failure_prediction_options, int check_freshness, int freshness_threshold, char *notes, char *notes_url, char *action_url, char *icon_image, char *icon_image_alt, char *vrml_image, char *statusmap_image, int x_2d, int y_2d, int have_2d_coords, double x_3d, double y_3d, double z_3d, int have_3d_coords, int should_be_drawn, int retain_status_information, int retain_nonstatus_information, int obsess_over_host) {
 	host *new_host = NULL;
 	int result = OK;
 #ifdef NSCORE
@@ -641,6 +641,8 @@
 		result = ERROR;
 	if((new_host->address = (char *)strdup(address)) == NULL)
 		result = ERROR;
+	if((new_host->address6 = (char *)strdup(address6)) == NULL)
+		result = ERROR;
 	if(check_period) {
 		if((new_host->check_period = (char *)strdup(check_period)) == NULL)
 			result = ERROR;
@@ -819,6 +821,7 @@
 		my_free(new_host->notification_period);
 		my_free(new_host->check_period);
 		my_free(new_host->address);
+		my_free(new_host->address6);
 		my_free(new_host->alias);
 		my_free(new_host->display_name);
 		my_free(new_host->name);
@@ -3266,6 +3269,7 @@
 		my_free(this_host->display_name);
 		my_free(this_host->alias);
 		my_free(this_host->address);
+		my_free(this_host->address6);
 #ifdef NSCORE
 		my_free(this_host->plugin_output);
 		my_free(this_host->long_plugin_output);
diff -ru nagios/include/macros.h nagios-new/include/macros.h
--- nagios/include/macros.h	2012-11-12 18:27:43.000000000 +0000
+++ nagios-new/include/macros.h	2013-06-09 18:35:54.000000000 +0000
@@ -40,7 +40,7 @@

 #define MAX_USER_MACROS				256	/* maximum number of $USERx$ macros */

-#define MACRO_X_COUNT				153	/* size of macro_x[] array */
+#define MACRO_X_COUNT				154	/* size of macro_x[] array */

 NAGIOS_BEGIN_DECL

@@ -216,6 +216,7 @@
 #define MACRO_LASTHOSTSTATEID                   150
 #define MACRO_LASTSERVICESTATE                  151
 #define MACRO_LASTSERVICESTATEID                152
+#define MACRO_HOSTADDRESS6                      153



diff -ru nagios/include/objects.h nagios-new/include/objects.h
--- nagios/include/objects.h	2012-11-12 18:27:43.000000000 +0000
+++ nagios-new/include/objects.h	2013-06-09 18:46:29.000000000 +0000
@@ -266,6 +266,7 @@
 	char    *display_name;
 	char	*alias;
 	char    *address;
+	char	*address6;
 	hostsmember *parent_hosts;
 	hostsmember *child_hosts;
 	servicesmember *services;
@@ -634,7 +635,7 @@
 commandsmember *add_service_notification_command_to_contact(contact *, char *);				/* adds a service notification command to a contact definition */
 commandsmember *add_host_notification_command_to_contact(contact *, char *);				/* adds a host notification command to a contact definition */
 customvariablesmember *add_custom_variable_to_contact(contact *, char *, char *);                       /* adds a custom variable to a service definition */
-host *add_host(char *, char *, char *, char *, char *, int, double, double, int, int, int, int, int, int, double, double, char *, int, char *, int, int, char *, int, int, double, double, int, int, int, int, int, int, int, int, char *, int, int, char *, char *, char *, char *, char *, char *, char *, int, int, int, double, double, double, int, int, int, int, int);	/* adds a host definition */
+host *add_host(char *, char *, char *, char *, char *, char *, int, double, double, int, int, int, int, int, int, double, double, char *, int, char *, int, int, char *, int, int, double, double, int, int, int, int, int, int, int, int, char *, int, int, char *, char *, char *, char *, char *, char *, char *, int, int, int, double, double, double, int, int, int, int, int);	/* adds a host definition */
 hostsmember *add_parent_host_to_host(host *, char *);							/* adds a parent host to a host definition */
 hostsmember *add_child_link_to_host(host *, host *);						       /* adds a child host to a host definition */
 contactgroupsmember *add_contactgroup_to_host(host *, char *);					       /* adds a contactgroup to a host definition */
diff -ru nagios/t-tap/test_checks.c nagios-new/t-tap/test_checks.c
--- nagios/t-tap/test_checks.c	2011-07-26 00:16:12.000000000 +0000
+++ nagios-new/t-tap/test_checks.c	2013-06-09 18:35:54.000000000 +0000
@@ -102,6 +102,7 @@
 	host1 = (host *)calloc(1, sizeof(host));
 	host1->name = strdup("Host1");
 	host1->address = strdup("127.0.0.1");
+	host1->address6 = strdup("::1");
 	host1->retry_interval = 1;
 	host1->check_interval = 5;
 	host1->check_options = 0;
diff -ru nagios/t-tap/test_events.c nagios-new/t-tap/test_events.c
--- nagios/t-tap/test_events.c	2011-07-26 00:16:12.000000000 +0000
+++ nagios-new/t-tap/test_events.c	2013-06-09 18:35:54.000000000 +0000
@@ -231,6 +231,7 @@
 		host1 = (host *)malloc(sizeof(host));
 	host1->name = strdup("Host1");
 	host1->address = strdup("127.0.0.1");
+	host1->address6 = strdup("::1");
 	host1->retry_interval = 1;
 	host1->check_interval = 5;
 	host1->check_options = 0;
diff -ru nagios/xdata/xodtemplate.c nagios-new/xdata/xodtemplate.c
--- nagios/xdata/xodtemplate.c	2013-03-09 21:46:35.000000000 +0000
+++ nagios-new/xdata/xodtemplate.c	2013-06-09 18:35:54.000000000 +0000
@@ -2369,6 +2369,10 @@
 				if((temp_host->address = (char *)strdup(value)) == NULL)
 					result = ERROR;
 				}
+			else if(!strcmp(variable, "address6")) {
+				if((temp_host->address6 = (char *)strdup(value)) == NULL)
+					result = ERROR;
+				}
 			else if(!strcmp(variable, "parents")) {
 				if(strcmp(value, XODTEMPLATE_NULL)) {
 					if((temp_host->parents = (char *)strdup(value)) == NULL)
@@ -6703,6 +6707,8 @@
 			this_host->alias = (char *)strdup(template_host->alias);
 		if(this_host->address == NULL && template_host->address != NULL)
 			this_host->address = (char *)strdup(template_host->address);
+		if(this_host->address6 == NULL && template_host->address6 != NULL)
+			this_host->address6 = (char *)strdup(template_host->address6);

 		xodtemplate_get_inherited_string(&template_host->have_parents, &template_host->parents, &this_host->have_parents, &this_host->parents);
 		xodtemplate_get_inherited_string(&template_host->have_host_groups, &template_host->host_groups, &this_host->have_host_groups, &this_host->host_groups);
@@ -8976,9 +8982,11 @@
 		this_host->alias = (char *)strdup(this_host->host_name);
 	if(this_host->address == NULL && this_host->host_name != NULL)
 		this_host->address = (char *)strdup(this_host->host_name);
+	if(this_host->address6 == NULL && this_host->host_name != NULL)
+		this_host->address6 = (char *)strdup(this_host->host_name);

 	/* add the host definition */
-	new_host = add_host(this_host->host_name, this_host->display_name, this_host->alias, (this_host->address == NULL) ? this_host->host_name : this_host->address, this_host->check_period, this_host->initial_state, this_host->check_interval, this_host->retry_interval, this_host->max_check_attempts, this_host->notify_on_recovery, this_host->notify_on_down, this_host->notify_on_unreachable, this_host->notify_on_flapping, this_host->notify_on_downtime, this_host->notification_interval, this_host->first_notification_delay, this_host->notification_period, this_host->notifications_enabled, this_host->check_command, this_host->active_checks_enabled, this_host->passive_checks_enabled, this_host->event_handler, this_host->event_handler_enabled, this_host->flap_detection_enabled, this_host->low_flap_threshold, this_host->high_flap_threshold, this_host->flap_detection_on_up, this_host->flap_detection_on_down, this_host->flap_detection_on_unreachable, this_host->stalk_on_up, this_host->stalk_on_down, this_host->stalk_on_unreachable, this_host->process_perf_data, this_host->failure_prediction_enabled, this_host->failure_prediction_options, this_host->check_freshness, this_host->freshness_threshold, this_host->notes, this_host->notes_url, this_host->action_url, this_host->icon_image, this_host->icon_image_alt, this_host->vrml_image, this_host->statusmap_image, this_host->x_2d, this_host->y_2d, this_host->have_2d_coords, this_host->x_3d, this_host->y_3d, this_host->z_3d, this_host->have_3d_coords, TRUE, this_host->retain_status_information, this_host->retain_nonstatus_information, this_host->obsess_over_host);
+	new_host = add_host(this_host->host_name, this_host->display_name, this_host->alias, (this_host->address == NULL) ? this_host->host_name : this_host->address, (this_host->address6 == NULL) ? this_host->host_name : this_host->address6, this_host->check_period, this_host->initial_state, this_host->check_interval, this_host->retry_interval, this_host->max_check_attempts, this_host->notify_on_recovery, this_host->notify_on_down, this_host->notify_on_unreachable, this_host->notify_on_flapping, this_host->notify_on_downtime, this_host->notification_interval, this_host->first_notification_delay, this_host->notification_period, this_host->notifications_enabled, this_host->check_command, this_host->active_checks_enabled, this_host->passive_checks_enabled, this_host->event_handler, this_host->event_handler_enabled, this_host->flap_detection_enabled, this_host->low_flap_threshold, this_host->high_flap_threshold, this_host->flap_detection_on_up, this_host->flap_detection_on_down, this_host->flap_detection_on_unreachable, this_host->stalk_on_up, this_host->stalk_on_down, this_host->stalk_on_unreachable, this_host->process_perf_data, this_host->failure_prediction_enabled, this_host->failure_prediction_options, this_host->check_freshness, this_host->freshness_threshold, this_host->notes, this_host->notes_url, this_host->action_url, this_host->icon_image, this_host->icon_image_alt, this_host->vrml_image, this_host->statusmap_image, this_host->x_2d, this_host->y_2d, this_host->have_2d_coords, this_host->x_3d, this_host->y_3d, this_host->z_3d, this_host->have_3d_coords, TRUE, this_host->retain_status_information, this_host->retain_nonstatus_information, this_host->obsess_over_host);


 	/* return with an error if we couldn't add the host */
@@ -10370,6 +10378,8 @@
 			fprintf(fp, "\talias\t%s\n", temp_host->alias);
 		if(temp_host->address)
 			fprintf(fp, "\taddress\t%s\n", temp_host->address);
+		if(temp_host->address6)
+			fprintf(fp, "\taddress6\t%s\n", temp_host->address6);
 		if(temp_host->parents)
 			fprintf(fp, "\tparents\t%s\n", temp_host->parents);
 		if(temp_host->check_period)
@@ -11574,6 +11584,7 @@
 		my_free(this_host->alias);
 		my_free(this_host->display_name);
 		my_free(this_host->address);
+		my_free(this_host->address6);
 		my_free(this_host->parents);
 		my_free(this_host->host_groups);
 		my_free(this_host->check_command);
diff -ru nagios/xdata/xodtemplate.h nagios-new/xdata/xodtemplate.h
--- nagios/xdata/xodtemplate.h	2011-07-26 00:16:12.000000000 +0000
+++ nagios-new/xdata/xodtemplate.h	2013-06-09 18:35:54.000000000 +0000
@@ -229,6 +229,7 @@
     char      *display_name;
     char      *alias;
     char      *address;
+    char      *address6;
     char      *parents;
     char      *host_groups;
     char      *check_command;

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 4:30 pm
by vext
The thing that is baffling is that from the debug logs in previous posts the Nagios host seems to be doing the EXACT same thing for both checks.
Both debug logs were from the same Nagios host, just forcing a check of two different services against the same client in succession, no restarts or anything.

So:
Same Nagios host
Same Client host
Same configuration of service exactly, with the only difference the name (service_description)

Fearing some corruption I couldn't determine, and since we build all of our hosts with Puppet, I even went drastic measures and created a new host. Entirely new hardware but running the same configuration, and the issue persisted.

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 5:45 pm
by dwhitfield
Section 1 of https://assets.nagios.com/downloads/nag ... utions.pdf deals with your issue, so you'll want to look through that. Some of the stuff is XI-related, so you can just ignore those parts. XI is Core + NDOUtils plus some glue to make it easy, but all the basic check stuff is the same.

Please post your nrpe.cfg from both the working and the non-working, probably located in /usr/local/nagios/etc/nrpe.cfg.

Re: Warning: Return code of 127 for check of service...

Posted: Tue Jan 24, 2017 5:52 pm
by vext
Here is the nrpe config you requested

Code: Select all

command[check_nrpe_alive]=/usr/lib/nagios/plugins/check_nrpe_alive.sh
Note that both the working and non-working service I configured to use the same check.

(Looking at that section of docs now)

Re: Warning: Return code of 127 for check of service...

Posted: Wed Jan 25, 2017 1:23 pm
by dwhitfield
If the NRPE document didn't help, you could run the following:

1. find / -xdev | sort > workingserver.txt on the working server.
2. find / -xdev | sort > failingserver.txt on the failing server.
3. move things appropriately (or change step four appropriately)
4. diff -daU 0 workingserver.txt failingserver.txt | grep -vE '^(@@|\+\+\+|---)'


There's got to be a difference, or there's some piece of malfunctioning hardware.

Re: Warning: Return code of 127 for check of service...

Posted: Wed Jan 25, 2017 3:57 pm
by vext
I feel like I'm not being clear, and I think it is due to overload of logging.
My working and failing services are on the same server.
I will narrow the scope.
Given exactly 1 Nagios instance.
On exactly 1 client host.
There are 2 services.
They are exactly the same except service_description

From the debug logging, here is the final output of Nagios' macro generating what it will run (on the nagios host) to execute this active check for the service that succeeds:
[1485276474.960246] [2048.1] [pid=17635] Done. Final output: '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'

Same from the service that fails:
[1485275694.053373] [2048.1] [pid=17635] Done. Final output: '/usr/local/nagios/libexec/check_nrpe -t 60 -H 10.7.64.29 -c check_nrpe_alive'

This is from the same Nagios instance, same PID of the master (wasn't even restarted).

From the objects.cache you can see that they have no setting differences (I just took the define service statements for each):

Code: Select all

mlariz@C02MH0Q8FD59 ~/tmp$ diff NRPE PUPPET_STATUS
3c3
<   service_description NRPE
---
>   service_description Puppet Status