Production server wproc errors returned
Posted: Tue Jun 11, 2019 11:30 am
Yesterday, our production server started reporting high load averages, climbing as high as 88. I had executed an apply config after deleting a device and it just continued to run never completing. Navigating NagiosXI had slow response.
The first entry in the event log is:
Runtime Error 2019-06-10 14:57:11 wproc: GLOBAL SERVICE EVENTHANDLER job 19748 from worker Core Worker 15277 is a non-check helper but exited with return code 1
Service Critical 2019-06-10 14:57:11 SERVICE ALERT: KEN-INTERNET-SW1;Check Network latency and packet loss;CRITICAL;HARD;5;CRITICAL - 64.212.90.4: rta nan, lost 100%
The event log showed these other errors:
Runtime Error 2019-06-10 15:23:41 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:41 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:41 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:41 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:41 wproc: GLOBAL SERVICE EVENTHANDLER job 22510 from worker Core Worker 15257 is a non-check helper but exited with return code 1
Service Recovery 2019-06-10 15:23:41 SERVICE ALERT: WEY-97LIBBEY-RTR1;Check Network latency and packet loss;OK;SOFT;2;OK - 172.20.254.150: rta 5.736ms, lost 0%
Runtime Error 2019-06-10 15:23:30 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:30 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:30 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:30 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:30 wproc: GLOBAL SERVICE EVENTHANDLER job 22491 from worker Core Worker 15253 is a non-check helper but exited with return code 1
Service Recovery 2019-06-10 15:23:29 SERVICE ALERT: CHE-228BILLER-SW31;Check Network latency and packet loss;OK;SOFT;2;OK - 172.22.75.31: rta 6.563ms, lost 0%
Runtime Error 2019-06-10 15:23:20 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:20 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:20 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:20 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:20 wproc: GLOBAL SERVICE EVENTHANDLER job 22473 from worker Core Worker 15268 is a non-check helper but exited with return code 1
Service Unknown 2019-06-10 15:23:20 SERVICE ALERT: NOR-1177PROVI-SW31;Interface Table Status - edge network devices;UNKNOWN;SOFT;2;UNKNOWN - Plugin timed out (15s).
Runtime Error 2019-06-10 15:23:16 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:16 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:16 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:16 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:16 wproc: GLOBAL SERVICE EVENTHANDLER job 22465 from worker Core Worker 15279 is a non-check helper but exited with return code 1
Service Recovery 2019-06-10 15:23:16 SERVICE ALERT: BTR-111GROSSM-SW26;Check Network latency and packet loss;OK;SOFT;2;OK - 172.22.60.26: rta 3.246ms, lost 0%
Runtime Error 2019-06-10 15:23:14 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Service Notification 2019-06-10 15:16:01 SERVICE NOTIFICATION: pkarr;localhost;Nagios XI - Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse XML from http://localhost/nagiosxi ()
Service Notification 2019-06-10 15:16:01 SERVICE NOTIFICATION: gwhite;localhost;Nagios XI - Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse XML from http://localhost/nagiosxi ()
Service Notification 2019-06-10 15:16:01 SERVICE NOTIFICATION: bhankers;localhost;Nagios XI - Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse XML from http://localhost/nagiosxi ()
Runtime Error 2019-06-10 15:05:56 wproc: host=localhost; service=Nagios XI - Jobs; contact=pkarr
Runtime Error 2019-06-10 15:05:56 wproc: NOTIFY job 20661 from worker Core Worker 15272 is a non-check helper but exited with return code 1
Runtime Error 2019-06-10 15:05:56 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:05:56 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:05:56 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:05:56 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:05:56 wproc: host=localhost; service=Nagios XI - Jobs; contact=bhankers
Runtime Error 2019-06-10 15:05:56 wproc: NOTIFY job 20661 from worker Core Worker 15265 is a non-check helper but exited with return code 1
Service Notification 2019-06-10 15:05:56 SERVICE NOTIFICATION: pkarr;localhost;Nagios XI - Jobs;WARNING;xi_service_notification_handler;Nonstop Operations Manager (nom) stale (5035 seconds old), Nonstop Operations Manager (nom) stale (5035 seconds old), Cleaner (cleaner) stale (474 seconds old)
Service Notification 2019-06-10 15:05:56 SERVICE NOTIFICATION: gwhite;localhost;Nagios XI - Jobs;WARNING;xi_service_notification_handler;Nonstop Operations Manager (nom) stale (5035 seconds old), Nonstop Operations Manager (nom) stale (5035 seconds old), Cleaner (cleaner) stale (474 seconds old)
Service Notification 2019-06-10 15:05:56 SERVICE NOTIFICATION: bhankers;localhost;Nagios XI - Jobs;WARNING;xi_service_notification_handler;Nonstop Operations Manager (nom) stale (5035 seconds old), Nonstop Operations Manager (nom) stale (5035 seconds old), Cleaner (cleaner) stale (474 seconds ol
Then this morning we had several more and it has been quiet since 8:40.
Information 2019-06-11 08:40:48 wproc: Core Worker 1324: job 98998 (pid=14148): Dormant child reaped
Runtime Error 2019-06-11 08:40:48 wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Runtime Error 2019-06-11 08:40:48 wproc: host=DRBOTTOMsw2; service=Check fan status on a cisco router or switch;
Runtime Error 2019-06-11 08:40:48 wproc: CHECK job 98998 from worker Core Worker 1324 timed out after 60.01s
Information 2019-06-11 08:40:48 wproc: Core Worker 1324: job 98998 (pid=14148) timed out. Killing it
Runtime Error 2019-06-11 03:15:58 wproc: stdout line 01: OK - No valid historical dataset... [details]
Runtime Error 2019-06-11 03:15:58 wproc: early_timeout=0; exited_ok=1; wait_status=14; error_code=0;
Runtime Error 2019-06-11 03:15:58 wproc: host=NOR-1177PROVI-SW41; service=Interface Table Status - edge network devices;
Runtime Error 2019-06-11 03:15:58 wproc: CHECK job 64635 from worker Core Worker 1318 died by signal 14 after 15.17 seconds
The first entry in the event log is:
Runtime Error 2019-06-10 14:57:11 wproc: GLOBAL SERVICE EVENTHANDLER job 19748 from worker Core Worker 15277 is a non-check helper but exited with return code 1
Service Critical 2019-06-10 14:57:11 SERVICE ALERT: KEN-INTERNET-SW1;Check Network latency and packet loss;CRITICAL;HARD;5;CRITICAL - 64.212.90.4: rta nan, lost 100%
The event log showed these other errors:
Runtime Error 2019-06-10 15:23:41 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:41 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:41 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:41 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:41 wproc: GLOBAL SERVICE EVENTHANDLER job 22510 from worker Core Worker 15257 is a non-check helper but exited with return code 1
Service Recovery 2019-06-10 15:23:41 SERVICE ALERT: WEY-97LIBBEY-RTR1;Check Network latency and packet loss;OK;SOFT;2;OK - 172.20.254.150: rta 5.736ms, lost 0%
Runtime Error 2019-06-10 15:23:30 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:30 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:30 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:30 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:30 wproc: GLOBAL SERVICE EVENTHANDLER job 22491 from worker Core Worker 15253 is a non-check helper but exited with return code 1
Service Recovery 2019-06-10 15:23:29 SERVICE ALERT: CHE-228BILLER-SW31;Check Network latency and packet loss;OK;SOFT;2;OK - 172.22.75.31: rta 6.563ms, lost 0%
Runtime Error 2019-06-10 15:23:20 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:20 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:20 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:20 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:20 wproc: GLOBAL SERVICE EVENTHANDLER job 22473 from worker Core Worker 15268 is a non-check helper but exited with return code 1
Service Unknown 2019-06-10 15:23:20 SERVICE ALERT: NOR-1177PROVI-SW31;Interface Table Status - edge network devices;UNKNOWN;SOFT;2;UNKNOWN - Plugin timed out (15s).
Runtime Error 2019-06-10 15:23:16 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:23:16 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:23:16 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:23:16 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:23:16 wproc: GLOBAL SERVICE EVENTHANDLER job 22465 from worker Core Worker 15279 is a non-check helper but exited with return code 1
Service Recovery 2019-06-10 15:23:16 SERVICE ALERT: BTR-111GROSSM-SW26;Check Network latency and packet loss;OK;SOFT;2;OK - 172.22.60.26: rta 3.246ms, lost 0%
Runtime Error 2019-06-10 15:23:14 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Service Notification 2019-06-10 15:16:01 SERVICE NOTIFICATION: pkarr;localhost;Nagios XI - Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse XML from http://localhost/nagiosxi ()
Service Notification 2019-06-10 15:16:01 SERVICE NOTIFICATION: gwhite;localhost;Nagios XI - Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse XML from http://localhost/nagiosxi ()
Service Notification 2019-06-10 15:16:01 SERVICE NOTIFICATION: bhankers;localhost;Nagios XI - Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse XML from http://localhost/nagiosxi ()
Runtime Error 2019-06-10 15:05:56 wproc: host=localhost; service=Nagios XI - Jobs; contact=pkarr
Runtime Error 2019-06-10 15:05:56 wproc: NOTIFY job 20661 from worker Core Worker 15272 is a non-check helper but exited with return code 1
Runtime Error 2019-06-10 15:05:56 wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Runtime Error 2019-06-10 15:05:56 wproc: stderr line 02: using dumb terminal settings.
Runtime Error 2019-06-10 15:05:56 wproc: stderr line 01: No entry for terminal type "unknown";
Runtime Error 2019-06-10 15:05:56 wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Runtime Error 2019-06-10 15:05:56 wproc: host=localhost; service=Nagios XI - Jobs; contact=bhankers
Runtime Error 2019-06-10 15:05:56 wproc: NOTIFY job 20661 from worker Core Worker 15265 is a non-check helper but exited with return code 1
Service Notification 2019-06-10 15:05:56 SERVICE NOTIFICATION: pkarr;localhost;Nagios XI - Jobs;WARNING;xi_service_notification_handler;Nonstop Operations Manager (nom) stale (5035 seconds old), Nonstop Operations Manager (nom) stale (5035 seconds old), Cleaner (cleaner) stale (474 seconds old)
Service Notification 2019-06-10 15:05:56 SERVICE NOTIFICATION: gwhite;localhost;Nagios XI - Jobs;WARNING;xi_service_notification_handler;Nonstop Operations Manager (nom) stale (5035 seconds old), Nonstop Operations Manager (nom) stale (5035 seconds old), Cleaner (cleaner) stale (474 seconds old)
Service Notification 2019-06-10 15:05:56 SERVICE NOTIFICATION: bhankers;localhost;Nagios XI - Jobs;WARNING;xi_service_notification_handler;Nonstop Operations Manager (nom) stale (5035 seconds old), Nonstop Operations Manager (nom) stale (5035 seconds old), Cleaner (cleaner) stale (474 seconds ol
Then this morning we had several more and it has been quiet since 8:40.
Information 2019-06-11 08:40:48 wproc: Core Worker 1324: job 98998 (pid=14148): Dormant child reaped
Runtime Error 2019-06-11 08:40:48 wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Runtime Error 2019-06-11 08:40:48 wproc: host=DRBOTTOMsw2; service=Check fan status on a cisco router or switch;
Runtime Error 2019-06-11 08:40:48 wproc: CHECK job 98998 from worker Core Worker 1324 timed out after 60.01s
Information 2019-06-11 08:40:48 wproc: Core Worker 1324: job 98998 (pid=14148) timed out. Killing it
Runtime Error 2019-06-11 03:15:58 wproc: stdout line 01: OK - No valid historical dataset... [details]
Runtime Error 2019-06-11 03:15:58 wproc: early_timeout=0; exited_ok=1; wait_status=14; error_code=0;
Runtime Error 2019-06-11 03:15:58 wproc: host=NOR-1177PROVI-SW41; service=Interface Table Status - edge network devices;
Runtime Error 2019-06-11 03:15:58 wproc: CHECK job 64635 from worker Core Worker 1318 died by signal 14 after 15.17 seconds