We also got a page last night with the following indicating that the issue might be getting worse. Here's the flow of alerts and timeline. While we get warning emails pretty often, this is the critical that caused a page, meaning it was critical for >20 minutes.
In checking me emails, I see ~100 alerts on Nagios_Remote_Jobs since 7/7 if that gives any perspective about how often it alerts. There are 4 retries and a 15 minut notification delay.
Code: Select all
2021-07-14 07:15:07 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;OK;HARD;4;All jobs are running okay.
Service Critical 2021-07-14 07:13:44 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;CRITICAL;HARD;4;Error: Could not parse JSON from https://10.133.134.84/nagiosxi/ (false
Service Notification 2021-07-14 07:13:44 SERVICE NOTIFICATION: 1vzw.net.cdsp-sms;txslm2mlnag001;Nagios_Remote_Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse JSON from https://10.133.134.84/nagiosxi/ (false
Service Notification 2021-07-14 07:13:44 SERVICE NOTIFICATION: 1vzw.net.cdsp-mail;txslm2mlnag001;Nagios_Remote_Jobs;CRITICAL;xi_service_notification_handler;Error: Could not parse JSON from https://10.133.134.84/nagiosxi/ (false
Service Notification 2021-07-14 07:03:50 SERVICE NOTIFICATION: 1vzw.net.cdsp-mail;txslm2mlnag001;Nagios_Remote_Jobs;WARNING;xi_service_notification_handler;Database Maintenance (dbmaint) stale (528 seconds old)
Service Notification 2021-07-14 06:58:49 SERVICE NOTIFICATION: 1vzw.net.cdsp-mail;txslm2mlnag002;Nagios_Remote_Jobs;WARNING;xi_service_notification_handler;Database Maintenance (dbmaint) stale (523 seconds old)
Service Warning 2021-07-14 06:44:05 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;WARNING;HARD;4;Database Maintenance (dbmaint) stale (543 seconds old)
Service Warning 2021-07-14 06:43:05 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;WARNING;SOFT;3;Database Maintenance (dbmaint) stale (483 seconds old)
Service Warning 2021-07-14 06:42:04 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;WARNING;SOFT;2;Database Maintenance (dbmaint) stale (422 seconds old)
Service Warning 2021-07-14 06:41:03 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;WARNING;SOFT;1;Database Maintenance (dbmaint) stale (361 seconds old)
Service Recovery 2021-07-14 06:40:03 SERVICE ALERT: txslm2mlnag001;Nagios_Remote_Jobs;OK;SOFT;2;All jobs are running okay.
Here's the output from the remote 5.8.4 host.
Code: Select all
ACCESSING URL: https://10.133.134.84/nagiosxi/api/v1/system/statusdetail?apikey=J8eTtlRoIYHGJdqtWU260TZsTG8N7GM6NZgAEnfjlVfk8J74D9pT2JFcl4fLJ07M
RESULT:
Array
(
[headers] => Array
(
[Date] => Wed, 14 Jul 2021 17:11:32 GMT
[Server] => Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16
[X-Powered-By] => PHP/5.4.16
[Access-Control-Allow-Origin] => *
[Access-Control-Allow-Methods] => POST, GET, OPTIONS, DELETE, PUT
[Content-Length] => 2298
[Content-Type] => application/json
)
[body] => {"nom":{"last_check":"1626282661"},"cleaner":{"last_check":"1626282662"},"deadpool_reaper":{"last_check":"1626282662"},"iostat":{"updated":"2021-07-14 17:11:27","user":"10.35","nice":"0.00","system":"2.21","iowait":"0.00","steal":"0.00","idle":"87.44"},"sysstat":{"last_check":"1626282682"},"eventman":{"last_check":"1626282692"},"cmdsubsys":{"last_check":"1626282692"},"dbmaint":{"last_check":"1626282603"},"perfdataprocessor":{"last_check":"1626282691"},"dbbackend":{"last_checkin":"2020-09-24 07:53:43","bytes_processed":"14702208","entries_processed":"24861","connect_time":"2020-09-24 07:33:17","disconnect_time":"0000-00-00 00:00:00"},"load":{"updated":"2021-07-14 17:11:22","load1":"1.45","load5":"0.99","load15":"0.80"},"memory":{"updated":"2021-07-14 17:11:22","total":"64266","used":"1691","free":"493","shared":"3212","buffers":"62080","cached":"58842"},"swap":{"updated":"2021-07-14 17:11:22","total":"32767","used":"8","free":"32759"},"feedprocessor":{"last_check":"1626282682"},"reportengine":{"last_check":"1626282661"},"daemons":{"updated":"2021-07-14 17:11:22","daemon":[{"@attributes":{"id":"nagioscore"},"name":"nagios","output":" ??32405 \/usr\/bin\/perl -w \/usr\/local\/nagios\/libexec\/check_hp -H 2001:4888:a03:311f:c0:a:0:413 --timeout=45 --community=sp1der --exclude=cpqFcaHostCntlrStatus","return_code":"0","status":"0"},{"@attributes":{"id":"pnp"},"name":"npcd","output":" ??13922 \/usr\/local\/nagios\/bin\/npcd -d -f \/usr\/local\/nagios\/etc\/pnp\/npcd.cfg","return_code":"0","status":"0"}]},"nagioscore":{"updated":"2021-07-14 17:11:22","activehostchecks":{"val1":"52","val5":"308","val15":"308"},"passivehostchecks":{"val1":"0","val5":"0","val15":"0"},"activeservicechecks":{"val1":"419","val5":"2096","val15":"2098"},"passiveservicechecks":{"val1":"1","val5":"1","val15":"1"},"activehostcheckperf":{"min_latency":"0","max_latency":"1.4704060554504395","avg_latency":"0.02912612356584495","min_execution_time":"0.0035369999999999998","max_execution_time":"4.120271","avg_execution_time":"3.960959032467532"},"activeservicecheckperf":{"min_latency":"0","max_latency":"1.472885012626648","avg_latency":"0.03401063694047463","min_execution_time":"0.0028250000000000003","max_execution_time":"13.892618","avg_execution_time":"0.6476608702471491"}}}
[info] => Array
(
[url] => https://10.133.134.84/nagiosxi/api/v1/system/statusdetail?apikey=J8eTtlRoIYHGJdqtWU260TZsTG8N7GM6NZgAEnfjlVfk8J74D9pT2JFcl4fLJ07M
[content_type] => application/json
[http_code] => 200
[header_size] => 311
[request_size] => 189
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.47397
[namelookup_time] => 6.8E-5
[connect_time] => 0.025447
[pretransfer_time] => 0.253035
[size_upload] => 0
[size_download] => 2298
[speed_download] => 4848
[speed_upload] => 0
[download_content_length] => 2298
[upload_content_length] => 0
[starttransfer_time] => 0.473957
[redirect_time] => 0
[certinfo] => Array
(
)
[primary_ip] => 10.133.134.84
[primary_port] => 443
[local_ip] => 10.136.243.84
[local_port] => 52232
[redirect_url] =>
)
)
XML DATA LOOKS OK
CHECKING JOB reportengine (Report Engine)
CHECKING JOB sysstat (System Statistics)
CHECKING JOB eventman (Event Manager)
CHECKING JOB feedprocessor (Feed Processor)
CHECKING JOB cmdsubsys (Command Subsystem)
CHECKING JOB nom (Nonstop Operations Manager)
CHECKING JOB dbmaint (Database Maintenance)
CHECKING JOB cleaner (Cleaner)
All jobs are running okay.
Here's the output from the local 5.7.3 host.
Code: Select all
ACCESSING URL: https://10.133.134.84/nagiosxi/api/v1/system/statusdetail?apikey=J8eTtlRoIYHGJdqtWU260TZsTG8N7GM6NZgAEnfjlVfk8J74D9pT2JFcl4fLJ07M
RESULT:
Array
(
[headers] => Array
(
[Date] => Wed, 14 Jul 2021 17:10:49 GMT
[Server] => Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16
[X-Powered-By] => PHP/5.4.16
[Access-Control-Allow-Origin] => *
[Access-Control-Allow-Methods] => POST, GET, OPTIONS, DELETE, PUT
[Content-Length] => 2261
[Content-Type] => application/json
)
[body] => {"nom":{"last_check":"1626282602"},"cleaner":{"last_check":"1626282603"},"deadpool_reaper":{"last_check":"1626282602"},"iostat":{"updated":"2021-07-14 17:10:48","user":"11.13","nice":"0.00","system":"3.26","iowait":"0.00","steal":"0.00","idle":"85.61"},"sysstat":{"last_check":"1626282643"},"eventman":{"last_check":"1626282648"},"cmdsubsys":{"last_check":"1626282649"},"dbmaint":{"last_check":"1626282603"},"perfdataprocessor":{"last_check":"1626282642"},"dbbackend":{"last_checkin":"2020-09-24 07:53:43","bytes_processed":"14702208","entries_processed":"24861","connect_time":"2020-09-24 07:33:17","disconnect_time":"0000-00-00 00:00:00"},"load":{"updated":"2021-07-14 17:10:43","load1":"0.61","load5":"0.79","load15":"0.73"},"memory":{"updated":"2021-07-14 17:10:43","total":"64266","used":"1630","free":"556","shared":"3212","buffers":"62079","cached":"58903"},"swap":{"updated":"2021-07-14 17:10:43","total":"32767","used":"8","free":"32759"},"feedprocessor":{"last_check":"1626282643"},"reportengine":{"last_check":"1626282602"},"daemons":{"updated":"2021-07-14 17:10:43","daemon":[{"@attributes":{"id":"nagioscore"},"name":"nagios","output":" ??31381 \/usr\/local\/nagios\/libexec\/check_nrpe -H 10.133.31.237 --v2-packets-only --unknown-timeout -t 59 3 -c check_init_service -a vasd","return_code":"0","status":"0"},{"@attributes":{"id":"pnp"},"name":"npcd","output":" ??13922 \/usr\/local\/nagios\/bin\/npcd -d -f \/usr\/local\/nagios\/etc\/pnp\/npcd.cfg","return_code":"0","status":"0"}]},"nagioscore":{"updated":"2021-07-14 17:10:43","activehostchecks":{"val1":"55","val5":"308","val15":"308"},"passivehostchecks":{"val1":"0","val5":"0","val15":"0"},"activeservicechecks":{"val1":"427","val5":"2098","val15":"2098"},"passiveservicechecks":{"val1":"1","val5":"1","val15":"1"},"activehostcheckperf":{"min_latency":"0","max_latency":"1.4704060554504395","avg_latency":"0.0297475943374015","min_execution_time":"0.003146","max_execution_time":"4.120271","avg_execution_time":"3.9612676363636363"},"activeservicecheckperf":{"min_latency":"0","max_latency":"1.472885012626648","avg_latency":"0.029339193409710605","min_execution_time":"0.0028250000000000003","max_execution_time":"13.892618","avg_execution_time":"0.648569172528517"}}}
[info] => Array
(
[url] => https://10.133.134.84/nagiosxi/api/v1/system/statusdetail?apikey=J8eTtlRoIYHGJdqtWU260TZsTG8N7GM6NZgAEnfjlVfk8J74D9pT2JFcl4fLJ07M
[content_type] => application/json
[http_code] => 200
[header_size] => 311
[request_size] => 189
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.318283
[namelookup_time] => 5.4E-5
[connect_time] => 0.000247
[pretransfer_time] => 0.116056
[size_upload] => 0
[size_download] => 2261
[speed_download] => 7103
[speed_upload] => 0
[download_content_length] => 2261
[upload_content_length] => 0
[starttransfer_time] => 0.318274
[redirect_time] => 0
[certinfo] => Array
(
)
[primary_ip] => 10.133.134.84
[primary_port] => 443
[local_ip] => 10.133.134.85
[local_port] => 33306
[redirect_url] =>
)
)
XML DATA LOOKS OK
CHECKING JOB reportengine (Report Engine)
CHECKING JOB sysstat (System Statistics)
CHECKING JOB eventman (Event Manager)
CHECKING JOB feedprocessor (Feed Processor)
CHECKING JOB cmdsubsys (Command Subsystem)
CHECKING JOB nom (Nonstop Operations Manager)
CHECKING JOB dbmaint (Database Maintenance)
CHECKING JOB cleaner (Cleaner)
All jobs are running okay.