Nagios XI crashed
Posted: Wed Mar 20, 2024 10:46 am
Hello,
Yesterday my Nagios XI crashed. In the log I see logrotate errors, a kernel error, out of memory etc. I updated/upgraded the server, rebooted and everything was back to normal.
Server - Ubuntu 22.04, MySQL is located on a dedicated server.
I'll continue to monitor how the server works, but if you can let me know what else I need to do or check, I'd appreciate it.
//----------------------------------------------------------------------------------
Mar 19 00:00:06 wls-nxi01 systemd[1]: Failed to start Rotate log files.
░░ Subject: A start job for unit logrotate.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit logrotate.service has finished with a failure.
░░
░░ The job identifier is 165026 and the job result is failed.
Mar 19 15:03:45 wls-nxi01 kernel: Out of memory: Killed process 3003271 (php) total-vm:147828kB, anon-rss:13464kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:240kB oom_score_adj:0
Mar 19 15:03:56 wls-nxi01 kernel: Out of memory: Killed process 3003282 (php) total-vm:145780kB, anon-rss:6560kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:03:56 wls-nxi01 kernel: Out of memory: Killed process 3003275 (php) total-vm:145780kB, anon-rss:8204kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 3003278 (php) total-vm:145780kB, anon-rss:1116kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 3003273 (php) total-vm:145780kB, anon-rss:164kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:236kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2999825 (nagios) total-vm:1197332kB, anon-rss:676kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:492kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2999953 (nagios) total-vm:1223300kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:524kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2997504 (apache2) total-vm:273660kB, anon-rss:108kB, file-rss:0kB, shmem-rss:0kB, UID:33 pgtables:276kB oom_score_adj:0
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056306 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task jbd2/sda2-8:347 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task journal-offline:3056470 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task journal-offline:3056471 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:2:3049984 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:0:3051805 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:1:3055591 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056303 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056306 blocked for more than 241 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056465 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:20:36 wls-nxi01 systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Yesterday my Nagios XI crashed. In the log I see logrotate errors, a kernel error, out of memory etc. I updated/upgraded the server, rebooted and everything was back to normal.
Server - Ubuntu 22.04, MySQL is located on a dedicated server.
I'll continue to monitor how the server works, but if you can let me know what else I need to do or check, I'd appreciate it.
//----------------------------------------------------------------------------------
Mar 19 00:00:06 wls-nxi01 systemd[1]: Failed to start Rotate log files.
░░ Subject: A start job for unit logrotate.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit logrotate.service has finished with a failure.
░░
░░ The job identifier is 165026 and the job result is failed.
Mar 19 15:03:45 wls-nxi01 kernel: Out of memory: Killed process 3003271 (php) total-vm:147828kB, anon-rss:13464kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:240kB oom_score_adj:0
Mar 19 15:03:56 wls-nxi01 kernel: Out of memory: Killed process 3003282 (php) total-vm:145780kB, anon-rss:6560kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:03:56 wls-nxi01 kernel: Out of memory: Killed process 3003275 (php) total-vm:145780kB, anon-rss:8204kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 3003278 (php) total-vm:145780kB, anon-rss:1116kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 3003273 (php) total-vm:145780kB, anon-rss:164kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:236kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2999825 (nagios) total-vm:1197332kB, anon-rss:676kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:492kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2999953 (nagios) total-vm:1223300kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:524kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2997504 (apache2) total-vm:273660kB, anon-rss:108kB, file-rss:0kB, shmem-rss:0kB, UID:33 pgtables:276kB oom_score_adj:0
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056306 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task jbd2/sda2-8:347 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task journal-offline:3056470 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task journal-offline:3056471 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:2:3049984 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:0:3051805 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:1:3055591 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056303 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056306 blocked for more than 241 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056465 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:20:36 wls-nxi01 systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!