Nagios XI crashed

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
ereut
Posts: 9
Joined: Mon Sep 13, 2021 3:26 pm
Location: Louisville, CO

Nagios XI crashed

Post by ereut »

Hello,
Yesterday my Nagios XI crashed. In the log I see logrotate errors, a kernel error, out of memory etc. I updated/upgraded the server, rebooted and everything was back to normal.
Server - Ubuntu 22.04, MySQL is located on a dedicated server.

I'll continue to monitor how the server works, but if you can let me know what else I need to do or check, I'd appreciate it.


//----------------------------------------------------------------------------------
Mar 19 00:00:06 wls-nxi01 systemd[1]: Failed to start Rotate log files.
░░ Subject: A start job for unit logrotate.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit logrotate.service has finished with a failure.
░░
░░ The job identifier is 165026 and the job result is failed.
Mar 19 15:03:45 wls-nxi01 kernel: Out of memory: Killed process 3003271 (php) total-vm:147828kB, anon-rss:13464kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:240kB oom_score_adj:0
Mar 19 15:03:56 wls-nxi01 kernel: Out of memory: Killed process 3003282 (php) total-vm:145780kB, anon-rss:6560kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:03:56 wls-nxi01 kernel: Out of memory: Killed process 3003275 (php) total-vm:145780kB, anon-rss:8204kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 3003278 (php) total-vm:145780kB, anon-rss:1116kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:232kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 3003273 (php) total-vm:145780kB, anon-rss:164kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:236kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2999825 (nagios) total-vm:1197332kB, anon-rss:676kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:492kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2999953 (nagios) total-vm:1223300kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:524kB oom_score_adj:0
Mar 19 15:04:53 wls-nxi01 kernel: Out of memory: Killed process 2997504 (apache2) total-vm:273660kB, anon-rss:108kB, file-rss:0kB, shmem-rss:0kB, UID:33 pgtables:276kB oom_score_adj:0
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056306 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task jbd2/sda2-8:347 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task journal-offline:3056470 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task journal-offline:3056471 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:2:3049984 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:0:3051805 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task kworker/u4:1:3055591 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056303 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056306 blocked for more than 241 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:21:15 wls-nxi01 kernel: INFO: task php:3056465 blocked for more than 120 seconds.
Mar 19 19:21:15 wls-nxi01 kernel: Not tainted 5.15.0-91-generic #101-Ubuntu
Mar 19 19:21:15 wls-nxi01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 19:20:36 wls-nxi01 systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
User avatar
lgute
Posts: 117
Joined: Mon Apr 06, 2020 2:49 pm

Re: Nagios XI crashed

Post by lgute »

Hi @ereut, thanks for reaching out.

Did you receive notifications for high memory usage on your XI server?

You may want to lower the Warning and Critical thresholds, so you have more time to react.

It may also be worth while to look into setting up an event handler for instances where the Memory Usage gets too high.
Please let us know if you have any other questions or concerns.

-Laura
Post Reply