Getting a random crash, some log files attached

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
theo481
Posts: 9
Joined: Tue Sep 01, 2020 10:26 am

Getting a random crash, some log files attached

Post by theo481 »

We're getting what seems to be a random crash on our NagiosXI server. This seems to be the start of the problem.

Code: Select all

Aug 31 20:14:18 nagios-prod nagios: wproc: Core Worker 19749: job 13714 (pid=65141) timed out. Killing it
Aug 31 20:14:18 nagios-prod nagios: wproc: CHECK job 13714 from worker Core Worker 19749 timed out after 124.73s
Aug 31 20:14:21 nagios-prod systemd: Started Session c80451 of user root.
Aug 31 20:14:52 nagios-prod journal: Suppressed 334 messages from /system.slice/nagios.service
Aug 31 20:15:07 nagios-prod nagios: job 13755 (pid=748): read() returned error 11
Aug 31 20:15:13 nagios-prod nagios: job 13785 (pid=512): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13756 (pid=749): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13885 (pid=852): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13757 (pid=750): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13758 (pid=751): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13758 (pid=751): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13758 (pid=751): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13758 (pid=751): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13759 (pid=752): read() returned error 11
Aug 31 20:15:14 nagios-prod nagios: job 13755 (pid=973): read() returned error 11
Aug 31 20:15:15 nagios-prod nagios: job 13760 (pid=753): read() returned error 11
Aug 31 20:15:15 nagios-prod nagios: job 13761 (pid=754): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13762 (pid=755): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13763 (pid=756): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13764 (pid=757): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13765 (pid=758): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13766 (pid=759): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13767 (pid=760): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13768 (pid=761): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13769 (pid=762): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13770 (pid=763): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13771 (pid=764): read() returned error 11
Aug 31 20:15:16 nagios-prod nagios: job 13772 (pid=765): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13773 (pid=766): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13774 (pid=767): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13775 (pid=768): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13776 (pid=769): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13777 (pid=770): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13778 (pid=771): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13779 (pid=772): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13780 (pid=773): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13756 (pid=975): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13757 (pid=976): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13758 (pid=977): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13759 (pid=978): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13760 (pid=979): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13761 (pid=980): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13762 (pid=981): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13763 (pid=982): read() returned error 11
Aug 31 20:15:17 nagios-prod nagios: job 13764 (pid=983): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13765 (pid=987): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13766 (pid=988): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13767 (pid=989): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13768 (pid=991): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13769 (pid=992): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13770 (pid=993): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13771 (pid=995): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13772 (pid=996): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13773 (pid=997): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13774 (pid=998): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13775 (pid=999): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13776 (pid=1000): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13777 (pid=1003): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13778 (pid=1004): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13779 (pid=1005): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13780 (pid=1006): read() returned error 11
Aug 31 20:15:18 nagios-prod nagios: job 13781 (pid=1007): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13781 (pid=465): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13783 (pid=471): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13781 (pid=774): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13781 (pid=774): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13782 (pid=775): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13782 (pid=775): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13782 (pid=775): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13782 (pid=775): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13782 (pid=775): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13782 (pid=508): read() returned error 11
Aug 31 20:15:19 nagios-prod nagios: job 13783 (pid=509): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13786 (pid=513): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13788 (pid=515): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13789 (pid=516): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13790 (pid=517): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13791 (pid=518): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13792 (pid=519): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13793 (pid=520): read() returned error 11
Aug 31 20:15:20 nagios-prod nagios: job 13796 (pid=524): read() returned error 11
After this repeats for a long time it switches over to

Code: Select all

ug 31 20:24:01 nagios-prod systemd: Started Session 1649639 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649632 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649640 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649633 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649636 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649641 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649642 of user nagios.
Aug 31 20:24:01 nagios-prod systemd: Started Session 1649638 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649647 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649645 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649649 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649643 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649650 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649648 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649646 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649652 of user root.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649644 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649651 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649654 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649655 of user nagios.
Aug 31 20:25:01 nagios-prod systemd: Started Session 1649653 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649657 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649658 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649659 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649663 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649661 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649665 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649656 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649660 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649664 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649662 of user nagios.
Aug 31 20:26:01 nagios-prod systemd: Started Session 1649666 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649667 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649669 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649671 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649673 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649674 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649676 of user nagios.
Aug 31 20:27:01 nagios-prod systemd: Started Session 1649675 of user nagios.
It is seemingly down until a restart of the monitoring process is done.

Any ideas?
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Getting a random crash, some log files attached

Post by dchurch »

If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Getting a random crash, some log files attached

Post by dchurch »

In `/etc/my.cnf.d/nagios.cnf`, create a file with the following lines:

Code: Select all

[mysqld]
innodb_log_buffer_size = 32M
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
max_allowed_packet = 67108864
Then run the following commands as root:

Code: Select all

/usr/local/nagiosxi/scripts/manage_services.sh stop nagios
/usr/local/nagiosxi/scripts/manage_services.sh stop mysqld
rm /var/lib/mysql/ib_logfile0
rm /var/lib/mysql/ib_logfile1
/usr/local/nagiosxi/scripts/manage_services.sh restart mysqld
/usr/local/nagiosxi/scripts/manage_services.sh restart nagios
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
theo481
Posts: 9
Joined: Tue Sep 01, 2020 10:26 am

Re: Getting a random crash, some log files attached

Post by theo481 »

dchurch wrote:If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
sent
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Getting a random crash, some log files attached

Post by dchurch »

I guess I was too quick to reply after receiving your profile.

Did you try my suggestion above? Did that fix your issue?
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
theo481
Posts: 9
Joined: Tue Sep 01, 2020 10:26 am

Re: Getting a random crash, some log files attached

Post by theo481 »

I will try this today, I was out with COVID.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Getting a random crash, some log files attached

Post by benjaminsmith »

I will try this today, I was out with COVID.
Sorry to hear that, and hope you are feeling much better.

Let us know how it goes. Dan is out on vacation, please send any private message to my account. Thanks, Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
theo481
Posts: 9
Joined: Tue Sep 01, 2020 10:26 am

Re: Getting a random crash, some log files attached

Post by theo481 »

Seems to be running better, but now my backups are failing.

Code: Select all

Last output from the system: Error backing up MySQL database 'nagios' - check the password in this script!
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Getting a random crash, some log files attached

Post by benjaminsmith »

Hi,

The last profile sent had crashed tables in the database log, try running the repair script below as root from the CLI.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Then test run another backup job. If that's not the issue, verify the correct passwords using the guide below.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

--Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked