Issues on scheduled reports
Re: Issues on scheduled reports
Per the document https://dev.mysql.com/doc/refman/8.0/en/gone-away.html the most likely cause would be the timeout being the root of the message. I'm not sure why the difference in where it runs matters quiet yet, but I would try increasing the wait_time on the database. Also, checking the /var/log/mysqld.log may give us some clues.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Issues on scheduled reports
Hello,
MariaDB variables:
log_warning & slow_query_log are enabled.
If some connection issues or long queries related to reports happenned the event would have been logged.
(But there are some difference between MariaDB & MySQL)
MariaBD main log and slow-queries log files have no messages related to such events.
The main log is clean/empty and the slow-queries contains only few requests.
I scheduled a new report this morning. Same behavior but no log.
When a report is built in less than 5min, the "services availabilty" part is filled up correctly.
I may change the wait_timeout variable.
I can't change it at the GLOBAL level without a DB restart.
But as it is a production server, it can't be done easily. It has to be explained, documented, justified and scheduled.
I need certainties before doing that.
Or maybe a small patch in Nagios report tool/script that close DB connections between queries ?
MariaDB variables:
Code: Select all
log_warnings = 1
slow_query_log = ON
slow_query_log_file = /var/log/mariadb/mariadb-slow.log
log_slow_filter = admin,filesort,filesort_on_disk,full_join,full_scan,query_cache,query_cache_miss,tmp_table,tmp_table_on_disk
long_query_time = 10.000000
wait_timeout = 300
interactive_timeout = 28800
connect_timeout = 10
If some connection issues or long queries related to reports happenned the event would have been logged.
(But there are some difference between MariaDB & MySQL)
MariaBD main log and slow-queries log files have no messages related to such events.
The main log is clean/empty and the slow-queries contains only few requests.
I scheduled a new report this morning. Same behavior but no log.
Code: Select all
MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name LIKE '%abort%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Aborted_clients | 3162 |
| Aborted_connects | 21331 |
+------------------+-------+
2 rows in set (0.00 sec)
- aborted_connects are clients who attempt to connect and fail. Usually this is because of incorrect credentials (wrong password or no matching host for the user).
- aborted_clients are the client not closing the connection properly.
When a report is built in less than 5min, the "services availabilty" part is filled up correctly.
I may change the wait_timeout variable.
I can't change it at the GLOBAL level without a DB restart.
But as it is a production server, it can't be done easily. It has to be explained, documented, justified and scheduled.
I need certainties before doing that.
Or maybe a small patch in Nagios report tool/script that close DB connections between queries ?
Loïc VAILLANT
Re: Issues on scheduled reports
Did you increase the max_input_vars settings in the /etc/php.ini file?
Edit the php.ini file and add the following to it.
Save the change and restart Apache by running
I have seen that if the php settings are not large enough, it can cause the MYSQL timeout error so try this change and let us know how it works out.
Edit the php.ini file and add the following to it.
Code: Select all
max_input_vars = 10000Code: Select all
service httpd restartBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: Issues on scheduled reports
The current PHP configuration is:
max_input_vars = 1000 is the default value (see documentation) since PHP 5.3.9 (my version: php-5.4.16-43.el7_4.1.x86_64)
Moreover per documentation (max-input-vars) if there are more input variables than specified by this directive, an E_WARNING is issued, and further input variables are truncated from the request.
So E_WARNING should be logged... But nothing related in httpd log files.
Simulating and stracing a scheduled report in cmdline, I detected that /usr/local/nagiosxi/var/scheduledreporting.log file did not exist.
Now created, it does not reveal error messages. But I can see the related 2 wget commands.
The strace output is more interesting: it took less than 1min to generate the host_availability part of the report, but +5min for the service_availabilty part.
(The related servicegroup contains 9 hosts & 56 services.)
Why such a difference ?
For the second read(/wget), the length is unspecified.
I'm digging further more...
Code: Select all
error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT
...
; How many GET/POST/COOKIE input variables may be accepted
; max_input_vars = 1000
Moreover per documentation (max-input-vars) if there are more input variables than specified by this directive, an E_WARNING is issued, and further input variables are truncated from the request.
So E_WARNING should be logged... But nothing related in httpd log files.
Simulating and stracing a scheduled report in cmdline, I detected that /usr/local/nagiosxi/var/scheduledreporting.log file did not exist.
Now created, it does not reveal error messages. But I can see the related 2 wget commands.
The strace output is more interesting: it took less than 1min to generate the host_availability part of the report, but +5min for the service_availabilty part.
(The related servicegroup contains 9 hosts & 56 services.)
Why such a difference ?
Code: Select all
time strace /usr/bin/php /usr/local/nagiosxi/html/includes/components/scheduledreporting/sendreport.php --report=2prcF0 --username="loic.vaillant" 2>&1 > /tmp/report.log
...
read(7, --2018-06-05 11:23:44-- https://XXX.YYY.ZZZ/nagiosxi//reports/availability.php?reportperiod=lastmonth&startdate=&enddate=&host=&service=&hostgroup=&servicegroup=myservicegroup&advanced=1&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringdowntime=yes&includesoftstates=no&assumedhoststate=3&assumedservicestate=6&timeperiod=&dont_count_downtime=on&dont_count_unknown=on&username=loic.vaillant&ticket=foobarp&locale=en_US&mode=csv&csvtype=host
Resolving XXX.YYY.ZZZ (XXX.YYY.ZZZ)... a.b.c.d
Connecting to XXX.YYY.ZZZ (XXX.YYY.ZZZ)|a.b.c.d|:443... connected.
WARNING: cannot verify XXX.YYY.ZZZ's certificate, issued by ‘/DC=lan/DC=zodiac/CN=HQ-CA1’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 430 [application/octet-stream]
Saving to: ‘/usr/local/nagiosxi/tmp/scheduledreport-loic.vaillant-2prcF0-Host_Availability.csv’
100%[===================================================================================================================================================================================================>] 430 --.-K/s in 0s
2018-06-05 11:24:26 (32.5 MB/s) - ‘/usr/local/nagiosxi/tmp/scheduledreport-loic.vaillant-2prcF0-Host_Availability.csv’ saved [430/430]
...
read(7, --2018-06-05 11:24:26-- https://XXX.YYY.ZZZ/nagiosxi//reports/availability.php?reportperiod=lastmonth&startdate=&enddate=&host=&service=&hostgroup=&servicegroup=myservicegroup&advanced=1&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringdowntime=yes&includesoftstates=no&assumedhoststate=3&assumedservicestate=6&timeperiod=&dont_count_downtime=on&dont_count_unknown=on&username=loic.vaillant&ticket=foobar&locale=en_US&mode=csv&csvtype=service
Resolving XXX.YYY.ZZZ (XXX.YYY.ZZZ)... a.b.c.d
Connecting to XXX.YYY.ZZZ (XXX.YYY.ZZZ)|a.b.c.d|:443... connected.
WARNING: cannot verify XXX.YYY.ZZZ's certificate, issued by ‘/DC=lan/DC=zodiac/CN=HQ-CA1’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘/usr/local/nagiosxi/tmp/scheduledreport-loic.vaillant-2prcF0-Service_Availability.csv’
[ <=> ] 24,262 --.-K/s in 0.1s
2018-06-05 11:29:57 (207 KB/s) - ‘/usr/local/nagiosxi/tmp/scheduledreport-loic.vaillant-2prcF0-Service_Availability.csv’ saved [24262]
exit_group(0) = ?
+++ exited with 0 +++
real 6m13.198s
user 0m0.250s
sys 0m0.291s
I'm digging further more...
Loïc VAILLANT
Re: Issues on scheduled reports
I took a look back to the wait_timeout attribute.
Even if it is a GLOBAL VARIABLE, it is possible to change its value without restarting MySQL/MariaDB.
After that, the report that failed is now ok.
I'm now trying bigger reports to test the efficiency of the fix.
You should also modify the /etc/my.cnf file to make this modification durable.
Even if it is a GLOBAL VARIABLE, it is possible to change its value without restarting MySQL/MariaDB.
Code: Select all
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE '%wait_timeout%';
+---------------------------------------------------+----------+
| Variable_name | Value |
+---------------------------------------------------+----------+
| wait_timeout | 300 |
+---------------------------------------------------+----------+
1 row in set (0.00 sec)
MariaDB [(none)]> SET GLOBAL wait_timeout = 600;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE '%wait_timeout%';
+---------------------------------------------------+----------+
| Variable_name | Value |
+---------------------------------------------------+----------+
| wait_timeout | 600 |
+---------------------------------------------------+----------+
1 row in set (0.00 sec)
I'm now trying bigger reports to test the efficiency of the fix.
You should also modify the /etc/my.cnf file to make this modification durable.
Loïc VAILLANT
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Issues on scheduled reports
Thanks for reporting back and I'm glad to hear that the failed report is now oklvaillant wrote:I took a look back to the wait_timeout attribute.
Even if it is a GLOBAL VARIABLE, it is possible to change its value without restarting MySQL/MariaDB.
After that, the report that failed is now ok.Code: Select all
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE '%wait_timeout%'; +---------------------------------------------------+----------+ | Variable_name | Value | +---------------------------------------------------+----------+ | wait_timeout | 300 | +---------------------------------------------------+----------+ 1 row in set (0.00 sec) MariaDB [(none)]> SET GLOBAL wait_timeout = 600; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE '%wait_timeout%'; +---------------------------------------------------+----------+ | Variable_name | Value | +---------------------------------------------------+----------+ | wait_timeout | 600 | +---------------------------------------------------+----------+ 1 row in set (0.00 sec)
I'm now trying bigger reports to test the efficiency of the fix.
You should also modify the /etc/my.cnf file to make this modification durable.
Re: Issues on scheduled reports
Hello,
You can new close this thread as Solved.
Thank you for your help.
You can new close this thread as Solved.
Thank you for your help.
Loïc VAILLANT