live data timeout settings

This support forum board is for questions relating to Nagios Fusion.
Locked
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

live data timeout settings

Post by nosajche »

Hello,

We are trying to get our Fusion environment to fuse with Nagios XI and pull our Log Server data and we are running into some problems.

We are running RHEL 7.9 (Maipo) on all boxes:

Fusion - v4.1.8
XI - v5.8.1
Log Server - v2.1.7

After successfully fusing the XI boxes (all green check marks for XI Fused servers and Log Server API authentication), we are seeing the following error message constantly in fusion.log:

Code: Select all

[2021-05-14 17:02:41] [SYSTEM] [ERROR]: poll_server() CHECK YOUR LIVE_DATA_TIMEOUT SETTINGS. IT MAY NEED INCREASED
In addition, dberrors.log is constantly growing with the following message:

Code: Select all

Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
We have already adjusted the timeout values and changed the Polling config:
Memory Limit - 1 G
Simultaneous Pollers - 4
Live Data Timeout - 300 seconds

We have some of the data generating in the dashlets but not all of the information-- any ideas on what to troubleshoot?


Thanks.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: live data timeout settings

Post by benjaminsmith »

HI,

It looks like the system may be having trouble processing all the incoming data, let's run a few tests to confirm this.

1. If these are large systems, try increasing the polling interval beyond 300 seconds. You can set this value globally or on a per-server basis.

2. For testing purposes, try taking the Live Data Timeout setting, to a very large number (600) to see if that resolves the error message in the fusion log.

3. Try to truncate the polled data from the table so the server will start over with fresh data. Run this as root.

Code: Select all

cd /usr/local/nagiosfusion/scripts
./truncate_polled.php
This will remove all of the temporary data that is used in the Fusion interface so you will have to wait until the next time a poll happens for updated data.

Lastly, double-check the disk space on this server, just to rule that out.

Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Re: live data timeout settings

Post by nosajche »

Hi Benjamin,

I've adjusted the polling interval for the larger servers to 600 seconds, the timeout to 600 seconds and ran the truncate_polled.php script but am still receiving the same errors re: data timeout.

Disk space is also not an issue, definitely plenty of that to go around.

Also looking at other threads, I tried the CURL command using the fuse key on the largest XI server and the CURL command completes in pretty good time:

Code: Select all

real	0m14.211s
user	0m0.096s
sys	0m0.179s

[1]-  Done                    time curl -k -XGET https://nagiosxihost.org/nagiosxi/api/v1/objects/servicestatus?fusekey=<FUSEKEY>
[3]+  Done                    user=nagiosadmin

Any other logs or anything else I can check on?


Thanks,


Jason
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: live data timeout settings

Post by ssax »

Please reboot the fusion server to kill off any old processes, then when it comes back up, please go to Admin > and click the gear icon to clear the polling locks.

Attach this file as well:

Code: Select all

/etc/php.ini
Send a screenshot of your settings in Admin > System Settings > Data & Polling.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: live data timeout settings

Post by ssax »

Your php.ini looks good.

Please go to Admin > System Settings > Data & Polling and set the Polling Subsystem Memory Limit to -1 (that's minus 1) and change the Simultaneous Pollers to 1.

Then check the box for Mapped Users Polling.

Then wait 5 minutes and let me know if they start polling properly.

What do you see in Admin > Fusion Logs?

Do you see any errors in the files in /usr/local/nagiosfusion/var/log that could be related?

Please note that for the hostgroups/servicegroups poll you may see the live data timeout message but those can be ingored as it's just the way that it's written.
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Re: live data timeout settings

Post by nosajche »

We are still seeing the following log entries constantly:

Code: Select all

[2021-06-09 02:48:08] [SYSTEM] [ERROR]: poll_server() unable to poll data for s:<XI SERVER>, u:nagiosadmin, poll:servicegroup
[2021-06-09 02:48:08] [SYSTEM] [ERROR]: poll_server() CHECK YOUR LIVE_DATA_TIMEOUT SETTINGS. IT MAY NEED INCREASED
[2021-06-09 04:32:19] [SYSTEM] [ERROR]: poll_server() unable to poll data for s:<XI SERVER>, u:nagiosadmin, poll:servicegroup
[2021-06-09 04:32:19] [SYSTEM] [ERROR]: poll_server() CHECK YOUR LIVE_DATA_TIMEOUT SETTINGS. IT MAY NEED TO BE INCREASED
[2021-06-09 04:37:21] [SYSTEM] [ERROR]: poll_server() unable to poll data for s:<XI SERVER>, u:nagiosadmin, poll:servicegroup
[2021-06-09 04:37:21] [SYSTEM] [ERROR]: poll_server() CHECK YOUR LIVE_DATA_TIMEOUT SETTINGS. IT MAY NEED TO BE INCREASED
[2021-06-09 04:42:24] [SYSTEM] [ERROR]: poll_server() unable to poll data for s:<XI SERVER>, u:nagiosadmin, poll:servicegroup
However, your response would seem to indicate this is a normal thing? Why is that?

Furthermore after making the changes you detailed, the following entries started to show in fusion.log:

Code: Select all

[2021-06-09 10:58:02] [SYSTEM] [ERROR]: insert_polled_data() caught exception: Unable to UPDATE polled_data ( [:polled_data_id] => 7146, [:monitoring_engine] => 0, [:notifications] => 0, [:active_checks] => 0, [:passive_checks] => 0, [:event_handlers] => 0 )
[2021-06-09 10:58:02] [SYSTEM] [ERROR]: insert_polled_data() db error: ( )
The other log files within that /var/log location don't have any other entries that seem helpful.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: live data timeout settings

Post by ssax »

That's because there is a bug in there but it doesn't impact anything other than you seeing that message.

Try doing this again:

Code: Select all

cd /usr/local/nagiosfusion/scripts
./truncate_polled.php
What is the output of these commands on the Fusion server?

Code: Select all

uname -a
cat /etc/*release
rpm -qa | grep -i maria
rpm -qa | grep -i mysql
dpkg --list | grep -i maria
dpkg --list | grep -i mysql
nosajche
Posts: 9
Joined: Fri Jun 12, 2020 8:43 am

Re: live data timeout settings

Post by nosajche »

Code: Select all

[root@fusionserver scripts]# uname -a
Linux fusionserver 3.10.0-1160.21.1.el7.x86_64 #1 SMP Mon Feb 22 18:03:13 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

[root@fusionserver scripts]# cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME=RHEL
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
Red Hat Enterprise Linux Server release 7.9 (Maipo)
Red Hat Enterprise Linux Server release 7.9 (Maipo)

[root@fusionserver scripts]# rpm -qa | grep -i maria
mariadb-5.5.68-1.el7.x86_64
mariadb-libs-5.5.68-1.el7.x86_64
mariadb-devel-5.5.68-1.el7.x86_64
mariadb-server-5.5.68-1.el7.x86_64

[root@fusionserver scripts]# rpm -qa | grep -i mysql
perl-DBD-MySQL-4.023-6.el7.x86_64
php-mysql-5.4.16-48.el7.x86_64
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: live data timeout settings

Post by ssax »

Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/

Thank you!
Locked