Page 2 of 2

Re: A lot of Service check timed out.

Posted: Fri Aug 30, 2019 3:49 pm
by mbellerue
Alright, first and foremost, can you run these commands and then see if the number of timeouts decreases?

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagios/var/ndo2db.pid
rm -rf /usr/local/nagios/var/ndo2db.sock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /us/local/nagiosxi/var/subsys/ndo2db
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
systemctl restart snmptt
If the timeouts persist, then I will need to get the following information from you.

Your /etc/hosts file
The version of MySQL or MariaDB running on the database server
The output of this command

Code: Select all

rpm -qa | grep "nagios\|httpd\|php"
I was also wondering why the database was offloaded. Was it in response to these timeouts?

Re: A lot of Service check timed out.

Posted: Fri Aug 30, 2019 5:22 pm
by paulol
The problem continues...

/etc/hosts file

Code: Select all

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
MySQL Version

Code: Select all

mysql-community-common-5.7.26-1.el7.x86_64
mysql-community-libs-compat-5.7.26-1.el7.x86_64
mysql-community-libs-5.7.26-1.el7.x86_64
mysql-community-server-5.7.26-1.el7.x86_64
mysql-community-devel-5.7.26-1.el7.x86_64
mysql-community-client-5.7.26-1.el7.x86_64
rpm -qa | grep "nagios\|httpd\|php"

Code: Select all

php-devel-5.4.16-46.el7.x86_64
php-process-5.4.16-46.el7.x86_64
php-imap-5.4.16-9.el7.x86_64
php-mssql-5.4.16-9.el7.x86_64
httpd-2.4.6-89.el7.centos.x86_64
php-cli-5.4.16-46.el7.x86_64
php-mbstring-5.4.16-46.el7.x86_64
php-pdo-5.4.16-46.el7.x86_64
php-mysql-5.4.16-46.el7.x86_64
php-snmp-5.4.16-46.el7.x86_64
php-pear-1.9.4-21.el7.noarch
php-5.4.16-46.el7.x86_64
nagios-repo-7-3.el7.noarch
php-common-5.4.16-46.el7.x86_64
php-ldap-5.4.16-46.el7.x86_64
php-xml-5.4.16-46.el7.x86_64
php-gd-5.4.16-46.el7.x86_64
php-pgsql-5.4.16-46.el7.x86_64
php-pecl-ssh2-0.12-1.el7.x86_64
httpd-tools-2.4.6-89.el7.centos.x86_64
I was also wondering why the database was offloaded. Was it in response to these timeouts?
Yes, it was...This error happened the same way when the database was on the same server and I offloaded the database to try to improve the performance.

Re: A lot of Service check timed out.

Posted: Tue Sep 03, 2019 3:10 pm
by mbellerue
Alright, so the software versions look good. Let's try running the repair script on the databases.

On the Nagios Xi server, run

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
That may take a moment to run. So let's gather a couple of other things as well.

If you could PM me a copy of the nsclient.ini file from one of your Windows machines, that would be great.
Once the database repair script has finished, run the following query to get the sizes of the tables,

Code: Select all

SELECT TABLE_NAME AS `Table`, ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)` FROM   information_schema.TABLES WHERE TABLE_SCHEMA in ("nagiosql", "nagios", "nagiosxi") ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
Also, once the repair script is finished, let the system just run for about an hour, and then let's get a fresh profile from Nagios XI, and a fresh copy of the database logs.

Re: A lot of Service check timed out.

Posted: Thu Sep 05, 2019 9:18 am
by paulol
Following the NSClient.ini, Tables Size and Repair Database files.

Support Edit: Files downloaded and shared with team

Re: A lot of Service check timed out.

Posted: Thu Sep 05, 2019 4:44 pm
by mbellerue
Thank you for those files! Could we also get another copy of your mysqld.log file?

One thing we noticed was that the message queues were a little high. One thing you could try is running through this KB article to increase the queues.

https://support.nagios.com/kb/article/n ... d-139.html

Re: A lot of Service check timed out.

Posted: Fri Sep 13, 2019 7:57 am
by paulol
I've changed, but the problem continues...

[root@TREVOUX var]# ipcs -q

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x40000002 2392064    nagios     600        25600        25
[root@TREVOUX var]# tail -f nagios.log | grep timeout

Code: Select all

[1568379101] SERVICE ALERT: SVCENSURA_ULA;Counter PhysicalDisk_Total - Disk Writes/sec;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379189] SERVICE ALERT: SAEBY;Counter PhysicalDisk_Total - Disk Reads/sec;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379199] SERVICE ALERT: ARRAS;Qtd. de canais ativos locais;UNKNOWN;HARD;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379269] SERVICE ALERT: BOLONHA;Counter PhysicalDisk_Total - Idle Time;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379286] SERVICE ALERT: LILLE;Sincronizacao de Horario;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379325] SERVICE ALERT: TARANTO;Memoria;UNKNOWN;HARD;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379334] SERVICE ALERT: ANTAKYA;Counter PhysicalDisk_Total - Disk Reads/sec;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379357] SERVICE ALERT: NANCY;/tmp Disk Usage;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.

Re: A lot of Service check timed out.

Posted: Fri Sep 13, 2019 3:05 pm
by mbellerue
Is that the mysqld.log file from the offloaded server? The entries in there stop at 8/22.

Re: A lot of Service check timed out.

Posted: Mon Oct 28, 2019 8:46 am
by paulol
I found the problem.

It was my network firewall. I don't know why but sometimes the network firewall starts to block IPS action in some signatures ( TCP.Out.Of.Range.Timestamp, SSL.Anonymous.Ciphers.Negotiation and TCP.Overlapping.Fragments ).

Sometimes when the NRPE checks services or hosts, the network firewall think that it's an Attack comming from Nagios XI. Maybe because of the amount of checks coming from one place.

Thanks for help.

Re: A lot of Service check timed out.

Posted: Mon Oct 28, 2019 9:21 am
by scottwilkerson
paulol wrote:I found the problem.

It was my network firewall. I don't know why but sometimes the network firewall starts to block IPS action in some signatures ( TCP.Out.Of.Range.Timestamp, SSL.Anonymous.Ciphers.Negotiation and TCP.Overlapping.Fragments ).

Sometimes when the NRPE checks services or hosts, the network firewall think that it's an Attack comming from Nagios XI. Maybe because of the amount of checks coming from one place.

Thanks for help.
Glad to hear you found it!

Locking thread