A lot of Service check timed out.

This board serves as an open discussion and support collaboration point for Nagios XI. NOTE: Nagios XI customers should use the Customer Support forum to obtain expedited support.

Re: A lot of Service check timed out.

Postby mbellerue » Fri Aug 30, 2019 3:49 pm

Alright, first and foremost, can you run these commands and then see if the number of timeouts decreases?

Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagios/var/ndo2db.pid
rm -rf /usr/local/nagios/var/ndo2db.sock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /us/local/nagiosxi/var/subsys/ndo2db
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
systemctl restart snmptt


If the timeouts persist, then I will need to get the following information from you.

Your /etc/hosts file
The version of MySQL or MariaDB running on the database server
The output of this command
Code: Select all
rpm -qa | grep "nagios\|httpd\|php"


I was also wondering why the database was offloaded. Was it in response to these timeouts?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 882
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Fri Aug 30, 2019 5:22 pm

The problem continues...

/etc/hosts file

Code: Select all
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6


MySQL Version

Code: Select all
mysql-community-common-5.7.26-1.el7.x86_64
mysql-community-libs-compat-5.7.26-1.el7.x86_64
mysql-community-libs-5.7.26-1.el7.x86_64
mysql-community-server-5.7.26-1.el7.x86_64
mysql-community-devel-5.7.26-1.el7.x86_64
mysql-community-client-5.7.26-1.el7.x86_64


rpm -qa | grep "nagios\|httpd\|php"

Code: Select all
php-devel-5.4.16-46.el7.x86_64
php-process-5.4.16-46.el7.x86_64
php-imap-5.4.16-9.el7.x86_64
php-mssql-5.4.16-9.el7.x86_64
httpd-2.4.6-89.el7.centos.x86_64
php-cli-5.4.16-46.el7.x86_64
php-mbstring-5.4.16-46.el7.x86_64
php-pdo-5.4.16-46.el7.x86_64
php-mysql-5.4.16-46.el7.x86_64
php-snmp-5.4.16-46.el7.x86_64
php-pear-1.9.4-21.el7.noarch
php-5.4.16-46.el7.x86_64
nagios-repo-7-3.el7.noarch
php-common-5.4.16-46.el7.x86_64
php-ldap-5.4.16-46.el7.x86_64
php-xml-5.4.16-46.el7.x86_64
php-gd-5.4.16-46.el7.x86_64
php-pgsql-5.4.16-46.el7.x86_64
php-pecl-ssh2-0.12-1.el7.x86_64
httpd-tools-2.4.6-89.el7.centos.x86_64


I was also wondering why the database was offloaded. Was it in response to these timeouts?
Yes, it was...This error happened the same way when the database was on the same server and I offloaded the database to try to improve the performance.
paulol
 
Posts: 153
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby mbellerue » Tue Sep 03, 2019 3:10 pm

Alright, so the software versions look good. Let's try running the repair script on the databases.

On the Nagios Xi server, run
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh


That may take a moment to run. So let's gather a couple of other things as well.

If you could PM me a copy of the nsclient.ini file from one of your Windows machines, that would be great.
Once the database repair script has finished, run the following query to get the sizes of the tables,
Code: Select all
SELECT TABLE_NAME AS `Table`, ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)` FROM   information_schema.TABLES WHERE TABLE_SCHEMA in ("nagiosql", "nagios", "nagiosxi") ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;


Also, once the repair script is finished, let the system just run for about an hour, and then let's get a fresh profile from Nagios XI, and a fresh copy of the database logs.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 882
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Thu Sep 05, 2019 9:18 am

Following the NSClient.ini, Tables Size and Repair Database files.

Support Edit: Files downloaded and shared with team
paulol
 
Posts: 153
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby mbellerue » Thu Sep 05, 2019 4:44 pm

Thank you for those files! Could we also get another copy of your mysqld.log file?

One thing we noticed was that the message queues were a little high. One thing you could try is running through this KB article to increase the queues.

https://support.nagios.com/kb/article/n ... d-139.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 882
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Fri Sep 13, 2019 7:57 am

I've changed, but the problem continues...

[root@TREVOUX var]# ipcs -q

Code: Select all
------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x40000002 2392064    nagios     600        25600        25


[root@TREVOUX var]# tail -f nagios.log | grep timeout
Code: Select all
[1568379101] SERVICE ALERT: SVCENSURA_ULA;Counter PhysicalDisk_Total - Disk Writes/sec;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379189] SERVICE ALERT: SAEBY;Counter PhysicalDisk_Total - Disk Reads/sec;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379199] SERVICE ALERT: ARRAS;Qtd. de canais ativos locais;UNKNOWN;HARD;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379269] SERVICE ALERT: BOLONHA;Counter PhysicalDisk_Total - Idle Time;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379286] SERVICE ALERT: LILLE;Sincronizacao de Horario;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379325] SERVICE ALERT: TARANTO;Memoria;UNKNOWN;HARD;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379334] SERVICE ALERT: ANTAKYA;Counter PhysicalDisk_Total - Disk Reads/sec;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
[1568379357] SERVICE ALERT: NANCY;/tmp Disk Usage;UNKNOWN;SOFT;1;CHECK_NRPE STATE UNKNOWN: Socket timeout after 50 seconds.
Attachments
mysqld.log
Mysql Log
(212.83 KiB) Downloaded 9 times
paulol
 
Posts: 153
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby mbellerue » Fri Sep 13, 2019 3:05 pm

Is that the mysqld.log file from the offloaded server? The entries in there stop at 8/22.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 882
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Mon Oct 28, 2019 8:46 am

I found the problem.

It was my network firewall. I don't know why but sometimes the network firewall starts to block IPS action in some signatures ( TCP.Out.Of.Range.Timestamp, SSL.Anonymous.Ciphers.Negotiation and TCP.Overlapping.Fragments ).

Sometimes when the NRPE checks services or hosts, the network firewall think that it's an Attack comming from Nagios XI. Maybe because of the amount of checks coming from one place.

Thanks for help.
Attachments
Capture.JPG
Network firewall log
paulol
 
Posts: 153
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby scottwilkerson » Mon Oct 28, 2019 9:21 am

paulol wrote:I found the problem.

It was my network firewall. I don't know why but sometimes the network firewall starts to block IPS action in some signatures ( TCP.Out.Of.Range.Timestamp, SSL.Anonymous.Ciphers.Negotiation and TCP.Overlapping.Fragments ).

Sometimes when the NRPE checks services or hosts, the network firewall think that it's an Attack comming from Nagios XI. Maybe because of the amount of checks coming from one place.

Thanks for help.

Glad to hear you found it!

Locking thread
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 17032
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Previous

Return to Nagios XI

Who is online

Users browsing this forum: MSN [Bot] and 12 guests