CHECK_NRPE: Socket timeout after 10 seconds

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
lcontreras
Posts: 48
Joined: Thu Sep 13, 2012 7:15 pm

CHECK_NRPE: Socket timeout after 10 seconds

Post by lcontreras »

Hi Guys,

I have the following situation.

I've installed and configured the following postgresql plugin https://exchange.nagios.org/directory/P ... es/details for monitoring postgresql on CentOS 6.8 64 bit.

I've configured NRPE with some commands as follow: (host to be monitored)

command[check_pg_connection]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action connection
command[check_pg_dbstats]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action dbstats
command[check_pg_bloat]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action bloat
command[check_pg_database_size]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'

There is no firewall running on the server with postgresql:

iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

NRPE is listening :

netstat -at | grep nrpe
tcp 0 0 *:nrpe *:* LISTEN

With nmap I've verified that the port is opened:

nmap 192.168.151.205 -p 5666

Starting Nmap 5.51 ( http://nmap.org ) at 2016-06-12 14:54 BOT
Nmap scan report for 192.168.151.205
Host is up (0.000078s latency).
PORT STATE SERVICE
5666/tcp open nrpe

Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds

Testing NRPE from Nagios XI server, I get this:

./check_nrpe -H 192.168.151.205 -c check_users
USERS OK - 1 users currently logged in |users=1;5;10;0

So, it means it's working, but I try one of the commands I've defined in the postgresql server, I got this:

./check_nrpe -H 192.168.151.205 -c check_pg_connection
CHECK_NRPE: Socket timeout after 10 seconds.

For solving the issue, I followed the procedure from here https://assets.nagios.com/downloads/nag ... utions.pdf

But in my case it didn't work.

Any idea about how to solve this issue?

regards,
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by rkennedy »

This could be multiple different things.

Can you execute the commands on the local client machine without issue? What is the full input / output for the following?

Code: Select all

/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
It could also be taking more than 10 seconds for it to come back with an answer, any luck if you run your check_nrpe with -t 60?

When you test over the CLI using ./check_nrpe -H 192.168.151.205 -c check_pg_connection, what do you see regarding NRPE in /var/log/messages on the client machine? This should help diagnose the problematic issue.
Former Nagios Employee
lcontreras
Posts: 48
Joined: Thu Sep 13, 2012 7:15 pm

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by lcontreras »

Hi rkennedy,

Concerning to your questions here are the answers:

Executing the command in the client directly, it works:

/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
POSTGRES_DATABASE_SIZE OK: DB "template1" (host:localhost) postgres: 6727864 (6570 kB) template1: 6727864 (6570 kB) template0: 6611460 (6457 kB) | time=0.01s postgres=6727864;32212254720;37580963840 template1=6727864;32212254720;37580963840 template0=6611460;32212254720;37580963840

For /var/log/messages for today I have the following entries:

Jun 14 07:23:54 linuxsvr1 xinetd[1224]: START: nrpe pid=1568 from=::ffff:192.168.151.192
Jun 14 07:24:24 linuxsvr1 xinetd[1224]: START: nrpe pid=1590 from=::ffff:192.168.151.192
Jun 14 07:24:34 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1568 duration=40(sec)
Jun 14 07:24:59 linuxsvr1 xinetd[1224]: START: nrpe pid=1612 from=::ffff:192.168.151.192
Jun 14 07:25:04 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1590 duration=40(sec)
Jun 14 07:25:29 linuxsvr1 xinetd[1224]: START: nrpe pid=1621 from=::ffff:192.168.151.192
Jun 14 07:25:39 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1612 duration=40(sec)
Jun 14 07:25:51 linuxsvr1 xinetd[1224]: START: nrpe pid=1634 from=::ffff:192.168.151.192
Jun 14 07:25:51 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1634 duration=0(sec)
Jun 14 07:25:57 linuxsvr1 xinetd[1224]: START: nrpe pid=1638 from=::ffff:192.168.151.192
Jun 14 07:26:07 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1638 duration=10(sec)
Jun 14 07:26:09 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1621 duration=40(sec)
Jun 14 07:28:54 linuxsvr1 xinetd[1224]: START: nrpe pid=1675 from=::ffff:192.168.151.192
Jun 14 07:29:10 linuxsvr1 xinetd[1224]: START: nrpe pid=1683 from=::1
Jun 14 07:29:10 linuxsvr1 xinetd[1683]: FAIL: nrpe address from=::1
Jun 14 07:29:10 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1683 duration=0(sec)
Jun 14 07:29:24 linuxsvr1 xinetd[1224]: START: nrpe pid=1685 from=::ffff:192.168.151.192
Jun 14 07:29:34 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1675 duration=40(sec)
Jun 14 07:29:59 linuxsvr1 xinetd[1716]: START: nrpe pid=1719 from=::ffff:192.168.151.192
Jun 14 07:30:29 linuxsvr1 xinetd[1716]: START: nrpe pid=1729 from=::ffff:192.168.151.192
Jun 14 07:30:37 linuxsvr1 xinetd[1716]: START: nrpe pid=1737 from=::ffff:127.0.0.1
Jun 14 07:30:39 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1719 duration=40(sec)
Jun 14 07:30:47 linuxsvr1 xinetd[1716]: EXIT: nrpe signal=13 pid=1737 duration=10(sec)
Jun 14 07:30:51 linuxsvr1 xinetd[1716]: START: nrpe pid=1746 from=::ffff:192.168.151.192
Jun 14 07:30:51 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1746 duration=0(sec)
Jun 14 07:30:57 linuxsvr1 xinetd[1716]: START: nrpe pid=1751 from=::1
Jun 14 07:30:57 linuxsvr1 xinetd[1751]: FAIL: nrpe address from=::1
Jun 14 07:30:57 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1751 duration=0(sec)
Jun 14 07:31:05 linuxsvr1 xinetd[1716]: START: nrpe pid=1753 from=::ffff:127.0.0.1
Jun 14 07:31:05 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1753 duration=0(sec)
Jun 14 07:31:09 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1729 duration=40(sec)


And from /var/log/messages in the Nagios XI 5 Server, I have:

Jun 14 11:24:32 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG Bloat;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:25:02 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG Connection;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:25:36 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG DB Size;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:26:06 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG DBStats;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.


And from the client, if I execute it in this way: ./check_nrpe -H 192.168.151.205 -c check_pg_connection I get this: CHECK_NRPE: Socket timeout after 10 seconds.

regards,
User avatar
tgriep
Madmin
Posts: 9179
Joined: Thu Oct 30, 2014 9:02 am

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by tgriep »

When a command is run on a remote server using NRPE, it is run as the nagios user and you may have a permission problem.
Can you login to the remote server as root, run this and post it so we can see the permissions of the plugins?

Code: Select all

ls -l /usr/local/nagios/libexec/
Then change to the nagios user and run your check. Post the output.

Code: Select all

su nagios
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
Be sure to check out our Knowledgebase for helpful articles and solutions!
lcontreras
Posts: 48
Joined: Thu Sep 13, 2012 7:15 pm

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by lcontreras »

In the server with postgresql, the output for the plugin

-rwxr-xr-x 1 nagios nagios 390169 Jun 12 13:52 check_postgres.pl

In the Nagios server:

-rwxr-xr-x. 1 nagios nagios 388326 Jun 12 13:15 /usr/local/nagios/libexec/check_postgres.pl

su nagios
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
could not change directory to "/root"
Password for user postgres:
POSTGRES_DATABASE_SIZE OK: DB "template1" (host:localhost) postgres: 6727864 (6570 kB) template1: 6727864 (6570 kB) template0: 6611460 (6457 kB) | time=4.31s postgres=6727864;32212254720;37580963840 template1=6727864;32212254720;37580963840 template0=6611460;32212254720;37580963840

After changing to nagios user, the command asked me for postgres password, but that happened because in root directory I have a hidden file with the postgres password defined, this is its permission:

-rw------- 1 nagios nagios 72 Jun 12 14:04 .pgpass
User avatar
tgriep
Madmin
Posts: 9179
Joined: Thu Oct 30, 2014 9:02 am

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by tgriep »

My suggestion in to put the hidden password file in the nagios home folder in the remote system.
The user permissions maybe correct for the file but the folder it is in may not be correct and that is why it is failing.
Be sure to check out our Knowledgebase for helpful articles and solutions!
lcontreras
Posts: 48
Joined: Thu Sep 13, 2012 7:15 pm

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by lcontreras »

Hi,

I could solve the issue, I had to modify a couple of files on the postgresql side for letting connection from the specific user on a specific database, after that, I could execute the command and now it's working on Nagios XI.

Thanks so much for the support.
User avatar
tgriep
Madmin
Posts: 9179
Joined: Thu Oct 30, 2014 9:02 am

Re: CHECK_NRPE: Socket timeout after 10 seconds

Post by tgriep »

That is good to hear, shall we mark this post as solved and lock it up then?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked