Hi Guys,
I have the following situation.
I've installed and configured the following postgresql plugin https://exchange.nagios.org/directory/P ... es/details for monitoring postgresql on CentOS 6.8 64 bit.
I've configured NRPE with some commands as follow: (host to be monitored)
command[check_pg_connection]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action connection
command[check_pg_dbstats]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action dbstats
command[check_pg_bloat]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action bloat
command[check_pg_database_size]=/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
There is no firewall running on the server with postgresql:
iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
NRPE is listening :
netstat -at | grep nrpe
tcp 0 0 *:nrpe *:* LISTEN
With nmap I've verified that the port is opened:
nmap 192.168.151.205 -p 5666
Starting Nmap 5.51 ( http://nmap.org ) at 2016-06-12 14:54 BOT
Nmap scan report for 192.168.151.205
Host is up (0.000078s latency).
PORT STATE SERVICE
5666/tcp open nrpe
Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds
Testing NRPE from Nagios XI server, I get this:
./check_nrpe -H 192.168.151.205 -c check_users
USERS OK - 1 users currently logged in |users=1;5;10;0
So, it means it's working, but I try one of the commands I've defined in the postgresql server, I got this:
./check_nrpe -H 192.168.151.205 -c check_pg_connection
CHECK_NRPE: Socket timeout after 10 seconds.
For solving the issue, I followed the procedure from here https://assets.nagios.com/downloads/nag ... utions.pdf
But in my case it didn't work.
Any idea about how to solve this issue?
regards,
CHECK_NRPE: Socket timeout after 10 seconds
-
- Posts: 48
- Joined: Thu Sep 13, 2012 7:15 pm
Re: CHECK_NRPE: Socket timeout after 10 seconds
This could be multiple different things.
Can you execute the commands on the local client machine without issue? What is the full input / output for the following?
It could also be taking more than 10 seconds for it to come back with an answer, any luck if you run your check_nrpe with -t 60?
When you test over the CLI using ./check_nrpe -H 192.168.151.205 -c check_pg_connection, what do you see regarding NRPE in /var/log/messages on the client machine? This should help diagnose the problematic issue.
Can you execute the commands on the local client machine without issue? What is the full input / output for the following?
Code: Select all
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
When you test over the CLI using ./check_nrpe -H 192.168.151.205 -c check_pg_connection, what do you see regarding NRPE in /var/log/messages on the client machine? This should help diagnose the problematic issue.
Former Nagios Employee
-
- Posts: 48
- Joined: Thu Sep 13, 2012 7:15 pm
Re: CHECK_NRPE: Socket timeout after 10 seconds
Hi rkennedy,
Concerning to your questions here are the answers:
Executing the command in the client directly, it works:
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
POSTGRES_DATABASE_SIZE OK: DB "template1" (host:localhost) postgres: 6727864 (6570 kB) template1: 6727864 (6570 kB) template0: 6611460 (6457 kB) | time=0.01s postgres=6727864;32212254720;37580963840 template1=6727864;32212254720;37580963840 template0=6611460;32212254720;37580963840
For /var/log/messages for today I have the following entries:
Jun 14 07:23:54 linuxsvr1 xinetd[1224]: START: nrpe pid=1568 from=::ffff:192.168.151.192
Jun 14 07:24:24 linuxsvr1 xinetd[1224]: START: nrpe pid=1590 from=::ffff:192.168.151.192
Jun 14 07:24:34 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1568 duration=40(sec)
Jun 14 07:24:59 linuxsvr1 xinetd[1224]: START: nrpe pid=1612 from=::ffff:192.168.151.192
Jun 14 07:25:04 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1590 duration=40(sec)
Jun 14 07:25:29 linuxsvr1 xinetd[1224]: START: nrpe pid=1621 from=::ffff:192.168.151.192
Jun 14 07:25:39 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1612 duration=40(sec)
Jun 14 07:25:51 linuxsvr1 xinetd[1224]: START: nrpe pid=1634 from=::ffff:192.168.151.192
Jun 14 07:25:51 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1634 duration=0(sec)
Jun 14 07:25:57 linuxsvr1 xinetd[1224]: START: nrpe pid=1638 from=::ffff:192.168.151.192
Jun 14 07:26:07 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1638 duration=10(sec)
Jun 14 07:26:09 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1621 duration=40(sec)
Jun 14 07:28:54 linuxsvr1 xinetd[1224]: START: nrpe pid=1675 from=::ffff:192.168.151.192
Jun 14 07:29:10 linuxsvr1 xinetd[1224]: START: nrpe pid=1683 from=::1
Jun 14 07:29:10 linuxsvr1 xinetd[1683]: FAIL: nrpe address from=::1
Jun 14 07:29:10 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1683 duration=0(sec)
Jun 14 07:29:24 linuxsvr1 xinetd[1224]: START: nrpe pid=1685 from=::ffff:192.168.151.192
Jun 14 07:29:34 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1675 duration=40(sec)
Jun 14 07:29:59 linuxsvr1 xinetd[1716]: START: nrpe pid=1719 from=::ffff:192.168.151.192
Jun 14 07:30:29 linuxsvr1 xinetd[1716]: START: nrpe pid=1729 from=::ffff:192.168.151.192
Jun 14 07:30:37 linuxsvr1 xinetd[1716]: START: nrpe pid=1737 from=::ffff:127.0.0.1
Jun 14 07:30:39 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1719 duration=40(sec)
Jun 14 07:30:47 linuxsvr1 xinetd[1716]: EXIT: nrpe signal=13 pid=1737 duration=10(sec)
Jun 14 07:30:51 linuxsvr1 xinetd[1716]: START: nrpe pid=1746 from=::ffff:192.168.151.192
Jun 14 07:30:51 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1746 duration=0(sec)
Jun 14 07:30:57 linuxsvr1 xinetd[1716]: START: nrpe pid=1751 from=::1
Jun 14 07:30:57 linuxsvr1 xinetd[1751]: FAIL: nrpe address from=::1
Jun 14 07:30:57 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1751 duration=0(sec)
Jun 14 07:31:05 linuxsvr1 xinetd[1716]: START: nrpe pid=1753 from=::ffff:127.0.0.1
Jun 14 07:31:05 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1753 duration=0(sec)
Jun 14 07:31:09 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1729 duration=40(sec)
And from /var/log/messages in the Nagios XI 5 Server, I have:
Jun 14 11:24:32 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG Bloat;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:25:02 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG Connection;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:25:36 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG DB Size;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:26:06 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG DBStats;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
And from the client, if I execute it in this way: ./check_nrpe -H 192.168.151.205 -c check_pg_connection I get this: CHECK_NRPE: Socket timeout after 10 seconds.
regards,
Concerning to your questions here are the answers:
Executing the command in the client directly, it works:
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
POSTGRES_DATABASE_SIZE OK: DB "template1" (host:localhost) postgres: 6727864 (6570 kB) template1: 6727864 (6570 kB) template0: 6611460 (6457 kB) | time=0.01s postgres=6727864;32212254720;37580963840 template1=6727864;32212254720;37580963840 template0=6611460;32212254720;37580963840
For /var/log/messages for today I have the following entries:
Jun 14 07:23:54 linuxsvr1 xinetd[1224]: START: nrpe pid=1568 from=::ffff:192.168.151.192
Jun 14 07:24:24 linuxsvr1 xinetd[1224]: START: nrpe pid=1590 from=::ffff:192.168.151.192
Jun 14 07:24:34 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1568 duration=40(sec)
Jun 14 07:24:59 linuxsvr1 xinetd[1224]: START: nrpe pid=1612 from=::ffff:192.168.151.192
Jun 14 07:25:04 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1590 duration=40(sec)
Jun 14 07:25:29 linuxsvr1 xinetd[1224]: START: nrpe pid=1621 from=::ffff:192.168.151.192
Jun 14 07:25:39 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1612 duration=40(sec)
Jun 14 07:25:51 linuxsvr1 xinetd[1224]: START: nrpe pid=1634 from=::ffff:192.168.151.192
Jun 14 07:25:51 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1634 duration=0(sec)
Jun 14 07:25:57 linuxsvr1 xinetd[1224]: START: nrpe pid=1638 from=::ffff:192.168.151.192
Jun 14 07:26:07 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1638 duration=10(sec)
Jun 14 07:26:09 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1621 duration=40(sec)
Jun 14 07:28:54 linuxsvr1 xinetd[1224]: START: nrpe pid=1675 from=::ffff:192.168.151.192
Jun 14 07:29:10 linuxsvr1 xinetd[1224]: START: nrpe pid=1683 from=::1
Jun 14 07:29:10 linuxsvr1 xinetd[1683]: FAIL: nrpe address from=::1
Jun 14 07:29:10 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1683 duration=0(sec)
Jun 14 07:29:24 linuxsvr1 xinetd[1224]: START: nrpe pid=1685 from=::ffff:192.168.151.192
Jun 14 07:29:34 linuxsvr1 xinetd[1224]: EXIT: nrpe status=0 pid=1675 duration=40(sec)
Jun 14 07:29:59 linuxsvr1 xinetd[1716]: START: nrpe pid=1719 from=::ffff:192.168.151.192
Jun 14 07:30:29 linuxsvr1 xinetd[1716]: START: nrpe pid=1729 from=::ffff:192.168.151.192
Jun 14 07:30:37 linuxsvr1 xinetd[1716]: START: nrpe pid=1737 from=::ffff:127.0.0.1
Jun 14 07:30:39 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1719 duration=40(sec)
Jun 14 07:30:47 linuxsvr1 xinetd[1716]: EXIT: nrpe signal=13 pid=1737 duration=10(sec)
Jun 14 07:30:51 linuxsvr1 xinetd[1716]: START: nrpe pid=1746 from=::ffff:192.168.151.192
Jun 14 07:30:51 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1746 duration=0(sec)
Jun 14 07:30:57 linuxsvr1 xinetd[1716]: START: nrpe pid=1751 from=::1
Jun 14 07:30:57 linuxsvr1 xinetd[1751]: FAIL: nrpe address from=::1
Jun 14 07:30:57 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1751 duration=0(sec)
Jun 14 07:31:05 linuxsvr1 xinetd[1716]: START: nrpe pid=1753 from=::ffff:127.0.0.1
Jun 14 07:31:05 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1753 duration=0(sec)
Jun 14 07:31:09 linuxsvr1 xinetd[1716]: EXIT: nrpe status=0 pid=1729 duration=40(sec)
And from /var/log/messages in the Nagios XI 5 Server, I have:
Jun 14 11:24:32 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG Bloat;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:25:02 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG Connection;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:25:36 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG DB Size;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
Jun 14 11:26:06 nagiosxi5 nagios: SERVICE NOTIFICATION: nagiosadmin;linuxsvr1;Check PG DBStats;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 40 seconds.
And from the client, if I execute it in this way: ./check_nrpe -H 192.168.151.205 -c check_pg_connection I get this: CHECK_NRPE: Socket timeout after 10 seconds.
regards,
Re: CHECK_NRPE: Socket timeout after 10 seconds
When a command is run on a remote server using NRPE, it is run as the nagios user and you may have a permission problem.
Can you login to the remote server as root, run this and post it so we can see the permissions of the plugins?
Then change to the nagios user and run your check. Post the output.
Can you login to the remote server as root, run this and post it so we can see the permissions of the plugins?
Code: Select all
ls -l /usr/local/nagios/libexec/
Code: Select all
su nagios
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 48
- Joined: Thu Sep 13, 2012 7:15 pm
Re: CHECK_NRPE: Socket timeout after 10 seconds
In the server with postgresql, the output for the plugin
-rwxr-xr-x 1 nagios nagios 390169 Jun 12 13:52 check_postgres.pl
In the Nagios server:
-rwxr-xr-x. 1 nagios nagios 388326 Jun 12 13:15 /usr/local/nagios/libexec/check_postgres.pl
su nagios
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
could not change directory to "/root"
Password for user postgres:
POSTGRES_DATABASE_SIZE OK: DB "template1" (host:localhost) postgres: 6727864 (6570 kB) template1: 6727864 (6570 kB) template0: 6611460 (6457 kB) | time=4.31s postgres=6727864;32212254720;37580963840 template1=6727864;32212254720;37580963840 template0=6611460;32212254720;37580963840
After changing to nagios user, the command asked me for postgres password, but that happened because in root directory I have a hidden file with the postgres password defined, this is its permission:
-rw------- 1 nagios nagios 72 Jun 12 14:04 .pgpass
-rwxr-xr-x 1 nagios nagios 390169 Jun 12 13:52 check_postgres.pl
In the Nagios server:
-rwxr-xr-x. 1 nagios nagios 388326 Jun 12 13:15 /usr/local/nagios/libexec/check_postgres.pl
su nagios
/usr/local/nagios/libexec/check_postgres.pl -H localhost -db template1 -u postgres --action database_size --warning='30 GB' --critical='35 GB'
could not change directory to "/root"
Password for user postgres:
POSTGRES_DATABASE_SIZE OK: DB "template1" (host:localhost) postgres: 6727864 (6570 kB) template1: 6727864 (6570 kB) template0: 6611460 (6457 kB) | time=4.31s postgres=6727864;32212254720;37580963840 template1=6727864;32212254720;37580963840 template0=6611460;32212254720;37580963840
After changing to nagios user, the command asked me for postgres password, but that happened because in root directory I have a hidden file with the postgres password defined, this is its permission:
-rw------- 1 nagios nagios 72 Jun 12 14:04 .pgpass
Re: CHECK_NRPE: Socket timeout after 10 seconds
My suggestion in to put the hidden password file in the nagios home folder in the remote system.
The user permissions maybe correct for the file but the folder it is in may not be correct and that is why it is failing.
The user permissions maybe correct for the file but the folder it is in may not be correct and that is why it is failing.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 48
- Joined: Thu Sep 13, 2012 7:15 pm
Re: CHECK_NRPE: Socket timeout after 10 seconds
Hi,
I could solve the issue, I had to modify a couple of files on the postgresql side for letting connection from the specific user on a specific database, after that, I could execute the command and now it's working on Nagios XI.
Thanks so much for the support.
I could solve the issue, I had to modify a couple of files on the postgresql side for letting connection from the specific user on a specific database, after that, I could execute the command and now it's working on Nagios XI.
Thanks so much for the support.
Re: CHECK_NRPE: Socket timeout after 10 seconds
That is good to hear, shall we mark this post as solved and lock it up then?
Be sure to check out our Knowledgebase for helpful articles and solutions!