Monitoring nagios with another nagios.
Monitoring nagios with another nagios.
Hi
So I finally managed to get both nagios monitoring servers fully working in lan A and lan B
So now I wanted to make them monitor eachother so that if one would go down, the other one would notify me.
So I treated one as a normal ubuntu client and the other as the normal monitoring server, I added him the same way as the other clients to the other nagios server and upon check he can ping/ssh the server but any other services I have defined return the error "CHECK_NRPE: Socket timeout after 10 seconds".
We just had this error with some clients and this was due to the fact that someone closed the port 5666. We opened it so why is it returning this error again?
For clarification: I added this other monitoring server the same way as any other ubuntu client I added to either nagios servers.
All the files are right, configs are exactly the same as the confirmed working ubuntu clients.
I hope this is enough information.
So I finally managed to get both nagios monitoring servers fully working in lan A and lan B
So now I wanted to make them monitor eachother so that if one would go down, the other one would notify me.
So I treated one as a normal ubuntu client and the other as the normal monitoring server, I added him the same way as the other clients to the other nagios server and upon check he can ping/ssh the server but any other services I have defined return the error "CHECK_NRPE: Socket timeout after 10 seconds".
We just had this error with some clients and this was due to the fact that someone closed the port 5666. We opened it so why is it returning this error again?
For clarification: I added this other monitoring server the same way as any other ubuntu client I added to either nagios servers.
All the files are right, configs are exactly the same as the confirmed working ubuntu clients.
I hope this is enough information.
Re: Monitoring nagios with another nagios.
Please post the service definitions for the services that are timing out.
Also, from the Nagios machine, run nmap <ipofserverthatistimingout> and post the full output. (replace <ipofserverthatistimingout> with the actual IP for the host that is timing out.)
Also, from the Nagios machine, run nmap <ipofserverthatistimingout> and post the full output. (replace <ipofserverthatistimingout> with the actual IP for the host that is timing out.)
Former Nagios Employee
Re: Monitoring nagios with another nagios.
The service definitions are fine, they are copy-paste from the other files. And yes, I edited the hostname and IP with the host definition.rkennedy wrote:Please post the service definitions for the services that are timing out.
Also, from the Nagios machine, run nmap <ipofserverthatistimingout> and post the full output. (replace <ipofserverthatistimingout> with the actual IP for the host that is timing out.)
Code: Select all
root@lannister:/home/administrator# nmap 10.0.60.10
Starting Nmap 6.40 ( http://nmap.org ) at 2016-05-18 00:15 CEST
Nmap scan report for 10.0.60.10
Host is up (0.0010s latency).
Not shown: 994 filtered ports
PORT STATE SERVICE
22/tcp open ssh
53/tcp closed domain
80/tcp open http
443/tcp closed https
587/tcp closed submission
3389/tcp closed ms-wbt-server
Nmap done: 1 IP address (1 host up) scanned in 32.67 seconds
root@lannister:/home/administrator#
Last edited by SaltyBear on Tue May 17, 2016 5:16 pm, edited 1 time in total.
Re: Monitoring nagios with another nagios.
Thanks, let us know what you come up with!
Former Nagios Employee.
me.
me.
Re: Monitoring nagios with another nagios.
Are you running it with xinetd?
What do you see in /var/log/messages or /var/log/syslog, anything related?
What do you see in /var/log/messages or /var/log/syslog, anything related?
Re: Monitoring nagios with another nagios.
The only thing I find of the monitored nagios is thisssax wrote:Are you running it with xinetd?
What do you see in /var/log/messages or /var/log/syslog, anything related?
Lannister is the one monitoring nightswatch. Lannister = LanA, nightswatch = LanB
Code: Select all
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;CPU Load;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;Check Zombies;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;Check hda1;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.90 ms
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;SSH;OK;HARD;1;SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;Total Processes;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 18 00:00:00 lannister nagios: CURRENT SERVICE STATE: nightswatch;Users Load;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
Code: Select all
May 17 21:29:06 lannister nagios: Successfully launched command file worker with pid 6993
May 17 21:30:52 lannister nagios: SERVICE ALERT: nightswatch;CPU Load;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:31:06 lannister nagios: SERVICE ALERT: nightswatch;Users Load;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:32:29 lannister nagios: SERVICE ALERT: nightswatch;Check Zombies;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:32:52 lannister nagios: SERVICE ALERT: nightswatch;CPU Load;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:33:06 lannister nagios: SERVICE ALERT: nightswatch;Users Load;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:34:05 lannister nagios: SERVICE ALERT: nightswatch;Check hda1;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:34:29 lannister nagios: SERVICE ALERT: nightswatch;Check Zombies;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:34:52 lannister nagios: SERVICE ALERT: nightswatch;CPU Load;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:34:52 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;CPU Load;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:34:56 lannister sSMTP[7104]: Creating SSL connection to host
May 17 21:34:56 lannister sSMTP[7104]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 21:34:58 lannister sSMTP[7104]: Sent mail for nagios@lannister (221 2.0.0 closing connection c85sm24765143wmd.0 - gsmtp) uid=1001 username=nagios outbytes=636
May 17 21:35:03 lannister nagios: EXTERNAL COMMAND: SCHEDULE_HOST_SVC_CHECKS;nightswatch;1463513703
May 17 21:35:06 lannister nagios: SERVICE ALERT: nightswatch;Users Load;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:35:06 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Users Load;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:35:06 lannister sSMTP[7122]: Creating SSL connection to host
May 17 21:35:07 lannister sSMTP[7122]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 21:35:08 lannister sSMTP[7122]: Sent mail for nagios@lannister (221 2.0.0 closing connection lr9sm4695078wjb.39 - gsmtp) uid=1001 username=nagios outbytes=640
May 17 21:35:13 lannister nagios: SERVICE ALERT: nightswatch;Total Processes;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:35:13 lannister nagios: SERVICE ALERT: nightswatch;Check hda1;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:35:13 lannister nagios: SERVICE ALERT: nightswatch;Check Zombies;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:35:13 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Check Zombies;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:35:13 lannister sSMTP[7131]: Creating SSL connection to host
May 17 21:35:14 lannister sSMTP[7131]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 21:35:15 lannister sSMTP[7131]: Sent mail for nagios@lannister (221 2.0.0 closing connection y3sm4695566wji.40 - gsmtp) uid=1001 username=nagios outbytes=646
May 17 21:37:13 lannister nagios: SERVICE ALERT: nightswatch;Total Processes;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:37:13 lannister nagios: SERVICE ALERT: nightswatch;Check hda1;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:37:13 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Check hda1;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:37:13 lannister sSMTP[7157]: Creating SSL connection to host
May 17 21:37:14 lannister sSMTP[7157]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 21:37:16 lannister sSMTP[7157]: Sent mail for nagios@lannister (221 2.0.0 closing connection y70sm5191963wmd.3 - gsmtp) uid=1001 username=nagios outbytes=640
May 17 21:39:01 lannister CRON[7173]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
May 17 21:39:13 lannister nagios: SERVICE ALERT: nightswatch;Total Processes;CRITICAL;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:39:13 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Total Processes;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 21:39:13 lannister sSMTP[7190]: Creating SSL connection to host
May 17 21:39:14 lannister sSMTP[7190]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 21:39:15 lannister sSMTP[7190]: Sent mail for nagios@lannister (221 2.0.0 closing connection a63sm5181282wmh.11 - gsmtp) uid=1001 username=nagios outbytes=650
May 17 22:09:01 lannister CRON[7566]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
May 17 22:17:01 lannister CRON[7696]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 17 22:29:05 lannister nagios: Auto-save of retention data completed successfully.
May 17 22:35:06 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Users Load;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 22:35:09 lannister sSMTP[7926]: Creating SSL connection to host
May 17 22:35:09 lannister sSMTP[7926]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 22:35:12 lannister sSMTP[7926]: Sent mail for nagios@lannister (221 2.0.0 closing connection d23sm5437012wmd.1 - gsmtp) uid=1001 username=nagios outbytes=640
May 17 22:35:13 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Check Zombies;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 22:35:13 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;CPU Load;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
May 17 22:35:13 lannister sSMTP[7936]: Creating SSL connection to host
May 17 22:35:13 lannister sSMTP[7937]: Creating SSL connection to host
May 17 22:35:13 lannister sSMTP[7936]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 22:35:13 lannister sSMTP[7937]: SSL connection using RSA_AES_128_CBC_SHA1
May 17 22:35:15 lannister sSMTP[7936]: Sent mail for nagios@lannister (221 2.0.0 closing connection b15sm25782557wmd.1 - gsmtp) uid=1001 username=nagios outbytes=636
May 17 22:35:16 lannister sSMTP[7937]: Sent mail for nagios@lannister (221 2.0.0 closing connection lf9sm4919254wjc.44 - gsmtp) uid=1001 username=nagios outbytes=646
May 17 22:37:13 lannister nagios: SERVICE NOTIFICATION: nagiosadmin;nightswatch;Check hda1;CRITICAL;notify-service-by-email;CHECK_NRPE: Socket timeout after 10 seconds.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Monitoring nagios with another nagios.
SaltyBear wrote:We just had this error with some clients and this was due to the fact that someone closed the port 5666. We opened it so why is it returning this error again?
Your nmap does not show it responded to port 5666, so I suspect it's old.SaltyBear wrote:Code: Select all
root@lannister:/home/administrator# nmap 10.0.60.10 Starting Nmap 6.40 ( http://nmap.org ) at 2016-05-18 00:15 CEST Nmap scan report for 10.0.60.10 Host is up (0.0010s latency). Not shown: 994 filtered ports PORT STATE SERVICE 22/tcp open ssh 53/tcp closed domain 80/tcp open http 443/tcp closed https 587/tcp closed submission 3389/tcp closed ms-wbt-server Nmap done: 1 IP address (1 host up) scanned in 32.67 seconds root@lannister:/home/administrator#
What OS is your Nagios Server? What commands did you use to open the ports?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Monitoring nagios with another nagios.
Ubuntu 14.04. and we opened the ports on the firewall, not sure what you mean pfsense/OBSD?Box293 wrote:SaltyBear wrote:We just had this error with some clients and this was due to the fact that someone closed the port 5666. We opened it so why is it returning this error again?Your nmap does not show it responded to port 5666, so I suspect it's old.SaltyBear wrote:Code: Select all
root@lannister:/home/administrator# nmap 10.0.60.10 Starting Nmap 6.40 ( http://nmap.org ) at 2016-05-18 00:15 CEST Nmap scan report for 10.0.60.10 Host is up (0.0010s latency). Not shown: 994 filtered ports PORT STATE SERVICE 22/tcp open ssh 53/tcp closed domain 80/tcp open http 443/tcp closed https 587/tcp closed submission 3389/tcp closed ms-wbt-server Nmap done: 1 IP address (1 host up) scanned in 32.67 seconds root@lannister:/home/administrator#
What OS is your Nagios Server? What commands did you use to open the ports?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Monitoring nagios with another nagios.
You may need to open the ports on the Nagios server as well:
I think you need to run this as well:
This command should confirm if your Ubuntu server is listening on port 5666:
What output does it produce?
Code: Select all
sudo iptables -I INPUT -p tcp --destination-port 5666 -j ACCEPTCode: Select all
sudo apt-get install -y iptables-persistentCode: Select all
sudo lsof -i :5666As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Monitoring nagios with another nagios.
I didn't run the first 2 commands, but I get thisBox293 wrote:You may need to open the ports on the Nagios server as well:I think you need to run this as well:Code: Select all
sudo iptables -I INPUT -p tcp --destination-port 5666 -j ACCEPTThis command should confirm if your Ubuntu server is listening on port 5666:Code: Select all
sudo apt-get install -y iptables-persistentWhat output does it produce?Code: Select all
sudo lsof -i :5666
Code: Select all
root@lannister:/home/administrator# sudo lsof -i :5666
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
xinetd 879 root 5u IPv4 9373 0t0 TCP *:nrpe (LISTEN)