Issue setting up DNX
Issue setting up DNX
Config
Linux 2.6.32-220.4.2.el6.i686 i386
VMware image Nagios XI 2011R2.2
DNX Configuration 1 Master with 3 Slaves
Nagios XI Installation Profile
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5
First let me admit that this is my first post and I have already resolve several problems from forum comments and posts. I guess now it is my turn to contribute or ask a question.
I have set up DNX as indicated using the automated script which ran successfully on the master and all 3 slaves. I have turned on logging and following is a sample of the output from the dnxplg.debug.log on the master:
[Mon May 21 18:12:12.173 2012] Post failed: Resource was not found. Service check [4356] will execute locally: /usr/local/nagios/libexec/check_load_remote -e ssh -H 148.80.26.32 -w 5 -c 10 -t 30 .
[Mon May 21 18:12:12.183 2012] Post failed: Resource was not found. Service check [4357] will execute locally: /usr/local/nagios/libexec/check_users -w 20 -c 50.
[Mon May 21 18:12:12.221 2012] Post failed: Resource was not found. Service check [4360] will execute locally: /usr/local/nagios/libexec/check_icmp -H 148.80.26.21 -w 3000.0,80% -c 5000.0,100% -p 5.
Following is sample output from the slaves dnxcld.debug.log:
[Mon May 21 18:18:05.163 2012] dnxSendNodeRequest: XML msg(197 bytes)=<dnxMessage><Request>NodeRequest</Request><XID>2-2920242032-1929160896</XID><GUID>2-2920242032-1929160896</GUID><ReqType>0</ReqType><JobCap>1</JobCap><Capacity>1</Capacity><TTL>4</TTL></dnxMessage>.
[Mon May 21 18:18:05.163 2012] dnxSendNodeRequest: XML msg(197 bytes)=<dnxMessage><Request>NodeRequest</Request><XID>2-3056610160-1929160896</XID><GUID>2-3056610160-1929160896</GUID><ReqType>0</ReqType><JobCap>1</JobCap><Capacity>1</Capacity><TTL>4</TTL></dnxMessage>.
[Mon May 21 18:18:05.163 2012] dnxSendNodeRequest: XML msg(197 bytes)=<dnxMessage><Request>NodeRequest</Request><XID>2-2962201456-1929160896</XID><GUID>2-2962201456-1929160896</GUID><ReqType>0</ReqType><JobCap>1</JobCap><Capacity>1</Capacity><TTL>4</TTL></dnxMessage>.
I don't appear to have any checks executing from the slaves.
Any recommendations or thoughts?
Linux 2.6.32-220.4.2.el6.i686 i386
VMware image Nagios XI 2011R2.2
DNX Configuration 1 Master with 3 Slaves
Nagios XI Installation Profile
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5
First let me admit that this is my first post and I have already resolve several problems from forum comments and posts. I guess now it is my turn to contribute or ask a question.
I have set up DNX as indicated using the automated script which ran successfully on the master and all 3 slaves. I have turned on logging and following is a sample of the output from the dnxplg.debug.log on the master:
[Mon May 21 18:12:12.173 2012] Post failed: Resource was not found. Service check [4356] will execute locally: /usr/local/nagios/libexec/check_load_remote -e ssh -H 148.80.26.32 -w 5 -c 10 -t 30 .
[Mon May 21 18:12:12.183 2012] Post failed: Resource was not found. Service check [4357] will execute locally: /usr/local/nagios/libexec/check_users -w 20 -c 50.
[Mon May 21 18:12:12.221 2012] Post failed: Resource was not found. Service check [4360] will execute locally: /usr/local/nagios/libexec/check_icmp -H 148.80.26.21 -w 3000.0,80% -c 5000.0,100% -p 5.
Following is sample output from the slaves dnxcld.debug.log:
[Mon May 21 18:18:05.163 2012] dnxSendNodeRequest: XML msg(197 bytes)=<dnxMessage><Request>NodeRequest</Request><XID>2-2920242032-1929160896</XID><GUID>2-2920242032-1929160896</GUID><ReqType>0</ReqType><JobCap>1</JobCap><Capacity>1</Capacity><TTL>4</TTL></dnxMessage>.
[Mon May 21 18:18:05.163 2012] dnxSendNodeRequest: XML msg(197 bytes)=<dnxMessage><Request>NodeRequest</Request><XID>2-3056610160-1929160896</XID><GUID>2-3056610160-1929160896</GUID><ReqType>0</ReqType><JobCap>1</JobCap><Capacity>1</Capacity><TTL>4</TTL></dnxMessage>.
[Mon May 21 18:18:05.163 2012] dnxSendNodeRequest: XML msg(197 bytes)=<dnxMessage><Request>NodeRequest</Request><XID>2-2962201456-1929160896</XID><GUID>2-2962201456-1929160896</GUID><ReqType>0</ReqType><JobCap>1</JobCap><Capacity>1</Capacity><TTL>4</TTL></dnxMessage>.
I don't appear to have any checks executing from the slaves.
Any recommendations or thoughts?
You do not have the required permissions to view the files attached to this post.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Issue setting up DNX
Did you install all of the plugins on the DNX client in /usr/local/nagios/libexec ?
Re: Issue setting up DNX
Also, when I manually execute the plug in from the Client it works successfully.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Issue setting up DNX
Can you post your dnx server and client configs
Thanks.
Thanks.
Re: Issue setting up DNX
I've attached the config files requested. We simplified down to just one client until we resolve the problem. Thanks in advance for your assistance. Just to provide the big picture, once we get this working in our development environment our intent is to roll out an architecture that will monitor approx. 70K host devices.
You do not have the required permissions to view the files attached to this post.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Issue setting up DNX
To verify, after making changes to the dnx server config you did restart nagios correct?
Also, can we verify that upd on the required ports isn't being blocked by any firewalls
Also, can we verify that upd on the required ports isn't being blocked by any firewalls
Re: Issue setting up DNX
Yes, nagios was restarted. I have been told by my network team that required ports are open. However, following is the output from the netstat -an -UDP command:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:47960 0.0.0.0:* 1299/avahi-daemon
udp 0 0 0.0.0.0:5353 0.0.0.0:* 1299/avahi-daemon
udp 0 0 192.168.252.113:123 0.0.0.0:* 1393/ntpd
udp 0 0 127.0.0.1:123 0.0.0.0:* 1393/ntpd
udp 0 0 0.0.0.0:123 0.0.0.0:* 1393/ntpd
udp 0 0 0.0.0.0:162 0.0.0.0:* 1356/snmptrapd
udp 0 0 0.0.0.0:12480 0.0.0.0:* 1762/dnxServer
udp 0 0 0.0.0.0:12480 0.0.0.0:* 1766/dnxServer
*
*
*
udp 0 0 0.0.0.0:12482 0.0.0.0:* 1756/dnxServer
udp 0 0 0.0.0.0:12482 0.0.0.0:* 1760/dnxServer
udp 0 0 0.0.0.0:12482 0.0.0.0:* 1764/dnxServer
udp 0 0 fe80::250:56ff:fe80:5157:123 :::* 1393/ntpd
udp 0 0 ::1:123 :::* 1393/ntpd
udp 0 0 :::123 :::* 1393/ntpd
udp 0 0 ::1:60430 ::1:60430 ESTABLISHED 1562/postmaster
If there is a command to test the specific port communications, let me know.
Following is the output of the command netstat -an -udp on the CLIENT:
[root@localhost ~]# netstat -an -udp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 192.168.252.114:33867 192.168.252.113:12481 ESTABLISHED 3135/./dnxClient
udp 0 0 192.168.252.114:60495 192.168.252.113:12480 ESTABLISHED 3096/./dnxClient
*
*
*
udp 0 0 192.168.252.114:34370 192.168.252.113:12481 ESTABLISHED 3135/./dnxClient
udp 0 0 0.0.0.0:12482 0.0.0.0:* 3135/./dnxClient
udp 0 0 0.0.0.0:12482 0.0.0.0:* 3096/./dnxClient
udp 0 0 192.168.252.114:37699 192.168.252.113:12481 ESTABLISHED 3096/./dnxClient
udp 0 0 192.168.252.114:38471 192.168.252.113:12480 ESTABLISHED 3096/./dnxClient
udp 0 0 ::1:60250 ::1:60250 ESTABLISHED 1554/postmaster
[root@localhost ~]#
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:47960 0.0.0.0:* 1299/avahi-daemon
udp 0 0 0.0.0.0:5353 0.0.0.0:* 1299/avahi-daemon
udp 0 0 192.168.252.113:123 0.0.0.0:* 1393/ntpd
udp 0 0 127.0.0.1:123 0.0.0.0:* 1393/ntpd
udp 0 0 0.0.0.0:123 0.0.0.0:* 1393/ntpd
udp 0 0 0.0.0.0:162 0.0.0.0:* 1356/snmptrapd
udp 0 0 0.0.0.0:12480 0.0.0.0:* 1762/dnxServer
udp 0 0 0.0.0.0:12480 0.0.0.0:* 1766/dnxServer
*
*
*
udp 0 0 0.0.0.0:12482 0.0.0.0:* 1756/dnxServer
udp 0 0 0.0.0.0:12482 0.0.0.0:* 1760/dnxServer
udp 0 0 0.0.0.0:12482 0.0.0.0:* 1764/dnxServer
udp 0 0 fe80::250:56ff:fe80:5157:123 :::* 1393/ntpd
udp 0 0 ::1:123 :::* 1393/ntpd
udp 0 0 :::123 :::* 1393/ntpd
udp 0 0 ::1:60430 ::1:60430 ESTABLISHED 1562/postmaster
If there is a command to test the specific port communications, let me know.
Following is the output of the command netstat -an -udp on the CLIENT:
[root@localhost ~]# netstat -an -udp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 192.168.252.114:33867 192.168.252.113:12481 ESTABLISHED 3135/./dnxClient
udp 0 0 192.168.252.114:60495 192.168.252.113:12480 ESTABLISHED 3096/./dnxClient
*
*
*
udp 0 0 192.168.252.114:34370 192.168.252.113:12481 ESTABLISHED 3135/./dnxClient
udp 0 0 0.0.0.0:12482 0.0.0.0:* 3135/./dnxClient
udp 0 0 0.0.0.0:12482 0.0.0.0:* 3096/./dnxClient
udp 0 0 192.168.252.114:37699 192.168.252.113:12481 ESTABLISHED 3096/./dnxClient
udp 0 0 192.168.252.114:38471 192.168.252.113:12480 ESTABLISHED 3096/./dnxClient
udp 0 0 ::1:60250 ::1:60250 ESTABLISHED 1554/postmaster
[root@localhost ~]#
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Issue setting up DNX
Just to verify, are all of the machine the same OS version and architecture? These appear to just be plugin errors or something.
Re: Issue setting up DNX
A few hours ago, one of my project team members identified the source of the problem. Due to security requirements we had to modify the auto-installation to handle manual download of the DNX package and then auto-install. To get quickly to the point, this caused the dnxClient to startup but quickly die. We resolved that problem with the addition of the “ –d 3” to the command in the init.d script. We are now definitely passing data from each client but have some minor SSH and other issues to resolve. While we are not a 100% recovered just yet, the netstat -aupn now indicates multiple ports with established dnxClient connections on ports 12480 and 12481 on each client. The debug logs indicate workers receiving job messages. We are moving in the right direction but have a few minor issues to resolve yet.