Hello .. Good night.
We are planning to upgrade our nls cluster to version 210.
Before we perform the upgrade; We have a habit of updating the lab environment and installing a new cluster.
When we are trying to add a new server to the cluster, an error is occurring. Both machines are in the same vlan; with their local firewalls disabled.
We copy the data to notepads to avoid having some weird character at the time of copying, but nothing. We were able to monitor traffic arriving at the destination server with tcpdump; but we cannot identify what is happening.
n the release notes there are some notes about backend access restrictions, could this be an unforeseen bug?
We also noticed that you have a portion of the home page that is blank.
NLS 210 new cluster issue
-
ssoliveira
- Posts: 91
- Joined: Wed Dec 07, 2016 6:02 pm
NLS 210 new cluster issue
You do not have the required permissions to view the files attached to this post.
Re: NLS 210 new cluster issue
Regarding the error attaching to the cluster, what OS are the lab machines using?
The blank section of the page is the Total Disk Usage data. Is there anything special about the setup of the disks on these lab servers? If you could run a df -h on one of those servers where the disk usage dashlet isn't showing up, that would be good to see. Maybe try loading the page, and seeing if any logs show up in the Apache error logs.
The blank section of the page is the Total Disk Usage data. Is there anything special about the setup of the disks on these lab servers? If you could run a df -h on one of those servers where the disk usage dashlet isn't showing up, that would be good to see. Maybe try loading the page, and seeing if any logs show up in the Apache error logs.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
ssoliveira
- Posts: 91
- Joined: Wed Dec 07, 2016 6:02 pm
Re: NLS 210 new cluster issue
Hi good afternoon.
These are new CentOS 7.7 servers (full updated), provisioned exclusively for testing with the new version of Nagios Log Server.
I monitored the communication between the two VMs at Join time of the second computer to the cluster. There is only communication on port 9300
I did not identify errors in the "/var/log/elasticsearch/74133b7f-483e-45db-b4fd-298d8f0792d7.log" file
Provisioned 2 new servers to test. When I try to add the second, the same error is occurring. Is there any log that I can monitor; what detail the join procedure from server to cluster?
These are new CentOS 7.7 servers (full updated), provisioned exclusively for testing with the new version of Nagios Log Server.
I monitored the communication between the two VMs at Join time of the second computer to the cluster. There is only communication on port 9300
I did not identify errors in the "/var/log/elasticsearch/74133b7f-483e-45db-b4fd-298d8f0792d7.log" file
Provisioned 2 new servers to test. When I try to add the second, the same error is occurring. Is there any log that I can monitor; what detail the join procedure from server to cluster?
Code: Select all
[root@centos702 ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[root@centos702 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 17M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/centos-root 82G 3.5G 79G 5% /
/dev/mapper/centos-home 10G 33M 10G 1% /home
/dev/sda1 497M 255M 243M 52% /boot
tmpfs 379M 0 379M 0% /run/user/0
-
ssoliveira
- Posts: 91
- Joined: Wed Dec 07, 2016 6:02 pm
Re: NLS 210 new cluster issue
We identified the problem.
All our servers have 2 network cards (eth0: frondend-application and eth1: backend-backup).
For some reason the new server is after starting the communication with IP 10.144.142.12, it tries to continue communication with IP from interface eth1).
This is strange; because we performed the installation through IP 10.144.142.12
Let's manually edit the cluster_hosts file; and try to finish building the cluster. If necessary, we will remove the eth1 interface temporarily. But this behavior is very strange, we have already performed the installation several times following this same procedure, and this behavior never occurred.
All our servers have 2 network cards (eth0: frondend-application and eth1: backend-backup).
For some reason the new server is after starting the communication with IP 10.144.142.12, it tries to continue communication with IP from interface eth1).
Code: Select all
[root@centos702 ~]# tail -f /var/log/elasticsearch/74133b7f-483e-45db-b4fd-298d8f0792d7.log
[2019-09-27 18:43:13,813][WARN ][discovery.zen ] [8e6e3199-003d-4e60-ae42-e26278140ffa] failed to connect to master [[c174a8c9-5660-4200-81a8-4e7443c67e54][75oZ8uvnTJmoYSTnc-m0Kg][centos701.local][inet[/172.16.11.102:9300]]{max_local_storage_nodes=1}], retrying...
org.elasticsearch.transport.ConnectTransportException: [c174a8c9-5660-4200-81a8-4e7443c67e54][inet[/172.16.11.102:9300]] connect_timeout[30s]
Code: Select all
[root@centos701 ~]# cat /tmp/nagioslogserver/install.log
...
Created symlink from /etc/systemd/system/multi-user.target.wants/httpd.service to /usr/lib/systemd/system/httpd.service.
daemons step completed OK
Running 'webroot'...
webroot step completed OK
Nagios Log Server Installation Success!
You can finish the final setup steps for Nagios Log Server by visiting:
http://10.144.142.12/nagioslogserver/
Code: Select all
[root@centos701 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.144.142.12 netmask 255.255.192.0 broadcast 10.144.191.255
ether 00:50:56:88:71:5d txqueuelen 1000 (Ethernet)
RX packets 31863 bytes 2627232 (2.5 MiB)
RX errors 0 dropped 559 overruns 0 frame 0
TX packets 3434 bytes 1317220 (1.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.11.102 netmask 255.255.255.0 broadcast 172.16.11.255
ether 00:50:56:88:77:d2 txqueuelen 1000 (Ethernet)
RX packets 13 bytes 780 (780.0 B)
RX errors 0 dropped 13 overruns 0 frame 0
TX packets 6 bytes 360 (360.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 18802 bytes 4237891 (4.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18802 bytes 4237891 (4.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Code: Select all
[root@centos701 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
172.16.11.102
Re: NLS 210 new cluster issue
Great catch! You might also check out the /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml file. Specifically this section
Code: Select all
############################## Network And HTTP ###############################
# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).
# Set the bind address specifically (IPv4 or IPv6):
#
# network.bind_host: 192.168.0.1
# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
# network.publish_host: 192.168.0.1
# Set both 'bind_host' and 'publish_host':
#
# network.host: 192.168.0.1As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!