Page 1 of 1

NLS running for 1 year and now it cannot start

Posted: Thu Jan 16, 2020 8:57 pm
by cdcsysadmin
Waiting for Database Startup
It looks like your local elasticsearch service is starting.

Why am I getting this error?
Elasticsearch can take some time to initialize its indices. This may take a few seconds. If this persists for more than a minute, please contact our support team

The page will refresh automatically after 5 seconds...


how to trace tthis problem or and related error log i can check to fix this , thanks!

Re: NLS running for 1 year and now it cannot start

Posted: Fri Jan 17, 2020 1:44 pm
by cdienger
Check the current log under /var/log/elasticsearch/ to help identify issues with the elasticsearch service. Also verify that it is running:

Code: Select all

service elasticsearch status
Please provide copies of the log if you need a second pair of eyes to review it.

Re: NLS running for 1 year and now it cannot start

Posted: Sun Jan 19, 2020 7:43 am
by cdcsysadmin
the log is in 0kb

-rw-r--r-- 1 nagios users 0 Oct 29 2018 afca2236-594e-44bb-96c9-7f6b7ac52711_index_indexing_slowlog.log
-rw-r--r-- 1 nagios users 0 Oct 29 2018 afca2236-594e-44bb-96c9-7f6b7ac52711_index_search_slowlog.log
-rw-r--r-- 1 nagios users 0 Jan 14 03:42 afca2236-594e-44bb-96c9-7f6b7ac52711.log
-rw-r--r-- 1 nagios users 313K Dec 18 03:06 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20191218.gz
-rw-r--r-- 1 nagios users 233K Dec 18 22:09 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20191219.gz
-rw-r--r-- 1 nagios users 19K Dec 23 16:14 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20191224.gz
-rw-r--r-- 1 nagios users 28K Jan 5 03:13 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200105.gz
-rw-r--r-- 1 nagios users 25K Jan 6 03:36 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200106.gz
-rw-r--r-- 1 nagios users 22K Jan 6 07:26 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200107.gz
-rw-r--r-- 1 nagios users 68K Jan 13 18:27 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200114.gz
[root@localhost elasticsearch]# cat afc*search*.log
[root@localhost elasticsearch]# cat afc*11.log

and here is service elasticsearch status
[root@localhost elasticsearch]# service elasticsearch status
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (exited) since Mon 2020-01-13 09:42:39 HKT; 6 days ago
Docs: man:systemd-sysv-generator(8)
Process: 2286 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)

Jan 13 09:42:39 localhost.localdomain systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
Jan 13 09:42:39 localhost.localdomain runuser[2360]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
Jan 13 09:42:39 localhost.localdomain runuser[2360]: pam_unix(runuser:session): session closed for user nagios
Jan 13 09:42:39 localhost.localdomain elasticsearch[2286]: Starting elasticsearch: [ OK ]
Jan 13 09:42:39 localhost.localdomain systemd[1]: Started LSB: This service manages the elasticsearch daemon.

Re: NLS running for 1 year and now it cannot start

Posted: Sun Jan 19, 2020 7:47 am
by cdcsysadmin
manual start with command service elasticsearch start , now i can login to NLS

Re: NLS running for 1 year and now it cannot start

Posted: Sun Jan 19, 2020 11:13 pm
by cdcsysadmin
after started elasticsearch service , it only receive 10mb of log, we have around 40G log per day, seem the server have problem of receiving log

other question , we dont know with we cant get in the forum with this message "You do not have the required permissions to read topics within this forum."

Thanks

Re: NLS running for 1 year and now it cannot start

Posted: Mon Jan 20, 2020 3:32 pm
by mbellerue
Can you show the output of cat /etc/fstab ? At 40GB of logs per day, I imagine you must have had some kind of additional storage tied to the Elasticsearch data directory. Maybe it didn't get mounted on the last reboot. Also let's get a df -h and df -i

Re: NLS running for 1 year and now it cannot start

Posted: Tue Jan 21, 2020 2:28 am
by cdcsysadmin
#
# /etc/fstab
# Created by anaconda on Tue May 8 13:21:12 2018
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root / ext4 defaults 1 1
UUID=fdcfcf9f-91e7-4a26-9ffb-3d5462862db5 /boot ext4 defaults 1 2
/dev/mapper/centos-swap swap swap defaults 0 0
/dev/snapshot_vg/data_lv /usr/local/nagioslogserver/snapshots ext4 defaults 1 1
10.10.110.225:/volume4/nagioslog01 /usr/local/nagioslogserver/snapshots02 nfs defaults 0 0
[root@localhost ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 68M 7.7G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 7.4T 6.2T 946G 87% /
/dev/sda1 976M 239M 671M 27% /boot
/dev/mapper/snapshot_vg-data_lv 20T 18T 836G 96% /usr/local/nagioslogserver/snapshots
10.10.110.225:/volume4/nagioslog01 11T 3.6T 7.4T 33% /usr/local/nagioslogserver/snapshots02
tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs 1.6G 0 1.6G 0% /run/user/0

Re: NLS running for 1 year and now it cannot start

Posted: Tue Jan 21, 2020 2:30 am
by cdcsysadmin
df -i

Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 2029258 627 2028631 1% /dev
tmpfs 2032174 1 2032173 1% /dev/shm
tmpfs 2032174 1519 2030655 1% /run
tmpfs 2032174 16 2032158 1% /sys/fs/cgroup
/dev/mapper/centos-root 501022720 107861 500914859 1% /
/dev/sda1 65536 358 65178 1% /boot
/dev/mapper/snapshot_vg-data_lv 1324875776 138771 1324737005 1% /usr/local/nagioslogserver/snapshots
10.10.110.225:/volume4/nagioslog01 366067712 38986 366028726 1% /usr/local/nagioslogserver/snapshots02
tmpfs 2032174 1 2032173 1% /run/user/1000
tmpfs 2032174 1 2032173 1% /run/user/0

Re: NLS running for 1 year and now it cannot start

Posted: Tue Jan 21, 2020 3:59 pm
by cdienger
The diskusage is at 87% which is going to cause issues with storing data. See the high and low disk water mark descriptions on https://www.elastic.co/guide/en/elastic ... cator.html.

I'd recommend removing older indices to clear up some space or allocating more drive space to the machine. Removing old indices can be done with:

Code: Select all

/usr/local/nagioslogserver/scripts/curator.sh delete indices --older-than $delete_time --time-unit days --timestring %Y.%m.%d
Where $delete_time is the number of days worth of logs to keep and the rest will be removed of the local NLS disk. Setting this to run automatically can be done by setting the "Delete indexes older than" field under Admin > System > Snapshots & Maintenance. You may need to wait a day for the optimizer to run and reclaim the space or you can force it to run by making sure a value for the "Optimize Indexes older than" and then running the snapshots_maintenance job under Admin > System > Command Subsystem.