NLS running for 1 year and now it cannot start

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
cdcsysadmin
Posts: 55
Joined: Tue Dec 04, 2018 9:52 pm

NLS running for 1 year and now it cannot start

Post by cdcsysadmin »

Waiting for Database Startup
It looks like your local elasticsearch service is starting.

Why am I getting this error?
Elasticsearch can take some time to initialize its indices. This may take a few seconds. If this persists for more than a minute, please contact our support team

The page will refresh automatically after 5 seconds...


how to trace tthis problem or and related error log i can check to fix this , thanks!
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: NLS running for 1 year and now it cannot start

Post by cdienger »

Check the current log under /var/log/elasticsearch/ to help identify issues with the elasticsearch service. Also verify that it is running:

Code: Select all

service elasticsearch status
Please provide copies of the log if you need a second pair of eyes to review it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cdcsysadmin
Posts: 55
Joined: Tue Dec 04, 2018 9:52 pm

Re: NLS running for 1 year and now it cannot start

Post by cdcsysadmin »

the log is in 0kb

-rw-r--r-- 1 nagios users 0 Oct 29 2018 afca2236-594e-44bb-96c9-7f6b7ac52711_index_indexing_slowlog.log
-rw-r--r-- 1 nagios users 0 Oct 29 2018 afca2236-594e-44bb-96c9-7f6b7ac52711_index_search_slowlog.log
-rw-r--r-- 1 nagios users 0 Jan 14 03:42 afca2236-594e-44bb-96c9-7f6b7ac52711.log
-rw-r--r-- 1 nagios users 313K Dec 18 03:06 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20191218.gz
-rw-r--r-- 1 nagios users 233K Dec 18 22:09 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20191219.gz
-rw-r--r-- 1 nagios users 19K Dec 23 16:14 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20191224.gz
-rw-r--r-- 1 nagios users 28K Jan 5 03:13 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200105.gz
-rw-r--r-- 1 nagios users 25K Jan 6 03:36 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200106.gz
-rw-r--r-- 1 nagios users 22K Jan 6 07:26 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200107.gz
-rw-r--r-- 1 nagios users 68K Jan 13 18:27 afca2236-594e-44bb-96c9-7f6b7ac52711.log-20200114.gz
[root@localhost elasticsearch]# cat afc*search*.log
[root@localhost elasticsearch]# cat afc*11.log

and here is service elasticsearch status
[root@localhost elasticsearch]# service elasticsearch status
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (exited) since Mon 2020-01-13 09:42:39 HKT; 6 days ago
Docs: man:systemd-sysv-generator(8)
Process: 2286 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)

Jan 13 09:42:39 localhost.localdomain systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
Jan 13 09:42:39 localhost.localdomain runuser[2360]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
Jan 13 09:42:39 localhost.localdomain runuser[2360]: pam_unix(runuser:session): session closed for user nagios
Jan 13 09:42:39 localhost.localdomain elasticsearch[2286]: Starting elasticsearch: [ OK ]
Jan 13 09:42:39 localhost.localdomain systemd[1]: Started LSB: This service manages the elasticsearch daemon.
cdcsysadmin
Posts: 55
Joined: Tue Dec 04, 2018 9:52 pm

Re: NLS running for 1 year and now it cannot start

Post by cdcsysadmin »

manual start with command service elasticsearch start , now i can login to NLS
cdcsysadmin
Posts: 55
Joined: Tue Dec 04, 2018 9:52 pm

Re: NLS running for 1 year and now it cannot start

Post by cdcsysadmin »

after started elasticsearch service , it only receive 10mb of log, we have around 40G log per day, seem the server have problem of receiving log

other question , we dont know with we cant get in the forum with this message "You do not have the required permissions to read topics within this forum."

Thanks
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: NLS running for 1 year and now it cannot start

Post by mbellerue »

Can you show the output of cat /etc/fstab ? At 40GB of logs per day, I imagine you must have had some kind of additional storage tied to the Elasticsearch data directory. Maybe it didn't get mounted on the last reboot. Also let's get a df -h and df -i
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
cdcsysadmin
Posts: 55
Joined: Tue Dec 04, 2018 9:52 pm

Re: NLS running for 1 year and now it cannot start

Post by cdcsysadmin »

#
# /etc/fstab
# Created by anaconda on Tue May 8 13:21:12 2018
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root / ext4 defaults 1 1
UUID=fdcfcf9f-91e7-4a26-9ffb-3d5462862db5 /boot ext4 defaults 1 2
/dev/mapper/centos-swap swap swap defaults 0 0
/dev/snapshot_vg/data_lv /usr/local/nagioslogserver/snapshots ext4 defaults 1 1
10.10.110.225:/volume4/nagioslog01 /usr/local/nagioslogserver/snapshots02 nfs defaults 0 0
[root@localhost ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 68M 7.7G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 7.4T 6.2T 946G 87% /
/dev/sda1 976M 239M 671M 27% /boot
/dev/mapper/snapshot_vg-data_lv 20T 18T 836G 96% /usr/local/nagioslogserver/snapshots
10.10.110.225:/volume4/nagioslog01 11T 3.6T 7.4T 33% /usr/local/nagioslogserver/snapshots02
tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs 1.6G 0 1.6G 0% /run/user/0
cdcsysadmin
Posts: 55
Joined: Tue Dec 04, 2018 9:52 pm

Re: NLS running for 1 year and now it cannot start

Post by cdcsysadmin »

df -i

Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 2029258 627 2028631 1% /dev
tmpfs 2032174 1 2032173 1% /dev/shm
tmpfs 2032174 1519 2030655 1% /run
tmpfs 2032174 16 2032158 1% /sys/fs/cgroup
/dev/mapper/centos-root 501022720 107861 500914859 1% /
/dev/sda1 65536 358 65178 1% /boot
/dev/mapper/snapshot_vg-data_lv 1324875776 138771 1324737005 1% /usr/local/nagioslogserver/snapshots
10.10.110.225:/volume4/nagioslog01 366067712 38986 366028726 1% /usr/local/nagioslogserver/snapshots02
tmpfs 2032174 1 2032173 1% /run/user/1000
tmpfs 2032174 1 2032173 1% /run/user/0
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: NLS running for 1 year and now it cannot start

Post by cdienger »

The diskusage is at 87% which is going to cause issues with storing data. See the high and low disk water mark descriptions on https://www.elastic.co/guide/en/elastic ... cator.html.

I'd recommend removing older indices to clear up some space or allocating more drive space to the machine. Removing old indices can be done with:

Code: Select all

/usr/local/nagioslogserver/scripts/curator.sh delete indices --older-than $delete_time --time-unit days --timestring %Y.%m.%d
Where $delete_time is the number of days worth of logs to keep and the rest will be removed of the local NLS disk. Setting this to run automatically can be done by setting the "Delete indexes older than" field under Admin > System > Snapshots & Maintenance. You may need to wait a day for the optimizer to run and reclaim the space or you can force it to run by making sure a value for the "Optimize Indexes older than" and then running the snapshots_maintenance job under Admin > System > Command Subsystem.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked