"Waiting for Database Startup" for 14 hours

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

"Waiting for Database Startup" for 14 hours

Post by waskinbas »

Hello! Yesterday I powered off my Nagios LS server (the basic OVF I've used for months) and moved it between datastores to get ready for a full CentOS install. When the OVF powered up, it gave me the "Waiting for Database Startup" page and has remained there for about 14 hours. I read a bunch of forum posts and followed several suggestions that have helped others. I have tried increasing the https.max, I have restored a configuration backup using the built-in script, and I have stopped and restarted the elasticsearch, logstash, and http services several times. I have also tried rebooting several times. I'd like to avoid restoring the entire server, so any suggestions would be helpful. I see similar, abandoned topics for this topic. I'm hoping this isn't a critical issue.
kg2857
Posts: 233
Joined: Wed Apr 12, 2023 5:48 pm

Re: "Waiting for Database Startup" for 14 hours

Post by kg2857 »

That means the elasticsearch service is failing to start.
I've seen that sometimes when this happens the cluster_hosts (I think) may need to be edited to contain the IP or hostname of eash cluster host. Since it sounds like you only have one NLS host, this may not be an issue.
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

Re: "Waiting for Database Startup" for 14 hours

Post by waskinbas »

It is a stand alone server. I had a space crunch a while back and had to remove the second server from the cluster in a hurry. I did try commenting out the removed servers in cluster_hosts, but it didn't help ElasticSearch start up.

I've decided to build a new CentOS Stream 9 server and copy my backups and log data to the new machine. However... I can't locate my snapshot repository. I tried using curl -XGET but it won't work either. Does anyone know what the default snapshot repository path is?
kg2857
Posts: 233
Joined: Wed Apr 12, 2023 5:48 pm

Re: "Waiting for Database Startup" for 14 hours

Post by kg2857 »

Might want to look at the ES log file and delete any lines in cluster_hosts other than the one host.
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

Re: "Waiting for Database Startup" for 14 hours

Post by waskinbas »

For anyone looking at this in the future, looks like the default datastore is /usr/local/nagioslogserver/elasticsearch/data/

It appears that the metadata files that my Elasticsearch needed to open the indexes were gone. Not sure why moving them between data stores wiped out those files, but they were gone. I ended up building a CentOS Stream 9 server and restoring a system backup from the OVF there.
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

Re: "Waiting for Database Startup" for 14 hours

Post by waskinbas »

Very nice summary, thank you! I ended up installing CentOS Stream 9 and restoring my backups to the new host. So far so good. I will keep your post bookmarked, though!
kg2857
Posts: 233
Joined: Wed Apr 12, 2023 5:48 pm

Re: "Waiting for Database Startup" for 14 hours

Post by kg2857 »

This means ES isn't starting. Fix that.
patrickhayes
Posts: 3
Joined: Mon Jul 03, 2023 4:10 am

Re: "Waiting for Database Startup" for 14 hours

Post by patrickhayes »

When this problem is experienced the solution is to reboot ALL of the NLS instances in the cluster. This requires rebooting the entire operating system, not just some of the services on the server.

After rebooting ALL of the NLS instances in the cluster the problem will be resolved. dino game
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

Re: "Waiting for Database Startup" for 14 hours

Post by waskinbas »

patrickhayes wrote: Mon Jul 03, 2023 4:15 am When this problem is experienced the solution is to reboot ALL of the NLS instances in the cluster. This requires rebooting the entire operating system, not just some of the services on the server.

After rebooting ALL of the NLS instances in the cluster the problem will be resolved. dino game
I only had one server and I rebooted multiple times. There was something wrong with ES that I didn't feel like troubleshooting any longer. I just stood up a CentOS 9 server, restored the config snapshots, and went on with my life. I did lose most of my indexes, but I now have a better back up strategy for them so I shouldn't have that issue next time.
kg2857
Posts: 233
Joined: Wed Apr 12, 2023 5:48 pm

Re: "Waiting for Database Startup" for 14 hours

Post by kg2857 »

Restarting all hosts is more of an I don't know what to do fix.
This issue was resolved some weeks ago so not sure why we're here.
Finally, correcting the cluster_hosts file and restarting ES generally resolves the issue at least when there are multiple hosts.
Post Reply