Page 1 of 2

Login not working and Server extremely slow

Posted: Mon Jan 13, 2020 7:09 am
by domsch1988
Hello everyone,

first time posting here, but i'm running out of ideas. Until 2 days ago, my instance of Nagioslog Server was running fine. Webfrontend hang for some reason. I restarted the Server and since then, i wasn't able to log in. Page load times are insanely slow (login page takes 2-3 Minutes, after clicking "Login" nothing happens for minutes).

I noticed, that there are two Java processes taking up over 70G of virtual memory each. The Server currently is running with 8 Cores and 32G of RAM. I could increase this for testing purposes. Last it ran, i had 24 Hosts sending Logs to it, 8 of which where Windows Servers, 4 Sophos Firewalls, the rest debian and CentOS Servers.
Log Data is written to an SMB Share. Currently there is roughly 900G of Data on that with a Quota of 2TB.

Server is running CentOS 7.7.1908. I can't tell you the Nagioslog Version, as i can't log in. It should be current as of 2 weeks ago.
I already reset the nagiosadmin password to something i'm 100% sure i type correctly. Yet the login fails. I feel like the login verification might fail because of Performance issues. I seriously hope someone can help me with this.

If you need any further information, i'm happy to provide what ever you need.

Re: Login not working and Server extremely slow

Posted: Mon Jan 13, 2020 11:13 am
by mbellerue
Let's start off by shutting down those Java processes. Try systemctl stop elasticsearch and systemctl stop logstash After that, see if the Java processes are gone, and if so, see if you can login.

Re: Login not working and Server extremely slow

Posted: Tue Jan 14, 2020 9:57 am
by domsch1988
Sorry for taking some time to report back.

Both Java Process get killed when i stop elasticsearch.
In that case, i can't test login, as i'm not getting the login prompt. Just the info that elasticsearch isn't running.
Image

Re: Login not working and Server extremely slow

Posted: Tue Jan 14, 2020 12:12 pm
by mbellerue
Ack, I forgot it won't let you login without elasticsearch running. But that's okay, let's work with it down for the moment. Can you post the output of grep ES_HEAP_SIZE /etc/sysconfig/elasticsearch ? That's the variable that tells Elasticsearch how much memory it can use. By default it's 50% of your system memory.

Re: Login not working and Server extremely slow

Posted: Wed Jan 15, 2020 1:57 am
by domsch1988
That would be

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
So yes, half the Memory.

The Processes aren't using that much memory either. It's "just" the amount of virtual memory that's that high. Actual RAM usage is around 50%.

Re: Login not working and Server extremely slow

Posted: Wed Jan 15, 2020 2:15 pm
by mbellerue
Alright, let's start up just Elasticsearch, and see what it does. systemctl start elasticsearch Start it up, give it a few moments, and then if you can grab a screenshot of its memory usage, that would be great. Also see if it lets you login to the web interface without Logstash running.

Re: Login not working and Server extremely slow

Posted: Wed Feb 26, 2020 9:11 am
by domsch1988
I'm really sorry for taking so long for coming back to this. Got other priorities at work and only now am able to look back into this.

I redid all the steps until here. Here's the screenshot of top after about 30 minutes:
Image

Let me know what more i can provide.

Edit: Also, no, login doesn't work without Logstash running. No Error, but same Problem. Site loads for a mintue or two and then just says "Wrong Username or Password".

Edit 2: I also, in the meanwhile, attached a 1.5TB Local Virtual Disk to the VM and moved all data there to rule out SMB as the Problem. Same result.

Re: Login not working and Server extremely slow

Posted: Wed Feb 26, 2020 5:33 pm
by cdienger
Please PM me a profile from the system. It can be gathered from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the PM system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

I'd also like to get a copy of the current settings index. This can be gathered by running:

Code: Select all

curl -XPOST http://localhost:9200/nagioslogserver/_export?path=/tmp/nagioslogserver.tar.gz
The file it creates and that we'd like to see is /tmp/nagioslogserver.tar.gz.

Re: Login not working and Server extremely slow

Posted: Thu Feb 27, 2020 3:50 am
by domsch1988
PM Sent.

Re: Login not working and Server extremely slow

Posted: Thu Feb 27, 2020 5:46 pm
by cdienger
A few indices are in a bad/corrupt state including the index(nagioslogserver) that holds the NLS credentials. Close one of them with:

Code: Select all

curl -XPOST "localhost:9200/logstash-2020.01.20/_close"
and then restart the elasticsearch service with:

Code: Select all

service elasticsearch restart
This can sometimes bring the nagioslogserver index back. Check for any 'red' indices after restarting with:

Code: Select all

curl 'localhost:9200/_cat/indices?pretty' | grep red
If the nagioslogserver index is still red then you can try importing a backup by running:

Code: Select all

cd /usr/local/nagioslogserver/scripts/
./restore_backup.sh /store/backups/nagioslogserver/<backup>.tar.gz
Also, the files in /var/log/logstash/ can be deleted. They don't contain anything recent and will make handling profiles easier if we need to gather another one.