Login not working and Server extremely slow

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Login not working and Server extremely slow

Post by domsch1988 »

Hello everyone,

first time posting here, but i'm running out of ideas. Until 2 days ago, my instance of Nagioslog Server was running fine. Webfrontend hang for some reason. I restarted the Server and since then, i wasn't able to log in. Page load times are insanely slow (login page takes 2-3 Minutes, after clicking "Login" nothing happens for minutes).

I noticed, that there are two Java processes taking up over 70G of virtual memory each. The Server currently is running with 8 Cores and 32G of RAM. I could increase this for testing purposes. Last it ran, i had 24 Hosts sending Logs to it, 8 of which where Windows Servers, 4 Sophos Firewalls, the rest debian and CentOS Servers.
Log Data is written to an SMB Share. Currently there is roughly 900G of Data on that with a Quota of 2TB.

Server is running CentOS 7.7.1908. I can't tell you the Nagioslog Version, as i can't log in. It should be current as of 2 weeks ago.
I already reset the nagiosadmin password to something i'm 100% sure i type correctly. Yet the login fails. I feel like the login verification might fail because of Performance issues. I seriously hope someone can help me with this.

If you need any further information, i'm happy to provide what ever you need.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Login not working and Server extremely slow

Post by mbellerue »

Let's start off by shutting down those Java processes. Try systemctl stop elasticsearch and systemctl stop logstash After that, see if the Java processes are gone, and if so, see if you can login.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Login not working and Server extremely slow

Post by domsch1988 »

Sorry for taking some time to report back.

Both Java Process get killed when i stop elasticsearch.
In that case, i can't test login, as i'm not getting the login prompt. Just the info that elasticsearch isn't running.
Image
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Login not working and Server extremely slow

Post by mbellerue »

Ack, I forgot it won't let you login without elasticsearch running. But that's okay, let's work with it down for the moment. Can you post the output of grep ES_HEAP_SIZE /etc/sysconfig/elasticsearch ? That's the variable that tells Elasticsearch how much memory it can use. By default it's 50% of your system memory.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Login not working and Server extremely slow

Post by domsch1988 »

That would be

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
So yes, half the Memory.

The Processes aren't using that much memory either. It's "just" the amount of virtual memory that's that high. Actual RAM usage is around 50%.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Login not working and Server extremely slow

Post by mbellerue »

Alright, let's start up just Elasticsearch, and see what it does. systemctl start elasticsearch Start it up, give it a few moments, and then if you can grab a screenshot of its memory usage, that would be great. Also see if it lets you login to the web interface without Logstash running.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Login not working and Server extremely slow

Post by domsch1988 »

I'm really sorry for taking so long for coming back to this. Got other priorities at work and only now am able to look back into this.

I redid all the steps until here. Here's the screenshot of top after about 30 minutes:
Image

Let me know what more i can provide.

Edit: Also, no, login doesn't work without Logstash running. No Error, but same Problem. Site loads for a mintue or two and then just says "Wrong Username or Password".

Edit 2: I also, in the meanwhile, attached a 1.5TB Local Virtual Disk to the VM and moved all data there to rule out SMB as the Problem. Same result.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Login not working and Server extremely slow

Post by cdienger »

Please PM me a profile from the system. It can be gathered from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the PM system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

I'd also like to get a copy of the current settings index. This can be gathered by running:

Code: Select all

curl -XPOST http://localhost:9200/nagioslogserver/_export?path=/tmp/nagioslogserver.tar.gz
The file it creates and that we'd like to see is /tmp/nagioslogserver.tar.gz.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Login not working and Server extremely slow

Post by domsch1988 »

PM Sent.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Login not working and Server extremely slow

Post by cdienger »

A few indices are in a bad/corrupt state including the index(nagioslogserver) that holds the NLS credentials. Close one of them with:

Code: Select all

curl -XPOST "localhost:9200/logstash-2020.01.20/_close"
and then restart the elasticsearch service with:

Code: Select all

service elasticsearch restart
This can sometimes bring the nagioslogserver index back. Check for any 'red' indices after restarting with:

Code: Select all

curl 'localhost:9200/_cat/indices?pretty' | grep red
If the nagioslogserver index is still red then you can try importing a backup by running:

Code: Select all

cd /usr/local/nagioslogserver/scripts/
./restore_backup.sh /store/backups/nagioslogserver/<backup>.tar.gz
Also, the files in /var/log/logstash/ can be deleted. They don't contain anything recent and will make handling profiles easier if we need to gather another one.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked