Logstash crash and unable to access web UI
Posted: Mon Apr 06, 2020 1:18 pm
Good morning Nagios team,
This morning I experienced what I would consider to be a "typical" crash/down scenario for my environment. Essentially, I logged into my Log Server console just after 7AM PST when I started my shift and everything was fine for a few hours in between periodic refreshes of the 'Home' screen. We're troubleshooting another issue in a separate support ticket, but basically what will happen is the my session will timeout in the web UI and show a yellow triangle in the upper right-hand corner and the very next thing I click on once that has happened will completely log me out of the current session.
Well, I went to log back in as I typically would (around 10:45AM PST) and the console became unresponsive, just a spinning circle on that tab in Chrome. I closed the tab and tried logging in again to no avail. I checked my vSphere console for any strange CPU behavior and noticed that on one of my nodes the CPU activity was wildly fluctuating every 30 seconds or so. I made the decision to reboot that node and after a few minutes once all the CPU activity normalized I tested logging into the console and it worked. The Home graph shows that logstash collection crashed/stopped just before 8AM PST, but when I SSH'd into all three of the nodes they all showed both logstash and elasticsearch running (active).
Whenever I experience an issue with Log Server, this is the behavior I see 95% of the time. I don't understand why the logstash service shows it's running when it very clearly isn't. Can I send a system profile to someone for review? Maybe you can see what's happening?
Thank you.
This morning I experienced what I would consider to be a "typical" crash/down scenario for my environment. Essentially, I logged into my Log Server console just after 7AM PST when I started my shift and everything was fine for a few hours in between periodic refreshes of the 'Home' screen. We're troubleshooting another issue in a separate support ticket, but basically what will happen is the my session will timeout in the web UI and show a yellow triangle in the upper right-hand corner and the very next thing I click on once that has happened will completely log me out of the current session.
Well, I went to log back in as I typically would (around 10:45AM PST) and the console became unresponsive, just a spinning circle on that tab in Chrome. I closed the tab and tried logging in again to no avail. I checked my vSphere console for any strange CPU behavior and noticed that on one of my nodes the CPU activity was wildly fluctuating every 30 seconds or so. I made the decision to reboot that node and after a few minutes once all the CPU activity normalized I tested logging into the console and it worked. The Home graph shows that logstash collection crashed/stopped just before 8AM PST, but when I SSH'd into all three of the nodes they all showed both logstash and elasticsearch running (active).
Whenever I experience an issue with Log Server, this is the behavior I see 95% of the time. I don't understand why the logstash service shows it's running when it very clearly isn't. Can I send a system profile to someone for review? Maybe you can see what's happening?
Thank you.