Logstash services stopped on all nodes
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Logstash services stopped on all nodes
Just noticed that our cluster has stopped collecting logs and after checking, looks like logstash on all nodes are not running. When starting the service manually, it does not stay running on any of the nodes. Nothing appears in the logstash logs. Any ideas?
Re: Logstash services stopped on all nodes
Anything in /var/log/messages? Is there any possibility that your disks are full?
My guesses are that either
A) Logstash can't start due to insufficient memory
-or-
B) Logstash can't start due to no disk space
Let me know!
My guesses are that either
A) Logstash can't start due to insufficient memory
-or-
B) Logstash can't start due to no disk space
Let me know!
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Logstash services stopped on all nodes
Space wise it is looking OK
Could be memory... but it shows that memory is fine within Log Server
I've restarted the node and the service is still dead....
Code: Select all
# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 99G 3.2G 95G 4% /
devtmpfs 9.9G 148K 9.9G 1% /dev
tmpfs 9.9G 0 9.9G 0% /dev/shm
/dev/sda1 99G 3.2G 95G 4% /
10.242.145.237:/vol/v_kdcnagiosnfs1_kdcnagls1n1_logs
2.5T 1.3T 1.3T 50% /nfs/logdata
10.242.145.250:/vol/v_kdcnagiosnfs2_repository
8.8T 7.9T 985G 90% /nfs/repositoryCode: Select all
# free -m
total used free shared buffers cached
Mem: 20120 19544 576 0 65 4813
-/+ buffers/cache: 14664 5456
Swap: 255 33 222
Code: Select all
# service logstash status
Logstash Daemon dead but pid file existsYou do not have the required permissions to view the files attached to this post.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Logstash services stopped on all nodes
I would trust `free` over the web interface, wouldn't you?CFT6Server wrote:Code: Select all
# free -m total used free shared buffers cached Mem: 20120 19544 576 0 65 4813 -/+ buffers/cache: 14664 5456 Swap: 255 33 222
I'd have to confirm this with the developers, but I bet that data in the UI comes from elasticsearch which is likely out of date if Logstash isn't up. Although it is surprising that you're not seeing out of memory errors in the logstash log, it usually doesn't try to hide memory problems.
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Logstash services stopped on all nodes
Looks like this was due to an error in the configuration that caused this. Not a memory issue. Not sure why it didn't pick this up during verification and also there wasn't any changes in the last 2 to 3 weeks, so timing is a bit odd. But turns out that one of the filters were referring to a pattern that is not valid. Since verification did not pick this up, I have to turn on verbose logging by editing /etc/init.d/logstash and putting in -vvv in the argument. Then it showed me the pattern that had errors which caused logstash to start running.
Re: Logstash services stopped on all nodes
That is a bit odd - I'm still working with the developers to get the configurations to verify before the Apply Configuration occurs. I have never encountered a logstash configuration that caused it to stop starting entirely (without logging). If you're willing to shed some more light before I close this thread, what exactly was the configuration mistake?
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Logstash services stopped on all nodes
It was a filter which had a grok pattern that did not existing.
So a snippet of the grok filter...
So one of those were renamed in the pattern file in /usr/local/nagioslogserver/logstash/patterns/ so it was pointing to an unknown pattern.
This did finally show in the logs once I enable the verbose logging.
I noticed that there are minimal logging by default for the logstash service. I have set mine to LS_OPTS="-v" in the /etc/init.d/logstash file.
So a snippet of the grok filter...
Code: Select all
grok {
match => [
'message', '%{CISCOFW106001_1}',
'message', '%{CISCOFW106001_2}',
'message', '%{CISCOFW106006_106007_1}',
'message', '%{CISCOFW106006_106007_2}',
'message', '%{CISCOFW106006_106007_106010}',
'message', '%{CISCOFW106015}',
'message', '%{CISCOFW106021}',
'message', '%{CISCOFW106023}',
'message', '%{CISCOFW106100}',
'message', '%{CISCOFW110002}',
'message', '%{CISCOFW302010}',
'message', '%{CISCOFW302013_302014_302015_302016_1}',
'message', '%{CISCOFW302013_302014_302015_302016_2}',
'message', '%{CISCOFW302020_302021_1}',
'message', '%{CISCOFW302020_302021_2}',
'message', '%{CISCOFW305011}',
'message', '%{CISCOFW313001_313004_313008}',
'message', '%{CISCOFW313005}',
'message', '%{CISCOFW402117}',
'message', '%{CISCOFW402119}',
'message', '%{CISCOFW419001}',
'message', '%{CISCOFW419002}',
'message', '%{CISCOFW500004}',
'message', '%{CISCOFW602303_602304_1}',
'message', '%{CISCOFW602303_602304_2}',
'message', '%{CISCOFW710001_710002_710003_710005_710006}',
'message', '%{CISCOFW713172}',
'message', '%{CISCOFW733100}',
'message', '%{CISCOFW106014}'
]This did finally show in the logs once I enable the verbose logging.
Code: Select all
{:timestamp=>"2015-10-01T21:02:04.867000-0700", :message=>"The error reported is: \n pattern %{CISCOFW106010} not defined"}Re: Logstash services stopped on all nodes
Thanks for the information. I attempted to replicate your problem, and logstash.log contained the following error:
This is a default install of Nagios Log Server, so I checked my init script, and I'm also in the lowers verbosity level. Any chance you're on an older version of NLS? I tested on 2.2 - the issue might be resolved already, and I am hesitant to file a bug report unless I can confirm that the bug still exists in the latest version.
Code: Select all
{:timestamp=>"2015-10-06T14:33:44.052000-0400", :message=>"The error reported is: \n pattern %{COMNEDAPACHELOG} not defined"}