Logstash services stopped on all nodes

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Logstash services stopped on all nodes

Post by CFT6Server »

Just noticed that our cluster has stopped collecting logs and after checking, looks like logstash on all nodes are not running. When starting the service manually, it does not stay running on any of the nodes. Nothing appears in the logstash logs. Any ideas?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash services stopped on all nodes

Post by jolson »

Anything in /var/log/messages? Is there any possibility that your disks are full?

My guesses are that either
A) Logstash can't start due to insufficient memory
-or-
B) Logstash can't start due to no disk space

Let me know!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Logstash services stopped on all nodes

Post by CFT6Server »

Space wise it is looking OK

Code: Select all

# df -h
Filesystem            Size  Used Avail Use% Mounted on
rootfs                 99G  3.2G   95G   4% /
devtmpfs              9.9G  148K  9.9G   1% /dev
tmpfs                 9.9G     0  9.9G   0% /dev/shm
/dev/sda1              99G  3.2G   95G   4% /
10.242.145.237:/vol/v_kdcnagiosnfs1_kdcnagls1n1_logs
                      2.5T  1.3T  1.3T  50% /nfs/logdata
10.242.145.250:/vol/v_kdcnagiosnfs2_repository
                      8.8T  7.9T  985G  90% /nfs/repository
Could be memory... but it shows that memory is fine within Log Server

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:         20120      19544        576          0         65       4813
-/+ buffers/cache:      14664       5456
Swap:          255         33        222
resource.JPG
I've restarted the node and the service is still dead....

Code: Select all

# service logstash status
Logstash Daemon dead but pid file exists
You do not have the required permissions to view the files attached to this post.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Logstash services stopped on all nodes

Post by jdalrymple »

CFT6Server wrote:

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:         20120      19544        576          0         65       4813
-/+ buffers/cache:      14664       5456
Swap:          255         33        222
I would trust `free` over the web interface, wouldn't you?

I'd have to confirm this with the developers, but I bet that data in the UI comes from elasticsearch which is likely out of date if Logstash isn't up. Although it is surprising that you're not seeing out of memory errors in the logstash log, it usually doesn't try to hide memory problems.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Logstash services stopped on all nodes

Post by CFT6Server »

Looks like this was due to an error in the configuration that caused this. Not a memory issue. Not sure why it didn't pick this up during verification and also there wasn't any changes in the last 2 to 3 weeks, so timing is a bit odd. But turns out that one of the filters were referring to a pattern that is not valid. Since verification did not pick this up, I have to turn on verbose logging by editing /etc/init.d/logstash and putting in -vvv in the argument. Then it showed me the pattern that had errors which caused logstash to start running.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash services stopped on all nodes

Post by jolson »

That is a bit odd - I'm still working with the developers to get the configurations to verify before the Apply Configuration occurs. I have never encountered a logstash configuration that caused it to stop starting entirely (without logging). If you're willing to shed some more light before I close this thread, what exactly was the configuration mistake?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Logstash services stopped on all nodes

Post by CFT6Server »

It was a filter which had a grok pattern that did not existing.

So a snippet of the grok filter...

Code: Select all

grok {
        match => [ 
			'message', '%{CISCOFW106001_1}',
			'message', '%{CISCOFW106001_2}',
			'message', '%{CISCOFW106006_106007_1}',
			'message', '%{CISCOFW106006_106007_2}',
			'message', '%{CISCOFW106006_106007_106010}',
			'message', '%{CISCOFW106015}',
			'message', '%{CISCOFW106021}',
			'message', '%{CISCOFW106023}',
			'message', '%{CISCOFW106100}',
			'message', '%{CISCOFW110002}',
			'message', '%{CISCOFW302010}',
			'message', '%{CISCOFW302013_302014_302015_302016_1}',
			'message', '%{CISCOFW302013_302014_302015_302016_2}',
			'message', '%{CISCOFW302020_302021_1}',
			'message', '%{CISCOFW302020_302021_2}',			
			'message', '%{CISCOFW305011}',
			'message', '%{CISCOFW313001_313004_313008}',
			'message', '%{CISCOFW313005}',
			'message', '%{CISCOFW402117}',
			'message', '%{CISCOFW402119}',
			'message', '%{CISCOFW419001}',
			'message', '%{CISCOFW419002}',
			'message', '%{CISCOFW500004}',
			'message', '%{CISCOFW602303_602304_1}',
			'message', '%{CISCOFW602303_602304_2}',
			'message', '%{CISCOFW710001_710002_710003_710005_710006}',
			'message', '%{CISCOFW713172}',
			'message', '%{CISCOFW733100}',
			'message', '%{CISCOFW106014}'
			]
So one of those were renamed in the pattern file in /usr/local/nagioslogserver/logstash/patterns/ so it was pointing to an unknown pattern.
This did finally show in the logs once I enable the verbose logging.

Code: Select all

{:timestamp=>"2015-10-01T21:02:04.867000-0700", :message=>"The error reported is: \n  pattern %{CISCOFW106010} not defined"}
I noticed that there are minimal logging by default for the logstash service. I have set mine to LS_OPTS="-v" in the /etc/init.d/logstash file.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash services stopped on all nodes

Post by jolson »

Thanks for the information. I attempted to replicate your problem, and logstash.log contained the following error:

Code: Select all

{:timestamp=>"2015-10-06T14:33:44.052000-0400", :message=>"The error reported is: \n  pattern %{COMNEDAPACHELOG} not defined"}
This is a default install of Nagios Log Server, so I checked my init script, and I'm also in the lowers verbosity level. Any chance you're on an older version of NLS? I tested on 2.2 - the issue might be resolved already, and I am hesitant to file a bug report unless I can confirm that the bug still exists in the latest version.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked