Page 1 of 2
Timeout error in logstash and inscrease log level logstash
Posted: Tue Jan 26, 2021 5:01 am
by sacom01
Hi Team,
I send log to Nagios Log Server by filebeat.
Nagios Log Server is regularly timeout, do not receive log in some minute.
Here my filebeat log:
2021-01-26T15:50:26.532+0700 INFO [publisher] pipeline/retry.go:215 retryer: send wait signal to consumer
2021-01-26T15:50:26.532+0700 ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp x.x.x.x:41432->x.x.x.x:5012: i/o timeout
2021-01-26T15:50:26.532+0700 INFO [publisher] pipeline/retry.go:219 done
2021-01-26T15:50:26.540+0700 ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: client is not connected
2021-01-26T15:50:28.456+0700 ERROR [publisher_pipeline_output] pipeline/output.go:181 failed to publish events: client is not connected
My logstash input config:
beats {
type => 'test_beat'
port => 5000
client_inactivity_timeout => 86400
}
Finally, How to inscrease log level logstash in Nagios Log Server.
Thanks team.
Re: Timeout error in logstash and inscrease log level logsta
Posted: Tue Jan 26, 2021 5:59 pm
by cdienger
After adding an input which defines a port, you need to make sure that the firewall allows that port. See
https://assets.nagios.com/downloads/nag ... Inputs.pdf for commands to update the firewall.
Steps for enabling debug logging for logstash:
Edit /etc/init.d/logstash and change line 64 from:
Code: Select all
DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS}"
to:
Code: Select all
DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS} --debug"
and restart the service with:
Code: Select all
systemctl daemon-reload
systemctl restart logstash
Let this run just long enough to allow NLS to process some events from this host and then collect the /var/log/logstash/logstash.log file before reverting the config back.
Re: Timeout error in logstash and inscrease log level logsta
Posted: Wed Jan 27, 2021 2:11 am
by sacom01
Hi cdienger,
I had stopped firewald but it still timeout.
Log in Logstash
{:timestamp=>"2021-01-27T13:01:34.189000+0700", :message=>"retrying failed action with response code: 503 (UnavailableShardsException[[logstash-2021.01.27][1] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4223ff95])", :level=>:info}
{:timestamp=>"2021-01-27T13:01:34.189000+0700", :message=>"retrying failed action with response code: 503 (UnavailableShardsException[[logstash-2021.01.27][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@7e183bc1])", :level=>:info}
Log in Elasticsearch
[2021-01-27 13:01:34,150][DEBUG][action.bulk ] [7744e59b-c59e-4e67-923d-e763d7b4c2e8] observer: timeout notification from cluster service. timeout setting [1m], time since s
tart [1m]
[2021-01-27 13:01:34,150][DEBUG][action.bulk ] [7744e59b-c59e-4e67-923d-e763d7b4c2e8] observer: timeout notification from cluster service. timeout setting [1m], time since s
tart [1m]
Please help me fix this error.
Thanks cdienger
Re: Timeout error in logstash and inscrease log level logsta
Posted: Thu Jan 28, 2021 10:36 am
by cdienger
Please provide a profile from the system. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:
Code: Select all
/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.
Note that this file can be very large and may not be able to be uploaded through the system. You can split the file into smaller files with the split command on the NLS(or other Linux machine) command line:
Code: Select all
split -b 5000000 /tmp/system-profile.tar.gz system-profile- -d
The above command will split the system-profile.tar.gz into 5MB segments and save them to files with the naming convention system-profile-nn.
Send this to me via a private message.
Re: Timeout error in logstash and inscrease log level logsta
Posted: Thu Jan 28, 2021 10:38 pm
by sacom01
Hi cdienger,
My system have 4 instances with 2 instances in DC site and 2 instances in DR site.
So i will send 2 attach file in 2 site for you.
Please help me review my system, because timeout still appear everyday.
Thank you.
Re: Timeout error in logstash and inscrease log level logsta
Posted: Fri Jan 29, 2021 4:58 pm
by cdienger
I received two profiles and would like to get profiles from the other two nodes as well.
At least one node appears to be having issues writing to the database quickly and is having to throttle indexing. Are the NLS machines using SSD or spinning disks?
Re: Timeout error in logstash and inscrease log level logsta
Posted: Sun Jan 31, 2021 8:34 pm
by sacom01
Hi cdienger,
I use disk SAS 10000 rpm for 4 instances.
And i had just sent system profile on 4 node for you.
Please help me review its.
Thank you.
Re: Timeout error in logstash and inscrease log level logsta
Posted: Tue Feb 02, 2021 12:18 pm
by cdienger
There doesn't appear to be any of the throttling message in the most recent log, but we do recommend SSDs for the best performance.
The problems seem to start around roughly the same time each day - 4am. Are there system backups or other tasks running around this time? What does the frequency and next run time look like under Admin > System > Command Subsystem?
Re: Timeout error in logstash and inscrease log level logsta
Posted: Thu Feb 04, 2021 10:22 pm
by sacom01
Hi Cdienger,
The error appear several times a day, not only at 4am.
Now i remove 2 server DR site, timeout in client do not appear.
when i remove 2 server DR site, i clear all data but cluster status is red.
How can i make cluster status green again?
And how can i build DR site for "disaster recovery" without join DC site.
Thanks Cdienger.
Re: Timeout error in logstash and inscrease log level logsta
Posted: Fri Feb 05, 2021 4:05 pm
by cdienger
Is the current status of the DC site red as well or is it ok?
The red status will occur when there is a primary shard that is unassigned. The fix is to either remove the index with the missing shard or to reassign the shard.
https://support.nagios.com/kb/article/n ... th-90.html has more information, but I'd like to get a profile from one of the machines in the DR cluster before you run anything.