Timeout error in logstash and inscrease log level logstash

sacom01 · Post by **sacom01** » Tue Jan 26, 2021 5:01 am

Hi Team,

I send log to Nagios Log Server by filebeat.
Nagios Log Server is regularly timeout, do not receive log in some minute.

Here my filebeat log:
2021-01-26T15:50:26.532+0700 INFO [publisher] pipeline/retry.go:215 retryer: send wait signal to consumer
2021-01-26T15:50:26.532+0700 ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp x.x.x.x:41432->x.x.x.x:5012: i/o timeout
2021-01-26T15:50:26.532+0700 INFO [publisher] pipeline/retry.go:219 done
2021-01-26T15:50:26.540+0700 ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: client is not connected
2021-01-26T15:50:28.456+0700 ERROR [publisher_pipeline_output] pipeline/output.go:181 failed to publish events: client is not connected

My logstash input config:
beats {
type => 'test_beat'
port => 5000
client_inactivity_timeout => 86400
}

Finally, How to inscrease log level logstash in Nagios Log Server.

Thanks team.

Post by **cdienger** » Tue Jan 26, 2021 5:59 pm

After adding an input which defines a port, you need to make sure that the firewall allows that port. See https://assets.nagios.com/downloads/nag ... Inputs.pdf for commands to update the firewall.

Steps for enabling debug logging for logstash:

Edit /etc/init.d/logstash and change line 64 from:

Code: Select all

DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS}"

to:

Code: Select all

DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS} --debug"

and restart the service with:

Code: Select all

systemctl daemon-reload
systemctl restart logstash

Let this run just long enough to allow NLS to process some events from this host and then collect the /var/log/logstash/logstash.log file before reverting the config back.

sacom01 · Post by **sacom01** » Wed Jan 27, 2021 2:11 am

Hi cdienger,

I had stopped firewald but it still timeout.

Log in Logstash
{:timestamp=>"2021-01-27T13:01:34.189000+0700", :message=>"retrying failed action with response code: 503 (UnavailableShardsException[[logstash-2021.01.27][1] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4223ff95])", :level=>:info}
{:timestamp=>"2021-01-27T13:01:34.189000+0700", :message=>"retrying failed action with response code: 503 (UnavailableShardsException[[logstash-2021.01.27][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@7e183bc1])", :level=>:info}

Log in Elasticsearch
[2021-01-27 13:01:34,150][DEBUG][action.bulk ] [7744e59b-c59e-4e67-923d-e763d7b4c2e8] observer: timeout notification from cluster service. timeout setting [1m], time since s
tart [1m]
[2021-01-27 13:01:34,150][DEBUG][action.bulk ] [7744e59b-c59e-4e67-923d-e763d7b4c2e8] observer: timeout notification from cluster service. timeout setting [1m], time since s
tart [1m]

Please help me fix this error.

Thanks cdienger

Post by **cdienger** » Thu Jan 28, 2021 10:36 am

Please provide a profile from the system. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh

This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the system. You can split the file into smaller files with the split command on the NLS(or other Linux machine) command line:

Code: Select all

split -b 5000000 /tmp/system-profile.tar.gz system-profile- -d

The above command will split the system-profile.tar.gz into 5MB segments and save them to files with the naming convention system-profile-nn.

Send this to me via a private message.

sacom01 · Post by **sacom01** » Thu Jan 28, 2021 10:38 pm

Hi cdienger,

My system have 4 instances with 2 instances in DC site and 2 instances in DR site.
So i will send 2 attach file in 2 site for you.

Please help me review my system, because timeout still appear everyday.

Thank you.

Post by **cdienger** » Fri Jan 29, 2021 4:58 pm

I received two profiles and would like to get profiles from the other two nodes as well.

At least one node appears to be having issues writing to the database quickly and is having to throttle indexing. Are the NLS machines using SSD or spinning disks?

sacom01 · Post by **sacom01** » Sun Jan 31, 2021 8:34 pm

Hi cdienger,

I use disk SAS 10000 rpm for 4 instances.
And i had just sent system profile on 4 node for you.
Please help me review its.

Thank you.

Post by **cdienger** » Tue Feb 02, 2021 12:18 pm

There doesn't appear to be any of the throttling message in the most recent log, but we do recommend SSDs for the best performance.

The problems seem to start around roughly the same time each day - 4am. Are there system backups or other tasks running around this time? What does the frequency and next run time look like under Admin > System > Command Subsystem?

sacom01 · Post by **sacom01** » Thu Feb 04, 2021 10:22 pm

Hi Cdienger,

The error appear several times a day, not only at 4am.
Now i remove 2 server DR site, timeout in client do not appear.
when i remove 2 server DR site, i clear all data but cluster status is red.
How can i make cluster status green again?
And how can i build DR site for "disaster recovery" without join DC site.

Thanks Cdienger.

Post by **cdienger** » Fri Feb 05, 2021 4:05 pm

Is the current status of the DC site red as well or is it ok?

The red status will occur when there is a primary shard that is unassigned. The fix is to either remove the index with the missing shard or to reassign the shard. https://support.nagios.com/kb/article/n ... th-90.html has more information, but I'd like to get a profile from one of the machines in the DR cluster before you run anything.

Nagios Support Forum

Timeout error in logstash and inscrease log level logstash

Timeout error in logstash and inscrease log level logstash

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta

Re: Timeout error in logstash and inscrease log level logsta