Timeout error in logstash and inscrease log level logstash

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Timeout error in logstash and inscrease log level logstash

Post by sacom01 »

Hi Team,

I send log to Nagios Log Server by filebeat.
Nagios Log Server is regularly timeout, do not receive log in some minute.

Here my filebeat log:
2021-01-26T15:50:26.532+0700 INFO [publisher] pipeline/retry.go:215 retryer: send wait signal to consumer
2021-01-26T15:50:26.532+0700 ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: read tcp x.x.x.x:41432->x.x.x.x:5012: i/o timeout
2021-01-26T15:50:26.532+0700 INFO [publisher] pipeline/retry.go:219 done
2021-01-26T15:50:26.540+0700 ERROR [logstash] logstash/async.go:280 Failed to publish events caused by: client is not connected
2021-01-26T15:50:28.456+0700 ERROR [publisher_pipeline_output] pipeline/output.go:181 failed to publish events: client is not connected

My logstash input config:
beats {
type => 'test_beat'
port => 5000
client_inactivity_timeout => 86400
}

Finally, How to inscrease log level logstash in Nagios Log Server.

Thanks team.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Timeout error in logstash and inscrease log level logsta

Post by cdienger »

After adding an input which defines a port, you need to make sure that the firewall allows that port. See https://assets.nagios.com/downloads/nag ... Inputs.pdf for commands to update the firewall.

Steps for enabling debug logging for logstash:

Edit /etc/init.d/logstash and change line 64 from:

Code: Select all

DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS}"
to:

Code: Select all

DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS} --debug"
and restart the service with:

Code: Select all

systemctl daemon-reload
systemctl restart logstash
Let this run just long enough to allow NLS to process some events from this host and then collect the /var/log/logstash/logstash.log file before reverting the config back.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: Timeout error in logstash and inscrease log level logsta

Post by sacom01 »

Hi cdienger,

I had stopped firewald but it still timeout.

Log in Logstash
{:timestamp=>"2021-01-27T13:01:34.189000+0700", :message=>"retrying failed action with response code: 503 (UnavailableShardsException[[logstash-2021.01.27][1] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4223ff95])", :level=>:info}
{:timestamp=>"2021-01-27T13:01:34.189000+0700", :message=>"retrying failed action with response code: 503 (UnavailableShardsException[[logstash-2021.01.27][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@7e183bc1])", :level=>:info}

Log in Elasticsearch
[2021-01-27 13:01:34,150][DEBUG][action.bulk ] [7744e59b-c59e-4e67-923d-e763d7b4c2e8] observer: timeout notification from cluster service. timeout setting [1m], time since s
tart [1m]
[2021-01-27 13:01:34,150][DEBUG][action.bulk ] [7744e59b-c59e-4e67-923d-e763d7b4c2e8] observer: timeout notification from cluster service. timeout setting [1m], time since s
tart [1m]

Please help me fix this error.

Thanks cdienger
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Timeout error in logstash and inscrease log level logsta

Post by cdienger »

Please provide a profile from the system. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the system. You can split the file into smaller files with the split command on the NLS(or other Linux machine) command line:

Code: Select all

split -b 5000000 /tmp/system-profile.tar.gz system-profile- -d
The above command will split the system-profile.tar.gz into 5MB segments and save them to files with the naming convention system-profile​-nn.

Send this to me via a private message.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: Timeout error in logstash and inscrease log level logsta

Post by sacom01 »

Hi cdienger,

My system have 4 instances with 2 instances in DC site and 2 instances in DR site.
So i will send 2 attach file in 2 site for you.

Please help me review my system, because timeout still appear everyday.

Thank you.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Timeout error in logstash and inscrease log level logsta

Post by cdienger »

I received two profiles and would like to get profiles from the other two nodes as well.

At least one node appears to be having issues writing to the database quickly and is having to throttle indexing. Are the NLS machines using SSD or spinning disks?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: Timeout error in logstash and inscrease log level logsta

Post by sacom01 »

Hi cdienger,

I use disk SAS 10000 rpm for 4 instances.
And i had just sent system profile on 4 node for you.
Please help me review its.

Thank you.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Timeout error in logstash and inscrease log level logsta

Post by cdienger »

There doesn't appear to be any of the throttling message in the most recent log, but we do recommend SSDs for the best performance.

The problems seem to start around roughly the same time each day - 4am. Are there system backups or other tasks running around this time? What does the frequency and next run time look like under Admin > System > Command Subsystem?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: Timeout error in logstash and inscrease log level logsta

Post by sacom01 »

Hi Cdienger,

The error appear several times a day, not only at 4am.
Now i remove 2 server DR site, timeout in client do not appear.
when i remove 2 server DR site, i clear all data but cluster status is red.
How can i make cluster status green again?
And how can i build DR site for "disaster recovery" without join DC site.

Thanks Cdienger.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Timeout error in logstash and inscrease log level logsta

Post by cdienger »

Is the current status of the DC site red as well or is it ok?

The red status will occur when there is a primary shard that is unassigned. The fix is to either remove the index with the missing shard or to reassign the shard. https://support.nagios.com/kb/article/n ... th-90.html has more information, but I'd like to get a profile from one of the machines in the DR cluster before you run anything.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked