Fluctuating Amount of Logs

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
tvoll
Posts: 39
Joined: Fri Aug 16, 2019 9:06 am

Fluctuating Amount of Logs

Post by tvoll »

I recently noticed a strange trend going on with my Nagios Log Server install. At random times and at random intervals, the server will decrease the log intake from the hundreds of thousands (Usually seeing 500k-700k logs) down to thousands (Usually around 1k-4k). The sources that produce these logs are consistent in their output, but the Log Server will display huge dips in activity.

In the Logstash Logs I was seeing a message repeat itself: "Received an event that has a different character encoding than you configured." along with "expected_charset=>"UTF-8""
From what I was seeing elsewhere online, it could be due to a configuration issue within the Log Server's inputs. However, the examples I have seen online do not fix the issue when I implement them into our server.

The only input I current have is as follows:

Code: Select all

tcp {
  port => "5544"
  type => "syslog"
  codec => plain {
    charset => "ISO-8859-1"
  }
}
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Fluctuating Amount of Logs

Post by ssax »

Please attach a copy of your profile from Admin > System Status > Download System Profile so that we can review the logs.

Do you see any drops/errors on the NIC on the logserver or any of the ports/firewalls/IPS devices in the network path?

Code: Select all

ifconfig -a
ethtool -S INTERFACENAME
tvoll
Posts: 39
Joined: Fri Aug 16, 2019 9:06 am

Re: Fluctuating Amount of Logs

Post by tvoll »

I am not seeing any errors related to ports/firewall.
After running ethtool -S on the interface being used for our log server, I see this as the output:

Code: Select all

NIC statistics:
     rx_packets: 32760508
     tx_packets: 10867519
     rx_bytes: 43505349930
     tx_bytes: 714017812
     rx_broadcast: 0
     tx_broadcast: 0
     rx_multicast: 0
     tx_multicast: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 4510
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 43505349930
     rx_csum_offload_good: 32674732
     rx_csum_offload_errors: 17
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
I've attached the System Profile as well.

Support edit: System profile downloaded, and shared with team.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Fluctuating Amount of Logs

Post by cdienger »

It looks like it is having an issue with the characters in the SerialNo field that is being sent over. Are you able to view the logs on the device? I'd be curious what the logs look like there as well the raw data being sent. The raw data can be gathered with:

Code: Select all

yum -y tcpdump
tcpdump -s 0 -i any port 5544 and host w.x.y.z -w output.pcap
where w.x.y.z is the IP address of a device that is generating these errors in the logs. Let this run long enough to capture of of this traffic and use CTRL+C to stop it and them PM us the output.pcap this creates.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tvoll
Posts: 39
Joined: Fri Aug 16, 2019 9:06 am

Re: Fluctuating Amount of Logs

Post by tvoll »

Unfortunately, I am told "The extension pcap is not allowed." in both this post and in the PM.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Fluctuating Amount of Logs

Post by mbellerue »

Oh, maybe try zipping the file, or just taking the extension off.

Support edit: output.zip shared with team
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
tvoll
Posts: 39
Joined: Fri Aug 16, 2019 9:06 am

Re: Fluctuating Amount of Logs

Post by tvoll »

mbellerue wrote:Oh, maybe try zipping the file, or just taking the extension off.

Support edit: output.zip shared with team
It has been a few days since the output was received.
Are there any updates?
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Fluctuating Amount of Logs

Post by mbellerue »

We found that there are a number of Windows clients sending to the Syslog port (3544), when they should be sending to the Windows Event Log port (3515). As an example, 172.31.55.48 should be sending to 3515. You should double check your Windows clients to make sure they are all going to 3515.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
tvoll
Posts: 39
Joined: Fri Aug 16, 2019 9:06 am

Re: Fluctuating Amount of Logs

Post by tvoll »

Made the change.
Logs went back up to the hundreds of thousands, and then recently dipped back down to the thousands again.
Issue still persists.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Fluctuating Amount of Logs

Post by cdienger »

Please gather a profile the next time you see a dip as well as a screenshot highlighting the dip and the time it occurred. I'd like to see if can see anything interesting being logged when there is a dip. Please also verify the time on the NLS machine with the "date" command as well as the time and timezone on the machine running the browser used to access the NLS web UI(it adjusts according to the browser's time).

That said, I would also suggest increasing the memory that logstash is allocated per https://support.nagios.com/kb/article/n ... g-576.html - I don't see logstash crashing, but this may help it's behavior.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked