I recently noticed a strange trend going on with my Nagios Log Server install. At random times and at random intervals, the server will decrease the log intake from the hundreds of thousands (Usually seeing 500k-700k logs) down to thousands (Usually around 1k-4k). The sources that produce these logs are consistent in their output, but the Log Server will display huge dips in activity.
In the Logstash Logs I was seeing a message repeat itself: "Received an event that has a different character encoding than you configured." along with "expected_charset=>"UTF-8""
From what I was seeing elsewhere online, it could be due to a configuration issue within the Log Server's inputs. However, the examples I have seen online do not fix the issue when I implement them into our server.
I am not seeing any errors related to ports/firewall.
After running ethtool -S on the interface being used for our log server, I see this as the output:
It looks like it is having an issue with the characters in the SerialNo field that is being sent over. Are you able to view the logs on the device? I'd be curious what the logs look like there as well the raw data being sent. The raw data can be gathered with:
yum -y tcpdump
tcpdump -s 0 -i any port 5544 and host w.x.y.z -w output.pcap
where w.x.y.z is the IP address of a device that is generating these errors in the logs. Let this run long enough to capture of of this traffic and use CTRL+C to stop it and them PM us the output.pcap this creates.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
We found that there are a number of Windows clients sending to the Syslog port (3544), when they should be sending to the Windows Event Log port (3515). As an example, 172.31.55.48 should be sending to 3515. You should double check your Windows clients to make sure they are all going to 3515.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Please gather a profile the next time you see a dip as well as a screenshot highlighting the dip and the time it occurred. I'd like to see if can see anything interesting being logged when there is a dip. Please also verify the time on the NLS machine with the "date" command as well as the time and timezone on the machine running the browser used to access the NLS web UI(it adjusts according to the browser's time).