On NLS client rsyslog/python/httpd processing stopped
Posted: Wed Mar 11, 2015 9:27 am
Nagios Log Server • 2015R1.3 (From VM Template)
We had a strange thing happen yesterday. 2 systems, a primary and failover, which are configured to send all logs (system and httpd) to NLS just pretty much stopped processing. They mainly run python wgsi processing. We spent some time trying to figure out what was going on since when the primary would stop responding to the varnishd cache server, the varnish cache server would move over the failover system, it would work for a few minutes, then bog down fail back to primary since it was now responding... back and forth. We went like this for 40mins or so until I decided to start backing out "extra" processing. First shut off all Nagios XI checks, then moved the NLS rsyslog config files and restarted rsyslogd on the primary system. When it came back as primary, everything was working as it should. No slowdown, no errors etc.. So on failover system, I moved the NLS rsyslog config files, and monitored httpd access logs and forces a failover. When we saw the system start to fail, I restarted rsyslog to implement the new config and all processing started back up. In the apache error_log, there were several stack trace entries from the restart of rsyslogd.
This configuration had been up and running since around Feb 25th so it surprised us when this cleared up the problems. I have attached all the rsyslog config files used for the configuration. After we got things back up and running, I did notice that the NLS web interface was very slow to respond, in fact I had to reboot the NLS to clear it up.
System only will allow 3 attachments will attach other 3 in reply.
Client systems OS one is RHEL 6.4 and one is RHEL 6.6
If other logs from NLS are wanted, let me know.
Thanks
Mitch
We had a strange thing happen yesterday. 2 systems, a primary and failover, which are configured to send all logs (system and httpd) to NLS just pretty much stopped processing. They mainly run python wgsi processing. We spent some time trying to figure out what was going on since when the primary would stop responding to the varnishd cache server, the varnish cache server would move over the failover system, it would work for a few minutes, then bog down fail back to primary since it was now responding... back and forth. We went like this for 40mins or so until I decided to start backing out "extra" processing. First shut off all Nagios XI checks, then moved the NLS rsyslog config files and restarted rsyslogd on the primary system. When it came back as primary, everything was working as it should. No slowdown, no errors etc.. So on failover system, I moved the NLS rsyslog config files, and monitored httpd access logs and forces a failover. When we saw the system start to fail, I restarted rsyslog to implement the new config and all processing started back up. In the apache error_log, there were several stack trace entries from the restart of rsyslogd.
Code: Select all
Tue Mar 10 14:43:37 2015] [error] Traceback (most recent call last):
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 799, in emit
[Tue Mar 10 14:43:37 2015] [error] self._connect_unixsocket(self.address)
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 731, in _connect_unixsocket
[Tue Mar 10 14:43:37 2015] [error] self.socket.connect(address)
[Tue Mar 10 14:43:37 2015] [error] File "<string>", line 1, in connect
[Tue Mar 10 14:43:37 2015] [error] error: [Errno 2] No such file or directory
[Tue Mar 10 14:43:37 2015] [error] Traceback (most recent call last):
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 799, in emit
[Tue Mar 10 14:43:37 2015] [error] self._connect_unixsocket(self.address)
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 731, in _connect_unixsocket
[Tue Mar 10 14:43:37 2015] [error] self.socket.connect(address)
[Tue Mar 10 14:43:37 2015] [error] File "<string>", line 1, in connect
[Tue Mar 10 14:43:37 2015] [error] error: [Errno 2] No such file or directory
[Tue Mar 10 14:43:37 2015] [error] Traceback (most recent call last):
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 799, in emit
[Tue Mar 10 14:43:37 2015] [error] self._connect_unixsocket(self.address)
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 731, in _connect_unixsocket
[Tue Mar 10 14:43:37 2015] [error] self.socket.connect(address)
[Tue Mar 10 14:43:37 2015] [error] File "<string>", line 1, in connect
[Tue Mar 10 14:43:37 2015] [error] error: [Errno 2] No such file or directory
[Tue Mar 10 14:43:37 2015] [error] Traceback (most recent call last):
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 799, in emit
[Tue Mar 10 14:43:37 2015] [error] self._connect_unixsocket(self.address)
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 731, in _connect_unixsocket
[Tue Mar 10 14:43:37 2015] [error] self.socket.connect(address)
[Tue Mar 10 14:43:37 2015] [error] File "<string>", line 1, in connect
[Tue Mar 10 14:43:37 2015] [error] error: [Errno 2] No such file or directoryCode: Select all
Tue Mar 10 14:43:37 2015] [error] Traceback (most recent call last):
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 799, in emit
[Tue Mar 10 14:43:37 2015] [error] self._connect_unixsocket(self.address)
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 731, in _connect_unixsocket
[Tue Mar 10 14:43:37 2015] [error] self.socket.connect(address)
[Tue Mar 10 14:43:37 2015] [error] File "<string>", line 1, in connect
[Tue Mar 10 14:43:37 2015] [error] error: [Errno 2] No such file or directory
[Tue Mar 10 14:43:37 2015] [error] Traceback (most recent call last):
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 799, in emit
[Tue Mar 10 14:43:37 2015] [error] self._connect_unixsocket(self.address)
[Tue Mar 10 14:43:37 2015] [error] File "/usr/lib64/python2.6/logging/handlers.py", line 731, in _connect_unixsocket
[Tue Mar 10 14:43:37 2015] [error] self.socket.connect(address)
[Tue Mar 10 14:43:37 2015] [error] File "<string>", line 1, in connect
[Tue Mar 10 14:43:37 2015] [error] error: [Errno 2] No such file or directory
System only will allow 3 attachments will attach other 3 in reply.
Client systems OS one is RHEL 6.4 and one is RHEL 6.6
If other logs from NLS are wanted, let me know.
Thanks
Mitch