Nagios suddenly stopped sending logs

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
tcsdi
Posts: 46
Joined: Thu Jan 03, 2019 10:07 am

Nagios suddenly stopped sending logs

Post by tcsdi »

Our Nagios Log Server suddenly stopped sending logs to our ELK SIEM, this started around March 8 when we checked the status. I have also attached a screenshot of the history of the logs sent.

How can I determine if my Nagios is functioning properly? It is seen sending logs, but very low as compared to what it used to send.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Nagios suddenly stopped sending logs

Post by npolovenko »

Hello, @tcsdi. Please generate and send me a profile from each log server in the cluster. A profile can be generated under Admin > System > System Status or in the command line by running:
/usr/local/nagioslogserver/scripts/profile.sh
The profile can be found at /tmp/system-profile.tar.gz.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tcsdi
Posts: 46
Joined: Thu Jan 03, 2019 10:07 am

Re: Nagios suddenly stopped sending logs

Post by tcsdi »

npolovenko wrote:Hello, @tcsdi. Please generate and send me a profile from each log server in the cluster. A profile can be generated under Admin > System > System Status or in the command line by running:
/usr/local/nagioslogserver/scripts/profile.sh
The profile can be found at /tmp/system-profile.tar.gz.
Hi @npolovenko,

Thanks for the reply, please see attached logs :)
You do not have the required permissions to view the files attached to this post.
tcsdi
Posts: 46
Joined: Thu Jan 03, 2019 10:07 am

Re: Nagios suddenly stopped sending logs

Post by tcsdi »

npolovenko wrote:Hello, @tcsdi. Please generate and send me a profile from each log server in the cluster. A profile can be generated under Admin > System > System Status or in the command line by running:
/usr/local/nagioslogserver/scripts/profile.sh
The profile can be found at /tmp/system-profile.tar.gz.
Hi @npolovenko,

Thanks for the reply, here is the profile of the server :)
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Nagios suddenly stopped sending logs

Post by npolovenko »

@tcsdi, Thanks! I'm seeing that Log Server indices were critical since January. It seems that insufficient ram was the biggest issue that caused the elasticsearch to fail.
java.lang.OutOfMemoryError: Java heap space
My recommendation would be to increase the RAM to at least 8gb on the production system. Then delete critical indexes and restore them from a backup:
https://support.nagios.com/kb/article.php?id=90
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tcsdi
Posts: 46
Joined: Thu Jan 03, 2019 10:07 am

Re: Nagios suddenly stopped sending logs

Post by tcsdi »

npolovenko wrote:@tcsdi, Thanks! I'm seeing that Log Server indices were critical since January. It seems that insufficient ram was the biggest issue that caused the elasticsearch to fail.
java.lang.OutOfMemoryError: Java heap space
My recommendation would be to increase the RAM to at least 8gb on the production system. Then delete critical indexes and restore them from a backup:
https://support.nagios.com/kb/article.php?id=90
Hi @npolovenko,

We upgraded the RAM to 8GB now but the server will stop after a few hours. I have attached another profile for you to look at.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Nagios suddenly stopped sending logs

Post by npolovenko »

@tcsdi, I'm not seeing anything out of the ordinary in the profile so far. Can you take a new screenshot of the Data Source graph?
Could you also clarify which filters are shown on your graph? I see that the graph has many colors. Is it an all-inclusive graph for all outputs or just for one particular output?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tcsdi
Posts: 46
Joined: Thu Jan 03, 2019 10:07 am

Re: Nagios suddenly stopped sending logs

Post by tcsdi »

npolovenko wrote:@tcsdi, I'm not seeing anything out of the ordinary in the profile so far. Can you take a new screenshot of the Data Source graph?
Could you also clarify which filters are shown on your graph? I see that the graph has many colors. Is it an all-inclusive graph for all outputs or just for one particular output?
Hi @npolovenkko, please see attached log for the list of all sources.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Nagios suddenly stopped sending logs

Post by npolovenko »

@tcsdi, I noticed that you're sending some logs twice to two different destinations. For example:

Code: Select all

if [type] =~ /(dnslog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1523
                sourcehost=> "10.5.115.106"
                }
            }
     if [type] =~ /(dnslog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1523
                sourcehost=> "10.5.115.107"
                }
            }
Or:

Code: Select all

if [type] =~ /(eventlog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1522
                sourcehost=> "10.5.115.106"
                codec => json {
                charset => 'CP1252'
                }
            }
            }
     if [type] =~ /(eventlog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1522
                sourcehost=> "10.5.115.107"
                codec => json {
                charset => 'CP1252'
                }
            }
I wonder if that could be causing issues. Are these two sourcehosts clustered? Could you disable duplicate output filters for some time to see if the performance will improve?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tcsdi
Posts: 46
Joined: Thu Jan 03, 2019 10:07 am

Re: Nagios suddenly stopped sending logs

Post by tcsdi »

npolovenko wrote:@tcsdi, I noticed that you're sending some logs twice to two different destinations. For example:

Code: Select all

if [type] =~ /(dnslog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1523
                sourcehost=> "10.5.115.106"
                }
            }
     if [type] =~ /(dnslog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1523
                sourcehost=> "10.5.115.107"
                }
            }
Or:

Code: Select all

if [type] =~ /(eventlog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1522
                sourcehost=> "10.5.115.106"
                codec => json {
                charset => 'CP1252'
                }
            }
            }
     if [type] =~ /(eventlog)/ {
            syslog {
                host => "172.31.108.236"
                port => 1522
                sourcehost=> "10.5.115.107"
                codec => json {
                charset => 'CP1252'
                }
            }
I wonder if that could be causing issues. Are these two sourcehosts clustered? Could you disable duplicate output filters for some time to see if the performance will improve?
Hi @npolovenko,

Yes the two sourcehosts are clustered but they were already configured that way since the beginning, even when the server is doing fine. Are there any other configuration that can cause the issue?
Locked