Nagios suddenly stopped sending logs
Nagios suddenly stopped sending logs
Our Nagios Log Server suddenly stopped sending logs to our ELK SIEM, this started around March 8 when we checked the status. I have also attached a screenshot of the history of the logs sent.
How can I determine if my Nagios is functioning properly? It is seen sending logs, but very low as compared to what it used to send.
How can I determine if my Nagios is functioning properly? It is seen sending logs, but very low as compared to what it used to send.
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios suddenly stopped sending logs
Hello, @tcsdi. Please generate and send me a profile from each log server in the cluster. A profile can be generated under Admin > System > System Status or in the command line by running:
The profile can be found at /tmp/system-profile.tar.gz./usr/local/nagioslogserver/scripts/profile.sh
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios suddenly stopped sending logs
Hi @npolovenko,npolovenko wrote:Hello, @tcsdi. Please generate and send me a profile from each log server in the cluster. A profile can be generated under Admin > System > System Status or in the command line by running:The profile can be found at /tmp/system-profile.tar.gz./usr/local/nagioslogserver/scripts/profile.sh
Thanks for the reply, please see attached logs
You do not have the required permissions to view the files attached to this post.
Re: Nagios suddenly stopped sending logs
Hi @npolovenko,npolovenko wrote:Hello, @tcsdi. Please generate and send me a profile from each log server in the cluster. A profile can be generated under Admin > System > System Status or in the command line by running:The profile can be found at /tmp/system-profile.tar.gz./usr/local/nagioslogserver/scripts/profile.sh
Thanks for the reply, here is the profile of the server
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios suddenly stopped sending logs
@tcsdi, Thanks! I'm seeing that Log Server indices were critical since January. It seems that insufficient ram was the biggest issue that caused the elasticsearch to fail.
https://support.nagios.com/kb/article.php?id=90
My recommendation would be to increase the RAM to at least 8gb on the production system. Then delete critical indexes and restore them from a backup:java.lang.OutOfMemoryError: Java heap space
https://support.nagios.com/kb/article.php?id=90
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios suddenly stopped sending logs
Hi @npolovenko,npolovenko wrote:@tcsdi, Thanks! I'm seeing that Log Server indices were critical since January. It seems that insufficient ram was the biggest issue that caused the elasticsearch to fail.My recommendation would be to increase the RAM to at least 8gb on the production system. Then delete critical indexes and restore them from a backup:java.lang.OutOfMemoryError: Java heap space
https://support.nagios.com/kb/article.php?id=90
We upgraded the RAM to 8GB now but the server will stop after a few hours. I have attached another profile for you to look at.
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios suddenly stopped sending logs
@tcsdi, I'm not seeing anything out of the ordinary in the profile so far. Can you take a new screenshot of the Data Source graph?
Could you also clarify which filters are shown on your graph? I see that the graph has many colors. Is it an all-inclusive graph for all outputs or just for one particular output?
Could you also clarify which filters are shown on your graph? I see that the graph has many colors. Is it an all-inclusive graph for all outputs or just for one particular output?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios suddenly stopped sending logs
Hi @npolovenkko, please see attached log for the list of all sources.npolovenko wrote:@tcsdi, I'm not seeing anything out of the ordinary in the profile so far. Can you take a new screenshot of the Data Source graph?
Could you also clarify which filters are shown on your graph? I see that the graph has many colors. Is it an all-inclusive graph for all outputs or just for one particular output?
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios suddenly stopped sending logs
@tcsdi, I noticed that you're sending some logs twice to two different destinations. For example:
Or:
I wonder if that could be causing issues. Are these two sourcehosts clustered? Could you disable duplicate output filters for some time to see if the performance will improve?
Code: Select all
if [type] =~ /(dnslog)/ {
syslog {
host => "172.31.108.236"
port => 1523
sourcehost=> "10.5.115.106"
}
}
if [type] =~ /(dnslog)/ {
syslog {
host => "172.31.108.236"
port => 1523
sourcehost=> "10.5.115.107"
}
}Code: Select all
if [type] =~ /(eventlog)/ {
syslog {
host => "172.31.108.236"
port => 1522
sourcehost=> "10.5.115.106"
codec => json {
charset => 'CP1252'
}
}
}
if [type] =~ /(eventlog)/ {
syslog {
host => "172.31.108.236"
port => 1522
sourcehost=> "10.5.115.107"
codec => json {
charset => 'CP1252'
}
}As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios suddenly stopped sending logs
Hi @npolovenko,npolovenko wrote:@tcsdi, I noticed that you're sending some logs twice to two different destinations. For example:Or:Code: Select all
if [type] =~ /(dnslog)/ { syslog { host => "172.31.108.236" port => 1523 sourcehost=> "10.5.115.106" } } if [type] =~ /(dnslog)/ { syslog { host => "172.31.108.236" port => 1523 sourcehost=> "10.5.115.107" } }I wonder if that could be causing issues. Are these two sourcehosts clustered? Could you disable duplicate output filters for some time to see if the performance will improve?Code: Select all
if [type] =~ /(eventlog)/ { syslog { host => "172.31.108.236" port => 1522 sourcehost=> "10.5.115.106" codec => json { charset => 'CP1252' } } } if [type] =~ /(eventlog)/ { syslog { host => "172.31.108.236" port => 1522 sourcehost=> "10.5.115.107" codec => json { charset => 'CP1252' } }
Yes the two sourcehosts are clustered but they were already configured that way since the beginning, even when the server is doing fine. Are there any other configuration that can cause the issue?