Nagios Support Forum

Posted: **Wed Oct 05, 2016 3:35 am**

Hello,

is it normal that the RAM is fully used?

i have searched in the forum and have set the ES_HEAP_SIZE to 4g
But the RAM is still full. The Image is attached.
My ressources
10g RAM
Intel Xeon 4 Cores
eSXI VM-Ware
Please help me with my problem.

Best Regards,
Sven

Posted: **Wed Oct 05, 2016 9:13 am**

Yes, this is to be expected. Java is a hog, and from my understanding pre-allocates memory to use at it's own will. With that, NLS's main consumption is going to be ram as it will cache your logs here, to allow quick viewing through them all.

A couple questions for you:
- How much data is open within all of your open indexes?
- What are your current backup / maintenance settings for closing indexes?

Posted: **Tue Oct 25, 2016 4:04 am**

Under Index Status 1.685549 Documents open. I think that is not much i have 12 Hosts they send logs to the logserver.
Current Backup is
Optimize indexex older than 2 Days
Close Indexes older than 3 Days
Delete Indexes older than 30 Days
Delete backups older than 30 Days

i have 10 GB RAM that is fully used from Nagios Logserver. The nagios logserver use my SWAP 3MB/3GB

I thought there is a way to limited the main RAM for Nagios Logserver???

greetings
Sven

Posted: **Tue Oct 25, 2016 10:50 am**

Sven.Z wrote:I thought there is a way to limited the main RAM for Nagios Logserver???

You can hard-code memory/heap limits in Java for elasticsearch and logstash, but we do not recommend this.

If you are unsatisfied with the amount of memory your machine is using, your primary options are to either increase the available physical memory (Java does not use SWAP very effectively) or reduce the number of open indices. The open indices are what consume the most RAM since open indices are stored in RAM for quick searching/processing. Elasticsearch is, by design, a memory hog.

Posted: **Thu Dec 08, 2016 2:51 am**

Hello,

i can´t reach my Nagioslogserver Instance because the RAM Is full. I need a workaround to limit the RAM or another tip. The System is only reachable over SSH.
I have attached the Top Command.
Please Help me with my problem. This Logserver Instance is not stable with this settings.
After this Uptime:
08:50:38 up 37 days, 22:09, 1 user, load average: 0,62, 0,35, 0,17
is this instance unreachable over Webfrontend.
I have Limit the open indices to 1 Day open. After 1 Day Close and After 3 Days Backup. Delete Backup After 10Days

I have 4 Cores eSXI Host with 12GB RAM.

EDIT: I think there is a Problem with the httpd service i have restart the Logstash and elasticsearch service. Anyway the Webfrontend is unreachable. Then i have done a restart of httpd. No Response from the machine. I have to do a Service stop from Logstash/elasticsearch and after this a httpd Stop and then restart all services. this are only for Information.

Here is the workflow:
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Mo 2016-10-31 10:41:43 CET; 1 months 7 days ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 23170 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 858 (httpd)
Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec"
CGroup: /system.slice/httpd.service
└─858 /usr/sbin/httpd -DFOREGROUND

Nov 25 14:31:00 *****: apache : TTY=unk...
Nov 25 14:31:00 *****: apache : TTY=unk...
Nov 28 03:44:01 *****: AH00557: httpd: a...
Nov 28 03:44:01 *****: AH00558: httpd: C...
Nov 28 03:44:02*****: Reloaded The Apache...
Dez 08 08:14:46 *****: apache : TTY=unk...
Dez 08 08:14:52 *****: apache : TTY=unk...
Dez 08 08:14:52 *****: apache : TTY=unk...
Dez 08 08:14:53 *****: apache : TTY=unk...
Dez 08 08:14:57 *****: apache : TTY=unk...
Hint: Some lines were ellipsized, use -l to show in full. #***** is only because fqdn
[root@nagioslogserver ~]# service httpd restart
Redirecting to /bin/systemctl restart httpd.service
^C
[root@nagioslogserver ~]# service httpd stop
Redirecting to /bin/systemctl stop httpd.service
^[[A^C
[root@nagioslogserver ~]# service logstash stop
Stopping logstash (via systemctl): [ OK ]
[root@nagioslogserver ~]# service elasticsearch stop
Stopping elasticsearch (via systemctl): [ OK ]
[root@nagioslogserver ~]# service httpd stop
Redirecting to /bin/systemctl stop httpd.service
[root@nagioslogserver ~]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@nagioslogserver ~]# service elasticsearch start
Starting elasticsearch (via systemctl): [ OK ]
[root@nagioslogserver ~]# service logstash start
Starting logstash (via systemctl): [ OK ]

Thanks
Sven

Posted: **Thu Dec 08, 2016 10:31 am**

Let's take a look at what indices are open, so we can figure out what can be closed off. Please run the following, and send over the output.

Code: Select all

curl 'localhost:9200/_cat/indices?v'

Also, please tell us exactly what you have modified in your ES configuration so we can understand that part too.

Posted: **Mon Dec 19, 2016 4:59 am**

Here is the result:

Code: Select all

[code]<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" CONTENT="text/html; charset=utf-8">
<title>ERROR: The requested URL could not be retrieved</title>
<style type="text/css"><!--
 /*
 Stylesheet for Squid Error pages
 Adapted from design by Free CSS Templates
 http://www.freecsstemplates.org
 Released for free under a Creative Commons Attribution 2.5 License
*/

/* Page basics */
* {
        font-family: verdana, sans-serif;
}

html body {
        margin: 0;
        padding: 0;
        background: #efefef;
        font-size: 12px;
        color: #1e1e1e;
}

/* Page displayed title area */
#titles {
        margin-left: 15px;
        padding: 10px;
        padding-left: 100px;
        background: url('http://www.squid-cache.org/Artwork/SN.png') no-repeat left;
}

/* initial title */
#titles h1 {
        color: #000000;
}
#titles h2 {
        color: #000000;
}

/* special event: FTP success page titles */
#titles ftpsuccess {
        background-color:#00ff00;
        width:100%;
}

/* Page displayed body content area */
#content {
        padding: 10px;
        background: #ffffff;
}

/* General text */
p {
}

/* error brief description */
#error p {
}

/* some data which may have caused the problem */
#data {
}

/* the error message received from the system or other software */
#sysmsg {
}

pre {
    font-family:sans-serif;
}

/* special event: FTP / Gopher directory listing */
#dirmsg {
    font-family: courier;
    color: black;
    font-size: 10pt;
}
#dirlisting {
    margin-left: 2%;
    margin-right: 2%;
}
#dirlisting tr.entry td.icon,td.filename,td.size,td.date {
    border-bottom: groove;
}
#dirlisting td.size {
    width: 50px;
    text-align: right;
    padding-right: 5px;
}

/* horizontal lines */
hr {
        margin: 0;
}

/* page displayed footer area */
#footer {
        font-size: 9px;
        padding-left: 10px;
}


body
:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }
:lang(he) { direction: rtl; }
 --></style>
</head><body id=ERR_CONNECT_FAIL>
<div id="titles">
<h1>ERROR</h1>
<h2>The requested URL could not be retrieved</h2>
</div>
<hr>

<div id="content">
<p>The following error was encountered while trying to retrieve the URL: <a href="http://localhost:9200/_cat/indices?">http://localhost:9200/_cat/indices?</a></p>

<blockquote id="error">
<p><b>Connection to 127.0.0.1 failed.</b></p>
</blockquote>

<p id="sysmsg">The system returned: <i>(111) Connection refused</i></p>

<p>The remote host or network may be down. Please try the request again.</p>

<p>Your cache administrator is <a href="mailto:webmaster?subject=CacheErrorInfo%20-%20ERR_CONNECT_FAIL&body=CacheHost%3A%20proxy.fh-swf.de%0D%0AErrPage%3A%20ERR_CONNECT_FAIL%0D%0AErr%3A%20(111)%20Connection%20refused%0D%0ATimeStamp%3A%20Mon,%2019%20Dec%202016%2009%3A43%3A23%20GMT%0D%0A%0D%0AClientIP%3A%20172.17.17.77%0D%0AServerIP%3A%20127.0.0.1%0D%0A%0D%0AHTTP%20Request%3A%0D%0AGET%20%2F_cat%2Findices%3Fv%20HTTP%2F1.1%0AUser-Agent%3A%20curl%2F7.29.0%0D%0AAccept%3A%20*%2F*%0D%0AProxy-Connection%3A%20Keep-Alive%0D%0AHost%3A%20localhost%3A9200%0D%0A%0D%0A%0D%0A">webmaster</a>.</p>

<br>
</div>

<hr>
<div id="footer">
<p>Generated Mon, 19 Dec 2016 09:43:23 GMT by proxy.**.de (squid/3.1.19)</p>
<!-- ERR_CONNECT_FAIL -->
</div>
</body></html>

[/code]
I have Configure the standard port for syslog 514. This was only with root permissions possible. Then i have disable ipv6.
Then i have do /etc/sysconfig/elasticsearch ES_HEAP_SIZE=4g AND MAX_LOCKED_MEMORY=2
I have upgrade the system to the Version 1.4.4.
And of course a proxy server for internet conectivity

Thats all.

Posted: **Mon Dec 19, 2016 11:27 am**

rkennedy wrote:Let's take a look at what indices are open, so we can figure out what can be closed off. Please run the following, and send over the output.
Code: Select all
curl 'localhost:9200/_cat/indices?v'
Also, please tell us exactly what you have modified in your ES configuration so we can understand that part too.

Did you run this from the NLS machine? If so, you'll want to configure your proxy not to proxy traffic to localhost. The output for this, should show us the size of all your open indices, which is what we need to see. Could you try again?

Here's what it shows from my system -

Code: Select all

[root@bob linux-nrpe-agent]# curl 'localhost:9200/_cat/indices?v'
health status index               pri rep docs.count docs.deleted store.size pri.store.size
       close  logstash-2016.10.16
yellow open   nagioslogserver_log   5   1     747357            0     64.1mb         64.1mb
       close  logstash-2016.08.25
       close  logstash-2016.10.20
       close  logstash-2016.10.21
       close  logstash-2016.10.10
       close  logstash-2016.10.31
yellow open   nagioslogserver       1   1        108            5    101.2kb        101.2kb
       close  logstash-2016.10.12
       close  logstash-2016.10.08
       close  logstash-2016.10.15
       close  logstash-2016.10.09
       close  logstash-2016.10.22
yellow open   logstash-2016.12.21   5   1       4058            0    996.3kb        996.3kb
       close  logstash-2016.08.31
       close  logstash-2016.10.11
       close  logstash-2016.08.22
       close  logstash-2016.08.26
       close  logstash-2016.10.29
       close  logstash-2016.10.24
       close  logstash-2016.10.28
       close  logstash-2016.10.17
       close  logstash-2016.10.19
       close  logstash-2016.10.13
       close  logstash-2016.08.28
       close  logstash-2016.10.25
       close  logstash-2016.10.06
       close  logstash-2016.11.19
       close  logstash-2016.10.30
       close  logstash-2016.08.23
       close  logstash-2016.08.24
       close  logstash-2016.10.18
       close  logstash-2016.10.27
       close  logstash-2016.09.01
       close  logstash-2016.10.26
yellow open   kibana-int            5   1         17            1    189.9kb        189.9kb
       close  logstash-2016.08.21
       close  logstash-2016.10.14
       close  logstash-2016.11.07
       close  logstash-2016.10.07
       close  logstash-2016.10.23
       close  logstash-2016.11.01

Posted: **Tue Dec 20, 2016 2:46 am**

Okay Here it is:

Code: Select all

[root@nagioslogserver ~]# curl 'localhost:9200/_cat/indices?v'
health status index               pri rep docs.count docs.deleted store.size pri.store.size
       close  logstash-2016.12.08
yellow open   kibana-int            5   1         11            0      106kb          106kb
       close  logstash-2016.12.16
yellow open   nagioslogserver_log   5   1    1091871            0     98.9mb         98.9mb
yellow open   logstash-2016.12.20   5   1      86712            0     36.9mb         36.9mb
yellow open   nagioslogserver       1   1         57            8     97.5kb         97.5kb
       close  logstash-2016.12.18
       close  logstash-2016.12.17
yellow open   logstash-2016.12.19   5   1     372750            0     76.7mb         76.7mb

The problem is not present at the moment. I think the indices are only interesting when the ram is loaded.

Posted: **Tue Dec 20, 2016 2:31 pm**

The problem may also have been due to localhost being proxied. I can see where that would cause issues with the front-end of Nagios Log Server.

It sounds like the original problem has been resolved then. Did you have additional questions regarding this issue?

Nagios Support Forum

Nagios Logserver RAM Problems

Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems

Re: Nagios Logserver RAM Problems