Page 2 of 5

Re: Multiple (40+) poller cron jobs running

Posted: Mon Apr 27, 2015 10:19 am
by GhostRider2110
PM with file attached sent. Thanks

See-ya
Mitch

Re: Multiple (40+) poller cron jobs running

Posted: Mon Apr 27, 2015 1:19 pm
by jolson
Mitch,

I should have included this in my previous post - could you also run the following:
pending tasks:

Code: Select all

curl 'localhost:9200/_cat/pending_tasks?v'
see recovery:

Code: Select all

curl -XGET 'localhost:9200/_cat/recovery?v'
A tail of the jobs logs (a few minutes):

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
check knapsack state:

Code: Select all

curl -XPOST 'http://localhost:9200/_export/state'
Best,


Jesse

Re: Multiple (40+) poller cron jobs running

Posted: Tue Apr 28, 2015 9:14 am
by GhostRider2110
Pending Tasks:

Code: Select all

insertOrder timeInQueue priority source
recovery
recovery.txt
(file attached)

Code: Select all

[root@IGAnagioslog ~]# tail -f /usr/local/nagioslogserver/var/jobs.log
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
knapsack state:

Code: Select all

{"count":1,"states":[{"mode":"export","started":"2015-04-21T17:06:47.142Z","path":"file:///store/backups/nagioslogserver/1429636007/kibana-int.tar.gz","node_name":"bb8f313e-98b6-4e1d-8ac4-19e6421ac511"}]}
See-ya
Mitch

Re: Multiple (40+) poller cron jobs running

Posted: Tue Apr 28, 2015 9:15 am
by GhostRider2110
ps -ef is now showing:

Code: Select all

nagios    1640  1638 16 Apr16 ?        1-23:21:14 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true
root      1690     1  0 Apr16 tty1     00:00:00 /sbin/mingetty /dev/tty1
root      1692     1  0 Apr16 tty2     00:00:00 /sbin/mingetty /dev/tty2
root      1694     1  0 Apr16 tty3     00:00:00 /sbin/mingetty /dev/tty3
root      1696     1  0 Apr16 tty4     00:00:00 /sbin/mingetty /dev/tty4
root      1698     1  0 Apr16 tty5     00:00:00 /sbin/mingetty /dev/tty5
root      1701   457  0 Apr16 ?        00:00:00 /sbin/udevd -d
root      1702   457  0 Apr16 ?        00:00:00 /sbin/udevd -d
root      1703     1  0 Apr16 tty6     00:00:00 /sbin/mingetty /dev/tty6
root      1716     2  0 Apr16 ?        00:04:49 [flush-253:0]
root      2733  1613  0 Apr21 ?        00:00:00 CROND
nagios    2735  2733  0 Apr21 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios    2738  2735  0 Apr21 ?        00:00:02 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios    2799  2738  0 Apr21 ?        00:01:38 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root      9385  1506  0 09:56 ?        00:00:00 sshd: root@pts/1 
root      9387  9385  0 09:56 pts/1    00:00:00 -bash
apache   11539  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   11540  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   11541  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   11542  1602  0 Apr26 ?        00:00:08 /usr/sbin/httpd
apache   11543  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   11544  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   11545  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   11546  1602  0 Apr26 ?        00:00:09 /usr/sbin/httpd
apache   15651  1602  0 07:51 ?        00:00:01 /usr/sbin/httpd
root     17593  1613  0 10:14 ?        00:00:00 CROND
root     17594  1613  0 10:14 ?        00:00:00 CROND
nagios   17595 17594  0 10:14 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
nagios   17596 17595  0 10:14 ?        00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller
nagios   17597 17593  0 10:14 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   17599 17597  0 10:14 ?        00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   17681  2799  0 10:14 ?        00:00:00 sleep 5
nagios   17682 20187  0 10:14 ?        00:00:00 sleep 5
nagios   17709 26852  0 10:14 ?        00:00:00 sleep 5
nagios   17716 55509  0 10:14 ?        00:00:00 sleep 5
nagios   17717 37393  0 10:14 ?        00:00:00 sleep 5
nagios   17721 22517  0 10:14 ?        00:00:00 sleep 5
nagios   17725 46589  0 10:14 ?        00:00:00 sleep 5
root     17726 63780  0 10:14 pts/0    00:00:00 ps -ef
root     20146  1613  0 Apr26 ?        00:00:00 CROND
nagios   20149 20146  0 Apr26 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   20150 20149  0 Apr26 ?        00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   20187 20150  0 Apr26 ?        00:00:26 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root     22466  1613  0 Apr27 ?        00:00:00 CROND
nagios   22470 22466  0 Apr27 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   22471 22470  0 Apr27 ?        00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   22517 22471  0 Apr27 ?        00:00:12 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root     26830  1613  0 Apr25 ?        00:00:00 CROND
nagios   26832 26830  0 Apr25 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   26833 26832  0 Apr25 ?        00:00:01 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   26852 26833  0 Apr25 ?        00:00:39 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root     37130  1613  0 Apr24 ?        00:00:00 CROND
nagios   37133 37130  0 Apr24 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   37135 37133  0 Apr24 ?        00:00:01 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   37393 37135  0 Apr24 ?        00:00:55 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root     46403  1613  0 Apr23 ?        00:00:00 CROND
nagios   46406 46403  0 Apr23 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   46408 46406  0 Apr23 ?        00:00:01 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   46589 46408  0 Apr23 ?        00:01:07 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root     55316  1613  0 Apr22 ?        00:00:00 CROND
nagios   55318 55316  0 Apr22 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
nagios   55320 55318  0 Apr22 ?        00:00:02 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   55509 55320  0 Apr22 ?        00:01:21 /bin/sh /usr/local/nagioslogserver/scripts/create_backup.sh
root     63778  1506  0 Apr27 ?        00:00:00 sshd: root@pts/0 
root     63780 63778  0 Apr27 pts/0    00:00:00 -bash
Let me know when you want me to try to restart/reset

See-ya
Mitch

Re: Multiple (40+) poller cron jobs running

Posted: Tue Apr 28, 2015 5:14 pm
by jolson
Mitch,

I much appreciate all of the information. I'll try and get some traction here and see what I can do. Feel free to restart your jobs if you need the backups up and working - but I would like to keep your machine in a 'broken' state to see whether or not we can troubleshoot this, since it's a problem that's been experienced before. Of course, restarting elasticsearch isn't the solution we're looking for.

Can you think of anything significant that may have impacted or otherwise interrupted the backup process?

Re: Multiple (40+) poller cron jobs running

Posted: Wed Apr 29, 2015 7:21 am
by GhostRider2110
Have been looking at this.. This does seem to be a different issue that earlier. Here is it just the backups that are not completing. Before poller was hanging.

If I am following the code in the backup script correctly, it seems to be hanging on the elasticsearch export part.

Code: Select all

# Wait for elasticsearch export jobs to finish...
echo "Waiting for backup. This may take a while."
count=3
while [[ $count -gt 0 ]]; do
	curl -s -XPOST 'http://localhost:9200/_export/state' > state.json
	count=$(python -m jsonselect.__main__ .count < state.json)
	echo -n "."
	sleep 5
done
Only one sleep command in the code and I have the same number of "sleep 5" commands as I do "create_backup.sh" commands.

Also what is strange, I configured in the "Backup & Maintenance" screen to use the repository "nagiosls1" and that is defined as /usr/local/nagioslogserver_bkuprep file system.
backup_Maint_Screen.png
So why would it want to use "BACKUP_DIR="/store/backups/nagioslogserver"" to place the backups?

Now I am just wondering if I should have started a new thread on this since I do believe it to be a different problem than what this thread started out with.

See-ya
Mitch

Re: Multiple (40+) poller cron jobs running

Posted: Wed Apr 29, 2015 10:49 am
by jolson
Mitch,

To be clear - the backup script that is hanging is /usr/local/nagioslogserver/scripts/create_backup.sh. This script controls configuration backup, not the backup of your actual logs. If you take a look at the filesizes of the backups in /store/backups/nagioslogserver/ you will see what I mean:

Code: Select all

[root@localhost ~]# ls -lh /store/backups/nagioslogserver/
-rw-r--r-- 1 nagios users 155K Apr  8 09:55 nagioslogserver.2015-04-07.1428419172.tar.gz
-rw-r--r-- 1 nagios users 300K Apr  8 10:06 nagioslogserver.2015-04-08.1428505577.tar.gz
-rw-r--r-- 1 nagios users 1.3M Apr 15 10:06 nagioslogserver.2015-04-15.1429110402.tar.gz
-rw-r--r-- 1 nagios users 1.5M Apr 16 10:06 nagioslogserver.2015-04-16.1429196807.tar.gz
-rw-r--r-- 1 nagios users 1.6M Apr 17 10:07 nagioslogserver.2015-04-17.1429283221.tar.gz
These backups are not controlled from the 'Backup and Maintenance' screen - they are pre-configured and automatic. For more information on this issue, another customer is experiencing similar symptoms: http://support.nagios.com/forum/viewtop ... 38&t=32218

I am having trouble reproducing this problem in lab, and due to that I can't open a bug report on it. I need to know how/why this is happening. Any chance you could collect the following information for me?

-The amount of nodes in your cluster
-The amount of logs coming in per 10 minutes (you can see this on your dashboard)
-Any custom logstash inputs/filters/outputs? If so, could you list them?
-Any customizations on the box you can think of? E.g. NRPE installed, non-default syslogger, etc.

Could you please post the results of these commands:

Code: Select all

cat /var/log/elasticsearch/*.log

Code: Select all

cat /var/log/httpd/error_log

Code: Select all

ls -lh /store/backups/nagioslogserver/
Thanks for your patience here Mitch, I'm hoping we can hunt whatever this is down and have it resolved soon.

Re: Multiple (40+) poller cron jobs running

Posted: Wed Apr 29, 2015 11:41 am
by GhostRider2110
jolson wrote:Mitch,

To be clear - the backup script that is hanging is /usr/local/nagioslogserver/scripts/create_backup.sh. This script controls configuration backup, not the backup of your actual logs. If you take a look at the filesizes of the backups in /store/backups/nagioslogserver/ you will see what I mean:

Code: Select all

[root@localhost ~]# ls -lh /store/backups/nagioslogserver/
-rw-r--r-- 1 nagios users 155K Apr  8 09:55 nagioslogserver.2015-04-07.1428419172.tar.gz
-rw-r--r-- 1 nagios users 300K Apr  8 10:06 nagioslogserver.2015-04-08.1428505577.tar.gz
-rw-r--r-- 1 nagios users 1.3M Apr 15 10:06 nagioslogserver.2015-04-15.1429110402.tar.gz
-rw-r--r-- 1 nagios users 1.5M Apr 16 10:06 nagioslogserver.2015-04-16.1429196807.tar.gz
-rw-r--r-- 1 nagios users 1.6M Apr 17 10:07 nagioslogserver.2015-04-17.1429283221.tar.gz
These backups are not controlled from the 'Backup and Maintenance' screen - they are pre-configured and automatic. For more information on this issue, another customer is experiencing similar symptoms: http://support.nagios.com/forum/viewtop ... 38&t=32218
Thanks for the explanation. Then I have another issue since I'm not getting any backups in the setup repository space, but that is for another time
jolson wrote:I am having trouble reproducing this problem in lab, and due to that I can't open a bug report on it. I need to know how/why this is happening. Any chance you could collect the following information for me?
-The amount of nodes in your cluster
Only 1
jolson wrote:-The amount of logs coming in per 10 minutes (you can see this on your dashboard)
count per 10m | (7442402 hits) Is that what you are asking for?
jolson wrote:-Any custom logstash inputs/filters/outputs? If so, could you list them?

Code: Select all

# 
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 29 Apr 2015 12:39:20 -0400
#

#
# Global Configuration
#

input {
    syslog {
        type => 'syslog'
        port => 5544
    }
    tcp {
        type => 'eventlog'
        port => 3515
        codec => json {
            charset => 'CP1252'
        }
    }
    tcp {
        type => 'import_raw'
        tags => 'import_raw'
        port => 2056
    }
    tcp {
        type => 'import_json'
        tags => 'import_json'
        port => 2057
        codec => json
    }
    syslog {
        type => 'syslog'
        port => 514
    }
    syslog {
        type => 'asa'
        port => 6544
    }
}

filter {
    if [program] == 'apache_access' {
        grok {
            match => [ 'message', '%{COMBINEDAPACHELOG}']
        }
        date {
            match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
        }
        mutate {
            replace => [ 'type', 'apache_access' ]
             convert => [ 'bytes', 'integer' ]
             convert => [ 'response', 'integer' ]
        }
    }
     
    if [program] == 'apache_error' {
        grok {
            match => [ 'message', '\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}']
        }
        mutate {
            replace => [ 'type', 'apache_error' ]
        }
    }
    if [program] == 'TrexSyncPubRep' {
    mutate {
    replace => [ 'type', 'TrexSyncPubRep' ]
    }
    }
    if [type] == 'asa' {
    grok{
    match => ['message', '%{SYSLOG5424PRI}%%{WORD:LogType}-%{INT:LogSeverity}-%{INT:LogMessageNumber}: Group = %{IPORHOST:Group}, Username = %{IPORHOST:username}, IP = %{IP:IPAddress}, Session disconnected. Session Type: %{WORD:SessionType}, Duration: %{CUSTOM1:DurationDays=[0-9]?}%{CUSTOM2=d? ?}%{INT:DurationHours:int}h:%{INT:DurationMinutes:int}m:%{INT:DurationSeconds:int}s, Bytes xmt: %{INT:BytesTransmitted:int}, Bytes rcv: %{INT:BytesReceived:int}, Reason: %{GREEDYDATA:Reason}']
    }
    geoip {
      source => "IPAddress"
    }
    }
    if [program] == 'apache_access' {
        geoip {
            source => 'clientip'
        }
    }
    if [program] == 'TrexSyncRep' {
    mutate {
    replace => [ 'type', 'TrexSyncRep' ]
    }
    }
    if [program] == 'Jupiter_log' {
    mutate {
    replace => [ 'type', 'Jupiter' ]
    }
    }
    if [program] == 'diablo_in1_video_management' {
    mutate {
    replace => [ 'type', 'diablo' ]
    }
    }
    if [program] == 'PUB_API_ACCESS' {
    mutate {
    replace => [ 'type', 'APIaccess' ]
    }
    }
    if [program] == 'sudo' {
    mutate {
    replace => [ 'type', 'sudo' ]
    }
    }
    if [program] == 'opt_lrms_logs_cmgopher' {
    mutate {
    replace => [ 'type', 'CMGopher_LRMS' ]
    }
    }
    if [program] == 'var_opt_lrms_log_uam' {
    mutate {
    replace => [ 'type', 'UAMGopher_LRMS' ]
    }
    }
    if [program] == 'opt_lrms_logs_uam' {
    mutate {
    replace => [ 'type', 'UAMGopher_LRMS' ]
    }
    }
    if [program] == 'opt_lrms_logs_cm' {
    mutate {
    replace => [ 'type', 'CM_LRMS' ]
    }
    }
    if [program] == 'Epsy_log' {
    mutate {
    replace => [ 'type', 'Epsy_log' ]
    }
    }
    if [program] == 'Wowzastream_access' {
    mutate {
    replace => [ 'type', 'wowzastream' ]
    }
    }
    if [program] == 'Wowzastream_error' {
    mutate {
    replace => [ 'type', 'wowzastream' ]
    }
    }
}

#
# Local Configuration
#
jolson wrote:-Any customizations on the box you can think of? E.g. NRPE installed, non-default syslogger, etc.
I do have the nagiosXI client install on it and except for OS security patches nothing else I can think of. I did move the base
install to a larger disk but that was a while back so things have been working since then.
jolson wrote:Could you please post the results of these commands:

Code: Select all

cat /var/log/elasticsearch/*.log

Code: Select all

[root@IGAnagioslog local]# cat /var/log/elasticsearch/*.log
[2015-04-29 03:57:35,898][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [apache_access] (dynamic)
[2015-04-29 05:14:40,084][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
[2015-04-29 05:14:51,559][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
[2015-04-29 05:14:53,163][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
[2015-04-29 05:15:58,783][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
[2015-04-29 05:17:32,378][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
[2015-04-29 08:20:42,535][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
[2015-04-29 08:20:43,490][INFO ][cluster.metadata         ] [bb8f313e-98b6-4e1d-8ac4-19e6421ac511] [logstash-2015.04.29] update_mapping [eventlog] (dynamic)
jolson wrote:

Code: Select all

cat /var/log/httpd/error_log

Code: Select all

[root@IGAnagioslog local]# cat /var/log/httpd/error_log
[Sun Apr 26 03:34:03 2015] [notice] Digest: generating secret for digest authentication ...
[Sun Apr 26 03:34:03 2015] [notice] Digest: done
[Sun Apr 26 03:34:03 2015] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 configured -- resuming normal operations
jolson wrote:

Code: Select all

ls -lh /store/backups/nagioslogserver/

Code: Select all

[root@IGAnagioslog local]# ls -lh /store/backups/nagioslogserver/
total 451M
drwxrwxrwx 2 nagios users 4.0K Apr 21 13:06 1429636007
drwxrwxrwx 2 nagios users 4.0K Apr 22 13:06 1429722407
drwxrwxrwx 2 nagios users 4.0K Apr 23 13:06 1429808811
drwxrwxrwx 2 nagios users 4.0K Apr 24 13:06 1429895212
drwxrwxrwx 2 nagios users 4.0K Apr 25 13:07 1429981621
drwxrwxrwx 2 nagios users 4.0K Apr 26 13:07 1430068022
drwxrwxrwx 2 nagios users 4.0K Apr 27 13:07 1430154427
drwxrwxrwx 2 nagios users 4.0K Apr 28 13:07 1430240831
-rw-r--r-- 1 nagios users  21M Mar 27 13:06 nagioslogserver.2015-03-27.1427475951.tar.gz
-rw-r--r-- 1 nagios users 5.3M Mar 28 13:05 nagioslogserver.2015-03-28.1427562351.tar.gz
-rw-r--r-- 1 nagios users 5.6M Mar 29 13:05 nagioslogserver.2015-03-29.1427648752.tar.gz
-rw-r--r-- 1 nagios users  21M Mar 30 13:06 nagioslogserver.2015-03-30.1427735162.tar.gz
-rw-r--r-- 1 nagios users 5.6M Mar 31 13:06 nagioslogserver.2015-03-31.1427821562.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  1 13:06 nagioslogserver.2015-04-01.1427907966.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  2 13:06 nagioslogserver.2015-04-02.1427994366.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  3 13:06 nagioslogserver.2015-04-03.1428080766.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  4 13:06 nagioslogserver.2015-04-04.1428167167.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  5 13:06 nagioslogserver.2015-04-05.1428253572.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  6 13:06 nagioslogserver.2015-04-06.1428339977.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  7 13:06 nagioslogserver.2015-04-07.1428426381.tar.gz
-rw-r--r-- 1 nagios users 5.6M Apr  8 13:06 nagioslogserver.2015-04-08.1428512781.tar.gz
-rw-r--r-- 1 nagios users  21M Apr  9 13:06 nagioslogserver.2015-04-09.1428599182.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 10 13:06 nagioslogserver.2015-04-10.1428685587.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 11 13:06 nagioslogserver.2015-04-11.1428771991.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 12 13:06 nagioslogserver.2015-04-12.1428858392.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 13 13:07 nagioslogserver.2015-04-13.1428944796.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 14 13:07 nagioslogserver.2015-04-14.1429031196.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 15 13:07 nagioslogserver.2015-04-15.1429117596.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 16 13:07 nagioslogserver.2015-04-16.1429203997.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 17 13:07 nagioslogserver.2015-04-17.1429290397.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 18 13:07 nagioslogserver.2015-04-18.1429376797.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 19 13:07 nagioslogserver.2015-04-19.1429463201.tar.gz
-rw-r--r-- 1 nagios users  21M Apr 20 13:07 nagioslogserver.2015-04-20.1429549602.tar.gz

[root@IGAnagioslog local]# df -kh
Filesystem            Size       Used      Avail      Use%      Mounted on
rootfs                   99G       5.7G      92G        6%        /
devtmpfs              2.0G      208K      2.0G       1%        /dev
tmpfs                   2.0G      0            2.0G       0%       /dev/shm
/dev/sda1            99G       5.7G       92G        6%       /
/dev/mapper/vg_nagioslog-lv_nagioslog_data
                           477G      256G      222G      54%     /usr/local/nagioslogserver
/dev/mapper/vg_nagioslog-lv_nagioslog_rep
                            49G       33M        49G       1% /usr/local/nagioslogserver_bkuprep

jolson wrote:Thanks for your patience here Mitch, I'm hoping we can hunt whatever this is down and have it resolved soon.

No problem.

See-ya
Mitch

Re: Multiple (40+) poller cron jobs running

Posted: Wed Apr 29, 2015 5:06 pm
by jolson
Mitch,

In the other thread, it was pointed out that his MAX_LOCKED_MEMORY variable was not set to 'unlimited'. Could you please cat your elasticsearch config file and show us what that variable is set to?

Code: Select all

cat /etc/sysconfig/elasticsearch
I would like you to change: MAX_LOCKED_MEMORY=X
to: MAX_LOCKED_MEMORY=unlimited

I know we discussed this change in a previous thread, but I'm not sure if it was implemented.

After this change, restart elasticsearch.

Code: Select all

service elasticsearch restart
I am ultimately looking for similarities between the two systems, and so far there is nothing that stands out. The MAX_LOCKED_MEMORY could be notable if yours was not set to unlimited as well.

Re: Multiple (40+) poller cron jobs running

Posted: Thu Apr 30, 2015 7:31 am
by GhostRider2110
Sorry, it was set a while back:

Code: Select all

[root@IGAnagioslog nagioslogserver]# cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"

# Heap Size (defaults to 256m min, 1g max)
ES_HEAP_SIZE=2g

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Additional Java OPTS
#ES_JAVA_OPTS=

# Maximum number of open files
MAX_OPEN_FILES=65535

# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144

# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"

# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"

# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"

# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"

# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
	GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
	GET_ES_CONFIG_RETURN=$?

	if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
		echo $GET_ES_CONFIG_MESSAGE
		exit 1
	else
		ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
	fi
fi