Page 2 of 2

Re: Drop messages from central syslog server

Posted: Fri Sep 23, 2016 12:17 pm
by mcapra
Hard to identify exactly where the failure is occurring, but i'm fairly certain it's within the syslog-ng forwarder or the individual rsyslog shippers at this point rather than Logstash (or any other NLS component).

It could still be that the NLS cluster is overloaded, but without messages in the Logstash log that traffic is being dropped I'm not totally convinced. Nothing I see within the Logstash community indicates that Logstash would produce the "rate-limiting" error that rsyslog is printing and everything in the rsyslog community indicates that the "rate-limiting" points to rsyslog configurations.

Lets see the outputs of:

Code: Select all

df -h 
free -m
ps aux | grep java
curl -XGET 'http://localhost:9200/_nodes/?pretty'
curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards&pretty'

Re: Drop messages from central syslog server

Posted: Fri Sep 23, 2016 1:16 pm
by krobertson71
Drive space and memory are well in the green. I had to attach the shard health output as it blew the character limit.

ps aux | grep java

Code: Select all

nagios    1759 18.2 53.3 67869128 19743780 ?   SLl  Sep21 529:04 /usr/bin/java -Xms18074m -Xmx18074m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Des.cluster.name=907e60a9-dc29-411e-96e8-2dfe503e0867 -Des.node.name=b2733b10-233a-4593-9428-85145cd54c77 -Des.discovery.zen.ping.unicast.hosts=localhost,10.0.103.180,10.136.132.107 -Des.path.repo=/ -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.6.0.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/usr/local/nagioslogserver/elasticsearch/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch
nagios    8803 12.1  1.4 4462948 526392 ?      SNsl 13:30   4:05 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
curl -XGET 'http://localhost:9200/_nodes/?pretty'

Code: Select all

curl -XGET 'http://localhost:9200/_nodes/?pretty'
{
  "cluster_name" : "907e60a9-dc29-411e-96e8-2dfe503e0867",
  "nodes" : {
    "lqgbDDScQKOsoLGmD69ibg" : {
      "name" : "11fe29cc-9353-4cc1-a368-14a0b6977937",
      "transport_address" : "inet[/10.136.132.107:9300]",
      "host" : "nagilgp02",
      "ip" : "10.136.132.107",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "settings" : {
        "node" : {
          "max_local_storage_nodes" : "1",
          "name" : "11fe29cc-9353-4cc1-a368-14a0b6977937"
        },
        "bootstrap" : {
          "mlockall" : "true"
        },
        "client" : {
          "type" : "node"
        },
        "transport" : {
          "tcp" : {
            "compress" : "true"
          }
        },
        "http" : {
          "host" : "localhost"
        },
        "name" : "11fe29cc-9353-4cc1-a368-14a0b6977937",
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "path" : {
          "data" : "/usr/local/nagioslogserver/elasticsearch/data",
          "work" : "/usr/local/nagioslogserver/tmp/elasticsearch",
          "home" : "/usr/local/nagioslogserver/elasticsearch",
          "conf" : "/usr/local/nagioslogserver/elasticsearch/config",
          "logs" : "/var/log/elasticsearch",
          "repo" : "/"
        },
        "config" : {
          "ignore_system_properties" : "true"
        },
        "cluster" : {
          "name" : "907e60a9-dc29-411e-96e8-2dfe503e0867"
        },
        "discovery" : {
          "zen" : {
            "ping" : {
              "unicast" : {
                "hosts" : "localhost,10.0.103.180,10.136.132.107",
                "hosts.0" : "localhost"
              },
              "multicast" : {
                "enabled" : "false"
              }
            }
          }
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "available_processors" : 6,
        "cpu" : {
          "vendor" : "Intel",
          "model" : "Xeon",
          "mhz" : 2493,
          "total_cores" : 6,
          "total_sockets" : 6,
          "cores_per_socket" : 1,
          "cache_size_in_bytes" : 25600
        },
        "mem" : {
          "total_in_bytes" : 37905866752
        },
        "swap" : {
          "total_in_bytes" : 2147479552
        }
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1772,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      },
      "jvm" : {
        "pid" : 1772,
        "version" : "1.7.0_101",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.95-b01",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1470154194590,
        "mem" : {
          "heap_init_in_bytes" : 18951962624,
          "heap_max_in_bytes" : 18899664896,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 18899664896
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      },
      "thread_pool" : {
        "generic" : {
          "type" : "cached",
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "index" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "200"
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "get" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "merge" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "suggest" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "bulk" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "50"
        },
        "optimize" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed",
          "min" : 10,
          "max" : 10,
          "queue_size" : "1k"
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 3,
          "max" : 3,
          "queue_size" : -1
        },
        "percolate" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        }
      },
      "network" : {
        "refresh_interval_in_millis" : 5000,
        "primary_interface" : {
          "address" : "10.136.132.107",
          "name" : "eth0",
          "mac_address" : "00:50:56:AD:36:1E"
        }
      },
      "transport" : {
        "bound_address" : "inet[/0:0:0:0:0:0:0:0%0:9300]",
        "publish_address" : "inet[/10.136.132.107:9300]",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : "inet[/127.0.0.1:9200]",
        "publish_address" : "inet[/127.0.0.1:9200]",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ {
        "name" : "knapsack-1.5.2.0-f340ad1",
        "version" : "1.5.2.0",
        "description" : "Knapsack plugin for import/export",
        "jvm" : true,
        "site" : false
      } ]
    },
    "5NJxoUWLQ6Co0GKbXJaHPw" : {
      "name" : "b2733b10-233a-4593-9428-85145cd54c77",
      "transport_address" : "inet[/10.0.103.180:9300]",
      "host" : "nagilgp01.dcri.duke.net",
      "ip" : "10.0.103.180",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "settings" : {
        "node" : {
          "max_local_storage_nodes" : "1",
          "name" : "b2733b10-233a-4593-9428-85145cd54c77"
        },
        "bootstrap" : {
          "mlockall" : "true"
        },
        "client" : {
          "type" : "node"
        },
        "transport" : {
          "tcp" : {
            "compress" : "true"
          }
        },
        "http" : {
          "host" : "localhost"
        },
        "name" : "b2733b10-233a-4593-9428-85145cd54c77",
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "path" : {
          "data" : "/usr/local/nagioslogserver/elasticsearch/data",
          "work" : "/usr/local/nagioslogserver/tmp/elasticsearch",
          "home" : "/usr/local/nagioslogserver/elasticsearch",
          "conf" : "/usr/local/nagioslogserver/elasticsearch/config",
          "logs" : "/var/log/elasticsearch",
          "repo" : "/"
        },
        "config" : {
          "ignore_system_properties" : "true"
        },
        "cluster" : {
          "name" : "907e60a9-dc29-411e-96e8-2dfe503e0867"
        },
        "discovery" : {
          "zen" : {
            "ping" : {
              "unicast" : {
                "hosts" : "localhost,10.0.103.180,10.136.132.107",
                "hosts.0" : "localhost"
              },
              "multicast" : {
                "enabled" : "false"
              }
            }
          }
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "available_processors" : 6,
        "cpu" : {
          "vendor" : "Intel",
          "model" : "Xeon",
          "mhz" : 2493,
          "total_cores" : 6,
          "total_sockets" : 6,
          "cores_per_socket" : 1,
          "cache_size_in_bytes" : 25600
        },
        "mem" : {
          "total_in_bytes" : 37905866752
        },
        "swap" : {
          "total_in_bytes" : 2147479552
        }
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1759,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      },
      "jvm" : {
        "pid" : 1759,
        "version" : "1.7.0_101",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.95-b01",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1474479636053,
        "mem" : {
          "heap_init_in_bytes" : 18951962624,
          "heap_max_in_bytes" : 18899664896,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 18899664896
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      },
      "thread_pool" : {
        "generic" : {
          "type" : "cached",
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "index" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "200"
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "get" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "merge" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "suggest" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "bulk" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "50"
        },
        "optimize" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed",
          "min" : 10,
          "max" : 10,
          "queue_size" : "1k"
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 3,
          "max" : 3,
          "queue_size" : -1
        },
        "percolate" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        }
      },
      "network" : {
        "refresh_interval_in_millis" : 5000,
        "primary_interface" : {
          "address" : "10.0.103.180",
          "name" : "eth0",
          "mac_address" : "00:50:56:AD:32:1E"
        }
      },
      "transport" : {
        "bound_address" : "inet[/0:0:0:0:0:0:0:0:9300]",
        "publish_address" : "inet[/10.0.103.180:9300]",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : "inet[/127.0.0.1:9200]",
        "publish_address" : "inet[localhost/127.0.0.1:9200]",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ {
        "name" : "knapsack-1.5.2.0-f340ad1",
        "version" : "1.5.2.0",
        "description" : "Knapsack plugin for import/export",
        "jvm" : true,
        "site" : false
      } ]
    }
  }
}

Re: Drop messages from central syslog server

Posted: Fri Sep 23, 2016 1:39 pm
by mcapra
Well, we've pretty much looked over everything on the Nagios Log Server end of things and I can't see anything misbehaving. I would make sure the syslog-ng forwarder isn't being overloaded and that the individual rsyslog agents are indeed configured without rate-limits.