Drop messages from central syslog server

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Drop messages from central syslog server

Post by mcapra »

Hard to identify exactly where the failure is occurring, but i'm fairly certain it's within the syslog-ng forwarder or the individual rsyslog shippers at this point rather than Logstash (or any other NLS component).

It could still be that the NLS cluster is overloaded, but without messages in the Logstash log that traffic is being dropped I'm not totally convinced. Nothing I see within the Logstash community indicates that Logstash would produce the "rate-limiting" error that rsyslog is printing and everything in the rsyslog community indicates that the "rate-limiting" points to rsyslog configurations.

Lets see the outputs of:

Code: Select all

df -h 
free -m
ps aux | grep java
curl -XGET 'http://localhost:9200/_nodes/?pretty'
curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards&pretty'
Former Nagios employee
https://www.mcapra.com/
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Drop messages from central syslog server

Post by krobertson71 »

Drive space and memory are well in the green. I had to attach the shard health output as it blew the character limit.

ps aux | grep java

Code: Select all

nagios    1759 18.2 53.3 67869128 19743780 ?   SLl  Sep21 529:04 /usr/bin/java -Xms18074m -Xmx18074m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Des.cluster.name=907e60a9-dc29-411e-96e8-2dfe503e0867 -Des.node.name=b2733b10-233a-4593-9428-85145cd54c77 -Des.discovery.zen.ping.unicast.hosts=localhost,10.0.103.180,10.136.132.107 -Des.path.repo=/ -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.6.0.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/usr/local/nagioslogserver/elasticsearch/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch
nagios    8803 12.1  1.4 4462948 526392 ?      SNsl 13:30   4:05 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
curl -XGET 'http://localhost:9200/_nodes/?pretty'

Code: Select all

curl -XGET 'http://localhost:9200/_nodes/?pretty'
{
  "cluster_name" : "907e60a9-dc29-411e-96e8-2dfe503e0867",
  "nodes" : {
    "lqgbDDScQKOsoLGmD69ibg" : {
      "name" : "11fe29cc-9353-4cc1-a368-14a0b6977937",
      "transport_address" : "inet[/10.136.132.107:9300]",
      "host" : "nagilgp02",
      "ip" : "10.136.132.107",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "settings" : {
        "node" : {
          "max_local_storage_nodes" : "1",
          "name" : "11fe29cc-9353-4cc1-a368-14a0b6977937"
        },
        "bootstrap" : {
          "mlockall" : "true"
        },
        "client" : {
          "type" : "node"
        },
        "transport" : {
          "tcp" : {
            "compress" : "true"
          }
        },
        "http" : {
          "host" : "localhost"
        },
        "name" : "11fe29cc-9353-4cc1-a368-14a0b6977937",
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "path" : {
          "data" : "/usr/local/nagioslogserver/elasticsearch/data",
          "work" : "/usr/local/nagioslogserver/tmp/elasticsearch",
          "home" : "/usr/local/nagioslogserver/elasticsearch",
          "conf" : "/usr/local/nagioslogserver/elasticsearch/config",
          "logs" : "/var/log/elasticsearch",
          "repo" : "/"
        },
        "config" : {
          "ignore_system_properties" : "true"
        },
        "cluster" : {
          "name" : "907e60a9-dc29-411e-96e8-2dfe503e0867"
        },
        "discovery" : {
          "zen" : {
            "ping" : {
              "unicast" : {
                "hosts" : "localhost,10.0.103.180,10.136.132.107",
                "hosts.0" : "localhost"
              },
              "multicast" : {
                "enabled" : "false"
              }
            }
          }
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "available_processors" : 6,
        "cpu" : {
          "vendor" : "Intel",
          "model" : "Xeon",
          "mhz" : 2493,
          "total_cores" : 6,
          "total_sockets" : 6,
          "cores_per_socket" : 1,
          "cache_size_in_bytes" : 25600
        },
        "mem" : {
          "total_in_bytes" : 37905866752
        },
        "swap" : {
          "total_in_bytes" : 2147479552
        }
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1772,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      },
      "jvm" : {
        "pid" : 1772,
        "version" : "1.7.0_101",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.95-b01",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1470154194590,
        "mem" : {
          "heap_init_in_bytes" : 18951962624,
          "heap_max_in_bytes" : 18899664896,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 18899664896
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      },
      "thread_pool" : {
        "generic" : {
          "type" : "cached",
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "index" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "200"
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "get" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "merge" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "suggest" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "bulk" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "50"
        },
        "optimize" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed",
          "min" : 10,
          "max" : 10,
          "queue_size" : "1k"
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 3,
          "max" : 3,
          "queue_size" : -1
        },
        "percolate" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        }
      },
      "network" : {
        "refresh_interval_in_millis" : 5000,
        "primary_interface" : {
          "address" : "10.136.132.107",
          "name" : "eth0",
          "mac_address" : "00:50:56:AD:36:1E"
        }
      },
      "transport" : {
        "bound_address" : "inet[/0:0:0:0:0:0:0:0%0:9300]",
        "publish_address" : "inet[/10.136.132.107:9300]",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : "inet[/127.0.0.1:9200]",
        "publish_address" : "inet[/127.0.0.1:9200]",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ {
        "name" : "knapsack-1.5.2.0-f340ad1",
        "version" : "1.5.2.0",
        "description" : "Knapsack plugin for import/export",
        "jvm" : true,
        "site" : false
      } ]
    },
    "5NJxoUWLQ6Co0GKbXJaHPw" : {
      "name" : "b2733b10-233a-4593-9428-85145cd54c77",
      "transport_address" : "inet[/10.0.103.180:9300]",
      "host" : "nagilgp01.dcri.duke.net",
      "ip" : "10.0.103.180",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "settings" : {
        "node" : {
          "max_local_storage_nodes" : "1",
          "name" : "b2733b10-233a-4593-9428-85145cd54c77"
        },
        "bootstrap" : {
          "mlockall" : "true"
        },
        "client" : {
          "type" : "node"
        },
        "transport" : {
          "tcp" : {
            "compress" : "true"
          }
        },
        "http" : {
          "host" : "localhost"
        },
        "name" : "b2733b10-233a-4593-9428-85145cd54c77",
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "path" : {
          "data" : "/usr/local/nagioslogserver/elasticsearch/data",
          "work" : "/usr/local/nagioslogserver/tmp/elasticsearch",
          "home" : "/usr/local/nagioslogserver/elasticsearch",
          "conf" : "/usr/local/nagioslogserver/elasticsearch/config",
          "logs" : "/var/log/elasticsearch",
          "repo" : "/"
        },
        "config" : {
          "ignore_system_properties" : "true"
        },
        "cluster" : {
          "name" : "907e60a9-dc29-411e-96e8-2dfe503e0867"
        },
        "discovery" : {
          "zen" : {
            "ping" : {
              "unicast" : {
                "hosts" : "localhost,10.0.103.180,10.136.132.107",
                "hosts.0" : "localhost"
              },
              "multicast" : {
                "enabled" : "false"
              }
            }
          }
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "available_processors" : 6,
        "cpu" : {
          "vendor" : "Intel",
          "model" : "Xeon",
          "mhz" : 2493,
          "total_cores" : 6,
          "total_sockets" : 6,
          "cores_per_socket" : 1,
          "cache_size_in_bytes" : 25600
        },
        "mem" : {
          "total_in_bytes" : 37905866752
        },
        "swap" : {
          "total_in_bytes" : 2147479552
        }
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1759,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      },
      "jvm" : {
        "pid" : 1759,
        "version" : "1.7.0_101",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.95-b01",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1474479636053,
        "mem" : {
          "heap_init_in_bytes" : 18951962624,
          "heap_max_in_bytes" : 18899664896,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 18899664896
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      },
      "thread_pool" : {
        "generic" : {
          "type" : "cached",
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "index" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "200"
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "get" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "merge" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "suggest" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "bulk" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "50"
        },
        "optimize" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed",
          "min" : 10,
          "max" : 10,
          "queue_size" : "1k"
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 12,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 3,
          "max" : 3,
          "queue_size" : -1
        },
        "percolate" : {
          "type" : "fixed",
          "min" : 6,
          "max" : 6,
          "queue_size" : "1k"
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 3,
          "keep_alive" : "5m",
          "queue_size" : -1
        }
      },
      "network" : {
        "refresh_interval_in_millis" : 5000,
        "primary_interface" : {
          "address" : "10.0.103.180",
          "name" : "eth0",
          "mac_address" : "00:50:56:AD:32:1E"
        }
      },
      "transport" : {
        "bound_address" : "inet[/0:0:0:0:0:0:0:0:9300]",
        "publish_address" : "inet[/10.0.103.180:9300]",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : "inet[/127.0.0.1:9200]",
        "publish_address" : "inet[localhost/127.0.0.1:9200]",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ {
        "name" : "knapsack-1.5.2.0-f340ad1",
        "version" : "1.5.2.0",
        "description" : "Knapsack plugin for import/export",
        "jvm" : true,
        "site" : false
      } ]
    }
  }
}
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Drop messages from central syslog server

Post by mcapra »

Well, we've pretty much looked over everything on the Nagios Log Server end of things and I can't see anything misbehaving. I would make sure the syslog-ng forwarder isn't being overloaded and that the individual rsyslog agents are indeed configured without rate-limits.
Former Nagios employee
https://www.mcapra.com/
Locked