Search found 15 matches

by NSchoenbaechler
Mon Aug 20, 2018 11:33 am
Forum: Open Source Nagios Projects
Topic: Checks randomly not reaching hard state
Replies: 13
Views: 3211

Re: Checks randomly not reaching hard state

Re-tested the same service in my original post with the host still up. Worked and notified fine. Thanks for your help, and thanks for submitting the bug request.
by NSchoenbaechler
Mon Aug 20, 2018 11:22 am
Forum: Open Source Nagios Projects
Topic: Checks randomly not reaching hard state
Replies: 13
Views: 3211

Re: Checks randomly not reaching hard state

Oh yeah I forgot about the host state stuff, so my testing methodology was bad. I'll check with a few instances where a service goes down but the host stays up. Thanks.
by NSchoenbaechler
Mon Aug 20, 2018 10:47 am
Forum: Open Source Nagios Projects
Topic: Checks randomly not reaching hard state
Replies: 13
Views: 3211

Re: Checks randomly not reaching hard state

Scott, Thank you so much for the reply. I was really scratching my head and thinking I was crazy. In the example I provided, the host was in a down state at the time. I just shut the server down and let things play out (though I was forcing checks to speed things up). Here's the output you requested...
by NSchoenbaechler
Mon Aug 20, 2018 10:18 am
Forum: Open Source Nagios Projects
Topic: Checks randomly not reaching hard state
Replies: 13
Views: 3211

Checks randomly not reaching hard state

Since upgrading to Nagios Core 4.4.x (we are now on 4.4.2, latest) we have seen a recurring and serious issue, where checks randomly remain in a soft state even when they have reached their max check attempts. Therefore they never notify, but we do get recovery notifications. Here's an example servi...
by NSchoenbaechler
Thu May 17, 2018 1:14 pm
Forum: Nagios Log Server
Topic: Snapshots & Maintenance jobs don't appear to be working
Replies: 5
Views: 265

Re: Snapshots & Maintenance jobs don't appear to be working

That took care of it. Thanks for the help.
by NSchoenbaechler
Tue May 15, 2018 9:22 am
Forum: Nagios Log Server
Topic: Snapshots & Maintenance jobs don't appear to be working
Replies: 5
Views: 265

Re: Snapshots & Maintenance jobs don't appear to be working

Attached. The last and next run times are way off given the frequency setting, and when I run manually, nothing seems to happen.
by NSchoenbaechler
Tue May 15, 2018 8:48 am
Forum: Nagios Log Server
Topic: Snapshots & Maintenance jobs don't appear to be working
Replies: 5
Views: 265

Snapshots & Maintenance jobs don't appear to be working

We had some issues with running out of disk space in the past so I configured maintenance settings to close indexes older than 30 days and delete indexes older than 35 days. I may be misunderstanding how this works, but indexes don't seem to be closing or deleting. Today is May 15 and I still have a...
by NSchoenbaechler
Wed May 02, 2018 1:38 pm
Forum: Nagios Log Server
Topic: Frequent reboots required to maintain performance
Replies: 8
Views: 1396

Re: Frequent reboots required to maintain performance

Cluster Health
Status Red
Timed Out? false
# Instances 2
# Data Instances 2
Active Primary Shards 348
Active Shards 348
Relocating Shards 0
Initializing Shards 0
Unassigned Shards 426
by NSchoenbaechler
Wed May 02, 2018 1:13 pm
Forum: Nagios Log Server
Topic: Frequent reboots required to maintain performance
Replies: 8
Views: 1396

Re: Frequent reboots required to maintain performance

I've also just noticed that we have some more serious issues. Our cluster status is red and we have many unassigned shards. It looks like all of our replica shards are unassigned, and I've been having trouble figuring out how to get them assigned. I've verified that both instances are up and elastic...
by NSchoenbaechler
Fri Apr 27, 2018 12:44 pm
Forum: Nagios Log Server
Topic: Frequent reboots required to maintain performance
Replies: 8
Views: 1396

Re: Frequent reboots required to maintain performance

All pages are just extremely slow to load. Even getting beyond the login screen takes around a minute. Queries end up being unresponsive. I'll follow your article and provide more feedback afterwards. Thank you.