Soft state counter decreasing?!?!?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
zzt
Posts: 4
Joined: Tue Oct 27, 2015 1:10 pm

Soft state counter decreasing?!?!?

Post by zzt »

Hello, the other day we had a switch go down, and nagios got a little weird. The ping failure brought it to a soft state, and it started incrementing towards our defined max of 5. At some point the state counter went from 3, to 2, then back to 3...it ended up getting to 5 and alerting...but I don't know why it would decrement in the first place. Has anyone else ever seen this behavior? We're running a basically stock setup of Nagios 3.5.1.

Syslog:
Oct 26 19:21:59 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
Oct 26 19:22:49 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Oct 26 19:25:29 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;SOFT;3;PING CRITICAL - Packet loss = 100%
Oct 26 19:27:03 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Oct 26 19:29:23 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;SOFT;3;PING CRITICAL - Packet loss = 100%
Oct 26 19:32:03 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;SOFT;4;PING CRITICAL - Packet loss = 100%
Oct 26 19:34:43 nagios01 nagios3: HOST ALERT: r2r10tor2;DOWN;HARD;5;PING CRITICAL - Packet loss = 100%

An alert was successfully sent at this point, but this behavior is very worrisome. I don't want to get in a situation where it's just flapping between 2 & 3.
zzt
Posts: 4
Joined: Tue Oct 27, 2015 1:10 pm

Re: Soft state counter decreasing?!?!?

Post by zzt »

Just saw that there was similar issue with an idrac. This one stalled at 2, before moving on to the max retry:

Oct 26 22:02:09 nagios01 nagios3: HOST ALERT: server213-drac;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
Oct 26 22:04:39 nagios01 nagios3: HOST ALERT: server213-drac;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Oct 26 22:08:04 nagios01 nagios3: HOST ALERT: server213-drac;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Oct 26 22:10:14 nagios01 nagios3: HOST ALERT: server213-drac;DOWN;SOFT;3;PING CRITICAL - Packet loss = 100%
Oct 26 22:12:34 nagios01 nagios3: HOST ALERT: server213-drac;DOWN;SOFT;4;PING CRITICAL - Packet loss = 100%
Oct 26 22:14:54 nagios01 nagios3: HOST ALERT: server213-drac;DOWN;HARD;5;PING CRITICAL - Packet loss = 100%
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Soft state counter decreasing?!?!?

Post by Box293 »

What is the output of these commands:

Code: Select all

ps -ef | grep nagios.cfg
ipcs -q
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
zzt
Posts: 4
Joined: Tue Oct 27, 2015 1:10 pm

Re: Soft state counter decreasing?!?!?

Post by zzt »

ps shows the main nagios3 process (45336) and a large number of subprocesses, which in turn have subprocesses doing the actual checks. There's only one parent nagios3 process, so I've been under the assumption that this is the correct behavior. ipcs returns nothing from what I can tell.

$ ps -ef | grep nagios.cfg

Code: Select all

nagios    40521  45336  0 15:57 ?        00:00:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios    40927  45336  0 15:57 ?        00:00:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios    45336      1 82 08:00 ?        06:35:23 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios    45458  45336  0 15:57 ?        00:00:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios    63240  45336  0 15:57 ?        00:00:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios    68877  45336  0 15:57 ?        00:00:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
...etc

$ ps -auxf
...

Code: Select all

nagios    45336 82.7  0.2 3115628 137196 ?      RNsl 08:00 399:41 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios   123579  1.1  0.2 3115628 135080 ?      SN   16:03   0:00  \_ /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios   123643  0.2  0.0   4480   760 ?        SN   16:03   0:00  |   \_ sh -c /usr/lib/nagios/plugins/check_ping -H '192.168.0.44' -w 5000,100% -c 5000,100% -p 1
nagios   123659  0.0  0.0   8396   944 ?        SN   16:03   0:00  |       \_ /usr/lib/nagios/plugins/check_ping -H 192.168.0..44 -w 5000,100% -c 5000,100% -p 1
nagios   123660  0.0  0.0   4404   744 ?        SN   16:03   0:00  |           \_ /bin/ping -n -U -w 10 -c 1 192.168.0..44
nagios   123582  0.7  0.2 3115628 135080 ?      SN   16:03   0:00  \_ /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg
nagios   123633  0.4  0.0   4480   844 ?        SN   16:03   0:00  |   \_ sh -c /usr/lib/nagios/plugins/check_ping -H '192.168.0..10' -w 5000,100% -c 5000,100% -p 1
nagios   123667  0.0  0.0   8396  1016 ?        SN   16:03   0:00  |       \_ /usr/lib/nagios/plugins/check_ping -H 192.168.0.10 -w 5000,100% -c 5000,100% -p 1
nagios   123668  0.0  0.0   4404   800 ?        SN   16:03   0:00  |           \_ /bin/ping -n -U -w 10 -c 1 192.168.0.10
...etc

$ ipcs -q

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Soft state counter decreasing?!?!?

Post by tmcdonald »

Did you configure NDO (ndoutils, ndo2db, etc) on this machine by any chance?
Former Nagios employee
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Soft state counter decreasing?!?!?

Post by Box293 »

I suspected multiple Nagios processes being the cause of the problem but as we can see there is only one parent nagios process.

ipcs relates to ndoutils which appears as though you're not using it.

I see you're on nagios3, I suggest upgrading to 4.1.1 as there have been many updates since then.

Are you using a load balancer like Mod_Gearman?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Soft state counter decreasing?!?!?

Post by hsmith »

Also, what was the installation method for this? Source/repo? OS? Would be nice to have this information handy for investigation.
Former Nagios Employee.
me.
zzt
Posts: 4
Joined: Tue Oct 27, 2015 1:10 pm

Re: Soft state counter decreasing?!?!?

Post by zzt »

It's running on Ubuntu 12.04.5 LTS, and installed using apt-get off our local mirror. There's no sign of NDO or Gearman either. The load average on this thing is pretty high thought. The CPUs (2 proc, 24 cores) aren't that busy, but there are a ton of processes, but I guess that's just nagios being nagios:

Code: Select all

top - 21:21:38 up 36 days, 20:28,  1 user,  load average: 33.02, 29.96, 28.02
Tasks: 795 total,  13 running, 782 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.3%us, 26.1%sy, 23.8%ni, 41.9%id,  3.5%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  49411656k total, 14925488k used, 34486168k free,   482744k buffers
Swap:  4194300k total,        0k used,  4194300k free,  4807712k cached

Code: Select all

 21:21:55 up 36 days, 20:28,  1 user,  load average: 30.57, 29.59, 27.94
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Soft state counter decreasing?!?!?

Post by Box293 »

zzt wrote:It's running on Ubuntu 12.04.5 LTS, and installed using apt-get off our local mirror.


This is most likely core 3.x. There were some major performance improvements in 4.x and I recommend you upgrade to 4.1.1.

I do have some guides here for installing from Source on Ubuntu:
http://sites.box293.com/nagios/guides/i ... untu-14-04

However that is version 4.0.8, I don't have a guide for 4.1.1 on Ubuntu at the minute however the relevant URLs are in this CentOS guide:
http://sites.box293.com/nagios/guides/i ... centos-6-7

You would need to take a copy of your configs first and then uninstall the old version before installing from source.

You could look at installing Mod_Gearman locally which may help.

A RAM Disk would also greatly improve performance. This guide is for Nagios XI on CentOS/RHEL however the concepts are identical.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked