<<<R3.3-Monitoring of hosts nodes has not been stable>>>

uhiadmin · Post by **uhiadmin** » Tue Aug 28, 2012 9:58 pm

Linux Distribution and version?
1. Cent OS 6.2 / 64 Bit
2. Manual Install
3. Gnome installed
4. Update Version R3.3

To Nagios XI Tech Support Specialist:
I applied the R3.3 and monitoring of hosts nodes has not been stable. This is what I have observed since the R3.3 Update:

Total Processes Critical 1/4 2012-8-28: 2848 processes with a STATE=RSZDT

Hopefully, the attached config snapshot might reveal the issue.

scottwilkerson · Post by **scottwilkerson** » Wed Aug 29, 2012 6:56 am

That's a lot of processes...

Can you run the following

Code: Select all

ps -ef > /tmp/procs.txt

and then attach /tmp/procs.txt

thanks

uhiadmin · Post by **uhiadmin** » Wed Aug 29, 2012 5:54 pm

Here is the file that asked me to attach.....

scottwilkerson · Post by **scottwilkerson** » Thu Aug 30, 2012 8:15 am

The problem must have cleared up because there are only 267 processes in that file...

uhiadmin · Post by **uhiadmin** » Sun Sep 02, 2012 2:45 pm

This message popped up after applying the upgrade:

[root@fldmon ~]# service ndo2db stop
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.

scottwilkerson · Post by **scottwilkerson** » Sun Sep 02, 2012 3:42 pm

Was ndo2db running when you ran this stop command?

uhiadmin · Post by **uhiadmin** » Tue Sep 04, 2012 5:26 pm

Scott,
I believe ndo2db was running. I had to perform a stop on all services in order to perform a mysql repair. Reason for the repair, we are experiencing high cpu load. Critical notification on the services side go beyond 2. This is when we see a spike in down host nodes and "I/O Wait" status is going into the red. My understanding and belief is that we have to offload the mysql database. Our host nodes amount to 3,364 and services are 75. Hopefully, I am explaining this clearly to you.

Post by **lmiltchev** » Wed Sep 05, 2012 9:38 am

Is it stable now, after the mysql repair? Anything interesting in the system log?

Code: Select all

tail /var/log/messages

If you decide to offload mysql, here is the document you need to review:

http://assets.nagios.com/downloads/nagi ... Server.pdf

Hope this helps.

uhiadmin · Post by **uhiadmin** » Wed Sep 05, 2012 7:57 pm

This is all that I am getting from the command:
Last login: Sun Sep 2 17:14:00 2012
[root@fldmon ~]# tail /var/log/messages
Sep 5 17:54:17 fldmon nagios: HOST ALERT: 747054SW01;DOWN;SOFT;1;CRITICAL - 10.47.54.2: rta nan, lost 100%
Sep 5 17:54:27 fldmon nagios: SERVICE ALERT: localhost;Current Load;CRITICAL;HARD;4;CRITICAL - load average: 6.00, 5.21, 4.29
Sep 5 17:54:29 fldmon nagios: HOST ALERT: 747054SW01;UP;SOFT;2;OK - 10.47.54.2: rta 46.917ms, lost 0%
Sep 5 17:55:26 fldmon nagios: HOST ALERT: 747054SW01;DOWN;SOFT;1;CRITICAL - 10.47.54.2: rta nan, lost 100%
Sep 5 17:55:26 fldmon nagios: HOST ALERT: 747054;DOWN;SOFT;1;CRITICAL - 10.47.54.1: rta nan, lost 100%
Sep 5 17:55:29 fldmon nagios: HOST ALERT: 747054;UP;SOFT;2;OK - 10.47.54.1: rta 45.857ms, lost 0%
Sep 5 17:55:29 fldmon nagios: HOST ALERT: 747054SW01;UP;SOFT;2;OK - 10.47.54.2: rta 46.944ms, lost 0%
Sep 5 17:55:29 fldmon nagios: HOST FLAPPING ALERT: 747054SW01;STARTED; Host appears to have started flapping (23.2% change > 20.0% threshold)
Sep 5 17:55:50 fldmon nagios: HOST ALERT: 747054SW01;DOWN;SOFT;1;CRITICAL - 10.47.54.2: rta nan, lost 100%
Sep 5 17:56:02 fldmon nagios: HOST ALERT: 747054SW01;UP;SOFT;2;OK - 10.47.54.2: rta 46.542ms, lost 20%
[root@fldmon ~]#

scottwilkerson · Post by **scottwilkerson** » Thu Sep 06, 2012 8:57 am

Has the machine stabilized since the repair and re-start of ndo2db?

Nagios Support Forum

<<<R3.3-Monitoring of hosts nodes has not been stable>>>

<<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>

Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>