Page 1 of 2
<<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Tue Aug 28, 2012 9:58 pm
by uhiadmin
Linux Distribution and version?
1. Cent OS 6.2 / 64 Bit
2. Manual Install
3. Gnome installed
4. Update Version R3.3
To Nagios XI Tech Support Specialist:
I applied the R3.3 and monitoring of hosts nodes has not been stable. This is what I have observed since the R3.3 Update:
Total Processes Critical 1/4 2012-8-28: 2848 processes with a STATE=RSZDT
Hopefully, the attached config snapshot might reveal the issue.
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Wed Aug 29, 2012 6:56 am
by scottwilkerson
That's a lot of processes...
Can you run the following
and then attach /tmp/procs.txt
thanks
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Wed Aug 29, 2012 5:54 pm
by uhiadmin
Here is the file that asked me to attach.....
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Thu Aug 30, 2012 8:15 am
by scottwilkerson
The problem must have cleared up because there are only 267 processes in that file...
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Sun Sep 02, 2012 2:45 pm
by uhiadmin
This message popped up after applying the upgrade:
[root@fldmon ~]# service ndo2db stop
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Sun Sep 02, 2012 3:42 pm
by scottwilkerson
Was ndo2db running when you ran this stop command?
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Tue Sep 04, 2012 5:26 pm
by uhiadmin
Scott,
I believe ndo2db was running. I had to perform a stop on all services in order to perform a mysql repair. Reason for the repair, we are experiencing high cpu load. Critical notification on the services side go beyond 2. This is when we see a spike in down host nodes and "I/O Wait" status is going into the red. My understanding and belief is that we have to offload the mysql database. Our host nodes amount to 3,364 and services are 75. Hopefully, I am explaining this clearly to you.
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Wed Sep 05, 2012 9:38 am
by lmiltchev
Is it stable now, after the mysql repair? Anything interesting in the system log?
If you decide to offload mysql, here is the document you need to review:
http://assets.nagios.com/downloads/nagi ... Server.pdf
Hope this helps.
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Wed Sep 05, 2012 7:57 pm
by uhiadmin
This is all that I am getting from the command:
Last login: Sun Sep 2 17:14:00 2012
[root@fldmon ~]# tail /var/log/messages
Sep 5 17:54:17 fldmon nagios: HOST ALERT: 747054SW01;DOWN;SOFT;1;CRITICAL - 10.47.54.2: rta nan, lost 100%
Sep 5 17:54:27 fldmon nagios: SERVICE ALERT: localhost;Current Load;CRITICAL;HARD;4;CRITICAL - load average: 6.00, 5.21, 4.29
Sep 5 17:54:29 fldmon nagios: HOST ALERT: 747054SW01;UP;SOFT;2;OK - 10.47.54.2: rta 46.917ms, lost 0%
Sep 5 17:55:26 fldmon nagios: HOST ALERT: 747054SW01;DOWN;SOFT;1;CRITICAL - 10.47.54.2: rta nan, lost 100%
Sep 5 17:55:26 fldmon nagios: HOST ALERT: 747054;DOWN;SOFT;1;CRITICAL - 10.47.54.1: rta nan, lost 100%
Sep 5 17:55:29 fldmon nagios: HOST ALERT: 747054;UP;SOFT;2;OK - 10.47.54.1: rta 45.857ms, lost 0%
Sep 5 17:55:29 fldmon nagios: HOST ALERT: 747054SW01;UP;SOFT;2;OK - 10.47.54.2: rta 46.944ms, lost 0%
Sep 5 17:55:29 fldmon nagios: HOST FLAPPING ALERT: 747054SW01;STARTED; Host appears to have started flapping (23.2% change > 20.0% threshold)
Sep 5 17:55:50 fldmon nagios: HOST ALERT: 747054SW01;DOWN;SOFT;1;CRITICAL - 10.47.54.2: rta nan, lost 100%
Sep 5 17:56:02 fldmon nagios: HOST ALERT: 747054SW01;UP;SOFT;2;OK - 10.47.54.2: rta 46.542ms, lost 20%
[root@fldmon ~]#
Re: <<<R3.3-Monitoring of hosts nodes has not been stable>>>
Posted: Thu Sep 06, 2012 8:57 am
by scottwilkerson
Has the machine stabilized since the repair and re-start of ndo2db?