Page 1 of 1
server malfunction (!)
Posted: Sun Nov 03, 2013 9:09 pm
by KiwiBloke
Hi,
Excuse the subject line i didn't know how else to classify this.
I was creating some dashboards last week for capacity planning purposes and then took a week off, i have returned to this today to carry on and have noticed that the charts have all stopped collecting data from approximately a week ago. but it gets worse
If I check a server at random ....
Overview page
"Host check is pending..."
next check is "not scheduled" and last check is "never"
Performance Graphs
I can see some charts, but they do not contain any data, I also see "You are not authorized to access this feature. Contact your Nagios XI administrator for more information, or to obtain access to this feature."
Advanced
Many host attributes are ticked green, but actually have red circles.
minemap view - everything is largely green.
engine status, everything appears green.
Halp!
C.
Re: server malfunction (!)
Posted: Mon Nov 04, 2013 12:40 pm
by slansing
Can you attach a screenshot of Admin > System Status?
Re: server malfunction (!)
Posted: Mon Nov 04, 2013 12:44 pm
by BanditBBS
I'm just going to throw this out there......
I had a very similar issue last week after a reboot of my server. half the screens stuff was all green but other screens it looked like nothing was scheduled. I restarted the nagios service and everything was fine almost immediately.
Re: server malfunction (!)
Posted: Mon Nov 04, 2013 4:46 pm
by KiwiBloke
Hi,
Thanks for yourt suggestion. I did monitoring top for a few minutes and noticed that two apache processes were consuming >10% of CPU each. But this didn't seem excessive. I have rebooted the server and this seems to have gone away, there are still a few apache processes but none are more than ~4% . Process_Perfdat was consuming ~40% but this has since quietened down considerably. We also see the ESXi perl commands consume~10% cpu each, but the only seem to run for a few seconds.
anyway, i have made other checks but the issue is still in effect. all hosts appear to have similar host status screens.
screenshots as attached.
Cheers,
C.
Re: server malfunction (!)
Posted: Wed Nov 06, 2013 5:53 pm
by slansing
Can you restart ndo2db and see if that kicks the checking off again?
I think those hosts may be disabled somehow as well, you should click the green check marks to re-enable those attributes.
Re: server malfunction (!)
Posted: Wed Nov 06, 2013 6:13 pm
by KiwiBloke
Hi,
I ran the command and got the following result (bear in mind the server has been restarted previously as part of an attempt to resolve the issue)
Code: Select all
[root@pfsunagiosxi ~]# service ndo2db restart
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting ndo2db: done.
[root@pfsunagiosxi ~]#
Perhaps this is normal behaviour ( have checked the directory and the lock file exists once the service was restarted)
I have enabled the checks for one of our servers and waited 15mins (at least two polling cycles) but all the status flags are red and the overview summery information is still missing.
So it looks like this has not altered anything. change you point me to any system logs i can look at?
Cheers,
Re: server malfunction (!)
Posted: Thu Nov 07, 2013 1:22 pm
by abrist
Alright, bear with me. We are going to try to restart most of the relevant services:
Code: Select all
service nagios stop
service ndo2db stop
service mysql stop
service postgresql stop
killall -9 nagios
killall -9 ndo2db
service mysqld start
service postgresql start
service nagios start
service ndo2db start
If this does not work for, send an email to
[email protected] to open a ticket. Attach your system profile .zip to the email.
Re: server malfunction (!)
Posted: Sun Nov 10, 2013 3:57 pm
by KiwiBloke
Hi,
Thanks for this, unfortunately it did not seem to work.
I will make contact with the support address as you have described.
Cheers,
KB.
Re: server malfunction (!)
Posted: Mon Nov 11, 2013 10:37 am
by abrist
OP is pursuing support through the ticketing system. Primary issue with npcd/perfdata load/timeouts. Locking thread.