server malfunction (!)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

server malfunction (!)

Post by KiwiBloke »

Hi,

Excuse the subject line i didn't know how else to classify this.

I was creating some dashboards last week for capacity planning purposes and then took a week off, i have returned to this today to carry on and have noticed that the charts have all stopped collecting data from approximately a week ago. but it gets worse

If I check a server at random ....
Overview page
"Host check is pending..."
next check is "not scheduled" and last check is "never"

Performance Graphs
I can see some charts, but they do not contain any data, I also see "You are not authorized to access this feature. Contact your Nagios XI administrator for more information, or to obtain access to this feature."

Advanced
Many host attributes are ticked green, but actually have red circles.

minemap view - everything is largely green.

engine status, everything appears green.

Halp!

C.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: server malfunction (!)

Post by slansing »

Can you attach a screenshot of Admin > System Status?
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: server malfunction (!)

Post by BanditBBS »

I'm just going to throw this out there......

I had a very similar issue last week after a reboot of my server. half the screens stuff was all green but other screens it looked like nothing was scheduled. I restarted the nagios service and everything was fine almost immediately.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: server malfunction (!)

Post by KiwiBloke »

Hi,

Thanks for yourt suggestion. I did monitoring top for a few minutes and noticed that two apache processes were consuming >10% of CPU each. But this didn't seem excessive. I have rebooted the server and this seems to have gone away, there are still a few apache processes but none are more than ~4% . Process_Perfdat was consuming ~40% but this has since quietened down considerably. We also see the ESXi perl commands consume~10% cpu each, but the only seem to run for a few seconds.

anyway, i have made other checks but the issue is still in effect. all hosts appear to have similar host status screens.

screenshots as attached.

Cheers,

C.
You do not have the required permissions to view the files attached to this post.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: server malfunction (!)

Post by slansing »

Can you restart ndo2db and see if that kicks the checking off again?

Code: Select all

service ndo2db restart
I think those hosts may be disabled somehow as well, you should click the green check marks to re-enable those attributes.
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: server malfunction (!)

Post by KiwiBloke »

Hi,

I ran the command and got the following result (bear in mind the server has been restarted previously as part of an attempt to resolve the issue)

Code: Select all

[root@pfsunagiosxi ~]# service ndo2db restart
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting ndo2db: done.
[root@pfsunagiosxi ~]#
Perhaps this is normal behaviour ( have checked the directory and the lock file exists once the service was restarted)

I have enabled the checks for one of our servers and waited 15mins (at least two polling cycles) but all the status flags are red and the overview summery information is still missing.

So it looks like this has not altered anything. change you point me to any system logs i can look at?

Cheers,
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: server malfunction (!)

Post by abrist »

Alright, bear with me. We are going to try to restart most of the relevant services:

Code: Select all

service nagios stop
service ndo2db stop
service mysql stop
service postgresql stop
killall -9 nagios
killall -9 ndo2db
service mysqld start
service postgresql start
service nagios start
service ndo2db start
If this does not work for, send an email to [email protected] to open a ticket. Attach your system profile .zip to the email.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
KiwiBloke
Posts: 81
Joined: Fri Apr 27, 2012 7:23 pm

Re: server malfunction (!)

Post by KiwiBloke »

Hi,

Thanks for this, unfortunately it did not seem to work.

I will make contact with the support address as you have described.

Cheers,

KB.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: server malfunction (!)

Post by abrist »

OP is pursuing support through the ticketing system. Primary issue with npcd/perfdata load/timeouts. Locking thread.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked