Tactical Overview, Ops Center, Ops Screen Problems

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

abrist wrote:Wasn't the initial problems due to a change in architecture causing the rrds to not be read/written? That and the timeout/load issues with npcd. Lets get the rrds handled and then move on to any other problems once they present themselves.
I have removed a number of host from my original Nagios install, as well as those same hosts on the new Nagios install as they were recently deemed unnecessary (already checked on another system).

When I go to check the perfdata folder, these hosts are still there as folders. Does Nagios not delete them here once I have deleted them in Nagios?

I'm getting ready to remove a whole load of these folders, but I want to make sure I'm not going to be removing something erroneously. Are these folders all representative of hosts that have been entered into Nagios, and only hosts? Not host groups, etc?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by abrist »

jbennett wrote:Are these folders all representative of hosts that have been entered into Nagios, and only hosts? Not host groups, etc?
These folders should represent hosts only (and their respective services). As long as you do not need the historical information, you are safe to delete those files that pertain to removed hosts. XI does not clean these up automatically just in case the historical data should be retained.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

Thanks, on my way there now.

One other question though. I'm seeing a number of services in these host files that are no longer used. For instance, these service have been consolidated from something like A-Ping, B-Ping, C-Ping to just Ping. I'm still seeing the A-Ping, B-Ping, and C-Ping in the folders of the still active hosts, as well as the newer Ping service. Am I correct in assuming that I should be fine to remove these services that are no longer even in Nagios any more?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by abrist »

Yep, as long as you have no need for the historical data.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

Ok - I'm in the process of unpacking the data onto the new machine right now.

However, I'm noticing that my service-perfdata file on the old machine is getting rather big (25MB currently) and that I'm now running out of space on my ramdisk.

When this happens, I can't see any data on the Service Status screen. I get "No matching services found" even though the Tactical overview shows some services as being down.

Since I'm not yet able to move over to the new server, I need to get this resolved.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by abrist »

jbennett wrote:However, I'm noticing that my service-perfdata file on the old machine is getting rather big (25MB currently) and that I'm now running out of space on my ramdisk.
The command "process-service-perfdata" should be run quite often and reduce the file size. Is any of the perfdata getting reaped and displayed in nagios?
Check the npcd process one more time:

Code: Select all

service npcd status
Also make sure there is not more than 1 parent instance of npcd:

Code: Select all

ps -aef | grep npcd
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

When I check my process-service-perfdata command, I see the following:

Code: Select all

/usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
Shouldn't that be pointing to the ramdisk instead?

I'm going to change the command to the following and see how it works. I'm going to try the same for the host-perfdata command as well.

Code: Select all

/usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /var/nagiosramdisk/service-perfdata.out

Code: Select all

[root@nagiosxivm ~]# service npcd status
NPCD running (pid 1913).
[root@nagiosxivm ~]# ps -aef | grep npcd
nagios    1913     1  0 13:43 ?        00:00:00 /usr/local/nagios/bin/npcd -d -f                                                                              /usr/local/nagios/etc/pnp/npcd.cfg
root     16644 11534  0 13:53 pts/0    00:00:00 grep npcd
When I vi the service-perfdata file, I see a whole bunch of

Code: Select all

HOSTSTATE::UP   HOSTSTATETYPE::HARD     SERVICESTATE::OK        SERVICESTATETYPE::HARD
When I search for one of the 2 hosts that are down, I don't find them listed in the file at all.
Last edited by jbennett on Wed Mar 06, 2013 11:19 am, edited 2 times in total.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

Now that I've successfully imported the graphing information and have it showing up on the new server, I've tried clicking on one of the graphs to get more detailed information. Upon doing so, I see the following: You are not authorized to access this feature. Contact your Nagios XI administrator for more information, or to obtain access to this feature

I am logged in as admin on the Nagios server yet I still see this.

Also, am I correct in seeing that the files that were generated as part of the process have since been deleted? I no longer see the .rrd.xml files in any of the folders. I only see .rrd and .xml files. I just want to make sure I don't need to follow up and remove any of these.

I can now see the Tactical overview and ops screens, but the data isn't consistant with the home screen, which links to the tactical overview. When I compare this to the old Nagios server, the home screen link is accurate, but when I click directly on the Tactical Overview, it is incorrect.

I am still running into the issue where I will click on the 'x Unhandled Problems' in the Tactical Overview and I can't see anything. No matching services found
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by scottwilkerson »

jbennett wrote:Now that I've successfully imported the graphing information and have it showing up on the new server
Great...
jbennett wrote:I've tried clicking on one of the graphs to get more detailed information. Upon doing so, I see the following: You are not authorized to access this feature. Contact your Nagios XI administrator for more information, or to obtain access to this feature
This is odd, does it only happen when you click on a performance graph that you are viewing on the service detail page -> Performance graph section?
jbennett wrote:Also, am I correct in seeing that the files that were generated as part of the process have since been deleted? I no longer see the .rrd.xml files in any of the folders. I only see .rrd and .xml files. I just want to make sure I don't need to follow up and remove any of these.
correct
jbennett wrote:I can now see the Tactical overview and ops screens, but the data isn't consistant with the home screen, which links to the tactical overview. When I compare this to the old Nagios server, the home screen link is accurate, but when I click directly on the Tactical Overview, it is incorrect.
There is a bug in the current TAC this will be fixed in 2012R1.7 when it is released
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

This is odd, does it only happen when you click on a performance graph that you are viewing on the service detail page -> Performance graph section?
When I go to the Performance graph tab on the service details page, I get the same thing.

However, I am noticing that all of my host and service checks seem to be in a perpetual pending state once I click on them. For example:

When I'm using the quick find and I type in a host name, I get that host with a list of the services assigned to be checked.

The status for those services says "Ok" with a last check of very recently. However, the host its self is greyed out and notifications are disabled. When I click on either the host or the service, it tells me that the are pending??

I just checked my npcd log and have found the following:

Code: Select all

[03-06-2013 07:59:22] NPCD: npcd Daemon (0.4.14) started with PID=3371
[03-06-2013 07:59:22] NPCD: Please have a look at 'npcd -V' to get license information
[03-06-2013 07:59:22] NPCD: HINT: load_threshold is enabled - ('20.000000')
[03-06-2013 11:29:01] NPCD: WARN: MAX load reached: load 22.010000/20.000000 at i=0[03-06-2013 11:29:17] NPCD: WARN: MAX load reached: load 26.120000/20.000000 at i=1[03-06-2013 11:29:32] NPCD: WARN: MAX load reached: load 25.710000/20.000000 at i=1[03-06-2013 11:29:47] NPCD: WARN: MAX load reached: load 24.930000/20.000000 at i=1[03-06-2013 11:29:56] NPCD: Caught Termination Signal - Hasta la vista... baby
[03-06-2013 11:35:42] NPCD: npcd Daemon (0.4.14) started with PID=3388
[03-06-2013 11:35:42] NPCD: Please have a look at 'npcd -V' to get license information
[03-06-2013 11:35:42] NPCD: HINT: load_threshold is enabled - ('20.000000')
Locked