Page 5 of 6
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Tue Mar 05, 2013 11:21 am
by jbennett
abrist wrote:Wasn't the initial problems due to a change in architecture causing the rrds to not be read/written? That and the timeout/load issues with npcd. Lets get the rrds handled and then move on to any other problems once they present themselves.
I have removed a number of host from my original Nagios install, as well as those same hosts on the new Nagios install as they were recently deemed unnecessary (already checked on another system).
When I go to check the perfdata folder, these hosts are still there as folders. Does Nagios not delete them here once I have deleted them in Nagios?
I'm getting ready to remove a whole load of these folders, but I want to make sure I'm not going to be removing something erroneously. Are these folders all representative of hosts that have been entered into Nagios, and only hosts? Not host groups, etc?
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Tue Mar 05, 2013 11:44 am
by abrist
jbennett wrote:Are these folders all representative of hosts that have been entered into Nagios, and only hosts? Not host groups, etc?
These folders should represent hosts only (and their respective services). As long as you do not need the historical information, you are safe to delete those files that pertain to removed hosts. XI does not clean these up automatically just in case the historical data should be retained.
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Tue Mar 05, 2013 11:51 am
by jbennett
Thanks, on my way there now.
One other question though. I'm seeing a number of services in these host files that are no longer used. For instance, these service have been consolidated from something like A-Ping, B-Ping, C-Ping to just Ping. I'm still seeing the A-Ping, B-Ping, and C-Ping in the folders of the still active hosts, as well as the newer Ping service. Am I correct in assuming that I should be fine to remove these services that are no longer even in Nagios any more?
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Tue Mar 05, 2013 12:56 pm
by abrist
Yep, as long as you have no need for the historical data.
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Tue Mar 05, 2013 5:25 pm
by jbennett
Ok - I'm in the process of unpacking the data onto the new machine right now.
However, I'm noticing that my service-perfdata file on the old machine is getting rather big (25MB currently) and that I'm now running out of space on my ramdisk.
When this happens, I can't see any data on the Service Status screen. I get "No matching services found" even though the Tactical overview shows some services as being down.
Since I'm not yet able to move over to the new server, I need to get this resolved.
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Tue Mar 05, 2013 5:32 pm
by abrist
jbennett wrote:However, I'm noticing that my service-perfdata file on the old machine is getting rather big (25MB currently) and that I'm now running out of space on my ramdisk.
The command "process-service-perfdata" should be run quite often and reduce the file size. Is any of the perfdata getting reaped and displayed in nagios?
Check the npcd process one more time:
Also make sure there is not more than 1 parent instance of npcd:
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Wed Mar 06, 2013 9:04 am
by jbennett
When I check my process-service-perfdata command, I see the following:
Code: Select all
/usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
Shouldn't that be pointing to the ramdisk instead?
I'm going to change the command to the following and see how it works. I'm going to try the same for the host-perfdata command as well.
Code: Select all
/usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /var/nagiosramdisk/service-perfdata.out
Code: Select all
[root@nagiosxivm ~]# service npcd status
NPCD running (pid 1913).
[root@nagiosxivm ~]# ps -aef | grep npcd
nagios 1913 1 0 13:43 ? 00:00:00 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
root 16644 11534 0 13:53 pts/0 00:00:00 grep npcd
When I vi the service-perfdata file, I see a whole bunch of
Code: Select all
HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD
When I search for one of the 2 hosts that are down, I don't find them listed in the file at all.
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Wed Mar 06, 2013 9:15 am
by jbennett
Now that I've successfully imported the graphing information and have it showing up on the new server, I've tried clicking on one of the graphs to get more detailed information. Upon doing so, I see the following: You are not authorized to access this feature. Contact your Nagios XI administrator for more information, or to obtain access to this feature
I am logged in as admin on the Nagios server yet I still see this.
Also, am I correct in seeing that the files that were generated as part of the process have since been deleted? I no longer see the .rrd.xml files in any of the folders. I only see .rrd and .xml files. I just want to make sure I don't need to follow up and remove any of these.
I can now see the Tactical overview and ops screens, but the data isn't consistant with the home screen, which links to the tactical overview. When I compare this to the old Nagios server, the home screen link is accurate, but when I click directly on the Tactical Overview, it is incorrect.
I am still running into the issue where I will click on the 'x Unhandled Problems' in the Tactical Overview and I can't see anything. No matching services found
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Wed Mar 06, 2013 12:21 pm
by scottwilkerson
jbennett wrote:Now that I've successfully imported the graphing information and have it showing up on the new server
Great...
jbennett wrote:I've tried clicking on one of the graphs to get more detailed information. Upon doing so, I see the following: You are not authorized to access this feature. Contact your Nagios XI administrator for more information, or to obtain access to this feature
This is odd, does it only happen when you click on a performance graph that you are viewing on the service detail page -> Performance graph section?
jbennett wrote:Also, am I correct in seeing that the files that were generated as part of the process have since been deleted? I no longer see the .rrd.xml files in any of the folders. I only see .rrd and .xml files. I just want to make sure I don't need to follow up and remove any of these.
correct
jbennett wrote:I can now see the Tactical overview and ops screens, but the data isn't consistant with the home screen, which links to the tactical overview. When I compare this to the old Nagios server, the home screen link is accurate, but when I click directly on the Tactical Overview, it is incorrect.
There is a bug in the current TAC this will be fixed in 2012R1.7 when it is released
Re: Tactical Overview, Ops Center, Ops Screen Problems
Posted: Wed Mar 06, 2013 1:41 pm
by jbennett
This is odd, does it only happen when you click on a performance graph that you are viewing on the service detail page -> Performance graph section?
When I go to the Performance graph tab on the service details page, I get the same thing.
However, I am noticing that all of my host and service checks seem to be in a perpetual pending state once I click on them. For example:
When I'm using the quick find and I type in a host name, I get that host with a list of the services assigned to be checked.
The status for those services says "Ok" with a last check of very recently. However, the host its self is greyed out and notifications are disabled. When I click on either the host or the service, it tells me that the are pending??
I just checked my npcd log and have found the following:
Code: Select all
[03-06-2013 07:59:22] NPCD: npcd Daemon (0.4.14) started with PID=3371
[03-06-2013 07:59:22] NPCD: Please have a look at 'npcd -V' to get license information
[03-06-2013 07:59:22] NPCD: HINT: load_threshold is enabled - ('20.000000')
[03-06-2013 11:29:01] NPCD: WARN: MAX load reached: load 22.010000/20.000000 at i=0[03-06-2013 11:29:17] NPCD: WARN: MAX load reached: load 26.120000/20.000000 at i=1[03-06-2013 11:29:32] NPCD: WARN: MAX load reached: load 25.710000/20.000000 at i=1[03-06-2013 11:29:47] NPCD: WARN: MAX load reached: load 24.930000/20.000000 at i=1[03-06-2013 11:29:56] NPCD: Caught Termination Signal - Hasta la vista... baby
[03-06-2013 11:35:42] NPCD: npcd Daemon (0.4.14) started with PID=3388
[03-06-2013 11:35:42] NPCD: Please have a look at 'npcd -V' to get license information
[03-06-2013 11:35:42] NPCD: HINT: load_threshold is enabled - ('20.000000')