Page 2 of 4

Re: Upgrade from 2011 to 2012 Failed

Posted: Tue Oct 23, 2012 4:29 pm
by JulianFDRacing
[1350988185] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1350988178.perfdata.host' timed out after 5 seconds
[1351030293] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030285.perfdata.host' timed out after 5 seconds
[1351030681] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030675.perfdata.host' timed out after 5 seconds

Could this be disk corruption? If so are there any recommended ways of running fsck?

Re: Upgrade from 2011 to 2012 Failed

Posted: Tue Oct 23, 2012 4:38 pm
by lmiltchev
Are your graphs working now? If you go to the "Advanced" tab under "Host (or Service) Status Detail" do you see anything on the "Performance Data:" line? Have you tried restarting npcd?

Code: Select all

service npcd restart

Re: Upgrade from 2011 to 2012 Failed

Posted: Tue Oct 23, 2012 4:52 pm
by JulianFDRacing
Interstingly NPCD was stopped when I tried to restart, will see if anything starts to come through, the RRD timestamps look OK but not sure if the data is in them, anything that can view the content of an RRD?

Re: Upgrade from 2011 to 2012 Failed

Posted: Tue Oct 23, 2012 5:19 pm
by JulianFDRacing
It seems to have moved forward by 40 minutes and is still 3 hours behind, if NCPD is stopped intermittently (I have rebooted a couple of times) could it lose sections of data and be a total of 3 hours out?, is so how can I make it catch up and start collecting again, I did do checks to make sure the services still have process_perf_info set to on and that in advanced it does list the performance data in the field and found all ok, in fact I set process_perf_info through a template but used the bulk mod tool to make sure its enabled, made no difference to that host. My original post said that it was since the upgrade, but after a restart it caught up to 7:20, then again I restarted at about 10pm the data now shows up to just before 8pm hence me asking if NPCD stops then will it be possible for it to get out of sync?

Keeping my eye on NPCD service and will see if it stops, any way to check RRD's? or that NPCD working?

Re: Upgrade from 2011 to 2012 Failed

Posted: Tue Oct 23, 2012 6:03 pm
by JulianFDRacing
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::cnwujcc FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_cnwujcc!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::rdnscam AppServer SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_appstatus_rdnscam!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::ascsaccu FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_ascsaccu!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 4 process(es)

Extract from
perl -pe 's/(\d+)/localtime($1)/e' service-perfdata

Time looks accurate at this point

Re: Upgrade from 2011 to 2012 Failed

Posted: Tue Oct 23, 2012 7:33 pm
by scottwilkerson
Let us know if problems re-occur

Re: Upgrade from 2011 to 2012 Failed

Posted: Wed Oct 24, 2012 3:21 am
by JulianFDRacing
scottwilkerson wrote:Let us know if problems re-occur
The problem hadn't gone away, I've left it overnight to see if any data is collected and now it only appears to be about an hour out, the nagios log is still showing the timeout errors. This VM is running pretty slow, is it possible that its just in catch up mode?

Re: Upgrade from 2011 to 2012 Failed

Posted: Wed Oct 24, 2012 7:00 am
by scottwilkerson
JulianFDRacing wrote:is it possible that its just in catch up mode?
Yes, and I can recommend setting the time out to like 15 seconds in /usr/local/nagios/etc/pnp/process_perfdata.cfg

Re: Upgrade from 2011 to 2012 Failed

Posted: Wed Oct 24, 2012 8:30 am
by JulianFDRacing
This is bizarre, when I left this overnight it caught up and was only about half an hour out, it then seemed to stall for a few hours until I restarted the whole box, then it appears to have jumped forward an hour from 11:10 to 12:10 but seems to have stalled again, I restarted at 14:00 when it moved forward an hour, NPCD is running and the clock is correct (managed to get it changed in ESX not to update the time), the graph data now is generally showing up to 12:10 but a couple still show 11:10, I'm not convinced that the graphing service is running reliably as it seems to work just after a reboot and then stop again.

I'm going to restart the services to update the timeout setting as per your last post but this issue still seems to exist despite everything appearing to work OK.

Thanks

Re: Upgrade from 2011 to 2012 Failed

Posted: Wed Oct 24, 2012 11:10 am
by lmiltchev
It seems like, the problems that you are experiencing are caused by your ESX server, resetting the time after reboot. Since, you've fixed that, your system will (*should*) eventually catch up. I would recommend keeping eye on the time, and making sure the npcd is running.