Upgrade from 2011 to 2012 Failed

JulianFDRacing · Post by **JulianFDRacing** » Tue Oct 23, 2012 4:29 pm

[1350988185] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1350988178.perfdata.host' timed out after 5 seconds
[1351030293] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030285.perfdata.host' timed out after 5 seconds
[1351030681] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030675.perfdata.host' timed out after 5 seconds

Could this be disk corruption? If so are there any recommended ways of running fsck?

Post by **lmiltchev** » Tue Oct 23, 2012 4:38 pm

Are your graphs working now? If you go to the "Advanced" tab under "Host (or Service) Status Detail" do you see anything on the "Performance Data:" line? Have you tried restarting npcd?

Code: Select all

service npcd restart

JulianFDRacing · Post by **JulianFDRacing** » Tue Oct 23, 2012 4:52 pm

Interstingly NPCD was stopped when I tried to restart, will see if anything starts to come through, the RRD timestamps look OK but not sure if the data is in them, anything that can view the content of an RRD?

JulianFDRacing · Post by **JulianFDRacing** » Tue Oct 23, 2012 5:19 pm

It seems to have moved forward by 40 minutes and is still 3 hours behind, if NCPD is stopped intermittently (I have rebooted a couple of times) could it lose sections of data and be a total of 3 hours out?, is so how can I make it catch up and start collecting again, I did do checks to make sure the services still have process_perf_info set to on and that in advanced it does list the performance data in the field and found all ok, in fact I set process_perf_info through a template but used the bulk mod tool to make sure its enabled, made no difference to that host. My original post said that it was since the upgrade, but after a restart it caught up to 7:20, then again I restarted at about 10pm the data now shows up to just before 8pm hence me asking if NPCD stops then will it be possible for it to get out of sync?

Keeping my eye on NPCD service and will see if it stops, any way to check RRD's? or that NPCD working?

JulianFDRacing · Post by **JulianFDRacing** » Tue Oct 23, 2012 6:03 pm

DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::cnwujcc FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_cnwujcc!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::rdnscam AppServer SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_appstatus_rdnscam!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::ascsaccu FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_ascsaccu!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 4 process(es)

Extract from
perl -pe 's/(\d+)/localtime($1)/e' service-perfdata

Time looks accurate at this point

scottwilkerson · Post by **scottwilkerson** » Tue Oct 23, 2012 7:33 pm

Let us know if problems re-occur

JulianFDRacing · Post by **JulianFDRacing** » Wed Oct 24, 2012 3:21 am

scottwilkerson wrote:Let us know if problems re-occur

The problem hadn't gone away, I've left it overnight to see if any data is collected and now it only appears to be about an hour out, the nagios log is still showing the timeout errors. This VM is running pretty slow, is it possible that its just in catch up mode?

scottwilkerson · Post by **scottwilkerson** » Wed Oct 24, 2012 7:00 am

JulianFDRacing wrote:is it possible that its just in catch up mode?

Yes, and I can recommend setting the time out to like 15 seconds in /usr/local/nagios/etc/pnp/process_perfdata.cfg

JulianFDRacing · Post by **JulianFDRacing** » Wed Oct 24, 2012 8:30 am

This is bizarre, when I left this overnight it caught up and was only about half an hour out, it then seemed to stall for a few hours until I restarted the whole box, then it appears to have jumped forward an hour from 11:10 to 12:10 but seems to have stalled again, I restarted at 14:00 when it moved forward an hour, NPCD is running and the clock is correct (managed to get it changed in ESX not to update the time), the graph data now is generally showing up to 12:10 but a couple still show 11:10, I'm not convinced that the graphing service is running reliably as it seems to work just after a reboot and then stop again.

I'm going to restart the services to update the timeout setting as per your last post but this issue still seems to exist despite everything appearing to work OK.

Thanks

Post by **lmiltchev** » Wed Oct 24, 2012 11:10 am

It seems like, the problems that you are experiencing are caused by your ESX server, resetting the time after reboot. Since, you've fixed that, your system will (*should*) eventually catch up. I would recommend keeping eye on the time, and making sure the npcd is running.

Nagios Support Forum

Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed

Re: Upgrade from 2011 to 2012 Failed