Upgrade from 2011 to 2012 Failed
-
- Posts: 51
- Joined: Tue Oct 16, 2012 9:45 am
Re: Upgrade from 2011 to 2012 Failed
[1350988185] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1350988178.perfdata.host' timed out after 5 seconds
[1351030293] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030285.perfdata.host' timed out after 5 seconds
[1351030681] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030675.perfdata.host' timed out after 5 seconds
Could this be disk corruption? If so are there any recommended ways of running fsck?
[1351030293] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030285.perfdata.host' timed out after 5 seconds
[1351030681] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030675.perfdata.host' timed out after 5 seconds
Could this be disk corruption? If so are there any recommended ways of running fsck?
Re: Upgrade from 2011 to 2012 Failed
Are your graphs working now? If you go to the "Advanced" tab under "Host (or Service) Status Detail" do you see anything on the "Performance Data:" line? Have you tried restarting npcd?
Code: Select all
service npcd restart
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 51
- Joined: Tue Oct 16, 2012 9:45 am
Re: Upgrade from 2011 to 2012 Failed
Interstingly NPCD was stopped when I tried to restart, will see if anything starts to come through, the RRD timestamps look OK but not sure if the data is in them, anything that can view the content of an RRD?
-
- Posts: 51
- Joined: Tue Oct 16, 2012 9:45 am
Re: Upgrade from 2011 to 2012 Failed
It seems to have moved forward by 40 minutes and is still 3 hours behind, if NCPD is stopped intermittently (I have rebooted a couple of times) could it lose sections of data and be a total of 3 hours out?, is so how can I make it catch up and start collecting again, I did do checks to make sure the services still have process_perf_info set to on and that in advanced it does list the performance data in the field and found all ok, in fact I set process_perf_info through a template but used the bulk mod tool to make sure its enabled, made no difference to that host. My original post said that it was since the upgrade, but after a restart it caught up to 7:20, then again I restarted at about 10pm the data now shows up to just before 8pm hence me asking if NPCD stops then will it be possible for it to get out of sync?
Keeping my eye on NPCD service and will see if it stops, any way to check RRD's? or that NPCD working?
Keeping my eye on NPCD service and will see if it stops, any way to check RRD's? or that NPCD working?
-
- Posts: 51
- Joined: Tue Oct 16, 2012 9:45 am
Re: Upgrade from 2011 to 2012 Failed
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::cnwujcc FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_cnwujcc!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::rdnscam AppServer SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_appstatus_rdnscam!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::ascsaccu FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_ascsaccu!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 4 process(es)
Extract from
perl -pe 's/(\d+)/localtime($1)/e' service-perfdata
Time looks accurate at this point
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::rdnscam AppServer SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_appstatus_rdnscam!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::ascsaccu FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_ascsaccu!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 4 process(es)
Extract from
perl -pe 's/(\d+)/localtime($1)/e' service-perfdata
Time looks accurate at this point
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Upgrade from 2011 to 2012 Failed
Let us know if problems re-occur
-
- Posts: 51
- Joined: Tue Oct 16, 2012 9:45 am
Re: Upgrade from 2011 to 2012 Failed
The problem hadn't gone away, I've left it overnight to see if any data is collected and now it only appears to be about an hour out, the nagios log is still showing the timeout errors. This VM is running pretty slow, is it possible that its just in catch up mode?scottwilkerson wrote:Let us know if problems re-occur
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Upgrade from 2011 to 2012 Failed
Yes, and I can recommend setting the time out to like 15 seconds in /usr/local/nagios/etc/pnp/process_perfdata.cfgJulianFDRacing wrote:is it possible that its just in catch up mode?
-
- Posts: 51
- Joined: Tue Oct 16, 2012 9:45 am
Re: Upgrade from 2011 to 2012 Failed
This is bizarre, when I left this overnight it caught up and was only about half an hour out, it then seemed to stall for a few hours until I restarted the whole box, then it appears to have jumped forward an hour from 11:10 to 12:10 but seems to have stalled again, I restarted at 14:00 when it moved forward an hour, NPCD is running and the clock is correct (managed to get it changed in ESX not to update the time), the graph data now is generally showing up to 12:10 but a couple still show 11:10, I'm not convinced that the graphing service is running reliably as it seems to work just after a reboot and then stop again.
I'm going to restart the services to update the timeout setting as per your last post but this issue still seems to exist despite everything appearing to work OK.
Thanks
I'm going to restart the services to update the timeout setting as per your last post but this issue still seems to exist despite everything appearing to work OK.
Thanks
Re: Upgrade from 2011 to 2012 Failed
It seems like, the problems that you are experiencing are caused by your ESX server, resetting the time after reboot. Since, you've fixed that, your system will (*should*) eventually catch up. I would recommend keeping eye on the time, and making sure the npcd is running.
Be sure to check out our Knowledgebase for helpful articles and solutions!