Upgrade from 2011 to 2012 Failed

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
JulianFDRacing
Posts: 51
Joined: Tue Oct 16, 2012 9:45 am

Re: Upgrade from 2011 to 2012 Failed

Post by JulianFDRacing »

[1350988185] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1350988178.perfdata.host' timed out after 5 seconds
[1351030293] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030285.perfdata.host' timed out after 5 seconds
[1351030681] Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1351030675.perfdata.host' timed out after 5 seconds

Could this be disk corruption? If so are there any recommended ways of running fsck?
User avatar
lmiltchev
Former Nagios Staff
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Upgrade from 2011 to 2012 Failed

Post by lmiltchev »

Are your graphs working now? If you go to the "Advanced" tab under "Host (or Service) Status Detail" do you see anything on the "Performance Data:" line? Have you tried restarting npcd?

Code: Select all

service npcd restart
Be sure to check out our Knowledgebase for helpful articles and solutions!
JulianFDRacing
Posts: 51
Joined: Tue Oct 16, 2012 9:45 am

Re: Upgrade from 2011 to 2012 Failed

Post by JulianFDRacing »

Interstingly NPCD was stopped when I tried to restart, will see if anything starts to come through, the RRD timestamps look OK but not sure if the data is in them, anything that can view the content of an RRD?
JulianFDRacing
Posts: 51
Joined: Tue Oct 16, 2012 9:45 am

Re: Upgrade from 2011 to 2012 Failed

Post by JulianFDRacing »

It seems to have moved forward by 40 minutes and is still 3 hours behind, if NCPD is stopped intermittently (I have rebooted a couple of times) could it lose sections of data and be a total of 3 hours out?, is so how can I make it catch up and start collecting again, I did do checks to make sure the services still have process_perf_info set to on and that in advanced it does list the performance data in the field and found all ok, in fact I set process_perf_info through a template but used the bulk mod tool to make sure its enabled, made no difference to that host. My original post said that it was since the upgrade, but after a restart it caught up to 7:20, then again I restarted at about 10pm the data now shows up to just before 8pm hence me asking if NPCD stops then will it be possible for it to get out of sync?

Keeping my eye on NPCD service and will see if it stops, any way to check RRD's? or that NPCD working?
JulianFDRacing
Posts: 51
Joined: Tue Oct 16, 2012 9:45 am

Re: Upgrade from 2011 to 2012 Failed

Post by JulianFDRacing »

DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::cnwujcc FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_cnwujcc!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::rdnscam AppServer SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_appstatus_rdnscam!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 1 process(es)
DATATYPE::SERVICEPERFDATA TIMET::Tue Oct 23 23:55:17 2012 HOSTNAME::asc-jadedev2.int.ascribe.com SERVICEDESC::ascsaccu FAT Client SERVICEPERFDATA:: SERVICECHECKCOMMAND::check_nrpe!check_jadestatus_ascsaccu!1!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::PROCESS OK - 4 process(es)

Extract from
perl -pe 's/(\d+)/localtime($1)/e' service-perfdata

Time looks accurate at this point
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Upgrade from 2011 to 2012 Failed

Post by scottwilkerson »

Let us know if problems re-occur
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
JulianFDRacing
Posts: 51
Joined: Tue Oct 16, 2012 9:45 am

Re: Upgrade from 2011 to 2012 Failed

Post by JulianFDRacing »

scottwilkerson wrote:Let us know if problems re-occur
The problem hadn't gone away, I've left it overnight to see if any data is collected and now it only appears to be about an hour out, the nagios log is still showing the timeout errors. This VM is running pretty slow, is it possible that its just in catch up mode?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Upgrade from 2011 to 2012 Failed

Post by scottwilkerson »

JulianFDRacing wrote:is it possible that its just in catch up mode?
Yes, and I can recommend setting the time out to like 15 seconds in /usr/local/nagios/etc/pnp/process_perfdata.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
JulianFDRacing
Posts: 51
Joined: Tue Oct 16, 2012 9:45 am

Re: Upgrade from 2011 to 2012 Failed

Post by JulianFDRacing »

This is bizarre, when I left this overnight it caught up and was only about half an hour out, it then seemed to stall for a few hours until I restarted the whole box, then it appears to have jumped forward an hour from 11:10 to 12:10 but seems to have stalled again, I restarted at 14:00 when it moved forward an hour, NPCD is running and the clock is correct (managed to get it changed in ESX not to update the time), the graph data now is generally showing up to 12:10 but a couple still show 11:10, I'm not convinced that the graphing service is running reliably as it seems to work just after a reboot and then stop again.

I'm going to restart the services to update the timeout setting as per your last post but this issue still seems to exist despite everything appearing to work OK.

Thanks
User avatar
lmiltchev
Former Nagios Staff
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Upgrade from 2011 to 2012 Failed

Post by lmiltchev »

It seems like, the problems that you are experiencing are caused by your ESX server, resetting the time after reboot. Since, you've fixed that, your system will (*should*) eventually catch up. I would recommend keeping eye on the time, and making sure the npcd is running.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked