Nagios Checks were not happening - system time change

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
ashok
Posts: 22
Joined: Mon Jul 21, 2014 8:07 am

Nagios Checks were not happening - system time change

Post by ashok »

Dear All,

Today I found a strange issue in nagios.

Checks were not happening in nagios for 30 minutes

when i checked the nagios.log

following logs i found ... Please help me in finding out the root cause..

I found that checks were not happening and after some time ( 10 minutes) i found the below line in nagios.log

[1409219057] Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...

I didnt do any sort of changes in the server..i didnt even login


after this i found countinous logs like

[Thu Aug 28 15:20:36 2014] Max concurrent service checks (3000) has been reached. Nudging Ranchi by 10 seconds...
[Thu Aug 28 15:20:36 2014] Max concurrent service checks (3000) has been reached. Nudging Raipur by 10 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging Vapi by 5 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging Indore by 13 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging Plant by 6 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging DNS Server by 5 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging Ghaziabad:Environment by 7 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging Noida by 6 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging Lucknow by 13 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging FARIDABAD:Uptime by 11 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging Raipur by 9 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging Mahindra-REVA Link by 9 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging Goregaon:Environment by 9 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging :Environment by 10 seconds...
[Thu Aug 28 15:20:40 2014] Max concurrent service checks (3000) has been reached. Nudging Ghaziabad:Memory Used by 8 seconds...

All these (Raipur,Vapi,Indore,Plant server, etcc..,,)were network devices and servers

For all the checks after that were logged like this...

after half an hour...

the issue got resolved automatically without doing anything and i found this log after ( 5 -7 minutes )


[1409221391] Auto-save of retention data completed successfully.



Please help me in getting this issue resolved...

If this is repeating...then i will be in trouble.


Thanks in advance...

Regards,
Ashok.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios Checks were not happening - system time change

Post by slansing »

Looks like you've got some time drift, follow these steps to resolve that:

http://assets.nagios.com/downloads/nagi ... m_Time.pdf

Easiest way would be to set up NTP. As far as concurrent checks go...sheesh... that is a lot, what are your check intervals? How many services do you have on each interval? Here are some options for the concurrent checks setting:

http://nagios.sourceforge.net/docs/nagi ... uning.html
ashok
Posts: 22
Joined: Mon Jul 21, 2014 8:07 am

Re: Nagios Checks were not happening - system time change

Post by ashok »

Hi Slangsing,

We have already configured ntp and we are in sync with ntp server only.

I checked with another server also there was no time change..

As our environment is very big we had to configure like that and we are able manage it..

check intervals are globally - 15 minutes and it varies based on the criticallity of the servers. For a few it is like 10 minutes, 5 minutes , 3 minutes etc,..

I think because this has come

Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...

it stopped the checks because of time difference.. as checks were scheduled and it didnt find the exact time stamp..


I want a clarity on why time has changed and where it has changed .. in ntp server? or in nagios server?

When i checked with ntp server team, they told that they didnt do any sort of changes

even i didnt do the changes in nagios server too..

but why this has come is where i got stuck,,

please help me out..
ashok
Posts: 22
Joined: Mon Jul 21, 2014 8:07 am

Re: Nagios Checks were not happening - system time change

Post by ashok »

Can anybody give a clue on this please,,

i need to provide the Root Cause by tomorrow EOD..

And also i need the confirmation whether this issue is nagios application's or server OS's or NTP server's?

Please help...
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios Checks were not happening - system time change

Post by abrist »

Well, nagios just detects the system time change, but has no control over actually changing it. I would assume you server experienced some drift and the ntp daemon fixed the offset. Nagios detected the change and attempted to compensate. As far as I know, nagios, by default, does not effect system time changes.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Nagios Checks were not happening - system time change

Post by eloyd »

Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...
This is NTP changing your clock. Make sure your hardware clock is in sync with your software clock and that your NTP server is reachable and has the right time. If you have a properly synced NTP client, almost 17 minute drifts should never happen.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
ashok
Posts: 22
Joined: Mon Jul 21, 2014 8:07 am

Re: Nagios Checks were not happening - system time change

Post by ashok »

Thank you very much for your valuable suggestions..
:)
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios Checks were not happening - system time change

Post by tmcdonald »

Did that resolve the problem?
Former Nagios employee
ashok
Posts: 22
Joined: Mon Jul 21, 2014 8:07 am

Re: Nagios Checks were not happening - system time change

Post by ashok »

Actually we didnt do anything from nagios application end.. it has auto rectified on it own after 17 minutes.. without any service restarts...or any troubleshooting..
I have informed the concerned team to troubleshoot regarding why the drift is happening..
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Nagios Checks were not happening - system time change

Post by eloyd »

Depending on the age and manufacturer of the machine, you may have a bad or dying battery on the motherboard. Worth looking at, but glad it's working.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
Locked