Dear All,
Today I found a strange issue in nagios.
Checks were not happening in nagios for 30 minutes
when i checked the nagios.log
following logs i found ... Please help me in finding out the root cause..
I found that checks were not happening and after some time ( 10 minutes) i found the below line in nagios.log
[1409219057] Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...
I didnt do any sort of changes in the server..i didnt even login
after this i found countinous logs like
[Thu Aug 28 15:20:36 2014] Max concurrent service checks (3000) has been reached. Nudging Ranchi by 10 seconds...
[Thu Aug 28 15:20:36 2014] Max concurrent service checks (3000) has been reached. Nudging Raipur by 10 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging Vapi by 5 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging Indore by 13 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging Plant by 6 seconds...
[Thu Aug 28 15:20:37 2014] Max concurrent service checks (3000) has been reached. Nudging DNS Server by 5 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging Ghaziabad:Environment by 7 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging Noida by 6 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging Lucknow by 13 seconds...
[Thu Aug 28 15:20:38 2014] Max concurrent service checks (3000) has been reached. Nudging FARIDABAD:Uptime by 11 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging Raipur by 9 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging Mahindra-REVA Link by 9 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging Goregaon:Environment by 9 seconds...
[Thu Aug 28 15:20:39 2014] Max concurrent service checks (3000) has been reached. Nudging :Environment by 10 seconds...
[Thu Aug 28 15:20:40 2014] Max concurrent service checks (3000) has been reached. Nudging Ghaziabad:Memory Used by 8 seconds...
All these (Raipur,Vapi,Indore,Plant server, etcc..,,)were network devices and servers
For all the checks after that were logged like this...
after half an hour...
the issue got resolved automatically without doing anything and i found this log after ( 5 -7 minutes )
[1409221391] Auto-save of retention data completed successfully.
Please help me in getting this issue resolved...
If this is repeating...then i will be in trouble.
Thanks in advance...
Regards,
Ashok.
Nagios Checks were not happening - system time change
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Nagios Checks were not happening - system time change
Looks like you've got some time drift, follow these steps to resolve that:
http://assets.nagios.com/downloads/nagi ... m_Time.pdf
Easiest way would be to set up NTP. As far as concurrent checks go...sheesh... that is a lot, what are your check intervals? How many services do you have on each interval? Here are some options for the concurrent checks setting:
http://nagios.sourceforge.net/docs/nagi ... uning.html
http://assets.nagios.com/downloads/nagi ... m_Time.pdf
Easiest way would be to set up NTP. As far as concurrent checks go...sheesh... that is a lot, what are your check intervals? How many services do you have on each interval? Here are some options for the concurrent checks setting:
http://nagios.sourceforge.net/docs/nagi ... uning.html
Re: Nagios Checks were not happening - system time change
Hi Slangsing,
We have already configured ntp and we are in sync with ntp server only.
I checked with another server also there was no time change..
As our environment is very big we had to configure like that and we are able manage it..
check intervals are globally - 15 minutes and it varies based on the criticallity of the servers. For a few it is like 10 minutes, 5 minutes , 3 minutes etc,..
I think because this has come
Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...
it stopped the checks because of time difference.. as checks were scheduled and it didnt find the exact time stamp..
I want a clarity on why time has changed and where it has changed .. in ntp server? or in nagios server?
When i checked with ntp server team, they told that they didnt do any sort of changes
even i didnt do the changes in nagios server too..
but why this has come is where i got stuck,,
please help me out..
We have already configured ntp and we are in sync with ntp server only.
I checked with another server also there was no time change..
As our environment is very big we had to configure like that and we are able manage it..
check intervals are globally - 15 minutes and it varies based on the criticallity of the servers. For a few it is like 10 minutes, 5 minutes , 3 minutes etc,..
I think because this has come
Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...
it stopped the checks because of time difference.. as checks were scheduled and it didnt find the exact time stamp..
I want a clarity on why time has changed and where it has changed .. in ntp server? or in nagios server?
When i checked with ntp server team, they told that they didnt do any sort of changes
even i didnt do the changes in nagios server too..
but why this has come is where i got stuck,,
please help me out..
Re: Nagios Checks were not happening - system time change
Can anybody give a clue on this please,,
i need to provide the Root Cause by tomorrow EOD..
And also i need the confirmation whether this issue is nagios application's or server OS's or NTP server's?
Please help...
i need to provide the Root Cause by tomorrow EOD..
And also i need the confirmation whether this issue is nagios application's or server OS's or NTP server's?
Please help...
Re: Nagios Checks were not happening - system time change
Well, nagios just detects the system time change, but has no control over actually changing it. I would assume you server experienced some drift and the ntp daemon fixed the offset. Nagios detected the change and attempted to compensate. As far as I know, nagios, by default, does not effect system time changes.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Nagios Checks were not happening - system time change
This is NTP changing your clock. Make sure your hardware clock is in sync with your software clock and that your NTP server is reachable and has the right time. If you have a properly synced NTP client, almost 17 minute drifts should never happen.Warning: A system time change of 0d 0h 16m 56s (forwards in time) has been detected. Compensating...
Re: Nagios Checks were not happening - system time change
Thank you very much for your valuable suggestions..
Re: Nagios Checks were not happening - system time change
Did that resolve the problem?
Former Nagios employee
Re: Nagios Checks were not happening - system time change
Actually we didnt do anything from nagios application end.. it has auto rectified on it own after 17 minutes.. without any service restarts...or any troubleshooting..
I have informed the concerned team to troubleshoot regarding why the drift is happening..
I have informed the concerned team to troubleshoot regarding why the drift is happening..
Re: Nagios Checks were not happening - system time change
Depending on the age and manufacturer of the machine, you may have a bad or dying battery on the motherboard. Worth looking at, but glad it's working.