Page 2 of 3
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 2:35 pm
by slansing
Are the check files getting cleared from your /tmp directory? What do their timestamps look like, are some very far into the past that normally would have been reaped?
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 2:44 pm
by vAJ
Longest running check I have is 10sec. per the perf page.
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 2:55 pm
by vAJ
They go all the way back to Oct of last year... so yeah, they should've been reaped.
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 2:59 pm
by scottwilkerson
The check files in /tmp will not get reaped if they are left behind from the case outlined.
The longest check may be 10 seconds now, but this is only since nagios started this time.
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 4:27 pm
by vAJ
So, given the number and age of these, is it reasonable to assume that I have some outstanding performance problems with this instance?
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 4:43 pm
by slansing
Not necessarily likely, they are temporary files created in memory when a check is sent out, and, if the check does not return within "for instance by default (10 seconds)" those files could be left behind since they are dropped from memory when a check returns. But, if they were created and then dropped from memory since the check information did not return in a timely manner they would remain there until you removed them. Thusly they are safe to delete as you would have little to lose, they could simply be there because a check took longer than 10 seconds to return at some point between October and now.
Re: Weird scheduling issues
Posted: Wed Apr 24, 2013 4:47 pm
by vAJ
They're not just from October. Consistently show dates from then until now, including today. 58k in total.
Re: Weird scheduling issues
Posted: Thu Apr 25, 2013 11:15 am
by slansing
Do you know off hand which checks in your environment could be taking a while to return data? Think along the lines of a windows update check, or a check that triggers a remote script and waits for it to finish running. You could possibly correlate the timestamps and frequency those temp files get left behind, then try to figure out what check it may be from that angle as well. Though it is safe to remove those temp files now, as they are of no use to the system any more.
Re: Weird scheduling issues
Posted: Thu Apr 25, 2013 2:34 pm
by vAJ
Spot checking the check files, looks like all NSClient checks against Windows hosts.
Re: Weird scheduling issues
Posted: Thu Apr 25, 2013 3:21 pm
by slansing
Have you noticed any notifications or alerts based around "Socket timeout" or "timeout after 'x' seconds?" You may want to add a:
Flag or even a higher number to some of those NSClient++ windows checks that could take longer to return in the CCM, this will extend the timeout range of the check.