Weird scheduling issues

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Weird scheduling issues

Post by slansing »

Are the check files getting cleared from your /tmp directory? What do their timestamps look like, are some very far into the past that normally would have been reaped?
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird scheduling issues

Post by vAJ »

Longest running check I have is 10sec. per the perf page.
Andrew J. - Do you even grok?
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird scheduling issues

Post by vAJ »

They go all the way back to Oct of last year... so yeah, they should've been reaped.
Andrew J. - Do you even grok?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Weird scheduling issues

Post by scottwilkerson »

The check files in /tmp will not get reaped if they are left behind from the case outlined.

The longest check may be 10 seconds now, but this is only since nagios started this time.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird scheduling issues

Post by vAJ »

So, given the number and age of these, is it reasonable to assume that I have some outstanding performance problems with this instance?
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Weird scheduling issues

Post by slansing »

Not necessarily likely, they are temporary files created in memory when a check is sent out, and, if the check does not return within "for instance by default (10 seconds)" those files could be left behind since they are dropped from memory when a check returns. But, if they were created and then dropped from memory since the check information did not return in a timely manner they would remain there until you removed them. Thusly they are safe to delete as you would have little to lose, they could simply be there because a check took longer than 10 seconds to return at some point between October and now.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird scheduling issues

Post by vAJ »

They're not just from October. Consistently show dates from then until now, including today. 58k in total.
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Weird scheduling issues

Post by slansing »

Do you know off hand which checks in your environment could be taking a while to return data? Think along the lines of a windows update check, or a check that triggers a remote script and waits for it to finish running. You could possibly correlate the timestamps and frequency those temp files get left behind, then try to figure out what check it may be from that angle as well. Though it is safe to remove those temp files now, as they are of no use to the system any more.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird scheduling issues

Post by vAJ »

Spot checking the check files, looks like all NSClient checks against Windows hosts.
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Weird scheduling issues

Post by slansing »

Have you noticed any notifications or alerts based around "Socket timeout" or "timeout after 'x' seconds?" You may want to add a:

Code: Select all

-t 30
Flag or even a higher number to some of those NSClient++ windows checks that could take longer to return in the CCM, this will extend the timeout range of the check.
Locked