a long running service

smcracraft · Post by **smcracraft** » Thu Jul 17, 2014 1:33 am

A service I wrote takes about 8 minutes to run.

It doesn't use the npre check but is just a standalone check run on a few boxes which polls some queues
over time.

I don't want to change the current setting of service_check_timeout in the cfg which is 60 seconds
to 8+ minutes. It gets time-out by the service timeout check default of 60 seconds.

What are my options other than:

+ not writing the monitor (that's not possible)
+ making it a standalone cron/at/daemon of some kind

The reasons for keeping it in Nagios are obvious.

I can't believe there is not a per-service service_check_timeout rather than only a global setting. The npre
is not needed in this case so I can't use the -t because it is running on the monitoring server itself, not remotely,
and I don't want to "loop back".

smcracraft · Post by **smcracraft** » Thu Jul 17, 2014 2:16 pm

That's strange. I bumped up service_check_timeout in the cfg and restarted
but the service is still getting terminated within 60 seconds.

abrist · Post by **abrist** » Thu Jul 17, 2014 5:26 pm

Have you considered making the check a cron and then having its results write to the command pipe or submit a passive check another way?

Leaving nagios forked for 8+ minutes is generally a bad idea.

smcracraft · Post by **smcracraft** » Mon Jul 21, 2014 8:45 pm

Yes - we have.

tmcdonald · Post by **tmcdonald** » Wed Jul 23, 2014 12:51 pm

smcracraft wrote:Yes - we have.

What objections do you have to running it as a cron? This seems like the easiest way to keep the check on your box without having to edit it and add in a -t flag.

Timeout are handled usually by the plugin itself and tend to default to 10 seconds, so the global Nagios timeout of 60 is a sort of failsafe.

One other option is to make the checks passive, stick them on the remote server, and then they can have whatever timeout they need.

smcracraft · Post by **smcracraft** » Tue Jul 29, 2014 3:51 pm

No objections for cron, although it is a few too many eggs in one basket, but no matter. For the remote runs, that also is
a no-no here.

In my case, I took the long-running complex monitor and simply made it record its runs in logfiles and then parse the prior runs
at an Nth attempt to measure long-stuff over longer-intervals than Nagios likes. It runs fine.

Hooray for space over time!

tmcdonald · Post by **tmcdonald** » Wed Jul 30, 2014 1:35 pm

If you have similar needs in the future (monitoring an average over time, it sounds like) you can take a look at a third-party addon called bischeck:

http://assets.nagios.com/downloads/nagi ... ios-XI.pdf

It can be a bit difficult get learn the syntax at first, but once you know it you can monitor things like change in averages over time (for disk usage increase rates) or varying thresholds throughout the day.

Nagios Support Forum

a long running service

a long running service

Re: a long running service

Re: a long running service

Re: a long running service

Re: a long running service

Re: a long running service

Re: a long running service