Page 1 of 1
Jobs failing.
Posted: Mon Jul 27, 2015 10:30 am
by Jklre
We have been experiencing issues with the job to trigger alerts keeps on failing. We setup everything to validate the alerts that came in over the weekend to compare it to our current systems and after about an hour the jobs to run the alerts stopped. I reset all jobs at 4:30pm and the jobs failed roughly an hour after that. We have 1866 alerts that should be running. Is there anywhere that I can look to try and figure this out? Thank you.
ss.jpg
Re: Jobs failing.
Posted: Mon Jul 27, 2015 11:10 am
by jolson
It's possible that all of your alerts take longer than 20 seconds to run in full. Try changing the frequency to every 1 minute and see if that makes a difference. Thank you!
Re: Jobs failing.
Posted: Mon Jul 27, 2015 12:10 pm
by Jklre
jolson wrote:It's possible that all of your alerts take longer than 20 seconds to run in full. Try changing the frequency to every 1 minute and see if that makes a difference. Thank you!
I'm setting it to 1 minute. We will see if it stays stable. Thank you.
Re: Jobs failing.
Posted: Mon Jul 27, 2015 12:11 pm
by jolson
That sounds good - let us know if it helps.
Re: Jobs failing.
Posted: Mon Jul 27, 2015 5:53 pm
by Jklre
jolson wrote:That sounds good - let us know if it helps.
looks like the alert jobs failed again. I set it to run every 3 minutes. I noticed the next run jumped to some random time. 21:48:24 its currently 18:53:00
ss4.jpg
Re: Jobs failing.
Posted: Mon Jul 27, 2015 6:45 pm
by Jklre
Jklre wrote:jolson wrote:That sounds good - let us know if it helps.
looks like the alert jobs failed again. I set it to run every 3 minutes. I noticed the next run jumped to some random time. 21:48:24 its currently 18:53:00
ss4.jpg
Weird I think the NTP time is messed up on these boxes. I just reset the jobs and its stating the time is 16:42:51
Re: Jobs failing.
Posted: Tue Jul 28, 2015 9:15 am
by jolson
You can try this out to change the timezone on several services at once, if you haven't already:
Code: Select all
cd /usr/local/nagioslogserver/scripts/
./change_timezone.sh -z America/Chicago
Feel free to replace 'America/Chicago' with a location of your choice.
Also, be sure to get NTP sync'd up properly. I would restart all of your processes after the time has been corrected.
Code: Select all
service elasticsearch restart
service logstash restart
service crond restart
service httpd restart
Re: Jobs failing.
Posted: Wed Jul 29, 2015 11:05 am
by Jklre
jolson wrote:You can try this out to change the timezone on several services at once, if you haven't already:
Code: Select all
cd /usr/local/nagioslogserver/scripts/
./change_timezone.sh -z America/Chicago
Feel free to replace 'America/Chicago' with a location of your choice.
Also, be sure to get NTP sync'd up properly. I would restart all of your processes after the time has been corrected.
Code: Select all
service elasticsearch restart
service logstash restart
service crond restart
service httpd restart
So far it seems stable. I set the timezones and setup NTP sync. The jobs are set to run every 3 minutes and that seems to be good so far.
Re: Jobs failing.
Posted: Wed Jul 29, 2015 11:24 am
by jolson
That's great news - I'll keep the thread open in case there are any further problems.