Alerts no longer fireing
Alerts no longer fireing
I loaded in about 200 or so alerts into nagios logserver. Somewhere along the line they stopped fireing. They will fire if i manually trigger them.
I have the alerts set to check every 5 minues for a loop back of 60 minutes
they all stopped firing at about the same time
Wed, 29 Apr 2015 17:03:02 -0400
I'm running Nagios Log Server • 2015R1.4
Thank you
I have the alerts set to check every 5 minues for a loop back of 60 minutes
they all stopped firing at about the same time
Wed, 29 Apr 2015 17:03:02 -0400
I'm running Nagios Log Server • 2015R1.4
Thank you
You do not have the required permissions to view the files attached to this post.
Re: Alerts no longer fireing
If you do a 'follow tail' on jobs.log, do you see the 'run_alerts' command run every minute?
Every minute or so you should see:
If you do not see the above, you'll need to reset your jobs from the command subsystem page.
I would also like a tail of your cron log:
Let me know - thanks!
Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.logCode: Select all
Running command run_alerts with args ' ' for job id: run_all_alertsI would also like a tail of your cron log:
Code: Select all
tail -n20 /var/log/cronRe: Alerts no longer fireing
jolson wrote:If you do a 'follow tail' on jobs.log, do you see the 'run_alerts' command run every minute?Every minute or so you should see:Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.logIf you do not see the above, you'll need to reset your jobs from the command subsystem page.Code: Select all
Running command run_alerts with args ' ' for job id: run_all_alerts
I would also like a tail of your cron log:Let me know - thanks!Code: Select all
tail -n20 /var/log/cron
The jobs.log is a 0 byte file
[root@pnagios05lxv ~]# tail -n20 /var/log/cron
Apr 30 14:40:01 pnagios05lxv CROND[2834]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:40:01 pnagios05lxv CROND[2835]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:41:01 pnagios05lxv CROND[2926]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:41:01 pnagios05lxv CROND[2928]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:42:01 pnagios05lxv CROND[3069]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:42:01 pnagios05lxv CROND[3070]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:43:01 pnagios05lxv CROND[3170]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:43:01 pnagios05lxv CROND[3172]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:44:01 pnagios05lxv CROND[3269]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:44:01 pnagios05lxv CROND[3271]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:45:01 pnagios05lxv CROND[3369]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:45:01 pnagios05lxv CROND[3370]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:46:01 pnagios05lxv CROND[3470]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:46:01 pnagios05lxv CROND[3471]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:47:01 pnagios05lxv CROND[3605]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:47:01 pnagios05lxv CROND[3606]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:48:01 pnagios05lxv CROND[3703]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:48:01 pnagios05lxv CROND[3705]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:49:01 pnagios05lxv CROND[3805]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:49:01 pnagios05lxv CROND[3807]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
You do not have the required permissions to view the files attached to this post.
Re: Alerts no longer fireing
Jobs.log is truncated very often, which is why I suggested a 'follow tail' as opposed to a regular tail - I should have mentioned that in my original post. I would like to see if your alert jobs are firing.The jobs.log is a 0 byte file
Based on the looks of your cron log, the jobs are firing properly - but I would like you to verify that.
Re: Alerts no longer fireing
Is that space in "nagioslogser ver" in your /etc/cron.d/nagioslogserver file as well or is it a typo in your post?
Re: Alerts no longer fireing
here's what i'm seeing from the tailjolson wrote:Jobs.log is truncated very often, which is why I suggested a 'follow tail' as opposed to a regular tail - I should have mentioned that in my original post. I would like to see if your alert jobs are firing.The jobs.log is a 0 byte file
Based on the looks of your cron log, the jobs are firing properly - but I would like you to verify that.
[root@pnagios05lxv ~]# tail -f /usr/local/nagioslogserver/var/jobs.log
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Re: Alerts no longer fireing
It looks like your alert jobs aren't coming in. Try resetting your jobs from the Command Subsystem:
You do not have the required permissions to view the files attached to this post.
Re: Alerts no longer fireing
jolson wrote:It looks like your alert jobs aren't coming in. Try resetting your jobs from the Command Subsystem:
ok cool that looks like it got it running again. Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
Re: Alerts no longer fireing
Glad to hear you're up and running again!
Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently? Anything notable that you might have changed before the alerts stopped triggering?
Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.
I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.
Best,
Jesse
Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently? Anything notable that you might have changed before the alerts stopped triggering?
One method that I can think of is to create an alert that always fires - have it send an email to some mailbox every minute. If the emails were to stop coming in automatically, that's an indication the job server may have detached.Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.
I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.
Best,
Jesse
Re: Alerts no longer fireing
jolson wrote:Glad to hear you're up and running again!
Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently perhaps? Anything notable that you might have changed before the alerts stopped triggering?
One method that I can think of is to create an alert that always fires - have it send an email to some mailbox every minute. If the emails were to stop coming in automatically, that's an indication the job server may have detached.Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.
I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.
Best,
Jesse
no upgrade recently. But I did add more memory to the system and reboot. The timing almost lines up.
Thanks for the tip. I think I might just write a nagios plugin or a check_mk check to monitor this. I don't think monitoring the maillog will work for me as we arn't sending any e-mail notifications just NDRP to our nagios dashboard.