Page 1 of 2

Alerts no longer fireing

Posted: Thu Apr 30, 2015 1:38 pm
by Jklre
I loaded in about 200 or so alerts into nagios logserver. Somewhere along the line they stopped fireing. They will fire if i manually trigger them.

I have the alerts set to check every 5 minues for a loop back of 60 minutes

they all stopped firing at about the same time

Wed, 29 Apr 2015 17:03:02 -0400

I'm running Nagios Log Server • 2015R1.4

Thank you
4.jpg
5.jpg

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 1:42 pm
by jolson
If you do a 'follow tail' on jobs.log, do you see the 'run_alerts' command run every minute?

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Every minute or so you should see:

Code: Select all

Running command run_alerts with args ' ' for job id: run_all_alerts
If you do not see the above, you'll need to reset your jobs from the command subsystem page.

I would also like a tail of your cron log:

Code: Select all

tail -n20 /var/log/cron
Let me know - thanks!

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 1:52 pm
by Jklre
jolson wrote:If you do a 'follow tail' on jobs.log, do you see the 'run_alerts' command run every minute?

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Every minute or so you should see:

Code: Select all

Running command run_alerts with args ' ' for job id: run_all_alerts
If you do not see the above, you'll need to reset your jobs from the command subsystem page.

I would also like a tail of your cron log:

Code: Select all

tail -n20 /var/log/cron
Let me know - thanks!

The jobs.log is a 0 byte file
6.jpg
[root@pnagios05lxv ~]# tail -n20 /var/log/cron
Apr 30 14:40:01 pnagios05lxv CROND[2834]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:40:01 pnagios05lxv CROND[2835]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:41:01 pnagios05lxv CROND[2926]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:41:01 pnagios05lxv CROND[2928]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:42:01 pnagios05lxv CROND[3069]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:42:01 pnagios05lxv CROND[3070]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:43:01 pnagios05lxv CROND[3170]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:43:01 pnagios05lxv CROND[3172]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:44:01 pnagios05lxv CROND[3269]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:44:01 pnagios05lxv CROND[3271]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:45:01 pnagios05lxv CROND[3369]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:45:01 pnagios05lxv CROND[3370]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:46:01 pnagios05lxv CROND[3470]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:46:01 pnagios05lxv CROND[3471]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:47:01 pnagios05lxv CROND[3605]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:47:01 pnagios05lxv CROND[3606]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:48:01 pnagios05lxv CROND[3703]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:48:01 pnagios05lxv CROND[3705]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:49:01 pnagios05lxv CROND[3805]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:49:01 pnagios05lxv CROND[3807]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 1:55 pm
by jolson
The jobs.log is a 0 byte file
Jobs.log is truncated very often, which is why I suggested a 'follow tail' as opposed to a regular tail - I should have mentioned that in my original post. I would like to see if your alert jobs are firing.

Based on the looks of your cron log, the jobs are firing properly - but I would like you to verify that.

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 1:59 pm
by ssax
Is that space in "nagioslogser ver" in your /etc/cron.d/nagioslogserver file as well or is it a typo in your post?

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 2:02 pm
by Jklre
jolson wrote:
The jobs.log is a 0 byte file
Jobs.log is truncated very often, which is why I suggested a 'follow tail' as opposed to a regular tail - I should have mentioned that in my original post. I would like to see if your alert jobs are firing.

Based on the looks of your cron log, the jobs are firing properly - but I would like you to verify that.
here's what i'm seeing from the tail

[root@pnagios05lxv ~]# tail -f /usr/local/nagioslogserver/var/jobs.log
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 2:19 pm
by jolson
It looks like your alert jobs aren't coming in. Try resetting your jobs from the Command Subsystem:
2015-04-30 14_19_28-Command Subsystem • Nagios Log Server.png

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 3:20 pm
by Jklre
jolson wrote:It looks like your alert jobs aren't coming in. Try resetting your jobs from the Command Subsystem:
2015-04-30 14_19_28-Command Subsystem • Nagios Log Server.png

ok cool that looks like it got it running again. Do you know what may have caused this and is there a way to have this alert if these jobs are not running?

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 3:31 pm
by jolson
Glad to hear you're up and running again!

Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently? Anything notable that you might have changed before the alerts stopped triggering?
Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
One method that I can think of is to create an alert that always fires - have it send an email to some mailbox every minute. If the emails were to stop coming in automatically, that's an indication the job server may have detached.

Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.

I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.

Best,


Jesse

Re: Alerts no longer fireing

Posted: Thu Apr 30, 2015 3:43 pm
by Jklre
jolson wrote:Glad to hear you're up and running again!

Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently perhaps? Anything notable that you might have changed before the alerts stopped triggering?
Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
One method that I can think of is to create an alert that always fires - have it send an email to some mailbox every minute. If the emails were to stop coming in automatically, that's an indication the job server may have detached.

Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.

I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.

Best,


Jesse

no upgrade recently. But I did add more memory to the system and reboot. The timing almost lines up.

Thanks for the tip. I think I might just write a nagios plugin or a check_mk check to monitor this. I don't think monitoring the maillog will work for me as we arn't sending any e-mail notifications just NDRP to our nagios dashboard.