Alerts no longer fireing

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Alerts no longer fireing

Post by Jklre »

I loaded in about 200 or so alerts into nagios logserver. Somewhere along the line they stopped fireing. They will fire if i manually trigger them.

I have the alerts set to check every 5 minues for a loop back of 60 minutes

they all stopped firing at about the same time

Wed, 29 Apr 2015 17:03:02 -0400

I'm running Nagios Log Server • 2015R1.4

Thank you
4.jpg
5.jpg
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Alerts no longer fireing

Post by jolson »

If you do a 'follow tail' on jobs.log, do you see the 'run_alerts' command run every minute?

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Every minute or so you should see:

Code: Select all

Running command run_alerts with args ' ' for job id: run_all_alerts
If you do not see the above, you'll need to reset your jobs from the command subsystem page.

I would also like a tail of your cron log:

Code: Select all

tail -n20 /var/log/cron
Let me know - thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Re: Alerts no longer fireing

Post by Jklre »

jolson wrote:If you do a 'follow tail' on jobs.log, do you see the 'run_alerts' command run every minute?

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Every minute or so you should see:

Code: Select all

Running command run_alerts with args ' ' for job id: run_all_alerts
If you do not see the above, you'll need to reset your jobs from the command subsystem page.

I would also like a tail of your cron log:

Code: Select all

tail -n20 /var/log/cron
Let me know - thanks!

The jobs.log is a 0 byte file
6.jpg
[root@pnagios05lxv ~]# tail -n20 /var/log/cron
Apr 30 14:40:01 pnagios05lxv CROND[2834]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:40:01 pnagios05lxv CROND[2835]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:41:01 pnagios05lxv CROND[2926]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:41:01 pnagios05lxv CROND[2928]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:42:01 pnagios05lxv CROND[3069]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:42:01 pnagios05lxv CROND[3070]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:43:01 pnagios05lxv CROND[3170]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:43:01 pnagios05lxv CROND[3172]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:44:01 pnagios05lxv CROND[3269]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:44:01 pnagios05lxv CROND[3271]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:45:01 pnagios05lxv CROND[3369]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:45:01 pnagios05lxv CROND[3370]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:46:01 pnagios05lxv CROND[3470]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:46:01 pnagios05lxv CROND[3471]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:47:01 pnagios05lxv CROND[3605]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:47:01 pnagios05lxv CROND[3606]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:48:01 pnagios05lxv CROND[3703]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:48:01 pnagios05lxv CROND[3705]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Apr 30 14:49:01 pnagios05lxv CROND[3805]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Apr 30 14:49:01 pnagios05lxv CROND[3807]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogser ver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Alerts no longer fireing

Post by jolson »

The jobs.log is a 0 byte file
Jobs.log is truncated very often, which is why I suggested a 'follow tail' as opposed to a regular tail - I should have mentioned that in my original post. I would like to see if your alert jobs are firing.

Based on the looks of your cron log, the jobs are firing properly - but I would like you to verify that.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Alerts no longer fireing

Post by ssax »

Is that space in "nagioslogser ver" in your /etc/cron.d/nagioslogserver file as well or is it a typo in your post?
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Re: Alerts no longer fireing

Post by Jklre »

jolson wrote:
The jobs.log is a 0 byte file
Jobs.log is truncated very often, which is why I suggested a 'follow tail' as opposed to a regular tail - I should have mentioned that in my original post. I would like to see if your alert jobs are firing.

Based on the looks of your cron log, the jobs are firing properly - but I would like you to verify that.
here's what i'm seeing from the tail

[root@pnagios05lxv ~]# tail -f /usr/local/nagioslogserver/var/jobs.log
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Alerts no longer fireing

Post by jolson »

It looks like your alert jobs aren't coming in. Try resetting your jobs from the Command Subsystem:
2015-04-30 14_19_28-Command Subsystem • Nagios Log Server.png
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Re: Alerts no longer fireing

Post by Jklre »

jolson wrote:It looks like your alert jobs aren't coming in. Try resetting your jobs from the Command Subsystem:
2015-04-30 14_19_28-Command Subsystem • Nagios Log Server.png

ok cool that looks like it got it running again. Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Alerts no longer fireing

Post by jolson »

Glad to hear you're up and running again!

Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently? Anything notable that you might have changed before the alerts stopped triggering?
Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
One method that I can think of is to create an alert that always fires - have it send an email to some mailbox every minute. If the emails were to stop coming in automatically, that's an indication the job server may have detached.

Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.

I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.

Best,


Jesse
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Re: Alerts no longer fireing

Post by Jklre »

jolson wrote:Glad to hear you're up and running again!

Typically this doesn't happen often, and I'm not sure what may have caused it in your case. Occasionally a Nagios Log Server update breaks the jobs system for whatever reason, and a job reset fixes the issue as it did in this case. Did you upgrade NLS recently perhaps? Anything notable that you might have changed before the alerts stopped triggering?
Do you know what may have caused this and is there a way to have this alert if these jobs are not running?
One method that I can think of is to create an alert that always fires - have it send an email to some mailbox every minute. If the emails were to stop coming in automatically, that's an indication the job server may have detached.

Another method may be monitoring the maillog of the server in question for activity - if mail stops being sent, that's an indication the job server may have detached.

I understand that the above methods are a little patched together, but there is not currently a mechanism in Nagios Log Server for detecting this - I can definitely file a feature request on your behalf if you'd like me to.

Best,


Jesse

no upgrade recently. But I did add more memory to the system and reboot. The timing almost lines up.

Thanks for the tip. I think I might just write a nagios plugin or a check_mk check to monitor this. I don't think monitoring the maillog will work for me as we arn't sending any e-mail notifications just NDRP to our nagios dashboard.
Locked