Page 1 of 1

Alerts are not running automatically

Posted: Tue Sep 19, 2017 1:03 pm
by ssoliveira
Execution Alerts are not running automatically; respecting the parameter "Check interval".

I have 3 alerts, which analyzes the logs regarding the delay of delivery of email.

Alerts analyze a period of 1 hour of logs; They are parameterized to perform the analyzes every 5 minutes. However, the execution of the analyzes is not occurring. I need executions every 5 minutes, so that the problem is solved automatically in nagios-xi, as the environment normalizes.

I tried other paramatriations, such as 1 hour of "Check Interval" and 1 hour of "Search Period", but an analysis to work manually.

Apparently an automatic routine is not working.

As shown in the pictures; the last execution was yesterday. When I was setting up the service.

Can you help me; to fix it?
Thank you.

Re: Alerts are not running automatically

Posted: Tue Sep 19, 2017 1:20 pm
by cdienger
Change the 0 to a 1: to send a warning if there are no matching lines.

https://nagios-plugins.org/doc/guidelin ... HOLDFORMAT

Re: Alerts are not running automatically

Posted: Tue Sep 19, 2017 6:33 pm
by ssoliveira
Hi, the problem is not the threshold.

The problem is that the execution of the analysis is not occurring automatically, it only works if I click on the manual execution icon.

It is in 1, and still does not execute automatically, respecting the parameter "Check interval"

Re: Alerts are not running automatically

Posted: Wed Sep 20, 2017 11:18 am
by cdienger
What is the status of the run_all_alerts job and it's frequency set to under Administration > System > Command Subsystem? Try clearing the current status with "Reset All Jobs" button. If it's still failing after this, make sure the cron job is running with the command:

Code: Select all

ps aux | grep jobs
You should see a line like:

Code: Select all

/bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
Also, the command:

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Should also be run. Every 20 seconds(or dicated by the frequency set on the run_all_alerts job) you should see an entry like:

Code: Select all

Running command run_alerts with args ' ' for job id: run_all_alerts

If not, then /var/log/cron log should be reviewed as well as /etc/cron.d/nagioslogserver. /etc/cron.d/nagioslogserver should contain:

Code: Select all

* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1

Re: Alerts are not running automatically

Posted: Wed Sep 20, 2017 6:31 pm
by ssoliveira
Hello .. Good evening,

Job run_all_alerts was running status, with the next run for the 24th of last month.

After clicking the "Reset All Jobs" button, the Job was left with "Waiting" status for a few seconds and then changed to "Waiting" with "Last Run Status" in "SUCCESS"

I ran the "ps aux | grep jobs" command on the 4 servers.

3 returned information, and 1 of them did not, according to the following data.

Should there be a job running on all servers? Or is there the concept of master?

Code: Select all

[root@datalog-ugt-log1 ~]#
[root@datalog-ugt-log1 ~]#
[root@datalog-ugt-log1 ~]# ps aux | grep jobs
nagios   10493  0.0  0.0 113120  1208 ?        Ss   20:15   0:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var jobs.log 2>&1
nagios   10495  0.1  0.0 262028 14732 ?        S    20:15   0:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
root     10652  0.0  0.0 112648   968 pts/0    R+   20:15   0:00 grep --color=auto jobs
[root@datalog-ugt-log1 ~]#
[root@datalog-ugt-log1 ~]#
[root@datalog-ugt-log1 ~]# ssh datalog-ugt-log2
Last login: Thu Aug 31 17:57:30 2017 from 172.23.3.149
[root@datalog-ugt-log2 ~]#
[root@datalog-ugt-log2 ~]# ps aux | grep jobs
nagios    4790  0.0  0.0 113120  1192 ?        Ss   20:15   0:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var jobs.log 2>&1
nagios    4793  0.1  0.0 240140 14088 ?        S    20:15   0:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
root      4947  0.0  0.0 112648   968 pts/0    S+   20:15   0:00 grep --color=auto jobs
[root@datalog-ugt-log2 ~]#
[root@datalog-ugt-log2 ~]#
[root@datalog-ugt-log2 ~]# ssh datalog-utb-log1
Last login: Thu Aug 31 17:23:56 2017 from datalog-ugt-log1
[root@datalog-utb-log1 ~]#
[root@datalog-utb-log1 ~]# ps aux | grep jobs
root     29678  0.0  0.0 112648   968 pts/0    S+   20:16   0:00 grep --color=auto jobs
[root@datalog-utb-log1 ~]#
[root@datalog-utb-log1 ~]#
[root@datalog-utb-log1 ~]# ssh datalog-utb-log2
Last login: Thu Aug 31 17:50:03 2017 from datalog-ugt-log2
[root@datalog-utb-log2 ~]#
[root@datalog-utb-log2 ~]#
[root@datalog-utb-log2 ~]# ps aux | grep jobs
nagios   25685  0.0  0.0 113120  1208 ?        Ss   20:16   0:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var jobs.log 2>&1
nagios   25687  0.1  0.0 240140 14088 ?        S    20:16   0:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
root     25787  0.0  0.0 112648   968 pts/0    S+   20:16   0:00 grep --color=auto jobs
[root@datalog-utb-log2 ~]#
Run the job manually on the 4 servers.

When executing in the first datalog-ugt-log1 the command did not return information, remaining locked (running) for minutes, until I cancel with "crtl + c"

Is this behavior expected? Does it keep running indefinitely? Or should the requisition on this page start and end?

Code: Select all

[root@datalog-ugt-log1 ~]#
[root@datalog-ugt-log1 ~]# /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1
^C
[root@datalog-ugt-log1 ~]# tail -f /usr/local/nagioslogserver/var/jobs.log
Running command run_alerts with args ' ' for job id: run_all_alerts
SUCCESS
Contents of the /etc/cron.d/nagioslogserver file on 4 servers.

Code: Select all

[root@datalog-ugt-log1 ~]# ssh datalog-ugt-log1 cat /etc/cron.d/nagioslogserver
# /etc/cron.d/nagioslogserver: crontab fragment for nagioslogserver

* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1

[root@datalog-ugt-log1 ~]# ssh datalog-ugt-log2 cat /etc/cron.d/nagioslogserver
# /etc/cron.d/nagioslogserver: crontab fragment for nagioslogserver

* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1

[root@datalog-ugt-log1 ~]# ssh datalog-utb-log1 cat /etc/cron.d/nagioslogserver
# /etc/cron.d/nagioslogserver: crontab fragment for nagioslogserver

* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1

[root@datalog-ugt-log1 ~]# ssh datalog-utb-log2 cat /etc/cron.d/nagioslogserver
# /etc/cron.d/nagioslogserver: crontab fragment for nagioslogserver

* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1

Re: Alerts are not running automatically

Posted: Thu Sep 21, 2017 9:33 am
by scottwilkerson
Click the button on the last image you uploaded that says "Reset All Jobs"

Somehow the run_azll_alerts got stuck in a running state.