Alerts not running

34Bearman · Post by **34Bearman** » Tue Mar 10, 2015 11:06 am

All of the Alerts have dates from last week on their last time they ran. I had this problem previously but the fix doesn't appear to work this time.

Previous Post: http://support.nagios.com/forum/viewtop ... 37&t=30652

When I run the command at the bottom of that thread alerts do not resume. Here is the command that was asked for output last time.

nagioslogserver]# curl -XGET 'http://localhost:9200/nagioslogserver/c ... run_alerts'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 3.3513753,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "eN2vMd_BSHWiRRDFx5appQ",
"_score" : 3.3513753,
"_source":{"created":"2015-02-02 15:27:00","active":1,"status":"running","type":"system","node":"global","command":"run_alerts","run_time":1423538059,"frequency":"20","last_run_time":"2015-02-09 21:13:59","last_run_status":"SUCCESS"}
}, {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "run_all_alerts",
"_score" : 3.3513753,
"_source":{"command":"run_alerts","run_time":1,"frequency":20,"node":"global","type":"system","status":"waiting","active":1}
} ]
}
}

Post by **tgriep** » Tue Mar 10, 2015 2:24 pm

What version of Log Server are you running?
Try upgrading to the latest version if you have not done so already.

In the new version, you can check the your job status by going to Administration > Command Subsystem.

34Bearman · Post by **34Bearman** » Tue Mar 10, 2015 3:48 pm

Thanks, I didn't know about that feature. When I went to the page nothing was running and the last runtimes were dated 1969. I hit the Reset Jobs Button. Jobs still are not running and a screen shot is posted below.

jolson · Post by **jolson** » Tue Mar 10, 2015 4:00 pm

What happens if you click the 'Edit' button next to one of those commands and set the next runtime to 'Now' - press save. Does the job in question start? If not, please run the following at the CLI and report your results:

Code: Select all

curl -XGET 'http://localhost:9200/nagioslogserver/commands/_search?pretty&q=command:do_maintenance'
getenforce

34Bearman · Post by **34Bearman** » Tue Mar 17, 2015 2:57 pm

Clicking on Now and updating the task still has the same result as before. Jobs will not run. Results of the command are below:

# curl -XGET 'http://localhost:9200/nagioslogserver/c ... aintenance'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.730029,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "backup_maintenance",
"_score" : 3.730029,
"_source":{"created":"2015-03-10 15:43:00","created_by":"1","active":1,"status":"waiting","type":"system","node":"global","command":"do_maintenance","run_time":1426106580,"frequency":"86400"}
} ]
}
}
[root@nagiosls ~]# getenforce
Disabled

jolson · Post by **jolson** » Tue Mar 17, 2015 3:47 pm

Can you please run the following command on all nodes and force a job from the command subsystem:

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log

This will give us the output of your jobs, hopefully pointing us in the right direction.

If forcing a job from the command subsystem doesn't seem to work properly, please run the following in a separate terminal:

Code: Select all

curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }'

34Bearman · Post by **34Bearman** » Tue Mar 17, 2015 4:54 pm

Forcing a command to run didn't work. Ran the second command and here is the output:

[root@nagiosls ~]# curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }'
{"_index":"nagioslogserver","_type":"commands","_id":"backup_maintenance","_version":6}[root@nagiosls ~]#

Also it appears that NLS has stopped accepting events from clients as well. Services are running but no events are being recorded or indexed. I've stopped and restarted the Elasticsearch and Logstash services. It will bring in a few events and then stop. I'm not sure if this is related or not.

Screenshot attached.

jolson · Post by **jolson** » Wed Mar 18, 2015 9:33 am

While the command is running, could you please get us a tail of jobs.log?

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log

This may provide us with some insight as to why the processes aren't firing. When did your alerts stop processing, and what was happening at that time?

34Bearman · Post by **34Bearman** » Wed Mar 18, 2015 8:37 pm

Sorry about that. So I tried to force the jobs while tailing the jobs.log. Nothing appeared in the tail of the log.

History on this is as follows:

1. Ran out of disk space.
2. With help from support extended the disk.
3. After that Alerts ran but would stop off an on.
4. Finally Alerts stopped and would not run. At the same time the database would intermittently stop processing log data. Usually if I restarted Logstash and the Elasticsearch DB it would process data for at least a day or two.

Let me know any other info I can get you.

jolson · Post by **jolson** » Thu Mar 19, 2015 10:37 am

So that I have this straight - did you force the backup_maintenance command from both the GUI and CLI? Running out of disk space can cause a lot of damage to an Elasticsearch instance.

Can you take a screenshot of your 'Cluster Status' page please? I would also like a screenshot of 'Instance Status'.

Nagios Support Forum

Alerts not running

Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running

Re: Alerts not running