Page 1 of 3
Alerts not running
Posted: Tue Mar 10, 2015 11:06 am
by 34Bearman
All of the Alerts have dates from last week on their last time they ran. I had this problem previously but the fix doesn't appear to work this time.
Previous Post:
http://support.nagios.com/forum/viewtop ... 37&t=30652
When I run the command at the bottom of that thread alerts do not resume. Here is the command that was asked for output last time.
nagioslogserver]# curl -XGET '
http://localhost:9200/nagioslogserver/c ... run_alerts'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 3.3513753,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "eN2vMd_BSHWiRRDFx5appQ",
"_score" : 3.3513753,
"_source":{"created":"2015-02-02 15:27:00","active":1,"status":"running","type":"system","node":"global","command":"run_alerts","run_time":1423538059,"frequency":"20","last_run_time":"2015-02-09 21:13:59","last_run_status":"SUCCESS"}
}, {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "run_all_alerts",
"_score" : 3.3513753,
"_source":{"command":"run_alerts","run_time":1,"frequency":20,"node":"global","type":"system","status":"waiting","active":1}
} ]
}
}
Re: Alerts not running
Posted: Tue Mar 10, 2015 2:24 pm
by tgriep
What version of Log Server are you running?
Try upgrading to the latest version if you have not done so already.
In the new version, you can check the your job status by going to Administration > Command Subsystem.
Re: Alerts not running
Posted: Tue Mar 10, 2015 3:48 pm
by 34Bearman
Thanks, I didn't know about that feature. When I went to the page nothing was running and the last runtimes were dated 1969. I hit the Reset Jobs Button. Jobs still are not running and a screen shot is posted below.
Re: Alerts not running
Posted: Tue Mar 10, 2015 4:00 pm
by jolson
What happens if you click the 'Edit' button next to one of those commands and set the next runtime to 'Now' - press save. Does the job in question start? If not, please run the following at the CLI and report your results:
Code: Select all
curl -XGET 'http://localhost:9200/nagioslogserver/commands/_search?pretty&q=command:do_maintenance'
getenforce
Re: Alerts not running
Posted: Tue Mar 17, 2015 2:57 pm
by 34Bearman
Clicking on Now and updating the task still has the same result as before. Jobs will not run. Results of the command are below:
# curl -XGET '
http://localhost:9200/nagioslogserver/c ... aintenance'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.730029,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "backup_maintenance",
"_score" : 3.730029,
"_source":{"created":"2015-03-10 15:43:00","created_by":"1","active":1,"status":"waiting","type":"system","node":"global","command":"do_maintenance","run_time":1426106580,"frequency":"86400"}
} ]
}
}
[root@nagiosls ~]# getenforce
Disabled
Re: Alerts not running
Posted: Tue Mar 17, 2015 3:47 pm
by jolson
Can you please run the following command on all nodes and force a job from the command subsystem:
Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.log
This will give us the output of your jobs, hopefully pointing us in the right direction.
If forcing a job from the command subsystem doesn't seem to work properly, please run the following in a separate terminal:
Code: Select all
curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }'
Re: Alerts not running
Posted: Tue Mar 17, 2015 4:54 pm
by 34Bearman
Forcing a command to run didn't work. Ran the second command and here is the output:
[root@nagiosls ~]# curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }'
{"_index":"nagioslogserver","_type":"commands","_id":"backup_maintenance","_version":6}[root@nagiosls ~]#
Also it appears that NLS has stopped accepting events from clients as well. Services are running but no events are being recorded or indexed. I've stopped and restarted the Elasticsearch and Logstash services. It will bring in a few events and then stop. I'm not sure if this is related or not.
Screenshot attached.
Re: Alerts not running
Posted: Wed Mar 18, 2015 9:33 am
by jolson
While the command is running, could you please get us a tail of jobs.log?
Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.log
This may provide us with some insight as to why the processes aren't firing. When did your alerts stop processing, and what was happening at that time?
Re: Alerts not running
Posted: Wed Mar 18, 2015 8:37 pm
by 34Bearman
Sorry about that. So I tried to force the jobs while tailing the jobs.log. Nothing appeared in the tail of the log.
History on this is as follows:
1. Ran out of disk space.
2. With help from support extended the disk.
3. After that Alerts ran but would stop off an on.
4. Finally Alerts stopped and would not run. At the same time the database would intermittently stop processing log data. Usually if I restarted Logstash and the Elasticsearch DB it would process data for at least a day or two.
Let me know any other info I can get you.
Re: Alerts not running
Posted: Thu Mar 19, 2015 10:37 am
by jolson
So that I have this straight - did you force the backup_maintenance command from both the GUI and CLI? Running out of disk space can cause a lot of damage to an Elasticsearch instance.
Can you take a screenshot of your 'Cluster Status' page please? I would also like a screenshot of 'Instance Status'.