Alerts not running
Alerts not running
All of the Alerts have dates from last week on their last time they ran. I had this problem previously but the fix doesn't appear to work this time.
Previous Post: http://support.nagios.com/forum/viewtop ... 37&t=30652
When I run the command at the bottom of that thread alerts do not resume. Here is the command that was asked for output last time.
nagioslogserver]# curl -XGET 'http://localhost:9200/nagioslogserver/c ... run_alerts'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 3.3513753,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "eN2vMd_BSHWiRRDFx5appQ",
"_score" : 3.3513753,
"_source":{"created":"2015-02-02 15:27:00","active":1,"status":"running","type":"system","node":"global","command":"run_alerts","run_time":1423538059,"frequency":"20","last_run_time":"2015-02-09 21:13:59","last_run_status":"SUCCESS"}
}, {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "run_all_alerts",
"_score" : 3.3513753,
"_source":{"command":"run_alerts","run_time":1,"frequency":20,"node":"global","type":"system","status":"waiting","active":1}
} ]
}
}
Previous Post: http://support.nagios.com/forum/viewtop ... 37&t=30652
When I run the command at the bottom of that thread alerts do not resume. Here is the command that was asked for output last time.
nagioslogserver]# curl -XGET 'http://localhost:9200/nagioslogserver/c ... run_alerts'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 3.3513753,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "eN2vMd_BSHWiRRDFx5appQ",
"_score" : 3.3513753,
"_source":{"created":"2015-02-02 15:27:00","active":1,"status":"running","type":"system","node":"global","command":"run_alerts","run_time":1423538059,"frequency":"20","last_run_time":"2015-02-09 21:13:59","last_run_status":"SUCCESS"}
}, {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "run_all_alerts",
"_score" : 3.3513753,
"_source":{"command":"run_alerts","run_time":1,"frequency":20,"node":"global","type":"system","status":"waiting","active":1}
} ]
}
}
Re: Alerts not running
What version of Log Server are you running?
Try upgrading to the latest version if you have not done so already.
In the new version, you can check the your job status by going to Administration > Command Subsystem.
Try upgrading to the latest version if you have not done so already.
In the new version, you can check the your job status by going to Administration > Command Subsystem.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Alerts not running
Thanks, I didn't know about that feature. When I went to the page nothing was running and the last runtimes were dated 1969. I hit the Reset Jobs Button. Jobs still are not running and a screen shot is posted below.
You do not have the required permissions to view the files attached to this post.
Re: Alerts not running
What happens if you click the 'Edit' button next to one of those commands and set the next runtime to 'Now' - press save. Does the job in question start? If not, please run the following at the CLI and report your results:
Code: Select all
curl -XGET 'http://localhost:9200/nagioslogserver/commands/_search?pretty&q=command:do_maintenance'
getenforceRe: Alerts not running
Clicking on Now and updating the task still has the same result as before. Jobs will not run. Results of the command are below:
# curl -XGET 'http://localhost:9200/nagioslogserver/c ... aintenance'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.730029,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "backup_maintenance",
"_score" : 3.730029,
"_source":{"created":"2015-03-10 15:43:00","created_by":"1","active":1,"status":"waiting","type":"system","node":"global","command":"do_maintenance","run_time":1426106580,"frequency":"86400"}
} ]
}
}
[root@nagiosls ~]# getenforce
Disabled
# curl -XGET 'http://localhost:9200/nagioslogserver/c ... aintenance'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.730029,
"hits" : [ {
"_index" : "nagioslogserver",
"_type" : "commands",
"_id" : "backup_maintenance",
"_score" : 3.730029,
"_source":{"created":"2015-03-10 15:43:00","created_by":"1","active":1,"status":"waiting","type":"system","node":"global","command":"do_maintenance","run_time":1426106580,"frequency":"86400"}
} ]
}
}
[root@nagiosls ~]# getenforce
Disabled
Re: Alerts not running
Can you please run the following command on all nodes and force a job from the command subsystem:
This will give us the output of your jobs, hopefully pointing us in the right direction.
If forcing a job from the command subsystem doesn't seem to work properly, please run the following in a separate terminal:
Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.logIf forcing a job from the command subsystem doesn't seem to work properly, please run the following in a separate terminal:
Code: Select all
curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }' Re: Alerts not running
Forcing a command to run didn't work. Ran the second command and here is the output:
[root@nagiosls ~]# curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }'
{"_index":"nagioslogserver","_type":"commands","_id":"backup_maintenance","_version":6}[root@nagiosls ~]#
Also it appears that NLS has stopped accepting events from clients as well. Services are running but no events are being recorded or indexed. I've stopped and restarted the Elasticsearch and Logstash services. It will bring in a few events and then stop. I'm not sure if this is related or not.
Screenshot attached.
[root@nagiosls ~]# curl -XPOST localhost:9200/nagioslogserver/commands/backup_maintenance/_update -d '{ "doc": { "run_time": "10" } }'
{"_index":"nagioslogserver","_type":"commands","_id":"backup_maintenance","_version":6}[root@nagiosls ~]#
Also it appears that NLS has stopped accepting events from clients as well. Services are running but no events are being recorded or indexed. I've stopped and restarted the Elasticsearch and Logstash services. It will bring in a few events and then stop. I'm not sure if this is related or not.
Screenshot attached.
You do not have the required permissions to view the files attached to this post.
Re: Alerts not running
While the command is running, could you please get us a tail of jobs.log?
This may provide us with some insight as to why the processes aren't firing. When did your alerts stop processing, and what was happening at that time?
Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.logRe: Alerts not running
Sorry about that. So I tried to force the jobs while tailing the jobs.log. Nothing appeared in the tail of the log.
History on this is as follows:
1. Ran out of disk space.
2. With help from support extended the disk.
3. After that Alerts ran but would stop off an on.
4. Finally Alerts stopped and would not run. At the same time the database would intermittently stop processing log data. Usually if I restarted Logstash and the Elasticsearch DB it would process data for at least a day or two.
Let me know any other info I can get you.
History on this is as follows:
1. Ran out of disk space.
2. With help from support extended the disk.
3. After that Alerts ran but would stop off an on.
4. Finally Alerts stopped and would not run. At the same time the database would intermittently stop processing log data. Usually if I restarted Logstash and the Elasticsearch DB it would process data for at least a day or two.
Let me know any other info I can get you.
Re: Alerts not running
So that I have this straight - did you force the backup_maintenance command from both the GUI and CLI? Running out of disk space can cause a lot of damage to an Elasticsearch instance.
Can you take a screenshot of your 'Cluster Status' page please? I would also like a screenshot of 'Instance Status'.
Can you take a screenshot of your 'Cluster Status' page please? I would also like a screenshot of 'Instance Status'.