verify config / apply config sometimes not returning

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
_asp_
Posts: 91
Joined: Mon May 23, 2016 4:30 am

verify config / apply config sometimes not returning

Post by _asp_ »

Hi,

we have a single instance.
When saving a new configuration (new / changed filters) the functions "verify" and "apply config" fo not retrun.
I waited several minutes.

It works slightly better when logstash service has been stopped before.

But here my questions:
1. Why is it so slow?
2. can I run both from command line, so that I can follow the progress and that I can cancel it?

In pure elk it is much easier / faster, when the config is just stored in files instead of elasticsearch. Sure the cost is the lack of a gui ;-)

Thanks, Andreas
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: verify config / apply config sometimes not returning

Post by rkennedy »

What amount of time does it actually take you? I just tested on a machine I have here locally, and it only took 12 seconds.

1. This does not sound normal.
2. Nope, the apply configuration will need to be done through the GUI due to everything that happens. We are not just writing to a file, and restarting - there is more of a process which involves writing to a DB, and then writing out to files.

A few questions for you -
1. What modifications have been made to the system / environment?
2. What OS / version of NLS are you running?
3. What amount of resources do you have allocated to it, and how much data worth of open indexes? This could just be a resource issue.
Former Nagios Employee
_asp_
Posts: 91
Joined: Mon May 23, 2016 4:30 am

Re: verify config / apply config sometimes not returning

Post by _asp_ »

problem occurs on production:
24GB RAM, 9 CPU, RedHat 6.5, 150-200GB open indexes, version 1.4.1

But I can reproduce the problem in the VM you are offering for download:
- changed to 3 CPU and 5 GB RAM (on 4CPU + Hyperthreading, 8GB Host). Switched to HostOnlyNetwork. Version 1.4.4.
- new installation.

Verify is working. It takes about 1-2 Minutes to return.

When hitting apply, I can see the logstash process consuming up to 230% cpu. Then it's load is going down to zero.
Web frontent is hanging in status "Running". Waiting vor 10 Minutes now.

The configuration of logstash has been changed in /usr/local/nagioslogserver/logstash/etc/conf.d (checked the timestamp)

So is the problem only GUI related?

In httpd log I can still see continous requests:

Code: Select all

192.168.56.1 - - [23/Nov/2016:10:29:31 +0100] "POST /nagioslogserver/api/system/status HTTP/1.1" 200 87 "http://192.168.56.101/nagioslogserver/configure/apply" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) G                              ecko/20100101 Firefox/45.0"

User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: verify config / apply config sometimes not returning

Post by mcapra »

So is the problem only GUI related?
It could be a timeout issue with PHP/Ajax calls. I would be interested in seeing what the actual job is doing.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
And do an apply configuration. You should only see one instance of "Running command apply_config with args". If you see multiples, something is not working right which would explain the prolonged "Running" status. If it just takes a long time between "Running command apply_config with args" and "SUCCESS", then Logstash is likely just processing a very complex configuration. This is pretty common with configurations that contain several unique plugins or lots of branching logic.
Former Nagios employee
https://www.mcapra.com/
_asp_
Posts: 91
Joined: Mon May 23, 2016 4:30 am

Re: verify config / apply config sometimes not returning

Post by _asp_ »

Here is what you wanted:

Code: Select all

# tail -f /usr/local/nagioslogserver/var/jobs.log
Running command delete_snapshot with args 'a:1:{s:4:"path";s:75:"/usr/local/nagioslogserver/snapshots/applyconfig.snapshot.1480012681.tar.gz";}' for job id: AViXpmzBX8Pxd4A2tYTo
SUCCESS
Running command apply_config with args 'a:2:{s:5:"sh_id";s:20:"AViXpmzRX8Pxd4A2tYTp";s:10:"sh_created";i:1480013016;}' for job id: AViXpmzhX8Pxd4A2tYTq
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
SUCCESS
Processed 2 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Processed 0 node jobs.
Processed 0 global jobs.
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Files have been updated when the 2nd success was written.
The gui stays in status "running".
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: verify config / apply config sometimes not returning

Post by mcapra »

If the apply job is taking longer than 30 seconds, you might consider editing your /etc/php.ini and increasing the max_execution_time and max_input_time values. Be sure to restart httpd after doing this.

Another place to check would be your browser's Javascript console to see if there are any errors in there while applying configuration.

And last but not least, we can query the API directly for the status of the job. This will tell us if the job is actually stuck in a "running" state or of the Javascript simply isn't updating correctly. The process is a bit of a pain though.

You'll need to tail -f /usr/local/nagioslogserver/var/jobs.log while running an apply config, then pull out the apply_config command id and feed that into a CURL call. From your previous output, you can get the command ID from:

Running command apply_config with args 'a:2:{s:5:"sh_id";s:20:"AViXpmzRX8Pxd4A2tYTp";s:10:"sh_created";i:1480013016;}' for job id: AViXpmzhX8Pxd4A2tYTq

And plug it into a CURL call to the back-end API endpoint that checks the status of jobs:

Code: Select all

curl -XPOST 'http://<nagios_log_server_host>/nagioslogserver/api/system/get_cmd_info?token=<api_key>&cmd_id=AViXpmzhX8Pxd4A2tYTq'
Former Nagios employee
https://www.mcapra.com/
Locked