Page 1 of 1

Slow api?

Posted: Thu Mar 09, 2017 8:12 am
by WillemDH
Hello,

I made this cript which is able to delete a service, all services and / or a host in Nagios XI with the help of the api. After I loop through all the services, I try to delete the host, but I get an error as all the services haven't been deleted yet it seems. When I check the CCM I can slowly see the deleted services go away one by one.

Why is this so slow? I have no performance issues on my server (as long as I don't apply configuration) .. Are the api queries queued somewhere? Where can I check if the queue is empty?

As a reference this is the script:

Code: Select all

#!/bin/bash
Debug=0
Verbose=1
source /tmp/daf_linux_functions.sh
NagiosServer="@option.nagiosserver@"
NagiosApiKey="@option.nagiosapikey@"
NagiosHost="@option.nagioshost@"
NagiosService="@option.nagiosservice@"
WriteLog Verbose Info "Verifying host \"$NagiosHost\" on $NagiosServer"
NagHostCheck="$(curl -s -XGET "https://$NagiosServer/nagiosxi/api/v1/objects/host?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&records=1" | jq -r .hostlist.recordcount)"
if [[ "$NagHostCheck" = "1" ]] ; then WriteLog Output Info "Host $NagiosHost verified. "
else WriteLog Output Info "Host $NagiosHost not found. Exiting. " ; exit 2 ; fi
if [[ "$NagiosService" != 'All' ]] ; then
    WriteLog Output Info "Attempting to remove service $NagiosService of host $NagiosHost. "
    CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&service_description=$NagiosService" | jq .)
    case $CurlResult in
        *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ; exit 2 ;;
        *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ; exit 0;;
        *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
    esac
else
    WriteLog Output Info "Attempting to remove host ${NagiosHost}. "
    CurlGetResult=$(curl -s -XGET "https://$NagiosServer/nagiosxi/api/v1/objects/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost" | jq .)
    WriteLog Debug Info "CurlGetResult: $CurlGetResult"
    ServiceCount=$(echo $CurlGetResult | jq -r .servicelist.recordcount)
    if [[ $ServiceCount = "1" ]] ; then
        ServiceName=$(echo $CurlGetResult | jq -r .servicelist.service.service_description)
        WriteLog Output Info "Attempting to remove the only service $ServiceName of host $NagiosHost. "
        CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&service_description=$ServiceName" | jq .)
        case $CurlResult in
            *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ;;
            *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ;;
            *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
        esac        
    else
        WriteLog Output Info "Looping through $ServiceCount services"
        for ((i=0; i<$ServiceCount; i++)); do
            ServiceName=$(echo $CurlGetResult | jq -r .servicelist.service[$i].service_description)
            WriteLog Output Info "Attempting to remove service $ServiceName of host $NagiosHost. "
            CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&service_description=$ServiceName" | jq .)
            case $CurlResult in
                *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ;;
                *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ;;
                *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
            esac
        done
    fi
    WriteLog Output Info "All $ServiceCount services of $NagiosHost deleted successfully. "
    WriteLog Output Info "Attempting to remove host $NagiosHost. "
    CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/host?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost" | jq .)
    case $CurlResult in
        *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ; exit 2 ;;
        *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ; exit 0;;
        *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
    esac    
fi    
exit 3
Yes I know I could put a sleep in there, but
- I rather just start the host deletion asap
- I would like to know how to monitor this api queue
- I would like to know why this is so slow (and this for only one host)

Grtz

Willem

Re: Slow api?

Posted: Thu Mar 09, 2017 12:14 pm
by mcapra
The API is undergoing a bit of a rework at the moment to address problems like this. Realistically, the request should wait until the task is done before a response is given and yes it should be a bit faster.
Yes I know I could put a sleep in there
That's what most people have been doing to work around the current problem.
I would like to know how to monitor this api queue
We don't have any convenient hooks for accessing the queue; Mostly because it's not really a proper queue. In the case of deleting services, those get passed through the external commands file to remove them from Core, after which (if applyconfig=1) a reconfigure command is submit through the same external commands file. That gets it out of Core in a timely fashion, but doesn't get it out of the MySQL database. We need to rely on ndo2db for that.

The problem occurs when you then go to delete the parent host, which resolves it's dependencies in the MySQL database (not via any hook in Nagios Core). If ndo2db hasn't yet picked up the changes to the Core environment (and sent them off to MySQL) the previously deleted services will still exist in the MySQL record (even though they may already be out of core). I think that's where you're getting the consensus issues you're seeing with the "all the services haven't been deleted" message.
I would like to know why this is so slow (and this for only one host)
Does this host have a lot of dependencies to resolve prior to deletion? That would be my initial thought.

Re: Slow api?

Posted: Thu Mar 09, 2017 3:24 pm
by WillemDH
The host has no dependencies whatsoever.

Ok, looking forward to api v2 then. I'll do with the sleep for now.. Not gonna ask for a feature request, as it seems you already know your issues.

Re: Slow api?

Posted: Thu Mar 09, 2017 3:43 pm
by cdienger
Is it okay for us to lock the thread or did you have any more questions regarding this?

Re: Slow api?

Posted: Thu Mar 09, 2017 3:48 pm
by WillemDH
Sorry just want to say this. Even with a sleep of 2 s for each service deletion and a sleep of 20 seconds before the host deletion, I still cannot delete the host.... Which implies it takes AT LEAST 50 seconds to remove a host and it's services. This is just ridiculous.

There must already exist a better and faster method to automate this (delete host and all it's services)...?

EDIT 1: Just changed the sleep to 100 seconds and it's still giving me an error. Is it even possible to delete all services and the host in one script without applying configuration in between?

EDIT 2: Ok, just read carefully through
"The problem occurs when you then go to delete the parent host, which resolves it's dependencies in the MySQL database (not via any hook in Nagios Core). If ndo2db hasn't yet picked up the changes to the Core environment (and sent them off to MySQL) the previously deleted services will still exist in the MySQL record (even though they may already be out of core). I think that's where you're getting the consensus issues you're seeing with the "all the services haven't been deleted" message. "
C'mon seeing that the apply configuration process is already super slow and causing all kinds of false positives, applying between deleting the services and the host is just not possible.

EDIT 3: So I gave nagiosql_delete_service.php a try but seem to hit another issue there:

Code: Select all

cd /usr/local/nagiosxi/scripts
[root@nagiosserver scripts]# ./nagiosql_delete_service.php –-config=testserver
URL: https://localhost/nagiosxi/includes/components/ccm/
Usage: ./nagiosql_delete_service.php [--id=<service id>] [--config=<config_name>]
My first thought why this isn't working is that the url https://localhost/nagiosxi/includes/components/ccm/ will cause ,issues as the PKI isses certificate is not created for localhost but for 'nagiosserver'..? SO why doesn't it use the internal url?

Re: Slow api?

Posted: Fri Mar 10, 2017 2:04 pm
by tmcdonald
I've spoken with the developers and unfortunately this just seems to be the way the API functions. It adds a job to a queue instead of executing immediately, and on large systems it seems it can take a while to process that queue. This will be changed in a future version, but for the time-being this is how it operates.

As for the PHP script, the localhost part is hard-coded and can be modified to work with your PKI setup. I'll file a feature request to have that reference your hostname.