Slow api?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Slow api?

Post by WillemDH »

Hello,

I made this cript which is able to delete a service, all services and / or a host in Nagios XI with the help of the api. After I loop through all the services, I try to delete the host, but I get an error as all the services haven't been deleted yet it seems. When I check the CCM I can slowly see the deleted services go away one by one.

Why is this so slow? I have no performance issues on my server (as long as I don't apply configuration) .. Are the api queries queued somewhere? Where can I check if the queue is empty?

As a reference this is the script:

Code: Select all

#!/bin/bash
Debug=0
Verbose=1
source /tmp/daf_linux_functions.sh
NagiosServer="@option.nagiosserver@"
NagiosApiKey="@option.nagiosapikey@"
NagiosHost="@option.nagioshost@"
NagiosService="@option.nagiosservice@"
WriteLog Verbose Info "Verifying host \"$NagiosHost\" on $NagiosServer"
NagHostCheck="$(curl -s -XGET "https://$NagiosServer/nagiosxi/api/v1/objects/host?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&records=1" | jq -r .hostlist.recordcount)"
if [[ "$NagHostCheck" = "1" ]] ; then WriteLog Output Info "Host $NagiosHost verified. "
else WriteLog Output Info "Host $NagiosHost not found. Exiting. " ; exit 2 ; fi
if [[ "$NagiosService" != 'All' ]] ; then
    WriteLog Output Info "Attempting to remove service $NagiosService of host $NagiosHost. "
    CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&service_description=$NagiosService" | jq .)
    case $CurlResult in
        *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ; exit 2 ;;
        *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ; exit 0;;
        *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
    esac
else
    WriteLog Output Info "Attempting to remove host ${NagiosHost}. "
    CurlGetResult=$(curl -s -XGET "https://$NagiosServer/nagiosxi/api/v1/objects/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost" | jq .)
    WriteLog Debug Info "CurlGetResult: $CurlGetResult"
    ServiceCount=$(echo $CurlGetResult | jq -r .servicelist.recordcount)
    if [[ $ServiceCount = "1" ]] ; then
        ServiceName=$(echo $CurlGetResult | jq -r .servicelist.service.service_description)
        WriteLog Output Info "Attempting to remove the only service $ServiceName of host $NagiosHost. "
        CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&service_description=$ServiceName" | jq .)
        case $CurlResult in
            *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ;;
            *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ;;
            *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
        esac        
    else
        WriteLog Output Info "Looping through $ServiceCount services"
        for ((i=0; i<$ServiceCount; i++)); do
            ServiceName=$(echo $CurlGetResult | jq -r .servicelist.service[$i].service_description)
            WriteLog Output Info "Attempting to remove service $ServiceName of host $NagiosHost. "
            CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/service?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost&service_description=$ServiceName" | jq .)
            case $CurlResult in
                *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ;;
                *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ;;
                *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
            esac
        done
    fi
    WriteLog Output Info "All $ServiceCount services of $NagiosHost deleted successfully. "
    WriteLog Output Info "Attempting to remove host $NagiosHost. "
    CurlResult=$(curl -s -XDELETE "https://$NagiosServer/nagiosxi/api/v1/config/host?apikey=$NagiosApiKey&pretty=1&host_name=$NagiosHost" | jq .)
    case $CurlResult in
        *error*)    WriteLog Output Error "Error: $(echo "$CurlResult" | jq -r .error)" ; exit 2 ;;
        *success*)      WriteLog Output Info "Success: $(echo "$CurlResult" | jq -r .success)" ; exit 0;;
        *)      WriteLog Output Error "Unknown curl result: $CurlResult" ; exit 3;;
    esac    
fi    
exit 3
Yes I know I could put a sleep in there, but
- I rather just start the host deletion asap
- I would like to know how to monitor this api queue
- I would like to know why this is so slow (and this for only one host)

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Slow api?

Post by mcapra »

The API is undergoing a bit of a rework at the moment to address problems like this. Realistically, the request should wait until the task is done before a response is given and yes it should be a bit faster.
Yes I know I could put a sleep in there
That's what most people have been doing to work around the current problem.
I would like to know how to monitor this api queue
We don't have any convenient hooks for accessing the queue; Mostly because it's not really a proper queue. In the case of deleting services, those get passed through the external commands file to remove them from Core, after which (if applyconfig=1) a reconfigure command is submit through the same external commands file. That gets it out of Core in a timely fashion, but doesn't get it out of the MySQL database. We need to rely on ndo2db for that.

The problem occurs when you then go to delete the parent host, which resolves it's dependencies in the MySQL database (not via any hook in Nagios Core). If ndo2db hasn't yet picked up the changes to the Core environment (and sent them off to MySQL) the previously deleted services will still exist in the MySQL record (even though they may already be out of core). I think that's where you're getting the consensus issues you're seeing with the "all the services haven't been deleted" message.
I would like to know why this is so slow (and this for only one host)
Does this host have a lot of dependencies to resolve prior to deletion? That would be my initial thought.
Former Nagios employee
https://www.mcapra.com/
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Slow api?

Post by WillemDH »

The host has no dependencies whatsoever.

Ok, looking forward to api v2 then. I'll do with the sleep for now.. Not gonna ask for a feature request, as it seems you already know your issues.
Nagios XI 5.8.1
https://outsideit.net
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Slow api?

Post by cdienger »

Is it okay for us to lock the thread or did you have any more questions regarding this?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Slow api?

Post by WillemDH »

Sorry just want to say this. Even with a sleep of 2 s for each service deletion and a sleep of 20 seconds before the host deletion, I still cannot delete the host.... Which implies it takes AT LEAST 50 seconds to remove a host and it's services. This is just ridiculous.

There must already exist a better and faster method to automate this (delete host and all it's services)...?

EDIT 1: Just changed the sleep to 100 seconds and it's still giving me an error. Is it even possible to delete all services and the host in one script without applying configuration in between?

EDIT 2: Ok, just read carefully through
"The problem occurs when you then go to delete the parent host, which resolves it's dependencies in the MySQL database (not via any hook in Nagios Core). If ndo2db hasn't yet picked up the changes to the Core environment (and sent them off to MySQL) the previously deleted services will still exist in the MySQL record (even though they may already be out of core). I think that's where you're getting the consensus issues you're seeing with the "all the services haven't been deleted" message. "
C'mon seeing that the apply configuration process is already super slow and causing all kinds of false positives, applying between deleting the services and the host is just not possible.

EDIT 3: So I gave nagiosql_delete_service.php a try but seem to hit another issue there:

Code: Select all

cd /usr/local/nagiosxi/scripts
[root@nagiosserver scripts]# ./nagiosql_delete_service.php –-config=testserver
URL: https://localhost/nagiosxi/includes/components/ccm/
Usage: ./nagiosql_delete_service.php [--id=<service id>] [--config=<config_name>]
My first thought why this isn't working is that the url https://localhost/nagiosxi/includes/components/ccm/ will cause ,issues as the PKI isses certificate is not created for localhost but for 'nagiosserver'..? SO why doesn't it use the internal url?
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Slow api?

Post by tmcdonald »

I've spoken with the developers and unfortunately this just seems to be the way the API functions. It adds a job to a queue instead of executing immediately, and on large systems it seems it can take a while to process that queue. This will be changed in a future version, but for the time-being this is how it operates.

As for the PHP script, the localhost part is hard-coded and can be modified to work with your PKI setup. I'll file a feature request to have that reference your hostname.
Former Nagios employee
Locked