Removing services takes a long time

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
developerx9
Posts: 20
Joined: Sun Jun 05, 2016 9:07 am

Re: Removing services takes a long time

Post by developerx9 »

Additional info regarding mysql:

- delete_service.php is FAST before reconfigure_nagios is executed, so I wouldn't say that mysql is the bottleneck here. (or at least haven't gotten evidence so far)
- I have been running MySQL from memory, just for troubleshooting purposes. delete_service.php was still slow.

I hope we find a solution for this issue, this is a huge blocker for us.
developerx9
Posts: 20
Joined: Sun Jun 05, 2016 9:07 am

Re: Removing services takes a long time

Post by developerx9 »

So we figured out what the blocker is, it's the fact that
/usr/bin/wget --load-cookies=nagiosql.cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'type=service&cmd=delete&id=18384' -O nagiosql.delete.service

is doing a bunch of unnecessary work and actually "simulating" deleting services over GUI, for whatever reason..

So when we checked nagiosql.delete.service we found a HTML page filled with bunch of this: http://pastebin.com/hz2KNhiB
and a huge amount of "option value" with list of all of our servers. When we try to open this file via browser, it's basically nagios GUI without css.

So what is taking so long is that every time we do "delete.service" via cmd, it loads the list of all of our hosts. Why??

The reason it is happening only AFTER the "reconfigure_nagios" is executed is because it is NOT logged in:

<div id="screen-overlay"></div>
<div id="loginMsgDiv" style="display: none;">
<span class='deselect'>
<div >
Login Required! </div>
</span>
</div>


So actually, when we do nagiosql.delete_service.php BEFORE reconfigure_nagios.sh, we're basically doing nothing.
So the question is - how do we get rid of the above? (To avoid loading all of our hosts).

I'm really surprised this works like this. Surely there has to be a better way..
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Removing services takes a long time

Post by avandemore »

So what is taking so long is that every time we do "delete.service" via cmd, it loads the list of all of our hosts. Why??
It isn't simulating, it is doing. It's using the same channel(HTTP) a delete from the GUI utilizes so that may be why it's confusing for you.

The "option value" list file(nagiosql.delete.service) you reference is simply the standard HTML you would get from visiting the URL. Generating that page polls your DB hence the wait.
Previous Nagios employee
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Removing services takes a long time

Post by tmcdonald »

Sorry that this was not mentioned more directly earlier in the thread, but you should really be using the built-in API to do config management. Historically, those scripts have been used primarily by support staff in order to troubleshoot, and I don't believe they were ever intended for public use.

The API documentation is available in the XI web interface under the Help menu, in the top-left navigation panel under the "Backend API Docs" section. Specifically, look into the "Config Reference" section.

This should be a lot faster and less error-prone than custom bash scripts.
Former Nagios employee
developerx9
Posts: 20
Joined: Sun Jun 05, 2016 9:07 am

Re: Removing services takes a long time

Post by developerx9 »

Hi,

sorry for the late reply.
I've rewritten the script to utilize the API, just to find out it is still slow as before.

Then I saw that when I use the API for delete the service it actually calls the "nagiosql.delete.service" script with the "id" of the service, which then generates the same HTML as before.
So basically, the API is a wrapper around the service that I was using the entire time. So obviously that did not fix my issue.
avandemore wrote:
So what is taking so long is that every time we do "delete.service" via cmd, it loads the list of all of our hosts. Why??
It isn't simulating, it is doing. It's using the same channel(HTTP) a delete from the GUI utilizes so that may be why it's confusing for you.
The "option value" list file(nagiosql.delete.service) you reference is simply the standard HTML you would get from visiting the URL. Generating that page polls your DB hence the wait.
How can I avoid this? We have 7k services and checks. Imagine if there's a massive change on at least 50% of those servers (i.e a new java dependency that we now have to monitor), it would take forever. ( I think I mentioned the math earlier).

I'm sorry guys but that's ridiculous..
I don't want a HTML output that takes 5 seconds to generate!
That's the reason I'm not using the GUI in the first place for God's sake

To get to the point:

- Can we avoid polling the DB every time we delete the service?
Is there a way I can avoid this by adjusting the php scripts? If so, what exactly?

This is a huge deal breaker for us, and I'm hoping you understand my concern. Switching to a new monitoring system isn't what we want, but we'll be forced to do so if we don't find a solution for this.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Removing services takes a long time

Post by avandemore »

We do offer this document on optimizing your DB:

Database Optimization]Database Optimization

Retaining too much data can negatively impact performance so reducing the length/amount of data kept may help.
Previous Nagios employee
developerx9
Posts: 20
Joined: Sun Jun 05, 2016 9:07 am

Re: Removing services takes a long time

Post by developerx9 »

The database is optimized. We have gone through this several post ago.
The amount of data is not the issue but the polling of the DB every time we delete a service which fetches all the servers for the sake of GUI which we do not even use, and the fact that your "API" (which is just a wrapper around the inefficient nagiosql.delete.service script) generates a huge amount of html/css for every 'delete' action is just insane.

I honestly can not believe that you are not willing to address such a critical issue.

We'll be moving to another monitoring system.

Thank you.
developerx9
Posts: 20
Joined: Sun Jun 05, 2016 9:07 am

Re: Removing services takes a long time

Post by developerx9 »

Since you guys are unwilling to correct the issue, here is a custom fix for anyone that faced the same problem.
Create a custom case in the page_router.inc.php script located in "/usr/local/nagiosxi/html/includes/components/ccm/includes"

Code: Select all

          case 'deleteCustom':
               //die('this is custom delete handler');
               if ($type == 'user') {
                   $html = route_admin_view($type);
               }
               else {
                   include_once(INCDIR.'delete_object.inc.php');
                   $returnContent = delete_object($type, $id);
                   if (ENVIRONMENT == "nagiosxi" && $returnContent[0] == 0) {
                       set_option("ccm_apply_config_needed", 1);
                  }
                  $html = 'Object deleted';
               }
               return $html;
Also, edit your scripts to use deleteCustom instead of 'delete', so it will be something like this:

/usr/bin/wget --load-cookies=nagiosql.cookies http://localhost/nagiosxi/includes/components/ccm/ --no-check-certificate --post-data 'type=service&cmd=deleteCustom&id=15524' -O nagiosql.delete.service

Now everything is flying and it takes 0,1 second per service, instead of 3-4 seconds.
I hope this will get fixed in the upcoming versions, and I'm really surprised you're not taking the issue seriously.

We have over 2000 servers to monitor, an enterprise license, and the fact that we have to fix these kind of issues ourselves is just crazy.
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: Removing services takes a long time

Post by bheden »

This was a bug that was fixed in 5.3.3 and will be released shortly, actually. Our fix was similar to the one you wrote. The same bug exists for deleting contacts, timeperiods and hosts.

As an FYI: the way that we're adding and removing configurations in an upcoming version of XI is being re-written to be a LOT more optimized. We are aware of the limitations that exist currently and we are addressing them.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Removing services takes a long time

Post by tmcdonald »

developerx9 wrote:We have over 2000 servers to monitor, an enterprise license, and the fact that we have to fix these kind of issues ourselves is just crazy.
(Emphasis mine)

If you are a current customer, please post in or allow me to move this thread to the Customer Support section of the forum. The level of support we are able to provide to non-customers regarding code modifications is limited, however had we known you were a customer earlier on we could have taken a much different approach involving direct input from the developers.

We are certainly not unwilling to address the issue - as was mentioned by @bheden there are quite a few optimizations in place for the API in future versions and 5.3.3 will be released shortly. It does take time to develop and test such changes, and since we just heard back from you on Sunday we have not had much time to address the specific points you made in your more recent posts. Please work with us to implement any improvements you feel can be made - we're certainly willing to do so on our end.
Former Nagios employee
Locked