Deadpool not deleting Stage 2 Services.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Deadpool not deleting Stage 2 Services.

Post by yo_marc »

I have a XI system here running 5.5.2. It is not properly deleting Stage 2 services from the deadpool. It removes them from the Deadpool ServiceGroup ok - but not from monitoring itself. They live on, and then this causes problems when the Deadpool tries to remove the Host.

I dug into it and I am seeing a bunch of lines with windows line-endings being set in the nagiosql.delete.service file that is produced about 2 commands deep into the Deadpool service removal process. Are those control-m's expected?

I did a dos2unix on the nagiosql.delete.service file before doing a manual ./reconfigure_nagios.sh - but that did not help. The service in question lives on, so there seems to be something else going on...

Any tips on where to look next?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Deadpool not deleting Stage 2 Services.

Post by ssax »

You may have multiple nagios process / kernel message queues, what is the output of these commands:

Code: Select all

ps aux | grep nagios.cfg
ipcs -q
I looked through the code, I'm not sure why there would be windows line endings in it.

What are you seeing in the /usr/local/nagiosxi/var/deadpool.log file?
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Re: Deadpool not deleting Stage 2 Services.

Post by yo_marc »

Ah - I think it's a multi-process issue... I'll kill this extra process and retry.

Code: Select all

[root@xi-server]# ps aux | grep nagios.cfg
nagios   29468  0.5  0.0  34104  4404 ?        Ss   09:59   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   29506  0.0  0.0  33580  3028 ?        S    09:59   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     31612  0.0  0.0 112708   976 pts/0    S+   10:02   0:00 grep --color=auto nagios.cfg


[root@xi-server]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x56000080 20447232   nagios     600        0            0
The /usr/local/nagiosxi/var/deadpool.log did not give any errors. I will post it if killing that second process doesn't do the job.
Will respond soon.
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Re: Deadpool not deleting Stage 2 Services.

Post by yo_marc »

Actually a follow up question: Is it correct to have two entries for the 'nagios.cfg' lines shown above?

One is the parent of the other... so this looks expected.

Code: Select all

[root@xi-server]# ps -afe | grep nagios.cfg
nagios    3101     1  1 08:52 ?        00:01:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    3169  3101  0 08:52 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     27388 26635  0 10:10 pts/0    00:00:00 grep --color=auto nagios.cfg
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Re: Deadpool not deleting Stage 2 Services.

Post by yo_marc »

Sanitized output of deadpool.log -- Initial deletion:

Code: Select all

Processing service 'my-client.donain.com' / 'All Disks' in stage 2
NagiosQL Service ID = 1866
SQL: DELETE FROM tbl_lnkServiceToServicegroup WHERE idSlave='2' AND idMaster='1866'
SQL: SELECt * FROM tbl_lnkServiceToServicegroup WHERE idMaster='1866'
SQL: UPDATE tbl_service SET servicegroups='0' WHERE id='1866'
SQL: UPDATE tbl_service SET last_modified='2018-08-28 11:25:02' WHERE id='1866'
Deleting service...
COMMAND: cd /usr/local/nagiosxi/scripts && ./nagiosql_delete_service.php --id=1866
--2018-08-28 11:25:02--  http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'nagiosql.delete.service'

     0K .......... .......                                     2.19M=0.008s

2018-08-28 11:25:02 (2.19 MB/s) - 'nagiosql.delete.service' saved [17994]


Reconfiguring Nagios Core...
PROCESSED HOSTS:
Array
(
)
PROCESSED SERVICES:
Array
(
    [0] => Array
        (
            [hostname] => my-client.domain.com
            [servicename] => All Disks
            [stage] => 2
        )

)
EMAIL:
Array
(
    [to] => [email protected]
    [subject] => Nagios Deadpool Report
    [message] =>
Deleted Services
===
The following services were deleted from the monitoring configuration because they remained in a problem state longer than the stage 2 deadpool threshold.

my-client.domain.com / All Disks


Access Nagios XI at:
http://xi-server.domain.com/nagiosxi/


)

Sanitized snipped of subsequent deadpool runs:

Code: Select all

Processing service 'my-client.domain.com' / 'All Disks' in stage 2
NagiosQL Service ID = 1866
   Not in deadpool -> skipping service
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Re: Deadpool not deleting Stage 2 Services.

Post by yo_marc »

For what it's worth, here are the Control-M's I spoke of.
(Still being created in generated /usr/local/nagiosxi/scripts/nagiosql.delete.service file)
As seen in vim on linux system, starting on line 200 to end of file:

Code: Select all

<!--- CHILD HEADER START -->

<div id="child_popup_layer">
    <div id="child_popup_content">
        <div id="child_popup_close">
            <a id="close_child_popup_link" style="display: inline-block;" title="Close"><i class="fa fa-times" style="font-size: 16px;"></i></a>
        </div>
        <div id="child_popup_container">
        </div>
    </div>
</div>

<!--- CHILD HEADER END -->        <div id="throbber" class="sk-spinner sk-spinner-center sk-spinner-three-bounce">
            <div class="sk-bounce1"></div>
            <div class="sk-bounce2"></div>
            <div class="sk-bounce3"></div>
        </div>
    </div>

    <script type="text/javascript">^M
var CCM_SESSION_ID = 0;^M
var CCM_LOCK = { };^M
^M
$(document).ready(function() {^M
^M
    if (CCM_SESSION_ID) {^M
^M
        $(window).bind('beforeunload', function(e) {^M
            $.ajax({^M
                url: 'ajax.php',^M
                method: 'POST',^M
                async: false,^M
                data: { cmd: 'removesession', ccm_session_id: CCM_SESSION_ID }^M
            });^M
        });^M
^M
        // Update the session if user is just sitting on a page (or editing it)^M
        var update_id = setInterval(update_session_and_lock, 10000);^M
^M
        check_page_usage();^M
    }^M
^M
    $(window).resize(function() {^M
        $('#lock-notice').center().css('top', '250px');^M
    });^M
^M
    $('#remove-lock').click(function() {^M
        $.post('ajax.php', { cmd: 'takelock', lock_id: CCM_LOCK.id, ccm_session_id: CCM_SESSION_ID }, function(d) {^M
            if (d.success) {^M
                CCM_LOCK = { }^M
                $('#lock-notice').hide();^M
                clear_whiteout();^M
            }^M
        }, 'json');^M
    });^M
});^M
^M
function update_session_and_lock()^M
{^M
    // Update session and return lock values^M
    var vars = { cmd: 'updatesession', ccm_session_id: CCM_SESSION_ID, obj_id: 1866 };^M
    if (CCM_LOCK.id) {^M
        vars.lock_id = CCM_LOCK.id;^M
    }^M
^M
    // Update session and get new lock if there is one^M
    $.post('ajax.php', vars, function(d) {^M
        if (d.has_new_lock) {^M
            CCM_LOCK = d.lock;^M
            $('.lock-text').html(d.locktext);^M
            check_page_usage();^M
        }^M
    }, 'json');^M
}^M
^M
function check_page_usage()^M
{^M
    if (CCM_LOCK.id) {^M
        whiteout();^M
        $('#lock-notice').center().css('top', '250px').show();^M
    }^M
}^M
^M
^M
</script>^M
    <div id="screen-overlay"></div>^M
    <div id="whiteout"></div>^M
    <div id="lock-notice" class="hide info-popup" style="text-align: center; padding: 25px;">^M
        <h4><i class="fa fa-exclamation-triangle" style="vertical-align: middle;"></i> The page is currently being edited by another user.</h4>^M
        <div class="lock-text">^M
                    </div>^M
        <div class="btns">^M
            <button type="button" id="remove-lock" class="btn btn-sm btn-danger">Remove Lock</button>^M
            <a href="" class="btn btn-sm btn-default">Cancel</a>^M
        </div>^M
    </div>^M
    <div id="loginMsgDiv" >^M
        <span class='deselect'>^M
            <div >^M
                Login Required!            </div>^M
        </span>^M
    </div>^M
^M

    <div id='loginDiv'>
        <h3>Nagios CCM Login</h3>
        <form id='loginForm' action='index.php' method='post'>
            <label for='username'>Username: </label><br />
            <input type='text' name='username' id='username' size='20'  autocomplete='off'/><br /><br />
            <label for='password'>Password</label><br />
            <input type='password' name='password' id='password' size='20'  autocomplete='off'/><br /><br />
            <input type='hidden' name='loginSubmitted' value='true' />
            <input type='hidden' name='menu' value='invisible' />
            <input class='ccmbutton' type='submit' name='submit' id='submit' value='Login' />
        </form>
    </div><!-- CHILD FOOTER START -->


<!-- CHILD FOOTER END -->

    </div><!--page-->

    <noframes>
        <!-- This page requires a web browser which supports frames. -->
        <h2>Nagios XI</h2>
        <p align="center">
            <a href="https://www.nagios.com/">www.nagios.com</a><br>
            Copyright (c) 2009-2018 Nagios Enterprises, LLC<br>
        </p>
        <p>
            <i>Note: These pages require a browser which supports frames</i>
        </p>
    </noframes>


    </body>
</html>

User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Deadpool not deleting Stage 2 Services.

Post by lmiltchev »

You are correct about the "^M" entries in the /usr/local/nagiosxi/scripts/nagiosql.delete.service file. I can verify that I see them on my test XI boxes. I will notify our developers about the issue.

For the time being, you could try removing them in the vi editor by typing:

Code: Select all

:%s/^V^M//g
Note: the ^V and ^M are typed by hitting Ctrl+v and Ctrl+m. The command listed above will replace the "^M" in the file with an empty string globally.
and hitting "Enter". Save and exit.

I would recommend upgrading your Nagios XI instance to the latest (5.5.3) as this should fix the issue that you are having with deleting the services from deadpool.
Fixed issue with deadpool cron job not being able to delete host/services due to script changes -JO
See the entire changelog here:
https://www.nagios.com/downloads/nagios-xi/change-log/
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Deadpool not deleting Stage 2 Services.

Post by lmiltchev »

One more thing we noticed:
<div id="loginMsgDiv" >^M
<span class='deselect'>^M
<div >^M
Login Required! </div>^M
</span>^M
</div>^M
It seems like the CCM user that nagiosql uses (nagiosxi) is not set properly... Most likely, you will need to go to the permissions page in the admin section and update the backend password.
Be sure to check out our Knowledgebase for helpful articles and solutions!
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Re: Deadpool not deleting Stage 2 Services.

Post by yo_marc »

Good info - Thanks! Looking into that now.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Deadpool not deleting Stage 2 Services.

Post by lmiltchev »

Sure, let us know how it went.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked