Deadpool not deleting Stage 2 Services.
Deadpool not deleting Stage 2 Services.
I have a XI system here running 5.5.2. It is not properly deleting Stage 2 services from the deadpool. It removes them from the Deadpool ServiceGroup ok - but not from monitoring itself. They live on, and then this causes problems when the Deadpool tries to remove the Host.
I dug into it and I am seeing a bunch of lines with windows line-endings being set in the nagiosql.delete.service file that is produced about 2 commands deep into the Deadpool service removal process. Are those control-m's expected?
I did a dos2unix on the nagiosql.delete.service file before doing a manual ./reconfigure_nagios.sh - but that did not help. The service in question lives on, so there seems to be something else going on...
Any tips on where to look next?
I dug into it and I am seeing a bunch of lines with windows line-endings being set in the nagiosql.delete.service file that is produced about 2 commands deep into the Deadpool service removal process. Are those control-m's expected?
I did a dos2unix on the nagiosql.delete.service file before doing a manual ./reconfigure_nagios.sh - but that did not help. The service in question lives on, so there seems to be something else going on...
Any tips on where to look next?
Re: Deadpool not deleting Stage 2 Services.
You may have multiple nagios process / kernel message queues, what is the output of these commands:
I looked through the code, I'm not sure why there would be windows line endings in it.
What are you seeing in the /usr/local/nagiosxi/var/deadpool.log file?
Code: Select all
ps aux | grep nagios.cfg
ipcs -qWhat are you seeing in the /usr/local/nagiosxi/var/deadpool.log file?
Re: Deadpool not deleting Stage 2 Services.
Ah - I think it's a multi-process issue... I'll kill this extra process and retry.
The /usr/local/nagiosxi/var/deadpool.log did not give any errors. I will post it if killing that second process doesn't do the job.
Will respond soon.
Code: Select all
[root@xi-server]# ps aux | grep nagios.cfg
nagios 29468 0.5 0.0 34104 4404 ? Ss 09:59 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 29506 0.0 0.0 33580 3028 ? S 09:59 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 31612 0.0 0.0 112708 976 pts/0 S+ 10:02 0:00 grep --color=auto nagios.cfg
[root@xi-server]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x56000080 20447232 nagios 600 0 0
Will respond soon.
Re: Deadpool not deleting Stage 2 Services.
Actually a follow up question: Is it correct to have two entries for the 'nagios.cfg' lines shown above?
One is the parent of the other... so this looks expected.
One is the parent of the other... so this looks expected.
Code: Select all
[root@xi-server]# ps -afe | grep nagios.cfg
nagios 3101 1 1 08:52 ? 00:01:14 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 3169 3101 0 08:52 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 27388 26635 0 10:10 pts/0 00:00:00 grep --color=auto nagios.cfg
Re: Deadpool not deleting Stage 2 Services.
Sanitized output of deadpool.log -- Initial deletion:
Sanitized snipped of subsequent deadpool runs:
Code: Select all
Processing service 'my-client.donain.com' / 'All Disks' in stage 2
NagiosQL Service ID = 1866
SQL: DELETE FROM tbl_lnkServiceToServicegroup WHERE idSlave='2' AND idMaster='1866'
SQL: SELECt * FROM tbl_lnkServiceToServicegroup WHERE idMaster='1866'
SQL: UPDATE tbl_service SET servicegroups='0' WHERE id='1866'
SQL: UPDATE tbl_service SET last_modified='2018-08-28 11:25:02' WHERE id='1866'
Deleting service...
COMMAND: cd /usr/local/nagiosxi/scripts && ./nagiosql_delete_service.php --id=1866
--2018-08-28 11:25:02-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'nagiosql.delete.service'
0K .......... ....... 2.19M=0.008s
2018-08-28 11:25:02 (2.19 MB/s) - 'nagiosql.delete.service' saved [17994]
Reconfiguring Nagios Core...
PROCESSED HOSTS:
Array
(
)
PROCESSED SERVICES:
Array
(
[0] => Array
(
[hostname] => my-client.domain.com
[servicename] => All Disks
[stage] => 2
)
)
EMAIL:
Array
(
[to] => [email protected]
[subject] => Nagios Deadpool Report
[message] =>
Deleted Services
===
The following services were deleted from the monitoring configuration because they remained in a problem state longer than the stage 2 deadpool threshold.
my-client.domain.com / All Disks
Access Nagios XI at:
http://xi-server.domain.com/nagiosxi/
)
Sanitized snipped of subsequent deadpool runs:
Code: Select all
Processing service 'my-client.domain.com' / 'All Disks' in stage 2
NagiosQL Service ID = 1866
Not in deadpool -> skipping service
Re: Deadpool not deleting Stage 2 Services.
For what it's worth, here are the Control-M's I spoke of.
(Still being created in generated /usr/local/nagiosxi/scripts/nagiosql.delete.service file)
As seen in vim on linux system, starting on line 200 to end of file:
(Still being created in generated /usr/local/nagiosxi/scripts/nagiosql.delete.service file)
As seen in vim on linux system, starting on line 200 to end of file:
Code: Select all
<!--- CHILD HEADER START -->
<div id="child_popup_layer">
<div id="child_popup_content">
<div id="child_popup_close">
<a id="close_child_popup_link" style="display: inline-block;" title="Close"><i class="fa fa-times" style="font-size: 16px;"></i></a>
</div>
<div id="child_popup_container">
</div>
</div>
</div>
<!--- CHILD HEADER END --> <div id="throbber" class="sk-spinner sk-spinner-center sk-spinner-three-bounce">
<div class="sk-bounce1"></div>
<div class="sk-bounce2"></div>
<div class="sk-bounce3"></div>
</div>
</div>
<script type="text/javascript">^M
var CCM_SESSION_ID = 0;^M
var CCM_LOCK = { };^M
^M
$(document).ready(function() {^M
^M
if (CCM_SESSION_ID) {^M
^M
$(window).bind('beforeunload', function(e) {^M
$.ajax({^M
url: 'ajax.php',^M
method: 'POST',^M
async: false,^M
data: { cmd: 'removesession', ccm_session_id: CCM_SESSION_ID }^M
});^M
});^M
^M
// Update the session if user is just sitting on a page (or editing it)^M
var update_id = setInterval(update_session_and_lock, 10000);^M
^M
check_page_usage();^M
}^M
^M
$(window).resize(function() {^M
$('#lock-notice').center().css('top', '250px');^M
});^M
^M
$('#remove-lock').click(function() {^M
$.post('ajax.php', { cmd: 'takelock', lock_id: CCM_LOCK.id, ccm_session_id: CCM_SESSION_ID }, function(d) {^M
if (d.success) {^M
CCM_LOCK = { }^M
$('#lock-notice').hide();^M
clear_whiteout();^M
}^M
}, 'json');^M
});^M
});^M
^M
function update_session_and_lock()^M
{^M
// Update session and return lock values^M
var vars = { cmd: 'updatesession', ccm_session_id: CCM_SESSION_ID, obj_id: 1866 };^M
if (CCM_LOCK.id) {^M
vars.lock_id = CCM_LOCK.id;^M
}^M
^M
// Update session and get new lock if there is one^M
$.post('ajax.php', vars, function(d) {^M
if (d.has_new_lock) {^M
CCM_LOCK = d.lock;^M
$('.lock-text').html(d.locktext);^M
check_page_usage();^M
}^M
}, 'json');^M
}^M
^M
function check_page_usage()^M
{^M
if (CCM_LOCK.id) {^M
whiteout();^M
$('#lock-notice').center().css('top', '250px').show();^M
}^M
}^M
^M
^M
</script>^M
<div id="screen-overlay"></div>^M
<div id="whiteout"></div>^M
<div id="lock-notice" class="hide info-popup" style="text-align: center; padding: 25px;">^M
<h4><i class="fa fa-exclamation-triangle" style="vertical-align: middle;"></i> The page is currently being edited by another user.</h4>^M
<div class="lock-text">^M
</div>^M
<div class="btns">^M
<button type="button" id="remove-lock" class="btn btn-sm btn-danger">Remove Lock</button>^M
<a href="" class="btn btn-sm btn-default">Cancel</a>^M
</div>^M
</div>^M
<div id="loginMsgDiv" >^M
<span class='deselect'>^M
<div >^M
Login Required! </div>^M
</span>^M
</div>^M
^M
<div id='loginDiv'>
<h3>Nagios CCM Login</h3>
<form id='loginForm' action='index.php' method='post'>
<label for='username'>Username: </label><br />
<input type='text' name='username' id='username' size='20' autocomplete='off'/><br /><br />
<label for='password'>Password</label><br />
<input type='password' name='password' id='password' size='20' autocomplete='off'/><br /><br />
<input type='hidden' name='loginSubmitted' value='true' />
<input type='hidden' name='menu' value='invisible' />
<input class='ccmbutton' type='submit' name='submit' id='submit' value='Login' />
</form>
</div><!-- CHILD FOOTER START -->
<!-- CHILD FOOTER END -->
</div><!--page-->
<noframes>
<!-- This page requires a web browser which supports frames. -->
<h2>Nagios XI</h2>
<p align="center">
<a href="https://www.nagios.com/">www.nagios.com</a><br>
Copyright (c) 2009-2018 Nagios Enterprises, LLC<br>
</p>
<p>
<i>Note: These pages require a browser which supports frames</i>
</p>
</noframes>
</body>
</html>
Re: Deadpool not deleting Stage 2 Services.
You are correct about the "^M" entries in the /usr/local/nagiosxi/scripts/nagiosql.delete.service file. I can verify that I see them on my test XI boxes. I will notify our developers about the issue.
For the time being, you could try removing them in the vi editor by typing:
Note: the ^V and ^M are typed by hitting Ctrl+v and Ctrl+m. The command listed above will replace the "^M" in the file with an empty string globally.
and hitting "Enter". Save and exit.
I would recommend upgrading your Nagios XI instance to the latest (5.5.3) as this should fix the issue that you are having with deleting the services from deadpool.
https://www.nagios.com/downloads/nagios-xi/change-log/
For the time being, you could try removing them in the vi editor by typing:
Code: Select all
:%s/^V^M//gand hitting "Enter". Save and exit.
I would recommend upgrading your Nagios XI instance to the latest (5.5.3) as this should fix the issue that you are having with deleting the services from deadpool.
See the entire changelog here:Fixed issue with deadpool cron job not being able to delete host/services due to script changes -JO
https://www.nagios.com/downloads/nagios-xi/change-log/
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Deadpool not deleting Stage 2 Services.
One more thing we noticed:
It seems like the CCM user that nagiosql uses (nagiosxi) is not set properly... Most likely, you will need to go to the permissions page in the admin section and update the backend password.<div id="loginMsgDiv" >^M
<span class='deselect'>^M
<div >^M
Login Required! </div>^M
</span>^M
</div>^M
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Deadpool not deleting Stage 2 Services.
Good info - Thanks! Looking into that now.
Re: Deadpool not deleting Stage 2 Services.
Sure, let us know how it went.
Be sure to check out our Knowledgebase for helpful articles and solutions!