Services disabled are reenabled

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
itunixops
Posts: 50
Joined: Tue Jul 28, 2020 12:27 pm

Services disabled are reenabled

Post by itunixops »

Version: 5.6.5
OS RHEL 7.4

Issue:

Recently everytime we disable a service it pops back online again. We're forcing those down but always appear to come up. Its of a concern.
We're also getting some false data from some of our systems with snmp plugin. In some cases in our check disk routines in one system the / directory came up as 85% full when it was actually 15% full.

Could we have someone look at this if we submit a dump of the system.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Services disabled are reenabled

Post by vtrac »

Hi itunixops,
I have tested disable/enable couple services from my Nagios XI v5.7.4.
I also created a new VM with Nagios XI v5.6.5 similar to your environment but were not able to reproduce the issue.

Could you please upload picture(s) of some of those services?

Also, please remember to click "Apply Configuration" after you have clicked the "Save" button for your changes to effect under CCM ... (please see pictures below).

To further investigate the issue, could you please send me the profile.zip and the exact name of the host and services that were having enable/disable issues.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
disable-service-in-CCM.png
apply-config-for-disable-service.png
Best Regards,
Vinh
You do not have the required permissions to view the files attached to this post.
itunixops
Posts: 50
Joined: Tue Jul 28, 2020 12:27 pm

Re: Services disabled are reenabled

Post by itunixops »

We have attached the profile per request. The service in question is a custom one called Check Gluster Volume.

In the last couple of days we have noticed it has not flapped but we are sending this info at this time.

We're also looking to update this to the latest code plus update the OS itself.


Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: Services disabled are reenabled

Post by vtrac »

Hi itunixops,
Looking in the log "nagios.txt" I noticed lot of the followings:

Code: Select all

[1607953674] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv01.mediacomcorp.com' are stale by 0d 0h 0m 40s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
[1607953674] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv02.mediacomcorp.com' are stale by 0d 0h 0m 40s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
[1607953733] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv01.mediacomcorp.com' are stale by 0d 0h 0m 39s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
[1607953734] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv02.mediacomcorp.com' are stale by 0d 0h 0m 39s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
I also noticed the followings:
There is no command called "check_gluster_vol" found as it is being called (below) in your defined service:

Code: Select all

define service {
	host_name	IA-ITUNIXOPS - iaalmcbv01.mediacomcorp.com
	service_description	Check Gluster Volume
	display_name	Check Gluster Volume
	check_period	24x7
	check_command	check_nrpe!check_gluster_vol!!!!!!!
	contact_groups	IT UNIX Ops
	notification_period	24x7
	initial_state	o
	importance	0
but there is one called "check_gluster_vol_status" as defined (below) inside the commands.cfg file:

Code: Select all

define command {
    command_name    check_gluster_vol_status
    command_line    sudo /usr/lib64/nagios/plugins/gluster/check_volume_status.py -v $ARG1$ -t $ARG2$
}
Here are my suggestions:
1) Change the command defined for "Check Gluster Volume" to (below) for both "iaalmcbv01" and "iaalmcbv02":

Code: Select all

check_command	check_nrpe!check_gluster_vol_status!!!!!!!
2) Add the command "check_gluster_vol_status" to the "nrpe.cfg" on the NRPE remote "iaalmcbv01" and "iaalmcbv02" as pictured below:
defined-command-ncpr.cfg.png
Once done, restart Nagios XI:

Code: Select all

# systemctl restart nagios.service
NRPE (remote) "iaalmcbv01" and iaalmcbv02":

Code: Select all

# systemctl restart nrpe.service
Hope this helps!!

Vinh
You do not have the required permissions to view the files attached to this post.
Locked