Services disabled are reenabled

itunixops · Post by **itunixops** » Thu Dec 10, 2020 10:43 am

Version: 5.6.5
OS RHEL 7.4

Issue:

Recently everytime we disable a service it pops back online again. We're forcing those down but always appear to come up. Its of a concern.
We're also getting some false data from some of our systems with snmp plugin. In some cases in our check disk routines in one system the / directory came up as 85% full when it was actually 15% full.

Could we have someone look at this if we submit a dump of the system.

Post by **vtrac** » Fri Dec 11, 2020 4:25 pm

Hi itunixops,
I have tested disable/enable couple services from my Nagios XI v5.7.4.
I also created a new VM with Nagios XI v5.6.5 similar to your environment but were not able to reproduce the issue.

Could you please upload picture(s) of some of those services?

Also, please remember to click "Apply Configuration" after you have clicked the "Save" button for your changes to effect under CCM ... (please see pictures below).

To further investigate the issue, could you please send me the profile.zip and the exact name of the host and services that were having enable/disable issues.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

disable-service-in-CCM.png

apply-config-for-disable-service.png

Best Regards,
Vinh

itunixops · Post by **itunixops** » Mon Dec 14, 2020 8:53 am

We have attached the profile per request. The service in question is a custom one called Check Gluster Volume.

In the last couple of days we have noticed it has not flapped but we are sending this info at this time.

We're also looking to update this to the latest code plus update the OS itself.

Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.

Post by **vtrac** » Mon Dec 14, 2020 4:40 pm

Hi itunixops,
Looking in the log "nagios.txt" I noticed lot of the followings:

Code: Select all

[1607953674] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv01.mediacomcorp.com' are stale by 0d 0h 0m 40s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
[1607953674] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv02.mediacomcorp.com' are stale by 0d 0h 0m 40s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
[1607953733] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv01.mediacomcorp.com' are stale by 0d 0h 0m 39s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.
[1607953734] Warning: The results of service 'Check Gluster Volume' on host 'IA-ITUNIXOPS - iaalmcbv02.mediacomcorp.com' are stale by 0d 0h 0m 39s (threshold=0d 0h 0m 20s).  I'm forcing an immediate check of the service.

I also noticed the followings:
There is no command called "check_gluster_vol" found as it is being called (below) in your defined service:

Code: Select all

define service {
	host_name	IA-ITUNIXOPS - iaalmcbv01.mediacomcorp.com
	service_description	Check Gluster Volume
	display_name	Check Gluster Volume
	check_period	24x7
	check_command	check_nrpe!check_gluster_vol!!!!!!!
	contact_groups	IT UNIX Ops
	notification_period	24x7
	initial_state	o
	importance	0

but there is one called "check_gluster_vol_status" as defined (below) inside the commands.cfg file:

Code: Select all

define command {
    command_name    check_gluster_vol_status
    command_line    sudo /usr/lib64/nagios/plugins/gluster/check_volume_status.py -v $ARG1$ -t $ARG2$
}

Here are my suggestions:
1) Change the command defined for "Check Gluster Volume" to (below) for both "iaalmcbv01" and "iaalmcbv02":

Code: Select all

check_command	check_nrpe!check_gluster_vol_status!!!!!!!

2) Add the command "check_gluster_vol_status" to the "nrpe.cfg" on the NRPE remote "iaalmcbv01" and "iaalmcbv02" as pictured below:

defined-command-ncpr.cfg.png

Once done, restart Nagios XI:

Code: Select all

# systemctl restart nagios.service

NRPE (remote) "iaalmcbv01" and iaalmcbv02":

Code: Select all

# systemctl restart nrpe.service

Hope this helps!!

Vinh

Nagios Support Forum

Services disabled are reenabled

Services disabled are reenabled

Re: Services disabled are reenabled

Re: Services disabled are reenabled

Re: Services disabled are reenabled