Page 1 of 2

Zombie processes hanging server using check_esx via sudo

Posted: Thu May 30, 2019 6:45 am
by bomahony
Hey folks

We have hit an issue in the last two weeks where running check_esx via sudo is not terminating child processes, and the checks co0ntinue to generate more and more zombie procs until the XI checks stop altogether. Restarting the monitoring service from the GUI resolves this. This appears to have been working for the last few months.

XI version 5.5.8 [planning an upgrade to 5.6 in late June]

Re: Zombie processes hanging server using check_esx via sudo

Posted: Thu May 30, 2019 11:41 am
by cdienger
Can you explain more about how it is configured using sudo? I'd also like to see a profile gathered when it is in this state(Admin > System Config > System Profile > Download Profile). Please PM this to me.

Re: Zombie processes hanging server using check_esx via sudo

Posted: Tue Jun 04, 2019 6:57 am
by bomahony
Was a bank holiday here yesterday, will get this today for you.

B

Re: Zombie processes hanging server using check_esx via sudo

Posted: Tue Jun 04, 2019 11:33 am
by cdienger
Sounds good.

Re: Zombie processes hanging server using check_esx via sudo

Posted: Wed Jun 05, 2019 6:33 am
by bomahony
Apologies, we are currently using cron to restart the services every few hours, and apparently i need a CR to disable this even temporarily. It may be tomorrow when we get the data.

Re: Zombie processes hanging server using check_esx via sudo

Posted: Wed Jun 05, 2019 1:43 pm
by lmiltchev
Noted.

Re: Zombie processes hanging server using check_esx via sudo

Posted: Wed Jun 05, 2019 1:52 pm
by cdienger
I received the profile and it is likely hanging because there isn't an entry in the /etc/sudoers file to allow nagios to run the script without requiring a password. Make sure there is an entry like:

Code: Select all

nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_vmware_api.pl

Re: Zombie processes hanging server using check_esx via sudo

Posted: Thu Jun 06, 2019 5:01 am
by bomahony
Sorry mate, I might not have explained this properly. This works fine *most* of the time, and has been since last October. The sudoers is done in/etc/sudoers.d
root@mon01 0 11:00:07 /home/ # cat /etc/sudoers.d/nagios
Defaults:nagios !requiretty
Cmnd_Alias NAGIOSCMD = /usr/local/nagios/libexec/check_vmware_api.pl
nagios ALL = NOPASSWD: NAGIOSCMD

Do i need to add something else to terminate properly?

Re: Zombie processes hanging server using check_esx via sudo

Posted: Thu Jun 06, 2019 12:56 pm
by cdienger
Thanks for the clarification. Does the situation improve if you run the check with a timeout set? Try running it with "-t 30" so that it times out after 30 seconds.

Re: Zombie processes hanging server using check_esx via sudo

Posted: Thu Jun 06, 2019 2:33 pm
by bomahony
Never even thought of that. I have a CR in for tomorrow and might try and sneak this in with it on one site and see how it goes.

Thing is, now that I think about it, the ESC check is actually checking customer stuff, and i reckon they upgraded to vcenter6.7 about then. Cannot confirm this, but it sounds possible. I checked the dates on the sudoers file and it was last Oct so it was running fine for 6mo+