Hey folks
We have hit an issue in the last two weeks where running check_esx via sudo is not terminating child processes, and the checks co0ntinue to generate more and more zombie procs until the XI checks stop altogether. Restarting the monitoring service from the GUI resolves this. This appears to have been working for the last few months.
XI version 5.5.8 [planning an upgrade to 5.6 in late June]
Zombie processes hanging server using check_esx via sudo
Re: Zombie processes hanging server using check_esx via sudo
Can you explain more about how it is configured using sudo? I'd also like to see a profile gathered when it is in this state(Admin > System Config > System Profile > Download Profile). Please PM this to me.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Zombie processes hanging server using check_esx via sudo
Was a bank holiday here yesterday, will get this today for you.
B
B
Re: Zombie processes hanging server using check_esx via sudo
Sounds good.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Zombie processes hanging server using check_esx via sudo
Apologies, we are currently using cron to restart the services every few hours, and apparently i need a CR to disable this even temporarily. It may be tomorrow when we get the data.
Re: Zombie processes hanging server using check_esx via sudo
Noted.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Zombie processes hanging server using check_esx via sudo
I received the profile and it is likely hanging because there isn't an entry in the /etc/sudoers file to allow nagios to run the script without requiring a password. Make sure there is an entry like:
Code: Select all
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_vmware_api.plAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Zombie processes hanging server using check_esx via sudo
Sorry mate, I might not have explained this properly. This works fine *most* of the time, and has been since last October. The sudoers is done in/etc/sudoers.d
root@mon01 0 11:00:07 /home/ # cat /etc/sudoers.d/nagios
Defaults:nagios !requiretty
Cmnd_Alias NAGIOSCMD = /usr/local/nagios/libexec/check_vmware_api.pl
nagios ALL = NOPASSWD: NAGIOSCMD
Do i need to add something else to terminate properly?
root@mon01 0 11:00:07 /home/ # cat /etc/sudoers.d/nagios
Defaults:nagios !requiretty
Cmnd_Alias NAGIOSCMD = /usr/local/nagios/libexec/check_vmware_api.pl
nagios ALL = NOPASSWD: NAGIOSCMD
Do i need to add something else to terminate properly?
Re: Zombie processes hanging server using check_esx via sudo
Thanks for the clarification. Does the situation improve if you run the check with a timeout set? Try running it with "-t 30" so that it times out after 30 seconds.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Zombie processes hanging server using check_esx via sudo
Never even thought of that. I have a CR in for tomorrow and might try and sneak this in with it on one site and see how it goes.
Thing is, now that I think about it, the ESC check is actually checking customer stuff, and i reckon they upgraded to vcenter6.7 about then. Cannot confirm this, but it sounds possible. I checked the dates on the sudoers file and it was last Oct so it was running fine for 6mo+
Thing is, now that I think about it, the ESC check is actually checking customer stuff, and i reckon they upgraded to vcenter6.7 about then. Cannot confirm this, but it sounds possible. I checked the dates on the sudoers file and it was last Oct so it was running fine for 6mo+