Zombie processes hanging server using check_esx via sudo

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Zombie processes hanging server using check_esx via sudo

Post by bomahony »

Hey folks

We have hit an issue in the last two weeks where running check_esx via sudo is not terminating child processes, and the checks co0ntinue to generate more and more zombie procs until the XI checks stop altogether. Restarting the monitoring service from the GUI resolves this. This appears to have been working for the last few months.

XI version 5.5.8 [planning an upgrade to 5.6 in late June]
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Zombie processes hanging server using check_esx via sudo

Post by cdienger »

Can you explain more about how it is configured using sudo? I'd also like to see a profile gathered when it is in this state(Admin > System Config > System Profile > Download Profile). Please PM this to me.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Re: Zombie processes hanging server using check_esx via sudo

Post by bomahony »

Was a bank holiday here yesterday, will get this today for you.

B
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Zombie processes hanging server using check_esx via sudo

Post by cdienger »

Sounds good.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Re: Zombie processes hanging server using check_esx via sudo

Post by bomahony »

Apologies, we are currently using cron to restart the services every few hours, and apparently i need a CR to disable this even temporarily. It may be tomorrow when we get the data.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Zombie processes hanging server using check_esx via sudo

Post by lmiltchev »

Noted.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Zombie processes hanging server using check_esx via sudo

Post by cdienger »

I received the profile and it is likely hanging because there isn't an entry in the /etc/sudoers file to allow nagios to run the script without requiring a password. Make sure there is an entry like:

Code: Select all

nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_vmware_api.pl
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Re: Zombie processes hanging server using check_esx via sudo

Post by bomahony »

Sorry mate, I might not have explained this properly. This works fine *most* of the time, and has been since last October. The sudoers is done in/etc/sudoers.d
root@mon01 0 11:00:07 /home/ # cat /etc/sudoers.d/nagios
Defaults:nagios !requiretty
Cmnd_Alias NAGIOSCMD = /usr/local/nagios/libexec/check_vmware_api.pl
nagios ALL = NOPASSWD: NAGIOSCMD

Do i need to add something else to terminate properly?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Zombie processes hanging server using check_esx via sudo

Post by cdienger »

Thanks for the clarification. Does the situation improve if you run the check with a timeout set? Try running it with "-t 30" so that it times out after 30 seconds.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Re: Zombie processes hanging server using check_esx via sudo

Post by bomahony »

Never even thought of that. I have a CR in for tomorrow and might try and sneak this in with it on one site and see how it goes.

Thing is, now that I think about it, the ESC check is actually checking customer stuff, and i reckon they upgraded to vcenter6.7 about then. Cannot confirm this, but it sounds possible. I checked the dates on the sudoers file and it was last Oct so it was running fine for 6mo+
Locked