onDemand Test for WoL Renderfarm, Adaptive Monitoring?
Posted: Mon Aug 06, 2012 10:57 am
Hy there,
i have a nice little renderfarm. Since they are not used 24/7 they are turned off/on by a managing software. When a node is supposed to be online I need to monitor the node.
Is the node realy online? Service online? NFS mount ok? etc ...
The manging software is not great but I know when a node is supposed to be online.
I tried to disabel the host checks via external commands. But this does not help. they are still shown a critical. Which is not very helpful because when 37 Hosts are offline thats great! Saving a lot of Power and the Air condition also runs at 50%.. But host number 38 was supposed to be online! ...
I really need them green. I can turn off all alerts. But that's not enough.
Right now my idea is to use Adaptive Monitoring an exchange all test command with fake one producing lots of "OK". Is there a better approach?
I have similar problems with a full hard disk (yeeees its full... I know. I told the staff. thank you.) ... Is there a plugin or something which removes hosts/services from the "critical-red-we-are-going-to-die" list when I say stop obsessing/checks for this host/service ? I like the color blue, or even pink with lilac dots ... Just not red. It always looks like I am not doing my job.
Have a nice day,
Timo
i have a nice little renderfarm. Since they are not used 24/7 they are turned off/on by a managing software. When a node is supposed to be online I need to monitor the node.
Is the node realy online? Service online? NFS mount ok? etc ...
The manging software is not great but I know when a node is supposed to be online.
I tried to disabel the host checks via external commands. But this does not help. they are still shown a critical. Which is not very helpful because when 37 Hosts are offline thats great! Saving a lot of Power and the Air condition also runs at 50%.. But host number 38 was supposed to be online! ...
I really need them green. I can turn off all alerts. But that's not enough.
Right now my idea is to use Adaptive Monitoring an exchange all test command with fake one producing lots of "OK". Is there a better approach?
I have similar problems with a full hard disk (yeeees its full... I know. I told the staff. thank you.) ... Is there a plugin or something which removes hosts/services from the "critical-red-we-are-going-to-die" list when I say stop obsessing/checks for this host/service ? I like the color blue, or even pink with lilac dots ... Just not red. It always looks like I am not doing my job.
Have a nice day,
Timo