I'm wondering if there's a plugin or add-on that will show a listing of current processes and keep track of each one and graph it's resource consumption on a Linux environment. basically I'd like to see the process id, how long the process has been running, the resources its using (CPU & memory) and for how long.
We have an Oracle server that were noticing that when the number of processes exceeds around 1000 it starts to slow down significantly. we'd like to start monitoring processes to see what is being spawned and how long each process is running for and what type of resources are being consumed.
thanks!
help with tracking historical data of running processes
-
bolson
Re: help with tracking historical data of running processes
Hello Chite, the difficulty in achieving your objective lies in the plugin specification. The spec is that a plugin will return a single line of text as output along with (optionally) a status of Ok, Warning, or Critical based on provided thresholds. There are plugins that will return the current number of running processes and return Warning or Critical if the count exceeds a threshold you provide. There is also a plugin check_top which aggregates parent and child (spawned or forked) processes and outputs the top CPU consumer. This plugin could be modified to return top memory consumer instead. Hope this helps!
Re: help with tracking historical data of running processes
There's also the inherent problem of PIDs eventually wrapping around, making PID a bad identifier for any given time series.
As already mentioned, you could probably modify check_top or check_procs to report the information you're looking for as Nagios formatted performance data. This, if done correctly, should give you the data required to figure out exactly what processes were running and what their resource consumption was for a given period by using the Nagios XI performance graphs.
However, the work would require some knowledge of Bash or C depending on your tool of choice. If Python is more your style, psutil would be a useful tool.
As already mentioned, you could probably modify check_top or check_procs to report the information you're looking for as Nagios formatted performance data. This, if done correctly, should give you the data required to figure out exactly what processes were running and what their resource consumption was for a given period by using the Nagios XI performance graphs.
However, the work would require some knowledge of Bash or C depending on your tool of choice. If Python is more your style, psutil would be a useful tool.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/