Limit SLA Reports to only Failures
Posted: Mon Mar 01, 2021 11:11 am
Hi
The customer wants us to tell him, using Nagios, which servers are experiencing a high load.
The "high load" concept is not defined but we thought we will define it based on CPU load.
We set an 80% warning and 90% critical limit on CPU metrics and put all CPU metrics in an "All_CPU" service group.
We can then run an SLA report on ALL_CPU group with say an SLA target of 85%.
So if a device had a high CPU for over 15% of the time it will fail SLA.
So far so good..
However the SLA report includes everything. With 5k devices it would generate a 400page PDF document.
200 for the host data and 200 for the service data (200x25=5k metrics).
Is there a way to limit the output to just the devices that failed the SLA? Or some other way to extract a report? (SQL query?)
rgds
George
The customer wants us to tell him, using Nagios, which servers are experiencing a high load.
The "high load" concept is not defined but we thought we will define it based on CPU load.
We set an 80% warning and 90% critical limit on CPU metrics and put all CPU metrics in an "All_CPU" service group.
We can then run an SLA report on ALL_CPU group with say an SLA target of 85%.
So if a device had a high CPU for over 15% of the time it will fail SLA.
So far so good..
However the SLA report includes everything. With 5k devices it would generate a 400page PDF document.
200 for the host data and 200 for the service data (200x25=5k metrics).
Is there a way to limit the output to just the devices that failed the SLA? Or some other way to extract a report? (SQL query?)
rgds
George