Page 2 of 2
Re: Duplicate definition found for service
Posted: Tue Mar 05, 2019 11:52 pm
by Rohan77
Hi Benjamin Smith,
Thanks for the update, Have sent you the state history over private message for your review, Thanks.
Re: Duplicate definition found for service
Posted: Wed Mar 06, 2019 1:07 pm
by benjaminsmith
Hi
@Rohan77,
Thanks for sending that over, and it confirms the data that was in the first report - the service CPU_Total_Linux never went into a hard critical state to generate a notification. The idle data generated by the iostat is the percentage of time the CPU is not waiting on i/o and is not equal to CPU load.
If you are receiving notifications from other services, I would recommend testing your plugin and correcting the timeout issue. I would increase the time out settings to -t 60.
To test notifications you can either adjust the check command parameters to force it into a critical sate or send passive checks until the state type changes to hard critical. The send passive checks to to Home > Service Status > CPU_Total_Linux > Advanced > Submit Passive Check Result
Code: Select all
define command {
command_name scb_cpu_total_linux
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_total_procsstat_org $ARG1$ $ARG2$
}
Re: Duplicate definition found for service
Posted: Thu Mar 07, 2019 5:21 am
by Rohan77
Hi Benjamin Smith,
Thanks for the updates and advice. Earlier, I was trying to perform the test for one of the other host HK1PNAG3 which is also getting monitored through same nagios portal where PG938 is discovered. Below are the steps i performed,
1) Individual CPU Service is already present for HK1PNAG3
2) Purposely added HK1PNAG3 in default service for CPU as well to see if any error appears
3) as the CPU usage for HK1PNAG3 was 35% during test time, Lowered down the threshold to 30% for warning and 34% for critical
4) Checked under Home -> Service Detail -> HK1PNAG3 -> CPU_Total_Linux
5) 6 checks are defined per 5 mins polling interval hence need to wait for 30 mins ( 6 checks * 5 polling interval) to have the auto alert) however i clicked on "Schedule a forced immediate check" until the check reaches to 6
6) observed that until check 5, The state was still SOFT however once it reached to 6th check, It became to HARD state and got notification in nagios
As you said initially, Even there is duplicate rules present for same service, It should at least generate single alert, so this has been proved as per above test however i still have concern regards to incident which happened for PG938 on 19th Feb , why there was no notification when the client had already confirmed that CPU for PG938 had reached to 100% ? I will private message the test output to you , Thanks.
Re: Duplicate definition found for service
Posted: Thu Mar 07, 2019 11:30 am
by benjaminsmith
Hi
@Rohan77,
i still have concern regards to incident which happened for PG938 on 19th Feb , why there was no notification when the client had already confirmed that CPU for PG938 had reached to 100
Thanks for sending that over, I'm not familiar with that check command but, it looks like the your check command is working as expected. The issue is here that there is not a direct comparison between iostat data, CPU load CPU usage.
Going forward, I would recommend that the check command for CPU monitoring is suitable for the customer's requirements. For example, our default Linux wizard users check_load for monitoring the CPU usage.
Load Checks
https://support.nagios.com/kb/article/l ... s-771.html
Re: Duplicate definition found for service
Posted: Fri Mar 08, 2019 2:51 am
by Rohan77
Hi Benjamin Smith,
The issue is here that there is not a direct comparison between iostat data, CPU load CPU usage.
Can you please help to elaborate more on above statement as we have similar check command settings applied for other nagios host servers and CPU alerts are getting generated.
Re: Duplicate definition found for service
Posted: Fri Mar 08, 2019 10:31 am
by benjaminsmith
Hi
@Rohan77,
If the check command working and generating alerts on other systems, then there maybe a server or mail setting issue for this system.
If the issue still persists, please open a support ticket for this issue for faster resolution.
https://support.nagios.com/tickets/