Hi Benjamin Smith,
Thanks for the update, Have sent you the state history over private message for your review, Thanks.
Duplicate definition found for service
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Duplicate definition found for service
Hi @Rohan77,
Thanks for sending that over, and it confirms the data that was in the first report - the service CPU_Total_Linux never went into a hard critical state to generate a notification. The idle data generated by the iostat is the percentage of time the CPU is not waiting on i/o and is not equal to CPU load.
If you are receiving notifications from other services, I would recommend testing your plugin and correcting the timeout issue. I would increase the time out settings to -t 60.
To test notifications you can either adjust the check command parameters to force it into a critical sate or send passive checks until the state type changes to hard critical. The send passive checks to to Home > Service Status > CPU_Total_Linux > Advanced > Submit Passive Check Result
Thanks for sending that over, and it confirms the data that was in the first report - the service CPU_Total_Linux never went into a hard critical state to generate a notification. The idle data generated by the iostat is the percentage of time the CPU is not waiting on i/o and is not equal to CPU load.
If you are receiving notifications from other services, I would recommend testing your plugin and correcting the timeout issue. I would increase the time out settings to -t 60.
To test notifications you can either adjust the check command parameters to force it into a critical sate or send passive checks until the state type changes to hard critical. The send passive checks to to Home > Service Status > CPU_Total_Linux > Advanced > Submit Passive Check Result
Code: Select all
define command {
command_name scb_cpu_total_linux
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_total_procsstat_org $ARG1$ $ARG2$
}
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Duplicate definition found for service
Hi Benjamin Smith,
Thanks for the updates and advice. Earlier, I was trying to perform the test for one of the other host HK1PNAG3 which is also getting monitored through same nagios portal where PG938 is discovered. Below are the steps i performed,
1) Individual CPU Service is already present for HK1PNAG3
2) Purposely added HK1PNAG3 in default service for CPU as well to see if any error appears
3) as the CPU usage for HK1PNAG3 was 35% during test time, Lowered down the threshold to 30% for warning and 34% for critical
4) Checked under Home -> Service Detail -> HK1PNAG3 -> CPU_Total_Linux
5) 6 checks are defined per 5 mins polling interval hence need to wait for 30 mins ( 6 checks * 5 polling interval) to have the auto alert) however i clicked on "Schedule a forced immediate check" until the check reaches to 6
6) observed that until check 5, The state was still SOFT however once it reached to 6th check, It became to HARD state and got notification in nagios
As you said initially, Even there is duplicate rules present for same service, It should at least generate single alert, so this has been proved as per above test however i still have concern regards to incident which happened for PG938 on 19th Feb , why there was no notification when the client had already confirmed that CPU for PG938 had reached to 100% ? I will private message the test output to you , Thanks.
Thanks for the updates and advice. Earlier, I was trying to perform the test for one of the other host HK1PNAG3 which is also getting monitored through same nagios portal where PG938 is discovered. Below are the steps i performed,
1) Individual CPU Service is already present for HK1PNAG3
2) Purposely added HK1PNAG3 in default service for CPU as well to see if any error appears
3) as the CPU usage for HK1PNAG3 was 35% during test time, Lowered down the threshold to 30% for warning and 34% for critical
4) Checked under Home -> Service Detail -> HK1PNAG3 -> CPU_Total_Linux
5) 6 checks are defined per 5 mins polling interval hence need to wait for 30 mins ( 6 checks * 5 polling interval) to have the auto alert) however i clicked on "Schedule a forced immediate check" until the check reaches to 6
6) observed that until check 5, The state was still SOFT however once it reached to 6th check, It became to HARD state and got notification in nagios
As you said initially, Even there is duplicate rules present for same service, It should at least generate single alert, so this has been proved as per above test however i still have concern regards to incident which happened for PG938 on 19th Feb , why there was no notification when the client had already confirmed that CPU for PG938 had reached to 100% ? I will private message the test output to you , Thanks.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Duplicate definition found for service
Hi @Rohan77,
Going forward, I would recommend that the check command for CPU monitoring is suitable for the customer's requirements. For example, our default Linux wizard users check_load for monitoring the CPU usage.
Load Checks
https://support.nagios.com/kb/article/l ... s-771.html
Thanks for sending that over, I'm not familiar with that check command but, it looks like the your check command is working as expected. The issue is here that there is not a direct comparison between iostat data, CPU load CPU usage.i still have concern regards to incident which happened for PG938 on 19th Feb , why there was no notification when the client had already confirmed that CPU for PG938 had reached to 100
Going forward, I would recommend that the check command for CPU monitoring is suitable for the customer's requirements. For example, our default Linux wizard users check_load for monitoring the CPU usage.
Load Checks
https://support.nagios.com/kb/article/l ... s-771.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Duplicate definition found for service
Hi Benjamin Smith,
Can you please help to elaborate more on above statement as we have similar check command settings applied for other nagios host servers and CPU alerts are getting generated.The issue is here that there is not a direct comparison between iostat data, CPU load CPU usage.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Duplicate definition found for service
Hi @Rohan77,
If the check command working and generating alerts on other systems, then there maybe a server or mail setting issue for this system.
If the issue still persists, please open a support ticket for this issue for faster resolution.
https://support.nagios.com/tickets/
If the check command working and generating alerts on other systems, then there maybe a server or mail setting issue for this system.
If the issue still persists, please open a support ticket for this issue for faster resolution.
https://support.nagios.com/tickets/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!