Hi All,
I'm new to nagios. In my server.cfg, i have a check_rabbitmq_queue which is used to monitor queues for some services in rabbitmq. But lately, i'm getting wrong alerts for the same. Everything is alerted as critical. I have manually checked and everything is fine. My inbox is flooded with these alerts (around 5 alerts in an hour for a particular service). Please let me know how to fix it and also how to change the intervals in alerts, so that i'm not being flooded with alerts.
This was the service queue.cfg
define service {
use generic-service
host_name server
service_description ServiceQueue
check_command check_rabbitmq_queue!15672!username!password!ServiceQueue!10,-1,-1,-1!15,-1,-1,-1
contact_groups nagiosadmin
}
This is the service queue.cfg now
define service {
use generic-service
host_name server
service_description ServiceQueue
check_command check_rabbitmq_queue!15672!username!password!ServiceQueue!10,5,5,1!15,10,10,2
contact_groups nagiosadmin
}
Even now, the issue still persists.
This is the alert i'm getting
>***** Nagios *****
>
>Notification Type: PROBLEM
>
>Service: ServiceQueue
>Host: server
>Address: x.x.x.x
>State: CRITICAL
>
>Date/Time: Mon Aug 07 15:42:02 UTC 2017
>
>Additional Info:
>
>RABBITMQ_QUEUE CRITICAL - consumers CRITICAL (3), messages OK (0)
>messages_ready OK (0) messages_unacknowledged OK (0)
Thanks in advance of for all the help and suggestions.
Getting wrong alerts for check_rabbitmq_queue in nagios
-
Siddharth Hegde
- Posts: 70
- Joined: Mon Aug 07, 2017 4:19 am
Re: Getting wrong alerts for check_rabbitmq_queue in nagios
I am going to assume you are using this plugin:
https://github.com/nagios-plugins-rabbi ... itmq_queue
No one here wrote that plugin, so the assistance you will receive might be limited. The maintainers of this plugin have a gitter chat which seems to be more appropriate for such questions:
https://gitter.im/nagios-plugins-rabbit ... s-rabbitmq
From the plugin's documentation/help:
So following your service's check_command:
1 is passed as the warning threshold for consumers, and 2 is passed as the critical. You are receiving critical alerts because the number of consumers (3) is past the threshold configured in the command for critical states (2).
https://github.com/nagios-plugins-rabbi ... itmq_queue
No one here wrote that plugin, so the assistance you will receive might be limited. The maintainers of this plugin have a gitter chat which seems to be more appropriate for such questions:
https://gitter.im/nagios-plugins-rabbit ... s-rabbitmq
From the plugin's documentation/help:
Code: Select all
=item -w | --warning
The warning levels for each count of messages, messages_ready,
messages_unacknowledged and consumers. This field consists of
one to four comma-separated thresholds. Specify -1 if no threshold
for a particular count.
=item -c | --critical
The critical levels for each count of messages, messages_ready,
messages_unacknowledged and consumers. This field consists of
one to four comma-separated thresholds. Specify -1 if no threshold
for a particular count.Code: Select all
check_rabbitmq_queue!15672!username!password!ServiceQueue!10,5,5,1!15,10,10,2Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Getting wrong alerts for check_rabbitmq_queue in nagios
This is true as far as the output is concerned, but we can certainly help with the flood of emails. The easiest thing to do would be the acknowledge the error. You can do that from the Service Status Detail page.mcapra wrote:. The maintainers of this plugin have a gitter chat which seems to be more appropriate for such questions:
https://gitter.im/nagios-plugins-rabbit ... s-rabbitmq
You could also change the notification interval in the CCM.
-
Siddharth Hegde
- Posts: 70
- Joined: Mon Aug 07, 2017 4:19 am
Re: Getting wrong alerts for check_rabbitmq_queue in nagios
Thanks for the suggestions and apologies for the late reply. I will follow this advice and increase the values and check for sometime. Will reply back tomorrow.mcapra wrote:I am going to assume you are using this plugin:
https://github.com/nagios-plugins-rabbi ... itmq_queue
No one here wrote that plugin, so the assistance you will receive might be limited. The maintainers of this plugin have a gitter chat which seems to be more appropriate for such questions:
https://gitter.im/nagios-plugins-rabbit ... s-rabbitmq
From the plugin's documentation/help:So following your service's check_command:Code: Select all
=item -w | --warning The warning levels for each count of messages, messages_ready, messages_unacknowledged and consumers. This field consists of one to four comma-separated thresholds. Specify -1 if no threshold for a particular count. =item -c | --critical The critical levels for each count of messages, messages_ready, messages_unacknowledged and consumers. This field consists of one to four comma-separated thresholds. Specify -1 if no threshold for a particular count.1 is passed as the warning threshold for consumers, and 2 is passed as the critical. You are receiving critical alerts because the number of consumers (3) is past the threshold configured in the command for critical states (2).Code: Select all
check_rabbitmq_queue!15672!username!password!ServiceQueue!10,5,5,1!15,10,10,2
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Getting wrong alerts for check_rabbitmq_queue in nagios
No problem at all. We're here every week day. Whenever you get around to it, we'll get you an answer!Siddharth Hegde wrote: apologies for the late reply.