Page 1 of 3
check_bpi & BPI component
Posted: Mon Feb 08, 2016 9:24 am
by smoren
Hello,
It seems there is a bug in health calculation in check_bpi.
Imagine two services – one(svc1) is in OK state, another(svc2) is in CRITICAL state and you use both of them in BPI group. If svc2 is NOT essential member, health of group is 50% - this is correct and expected. Hovever, if I mark svc2 as essential member, health is 100%. In this case, I’d expect health is either 50%, or 0%(very important component (essential member) of service (BPI group) is CRITICAL).
Now, health graphs are usable only for BPI groups with no essential members. And I cannot show to my managers health 100% when service was actually down.
In addition, here are some suggestions to improve BPI component:
- Add filter box for Available Hosts, Services and BPI Groups list. If this list contains thousands of items, it is hard to find the one I need.
- Add checkboxes to items in Authorized Users list. Selecting just rows gives room for one wrong click, and all authorized users will be no longer authorized... Alternatively, create two lists – All Users and Authorized Users + buttons to move users between these lists. The latter one will be even better – you will see all authorized users without need to scroll list of all users(especially useful when you have many users).
What's your thought on this?
Thanks.
Re: check_bpi & BPI component
Posted: Mon Feb 08, 2016 12:39 pm
by lmiltchev
Hovever, if I mark svc2 as essential member, health is 100%. In this case, I’d expect health is either 50%, or 0%(very important component (essential member) of service (BPI group) is CRITICAL).
This is by design. The "state" is still critical. In the first case, the health of the BPI group is determined, based on the threshold, in the second case - based on the state of an essential member. It is a bit confusing and I do believe we need to change:
to:
in these cases. I will discuss the issue with our developers, and will get back to you.
In addition, here are some suggestions to improve BPI component:
Add filter box for Available Hosts, Services and BPI Groups list. If this list contains thousands of items, it is hard to find the one I need.
Add checkboxes to items in Authorized Users list. Selecting just rows gives room for one wrong click, and all authorized users will be no longer authorized... Alternatively, create two lists – All Users and Authorized Users + buttons to move users between these lists. The latter one will be even better – you will see all authorized users without need to scroll list of all users(especially useful when you have many users).
These are good ideas. I filed an internal feature request ( TASK ID 7687) for adding this functionality. Thanks!
Re: check_bpi & BPI component
Posted: Mon Feb 08, 2016 1:16 pm
by smoren
Determining group state using thresholds and essential members is clear for me and it works fine. The only issue is with health percentage. Current values gives me no sense... (considering essential members)
Speaking of BPI - are there any recommendations for maximal number of BPI groups?
You're welcome for ideas...

Re: check_bpi & BPI component
Posted: Mon Feb 08, 2016 2:44 pm
by lmiltchev
Speaking of BPI - are there any recommendations for maximal number of BPI groups?
Not that I know of. Why are you asking? Are you having some issues when/after adding a large number of BPI groups?
Re: check_bpi & BPI component
Posted: Tue Feb 09, 2016 11:20 am
by smoren
I have no issues (now

). But we plan to add many new groups, so I wanted to know if there are any knows issues that may arise.
Today, I received a request to show all group to specific user, but I wasn't able to find a way how to do it? Is it even possible?
I know 3 options to see BPI groups:
- to be an admin
- to show all users all groups
- to show specific groups to specific users.
Am I missing something?
And here's another idea: How about giving specific user right to see all groups including all used hosts/services/subgroups (including correct states) no matter if that users is authorized for these hosts and services? This might be useful for service level managers. So he can check, if services (BPI groups) are configured as agreed.
You may broaden this idea in relation to my another
thread - create new role
Service Level Manager (in addition to User and Admin) and give him default permissions - show all BPI group, show all(or these: ... ) dummy hosts used in check_bpi wizzard...
I know, lot of issues must be solved... but these are just my two cents to enhance Nagios XI

Re: check_bpi & BPI component
Posted: Tue Feb 09, 2016 11:38 am
by lmiltchev
Have you tried setting up a read-only user (Admin-Manage Users), who can see all hosts and services? What happens when you log in BPI as this read-only user? Can he/she see all of the groups?
Re: check_bpi & BPI component
Posted: Tue Feb 09, 2016 12:52 pm
by smoren
I have.. but I forgot about this option

It works, thanks.
I have few more questions about check_bpi (it is very important component for us, so I'd like to fine-tune it as much as possible

):
- sometimes it returns "OK - Ok : Group health is 100.00% with 0 problem(s)" - notice 2 OK strings. Is there any reason for this? Usually there is only one OK...
- is it possible to update string pattern it uses for status information? For example I'd like information like this:
- OK - Service is available (health is 100%)
- WARNING - Service is slightly degraded (health is 50%) - host proxy1 is DOWN
- CRITICAL - Service is not available (health is 0%) - host proxy1 is DOWN,host proxy2 is DOWN
Warning and Critical states would always contain list of group items that are in problem state. (currently it is only for essential members). This might provide much useful information for our Helpdesk, Admins and managers - they will have report containing not only downtime times, but also reason of downtime (or degraded service)...
For long list of host/services in problem states for one service (BPI group) - use shorter pattern - e.g. 3 out of 5 hosts are down, 12 out of 25 services are critical,...
What do you think about it?
I've got another enhancing idea: If user is authorized for all services/hosts/groups for specific BPI group, automatically show him this group.
Re: check_bpi & BPI component
Posted: Tue Feb 09, 2016 6:21 pm
by rkennedy
- sometimes it returns "OK - Ok : Group health is 100.00% with 0 problem(s)" - notice 2 OK strings. Is there any reason for this? Usually there is only one OK...
How many members do you have as part of the group? Can you post a screenshot as an example?
- is it possible to update string pattern it uses for status information? For example I'd like information like this:
OK - Service is available (health is 100%)
WARNING - Service is slightly degraded (health is 50%) - host proxy1 is DOWN
CRITICAL - Service is not available (health is 0%) - host proxy1 is DOWN,host proxy2 is DOWN
Yes, take a look at this
https://assets.nagios.com/downloads/nag ... BPI_v2.pdf (scroll down to Understanding the BPI Group Logic) - you will want to set your warning / critical accordingly.
Warning and Critical states would always contain list of group items that are in problem state. (currently it is only for essential members). This might provide much useful information for our Helpdesk, Admins and managers - they will have report containing not only downtime times, but also reason of downtime (or degraded service)...
For long list of host/services in problem states for one service (BPI group) - use shorter pattern - e.g. 3 out of 5 hosts are down, 12 out of 25 services are critical,...
I can file a feature request for this, if you'd like?
I've got another enhancing idea: If user is authorized for all services/hosts/groups for specific BPI group, automatically show him this group.
When you say automatically show him this group, what are you referring to? Where?
Re: check_bpi & BPI component
Posted: Thu Feb 11, 2016 9:12 am
by smoren
How many members do you have as part of the group? Can you post a screenshot as an example?
Number of members in groups range usually from 1 member (eg. only one host) to few services. But these two OK strings appear only for a few seconds and then it gets back to 1 OK string. I have checked state history for service related to group (using check_bpi) and also for group members - no state change for both (nor even soft). This is not an issue in one group, but in many - but always only for a short period of time..
If I may suggest, try to ask developers on what conditions check_bpi can generate string like
OK - Ok : Group health....
I understand BPI Group Logic. But I wanted to ask if it is possible to customize string patters to the strings I wrote. E.g. change string from "OK - Group health is 100.00% with 0 problem(s)" to "OK - Service is available (health is 100%)". Probably this is a feature requests. My idea is to have configuration file for check_bpi with content like:
Code: Select all
OK_STRING="OK - Service is available (health is $group_health$%)"
WARNING_STRING="WARNING - Service is slightly degraded (health is $group_health$%) - $list1$"
$list1$="$membertype$ $membername$ is $memberstate$"
What's your thoughts on this?
I can file a feature request for this, if you'd like?
Please create a feature request - this may be related to previous paragraph...
When you say automatically show him this group, what are you referring to? Where?
In BPI component. Currently there are two ways how users can see BPI groups - either they are given permission to see all groups or they are authorized for specific group(s). This would be 3rd option. This might significantly decrease time to configure authorized users for groups since in many situations no configuration will be needed

.
I will discuss the issue with our developers, and will get back to you.
Did you have a chance to discuss this with developers? (health=100% when essential member is in problem state)
Sorry for such a long post..

Re: check_bpi & BPI component
Posted: Thu Feb 11, 2016 4:33 pm
by lmiltchev
I was able to recreate the issue with the "duplicated" status (OK - Ok: xxx, CRITICAL - Critical: xxx, etc.) in the output, and filed an internal bug report (TASK ID 7738).
Also, I posted a feature request about adding the ability in BPI to customize the messages (TASK ID 7739). Thank you!