Page 1 of 1

XI: Challenge with "disappearing Service definitions"

Posted: Wed Dec 04, 2019 3:04 pm
by inversecow
Ahoy folks,

FYI that my users are reporting an issue when attempting to leverage Service definitions & Service Templates via XI.

Generally speaking they are following the process discussed in this thread (https://support.nagios.com/forum/viewto ... =6&t=56528).
I have also detailed my steps to reproduce below.
NOTE: Any fields not detailed are either blank or go to default values.

0.) Service Template is created & applied (via CCM)

Code: Select all

# NAGIOS XI: Service Template is created & applied (via CCM)

## CCM: Create Service Template

XI: Configure > Core Config Manager

Templates > Service Templates

Add New

### Common Settings

Template Name: TEAM_CPU_Usage

Description: TEAM_CPU_Usage

Active: CHECKED

### Check Settings

Max check attempts: 1

Check period: 24x7

### Alert Settings

Manage Contacts:  registered contact assigned

Notification period:  24x7

Notification options: Warning, Critical, Recovery

Notification enabled: Skip

### Misc Settings

Blank.

## CCM: Apply Configuration

Click the "Apply Configuration" button at bottom of *Service Templates* list.
Observe and check for positive result.
1.) Service definition is created & applied (via CCM)

NOTE: This is the step where the problem occurs!

Code: Select all

# NAGIOS XI: Service definition is created & applied (via CCM)

## CCM: Create Service definition

XI: Configure > Core Config Manager

Monitoring > Services

Add New

### Common Settings

Template Name: TEAM_CPU_Usage

Description: TEAM_CPU_Usage

**NOTE: This must align with the defined rule name in the host-side *nrdp.cfg* file.**

Manage Templates: Associated with "TEAM_CPU_Usage" Service Template.

Manage Host Groups: Associated with pre-existing host group(s) which user has permissions for.

Active: CHECKED

Check command:  check_dummy

$ARG1$: 0

$ARG2$: "No data received yet."

### Check Settings

Initial state:  Ok

Max check attempts: 1

Active checks enabled:  Off

Passive checks enabled: Skip

### Alert Settings

Notification enabled: Skip

### Misc Settings

Blank.

## CCM: Apply Configuration

Click the "Apply Configuration" button at bottom of *Service Templates* list.
Observe and check for positive result.
NOTE: When I click "Apply Configuration", my new Service definition "disappears" from view, never to be found.
Also of note, if I "click away" (EG: go to another section of CCM and then return), my new Service definition "disappears" from view also.

A user with ADMIN permissions is able to see this Service definition however.
Also USERs with permissions that include "Can see all hosts and services" can see the new Service definition, however this is *undesirable* in our multi-tenant environment (we do not want everyone to see everything).

The problem with this, is the USERs are thus unable to modify their Service definitions (EG: in a self-serv capacity).
If we change the "Manage Host Groups" option at the Service definition step to use the wild-card ("*") instead of a targeted host group(s), this works (but then *every USER* sees it, regardless of tenant / security permissions).

If you would, please consider and advise how we might resolve this visibility issue?

Thank you for your time!

EDIT: Adding supplemental details below this line!

# ENV

Code: Select all

OS: RHEL 7.x (VMWare)
Nagios Core:  4.4.3
Nagios XI:  5.6.6
# NCPA: NRDP rule

Code: Select all

%HOSTNAME%|TEAM_CPU_Usage|300 = cpu/percent --warning 95 --critical 98 --aggregate avg
# XI: USER level permissions

Code: Select all

# XI: USER level permissions

## General Settings

Username: LDAP username

Email User New Password:  unchecked

Name: Derived from LDAP entry.

Email Address:  Derived from LDAP entry.

Phone Number: blank

Enable Notifications: checked

Account Enabled:  checked

## Preferences

Languages:  English (English)

Date Format:  YYYY-MM-DD HH:MM:SS

Number Format:  1,000.00

Week Format:  Sunday - Saturday

## Authentication Settings

Auth Type:  Active Directory

AD Server:  AD_server_IP,AD_server_IP,AD_server_IP

AD Username:  Derived from LDAP entry.

Allow local login if auth server login fails: unchecked

## Security Settings

Authorization Level:  User

Can see all hosts and services: unchecked

Can control all hosts and services: checked

Can configure hosts and services: checked

Can access advanced features: checked

Can access monitoring engine: unchecked

Read-only access: unchecked

API access: checked

Core Config Manager access: Limited

### Limited Access CCM Permissions

#### Group Permissions

Host Groups:  unchecked

Service Groups: checked

#### Alerting Permissions

Contacts: checked

Contact Groups: unchecked

Time Periods: checked

Host Escalations: checked

Service Escalations:  checked

#### Template Permissions

Host Templates: checked

Service Templates: checked

Contact Templates: unchecked

#### Command Permissions

Commands: checked

#### Advanced Permissions

Host Dependencies:  checked

Service Dependencies: checked

#### Tool Permissions

Static Config Editor: unchecked

User Macros:  unchecked

Import Config Files:  unchecked

Config File Management: unchecked

Re: XI: Challenge with "disappearing Service definitions"

Posted: Wed Dec 04, 2019 5:55 pm
by ssax
On an apply configuration here's what happens:
- Writes out DB changes to disk
- Restarts nagios service

Once the nagios service restarts it can take up to 3 minutes from an apply configuration for NDO2DB to write out the information to database and be considered ready for the CCM permissions building process to start. It then loops through each user and builds the permissions. How many users do you have? (that has an impact) How many total hosts/services do you have? (that also has an impact)

How long does it take (time it) from apply config until it's finally viewable by the user? What about for an admin? (time it)

Please send me a copy of your profile, you can download it from Admin > System Profile > Download Profile.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the commands if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

Code: Select all

echo "SELECT count(*) FROM nagios_objects;" | mysql -h 127.0.0.1 -uroot -pnagiosxi nagios --table

What version of XI are you running? There were some recent changes that should speed up some of that in the latest versions.

Attach your /etc/php.ini as well and we will see if we can increase some of the limits for a little better performance.

Re: XI: Challenge with "disappearing Service definitions"

Posted: Fri Dec 06, 2019 10:57 am
by inversecow
Hello again,

Thanks for the eyes on this!
Understood regarding the "back-end process" and thank you for that insight.
To be clear, the "status displays" (EG: Service Status Summary, under HOME section) section expose the "last known results" for the proper nodes that have matching NRDP rules to feed them (with my USER view, as well as with my ADMIN user).
This appears to be a visibility challenge within the CCM sub-section itself, under the CCM: Monitoring > Service section (where it is not visible for a LIMITED user, but is for an ADMIN user).
Our ORG wishes to enable the USER persons to be able to work on these, without making them ADMIN users.
I have detailed our present USER (LIMITED) permissions config at the bottom of my initial post, if this is relevant.

At present we have the following counts (going by the view in CCM Object Summary).
These numbers come from what is to be our PROD environment instance, where at present the users are performing testing ahead of offical cut-over from our legacy product.
Thus, I would expect these numbers to go up in some areas (EG: hosts count).

Code: Select all

Hosts: 145
Host Groups:  20
Services: 30
Service Groups: 0
Contacts: 76
Contact Groups: 16
For context, we have positioned our strategy as follows:

Active checks:
UP/DOWN (check_ping) active checks (via a host template), will grow as more nodes are added (several thousand once fully cut-over to this application)
We are likely add a select few other Active checks, like a hand-ful of URL monitors.

Passive checks:
Everything else.
NCPA agent layered on monitored nodes, NRDP rules used to "call home" when there is an issue to alert upon.

We are really pushing folks to stay away from Active checks (and the wizards) to minimize load on XI, and delegate that to the nodes (via NCPA & passive checks).
How long does it take (time it) from apply config until it's finally viewable by the user?
This is the base issue from my user's perspective, as it *never shows up* (or delay / lag time appears as "infinite" in its visibility to non-ADMIN users).
For example, one of my users will create an entry and days have gone by without it becoming visible to them.
In my triage re-production attempts this past morning, my test user (no ADMIN perms) created a Service definition (CCM: Monitoring > Services > Add New), which then disappears from visibility at commit screen (either after you commit, or if you "go to another page and then try to Apply Config).
Users with the ADMIN permission, continue to see the new service definition without challenge (no interruption of visibility).

It feels like some sort of failure to complete the "CCM permissions building process", where perhaps the object visibility permissions are somehow not being properly generated/applied?
What about for an admin?
It appears almost instantly for an ADMIN level user.
Specific timing is: ~30 seconds from clicking "apply config" to successful commit screen, than it is ready within another 30 seconds for ADMIN user visibility (hand timing).

I have the Profile to submit to you.
Where may I put it for you, without uploading to the forum thread?

As requested, here is the output (dB off-box):

Code: Select all

MariaDB [(none)]> SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');

+--------------------------------------------+------------+
| Table                                      | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements                    |       0.00 |
| nagios_commands                            |       0.02 |
| nagios_commenthistory                      |       0.73 |
| nagios_comments                            |       0.00 |
| nagios_configfiles                         |       0.00 |
| nagios_configfilevariables                 |       0.01 |
| nagios_conninfo                            |       0.10 |
| nagios_contact_addresses                   |       0.00 |
| nagios_contact_notificationcommands        |       0.04 |
| nagios_contactgroup_members                |       0.01 |
| nagios_contactgroups                       |       0.00 |
| nagios_contactnotificationmethods          |       2.00 |
| nagios_contactnotifications                |       2.44 |
| nagios_contacts                            |       0.01 |
| nagios_contactstatus                       |       0.01 |
| nagios_customvariables                     |       0.02 |
| nagios_customvariablestatus                |       0.02 |
| nagios_dbversion                           |       0.00 |
| nagios_downtimehistory                     |       0.01 |
| nagios_eventhandlers                       |       0.00 |
| nagios_externalcommands                    |       0.00 |
| nagios_flappinghistory                     |       0.01 |
| nagios_host_contactgroups                  |       0.01 |
| nagios_host_contacts                       |       0.01 |
| nagios_host_parenthosts                    |       0.00 |
| nagios_hostchecks                          |       0.00 |
| nagios_hostdependencies                    |       0.00 |
| nagios_hostescalation_contactgroups        |       0.00 |
| nagios_hostescalation_contacts             |       0.00 |
| nagios_hostescalations                     |       0.00 |
| nagios_hostgroup_members                   |       0.02 |
| nagios_hostgroups                          |       0.00 |
| nagios_hosts                               |       0.04 |
| nagios_hoststatus                          |       0.10 |
| nagios_instances                           |       0.00 |
| nagios_logentries                          |     154.19 |
| nagios_notifications                       |       3.00 |
| nagios_objects                             |       0.27 |
| nagios_processevents                       |       0.24 |
| nagios_programstatus                       |       0.00 |
| nagios_runtimevariables                    |       0.00 |
| nagios_scheduleddowntime                   |       0.00 |
| nagios_service_contactgroups               |       0.01 |
| nagios_service_contacts                    |       0.05 |
| nagios_service_parentservices              |       0.00 |
| nagios_servicechecks                       |       0.00 |
| nagios_servicedependencies                 |       0.00 |
| nagios_serviceescalation_contactgroups     |       0.00 |
| nagios_serviceescalation_contacts          |       0.00 |
| nagios_serviceescalations                  |       0.00 |
| nagios_servicegroup_members                |       0.00 |
| nagios_servicegroups                       |       0.00 |
| nagios_services                            |       0.22 |
| nagios_servicestatus                       |       0.95 |
| nagios_statehistory                        |       1.87 |
| nagios_systemcommands                      |       0.03 |
| nagios_timedeventqueue                     |       0.00 |
| nagios_timedevents                         |       0.00 |
| nagios_timeperiod_timeranges               |       0.03 |
| nagios_timeperiods                         |       0.01 |
| tbl_command                                |       0.06 |
| tbl_contact                                |       0.03 |
| tbl_contactgroup                           |       0.03 |
| tbl_contacttemplate                        |       0.03 |
| tbl_domain                                 |       0.03 |
| tbl_host                                   |       0.06 |
| tbl_hostdependency                         |       0.03 |
| tbl_hostescalation                         |       0.03 |
| tbl_hostextinfo                            |       0.03 |
| tbl_hostgroup                              |       0.03 |
| tbl_hosttemplate                           |       0.03 |
| tbl_info                                   |       0.17 |
| tbl_lnkContactToCommandHost                |       0.02 |
| tbl_lnkContactToCommandService             |       0.02 |
| tbl_lnkContactToContactgroup               |       0.02 |
| tbl_lnkContactToContacttemplate            |       0.02 |
| tbl_lnkContactToVariabledefinition         |       0.02 |
| tbl_lnkContactgroupToContact               |       0.02 |
| tbl_lnkContactgroupToContactgroup          |       0.02 |
| tbl_lnkContacttemplateToCommandHost        |       0.02 |
| tbl_lnkContacttemplateToCommandService     |       0.02 |
| tbl_lnkContacttemplateToContactgroup       |       0.02 |
| tbl_lnkContacttemplateToContacttemplate    |       0.02 |
| tbl_lnkContacttemplateToVariabledefinition |       0.02 |
| tbl_lnkHostToContact                       |       0.02 |
| tbl_lnkHostToContactgroup                  |       0.02 |
| tbl_lnkHostToHost                          |       0.02 |
| tbl_lnkHostToHostgroup                     |       0.02 |
| tbl_lnkHostToHosttemplate                  |       0.02 |
| tbl_lnkHostToVariabledefinition            |       0.02 |
| tbl_lnkHostdependencyToHost_DH             |       0.02 |
| tbl_lnkHostdependencyToHost_H              |       0.02 |
| tbl_lnkHostdependencyToHostgroup_DH        |       0.02 |
| tbl_lnkHostdependencyToHostgroup_H         |       0.02 |
| tbl_lnkHostescalationToContact             |       0.02 |
| tbl_lnkHostescalationToContactgroup        |       0.02 |
| tbl_lnkHostescalationToHost                |       0.02 |
| tbl_lnkHostescalationToHostgroup           |       0.02 |
| tbl_lnkHostgroupToHost                     |       0.02 |
| tbl_lnkHostgroupToHostgroup                |       0.02 |
| tbl_lnkHosttemplateToContact               |       0.02 |
| tbl_lnkHosttemplateToContactgroup          |       0.02 |
| tbl_lnkHosttemplateToHost                  |       0.02 |
| tbl_lnkHosttemplateToHostgroup             |       0.02 |
| tbl_lnkHosttemplateToHosttemplate          |       0.02 |
| tbl_lnkHosttemplateToVariabledefinition    |       0.02 |
| tbl_lnkServiceToContact                    |       0.02 |
| tbl_lnkServiceToContactgroup               |       0.02 |
| tbl_lnkServiceToHost                       |       0.02 |
| tbl_lnkServiceToHostgroup                  |       0.02 |
| tbl_lnkServiceToServicegroup               |       0.02 |
| tbl_lnkServiceToServicetemplate            |       0.02 |
| tbl_lnkServiceToVariabledefinition         |       0.02 |
| tbl_lnkServicedependencyToHost_DH          |       0.02 |
| tbl_lnkServicedependencyToHost_H           |       0.02 |
| tbl_lnkServicedependencyToHostgroup_DH     |       0.02 |
| tbl_lnkServicedependencyToHostgroup_H      |       0.02 |
| tbl_lnkServicedependencyToService_DS       |       0.02 |
| tbl_lnkServicedependencyToService_S        |       0.02 |
| tbl_lnkServicedependencyToServicegroup_DS  |       0.02 |
| tbl_lnkServicedependencyToServicegroup_S   |       0.02 |
| tbl_lnkServiceescalationToContact          |       0.02 |
| tbl_lnkServiceescalationToContactgroup     |       0.02 |
| tbl_lnkServiceescalationToHost             |       0.02 |
| tbl_lnkServiceescalationToHostgroup        |       0.02 |
| tbl_lnkServiceescalationToService          |       0.02 |
| tbl_lnkServiceescalationToServicegroup     |       0.02 |
| tbl_lnkServicegroupToService               |       0.02 |
| tbl_lnkServicegroupToServicegroup          |       0.02 |
| tbl_lnkServicetemplateToContact            |       0.02 |
| tbl_lnkServicetemplateToContactgroup       |       0.02 |
| tbl_lnkServicetemplateToHost               |       0.02 |
| tbl_lnkServicetemplateToHostgroup          |       0.02 |
| tbl_lnkServicetemplateToServicegroup       |       0.02 |
| tbl_lnkServicetemplateToServicetemplate    |       0.02 |
| tbl_lnkServicetemplateToVariabledefinition |       0.02 |
| tbl_lnkTimeperiodToTimeperiod              |       0.02 |
| tbl_logbook                                |       0.02 |
| tbl_mainmenu                               |       0.02 |
| tbl_permission                             |       0.06 |
| tbl_permission_inactive                    |       0.02 |
| tbl_service                                |       0.02 |
| tbl_servicedependency                      |       0.03 |
| tbl_serviceescalation                      |       0.03 |
| tbl_serviceextinfo                         |       0.03 |
| tbl_servicegroup                           |       0.03 |
| tbl_servicetemplate                        |       0.03 |
| tbl_session                                |       0.02 |
| tbl_session_locks                          |       0.02 |
| tbl_settings                               |       0.03 |
| tbl_submenu                                |       0.02 |
| tbl_timedefinition                         |       0.05 |
| tbl_timeperiod                             |       0.03 |
| tbl_user                                   |       0.03 |
| tbl_variabledefinition                     |       0.02 |
| xi_auditlog                                |       2.17 |
| xi_auth_tokens                             |       1.58 |
| xi_cmp_trapdata                            |       0.03 |
| xi_cmp_trapdata_log                        |       0.03 |
| xi_commands                                |       0.02 |
| xi_eventqueue                              |       0.03 |
| xi_events                                  |       0.09 |
| xi_meta                                    |       1.52 |
| xi_mibs                                    |       0.05 |
| xi_options                                 |       0.06 |
| xi_sessions                                |       0.03 |
| xi_sysstat                                 |       0.03 |
| xi_usermeta                                |       0.63 |
| xi_users                                   |       0.06 |
+--------------------------------------------+------------+
169 rows in set (0.01 sec)

Code: Select all

MariaDB [nagios]> SELECT count(*) FROM nagios_objects;

+----------+
| count(*) |
+----------+
|     2620 |
+----------+
1 row in set (0.00 sec)
What version of XI are you running? There were some recent changes that should speed up some of that in the latest versions.
# ENV

Code: Select all

OS: RHEL 7.x (VMWare)
Nagios Core:  4.4.3
Nagios XI:  5.6.6
Nagios dB:  5.5.64-MariaDB MariaDB Server (off-box)
Attach your /etc/php.ini as well and we will see if we can increase some of the limits for a little better performance.
Attached for your review.

Thanks for your efforts on this!

EDIT:
I have sent the 'profile.zip' file from my XI-PROD ENV to ssax via PM.

Re: XI: Challenge with "disappearing Service definitions"

Posted: Fri Dec 06, 2019 3:54 pm
by ssax
First, edit your /etc/php.ini and change these:

Code: Select all

max_execution_time = 60
max_input_time = 120
max_input_vars = 5000
memory_limit = 256M
To:

Code: Select all

max_execution_time = 300
max_input_time = 600
max_input_vars = 100000
memory_limit = 1G
Then restart the httpd service:

Code: Select all

service httpd restart

That should give better performance for PHP scripts.

Analyzing your profile now, I will post an update shortly.

Re: XI: Challenge with "disappearing Service definitions"

Posted: Fri Dec 06, 2019 4:01 pm
by ssax
I don't see anything that stands out from the profile.

Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

Code: Select all

https://support.nagios.com/tickets/