Nagios Core and XI out of sync
-
safuanmansor
- Posts: 59
- Joined: Mon Jul 16, 2018 9:16 pm
Nagios Core and XI out of sync
Hi Support,
We have and incident where the nagiosxi lately is out of sync from the nagios core again. We encounter this issue last time and base on the mariadb logs, it is showing [Warning] Could not increase number of max_open_files to more than 20000 (request: 70011) (Last time the value is 5000). Unfortunately currently we also experiencing "Object doest not exist" issue that occur sometimes when click on any host and service and the issue gone by hitting apply configuration button randomly.
So i was wondering is there any value/guideline/recommendation/benchmark than can be follow to address the openfile limits in nagiosxi databases. Is the 2nd issue related and do you have any idea how to fix it?
Thanks
Safuan
We have and incident where the nagiosxi lately is out of sync from the nagios core again. We encounter this issue last time and base on the mariadb logs, it is showing [Warning] Could not increase number of max_open_files to more than 20000 (request: 70011) (Last time the value is 5000). Unfortunately currently we also experiencing "Object doest not exist" issue that occur sometimes when click on any host and service and the issue gone by hitting apply configuration button randomly.
So i was wondering is there any value/guideline/recommendation/benchmark than can be follow to address the openfile limits in nagiosxi databases. Is the 2nd issue related and do you have any idea how to fix it?
Thanks
Safuan
Re: Nagios Core and XI out of sync
Do you know how many hosts / services you're monitoring? What distro are you running? What version of Nagios XI?
If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.
If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.
If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORTIf the profile script fails, please include the ENTIRE output.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
-
safuanmansor
- Posts: 59
- Joined: Mon Jul 16, 2018 9:16 pm
Re: Nagios Core and XI out of sync
Hi dchurch,
The system info are as below.
XI 5.6.7
Redhat 7.9
Host: 2000+
Services: 31000+
I have pm the profile for review.
Regards,
Safuan
The system info are as below.
XI 5.6.7
Redhat 7.9
Host: 2000+
Services: 31000+
I have pm the profile for review.
Regards,
Safuan
Re: Nagios Core and XI out of sync
The second issue might be related to the max_open_files error, but it's not guaranteed.
It's likely not due to a corrupted database, but it's a good idea to run the database repair script anyway:
MariaDB issue re:max_open_files
The error means the limit is being hit somewhere. Let’s resolve that by editing any configured limits. Have a look at the following files:
- /etc/systemd/system/mariadb.service.d/migrated-from-my.cnf-settings.conf
- /etc/systemd/system/mysqld.service.d/limits.conf
- /usr/lib/systemd/system/mariadb.service
- /usr/lib/systemd/system/mariadb.service
- /etc/systemd/system/mysql.service
- /etc/systemd/system/mysqld.service
Look within those files for the following config lines:Change these lines to your new limit. For example:
Other issues
Looks like the PHP process is running into issues because it can't modify the files in your ramdisk:
I'd inspect the permissions in that directory structure to make sure the apache daemon has access to modify it. OR, since it looks like your config shied away from storing the perf data in the ramdisk, reconfigure birdseye to not use the ramdisk anymore.
It's likely not due to a corrupted database, but it's a good idea to run the database repair script anyway:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.shThe error means the limit is being hit somewhere. Let’s resolve that by editing any configured limits. Have a look at the following files:
- /etc/systemd/system/mariadb.service.d/migrated-from-my.cnf-settings.conf
- /etc/systemd/system/mysqld.service.d/limits.conf
- /usr/lib/systemd/system/mariadb.service
- /usr/lib/systemd/system/mariadb.service
- /etc/systemd/system/mysql.service
- /etc/systemd/system/mysqld.service
Look within those files for the following config lines:
Code: Select all
LimitNOFILE=
LimitMEMLOCK=Code: Select all
LimitNOFILE=100000
LimitMEMLOCK=100000Looks like the PHP process is running into issues because it can't modify the files in your ramdisk:
Code: Select all
[Sun Feb 07 21:19:24.688092 2021] [:error] [pid 21763] [client 10.150.1.143:51757] PHP Warning: unlink(/usr/local/nagiosramdisk//5/3249178946/3130682211): Permission denied in /usr/local/nagiosxi/html/includes/utils-backend.inc.php on line 0, referer: http://10.103.12.94/nagiosxi/includes/components/birdseye/birdseye.phpIf you didn't get an 8% raise over the course of the pandemic, you took a pay cut.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
-
safuanmansor
- Posts: 59
- Joined: Mon Jul 16, 2018 9:16 pm
Re: Nagios Core and XI out of sync
Hi dchurch,
The issue sometimes happened 3 to 4 time a day and yes the current workaround that we do is running the repairing database scripts. Running this script multiple times a day is so not efficient.
As for open files limits. It was configure from 5000 to 20000 last 2 month and now it hit the max again. I understand that increasing this number will allow the database to have more room for the openfiles/open files descriptor. Do you have any method or benchmark that we can refer ? So that we can tune the db prior to adding more services in the future.
E.g :
10000 service - 30000 to 50000 limit
30000 service -60000 to 100000 limit
Thanks,
Safuan
The issue sometimes happened 3 to 4 time a day and yes the current workaround that we do is running the repairing database scripts. Running this script multiple times a day is so not efficient.
As for open files limits. It was configure from 5000 to 20000 last 2 month and now it hit the max again. I understand that increasing this number will allow the database to have more room for the openfiles/open files descriptor. Do you have any method or benchmark that we can refer ? So that we can tune the db prior to adding more services in the future.
E.g :
10000 service - 30000 to 50000 limit
30000 service -60000 to 100000 limit
Thanks,
Safuan
Re: Nagios Core and XI out of sync
Please send a screenshot of Admin > Performance Settings > Databases (the whole page).
Send the output of these commands as root:
Additionally, include the output of these commands:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
This next command may fail, that's okay, not all systems run postgresql:
Send the output of these commands as root:
Code: Select all
sar
sysctl -p
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --tableCode: Select all
echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi-
safuanmansor
- Posts: 59
- Joined: Mon Jul 16, 2018 9:16 pm
Re: Nagios Core and XI out of sync
Hi Support,
Database performance screenshot. Command result:
1 . sar 2. sysctl -p 3. ulimit -a 4.ulimit -a nagios & mysql 5 Query
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| alc | 0.00 |
| bdc | 0.00 |
| hla | 0.00 |
| hlib | 0.00 |
| hlisb | 0.00 |
| limit_t1 | 0.00 |
| limit_t2 | 0.00 |
| limit_t3 | 0.00 |
| limit_total | 0.00 |
| nagios_acknowledgements | 0.60 |
| nagios_commands | 0.08 |
| nagios_commenthistory | 2730.27 |
| nagios_comments | 1.13 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 1.59 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.25 |
| nagios_contactgroup_members | 0.04 |
| nagios_contactgroups | 0.01 |
| nagios_contactnotificationmethods | 211.60 |
| nagios_contactnotifications | 221.34 |
| nagios_contacts | 0.06 |
| nagios_contactstatus | 0.04 |
| nagios_customvariables | 0.98 |
| nagios_customvariablestatus | 1.37 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 81.36 |
| nagios_eventhandlers | 0.14 |
| nagios_externalcommands | 0.40 |
| nagios_flappinghistory | 9.54 |
| nagios_host_contactgroups | 0.18 |
| nagios_host_contacts | 0.26 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.72 |
| nagios_hostgroups | 0.02 |
| nagios_hosts | 0.48 |
| nagios_hoststatus | 1.21 |
| nagios_instances | 0.00 |
| nagios_logentries | 10719.98 |
| nagios_notifications | 3057.69 |
| nagios_objects | 19.33 |
| nagios_processevents | 1.07 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.60 |
| nagios_service_contactgroups | 1.78 |
| nagios_service_contacts | 7.24 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.08 |
| nagios_servicegroups | 0.00 |
| nagios_services | 5.35 |
| nagios_servicestatus | 16.48 |
| nagios_statehistory | 865.41 |
| nagios_systemcommands | 0.11 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.04 |
| nagios_timeperiods | 0.01 |
| profile | 0.00 |
| region | 0.02 |
| tc | 0.00 |
| tbl_command | 0.12 |
| tbl_contact | 0.08 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.43 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.03 |
| tbl_hosttemplate | 0.01 |
| tbl_info | 0.13 |
| tbl_lnkcontactgrouptocontact | 0.01 |
| tbl_lnkcontactgrouptocontactgroup | 0.00 |
| tbl_lnkcontacttemplatetocommandhost | 0.00 |
| tbl_lnkcontacttemplatetocommandservice | 0.00 |
| tbl_lnkcontacttemplatetocontactgroup | 0.00 |
| tbl_lnkcontacttemplatetocontacttemplate | 0.00 |
| tbl_lnkcontacttemplatetovariabledefinition | 0.00 |
| tbl_lnkcontacttocommandhost | 0.00 |
| tbl_lnkcontacttocommandservice | 0.00 |
| tbl_lnkcontacttocontactgroup | 0.00 |
| tbl_lnkcontacttocontacttemplate | 0.01 |
| tbl_lnkcontacttovariabledefinition | 0.00 |
| tbl_lnkhostdependencytohost_dh | 0.00 |
| tbl_lnkhostdependencytohost_h | 0.00 |
| tbl_lnkhostdependencytohostgroup_dh | 0.00 |
| tbl_lnkhostdependencytohostgroup_h | 0.00 |
| tbl_lnkhostescalationtocontact | 0.00 |
| tbl_lnkhostescalationtocontactgroup | 0.00 |
| tbl_lnkhostescalationtohost | 0.00 |
| tbl_lnkhostescalationtohostgroup | 0.00 |
| tbl_lnkhostgrouptohost | 0.17 |
| tbl_lnkhostgrouptohostgroup | 0.01 |
| tbl_lnkhosttemplatetocontact | 0.00 |
| tbl_lnkhosttemplatetocontactgroup | 0.00 |
| tbl_lnkhosttemplatetohost | 0.00 |
| tbl_lnkhosttemplatetohostgroup | 0.00 |
| tbl_lnkhosttemplatetohosttemplate | 0.00 |
| tbl_lnkhosttemplatetovariabledefinition | 0.00 |
| tbl_lnkhosttocontact | 0.14 |
| tbl_lnkhosttocontactgroup | 0.11 |
| tbl_lnkhosttohost | 0.00 |
| tbl_lnkhosttohostgroup | 0.02 |
| tbl_lnkhosttohosttemplate | 0.06 |
| tbl_lnkhosttovariabledefinition | 0.02 |
| tbl_lnkservicedependencytohost_dh | 0.00 |
| tbl_lnkservicedependencytohost_h | 0.00 |
| tbl_lnkservicedependencytohostgroup_dh | 0.00 |
| tbl_lnkservicedependencytohostgroup_h | 0.00 |
| tbl_lnkservicedependencytoservice_ds | 0.00 |
| tbl_lnkservicedependencytoservice_s | 0.00 |
| tbl_lnkservicedependencytoservicegroup_ds | 0.02 |
| tbl_lnkservicedependencytoservicegroup_s | 0.02 |
| tbl_lnkserviceescalationtocontact | 0.00 |
| tbl_lnkserviceescalationtocontactgroup | 0.00 |
| tbl_lnkserviceescalationtohost | 0.00 |
| tbl_lnkserviceescalationtohostgroup | 0.00 |
| tbl_lnkserviceescalationtoservice | 0.00 |
| tbl_lnkserviceescalationtoservicegroup | 0.02 |
| tbl_lnkservicegrouptoservice | 0.08 |
| tbl_lnkservicegrouptoservicegroup | 0.00 |
| tbl_lnkservicetemplatetocontact | 0.00 |
| tbl_lnkservicetemplatetocontactgroup | 0.00 |
| tbl_lnkservicetemplatetohost | 0.00 |
| tbl_lnkservicetemplatetohostgroup | 0.00 |
| tbl_lnkservicetemplatetoservicegroup | 0.00 |
| tbl_lnkservicetemplatetoservicetemplate | 0.01 |
| tbl_lnkservicetemplatetovariabledefinition | 0.00 |
| tbl_lnkservicetocontact | 0.19 |
| tbl_lnkservicetocontactgroup | 0.19 |
| tbl_lnkservicetohost | 0.66 |
| tbl_lnkservicetohostgroup | 0.00 |
| tbl_lnkservicetoservicegroup | 0.00 |
| tbl_lnkservicetoservicetemplate | 0.20 |
| tbl_lnkservicetovariabledefinition | 0.12 |
| tbl_lnktimeperiodtotimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 1.14 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.02 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.04 |
| tbl_timeperiod | 0.02 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.26 |
| xi_auditlog | 14.85 |
| xi_auth_tokens | 0.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.01 |
| xi_eventqueue | 0.02 |
| xi_events | 0.19 |
| xi_incidents | 0.00 |
| xi_meta | 16.53 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.01 |
| xi_usermeta | 4.34 |
| xi_users | 0.03 |
+--------------------------------------------+------------+
Database performance screenshot. Command result:
1 . sar 2. sysctl -p 3. ulimit -a 4.ulimit -a nagios & mysql 5 Query
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| alc | 0.00 |
| bdc | 0.00 |
| hla | 0.00 |
| hlib | 0.00 |
| hlisb | 0.00 |
| limit_t1 | 0.00 |
| limit_t2 | 0.00 |
| limit_t3 | 0.00 |
| limit_total | 0.00 |
| nagios_acknowledgements | 0.60 |
| nagios_commands | 0.08 |
| nagios_commenthistory | 2730.27 |
| nagios_comments | 1.13 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 1.59 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.25 |
| nagios_contactgroup_members | 0.04 |
| nagios_contactgroups | 0.01 |
| nagios_contactnotificationmethods | 211.60 |
| nagios_contactnotifications | 221.34 |
| nagios_contacts | 0.06 |
| nagios_contactstatus | 0.04 |
| nagios_customvariables | 0.98 |
| nagios_customvariablestatus | 1.37 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 81.36 |
| nagios_eventhandlers | 0.14 |
| nagios_externalcommands | 0.40 |
| nagios_flappinghistory | 9.54 |
| nagios_host_contactgroups | 0.18 |
| nagios_host_contacts | 0.26 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.72 |
| nagios_hostgroups | 0.02 |
| nagios_hosts | 0.48 |
| nagios_hoststatus | 1.21 |
| nagios_instances | 0.00 |
| nagios_logentries | 10719.98 |
| nagios_notifications | 3057.69 |
| nagios_objects | 19.33 |
| nagios_processevents | 1.07 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.60 |
| nagios_service_contactgroups | 1.78 |
| nagios_service_contacts | 7.24 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.08 |
| nagios_servicegroups | 0.00 |
| nagios_services | 5.35 |
| nagios_servicestatus | 16.48 |
| nagios_statehistory | 865.41 |
| nagios_systemcommands | 0.11 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.04 |
| nagios_timeperiods | 0.01 |
| profile | 0.00 |
| region | 0.02 |
| tc | 0.00 |
| tbl_command | 0.12 |
| tbl_contact | 0.08 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.43 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.03 |
| tbl_hosttemplate | 0.01 |
| tbl_info | 0.13 |
| tbl_lnkcontactgrouptocontact | 0.01 |
| tbl_lnkcontactgrouptocontactgroup | 0.00 |
| tbl_lnkcontacttemplatetocommandhost | 0.00 |
| tbl_lnkcontacttemplatetocommandservice | 0.00 |
| tbl_lnkcontacttemplatetocontactgroup | 0.00 |
| tbl_lnkcontacttemplatetocontacttemplate | 0.00 |
| tbl_lnkcontacttemplatetovariabledefinition | 0.00 |
| tbl_lnkcontacttocommandhost | 0.00 |
| tbl_lnkcontacttocommandservice | 0.00 |
| tbl_lnkcontacttocontactgroup | 0.00 |
| tbl_lnkcontacttocontacttemplate | 0.01 |
| tbl_lnkcontacttovariabledefinition | 0.00 |
| tbl_lnkhostdependencytohost_dh | 0.00 |
| tbl_lnkhostdependencytohost_h | 0.00 |
| tbl_lnkhostdependencytohostgroup_dh | 0.00 |
| tbl_lnkhostdependencytohostgroup_h | 0.00 |
| tbl_lnkhostescalationtocontact | 0.00 |
| tbl_lnkhostescalationtocontactgroup | 0.00 |
| tbl_lnkhostescalationtohost | 0.00 |
| tbl_lnkhostescalationtohostgroup | 0.00 |
| tbl_lnkhostgrouptohost | 0.17 |
| tbl_lnkhostgrouptohostgroup | 0.01 |
| tbl_lnkhosttemplatetocontact | 0.00 |
| tbl_lnkhosttemplatetocontactgroup | 0.00 |
| tbl_lnkhosttemplatetohost | 0.00 |
| tbl_lnkhosttemplatetohostgroup | 0.00 |
| tbl_lnkhosttemplatetohosttemplate | 0.00 |
| tbl_lnkhosttemplatetovariabledefinition | 0.00 |
| tbl_lnkhosttocontact | 0.14 |
| tbl_lnkhosttocontactgroup | 0.11 |
| tbl_lnkhosttohost | 0.00 |
| tbl_lnkhosttohostgroup | 0.02 |
| tbl_lnkhosttohosttemplate | 0.06 |
| tbl_lnkhosttovariabledefinition | 0.02 |
| tbl_lnkservicedependencytohost_dh | 0.00 |
| tbl_lnkservicedependencytohost_h | 0.00 |
| tbl_lnkservicedependencytohostgroup_dh | 0.00 |
| tbl_lnkservicedependencytohostgroup_h | 0.00 |
| tbl_lnkservicedependencytoservice_ds | 0.00 |
| tbl_lnkservicedependencytoservice_s | 0.00 |
| tbl_lnkservicedependencytoservicegroup_ds | 0.02 |
| tbl_lnkservicedependencytoservicegroup_s | 0.02 |
| tbl_lnkserviceescalationtocontact | 0.00 |
| tbl_lnkserviceescalationtocontactgroup | 0.00 |
| tbl_lnkserviceescalationtohost | 0.00 |
| tbl_lnkserviceescalationtohostgroup | 0.00 |
| tbl_lnkserviceescalationtoservice | 0.00 |
| tbl_lnkserviceescalationtoservicegroup | 0.02 |
| tbl_lnkservicegrouptoservice | 0.08 |
| tbl_lnkservicegrouptoservicegroup | 0.00 |
| tbl_lnkservicetemplatetocontact | 0.00 |
| tbl_lnkservicetemplatetocontactgroup | 0.00 |
| tbl_lnkservicetemplatetohost | 0.00 |
| tbl_lnkservicetemplatetohostgroup | 0.00 |
| tbl_lnkservicetemplatetoservicegroup | 0.00 |
| tbl_lnkservicetemplatetoservicetemplate | 0.01 |
| tbl_lnkservicetemplatetovariabledefinition | 0.00 |
| tbl_lnkservicetocontact | 0.19 |
| tbl_lnkservicetocontactgroup | 0.19 |
| tbl_lnkservicetohost | 0.66 |
| tbl_lnkservicetohostgroup | 0.00 |
| tbl_lnkservicetoservicegroup | 0.00 |
| tbl_lnkservicetoservicetemplate | 0.20 |
| tbl_lnkservicetovariabledefinition | 0.12 |
| tbl_lnktimeperiodtotimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 1.14 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.02 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.04 |
| tbl_timeperiod | 0.02 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.26 |
| xi_auditlog | 14.85 |
| xi_auth_tokens | 0.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.01 |
| xi_eventqueue | 0.02 |
| xi_events | 0.19 |
| xi_incidents | 0.00 |
| xi_meta | 16.53 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.01 |
| xi_usermeta | 4.34 |
| xi_users | 0.03 |
+--------------------------------------------+------------+
You do not have the required permissions to view the files attached to this post.
Re: Nagios Core and XI out of sync
What error did you get when you tried to increase the open file limit before?
The number of open file still has to be increased in the server. It is currently set to 100000 but it still needs to be increased.
There is an MTRG process that runs every 5 minutes to gather the bandwidth data for nagios and it has 147305 files.
Depending on how it runs, that may be exceeding the open file limit.
Edit the file /etc/security/limits.conf and define / update the following settings:
Once you have made the changes save the file and restart the server to guarantee the changes are loaded and that all of the process are restarted..
In the following folder are the config files for the MRTG process.
They are typically named with the IP address that you are polling the Bandwitch data from.
If the device is no longer on your network, delete the .cfg file and that should help in dropping the open files on the server.
Open a root shell on the Nagios server and run the following command and post the output so we can see the number of max connections to the MYSQL database as that may need to be increased.
The number of open file still has to be increased in the server. It is currently set to 100000 but it still needs to be increased.
There is an MTRG process that runs every 5 minutes to gather the bandwidth data for nagios and it has 147305 files.
Depending on how it runs, that may be exceeding the open file limit.
Edit the file /etc/security/limits.conf and define / update the following settings:
Code: Select all
#locked memory
* hard memlock 128
* soft memlock 128
#open files
* soft nofile 1000000
* hard nofile 1000000
#max user processes
* hard nproc 100000
* soft nproc 100000
#stack size
* hard stack 20480
* soft stack 20480In the following folder are the config files for the MRTG process.
Code: Select all
/etc/mrtg/conf.d/If the device is no longer on your network, delete the .cfg file and that should help in dropping the open files on the server.
Open a root shell on the Nagios server and run the following command and post the output so we can see the number of max connections to the MYSQL database as that may need to be increased.
Code: Select all
mysql -u root -pnagiosxi -e "show global status like '%used_connections%'; show variables like 'max_connections';"Be sure to check out our Knowledgebase for helpful articles and solutions!
-
safuanmansor
- Posts: 59
- Joined: Mon Jul 16, 2018 9:16 pm
Re: Nagios Core and XI out of sync
Hi Tgriep,
1. The command output are as below
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| Max_used_connections | 599 |
+----------------------+-------+
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 50000 |
+-----------------+-------+
2. 3. The activity to reconfigure the limit is done. No error when trying to increase open file limit. we just want to understand how much actually the openfiles value need to be set. Cannot be just plug it from the sky right?.
3. The activity to reconfigure the limit is scheduled.
4.
Thanks,
Safuan
1. The command output are as below
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| Max_used_connections | 599 |
+----------------------+-------+
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 50000 |
+-----------------+-------+
2. 3. The activity to reconfigure the limit is done. No error when trying to increase open file limit. we just want to understand how much actually the openfiles value need to be set. Cannot be just plug it from the sky right?.
3. The activity to reconfigure the limit is scheduled.
4.
- We notice that when we adding the interface, the wizzard is adding the whole interface at the backend even though we just select a few out of hundred interface per switches (the frontend gui monitored correctly).Is this intended to work that way? I suspect this is also contribute the reason of open files is hitting the limit.tgriep wrote:There is an MTRG process that runs every 5 minutes to gather the bandwidth data for nagios and it has 147305 files.
Depending on how it runs, that may be exceeding the open file limit.
Thanks,
Safuan
Re: Nagios Core and XI out of sync
To increase performance I would reduce the size of these tables:
Then go to Admin > Performance Settings > Databases and set ALL THREE Optimize Intervals to 300 and click Update Settings.
FAQ: Can I truncate the tables first before proceeding with database repair (if I have crashed tables)?
You can truncate before repairing the DB, it's up to you. If you want to back it up first, you'll need to repair it. If you don't care, or already have a backup, truncate it first as it will speed up the DB repair process.
NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the commands if your DB is housed/stored/offloaded/contained on a different server and/or you've changed the root mysql password
If you don't care about the data, or already have a backup, you can just truncate the tables which will essentially drop and recreate the table with zero data in it (removing all historical data for the respective reports):
nagios_logentries - Impacts Event Log report length
nagios_statehistory - Impacts the State History report length
nagios_notifications - Impacts the Notifications report length
nagios_commenthistory - Impacts the Comment History age
These should technically work to clean the DB tables up manually (if the tables aren't crashed, if they are crashed, you will need to repair the database FIRST in order to run these queries):
nagios_logentries - Impacts Event Log report length
nagios_statehistory - Impacts the State History report length
nagios_notifications - Impacts the Notifications report length
nagios_commenthistory - Impacts the Comment history age
Then you should go to Admin > Performance Settings > Databases tab and adjust ALL of the retention intervals to meet your business data policy standards to keep them cleaned up as these settings are for adjusting the retention on those DB tables.
I would lower them to the smallest possible level and utilize the XI backup/restore process and the Admin > Scheduled Backups process to offload the backups to another server. Since these XI backups contain database backups you can spin them up to grab the data and report on them if needed.
2. I think MRTG is going to be the biggest culprit for this but when you have MRTG running and Nagios running checks and all the other processes running they add up. Given you have 147,000 MRTG configs and depending on the total number of ports each has it can add up quick, I think 1000000 is a good number but a lofty goal for a single XI system.
4. That is the way it was designed. It adds all of the ports that aren't administratively down. The only way around that would be to comment out the ones you don't want to monitor in the /etc/mrtg/conf.d files.
Given the size of your system you're likely getting close to extreme mitigation tactics such as implementing mod_gearman (I don't have access to your profile so I'm unsure of whether you're currently running that or not) in order to offload the processing of the checks (and open files for those checks) to increase the performance of the XI server to handle the things it needs to. You can only do so much on a single system.
Code: Select all
| nagios_commenthistory | 2730.27 |
| nagios_logentries | 10719.98 |
| nagios_notifications | 3057.69 |FAQ: Can I truncate the tables first before proceeding with database repair (if I have crashed tables)?
You can truncate before repairing the DB, it's up to you. If you want to back it up first, you'll need to repair it. If you don't care, or already have a backup, truncate it first as it will speed up the DB repair process.
NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the commands if your DB is housed/stored/offloaded/contained on a different server and/or you've changed the root mysql password
If you don't care about the data, or already have a backup, you can just truncate the tables which will essentially drop and recreate the table with zero data in it (removing all historical data for the respective reports):
nagios_logentries - Impacts Event Log report length
Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_logentries;'Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_statehistory;'Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_notifications;'Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_commenthistory;'These should technically work to clean the DB tables up manually (if the tables aren't crashed, if they are crashed, you will need to repair the database FIRST in order to run these queries):
nagios_logentries - Impacts Event Log report length
Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_logentries WHERE logentry_time <= (NOW() - INTERVAL 6 MONTH);'Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_statehistory WHERE state_time <= (NOW() - INTERVAL 6 MONTH);'Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_notifications WHERE start_time <= (NOW() - INTERVAL 6 MONTH);'Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_commenthistory WHERE entry_time <= (NOW() - INTERVAL 6 MONTH);'I would lower them to the smallest possible level and utilize the XI backup/restore process and the Admin > Scheduled Backups process to offload the backups to another server. Since these XI backups contain database backups you can spin them up to grab the data and report on them if needed.
2. I think MRTG is going to be the biggest culprit for this but when you have MRTG running and Nagios running checks and all the other processes running they add up. Given you have 147,000 MRTG configs and depending on the total number of ports each has it can add up quick, I think 1000000 is a good number but a lofty goal for a single XI system.
4. That is the way it was designed. It adds all of the ports that aren't administratively down. The only way around that would be to comment out the ones you don't want to monitor in the /etc/mrtg/conf.d files.
Given the size of your system you're likely getting close to extreme mitigation tactics such as implementing mod_gearman (I don't have access to your profile so I'm unsure of whether you're currently running that or not) in order to offload the processing of the checks (and open files for those checks) to increase the performance of the XI server to handle the things it needs to. You can only do so much on a single system.