Nagiosxi OOM issue/ kills monitoring engine
Nagiosxi OOM issue/ kills monitoring engine
I have been having issue this past week with several OOM Errors, but it also stops my monitoring service and then i have to restart it. which is bad when it happens overnight and no one gets any alerts. Could someone please help me. please let me know what extra information you may need.
You do not have the required permissions to view the files attached to this post.
Re: Nagiosxi OOM issue/ kills monitoring engine
What version of XI?
How much RAM do you have?
Please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.
If that occurs again, please get the output of these commands before fixing it:
How much RAM do you have?
Please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.
If that occurs again, please get the output of these commands before fixing it:
Code: Select all
top -n1
ps auxRe: Nagiosxi OOM issue/ kills monitoring engine
it's xi 5.7.3 and it currently has 8GB of RAM, it did only have 1GB, but i added more last wednesday and it seems to recognize it so i'm not sure what's going on.
i pm'd you my profile.
ok, i'll run the command when it stops again.
i pm'd you my profile.
ok, i'll run the command when it stops again.
Re: Nagiosxi OOM issue/ kills monitoring engine
Still seeing these:
I'm also seeing the OOM in /var/log/messages, is this a physical system? I'm wondering if you have some bad memory if you just added more RAM, did you reboot it after adding it?
You have crashed DB tables, please repair them:
Let's check our DB tables after you've repaired them:
Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
This next command may fail, that's okay, not all systems run postgresql, send the output anyways:
What is the output of this command:
Code: Select all
Oct 8 15:44:10 nagiosxi nagios: Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1602186250.perfdata.host" - errno: Cannot allocate memoryYou have crashed DB tables, please repair them:
Code: Select all
cd /usr/local/nagiosxi/scripts
./repair_databases.sh
Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --tableCode: Select all
echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxiCode: Select all
dmesgRe: Nagiosxi OOM issue/ kills monitoring engine
to answer your first question, no it is completely virtual, through VMware and yes i did reboot after.
Re: Nagiosxi OOM issue/ kills monitoring engine
first command output
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.10 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 1.04 |
| nagios_comments | 0.01 |
| nagios_configfiles | 0.01 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.29 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 0.12 |
| nagios_contactnotifications | 0.12 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.10 |
| nagios_customvariablestatus | 0.11 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.03 |
| nagios_eventhandlers | 0.00 |
| nagios_externalcommands | 0.01 |
| nagios_flappinghistory | 0.98 |
| nagios_host_contactgroups | 0.01 |
| nagios_host_contacts | 0.03 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.11 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.00 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.06 |
| nagios_hoststatus | 0.16 |
| nagios_instances | 0.00 |
| nagios_logentries | 9.69 |
| nagios_notifications | 0.13 |
| nagios_objects | 0.34 |
| nagios_processevents | 0.06 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.03 |
| nagios_service_contacts | 0.09 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.45 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.18 |
| nagios_servicestatus | 0.75 |
| nagios_statehistory | 26.40 |
| nagios_systemcommands | 0.03 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.02 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.04 |
| tbl_contact | 0.01 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.05 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.01 |
| tbl_hosttemplate | 0.01 |
| tbl_info | 0.13 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.00 |
| tbl_lnkContactToContacttemplate | 0.00 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.00 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.01 |
| tbl_lnkHostToContactgroup | 0.00 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.00 |
| tbl_lnkHostToHosttemplate | 0.01 |
| tbl_lnkHostToVariabledefinition | 0.01 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.00 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.05 |
| tbl_lnkServiceToContactgroup | 0.01 |
| tbl_lnkServiceToHost | 0.03 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.03 |
| tbl_lnkServiceToVariabledefinition | 0.02 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.00 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.01 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.14 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.02 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.02 |
| tbl_timeperiod | 0.01 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.07 |
2nd command output:
table | size | externalsize
--------------------------+------------+--------------
xi_meta | 15 MB | 3296 kB
xi_auditlog | 1512 kB | 1232 kB
xi_events | 1312 kB | 1304 kB
xi_usermeta | 632 kB | 424 kB
xi_auth_tokens | 296 kB | 288 kB
xi_options | 120 kB | 96 kB
xi_sysstat | 112 kB | 72 kB
xi_users | 88 kB | 72 kB
xi_commands | 80 kB | 72 kB
xi_mibs | 72 kB | 64 kB
xi_eventqueue | 64 kB | 56 kB
xi_cmp_nagiosbpi_backups | 64 kB | 56 kB
xi_sessions | 40 kB | 40 kB
xi_cmp_trapdata | 24 kB | 24 kB
xi_cmp_favorites | 16 kB | 16 kB
xi_deploy_jobs | 16 kB | 16 kB
xi_cmp_trapdata_log | 16 kB | 16 kB
xi_deploy_agents | 16 kB | 16 kB
xi_incidents | 8192 bytes | 8192 bytes
xi_cmp_ccm_backups | 8192 bytes | 8192 bytes
(20 rows)
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.10 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 1.04 |
| nagios_comments | 0.01 |
| nagios_configfiles | 0.01 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.29 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 0.12 |
| nagios_contactnotifications | 0.12 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.10 |
| nagios_customvariablestatus | 0.11 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.03 |
| nagios_eventhandlers | 0.00 |
| nagios_externalcommands | 0.01 |
| nagios_flappinghistory | 0.98 |
| nagios_host_contactgroups | 0.01 |
| nagios_host_contacts | 0.03 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.11 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.00 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.06 |
| nagios_hoststatus | 0.16 |
| nagios_instances | 0.00 |
| nagios_logentries | 9.69 |
| nagios_notifications | 0.13 |
| nagios_objects | 0.34 |
| nagios_processevents | 0.06 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.03 |
| nagios_service_contacts | 0.09 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.45 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.18 |
| nagios_servicestatus | 0.75 |
| nagios_statehistory | 26.40 |
| nagios_systemcommands | 0.03 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.02 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.04 |
| tbl_contact | 0.01 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.05 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.01 |
| tbl_hosttemplate | 0.01 |
| tbl_info | 0.13 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.00 |
| tbl_lnkContactToContacttemplate | 0.00 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.00 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.01 |
| tbl_lnkHostToContactgroup | 0.00 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.00 |
| tbl_lnkHostToHosttemplate | 0.01 |
| tbl_lnkHostToVariabledefinition | 0.01 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.00 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.05 |
| tbl_lnkServiceToContactgroup | 0.01 |
| tbl_lnkServiceToHost | 0.03 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.03 |
| tbl_lnkServiceToVariabledefinition | 0.02 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.00 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.01 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.14 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.02 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.02 |
| tbl_timeperiod | 0.01 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.07 |
2nd command output:
table | size | externalsize
--------------------------+------------+--------------
xi_meta | 15 MB | 3296 kB
xi_auditlog | 1512 kB | 1232 kB
xi_events | 1312 kB | 1304 kB
xi_usermeta | 632 kB | 424 kB
xi_auth_tokens | 296 kB | 288 kB
xi_options | 120 kB | 96 kB
xi_sysstat | 112 kB | 72 kB
xi_users | 88 kB | 72 kB
xi_commands | 80 kB | 72 kB
xi_mibs | 72 kB | 64 kB
xi_eventqueue | 64 kB | 56 kB
xi_cmp_nagiosbpi_backups | 64 kB | 56 kB
xi_sessions | 40 kB | 40 kB
xi_cmp_trapdata | 24 kB | 24 kB
xi_cmp_favorites | 16 kB | 16 kB
xi_deploy_jobs | 16 kB | 16 kB
xi_cmp_trapdata_log | 16 kB | 16 kB
xi_deploy_agents | 16 kB | 16 kB
xi_incidents | 8192 bytes | 8192 bytes
xi_cmp_ccm_backups | 8192 bytes | 8192 bytes
(20 rows)
You do not have the required permissions to view the files attached to this post.
Re: Nagiosxi OOM issue/ kills monitoring engine
Have you experienced the OOM kill since repairing the databases?
Re: Nagiosxi OOM issue/ kills monitoring engine
nope, i checked nagios this morning and over the weekend the monitoring engine did not stop and when i checked the data base it did not have any oom killers. that seems to have fixed it. thank you so much
Re: Nagiosxi OOM issue/ kills monitoring engine
That's great to hear, I'm glad that resolved your issue! Are we okay to lock this up and mark it as resolved?