Page 1 of 1

Server(XXXXXX): IPCS Queue is WARNING state to frequently

Posted: Mon Aug 05, 2019 4:13 am
by rtsupport
Hello Nagios Support Forum,

We are getting the below alert very frequently so please let us know where and which logs need to check the information to fix the issue.

Alert Subject message >> "Message queue count on host usaXXXXXX has reached 50000000"

***** Nagios XI Alert *****

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: IPCS Queue
Host: XXXXXXX
Address: XX.XXX.XXX.XXX
State: WARNING
Info:
WARNING: 1 message queue for the user nagios detected, Number Of Messages (Total) = 97409 (WARNING 80000), Used Bytes (Total) = 64312741B
Date/Time: 2019-08-05 04:03:12

Once recovered we will received below king of mail -

Nagios has detected this service has recovered.

Notification Type: RECOVERY

Service: IPCS Queue
Host: XXXXXXXXXX
Address: XX.XXX.XXX.XXX
State: OK
Info:
OK: 1 message queue for the user nagios detected, Number Of Messages (Total) = 0, Used Bytes (Total) = 0B
Date/Time: 2019-08-05 04:06:12

Regards,
Ashutosh Tripathi

Re: Server(XXXXXX): IPCS Queue is WARNING state to frequentl

Posted: Mon Aug 05, 2019 1:25 pm
by ssax
Please send the output of these commands (run as root):

Code: Select all

ipcs -q
sysctl -p
ulimit -a
su - nagios
ulimit -a
Please send me a copy of your profile as well so that I can investigate some things (what thresholds you have set, load, a bunch of stuff), you can download it from Admin > System Profile > Download Profile.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*​​
/usr/local/nagiosxi/html/includes/components/profile/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip​ file.​

If the profile script fails, please include the ENTIRE output.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Then run this command:

Code: Select all

grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l
If it outputs the number 2, run the command below as well and include the output, if it outputs anything other than 2 - don't run the command. (some XI systems use both mysql and postgresql if they were install prior to XI 5.0 and then upgraded from there).

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi

Re: Server(XXXXXX): IPCS Queue is WARNING state to frequentl

Posted: Tue Aug 06, 2019 3:40 am
by rtsupport
Hello Team,

Please find the output as asked and do the further investigation based on that.

[user@usaXXXXXX ~]$ sudo su -
-bash-4.1# pws
-bash: pws: command not found
-bash-4.1# clear
-bash-4.1# pwd
/root
-bash-4.1# ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0xdb000002 13500416 nagios 600 0 0


-bash-4.1# sysctl -p
kernel.sysrq = 1
kernel.core_uses_pid = 1
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.ip_forward = 0
sunrpc.tcp_slot_table_entries = 128
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_tw_recycle = 0
net.ipv4.ip_local_port_range = 9000 65500
kernel.randomize_va_space = 2
kernel.exec-shield = 0
kernel.msgmni = 512000
kernel.msgmnb = 522288000
kernel.msgmax = 522288000
kernel.shmall = 268435456
kernel.shmmax = 4294967295
kernel.shmmni = 4096
kernel.sem = 1250 256000 100 1024
fs.file-max = 13041294
fs.aio-max-nr = 1048576
vm.max_map_count = 1000000
-bash-4.1#

-bash-4.1# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31360
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 31360
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
-bash-4.1#


-bash-4.1# pwd
/root
-bash-4.1# su - nagios
[nagios@XXXXXXX ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31360
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[nagios@XXXXXXX ~]$

[nagios@XXXXXXX ~]$ echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 2.42 |
| nagios_commands | 0.06 |
| nagios_commenthistory | 62.00 |
| nagios_comments | 0.88 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 1.04 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.04 |
| nagios_contactgroup_members | 0.13 |
| nagios_contactgroups | 0.04 |
| nagios_contactnotificationmethods | 926.51 |
| nagios_contactnotifications | 979.90 |
| nagios_contacts | 0.05 |
| nagios_contactstatus | 0.03 |
| nagios_customvariables | 0.49 |
| nagios_customvariablestatus | 0.54 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 13.87 |
| nagios_eventhandlers | 0.00 |
| nagios_externalcommands | 0.02 |
| nagios_flappinghistory | 59.42 |
| nagios_host_contactgroups | 0.05 |
| nagios_host_contacts | 0.01 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.12 |
| nagios_hostgroups | 0.03 |
| nagios_hosts | 0.19 |
| nagios_hoststatus | 0.46 |
| nagios_instances | 0.00 |
| nagios_logentries | 4005.71 |
| nagios_notifications | 277.10 |
| nagios_objects | 10.09 |
| nagios_processevents | 0.32 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.02 |
| nagios_service_contactgroups | 0.35 |
| nagios_service_contacts | 0.04 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_services | 1.21 |
| nagios_servicestatus | 3.54 |
| nagios_statehistory | 1838.37 |
| nagios_systemcommands | 0.06 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.27 |
| nagios_timeperiods | 0.06 |
| tbl_command | 0.09 |
| tbl_contact | 0.06 |
| tbl_contactgroup | 0.07 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.15 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.09 |
| tbl_hosttemplate | 0.01 |
| tbl_info | 0.13 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.12 |
| tbl_lnkContactToContacttemplate | 0.02 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.01 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.00 |
| tbl_lnkHostToContactgroup | 0.04 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.07 |
| tbl_lnkHostToHosttemplate | 0.04 |
| tbl_lnkHostToVariabledefinition | 0.26 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.00 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.00 |
| tbl_lnkServiceToContactgroup | 0.00 |
| tbl_lnkServiceToHost | 0.07 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.12 |
| tbl_lnkServiceToVariabledefinition | 0.01 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServicegroupToService | 0.00 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.00 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_service | 0.37 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.02 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.25 |
| tbl_timeperiod | 0.12 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.55 |
+--------------------------------------------+------------+
[nagios@XXXXXXX ~]$


[nagios@XXXXXXX ~]$ grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l
2


[nagios@XXXXXXX ~]$ echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
table | size | externalsize
--------------+------------+--------------
xi_meta | 35 MB | 5352 kB
xi_usermeta | 5400 kB | 3232 kB
xi_auditlog | 3096 kB | 2536 kB
xi_events | 1880 kB | 1824 kB
xi_users | 224 kB | 112 kB
xi_commands | 160 kB | 152 kB
xi_sysstat | 120 kB | 72 kB
xi_options | 88 kB | 72 kB
xi_incidents | 8192 bytes | 8192 bytes
(9 rows)

You have new mail in /var/spool/mail/nagios
[nagios@XXXXXXX ~]$

Re: Server(XXXXXX): IPCS Queue is WARNING state to frequentl

Posted: Tue Aug 06, 2019 4:57 pm
by ssax
The large tables can definitely have a performance impact, I'd look at truncating this 4GB one here as it's duplicate data (already logged to /usr/local/nagios/var/nagios.log and archived off to /usr/local/nagios/var/archives/), truncating the table will impact historical data from the Event Log report ONLY (again, duplicate data) and as long as you have XI backups you can always recover them:
| nagios_logentries | 4005.71 |
Follow the In certain instances, it may be necessary to truncate (empty) one or more tables section from this guide to truncate the nagios_logentries table:

https://support.nagios.com/kb/article.php?id=24

max user processes (-u) 1024
I would also increase your nagios user's max open files to 4096 following this guide:

https://support.nagios.com/kb/article/n ... ng-19.html

Re: Server(XXXXXX): IPCS Queue is WARNING state to frequentl

Posted: Tue Aug 13, 2019 10:06 am
by rtsupport
Hello Team,

Is there any way we can truncate the specific period logs and what will be the impact if we will truncate the table you recommended ?

what impact will occur if we truncate specific period log ?

Re: Server(XXXXXX): IPCS Queue is WARNING state to frequentl

Posted: Tue Aug 13, 2019 5:04 pm
by ssax
Is there any way we can truncate the specific period logs and what will be the impact if we will truncate the table you recommended ?
Yes, you'd need to construct a SQL query to do it.
what impact will occur if we truncate specific period log ?
Depends on the tables you are truncating

The information was included in the previous message, you can look at how dbmain.php does it to construct your queries or I can take a look tomorrow and try to do it.

I recommend you only work on the nagios_notifications, nagios_logentries, and nagios_statehistory tables only as these affect their respective reports ONLY.