I want to share a problem that we are suffering in two nagiosXI servers. We have one Nagios old server that we are "migrating" to a new one, it has a local database, let's call it OldNagios. We have another two servers with their database offloaded, one for production environment and another one for Integration environment, let's call NewNagios_PROD and NewNagios_INT.
Code: Select all
Architecture: x86_64
CPU(s): 32
RAM: 128GB Used/free (5.3/114)
load average: 3.48, 3.45, 4.88
NagiosXI: 5.6.7
Nagios Core: 4.4.5
It only occurs with the servers with offloaded database.
NewNagios_PROD Monitoring Engine Status PROD I though it was a problem with NDO2DB data insertion, after activating the debug showing SQL sentences, the information inserted doesn't show that differences:
Sorry for using quote instead of code, but is for bold marks.
Scheduled queue shows the information correctly: /etc/sysctl.conf[1617802749.793706] [002.0] [pid=22331] INSERT INTO nagios_servicestatus SET instance_id='1', service_object_id='4903', status_update_time=FROM_UNIXTIME(1617802690), output='Out: 61\.28Kbps: In: 7\.88Kbps \(Sent 2\.56Mb, Received 336\.68Kb in 342 seconds\)',..., last_check=FROM_UNIXTIME(1617802421), next_check=FROM_UNIXTIME(1617802716), check_type='0', last_state_change=FROM_UNIXTIME(1617702588), last_hard_state_change=FROM_UNIXTIME(1617702588), last_hard_state='0', last_time_ok=FROM_UNIXTIME(1617802421), last_time_warning=FROM_UNIXTIME(1617702588), last_time_unknown=FROM_UNIXTIME(0), last_time_critical=FROM_UNIXTIME(1617658232), ... ON DUPLICATE KEY UPDATE instance_id='1', service_object_id='4903', status_update_time=FROM_UNIXTIME(1617802690), output='Out: 61\.28Kbps: In: 7\.88Kbps \(Sent 2\.56Mb, Received 336\.68Kb in 342 seconds\)',..., last_check=FROM_UNIXTIME(1617802421), next_check=FROM_UNIXTIME(1617802716), check_type='0', last_state_change=FROM_UNIXTIME(1617702588), last_hard_state_change=FROM_UNIXTIME(1617702588), last_hard_state='0', last_time_ok=FROM_UNIXTIME(1617802421), last_time_warning=FROM_UNIXTIME(1617702588), last_time_unknown=FROM_UNIXTIME(0), last_time_critical=FROM_UNIXTIME(1617658232), ...
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x32000040 11 nagios 600 131023872 127953
We've suffered another problerm in the past with those queues, because meanwhile the database is purging the Xi metadata, you can't insert information and that queue is used to store the messages. So I've change those values but the problem persists, I mean, I don't know why we are saturating that values/queues.
kernel.msgmnb = 796432000
kernel.msgmax = 796432000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
We are going to configure Jumbo frame, but i'm not sure this could help to the situation.
Any help or comment will be welcome.
Thanks in advance.
BR,
Juanma.