Page 2 of 3
Re: Scheduling very unstable
Posted: Fri Apr 06, 2018 1:51 pm
by cdienger
Try increasing the memory limit to 1024M or 2048M in php.ini and restarting the httpd service again. Digging a bit deeper in the code, I don't see where much else could go wrong like this unless the nagios database is also having issues and the full queue can't be pulled. Based on the profile though I would say the php memory problem is more likely.
The table that is queried for this chart is nagios_timedeventqueue. To check it the following can be run(note the row returned):
echo "select * from nagios_timedeventqueue" | mysql --uroot -pnagiosxi -Dnagios
https://assets.nagios.com/downloads/nag ... tabase.pdf covers repairing the database and although I don't think at this time it would be the problem, it wouldn't hurt to run the repair script anyway.
Re: Scheduling very unstable
Posted: Sun Apr 08, 2018 7:57 pm
by rajasegar
cdienger wrote:Try increasing the memory limit to 1024M or 2048M in php.ini and restarting the httpd service again. Digging a bit deeper in the code, I don't see where much else could go wrong like this unless the nagios database is also having issues and the full queue can't be pulled. Based on the profile though I would say the php memory problem is more likely.
The table that is queried for this chart is nagios_timedeventqueue. To check it the following can be run(note the row returned):
echo "select * from nagios_timedeventqueue" | mysql --uroot -pnagiosxi -Dnagios
https://assets.nagios.com/downloads/nag ... tabase.pdf covers repairing the database and although I don't think at this time it would be the problem, it wouldn't hurt to run the repair script anyway.
echo "select * from nagios_timedeventqueue" | mysql -uroot -pnagiosxi -Dnagios --host=10.17.19.237
This returns Empty set (0.00 sec). Scheduling seems fine now. Can you please double check?
Re: Scheduling very unstable
Posted: Mon Apr 09, 2018 12:52 pm
by scottwilkerson
rajasegar wrote:Scheduling seems fine now. Can you please double check?
There is not a lot to check if it all seems stable.
I have noted this behavior a few times in the past on systems with mod_gearman if they somehow spawn 2 nagios parent processes, but if you restart several things before we get any logs it's hard to tell. If you notice it again, it would be great if you could create a profile BEFORE trying ot take any corrective action, and open a ticket with that for analysis
Re: Scheduling very unstable
Posted: Mon Apr 09, 2018 6:49 pm
by rajasegar
scottwilkerson wrote:rajasegar wrote:Scheduling seems fine now. Can you please double check?
There is not a lot to check if it all seems stable.
I have noted this behavior a few times in the past on systems with mod_gearman if they somehow spawn 2 nagios parent processes, but if you restart several things before we get any logs it's hard to tell. If you notice it again, it would be great if you could create a profile BEFORE trying ot take any corrective action, and open a ticket with that for analysis
Ok noted. Will do that if it happens again.
As I updated previously I disabled the mod_gearman in nagios.cfg.
Code: Select all
#broker_module=/usr/lib64/mod_gearman2/mod_gearman2.o config=/etc/mod_gearman2/module.conf eventhandler=no
Re: Scheduling very unstable
Posted: Tue Apr 10, 2018 4:32 am
by rajasegar
This time another server is giving issue. Took the system profile before restarting.
18 cores, 20GB. DB not offloaded.
Capture_XI3.JPG
profile_xi3.zip
Re: Scheduling very unstable
Posted: Tue Apr 10, 2018 8:28 am
by scottwilkerson
your system log is showing these errors
Code: Select all
ndo2db: Warning: queue send error, retrying...
Please follow this document and perform all the changes
https://support.nagios.com/kb/article/n ... d-139.html
Re: Scheduling very unstable
Posted: Tue Apr 10, 2018 6:39 pm
by rajasegar
Already done this before. Here is the current config.
I can keep increasing it but the problem will still persist like in the other server before.
Code: Select all
# Controls the default maxmimum size of a mesage queue
#kernel.msgmnb = 431072000
kernel.msgmnb = 352144000
# Controls the maximum size of a message, in bytes
#kernel.msgmax = 431072000
kernel.msgmax = 352144000
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456
# The maximum number of messages allowed in any one message queue
#kernel.msgmni = 356000
kernel.msgmni = 512000
Re: Scheduling very unstable
Posted: Wed Apr 11, 2018 10:06 am
by scottwilkerson
We may want to try increasing
to
The server is getting to the point that it could really benefit from offloading the MYSQL databases to reduce some of the load on the loacal machine any move the simultaneous processing to a second server
https://assets.nagios.com/downloads/nag ... Server.pdf
Re: Scheduling very unstable
Posted: Mon Apr 30, 2018 4:03 am
by rajasegar
The problem came back. This is driving me nuts whole day
Capture.JPG
profile.zip
Repaired the databases, restarted services, restarted App and DB servers.
Still same problem.
Please advice what else to check.
# Controls the default maxmimum size of a mesage queue
kernel.msgmnb = 352144000
# Controls the maximum size of a message, in bytes
kernel.msgmax = 352144000
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456
# The maximum number of messages allowed in any one message queue
kernel.msgmni = 768000
Re: Scheduling very unstable
Posted: Mon Apr 30, 2018 1:38 pm
by tgriep
Can you run the following commands on the nagios server and post the output to the ticket? If the username, password and the name of the MYSQL is not like the examples below, please change them.
Code: Select all
/usr/local/nagios/bin/nagiostats
echo 'SELECT COUNT(*) AS total FROM nagios_hoststatus WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_hoststatus.last_check,NOW()) < 60);' |mysql -u nagios -pnagios --databases nagios -h nagiosproddb1
echo 'SELECT COUNT(*) AS total FROM nagios_hoststatus WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_hoststatus.last_check,NOW()) < 300);' |mysql -u nagios -pnagios --databases nagios -h nagiosproddb1
echo 'SELECT COUNT(*) AS total FROM nagios_hoststatus WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_hoststatus.last_check,NOW()) < 900);' |mysql -u nagios -pnagios --databases nagios -h nagiosproddb1
echo 'SELECT COUNT(*) AS total FROM nagios_servicestatus WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_servicestatus.last_check,NOW()) < 60);' |mysql -u nagios -pnagios --databases nagios -h nagiosproddb1
echo 'SELECT COUNT(*) AS total FROM nagios_servicestatus WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_servicestatus.last_check,NOW()) < 300);' |mysql -u nagios -pnagios --databases nagios -h nagiosproddb1
echo 'SELECT COUNT(*) AS total FROM nagios_servicestatus WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_servicestatus.last_check,NOW()) < 900);' |mysql --u nagios -pnagios --databases nagios -h nagiosproddb1
What this will do it to print out that Nagios Core has tested and what is in the MYSQL database so we can compare the information.