Scheduling very unstable

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Scheduling very unstable

Post by rajasegar »

The instance was totally jammed this morning. I captured the profile before and after I did Apply Configuration before running the commands you gave.
Capture4.JPG
profile_before_restart.zip
profile_after restart.zip

Code: Select all

[nagios@nagiosprodxi1 debug]$ /usr/local/nagios/bin/nagiostats

Nagios Stats 4.2.4
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 12-07-2016
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /var/nagiosramdisk/status.dat
Status File Age:                        0d 0h 0m 3s
Status File Version:                    4.2.4

Program Running Time:                   0d 0h 32m 44s
Nagios PID:                             14327

Total Services:                         10199
Services Checked:                       10199
Services Scheduled:                     10171
Services Actively Checked:              10171
Services Passively Checked:             28
Total Service State Change:             0.000 / 55.070 / 0.115 %
Active Service Latency:                 0.000 / 73.258 / 0.424 sec
Active Service Execution Time:          0.008 / 91.019 / 1.155 sec
Active Service State Change:            0.000 / 55.070 / 0.098 %
Active Services Last 1/5/15/60 min:     1734 / 8261 / 8597 / 8846
Passive Service Latency:                0.080 / 8.698 / 0.962 sec
Passive Service State Change:           6.120 / 6.250 / 6.227 %
Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:              9601 / 300 / 76 / 222
Services Flapping:                      25
Services In Downtime:                   85

Total Hosts:                            1382
Hosts Checked:                          1382
Hosts Scheduled:                        1382
Hosts Actively Checked:                 1382
Host Passively Checked:                 0
Total Host State Change:                0.000 / 0.000 / 0.000 %
Active Host Latency:                    0.000 / 1.016 / 0.006 sec
Active Host Execution Time:             0.002 / 10.003 / 0.100 sec
Active Host State Change:               0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:        260 / 1382 / 1382 / 1382
Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
Passive Host State Change:              0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                  1375 / 7 / 0
Hosts Flapping:                         0
Hosts In Downtime:                      12

Active Host Checks Last 1/5/15 min:     310 / 1945 / 5900
   Scheduled:                           265 / 1737 / 5321
   On-demand:                           45 / 208 / 579
   Parallel:                            265 / 1737 / 5321
   Serial:                              0 / 0 / 0
   Cached:                              45 / 208 / 579
Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
Active Service Checks Last 1/5/15 min:  1837 / 8549 / 25707
   Scheduled:                           1837 / 8549 / 25707
   On-demand:                           0 / 0 / 0
   Cached:                              0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:      0 / 0 / 0


[nagios@nagiosprodxi1 debug]$ echo 'SELECT COUNT(*) AS total FROM nagios_hoststatus  WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_hoststatus.last_check,NOW()) < 60);' | mysql --user=nagios --password=nagiosdb --host=nagiosproddb1 --database=nagios
total
0

[nagios@nagiosprodxi1 debug]$ echo 'SELECT COUNT(*) AS total FROM nagios_hoststatus  WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_hoststatus.last_check,NOW()) < 300);' | mysql --user=nagios --password=nagiosdb --host=nagiosproddb1 --database=nagios
total
0
[nagios@nagiosprodxi1 debug]$ echo 'SELECT COUNT(*) AS total FROM nagios_hoststatus  WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_hoststatus.last_check,NOW()) < 900);' | mysql --user=nagios --password=nagiosdb --host=nagiosproddb1 --database=nagios
total
1296

[nagios@nagiosprodxi1 debug]$ echo 'SELECT COUNT(*) AS total FROM nagios_servicestatus  WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_servicestatus.last_check,NOW()) < 60);' | mysql --user=nagios --password=nagiosdb --host=nagiosproddb1 --database=nagios
total
0

[nagios@nagiosprodxi1 debug]$ echo 'SELECT COUNT(*) AS total FROM nagios_servicestatus  WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_servicestatus.last_check,NOW()) < 300);' | mysql --user=nagios --password=nagiosdb --host=nagiosproddb1 --database=nagios
total
0

[nagios@nagiosprodxi1 debug]$ echo 'SELECT COUNT(*) AS total FROM nagios_servicestatus  WHERE TRUE AND`active_checks_enabled`=1 AND (TIMESTAMPDIFF(SECOND,nagios_servicestatus.last_check,NOW()) < 900);' | mysql --user=nagios --password=nagiosdb --host=nagiosproddb1 --database=nagios
total
7268

[nagios@nagiosprodxi1 debug]$


You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Scheduling very unstable

Post by tgriep »

If looks like the ndo2db process may not be updating the data on the MYSQL server and is queueing the data.
I see that the MYSQL server has been offloaded to a remote server. One thing to try is to setup Jumbo Frames for the network interfaces between the XI server and the remote MYSQL server.
That will increase the size if the data packet and that should help out as there will be less traffic on the network and should speed up the updates to the MYSQL server.

Can you enable that on the servers?

What are the specifications for the MYSQL server and the XI server?
What is the speed of the CPU's allocated to both?
To find out, run the following on both the MYSQL server and the XI server and post it here.

Code: Select all

lscpu
Can you increase the speed of the CPU's allocated to the servers?
Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Scheduling very unstable

Post by rajasegar »

tgriep wrote:If looks like the ndo2db process may not be updating the data on the MYSQL server and is queueing the data.
I see that the MYSQL server has been offloaded to a remote server. One thing to try is to setup Jumbo Frames for the network interfaces between the XI server and the remote MYSQL server.
That will increase the size if the data packet and that should help out as there will be less traffic on the network and should speed up the updates to the MYSQL server.

Can you enable that on the servers?

What are the specifications for the MYSQL server and the XI server?
What is the speed of the CPU's allocated to both?
To find out, run the following on both the MYSQL server and the XI server and post it here.

Code: Select all

lscpu
Can you increase the speed of the CPU's allocated to the servers?
The CPU are all OK, under underutilised in fact. Both the app and DB
DB1_CPULoadStat.JPG
DB1_CPULoad.JPG
Setting up jumbo frames etc is a no go as the network team will point the issue at the apps.
This also does not explain why it is ok for sometime then starts acting up.

Is there is something else to check on the NDO2DB issue?

Code: Select all

[nagios@nagiosprodxi1 debug]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                20
On-line CPU(s) list:   0-19
Thread(s) per core:    1
Core(s) per socket:    20
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Stepping:              7
CPU MHz:               2893.028
BogoMIPS:              5786.05
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-19
You have new mail in /var/spool/mail/nagios
[nagios@nagiosprodxi1 debug]$


[nagios@nagiosproddb1 ~]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Stepping:              7
CPU MHz:               2893.028
BogoMIPS:              5786.05
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-3
[nagios@nagiosproddb1 ~]$



You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Scheduling very unstable

Post by rajasegar »

Do you have any instruction on how to revert back the offloaded DB?
Restarting continuously is not practical.

Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Scheduling very unstable

Post by tacolover101 »

rajasegar wrote:Do you have any instruction on how to revert back the offloaded DB?
Restarting continuously is not practical.

Thanks
stop nagios, then ndo2db, then sql.

you need to dump the remote DB, and then import it locally.

reverse engineer the instructions for offloading to apply to local once again.

then start the services back up again.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Scheduling very unstable

Post by tgriep »

When you migrated the MYSQL database, did you delete the MYSQL databases on the Nagios server or remove the MYSQL server package?

The instructions below assume that MYSQL Server is still installed on the Nagios server and that the databases exist.

Run this as root to move the MYSQL databases back

Code: Select all

service nagios stop
service ndo2db stop
service mysqld start

Code: Select all

mysqldump -u nagios -p'nagios' -h nagiosproddb1 nagios | mysql -u root -p'nagiosxi' nagios
mysqldump -u nagiosql -p'nagiosql' -h nagiosproddb1 nagiosql | mysql -u root -p'nagiosxi' nagiosql
Edit the /usr/local/nagios/etc/ndo2db.cfg file and change this line from

Code: Select all

db_host=nagiosproddb1
to

Code: Select all

db_host=localhost
Find these lines and change them from

Code: Select all

db_user=nagios
db_pass=nagios
to

Code: Select all

db_user=ndoutils
db_pass=n@gweb
Save the file.

Edit the /usr/local/nagiosxi/html/config.inc.php file

Change this section from

Code: Select all

	"ndoutils" => array(
		"dbtype" => 'mysql',
		"dbserver" => 'nagiosproddb1',
		"user" => 'nagios',
to

Code: Select all

    "ndoutils" => array(
        "dbtype" => 'mysql',
        "dbserver" => 'localhost',
        "user" => 'ndoutils',
        "pwd" => 'n@gweb',
Change this section from

Code: Select all

	"nagiosql" => array(
		"dbtype" => 'mysql',
		"dbserver" => 'nagiosproddb1',
		"user" => 'nagiosql',
to

Code: Select all

    "nagiosql" => array(
        "dbtype" => 'mysql',
        "dbserver" => 'localhost',
        "user" => 'nagiosql',
        "pwd" => 'n@gweb',
Save the file


Then edit the /var/www/html/nagiosql/config/settings.php file.
Change this section from

Code: Select all

server = nagiosproddb1
port = 3306
database = nagiosql
username = nagiosql
password = nagiosql
to

Code: Select all

server = localhost
port = 3306
database = nagiosql
username = nagiosql
password = n@gweb
Save the file and start up the ndo2db and nagios processes by running

Code: Select all

service ndo2db start
service nagios start
Login to the XI GUI and see if the system is working.

Then to Modify the backup script, edit the /root/scripts/automysqlbackup file
Change line #34 from this:

Code: Select all

DBHOST=nagiosproddb1
to

Code: Select all

DBHOST=localhost
Save the file so the automatic backup will backup the correct server's data.
Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Scheduling very unstable

Post by rajasegar »

tgriep wrote:When you migrated the MYSQL database, did you delete the MYSQL databases on the Nagios server or remove the MYSQL server package?

The instructions below assume that MYSQL Server is still installed on the Nagios server and that the databases exist.
Run this as root to move the MYSQL databases back

Code: Select all

service nagios stop
service ndo2db stop
service mysqld start

Code: Select all

mysqldump -u nagios -p'nagios' -h nagiosproddb1 nagios | mysql -u root -p'nagiosxi' nagios
mysqldump -u nagiosql -p'nagiosql' -h nagiosproddb1 nagiosql | mysql -u root -p'nagiosxi' nagiosql
Edit the /usr/local/nagios/etc/ndo2db.cfg file and change this line from

Code: Select all

db_host=nagiosproddb1
to

.....

Code: Select all

db_host=localhost
Thanks that solved all the issues.

Created the databases and assigned the grant for each of the database

Code: Select all

create database nagios;
create database nagiosql;

grant all privileges on *.* to nagiosql@localhost identified by 'n@gweb' with grant option;
grant all privileges on *.* to ndoutils@localhost identified by 'n@gweb' with grant option;

Code: Select all


Before used to be about 80k to 100k messages in queue, now very low
[nagios@nagiosprodxi1 debug]$ ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xba000002 3244032    nagios     600        57344        56
Everything is so fast now. No more offloading database.
Capture.JPG
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Scheduling very unstable

Post by tgriep »

Thanks for the feed back and glad it worked for you.
If you do not have any further questions, shall I close the post?
Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Scheduling very unstable

Post by rajasegar »

tgriep wrote:Thanks for the feed back and glad it worked for you.
If you do not have any further questions, shall I close the post?
Please close this thread. Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
Locked