This support forum board is for support questions relating to
Nagios XI , our flagship commercial network monitoring solution.
rajasegar
Posts: 1018 Joined: Sun Mar 30, 2014 10:49 pm
Post
by rajasegar » Tue Jul 24, 2018 1:48 am
Previous Thread
https://support.nagios.com/forum/viewto ... 9&start=20
Capture.JPG
Same problem again. Last time we solved a similar issue with other instance by not offloading the DB.
Cant do that in this instance as it is not offloaded
Attached is the system Profile
profile.zip
Code: Select all
[nagios@nagiosprodxi3 ~]$ /usr/local/nagios/bin/nagiostats
Nagios Stats 4.2.4
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 12-07-2016
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /var/nagiosramdisk/status.dat
Status File Age: 0d 0h 0m 10s
Status File Version: 4.2.4
Program Running Time: 0d 2h 0m 2s
Nagios PID: 15031
Total Services: 16140
Services Checked: 16140
Services Scheduled: 16140
Services Actively Checked: 16140
Services Passively Checked: 0
Total Service State Change: 0.000 / 71.840 / 0.353 %
Active Service Latency: 0.000 / 1.090 / 0.014 sec
Active Service Execution Time: 0.009 / 60.026 / 11.391 sec
Active Service State Change: 0.000 / 71.840 / 0.353 %
Active Services Last 1/5/15/60 min: 1456 / 9702 / 14507 / 16110
Passive Service Latency: 0.000 / 0.000 / 0.000 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 13269 / 7 / 2564 / 300
Services Flapping: 99
Services In Downtime: 0
Total Hosts: 2998
Hosts Checked: 2998
Hosts Scheduled: 2998
Hosts Actively Checked: 2998
Host Passively Checked: 0
Total Host State Change: 0.000 / 13.750 / 0.082 %
Active Host Latency: 0.000 / 1.113 / 0.013 sec
Active Host Execution Time: 4.032 / 28.313 / 6.700 sec
Active Host State Change: 0.000 / 13.750 / 0.082 %
Active Hosts Last 1/5/15/60 min: 487 / 2956 / 2998 / 2998
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 2508 / 490 / 0
Hosts Flapping: 3
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 738 / 4045 / 12363
Scheduled: 735 / 4024 / 12292
On-demand: 3 / 21 / 71
Parallel: 735 / 4024 / 12292
Serial: 0 / 0 / 0
Cached: 3 / 21 / 71
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 2037 / 10459 / 33266
Scheduled: 2037 / 10459 / 33266
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0
External Commands Last 1/5/15 min: 0 / 0 / 0
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Tue Jul 24, 2018 7:28 am
Lets run the following:
Code: Select all
echo "vacuum;vacuum analyse;vacuum full;"|psql nagiosxi postgres
rajasegar
Posts: 1018 Joined: Sun Mar 30, 2014 10:49 pm
Post
by rajasegar » Tue Jul 24, 2018 7:46 pm
scottwilkerson wrote: Lets run the following:
Code: Select all
echo "vacuum;vacuum analyse;vacuum full;"|psql nagiosxi postgres
FInished very fast, almost 2 seconds.
Code: Select all
[nagios@nagiosprodxi3 ~]$ echo "vacuum;vacuum analyse;vacuum full;"|psql nagiosxi postgres
VACUUM
VACUUM
VACUUM
The queue rate was very high and good after restarting the services but started going down gradually.
After 6 minutes
Capture1.JPG
After 36min
Capture2.JPG
After about 1 hour it is back to blank.
Capture3.JPG
Any other suggestions?
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Wed Jul 25, 2018 8:31 am
Can you send a current profile?
Also lets run the following
Code: Select all
echo "select count(*) from xi_events where status_code !=0"|psql nagiosxi nagiosxi
rajasegar
Posts: 1018 Joined: Sun Mar 30, 2014 10:49 pm
Post
by rajasegar » Wed Jul 25, 2018 6:31 pm
scottwilkerson wrote: Can you send a current profile?
profile (1).zip
Also lets run the following
Code: Select all
echo "select count(*) from xi_events where status_code !=0"|psql nagiosxi nagiosxi
Code: Select all
[nagios@nagiosprodxi3 ~]$ echo "select count(*) from xi_events where status_code !=0"|psql nagiosxi nagiosxi
count
-------
6959
(1 row)
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Thu Jul 26, 2018 7:29 am
Could you create a new system profile at the time this happens again, before restarting any services?
rajasegar
Posts: 1018 Joined: Sun Mar 30, 2014 10:49 pm
Post
by rajasegar » Thu Jul 26, 2018 6:35 pm
scottwilkerson wrote: Could you create a new system profile at the time this happens again, before restarting any services?
The profile I posted earlier was before the restart.
Anyway this morning it looks back to normal. This is very frustrating as it keeps on happening.
Capture.JPG
Code: Select all
Last login: Thu Jul 26 11:23:46 2018 from 172.29.2.75
[nagios@nagiosprodxi3 ~]$ echo "select count(*) from xi_events where status_code !=0"|psql nagiosxi nagiosxi
count
-------
4346
(1 row)
You do not have the required permissions to view the files attached to this post.
Last edited by
rajasegar on Sun Jul 29, 2018 7:06 pm, edited 1 time in total.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396 Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:
Post
by scottwilkerson » Fri Jul 27, 2018 7:28 am
Looking at the profile again, it looks like ndo2db wasn't able to post a critical piece of information because the database was disconnected before the write. I'm not sure if this was part of a system shutdown or what but could have been part of the cause.
Code: Select all
Jul 24 12:51:31 nagiosprodxi3 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 24 12:51:31 nagiosprodxi3 ndo2db: mysql_error: 'MySQL server has gone away'
Jul 24 12:51:31 nagiosprodxi3 ndo2db: Error: Connection to MySQL database has been lost!
Jul 24 12:51:31 nagiosprodxi3 rrdcached[1289]: caught SIGTERM
Jul 24 12:51:31 nagiosprodxi3 rrdcached[1289]: starting shutdown
Jul 24 12:51:33 nagiosprodxi3 rrdcached[1289]: clean shutdown; all RRDs flushed
rajasegar
Posts: 1018 Joined: Sun Mar 30, 2014 10:49 pm
Post
by rajasegar » Sun Jul 29, 2018 7:07 pm
scottwilkerson wrote: Looking at the profile again, it looks like ndo2db wasn't able to post a critical piece of information because the database was disconnected before the write. I'm not sure if this was part of a system shutdown or what but could have been part of the cause.
Code: Select all
Jul 24 12:51:31 nagiosprodxi3 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_processed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 24 12:51:31 nagiosprodxi3 ndo2db: mysql_error: 'MySQL server has gone away'
Jul 24 12:51:31 nagiosprodxi3 ndo2db: Error: Connection to MySQL database has been lost!
Jul 24 12:51:31 nagiosprodxi3 rrdcached[1289]: caught SIGTERM
Jul 24 12:51:31 nagiosprodxi3 rrdcached[1289]: starting shutdown
Jul 24 12:51:33 nagiosprodxi3 rrdcached[1289]: clean shutdown; all RRDs flushed
Looks like a normal shutdown as we included mysqld service shutdown in the script.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
rajasegar
Posts: 1018 Joined: Sun Mar 30, 2014 10:49 pm
Post
by rajasegar » Mon Jul 30, 2018 1:27 am
It is dead again. Please see the profile before the restart.
profile.zip
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation