Page 1 of 2
Nagios process memory usage growing
Posted: Thu Nov 26, 2020 8:56 am
by sgargano
Hi all,
since today morning we are facing issues with the memory usages of our nagios process. The memory usage of this process is growing until the system crashed and is not accessible again. We upgraded our system about 1 week ago which runs pretty stable until today. No changes done on the system the last days.
Already run repair_databases.sh which didn't fix the issues.
Any advise how to solve this issue
br
Robert
Re: Nagios process memory usage growing
Posted: Thu Nov 26, 2020 9:24 am
by sgargano
I am also not able to download the system profile.
Error:
PROFILE BUILD FAILED
Array
(
)
CODE: 1
Re: Nagios process memory usage growing
Posted: Mon Nov 30, 2020 3:55 pm
by benjaminsmith
Hi
@sgargano,
So just to confirm, the upgrade worked fine for about a week, and then Nagios's process started consuming system memory?
Do you have any customized settings, such as offloaded database, mod german..etc?
That error message you're seeing when trying to download the profile is usually due to an incorrect sudoers file (which can lead to other issues as well). Please follow the steps in the article below to correct, and then send the profile for us to review.
Nagios XI - Profile Build Failed
If that fails again, try generating one from the command line.
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting
/usr/local/nagiosxi/var/components/profile.zip file. Thanks, Benjamin
Re: Nagios process memory usage growing
Posted: Tue Dec 01, 2020 12:21 pm
by sgargano
Yes I can confirm that it was working after the upgrade for about 1 week. During the upgrade from 5.5.10. to 5.7.5 we faced some issues that we need to adjust the nagios.cfg because of the NDO change and also had some problem that a db table is missing but I found some threads how to solve and after applying this changes the system was working.
I am not 100 % sure if we have some customized configurations but I don't think so. Only customization I heard from the former admin is that the default path for some config file was changed.
We needed to fully rollback to 5.5.10 due to this issues. Now also the profile download is working again etc.
In my feeling is seems that something was running full in the background which result that the system runs out of memory and finally crashing.
Even when we tried to revert to a snap we took one day before the problem appeared, the problem popped off again.
Re: Nagios process memory usage growing
Posted: Wed Dec 02, 2020 2:01 pm
by ssax
When you were on XI 5.7.5, were you running the new NDO3 or did you downgrade your NDO3 back to NDO2DB?
I've seen this occur with a couple missing table rows in nagios_hoststatus and nagios_servicestatus, the check_options field missing causing the nagios process to eventually segfault/crash.
Since you are reverted we can't really tell if those table rows were missing, but they can be added like this:
Code: Select all
mysql -h 127.0.0.1 -uroot -pnagiosxi nagios -e "ALTER TABLE nagios_hoststatus ADD check_options smallint(6) NOT NULL default '0' AFTER check_type;"
mysql -h 127.0.0.1 -uroot -pnagiosxi nagios -e "ALTER TABLE nagios_servicestatus ADD check_options smallint(6) NOT NULL default '0' AFTER check_type;"
If they exist already you would see this which can safely be ignored:
Code: Select all
ERROR 1060 (42S21) at line 1: Duplicate column name 'check_options'
If the profile build failed (when before it worked), I assume the /etc/sudoers entries didn't get updated on the system or were broken somehow, you can update them following this guide (make sure to download the same version of XI that your XI system is running at the time):
https://support.nagios.com/kb/article.p ... ategory=44
Re: Nagios process memory usage growing
Posted: Thu Dec 03, 2020 11:45 am
by sgargano
After upgrade to 5.7.5 we were running the new ND03 and NOT downgraded to NDO2DB.
I am not remember exactly but it could be the check_options field which was also missing in our case. We fixed that by manually adding these field.
I will plan a maintenance for the beginning of next week to start the vm which currently have the problem. Will follow the guide to fix the profile build and send you via private message the downloaded profile for further investigation.
Re: Nagios process memory usage growing
Posted: Fri Dec 04, 2020 3:03 pm
by benjaminsmith
Hi,
I will plan a maintenance for the beginning of next week to start the vm which currently have the problem. Will follow the guide to fix the profile build and send you via private message the downloaded profile for further investigation.
Sounds good. We'll wait for you reply. You can send the profile to either myself or Sean Sax.
If you PM the profile, please reply to this thread to bring it up in the support queue.
Thanks!
Re: Nagios process memory usage growing
Posted: Thu Dec 10, 2020 9:55 am
by sgargano
Hi Benjamin,
Profile send via Private message.
Hope you find the issues.
If you need more information please let me know.
best regards
Re: Nagios process memory usage growing
Posted: Fri Dec 11, 2020 10:44 am
by benjaminsmith
Hi
@sgargano,
It looks like the main issue here is with connection/write issues on this system with ndo3 and this is likely causing the memory usage to spike ( and Apache / Core to crash).
[1607610437] wproc: 'Core Worker 1678' seems to be choked. ret = -1; bufsize = 599: written = 0; errno = 32 (Broken pipe)
[1607610437] NDO-3: The following query failed while MySQL appears to be connected:
[1607610437] NDO-3: INSERT INTO nagios_servicechecks (instance_id, start_time,
My recommendation would be to either remain on 5.5.10 until 5.8 is released or upgrade to 5.7.5 and then downgrade the backend database to the previous version. If you would like to downgrade ndo3, the instructions are below. It's a relatively simple procedure.
Code: Select all
systemctl stop nagios
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi/subcomponents/ndoutils
./install
systemctl enable ndo2db
Then edit your /usr/local/nagios/etc/nagios.cfg and make sure this line is uncommented:
Code: Select all
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
Make sure this line is commented:
Code: Select all
#broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
Then start the nagios service:
Let me know what you decide and if you have any questions.
--Benjamin
Re: Nagios process memory usage growing
Posted: Fri Dec 18, 2020 3:28 am
by sgargano
Hi Benjamin,
during downgrade I see following error. Not sure if this leads to an error:
Code: Select all
make[2]: Entering directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.3/src'
gcc -fPIC -g -O2 -DHAVE_CONFIG_H -c -o db.o db.c
db.c: In function ‘ndo2db_db_init’:
db.c:169:29: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
if(!mysql_init(&idi->dbinfo.mysql_conn)){
^
db.c: In function ‘ndo2db_db_connect’:
db.c:211:16: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
&idi->dbinfo.mysql_conn,
^
db.c:218:4: error: ‘CLIENT_REMEMBER_OPTIONS’ undeclared (first use in this function)
CLIENT_REMEMBER_OPTIONS
^
db.c:218:4: note: each undeclared identifier is reported only once for each function it appears in
db.c:220:27: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
mysql_close(&idi->dbinfo.mysql_conn);
^
db.c:221:101: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
syslog(LOG_USER|LOG_INFO,"Error: Could not connect to MySQL database: %s",mysql_error(&idi->dbinfo.mysql_conn));
^
db.c: In function ‘ndo2db_db_disconnect’:
db.c:244:26: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
mysql_close(&idi->dbinfo.mysql_conn);
^
db.c: In function ‘ndo2db_db_hello’:
db.c:268:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
^
db.c:268:59: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
^
db.c:269:18: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
^
db.c:269:56: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
^
db.c:270:53: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
ndo2db_convert_string_to_unsignedlong(idi->dbinfo.mysql_row[0],&idi->dbinfo.instance_id);
^
db.c:273:32: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
mysql_free_result(idi->dbinfo.mysql_result);
^
db.c:274:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
idi->dbinfo.mysql_result=NULL;
^
db.c:283:56: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
idi->dbinfo.instance_id=mysql_insert_id(&idi->dbinfo.mysql_conn);
^
db.c:303:55: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
idi->dbinfo.conninfo_id=mysql_insert_id(&idi->dbinfo.mysql_conn);
^
db.c: In function ‘ndo2db_db_query’:
db.c:477:30: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
if (mysql_query(&idi->dbinfo.mysql_conn,buf)) {
^
db.c:479:75: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
syslog(LOG_USER|LOG_INFO,"mysql_error: '%s'\n", mysql_error(&idi->dbinfo.mysql_conn));
^
db.c: In function ‘ndo2db_handle_db_error’:
db.c:512:33: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
result=mysql_errno(&idi->dbinfo.mysql_conn);
^
db.c:513:13: error: ‘CR_SERVER_LOST’ undeclared (first use in this function)
if(result==CR_SERVER_LOST || result==CR_SERVER_GONE_ERROR){
^
db.c:513:39: error: ‘CR_SERVER_GONE_ERROR’ undeclared (first use in this function)
if(result==CR_SERVER_LOST || result==CR_SERVER_GONE_ERROR){
^
db.c: In function ‘ndo2db_db_get_latest_data_time’:
db.c:565:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
^
db.c:565:59: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
^
db.c:566:18: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
^
db.c:566:56: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
^
db.c:567:53: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
ndo2db_convert_string_to_unsignedlong(idi->dbinfo.mysql_row[0],t);
^
db.c:569:32: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
mysql_free_result(idi->dbinfo.mysql_result);
^
db.c:570:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
idi->dbinfo.mysql_result=NULL;
^
make[2]: *** [db.o] Error 1
make[2]: Leaving directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.3/src'
make[1]: *** [ndo2db] Error 2
make[1]: Leaving directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.3/src'
make: *** [all] Error 2
When is try to start nagios after downgrade the service failed to start because it seems that /usr/local/nagios/bin/ndomod.o is missing
br
Robert