Nagios process memory usage growing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Nagios process memory usage growing

Post by sgargano »

Hi all,

since today morning we are facing issues with the memory usages of our nagios process. The memory usage of this process is growing until the system crashed and is not accessible again. We upgraded our system about 1 week ago which runs pretty stable until today. No changes done on the system the last days.

Already run repair_databases.sh which didn't fix the issues.

Any advise how to solve this issue

br
Robert
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Re: Nagios process memory usage growing

Post by sgargano »

I am also not able to download the system profile.
Error:

PROFILE BUILD FAILED
Array
(
)
CODE: 1
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios process memory usage growing

Post by benjaminsmith »

Hi @sgargano,

So just to confirm, the upgrade worked fine for about a week, and then Nagios's process started consuming system memory?

Do you have any customized settings, such as offloaded database, mod german..etc?

That error message you're seeing when trying to download the profile is usually due to an incorrect sudoers file (which can lead to other issues as well). Please follow the steps in the article below to correct, and then send the profile for us to review.

Nagios XI - Profile Build Failed

If that fails again, try generating one from the command line.

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*​​
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip​ file.​ Thanks, Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Re: Nagios process memory usage growing

Post by sgargano »

Yes I can confirm that it was working after the upgrade for about 1 week. During the upgrade from 5.5.10. to 5.7.5 we faced some issues that we need to adjust the nagios.cfg because of the NDO change and also had some problem that a db table is missing but I found some threads how to solve and after applying this changes the system was working.

I am not 100 % sure if we have some customized configurations but I don't think so. Only customization I heard from the former admin is that the default path for some config file was changed.

We needed to fully rollback to 5.5.10 due to this issues. Now also the profile download is working again etc.

In my feeling is seems that something was running full in the background which result that the system runs out of memory and finally crashing.
Even when we tried to revert to a snap we took one day before the problem appeared, the problem popped off again.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios process memory usage growing

Post by ssax »

When you were on XI 5.7.5, were you running the new NDO3 or did you downgrade your NDO3 back to NDO2DB?

I've seen this occur with a couple missing table rows in nagios_hoststatus and nagios_servicestatus, the check_options field missing causing the nagios process to eventually segfault/crash.

Since you are reverted we can't really tell if those table rows were missing, but they can be added like this:

Code: Select all

mysql -h 127.0.0.1 -uroot -pnagiosxi nagios -e "ALTER TABLE nagios_hoststatus ADD check_options smallint(6) NOT NULL default '0' AFTER check_type;"
mysql -h 127.0.0.1 -uroot -pnagiosxi nagios -e "ALTER TABLE nagios_servicestatus ADD check_options smallint(6) NOT NULL default '0' AFTER check_type;"
If they exist already you would see this which can safely be ignored:

Code: Select all

ERROR 1060 (42S21) at line 1: Duplicate column name 'check_options'
If the profile build failed (when before it worked), I assume the /etc/sudoers entries didn't get updated on the system or were broken somehow, you can update them following this guide (make sure to download the same version of XI that your XI system is running at the time):

https://support.nagios.com/kb/article.p ... ategory=44
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Re: Nagios process memory usage growing

Post by sgargano »

After upgrade to 5.7.5 we were running the new ND03 and NOT downgraded to NDO2DB.

I am not remember exactly but it could be the check_options field which was also missing in our case. We fixed that by manually adding these field.
I will plan a maintenance for the beginning of next week to start the vm which currently have the problem. Will follow the guide to fix the profile build and send you via private message the downloaded profile for further investigation.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios process memory usage growing

Post by benjaminsmith »

Hi,
I will plan a maintenance for the beginning of next week to start the vm which currently have the problem. Will follow the guide to fix the profile build and send you via private message the downloaded profile for further investigation.
Sounds good. We'll wait for you reply. You can send the profile to either myself or Sean Sax.

If you PM the profile, please reply to this thread to bring it up in the support queue.

Thanks!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Re: Nagios process memory usage growing

Post by sgargano »

Hi Benjamin,

Profile send via Private message.
Hope you find the issues.

If you need more information please let me know.

best regards
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios process memory usage growing

Post by benjaminsmith »

Hi @sgargano,

It looks like the main issue here is with connection/write issues on this system with ndo3 and this is likely causing the memory usage to spike ( and Apache / Core to crash).
[1607610437] wproc: 'Core Worker 1678' seems to be choked. ret = -1; bufsize = 599: written = 0; errno = 32 (Broken pipe)
[1607610437] NDO-3: The following query failed while MySQL appears to be connected:
[1607610437] NDO-3: INSERT INTO nagios_servicechecks (instance_id, start_time,
My recommendation would be to either remain on 5.5.10 until 5.8 is released or upgrade to 5.7.5 and then downgrade the backend database to the previous version. If you would like to downgrade ndo3, the instructions are below. It's a relatively simple procedure.

Code: Select all

systemctl stop nagios
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi/subcomponents/ndoutils
./install
systemctl enable ndo2db
Then edit your /usr/local/nagios/etc/nagios.cfg and make sure this line is uncommented:

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
Make sure this line is commented:

Code: Select all

#broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
Then start the nagios service:

Code: Select all

systemctl start nagios
Let me know what you decide and if you have any questions.

--Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Re: Nagios process memory usage growing

Post by sgargano »

Hi Benjamin,

during downgrade I see following error. Not sure if this leads to an error:

Code: Select all

make[2]: Entering directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.3/src'
gcc -fPIC -g -O2 -DHAVE_CONFIG_H  -c -o db.o db.c
db.c: In function ‘ndo2db_db_init’:
db.c:169:29: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
  if(!mysql_init(&idi->dbinfo.mysql_conn)){
                             ^
db.c: In function ‘ndo2db_db_connect’:
db.c:211:16: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
    &idi->dbinfo.mysql_conn,
                ^
db.c:218:4: error: ‘CLIENT_REMEMBER_OPTIONS’ undeclared (first use in this function)
    CLIENT_REMEMBER_OPTIONS
    ^
db.c:218:4: note: each undeclared identifier is reported only once for each function it appears in
db.c:220:27: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
   mysql_close(&idi->dbinfo.mysql_conn);
                           ^
db.c:221:101: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
   syslog(LOG_USER|LOG_INFO,"Error: Could not connect to MySQL database: %s",mysql_error(&idi->dbinfo.mysql_conn));
                                                                                                     ^
db.c: In function ‘ndo2db_db_disconnect’:
db.c:244:26: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
  mysql_close(&idi->dbinfo.mysql_conn);
                          ^
db.c: In function ‘ndo2db_db_hello’:
db.c:268:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
              ^
db.c:268:59: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
   idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
                                                           ^
db.c:269:18: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
   if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
                  ^
db.c:269:56: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
                                                        ^
db.c:270:53: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
    ndo2db_convert_string_to_unsignedlong(idi->dbinfo.mysql_row[0],&idi->dbinfo.instance_id);
                                                     ^
db.c:273:32: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   mysql_free_result(idi->dbinfo.mysql_result);
                                ^
db.c:274:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   idi->dbinfo.mysql_result=NULL;
              ^
db.c:283:56: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
    idi->dbinfo.instance_id=mysql_insert_id(&idi->dbinfo.mysql_conn);
                                                        ^
db.c:303:55: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
   idi->dbinfo.conninfo_id=mysql_insert_id(&idi->dbinfo.mysql_conn);
                                                       ^
db.c: In function ‘ndo2db_db_query’:
db.c:477:30: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
  if (mysql_query(&idi->dbinfo.mysql_conn,buf)) {
                              ^
db.c:479:75: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
   syslog(LOG_USER|LOG_INFO,"mysql_error: '%s'\n", mysql_error(&idi->dbinfo.mysql_conn));
                                                                           ^
db.c: In function ‘ndo2db_handle_db_error’:
db.c:512:33: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
  result=mysql_errno(&idi->dbinfo.mysql_conn);
                                 ^
db.c:513:13: error: ‘CR_SERVER_LOST’ undeclared (first use in this function)
  if(result==CR_SERVER_LOST || result==CR_SERVER_GONE_ERROR){
             ^
db.c:513:39: error: ‘CR_SERVER_GONE_ERROR’ undeclared (first use in this function)
  if(result==CR_SERVER_LOST || result==CR_SERVER_GONE_ERROR){
                                       ^
db.c: In function ‘ndo2db_db_get_latest_data_time’:
db.c:565:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
              ^
db.c:565:59: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_conn’
   idi->dbinfo.mysql_result=mysql_store_result(&idi->dbinfo.mysql_conn);
                                                           ^
db.c:566:18: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
   if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
                  ^
db.c:566:56: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   if((idi->dbinfo.mysql_row=mysql_fetch_row(idi->dbinfo.mysql_result))!=NULL){
                                                        ^
db.c:567:53: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_row’
    ndo2db_convert_string_to_unsignedlong(idi->dbinfo.mysql_row[0],t);
                                                     ^
db.c:569:32: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   mysql_free_result(idi->dbinfo.mysql_result);
                                ^
db.c:570:14: error: ‘ndo2db_dbconninfo’ has no member named ‘mysql_result’
   idi->dbinfo.mysql_result=NULL;
              ^
make[2]: *** [db.o] Error 1
make[2]: Leaving directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.3/src'
make[1]: *** [ndo2db] Error 2
make[1]: Leaving directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.3/src'
make: *** [all] Error 2
When is try to start nagios after downgrade the service failed to start because it seems that /usr/local/nagios/bin/ndomod.o is missing

br
Robert
Locked