Configuration Wizards - can not add new machine

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Configuration Wizards - can not add new machine

Post by tgriep »

One thing I see is that the system may be at the limit for the Kernel Message Queue which sometimes caused Missing Host or Service data.
To fix that, follow the instructions in this KB article.
https://support.nagios.com/kb/article/n ... d-139.html

Another thing I see are a lot of errors when Nagios is trying to write to the MYSQL database.
It looks like some of the fields on the status tables are not large enough which is causing them to not be updated and that is also preventing the status from displaying in the GUI.

Can you run the following on the Nagios server to dump the database structure and post the /tmp/nagios.txt file so we can check the fields that are causing the errors?

Code: Select all

mysqldump --no-data -u nagios -pnagios --databases nagios  -h xxx.xxx.xxx.xxx  >/tmp/nagios.txt
Replace xxx.xxx.xxx.xxx with the IP address of your MYSQL server.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Eurocash
Posts: 29
Joined: Wed Dec 27, 2017 6:15 am

Re: Configuration Wizards - can not add new machine

Post by Eurocash »

It is true with Kenrel Message Queue.
I did follow this KB article:
https://support.nagios.com/kb/article/n ... d-139.html

I had everything done without clearing queue. Cleared queue in this moment. Still I have problem with NDO (mostly I can see this in Nagvis):
NDO.png
If I restart the Database Backend it's all good for a while, but always return to the state as above.

Database structure dump:
nagios.txt
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Configuration Wizards - can not add new machine

Post by tgriep »

Thanks for the files. To increase the field in the MYSQL database that is generating the error, run the following as root.

Code: Select all

echo "use nagios;alter table nagios_eventhandlers modify command_line varchar(1023) not null;" | mysql -u root -pnagiosxi
After the change, let the system run for 15 minutes and see if the error fixed.

If not, I would have to get a new copy of the /var/log/messages file with any new errors in it so I can see if there are any other tables that need to be updated.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Eurocash
Posts: 29
Joined: Wed Dec 27, 2017 6:15 am

Re: Configuration Wizards - can not add new machine

Post by Eurocash »

Error is not fixed. Still I can see NDO Error (like in provided screenshot). Still I cannot add new host with Configuration Wizards (I have reproduced steps posted by cdienger).

/var/log/messages:
messages.tar
Problems with: alias, display_name, varvalue and still with command_line?
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Configuration Wizards - can not add new machine

Post by tgriep »

I found more MYSQL tables that need to be increased. To increase them, run the following as root.

Code: Select all

echo "use nagios;alter table nagios_services modify display_name varchar(1023) not null;" | mysql -u root -pnagiosxi
echo "use nagios;alter table nagios_hosts modify alias varchar(1023) not null;" | mysql -u root -pnagiosxi
echo "use nagios;alter table nagios_services modify alias varchar(1023) not null;" | mysql -u root -pnagiosxi
echo "use nagios;alter table nagios_commands modify command_line varchar(1023) not null;" | mysql -u root -pnagiosxi
echo "use nagios;alter table nagios_configfilevariables modify varvalue varchar(1023) not null;" | mysql -u root -pnagiosxi
echo "use nagios;alter table nagios_conninfo modify bytes_processed int(12);" | mysql -u root -pnagiosxi
After running the above, lets stop / start these processes.

Code: Select all

service nagios stop
service ndo2db stop
service crond restart
service ndo2db start
service nagios start
Hopefully I did not miss any of the tables but if I did, I would have to get an updated messages file with just today's data.

Another thing I would do is to make sure the configs are in sync with the running config and to to that, follow this procedure.
Go to the Core Config Manager
Under "Tools", click "Write Config Files" or if you are running a newer versions of XI, The menu is called "Config File Management"
Click the click the "Write" button, then the "Delete" button then click the "Write" button and then the "Verify" button
If you get any errors, resolve them and click on the "Delete" button, "Write", "Verify" until all of the errors are resolved.
Click the Apply Configuration link and click the "Apply Configuration" button after all of the errors are resolved.

Let us know if this fixes the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Eurocash
Posts: 29
Joined: Wed Dec 27, 2017 6:15 am

Re: Configuration Wizards - can not add new machine

Post by Eurocash »

Increase tables results
command
echo "use nagios;alter table nagios_services modify alias varchar(1023) not null;" | mysql -u root -pnagiosxi
gives ERROR 1054 (42S22) at line 1: Unknown column 'alias' in 'nagios_services'
Rest of increase commands went good.

Core Config Manager section
Did whole procedure with Write Config Files. - 0 Errors only Warinings.
Again I have tried adding new host, I have reproduced steps posted by cdienger, the host is not added...

About Queue
command
watch ipcs -q
gives me one line and I can see that used-bytes is maxed and also messages is maxed out after some time...
queue.png
New /var/log/messages:
messages.tar
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Configuration Wizards - can not add new machine

Post by tgriep »

Sorry about that, I put in the wrong field name. Try this instead.

Code: Select all

echo "use nagios;alter table nagios_services modify display_name varchar(1023) not null;" | mysql -u root -pnagiosxi
Also, the Kernel message Queue being maxed out can be fixed by following this KB article.
https://support.nagios.com/kb/article/n ... d-139.html

Try that and see if it resolves the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Eurocash
Posts: 29
Joined: Wed Dec 27, 2017 6:15 am

Re: Configuration Wizards - can not add new machine

Post by Eurocash »

echo "use nagios;alter table nagios_services modify display_name varchar(1023) not null;" | mysql -u root -pnagiosxi
this was first at coommands list you deliverede in previous post - and it was done.
Also, the Kernel message Queue being maxed out can be fixed by following this KB article.
It was done.
sysctl -p.png
I also checked mysql-error.log several drops of connections. Change max connections due to this article:
https://support.nagios.com/kb/article/n ... s-513.html

Was 151 now max connections is 1000.

There was also:
139846686680832 [ERROR] mysqld: Table './nagios/nagios_notifications' is marked as crashed and last (automatic?) repair failed
and
139846686680832 [ERROR] mysqld: Table 'nagios_notifications' is marked as crashed and last (automatic?) repair failed

So I run the repairmysql.sh it was succesfful.

Still I have the same problems:
Kernel message Queue is maxing out, then I have NDO errors and cannot see stats in GUI. Also I can not add new host with Configuration Wizard.
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Configuration Wizards - can not add new machine

Post by tgriep »

The problem could be that the MYSQL database is offloaded to a remote server and the extra latency to transfer the data over the network is causing the Kernel Message Queue to fill up as it cannot write the data quick enough.
I suggest moving the MYSQL database back to the Nagios server as that should be able to reduce the latency and the data can get written faster.
Sometime offloading the MYSQL database doesn't increase performance so moving it back helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Eurocash
Posts: 29
Joined: Wed Dec 27, 2017 6:15 am

Re: Configuration Wizards - can not add new machine

Post by Eurocash »

We have offloaded database due to performance issues. GUI was very laggy and we have issue with missing hosts or services or status data even after we used this solution: https://support.nagios.com/kb/article/n ... d-139.html
After offload operation Nagios XI have max load at 60-70% of CPU, DB 15-30% of CPU. Both have minimum 1/4 of ram free. Network have load at 20-30% max.
After Kernel Message Queue is full max load at Nagios machine is at 10% of CPU...

Both machines are VM, they are at the same physical host.
We do not have Jumbo frames turned on - I assume that in this situation it can be a problem.

We have added all configuration files and we do not have added snmp/nrpe to old hosts. We have turned off postfix (mail alerts). Can the last two be filling kernel message queue too?

We have 1500 hosts and 10000 services and most likely this numbers willl grow fast in near future.
Locked