Page 1 of 2
What is wrong with Nagios making ALL BPI groups disappear?
Posted: Fri Sep 27, 2019 11:36 am
by dlukinski
Hello XI Support
Our BPI configurations disappear for no good reason: 6 groups existed and lasted a week over many changes.
All of themSuddenly gone today after very small config change & applying changes
I have restored previous good BPI backup manually (works so far) after restoring previous config snapshot.
not yet tried to apply any other changes yet.
What is wrong with Nagios making ALL BPI groups disappear?
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Fri Sep 27, 2019 1:32 pm
by ssax
There was a bugfix (to fix the same name hostgroup servicegroup bug) that caused some issues on the upgrade, the fix was to prepend them with
hg_ and
sg_ and if there were duplicates/other issues it could have had an issue. Sometimes this happens, I apologize for any inconvenience. Our developers do try to catch everything but sometimes things like this happen but they are working on ways to address them.
Please PM me your
/usr/local/nagiosxi/etc/components/bpi.conf and
/usr/local/nagiosxi/etc/components/bpi.conf.backup so that I can do some analyzing/processing and try to get you back in working order.
If it's not a bpi.conf error (even if it is a direct bpi.conf issue, this will still need to be done)
You will need to follow this guide though to re-match the groups by editing your current BPI services:
https://support.nagios.com/kb/article/n ... e-858.html
Let me know if you have any questions or if I can clarify anything.
Thank you
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Fri Sep 27, 2019 4:01 pm
by dlukinski
ssax wrote:There was a bugfix (to fix the same name hostgroup servicegroup bug) that caused some issues on the upgrade, the fix was to prepend them with
hg_ and
sg_ and if there were duplicates/other issues it could have had an issue. Sometimes this happens, I apologize for any inconvenience. Our developers do try to catch everything but sometimes things like this happen but they are working on ways to address them.
Please PM me your
/usr/local/nagiosxi/etc/components/bpi.conf and
/usr/local/nagiosxi/etc/components/bpi.conf.backup so that I can do some analyzing/processing and try to get you back in working order.
If it's not a bpi.conf error (even if it is a direct bpi.conf issue, this will still need to be done)
You will need to follow this guide though to re-match the groups by editing your current BPI services:
https://support.nagios.com/kb/article/n ... e-858.html
Let me know if you have any questions or if I can clarify anything.
Thank you
Hi
Please see the conf file (nothing wrong with these)
I restored .good config and applied new configurations multiple times after that - still works for now
However I do not understand what broke it the last time
- please comment any configurations to tweak because BPI is too big?
Nagios fix worked as far as I can tell, but from time to time Nagios generates broken BPI for reasons unknown
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Fri Sep 27, 2019 4:36 pm
by ssax
Thank you, let's get a remote scheduled as well, just in case, check your PMs.
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Fri Sep 27, 2019 4:43 pm
by ssax
See previous message but:
- Additionally, we may have adjusted the max open files/nproc
What is the output of these commands (as root):
Code: Select all
sysctl -p
ulimit -a
su - nagios
ulimit -a
- please comment any configurations to tweak because BPI is too big?
/etc/php.ini - Increased limits for memory_limit, max_input_time, max_input_vars, max_execution_time, and I'm not sure if this was you or not but there we updated these as well:
Code: Select all
upload_max_filesize = 20M
post_max_size = 21M
Attach yours as well if you would.
This was the mod_gearman exclude group being too large that could exceed a PHP limit.
Then there was the
Duplication issue, this one was hard to pin down but was eventually found to be an issue when you had a hostgroup AND a servicegroup with the exact names.
And now this bugfix (which was a fix for that duplication issue) cause the
Unknown BPI Group Index errors.
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Mon Sep 30, 2019 3:52 pm
by dlukinski
ssax wrote:See previous message but:
- Additionally, we may have adjusted the max open files/nproc
What is the output of these commands (as root):
Code: Select all
sysctl -p
ulimit -a
su - nagios
ulimit -a
- please comment any configurations to tweak because BPI is too big?
/etc/php.ini - Increased limits for memory_limit, max_input_time, max_input_vars, max_execution_time, and I'm not sure if this was you or not but there we updated these as well:
Code: Select all
upload_max_filesize = 20M
post_max_size = 21M
Attach yours as well if you would.
This was the mod_gearman exclude group being too large that could exceed a PHP limit.
Then there was the
Duplication issue, this one was hard to pin down but was eventually found to be an issue when you had a hostgroup AND a servicegroup with the exact names.
And now this bugfix (which was a fix for that duplication issue) cause the
Unknown BPI Group Index errors.
Set php.ini to the same 20-21 (was 16-21 for me) / Please send the intive for the meeting
Plrease see the output down below
Code: Select all
[root@fikc-nagxiprod01 perfdata]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.pid_max = 4194303
[root@fikc-nagxiprod01 perfdata]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63677
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 131072
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@fikc-nagxiprod01 perfdata]# su - nagios
[nagios@fikc-nagxiprod01 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63677
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Mon Sep 30, 2019 4:53 pm
by ssax
I won't have the link until about 5 minutes before (when I set it up) as I'm not sure which account will be available.
Did you schedule on my calendar? I don't see the booking.
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Thu Oct 03, 2019 7:51 am
by dlukinski
dlukinski wrote:ssax wrote:See previous message but:
- Additionally, we may have adjusted the max open files/nproc
What is the output of these commands (as root):
Code: Select all
sysctl -p
ulimit -a
su - nagios
ulimit -a
- please comment any configurations to tweak because BPI is too big?
/etc/php.ini - Increased limits for memory_limit, max_input_time, max_input_vars, max_execution_time, and I'm not sure if this was you or not but there we updated these as well:
Code: Select all
upload_max_filesize = 20M
post_max_size = 21M
Attach yours as well if you would.
This was the mod_gearman exclude group being too large that could exceed a PHP limit.
Then there was the
Duplication issue, this one was hard to pin down but was eventually found to be an issue when you had a hostgroup AND a servicegroup with the exact names.
And now this bugfix (which was a fix for that duplication issue) cause the
Unknown BPI Group Index errors.
Set php.ini to the same 20-21 (was 16-21 for me) / Please send the intive for the meeting
Plrease see the output down below
Code: Select all
[root@fikc-nagxiprod01 perfdata]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.pid_max = 4194303
[root@fikc-nagxiprod01 perfdata]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63677
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 131072
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@fikc-nagxiprod01 perfdata]# su - nagios
[nagios@fikc-nagxiprod01 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63677
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Did not get the link to your calendar
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Thu Oct 03, 2019 4:58 pm
by ssax
Resent again, here's the one I sent before:
Check your forum PMs at the top of the forums, it should show you the number of new messages at the top.
Re: What is wrong with Nagios making ALL BPI groups disappea
Posted: Fri Oct 04, 2019 10:38 am
by dlukinski
ssax wrote:Resent again, here's the one I sent before:
Check your forum PMs at the top of the forums, it should show you the number of new messages at the top.
Hello Sean
I can't find your email and now I am in real trouble: upgrade 5.6.7 broke and deleted BPI (this is a disaster).
I re-created groups, but the BPI is still broken.