Tactical Overview, Ops Center, Ops Screen Problems

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by scottwilkerson »

lets rule out db corruption, and run the following

Code: Select all

/usr/local/nagiosxi/scripts/repairmysql.sh nagios
then for good measure, run

Code: Select all

service ndo2db stop
service nagios stop
killall -9 ndo2db
killall -9 nagios
service ndo2db start
service nagios start
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

I did as suggested and I'm still running out of space on my ramdisk on both machines. One is set to 50MB while the new one is set to 75MB.

Code: Select all

[root@nagiosxivm ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      108G   49G   55G  48% /
tmpfs                 7.4G     0  7.4G   0% /dev/shm
/dev/sda1              97M   82M   11M  89% /boot
tmpfs                  50M     -     -   -  /var/nagiosramdisk

Code: Select all

[root@LNTTAVMNAG1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00_ROOT
                       48G   17G   28G  38% /
/dev/mapper/VolGroup00-LogVol00
                      2.9G   69M  2.7G   3% /tmp
/dev/mapper/VolGroup00-LogVol00_VAR
                      4.8G  3.1G  1.5G  68% /var
/dev/hda1             190M   47M  134M  26% /boot
tmpfs                 2.0G     0  2.0G   0% /dev/shm
tmpfs                  75M   75M     0 100% /var/nagiosramdisk
10.100.3.220:/kickstart
                      190G  110G   70G  62% /kickstart
From what I've been able to find, a reboot is the only thing that fixes it. I have tried stopping and starting nagios, npcd, ndo2db, mysqld without luck. I just tried restarting httpd and it appears to have cleared the file.

Somehow, the data is being written but not cleared. When I restart and check the services that are showing down, I can see them just fine. The closer the ramdisk gets to being full, the more likely I am to click on services and/or hosts that are down and get the screen that says nothing is down.

This is rendering both Nagios installs useless at this point. :(

I decided to take a look at the commands for process-host-perfdata-* & process-service-perfdata-* after reading this post: Nagios Performance Tuning - Tech Tiops: Understanding Disk I/O
A number of these were still pointing to the /usr/local/nagios/var/* files. The documentation only shows to update the process-*-perfdata-file-bulk files to point to the ramdisk.

I've let the new system run like this for about an hour now and it seems that the service-perfdata file has stabalized at 44K and host-perfdata at 11k.

On a separate note, I think I figured out the issue on my new system with the Home link being different from the Tactical Overview link. When I imported the backup, I'm guessing that it kept the links pointing to the old server and didn't update it with the new one. This might be a bug?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by scottwilkerson »

I would go through each of the items in
http://library.nagios.com/library/produ ... n-nagiosxi

and make sure all have been modified and with the correct path.

Also, if you run the following it may give a clue to what is causing the ram disk to fill

Code: Select all

ll /var/nagiosramdisk/spool/perfdata|wc -l
ll /var/nagiosramdisk/spool/xidpe|wc -l
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

scottwilkerson wrote:I would go through each of the items in
http://library.nagios.com/library/produ ... n-nagiosxi

and make sure all have been modified and with the correct path.

Also, if you run the following it may give a clue to what is causing the ram disk to fill

Code: Select all

ll /var/nagiosramdisk/spool/perfdata|wc -l
ll /var/nagiosramdisk/spool/xidpe|wc -l
I have, multiple times.

The documentation only has the following:

Code: Select all

Additionally, the following command definitions will need to be updated in the Nagios XI->Core Config Manager->Commands. 
 command_name process-host-perfdata-file-bulk
 command_line /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
 command_name process-service-perfdata-file-bulk
 command_line /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
It doesn't touch on any of the other commands:
  • process-host-perfdata-file-pnp-bulk
  • process-host-perfdata
  • process-service-perfdata-file-pnp-bulk
  • process-service-perfdata
It wasn't until I changed these as well to point to the new folder (/var/nagiosramdisk) that the issue was corrected. None of these is mentioned anywhere in the how-to document.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by slansing »

Hey jbennett,

Can you submit a ticket to [email protected]? This way our team can work more hand in hand over this and take a look at your system via a remote session if it is needed, being that this issue is still occurring and impacting your monitoring environment so heavily.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by jbennett »

slansing wrote:Hey jbennett,

Can you submit a ticket to [email protected]? This way our team can work more hand in hand over this and take a look at your system via a remote session if it is needed, being that this issue is still occurring and impacting your monitoring environment so heavily.
Thank you for the offer.

However, this is not necessary as I have stated everything is corrected.

The only reason for a ticket would be to inform whoever does documentation that the documentation for utilizing a ramdisk doesn't include all of the necessary changes to commands.

Also, that when migrating data, links still point to old servers and aren't updated on the new server.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Tactical Overview, Ops Center, Ops Screen Problems

Post by sreinhardt »

We will look into correcting the documentation. Thank you for the info and clear descriptions!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked