Performance dive with 2012R1.1 Upgrade?

jbennett · Post by **jbennett** » Fri Nov 02, 2012 12:07 pm

I just upgraded from 2012R1.0 to 1.1 and am running into a LOT of load.

When I ran the configs from the command line (-v option), I have over 2500 warnings for things that were working fine before the upgrade.

It appears that templates aren't being applied to my hosts or services. I'm getting a ton of 'no check time period defined', etc.

Is this something that the recent update was trying to resolve? I thought I saw something about this in the release notes?

I made one change to remove a host from a service check and the config has been applying for well over an hour now.

Any suggestions?

scottwilkerson · Post by **scottwilkerson** » Fri Nov 02, 2012 1:06 pm

As far as I am aware, nothing relating to templating has changed at all.

You may want to check your mysql.log to make sure you don't have some sort of database corruption.

Code: Select all

tail -f /var/log/mysql.log

jbennett · Post by **jbennett** » Fri Nov 02, 2012 2:13 pm

I ran the DB clean up and the permissions tool.

I'm just running into all kinds of issues now. It appears that I have a number of orphaned checks again, and when I go to look at my service problems, nothing shows up. It will tell me I have close to 300 service warnings, yet when I click on this from the Tactical Monitoring Overview, I get: No matching services found.

The hosts seem to show up just fine.

Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.

When tail the mysqld.log, I see the following. The shutdown was from the db tool.

Code: Select all

121102 18:18:30 [Note] /usr/libexec/mysqld: Shutdown complete

121102 18:18:30 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
121102 18:19:52 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
121102 18:19:52  InnoDB: Initializing buffer pool, size = 8.0M
121102 18:19:52  InnoDB: Completed initialization of buffer pool
121102 18:19:54  InnoDB: Started; log sequence number 0 44263
121102 18:19:54 [Note] Event Scheduler: Loaded 0 events
121102 18:19:54 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

scottwilkerson · Post by **scottwilkerson** » Fri Nov 02, 2012 3:08 pm

Can we reset nagios & ndo2db

Code: Select all

service nagios stop
service nagios stop
killall -9 nagios
killall -9 ndo2db
service nagios start
service ndo2dbstart

Also, lets just verify we don't have a full disk which can cause odd behavior like this

Code: Select all

df -h

jbennett · Post by **jbennett** » Mon Nov 05, 2012 9:35 am

scottwilkerson wrote:Can we reset nagios & ndo2db
Code: Select all
service nagios stop
service nd02db stop
killall -9 nagios
killall -9 ndo2db
service nagios start
service ndo2db start
Also, lets just verify we don't have a full disk which can cause odd behavior like this
Code: Select all
df -h

Code: Select all

[root@nagiosxivm ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      108G   43G   61G  42% /
tmpfs                 7.4G     0  7.4G   0% /dev/shm
/dev/sda1              97M   82M   11M  89% /boot

scottwilkerson · Post by **scottwilkerson** » Mon Nov 05, 2012 12:31 pm

are you still getting errors?

Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.

Can you look at the apache error_log to see what is showing

Code: Select all

tail -f /var/log/httpd/error_log

jbennett · Post by **jbennett** » Mon Nov 05, 2012 5:57 pm

scottwilkerson wrote:are you still getting errors?

Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
Can you look at the apache error_log to see what is showing
Code: Select all
tail -f /var/log/httpd/error_log

Yes, I'm still getting the 500 error page.

When I tail the log suggested, I am seeing the following, over and over and over again:

Code: Select all

[Mon Nov 05 22:58:01 2012] [error] [client 10.100.30.65] PHP Warning:  include_once(): Failed opening '/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8, referer: http://10.100.30.10/nagiosxi/includes/components/nocscreen/noc.php
[Mon Nov 05 22:58:02 2012] [error] [client 127.0.0.1] PHP Warning:  include_once(/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php): failed to open stream: No such file or directory in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8
[Mon Nov 05 22:58:02 2012] [error] [client 127.0.0.1] PHP Warning:  include_once(): Failed opening '/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8

Also, when I run the config check, I'm seeing a lot of duplicate definitions found for services that I had deleted prior to the upgrade. I've tried finding them through the CCM, but they're not there. If I delete them from the /usr/local/nagios/etc/services/ folder, they reappear once I reapply the configs.

Should I delete them from the /usr/local/nagios/etc/services/ folder, them use the Write Config Files options under Tools in the CCM?

scottwilkerson · Post by **scottwilkerson** » Mon Nov 05, 2012 6:03 pm

Ok, this looks like you have a wizard installed in a component directory
Run

Code: Select all

rm -rf /usr/local/nagiosxi/html/includes/components/snmpwalk

Then try again, checking again for errors

jbennett · Post by **jbennett** » Tue Nov 06, 2012 9:28 am

scottwilkerson wrote:Ok, this looks like you have a wizard installed in a component directory
Run
Code: Select all
rm -rf /usr/local/nagiosxi/html/includes/components/snmpwalk
Then try again, checking again for errors

This has taken care of the 500 error. I'm not sure how that happened in the first place though?

Now, I need to go about getting the old config files removed. Suggestions?

jbennett · Post by **jbennett** » Tue Nov 06, 2012 10:56 am

It appears that stoping nagios, manually deleting these files, then applying the configs individually through the CCM, then restarting nagios has taken care of the duplicates.

HOWEVER, I'm still running into the issue where services are reporting down in the tactical overview, yet when I click on the 7 critical, unhandled problems or the 1 warning, unhandled problem, I get a page that shows no issues.

Also, on this same page where no matching services are found, the host & service status summary boxes at the top of the screen shows different counts than the tactical overview.

I'm not sure I understand why this is?

EDIT: I figured it out. It seems that in the upgrade process, things were reverted to a previous state for some reason. The hosts and services that I wasn't able to see no longer had a specific contact group assigned to them. This caused the hosts to show down, but when I would go to check or search for them I wasn't able to see them. This has since been resolved. Many thanks.

Nagios Support Forum

Performance dive with 2012R1.1 Upgrade?

Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?

Re: Performance dive with 2012R1.1 Upgrade?