Page 1 of 2
Performance dive with 2012R1.1 Upgrade?
Posted: Fri Nov 02, 2012 12:07 pm
by jbennett
I just upgraded from 2012R1.0 to 1.1 and am running into a LOT of load.
When I ran the configs from the command line (-v option), I have over 2500 warnings for things that were working fine before the upgrade.
It appears that templates aren't being applied to my hosts or services. I'm getting a ton of 'no check time period defined', etc.
Is this something that the recent update was trying to resolve? I thought I saw something about this in the release notes?
I made one change to remove a host from a service check and the config has been applying for well over an hour now.
Any suggestions?
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Fri Nov 02, 2012 1:06 pm
by scottwilkerson
As far as I am aware, nothing relating to templating has changed at all.
You may want to check your mysql.log to make sure you don't have some sort of database corruption.
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Fri Nov 02, 2012 2:13 pm
by jbennett
I ran the DB clean up and the permissions tool.
I'm just running into all kinds of issues now. It appears that I have a number of orphaned checks again, and when I go to look at my service problems, nothing shows up. It will tell me I have close to 300 service warnings, yet when I click on this from the Tactical Monitoring Overview, I get: No matching services found.
The hosts seem to show up just fine.
Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
When tail the mysqld.log, I see the following. The shutdown was from the db tool.
Code: Select all
121102 18:18:30 [Note] /usr/libexec/mysqld: Shutdown complete
121102 18:18:30 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
121102 18:19:52 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
121102 18:19:52 InnoDB: Initializing buffer pool, size = 8.0M
121102 18:19:52 InnoDB: Completed initialization of buffer pool
121102 18:19:54 InnoDB: Started; log sequence number 0 44263
121102 18:19:54 [Note] Event Scheduler: Loaded 0 events
121102 18:19:54 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Fri Nov 02, 2012 3:08 pm
by scottwilkerson
Can we reset nagios & ndo2db
Code: Select all
service nagios stop
service nagios stop
killall -9 nagios
killall -9 ndo2db
service nagios start
service ndo2dbstart
Also, lets just verify we don't have a full disk which can cause odd behavior like this
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Mon Nov 05, 2012 9:35 am
by jbennett
scottwilkerson wrote:Can we reset nagios & ndo2db
Code: Select all
service nagios stop
service nd02db stop
killall -9 nagios
killall -9 ndo2db
service nagios start
service ndo2db start
Also, lets just verify we don't have a full disk which can cause odd behavior like this
Code: Select all
[root@nagiosxivm ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
108G 43G 61G 42% /
tmpfs 7.4G 0 7.4G 0% /dev/shm
/dev/sda1 97M 82M 11M 89% /boot
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Mon Nov 05, 2012 12:31 pm
by scottwilkerson
are you still getting errors?
Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
Can you look at the apache error_log to see what is showing
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Mon Nov 05, 2012 5:57 pm
by jbennett
scottwilkerson wrote:are you still getting errors?
Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
Can you look at the apache error_log to see what is showing
Yes, I'm still getting the 500 error page.
When I tail the log suggested, I am seeing the following, over and over and over again:
Code: Select all
[Mon Nov 05 22:58:01 2012] [error] [client 10.100.30.65] PHP Warning: include_once(): Failed opening '/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8, referer: http://10.100.30.10/nagiosxi/includes/components/nocscreen/noc.php
[Mon Nov 05 22:58:02 2012] [error] [client 127.0.0.1] PHP Warning: include_once(/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php): failed to open stream: No such file or directory in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8
[Mon Nov 05 22:58:02 2012] [error] [client 127.0.0.1] PHP Warning: include_once(): Failed opening '/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8
Also, when I run the config check, I'm seeing a lot of duplicate definitions found for services that I had deleted prior to the upgrade. I've tried finding them through the CCM, but they're not there. If I delete them from the /usr/local/nagios/etc/services/ folder, they reappear once I reapply the configs.
Should I delete them from the /usr/local/nagios/etc/services/ folder, them use the Write Config Files options under Tools in the CCM?
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Mon Nov 05, 2012 6:03 pm
by scottwilkerson
Ok, this looks like you have a wizard installed in a component directory
Run
Code: Select all
rm -rf /usr/local/nagiosxi/html/includes/components/snmpwalk
Then try again, checking again for errors
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Tue Nov 06, 2012 9:28 am
by jbennett
scottwilkerson wrote:Ok, this looks like you have a wizard installed in a component directory
Run
Code: Select all
rm -rf /usr/local/nagiosxi/html/includes/components/snmpwalk
Then try again, checking again for errors
This has taken care of the 500 error. I'm not sure how that happened in the first place though?
Now, I need to go about getting the old config files removed. Suggestions?
Re: Performance dive with 2012R1.1 Upgrade?
Posted: Tue Nov 06, 2012 10:56 am
by jbennett
It appears that stoping nagios, manually deleting these files, then applying the configs individually through the CCM, then restarting nagios has taken care of the duplicates.
HOWEVER, I'm still running into the issue where services are reporting down in the tactical overview, yet when I click on the 7 critical, unhandled problems or the 1 warning, unhandled problem, I get a page that shows no issues.
Also, on this same page where no matching services are found, the host & service status summary boxes at the top of the screen shows different counts than the tactical overview.
I'm not sure I understand why this is?
EDIT: I figured it out. It seems that in the upgrade process, things were reverted to a previous state for some reason. The hosts and services that I wasn't able to see no longer had a specific contact group assigned to them. This caused the hosts to show down, but when I would go to check or search for them I wasn't able to see them. This has since been resolved. Many thanks.