Performance dive with 2012R1.1 Upgrade?
Performance dive with 2012R1.1 Upgrade?
I just upgraded from 2012R1.0 to 1.1 and am running into a LOT of load.
When I ran the configs from the command line (-v option), I have over 2500 warnings for things that were working fine before the upgrade.
It appears that templates aren't being applied to my hosts or services. I'm getting a ton of 'no check time period defined', etc.
Is this something that the recent update was trying to resolve? I thought I saw something about this in the release notes?
I made one change to remove a host from a service check and the config has been applying for well over an hour now.
Any suggestions?
When I ran the configs from the command line (-v option), I have over 2500 warnings for things that were working fine before the upgrade.
It appears that templates aren't being applied to my hosts or services. I'm getting a ton of 'no check time period defined', etc.
Is this something that the recent update was trying to resolve? I thought I saw something about this in the release notes?
I made one change to remove a host from a service check and the config has been applying for well over an hour now.
Any suggestions?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance dive with 2012R1.1 Upgrade?
As far as I am aware, nothing relating to templating has changed at all.
You may want to check your mysql.log to make sure you don't have some sort of database corruption.
You may want to check your mysql.log to make sure you don't have some sort of database corruption.
Code: Select all
tail -f /var/log/mysql.logRe: Performance dive with 2012R1.1 Upgrade?
I ran the DB clean up and the permissions tool.
I'm just running into all kinds of issues now. It appears that I have a number of orphaned checks again, and when I go to look at my service problems, nothing shows up. It will tell me I have close to 300 service warnings, yet when I click on this from the Tactical Monitoring Overview, I get: No matching services found.
The hosts seem to show up just fine.
Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
When tail the mysqld.log, I see the following. The shutdown was from the db tool.
I'm just running into all kinds of issues now. It appears that I have a number of orphaned checks again, and when I go to look at my service problems, nothing shows up. It will tell me I have close to 300 service warnings, yet when I click on this from the Tactical Monitoring Overview, I get: No matching services found.
The hosts seem to show up just fine.
Also, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
When tail the mysqld.log, I see the following. The shutdown was from the db tool.
Code: Select all
121102 18:18:30 [Note] /usr/libexec/mysqld: Shutdown complete
121102 18:18:30 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
121102 18:19:52 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
121102 18:19:52 InnoDB: Initializing buffer pool, size = 8.0M
121102 18:19:52 InnoDB: Completed initialization of buffer pool
121102 18:19:54 InnoDB: Started; log sequence number 0 44263
121102 18:19:54 [Note] Event Scheduler: Loaded 0 events
121102 18:19:54 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance dive with 2012R1.1 Upgrade?
Can we reset nagios & ndo2db
Also, lets just verify we don't have a full disk which can cause odd behavior like this
Code: Select all
service nagios stop
service nagios stop
killall -9 nagios
killall -9 ndo2db
service nagios start
service ndo2dbstartCode: Select all
df -hRe: Performance dive with 2012R1.1 Upgrade?
scottwilkerson wrote:Can we reset nagios & ndo2dbAlso, lets just verify we don't have a full disk which can cause odd behavior like thisCode: Select all
service nagios stop service nd02db stop killall -9 nagios killall -9 ndo2db service nagios start service ndo2db startCode: Select all
df -h
Code: Select all
[root@nagiosxivm ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
108G 43G 61G 42% /
tmpfs 7.4G 0 7.4G 0% /dev/shm
/dev/sda1 97M 82M 11M 89% /boot-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance dive with 2012R1.1 Upgrade?
are you still getting errors?
Can you look at the apache error_log to see what is showingAlso, when I try to access the Monitoring Wizard, I get a HTTP500 error page.
Code: Select all
tail -f /var/log/httpd/error_logRe: Performance dive with 2012R1.1 Upgrade?
Yes, I'm still getting the 500 error page.scottwilkerson wrote:are you still getting errors?
Can you look at the apache error_log to see what is showingAlso, when I try to access the Monitoring Wizard, I get a HTTP500 error page.Code: Select all
tail -f /var/log/httpd/error_log
When I tail the log suggested, I am seeing the following, over and over and over again:
Code: Select all
[Mon Nov 05 22:58:01 2012] [error] [client 10.100.30.65] PHP Warning: include_once(): Failed opening '/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8, referer: http://10.100.30.10/nagiosxi/includes/components/nocscreen/noc.php
[Mon Nov 05 22:58:02 2012] [error] [client 127.0.0.1] PHP Warning: include_once(/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php): failed to open stream: No such file or directory in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8
[Mon Nov 05 22:58:02 2012] [error] [client 127.0.0.1] PHP Warning: include_once(): Failed opening '/usr/local/nagiosxi/html/includes/components/snmpwalk/../configwizardhelper.inc.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php') in /usr/local/nagiosxi/html/includes/components/snmpwalk/snmpwalk.inc.php on line 8
Should I delete them from the /usr/local/nagios/etc/services/ folder, them use the Write Config Files options under Tools in the CCM?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Performance dive with 2012R1.1 Upgrade?
Ok, this looks like you have a wizard installed in a component directory
Run
Then try again, checking again for errors
Run
Code: Select all
rm -rf /usr/local/nagiosxi/html/includes/components/snmpwalkRe: Performance dive with 2012R1.1 Upgrade?
This has taken care of the 500 error. I'm not sure how that happened in the first place though?scottwilkerson wrote:Ok, this looks like you have a wizard installed in a component directory
RunThen try again, checking again for errorsCode: Select all
rm -rf /usr/local/nagiosxi/html/includes/components/snmpwalk
Now, I need to go about getting the old config files removed. Suggestions?
Re: Performance dive with 2012R1.1 Upgrade?
It appears that stoping nagios, manually deleting these files, then applying the configs individually through the CCM, then restarting nagios has taken care of the duplicates.
HOWEVER, I'm still running into the issue where services are reporting down in the tactical overview, yet when I click on the 7 critical, unhandled problems or the 1 warning, unhandled problem, I get a page that shows no issues.
Also, on this same page where no matching services are found, the host & service status summary boxes at the top of the screen shows different counts than the tactical overview.
I'm not sure I understand why this is?
EDIT: I figured it out. It seems that in the upgrade process, things were reverted to a previous state for some reason. The hosts and services that I wasn't able to see no longer had a specific contact group assigned to them. This caused the hosts to show down, but when I would go to check or search for them I wasn't able to see them. This has since been resolved. Many thanks.
HOWEVER, I'm still running into the issue where services are reporting down in the tactical overview, yet when I click on the 7 critical, unhandled problems or the 1 warning, unhandled problem, I get a page that shows no issues.
Also, on this same page where no matching services are found, the host & service status summary boxes at the top of the screen shows different counts than the tactical overview.
I'm not sure I understand why this is?
EDIT: I figured it out. It seems that in the upgrade process, things were reverted to a previous state for some reason. The hosts and services that I wasn't able to see no longer had a specific contact group assigned to them. This caused the hosts to show down, but when I would go to check or search for them I wasn't able to see them. This has since been resolved. Many thanks.