Page 2 of 3
Re: Post Upgrade Issues on 2014R1.3
Posted: Mon Aug 04, 2014 4:48 pm
by chriscamm
Sorry been away.
Ok I had issues from upgrading to 2014 from 2012 which has been worked on in this forum. I installed mod_gearman at the suggestion of someone at Nagios and it was working well or seemed to be until this upgrade however that is not to say that I did not have other issues which might have been masking me from seeing these errors.
I am available for a remote session should you need to see the issues.
Thanks
Chris
Re: Post Upgrade Issues on 2014R1.3
Posted: Tue Aug 05, 2014 2:12 am
by chriscamm
Hi,
My Nagios server keeps now running out of disk space:
Code: Select all
/usr/local/nagios/var has the following files:
drwxrwxr-x. 7 nagios nagios 135168 Aug 5 07:56 .
drwxr-xr-x. 9 root root 4096 Jun 13 10:13 ..
drwxrwxr-x. 2 nagios nagios 20480 Aug 5 00:01 archives
-rw-r--r-- 1 nagios nagios 144482304 Aug 5 07:56 host-perfdata
-rw-r--r-- 1 nagios nagios 12508 Jun 16 21:33 livestatus.log
drwxr-xr-x 2 nagios nagios 4096 Jul 7 15:26 log
-rw-r--r-- 1 nagios nagios 6 Aug 4 23:33 nagios.lock
-rw-r--r-- 1 nagios nagios 2347008 Aug 5 07:56 nagios.log
-rw-r--r-- 1 nagios nagios 626688 Aug 5 07:56 ndo2db.debug
-rw-r--r-- 1 nagios nagios 1003389 Aug 5 01:34 ndo2db.debug.old
-rw-r--r-- 1 nagios nagios 5 Aug 4 22:15 ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Aug 4 23:30 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Aug 4 22:15 ndo.sock
-rw-r--r-- 1 nagios nagios 10387456 Aug 5 07:55 npcd.log
-rw-r--r-- 1 nagios nagios 10485781 Jul 30 22:42 npcd.log.old
-rw-r--r--. 1 nagios nagios 4318354 Aug 4 23:33 objects.cache
-rw-rw-rw- 1 nagios nagios 5121567 Jul 18 22:40 perfdata.log
-rw------- 1 nagios users 0 Aug 5 07:33 retention.dat
drwxrwsr-x. 2 nagios nagcmd 4096 Aug 4 23:33 rw
---------- 1 nagios users 69632 Aug 5 01:37 sed04y1dh
---------- 1 root root 1005694353 Aug 4 23:48 sed0OQHX3
---------- 1 nagios users 2248564739 Aug 4 23:37 sed0URYMb
---------- 1 nagios users 4096 Aug 5 01:48 sed2esThS
---------- 1 nagios users 933888 Aug 5 01:34 sed3Hj3th
---------- 1 nagios users 1699409947 Aug 5 00:08 sed50Z2Ez
---------- 1 nagios users 4096 Aug 5 00:54 sed7z4EVR
---------- 1 nagios users 4096 Aug 5 01:23 sed97TL3a
---------- 1 nagios users 3285609404 Aug 4 23:45 sed9j1UzA
---------- 1 nagios users 1765825811 Aug 5 00:16 sedAm33Dd
---------- 1 nagios users 1258598043 Aug 4 23:59 sedb6m10N
---------- 1 nagios users 4096 Aug 5 00:46 sedcqHwdL
---------- 1 nagios users 4096 Aug 5 00:29 sedd2bazE
---------- 1 nagios users 2720014924 Aug 4 23:40 sedd9gCnd
---------- 1 nagios users 118784 Aug 5 00:29 sede7WkxK
---------- 1 nagios users 1431769009 Aug 5 00:02 sedEgf1Fx
---------- 1 nagios users 167591936 Aug 5 00:17 sedEKxV6Z
---------- 1 nagios users 1353080832 Aug 5 00:10 sedEOEMuF
---------- 1 nagios users 1734414111 Aug 5 00:09 sedf4GSMD
---------- 1 nagios users 4993024 Aug 5 00:18 sedFTRuFZ
---------- 1 nagios users 2486180834 Aug 4 23:53 sedG9SYtD
---------- 1 nagios users 4096 Aug 5 00:23 sedgdxshc
---------- 1 nagios users 3371499424 Aug 4 23:48 sedGgWR3D
---------- 1 nagios users 1195097957 Aug 4 23:58 sedHGispk
---------- 1 nagios users 4096 Aug 5 01:32 sedHYsGWQ
---------- 1 nagios users 1851322335 Aug 5 00:15 sedI2yCfI
---------- 1 nagios users 131072 Aug 5 01:29 sediHERPW
---------- 1 nagios users 3103634677 Aug 4 23:44 sedjkliQG
---------- 1 nagios users 1761226372 Aug 5 00:12 sedle9x7g
---------- 1 nagios users 4096 Aug 5 01:07 sedMY4QAc
---------- 1 nagios users 2928884739 Aug 4 23:41 sedNCvpp5
---------- 1 nagios users 3202558427 Aug 4 23:43 sednf4fuN
---------- 1 nagios users 2880693270 Aug 4 23:55 sedns6b5E
---------- 1 nagios users 16384 Aug 5 00:36 sednyqbEs
---------- 1 nagios users 3899392 Aug 5 01:07 sedodD9qN
---------- 1 nagios users 143360 Aug 5 01:33 sedQfqlGp
---------- 1 nagios users 2026523631 Aug 4 23:52 sedQIhCyt
---------- 1 nagios users 4096 Aug 5 01:33 sedrB4gTA
---------- 1 nagios users 888189674 Aug 4 23:56 sedrY4tT9
---------- 1 nagios users 1792732198 Aug 5 00:13 seds7YoO1
---------- 1 nagios users 4096 Aug 5 00:22 sedsjxFXG
---------- 1 nagios users 2145236747 Aug 4 23:35 sedsMjzDt
---------- 1 nagios users 0 Aug 5 00:39 sedsMO4YV
---------- 1 nagios users 4096 Aug 5 00:22 sedSmuNv4
---------- 1 nagios users 9875456 Aug 5 00:18 sedt5e29m
---------- 1 nagios users 1667689862 Aug 5 00:06 seduHurSD
---------- 1 nagios users 2285268992 Aug 4 23:50 seduZNllf
---------- 1 nagios users 2948509058 Aug 4 23:34 sedWbuPzK
---------- 1 nagios users 1776100559 Aug 4 23:39 sedwRuLMS
---------- 1 nagios users 3369140220 Aug 4 23:47 sedwsJ12f
---------- 1 nagios users 0 Aug 5 06:59 sedwsJW9X
---------- 1 nagios users 4096 Aug 5 00:34 sedWZ8QUH
---------- 1 nagios users 0 Aug 5 00:29 sedwZKCKn
---------- 1 nagios users 0 Aug 5 02:03 sedY9gWaT
---------- 1 nagios users 4096 Aug 5 00:32 sedyFHI71
---------- 1 nagios users 1623123406 Aug 5 00:03 sedYroHYK
---------- 1 nagios users 1325838268 Aug 5 00:00 sedytWlkJ
---------- 1 nagios users 0 Aug 5 01:06 sedYuvH7D
---------- 1 nagios users 1660052903 Aug 5 00:05 sedzSYAZH
-rw-r--r-- 1 nagios nagios 3415560192 Aug 5 07:56 service-perfdata
drwxr-xr-x. 5 nagios nagios 4096 Jun 6 2012 spool
drwxr-xr-x. 2 nagios nagios 4096 Jul 18 22:40 stats
-rw-rw-r-- 1 nagios users 0 Aug 5 07:56 status.dat
My process-service-perfdata-file-bulk command is
Code: Select all
sed -i 's/\\n//g' /usr/local/nagios/var/service-perfdata /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
My process-host-perfdata-file-bulk command is
Code: Select all
sed -i 's/\\n//g' /usr/local/nagios/var/host-perfdata /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
Any ideas why it is not importing them correctly?
Thanks
Chris
Re: Post Upgrade Issues on 2014R1.3
Posted: Tue Aug 05, 2014 2:59 pm
by lmiltchev
Can you show us the output of the following command in code wraps?
Are you currently using livestatus? Did you install it after the upgrade? What's the livestatus version?
Re: Post Upgrade Issues on 2014R1.3
Posted: Tue Aug 05, 2014 4:16 pm
by chriscamm
Hi no live status running
/var/log/message
Code: Select all
[root@qualngs ~]# tail -50 /var/log/messages
Aug 5 21:47:31 qualngs abrt-server[55041]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:31 qualngs abrt[55042]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:31 qualngs abrtd[2081]: New client connected
Aug 5 21:47:31 qualngs abrt-server[55043]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:31 qualngs abrt[55044]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:31 qualngs abrtd[2081]: New client connected
Aug 5 21:47:31 qualngs abrt-server[55045]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:32 qualngs abrt[55046]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:32 qualngs abrtd[2081]: New client connected
Aug 5 21:47:32 qualngs abrt-server[55047]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:32 qualngs abrt[55048]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:32 qualngs abrtd[2081]: New client connected
Aug 5 21:47:32 qualngs abrt-server[55049]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:32 qualngs abrt[55051]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:32 qualngs abrtd[2081]: New client connected
Aug 5 21:47:32 qualngs abrt-server[55052]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:33 qualngs abrt[55053]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:33 qualngs abrtd[2081]: New client connected
Aug 5 21:47:33 qualngs abrt-server[55054]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:33 qualngs abrt[55055]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:33 qualngs abrtd[2081]: New client connected
Aug 5 21:47:33 qualngs abrt-server[55056]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:34 qualngs abrt[55057]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:47:34 qualngs abrtd[2081]: New client connected
Aug 5 21:47:34 qualngs abrt-server[55058]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 21:50:12 qualngs xinetd[1943]: START: nrpe pid=61443 from=::ffff:172.20.10.134
Aug 5 21:50:12 qualngs nrpe[61443]: Error: Could not complete SSL handshake. 5
Aug 5 21:50:12 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=61443 duration=0(sec)
Aug 5 21:55:34 qualngs xinetd[1943]: START: nrpe pid=9000 from=::ffff:172.20.10.126
Aug 5 21:55:34 qualngs nrpe[9000]: Error: Could not complete SSL handshake. 5
Aug 5 21:55:34 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=9000 duration=0(sec)
Aug 5 22:01:00 qualngs xinetd[1943]: START: nrpe pid=21631 from=::ffff:172.20.10.126
Aug 5 22:01:00 qualngs nrpe[21631]: Error: Could not complete SSL handshake. 5
Aug 5 22:01:00 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=21631 duration=0(sec)
Aug 5 22:06:51 qualngs xinetd[1943]: START: nrpe pid=34617 from=::ffff:172.20.10.126
Aug 5 22:06:51 qualngs nrpe[34617]: Error: Could not complete SSL handshake. 5
Aug 5 22:06:51 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=34617 duration=0(sec)
Aug 5 22:07:59 qualngs abrt[36508]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 22:07:59 qualngs abrtd[2081]: New client connected
Aug 5 22:07:59 qualngs abrt-server[36509]: Saved Python crash dump of pid 36508 to /var/spool/abrt/pyhook-2014-08-05-22:07:59-36508
Aug 5 22:07:59 qualngs abrtd[2081]: Directory 'pyhook-2014-08-05-22:07:59-36508' creation detected
Aug 5 22:07:59 qualngs abrtd[2081]: Executable '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Aug 5 22:07:59 qualngs abrtd[2081]: 'post-create' on '/var/spool/abrt/pyhook-2014-08-05-22:07:59-36508' exited with 1
Aug 5 22:07:59 qualngs abrtd[2081]: Deleting problem directory '/var/spool/abrt/pyhook-2014-08-05-22:07:59-36508'
Aug 5 22:07:59 qualngs abrt[36512]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 22:07:59 qualngs abrtd[2081]: New client connected
Aug 5 22:07:59 qualngs abrt-server[36513]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug 5 22:12:08 qualngs xinetd[1943]: START: nrpe pid=49037 from=::ffff:172.20.10.126
Aug 5 22:12:08 qualngs nrpe[49037]: Error: Could not complete SSL handshake. 5
Aug 5 22:12:08 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=49037 duration=0(sec)
Re: Post Upgrade Issues on 2014R1.3
Posted: Tue Aug 05, 2014 4:40 pm
by scottwilkerson
Ok here is the issue... You installed mod_gearman and the check_rrdtraf plugin used for bandwidth checks needs to read a file from the local XI server
These are items you would want to exclude from distributing to mod_gearman. You can do this by adding the MRTG bandwidth check services to a servicegroup, then modifying the /usr/local/etc/mod_gearman/mod_gearman_neb.conf on your XI server to contain the following changing NEW_SERVICEGROUP_NAME with the name of the servicegroup
Code: Select all
# sets a list of servicegroups which will not be executed
# by gearman. They are just passed through.
# Default is none
localservicegroups=NEW_SERVICEGROUP_NAME
Re: Post Upgrade Issues on 2014R1.3
Posted: Wed Aug 06, 2014 5:09 am
by chriscamm
Thanks I will check and update
Chris
Re: Post Upgrade Issues on 2014R1.3
Posted: Wed Aug 13, 2014 8:59 am
by tmcdonald
Been a while since we heard from you. Any update?
Re: Post Upgrade Issues on 2014R1.3
Posted: Fri Aug 15, 2014 5:18 am
by chriscamm
Sorry - Yes this fixed this mrtg issue. Been having other issues with check_wmi_plus results and it looks like its the same issue with files being generated with the previous state on the server.
Thanks
Chris
Re: Post Upgrade Issues on 2014R1.3
Posted: Fri Aug 15, 2014 11:40 am
by sreinhardt
Those wmi temp files should be in /tmp. You can just remove them and it will create new indexes if needed. If you have not done so, I would highly suggest starting with that. If you have, could you give more details as to what you are seeing and have seen in the past, as well as steps you may have performed recently to resolve?
Re: Post Upgrade Issues on 2014R1.3
Posted: Sat Aug 16, 2014 4:09 am
by chriscamm
Hi,
My setup is this:
1 x NagiosXI Server running gourmand (server1)
2 x NagiosXI Servers running as mod_gearman_workers (server2 and server3)
When I run a service check to say check the CPU usage using the check_wmi_plus.pl command it goes away and brings back the results on say server1. This says first run so next time it runs you will get the results. Next time it runs on server1 I get the results and all is well. Next time it runs it runs on server2 and I get this is the first time this has run. It runs again on server3 and I get the same this is the first time it has run. Then it runs on server1 and I get stale data returned scheduling an immediate check.
Can you confirm if I am able to run check_wmi_plus.pl on gearmand or if I will always get these errors?
My current Nagios server is running over 4000 server checks and I get ndo2db errors If I do not distribute the load across all three servers however, this is now causing me lots of head aches as I keep getting alerts constantly.
Thanks
Chris