Page 2 of 3

Re: Post Upgrade Issues on 2014R1.3

Posted: Mon Aug 04, 2014 4:48 pm
by chriscamm
Sorry been away.

Ok I had issues from upgrading to 2014 from 2012 which has been worked on in this forum. I installed mod_gearman at the suggestion of someone at Nagios and it was working well or seemed to be until this upgrade however that is not to say that I did not have other issues which might have been masking me from seeing these errors.

I am available for a remote session should you need to see the issues.

Thanks

Chris

Re: Post Upgrade Issues on 2014R1.3

Posted: Tue Aug 05, 2014 2:12 am
by chriscamm
Hi,

My Nagios server keeps now running out of disk space:

Code: Select all

/usr/local/nagios/var has the following files:

drwxrwxr-x. 7 nagios nagios     135168 Aug  5 07:56 .
drwxr-xr-x. 9 root   root         4096 Jun 13 10:13 ..
drwxrwxr-x. 2 nagios nagios      20480 Aug  5 00:01 archives
-rw-r--r--  1 nagios nagios  144482304 Aug  5 07:56 host-perfdata
-rw-r--r--  1 nagios nagios      12508 Jun 16 21:33 livestatus.log
drwxr-xr-x  2 nagios nagios       4096 Jul  7 15:26 log
-rw-r--r--  1 nagios nagios          6 Aug  4 23:33 nagios.lock
-rw-r--r--  1 nagios nagios    2347008 Aug  5 07:56 nagios.log
-rw-r--r--  1 nagios nagios     626688 Aug  5 07:56 ndo2db.debug
-rw-r--r--  1 nagios nagios    1003389 Aug  5 01:34 ndo2db.debug.old
-rw-r--r--  1 nagios nagios          5 Aug  4 22:15 ndo2db.lock
-rw-r--r--  1 nagios nagios          0 Aug  4 23:30 ndomod.tmp
srwxr-xr-x  1 nagios nagios          0 Aug  4 22:15 ndo.sock
-rw-r--r--  1 nagios nagios   10387456 Aug  5 07:55 npcd.log
-rw-r--r--  1 nagios nagios   10485781 Jul 30 22:42 npcd.log.old
-rw-r--r--. 1 nagios nagios    4318354 Aug  4 23:33 objects.cache
-rw-rw-rw-  1 nagios nagios    5121567 Jul 18 22:40 perfdata.log
-rw-------  1 nagios users           0 Aug  5 07:33 retention.dat
drwxrwsr-x. 2 nagios nagcmd       4096 Aug  4 23:33 rw
----------  1 nagios users       69632 Aug  5 01:37 sed04y1dh
----------  1 root   root   1005694353 Aug  4 23:48 sed0OQHX3
----------  1 nagios users  2248564739 Aug  4 23:37 sed0URYMb
----------  1 nagios users        4096 Aug  5 01:48 sed2esThS
----------  1 nagios users      933888 Aug  5 01:34 sed3Hj3th
----------  1 nagios users  1699409947 Aug  5 00:08 sed50Z2Ez
----------  1 nagios users        4096 Aug  5 00:54 sed7z4EVR
----------  1 nagios users        4096 Aug  5 01:23 sed97TL3a
----------  1 nagios users  3285609404 Aug  4 23:45 sed9j1UzA
----------  1 nagios users  1765825811 Aug  5 00:16 sedAm33Dd
----------  1 nagios users  1258598043 Aug  4 23:59 sedb6m10N
----------  1 nagios users        4096 Aug  5 00:46 sedcqHwdL
----------  1 nagios users        4096 Aug  5 00:29 sedd2bazE
----------  1 nagios users  2720014924 Aug  4 23:40 sedd9gCnd
----------  1 nagios users      118784 Aug  5 00:29 sede7WkxK
----------  1 nagios users  1431769009 Aug  5 00:02 sedEgf1Fx
----------  1 nagios users   167591936 Aug  5 00:17 sedEKxV6Z
----------  1 nagios users  1353080832 Aug  5 00:10 sedEOEMuF
----------  1 nagios users  1734414111 Aug  5 00:09 sedf4GSMD
----------  1 nagios users     4993024 Aug  5 00:18 sedFTRuFZ
----------  1 nagios users  2486180834 Aug  4 23:53 sedG9SYtD
----------  1 nagios users        4096 Aug  5 00:23 sedgdxshc
----------  1 nagios users  3371499424 Aug  4 23:48 sedGgWR3D
----------  1 nagios users  1195097957 Aug  4 23:58 sedHGispk
----------  1 nagios users        4096 Aug  5 01:32 sedHYsGWQ
----------  1 nagios users  1851322335 Aug  5 00:15 sedI2yCfI
----------  1 nagios users      131072 Aug  5 01:29 sediHERPW
----------  1 nagios users  3103634677 Aug  4 23:44 sedjkliQG
----------  1 nagios users  1761226372 Aug  5 00:12 sedle9x7g
----------  1 nagios users        4096 Aug  5 01:07 sedMY4QAc
----------  1 nagios users  2928884739 Aug  4 23:41 sedNCvpp5
----------  1 nagios users  3202558427 Aug  4 23:43 sednf4fuN
----------  1 nagios users  2880693270 Aug  4 23:55 sedns6b5E
----------  1 nagios users       16384 Aug  5 00:36 sednyqbEs
----------  1 nagios users     3899392 Aug  5 01:07 sedodD9qN
----------  1 nagios users      143360 Aug  5 01:33 sedQfqlGp
----------  1 nagios users  2026523631 Aug  4 23:52 sedQIhCyt
----------  1 nagios users        4096 Aug  5 01:33 sedrB4gTA
----------  1 nagios users   888189674 Aug  4 23:56 sedrY4tT9
----------  1 nagios users  1792732198 Aug  5 00:13 seds7YoO1
----------  1 nagios users        4096 Aug  5 00:22 sedsjxFXG
----------  1 nagios users  2145236747 Aug  4 23:35 sedsMjzDt
----------  1 nagios users           0 Aug  5 00:39 sedsMO4YV
----------  1 nagios users        4096 Aug  5 00:22 sedSmuNv4
----------  1 nagios users     9875456 Aug  5 00:18 sedt5e29m
----------  1 nagios users  1667689862 Aug  5 00:06 seduHurSD
----------  1 nagios users  2285268992 Aug  4 23:50 seduZNllf
----------  1 nagios users  2948509058 Aug  4 23:34 sedWbuPzK
----------  1 nagios users  1776100559 Aug  4 23:39 sedwRuLMS
----------  1 nagios users  3369140220 Aug  4 23:47 sedwsJ12f
----------  1 nagios users           0 Aug  5 06:59 sedwsJW9X
----------  1 nagios users        4096 Aug  5 00:34 sedWZ8QUH
----------  1 nagios users           0 Aug  5 00:29 sedwZKCKn
----------  1 nagios users           0 Aug  5 02:03 sedY9gWaT
----------  1 nagios users        4096 Aug  5 00:32 sedyFHI71
----------  1 nagios users  1623123406 Aug  5 00:03 sedYroHYK
----------  1 nagios users  1325838268 Aug  5 00:00 sedytWlkJ
----------  1 nagios users           0 Aug  5 01:06 sedYuvH7D
----------  1 nagios users  1660052903 Aug  5 00:05 sedzSYAZH
-rw-r--r--  1 nagios nagios 3415560192 Aug  5 07:56 service-perfdata
drwxr-xr-x. 5 nagios nagios       4096 Jun  6  2012 spool
drwxr-xr-x. 2 nagios nagios       4096 Jul 18 22:40 stats
-rw-rw-r--  1 nagios users           0 Aug  5 07:56 status.dat
My process-service-perfdata-file-bulk command is

Code: Select all

sed -i 's/\\n//g' /usr/local/nagios/var/service-perfdata /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
My process-host-perfdata-file-bulk command is

Code: Select all

sed -i 's/\\n//g' /usr/local/nagios/var/host-perfdata /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
Any ideas why it is not importing them correctly?

Thanks

Chris

Re: Post Upgrade Issues on 2014R1.3

Posted: Tue Aug 05, 2014 2:59 pm
by lmiltchev
Can you show us the output of the following command in code wraps?

Code: Select all

tail -50 /var/log/messages
Are you currently using livestatus? Did you install it after the upgrade? What's the livestatus version?

Re: Post Upgrade Issues on 2014R1.3

Posted: Tue Aug 05, 2014 4:16 pm
by chriscamm
Hi no live status running

/var/log/message

Code: Select all

[root@qualngs ~]# tail -50 /var/log/messages
Aug  5 21:47:31 qualngs abrt-server[55041]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:31 qualngs abrt[55042]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:31 qualngs abrtd[2081]: New client connected
Aug  5 21:47:31 qualngs abrt-server[55043]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:31 qualngs abrt[55044]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:31 qualngs abrtd[2081]: New client connected
Aug  5 21:47:31 qualngs abrt-server[55045]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:32 qualngs abrt[55046]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:32 qualngs abrtd[2081]: New client connected
Aug  5 21:47:32 qualngs abrt-server[55047]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:32 qualngs abrt[55048]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:32 qualngs abrtd[2081]: New client connected
Aug  5 21:47:32 qualngs abrt-server[55049]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:32 qualngs abrt[55051]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:32 qualngs abrtd[2081]: New client connected
Aug  5 21:47:32 qualngs abrt-server[55052]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:33 qualngs abrt[55053]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:33 qualngs abrtd[2081]: New client connected
Aug  5 21:47:33 qualngs abrt-server[55054]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:33 qualngs abrt[55055]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:33 qualngs abrtd[2081]: New client connected
Aug  5 21:47:33 qualngs abrt-server[55056]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:34 qualngs abrt[55057]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:47:34 qualngs abrtd[2081]: New client connected
Aug  5 21:47:34 qualngs abrt-server[55058]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 21:50:12 qualngs xinetd[1943]: START: nrpe pid=61443 from=::ffff:172.20.10.134
Aug  5 21:50:12 qualngs nrpe[61443]: Error: Could not complete SSL handshake. 5
Aug  5 21:50:12 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=61443 duration=0(sec)
Aug  5 21:55:34 qualngs xinetd[1943]: START: nrpe pid=9000 from=::ffff:172.20.10.126
Aug  5 21:55:34 qualngs nrpe[9000]: Error: Could not complete SSL handshake. 5
Aug  5 21:55:34 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=9000 duration=0(sec)
Aug  5 22:01:00 qualngs xinetd[1943]: START: nrpe pid=21631 from=::ffff:172.20.10.126
Aug  5 22:01:00 qualngs nrpe[21631]: Error: Could not complete SSL handshake. 5
Aug  5 22:01:00 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=21631 duration=0(sec)
Aug  5 22:06:51 qualngs xinetd[1943]: START: nrpe pid=34617 from=::ffff:172.20.10.126
Aug  5 22:06:51 qualngs nrpe[34617]: Error: Could not complete SSL handshake. 5
Aug  5 22:06:51 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=34617 duration=0(sec)
Aug  5 22:07:59 qualngs abrt[36508]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 22:07:59 qualngs abrtd[2081]: New client connected
Aug  5 22:07:59 qualngs abrt-server[36509]: Saved Python crash dump of pid 36508 to /var/spool/abrt/pyhook-2014-08-05-22:07:59-36508
Aug  5 22:07:59 qualngs abrtd[2081]: Directory 'pyhook-2014-08-05-22:07:59-36508' creation detected
Aug  5 22:07:59 qualngs abrtd[2081]: Executable '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Aug  5 22:07:59 qualngs abrtd[2081]: 'post-create' on '/var/spool/abrt/pyhook-2014-08-05-22:07:59-36508' exited with 1
Aug  5 22:07:59 qualngs abrtd[2081]: Deleting problem directory '/var/spool/abrt/pyhook-2014-08-05-22:07:59-36508'
Aug  5 22:07:59 qualngs abrt[36512]: detected unhandled Python exception in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 22:07:59 qualngs abrtd[2081]: New client connected
Aug  5 22:07:59 qualngs abrt-server[36513]: Not saving repeating crash in '/usr/local/nagiosxi/html/includes/components/capacityplanning/backend/capacityplanning.py'
Aug  5 22:12:08 qualngs xinetd[1943]: START: nrpe pid=49037 from=::ffff:172.20.10.126
Aug  5 22:12:08 qualngs nrpe[49037]: Error: Could not complete SSL handshake. 5
Aug  5 22:12:08 qualngs xinetd[1943]: EXIT: nrpe status=0 pid=49037 duration=0(sec)

Re: Post Upgrade Issues on 2014R1.3

Posted: Tue Aug 05, 2014 4:40 pm
by scottwilkerson
Ok here is the issue... You installed mod_gearman and the check_rrdtraf plugin used for bandwidth checks needs to read a file from the local XI server


These are items you would want to exclude from distributing to mod_gearman. You can do this by adding the MRTG bandwidth check services to a servicegroup, then modifying the /usr/local/etc/mod_gearman/mod_gearman_neb.conf on your XI server to contain the following changing NEW_SERVICEGROUP_NAME with the name of the servicegroup

Code: Select all

# sets a list of servicegroups which will not be executed
# by gearman. They are just passed through.
# Default is none
localservicegroups=NEW_SERVICEGROUP_NAME

Re: Post Upgrade Issues on 2014R1.3

Posted: Wed Aug 06, 2014 5:09 am
by chriscamm
Thanks I will check and update

Chris

Re: Post Upgrade Issues on 2014R1.3

Posted: Wed Aug 13, 2014 8:59 am
by tmcdonald
Been a while since we heard from you. Any update?

Re: Post Upgrade Issues on 2014R1.3

Posted: Fri Aug 15, 2014 5:18 am
by chriscamm
Sorry - Yes this fixed this mrtg issue. Been having other issues with check_wmi_plus results and it looks like its the same issue with files being generated with the previous state on the server.

Thanks

Chris

Re: Post Upgrade Issues on 2014R1.3

Posted: Fri Aug 15, 2014 11:40 am
by sreinhardt
Those wmi temp files should be in /tmp. You can just remove them and it will create new indexes if needed. If you have not done so, I would highly suggest starting with that. If you have, could you give more details as to what you are seeing and have seen in the past, as well as steps you may have performed recently to resolve?

Re: Post Upgrade Issues on 2014R1.3

Posted: Sat Aug 16, 2014 4:09 am
by chriscamm
Hi,

My setup is this:

1 x NagiosXI Server running gourmand (server1)
2 x NagiosXI Servers running as mod_gearman_workers (server2 and server3)

When I run a service check to say check the CPU usage using the check_wmi_plus.pl command it goes away and brings back the results on say server1. This says first run so next time it runs you will get the results. Next time it runs on server1 I get the results and all is well. Next time it runs it runs on server2 and I get this is the first time this has run. It runs again on server3 and I get the same this is the first time it has run. Then it runs on server1 and I get stale data returned scheduling an immediate check.

Can you confirm if I am able to run check_wmi_plus.pl on gearmand or if I will always get these errors?

My current Nagios server is running over 4000 server checks and I get ndo2db errors If I do not distribute the load across all three servers however, this is now causing me lots of head aches as I keep getting alerts constantly.

Thanks

Chris