Page 1 of 2

Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 3:26 pm
by arnab.roy
Hi Support,

Since the upgrade we are having major problems

Nagios is no longer processing external commands i.e. we are not being able to process snmp traps etc.

We are also seeing these messages

Jul 24 21:21:33 karma nagios: wproc: Core Worker 10396: job 41 (pid=13272) timed out. Killing it
Jul 24 21:21:33 karma nagios: wproc: Core Worker 10396: kill(-13272, SIGKILL) failed: Operation not permitted
Jul 24 21:21:34 karma nagios: wproc: Core Worker 10394: job 41 (pid=13286) timed out. Killing it
Jul 24 21:21:34 karma nagios: wproc: Core Worker 10394: kill(-13286, SIGKILL) failed: Operation not permitted


Whenever a trap comes in we see this in the error log

karma nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;wmin-aruba;SNMP Traps;0; 07 DE 07 18 14 19 0C 00 2B 00 00 10.100.119.25 2C 54 CF E5 BB 13 00 24 6C 57 E2 F0 WMIN-RAP2-ISLS 0 0 4 / wlsxTrapTime (OCTETSTR):07 DE 07 18 14 19 0C 00 2B 00 00 wlsxTrapUserIpAddress.0 (IPADDR):10.100.119.25 wlsxTrapUserPhyAddress.0 (OCTETSTR):2C 54 CF E5 BB 13 wlsxTrapAPBSSID.0 (OCTETSTR):00 24 6C 57 E2 F0 wlsxTrapAPName.0 (OCTETSTR):WMIN-RAP2-ISLS wlsxTrapCardSlot.0 (INTEGER32):0 wlsxTrapPortNumber.0 (INTEGER32):0 wlsxTrapUserAttributeChangeType.0 (INTEGER):4

Can we have some urgent feedback, otherwise might need to perform a rollback.

Many Thanks
Arnab

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 4:22 pm
by lmiltchev
Were you able to actually complete the upgrade successfully? What is the output of the following commands?

Code: Select all

/usr/local/nagios/bin/nagios | head -2
/usr/local/nagios/bin/ndo2db | head -2
Do you have any config errors? Are you using mod gearman or mk livestatus?

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 4:32 pm
by snapon_admin
I would also check:

Code: Select all

service snmptrapd status
service snmptt status
Your post made me realize we haven't received any traps in awhile so I checked these 2 things on our server and snmptt wasn't running. I started it and traps started streaming in. Not sure why it was stopped, but judging by the last time we received a trap it was right around the time we did the 2014 upgrade. Not sure if it's related or not, or even the same issue you're having, but I figured I'd throw in my 2 cents.

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 4:38 pm
by sreinhardt
XI upgrades should not effect running services other than mysql, postgres, httpd, iptables, selinux, nagios, npcd, and probably mrtg. There really should be nothing to do with snmptt, snmptrapd, or snmpd, especially since we consider that separate integration. However I do find it interesting that you both report this seeming to be around update time.

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 4:49 pm
by arnab.roy
hi output is

Nagios Core 4.0.7

Ndo2db 2.0.0

It did throw up some errors for the nagios config i went in and cleaned up the core config then ran the upgrade. We are not using gearman or mkstatus , but we use nrdp and nsca. The problem we are observing is i think writing to nagios cmd. Because thats where the snmptrahandling.py scripts writes into.

Can you suggest any further troubleshooting steps.

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 4:56 pm
by abrist
What are the permissions on the command pipe?

Code: Select all

ls -la /usr/local/nagios/var/rw/
ls -lad /usr/local/nagios/var/rw/

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 5:00 pm
by arnab.roy
Here you go

ls -la /usr/local/nagios/var/rw/
total 16
drwxrwsr-x 2 nagios nagcmd 4096 Jul 24 21:16 .
drwxrwxr-x 6 nagios nagios 4096 Jul 24 22:57 ..
prw-rw---- 1 nagios nagcmd 0 Jul 24 22:57 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Jul 24 21:16 nagios.qh
-rw-rw-r-- 1 nagios nagcmd 7137 Mar 14 16:04 nsca.dump


ls -lad /usr/local/nagios/var/rw/

drwxrwsr-x 2 nagios nagcmd 4096 Jul 24 21:16 /usr/local/nagios/var/rw/

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 5:08 pm
by abrist
Well, dang. Those look fine. How about groups?

Code: Select all

grep nag /etc/group

Re: Problems after 2014 Upgrade

Posted: Thu Jul 24, 2014 5:21 pm
by arnab.roy
I think i found the answers myself http://support.nagios.com/forum/viewtop ... 16&t=27376 on another thread , I will do some more testing and confirm if I am seeing the same issue or not

Re: Problems after 2014 Upgrade

Posted: Fri Jul 25, 2014 9:21 am
by slansing
Great let us know, sounds similar.