Problems after 2014 Upgrade
Problems after 2014 Upgrade
Hi Support,
Since the upgrade we are having major problems
Nagios is no longer processing external commands i.e. we are not being able to process snmp traps etc.
We are also seeing these messages
Jul 24 21:21:33 karma nagios: wproc: Core Worker 10396: job 41 (pid=13272) timed out. Killing it
Jul 24 21:21:33 karma nagios: wproc: Core Worker 10396: kill(-13272, SIGKILL) failed: Operation not permitted
Jul 24 21:21:34 karma nagios: wproc: Core Worker 10394: job 41 (pid=13286) timed out. Killing it
Jul 24 21:21:34 karma nagios: wproc: Core Worker 10394: kill(-13286, SIGKILL) failed: Operation not permitted
Whenever a trap comes in we see this in the error log
karma nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;wmin-aruba;SNMP Traps;0; 07 DE 07 18 14 19 0C 00 2B 00 00 10.100.119.25 2C 54 CF E5 BB 13 00 24 6C 57 E2 F0 WMIN-RAP2-ISLS 0 0 4 / wlsxTrapTime (OCTETSTR):07 DE 07 18 14 19 0C 00 2B 00 00 wlsxTrapUserIpAddress.0 (IPADDR):10.100.119.25 wlsxTrapUserPhyAddress.0 (OCTETSTR):2C 54 CF E5 BB 13 wlsxTrapAPBSSID.0 (OCTETSTR):00 24 6C 57 E2 F0 wlsxTrapAPName.0 (OCTETSTR):WMIN-RAP2-ISLS wlsxTrapCardSlot.0 (INTEGER32):0 wlsxTrapPortNumber.0 (INTEGER32):0 wlsxTrapUserAttributeChangeType.0 (INTEGER):4
Can we have some urgent feedback, otherwise might need to perform a rollback.
Many Thanks
Arnab
Since the upgrade we are having major problems
Nagios is no longer processing external commands i.e. we are not being able to process snmp traps etc.
We are also seeing these messages
Jul 24 21:21:33 karma nagios: wproc: Core Worker 10396: job 41 (pid=13272) timed out. Killing it
Jul 24 21:21:33 karma nagios: wproc: Core Worker 10396: kill(-13272, SIGKILL) failed: Operation not permitted
Jul 24 21:21:34 karma nagios: wproc: Core Worker 10394: job 41 (pid=13286) timed out. Killing it
Jul 24 21:21:34 karma nagios: wproc: Core Worker 10394: kill(-13286, SIGKILL) failed: Operation not permitted
Whenever a trap comes in we see this in the error log
karma nagios: Error: External command failed -> PROCESS_SERVICE_CHECK_RESULT;wmin-aruba;SNMP Traps;0; 07 DE 07 18 14 19 0C 00 2B 00 00 10.100.119.25 2C 54 CF E5 BB 13 00 24 6C 57 E2 F0 WMIN-RAP2-ISLS 0 0 4 / wlsxTrapTime (OCTETSTR):07 DE 07 18 14 19 0C 00 2B 00 00 wlsxTrapUserIpAddress.0 (IPADDR):10.100.119.25 wlsxTrapUserPhyAddress.0 (OCTETSTR):2C 54 CF E5 BB 13 wlsxTrapAPBSSID.0 (OCTETSTR):00 24 6C 57 E2 F0 wlsxTrapAPName.0 (OCTETSTR):WMIN-RAP2-ISLS wlsxTrapCardSlot.0 (INTEGER32):0 wlsxTrapPortNumber.0 (INTEGER32):0 wlsxTrapUserAttributeChangeType.0 (INTEGER):4
Can we have some urgent feedback, otherwise might need to perform a rollback.
Many Thanks
Arnab
Re: Problems after 2014 Upgrade
Were you able to actually complete the upgrade successfully? What is the output of the following commands?
Do you have any config errors? Are you using mod gearman or mk livestatus?
Code: Select all
/usr/local/nagios/bin/nagios | head -2
/usr/local/nagios/bin/ndo2db | head -2Be sure to check out our Knowledgebase for helpful articles and solutions!
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Problems after 2014 Upgrade
I would also check:
Your post made me realize we haven't received any traps in awhile so I checked these 2 things on our server and snmptt wasn't running. I started it and traps started streaming in. Not sure why it was stopped, but judging by the last time we received a trap it was right around the time we did the 2014 upgrade. Not sure if it's related or not, or even the same issue you're having, but I figured I'd throw in my 2 cents.
Code: Select all
service snmptrapd status
service snmptt status
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Problems after 2014 Upgrade
XI upgrades should not effect running services other than mysql, postgres, httpd, iptables, selinux, nagios, npcd, and probably mrtg. There really should be nothing to do with snmptt, snmptrapd, or snmpd, especially since we consider that separate integration. However I do find it interesting that you both report this seeming to be around update time.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Problems after 2014 Upgrade
hi output is
Nagios Core 4.0.7
Ndo2db 2.0.0
It did throw up some errors for the nagios config i went in and cleaned up the core config then ran the upgrade. We are not using gearman or mkstatus , but we use nrdp and nsca. The problem we are observing is i think writing to nagios cmd. Because thats where the snmptrahandling.py scripts writes into.
Can you suggest any further troubleshooting steps.
Nagios Core 4.0.7
Ndo2db 2.0.0
It did throw up some errors for the nagios config i went in and cleaned up the core config then ran the upgrade. We are not using gearman or mkstatus , but we use nrdp and nsca. The problem we are observing is i think writing to nagios cmd. Because thats where the snmptrahandling.py scripts writes into.
Can you suggest any further troubleshooting steps.
Re: Problems after 2014 Upgrade
What are the permissions on the command pipe?
Code: Select all
ls -la /usr/local/nagios/var/rw/
ls -lad /usr/local/nagios/var/rw/
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Problems after 2014 Upgrade
Here you go
ls -la /usr/local/nagios/var/rw/
total 16
drwxrwsr-x 2 nagios nagcmd 4096 Jul 24 21:16 .
drwxrwxr-x 6 nagios nagios 4096 Jul 24 22:57 ..
prw-rw---- 1 nagios nagcmd 0 Jul 24 22:57 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Jul 24 21:16 nagios.qh
-rw-rw-r-- 1 nagios nagcmd 7137 Mar 14 16:04 nsca.dump
ls -lad /usr/local/nagios/var/rw/
drwxrwsr-x 2 nagios nagcmd 4096 Jul 24 21:16 /usr/local/nagios/var/rw/
ls -la /usr/local/nagios/var/rw/
total 16
drwxrwsr-x 2 nagios nagcmd 4096 Jul 24 21:16 .
drwxrwxr-x 6 nagios nagios 4096 Jul 24 22:57 ..
prw-rw---- 1 nagios nagcmd 0 Jul 24 22:57 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Jul 24 21:16 nagios.qh
-rw-rw-r-- 1 nagios nagcmd 7137 Mar 14 16:04 nsca.dump
ls -lad /usr/local/nagios/var/rw/
drwxrwsr-x 2 nagios nagcmd 4096 Jul 24 21:16 /usr/local/nagios/var/rw/
Re: Problems after 2014 Upgrade
Well, dang. Those look fine. How about groups?
Code: Select all
grep nag /etc/groupFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Problems after 2014 Upgrade
I think i found the answers myself http://support.nagios.com/forum/viewtop ... 16&t=27376 on another thread , I will do some more testing and confirm if I am seeing the same issue or not
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Problems after 2014 Upgrade
Great let us know, sounds similar.