External command processing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
cellact
Posts: 69
Joined: Mon May 14, 2012 7:00 am

External command processing

Post by cellact »

Hi,
I've encountered an issue with the way Nagios processes external commands coming from snmptt,
Below is an example of SNMP traps sent from a Cisco router,
The traps come in by this order: badPattern -> Failed -> Available
But Nagios processes them like: badPattern -> Available -> Failed

How can I fix this?

Thanks.

Code: Select all

Event Handler  2013-11-12 08:13:51GLOBAL SERVICE EVENT HANDLER: Cisco Router;SNMP Traps;CRITICAL;HARD;1;xi_service_event_handler
Service Critical 2013-11-12 08:13:51SERVICE ALERT: Cisco Router;SNMP Traps;CRITICAL;HARD;1;Failed
Passive Check 2013-11-12 08:13:51PASSIVE SERVICE CHECK: Cisco Router;SNMP Traps;2;Failed
Event Handler 2013-11-12 08:13:51GLOBAL SERVICE EVENT HANDLER: Cisco Router;SNMP Traps;OK;HARD;1;xi_service_event_handler
Service Recovery 2013-11-12 08:13:51SERVICE ALERT: Cisco Router;SNMP Traps;OK;HARD;1;Available
Passive Check 2013-11-12 08:13:51PASSIVE SERVICE CHECK: Cisco Router;SNMP Traps;0;Available
Event Handler 2013-11-12 08:13:51GLOBAL SERVICE EVENT HANDLER: Cisco Router;SNMP Traps;CRITICAL;HARD;1;xi_service_event_handler
Service Critical 2013-11-12 08:13:51SERVICE ALERT: Cisco Router;SNMP Traps;CRITICAL;HARD;1;badPattern
Passive Check 2013-11-12 08:13:51PASSIVE SERVICE CHECK: Cisco Router;SNMP Traps;2;badPattern
Passive Check 2013-11-12 08:13:46EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;0;Available
Passive Check 2013-11-12 08:13:41EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;Failed
Passive Check 2013-11-12 08:13:41EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;badPattern
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: External command processing

Post by sreinhardt »

I highly doubt this is an issue with how nagios is processing the traps, but instead the snmptrapd\tt daemons, although I could certainly be incorrect. The reason I believe this, is that snmptrapd puts new traps into files in a spool directory, snmptt then reaps these and acts accordingly which may or may not include sending to nagios, finally nagios displays the notifications in the order received. With that being said, I do not know if there is any control over how or the order that snmptt reaps files. If it is not taking them in order of date\time, and instead say just listing the directory and taking them as they come, I would completely understand this happening. However I know that when something is sent to the nagios.cmd file it should be processed in the order received, and such my reasoning for why it is not truly a nagios issue. However it would be great to know if we could alter how the files are reaped to ensure that this doesn't happen!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
cellact
Posts: 69
Joined: Mon May 14, 2012 7:00 am

Re: External command processing

Post by cellact »

Hi sreinhardt,
I've read your comment but I have to disagree.
The log clearly shows the external commands (snmptt -> snmptraphandling.py -> nagios.cmd) arrive by the correct order:

Passive Check 2013-11-12 08:13:46EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;0;Available
Passive Check 2013-11-12 08:13:41EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;Failed
Passive Check 2013-11-12 08:13:41EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;badPattern

Only then Nagios processes the check results in the wrong order.
My guess is that because both external commands were written at the same exact second (08:13:41) there was some mix up and the latter was processed later on.

How can I debug this?

Thanks,
Tal
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: External command processing

Post by sreinhardt »

Could you post the portion of snmptt.log that would correlate to these same three traps? As for debugging, this is an interesting one that we are going to have to think on for a little bit.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
cellact
Posts: 69
Joined: Mon May 14, 2012 7:00 am

Re: External command processing

Post by cellact »

Sure:

Code: Select all

Tue Nov 12 08:13:40 2013 .1.3.6.1.4.1.9.9.336.0.2 Warning "Status Events" 192.168.103.11 - badPattern
Tue Nov 12 08:13:41 2013 .1.3.6.1.4.1.9.9.336.0.2 Warning "Status Events" 192.168.103.11 - Failed
Tue Nov 12 08:13:41 2013 .1.3.6.1.4.1.9.9.336.0.2 Warning "Status Events" 192.168.103.11 - Available
As you can see, two traps came at the same exact time.

They were submitted correctly:

Code: Select all

Passive Check 2013-11-12 08:13:46EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;0;Available
Passive Check 2013-11-12 08:13:41EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;Failed
Passive Check 2013-11-12 08:13:41EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;badPattern
And then Nagios mixed up the order:

Code: Select all

Passive Check 2013-11-12 08:13:51PASSIVE SERVICE CHECK: Cisco Router;SNMP Traps;2;badPattern
Passive Check 2013-11-12 08:13:51PASSIVE SERVICE CHECK: Cisco Router;SNMP Traps;0;Available
Passive Check 2013-11-12 08:13:51PASSIVE SERVICE CHECK: Cisco Router;SNMP Traps;2;Failed
I'm using Nagios XI 2011R3.2.

Thanks
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: External command processing

Post by sreinhardt »

OK thanks for the additional info. I am going to speak with our core dev who is in tomorrow and see if there is some logic that I am missing or should be added to handle these in the order submitted.

edit: just an update, the core dev confirmed my thinking that nagios.cmd (what receives from snmptt) should be processing in the order recieved. We are testing this internally to see what we can find.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
cellact
Posts: 69
Joined: Mon May 14, 2012 7:00 am

Re: External command processing

Post by cellact »

I just had this reproduced in my prod environment. :(

Code: Select all

2013-11-18 14:55:43GLOBAL SERVICE : Cisco ITP1;SNMP Traps;CRITICAL;HARD;1;xi_service_event_handler
2013-11-18 14:55:43SERVICE ALERT: Cisco ITP1;SNMP Traps;CRITICAL;HARD;1;Linkset1 / LinkState failed / LinkReason changeOverInProgress / LinkTestResult noErrors /
2013-11-18 14:55:43PASSIVE SERVICE CHECK: Cisco ITP1;SNMP Traps;2;Linkset1 / LinkState failed / LinkReason changeOverInProgress / LinkTestResult noErrors /
2013-11-18 14:55:43GLOBAL SERVICE : Cisco ITP1;SNMP Traps;CRITICAL;HARD;1;xi_service_event_handler
2013-11-18 14:55:43SERVICE ALERT: Cisco ITP1;SNMP Traps;CRITICAL;HARD;1;Linkset1 / LinkState failed / LinkReason changeOverInProgress / LinkTestResult badPattern /
2013-11-18 14:55:43PASSIVE SERVICE CHECK: Cisco ITP1;SNMP Traps;2;Linkset1 / LinkState failed / LinkReason changeOverInProgress / LinkTestResult badPattern /
2013-11-18 14:55:43GLOBAL SERVICE : Cisco ITP1;SNMP Traps;OK;HARD;1;xi_service_event_handler
2013-11-18 14:55:43SERVICE ALERT: Cisco ITP1;SNMP Traps;OK;HARD;1;Linkset1 / LinkState available / LinkReason linkRestored / LinkTestResult noErrors /
2013-11-18 14:55:43PASSIVE SERVICE CHECK: Cisco ITP1;SNMP Traps;0;Linkset1 / LinkState available / LinkReason linkRestored / LinkTestResult noErrors /
2013-11-18 14:55:37EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;0;Linkset1 / LinkState available / LinkReason linkRestored / LinkTestResult noErrors /
2013-11-18 14:55:36EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;Linkset1 / LinkState failed / LinkReason changeOverInProgress / LinkTestResult noErrors /
2013-11-18 14:55:36EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;192.168.103.11;SNMP Traps;2;Linkset1 / LinkState failed / LinkReason changeOverInProgress / LinkTestResult badPattern /
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: External command processing

Post by abrist »

Please open a bug report at http://tracker.nagios.com
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cellact
Posts: 69
Joined: Mon May 14, 2012 7:00 am

Re: External command processing

Post by cellact »

Hi,
I've opened a bug report - http://tracker.nagios.com/view.php?id=465
How soon can I get a reply on this?

Thanks.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: External command processing

Post by tmcdonald »

Hard to say. We have a ton (100+) open feature requests and a dozen or so bug reports of varying importance. Looking at your ticket it seems it hasn't been claimed yet, but we just got a new dev so we'll see how the workload shifts.
Former Nagios employee
Locked