'SERVICEPROBLEMID' Questions

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Gavin
Posts: 58
Joined: Mon Dec 24, 2012 4:56 am

'SERVICEPROBLEMID' Questions

Post by Gavin »

We have Nagios configured so that, every time it sends an alert, it sends an e-mail to our CRM system. This logs a ticket, which we can then use for reporting, customer notifications etc. Currently, we do not send 'OK' notifications, or repeat notifications, because both would log a new ticket. We also only send 'CRITICAL' notifications, and we'd rather send 'WARNING' notifications too, so we get the complete picture.

Ideally, we'd like to send the OK notifications and have the ticket close automatically. If I can get the right information out of Nagios, this should be possible with our CRM system (Salesforce.com Service Cloud). Before I start this... does my logic below seem sound?

* ServiceA is WARNING
** E-Mail CRM with $SERVICEPROBLEMID$ and problem description.
** CRM system compares $SERVICEPROBLEMID$ with all current SERVICEPROBLEMIDs, fails to find it, and creates a new ticket.

* ServiceA is CRITICAL
** E-Mail CRM with $SERVICEPROBLEMID$ and problem description.
** CRM system compares $SERVICEPROBLEMID$ with all current SERVICEPROBLEMIDs, finds it, and updates the existing ticket.

* ServiceA is CRITICAL (repeat notification)
** E-Mail CRM with $SERVICEPROBLEMID$ and problem description.
** CRM system compares $SERVICEPROBLEMID$ with all current SERVICEPROBLEMIDs, finds it, and updates the existing ticket.

* ServiceA is OK
** E-Mail CRM with $LASTSERVICEPROBLEMID$, problem description, and a keyword that signifies the message is a recovery message.
** CRM system compares $LASTSERVICEPROBLEMID$ with all current SERVICEPROBLEMID, finds it, and closes the existing ticket.

It all seems pretty straightforward to me, can anyone see any issues with doing the above? Also, the Nagios documentation says that the 'SERVICEPROBLEMID' does not change between Non-OK state transitions - does 'UNKNOWN' count as 'Non-OK'? What about flapping? I assume that it remains the same until the service is 'OK' and 'HARD'?

It'd be really nice if we could get this working properly, and I'd post a brief guide on the Exchange for getting this set up with Salesforce, as we can't be the only people doing this!

Thanks,

Gavin
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: 'SERVICEPROBLEMID' Questions

Post by abrist »

UNKNOWN should be considered non-OK. SERVICEPROBLEMIDs are in fact unique through the problem state.
I am not too sure about flapping, though I would presume it would be considered a non-ok state.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Gavin
Posts: 58
Joined: Mon Dec 24, 2012 4:56 am

Re: 'SERVICEPROBLEMID' Questions

Post by Gavin »

Great. Just to confirm, when you say 'globally' unique, does this apply to both 'HOSTPROBLEMID' and 'SERVICEPROBLEMID'? I'm planning on using just one field in the CRM for both Host and Service problems.

Thanks,

Gavin
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: 'SERVICEPROBLEMID' Questions

Post by abrist »

I believe they are both unique, as the problemid number counter is global and shared. See: http://nagios.sourceforge.net/docs/3_0/macrolist.html
$SERVICEPROBLEMID$
A globally unique number associated with the service's current problem state. Every time a service (or host) transitions from an OK or UP state to a problem state, a global problem ID number is incremented by one (1). This macro will be non-zero if the service is currently a non-OK state. State transitions between non-OK states (e.g. WARNING to CRITICAL) do not cause this problem id to increase. If the service is currently in an OK state, this macro will be set to zero (0). Combined with event handlers, this macro could be used to automatically open trouble tickets when services first enter a problem state.

[snip]

$HOSTPROBLEMID$
A globally unique number associated with the host's current problem state. Every time a host (or service) transitions from an UP or OK state to a problem state, a global problem ID number is incremented by one (1). This macro will be non-zero if the host is currently a non-UP state. State transitions between non-UP states (e.g. DOWN to UNREACHABLE) do not cause this problem id to increase. If the host is currently in an UP state, this macro will be set to zero (0). Combined with event handlers, this macro could be used to automatically open trouble tickets when hosts first enter a problem state.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked