Page 1 of 2
Wildcard on service name
Posted: Mon Mar 31, 2014 4:28 pm
by sujitt
Can we configure Passive services which can accept wild card names. Example
We have programs which will which will send in passive statuses to nagios with names like
CIPBBILB , CIPBLPCB, CIPCBC2B ,CIPCCSSB, CIPCMPPB, CIPCSATB ,CIPDRFNB
All of these run multiple threads for example if 1st program runs 5 threads we will have Services named CIPBBILB001, CIPBBILB002, CIPBBILB003, CIPBBILB004 and CIPBBILB005. today we have configured most of these and if we preconfigure all of our programs to a maximum threads then we will have 8200 services configured for passive mode. Which increases the db size and have seen the apply config taking a long time to complete.
Is there a way we have services to accept regular expression like CIPBBILB??? and they should accept use all the configs defined in this template and display statuses based on the name provided in the passive status sent.
If this scenario is not clear let me know I could send some samples.
Your help is appreciated. We are going live with a nagios solution for batch application monitoring next week.
Re: Wildcard on service name
Posted: Tue Apr 01, 2014 1:19 pm
by sujitt
Please let me know if you need more explanation of the situation. The other question is if I add 8200 services for passive monitoring will this have any negative effects on the Nagios Server other than the long time to apply config
is there a way the apply config to take care of changes only instead of all of the config written out all the time?
Re: Wildcard on service name
Posted: Tue Apr 01, 2014 1:28 pm
by sreinhardt
More direct applicable examples would be extremely helpful at this point. As for the second set of questions, no that number of passive services will not be terribly hard on the nagios system, and unfortunately at this time, there is not a way to output just new or altered configs, all of them are always written out. This last one about writing configs, is something we are considering changing though.
Re: Wildcard on service name
Posted: Tue Apr 01, 2014 2:06 pm
by sujitt
I have attached my current state of services. The naming convenvention for the services are <PROGNAME><XXX> where XXX is the thread number.
Here is the exact scenario we are implementing.
1. This is a passive scenario where our Ctrl-M (job scheduling program) would execute a program which calls a masterscript program which spawns of multiple threads based on the number of threads.
2. Each thread will call send_nsca which sends the commandline identifying the state of the thread. It also sends a pid for the thread which will allow for killing a thread from Nagios using Action component.
3. If the thread fails, the thread would have sent a restart info for the action component to restart the thread. Action component will then call the NRPE to execute the thread and also calls the ticketing system through action handler to create a ticket.
Now the problem is that the number of threads each program spawns is dynamic. since we have configured each thread name to be a service name, we need to pre-configure all of these services to a max say 100 threads. this now will create 8200 services which is too much and the effects on Nagios is unknown to me especially with our current sizing of 2 procs
I was concerned with the approach we are taking to solve the problem. I needed more guidance to this.
1. if there was a way to regex the thread numbers and define only the progname as servicename and still get info by thread level ?
2. if we have to preconfig, what is the drag on the system? do we need more cpu or memory or disk or any other resources?
3. Is there any other option I should consider.
Thanks
sujith
environment.png
Servicesccb.png
Re: Wildcard on service name
Posted: Tue Apr 01, 2014 4:29 pm
by abrist
sujitt wrote:1. if there was a way to regex the thread numbers and define only the progname as servicename and still get info by thread level ?
This is possible, by parsing the output of "ps".
sujitt wrote:2. if we have to preconfig, what is the drag on the system? do we need more cpu or memory or disk or any other resources?
The biggest issue with preconfiguring these 8200 checks is that a number of hosts will not be using all checks. So you could potentially have many false failures if the host checked does not support the max number of threads.
sujitt wrote:3. Is there any other option I should consider.
Is there any way to wrap all the thread checks on each unique host into 1 check a piece? The threads could all write to a common file and then your check could parse the file and then return the necessary strings for a nagios server side script to parse and restart the necessary threads?
Re: Wildcard on service name
Posted: Tue Apr 01, 2014 5:47 pm
by sujitt
sujitt wrote:
1. if there was a way to regex the thread numbers and define only the progname as servicename and still get info by thread level ?
This is possible, by parsing the output of "ps".
what ever we do if we do not pre-config the service name the nsca output is going to be in unconfigured objects
sujitt wrote:
2. if we have to preconfig, what is the drag on the system? do we need more cpu or memory or disk or any other resources?
The biggest issue with preconfiguring these 8200 checks is that a number of hosts will not be using all checks. So you could potentially have many false failures if the host checked does not support the max number of threads.
Since these are passive checks does this number of services still have issues?
sujitt wrote:
3. Is there any other option I should consider.
Is there any way to wrap all the thread checks on each unique host into 1 check a piece? The threads could all write to a common file and then your check could parse the file and then return the necessary strings for a nagios server side script to parse and restart the necessary threads?
even in this case we need to have multiple service names for operators to be able to restart the thread from Nagios screens.
Re: Wildcard on service name
Posted: Wed Apr 02, 2014 10:59 am
by abrist
Passive checks, or just any checks in general cannot use wildcards.You may have to just work with the implementation you currently have. I still think there may be another method to deal with the large process list without using individual service checks, but it would most likely be scripting heavy. Have you though about just identifying the processes that may need to be restarted and just return those threads, instead of the whole list?
Re: Wildcard on service name
Posted: Wed Apr 02, 2014 11:11 am
by sujitt
This is kind of a feature request where you can dynamically configure objects not existing ( today called as unconfigured objects)
set regular expression based templates for hostname and service names.
when nsca receives a packet which is not already defined, that event should look at the template list to see any matches and then use that to configure that object on the fly and also report the status actually sent by the client.
This way you will not loose the event and will not have predefine every object which may fail.
For now I am going to pre-define all the 8200 objects assuming that this is not going to cause huge performance issues since it is passive mode.
Re: Wildcard on service name
Posted: Wed Apr 02, 2014 11:14 am
by abrist
sujitt wrote:This is kind of a feature request where you can dynamically configure objects not existing
You may want to suggest this by opening a feature request at
http://tracker.nagios.com
sujitt wrote:For now I am going to pre-define all the 8200 objects assuming that this is not going to cause huge performance issues since it is passive mode.
They should be fine - how often are you returning results from the passive checks?
Re: Wildcard on service name
Posted: Wed Apr 02, 2014 9:27 pm
by sujitt
Each service check will max receive 3 messages a day