Page 1 of 1

NagiosXI Possible Misconfigured Apply Conf Failing

Posted: Fri Apr 11, 2014 11:48 am
by mlopez
Hi All,


1.What version of Nagios XI are you using?
2.Linux Distribution and version? CentOS 6.3
3.32 or 64bit? 64bit
4.VMware Image or Manual Install of XI? Nagios XI 2012R2.9
5.Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
**If you are encountering multiple issues that may not be related, start a thread for each issue


Setup:
Hosts: 692 hosts
Services: 50976

What will nagios be used for?
The Nagios XI system will only be used for passive SNMP Traps and we have 75 Traps per host hence the amount of services. I have setup 1 Service per OID.

Please keep in mind that we only receive around 10,000~15,000 Traps a day which is very low volume. Now I can reduce the amount of services if I can figure out how to display multiple Traps per Service but it seems to be an issue with snmptt / nagios that it will only display the latest trap, I need this feature as I want to migrate from a different NMS solution which only has alert to email capability and I actually really like Nagios and want our groups to be more Interactive in investigating, acknowledging and the Criticality is very important which Nagios has all these features and much more.


The ideal scenario and these are prerequisite:
1. SNMP Trap comes in
2. Display's trap on "view"
3. X Group Acknowledges alert.
4. If "X Service" has no recent alerts in the last 24 hours clear all service alerts. (This will make the system more manageable.
5. If overall Host did not receive any traps on any of the associated services in "X time" create an alert advising connectivity issue.


---------------------------------------------
Q:I think I must have enabled something in the services which I shouldn't have and thus it's slowing down the "Apply Configuration" as it is now taking 10 minutes to apply and seems to be failing now, I only need the traps to be displayed and only clear if no traps for the service in 24 hours.

Example of one of my services (All are setup the same 75 per host):

Code: Select all

define service {
        host_name                       HOSTNAME
        service_description             NAMEOFSNMPTRAP
        use                             xiwizard_snmptrap_service
        is_volatile                     1
        max_check_attempts              1
        check_interval                  1
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        notification_interval           1
        notification_period             xi_timeperiod_24x7
        notifications_enabled           1
        contacts                        nagiosadmin
        contact_groups                  admins
        stalking_options                o,w,c,u,
        icon_image                      snmptrap.png
        _xiwizard                       snmp_trap
        register                        1
        }
---------------------------------------------

Q:Maybe I shouldn't be using check/retry/check period as these snmp traps?
(I only need the host to monitor all service and if no alerts in 48 hours alert advising no traps from host and only to run in passive)

Example of my Hosts:

Code: Select all

define host {
        host_name                       HOSTNAME
        use                             xiwizard_passive_host
        alias                           HOSTNAMEXXX
        address                         1.1.1.1
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        contacts                        nagiosadmin
        notification_interval           1
        notification_period             xi_timeperiod_24x7
        notification_options            d,
        notifications_enabled           1
        _xiwizard                       passivecheck
        register                        1
        }
---------------------------------------------

snmptt (Every Trap is setup to go to a different Service this one is going to VoltageNotify) BTW this is working fine:

Code: Select all

EVENT VoltageNotify .1.3.6.1.4.1.0000.100.3.74 "Status Events" Warning
FORMAT $1 $6 $7=$8, $9. - VoltageNotify
SDESC
EXEC /usr/local/bin/snmptraphandling.py "$1" "VoltageNotify" "$s" "$@" "$-*" "$1 $6 $7=$8, $9."
Voltage error/warning detected.
Variables:
  1: SystemSn
  2: SystemName
  3: SystemLocation
  4: SystemDescription
  5: SystemTime
  6: Origin
  7: ObjectName
  8: ObjectValue
  9: Message
EDESC
---------------------------------------------





After doing the MYSQL Offloading everything was working fine but after I finished the Ramdisk implementation I now getting the following:

Command submitted for processing...
Waiting for configuration verification................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Configurations failed to write to file.

php.ini

Code: Select all

max_execution_time = 450
max_input_time = 450
memory_limit = 800M


And the reason for this really long post is I'm about to uninstall and start from scratch again and I want to ensure my next install will go as planned as it's started going slow after I added all services.

Again when I had 1 host + 1 service (SNMP Traps) there was no load issue with the SNMP Traps coming in, it's only when I added all the individual services that the "Apply Configuration" took 10~15 minutes to apply and thus forcing me to add offloading and ramdisk which I think might be something wrong with the services I created or hosts configuration.


Any help would be greatly appreciated and if you need any more information just comment and I will provide it, I also want to thank "slansing, tmcdonald, sreinhardt, scottwilkerson) for all the help they have provided already.

Sincerely,
Michael

Re: NagiosXI Possible Misconfigured Apply Conf Failing

Posted: Fri Apr 11, 2014 2:43 pm
by sreinhardt
Wow that is quite the post! Before you wipe anything out, would it be possible for you to send over your latest configuration snapshot via pm? With core 3.5 it is not unheard of at all for 50k+ checks to take a few minutes to verify and apply config, but this might be partially due to templating and such that could be helped along a bit. I don't think there is really anything wrong with your system, with the possible exception of some IO or load issues purely due to the time the whole apply config process takes.

On a side note, core 4, with 2014, will make this significantly quicker, and that is a very short time away.

Re: NagiosXI Possible Misconfigured Apply Conf Failing

Posted: Fri Apr 11, 2014 5:50 pm
by mlopez
Hi Spenser,
After removing all my services + hosts except for one the "applying configuration" no longer failed.

Code: Select all

Applying Configuration

    Command submitted for processing...
    Waiting for configuration verification........
    Configuration applied successfully.

Success! Nagios Core was restarted with an updated configuration.
This is good as it tells me the changes I did to the Offloading SQL + Ramdisk were proper. When you have time check your private messages as now that all is empty I want to ensure I am properly configuring the services + hosts to give the less load possible on the system as I will only be handling SNMP Traps inbound and nothing outbound.

Sincerely,
Michael

Re: NagiosXI Possible Misconfigured Apply Conf Failing

Posted: Mon Apr 14, 2014 10:14 am
by slansing
Going to close this for now as it is resolved, sreinhardt may have time to take a look at the PM, but we try to handle all issues and questions here, in the open so all techs can assist if needed.