Page 2 of 2

Re: Notifications not being sent to all members of Contact G

Posted: Mon Jun 30, 2014 5:01 pm
by slansing
Well, it looks like pagerduty can only handle Problem, Acknowledgement, and recovery Alerts:

http://www.pagerduty.com/docs/guides/na ... ion-guide/

A problem specifically being:
A service or host has just entered (or is still in) a problem state. If this is a service notification, it means the service is either in a WARNING, UNKNOWN or CRITICAL state. If this is a host notification, it means the host is in a DOWN or UNREACHABLE state.
Can you add the additional state change information around that log snippit you wrapped in your last reply? So we can see what the exact state was?

Re: Notifications not being sent to all members of Contact G

Posted: Mon Jun 30, 2014 6:16 pm
by crnelson
This is the log excerpt from a CRITICAL state that did not notify PagerDuty.
Originally posted Thu Jun 26, 2014 2:29 pm

Code: Select all

Jun 25 05:21:01 fwapp003 xinetd[3580]: START: nrpe pid=443 from=*.*.*.24
Jun 25 05:21:01 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=443 duration=0(sec)
Jun 25 05:21:02 fwapp003 xinetd[3580]: START: nrpe pid=543 from=*.*.*.24
Jun 25 05:21:02 fwapp003 xinetd[3580]: START: nrpe pid=544 from=*.*.*.24
Jun 25 05:21:02 fwapp003 xinetd[3580]: START: nrpe pid=545 from=*.*.*.24
Jun 25 05:21:02 fwapp003 xinetd[3580]: START: nrpe pid=546 from=*.*.*.24
Jun 25 05:21:02 fwapp003 xinetd[3580]: START: nrpe pid=547 from=*.*.*.24
Jun 25 05:21:02 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=547 duration=0(sec)
Jun 25 05:21:02 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=545 duration=0(sec)
Jun 25 05:21:02 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=546 duration=0(sec)
Jun 25 05:21:02 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=543 duration=0(sec)
Jun 25 05:21:02 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=544 duration=0(sec)
Jun 25 05:21:03 fwapp003 nagios: SERVICE ALERT: *201;Apache Web Server;CRITICAL;SOFT;2;[25.06.2014 05:21:00 SYSTEM] watchdog for monitor is not running. 
Jun 25 05:21:03 fwapp003 ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_statehistory SET instance_id='1', state_time=FROM_UNIXTIME(1403698863), state_time_usec='284697', object_id='433', state_change
='1', state='2', state_type='0', current_check_attempt='2', max_check_attempts='5', last_state='2', last_hard_state='0', output='\[25\.06\.2014 05:21:00 SYSTEM\] watchdog for monitor is not running\.', long_outpu
t='\[25\.06\.2014 05:21:00 SYSTEM\] watchdog for monitor is not running\.'' 
Jun 25 05:21:03 fwapp003 ndo2db: mysql_error: 'Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed' 
Jun 25 05:21:12 fwapp003 nagios: SERVICE NOTIFICATION: stb*;*008;Drive_G: Disk Usage;CRITICAL;notify-service-by-email;G:\ - total: 75.00 Gb - used: 73.77 Gb (98%) - free 1.23 Gb (2%) 
Jun 25 05:21:12 fwapp003 nagios: SERVICE NOTIFICATION: rmc*;*008;Drive_G: Disk Usage;CRITICAL;notify-service-by-email;G:\ - total: 75.00 Gb - used: 73.77 Gb (98%) - free 1.23 Gb (2%) 
Jun 25 05:21:12 fwapp003 nagios: SERVICE NOTIFICATION: mpr*;*008;Drive_G: Disk Usage;CRITICAL;notify-service-by-email;G:\ - total: 75.00 Gb - used: 73.77 Gb (98%) - free 1.23 Gb (2%) 
Jun 25 05:21:12 fwapp003 nagios: SERVICE NOTIFICATION: eca*;*008;Drive_G: Disk Usage;CRITICAL;notify-service-by-email;G:\ - total: 75.00 Gb - used: 73.77 Gb (98%) - free 1.23 Gb (2%) 
Jun 25 05:21:12 fwapp003 xinetd[3580]: START: nrpe pid=1285 from=*.*.*.10
Jun 25 05:21:12 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=1285 duration=0(sec)
Jun 25 05:21:22 fwapp003 xinetd[3580]: START: nrpe pid=1578 from=*.*.*.24
Jun 25 05:21:22 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=1578 duration=0(sec)
Jun 25 05:21:44 fwapp003 xinetd[3580]: START: nrpe pid=2631 from=*.*.*.10
Jun 25 05:21:44 fwapp003 xinetd[3580]: EXIT: nrpe status=0 pid=2631 duration=0(sec)
Jun 25 05:21:59 fwapp003 xinetd[3580]: START: nrpe pid=3126 from=*.*.*.24

Re: Notifications not being sent to all members of Contact G

Posted: Tue Jul 01, 2014 9:19 am
by slansing
Well, among other things you have crashed mysql tables, which you should take care of first:

http://assets.nagios.com/downloads/nagi ... tabase.pdf

How do you turn off your server? Use:

Code: Select all

shutdown -h now
To safely shut it down if you have to, a power outage that effected the server would have also caused this.

Re: Notifications not being sent to all members of Contact G

Posted: Tue Jul 08, 2014 4:22 pm
by crnelson
I have completed the database repair as suggested and repeated the testing steps. Here's where it's at:
- There are no more crashed tables being referenced in the log files (win!)
- The notifications still don't notify the PagerDuty service, even if I force a "natural" failure by dropping the thresholds to force critical status

I've done some research in MySQL and found the following to confirm that, even at the database layer, the pagerduty_MSSQL_PROD contact does belong to the MSSQL_Disc_Usage_Notifiy group.

Code: Select all

mysql> select * from nagios_contactgroups where alias like "%Disc_Usage%";
+-----------------+-------------+-------------+------------------------+-------------------------------------+
| contactgroup_id | instance_id | config_type | contactgroup_object_id | alias                               |
+-----------------+-------------+-------------+------------------------+-------------------------------------+
|          167995 |           1 |           1 |                  15956 | PrintNet_Disc_Usage_Notifiy         | 
|          167987 |           1 |           1 |                  11841 | MSSQL_Disc_Usage_Notifiy            | 
|          167976 |           1 |           1 |                  14962 | FTP_Disc_Usage_Notifiy              | 
|          167966 |           1 |           1 |                  11973 | Business_Objects_Disc_Usage_Notifiy | 
+-----------------+-------------+-------------+------------------------+-------------------------------------+
4 rows in set (0.00 sec)



mysql> select * from nagios_contactgroup_members where contactgroup_id like "%167987%";
+------------------------+-------------+-----------------+-------------------+
| contactgroup_member_id | instance_id | contactgroup_id | contact_object_id |
+------------------------+-------------+-----------------+-------------------+
|                1061066 |           1 |          167987 |             11840 | 
|                1061065 |           1 |          167987 |             11839 | 
|                1061064 |           1 |          167987 |              7621 | 
|                1061063 |           1 |          167987 |              7618 | 
|                1061062 |           1 |          167987 |             11837 | 
|                1061061 |           1 |          167987 |             11392 | 
|                1061060 |           1 |          167987 |              4302 | 
|                1061059 |           1 |          167987 |              3053 | 
|                1061058 |           1 |          167987 |             16033 | 
|                1061057 |           1 |          167987 |             11836 | 
|                1061056 |           1 |          167987 |             11835 | 
|                1061055 |           1 |          167987 |              6879 | 
|                1061054 |           1 |          167987 |              6093 | 
+------------------------+-------------+-----------------+-------------------+
13 rows in set (0.00 sec)


mysql> select * from nagios_contacts where alias like "%pagerduty_MSSQL_PROD%";
+------------+-------------+-------------+-------------------+-------------------------------------------------+---------------+----------------------------------+---------------------------+------------------------------+----------------------------+-------------------------------+---------------------+-------------------------+------------------------+------------------------+-------------------------+-------------------------+-------------------------+----------------------+------------------+-------------------------+----------------------+----------------------+
| contact_id | instance_id | config_type | contact_object_id | alias                                           | email_address | pager_address                    | host_timeperiod_object_id | service_timeperiod_object_id | host_notifications_enabled | service_notifications_enabled | can_submit_commands | notify_service_recovery | notify_service_warning | notify_service_unknown | notify_service_critical | notify_service_flapping | notify_service_downtime | notify_host_recovery | notify_host_down | notify_host_unreachable | notify_host_flapping | notify_host_downtime |
+------------+-------------+-------------+-------------------+-------------------------------------------------+---------------+----------------------------------+---------------------------+------------------------------+----------------------------+-------------------------------+---------------------+-------------------------+------------------------+------------------------+-------------------------+-------------------------+-------------------------+----------------------+------------------+-------------------------+----------------------+----------------------+
|     616533 |           1 |           1 |             16166 | pagerduty_MSSQL_PROD_HOSTONLY_Tier1_DonorFacing |               | a2fa**************************19 |                      3334 |                           66 |                          1 |                             0 |                   1 |                       0 |                      0 |                      0 |                       0 |                       0 |                       0 |                    1 |                1 |                       0 |                    0 |                    0 | 
|     616534 |           1 |           1 |             16167 | pagerduty_MSSQL_PROD_Tier1_DonorFacing          |               | 82c3**************************85 |                      3334 |                         3334 |                          1 |                             1 |                   1 |                       1 |                      1 |                      0 |                       1 |                       0 |                       0 |                    1 |                1 |                       0 |                    0 |                    0 | 
|     616532 |           1 |           1 |             16076 | pagerduty_MSSQL_PROD_HOSTONLY                   |               | a2fa**************************19 |                      3334 |                           66 |                          1 |                             0 |                   1 |                       0 |                      0 |                      0 |                       0 |                       0 |                       0 |                    1 |                1 |                       0 |                    0 |                    0 | 
|     616531 |           1 |           1 |             16033 | pagerduty_MSSQL_PROD                            |               | 17be**************************9e |                      3334 |                         3334 |                          1 |                             1 |                   1 |                       1 |                      1 |                      0 |                       1 |                       0 |                       0 |                    1 |                1 |                       0 |                    0 |                    0 | 
+------------+-------------+-------------+-------------------+-------------------------------------------------+---------------+----------------------------------+---------------------------+------------------------------+----------------------------+-------------------------------+---------------------+-------------------------+------------------------+------------------------+-------------------------+-------------------------+-------------------------+----------------------+------------------+-------------------------+----------------------+----------------------+
4 rows in set (0.00 sec)
The logs are as they were before relating to this particular service assigned to the hostgroup; all other contacts are emailed, PagerDuty API is not called. Where this contact group is applied elsewhere, the PagerDuty API is still being called.

Re: Notifications not being sent to all members of Contact G

Posted: Wed Jul 09, 2014 10:34 am
by tmcdonald
It looks like your pagerduty_MSSQL_PROD contact (contact_object_id = 16033) is in fact in the MSSQL_Disc_Usage_Notifiy contact group. Look in the middle table for 16033 in the contact_object_id column.

Also, the nagiosql database is what reflects the CCM settings. You might want to look in there instead. Try applying the config after you've run the db repair script and see if that forces the configs. Is the contact a member of the group at the config file layer?