Page 2 of 3

Re: A simple question about simple notification

Posted: Mon Jul 23, 2018 7:29 am
by davide.bonicelli
Ok.
Do you have any other suggest?

Re: A simple question about simple notification

Posted: Mon Jul 23, 2018 3:55 pm
by tgriep
The next thing to look at us the user's notification settings. If they are not are not enabled for Recoveries, that could cause the issue.

To check this, you will have to login as the userin the XI GUI. Click on the user name in the top right of the window.
On the left menu, go to the Notification Preferences and the Notification Methods menu and make sure the settings are correct and update the settings if any changes are done.

The above settings would affect all of the Service or Host email notification option for that user but it would be good to check it.

Also, can you run the following as root and post the output so we can see what processes are running on the server?

Code: Select all

ps -ef --cols=300
Thanks

Re: A simple question about simple notification

Posted: Wed Jul 25, 2018 9:11 am
by davide.bonicelli
The preferences looks good to me:
Cattura.JPG
and these are the process running:

Code: Select all

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Jul13 ?        00:00:01 /sbin/init
root         2     0  0 Jul13 ?        00:00:00 [kthreadd]
root         3     2  0 Jul13 ?        00:00:49 [migration/0]
root         4     2  0 Jul13 ?        00:02:10 [ksoftirqd/0]
root         5     2  0 Jul13 ?        00:00:00 [stopper/0]
root         6     2  0 Jul13 ?        00:00:01 [watchdog/0]
root         7     2  0 Jul13 ?        00:00:49 [migration/1]
root         8     2  0 Jul13 ?        00:00:00 [stopper/1]
root         9     2  0 Jul13 ?        00:02:12 [ksoftirqd/1]
root        10     2  0 Jul13 ?        00:00:01 [watchdog/1]
root        11     2  0 Jul13 ?        00:00:47 [events/0]
root        12     2  0 Jul13 ?        00:01:08 [events/1]
root        13     2  0 Jul13 ?        00:00:00 [events/0]
root        14     2  0 Jul13 ?        00:00:00 [events/1]
root        15     2  0 Jul13 ?        00:00:00 [events_long/0]
root        16     2  0 Jul13 ?        00:00:00 [events_long/1]
root        17     2  0 Jul13 ?        00:00:00 [events_power_ef]
root        18     2  0 Jul13 ?        00:00:00 [events_power_ef]
root        19     2  0 Jul13 ?        00:00:00 [cgroup]
root        20     2  0 Jul13 ?        00:00:00 [khelper]
root        21     2  0 Jul13 ?        00:00:00 [netns]
root        22     2  0 Jul13 ?        00:00:00 [async/mgr]
root        23     2  0 Jul13 ?        00:00:00 [pm]
root        24     2  0 Jul13 ?        00:00:03 [sync_supers]
root        25     2  0 Jul13 ?        00:00:00 [bdi-default]
root        26     2  0 Jul13 ?        00:00:00 [kintegrityd/0]
root        27     2  0 Jul13 ?        00:00:00 [kintegrityd/1]
root        28     2  0 Jul13 ?        00:01:31 [kblockd/0]
root        29     2  0 Jul13 ?        00:01:26 [kblockd/1]
root        30     2  0 Jul13 ?        00:00:00 [kacpid]
root        31     2  0 Jul13 ?        00:00:00 [kacpi_notify]
root        32     2  0 Jul13 ?        00:00:00 [kacpi_hotplug]
root        33     2  0 Jul13 ?        00:00:00 [ata_aux]
root        34     2  0 Jul13 ?        00:00:00 [ata_sff/0]
root        35     2  0 Jul13 ?        00:00:00 [ata_sff/1]
root        36     2  0 Jul13 ?        00:00:00 [ksuspend_usbd]
root        37     2  0 Jul13 ?        00:00:00 [khubd]
root        38     2  0 Jul13 ?        00:00:00 [kseriod]
root        39     2  0 Jul13 ?        00:00:00 [md/0]
root        40     2  0 Jul13 ?        00:00:00 [md/1]
root        41     2  0 Jul13 ?        00:00:00 [md_misc/0]
root        42     2  0 Jul13 ?        00:00:00 [md_misc/1]
root        43     2  0 Jul13 ?        00:00:00 [linkwatch]
root        45     2  0 Jul13 ?        00:00:00 [khungtaskd]
root        46     2  0 Jul13 ?        00:04:41 [kswapd0]
root        47     2  0 Jul13 ?        00:00:00 [ksmd]
root        48     2  0 Jul13 ?        00:04:13 [khugepaged]
root        49     2  0 Jul13 ?        00:00:00 [aio/0]
root        50     2  0 Jul13 ?        00:00:00 [aio/1]
root        51     2  0 Jul13 ?        00:00:00 [crypto/0]
root        52     2  0 Jul13 ?        00:00:00 [crypto/1]
root        59     2  0 Jul13 ?        00:00:00 [kthrotld/0]
root        60     2  0 Jul13 ?        00:00:00 [kthrotld/1]
root        61     2  0 Jul13 ?        00:00:00 [pciehpd]
root        63     2  0 Jul13 ?        00:00:00 [kpsmoused]
root        64     2  0 Jul13 ?        00:00:00 [usbhid_resumer]
root        65     2  0 Jul13 ?        00:00:00 [deferwq]
root        98     2  0 Jul13 ?        00:00:00 [kdmremove]
root        99     2  0 Jul13 ?        00:00:00 [kstriped]
root       130     2  0 Jul13 ?        00:00:00 [ttm_swap]
root       252     2  0 Jul13 ?        00:00:21 [mpt_poll_0]
root       253     2  0 Jul13 ?        00:00:00 [mpt/0]
root       254     2  0 Jul13 ?        00:00:00 [scsi_eh_0]
root       259     2  0 Jul13 ?        00:00:00 [scsi_eh_1]
root       260     2  0 Jul13 ?        00:00:00 [scsi_eh_2]
root       385     2  0 Jul13 ?        00:00:00 [kdmflush]
root       387     2  0 Jul13 ?        00:00:00 [kdmflush]
root       405     2  0 Jul13 ?        00:02:15 [jbd2/dm-0-8]
root       406     2  0 Jul13 ?        00:00:00 [ext4-dio-unwrit]
root       487     1  0 Jul13 ?        00:00:00 /sbin/udevd -d
postgres   579  1850  0 07:22 ?        00:00:11 postgres: nagiosxi nagiosxi ::1(51994) idle
root       711     2  0 Jul13 ?        00:00:13 [vmmemctl]
root       828     2  0 Jul13 ?        00:00:00 [jbd2/sda1-8]
root       829     2  0 Jul13 ?        00:00:00 [ext4-dio-unwrit]
root       837     2  0 Jul13 ?        00:04:38 [flush-253:0]
root       867     2  0 Jul13 ?        00:00:26 [kauditd]
root      1491     1  0 Jul13 ?        00:01:22 auditd
root      1513     1  0 Jul13 ?        00:00:44 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
dbus      1542     1  0 Jul13 ?        00:00:00 dbus-daemon --system
root      1586     1  0 Jul13 ?        00:00:24 /usr/sbin/snmptrapd -Ln -p /var/run/snmptrapd.pid
root      1599     1  0 Jul13 ?        00:00:15 /usr/bin/perl /usr/sbin/snmptt --daemon
snmptt    1600  1599  0 Jul13 ?        00:00:32 /usr/bin/perl /usr/sbin/snmptt --daemon
root      1617     1  0 Jul13 ?        00:00:01 /usr/sbin/sshd
root      1628     1  0 Jul13 ?        00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
ntp       1639     1  0 Jul13 ?        00:00:02 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root      1676     1  0 Jul13 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     1799  1676  0 Jul13 ?        02:45:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
postgres  1850     1  0 Jul13 ?        00:01:43 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
nagios    1866     1  0 Jul13 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
root      1945     1  0 Jul13 ?        00:00:10 /usr/libexec/postfix/master
postfix   1958  1945  0 Jul13 ?        00:00:04 qmgr -l -t fifo -u
496       1959     1  0 Jul13 ?        00:00:00 shellinaboxd -u shellinabox -g shellinabox --cert=/var/lib/shellinabox --port=7878 --background=/var/run/shellinaboxd.pid --disable-ssl-menu -s /:SSH --localhost-only --css white-on-black.css
496       1961  1959  0 Jul13 ?        00:00:00 shellinaboxd -u shellinabox -g shellinabox --cert=/var/lib/shellinabox --port=7878 --background=/var/run/shellinaboxd.pid --disable-ssl-menu -s /:SSH --localhost-only --css white-on-black.css
root      1971     1  0 Jul13 ?        00:00:36 /usr/sbin/httpd
root      1983     1  0 Jul13 ?        00:00:37 crond
nagios    1994     1  0 Jul13 ?        00:01:33 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios    2008     1  0 Jul13 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root      2248     1  0 Jul13 ?        00:14:23 /usr/sbin/vmtoolsd
root      2267     1  0 Jul13 tty1     00:00:00 /sbin/mingetty /dev/tty1
root      2269     1  0 Jul13 tty2     00:00:00 /sbin/mingetty /dev/tty2
root      2271     1  0 Jul13 tty3     00:00:00 /sbin/mingetty /dev/tty3
root      2273     1  0 Jul13 tty4     00:00:00 /sbin/mingetty /dev/tty4
root      2275     1  0 Jul13 tty5     00:00:00 /sbin/mingetty /dev/tty5
root      2277     1  0 Jul13 tty6     00:00:00 /sbin/mingetty /dev/tty6
root      2282   487  0 Jul13 ?        00:00:00 /sbin/udevd -d
root      2283   487  0 Jul13 ?        00:00:00 /sbin/udevd -d
postgres  2298  1850  0 Jul13 ?        00:00:29 postgres: logger process
postgres  2300  1850  0 Jul13 ?        00:03:25 postgres: writer process
postgres  2301  1850  0 Jul13 ?        00:02:31 postgres: wal writer process
postgres  2302  1850  0 Jul13 ?        00:00:36 postgres: autovacuum launcher process
postgres  2303  1850  0 Jul13 ?        00:02:17 postgres: stats collector process
apache    3446  1971  0 Jul24 ?        00:11:00 /usr/sbin/httpd
postgres  3569  1850  0 Jul24 ?        00:00:34 postgres: nagiosxi nagiosxi ::1(49374) idle
apache    4480  1971  0 Jul24 ?        00:11:21 /usr/sbin/httpd
apache    4574  1971  0 05:42 ?        00:04:41 /usr/sbin/httpd
postgres  4577  1850  0 Jul24 ?        00:00:33 postgres: nagiosxi nagiosxi ::1(49444) idle
postgres  5318  1850  0 05:43 ?        00:00:14 postgres: nagiosxi nagiosxi ::1(56394) idle
postfix  10537  1945  0 14:48 ?        00:00:00 pickup -l -t fifo -u
apache   14211  1971  0 04:58 ?        00:05:01 /usr/sbin/httpd
postgres 14442  1850  0 04:58 ?        00:00:16 postgres: nagiosxi nagiosxi ::1(45696) idle
root     14971  1983  0 16:09 ?        00:00:00 CROND
root     14972  1983  0 16:09 ?        00:00:00 CROND
root     14973  1983  0 16:09 ?        00:00:00 CROND
root     14974  1983  0 16:09 ?        00:00:00 CROND
root     14975  1983  0 16:09 ?        00:00:00 CROND
root     14976  1983  0 16:09 ?        00:00:00 CROND
nagios   14977 14975  0 16:09 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios   14978 14971  0 16:09 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios   14979 14973  0 16:09 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1
nagios   14980 14972  0 16:09 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios   14983 14974  0 16:09 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1
nagios   14984 14976  0 16:09 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios   14987 14978  0 16:09 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios   14988 14980  0 16:09 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios   14989 14983  0 16:09 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios   14990 14977  0 16:09 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios   14992 14979  0 16:09 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php
nagios   14999 14984  0 16:09 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
postgres 15073  1850  0 16:09 ?        00:00:00 postgres: nagiosxi nagiosxi ::1(37358) idle
postgres 15076  1850  0 16:09 ?        00:00:00 postgres: nagiosxi nagiosxi ::1(37360) idle
postgres 15078  1850  0 16:09 ?        00:00:00 postgres: nagiosxi nagiosxi ::1(37362) idle
postgres 15095  1850  0 16:09 ?        00:00:00 postgres: nagiosxi nagiosxi ::1(37364) idle
postgres 15164  1850  0 16:09 ?        00:00:00 postgres: nagiosxi nagiosxi ::1(37370) idle
postgres 15181  1850  0 16:09 ?        00:00:00 postgres: nagiosxi nagiosxi ::1(37374) idle
root     15367  1617  0 16:09 ?        00:00:00 sshd: root@pts/0
root     15604 15367  0 16:09 pts/0    00:00:00 -bash
nagios   16043 31811  0 16:09 ?        00:00:00 /usr/local/nagios/libexec/check_nt -H 88.149.179.123 -t 30 -s Ah124gen -p 12489 -v SERVICESTATE -l DNS -d SHOWALL
root     16044 15604  1 16:09 pts/0    00:00:00 ps -ef --cols=300
apache   17391  1971  0 08:27 ?        00:03:20 /usr/sbin/httpd
postgres 17446  1850  0 08:27 ?        00:00:10 postgres: nagiosxi nagiosxi ::1(39298) idle
apache   17708  1971  0 05:01 ?        00:04:59 /usr/sbin/httpd
apache   17746  1971  0 08:28 ?        00:03:20 /usr/sbin/httpd
apache   17804  1971  0 08:28 ?        00:03:19 /usr/sbin/httpd
postgres 17805  1850  0 08:28 ?        00:00:10 postgres: nagiosxi nagiosxi ::1(39470) idle
postgres 17973  1850  0 08:28 ?        00:00:10 postgres: nagiosxi nagiosxi ::1(39498) idle
postgres 18062  1850  0 05:02 ?        00:00:15 postgres: nagiosxi nagiosxi ::1(46650) idle
apache   18360  1971  0 08:28 ?        00:03:18 /usr/sbin/httpd
postgres 18431  1850  0 08:28 ?        00:00:10 postgres: nagiosxi nagiosxi ::1(39612) idle
apache   30140  1971  0 Jul24 ?        00:10:21 /usr/sbin/httpd
postgres 30152  1850  0 Jul24 ?        00:00:32 postgres: nagiosxi nagiosxi ::1(60444) idle
nagios   31809     1  0 Jul20 ?        00:17:33 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   31810 31809  0 Jul20 ?        00:02:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   31811 31809  0 Jul20 ?        00:02:27 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   31812 31809  0 Jul20 ?        00:02:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   31813 31809  0 Jul20 ?        00:02:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   31820  2008  0 Jul20 ?        00:01:37 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   31821 31820  0 Jul20 ?        00:41:52 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   31831 31809  0 Jul20 ?        00:00:24 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
apache   32559  1971  0 07:22 ?        00:03:52 /usr/sbin/httpd

Re: A simple question about simple notification

Posted: Wed Jul 25, 2018 1:23 pm
by tgriep
Those settings for the user account look good.
Can you login to the XI GUI and go to the Home > Service details menu.
Select one of the services that are not sending the Recovery email and go to the Advanced Tab.
Screen capture that and add it to the forum post.

Then go to the Admin > Manage Email Settings menu and let us know which Mail Method is selected for the server.

Are the missing Notifications only happening for certain users or for all users?
The services that are generating the recoveries, do they have the same contacts added to them as the failing services?

Re: A simple question about simple notification

Posted: Thu Jul 26, 2018 7:47 am
by davide.bonicelli
here's the grab:
Cattura.JPG
The mail method is smtp.
I noted now that an user has only critical,warning, flapping stopped and flapping start message but no recoveries!
wtf??
but the notification preferences are good!

Re: A simple question about simple notification

Posted: Thu Jul 26, 2018 10:54 am
by tgriep
The screen capture from your post shows that it never sent a notification as the Last Notification Field says never.

Couple of things, you can open up a ticket so we can gather data from your server to check the settings.
This article has instructions for that.
https://support.nagios.com/kb/article/c ... r-769.html


Or you can post or PM me the information so we can check the settings.

Could you send in your Nagios XI System Profile so I can review it?
To get your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to the post or PM it to me.

Re: A simple question about simple notification

Posted: Fri Jul 27, 2018 4:21 am
by davide.bonicelli
Thanks Tom, you've a pm.

Re: A simple question about simple notification

Posted: Fri Jul 27, 2018 9:42 am
by tgriep
I took a look at the settings for the nagios_DustyRendering, Nagios XI Jobs service and the settings look good to me.
But in your screen capture, it never sent a notification so do you have another host and service that did not work?
Can you get the following file from the Nagios server and PM it it me?

Code: Select all

/usr/local/nagios/var/status.dat

Re: A simple question about simple notification

Posted: Fri Jul 27, 2018 10:27 am
by davide.bonicelli
i've a lot of services with this problem, as i said before the upgrade i never had.
as you could see below this service in the past send notifications regularly
Cattura.JPG

Re: A simple question about simple notification

Posted: Fri Jul 27, 2018 11:24 am
by tgriep
Do you have another host or service that failed today?
The profile and status.dat file is a snapshot in time and does not have any information on those older entries.