Acknowledgements missing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Acknowledgements missing

Post by cmerchant »

I noticed that in your tag line, you have your database offloaded.

Could you run the following on the remote database server:

Code: Select all

tail -100 /var/log/mysqld.log
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Acknowledgements missing

Post by rajasegar »

cmerchant wrote:I noticed that in your tag line, you have your database offloaded.

Could you run the following on the remote database server:

Code: Select all

tail -100 /var/log/mysqld.log
I just posted that, just refer 2 posts back in this thread.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Acknowledgements missing

Post by cmerchant »

I just posted that, just refer 2 posts back in this thread.

Code: Select all

141027 17:46:47 [Note] Found 793650 of 793649 rows when repairing './nagios/nagios_statehistory'
150313 19:03:57 [Note] /usr/libexec/mysqld: Normal shutdown
There is no log entries between October 2014 and March 13th. Could you post the output of the following:

Code: Select all

grep db_host /usr/local/nagios/etc/ndo2db.cfg
tail -200 /usr/local/nagios/var/nagios.log
ipcs -q
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Acknowledgements missing

Post by rajasegar »

cmerchant wrote:
I just posted that, just refer 2 posts back in this thread.

Code: Select all

141027 17:46:47 [Note] Found 793650 of 793649 rows when repairing './nagios/nagios_statehistory'
150313 19:03:57 [Note] /usr/libexec/mysqld: Normal shutdown
There is no log entries between October 2014 and March 13th. Could you post the output of the following:

Code: Select all

grep db_host /usr/local/nagios/etc/ndo2db.cfg
tail -200 /usr/local/nagios/var/nagios.log
ipcs -q

Code: Select all

[nagios@nagiosprodxi1 tmp]$ grep db_host /usr/local/nagios/etc/ndo2db.cfg
db_host=nagiosproddb1
[nagios@nagiosprodxi1 tmp]$
IP address masked to example.com. Notifications log removed.

Code: Select all

[nagios@nagiosprodxi1 tmp]$ tail -200 /usr/local/nagios/var/nagios.log |grep -v NOTIFICATION > tmp.log
[nagios@nagiosprodxi1 tmp]$ cat tmp.log | sed -r 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/example.com/g' >tmp1.log
[nagios@nagiosprodxi1 tmp]$ cat tmp1.log
[1427668736] HOST ALERT: mobilebanking02;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427668766] HOST ALERT: clksha-db02;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427668766] SERVICE ALERT: TCLIENTDEPSVR;MSSQL: Connection Time;WARNING;SOFT;1;WARNING - 1.01 seconds to connect as appsmonitor
[1427668767] HOST ALERT: mobilebanking02;DOWN;HARD;2;CRITICAL - example.com: rta nan, lost 100%
[1427668786] HOST ALERT: mobilebanking02;UP;HARD;1;OK - example.com: rta 8.738ms, lost 0%
[1427668807] SERVICE ALERT: BHQPDM000203;MSSQL: Connection Time;OK;SOFT;2;OK - 0.03 seconds to connect as appsmonitor
[1427668807] HOST ALERT: mobilebanking01;UP;SOFT;2;OK - example.com: rta 11.424ms, lost 0%
[1427668836] HOST ALERT: clksha-db02;UP;SOFT;2;OK - example.com: rta 8.017ms, lost 0%
[1427668866] SERVICE ALERT: GTW-DB01;Memory: Physical;WARNING;SOFT;2;WARNING: physical memory: Total: 16G - Used: 12G (75%) - Free: 3.96G (25%) > warning
[1427668896] SERVICE ALERT: SGR_BR_Bukittinggi;BW - Port 41 Virtual-Access1 Sec-1Mb;OK;HARD;3;OK - interface Virtual-Access1 usage is in:0.00% (0.00MBi/s) out:0.01% (0.00MBi/s)
[1427668987] SERVICE ALERT: BHQPDMESXIU09;VMH CPU Usage;WARNING;SOFT;1;ESX3 WARNING - cpu usage=70.75 %
[1427668987] SERVICE ALERT: Cambodia_UAT_SIT_DR;AS400 Disk Usage System Alt;CRITICAL;HARD;2;CRITICAL: Network error:java.net.SocketException: Connection reset
[1427669005] SERVICE ALERT: SQLSVR05;MSSQL: Connection Time;WARNING;SOFT;1;WARNING - 1.24 seconds to connect as appsmonitor
[1427669065] SERVICE ALERT: SQL-PP4;Memory: Physical;OK;SOFT;2;OK: physical memory: Total: 2G - Used: 1.01G (50%) - Free: 0.991G (50%)
[1427669066] SERVICE ALERT: TCLIENTDEPSVR;MSSQL: Connection Time;OK;SOFT;2;OK - 0.06 seconds to connect as appsmonitor
[1427669096] SERVICE ALERT: MIDAS2;CPU Usage;WARNING;HARD;3;WARNING: 30m: average load 80% > warning, 10m: average load 55%, 5m: average load 47%
[1427669096] SERVICE ALERT: BDWTWS1;CPU Load;WARNING;SOFT;2;WARNING - load average: 0.63, 0.71, 0.66
[1427669107] HOST ALERT: mobilebanking01;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669176] SERVICE ALERT: CTCSUNI2;MSSQL: Connection Time;WARNING;SOFT;1;WARNING - 1.10 seconds to connect as appsmonitor
[1427669185] HOST ALERT: BHQPDM000131;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669216] HOST ALERT: mobilebanking02;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669246] SERVICE ALERT: BHQPDM000064;Port: 22;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1427669258] HOST ALERT: BHQPDM000064;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669276] HOST ALERT: mobilebanking02;UP;SOFT;2;OK - example.com: rta 10.261ms, lost 0%
[1427669276] HOST FLAPPING ALERT: mobilebanking02;STARTED; Host appears to have started flapping (21.1% change > 20.0% threshold)
[1427669306] SERVICE ALERT: SQLSVR05;MSSQL: Connection Time;OK;SOFT;2;OK - 0.04 seconds to connect as appsmonitor
[1427669306] SERVICE ALERT: BHQPDMESXIU06;VMH CPU Usage;WARNING;SOFT;1;ESX3 WARNING - cpu usage=74.44 %
[1427669306] SERVICE FLAPPING ALERT: BHQPPT000004;Memory: Physical;STOPPED; Service appears to have stopped flapping (4.9% change < 5.0% threshold)
[1427669306] HOST ALERT: mobilebanking01;UP;SOFT;2;OK - example.com: rta 13.326ms, lost 0%
[1427669377] SERVICE ALERT: BHQPDM000363;Memory: Physical;WARNING;SOFT;1;WARNING: physical memory: Total: 4G - Used: 3.11G (77%) - Free: 910M (23%) > warning
[1427669377] SERVICE ALERT: BHQPDM000064;Disk: Drive All;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427669396] HOST ALERT: BHQPDM000064;DOWN;HARD;2;CRITICAL - example.com: rta nan, lost 100%
[1427669425] SERVICE ALERT: esbwmbdr02;CPU Load;CRITICAL;SOFT;1;CRITICAL - load average: 1.68, 1.33, 0.89
[1427669436] HOST ALERT: BHQPDM000131;UP;SOFT;2;OK - example.com: rta 0.826ms, lost 0%
[1427669446] HOST ALERT: BHQPDM000003;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669467] SERVICE ALERT: CTCSUNI2;MSSQL: Connection Time;OK;SOFT;2;OK - 0.05 seconds to connect as appsmonitor
[1427669467] SERVICE ALERT: GTW-DB01;Memory: Physical;WARNING;HARD;3;WARNING: physical memory: Total: 16G - Used: 12.1G (75%) - Free: 3.94G (25%) > warning
[1427669516] SERVICE ALERT: esbmqdb02;AUD Channel Count;CRITICAL;SOFT;1;Channel count is OK
[1427669516] SERVICE ALERT: BHQPDM000064;Memory: Physical;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427669526] SERVICE ALERT: CMTWEB02-INT;Log: VREMIT_CreateTransHistory;OK;HARD;1;OK - no errors or warnings
[1427669546] SERVICE ALERT: BHQPDM000064;Port: 22;CRITICAL;HARD;2;CRITICAL - Socket timeout after 10 seconds
[1427669576] SERVICE ALERT: Cambodia_UAT_SIT_DR;AS400 Disk Usage System Alt;OK;HARD;2;OK - *SYSTEM: 69.11%
[1427669586] SERVICE ALERT: BHQPDMESXIU09;VMH CPU Usage;OK;SOFT;2;ESX3 OK - cpu usage=55.75 %
[1427669616] SERVICE ALERT: CMTWEB02-INT;Log: VREMIT_ReceiveMoneyChecker;OK;HARD;1;OK - no errors or warnings
[1427669626] HOST ALERT: mobilebanking02;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669676] SERVICE ALERT: BHQPDM000064;Disk: Drive All;UNKNOWN;HARD;2;CHECK_NRPE: Socket timeout after 60 seconds.
[1427669676] SERVICE ALERT: CMTWEB02-INT;Log: VREMIT_ActiveX;OK;HARD;1;OK - no errors or warnings
[1427669676] HOST ALERT: mobilebanking02;UP;SOFT;2;OK - example.com: rta 9.358ms, lost 0%
[1427669685] SERVICE ALERT: BDWTWS1;CPU Load;OK;SOFT;3;OK - load average: 0.65, 0.65, 0.64
[1427669686] HOST ALERT: BHQPDM000003;UP;SOFT;2;OK - example.com: rta 783.545ms, lost 0%
[1427669696] SERVICE ALERT: MIDAS2;CPU Usage;OK;HARD;3;OK: 30m: average load 64%, 10m: average load 48%, 5m: average load 47%
[1427669696] SERVICE FLAPPING ALERT: MIDAS2;CPU Usage;STARTED; Service appears to have started flapping (21.4% change >= 20.0% threshold)
[1427669696] HOST ALERT: mobilebanking01;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669715] SERVICE ALERT: BHQPDM000064;CPU Usage;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427669716] HOST ALERT: mobilebanking01;UP;SOFT;2;OK - example.com: rta 8.963ms, lost 0%
[1427669725] SERVICE ALERT: CMTWEB02-INT;Log: VREMIT_FailedBeneficiary;OK;HARD;1;OK - no errors or warnings
[1427669738] SERVICE ALERT: CMTWEB02-INT;Log: VREMIT_PrimaryKeyViolation;OK;HARD;1;OK - no errors or warnings
[1427669756] SERVICE ALERT: CMTWEB02-INT;Log: VREMIT_RootCommTransAborted;OK;HARD;1;OK - no errors or warnings
[1427669776] SERVICE ALERT: BHQPDM000064;Memory: Pagefile;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427669816] SERVICE ALERT: esbmqdb02;AUD Channel Count;OK;SOFT;2;Channel count is OK
[1427669816] HOST ALERT: mobilebanking02;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669828] Auto-save of retention data completed successfully.
[1427669876] SERVICE ALERT: Cambodia_UAT_SIT_DR;AS400 Disk Usage System Alt;CRITICAL;SOFT;1;CRITICAL: Network error:java.net.SocketException: Connection reset
[1427669888] HOST ALERT: mobilebanking02;UP;SOFT;2;WARNING - example.com: rta 7.766ms, lost 80%
[1427669906] SERVICE ALERT: BHQPDMESXIU06;VMH CPU Usage;OK;SOFT;2;ESX3 OK - cpu usage=68.54 %
[1427669917] HOST ALERT: mobilebanking02;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427669947] HOST ALERT: mobilebanking02;UP;SOFT;2;OK - example.com: rta 8.094ms, lost 0%
[1427669976] SERVICE ALERT: BHQPDM000363;Memory: Physical;OK;SOFT;2;OK: physical memory: Total: 4G - Used: 2.62G (65%) - Free: 1.38G (35%)
[1427669976] SERVICE FLAPPING ALERT: PRODBDS027;Disk: Drive All;STOPPED; Service appears to have stopped flapping (3.8% change < 5.0% threshold)
[1427669986] SERVICE ALERT: PDMSMTP02;CPU Load;WARNING;SOFT;1;WARNING - load average: 0.76, 0.42, 0.36
[1427669996] HOST ALERT: BHQPDM000003;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670016] SERVICE ALERT: BHQPDM000003;Memory: Physical;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670028] SERVICE ALERT: BHQPDM000003;Disk: Drive All;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670028] SERVICE ALERT: esbwmbdr02;CPU Load;CRITICAL;SOFT;2;CRITICAL - load average: 1.12, 1.32, 1.13
[1427670028] HOST ALERT: mobilebanking01;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670029] HOST ALERT: BHQPDM000003;UP;SOFT;2;OK - example.com: rta 7.212ms, lost 0%
[1427670029] HOST FLAPPING ALERT: BHQPDM000003;STARTED; Host appears to have started flapping (23.0% change > 20.0% threshold)
[1427670067] SERVICE ALERT: BHQPDM000003;CPU Usage;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670086] HOST ALERT: RCSDB01;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670106] SERVICE ALERT: BHQPDM000064;Memory: Physical;UNKNOWN;SOFT;2;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670116] HOST ALERT: BHQPDM000130;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670116] HOST ALERT: RCSDB01;DOWN;HARD;2;CRITICAL - example.com: rta nan, lost 100%
[1427670127] HOST ALERT: BHQPDM000319;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670127] HOST ALERT: mobilebanking01;UP;SOFT;2;OK - example.com: rta 8.461ms, lost 0%
[1427670136] SERVICE ALERT: BCSDB01;Tablespace: EBDECS_STATIC_IDX;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/BCSPROD. timeout alarm
[1427670136] SERVICE ALERT: BCSDB01;CPU Stats;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670136] SERVICE ALERT: BHQPDM000345;MySQL: br1m Connection Time;WARNING;SOFT;1;WARNING - 2.36 seconds to connect as root
[1427670146] SERVICE ALERT: BHQPDM000335;Oracle: Connection Time;WARNING;SOFT;1;WARNING - 1.09 seconds to connect as APPSMONITOR
[1427670146] SERVICE ALERT: BHQPDM000345;MySQL: test Connection Time;WARNING;SOFT;1;WARNING - 1.05 seconds to connect as root
[1427670146] HOST ALERT: EPMSVR;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670146] HOST ALERT: BHQPDM000130;DOWN;HARD;2;CRITICAL - example.com: rta nan, lost 100%
[1427670158] SERVICE ALERT: BHQPDM000200;Disk: Drive All;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
[1427670158] SERVICE ALERT: BHQKPK200019;MSSQL: Connection Time;WARNING;SOFT;1;WARNING - 1.28 seconds to connect as appsmonitor
[1427670158] SERVICE ALERT: RCSDB01;Tablespace: EBDOCGEN_STATIC;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/RCSPROD. timeout alarm
[1427670158] SERVICE ALERT: BCSDB01;Tablespace: EBBATCH_STATIC_IDX;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/BCSPROD. timeout alarm
[1427670158] SERVICE ALERT: BCSDB01;Tablespace: EBCOMMON_LOB;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/BCSPROD. timeout alarm
[1427670158] SERVICE ALERT: BCBSVR1;MSSQL: Connection Time;CRITICAL;SOFT;1;CRITICAL - 27.03 seconds to connect as appsmonitor
[1427670158] SERVICE ALERT: NEWWESTERNUNION;MSSQL: Connection Time;CRITICAL;SOFT;1;CRITICAL - 14.32 seconds to connect as appsmonitor
[1427670158] HOST ALERT: BHQPDM000345;DOWN;SOFT;1;CRITICAL - example.com: rta nan, lost 100%
[1427670166] SERVICE ALERT: BHQPDM000308;Memory: Pagefile;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670166] SERVICE ALERT: BHQPDM000255;Oracle: RCSRPT Session Usage;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/RCSRPT. timeout alarm
[1427670166] HOST ALERT: EPMSVR;UP;SOFT;2;OK - example.com: rta 50.699ms, lost 0%
[1427670177] SERVICE ALERT: RCSDB01;Tablespace: EBBATCH_TRANS;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/RCSPROD. timeout alarm
[1427670177] SERVICE ALERT: RCSDB01;Tablespace: EBMCIF_TRANS_IDX;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/RCSPROD. timeout alarm
[1427670177] SERVICE ALERT: RCSDB01;Tablespace: EBCOMMON_STATIC_IDX;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/RCSPROD. timeout alarm
[1427670177] SERVICE ALERT: MBCSVRD011;Memory: Physical;WARNING;SOFT;1;WARNING: physical memory: Total: 2G - Used: 1.4G (70%) - Free: 613M (30%) > warning
[1427670177] SERVICE ALERT: RCSDB01;Tablespace: EBDCES_STATIC;CRITICAL;SOFT;1;CRITICAL - cannot connect to example.com:1521/RCSPROD. timeout alarm
[1427670177] SERVICE ALERT: BHQPDM000353;Database Online DEFAULT: master;CRITICAL;SOFT;1;CRITICAL - connection could not be established within 60 seconds
[1427670177] SERVICE ALERT: Cambodia_UAT_SIT_DR;AS400 Disk Usage System Alt;CRITICAL;HARD;2;CRITICAL: Network error:java.net.SocketException: Connection reset
[1427670177] SERVICE ALERT: BHQPDM000319;Disk: Drive All x Cluster P Q;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 60 seconds.
[1427670177] HOST ALERT: RCSDB01;UP;HARD;1;OK - example.com: rta 1.349ms, lost 0%
[1427670186] HOST ALERT: BHQPDM000319;UP;SOFT;2;OK - example.com: rta 0.806ms, lost 0%
[1427670186] HOST ALERT: BHQPDM000130;UP;HARD;1;OK - example.com: rta 0.894ms, lost 0%
[1427670196] SERVICE ALERT: BDWDSS1;CPU Load;CRITICAL;SOFT;1;CRITICAL - load average: 1.68, 1.21, 0.77
[1427670206] HOST ALERT: BHQPDM000345;UP;SOFT;2;OK - example.com: rta 13.796ms, lost 0%
[1427670220] SERVICE ALERT: BHQPDM000335;Disk: Drive All;OK;HARD;2;OK: All drives within bounds.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Acknowledgements missing

Post by abrist »

You mentioned that you are using a ramdisk earlier. Was the retention.dat directive path location changed to use the ramdisk path in nagios.cfg:

Code: Select all

grep retention /usr/local/nagios/etc/nagios.cfg
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Acknowledgements missing

Post by rajasegar »

abrist wrote:You mentioned that you are using a ramdisk earlier. Was the retention.dat directive path location changed to use the ramdisk path in nagios.cfg:

Code: Select all

grep retention /usr/local/nagios/etc/nagios.cfg
No. Retention.dat was still in /usr/local/nagios/var
Only status.dat and object.cache was in the ramdisk.

We disabled the ramdisk to troubleshoot the Gearman scheduling issue.
Currently it is not being used.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Acknowledgements missing

Post by abrist »

Do new acknowledgements/comments show in XI?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Acknowledgements missing

Post by rajasegar »

abrist wrote:Do new acknowledgements/comments show in XI?
Yes. It is working fine now.
This is the second time this has happened. Most of the alerts went into unhandled classification.
The monitoring team were not exactly pleased to go through all the alerts again.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Acknowledgements missing

Post by abrist »

1. I would suggest re-evaluating why so many alerts need acks - have you considered disabling those checks or increasing their thresholds? If you don't want them to alert, just remove the alert options or use "null").
2. The only way you lose the acks and comments is by removing/losing retention.dat. Check with your teams as we do occasionally suggest to people to remove retention.dat on these forums, but in your case, it should pretty much never be performed without understanding the consequences due to your reliance on the acks for silencing alerts.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Acknowledgements missing

Post by rajasegar »

abrist wrote:1. I would suggest re-evaluating why so many alerts need acks - have you considered disabling those checks or increasing their thresholds? If you don't want them to alert, just remove the alert options or use "null").
2. The only way you lose the acks and comments is by removing/losing retention.dat. Check with your teams as we do occasionally suggest to people to remove retention.dat on these forums, but in your case, it should pretty much never be performed without understanding the consequences due to your reliance on the acks for silencing alerts.
We get people to acknowledge so that we know someone looked into it.
As mentioned before nobody removed the retention.dat. There must be some other reason.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
Locked