Logfile monitoring for rotated logs

skattam · Post by **skattam** » Wed May 02, 2018 6:43 am

Hi,

We are trying to monitor a particular pattern 24/7 in a folder contains logs that get rotated for every 10 MB of data. These are basically execution logs of batch Jobs from a WebSphere system. We want to monitor the Exception caught string in logfiles no matter where it is.

so we are using check_log3.pl script and we gave the log folder in -l option with filename pattern as POsync.log.* in -m option but results are not accurate.

For a job, it created the pattern in POsync.log itself but still it wasn’t recognized.

Here is an example.

iptposnc 2018-04-27 11:31:06,832 [main] INFO org.springframework.beans.factory.xml.XmlBeanDefinitionReader - Loading XML bean definitions from class path resource [org/springframework/jdbc/support/sql-error-codes.xml]
iptposnc 2018-04-27 11:31:06,931 [main] INFO org.springframework.jdbc.support.SQLErrorCodesFactory - SQLErrorCodes loaded: [DB2, Derby, H2, HSQL, Informix, MS-SQL, MySQL, Oracle, PostgreSQL, Sybase, Hana]
iptposnc 2018-04-27 11:31:06,948 [main] INFO PaymentHostDao.class - Exception caught while getting PO data T7756_IPT_PO tablePreparedStatementCallback; SQL [UPDATE LOWES.T7756_IPT_PO set RR_NBR = ? , FNL_DTN_LCT_NBR = ?, T7624_PO_STS_CD = CASE WHEN (T7624_PO_STS_CD > ?) THEN T7624_PO_STS_CD ELSE ? END , E213_PO_STS_CD = CASE WHEN (E213_PO_STS_CD > ?) THEN E213_PO_STS_CD ELSE ? END , CDK_FCY_LCN_NBR = ? , SHP_DT = ?, ARV_DT = ?, INI_RCP_DT = CASE WHEN (INI_RCP_DT IS NOT NULL) THEN INI_RCP_DT ELSE ? END , SHP_FRO_VBU_NBR = ?, CDN_VBU_NBR = ?, BYR_USE_NME = ? , MER_CST_AMT = ?, TOT_PO_ORD_QTY = ?, TOT_GRS_WGT_MSR = ?, TOT_GRS_WGT_UOM_TXT = 'LBS', TOT_PO_VLM_MSR = ?, TOT_PO_VLM_UOM_TXT = 'CFT', TL_NBR = ?, FIN_IMT_NBR = ?, T4830_CNR_ORD_SIZ_CD = ?, T4830_CNR_SHP_SIZ_CD = ?, T5104_LAD_PNT_ECG_LCT_CD = ?, T5104_PNT_DCH_ECG_LCT_CD = ?, T5104_PNT_OF_RSB_LCT_CD = ?, PO_SNC_IDC = 'N',PO_LST_SNC_DM = current_timestamp, UPD_DM = current_timestamp,UPD_ID = ? WHERE T7756_PO_ID = ? AND (PO_SNC_IDC = 'Y' OR PO_SNC_IDC = 'H' OR PO_SNC_IDC = 'A' ) ]; [jcc][t4][102][10040][4.18.60] Batch failure. The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null; nested exception is com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][4.18.60] Batch failure. The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null

Logs:
-rw-r--r-- 1 iptrbolt staff 1048658 Feb 20 09:59 POSync.log.10
-rw-r--r-- 1 iptrbolt staff 1048681 Feb 21 14:14 POSync.log.9
-rw-r--r-- 1 iptrbolt staff 1048755 Feb 21 14:14 POSync.log.8
-rw-r--r-- 1 iptrbolt staff 1048711 Feb 21 14:14 POSync.log.7
-rw-r--r-- 1 iptrbolt staff 1048934 Feb 21 14:14 POSync.log.6
-rw-r--r-- 1 iptrbolt staff 1048631 Apr 27 10:26 POSync.log.5
-rw-r--r-- 1 iptrbolt staff 1048958 Apr 27 11:19 POSync.log.4
-rw-r--r-- 1 iptrbolt staff 1048671 Apr 27 11:19 POSync.log.3
-rw-r--r-- 1 iptrbolt staff 1048752 Apr 27 11:19 POSync.log.2
-rw-r--r-- 1 iptrbolt staff 1049103 Apr 27 11:19 POSync.log.1
-rw-r--r-- 1 iptrbolt staff 666687 Apr 27 12:17 POSync.log

Location:
/usr/iptbatch/PaymentsPOSyncBatch/log

Command:

"/home/batman/nagios/check_log3.pl -l '/usr/iptbatch/PaymentsPOSyncBatch/log/' -m 'POSync.log.*' -p 'Exception caught' -p 'exception caught' -c 1"

Please let us know how we can achieve this.

Let me know if any clarifications.

Post by **mcapra** » Wed May 02, 2018 9:20 am

Just a hunch (I could be totally wrong), but it's possible that check_log3 is persisting information that it already checked those files since their names are consistent and they all seem to be roughly the same length. check_log3 by default will store a file in /tmp that keeps track of which files and the lines in those files that it's already checked. You might take a look at that seekfile to confirm/deny this hunch. I've not spent any time looking at how that seekfile works.

Code: Select all

-s, --seekfile=
    The temporary file to store the seek position of the last scan.  If not
    specified, it will be automatically generated in /tmp, based on the
    log file's base name.  If this is a directory, the seek file will be auto-
    generated there instead of in /tmp.
    If you specify the system's null device (/dev/null), the entire log file
    will be read every time.

If my hunch is right, you might alter your logrotate rule for these logs to append the date or timestamp rather than a simple series number and see if that simple change has the desired effect.

Also, a plug for Nagios Log Server which is superior to check_log3 in many ways.

kyang · Post by **kyang** » Wed May 02, 2018 4:09 pm

Thanks @mcapra!

Nagios Support Forum

Logfile monitoring for rotated logs

Logfile monitoring for rotated logs

Re: Logfile monitoring for rotated logs

Re: Logfile monitoring for rotated logs