Page 1 of 1
File check assistance
Posted: Thu Aug 27, 2015 7:40 am
by jkinning
I am having some difficulties in getting these 2 requests figured out.
1.) Check that files pamalm.txt, pamhold.txt, and pamsec.txt are available in directory e:\eas\server\batch\import\fwia\ on specific servers at 11:30 pm M – F. Send notification to Support only once.
I was thinking of using something like this:
Code: Select all
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\*.txt" max-dir-depth=0 MinCrit=0
but I wasn't sure how to create the check to only check at 11:30pm. Is it better to just always check and then send notification at 11:30pm if no files are present? The directory is normally empty but another job on a different server drops these 3 text files on this server after it completes. Today this is a manual task where people login to check to see if these files exist. We have Nagios so I am trying to get Nagios to eliminate the manual checking. Makes sense right?
2.) Check to make sure file e:\eas\server\log\alloc.log is not open for more than 5 minutes on server. Once that is tested they will up the time to around 2 hours.
I looked around for a "file open" plugin but didn't find anything that looked to fit the bill. I didn't know if check_nrpe had anything that could help or if anyone else is doing something that could help out with what I am trying to accomplish. I am trying to check to make sure the application didn't hang. The way I was thinking of doing it was watch the log file. When the application is processing the files it writes to this log file. This is a long process but shouldn't take long than 2 hours. I've heard it has taken a little over an hour to process everything. So if I could see that the log file was in use and have the service check initially for 5 minutes and then change that to 120 minutes or a little less to see if it was still in use or being written to and if so send out the notificaion.
Re: File check assistance
Posted: Thu Aug 27, 2015 10:47 am
by jdalrymple
Part 1) Scheduled recurring downtime?
Part 2) The only 'lsof' I'm aware of for Windows is handle. Maybe write a powershell that matches the string "No matching handles found."?
Code: Select all
D:\Users\jdalrymple\Downloads\Handle>Handle.exe c:\users\jdalrymple\NTUSER.DAT
Handle v4.0
Copyright (C) 1997-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
System pid: 4 type: File 304: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ec-26f0-11e5-80da-e41d2d741090}.TxR.blf
System pid: 4 type: File 524: C:\Users\jdalrymple\NTUSER.DAT
System pid: 4 type: File 590: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ed-26f0-11e5-80da-e41d2d741090}.TM.blf
System pid: 4 type: File 11B4: C:\Users\jdalrymple\ntuser.dat.LOG2
System pid: 4 type: File 1498: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ec-26f0-11e5-80da-e41d2d741090}.TxR.0.regtrans-ms
System pid: 4 type: File 1600: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ed-26f0-11e5-80da-e41d2d741090}.TMContainer00000000000000000002.regtrans-ms
System pid: 4 type: File 164C: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ec-26f0-11e5-80da-e41d2d741090}.TxR.1.regtrans-ms
System pid: 4 type: File 1688: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ec-26f0-11e5-80da-e41d2d741090}.TxR.2.regtrans-ms
System pid: 4 type: File 16C0: C:\Users\jdalrymple\ntuser.dat.LOG1
System pid: 4 type: File 1788: C:\Users\jdalrymple\NTUSER.DAT{77a2c7ed-26f0-11e5-80da-e41d2d741090}.TMContainer00000000000000000001.regtrans-ms
D:\Users\jdalrymple\Downloads\Handle>Handle.exe "c:\users\jdalrymple\desktop\Clipboard01.jpg"
Handle v4.0
Copyright (C) 1997-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
No matching handles found.
Re: File check assistance
Posted: Thu Aug 27, 2015 11:51 am
by jkinning
1.) I just received some additional information on this one. These 3 files need to exist but they are wondering if Nagios can check and if they are not in the directory by 11:30pm then send out a notification. I was using *.txt but there will be several .txt files but these 3 are the critical ones and if they are not there then notifications need to be sent.
Does that help?
Re: File check assistance
Posted: Thu Aug 27, 2015 12:16 pm
by jdalrymple
If it's "around 11:30" I'd still run with a scheduled downtime.
If it's "exactly 11:30" you'll likely have to do a passive check.
This:
https://exchange.nagios.org/directory/P ... nt/details
And this:
http://docs.nsclient.org/tutorial/nagios/nsca.html
Incidentally, if it was Linux I'd use this instead of a full blown agent:
https://exchange.nagios.org/directory/A ... er/details
I've never seen something similar for Windows.
Re: File check assistance
Posted: Fri Aug 28, 2015 12:52 pm
by jkinning
I am not really following you, at least I don't think so.
You saying to use the checks
Code: Select all
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\pamalm.txt" max-dir-depth=0 MinCrit=0
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\pamhold.txt" max-dir-depth=0 MinCrit=0
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\pamsec.txt" max-dir-depth=0 MinCrit=0
and just have the notifications enabled using a timeperiod of 11:30pm?
Re: File check assistance
Posted: Fri Aug 28, 2015 1:56 pm
by tmcdonald
An active check may or may not fall on a specific schedule. If it has a check interval of 5 minutes, that might turn into 5 minutes and 15 seconds, or 4 minutes and 45 seconds, depending on how loaded the system is. This compounds over time so even though it is running *about* every 5 minutes, it might not run every time the small hand on your wall clock is over a number.
Alternatively, a passive check runs only when you tell it to (or when it is triggered, in the case of a SNMP trap). This is a better choice for if you need it to run exactly at 11:30.
Re: File check assistance
Posted: Fri Sep 04, 2015 7:18 am
by jkinning
I was having some additional discussions and meetings and the reason the 11:30 time came about was because that is the time the files usually or should be there. If not, there is a problem.
So, I am now wondering if there is a way to check that all three files exist within one check and create a timeperiod for 11:30pm - 11:50pm.
Code: Select all
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\pamalm.txt" max-dir-depth=0 MinCrit=0
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\pamhold.txt" max-dir-depth=0 MinCrit=0
./check_nrpe -H eas1t -t 30 -c CheckFiles -a path="e:\\eas\server\batch\import\fwia\pamsec.txt" max-dir-depth=0 MinCrit=0
When all three files are there per the service check the notification will be sent out. This would eliminate the manual task they are currently doing by logging into the VPN each night and checking to see if the 3 files are present or not.
Re: File check assistance
Posted: Fri Sep 04, 2015 11:31 am
by lmiltchev
You can easily create a custom timeperiod under the CCM and add it to your services. As for the three checks - you could create a new BPI group, add the three checks to it, and set your warning & critical health thresholds to 100%. If any of the checks fails, the group's health would change to "Critical". Then, you could run the BPI wizard against the BPI group, and start monitoring the group's health. Hope this helps.
Re: File check assistance
Posted: Tue Sep 08, 2015 11:56 am
by jkinning
I am giving the BPI method a try since I was unable to figure out anything easier. Maybe this is easier I just am unfamiliar with it.
So, I created the BPI group with the three services and defined them like you had mentioned with 100% and I did get a notification. A couple items I'm not sure on is the IP is 127.0.0.1 should I change that to the actual host in which the files should be present or is that normal for the BPI group? Also, since I have the three services with the notification time of 23:29-23:31 M-F do I also need to use that timeperiod for the BPI group?
Again, I am just trying to get a notification from Nagios around 11:30pm to let me know that these three files are present. Eliminating a manual process today where people login and verify.
Re: File check assistance
Posted: Tue Sep 08, 2015 2:58 pm
by lmiltchev
The host's IP should be 127.0.0.1 as this is a "dummy" host, so no, you don't need to change it. If you need to be notified only during a custom timeperiod, you will need to modify the notificaiton period on your service (CCM->Services-><your service>->Alert Settings). Can you show us the service's config?
CCM->Services->select the "BPI config" from the "Filter by Config Name" drop-down menu, click on the "View Text Config" (the diskette icon), and copy/paste the config.