Page 2 of 2
Re: check_disk not timing out!
Posted: Wed May 20, 2015 2:59 pm
by BanditBBS
abrist wrote:Then stat() is definitely failing. I thought it was odd that the alarm as not triggering on timeout. . .well, there was no alarm

So I added one. Can you test this new bin to see if it times out?
AHH, even worse! It isn't timing out and I can't about the application anymore(started new session and killed it)
Re: check_disk not timing out!
Posted: Wed May 20, 2015 3:02 pm
by BanditBBS
Weird....I can abort before the timeout value but if I try after the timeout value I can not ctrl-c it.
Re: check_disk not timing out!
Posted: Wed May 20, 2015 3:04 pm
by abrist
That is seriously odd.
Can we *pretty please* have a remote session?
Re: check_disk not timing out!
Posted: Wed May 20, 2015 3:06 pm
by abrist
Ah, ok, I will do some more digging.
Re: check_disk not timing out!
Posted: Wed May 20, 2015 3:07 pm
by BanditBBS
Yes Andy, we can do remote, but no compiling on that box....guess we could compile on the nagios server first and xfer the file.
Re: check_disk not timing out!
Posted: Wed May 20, 2015 5:00 pm
by abrist
Lets wait on that remote for now. Try the following program I whipped up, it should check for a stale mount:
Code: Select all
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iso646.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv) {
struct stat st;
char* mount_point;
int ret;
int option = 0;
while ((option = getopt(argc, argv,"m:")) != -1) {
switch (option) {
case 'm' : mount_point = optarg;
break;
}
}
ret = stat(mount_point, &st);
if(ret == -1 and errno == ESTALE){
printf("Mount: %s is stale\n",mount_point);
return EXIT_SUCCESS;
} else {
printf("Mount: %s is not stale\n",mount_point);
return EXIT_FAILURE;
}
}
Save it to stale.c.
Code: Select all
gcc stale.c
./a.out -m <mount point>
Or just try running the bin attached. I want to see if we can detect the stale mount point.
Re: check_disk not timing out!
Posted: Wed May 20, 2015 8:19 pm
by BanditBBS
It hangs and I have to abort the process Andy.
Re: check_disk not timing out!
Posted: Thu May 21, 2015 9:16 am
by abrist
Well, Dang. I guess we could just fork a child process (to stat the mount) and kill the pid after a timeout (if it is stale). I don't think any of the other nagios-plugins use forks though, so this may not generally be a good idea. A number of them do. I will look into forking stat() and then running a timeout on the child process - allowing us to kill it after timeout and exit critical. Sound fair?
Anyone out there have any other good ideas for detecting stale mounts using C?
Re: check_disk not timing out!
Posted: Thu May 21, 2015 9:46 am
by BanditBBS
abrist wrote:Well, Dang. I guess we could just fork a child process (to stat the mount) and kill the pid after a timeout (if it is stale). I don't think any of the other nagios-plugins use forks though, so this may not generally be a good idea. A number of them do. I will look into forking stat() and then running a timeout on the child process - allowing us to kill it after timeout and exit critical. Sound fair?
Anyone out there have any other good ideas for detecting stale mounts using C?
Definitely sounds fair to me Andy. Really, that seems about the best method and the other benefit would be I can remove my NFS check and just use check_disk to alert on stale mounts! That'll remove 1000 + checks for me
Re: check_disk not timing out!
Posted: Thu May 21, 2015 9:59 am
by abrist
I am worried though that we would have to fork() for each mount checked, track the pids, and then report on those that timeout or something. This could actually get a bit complicated, and add to the overhead of check_disk. I may need to think about this a bit.