check_disk not timing out!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: check_disk not timing out!

Post by BanditBBS »

abrist wrote:Then stat() is definitely failing. I thought it was odd that the alarm as not triggering on timeout. . .well, there was no alarm :P
So I added one. Can you test this new bin to see if it times out?

AHH, even worse! It isn't timing out and I can't about the application anymore(started new session and killed it)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: check_disk not timing out!

Post by BanditBBS »

Weird....I can abort before the timeout value but if I try after the timeout value I can not ctrl-c it.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_disk not timing out!

Post by abrist »

That is seriously odd.
Can we *pretty please* have a remote session?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_disk not timing out!

Post by abrist »

Ah, ok, I will do some more digging.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: check_disk not timing out!

Post by BanditBBS »

Yes Andy, we can do remote, but no compiling on that box....guess we could compile on the nagios server first and xfer the file.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_disk not timing out!

Post by abrist »

Lets wait on that remote for now. Try the following program I whipped up, it should check for a stale mount:

Code: Select all

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iso646.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>


int
main (int argc, char **argv) {
    struct stat st;
    char* mount_point;
    int ret;
    int option = 0;
    while ((option = getopt(argc, argv,"m:")) != -1) {
        switch (option) {
             case 'm' : mount_point = optarg;
             break;
        }
    }

    ret = stat(mount_point, &st);
    if(ret == -1 and errno == ESTALE){
        printf("Mount: %s is stale\n",mount_point);
        return EXIT_SUCCESS;
    } else {
        printf("Mount: %s is not stale\n",mount_point);
        return EXIT_FAILURE;
    }
}
Save it to stale.c.

Code: Select all

gcc stale.c
./a.out -m <mount point>
Or just try running the bin attached. I want to see if we can detect the stale mount point.
You do not have the required permissions to view the files attached to this post.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: check_disk not timing out!

Post by BanditBBS »

It hangs and I have to abort the process Andy.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_disk not timing out!

Post by abrist »

Well, Dang. I guess we could just fork a child process (to stat the mount) and kill the pid after a timeout (if it is stale). I don't think any of the other nagios-plugins use forks though, so this may not generally be a good idea. A number of them do. I will look into forking stat() and then running a timeout on the child process - allowing us to kill it after timeout and exit critical. Sound fair?

Anyone out there have any other good ideas for detecting stale mounts using C?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: check_disk not timing out!

Post by BanditBBS »

abrist wrote:Well, Dang. I guess we could just fork a child process (to stat the mount) and kill the pid after a timeout (if it is stale). I don't think any of the other nagios-plugins use forks though, so this may not generally be a good idea. A number of them do. I will look into forking stat() and then running a timeout on the child process - allowing us to kill it after timeout and exit critical. Sound fair?

Anyone out there have any other good ideas for detecting stale mounts using C?
Definitely sounds fair to me Andy. Really, that seems about the best method and the other benefit would be I can remove my NFS check and just use check_disk to alert on stale mounts! That'll remove 1000 + checks for me
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_disk not timing out!

Post by abrist »

I am worried though that we would have to fork() for each mount checked, track the pids, and then report on those that timeout or something. This could actually get a bit complicated, and add to the overhead of check_disk. I may need to think about this a bit.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked