Re: create_backup.sh hanging
Posted: Thu Apr 30, 2015 9:54 am
I am able to reproduce this issue on my local system - though it's slightly different from what you describe.
In my testing, if the backup script is run on a single node at a time, the state will climb from 0 to a max of 2 and then descend back to 0 - the backup completes properly.
If the backup script is run on both nodes at once, the states will get stuck and the backups never finish. I spoke with a developer, and it looks like the backup job is currently running as an instance job - which means that every instance picks it up and runs it whenever a backup is scheduled and run from the command subsystem. I think the problem is that the 'state' is a cluster-wide variable, so the backups will inevitably get stuck if running on more than 1 instance at the same time.
This is my theory so far. I will do further testing and collaberation with the developers to see if this is the root cause, and we'll fix it accordingly if it is. Thanks for the information, it was very helpful in reproducing this bug. I'll keep you informed.
Best,
Jesse
In my testing, if the backup script is run on a single node at a time, the state will climb from 0 to a max of 2 and then descend back to 0 - the backup completes properly.
If the backup script is run on both nodes at once, the states will get stuck and the backups never finish. I spoke with a developer, and it looks like the backup job is currently running as an instance job - which means that every instance picks it up and runs it whenever a backup is scheduled and run from the command subsystem. I think the problem is that the 'state' is a cluster-wide variable, so the backups will inevitably get stuck if running on more than 1 instance at the same time.
This is my theory so far. I will do further testing and collaberation with the developers to see if this is the root cause, and we'll fix it accordingly if it is. Thanks for the information, it was very helpful in reproducing this bug. I'll keep you informed.
Best,
Jesse