b790c3b7c3
A long, complicated sequence of events, beginning with the RESEND flag not being cleared on an lkb, can result in an unlock never completing. - lkb on waiters list for remote lookup - the remote node is both the dir node and the master node, so it optimizes the lookup into a request and sends a request reply back - the request reply is saved on the requestqueue to be processed after recovery - recovery runs dlm_recover_waiters_pre() which sets RESEND flag so the lookup will be resent after recovery - end of recovery: process_requestqueue takes saved request reply which removes the lkb off the waitesr list, _without_ clearing the RESEND flag - end of recovery: dlm_recover_waiters_post() doesn't do anything with the now completed lookup lkb (would usually clear RESEND) - later, the node unmounts, unlocks this lkb that still has RESEND flag set - the lkb is on the waiters list again, now for unlock, when recovery occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND set, doesn't do anything since the master still exists - end of recovery: dlm_recover_waiters_post() takes this lkb off the waiters list because it has the RESEND flag set, then reports an error because unlocks are never supposed to be handled in recover_waiters_post(). - later, the unlock reply is received, doesn't find the lkb on the waiters list because recover_waiters_post() has wrongly removed it. - the unlock operation has been lost, and we're left with a stray granted lock - unmount spins waiting for the unlock to complete The visible evidence of this problem will be a node where gfs umount is spinning, the dlm waiters list will be empty, and the dlm locks list will show a granted lock. The fix is simply to clear the RESEND flag when taking an lkb off the waiters list. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> |
||
---|---|---|
.. | ||
ast.c | ||
ast.h | ||
config.c | ||
config.h | ||
debug_fs.c | ||
dir.c | ||
dir.h | ||
dlm_internal.h | ||
Kconfig | ||
lock.c | ||
lock.h | ||
lockspace.c | ||
lockspace.h | ||
lowcomms-sctp.c | ||
lowcomms-tcp.c | ||
lowcomms.h | ||
lvb_table.h | ||
main.c | ||
Makefile | ||
member.c | ||
member.h | ||
memory.c | ||
memory.h | ||
midcomms.c | ||
midcomms.h | ||
rcom.c | ||
rcom.h | ||
recover.c | ||
recover.h | ||
recoverd.c | ||
recoverd.h | ||
requestqueue.c | ||
requestqueue.h | ||
user.c | ||
user.h | ||
util.c | ||
util.h |