linux/fs/ocfs2
Junxiao Bi e0cbb79805 ocfs2: o2hb: add negotiate timer
This series of patches is to fix the issue that when storage down, all
nodes will fence self due to write timeout.

With this patch set, all nodes will keep going until storage back
online, except if the following issue happens, then all nodes will do as
before to fence self.

1. io error got
2. network between nodes down
3. nodes panic

This patch (of 6):

When storage down, all nodes will fence self due to write timeout.  The
negotiate timer is designed to avoid this, with it node will wait until
storage up again.

Negotiate timer working in the following way:

1. The timer expires before write timeout timer, its timeout is half
   of write timeout now.  It is re-queued along with write timeout timer.
   If expires, it will send NEGO_TIMEOUT message to master node(node with
   lowest node number).  This message does nothing but marks a bit in a
   bitmap recording which nodes are negotiating timeout on master node.

2. If storage down, nodes will send this message to master node, then
   when master node finds its bitmap including all online nodes, it sends
   NEGO_APPROVL message to all nodes one by one, this message will
   re-queue write timeout timer and negotiate timer.  For any node doesn't
   receive this message or meets some issue when handling this message, it
   will be fenced.  If storage up at any time, o2hb_thread will run and
   re-queue all the timer, nothing will be affected by these two steps.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
..
cluster ocfs2: o2hb: add negotiate timer 2016-05-27 14:49:37 -07:00
dlm ocfs2/dlm: return zero if deref_done message is successfully handled 2016-04-28 19:34:04 -07:00
dlmfs mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
acl.c ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
acl.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
alloc.c ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec 2016-05-19 19:12:14 -07:00
alloc.h ocfs2: constify ocfs2_extent_tree_operations structures 2016-01-14 16:00:49 -08:00
aops.c Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
aops.h ocfs2: fix ip_unaligned_aio deadlock with dio work queue 2016-03-25 16:37:42 -07:00
blockcheck.c
blockcheck.h
buffer_head_io.c ocfs2: clear the rest of the buffers on error 2015-09-04 16:54:41 -07:00
buffer_head_io.h
dcache.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dcache.h ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock 2014-04-03 16:20:55 -07:00
dir.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
dir.h VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dlmglue.c posix_acl: Inode acl caching fixes 2016-03-31 00:30:15 -04:00
dlmglue.h ocfs2: avoid blocking in ocfs2_mark_lockres_freeing() in downconvert thread 2014-04-03 16:20:55 -07:00
export.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
export.h
extent_map.c ocfs2: neaten do_error, ocfs2_error and ocfs2_abort 2015-09-04 16:54:41 -07:00
extent_map.h
file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 11:01:31 -07:00
file.h ocfs2: prepare some interfaces used in append direct io 2015-02-16 17:56:04 -08:00
filecheck.c ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
filecheck.h ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
heartbeat.c
heartbeat.h
inode.c ocfs2: fix improper handling of return errno 2016-05-26 15:35:44 -07:00
inode.h ocfs2: fix ip_unaligned_aio deadlock with dio work queue 2016-03-25 16:37:42 -07:00
ioctl.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ioctl.h
journal.c ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
journal.h jbd2: add support for avoiding data writes during transaction commits 2016-04-24 00:56:07 -04:00
Kconfig
localalloc.c ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
localalloc.h ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters 2014-02-06 13:48:51 -08:00
locks.c ocfs2: fix flock panic issue 2015-12-29 17:45:49 -08:00
locks.h
Makefile ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
mmap.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
mmap.h
move_extents.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
move_extents.h
namei.c ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
namei.h ocfs2: do not include dio entry in case of orphan scan 2015-11-05 19:34:48 -08:00
ocfs1_fs_compat.h
ocfs2_fs.h ocfs2: fix comment in struct ocfs2_extended_slot 2016-05-19 19:12:14 -07:00
ocfs2_ioctl.h
ocfs2_lockid.h
ocfs2_lockingver.h
ocfs2_trace.h ocfs2: code clean up for direct io 2016-03-25 16:37:42 -07:00
ocfs2.h mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
quota_global.c ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas 2016-03-29 17:44:11 +02:00
quota_local.c ocfs2: neaten do_error, ocfs2_error and ocfs2_abort 2015-09-04 16:54:41 -07:00
quota.h quota: constify qtree_fmt_operations structures 2016-01-04 10:58:35 +01:00
refcounttree.c ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
refcounttree.h
reservations.c ocfs2: make resv_lock spinlock static 2015-02-10 14:30:29 -08:00
reservations.h
resize.c ocfs2: solve a problem of crossing the boundary in updating backups 2016-03-25 16:37:42 -07:00
resize.h
slot_map.c ocfs2: clean up an unneeded goto in ocfs2_put_slot() 2016-05-19 19:12:14 -07:00
slot_map.h
stack_o2cb.c ocfs2: avoid a pointless delay in o2cb_cluster_check() 2015-04-14 16:48:57 -07:00
stack_user.c char: make misc_deregister a void function 2015-08-05 10:35:49 -07:00
stackglue.c ocfs2: export ocfs2_kset for online file check 2016-03-22 15:36:02 -07:00
stackglue.h ocfs2: export ocfs2_kset for online file check 2016-03-22 15:36:02 -07:00
suballoc.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
suballoc.h ocfs2: rollback alloc_dinode counts when ocfs2_block_group_set_bits() failed 2014-04-03 16:20:56 -07:00
super.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
super.h ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
symlink.c switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
symlink.h
sysfile.c ocfs2: avoid system inode ref confusion by adding mutex lock 2014-04-03 16:20:57 -07:00
sysfile.h
uptodate.c ocfs2: remove NULL assignments on static 2014-06-04 16:53:53 -07:00
uptodate.h
xattr.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 11:01:31 -07:00
xattr.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00