linux/fs/exofs
Boaz Harrosh 9ff19309a9 ore: Fix NFS crash by supporting any unaligned RAID IO
In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt  with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:45:28 +03:00
..
BUGS exofs: Documentation 2009-03-31 19:44:38 +03:00
common.h Fix common misspellings 2011-03-31 11:26:23 -03:00
dir.c exofs: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:22 +08:00
exofs.h exofs: Add SYSFS info for autologin/pNFS export 2012-05-21 12:24:01 +03:00
file.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers 2011-07-20 20:47:59 -04:00
inode.c vfs: Rename end_writeback() to clear_inode() 2012-05-06 13:43:41 +08:00
Kbuild exofs: Add SYSFS info for autologin/pNFS export 2012-05-21 12:24:01 +03:00
Kconfig ore: FIX breakage when MISC_FILESYSTEMS is not set 2012-01-06 16:48:14 +02:00
Kconfig.ore ore: FIX breakage when MISC_FILESYSTEMS is not set 2012-01-06 16:48:14 +02:00
namei.c vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link} 2012-03-20 21:29:32 -04:00
ore_raid.c ore: Fix NFS crash by supporting any unaligned RAID IO 2012-07-20 11:45:28 +03:00
ore_raid.h ore: RAID5 Write 2011-10-24 17:15:33 -07:00
ore.c ore: fix BUG_ON, too few sgs when reading 2012-01-06 16:49:07 +02:00
super.c exofs: Add SYSFS info for autologin/pNFS export 2012-05-21 12:24:01 +03:00
symlink.c exofs: Remove IBM copyrights 2009-06-21 17:53:47 +03:00
sys.c exofs: fix sparse non-ANSI function warning 2012-06-12 06:33:22 +03:00