linux/mm
Hisashi Hifumi 6ccfa806a9 VFS: fix dio write returning EIO when try_to_release_page fails
Dio write returns EIO when try_to_release_page fails because bh is
still referenced.

The patch

    commit 3f31fddfa2
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 25 01:46:22 2008 -0700

        jbd: fix race between free buffer and commit transaction

was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.

I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
sometimes got EIO through this test.

The patch above fixed race between freeing buffer(dio) and committing
transaction(jbd) but I discovered that there is another race, freeing
buffer(dio) and ext3/4_ordered_writepage.

: background_writeout()
     ->write_cache_pages()
       ->ext3_ordered_writepage()
     	   walk_page_buffers() -> take a bh ref
 	   block_write_full_page() -> unlock_page
		: <- end_page_writeback
                : <- race! (dio write->try_to_release_page fails)
      	   walk_page_buffers() ->release a bh ref

ext3_ordered_writepage holds bh ref and does unlock_page remaining
taking a bh ref, so this causes the race and failure of
try_to_release_page.

To fix this race, I used the approach of falling back to buffered
writes if try_to_release_page() fails on a page.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:37 -07:00
..
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c bootmem: fix aligning of node-relative indexes and offsets 2008-08-20 15:40:31 -07:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap_xip.c mm: xip/ext2 fix block allocation race 2008-08-20 15:40:32 -07:00
filemap.c VFS: fix dio write returning EIO when try_to_release_page fails 2008-09-02 19:21:37 -07:00
fremap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c allocate structures for reservation tracking in hugetlbfs outside of spinlocks v2 2008-08-12 16:07:28 -07:00
internal.h mm: export prep_compound_page to mm 2008-07-24 10:47:17 -07:00
Kconfig mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
Makefile mmu-notifiers: core 2008-07-28 16:30:21 -07:00
memcontrol.c memcg: fix oops in mem_cgroup_shrink_usage 2008-08-12 16:07:28 -07:00
memory_hotplug.c memory-hotplug: add sysfs removable attribute for hotplug memory remove 2008-07-24 10:47:21 -07:00
memory.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
mempolicy.c do_migrate_pages(): remove unused variable 2008-08-12 16:07:29 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c mlock() fix return values 2008-08-04 16:58:45 -07:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c Merge branch 'core/locking' into core/urgent 2008-08-12 00:11:49 +02:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mremap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c nommu: Provide vmalloc_exec(). 2008-08-04 16:01:47 +09:00
oom_kill.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
page_alloc.c mm: make setup_zone_migrate_reserve() aware of overlapping nodes 2008-09-02 19:21:37 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c Remove '#include <stddef.h>' from mm/page_isolation.c 2008-09-02 09:29:01 +01:00
page-writeback.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c pdflush: use time_after() instead of open-coding it 2008-07-25 10:53:28 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: readahead scan lockless 2008-07-26 12:00:06 -07:00
rmap.c mm: dirty page tracking race fix 2008-08-20 15:40:32 -07:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
shmem.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
slab.c mm: unexport ksize 2008-07-29 23:44:26 +03:00
slob.c mm: unexport ksize 2008-07-29 23:44:26 +03:00
slub.c slub: Disable NUMA remote node defragmentation by default 2008-08-20 21:50:21 +03:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c mm/sparse.c: removed duplicated include 2008-08-12 16:07:30 -07:00
swap_state.c mm: show free swap as signed 2008-08-20 15:40:30 -07:00
swap.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
swapfile.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c VFS: fix dio write returning EIO when try_to_release_page fails 2008-09-02 19:21:37 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c Use WARN() in mm/vmalloc.c 2008-07-26 12:00:07 -07:00
vmscan.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
vmstat.c [ARM] Skip memory holes in FLATMEM when reading /proc/pagetypeinfo 2008-08-27 20:09:28 +01:00