linux/mm
Zach Brown 8459d86aff [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED
The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
going to call aio_complete().  It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.

Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio().  direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.

The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain.  It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.

This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio.  Instead we return the return
code of the operation and let the aio core call aio_complete().  This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.

Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it.  XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
 We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:21 -08:00
..
allocpercpu.c [PATCH] Allow NULL pointers in percpu_free 2006-12-07 08:39:22 -08:00
backing-dev.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c [PATCH] BLOCK: Separate the bounce buffering code from the highmem code [try #6] 2006-09-30 20:32:11 +02:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap_xip.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap.c [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED 2006-12-10 09:57:21 -08:00
filemap.h Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
fremap.c [PATCH] kill install_file_pte's pte_val 2006-12-07 08:39:23 -08:00
highmem.c [PATCH] BLOCK: Separate the bounce buffering code from the highmem code [try #6] 2006-09-30 20:32:11 +02:00
hugetlb.c [PATCH] mm: make compound page destructor handling explicit 2006-12-07 08:39:25 -08:00
internal.h [PATCH] mm: VM_BUG_ON 2006-09-26 08:48:44 -07:00
Kconfig Fix "can not" in Documentation and Kconfig 2006-10-03 22:53:09 +02:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
Makefile [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
memory_hotplug.c [PATCH] Get rid of zone_table[] 2006-12-07 08:39:20 -08:00
memory.c [PATCH] read_zero_pagealigned() locking fix 2006-12-10 09:55:39 -08:00
mempolicy.c [PATCH] struct path: convert mm 2006-12-08 08:28:47 -08:00
mempool.c [PATCH] dm: work around mempool_alloc, bio_alloc_bioset deadlocks 2006-09-01 11:39:09 -07:00
migrate.c [PATCH] radix-tree: RCU lockless readside 2006-12-07 08:39:25 -08:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c [PATCH] mlock cleanup 2006-12-07 08:39:22 -08:00
mmap.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c [PATCH] paravirt: lazy mmu mode hooks.patch 2006-10-01 00:39:33 -07:00
mremap.c [PATCH] paravirt: lazy mmu mode hooks.patch 2006-10-01 00:39:33 -07:00
msync.c [PATCH] mm: msync() cleanup 2006-09-26 08:48:45 -07:00
nommu.c [PATCH] struct path: convert mm 2006-12-08 08:28:47 -08:00
oom_kill.c [PATCH] oom: less memdie 2006-12-07 08:39:20 -08:00
page_alloc.c [PATCH] fault-injection: defaults likely to please a new user 2006-12-08 08:29:03 -08:00
page_io.c [PATCH] swsusp: use block device offsets to identify swap locations 2006-12-07 08:39:27 -08:00
page-writeback.c [PATCH] io-accounting: write accounting 2006-12-10 09:55:41 -08:00
pdflush.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] io-accounting-read-accounting nfs fix 2006-12-10 09:55:41 -08:00
rmap.c [PATCH] mm: more commenting on lock ordering 2006-10-20 10:26:44 -07:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
shmem.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
slab.c [PATCH] fault-injection: defaults likely to please a new user 2006-12-08 08:29:03 -08:00
slob.c [PATCH] Make kmem_cache_destroy() return void 2006-09-27 08:26:11 -07:00
sparse.c [PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int 2006-12-07 08:39:23 -08:00
swap_state.c [PATCH] lockdep: locking init debugging improvement 2006-07-03 15:27:02 -07:00
swap.c [PATCH] hotplug CPU: clean up hotcpu_notifier() use 2006-12-07 08:39:39 -08:00
swapfile.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
thrash.c [PATCH] make mm/thrash.c:global_faults static 2006-12-07 08:39:22 -08:00
tiny-shmem.c [PATCH] struct path: convert mm 2006-12-08 08:28:47 -08:00
truncate.c [PATCH] io-accounting: write-cancel accounting 2006-12-10 09:55:41 -08:00
util.c [PATCH] slab: clean up leak tracking ifdefs a little bit 2006-10-04 07:55:13 -07:00
vmalloc.c [PATCH] Fix strange size check in __get_vm_area_node() 2006-11-16 11:43:38 -08:00
vmscan.c [PATCH] hotplug CPU: clean up hotcpu_notifier() use 2006-12-07 08:39:39 -08:00
vmstat.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00