linux

Author	SHA1	Message	Date
Henry C Chang	d3d0720d4a	ceph: do not use i_wrbuffer_ref as refcount for Fb cap We increments i_wrbuffer_ref when taking the Fb cap. This breaks the dirty page accounting and causes looping in __ceph_do_pending_vmtruncate, and ceph client hangs. This bug can be reproduced occasionally by running blogbench. Add a new field i_wb_ref to inode and dedicate it to Fb reference counting. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-11 10:44:48 -07:00
Henry C Chang	a26a185d27	ceph: fix list_add in ceph_put_snap_realm Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-11 10:44:36 -07:00
Henry C Chang	7d8e18a69d	ceph: print debug message before put mds session The mds session, s, could be freed during ceph_put_mds_session. Move dout before ceph_put_mds_session. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-11 10:44:34 -07:00
Sage Weil	fca65b4ad7	ceph: do not call __mark_dirty_inode under i_lock The __mark_dirty_inode helper now takes i_lock as of `250df6ed`. Fix the one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the flags value so that the callers can do it outside of i_lock. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-04 12:56:45 -07:00
Henry C Chang	8c71897be2	ceph: handle ceph_osdc_new_request failure in ceph_writepages_start We should unlock the page and return -ENOMEM if ceph_osdc_new_request failed. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-03 09:28:12 -07:00
Sage Weil	3772d26d87	ceph: use ihold() when i_lock is held See `0444d76ae6`. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-03 09:28:08 -07:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Linus Torvalds	50f3515828	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: Create a new key type "ceph". libceph: Get secret from the kernel keys api when mounting with key=NAME. ceph: Move secret key parsing earlier. libceph: fix null dereference when unregistering linger requests ceph: unlock on error in ceph_osdc_start_request() ceph: fix possible NULL pointer dereference ceph: flush msgr_wq during mds_client shutdown	2011-03-30 09:46:09 -07:00
Tommi Virtanen	8323c3aa74	ceph: Move secret key parsing earlier. This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-29 12:11:16 -07:00
Dave Chinner	0444d76ae6	fs: don't use igrab() while holding i_lock Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥ If we are already holding the i_lock, we have a reference to the inode so we can safely use ihold() to gain an extra reference. This avoids hangs due to lock recursion on the i_lock now that the inode_lock is gone and igrab() uses the i_lock itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Ryan Mallon <ryan@bluewatersys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-29 07:50:34 -07:00
Sage Weil	ef550f6f4f	ceph: flush msgr_wq during mds_client shutdown The release method for mds connections uses a backpointer to the mds_client, so we need to flush the workqueue of any pending work (and ceph_connection references) prior to freeing the mds_client. This fixes an oops easily triggered under UML by while true ; do mount ... ; umount ... ; done Also fix an outdated comment: the flush in ceph_destroy_client only flushes OSD connections out. This bug is basically an artifact of the ceph -> ceph+libceph conversion. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-25 13:27:48 -07:00
Sage Weil	147851d2dc	ceph: rename dentry_release -> d_release, fix comment Just for consistency's sake. Fix obsolete comment too. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:26 -07:00
Henry C Chang	49bcb93236	ceph: add request to the tail of unsafe write list In sync_write_wait(), we assume that the newest request is at the tail of unsafe write list. We should maintain the semantics here. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:25 -07:00
Henry C Chang	78a255654f	ceph: remove request from unsafe list if it is canceled/timed out This fixes the list corruption warning like this: ------------[ cut here ]------------ WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81() Hardware name: X8DTU list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130). Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan] Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1 Call Trace: [<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94 [<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43 [<ffffffff812351a3>] __list_add+0x68/0x81 [<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph] [<ffffffff8111d2a0>] do_sync_write+0xe8/0x125 [<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3 [<ffffffff811e8521>] ? security_file_permission+0x16/0x18 [<ffffffff8111d864>] vfs_write+0xae/0x10b [<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76 [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b ---[ end trace 08573eb9f07ff6f4 ]--- Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:24 -07:00
Sage Weil	80456f8672	ceph: move readahead default to fs/ceph from libceph Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:23 -07:00
Yehuda Sadeh	ad1fee96cb	ceph: add ino32 mount option The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2011-03-21 12:24:22 -07:00
Sage Weil	21f3b5f1bb	ceph: remove debugfs debug cruft Whoops! Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:20 -07:00
Sage Weil	09adc80c61	ceph: preserve I_COMPLETE across rename d_move puts the renamed dentry at the end of d_subdirs, screwing with our cached dentry directory offsets. We were just clearing I_COMPLETE to avoid any possibility of trouble. However, assigning the renamed dentry an offset at the end of the directory (to match it's new d_subdirs position) is sufficient to maintain correct behavior and hold onto I_COMPLETE. This is especially important for workloads like rsync, which renames files into place. Before, we would lose I_COMPLETE and do MDS lookups for each file. With this patch we only talk to the MDS on create and rename. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-15 09:14:03 -07:00
Al Viro	0eb980e317	ceph: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:05 -05:00
Sage Weil	455cec0abf	ceph: no .snap inside of snapped namespace Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-04 12:25:09 -08:00
Sage Weil	16a8b70a5a	ceph: do not clear I_COMPLETE from d_release First, this was racy anyway: d_release isn't called until well after the dentry is unhashed. Second, this runs afoul of the recent dcache change that clears d_parent prior to calling d_release (`949854d0`), causing a NULL pointer dereference. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:52 -08:00
Sage Weil	b545cc1505	ceph: do not set I_COMPLETE Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:51 -08:00
Sage Weil	9bde178d05	Revert "ceph: keep reference to parent inode on ceph_dentry" This reverts commit `97d79b403e`. This fails to account for d_parent changes due to rename or disconnected dentries due to submounts or NFS reexports. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:50 -08:00
Linus Torvalds	8bd89ca220	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling	2011-02-21 15:01:38 -08:00
Yehuda Sadeh	97d79b403e	ceph: keep reference to parent inode on ceph_dentry When creating a new dentry we now hold a reference to the parent inode in the ceph_dentry. This is required due to the new RCU changes from `949854d0`, which set dentry->d_parent to NULL in d_kill before calling the ->release() callback. If/when that behavior is changed, we can revert this hack. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-02-19 19:59:14 -08:00
Sage Weil	e8e1ba96b2	ceph: queue cap_snaps once per realm We were forming a dirty list, and then queueing cap_snaps for each realm _and_ its children, regardless of whether the children were already in the dirty list. This meant we did it twice for some realms. Which in turn meant we corrupted mdsc->snap_flush_list when the cap_snap was re-added to the list it was already on, and could trigger an infinite loop. We were also using recursion to do reach all the children, a no-no when stack is limited. Instead, (re)queue any children on the dirty list, avoiding processing anything twice and avoiding any recursion. Signed-off-by: Sage Weil <sage@newdream.net>	2011-02-04 20:45:58 -08:00
Linus Torvalds	b12ece7d85	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: avoid picking MDS that is not active ceph: avoid immediate cap check after import ceph: fix flushing of caps vs cap import ceph: fix erroneous cap flush to non-auth mds ceph: fix cap_wanted_delay_{min,max} mount option initialization ceph: fix xattr rbtree search ceph: fix getattr on directory when using norbytes	2011-01-28 12:12:58 +10:00
Sage Weil	d66bbd441c	ceph: avoid picking MDS that is not active Ignore replication or auth frag data if it indicates an MDS that is not active. This can happen if the MDS shuts down and the client has stale data about the namespace distribution across the MDS cluster. If that's the case, fall back to directing the request based on the auth cap (which should always be accurate). Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-25 08:16:37 -08:00
Sage Weil	7e57b81c76	ceph: avoid immediate cap check after import The NODELAY flag avoids the heuristics that delay cap (issued/wanted) release. There's no reason for that after we import a cap, and it kills whatever benefit we get from those delays. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:26 -08:00
Sage Weil	088b3f5e9e	ceph: fix flushing of caps vs cap import If we are mid-flush and a cap is migrated to another node, we need to resend the cap flush message to the new MDS, and do so with the original flush_seq to avoid leaking across a sync boundary. Previously we didn't redo the flush (we only flushed newly dirty data), which would cause a later sync to hang forever. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:25 -08:00
Sage Weil	24be0c4810	ceph: fix erroneous cap flush to non-auth mds The int flushing is global and not clear on each iteration of the loop, which can cause a second flush of caps to any MDSs with ids greater than the auth. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:24 -08:00
Sage Weil	50aac4fec5	ceph: fix cap_wanted_delay_{min,max} mount option initialization These were initialized to 0 instead of the default, fallout from the RBD refactor in `3d14c5d2b6`. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:22 -08:00
Sage Weil	17db143fc0	ceph: fix xattr rbtree search Fix xattr name comparison in rbtree search for strings that share a prefix. The name argument is null terminated, but the xattr name is not, so we need to use strncmp, but that means adjusting for the case where name is a prefix of xattr->name. The corresponding case in __set_xattr() already handles this properly (although in that case name is also not null terminated). Reported-by: Sergiy Kibrik <sakib@meta.ua> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-13 15:50:11 -08:00
Yehuda Sadeh	1c1266bb91	ceph: fix getattr on directory when using norbytes The norbytes mount option was broken, and when doing getattr on a directory it return the rbytes instead of the number of entities. This commit fixes it. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-13 15:50:06 -08:00
Linus Torvalds	a170315420	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode	2011-01-13 10:25:24 -08:00
Tejun Heo	01e6acc4ea	ceph: fsc->_wq's aren't used in memory reclaim path fsc->_wq's aren't depended upon during memory reclaim. Convert to alloc_workqueue() w/o WQ_MEM_RECLAIM. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:14 -08:00
Tracey Dent	582c86e690	ceph: Makefile: Remove unnessary code Remove the if and else conditional because the code is in mainline and there is no need in it being there. Also, Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	dc69e2e9fc	ceph: associate requests with opening sessions Associate request with sessions that aren't yep open. This makes the debugfs mdsc request list more informative. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	4af25fdda6	ceph: drop redundant r_mds field The r_mds field is redundant, since we can find the same information at r_session->s_mds, and when r_session is NULL then r_mds is meaningless. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	14303d20f3	ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS This implements the DIRLAYOUTHASH protocol feature, which passes the dir layout over the wire from the MDS. This gives the client knowledge of the correct hash function to use for mapping dentries among dir fragments. Note that if this feature is _not_ present on the client but is on the MDS, the client may misdirect requests. This will result in a forward and degrade performance. It may also result in inaccurate NFS filehandle generation, which will prevent fh resolution when the inode is not present in the client cache and the parent directories have been fragmented. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	6c0f3af72c	ceph: add dir_layout to inode Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:12 -08:00
Nick Piggin	b74c79e993	fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	34286d6662	fs: rcu-walk aware d_revalidate method Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/\([^\t ]\)->d_op = \(.\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]\)\.d_op = \(.\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	b5c84bf6f6	fs: dcache remove dcache_lock dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:23 +11:00
Nick Piggin	2fd6b7f507	fs: dcache scale subdirs Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Nick Piggin	da5029563a	fs: dcache scale d_unhashed Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Nick Piggin	b7ab39f631	fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00

1 2 3 4 5 ...

530 Commits