linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-14 16:12:02 +00:00

Author	SHA1	Message	Date
Alexander Aring	b92a4e3f86	fs: dlm: change posix lock sigint handling This patch changes the handling of a plock operation that was interrupted while waiting for a user space reply from dlm_controld. (This is not the lock blocking state, i.e. locks_lock_file_wait().) Currently, when an op is interrupted while waiting on user space, the op is removed. When the user space result later arrives, a kernel message is loggged: "dev_write no op...". This can be seen from a test such as "stress-ng --fcntl 100" and interrupting it with ctrl-c. Now, leave the op in place when interrupted and remove it when the result arrives (the result will be ignored.) With this change, the logged message is not expected to appear, and would indicate a bug. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:05 -05:00
Alexander Aring	4d413ae9ce	fs: dlm: use dlm_plock_info for do_unlock_close This patch refactors do_unlock_close() by using only struct dlm_plock_info as a parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:04 -05:00
Alexander Aring	ea06d4cabf	fs: dlm: change plock interrupted message to debug again This patch reverses the commit `bcfad4265c` ("dlm: improve plock logging if interrupted") by moving it to debug level and notifying the user an op was removed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:52:36 -05:00
Alexander Aring	19d7ca051d	fs: dlm: add pid to debug log This patch adds the pid information which requested the lock operation to the debug log output. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-23 14:41:39 -05:00
Alexander Aring	976a062434	fs: dlm: plock use list_first_entry This patch will use the list helper list_first_entry() instead of using list_entry() to get the first element of a list. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-23 14:22:10 -05:00
Jakob Koschel	dc1acd5c94	dlm: replace usage of found with dedicated list iterator variable To move the list iterator variable into the list_for_each_entry_() macro in the future it should be avoided to use the list iterator variable after the loop body. To never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:03:14 -05:00
Alexander Aring	314a5540ff	dlm: move global to static inits Instead of init global module at module loading time we can move the initialization of those global variables at memory initialization of the module loader. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:16 -05:00
Alexander Aring	16d58904df	dlm: remove unnecessary INIT_LIST_HEAD() There is no need to call INIT_LIST_HEAD() when it's set directly afterwards by list_add_tail(). Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:13 -05:00
Alexander Aring	bcfad4265c	dlm: improve plock logging if interrupted This patch changes the log level if a plock is removed when interrupted from debug to info. Additional it signals now that the plock entity was removed to let the user know what's happening. If on a dev_write() a pending plock cannot be find it will signal that it might have been removed because wait interruption. Before this patch there might be a "dev_write no op ..." info message and the users can only guess that the plock was removed before because the wait interruption. To be sure that is the case we log both messages on the same log level. Let both message be logged on info layer because it should not happened a lot and if it happens it should be clear why the op was not found. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:09 -05:00
Alexander Aring	a800ba77fd	dlm: rearrange async condition return This patch moves the return of FILE_LOCK_DEFERRED a little bit earlier than checking afterwards again if the request was an asynchronous request. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:05 -05:00
Alexander Aring	bcbb4ba6c9	dlm: cleanup plock_op vs plock_xop Lately the different casting between plock_op and plock_xop and list holders which was involved showed some issues which were hard to see. This patch removes the "plock_xop" structure and introduces a "struct plock_async_data". This structure will be set in "struct plock_op" in case of asynchronous lock handling as the original "plock_xop" was made for. There is no need anymore to cast pointers around for additional fields in case of asynchronous lock handling. As disadvantage another allocation was introduces but only needed in the asynchronous case which is currently only used in combination with nfs lockd. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:02 -05:00
Alexander Aring	a559790caa	dlm: replace sanity checks with WARN_ON There are several sanity checks and recover handling if they occur in the dlm plock handling. From my understanding those operation can't run in parallel with any list manipulation which involved setting the list holder of plock_op, if so we have a bug which this sanity check will warn about. Previously if such sanity check occurred the dlm plock handling was trying to recover from it by deleting the plock_op from a list which the holder was set to. However there is a bug in the dlm plock handling if this case ever happens. To make such bugs are more visible for further investigations we add a WARN_ON() on those sanity checks and remove the recovering handling because other possible side effects. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:01:58 -05:00
Alexander Aring	42252d0d2a	dlm: fix plock invalid read This patch fixes an invalid read showed by KASAN. A unlock will allocate a "struct plock_op" and a followed send_op() will append it to a global send_list data structure. In some cases a followed dev_read() moves it to recv_list and dev_write() will cast it to "struct plock_xop" and access fields which are only available in those structures. At this point an invalid read happens by accessing those fields. To fix this issue the "callback" field is moved to "struct plock_op" to indicate that a cast to "plock_xop" is allowed and does the additional "plock_xop" handling if set. Example of the KASAN output which showed the invalid read: [ 2064.296453] ================================================================== [ 2064.304852] BUG: KASAN: slab-out-of-bounds in dev_write+0x52b/0x5a0 [dlm] [ 2064.306491] Read of size 8 at addr ffff88800ef227d8 by task dlm_controld/7484 [ 2064.308168] [ 2064.308575] CPU: 0 PID: 7484 Comm: dlm_controld Kdump: loaded Not tainted 5.14.0+ #9 [ 2064.310292] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 2064.311618] Call Trace: [ 2064.312218] dump_stack_lvl+0x56/0x7b [ 2064.313150] print_address_description.constprop.8+0x21/0x150 [ 2064.314578] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.315610] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.316595] kasan_report.cold.14+0x7f/0x11b [ 2064.317674] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.318687] dev_write+0x52b/0x5a0 [dlm] [ 2064.319629] ? dev_read+0x4a0/0x4a0 [dlm] [ 2064.320713] ? bpf_lsm_kernfs_init_security+0x10/0x10 [ 2064.321926] vfs_write+0x17e/0x930 [ 2064.322769] ? __fget_light+0x1aa/0x220 [ 2064.323753] ksys_write+0xf1/0x1c0 [ 2064.324548] ? __ia32_sys_read+0xb0/0xb0 [ 2064.325464] do_syscall_64+0x3a/0x80 [ 2064.326387] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2064.327606] RIP: 0033:0x7f807e4ba96f [ 2064.328470] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 39 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 7c 87 f8 ff 48 [ 2064.332902] RSP: 002b:00007ffd50cfe6e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 2064.334658] RAX: ffffffffffffffda RBX: 000055cc3886eb30 RCX: 00007f807e4ba96f [ 2064.336275] RDX: 0000000000000040 RSI: 00007ffd50cfe7e0 RDI: 0000000000000010 [ 2064.337980] RBP: 00007ffd50cfe7e0 R08: 0000000000000000 R09: 0000000000000001 [ 2064.339560] R10: 000055cc3886eb30 R11: 0000000000000293 R12: 000055cc3886eb80 [ 2064.341237] R13: 000055cc3886eb00 R14: 000055cc3886f590 R15: 0000000000000001 [ 2064.342857] [ 2064.343226] Allocated by task 12438: [ 2064.344057] kasan_save_stack+0x1c/0x40 [ 2064.345079] __kasan_kmalloc+0x84/0xa0 [ 2064.345933] kmem_cache_alloc_trace+0x13b/0x220 [ 2064.346953] dlm_posix_unlock+0xec/0x720 [dlm] [ 2064.348811] do_lock_file_wait.part.32+0xca/0x1d0 [ 2064.351070] fcntl_setlk+0x281/0xbc0 [ 2064.352879] do_fcntl+0x5e4/0xfe0 [ 2064.354657] __x64_sys_fcntl+0x11f/0x170 [ 2064.356550] do_syscall_64+0x3a/0x80 [ 2064.358259] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2064.360745] [ 2064.361511] Last potentially related work creation: [ 2064.363957] kasan_save_stack+0x1c/0x40 [ 2064.365811] __kasan_record_aux_stack+0xaf/0xc0 [ 2064.368100] call_rcu+0x11b/0xf70 [ 2064.369785] dlm_process_incoming_buffer+0x47d/0xfd0 [dlm] [ 2064.372404] receive_from_sock+0x290/0x770 [dlm] [ 2064.374607] process_recv_sockets+0x32/0x40 [dlm] [ 2064.377290] process_one_work+0x9a8/0x16e0 [ 2064.379357] worker_thread+0x87/0xbf0 [ 2064.381188] kthread+0x3ac/0x490 [ 2064.383460] ret_from_fork+0x22/0x30 [ 2064.385588] [ 2064.386518] Second to last potentially related work creation: [ 2064.389219] kasan_save_stack+0x1c/0x40 [ 2064.391043] __kasan_record_aux_stack+0xaf/0xc0 [ 2064.393303] call_rcu+0x11b/0xf70 [ 2064.394885] dlm_process_incoming_buffer+0x47d/0xfd0 [dlm] [ 2064.397694] receive_from_sock+0x290/0x770 [dlm] [ 2064.399932] process_recv_sockets+0x32/0x40 [dlm] [ 2064.402180] process_one_work+0x9a8/0x16e0 [ 2064.404388] worker_thread+0x87/0xbf0 [ 2064.406124] kthread+0x3ac/0x490 [ 2064.408021] ret_from_fork+0x22/0x30 [ 2064.409834] [ 2064.410599] The buggy address belongs to the object at ffff88800ef22780 [ 2064.410599] which belongs to the cache kmalloc-96 of size 96 [ 2064.416495] The buggy address is located 88 bytes inside of [ 2064.416495] 96-byte region [ffff88800ef22780, ffff88800ef227e0) [ 2064.422045] The buggy address belongs to the page: [ 2064.424635] page:00000000b6bef8bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xef22 [ 2064.428970] flags: 0xfffffc0000200(slab\|node=0\|zone=1\|lastcpupid=0x1fffff) [ 2064.432515] raw: 000fffffc0000200 ffffea0000d68b80 0000001400000014 ffff888001041780 [ 2064.436110] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 2064.439813] page dumped because: kasan: bad access detected [ 2064.442548] [ 2064.443310] Memory state around the buggy address: [ 2064.445988] ffff88800ef22680: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.449444] ffff88800ef22700: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.452941] >ffff88800ef22780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc [ 2064.456383] ^ [ 2064.459386] ffff88800ef22800: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc [ 2064.462788] ffff88800ef22880: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.466239] ================================================================== reproducer in python: import argparse import struct import fcntl import os parser = argparse.ArgumentParser() parser.add_argument('-f', '--file', help='file to use fcntl, must be on dlm lock filesystem e.g. gfs2') args = parser.parse_args() f = open(args.file, 'wb+') lockdata = struct.pack('hhllhh', fcntl.F_WRLCK,0,0,0,0,0) fcntl.fcntl(f, fcntl.F_SETLK, lockdata) lockdata = struct.pack('hhllhh', fcntl.F_UNLCK,0,0,0,0,0) fcntl.fcntl(f, fcntl.F_SETLK, lockdata) Fixes: `586759f03e` ("gfs2: nfs lock support for gfs2") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:01:53 -05:00
Thomas Gleixner	7336d0e654	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-05 17:37:12 +02:00
Linus Torvalds	a9a08845e9	vfs: do bulk POLL* -> EPOLL* replacement This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V \| grep -v '^t' \| grep -v /um/ \| grep -v '^sa' \| grep -v '/poll.h$'\|grep -v '^D'` for f in $L; do sed -i "-es/^$[^\"]$$\<POLL$V\>$/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-02-11 14:34:03 -08:00
Al Viro	076ccb76e1	fs: annotate ->poll() instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-11-27 16:20:05 -05:00
Benjamin Coddington	9d5b86ac13	fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks Since commit `c69899a17c` "NFSv4: Update of VFS byte range lock must be atomic with the stateid update", NFSv4 has been inserting locks in rpciod worker context. The result is that the file_lock's fl_nspid is the kworker's pid instead of the original userspace pid. The fl_nspid is only used to represent the namespaced virtual pid number when displaying locks or returning from F_GETLK. There's no reason to set it for every inserted lock, since we can usually just look it up from fl_pid. So, instead of looking up and holding struct pid for every lock, let's just look up the virtual pid number from fl_pid when it is needed. That means we can remove fl_nspid entirely. The translaton and presentation of fl_pid should handle the following four cases: 1 - F_GETLK on a remote file with a remote lock: In this case, the filesystem should determine the l_pid to return here. Filesystems should indicate that the fl_pid represents a non-local pid value that should not be translated by returning an fl_pid <= 0. 2 - F_GETLK on a local file with a remote lock: This should be the l_pid of the lock manager process, and translated. 3 - F_GETLK on a remote file with a local lock, and 4 - F_GETLK on a local file with a local lock: These should be the translated l_pid of the local locking process. Fuse was already doing the correct thing by translating the pid into the caller's namespace. With this change we must update fuse to translate to init's pid namespace, so that the locks API can then translate from init's pid namespace into the pid namespace of the caller. With this change, the locks API will expect that if a filesystem returns a remote pid as opposed to a local pid for F_GETLK, that remote pid will be <= 0. This signifies that the pid is remote, and the locks API will forego translating that pid into the pid namespace of the local calling process. Finally, we convert remote filesystems to present remote pids using negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote pid returned for F_GETLK lock requests. Since local pids will never be larger than PID_MAX_LIMIT (which is currently defined as <= 4 million), but pid_t is an unsigned int, we should have plenty of room to represent remote pids with negative numbers if we assume that remote pid numbers are similarly limited. If this is not the case, then we run the risk of having a remote pid returned for which there is also a corresponding local pid. This is a problem we have now, but this patch should reduce the chances of that occurring, while also returning those remote pid numbers, for whatever that may be worth. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2017-07-16 10:28:22 -04:00
Linus Torvalds	d000f8d67f	dlm for 4.4 This includes one simple fix to make posix locks interruptible by signals in cases where a signal handler is used. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWO3ZiAAoJEDgbc8f8gGmqXc8P/j1GcuiMyHbRjlyGgExYYuFG l9eeoCAzUy4aE9PnN90ur/9FhYGQ3fEte4xaKDU7FMTsQIvNrKYdNAt6qWQdQI6M shogDw6hXwXJ42IPS9fuj2dEPFD5wBoMYjEGowzdAvsMH3cROyN03hIWqSWTL3jI UFSR7NjbnQwT8ZUAYcICEE5VerqsGzxGkaFF+V3fYASsZlALTlQkVHT5DQzCkXgq 0CkNhsMx5H4Ng9y/2dnhPj3y24NqbhdtLX4dkcKevMmHP5FJ/rEI82oizxPgJ8oZ QlcSOUZatNuqLVAStecmsd5sH80/IDspnpMDxnQCKnioNq3x6YXXfhyv5CKB6Ahy atA3SlYDACiZz5tydJ/97DJvvIrF2rUETPXk2Lobc972UU99r8zxCUah8xv4ThD/ DtuSkqNnTmXjMcTssHDqo/Kg16dZxpx+itxsWCEivfZm6EL1j5RAvZO5G04wMmry D/FXDKT/FZR+xYDIg1FLc1uOMldeRbMWhb+zGfTAnYy0aH43oyePVddbC+lSuVfp Pat2avXoovR59+7nhFk+s+xf3c8oKMoSwCZOso4OoVySRZQmE1dH6m6D5RLkgRHw nTGggRRAAOsLoiYFtnCKXHkxUTGZWDJfwtv1OI1IABqdoSj5Px/4JAswWj1e5+dH haCGlyvMDqK8ImaekGcZ =m+FJ -----END PGP SIGNATURE----- Merge tag 'dlm-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm update from David Teigland: "This includes one simple fix to make posix locks interruptible by signals in cases where a signal handler is used" * tag 'dlm-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: make posix locks interruptible	2015-11-05 11:15:25 -08:00
Eric Ren	a6b1533e9a	dlm: make posix locks interruptible Replace wait_event_killable with wait_event_interruptible so that a program waiting for a posix lock can be interrupted by a signal. With the killable version, a program was not interruptible by a signal if it had a signal handler set for it, overriding the default action of terminating the process. Signed-off-by: Eric Ren <zren@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>	2015-11-03 10:38:22 -06:00
Benjamin Coddington	4f6563677a	Move locks API users to locks_lock_inode_wait() Instead of having users check for FL_POSIX or FL_FLOCK to call the correct locks API function, use the check within locks_lock_inode_wait(). This allows for some later cleanup. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>	2015-10-22 14:57:36 -04:00
Greg Kroah-Hartman	f368ed6088	char: make misc_deregister a void function With well over 200+ users of this api, there are a mere 12 users that actually checked the return value of this function. And all of them really didn't do anything with that information as the system or module was shutting down no matter what. So stop pretending like it matters, and just return void from misc_deregister(). If something goes wrong in the call, you will get a WARNING splat in the syslog so you know how to fix up your driver. Other than that, there's nothing that can go wrong. Cc: Alasdair Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-05 10:35:49 -07:00
Joe Perches	d0449b90f8	locks: Remove unused conf argument from lm_grant This argument is always NULL so don't pass it around. [jlayton: remove dependencies on previous patches in series] Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-09 16:01:06 -04:00
David Teigland	9000831839	dlm: avoid unnecessary posix unlock When the kernel clears flocks/plocks during close, it calls posix unlock when there are flocks but no posix locks. Without this patch, that unnecessary posix unlock is passed to userland (dlm_controld), across the cluster, and back to the kernel. This can create a lot of plock activity, even when no posix locks had been used. This patch copies the nfs approach, and skips the full posix unlock if there is no plock found during the vfs unlock phase. Signed-off-by: David Teigland <teigland@redhat.com>	2013-04-08 12:03:15 -05:00
J. Bruce Fields	8fb47a4fbf	locks: rename lock-manager ops Both the filesystem and the lock manager can associate operations with a lock. Confusingly, one of them (fl_release_private) actually has the same name in both operation structures. It would save some confusion to give the lock-manager ops different names. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-20 20:23:19 -04:00
David Teigland	901025d2f3	dlm: make plock operation killable Allow processes blocked on plock requests to be interrupted when they are killed. This leaves the problem of cleaning up the lock state in userspace. This has three parts: 1. Add a flag to unlock operations sent to userspace indicating the file is being closed. Userspace will then look for and clear any waiting plock operations that were abandoned by an interrupted process. 2. Queue an unlock-close operation (like in 1) to clean up userspace from an interrupted plock request. This is needed because the vfs will not send a cleanup-unlock if it sees no locks on the file, which it won't if the interrupted operation was the only one. 3. Do not use replies from userspace for unlock-close operations because they are unnecessary (they are just cleaning up for the process which did not make an unlock call). This also simplifies the new unlock-close generated from point 2. Signed-off-by: David Teigland <teigland@redhat.com>	2011-05-23 10:47:06 -05:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Linus Torvalds	02412f49f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: always use GFP_NOFS	2009-12-10 09:33:59 -08:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
David Teigland	573c24c4af	dlm: always use GFP_NOFS Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by: David Teigland <teigland@redhat.com>	2009-11-30 16:34:43 -06:00
David Teigland	c78a87d0a1	dlm: fix plock use-after-free Fix a regression from the original addition of nfs lock support `586759f03e`. When a synchronous (non-nfs) plock completes, the waiting thread will wake up and free the op struct. This races with the user thread in dev_write() which goes on to read the op's callback field to check if the lock is async and needs a callback. This check can happen on the freed op. The fix is to note the callback value before the op can be freed. Signed-off-by: David Teigland <teigland@redhat.com>	2009-06-18 13:42:42 -05:00
Jeff Layton	20d5a39929	dlm: initialize file_lock struct in GETLK before copying conflicting lock dlm_posix_get fills out the relevant fields in the file_lock before returning when there is a lock conflict, but doesn't clean out any of the other fields in the file_lock. When nfsd does a NFSv4 lockt call, it sets the fl_lmops to nfsd_posix_mng_ops before calling the lower fs. When the lock comes back after testing a lock on GFS2, it still has that field set. This confuses nfsd into thinking that the file_lock is a nfsd4 lock. Fix this by making DLM reinitialize the file_lock before copying the fields from the conflicting lock. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2009-01-21 15:28:45 -06:00
David Teigland	24179f4880	dlm: fix plock notify callback to lockd We should use the original copy of the file_lock, fl, instead of the copy, flc in the lockd notify callback. The range in flc has been modified by posix_lock_file(), so it will not match a copy of the lock in lockd. Signed-off-by: David Teigland <teigland@redhat.com>	2009-01-21 15:28:45 -06:00
Miklos Szeredi	bde74e4bc6	locks: add special return value for asynchronous locks Use a special error value FILE_LOCK_DEFERRED to mean that a locking operation returned asynchronously. This is returned by posix_lock_file() for sleeping locks to mean that the lock has been queued on the block list, and will be woken up when it might become available and needs to be retried (either fl_lmops->fl_notify() is called or fl_wait is woken up). f_op->lock() to mean either the above, or that the filesystem will call back with fl_lmops->fl_grant() when the result of the locking operation is known. The filesystem can do this for sleeping as well as non-sleeping locks. This is to make sure, that return values of -EAGAIN and -EINPROGRESS by filesystems are not mistaken to mean an asynchronous locking. This also makes error handling in fs/locks.c and lockd/svclock.c slightly cleaner. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
David Teigland	817d10bad5	dlm: fix plock dev_write return value The return value on writes to the plock device should be the number of bytes written. It was returning 0 instead when an nfs lock callback was involved. Reported-by: Nathan Straz <nstraz@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-05-19 15:37:27 -05:00
David Teigland	2402211a83	dlm: move plock code from gfs2 Move the code that handles cluster posix locks from gfs2 into the dlm so that it can be used by both gfs2 and ocfs2. Signed-off-by: David Teigland <teigland@redhat.com>	2008-04-21 11:22:28 -05:00

36 Commits