linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-19 10:31:48 +00:00

Author	SHA1	Message	Date
Dan Carpenter	805eeb8e04	xfs: extra semi-colon breaks a condition There were some extra semi-colons here which mean that we return true unintentionally. Fixes: `a49935f200` ('xfs: xfs_check_page_type buffer checks need help') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-04-04 06:56:30 +11:00
Linus Torvalds	159d8133d0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science -- mostly documentation and comment updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: sparse: fix comment doc: fix double words isdn: capi: fix "CAPI_VERSION" comment doc: DocBook: Fix typos in xml and template file Bluetooth: add module name for btwilink driver core: unexport static function create_syslog_header mmc: core: typo fix in printk specifier ARM: spear: clean up editing mistake net-sysfs: fix comment typo 'CONFIG_SYFS' doc: Insert MODULE_ in module-signing macros Documentation: update URL to hfsplus Technote 1150 gpio: update path to documentation ixgbe: Fix format string in ixgbe_fcoe. Kconfig: Remove useless "default N" lines user_namespace.c: Remove duplicated word in comment CREDITS: fix formatting treewide: Fix typo in Documentation/DocBook mm: Fix warning on make htmldocs caused by slab.c ata: ata-samsung_cf: cleanup in header file idr: remove unused prototype of idr_free()	2014-04-02 16:23:38 -07:00
Linus Torvalds	b9f2b21a32	Devicetree changes for v3.15 Updates to devicetree core code. This branch contains the following notable changes: * Add reserved memory binding * Make struct device_node a kobject and remove legacy /proc/device-tree * ePAPR conformance fixes * Update in-kernel DTC copy to version v1.4.0 * Preparation changes for dynamic device tree overlays * minor bug fixes and documentation changes The most significant change in this branch is the conversion of struct device_node to be a kobject that is exposed via sysfs and removal of the old /proc/device-tree code. This simplifies the device tree handling code and tightens up the lifecycle on device tree nodes. [updated: added fix for dangling select PROC_DEVICETREE] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJTOyNwAAoJEMWQL496c2LNZY0QAIreUrpo3/hKRau61EDPXkOA UFRyPUHD0k/dNXWWDbTfvKH/nAfzdVwejhePqEWiODiFOFkq7JyQlMKPA+CZuZj0 ygN4215A1yj/hDf6JRD5Zn4WGpawDt9InlbZSps6P5dd8voV5t5dz6uzz+Y7uqaK CAjTDlBSmxEen5vRHiHQgKv74au/+b9yfSURjPQVWg46+wl3WJwjsdzerphm4unW tpEr8zkIsm51mqqAx4penIuiovh7+L2J5v4BFeg8o+kaZEuZpVxLHJPOuBd5hdom zeqEIj3AqHTh5suYIHe4aAbZ2wMP3kYGgkPGwfWLnwLyULxalcCtGZeaCi9nwTFj Fdj+7f17ocrt5mif0f5Deufi1LqJsDjhY6G9p7HuV7Y9hsMILpJIUoGENPji+TWj BA4L45eaPmNYdKJytEtFD7F2WnXeHZ6fDtYho/39DWW+Bt16IFX85T199irhxGG4 byN6LRaahk2UeycSXkQHAlWOQHqzBcJJAkQLN2iahzyYRr9Dy+VI2E9clm53m49O YQYcONdUlMYrtfRwJpbB9XHM0HgZUvg0LT5z/iHQs9uJtoo33Oj+zxFixyZLQ9Dq qyLqQWEpV9gFLAo9tpf56gffkLiJRsHkX4UJ6oTtj4DY1WWU9H81jjCvv/7flzp/ 8ZyyZzANQf1DZ9kqO2v+ =lyA5 -----END PGP SIGNATURE----- Merge tag 'dt-for-linus' of git://git.secretlab.ca/git/linux Pull devicetree changes from Grant Likely: "Updates to devicetree core code. This branch contains the following notable changes: - add reserved memory binding - make struct device_node a kobject and remove legacy /proc/device-tree - ePAPR conformance fixes - update in-kernel DTC copy to version v1.4.0 - preparatory changes for dynamic device tree overlays - minor bug fixes and documentation changes The most significant change in this branch is the conversion of struct device_node to be a kobject that is exposed via sysfs and removal of the old /proc/device-tree code. This simplifies the device tree handling code and tightens up the lifecycle on device tree nodes. [updated: added fix for dangling select PROC_DEVICETREE]" * tag 'dt-for-linus' of git://git.secretlab.ca/git/linux: (29 commits) dt: Remove dangling "select PROC_DEVICETREE" of: Add support for ePAPR "stdout-path" property of: device_node kobject lifecycle fixes of: only scan for reserved mem when fdt present powerpc: add support for reserved memory defined by device tree arm64: add support for reserved memory defined by device tree of: add missing major vendors of: add vendor prefix for SMSC of: remove /proc/device-tree of/selftest: Add self tests for manipulation of properties of: Make device nodes kobjects so they show up in sysfs arm: add support for reserved memory defined by device tree drivers: of: add support for custom reserved memory drivers drivers: of: add initialization code for dynamic reserved memory drivers: of: add initialization code for static reserved memory of: document bindings for reserved-memory nodes Revert "of: fix of_update_property()" kbuild: dtbs_install: new make target ARM: mvebu: Allows to get the SoC ID even without PCI enabled of: Allows to use the PCI translator without the PCI core ...	2014-04-02 14:27:15 -07:00
Linus Torvalds	7125764c5d	Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull compat time conversion changes from Peter Anvin: "Despite the branch name this is really neither an x86 nor an x32-specific patchset, although it the implementation of the discussions that followed the x32 security hole a few months ago. This removes get/put_compat_timespec/val() and replaces them with compat_get/put_timespec/val() which are savvy as to the current status of COMPAT_USE_64BIT_TIME. It removes several unused and/or incorrect/misleading functions (like compat_put_timeval_convert which doesn't in fact do any conversion) and also replaces several open-coded implementations what is now called compat_convert_timespec() with that function" * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Fix sparse address space warnings compat: Get rid of (get\|put)_compat_time(val\|spec)	2014-04-02 12:51:41 -07:00
Rajat Jain	f3846266f5	fuse: fix "uninitialized variable" warning Fix the following warning: In file included from include/linux/fs.h:16:0, from fs/fuse/fuse_i.h:13, from fs/fuse/file.c:9: fs/fuse/file.c: In function 'fuse_file_poll': include/linux/rbtree.h:82:28: warning: 'parent' may be used uninitialized in this function [-Wmaybe-uninitialized] fs/fuse/file.c:2592:27: note: 'parent' was declared here Signed-off-by: Rajat Jain <rajatxjain@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:51 +02:00
Pavel Emelyanov	4d99ff8f12	fuse: Turn writeback cache on Introduce a bit kernel and userspace exchange between each-other on the init stage and turn writeback on if the userspace want this and mount option 'allow_wbcache' is present (controlled by fusermount). Also add each writable file into per-inode write list and call the generic_file_aio_write to make use of the Linux page cache engine. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Pavel Emelyanov	ea8cd33390	fuse: Fix O_DIRECT operations vs cached writeback misorder The problem is: 1. write cached data to a file 2. read directly from the same file (via another fd) The 2nd operation may read stale data, i.e. the one that was in a file before the 1st op. Problem is in how fuse manages writeback. When direct op occurs the core kernel code calls filemap_write_and_wait to flush all the cached ops in flight. But fuse acks the writeback right after the ->writepages callback exits w/o waiting for the real write to happen. Thus the subsequent direct op proceeds while the real writeback is still in flight. This is a problem for backends that reorder operation. Fix this by making the fuse direct IO callback explicitly wait on the in-flight writeback to finish. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Maxim Patlasov	fe38d7df23	fuse: fuse_flush() should wait on writeback The aim of .flush fop is to hint file-system that flushing its state or caches or any other important data to reliable storage would be desirable now. fuse_flush() passes this hint by sending FUSE_FLUSH request to userspace. However, dirty pages and pages under writeback may be not visible to userspace yet if we won't ensure it explicitly. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Pavel Emelyanov	6b12c1b37e	fuse: Implement write_begin/write_end callbacks The .write_begin and .write_end are requiered to use generic routines (generic_file_aio_write --> ... --> generic_perform_write) for buffered writes. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:49 +02:00
Maxim Patlasov	482fce55d2	fuse: restructure fuse_readpage() Move the code filling and sending read request to a separate function. Future patches will use it for .write_begin -- partial modification of a page requires reading the page from the storage very similarly to what fuse_readpage does. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:49 +02:00
Pavel Emelyanov	e7cc133c37	fuse: Flush files on wb close Any write request requires a file handle to report to the userspace. Thus when we close a file (and free the fuse_file with this info) we have to flush all the outstanding dirty pages. filemap_write_and_wait() is enough because every page under fuse writeback is accounted in ff->count. This delays actual close until all fuse wb is completed. In case of "write cache" turned off, the flush is ensured by fuse_vma_close(). Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:49 +02:00
Maxim Patlasov	b0aa760652	fuse: Trust kernel i_mtime only Let the kernel maintain i_mtime locally: - clear S_NOCMTIME - implement i_op->update_time() - flush mtime on fsync and last close - update i_mtime explicitly on truncate and fallocate Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime should be flushed to the server eventually. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:48 +02:00
Pavel Emelyanov	8373200b12	fuse: Trust kernel i_size only Make fuse think that when writeback is on the inode's i_size is always up-to-date and not update it with the value received from the userspace. This is done because the page cache code may update i_size without letting the FS know. This assumption implies fixing the previously introduced short-read helper -- when a short read occurs the 'hole' is filled with zeroes. fuse_file_fallocate() is also fixed because now we should keep i_size up to date, so it must be updated if FUSE_FALLOCATE request succeeded. Signed-off-by: Maxim V. Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:48 +02:00
Pavel Emelyanov	d5cd66c58e	fuse: Connection bit for enabling writeback Off (0) by default. Will be used in the next patches and will be turned on at the very end. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:48 +02:00
Pavel Emelyanov	a92adc824e	fuse: Prepare to handle short reads A helper which gets called when read reports less bytes than was requested. See patch "trust kernel i_size only" for details. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:47 +02:00
Pavel Emelyanov	650b22b941	fuse: Linking file to inode helper When writeback is ON every writeable file should be in per-inode write list, not only mmap-ed ones. Thus introduce a helper for this linkage. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:47 +02:00
Linus Torvalds	7a48837732	Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block Pull core block layer updates from Jens Axboe: "This is the pull request for the core block IO bits for the 3.15 kernel. It's a smaller round this time, it contains: - Various little blk-mq fixes and additions from Christoph and myself. - Cleanup of the IPI usage from the block layer, and associated helper code. From Frederic Weisbecker and Jan Kara. - Duplicate code cleanup in bio-integrity from Gu Zheng. This will give you a merge conflict, but that should be easy to resolve. - blk-mq notify spinlock fix for RT from Mike Galbraith. - A blktrace partial accounting bug fix from Roman Pen. - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li" * 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits) blk-mq: add REQ_SYNC early rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock blk-mq: support partial I/O completions blk-mq: merge blk_mq_insert_request and blk_mq_run_request blk-mq: remove blk_mq_alloc_rq blk-mq: don't dump CPU -> hw queue map on driver load blk-mq: fix wrong usage of hctx->state vs hctx->flags blk-mq: allow blk_mq_init_commands() to return failure block: remove old blk_iopoll_enabled variable blktrace: fix accounting of partially completed requests smp: Rename __smp_call_function_single() to smp_call_function_single_async() smp: Remove wait argument from __smp_call_function_single() watchdog: Simplify a little the IPI call smp: Move __smp_call_function_single() below its safe version smp: Consolidate the various smp_call_function_single() declensions smp: Teach __smp_call_function_single() to check for offline cpus smp: Remove unused list_head from csd smp: Iterate functions through llist_for_each_entry_safe() block: Stop abusing rq->csd.list in blk-softirq block: Remove useless IPI struct initialization ...	2014-04-01 19:19:15 -07:00
Eric Whitney	ad6599ab3a	ext4: fix premature freeing of partial clusters split across leaf blocks Xfstests generic/311 and shared/298 fail when run on a bigalloc file system. Kernel error messages produced during the tests report that blocks to be freed are already on the to-be-freed list. When e2fsck is run at the end of the tests, it typically reports bad i_blocks and bad free blocks counts. The bug that causes these failures is located in ext4_ext_rm_leaf(). Code at the end of the function frees a partial cluster if it's not shared with an extent remaining in the leaf. However, if all the extents in the leaf have been removed, the code dereferences an invalid extent pointer (off the front of the leaf) when the check for sharing is made. This generally has the effect of unconditionally freeing the partial cluster, which leads to the observed failures when the partial cluster is shared with the last extent in the next leaf. Fix this by attempting to free the cluster only if extents remain in the leaf. Any remaining partial cluster will be freed if possible when the next leaf is processed or when leaf removal is complete. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-01 19:49:30 -04:00
Linus Torvalds	158e0d3621	Driver core / sysfs patches for 3.15-rc1 Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlM7A0wACgkQMUfUDdst+ynJNACfZlY+KNKIhNFt1OOW8rQfSZzy 1PYAnjYuOoly01JlPrpJD5b4TdxaAq71 =GVUg -----END PGP SIGNATURE----- Merge tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and sysfs updates from Greg KH: "Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while" * tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits) Revert "sysfs, driver-core: remove unused {sysfs\|device}_schedule_callback_owner()" kernfs: cache atomic_write_len in kernfs_open_file numa: fix NULL pointer access and memory leak in unregister_one_node() Revert "driver core: synchronize device shutdown" kernfs: fix off by one error. kernfs: remove duplicate dir.c at the top dir x86: align x86 arch with generic CPU modalias handling cpu: add generic support for CPU feature based module autoloading sysfs: create bin_attributes under the requested group driver core: unexport static function create_syslog_header firmware: use power efficient workqueue for unloading and aborting fw load firmware: give a protection when map page failed firmware: google memconsole driver fixes firmware: fix google/gsmi duplicate efivars_sysfs_init() drivers/base: delete non-required instances of include <linux/init.h> kernfs: fix kernfs_node_from_dentry() ACPI / platform: drop redundant ACPI_HANDLE check kernfs: fix hash calculation in kernfs_rename_ns() kernfs: add CONFIG_KERNFS sysfs, kobject: add sysfs wrapper for kernfs_enable_ns() ...	2014-04-01 16:28:19 -07:00
Linus Torvalds	1ead658124	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Thomas Gleixner: "This assorted collection provides: - A new timer based timer broadcast feature for systems which do not provide a global accessible timer device. That allows those systems to put CPUs into deep idle states where the per cpu timer device stops. - A few NOHZ_FULL related improvements to the timer wheel - The usual updates to timer devices found in ARM SoCs - Small improvements and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) tick: Remove code duplication in tick_handle_periodic() tick: Fix spelling mistake in tick_handle_periodic() x86: hpet: Use proper destructor for delayed work workqueue: Provide destroy_delayed_work_on_stack() clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS timer: Remove code redundancy while calling get_nohz_timer_target() hrtimer: Rearrange comments in the order struct members are declared timer: Use variable head instead of &work_list in __run_timers() clocksource: exynos_mct: silence a static checker warning arm: zynq: Add support for cpufreq arm: zynq: Don't use arm_global_timer with cpufreq clocksource/cadence_ttc: Overhaul clocksource frequency adjustment clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI sh: Remove Kconfig entries for TMU, CMT and MTU2 ARM: shmobile: Remove CMT, TMU and STI Kconfig entries clocksource: armada-370-xp: Use atomic access for shared registers clocksource: orion: Use atomic access for shared registers clocksource: timer-keystone: Delete unnecessary variable clocksource: timer-keystone: introduce clocksource driver for Keystone ...	2014-04-01 11:00:07 -07:00
Linus Torvalds	a21e40877a	Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main purpose is to fix a full dynticks bug related to virtualization, where steal time accounting appears to be zero in /proc/stat even after a few seconds of competing guests running busy loops in a same host CPU. It's not a regression though as it was there since the beginning. The other commits are preparatory work to fix the bug and various cleanups" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch: Remove stub cputime.h headers sched: Remove needless round trip nsecs <-> tick conversion of steal time cputime: Fix jiffies based cputime assumption on steal accounting cputime: Bring cputime -> nsecs conversion cputime: Default implementation of nsecs -> cputime conversion cputime: Fix nsecs_to_cputime() return type cast	2014-04-01 10:16:10 -07:00
Miklos Szeredi	bd42998a6b	ext4: add cross rename support Implement RENAME_EXCHANGE flag in renameat2 syscall. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	bd1af145b9	ext4: rename: split out helper functions Cross rename (exchange source and dest) will need to call some of these helpers for both source and dest, while overwriting rename currently only calls them for one or the other. This also makes the code easier to follow. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	0d7d5d678b	ext4: rename: move EMLINK check up Move checking i_nlink from after ext4_get_first_dir_block() to before. The check doesn't rely on the result of that function and the function only fails on fs corruption, so the order shouldn't matter. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	c0d268c366	ext4: rename: create ext4_renament structure for local vars Need to split up ext4_rename() into helpers but there are too many local variables involved, so create a new structure. This also, apparently, makes the generated code size slightly smaller. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	da1ce0670c	vfs: add cross-rename If flags contain RENAME_EXCHANGE then exchange source and destination files. There's no restriction on the type of the files; e.g. a directory can be exchanged with a symlink. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
J. Bruce Fields	4fd699ae3f	vfs: lock_two_nondirectories: allow directory args lock_two_nondirectories warned if either of its args was a directory. Instead just ignore the directory args. This is needed for locking in cross rename. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	0b3974eb04	security: add flags to rename hooks Add flags to security_path_rename() and security_inode_rename() hooks. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	0a7c3937a1	vfs: add RENAME_NOREPLACE flag If this flag is specified and the target of the rename exists then the rename syscall fails with EEXIST. The VFS does the existence checking, so it is trivial to enable for most local filesystems. This patch only enables it in ext4. For network filesystems the VFS check is not enough as there may be a race between a remote create and the rename, so these filesystems need to handle this flag in their ->rename() implementations to ensure atomicity. Andy writes about why this is useful: "The trivial answer: to eliminate the race condition from 'mv -i'. Another answer: there's a common pattern to atomically create a file with contents: open a temporary file, write to it, optionally fsync it, close it, then link(2) it to the final name, then unlink the temporary file. The reason to use link(2) is because it won't silently clobber the destination. This is annoying: - It requires an extra system call that shouldn't be necessary. - It doesn't work on (IMO sensible) filesystems that don't support hard links (e.g. vfat). - It's not atomic -- there's an intermediate state where both files exist. - It's ugly. The new rename flag will make this totally sensible. To be fair, on new enough kernels, you can also use O_TMPFILE and linkat to achieve the same thing even more cleanly." Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	520c8b1650	vfs: add renameat2 syscall Add new renameat2 syscall, which is the same as renameat with an added flags argument. Pass flags to vfs_rename() and to i_op->rename() as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:42 +02:00
Miklos Szeredi	bc27027a73	vfs: rename: use common code for dir and non-dir There's actually very little difference between vfs_rename_dir() and vfs_rename_other() so move both inline into vfs_rename() which still stays reasonably readable. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:42 +02:00
Miklos Szeredi	de22a4c372	vfs: rename: move d_move() up Move the d_move() in vfs_rename_dir() up, similarly to how it's done in vfs_rename_other(). The next patch will consolidate these two functions and this is the only structural difference between them. I'm not sure if doing the d_move() after the dput is even valid. But there may be a logical explanation for that. But moving the d_move() before the dput() (and the mutex_unlock()) should definitely not hurt. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:42 +02:00
Miklos Szeredi	44b1d53043	vfs: add d_is_dir() Add d_is_dir(dentry) helper which is analogous to S_ISDIR(). To avoid confusion, rename d_is_directory() to d_can_lookup(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:41 +02:00
Lukas Czerner	e5b30416f3	ext4: remove unneeded test of ret variable Currently in ext4_fallocate() and ext4_zero_range() we're testing ret variable along with new_size. However in ext4_fallocate() we just tested ret before and in ext4_zero_range() if will always be zero when we get there so there is no need to test it in both cases. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-01 00:59:21 -04:00
Linus Torvalds	9d919e8d5b	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "PREPARE_[DELAYED_]WORK() were used to change the work function of work items without fully reinitializing it; however, this makes workqueue consider the work item as a different one from before and allows the work item to start executing before the previous instance is finished which can lead to extremely subtle issues which are painful to debug. The interface has never been popular. This pull request contains patches to remove existing usages and kill the interface. As one of the changes was routed during the last devel cycle and another depended on a pending change in nvme, for-3.15 contains a couple merge commits. In addition, interfaces which were deprecated quite a while ago - __cancel_delayed_work() and WQ_NON_REENTRANT - are removed too" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove deprecated WQ_NON_REENTRANT workqueue: Spelling s/instensive/intensive/ workqueue: remove PREPARE_[DELAYED_]WORK() staging/fwserial: don't use PREPARE_WORK afs: don't use PREPARE_WORK nvme: don't use PREPARE_WORK usb: don't use PREPARE_DELAYED_WORK floppy: don't use PREPARE_[DELAYED_]WORK ps3-vuart: don't use PREPARE_WORK wireless/rt2x00: don't use PREPARE_WORK in rt2800usb.c workqueue: Remove deprecated __cancel_delayed_work()	2014-03-31 15:08:51 -07:00
Linus Torvalds	1ce235faa8	- KGDB support for arm64 - PCI I/O space extended to 16M (in preparation of PCIe support patches) - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the time being), together with swiotlb late initialisation to correctly setup the bounce buffer - DMA API cache maintenance support (not all ARMv8 platforms have hardware cache coherency) - Crypto extensions advertising via ELF_HWCAP2 for compat user space - Perf support for dwarf unwinding in compat mode - asm/tlb.h converted to the generic mmu_gather code - asm-generic rwsem implementation - Code clean-up -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAABAgAGBQJTOaqsAAoJEGvWsS0AyF7xYNUP/3/IPySIB+/6pyUG6q7kvIpF Di93M+VdmnLEOKhhx/tjkiEmEQMp0hFPeOlQRWf/Ugg4ksulP6gRejdDEjIfkmsk LrRXLjvH79NDJbN0pTUXqGDvLLZ9Qnib+HEOuKABIYUrwhNKySBk+5omGfXFtwLR Mb5JxPX0kbBXOqbOX4RgANQoRlE8GxJR3V245zlGxA4klcN4IiaDy/99kj+kaeaa Cl8X9K2I550IZ2YUAWPOut2aee2qRFQtAhIDgVthTYlGRx7Y/rDLM16B8fFY/T0H 7azIpSO5hk5lp8J3giJHYajlJlXNla5FeHQb8XAVnlyqFBmCUn0vvd2VbPvWREJp UD8t1vZZt/s2he6CVAQIfQghwLyzrpPa19KbnyI+3HtsZ+NS/puBJmcVKZ2PBY/L 28BsRzB7BKAPEVhNmyPwFHNdZTvjaqYUCLhQ0uTp1sSHMcLeSs7+vyMR99f/0u9E doSYAeF41ZkxHXL5xEevdj4sFkCEY1XFxER1Y8VM1rqHTeGEoeYbdS/u9tEeBgit jBelvHAlNTBgbur2nW4E9fQpAF2CsvWnRq6lSmDRTkyjzcLUQqA8bsQJ3aUyJtZt j17kUIzSH1q7x3zAaWQcvMVeawdkv2+HanjuTOdeO2ehvyG71vvxA3RkCv8o5Jhh da+jAMhkpYQxk8mSKkWm =8+cB -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull ARM64 updates from Catalin Marinas: - KGDB support for arm64 - PCI I/O space extended to 16M (in preparation of PCIe support patches) - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the time being), together with swiotlb late initialisation to correctly setup the bounce buffer - DMA API cache maintenance support (not all ARMv8 platforms have hardware cache coherency) - Crypto extensions advertising via ELF_HWCAP2 for compat user space - Perf support for dwarf unwinding in compat mode - asm/tlb.h converted to the generic mmu_gather code - asm-generic rwsem implementation - Code clean-up * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: Remove pgprot_dmacoherent() arm64: Support DMA_ATTR_WRITE_COMBINE arm64: Implement custom mmap functions for dma mapping arm64: Fix __range_ok macro arm64: Fix duplicated Kconfig entries arm64: mm: Route pmd thp functions through pte equivalents arm64: rwsem: use asm-generic rwsem implementation asm-generic: rwsem: de-PPCify rwsem.h arm64: enable generic CPU feature modalias matching for this architecture arm64: smp: make local symbol static arm64: debug: make local symbols static ARM64: perf: support dwarf unwinding in compat mode ARM64: perf: add support for frame pointer unwinding in compat mode ARM64: perf: add support for perf registers API arm64: Add boot time configuration of Intermediate Physical Address size arm64: Do not synchronise I and D caches for special ptes arm64: Make DMA coherent and strongly ordered mappings not executable arm64: barriers: add dmb barrier arm64: topology: Implement basic CPU topology support arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries ...	2014-03-31 15:01:45 -07:00
Linus Torvalds	190f918660	Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 compat wrapper rework from Heiko Carstens: "S390 compat system call wrapper simplification work. The intention of this work is to get rid of all hand written assembly compat system call wrappers on s390, which perform proper sign or zero extension, or pointer conversion of compat system call parameters. Instead all of this should be done with C code eg by using Al's COMPAT_SYSCALL_DEFINEx() macro. Therefore all common code and s390 specific compat system calls have been converted to the COMPAT_SYSCALL_DEFINEx() macro. In order to generate correct code all compat system calls may only have eg compat_ulong_t parameters, but no unsigned long parameters. Those patches which change parameter types from unsigned long to compat_ulong_t parameters are separate in this series, but shouldn't cause any harm. The only compat system calls which intentionally have 64 bit parameters (preadv64 and pwritev64) in support of the x86/32 ABI haven't been changed, but are now only available if an architecture defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64. System calls which do not have a compat variant but still need proper zero extension on s390, like eg "long sys_brk(unsigned long brk)" will get a proper wrapper function with the new s390 specific COMPAT_SYSCALL_WRAPx() macro: COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk); which generates the following code (simplified): asmlinkage long sys_brk(unsigned long brk); asmlinkage long compat_sys_brk(long brk) { return sys_brk((u32)brk); } Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines includes both linux/syscall.h and linux/compat.h, it will generate build errors, if the declaration of sys_brk() doesn't match, or if there exists a non-matching compat_sys_brk() declaration. In addition this will intentionally result in a link error if somewhere else a compat_sys_brk() function exists, which probably should have been used instead. Two more BUILD_BUG_ONs make sure the size and type of each compat syscall parameter can be handled correctly with the s390 specific macros. I converted the compat system calls step by step to verify the generated code is correct and matches the previous code. In fact it did not always match, however that was always a bug in the hand written asm code. In result we get less code, less bugs, and much more sanity checking" * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits) s390/compat: add copyright statement compat: include linux/unistd.h within linux/compat.h s390/compat: get rid of compat wrapper assembly code s390/compat: build error for large compat syscall args mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: convert to COMPAT_SYSCALL_DEFINE security/compat: convert to COMPAT_SYSCALL_DEFINE mm/compat: convert to COMPAT_SYSCALL_DEFINE net/compat: convert to COMPAT_SYSCALL_DEFINE kernel/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: optional preadv64/pwrite64 compat system calls ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t s390/compat: partial parameter conversion within syscall wrappers s390/compat: automatic zero, sign and pointer conversion of syscalls s390/compat: add sync_file_range and fallocate compat syscalls ...	2014-03-31 14:32:17 -07:00
Linus Torvalds	7cc3afdf43	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "The main changes: - Add debug code to the dump EFI pagetable - Borislav Petkov - Make 1:1 runtime mapping robust when booting on machines with lots of memory - Borislav Petkov - Move the EFI facilities bits out of 'x86_efi_facility' and into efi.flags which is the standard architecture independent place to keep EFI state, by Matt Fleming. - Add 'EFI mixed mode' support: this allows 64-bit kernels to be booted from 32-bit firmware. This needs a bootloader that supports the 'EFI handover protocol'. By Matt Fleming" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86, efi: Abstract x86 efi_early calls x86/efi: Restore 'attr' argument to query_variable_info() x86/efi: Rip out phys_efi_get_time() x86/efi: Preserve segment registers in mixed mode x86/boot: Fix non-EFI build x86, tools: Fix up compiler warnings x86/efi: Re-disable interrupts after calling firmware services x86/boot: Don't overwrite cr4 when enabling PAE x86/efi: Wire up CONFIG_EFI_MIXED x86/efi: Add mixed runtime services support x86/efi: Firmware agnostic handover entry points x86/efi: Split the boot stub into 32/64 code paths x86/efi: Add early thunk code to go from 64-bit to 32-bit x86/efi: Build our own EFI services pointer table efi: Add separate 32-bit/64-bit definitions x86/efi: Delete dead code when checking for non-native x86/mm/pageattr: Always dump the right page table in an oops x86, tools: Consolidate #ifdef code x86/boot: Cleanup header.S by removing some #ifdefs efi: Use NULL instead of 0 for pointer ...	2014-03-31 12:26:05 -07:00
Linus Torvalds	b3fd4ea9df	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "Main changes: - Torture-test changes, including refactoring of rcutorture and introduction of a vestigial locktorture. - Real-time latency fixes. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) rcu: Provide grace-period piggybacking API rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() Documentation/memory-barriers.txt: Clarify release/acquire ordering rcutorture: Save kvm.sh output to log rcutorture: Add a lock_busted to test the test rcutorture: Place kvm-test-1-run.sh output into res directory rcutorture: Rename TREE_RCU-Kconfig.txt locktorture: Add kvm-recheck.sh plug-in for locktorture rcutorture: Gracefully handle NULL cleanup hooks locktorture: Add vestigial locktorture configuration rcutorture: Introduce "rcu" directory level underneath configs rcutorture: Rename kvm-test-1-rcu.sh rcutorture: Remove RCU dependencies from ver_functions.sh API rcutorture: Create CFcommon file for common Kconfig parameters rcutorture: Create config files for scripted test-the-test testing rcutorture: Add an rcu_busted to test the test locktorture: Add a lock-torture kernel module rcutorture: Abstract kvm-recheck.sh ...	2014-03-31 11:05:24 -07:00
Steven Whitehouse	1b2ad41214	GFS2: Fix address space from page function Now that rgrps use the address space which is part of the super block, we need to update gfs2_mapping2sbd() to take account of that. The only way to do that easily is to use a different set of address_space_operations for rgrps. Reported-by: Abhi Das <adas@redhat.com> Tested-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-31 17:48:27 +01:00
Abhi Das	059788039f	GFS2: Fix uninitialized VFS inode in gfs2_create_inode When gfs2_create_inode() fails due to quota violation, the VFS inode is not completely uninitialized. This can cause a list corruption error. This patch correctly uninitializes the VFS inode when a quota violation occurs in the gfs2_create_inode codepath. Resolves: rhbz#1059808 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-31 16:41:39 +01:00
Jeff Layton	29723adee1	locks: make locks_mandatory_area check for file-private locks Allow locks_mandatory_area() to handle file-private locks correctly. If there is a file-private lock set on an open file and we're doing I/O via the same, then that should not cause anything to block. Handle this by first doing a non-blocking FL_ACCESS check for a file-private lock, and then fall back to checking for a classic POSIX lock (and possibly blocking). Note that this approach is subject to the same races that have always plagued mandatory locking on Linux. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	d7a06983a0	locks: fix locks_mandatory_locked to respect file-private locks As Trond pointed out, you can currently deadlock yourself by setting a file-private lock on a file that requires mandatory locking and then trying to do I/O on it. Avoid this problem by plumbing some knowledge of file-private locks into the mandatory locking code. In order to do this, we must pass down information about the struct file that's being used to locks_verify_locked. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	90478939dc	locks: require that flock->l_pid be set to 0 for file-private locks Neil Brown suggested potentially overloading the l_pid value as a "lock context" field for file-private locks. While I don't think we will probably want to do that here, it's probably a good idea to ensure that in the future we could extend this API without breaking existing callers. Typically the l_pid value is ignored for incoming struct flock arguments, serving mainly as a place to return the pid of the owner if there is a conflicting lock. For file-private locks, require that it currently be set to 0 and return EINVAL if it isn't. If we eventually want to make a non-zero l_pid mean something, then this will help ensure that we don't break legacy programs that are using file-private locks. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	5d50ffd7c3	locks: add new fcntl cmd values for handling file private locks Due to some unfortunate history, POSIX locks have very strange and unhelpful semantics. The thing that usually catches people by surprise is that they are dropped whenever the process closes any file descriptor associated with the inode. This is extremely problematic for people developing file servers that need to implement byte-range locks. Developers often need a "lock management" facility to ensure that file descriptors are not closed until all of the locks associated with the inode are finished. Additionally, "classic" POSIX locks are owned by the process. Locks taken between threads within the same process won't conflict with one another, which renders them useless for synchronization between threads. This patchset adds a new type of lock that attempts to address these issues. These locks conflict with classic POSIX read/write locks, but have semantics that are more like BSD locks with respect to inheritance and behavior on close. This is implemented primarily by changing how fl_owner field is set for these locks. Instead of having them owned by the files_struct of the process, they are instead owned by the filp on which they were acquired. Thus, they are inherited across fork() and are only released when the last reference to a filp is put. These new semantics prevent them from being merged with classic POSIX locks, even if they are acquired by the same process. These locks will also conflict with classic POSIX locks even if they are acquired by the same process or on the same file descriptor. The new locks are managed using a new set of cmd values to the fcntl() syscall. The initial implementation of this converts these values to "classic" cmd values at a fairly high level, and the details are not exposed to the underlying filesystem. We may eventually want to push this handing out to the lower filesystem code but for now I don't see any need for it. Also, note that with this implementation the new cmd values are only available via fcntl64() on 32-bit arches. There's little need to add support for legacy apps on a new interface like this. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	57b65325fe	locks: skip deadlock detection on FL_FILE_PVT locks It's not really feasible to do deadlock detection with FL_FILE_PVT locks since they aren't owned by a single task, per-se. Deadlock detection also tends to be rather expensive so just skip it for these sorts of locks. Also, add a FIXME comment about adding more limited deadlock detection that just applies to ro -> rw upgrades, per Andy's request. Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	c1e62b8fc3	locks: pass the cmd value to fcntl_getlk/getlk64 Once we introduce file private locks, we'll need to know what cmd value was used, as that affects the ownership and whether a conflict would arise. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	3fd80cddc6	locks: report l_pid as -1 for FL_FILE_PVT locks FL_FILE_PVT locks are no longer tied to a particular pid, and are instead inheritable by child processes. Report a l_pid of '-1' for these sorts of locks since the pid is somewhat meaningless for them. This precedent comes from FreeBSD. There, POSIX and flock() locks can conflict with one another. If fcntl(F_GETLK, ...) returns a lock set with flock() then the l_pid member cannot be a process ID because the lock is not held by a process as such. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	c918d42a27	locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" In a later patch, we'll be adding a new type of lock that's owned by the struct file instead of the files_struct. Those sorts of locks will be flagged with a new FL_FILE_PVT flag. Report these types of locks as "FLPVT" in /proc/locks to distinguish them from "classic" POSIX locks. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	78ed8a1338	locks: rename locks_remove_flock to locks_remove_file This function currently removes leases in addition to flock locks and in a later patch we'll have it deal with file-private locks too. Rename it to locks_remove_file to indicate that it removes locks that are associated with a particular struct file, and not just flock locks. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	bce7560d49	locks: consolidate checks for compatible filp->f_mode values in setlk handlers Move this check into flock64_to_posix_lock instead of duplicating it in two places. This also fixes a minor wart in the code where we continue referring to the struct flock after converting it to struct file_lock. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
J. Bruce Fields	ef12e72a01	locks: fix posix lock range overflow handling In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit off_t. The existing range checks also seem to depend on signed arithmetic wrapping when it overflows. In practice maybe that works, but we can be more careful. That also allows us to make a more reliable distinction between -EINVAL and -EOVERFLOW. Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller to set a lock with starting point no longer representable as a 32-bit value. We could return -EOVERFLOW in such cases, but the locks code is capable of handling such ranges, so we choose to be lenient here. The only problem is that subsequent GETLK calls on such a lock will fail with EOVERFLOW. While we're here, do some cleanup including consolidating code for the flock and flock64 cases. Signed-off-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	8c3cac5e6a	locks: eliminate BUG() call when there's an unexpected lock on file close A leftover lock on the list is surely a sign of a problem of some sort, but it's not necessarily a reason to panic the box. Instead, just log a warning with some info about the lock, and then delete it like we would any other lock. In the event that the filesystem declares a ->lock f_op, we may end up leaking something, but that's generally preferable to an immediate panic. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	b03dfdec03	locks: add __acquires and __releases annotations to locks_start and locks_stop ...to make sparse happy. Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	6ca10ed8ed	locks: remove "inline" qualifier from fl_link manipulation functions It's best to let the compiler decide that. Acked-by: J. Bruce Fields <bfields@fieldses.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	46dad7603f	locks: clean up comment typo Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:42 -04:00
Jeff Layton	24cbe7845e	locks: close potential race between setlease and open As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode->i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Because we take dentry references and acquire write access on the file before calling break_lease, we know that if the i_flock list is empty when the open caller goes to check it then the necessary refcounts have already been incremented. Thus the additional check for a conflicting open will see that there is one and the setlease call will fail. Cc: Bruce Fields <bfields@fieldses.org> Cc: David Howells <dhowells@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@fieldses.org>	2014-03-31 08:24:42 -04:00
Abhi Das	e9fb7c73a4	GFS2: Fix return value in slot_get() ENOSPC was being returned in slot_get inspite of successful execution of the function. This patch fixes this return code. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-31 10:43:05 +01:00
Grant Likely	d88cf7d7b4	Merge remote-tracking branch 'robh/for-next' into devicetree/next	2014-03-31 08:10:55 +01:00
Linus Torvalds	fedc1ed0f1	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Switch mnt_hash to hlist, turning the races between __lookup_mnt() and hash modifications into false negatives from __lookup_mnt() (instead of hangs)" On the false negatives from __lookup_mnt(): "The only thing we care about is not getting stuck in __lookup_mnt(). If it misses an entry because something in front of it just got moved around, etc, we are fine. We'll notice that mount_lock mismatch and that'll be it" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch mnt_hash to hlist don't bother with propagate_mnt() unless the target is shared keep shadowed vfsmounts together resizable namespace.c hashes	2014-03-30 17:26:08 -07:00
Theodore Ts'o	00a1a053eb	ext4: atomically set inode->i_flags in ext4_set_inode_flags() Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-30 17:02:06 -07:00
Al Viro	38129a13e6	switch mnt_hash to hlist fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-30 19:18:51 -04:00
Al Viro	0b1b901b5a	don't bother with propagate_mnt() unless the target is shared If the dest_mnt is not shared, propagate_mnt() does nothing - there's no mounts to propagate to and thus no copies to create. Might as well don't bother calling it in that case. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-30 19:18:50 -04:00
Al Viro	1d6a32acd7	keep shadowed vfsmounts together preparation to switching mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-30 19:18:50 -04:00
Al Viro	0818bf27c0	resizable namespace.c hashes * switch allocation to alloc_large_system_hash() * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) * switch mountpoint_hashtable from list_head to hlist_head Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-30 19:18:49 -04:00
Trond Myklebust	e911b8158e	NFSv4: Fix a use-after-free problem in open() If we interrupt the nfs4_wait_for_completion_rpc_task() call in nfs4_run_open_task(), then we don't prevent the RPC call from completing. So freeing up the opendata->f_attr.mdsthreshold in the error path in _nfs4_do_open() leads to a use-after-free when the XDR decoder tries to decode the mdsthreshold information from the server. Fixes: `82be417aa3` (NFSv4.1 cache mdsthreshold values on OPEN) Tested-by: Steve Dickson <SteveD@redhat.com> Cc: stable@vger.kernel.org # 3.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-28 20:12:10 -04:00
Sasha Levin	d9060742fb	ocfs2: check if cluster name exists before deref Commit `c74a3bdd9b` ("ocfs2: add clustername to cluster connection") is trying to strlcpy a string which was explicitly passed as NULL in the very same patch, triggering a NULL ptr deref. BUG: unable to handle kernel NULL pointer dereference at (null) IP: strlcpy (lib/string.c:388 lib/string.c:151) CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274 RIP: strlcpy (lib/string.c:388 lib/string.c:151) Call Trace: ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350) ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396) user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679) dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503) vfs_mkdir (fs/namei.c:3467) SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472) tracesys (arch/x86/kernel/entry_64.S:749) akpm: this patch probably disables the feature. A temporary thing to avoid triviel oopses. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-28 13:56:58 -07:00
Jan Kara	75c5a52da3	vfs: Allocate anon_inode_inode in anon_inode_init() Currently we allocated anon_inode_inode in anon_inodefs_mount. This is somewhat fragile as if that function ever gets called again, it will overwrite anon_inode_inode pointer. So move the initialization of anon_inode_inode to anon_inode_init(). Signed-off-by: Jan Kara <jack@suse.cz> [ Further simplified on suggestion from Dave Jones ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-27 09:52:54 -07:00
Greg Kroah-Hartman	72099304ee	Revert "sysfs, driver-core: remove unused {sysfs\|device}_schedule_callback_owner()" This reverts commit `d1ba277e79`. As reported by Stephen, this patch breaks linux-next as a ppc patch suddenly (after 2 years) started using this old api call. So revert it for now, it will go away in 3.15-rc2 when we can change the PPC call to the new api. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tejun Heo <tj@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-03-25 20:54:57 -07:00
Linus Torvalds	fce7fc79c8	fs: remove now stale label in anon_inode_init() The previous commit removed the register_filesystem() call and the associated error handling, but left the label for the error path that no longer exists. Remove that too. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-25 17:43:34 -07:00
Jan Kara	d6f2589ad5	fs: Avoid userspace mounting anon_inodefs filesystem anon_inodefs filesystem is a kernel internal filesystem userspace shouldn't mess with. Remove registration of it so userspace cannot even try to mount it (which would fail anyway because the filesystem is MS_NOUSER). This fixes an oops triggered by trinity when it tried mounting anon_inodefs which overwrote anon_inode_inode pointer while other CPU has been in anon_inode_getfile() between ihold() and d_instantiate(). Thus effectively creating dentry pointing to an inode without holding a reference to it. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-25 17:42:16 -07:00
Linus Torvalds	632b06aa28	Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd fix frm Bruce Fields: "J R Okajima sent this early and I was just slow to pass it along, apologies. Fortunately it's a simple fix" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: nfsd: fix lost nfserrno() call in nfsd_setattr()	2014-03-25 15:24:11 -07:00
Matthew Wilcox	e04027e887	ext4: fix comment typo Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 15:15:07 -04:00
Matthew Wilcox	94350ab5c3	ext4: make ext4_block_zero_page_range static It's only called within inode.c, so make it static, remove its prototype from ext4.h and move it above all of its callers so it doesn't need a prototype within inode.c. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 15:09:16 -04:00
Theodore Ts'o	5f16f3225b	ext4: atomically set inode->i_flags in ext4_set_inode_flags() Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2014-03-24 14:43:12 -04:00
Theodore Ts'o	ed3654eb98	ext4: optimize Hurd tests when reading/writing inodes Set a in-memory superblock flag to indicate whether the file system is designed to support the Hurd. Also, add a sanity check to make sure the 64-bit feature is not set for Hurd file systems, since i_file_acl_high conflicts with a Hurd-specific field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 14:09:06 -04:00
Rusty Russell	58f86cc89c	VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms. Summary of http://lkml.org/lkml/2014/3/14/363 : Ted: module_param(queue_depth, int, 444) Joe: 0444! Rusty: User perms >= group perms >= other perms? Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2? Side effect of stricter permissions means removing the unnecessary S_IFREG from several callers. Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair number of drivers fail this test, so that will be the debate for a future patch. Suggested-by: Joe Perches <joe@perches.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-03-24 12:21:00 +10:30
Al Viro	b37199e626	rcuwalk: recheck mount_lock after mountpoint crossing attempts We can get false negative from __lookup_mnt() if an unrelated vfsmount gets moved. In that case legitimize_mnt() is guaranteed to fail, and we will fall back to non-RCU walk... unless we end up running into a hard error on a filesystem object we wouldn't have reached if not for that false negative. IOW, delaying that check until the end of pathname resolution is wrong - we should recheck right after we attempt to cross the mountpoint. We don't need to recheck unless we see d_mountpoint() being true - in that case even if we have just raced with mount/umount, we can simply go on as if we'd come at the moment when the sucker wasn't a mountpoint; if we run into a hard error as the result, it was a legitimate outcome. __lookup_mnt() returning NULL is different in that respect, since it might've happened due to operation on completely unrelated mountpoint. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-23 00:32:55 -04:00
Al Viro	e825196d48	make prepend_name() work correctly when called with negative buflen In all callchains leading to prepend_name(), the value left in buflen is eventually discarded unused if prepend_name() has returned a negative. So we are free to do what prepend() does, and subtract from buflen before* checking for underflow (which turns into checking the sign of subtraction result, of course). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-23 00:28:40 -04:00
Eric Biggers	99aea68134	vfs: Don't let __fdget_pos() get FMODE_PATH files Commit `bd2a31d522` ("get rid of fget_light()") introduced the __fdget_pos() function, which returns the resulting file pointer and fdput flags combined in an 'unsigned long'. However, it also changed the behavior to return files with FMODE_PATH set, which shouldn't happen because read(), write(), lseek(), etc. aren't allowed on such files. This commit restores the old behavior. This regression actually had no effect on read() and write() since FMODE_READ and FMODE_WRITE are not set on file descriptors opened with O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH to fail with ESPIPE rather than EBADF. Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-23 00:03:12 -04:00
Eric Biggers	d7a15f8d07	vfs: atomic f_pos access in llseek() Commit `9c225f2655` ("vfs: atomic f_pos accesses as per POSIX") changed several system calls to use fdget_pos() instead of fdget(), but missed sys_llseek(). Fix it. Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-23 00:03:12 -04:00
Liu Bo	00fdf13a2e	Btrfs: fix a crash of clone with inline extents's split xfstests's btrfs/035 triggers a BUG_ON, which we use to detect the split of inline extents in __btrfs_drop_extents(). For inline extents, we cannot duplicate another EXTENT_DATA item, because it breaks the rule of inline extents, that is, 'start offset' needs to be 0. We have set limitations for the source inode's compressed inline extents, because it needs to decompress and recompress. Now the destination inode's inline extents also need similar limitations. With this, xfstests btrfs/035 doesn't run into panic. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-21 17:35:18 -07:00
Chris Mason	73b802f447	btrfs: fix uninit variable warning fs/btrfs/send.c:2926: warning: ‘entry’ may be used uninitialized in this function Signed-off-by: Chris Mason <clm@fb.com>	2014-03-21 15:30:44 -07:00
Josef Bacik	4485386853	Btrfs: take into account total references when doing backref lookup I added an optimization for large files where we would stop searching for backrefs once we had looked at the number of references we currently had for this extent. This works great most of the time, but for snapshots that point to this extent and has changes in the original root this assumption falls on it face. So keep track of any delayed ref mods made and add in the actual ref count as reported by the extent item and use that to limit how far down an inode we'll search for extents. Thanks, Reportedy-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Josef Bacik <jbacik@fb.com> Reported-by: Hugo Mills <hugo@carfax.org.uk> Tested-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-21 15:28:09 -07:00
Filipe Manana	bfa7e1f8be	Btrfs: part 2, fix incremental send's decision to delay a dir move/rename For an incremental send, fix the process of determining whether the directory inode we're currently processing needs to have its move/rename operation delayed. We were ignoring the fact that if the inode's new immediate ancestor has a higher inode number than ours but wasn't renamed/moved, we might still need to delay our move/rename, because some other ancestor directory higher in the hierarchy might have an inode number higher than ours and was renamed/moved too - in this case we have to wait for rename/move of that ancestor to happen before our current directory's rename/move operation. Simple steps to reproduce this issue: $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir -p /mnt/a/x1/x2 $ mkdir /mnt/a/Z $ mkdir -p /mnt/a/x1/x2/x3/x4/x5 $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mv /mnt/a/x1/x2/x3 /mnt/a/Z/X33 $ mv /mnt/a/x1/x2 /mnt/a/Z/X33/x4/x5/X22 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send The incremental send caused the kernel code to enter an infinite loop when building the path string for directory Z after its references are processed. A more complex scenario: $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir -p /mnt/a/b/c/d $ mkdir /mnt/a/b/c/d/e $ mkdir /mnt/a/b/c/d/f $ mv /mnt/a/b/c/d/e /mnt/a/b/c/d/f/E2 $ mkdir /mmt/a/b/c/g $ mv /mnt/a/b/c/d /mnt/a/b/D2 $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mkdir /mnt/a/o $ mv /mnt/a/b/c/g /mnt/a/b/D2/f/G2 $ mv /mnt/a/b/D2 /mnt/a/b/dd $ mv /mnt/a/b/c /mnt/a/C2 $ mv /mnt/a/b/dd/f /mnt/a/o/FF $ mv /mnt/a/b /mnt/a/o/FF/E2/BB $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-21 15:25:48 -07:00
Filipe Manana	7b119a8b89	Btrfs: fix incremental send's decision to delay a dir move/rename It's possible to change the parent/child relationship between directories in such a way that if a child directory has a higher inode number than its parent, it doesn't necessarily means the child rename/move operation can be performed immediately. The parent migth have its own rename/move operation delayed, therefore in this case the child needs to have its rename/move operation delayed too, and be performed after its new parent's rename/move. Steps to reproduce the issue: $ umount /mnt $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir /mnt/A $ mkdir /mnt/B $ mkdir /mnt/C $ mv /mnt/C /mnt/A $ mv /mnt/B /mnt/A/C $ mkdir /mnt/A/C/D $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mv /mnt/A/C/D /mnt/A/D2 $ mv /mnt/A/C/B /mnt/A/D2/B2 $ mv /mnt/A/C /mnt/A/D2/B2/C2 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send The incremental send caused the kernel code to enter an infinite loop when building the path string for directory C after its references are processed. The necessary conditions here are that C has an inode number higher than both A and B, and B as an higher inode number higher than A, and D has the highest inode number, that is: inode_number(A) < inode_number(B) < inode_number(C) < inode_number(D) The same issue could happen if after the first snapshot there's any number of intermediary parent directories between A2 and B2, and between B2 and C2. A test case for xfstests follows, covering this simple case and more advanced ones, with files and hard links created inside the directories. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-21 15:24:27 -07:00
Filipe Manana	425b5dafc8	Btrfs: remove unnecessary inode generation lookup in send No need to search in the send tree for the generation number of the inode, we already have it in the recorded_ref structure passed to us. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:28 -07:00
Filipe Manana	21543baddc	Btrfs: fix race when updating existing ref head While we update an existing ref head's extent_op, we're not holding its spinlock, so while we're updating its extent_op contents (key, flags) we can have a task running __btrfs_run_delayed_refs() that holds the ref head's lock and sets its extent_op to NULL right after the task updating the ref head just checked its extent_op was not NULL. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:28 -07:00
Qu Wenruo	c3a468915a	btrfs: Add trace for btrfs_workqueue alloc/destroy Since most of the btrfs_workqueue is printed as pointer address, for easier analysis, add trace for btrfs_workqueue alloc/destroy. So it is possible to determine the workqueue that a given work belongs to(by comparing the wq pointer address with alloc trace event). Signed-off-by: Qu Wenruo <quenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:28 -07:00
Filipe Manana	f094c9bd3e	Btrfs: less fs tree lock contention when using autodefrag When finding new extents during an autodefrag, don't do so many fs tree lookups to find an extent with a size smaller then the target treshold. Instead, after each fs tree forward search immediately unlock upper levels and process the entire leaf while holding a read lock on the leaf, since our leaf processing is very fast. This reduces lock contention, allowing for higher concurrency when other tasks want to write/update items related to other inodes in the fs tree, as we're not holding read locks on upper tree levels while processing the leaf and we do less tree searches. Test: sysbench --test=fileio --file-num=512 --file-total-size=16G \ --file-test-mode=rndrw --num-threads=32 --file-block-size=32768 \ --file-rw-ratio=3 --file-io-mode=sync --max-time=1800 \ --max-requests=10000000000 [prepare\|run] (fileystem mounted with -o autodefrag, averages of 5 runs) Before this change: 58.852Mb/sec throughtput, read 77.589Gb, written 25.863Gb After this change: 63.034Mb/sec throughtput, read 83.102Gb, written 27.701Gb Test machine: quad core intel i5-3570K, 32Gb of RAM, SSD. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:27 -07:00
Guangyu Sun	72de6b5393	Btrfs: return EPERM when deleting a default subvolume The error message is confusing: # btrfs sub delete /mnt/mysub/ Delete subvolume '/mnt/mysub' ERROR: cannot delete '/mnt/mysub' - Directory not empty The error message does not make sense to me: It's not about deleting a directory but it's a subvolume, and it doesn't matter if the subvolume is empty or not. Maybe EPERM or is more appropriate in this case, combined with an explanatory kernel log message. (e.g. "subvolume with ID 123 cannot be deleted because it is configured as default subvolume.") Reported-by: Koen De Wit <koen.de.wit@oracle.com> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:27 -07:00
Filipe Manana	ef66af101a	Btrfs: add missing kfree in btrfs_destroy_workqueue Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:27 -07:00
Filipe Manana	308d9800b2	Btrfs: cache extent states in defrag code path When locking file ranges in the inode's io_tree, cache the first extent state that belongs to the target range, so that when unlocking the range we don't need to search in the io_tree again, reducing cpu time and making and therefore holding the io_tree's lock for a shorter period. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:27 -07:00
Josef Bacik	3bbb24b20a	Btrfs: fix deadlock with nested trans handles Zach found this deadlock that would happen like this btrfs_end_transaction <- reduce trans->use_count to 0 btrfs_run_delayed_refs btrfs_cow_block find_free_extent btrfs_start_transaction <- increase trans->use_count to 1 allocate chunk btrfs_end_transaction <- decrease trans->use_count to 0 btrfs_run_delayed_refs lock tree block we are cowing above ^^ We need to only decrease trans->use_count if it is above 1, otherwise leave it alone. This will make nested trans be the only ones who decrease their added ref, and will let us get rid of the trans->use_count++ hack if we have to commit the transaction. Thanks, cc: stable@vger.kernel.org Reported-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Zach Brown <zab@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-03-20 17:15:27 -07:00
Theodore Ts'o	c4f6570605	ext4: kill i_version support for Hurd-castrated file systems The Hurd file system uses uses the inode field which is now used for i_version for its translator block. This means that ext2 file systems that are formatted for GNU Hurd can't be used to support NFSv4. Given that Hurd file systems don't support extents, and a huge number of modern file system features, this is no great loss. If we don't do this, the attempt to update the i_version field will stomp over the translator block field, which will cause file system corruption for Hurd file systems. This can be replicated via: mke2fs -t ext2 -o hurd /dev/vdc mount -t ext4 /dev/vdc /vdc touch /vdc/bug0000 umount /dev/vdc e2fsck -f /dev/vdc Addresses-Debian-Bug: #738758 Reported-By: Gabriele Giacone <1o5g4r8o@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-20 00:32:57 -04:00
Bob Peterson	733dbc1b21	GFS2: inline function gfs2_set_mode Here is a revised patch based on Steve's feedback: This patch eliminates function gfs2_set_mode which was only called in one place, and always returned 0. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-19 15:53:52 +00:00
Bob Peterson	f45dc26ded	GFS2: Remove extraneous function gfs2_security_init This patch eliminates function gfs2_security_init in favor of just calling security_inode_init_security directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-19 15:16:58 +00:00
Bob Peterson	b00263d1ca	GFS2: Increase the max number of ACLs This patch increases the maximum number of ACLs from 25 to 300 for a 4K block size. The value is adjusted accordingly if the block size is smaller. Note that this is an arbitrary limit with a performance tradeoff, and that the physical limit is slightly over 500. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-19 15:16:24 +00:00
Trond Myklebust	150e7260f3	NFSv4: Ensure we respect soft mount timeouts during trunking discovery Tested-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-19 08:34:40 -04:00
Trond Myklebust	f9b7ebdf7e	NFSv4: Schedule recovery if nfs40_walk_client_list() is interrupted If a timeout or a signal interrupts the NFSv4 trunking discovery SETCLIENTID_CONFIRM call, then we don't know whether or not the server has changed the callback identifier on us. Assume that it did, and schedule a 'path down' recovery... Tested-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-19 08:34:20 -04:00
T Makphaibulchoke	9c191f701c	ext4: each filesystem creates and uses its own mb_cache This patch adds new interfaces to create and destory cache, ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove the cache creation and destory calls from ex4_init_xattr() and ext4_exitxattr() in fs/ext4/xattr.c. fs/ext4/super.c has been changed so that when a filesystem is mounted a cache is allocated and attched to its ext4_sb_info structure. fs/mbcache.c has been changed so that only one slab allocator is allocated and used by all mbcache structures. Signed-off-by: T. Makphaibulchoke <tmac@hp.com>	2014-03-18 19:24:49 -04:00
T Makphaibulchoke	1f3e55fe02	fs/mbcache.c: doucple the locking of local from global data The patch increases the parallelism of mbcache by using the built-in lock in the hlist_bl_node to protect the mb_cache's local block and index hash chains. The global data mb_cache_lru_list and mb_cache_list continue to be protected by the global mb_cache_spinlock. New block group spinlock, mb_cache_bg_lock is also added to serialize accesses to mb_cache_entry's local data. A new member e_refcnt is added to the mb_cache_entry structure to help preventing an mb_cache_entry from being deallocated by a free while it is being referenced by either mb_cache_entry_get() or mb_cache_entry_find(). Signed-off-by: T. Makphaibulchoke <tmac@hp.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 19:23:20 -04:00
T Makphaibulchoke	3e037e5211	fs/mbcache.c: change block and index hash chain to hlist_bl_node This patch changes each mb_cache's both block and index hash chains to use a hlist_bl_node, which contains a built-in lock. This is the first step in decoupling of locks serializing accesses to mb_cache global data and each mb_cache_entry local data. Signed-off-by: T. Makphaibulchoke <tmac@hp.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 19:19:41 -04:00
Lukas Czerner	b8a8684502	ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Also add appropriate tracepoints. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 18:05:35 -04:00
Lukas Czerner	0e8b6879f3	ext4: refactor ext4_fallocate code Move block allocation out of the ext4_fallocate into separate function called ext4_alloc_file_blocks(). This will allow us to use the same allocation code for other allocation operations such as zero range which is commit in the next patch. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 18:03:51 -04:00
Lukas Czerner	f282ac19d8	ext4: Update inode i_size after the preallocation Currently in ext4_fallocate we would update inode size, c_time and sync the file with every partial allocation which is entirely unnecessary. It is true that if the crash happens in the middle of truncate we might end up with unchanged i size, or c_time which I do not think is really a problem - it does not mean file system corruption in any way. Note that xfs is doing things the same way e.g. update all of the mentioned after the allocation is done. This commit moves all the updates after the allocation is done. In addition we also need to change m_time as not only inode has been change bot also data regions might have changed (unwritten extents). However m_time will be only updated when i_size changed. Also we do not need to be paranoid about changing the c_time only if the actual allocation have happened, we can change it even if we try to allocate only to find out that there are already block allocated. It's not really a big deal and it will save us some additional complexity. Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG section. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>- -- v3: Do not remove the code to set EXT4_INODE_EOFBLOCKS flag fs/ext4/extents.c \| 96 ++++++++++++++++++++++++------------------------------- 1 file changed, 42 insertions(+), 54 deletions(-)	2014-03-18 17:44:35 -04:00
Liu ShuoX	e32634f5d5	pstore: Fix memory leak when decompress using big_oops_buf After sucessful decompressing, the buffer which pointed by 'buf' will be lost as 'buf' is overwrite by 'big_oops_buf' and will never be freed. Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-17 14:14:03 -07:00
Liu ShuoX	017321cf39	pstore: Fix buffer overflow while write offset equal to buffer size In case new offset is equal to prz->buffer_size, it won't wrap at this time and will return old(overflow) value next time. Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-17 14:14:03 -07:00
Liu ShuoX	34f0ec82e0	pstore: Correct the max_dump_cnt clearing of ramoops In case that ramoops_init_przs failed, max_dump_cnt won't be reset to zero in error handle path. Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-17 14:14:03 -07:00
Liu ShuoX	b0aa931fb8	pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz ramoops_get_next_prz get the prz according the paramters. If it get a uninitialized prz, access its members by following persistent_ram_old_size(prz) will cause a NULL pointer crash. Ex: if ftrace_size is 0, fprz will be NULL. Fix it by return NULL in advance. Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-17 14:14:03 -07:00
Liu ShuoX	aa9a4a1edf	pstore: skip zero size persistent ram buffer in traverse In ramoops_pstore_read, a valid prz pointer with zero size buffer will break traverse of all persistent ram buffers. The latter buffer might be lost. Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Cc: Colin Cross <ccross@android.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-17 14:14:03 -07:00
Liu ShuoX	57fd835385	pstore: clarify clearing of _read_cnt in ramoops_context *_read_cnt in ramoops_context need to be cleared during pstore ->open to support mutli times getting the records. The patch added missed ftrace_read_cnt clearing and removed duplicate clearing in ramoops_probe. Signed-off-by: Liu ShuoX <shuox.liu@intel.com> Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-17 14:14:03 -07:00
Chuck Lever	706cb8db3b	NFS: advertise only supported callback netids NFSv4.0 clients use the SETCLIENTID operation to inform NFS servers how to contact a client's callback service. If a server cannot contact a client's callback service, that server will not delegate to that client, which results in a performance loss. Our client advertises "rdma" as the callback netid when the forward channel is "rdma". But our client always starts only "tcp" and "tcp6" callback services. Instead of advertising the forward channel netid, advertise "tcp" or "tcp6" as the callback netid, based on the value of the clientaddr mount option, since those are what our client currently supports. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69171 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 16:04:54 -04:00
Chuck Lever	a7697f6ff8	NFS: Clean up: revert increase in READDIR RPC buffer max size Security labels go with each directory entry, thus they are always stored in the page cache, not in the head buffer. The length of the reply that goes in head[0] should not have changed to support NFSv4.2 labels. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 15:30:38 -04:00
Trond Myklebust	bd0f725c4c	Merge branch 'devel' into linux-next	2014-03-17 15:15:21 -04:00
Jeff Layton	f7be728468	nfs: emit a fsnotify_nameremove call in sillyrename codepath If a file is sillyrenamed, then the generic vfs_unlink code will skip emitting fsnotify events for it. This patch has the sillyrename code do that instead. In truth this is a little bit odd since we aren't actually removing the dentry per-se, but renaming it. Still, this is probably the right thing to do since it's what userland apps expect to see when an unlink() occurs or some file is renamed on top of the dentry. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 15:14:17 -04:00
Jeff Layton	33912be816	nfs: remove synchronous rename code Now that nfs_rename uses the async infrastructure, we can remove this. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 15:14:17 -04:00
Jeff Layton	80a491fd40	nfs: convert nfs_rename to use async_rename infrastructure There isn't much sense in maintaining two separate versions of rename code. Convert nfs_rename to use the asynchronous rename infrastructure that nfs_sillyrename uses, and emulate synchronous behavior by having the task just wait on the reply. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 15:14:17 -04:00
Jeff Layton	0e862a4051	nfs: make nfs_async_rename non-static ...and move the prototype for nfs_sillyrename to internal.h. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 15:14:16 -04:00
Jeff Layton	96f9d8c074	nfs: abstract out code needed to complete a sillyrename The async rename code is currently "polluted" with some parts that are really just for sillyrenames. Add a new "complete" operation vector to the nfs_renamedata to separate out the stuff that just needs to be done for a sillyrename. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-03-17 15:14:16 -04:00
Eric Whitney	c063449394	ext4: fix partial cluster handling for bigalloc file systems Commit `9cb00419fa`, which enables hole punching for bigalloc file systems, exposed a bug introduced by commit `6ae06ff51e` in an earlier release. When run on a bigalloc file system, xfstests generic/013, 068, 075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors or cause kernel error messages indicating that previously freed blocks are being freed again. The latter commit optimizes the selection of the starting extent in ext4_ext_rm_leaf() when hole punching by beginning with the extent supplied in the path argument rather than with the last extent in the leaf node (as is still done when truncating). However, the code in rm_leaf that initially sets partial_cluster to track cluster sharing on extent boundaries is only guaranteed to run if rm_leaf starts with the last node in the leaf. Consequently, partial_cluster is not correctly initialized when hole punching, and a cluster on the boundary of a punched region that should be retained may instead be deallocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-03-13 23:34:16 -04:00
Eric Whitney	31cf0f2c31	ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents Code deallocating the extent path referenced by an argument to ext4_ext_handle_uninitialized_extents was made redundant with identical code in its one caller, ext4_ext_map_blocks, by commit `3779473246`. Allocating and deallocating the path in the same function also makes the code clearer. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-13 23:14:46 -04:00
Theodore Ts'o	38c03b3439	ext4: only call sync_filesystm() when remounting read-only This is the only time it is required for ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-13 22:49:42 -04:00
Frederic Weisbecker	bfc3f0281e	cputime: Default implementation of nsecs -> cputime conversion The architectures that override cputime_t (s390, ppc) don't provide any version of nsecs_to_cputime(). Indeed this cputime_t implementation by backend only happens when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y under which the core code doesn't make any use of nsecs_to_cputime(). At least for now. We are going to make a broader use of it so lets provide a default version with a per usecs granularity. It should be good enough for most usecases. Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-03-13 15:56:43 +01:00
Theodore Ts'o	02b9984d64	fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org	2014-03-13 10:14:33 -04:00
Dave Chinner	fe986f9d88	Merge branch 'xfs-O_TMPFILE-support' into for-next Conflicts: fs/xfs/xfs_trans_resv.c - fix for XFS_INODE_CLUSTER_SIZE macro removal	2014-03-13 19:14:43 +11:00
Dave Chinner	5f44e4c185	Merge branch 'xfs-bug-fixes-for-3.15-2' into for-next	2014-03-13 19:13:05 +11:00
Dave Chinner	49ae4b97d7	Merge branch 'xfs-verifier-cleanup' into for-next	2014-03-13 19:12:33 +11:00
Dave Chinner	730357a5cb	Merge branch 'xfs-stack-fixes' into for-next	2014-03-13 19:12:13 +11:00
Dave Chinner	b6db0551fd	Merge branch 'xfs-collapse-range' into for-next	2014-03-13 19:11:06 +11:00
Lukas Czerner	376ba31314	xfs: Add support for FALLOC_FL_ZERO_RANGE Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. We can also preallocate blocks past EOF in the same was as with fallocate. Flag FALLOC_FL_KEEP_SIZE will cause the inode size to remain the same even if we preallocate blocks past EOF. It uses the same code to zero range as it is used by the XFS_IOC_ZERO_RANGE ioctl. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-03-13 19:07:58 +11:00
Lukas Czerner	409332b65d	fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents - even though file system may choose to zero out the extent or do whatever which will result in reading zeros from the range while the range remains allocated for the file. This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-03-13 19:07:42 +11:00
Theodore Ts'o	66a4cb187b	jbd2: improve error messages for inconsistent journal heads Fix up error messages printed when the transaction pointers in a journal head are inconsistent. This improves the error messages which are printed when running xfstests generic/068 in data=journal mode. See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-12 16:38:03 -04:00
Bob Peterson	428fd95d85	GFS2: Re-add a call to log_flush_wait when flushing the journal Upstream commit `34cc178` changed a line of code from calling function log_flush_commit to calling log_write_header. This had the effect of eliminating a call to function log_flush_wait. That causes the journal to skip over log headers, which results in multiple wrap points, which itself leads to infinite loops in journal replay, both in the kernel code and fsck.gfs2 code. This patch re-adds that call. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-12 14:46:29 +00:00
Bob Peterson	01b172b7b1	GFS2: Ensure workqueue is scheduled after noexp request This patch closes a small timing window whereby a request to hold the transaction glock can get stuck. The problem is that after the DLM has granted the lock, it can get into a state whereby it doesn't transition the glock to a held state, due to not having requeued the glock state machine to finish the transition. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-12 14:45:48 +00:00
Abhi Das	48f8f711ed	GFS2: check NULL return value in gfs2_ok_to_move gfs2_lookupi() can return NULL if the path to the root is broken by another rename/rmdir. In this case gfs2_ok_to_move() must check for this NULL pointer and return error. Resolves: rhbz#1060246 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-12 09:50:27 +00:00
Grant Likely	8357041a69	of: remove /proc/device-tree The same data is now available in sysfs, so we can remove the code that exports it in /proc and replace it with a symlink to the sysfs version. Tested on versatile qemu model and mpc5200 eval board. More testing would be appreciated. v5: Fixed up conflicts with mainline changes Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>	2014-03-11 20:48:32 +00:00
Linus Torvalds	33807f4f0d	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French: "A fix for the problem which Al spotted in cifs_writev and a followup (noticed when fixing CVE-2014-0069) patch to ensure that cifs never sends more than the smb frame length over the socket (as we saw with that cifs_iovec_write problem that Jeff fixed last month)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: mask off top byte in get_rfc1002_length() cifs: sanity check length of data to send before sending CIFS: Fix wrong pos argument of cifs_find_lock_conflict	2014-03-11 11:53:42 -07:00
Linus Torvalds	8712a00514	Merge branch 'akpm' (patches from Andrew Morton) Merge misc fixes from Andrew Morton: "Nine fixes" * emailed patches from Andrew Morton akpm@linux-foundation.org>: cris: convert ffs from an object-like macro to a function-like macro hfsplus: add HFSX subfolder count support tools/testing/selftests/ipc/msgque.c: handle msgget failure return correctly MAINTAINERS: blackfin: add git repository revert "kallsyms: fix absolute addresses for kASLR" mm/Kconfig: fix URL for zsmalloc benchmark fs/proc/base.c: fix GPF in /proc/$PID/map_files mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block mm: fix GFP_THISNODE callers and clarify	2014-03-10 17:26:36 -07:00
Sergei Antonov	d7d673a591	hfsplus: add HFSX subfolder count support Adds support for HFSX 'HasFolderCount' flag and a corresponding 'folderCount' field in folder records. (For reference see HFS_FOLDERCOUNT and kHFSHasFolderCountBit/kHFSHasFolderCountMask in Apple's source code.) Ignoring subfolder count leads to fs errors found by Mac: ... Checking catalog hierarchy. HasFolderCount flag needs to be set (id = 105) (It should be 0x10 instead of 0) Incorrect folder count in a directory (id = 2) (It should be 7 instead of 6) ... Steps to reproduce: Format with "newfs_hfs -s /dev/diskXXX". Mount in Linux. Create a new directory in root. Unmount. Run "fsck_hfs /dev/diskXXX". The patch handles directory creation, deletion, and rename. Signed-off-by: Sergei Antonov <saproj@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-10 17:26:21 -07:00
Artem Fetishev	70335abb26	fs/proc/base.c: fix GPF in /proc/$PID/map_files The expected logic of proc_map_files_get_link() is either to return 0 and initialize 'path' or return an error and leave 'path' uninitialized. By the time dname_to_vma_addr() returns 0 the corresponding vma may have already be gone. In this case the path is not initialized but the return value is still 0. This results in 'general protection fault' inside d_path(). Steps to reproduce: CONFIG_CHECKPOINT_RESTORE=y fd = open(...); while (1) { mmap(fd, ...); munmap(fd, ...); } ls -la /proc/$PID/map_files Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991 Signed-off-by: Artem Fetishev <artem_fetishev@epam.com> Signed-off-by: Aleksandr Terekhov <aleksandr_terekhov@epam.com> Reported-by: <wiebittewas@gmail.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-10 17:26:20 -07:00
Linus Torvalds	e6a4b6f5ea	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro. Clean up file table accesses (get rid of fget_light() in favor of the fdget() interface), add proper file position locking. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of fget_light() sockfd_lookup_light(): switch to fdget^W^Waway from fget_light vfs: atomic f_pos accesses as per POSIX ocfs2 syncs the wrong range...	2014-03-10 12:57:26 -07:00
Miao Xie	573bfb72f7	Btrfs: fix possible empty list access when flushing the delalloc inodes We didn't have a lock to protect the access to the delalloc inodes list, that is we might access a empty delalloc inodes list if someone start flushing delalloc inodes because the delalloc inodes were moved into a other list temporarily. Fix it by wrapping the access with a lock. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:29 -04:00
Miao Xie	31f3d255c6	Btrfs: split the global ordered extents mutex When we create a snapshot, we just need wait the ordered extents in the source fs/file root, but because we use the global mutex to protect this ordered extents list of the source fs/file root to avoid accessing a empty list, if someone got the mutex to access the ordered extents list of the other fs/file root, we had to wait. This patch splits the above global mutex, now every fs/file root has its own mutex to protect its own list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:28 -04:00
Miao Xie	6c255e67ce	Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock We needn't flush all delalloc inodes when we doesn't get s_umount lock, or we would make the tasks wait for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:27 -04:00
Miao Xie	24af7dd188	Btrfs: reclaim delalloc metadata more aggressively generic/074 in xfstests failed sometimes because of the enospc error, the reason of this problem is that we just reclaimed the space we need from the reserved space for delalloc, and then tried to reserve the space, but if some task did no-flush reservation between the above reclamation and reservation, Task1 Task2 shrink_delalloc() reclaim 1 block (The space that can be reserved now is 1 block) do no-flush reservation reserve 1 block (The space that can be reserved now is 0 block) reserving 1 block failed the reservation of Task1 failed, but in fact, there was enough space to reserve if we could reclaim more space before. Fix this problem by the aggressive reclamation of the reserved delalloc metadata space. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:26 -04:00
Miao Xie	0424c54897	Btrfs: remove unnecessary lock in may_commit_transaction() The reason is: - The per-cpu counter has its own lock to protect itself. - Here we needn't get a exact value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:25 -04:00
Miao Xie	b88935bf98	Btrfs: remove the unnecessary flush when preparing the pages Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:25 -04:00
Miao Xie	41bd9ca459	Btrfs: just do dirty page flush for the inode with compression before direct IO As the comment in the btrfs_direct_IO says, only the compressed pages need be flush again to make sure they are on the disk, but the common pages needn't, so we add a if statement to check if the inode has compressed pages or not, if no, skip the flush. And in order to prevent the write ranges from intersecting, we need wait for the running ordered extents. But the current code waits for them twice, one is done before the direct IO starts (in btrfs_wait_ordered_range()), the other is before we get the blocks, it is unnecessary. because we can do the direct IO without holding i_mutex, it means that the intersected ordered extents may happen during the direct IO, the first wait can not avoid this problem. So we use filemap_fdatawrite_range() instead of btrfs_wait_ordered_range() to remove the first wait. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:24 -04:00
Miao Xie	af7a65097b	Btrfs: wake up the tasks that wait for the io earlier The tasks that wait for the IO_DONE flag just care about the io of the dirty pages, so it is better to wake up them immediately after all the pages are written, not the whole process of the io completes. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2014-03-10 15:17:23 -04:00

1 2 3 4 5 ...

35607 Commits