linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-17 17:41:44 +00:00

Author	SHA1	Message	Date
Fabian Frederick	c811f5f41e	fs/squashfs/super.c: logging cleanup - Convert printk to pr_foo() - Add pr_fmt for future logging entries - Coalesce formats Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:13 -07:00
Fabian Frederick	14694888db	fs/squashfs/file_direct.c: replace countsize kmalloc by kmalloc_array kmalloc_array() manages countsizeof overflow. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:13 -07:00
Fabian Frederick	2502722dde	ntfs: kernel-doc warning fixes cached_page and lru_pvec were removed from ntfs_attr_extend_initialized in commit `2ec93b0bf3` ("ntfs: clean up ntfs_attr_extend_initialized") lru_pvec has been removed from __ntfs_grab_cache_pages in commit `4c99000ac4` ("ntfs: use add_to_page_cache_lru()") Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:12 -07:00
Fabian Frederick	2122da26c3	fs/logfs/readwrite.c: kernel-doc warning fixes s/-/:/ and fix variable names. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:12 -07:00
Jan Kara	5838d4442b	fanotify: fix double free of pending permission events Commit `8581679424` ("fanotify: Fix use after free for permission events") introduced a double free issue for permission events which are pending in group's notification queue while group is being destroyed. These events are freed from fanotify_handle_event() but they are not removed from groups notification queue and thus they get freed again from fsnotify_flush_notify(). Fix the problem by removing permission events from notification queue before freeing them if we skip processing access response. Also expand comments in fanotify_release() to explain group shutdown in detail. Fixes: `8581679424` Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Douglas Leeder <douglas.leeder@sophos.com> Tested-by: Douglas Leeder <douglas.leeder@sophos.com> Reported-by: Heinrich Schuchard <xypron.glpk@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:12 -07:00
Jan Kara	8ba8fa9170	fsnotify: rename event handling functions Rename fsnotify_add_notify_event() to fsnotify_add_event() since the "notify" part is duplicit. Rename fsnotify_remove_notify_event() and fsnotify_peek_notify_event() to fsnotify_remove_first_event() and fsnotify_peek_first_event() respectively since "notify" part is duplicit and they really look at the first event in the queue. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:12 -07:00
Fabian Frederick	3e58406484	fs/fscache: make ctl_table static fscache_sysctls and fscache_sysctls_root are only used in main.c Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: David Howells <dhowells@redhat.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:12 -07:00
Linus Torvalds	ae045e2455	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights: 1) Steady transitioning of the BPF instructure to a generic spot so all kernel subsystems can make use of it, from Alexei Starovoitov. 2) SFC driver supports busy polling, from Alexandre Rames. 3) Take advantage of hash table in UDP multicast delivery, from David Held. 4) Lighten locking, in particular by getting rid of the LRU lists, in inet frag handling. From Florian Westphal. 5) Add support for various RFC6458 control messages in SCTP, from Geir Ola Vaagland. 6) Allow to filter bridge forwarding database dumps by device, from Jamal Hadi Salim. 7) virtio-net also now supports busy polling, from Jason Wang. 8) Some low level optimization tweaks in pktgen from Jesper Dangaard Brouer. 9) Add support for ipv6 address generation modes, so that userland can have some input into the process. From Jiri Pirko. 10) Consolidate common TCP connection request code in ipv4 and ipv6, from Octavian Purdila. 11) New ARP packet logger in netfilter, from Pablo Neira Ayuso. 12) Generic resizable RCU hash table, with intial users in netlink and nftables. From Thomas Graf. 13) Maintain a name assignment type so that userspace can see where a network device name came from (enumerated by kernel, assigned explicitly by userspace, etc.) From Tom Gundersen. 14) Automatic flow label generation on transmit in ipv6, from Tom Herbert. 15) New packet timestamping facilities from Willem de Bruijn, meant to assist in measuring latencies going into/out-of the packet scheduler, latency from TCP data transmission to ACK, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits) cxgb4 : Disable recursive mailbox commands when enabling vi net: reduce USB network driver config options. tg3: Modify tg3_tso_bug() to handle multiple TX rings amd-xgbe: Perform phy connect/disconnect at dev open/stop amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask net: sun4i-emac: fix memory leak on bad packet sctp: fix possible seqlock seadlock in sctp_packet_transmit() Revert "net: phy: Set the driver when registering an MDIO bus device" cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine team: Simplify return path of team_newlink bridge: Update outdated comment on promiscuous mode net-timestamp: ACK timestamp for bytestreams net-timestamp: TCP timestamping net-timestamp: SCHED timestamp on entering packet scheduler net-timestamp: add key to disambiguate concurrent datagrams net-timestamp: move timestamp flags out of sk_flags net-timestamp: extend SCM_TIMESTAMPING ancillary data struct cxgb4i : Move stray CPL definitions to cxgb4 driver tcp: reduce spurious retransmits due to transient SACK reneging qlcnic: Initialize dcbnl_ops before register_netdev ...	2014-08-06 09:38:14 -07:00
Linus Torvalds	bb2cbf5e93	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "In this release: - PKCS#7 parser for the key management subsystem from David Howells - appoint Kees Cook as seccomp maintainer - bugfixes and general maintenance across the subsystem" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits) X.509: Need to export x509_request_asymmetric_key() netlabel: shorter names for the NetLabel catmap funcs/structs netlabel: fix the catmap walking functions netlabel: fix the horribly broken catmap functions netlabel: fix a problem when setting bits below the previously lowest bit PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1 tpm: simplify code by using %*phN specifier tpm: Provide a generic means to override the chip returned timeouts tpm: missing tpm_chip_put in tpm_get_random() tpm: Properly clean sysfs entries in error path tpm: Add missing tpm_do_selftest to ST33 I2C driver PKCS#7: Use x509_request_asymmetric_key() Revert "selinux: fix the default socket labeling in sock_graft()" X.509: x509_request_asymmetric_keys() doesn't need string length arguments PKCS#7: fix sparse non static symbol warning KEYS: revert encrypted key change ima: add support for measuring and appraising firmware firmware_class: perform new LSM checks security: introduce kernel_fw_from_file hook PKCS#7: Missing inclusion of linux/err.h ...	2014-08-06 08:06:39 -07:00
Linus Torvalds	e7fda6c4c3	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...	2014-08-05 17:46:42 -07:00
Jeff Mahoney	27d0e5bc85	reiserfs: fix corruption introduced by balance_leaf refactor Commits `f1f007c308` (reiserfs: balance_leaf refactor, pull out balance_leaf_insert_left) and `cf22df182b` (reiserfs: balance_leaf refactor, pull out balance_leaf_paste_left) missed that the `body' pointer was getting repositioned. Subsequent users of the pointer would expect it to be repositioned, and as a result, parts of the tree would get overwritten. The most common observed corruption is indirect block pointers being overwritten. Since the body value isn't actually used anymore in the called routines, we can pass back the offset it should be shifted. We constify the body and ih pointers in the balance_leaf as a mostly-free preventative measure. Cc: <stable@vger.kernel.org> # 3.16 Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-08-05 23:18:38 +02:00
Jeff Layton	14a571a8ec	nfsd: add some comments to the nfsd4 object definitions Add some comments that describe what each of these objects is, and how they related to one another. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 16:09:20 -04:00
Jeff Layton	b687f6863e	nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 15:00:54 -04:00
Steve French	f29ebb47d5	Add worker function to set allocation size Adds setinfo worker function for SMB2/SMB3 support of SET_ALLOCATION_INFORMATION Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>	2014-08-05 12:53:37 -05:00
Jeff Layton	74cf76df0f	nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:20 -04:00
Jeff Layton	dab6ef2415	nfsd: remove nfs4_lock_state: nfs4_laundromat Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:20 -04:00
Trond Myklebust	05149dd4dc	nfsd: Remove nfs4_lock_state(): reclaim_complete() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:19 -04:00
Trond Myklebust	cb86fb1428	nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:18 -04:00
Trond Myklebust	3974552dce	nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session() Also destroy_clientid and bind_conn_to_session. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:17 -04:00
Trond Myklebust	3234975f47	nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:16 -04:00
Trond Myklebust	084d4d4549	nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:15 -04:00
Trond Myklebust	36626a2ecf	nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:14 -04:00
Trond Myklebust	2dd7f2ad4e	nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:13 -04:00
Trond Myklebust	51f5e78355	nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:12 -04:00
Trond Myklebust	e7d5dc19ce	nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:12 -04:00
Trond Myklebust	c2d1d6a8f0	nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:11 -04:00
Jeff Layton	285abdee53	nfsd: remove old fault injection infrastructure Remove the old nfsd_for_n_state function and move nfsd_find_client higher up into the file to get rid of forward declaration. Remove the struct nfsd_fault_inject_op arguments from the operations as they are no longer needed by any of them. Finally, remove the old "standard" get and set routines, which also eliminates the client_mutex from this code. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:10 -04:00
Jeff Layton	98d5c7c5bd	nfsd: add more granular locking to *_delegations fault injectors ...instead of relying on the client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:09 -04:00
Jeff Layton	82e05efaec	nfsd: add more granular locking to forget_openowners fault injector ...instead of relying on the client_mutex. Also, fix up the printk output that is generated when the file is read. It currently says that it's reporting the number of open files, but it's actually reporting the number of openowners. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:08 -04:00
Jeff Layton	016200c373	nfsd: add more granular locking to forget_locks fault injector ...instead of relying on the client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:07 -04:00
Jeff Layton	3738d50e7f	nfsd: add a list_head arg to nfsd_foreach_client_lock In a later patch, we'll want to collect the locks onto a list for later destruction. If "func" is defined and "collect" is defined, then we'll add the lock stateid to the list. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:06 -04:00
Jeff Layton	69fc9edf98	nfsd: add nfsd_inject_forget_clients ...which uses the client_lock for protection instead of client_mutex. Also remove nfsd_forget_client as there are no more callers. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:05 -04:00
Jeff Layton	a0926d1527	nfsd: add a forget_client set_clnt routine ...that relies on the client_lock instead of client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:04 -04:00
Jeff Layton	7ec0e36f1a	nfsd: add a forget_clients "get" routine with proper locking Add a new "get" routine for forget_clients that relies on the client_lock instead of the client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:04 -04:00
Jeff Layton	c96223d3b6	nfsd: abstract out the get and set routines into the fault injection ops Now that we've added more granular locking in other places, it's time to address the fault injection code. This code is currently quite reliant on the client_mutex for protection. Start to change this by adding a new set of fault injection op vectors. For now they all use the legacy ones. In later patches we'll add new routines that can deal with more granular locking. Also, move some of the printk routines into the callers to make the results of the operations more uniform. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:02 -04:00
Jeff Layton	294ac32e99	nfsd: protect clid and verifier generation with client_lock The clid counter is a global counter currently. Move it to be a per-net property so that it can be properly protected by the nn->client_lock instead of relying on the client_mutex. The verifier generator is also potentially racy if there are two simultaneous callers. Generate the verifier when we generate the clid value, so it's also created under the client_lock. With this, there's no need to keep two counters as they'd always be in sync anyway, so just use the clientid_counter for both. As Trond points out, what would be best is to eventually move this code to use IDR instead of the hash tables. That would also help ensure uniqueness, but that's probably best done as a separate project. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:02 -04:00
Jeff Layton	fd699b8a48	nfsd: don't destroy clients that are busy It's possible that we'll have an in-progress call on some of the clients while a rogue EXCHANGE_ID or DESTROY_CLIENTID call comes in. Be sure to try and mark the client expired first, so that the refcount is respected. This will only be a problem once the client_mutex is removed. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:01 -04:00
Kinglong Mee	fb94d766af	NFSD: Put the reference of nfs4_file when freeing stid After testing nfs4 lock, I restart the nfsd service, got messages as, [ 5677.403419] nfsd: last server has exited, flushing export cache [ 5677.463728] ============================================================================= [ 5677.463942] BUG nfsd4_files (Tainted: G B OE): Objects remaining in nfsd4_files on kmem_cache_close() [ 5677.464055] ----------------------------------------------------------------------------- [ 5677.464203] INFO: Slab 0xffffea0000233400 objects=28 used=1 fp=0xffff880008cd3d98 flags=0x3ffc0000004080 [ 5677.464318] CPU: 0 PID: 3772 Comm: rmmod Tainted: G B OE 3.16.0-rc2+ #29 [ 5677.464420] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 5677.464538] 0000000000000000 0000000036af2c9f ffff88000ce97d68 ffffffff816eacfa [ 5677.464643] ffffea0000233400 ffff88000ce97e40 ffffffff811cda44 ffffffff00000020 [ 5677.464774] ffff88000ce97e50 ffff88000ce97e00 656a624f00000008 616d657220737463 [ 5677.464875] Call Trace: [ 5677.464925] [<ffffffff816eacfa>] dump_stack+0x45/0x56 [ 5677.464983] [<ffffffff811cda44>] slab_err+0xb4/0xe0 [ 5677.465040] [<ffffffff811d0457>] ? __kmalloc+0x117/0x290 [ 5677.465099] [<ffffffff81100eec>] ? on_each_cpu_cond+0xac/0xf0 [ 5677.465158] [<ffffffff811d1bc0>] ? kmem_cache_close+0x110/0x2e0 [ 5677.465218] [<ffffffff811d1be0>] kmem_cache_close+0x130/0x2e0 [ 5677.465279] [<ffffffff8135a0c1>] ? kobject_cleanup+0x91/0x1b0 [ 5677.465338] [<ffffffff811d22be>] __kmem_cache_shutdown+0xe/0x10 [ 5677.465399] [<ffffffff8119bd28>] kmem_cache_destroy+0x48/0x100 [ 5677.465466] [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd] [ 5677.465530] [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd] [ 5677.465589] [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200 [ 5677.465649] [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90 [ 5677.465759] [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b [ 5677.465822] INFO: Object 0xffff880008cd0000 @offset=0 [ 5677.465882] INFO: Allocated in nfsd4_process_open1+0x61/0x350 [nfsd] age=7599 cpu=0 pid=3253 [ 5677.466115] __slab_alloc+0x3b0/0x4b1 [ 5677.466166] kmem_cache_alloc+0x1e4/0x240 [ 5677.466220] nfsd4_process_open1+0x61/0x350 [nfsd] [ 5677.466276] nfsd4_open+0xee/0x860 [nfsd] [ 5677.466329] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd] [ 5677.466384] nfsd_dispatch+0xbb/0x200 [nfsd] [ 5677.466447] svc_process_common+0x453/0x6f0 [sunrpc] [ 5677.466506] svc_process+0x103/0x170 [sunrpc] [ 5677.466559] nfsd+0x117/0x190 [nfsd] [ 5677.466609] kthread+0xd8/0xf0 [ 5677.466656] ret_from_fork+0x7c/0xb0 [ 5677.466775] kmem_cache_destroy nfsd4_files: Slab cache still has objects [ 5677.466839] CPU: 0 PID: 3772 Comm: rmmod Tainted: G B OE 3.16.0-rc2+ #29 [ 5677.466937] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 5677.467049] 0000000000000000 0000000036af2c9f ffff88000ce97eb0 ffffffff816eacfa [ 5677.467150] ffff880020bb2d00 ffff88000ce97ed0 ffffffff8119bdd9 0000000000000000 [ 5677.467250] ffffffffa06065c0 ffff88000ce97ee0 ffffffffa05ef78d ffff88000ce97ef0 [ 5677.467351] Call Trace: [ 5677.467397] [<ffffffff816eacfa>] dump_stack+0x45/0x56 [ 5677.467454] [<ffffffff8119bdd9>] kmem_cache_destroy+0xf9/0x100 [ 5677.467516] [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd] [ 5677.467579] [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd] [ 5677.467639] [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200 [ 5677.467765] [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90 [ 5677.467826] [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Fixes: `11b9164ada` "nfsd: Add a struct nfs4_file field to struct nfs4_stid" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:53:36 -04:00
Linus Torvalds	8e099d1e8b	Bug fixes and clean ups for the 3.17 merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJT4Eh0AAoJENNvdpvBGATwmaAP/0W9B/VPY4rSnanpXvsYlone wjIjh11NZdU3daW/c/VhgEIO81rFaoFoQ/dN+pIo7uHG8JRqZjyHXZY2eVF5SFFp FQyGEGMRhE+u78IDg3U99OlAUTo8SAHjHVlYUBVpT9lazMLWRPqP7uHJbXow5ijK /bXSVY+6fOWY1/yruCZv1nRtg9JNZgCc1LOPDqn6K16jItBKfYBvVbpw6hise2v2 rPfcSlKJ5Wzo/PgNX+IR9nnUOXzpbdM2CZbxy0qB2jZYirzE6VNeHhc1JWxQWNB1 Dg3j/ynEWPs03+9ywLcQ0kEQvUXhQQlMGmfPkgzWeAUQDUv4QAeYmFiRhc/EgJWY othDmKVqy0Pn9rmGCOMg/TJFH0Mz/c7PTxFVF1onxs2Sqvl3yCdZANT+ie5UoC9m zkUHdY3HkARiK/I6d5CCJzvHMxWNyf6bAJmoR6L/SaOPXebc4cfFtjgV01kbTWv+ rW9MCj3TiIC9MpWdVJADcmc2w3cY0L/NUbFWHZhMSDFiuJvcLUw5afaJTvKEPuKp WnnYICPj6wMP7Gy/isTxBGGi0UjPm67DHGLpG+syPDi1RxU6Vw3p9PjbdvCRj7so UD3xzntCHOzVcgAxE92V4pMZAajtv0sfIBVzMm7k8iUXzb7Er1c5dV6ROkOzguq3 Ogj/c2JHSSB4TSYdVxsG =hf9n -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Bug fixes and clean ups for the 3.17 merge window" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct ext4: fix COLLAPSE RANGE test for bigalloc file systems ext4: check inline directory before converting ext4: fix incorrect locking in move_extent_per_page ext4: use correct depth value ext4: add i_data_sem sanity check ext4: fix wrong size computation in ext4_mb_normalize_request() ext4: make ext4_has_inline_data() as a inline function ext4: remove readpage() check in ext4_mmap_file() ext4: fix punch hole on files with indirect mapping ext4: remove metadata reservation checks ext4: rearrange initialization to fix EXT4FS_DEBUG	2014-08-04 20:46:54 -07:00
Linus Torvalds	b54ecfb702	Merge tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series includes patches to: - add nobarrier mount option - support tmpfile and rename2 - enhance the fdatasync behavior - fix the error path - fix the recovery routine - refactor a part of the checkpoint procedure - reduce some lock contentions" * tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: use for_each_set_bit to simplify the code f2fs: add f2fs_balance_fs for expand_inode_data f2fs: invalidate xattr node page when evict inode f2fs: avoid skipping recover_inline_xattr after recover_inline_data f2fs: add tracepoint for f2fs_direct_IO f2fs: reduce competition among node page writes f2fs: fix coding style f2fs: remove redundant lines in allocate_data_block f2fs: add tracepoint for f2fs_issue_flush f2fs: avoid retrying wrong recovery routine when error was occurred f2fs: test before set/clear bits f2fs: fix wrong condition for unlikely f2fs: enable in-place-update for fdatasync f2fs: skip unnecessary data writes during fsync f2fs: add info of appended or updated data writes f2fs: use radix_tree for ino management f2fs: add infra for ino management f2fs: punch the core function for inode management f2fs: add nobarrier mount option f2fs: fix to put root inode in error path of fill_super ...	2014-08-04 20:30:07 -07:00
Linus Torvalds	29b88e23a9	Driver core patches for 3.17-rc1 Here's the big driver-core pull request for 3.17-rc1. Largest thing in here is the dma-buf rework and fence code, that touched many different subsystems so it was agreed it should go through this tree to handle merge issues. There's also some firmware loading updates, as well as tests added, and a few other tiny changes, the changelog has the details. All have been in linux-next for a long time. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlPf1XcACgkQMUfUDdst+ylREACdHLXBa02yLrRzbrONJ+nARuFv JuQAoMN49PD8K9iMQpXqKBvZBsu+iCIY =w8OJ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the big driver-core pull request for 3.17-rc1. Largest thing in here is the dma-buf rework and fence code, that touched many different subsystems so it was agreed it should go through this tree to handle merge issues. There's also some firmware loading updates, as well as tests added, and a few other tiny changes, the changelog has the details. All have been in linux-next for a long time" * tag 'driver-core-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits) ARM: imx: Remove references to platform_bus in mxc code firmware loader: Fix _request_firmware_load() return val for fw load abort platform: Remove most references to platform_bus device test: add firmware_class loader test doc: fix minor typos in firmware_class README staging: android: Cleanup style issues Documentation: devres: Sort managed interfaces Documentation: devres: Add devm_kmalloc() et al fs: debugfs: remove trailing whitespace kernfs: kernel-doc warning fix debugfs: Fix corrupted loop in debugfs_remove_recursive stable_kernel_rules: Add pointer to netdev-FAQ for network patches driver core: platform: add device binding path 'driver_override' driver core/platform: remove unused implicit padding in platform_object firmware loader: inform direct failure when udev loader is disabled firmware: replace ALIGN(PAGE_SIZE) by PAGE_ALIGN firmware: read firmware size using i_size_read() firmware loader: allow disabling of udev as firmware loader reservation: add suppport for read-only access using rcu reservation: update api and add some helpers ... Conflicts: drivers/base/platform.c	2014-08-04 18:34:04 -07:00
Linus Torvalds	98959948a7	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Move the nohz kick code out of the scheduler tick to a dedicated IPI, from Frederic Weisbecker. This necessiated quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification - Remove proliferation of wait_on_bit() action functions and allow wait_on_bit_action() functions to support a timeout. (Neil Brown) - Another round of sched/numa improvements, cleanups and fixes. (Rik van Riel) - Implement fast idling of CPUs when the system is partially loaded, for better scalability. (Tim Chen) - Restructure and fix the CPU hotplug handling code that may leave cfs_rq and rt_rq's throttled when tasks are migrated away from a dead cpu. (Kirill Tkhai) - Robustify the sched topology setup code. (Peterz Zijlstra) - Improve sched_feat() handling wrt. static_keys (Jason Baron) - Misc fixes. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) sched/fair: Fix 'make xmldocs' warning caused by missing description sched: Use macro for magic number of -1 for setparam sched: Robustify topology setup sched: Fix sched_setparam() policy == -1 logic sched: Allow wait_on_bit_action() functions to support a timeout sched: Remove proliferation of wait_on_bit() action functions sched/numa: Revert "Use effective_load() to balance NUMA loads" sched: Fix static_key race with sched_feat() sched: Remove extra static_key*() function indirection sched/rt: Fix replenish_dl_entity() comments to match the current upstream code sched: Transform resched_task() into resched_curr() sched/deadline: Kill task_struct->pi_top_task sched: Rework check_for_tasks() sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime() sched/fair: Disable runtime_enabled on dying rq sched/numa: Change scan period code to match intent sched/numa: Rework best node setting in task_numa_migrate() sched/numa: Examine a task move when examining a task swap sched/numa: Simplify task_numa_compare() sched/numa: Use effective_load() to balance NUMA loads ...	2014-08-04 16:23:30 -07:00
Scott Mayhew	71a6ec8ac5	nfs: reject changes to resvport and sharecache during remount Commit `c8e47028` made it possible to change resvport/noresvport and sharecache/nosharecache via a remount operation, neither of which should be allowed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: `c8e47028` (nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data) Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-04 17:41:52 -04:00
Kinglong Mee	5b53dc88b0	NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error Fix Commit `60ea681299` (NFS: Migration support for RELEASE_LOCKOWNER) If getting expired error, client will enter a infinite loop as, client server RELEASE_LOCKOWNER(old clid) -----> <--- expired error RENEW(old clid) -----> <--- expired error SETCLIENTID -----> <--- a new clid SETCLIENTID_CONFIRM (new clid) --> <--- ok RELEASE_LOCKOWNER(old clid) -----> <--- expired error RENEW(new clid) -----> <-- ok RELEASE_LOCKOWNER(old clid) -----> <--- expired error RENEW(new clid) -----> <-- ok ... ... Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> [Trond: replace call to nfs4_async_handle_error() with nfs4_schedule_lease_recovery()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-04 16:51:38 -04:00
Chao Yu	b65ee14818	f2fs: use for_each_set_bit to simplify the code This patch uses for_each_set_bit to simplify some codes in f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-04 13:20:53 -07:00
Chao Yu	497a0930bb	f2fs: add f2fs_balance_fs for expand_inode_data This patch adds f2fs_balance_fs in expand_inode_data to avoid allocation failure with segment. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-04 13:02:03 -07:00
Chao Yu	002a41cabb	f2fs: invalidate xattr node page when evict inode When inode is evicted, all the page cache belong to this inode should be released including the xattr node page. But previously we didn't do this, this patch fixed this issue. v2: o reposition invalidate_mapping_pages() to the right place suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-04 13:01:22 -07:00
Linus Torvalds	f2a84170ed	Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Major reorganization of percpu header files which I think makes things a lot more readable and logical than before. - percpu-refcount is updated so that it requires explicit destruction and can be reinitialized if necessary. This was pulled into the block tree to replace the custom percpu refcnting implemented in blk-mq. - In the process, percpu and percpu-refcount got cleaned up a bit * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits) percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() percpu-refcount: require percpu_ref to be exited explicitly percpu-refcount: use unsigned long for pcpu_count pointer percpu-refcount: add helpers for ->percpu_count accesses percpu-refcount: one bit is enough for REF_STATUS percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() percpu: Use ALIGN macro instead of hand coding alignment calculation percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations percpu: preffity percpu header files percpu: use raw_cpu_() to define __this_cpu_() percpu: reorder macros in percpu header files percpu: move {raw\|this}_cpu_() definitions to include/linux/percpu-defs.h percpu: move generic {raw\|this}_cpu__N() definitions to include/asm-generic/percpu.h percpu: only allow sized arch overrides for {raw\|this}_cpu_*() ops percpu: reorganize include/linux/percpu-defs.h percpu: move accessors from include/linux/percpu.h to percpu-defs.h percpu: include/asm-generic/percpu.h should contain only arch-overridable parts percpu: introduce arch_raw_cpu_ptr() ...	2014-08-04 10:09:27 -07:00
Eric W. Biederman	344470cac4	proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts In oddball cases where the thread has a different mount namespace than the thread group leader or more likely in cases where the thread remains and the thread group leader has exited this ensures that /proc/mounts continues to work. This should not cause any problems but if it does this patch can just be reverted. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-08-04 10:07:15 -07:00
Eric W. Biederman	e813244072	proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net In oddball cases where the thread has a different network namespace than the primary thread group leader or more likely in cases where the thread remains and the thread group leader has exited this ensures that /proc/net continues to work. This should not cause any problems but if it does this patch can just be reverted. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-08-04 10:07:13 -07:00
Eric W. Biederman	0097875bd4	proc: Implement /proc/thread-self to point at the directory of the current thread /proc/thread-self is derived from /proc/self. /proc/thread-self points to the directory in proc containing information about the current thread. This funtionality has been missing for a long time, and is tricky to implement in userspace as gettid() is not exported by glibc. More importantly this allows fixing defects in /proc/mounts and /proc/net where in a threaded application today they wind up being empty files when only the initial pthread has exited, causing problems for other threads. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-08-04 10:07:11 -07:00
Eric W. Biederman	6ba8ed79a3	proc: Have net show up under /proc/<tgid>/task/<tid> Network namespaces are per task so it make sense for them to show up in the task directory. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-08-04 10:07:08 -07:00
Linus Torvalds	1bff598860	File locking related changes for v3.17 (pile #1 ) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJT34bvAAoJEAAOaEEZVoIVAPIQAINMD2fqeF3g9ZHxyzsKWoUp f14ZKeF/6nbG4Bn+iIihzxz/Bs9qS+03oVeI4oAg1c9crT+qZ6+nLM4C1n5gfck0 Z0DvF1ITFcr+Nv0D/GSIiI4NY8ZJLP5gZWCPYaO4xamwVs2Bh4/B4uxi7ETIkfXh uL6dN739D2fBDNZBbeRh4VJTGXbT6ipzkTIBFXkMfmqGtUxzeTfepN+IdhE5gVnx xXc8ZZOVNmWI7g/YAYKSMlLbufHHgX47U2sNTljtHII4GXf98DmiYulcJvhfQ1JP 7xSmbIrvn9Gm2iGobzbfED/OjXA0rsdw1vSzTO/uHUYPRriMOwuDRGE+S3oP0dRD ZdxQa8iOZjWEsWbDTRekBBAIXWcTUN8g8EbPj74EN0GWi3HYFj/ORkowj5Ym6zWh Sv4w9SafNMOKy9tt4RVh4iwendU/pNLrRgvR407aM+UWkhwCpinlO6vSLHppUwlC dgxFZtkdeBf5tMkm8Tja+XAV2SjU8DwP4nFU1kHu25L0W7m7hmmIeu6Crq5qlL3J 0NCPTO1LeGNP1WiOQf99nXoJVeL3//CfD+H4LjIMcGCc4P7gJc346rH91Zd6rXY/ kGomnkBMw+5WLvfOJ1NhuaEy3g8Wfk84QzlsmWgTEzl5qT+SEjPLcDHWqWJ2GTvB gFUPWHenVMcrFN/n0CWg =hvV8 -----END PGP SIGNATURE----- Merge tag 'locks-v3.17-1' of git://git.samba.org/jlayton/linux Pull file locking related changes from Jeff Layton: "Just a couple of changes from Christoph to start us down the road toward getting rid of the fl_owner_t typedef" * tag 'locks-v3.17-1' of git://git.samba.org/jlayton/linux: locks: purge fl_owner_t from fs/locks.c locks: typedef fl_owner_t to void *	2014-08-04 10:03:10 -07:00
Eric W. Biederman	65b38851a1	NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes The usage of pid_ns->child_reaper->nsproxy->net_ns in nfs_server_list_open and nfs_client_list_open is not safe. /proc for a pid namespace can remain mounted after the all of the process in that pid namespace have exited. There are also times before the initial process in a pid namespace has started or after the initial process in a pid namespace has exited where pid_ns->child_reaper can be NULL or stale. Making the idiom pid_ns->child_reaper->nsproxy a double whammy of problems. Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and /proc/net/nfsfs/volumes and add a symlink from the original location, and to use seq_open_net as it has been designed. Cc: stable@vger.kernel.org Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-08-04 09:28:32 -07:00
NeilBrown	50d77739fa	NFS: fix two problems in lookup_revalidate in RCU-walk 1/ rcu_dereference isn't correct: that field isn't RCU protected. It could potentially change at any time so ACCESS_ONCE might be justified. changes to ->d_parent are protected by ->d_seq. However that isn't always checked after ->d_revalidate is called, so it is safest to keep the double-check that ->d_parent hasn't changed at the end of these functions. 2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten. So 'parent' was not the parent of 'dentry'. This fails safe is the context is that dentry->d_inode is NULL, and the result of parent->d_inode being NULL is that ECHILD is returned, which is always safe. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-04 09:22:08 -04:00
Dave Chinner	645f985721	Merge branch 'xfs-misc-fixes-3.17-2' into for-next	2014-08-04 13:55:27 +10:00
Dave Chinner	b076d8720d	Merge branch 'xfs-bulkstat-refactor' into for-next	2014-08-04 13:54:46 +10:00
Dave Chinner	4d7eece2c0	Merge branch 'xfs-misc-fixes-3.17-1' into for-next	2014-08-04 13:54:14 +10:00
Dave Chinner	e0ac6d45bc	Merge branch 'xfs-quota-eofblocks-scan' into for-next	2014-08-04 13:53:47 +10:00
kbuild test robot	6eee8972cc	xfs: fix coccinelle warnings Removes unneeded semicolon, introduced by commit `a70a4fa5` ("xfs: fix a couple error sequence jumps in xfs_mountfs"): fs/xfs/xfs_mount.c:858:24-25: Unneeded semicolon Generated by: scripts/coccinelle/misc/semicolon.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:49:40 +10:00
Dave Chinner	4ef897a275	xfs: flush both inodes in xfs_swap_extents We need to treat both inodes identically from a page cache point of view when prepareing them for extent swapping. We don't do this right now - we assume that one of the inodes empty, because that's what xfs_fsr currently does. Remove this assumption from the code. While factoring out the flushing and related checks, move the transactions reservation to immeidately after the flushes so that we don't need to pick up and then drop the ilock to do the transaction reservation. There are no issues with aborting the transaction it if the checks fail before we join the inodes to the transaction and dirty them, so this is a safe change to make. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:44:08 +10:00
Dave Chinner	8121768321	xfs: fix swapext ilock deadlock xfs_swap_extents() holds the ilock over a call to filemap_write_and_wait(), which can then try to write data and take the ilock. That causes a self-deadlock. Fix the deadlock and clean up the code by separating the locking appropriately. Add a lockflags variable to track what locks we are holding as we gain and drop them and cleanup the error handling to always use "out_unlock" with the lockflags variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:29:32 +10:00
Dave Chinner	b92cc59f69	xfs: kill xfs_vnode.h Move the IO flag definitions to xfs_inode.h and kill the header file as it is now empty. Removing the xfs_vnode.h file showed up an implicit header include path: xfs_linux.h -> xfs_vnode.h -> xfs_fs.h And so every xfs header file has been inplicitly been including xfs_fs.h where it is needed or not. Hence the removal of xfs_vnode.h causes all sorts of build issues because BBTOB() and friends are no longer automatically included in the build. This also gets fixed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:28:20 +10:00
Dave Chinner	dd8c38bab0	xfs: kill VN_MAPPED Only one user, no longer needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:23:35 +10:00
Dave Chinner	2667c6f935	xfs: kill VN_CACHED Only has 2 users, has outlived it's usefulness. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:23:15 +10:00
Dave Chinner	eac152b474	xfs: kill VN_DIRTY() Only one user of the macro and the dirty mapping check is redundant so just get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:22:49 +10:00
Dave Chinner	ad3714b82c	xfs: dquot recovery needs verifiers dquot recovery should add verifiers to the dquot buffers that it recovers changes into. Unfortunately, it doesn't attached the verifiers to the buffers in a consistent manner. For example, xlog_recover_dquot_pass2() reads dquot buffers without a verifier and then writes it without ever having attached a verifier to the buffer. Further, dquot buffer recovery may write a dquot buffer that has not been modified, or indeed, shoul dbe written because quotas are not enabled and hence changes to the buffer were not replayed. In this case, we again write buffers without verifiers attached because that doesn't happen until after the buffer changes have been replayed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:59:31 +10:00
Dave Chinner	5fd364fee8	xfs: quotacheck leaves dquot buffers without verifiers When running xfs/305, I noticed that quotacheck was flushing dquot buffers that did not have the xfs_dquot_buf_ops verifiers attached: XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8 ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00 DQ....e......... ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001 ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000 ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80 Call Trace: [<ffffffff81cf1cca>] dump_stack+0x45/0x56 [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0 [<ffffffff810db520>] ? wake_up_state+0x20/0x20 [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0 [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220 [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90 [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90 [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0 [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0 [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0 [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340 [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0 [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170 [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20 [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0 [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120 [<ffffffff811e757e>] do_mount+0x23e/0xad0 [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50 [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150 [<ffffffff811e8103>] SyS_mount+0x83/0xc0 [<ffffffff81cfd40b>] tracesys+0xdd/0xe2 This was caused by dquot buffer readahead not attaching a verifier structure to the buffer when readahead was issued, resulting in the followup read of the buffer finding a valid buffer and so not attaching new verifiers to the buffer as part of the read. Also, when a verifier failure occurs, we then read the buffer without verifiers. Attach the verifiers manually after this read so that if the buffer is then written it will be verified that the corruption has been repaired. Further, when flushing a dquot we don't ask for a verifier when reading in the dquot buffer the dquot belongs to. Most of the time this isn't an issue because the buffer is still cached, but when it is not cached it will result in writing the dquot buffer without having the verfier attached. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:43:26 +10:00
Dave Chinner	67dc288c21	xfs: ensure verifiers are attached to recovered buffers Crash testing of CRC enabled filesystems has resulted in a number of reports of bad CRCs being detected after the filesystem was mounted. Errors such as the following were being seen: XFS (sdb3): Mounting V5 Filesystem XFS (sdb3): Starting recovery (logdev: internal) XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1 XFS (sdb3): Unmount and run xfs_repair XFS (sdb3): First 64 bytes of corrupted metadata buffer: ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40 XAGF...........@ ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01 ..mS..w......... ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00 ................ XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1 The errors were typically being seen in AGF, AGI and their related btree block buffers some time after log recovery had run. Often it wasn't until later subsequent mounts that the problem was discovered. The common symptom was a buffer with the correct contents, but a CRC and an LSN that matched an older version of the contents. Some debug added to _xfs_buf_ioapply() indicated that buffers were being written without verifiers attached to them from log recovery, and Jan Kara isolated the cause to log recovery readahead an dit's interactions with buffers that had a more recent LSN on disk than the transaction being recovered. In this case, the buffer did not get a verifier attached, and os when the second phase of log recovery ran and recovered EFIs and unlinked inodes, the buffers were modified and written without the verifier running. Hence they had up to date contents, but stale LSNs and CRCs. Fix it by attaching verifiers to buffers we skip due to future LSN values so they don't escape into the buffer cache without the correct verifier attached. This patch is based on analysis and a patch from Jan Kara. cc: <stable@vger.kernel.org> Reported-by: Jan Kara <jack@suse.cz> Reported-by: Fanael Linithien <fanael4@gmail.com> Reported-by: Grozdan <neutrino8@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:43:06 +10:00
Dave Chinner	400b9d8875	xfs: catch buffers written without verifiers attached We recently had a bug where buffers were slipping through log recovery without any verifier attached to them. This was resulting in on-disk CRC mismatches for valid data. Add some warning code to catch this occurrence so that we catch such bugs during development rather than not being aware they exist. Note that we cannot do this verification unconditionally as non-CRC filesystems don't always attach verifiers to the buffers being written. e.g. during log recovery we cannot identify all the different types of buffers correctly on non-CRC filesystems, so we can't attach the correct verifiers in all cases and so we don't attach any. Hence we don't want on non-CRC filesystems to avoid spamming the logs with false indications. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:42:40 +10:00
Eric Sandeen	5ef828c415	xfs: avoid false quotacheck after unclean shutdown The commit `83e782e` xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD added a new function xfs_sb_quota_from_disk() which swaps on-disk XFS_OQUOTA_* flags for in-core XFS_GQUOTA_* and XFS_PQUOTA_* flags after the superblock is read. However, if log recovery is required, the superblock is read again, and the modified in-core flags are re-read from disk, so we have XFS_OQUOTA_* flags in memory again. This causes the XFS_QM_NEED_QUOTACHECK() test to be true, because the XFS_OQUOTA_CHKD is still set, and not XFS_GQUOTA_CHKD or XFS_PQUOTA_CHKD. Change xfs_sb_from_disk to call xfs_sb_quota_from disk and always convert the disk flags to in-memory flags. Add a lower-level function which can be called with "false" to not convert the flags, so that the sb verifier can verify exactly what was on disk, per Brian Foster's suggestion. Reported-by: Cyril B. <cbay@excellency.fr> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2014-08-04 11:35:44 +10:00
Brian Foster	eedf32bfca	xfs: fix rounding error of fiemap length parameter The offset and length parameters are converted from bytes to basic blocks by xfs_vn_fiemap(). The BTOBB() converter rounds the value up to the nearest basic block. This leads to unexpected behavior when unaligned offsets are provided to FIEMAP. Fix the conversions of byte values to block values to cover the provided offsets. Round down the start offset to the nearest basic block. Calculate the end offset based on the provided values, round up and calculate length based on the start block offset. Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 11:35:35 +10:00
Jie Liu	1e773c4989	xfs: introduce xfs_bulkstat_ag_ichunk Introduce xfs_bulkstat_ag_ichunk() to process inodes in chunk with a pointer to a formatter function that will iget the inode and fill in the appropriate structure. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 11:22:31 +10:00
NeilBrown	f682a398b2	NFS: allow lockless access to access_cache The access cache is used during RCU-walk path lookups, so it is best to avoid locking if possible as taking a lock kills concurrency. The rbtree is not rcu-safe and cannot easily be made so. Instead we simply check the last (i.e. most recent) entry on the LRU list. If this doesn't match, then we return -ECHILD and retry in lock/refcount mode. This requires freeing the nfs_access_entry struct with rcu, and requires using rcu access primatives when adding entries to the lru, and when examining the last entry. Calling put_rpccred before kfree_rcu looks a bit odd, but as put_rpccred already provides rcu protection, we know that the cred will not actually be freed until the next grace period, so any concurrent access will be safe. This patch provides about 5% performance improvement on a stat-heavy synthetic work load with 4 threads on a 2-core CPU. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:13 -04:00
NeilBrown	1fa1e38447	NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU It fails with -ECHILD rather than make an RPC call. This allows nfs_lookup_revalidate to call it in RCU-walk mode. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:13 -04:00
NeilBrown	912a108da7	NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU This requires nfs_check_verifier to take an rcu_walk flag, and requires an rcu version of nfs_revalidate_inode which returns -ECHILD rather than making an RPC call. With this, nfs_lookup_revalidate can call nfs_neg_need_reval in RCU-walk mode. We can also move the LOOKUP_RCU check past the nfs_check_verifier() call in nfs_lookup_revalidate. If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from doing a full check, they return a status indicating that a revalidation is required. As this revalidation will not be possible in RCU_WALK mode, -ECHILD will ultimately be returned, which is the desired result. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:12 -04:00
NeilBrown	f3324a2a94	NFS: support RCU_WALK in nfs_permission() nfs_permission makes two calls which are not always safe in RCU_WALK, rpc_lookup_cred and nfs_do_access. The second can easily be made rcu-safe by aborting with -ECHILD before making the RPC call. The former can be made rcu-safe by calling rpc_lookup_cred_nonblock() instead. As this will almost always succeed, we use it even when RCU_WALK isn't being used as it still saves some spinlocks in a common case. We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock() fails and MAY_NOT_BLOCK isn't set. This optimisation (always trying rpc_lookup_cred_nonblock()) is particularly important when a security module is active. In that case inode_permission() may return -ECHILD from security_inode_permission() even though ->permission() succeeded in RCU_WALK mode. This leads to may_lookup() retrying inode_permission after performing unlazy_walk(). The spinlock that rpc_lookup_cred() takes is often more expensive than anything security_inode_permission() does, so that spinlock becomes the main bottleneck. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:12 -04:00
NeilBrown	d51ac1a8e9	NFS: prepare for RCU-walk support but pushing tests later in code. nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission all need to understand and handle RCU-walk for NFS to gain the benefits of RCU-walk for cached information. Currently these functions all immediately return -ECHILD if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set. This patch pushes those tests later in the code so that we only abort immediately before we enter rcu-unsafe code. As subsequent patches make that rcu-unsafe code rcu-safe, several of these new tests will disappear. With this patch there are several paths through the code which will no longer return -ECHILD during an RCU-walk. However these are mostly error paths or other uninteresting cases. A noteworthy change in nfs_lookup_revalidate is that we don't take (or put) the reference to ->d_parent when LOOKUP_RCU is set. Rather we rcu_dereference ->d_parent, and check that ->d_inode is not NULL. We also check that ->d_parent hasn't changed after all the tests. In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the path that only calls nfs_lookup_revalidate() as that function already performs the required test. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:11 -04:00
NeilBrown	49317a7fda	NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used. nfs4_lookup_revalidate only uses 'parent' to get 'dir', and only uses 'dir' if 'inode == NULL'. So we don't need to find out what 'parent' or 'dir' is until we know that 'inode' is NULL. By moving 'dget_parent' inside the 'if', we can reduce the number of call sites for 'dput(parent)'. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:11 -04:00
Alexey Khoroshilov	1f70ef96b1	NFS: add checks for returned value of try_module_get() There is a couple of places in client code where returned value of try_module_get() is ignored. As a result there is a small chance to premature unload module because of unbalanced refcounting. The patch adds error handling in that places. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:10 -04:00
Weston Andros Adamson	411a99adff	nfs: clear_request_commit while holding i_lock Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:26 -04:00
Weston Andros Adamson	e6cf82d183	pnfs: add pnfs_put_lseg_async This is useful when lsegs need to be released while holding locks. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:25 -04:00
Weston Andros Adamson	02d1426c70	pnfs: find swapped pages on pnfs commit lists too nfs_page_find_head_request_locked looks through the regular nfs commit lists when the page is swapped out, but doesn't look through the pnfs commit lists. I'm not sure if anyone has hit any issues caused by this. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:25 -04:00
Weston Andros Adamson	b412ddf066	nfs: fix comment and add warn_on for PG_INODE_REF Fix the comment in nfs_page.h for PG_INODE_REF to reflect that it's no longer set only on head requests. Also add a WARN_ON_ONCE in nfs_inode_remove_request as PG_INODE_REF should always be set. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:25 -04:00
Weston Andros Adamson	e7029206ff	nfs: check wait_on_bit_lock err in page_group_lock Return errors from wait_on_bit_lock from nfs_page_group_lock. Add a bool argument @wait to nfs_page_group_lock. If true, loop over wait_on_bit_lock until it returns cleanly. If false, return the error from wait_on_bit_lock. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:24 -04:00
NeilBrown	4fa2c54b51	NFS: nfs4_do_open should add negative results to the dcache. If you have an NFSv4 mounted directory which does not container 'foo' and: ls -l foo ssh $server touch foo cat foo then the 'cat' will fail (usually, depending a bit on the various cache ages). This is correct as negative looks are cached by default. However with the same initial conditions: cat foo ssh $server touch foo cat foo will usually succeed. This is because an "open" does not add a negative dentry to the dcache, while a "lookup" does. This can have negative performance effects. When "gcc" searches for an include file, it will try to "open" the file in every director in the search path. Without caching of negative "open" results, this generates much more traffic to the server than it should (or than NFSv3 does). The root of the problem is that _nfs4_open_and_get_state() will call d_add_unique() on a positive result, but not on a negative result. Compare with nfs_lookup() which calls d_materialise_unique on both a positive result and on ENOENT. This patch adds a call d_add() in the ENOENT case for _nfs4_open_and_get_state() and also calls nfs_set_verifier(). With it, many fewer "open" requests for known-non-existent files are sent to the server. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:22 -04:00
Andrey Utkin	7a9e75a185	nfs3_list_one_acl(): check get_acl() result with IS_ERR_OR_NULL There was a check for result being not NULL. But get_acl() may return NULL, or ERR_PTR, or actual pointer. The purpose of the function where current change is done is to "list ACLs only when they are available", so any error condition of get_acl() mustn't be elevated, and returning 0 there is still valid. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111 Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: `74adf83f5d` (nfs: only show Posix ACLs in listxattr if actually...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:22 -04:00
Trond Myklebust	9806755c56	Merge branch 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next * 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits) xprtrdma: Handle additional connection events xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro xprtrdma: Make rpcrdma_ep_disconnect() return void xprtrdma: Schedule reply tasklet once per upcall xprtrdma: Allocate each struct rpcrdma_mw separately xprtrdma: Rename frmr_wr xprtrdma: Disable completions for LOCAL_INV Work Requests xprtrdma: Disable completions for FAST_REG_MR Work Requests xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect xprtrdma: Properly handle exhaustion of the rb_mws list xprtrdma: Chain together all MWs in same buffer pool xprtrdma: Back off rkey when FAST_REG_MR fails xprtrdma: Unclutter struct rpcrdma_mr_seg xprtrdma: Don't invalidate FRMRs if registration fails xprtrdma: On disconnect, don't ignore pending CQEs xprtrdma: Update rkeys after transport reconnect xprtrdma: Limit data payload size for ALLPHYSICAL xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs ...	2014-08-03 17:04:51 -04:00
Trond Myklebust	3a505845cd	NFS: Enforce an upper limit on the number of cached access call This may be used to limit the number of cached credentials building up inside the access cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:03:22 -04:00
Steve French	59b04c5df7	[CIFS] Fix incorrect hex vs. decimal in some debug print statements Joe Perches and Hans Wennborg noticed that various places in the kernel were printing decimal numbers with 0x prefix. printk("0x%d") or equivalent This fixes the instances of this in the cifs driver. CC: Hans Wennborg <hans@hanshq.net> CC: Joe Perches <joe@perches.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 21:16:48 -05:00
Chao Yu	70cfed88ef	f2fs: avoid skipping recover_inline_xattr after recover_inline_data When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-02 07:43:51 -07:00
Chao Yu	70407fad85	f2fs: add tracepoint for f2fs_direct_IO This patch adds a tracepoint for f2fs_direct_IO. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-02 07:34:46 -07:00
Steve French	81691503b2	Update cifs version to 2.04 Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	21496687a7	CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2 The existing mapping causes unlink() call to return error after delete operation. Changing the mapping to -EACCES makes the client process the call like CIFS protocol does - reset dos attributes with ATTR_READONLY flag masked off and retry the operation. Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	b770ddfa26	CIFS: Optimize readpages in a short read case on reconnects by marking pages with a data from a partially received response up-to-date. This is suitable for non-signed connections. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	d913ed17f0	CIFS: Optimize cifs_user_read() in a short read case on reconnects by filling the output buffer with a data got from a partially received response and requesting the remaining data from the server. This is suitable for non-signed connections. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	fb8a3e5255	CIFS: Improve indentation in cifs_user_read() Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	2e8a05d802	CIFS: Fix possible buffer corruption in cifs_user_read() If there was a short read in the middle of the rdata list, we can end up with a corrupt output buffer. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	b3160aebb4	CIFS: Count got bytes in read_into_pages() that let us know how many bytes we have already got before reconnect. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	34a54d6177	CIFS: Use separate var for the number of bytes got in async read and don't mix it with the number of bytes that was requested. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	3fabaa2746	CIFS: Indicate reconnect with ECONNABORTED error code that let us not mix it with EAGAIN. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	bed9da0213	CIFS: Use multicredits for SMB 2.1/3 reads If we negotiate SMB 2.1 and higher version of the protocol and a server supports large read buffer size, we need to consume 1 credit per 65536 bytes. So, we need to know how many credits we have and obtain the required number of them before constructing a readdata structure in readpages and user read. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	e374d90f8a	CIFS: Fix rsize usage for sync read If a server changes maximum buffer size for read requests (rsize) on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in cifs_read. Fix this by checking rsize all the time before repeating requests. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	25f402598d	CIFS: Fix rsize usage in user read If a server changes maximum buffer size for read (rsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in user read. Fix this by checking rsize all the time before repeating requests. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	0ada36b244	CIFS: Separate page reading from user read Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	69cebd7560	CIFS: Fix rsize usage in readpages If a server changes maximum buffer size for read (rsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in readpages. Fix this by checking rsize all the time before repeating requests. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	387eb92ac6	CIFS: Separate page search from readpages Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	cb7e9eabb2	CIFS: Use multicredits for SMB 2.1/3 writes If we negotiate SMB 2.1 and higher version of the protocol and a server supports large write buffer size, we need to consume 1 credit per 65536 bytes. So, we need to know how many credits we have and obtain the required number of them before constructing a writedata structure in writepages and iovec write. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	6ec0b01b26	CIFS: Fix wsize usage in iovec write If a server change maximum buffer size for write (wsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in iovec write. Fix this by checking wsize all the time before repeating request in iovec write. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	43de94eadf	CIFS: Separate writing from iovec write Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	66386c08be	CIFS: Separate filling pages from iovec write Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	7f6c50086a	CIFS: Fix cifs_writev_requeue when wsize changes If wsize changes on reconnect we need to use new writedata structure that for retrying. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	66231a4796	CIFS: Fix wsize usage in writepages If a server change maximum buffer size for write (wsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in writepages. Fix this by checking wsize all the time before repeating request in writepages. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	90ac1387c2	CIFS: Separate pages initialization from writepages Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	619aa48edb	CIFS: Separate page sending from writepages Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Steve French	27924075b5	Remove sparse build warning The recent session setup patch set (cifs-Separate-rawntlmssp-auth-from-CIFS_SessSetup.patch) had introduced a trivial sparse build warning. Signed-off-by: Steve French <smfrench@gmail.com> Cc: Sachin Prabhu <sprabhu@redhat.com>	2014-08-02 01:23:01 -05:00
Pavel Shilovsky	7e48ff8202	CIFS: Separate page processing from writepages Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:01 -05:00
Pavel Shilovsky	038bc961c3	CIFS: Fix async reading on reconnects If we get into read_into_pages() from cifs_readv_receive() and then loose a network, we issue cifs_reconnect that moves all mids to a private list and issue their callbacks. The callback of the async read request sets a mid to retry, frees it and wakes up a process that waits on the rdata completion. After the connection is established we return from read_into_pages() with a short read, use the mid that was freed before and try to read the remaining data from the a newly created socket. Both actions are not what we want to do. In reconnect cases (-EAGAIN) we should not mask off the error with a short read but should return the error code instead. Acked-by: Jeff Layton <jlayton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:01 -05:00
Jeff Layton	7abea1e8e8	nfsd: don't destroy client if mark_client_expired_locked fails If it fails, it means that the client is in use and so destroying it would be bad. Currently, the client_mutex prevents this from happening but once we remove it, we won't be able to do this. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:26 -04:00
Jeff Layton	97403d95e1	nfsd: move unhash_client_locked call into mark_client_expired_locked All the callers except for the fault injection code call it directly afterward, and in the fault injection case it won't hurt to do so anyway. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:25 -04:00
Jeff Layton	217526e7ec	nfsd: protect the close_lru list and oo_last_closed_stid with client_lock Currently, it's protected by the client_mutex. Move it so that the list and the fields in the openowner are protected by the client_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:24 -04:00
Trond Myklebust	0a880a28f8	nfsd: Add lockdep assertions to document the nfs4_client/session locking Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:23 -04:00
Trond Myklebust	3e339f964b	nfsd: Ensure lookup_clientid() takes client_lock Ensure that the client lookup is done safely under the client_lock, so we're not relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:23 -04:00
Trond Myklebust	6b10ad193d	nfsd: Protect nfsd4_destroy_clientid using client_lock ...instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:22 -04:00
Jeff Layton	d20c11d86d	nfsd: Protect session creation and client confirm using client_lock In particular, we want to ensure that the move_to_confirmed() is protected by the nn->client_lock spin lock, so that we can use that when looking up the clientid etc. instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:21 -04:00
Trond Myklebust	3dbacee6e1	nfsd: Protect unconfirmed client creation using client_lock ...instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:20 -04:00
Trond Myklebust	5cc40fd7b6	nfsd: Move create_client() call outside the lock For efficiency reasons, and because we want to use spin locks instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:20 -04:00
Trond Myklebust	425510f5c8	nfsd: Don't require client_lock in free_client The struct nfs_client is supposed to be invisible and unreferenced before it gets here. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:19 -04:00
Trond Myklebust	4864af97e0	nfsd: Ensure that the laundromat unhashes the client before releasing locks If we leave the client on the confirmed/unconfirmed tables, and leave the sessions visible on the sessionid_hashtbl, then someone might find them before we've had a chance to destroy them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:18 -04:00
Trond Myklebust	4beb345b37	nfsd: Ensure struct nfs4_client is unhashed before we try to destroy it When we remove the client_mutex protection, we will need to ensure that it can't be found by other threads while we're destroying it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:17 -04:00
J. Bruce Fields	83e452fee8	nfsd4: fix out of date comment Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:16 -04:00
Kinglong Mee	d9499a9571	NFSD: Decrease nfsd_users in nfsd_startup_generic fail A memory allocation failure could cause nfsd_startup_generic to fail, in which case nfsd_users wouldn't be incorrectly left elevated. After nfsd restarts nfsd_startup_generic will then succeed without doing anything--the first consequence is likely nfs4_start_net finding a bad laundry_wq and crashing. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `4539f14981` "nfsd: replace boolean nfsd_up flag by users counter" Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:26:09 -04:00
Eric Biggers	6d2b6170c8	vfs: fix check for fallocate on active swapfile Fix the broken check for calling sys_fallocate() on an active swapfile, introduced by commit `0790b31b69` ("fs: disallow all fallocate operation on active swapfile"). Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-01 02:36:04 -04:00
Christoph Hellwig	af43647277	direct-io: fix AIO regression The direct-io.c rewrite to use the iov_iter infrastructure stopped updating the size field in struct dio_submit, and thus rendered the check for allowing asynchronous completions to always return false. Fix this by comparing it to the count of bytes in the iov_iter instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com>	2014-08-01 02:35:51 -04:00
Sachin Prabhu	cc87c47d9d	cifs: Separate rawntlmssp auth from CIFS_SessSetup() Separate rawntlmssp authentication from CIFS_SessSetup(). Also cleanup CIFS_SessSetup() since we no longer do any auth within it. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	ee03c646dd	cifs: Split Kerberos authentication off CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	583cf7afc7	cifs: Split ntlm and ntlmv2 authentication methods off CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	80a0e63751	cifs: Split lanman auth from CIFS_SessSetup() In preparation for splitting CIFS_SessSetup() into smaller more manageable chunks, we first add helper functions. We then proceed to split out lanman auth out of CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	6d81ed1ec2	cifs: replace code with free_rsp_buf() The functionality provided by free_rsp_buf() is duplicated in a number of places. Replace these instances with a call to free_rsp_buf(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Eric W. Biederman	ffbc6f0ead	mnt: Change the default remount atime from relatime to the existing value Since March 2009 the kernel has treated the state that if no MS_..ATIME flags are passed then the kernel defaults to relatime. Defaulting to relatime instead of the existing atime state during a remount is silly, and causes problems in practice for people who don't specify any MS_...ATIME flags and to get the default filesystem atime setting. Those users may encounter a permission error because the default atime setting does not work. A default that does not work and causes permission problems is ridiculous, so preserve the existing value to have a default atime setting that is always guaranteed to work. Using the default atime setting in this way is particularly interesting for applications built to run in restricted userspace environments without /proc mounted, as the existing atime mount options of a filesystem can not be read from /proc/mounts. In practice this fixes user space that uses the default atime setting on remount that are broken by the permission checks keeping less privileged users from changing more privileged users atime settings. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:59 -07:00
Eric W. Biederman	9566d67428	mnt: Correct permission checks in do_remount While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:34 -07:00
Eric W. Biederman	07b645589d	mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount There are no races as locked mount flags are guaranteed to never change. Moving the test into do_remount makes it more visible, and ensures all filesystem remounts pass the MNT_LOCK_READONLY permission check. This second case is not an issue today as filesystem remounts are guarded by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged mount namespaces, but it could become an issue in the future. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:17 -07:00
Eric W. Biederman	a6138db815	mnt: Only change user settable mount flags in remount Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:11:54 -07:00
Jeff Layton	4ae098d327	nfsd: rename unhash_generic_stateid to unhash_ol_stateid ...to better match other functions that deal with open/lock stateids. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:31 -04:00
Jeff Layton	d83017f94c	nfsd: don't thrash the cl_lock while freeing an open stateid When we remove the client_mutex, we'll have a potential race between FREE_STATEID and CLOSE. The root of the problem is that we are walking the st_locks list, dropping the spinlock and then trying to release the persistent reference to the lockstateid. In between, a FREE_STATEID call can come along and take the lock, find the stateid and then try to put the reference. That leads to a double put. Fix this by not releasing the cl_lock in order to release each lock stateid. Use put_generic_stateid_locked to unhash them and gather them onto a list, and free_ol_stateid_reaplist to free any that end up on the list. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:31 -04:00
Jeff Layton	2c41beb0e5	nfsd: reduce cl_lock thrashing in release_openowner Releasing an openowner is a bit inefficient as it can potentially thrash the cl_lock if you have a lot of stateids attached to it. Once we remove the client_mutex, it'll also potentially be dangerous to do this. Add some functions to make it easier to defer the part of putting a generic stateid reference that needs to be done outside the cl_lock while doing the parts that must be done while holding it under a single lock. First we unhash each open stateid. Then we call put_generic_stateid_locked which will put the reference to an nfs4_ol_stateid. If it turns out to be the last reference, it'll go ahead and remove the stid from the IDR tree and put it onto the reaplist using the st_locks list_head. Then, after dropping the lock we'll call free_ol_stateid_reaplist to walk the list of stateids that are fully unhashed and ready to be freed, and free each of them. This function can sleep, so it must be done outside any spinlocks. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:30 -04:00
Jeff Layton	fc5a96c3b7	nfsd: close potential race in nfsd4_free_stateid Once we remove the client_mutex, it'll be possible for the sc_type of a lock stateid to change after it's found and checked, but before we can go to destroy it. If that happens, we can end up putting the persistent reference to the stateid more than once, and unhash it more than once. Fix this by unhashing the lock stateid prior to dropping the cl_lock but after finding it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:29 -04:00
Jeff Layton	3c1c995cc2	nfsd: optimize destroy_lockowner cl_lock thrashing Reduce the cl_lock trashing in destroy_lockowner. Unhash all of the lockstateids on the lockowner's list. Put the reference under the lock and see if it was the last one. If so, then add it to a private list to be destroyed after we drop the lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:28 -04:00
Jeff Layton	a819ecc1bb	nfsd: add locking to stateowner release Once we remove the client_mutex, we'll need to properly protect the stateowner reference counts using the cl_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:27 -04:00
Jeff Layton	882e9d25e1	nfsd: clean up and reorganize release_lockowner Do more within the main loop, and simplify the function a bit. Also, there's no need to take a stateowner reference unless we're going to call release_lockowner. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:27 -04:00
Trond Myklebust	d4f0489f38	nfsd: Move the open owner hash table into struct nfs4_client Preparation for removing the client_mutex. Convert the open owner hash table into a per-client table and protect it using the nfs4_client->cl_lock spin lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:26 -04:00
Trond Myklebust	c58c6610ec	nfsd: Protect adding/removing lock owners using client_lock Once we remove client mutex protection, we'll need to ensure that stateowner lookup and creation are atomic between concurrent compounds. Ensure that alloc_init_lock_stateowner checks the hashtable under the client_lock before adding a new element. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:25 -04:00
Trond Myklebust	7ffb588086	nfsd: Protect adding/removing open state owners using client_lock Once we remove client mutex protection, we'll need to ensure that stateowner lookup and creation are atomic between concurrent compounds. Ensure that alloc_init_open_stateowner checks the hashtable under the client_lock before adding a new element. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:24 -04:00
Jeff Layton	b401be22b5	nfsd: don't allow CLOSE to proceed until refcount on stateid drops Once we remove client_mutex protection, it'll be possible to have an in-flight operation using an openstateid when a CLOSE call comes in. If that happens, we can't just put the sc_file reference and clear its pointer without risking an oops. Fix this by ensuring that v4.0 CLOSE operations wait for the refcount to drop before proceeding to do so. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:23 -04:00
Jeff Layton	d3134b1049	nfsd: make openstateids hold references to their openowners Change it so that only openstateids hold persistent references to openowners. References can still be held by compounds in progress. With this, we can get rid of NFS4_OO_NEW. It's possible that we will create a new openowner in the process of doing the open, but something later fails. In the meantime, another task could find that openowner and start using it on a successful open. If that occurs we don't necessarily want to tear it down, just put the reference that the failing compound holds. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:23 -04:00
Jeff Layton	5adfd8850b	nfsd: clean up refcounting for lockowners Ensure that lockowner references are only held by lockstateids and operations that are in-progress. With this, we can get rid of release_lockowner_if_empty, which will be racy once we remove client_mutex protection. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:22 -04:00
Trond Myklebust	e4f1dd7fc2	nfsd: Make lock stateid take a reference to the lockowner A necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:21 -04:00
Jeff Layton	8f4b54c53f	nfsd: add an operation for unhashing a stateowner Allow stateowners to be unhashed and destroyed when the last reference is put. The unhashing must be idempotent. In a future patch, we'll add some locking around it, but for now it's only protected by the client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:20 -04:00
Jeff Layton	5db1c03feb	nfsd: clean up lockowner refcounting when finding them Ensure that when finding or creating a lockowner, that we get a reference to it. For now, we also take an extra reference when a lockowner is created that can be put when release_lockowner is called, but we'll remove that in a later patch once we change how references are held. Since we no longer destroy lockowners in the event of an error in nfsd4_lock, we must change how the seqid gets bumped in the lk_is_new case. Instead of doing so on creation, do it manually in nfsd4_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:20 -04:00
Jeff Layton	58fb12e6a4	nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache We don't want to rely on the client_mutex for protection in the case of NFSv4 open owners. Instead, we add a mutex that will only be taken for NFSv4.0 state mutating operations, and that will be released once the entire compound is done. Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay take a reference to the stateowner when they are using it for NFSv4.0 open and lock replay caching. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:19 -04:00
Jeff Layton	6b180f0b57	nfsd: Add reference counting to state owners The way stateowners are managed today is somewhat awkward. They need to be explicitly destroyed, even though the stateids reference them. This will be particularly problematic when we remove the client_mutex. We may create a new stateowner and attempt to open a file or set a lock, and have that fail. In the meantime, another RPC may come in that uses that same stateowner and succeed. We can't have the first task tearing down the stateowner in that situation. To fix this, we need to change how stateowners are tracked altogether. Refcount them and only destroy them once all stateids that reference them have been destroyed. This patch starts by adding the refcounting necessary to do that. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:18 -04:00
Trond Myklebust	2d3f96689f	nfsd: Migrate the stateid reference into nfs4_find_stateid_by_type() Allow nfs4_find_stateid_by_type to take the stateid reference, while still holding the &cl->cl_lock. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:17 -04:00
Trond Myklebust	fd9110113c	nfsd: Migrate the stateid reference into nfs4_lookup_stateid() Allow nfs4_lookup_stateid to take the stateid reference, instead of having all the callers do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:16 -04:00
Trond Myklebust	4cbfc9f704	nfsd: Migrate the stateid reference into nfs4_preprocess_seqid_op Allow nfs4_preprocess_seqid_op to take the stateid reference, instead of having all the callers do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:15 -04:00
Trond Myklebust	0667b1e9d8	nfsd: Add reference counting to nfs4_preprocess_confirmed_seqid_op Ensure that all the callers put the open stateid after use. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:15 -04:00
Trond Myklebust	2585fc7958	nfsd: nfsd4_open_confirm() must reference the open stateid Ensure that nfsd4_open_confirm() keeps a reference to the open stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:14 -04:00
Trond Myklebust	8a0b589d8f	nfsd: Prepare nfsd4_close() for open stateid referencing Prepare nfsd4_close for a future where nfs4_preprocess_seqid_op() hands it a fully referenced open stateid. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:13 -04:00
Trond Myklebust	d6f2bc5dcf	nfsd: nfsd4_process_open2() must reference the open stateid Ensure that nfsd4_process_open2() keeps a reference to the open stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:12 -04:00
Trond Myklebust	dcd94cc2e7	nfsd: nfsd4_process_open2() must reference the delegation stateid Ensure that nfsd4_process_open2() keeps a reference to the delegation stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:11 -04:00
Trond Myklebust	67cb1279be	nfsd: Ensure that nfs4_open_delegation() references the delegation stateid Ensure that nfs4_open_delegation() keeps a reference to the delegation stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:11 -04:00
Trond Myklebust	858cc57336	nfsd: nfsd4_locku() must reference the lock stateid Ensure that nfsd4_locku() keeps a reference to the lock stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:10 -04:00
Trond Myklebust	3d0fabd5a4	nfsd: Add reference counting to lock stateids Ensure that nfsd4_lock() references the lock stateid while it is manipulating it. Not currently necessary, but will be once the client_mutex is removed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:09 -04:00
Jeff Layton	1af71cc801	nfsd: ensure atomicity in nfsd4_free_stateid and nfsd4_validate_stateid Hold the cl_lock over the bulk of these functions. In addition to ensuring that they aren't freed prematurely, this will also help prevent a potential race that could be introduced later. Once we remove the client_mutex, it'll be possible for FREE_STATEID and CLOSE to race and for both to try to put the "persistent" reference to the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:08 -04:00
Jeff Layton	356a95ece7	nfsd: clean up races in lock stateid searching and creation Preparation for removal of the client_mutex. Currently, no lock aside from the client_mutex is held when calling find_lock_state. Ensure that the cl_lock is held by adding a lockdep assertion. Once we remove the client_mutex, it'll be possible for another thread to race in and insert a lock state for the same file after we search but before we insert a new one. Ensure that doesn't happen by redoing the search after allocating a new stid that we plan to insert. If one is found just put the one that was allocated. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:07 -04:00
Jeff Layton	1c755dc1ad	nfsd: Add locking to protect the state owner lists Change to using the clp->cl_lock for this. For now, there's a lot of cl_lock thrashing, but in later patches we'll eliminate that and close the potential races that can occur when releasing the cl_lock while walking the lists. For now, the client_mutex prevents those races. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:07 -04:00
Jeff Layton	b49e084d8c	nfsd: do filp_close in sc_free callback for lock stateids Releasing locks when we unhash the stateid instead of doing so only when the stateid is actually released will be problematic in later patches when we need to protect the unhashing with spinlocks. Move it into the sc_free operation instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:19:50 -04:00
Jeff Layton	4770d72201	nfsd4: use cl_lock to synchronize all stateid idr calls Currently, this is serialized by the client_mutex, which is slated for removal. Add finer-grained locking here. Also, do some cleanup around find_stateid to prepare for taking references. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:19:25 -04:00
Trond Myklebust	11b9164ada	nfsd: Add a struct nfs4_file field to struct nfs4_stid All stateids are associated with a nfs4_file. Let's consolidate. Replace delegation->dl_file with the dl_stid.sc_file, and nfs4_ol_stateid->st_file with st_stid.sc_file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 12:51:34 -04:00
Trond Myklebust	6011695da2	nfsd: Add reference counting to the lock and open stateids When we remove the client_mutex, we'll need to be able to ensure that these objects aren't destroyed while we're not holding locks. Add a ->free() callback to the struct nfs4_stid, so that we can release a reference to the stid without caring about the contents. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 12:43:53 -04:00
hujianyang	25601a3c97	UBIFS: Add log overlap assertions We use a circle area to record the log nodes in ubifs. This log area should not be overlapped. But after researching the code, I found some conditions may lead log head wraps log ltail. Although we've fixed the problems discovered, there may be some other issues still left. This patch adds assertions where lhead changes to next leb to make sure ltail is not wrapped. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-31 15:52:51 +03:00
Chao Yu	b3582c6892	f2fs: reduce competition among node page writes We do not need to block on ->node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for ->node_write to promote performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 23:28:37 -07:00
Theodore Ts'o	86f0afd463	ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct If there is a failure while allocating the preallocation structure, a number of blocks can end up getting marked in the in-memory buddy bitmap, and then not getting released. This can result in the following corruption getting reported by the kernel: EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126, 12793 clusters in bitmap, 12729 in gd In that case, we need to release the blocks using mb_free_blocks(). Tested: fs smoke test; also demonstrated that with injected errors, the file system is no longer getting corrupted Google-Bug-Id: 16657874 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-07-30 22:17:17 -04:00
Jaegeuk Kim	65b85ccce0	f2fs: fix coding style This patch fixes wrong coding style. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 17:25:54 -07:00
Dongho Sim	33be828ada	f2fs: remove redundant lines in allocate_data_block There are redundant lines in allocate_data_block. In this function, we call refresh_sit_entry with old seg and old curseg. After that, we call locate_dirty_segment with old curseg. But, the new address is always allocated from old curseg and we call locate_dirty_segment with old curseg in refresh_sit_entry. So, we do not need to call locate_dirty_segment with old curseg again. We've discussed like below: Jaegeuk said: "When considering SSR, we need to take care of the following scenario. - old segno : X - new address : Z - old curseg : Y This means, a new block is supposed to be written to Z from X. And Z is newly allocated in the same path from Y. In that case, we should trigger locate_dirty_segment for Y, since it was a current_segment and can be dirty owing to SSR. But that was not included in the dirty list." Changman said: "We already choosed old curseg(Y) and then we allocate new address(Z) from old curseg(Y). After that we call refresh_sit_entry(old address, new address). In the funcation, we call locate_dirty_segment with old seg and old curseg. So calling locate_dirty_segment after refresh_sit_entry again is redundant." Jaegeuk said: "Right. The new address is always allocated from old_curseg." Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Dongho Sim <dh.sim@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 15:38:59 -07:00
Jaegeuk Kim	24a9ee0fa3	f2fs: add tracepoint for f2fs_issue_flush This patch adds a tracepoint for f2fs_issue_flush. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:36 -07:00
Jaegeuk Kim	cf2271e781	f2fs: avoid retrying wrong recovery routine when error was occurred This patch eliminates the propagation of recovery errors to the next mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	61e0f2d0a5	f2fs: test before set/clear bits If the bit is already set, we don't need to reset it, and vice versa. Because we don't need to make the caches dirty for that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	01229f5e1b	f2fs: fix wrong condition for unlikely This patch fixes the wrongly used unlikely condition. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:31 -07:00
Jaegeuk Kim	ea1aa12ca2	f2fs: enable in-place-update for fdatasync This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:23 -07:00
Jaegeuk Kim	6d99ba41a7	f2fs: skip unnecessary data writes during fsync This patch intends to improve the fsync performance by skipping remaining the recovery information, only when there is no data that we should recover. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:03 -07:00
David S. Miller	f139c74a8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 13:25:49 -07:00
Jeff Layton	b3fbfe0e7a	nfsd: print status when nfsd4_open fails to open file it just created It's possible for nfsd to fail opening a file that it has just created. When that happens, we throw a WARN but it doesn't include any info about the error code. Print the status code to give us a bit more info. Our QA group hit some of these warnings under some very heavy stress testing. My suspicion is that they hit the file-max limit, but it's hard to know for sure. Go ahead and add a -ENFILE mapping to nfserr_serverfault to make the error more distinct (and correct). Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 23:08:38 -04:00
Eric W. Biederman	728dba3a39	namespaces: Use task_lock and not rcu to protect nsproxy The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit `cf7b708c8d` Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-29 18:08:50 -07:00
Christoph Hellwig	d5cf09bace	xfs: require 64-bit sector_t Trying to support tiny disks only and saving a bit memory might have made sense on an SGI O2 15 years ago, but is pretty pointless today. Remove the rarely tested codepath that uses various smaller in-memory types to reduce our test matrix and make the codebase a little bit smaller and less complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-30 09:12:05 +10:00
Jeff Layton	650ecc8f8f	nfsd: remove dl_fh field from struct nfs4_delegation Now that the nfs4_file has a filehandle in it, we no longer need to keep a per-delegation copy of it. Switch to using the one in the nfs4_file instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:58 -04:00
Jeff Layton	f54fe962b8	nfsd: give block_delegation and delegation_blocked its own spinlock The state lock can be fairly heavily contended, and there's no reason that nfs4_file lookups and delegation_blocked should be mutually exclusive. Let's give the new block_delegation code its own spinlock. It does mean that we'll need to take a different lock in the delegation break code, but that's not generally as critical to performance. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:57 -04:00
Jeff Layton	0b26693c56	nfsd: clean up nfs4_set_delegation Move the alloc_init_deleg call into nfs4_set_delegation and change the function to return a pointer to the delegation or an IS_ERR return. This allows us to skip allocating a delegation if the file has already experienced a lease conflict. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:56 -04:00
Jeff Layton	4cf59221c7	nfsd: clean up arguments to nfs4_open_delegation No need to pass in a net pointer since we can derive that. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:55 -04:00
Jeff Layton	f9416e281e	nfsd: drop unused stp arg to alloc_init_deleg Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:54 -04:00
Trond Myklebust	02a3508dba	nfsd: Convert delegation counter to an atomic_long_t type We want to convert to an atomic type so that we don't need to lock across the call to alloc_init_deleg(). Then convert to a long type so that we match the size of 'max_delegations'. None of this is a problem today, but it will be once we remove client_mutex protection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:54 -04:00
Jeff Layton	2d4a532d38	nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock Currently, both destroy_revoked_delegation and revoke_delegation manipulate the cl_revoked list without any locking aside from the client_mutex. Ensure that the clp->cl_lock is held when manipulating it, except for the list walking in destroy_client. At that point, the client should no longer be in use, and so it should be safe to walk the list without any locking. That also means that we don't need to do the list_splice_init there either. Also, the fact that revoke_delegation deletes dl_recall_lru list_head without any locking makes it difficult to know whether it's doing so safely in all cases. Move the list_del_init calls into the callers, and add a WARN_ON in the event that t's passed a delegation that has a non-empty list_head. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:53 -04:00
Jeff Layton	4269067696	nfsd: fully unhash delegations when revoking them Ensure that the delegations cannot be found by the laundromat etc once we add them to the various 'revoke' lists. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:52 -04:00
Trond Myklebust	f83388341b	nfsd: simplify stateid allocation and file handling Don't allow stateids to clear the open file pointer until they are being destroyed. In a later patches we'll want to rely on the fact that we have a valid file pointer when dealing with the stateid and this will save us from having to do a lot of NULL pointer checks before doing so. Also, move to allocating stateids with kzalloc and get rid of the explicit zeroing of fields. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:51 -04:00
David Howells	0ef1351523	AFS: Correctly assemble the client UUID Correctly assemble the client UUID by OR'ing in the flags rather than assigning them over the other components. Reported-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-29 10:14:36 -07:00
Jaegeuk Kim	fff04f90c1	f2fs: add info of appended or updated data writes This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	39efac41fb	f2fs: use radix_tree for ino management For better ino management, this patch replaces the data structure from list to radix tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	6451e041c8	f2fs: add infra for ino management This patch changes the naming of orphan-related data structures to use as inode numbers managed globally. Later, we can use this facility for managing any inode number lists. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:45:54 -07:00
Namjae Jeon	ee98fa3a8b	ext4: fix COLLAPSE RANGE test for bigalloc file systems Blocks in collapse range should be collapsed per cluster unit when bigalloc is enable. If bigalloc is not enable, EXT4_CLUSTER_SIZE will be same with EXT4_BLOCK_SIZE. With this bug fixed, patch enables COLLAPSE_RANGE for bigalloc, which fixes a large number of xfstest failures which use fsx. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-29 10:45:31 -04:00
Jaegeuk Kim	953e6cc6bc	f2fs: punch the core function for inode management This patch punches out the core functions to manage the inode numbers. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:40:25 -07:00
Jaegeuk Kim	0f7b2abd18	f2fs: add nobarrier mount option This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 05:27:48 -07:00
David S. Miller	3fd0202a0d	Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-07-25 Please pull this batch of updates intended for the 3.17 stream! For the mac80211 bits, Johannes says: "We have a lot of TDLS patches, among them a fix that should make hwsim tests happy again. The rest, this time, is mostly small fixes." For the Bluetooth bits, Gustavo says: "Some more patches for 3.17. The most important change here is the move of the 6lowpan code to net/6lowpan. It has been agreed with Davem that this change will go through the bluetooth tree. The rest are mostly clean up and fixes." and, "Here follows some more patches for 3.17. These are mostly fixes to what we've sent to you before for next merge window." For the iwlwifi bits, Emmanuel says: "I have the usual amount of BT Coex stuff. Arik continues to work on TDLS and Ariej contributes a few things for HS2.0. I added a few more things to the firmware debugging infrastructure. Eran fixes a small bug - pretty normal content." And for the Atheros bits, Kalle says: "For ath6kl me and Jessica added support for ar6004 hw3.0, our latest version of ar6004. For ath10k Janusz added a printout so that it's easier to check what ath10k kconfig options are enabled. He also added a debugfs file to configure maximum amsdu and ampdu values. Also we had few fixes as usual." On top of that is the usual large batch of various driver updates -- brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action. Rafał has also been very busy with b43 and related updates. Also, I pulled the wireless tree into this in order to resolve a merge conflict... P.S. The change to fs/compat_ioctl.c reflects a name change in a Bluetooth header file... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 17:36:25 -07:00
Darrick J. Wong	40b163f1c4	ext4: check inline directory before converting Before converting an inline directory to a regular directory, check the directory entries to make sure they're not obviously broken. This helps us to avoid a BUG_ON if one of the dirents is trashed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2014-07-28 13:06:26 -04:00
Artem Bityutskiy	6390e99177	Revert "UBIFS: add a log overlap assertion" This reverts commit `545f7fdf6d`. Hujianyang's testing revealed that the patch is bogus. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-28 19:15:19 +03:00
Yan, Zheng	06fee30f6a	ceph: fix append mode write generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-28 13:29:33 +04:00
Ilya Dryomov	7e8a295295	ceph: fix sizeof(struct tYpO *) typo struct ceph_xattr -> struct ceph_inode_xattr Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:29:27 +04:00
Ilya Dryomov	1a295bd8c8	ceph: remove redundant memset(0) xattrs array of pointers is allocated with kcalloc() - no need to memset() it to 0 right after that. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:28:33 +04:00
Ingo Molnar	ca5bc6cd5d	Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-28 10:03:00 +02:00
Dmitry Monakhov	6e2631463f	ext4: fix incorrect locking in move_extent_per_page If we have to copy data we must drop i_data_sem because of get_blocks() will be called inside mext_page_mkuptodate(), but later we must reacquire it again because we are about to change extent's tree Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:32:27 -04:00
Dmitry Monakhov	29faed1638	ext4: use correct depth value Inode's depth can be changed from here: ext4_ext_try_to_merge() ->ext4_ext_try_to_merge_up() We must use correct value. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:30:29 -04:00
Dmitry Monakhov	4b1f166071	ext4: add i_data_sem sanity check Each caller of ext4_ext_dirty must hold i_data_sem, The only exception is migration code, let's make it convenient. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:28:15 -04:00
Xiaoguang Wang	b27b1535ac	ext4: fix wrong size computation in ext4_mb_normalize_request() As the member fe_len defined in struct ext4_free_extent is expressed as number of clusters, the variable "size" computation is wrong, we need to first translate fe_len to block number, then to bytes. Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>	2014-07-27 22:26:36 -04:00
Linus Torvalds	fbf08efa04	Merge branch 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs Pull vfs fixes from Christoph Hellwig: "A vfsmount leak fix, and a compile warning fix" * 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs: fs: umount on symlink leaks mnt count direct-io: fix uninitialized warning in do_direct_IO()	2014-07-27 09:43:41 -07:00
Linus Torvalds	0246544fc9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "These two pathes fix issues with the kernel-userspace protocol changes in v3.15" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT fuse: s_time_gran fix	2014-07-25 16:16:34 -07:00
Chao Yu	9d84795077	f2fs: fix to put root inode in error path of fill_super We should put root inode correctly in error path of fill_super, otherwise we may encounter a leak case of inode resource. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:19:57 -07:00
Chao Yu	dbf20cb259	f2fs: avoid use invalid mapping of node_inode when evict meta inode Andrey Tsyvarev reported: "Using memory error detector reveals the following use-after-free error in 3.15.0: AddressSanitizer: heap-use-after-free in f2fs_evict_inode Read of size 8 by thread T22279: [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs] [<ffffffff812359af>] evict+0x15f/0x290 [< inlined >] iput+0x196/0x280 iput_final [<ffffffff812369a6>] iput+0x196/0x280 [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs] [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0 [<ffffffff812105fd>] kill_block_super+0x4d/0xb0 [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80 [<ffffffff81211c98>] deactivate_super+0x68/0x80 [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250 [< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b Freed by thread T3: [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs] [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim [< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch [< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930 [<ffffffff8107cce2>] __do_softirq+0x142/0x380 [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50 [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280 [<ffffffff810a8238>] kthread+0x148/0x160 [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0 Allocated by thread T22276: [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs] [<ffffffff81235e2a>] iget_locked+0x10a/0x230 [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs] [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs] [<ffffffff81211bce>] mount_bdev+0x1de/0x240 [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs] [<ffffffff81212a85>] mount_fs+0x55/0x220 [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200 [< inlined >] do_mount+0x2b4/0x1120 do_new_mount [<ffffffff812400d4>] do_mount+0x2b4/0x1120 [< inlined >] SyS_mount+0xb2/0x110 SYSC_mount [<ffffffff812414a2>] SyS_mount+0xb2/0x110 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b The buggy address ffff8800587866c8 is located 48 bytes inside of 680-byte region [ffff880058786698, ffff880058786940) Memory state around the buggy address: ffff880058786100: ffffffff ffffffff ffffffff ffffffff ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff ffff880058786400: ffffffff ffffffff ffffffff ffffffff ffff880058786500: ffffffff ffffffff ffffffff fffffffr >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff ^ ffff880058786700: ffffffff ffffffff ffffffff ffffffff ffff880058786800: ffffffff ffffffff ffffffff ffffffff ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr.... ffff880058786a00: ........ ........ ........ ........ ffff880058786b00: ........ ........ ........ ........ Legend: f - 8 freed bytes r - 8 redzone bytes . - 8 allocated bytes x=1..7 - x allocated bytes + (8-x) redzone bytes Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned." Nids for both meta_inode and node_inode are reservation, so it's not necessary for us to invalidate pages which will never be allocated. To fix this issue, let's skipping needlessly invalidating pages for {meta,node}_inode in f2fs_evict_inode. Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:19:55 -07:00
Chao Yu	32f9bc25cb	f2fs: support ->rename2() Now new interface ->rename2() is added to VFS, here are related description: https://lkml.org/lkml/2014/2/7/873 https://lkml.org/lkml/2014/2/7/758 This patch adds function f2fs_rename2() to support ->rename2() including handling both RENAME_EXCHANGE and RENAME_NOREPLACE flag. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:14:08 -07:00
Huang Ying	79e35dc3c2	f2fs: add f2fs_balance_fs for direct IO Otherwise, if a large amount of direct IO writes were done, the segment allocation may be failed because no enough segments are gced. Changes: v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:13:58 -07:00
Gu Zheng	00fefb9cf2	aio: use iovec array rather than the single one Previously, we only offer a single iovec to handle all the read/write cases, so the PREADV/PWRITEV request always need to alloc more iovec buffer when copying user vectors. If we use a tmp iovec array rather than the single one, some small PREADV/PWRITEV workloads(vector size small than the tmp buffer) will not need to alloc more iovec buffer when copying user vectors. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	2be4e7deec	aio: fix some comments The function comments of aio_run_iocb and aio_read_events are out of date, so fix them here. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	8dc4379e17	aio: use the macro rather than the inline magic number Replace the inline magic number with the ready-made macro(AIO_RING_MAGIC), just clean up. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	b53f1f82fb	aio: remove the needless registration of ring file's private_data Remove the registration of ring file's private_data, we do not use it. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Eric Paris	7d8b6c6375	CAPABILITIES: remove undefined caps from all processes This is effectively a revert of `7b9a7ec565` plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>	2014-07-24 21:53:47 +10:00
James Morris	4ca332e11d	Merge tag 'keys-next-20140722' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next	2014-07-24 21:36:19 +10:00
Jie Liu	74dc93a908	xfs: fix uflags detection at xfs_fs_rm_xquota We are intended to check up uflags against FS_PROJ_QUOTA rather than FS_USER_UQUOTA once more, it looks to me like a typo, but might cause the project quota metadata space can not be removed. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 21:27:17 +10:00
Jie Liu	7b409a7d6f	xfs: remove XFS_IS_OQUOTA_ON macros Remove the XFS_IS_OQUOTA_ON macros as it is obsoleted. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 21:27:16 +10:00
Eric Sandeen	54aa61f82d	xfs: tidy up xfs_set_inode32 xfs_set_inode32() caught my eye because it had weird spacing around the "-1's". In cleaning that up, I realized that the assignment in the declaration of "ino" is never used; it's rewritten before it gets read. Drop the ino initializer from its declaration since it's not used, and move the agino initialization into the body of the function, mostly so that we can have pretty whitespace and not exceed 80 columns. :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:53:10 +10:00
Eric Sandeen	9de67c3ba9	xfs: allow inode allocations in post-growfs disk space Today, if we perform an xfs_growfs which adds allocation groups, mp->m_maxagi is not properly updated when the growfs is complete. Therefore inodes will continue to be allocated only in the AGs which existed prior to the growfs, and the new space won't be utilized. This is because of this path in xfs_growfs_data_private(): xfs_growfs_data_private xfs_initialize_perag(mp, nagcount, &nagimax); if (mp->m_flags & XFS_MOUNT_32BITINODES) index = xfs_set_inode32(mp); else index = xfs_set_inode64(mp); if (maxagi) maxagi = index; where xfs_set_inode iterates over the (old) agcount in mp->m_sb.sb_agblocks, which has not yet been updated in the growfs path. So "index" will be returned based on the old agcount, not the new one, and new AGs are not available for inode allocation. Fix this by explicitly passing the proper AG count (which xfs_initialize_perag() already has) down another level, so that xfs_set_inode* can make the proper decision about acceptable AGs for inode allocation in the potentially newly-added AGs. This has been broken since 3.7, when these two xfs_set_inode* functions were added in commit `2d2194f`. Prior to that, we looped over "agcount" not sb_agblocks in these calculations. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:51:54 +10:00
Jie Liu	eb866bbf09	xfs: mark xfs_qm_quotacheck as static xfs_qm_quotacheck() is not used outside of xfs_qm.c. Mark it static and move it around in the file to avoid a forward declaration. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:49:57 +10:00
Mark Tinguely	5c18717ea2	xfs: fix cil push sequence after log recovery When the CIL checkpoint is fully written to the log, the LSN of the checkpoint commit record is written into the CIL context structure. This allows log force waiters to correctly detect when the checkpoint they are waiting on have been fully written into the log buffers. However, the initial context after mount is initialised with a non-zero commit LSN, so appears to waiters as though it is complete even though it may not have even been pushed, let alone written to the log buffers. Hence a log force immediately after a filesystem is mounted may not behave correctly, nor does commit record ordering if multiple CIL pushes interleave immediately after mount. To fix this, make sure the initial context commit LSN is not touched until the first checkpointis actually pushed. [dchinner: rewrite commit message] Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:49:40 +10:00
Vasily Averin	295dc39d94	fs: umount on symlink leaks mnt count Currently umount on symlink blocks following umount: /vz is separate mount # ls /vz/ -al \| grep test drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir # umount -l /vz/testlink umount: /vz/testlink: not mounted (expected) # lsof /vz # umount /vz umount: /vz: device is busy. (unexpected) In this case mountpoint_last() gets an extra refcount on path->mnt Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Ian Kent <raven@themaw.net> Acked-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-24 06:18:12 -04:00
Boaz Harrosh	6fcc5420bf	direct-io: fix uninitialized warning in do_direct_IO() The following warnings: fs/direct-io.c: In function ‘__blockdev_direct_IO’: fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:16: note: ‘to’ was declared here fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:10: note: ‘from’ was declared here are false positive because dio_get_page() either fails, or sets both 'from' and 'to'. Paul Bolle said ... Maybe it's better to move initializing "to" and "from" out of dio_get_page(). That _might_ make it easier for both the the reader and the compiler to understand what's going on. Something like this: Christoph Hellwig said ... The fix of moving the code definitively looks nicer, while I think uninitialized_var is horrible wart that won't get anywhere near my code. Boaz Harrosh: I agree with Christoph and Paul Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-24 06:17:07 -04:00
Brian Foster	f074051ff5	xfs: squash prealloc while over quota free space as well From: Brian Foster <bfoster@redhat.com> Commit `4d559a3b` introduced heavy prealloc. squashing to catch the case of requesting too large a prealloc on smaller filesystems, leading to repeated flush and retry cycles that occur on ENOSPC. Now that we issue eofblocks scans on EDQUOT/ENOSPC, squash the prealloc against the minimum available free space across all applicable quotas as well to avoid a similar problem of repeated eofblocks scans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:56:08 +10:00
Brian Foster	dc06f398f0	xfs: run an eofblocks scan on ENOSPC/EDQUOT From: Brian Foster <bfoster@redhat.com> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:49:28 +10:00
Brian Foster	f452639792	xfs: support a union-based filter for eofblocks scans From: Brian Foster <bfoster@redhat.com> The eofblocks scan inode filter uses intersection logic by default. E.g., specifying both user and group quota ids filters out inodes that are not covered by both the specified user and group quotas. This is suitable for behavior exposed to userspace. Scans that are initiated from within the kernel might require more broad semantics, such as scanning all inodes under each quota associated with an inode to alleviate low free space conditions in each. Create the XFS_EOF_FLAGS_UNION flag to support a conditional union-based filtering algorithm for eofblocks scans. This flag is intentionally left out of the valid mask as it is not supported for scans initiated from userspace. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:44:28 +10:00
Brian Foster	5400da7dc0	xfs: add scan owner field to xfs_eofblocks From: Brian Foster <bfoster@redhat.com> The scan owner field represents an optional inode number that is responsible for the current scan. The purpose is to identify that an inode is under iolock and as such, the iolock shouldn't be attempted when trimming eofblocks. This is an internal only field. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:40:22 +10:00
Jie Liu	f3d1e58743	xfs: introduce xfs_bulkstat_grab_ichunk From: Jie Liu <jeff.liu@oracle.com> Introduce xfs_bulkstat_grab_ichunk() to look up an inode chunk in where the given inode resides, then grab the record. Update the data for the pointed-to record if the inode was not the last in the chunk and there are some left allocated, return the grabbed inode count on success. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:42:21 +10:00
Jie Liu	4b8fdfecd8	xfs: introduce xfs_bulkstat_ichunk_ra From: Jie Liu <jeff.liu@oracle.com> Introduce xfs_bulkstat_ichunk_ra() to loop over all clusters in the next inode chunk, then performs readahead if there are any allocated inodes in that cluster. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:41:18 +10:00
Jie Liu	d4c2734875	xfs: fix error handling at xfs_bulkstat From: Jie Liu <jeff.liu@oracle.com> We should not ignore the btree operation errors at xfs_bulkstat() but to propagate them if any. This patch fix two places in this function and the remaining things will be fixed with code refactoring thereafter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:40:43 +10:00
Jie Liu	296dfd7fdb	xfs: remove redundant user buffer count checks at xfs_bulkstat From: Jie Liu <jeff.liu@oracle.com> Remove the redundant user buffer and count checks as it has already been validated at xfs_ioc_bulkstat(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:40:26 +10:00
Himangi Saraogi	08a0f24e4c	ceph: replace comma with a semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-07-24 12:04:46 +04:00
Jie Liu	c7cb51dcb0	xfs: fix error handling at xfs_inumbers From: Jie Liu <jeff.liu@oracle.com> To fetch the file system number tables, we currently just ignore the errors and proceed to loop over the next AG or bump agino to the next chunk in case of btree operations failed, that is not properly because those errors might hint us potential file system problems. This patch rework xfs_inumbers() to handle the btree operation errors as well as the loop conditions. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 12:18:47 +10:00
Jie Liu	549fa00679	xfs: consolidate xfs_inumbers From: Jie Liu <jeff.liu@oracle.com> Consolidate xfs_inumbers() to make the formatter function return correct error and make the source code looks a bit neat. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 12:11:47 +10:00
Christoph Hellwig	d716f8eedb	xfs: remove xfs_bulkstat_single From: Christoph Hellwig <hch@lst.de> xfs_bukstat_one doesn't have any failure case that would go away when called through xfs_bulkstat, so remove the fallback and the now unessecary xfs_bulkstat_single function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 12:07:15 +10:00
Jie Liu	8fe657760d	xfs: remove redundant stat assignment in xfs_bulkstat_one_int From: Jie Liu <jeff.liu@oracle.com> Remove the redundant BULKSTAT_RV_NOTHING assignment in case of call xfs_iget() failed at xfs_bulkstat_one_int(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 11:33:28 +10:00
Linus Torvalds	82e13c71bc	Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfix from Bruce Fields: "Another regression from the xdr encoding rewrite" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: NFSD: Fix crash encoding lock reply on 32-bit	2014-07-23 17:55:11 -07:00
Hugh Dickins	4e66d445d0	simple_xattr: permit 0-size extended attributes If a filesystem uses simple_xattr to support user extended attributes, LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes values of zero size. Fix that off-by-one (but apparently no filesystem needs them yet). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-23 15:10:55 -07:00
Silesh C V	aed8adb768	coredump: fix the setting of PF_DUMPCORE Commit `079148b919` ("coredump: factor out the setting of PF_DUMPCORE") cleaned up the setting of PF_DUMPCORE by removing it from all the linux_binfmt->core_dump() and moving it to zap_threads().But this ended up clearing all the previously set flags. This causes issues during core generation when tsk->flags is checked again (eg. for PF_USED_MATH to dump floating point registers). Fix this. Signed-off-by: Silesh C V <svellattu@mvista.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-23 15:10:54 -07:00
Thomas Gleixner	5eaaed4fe2	fs: lockd: Use ktime_get_ns() Replace the ever recurring: ts = ktime_get_ts(); ns = timespec_to_ns(&ts); with ns = ktime_get_ns(); Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:44 -07:00
Jeff Layton	f9c00c3ab4	nfsd: Do not let nfs4_file pin the struct inode Remove the fi_inode field in struct nfs4_file in order to remove the possibility of struct nfs4_file pinning the inode when it does not have any open state. The only place we still need to get to an inode is in check_for_locks, so change it to use find_any_file and use the inode from any that it finds. If it doesn't find one, then just assume there aren't any. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	b07c54a4a3	nfsd: nfs4_check_fh - make it actually check the filehandle ...instead of just checking the inode that corresponds to it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	ca94321783	nfsd: Use the filehandle to look up the struct nfs4_file instead of inode This makes more sense anyway since an inode pointer value can change even when the filehandle doesn't. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	e2cf80d73f	nfsd: Store the filehandle with the struct nfs4_file For use when we may not have a struct inode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:23 -04:00
Himangi Saraogi	fc8e5a644c	nfsd4: convert comma to semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 14:20:49 -04:00
Jeff Layton	2f6ce8e73c	nfsd: ensure that st_access_bmap and st_deny_bmap are initialized to 0 Open stateids must be initialized with the st_access_bmap and st_deny_bmap set to 0, so that nfs4_get_vfs_file can properly record their state in old_access_bmap and old_deny_bmap. This bug was introduced in commit `baeb4ff0e5` (nfsd: make deny mode enforcement more efficient and close races in it) and was causing the refcounts to end up incorrect when nfs4_get_vfs_file returned an error after bumping the refcounts. This made it impossible to unmount the underlying filesystem after running pynfs tests that involve deny modes. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 14:20:47 -04:00
Thomas Gleixner	57e0be041d	sched: Make task->real_start_time nanoseconds based Simplify the only user of this data by removing the timespec conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 10:18:05 -07:00
Thomas Gleixner	53cc7bad37	timerfd: Use ktime_mono_to_real() We have a few other use cases of ktime_get_monotonic_offset() which can be optimized with ktime_mono_to_real(). The timerfd code uses the offset only for comparison, so we can use ktime_mono_to_real(0) for this as well. Funny enough text size shrinks with that on ARM and x8664 !? Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 10:18:02 -07:00
Kinglong Mee	f98bac5a30	NFSD: Fix crash encoding lock reply on 32-bit Commit `8c7424cff6` "nfsd4: don't try to encode conflicting owner if low on space" forgot to free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak. Worse, kfree() can be called on an uninitialized pointer in the case of a succesful lock (or one that fails for a reason other than a conflict). (Note that lock->lk_denied.ld_owner.data appears it should be zero here, until you notice that it's one arm of a union the other arm of which is written to in the succesful case by the memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, sizeof(stateid_t)); in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.) Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `8c7424cff6` ""nfsd4: don't try to encode conflicting owner if low on space" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 10:31:56 -04:00
David Howells	633706a2ee	Merge branch 'keys-fixes' into keys-next Signed-off-by: David Howells <dhowells@redhat.com>	2014-07-22 21:55:45 +01:00
David Howells	f9167789df	KEYS: user: Use key preparsing Make use of key preparsing in user-defined and logon keys so that quota size determination can take place prior to keyring locking when a key is being added. Also the idmapper key types need to change to match as they use the user-defined key type routines. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>	2014-07-22 21:46:17 +01:00
Jeff Layton	d55a166c96	nfsd: bump dl_time when unhashing delegation There's a potential race between a lease break and DELEGRETURN call. Suppose a lease break comes in and queues the workqueue job for a delegation, but it doesn't run just yet. Then, a DELEGRETURN comes in finds the delegation and calls destroy_delegation on it to unhash it and put its primary reference. Next, the workqueue job runs and queues the delegation back onto the del_recall_lru list, issues the CB_RECALL and puts the final reference. With that, the final reference to the delegation is put, but it's still on the LRU list. When we go to unhash a delegation, it's because we intend to get rid of it soon afterward, so we don't want lease breaks to mess with it once that occurs. Fix this by bumping the dl_time whenever we unhash a delegation, to ensure that lease breaks don't monkey with it. I believe this is a regression due to commit `02e1215f9f` (nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg). Prior to that, the state_lock was held in the lm_break callback itself, and that would have prevented this race. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-22 15:34:47 -04:00
Andrew Gallagher	d7afaec0b5	fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT Here some additional changes to set a capability flag so that clients can detect when it's appropriate to return -ENOSYS from open. This amends the following commit introduced in 3.14: `7678ac5061` fuse: support clients that don't implement 'open' However we can only add the flag to 3.15 and later since there was no protocol version update in 3.14. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.15+	2014-07-22 16:37:43 +02:00
Miklos Szeredi	a800bad366	fuse: s_time_gran fix Default s_time_gran is 1, don't overwrite that if userspace didn't explicitly specify one. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.15+	2014-07-22 16:37:42 +02:00
Benjamin LaHaise	be6fb451a2	aio: remove no longer needed preempt_disable() Based on feedback from Jens Axboe on `263782c1c9`, clean up get/put_reqs_available() to remove the no longer needed preempt_disable() and preempt_enable() pair. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Jens Axboe <axboe@kernel.dk>	2014-07-22 09:56:56 -04:00
Trond Myklebust	72c0b0fb9f	nfsd: Move the delegation reference counter into the struct nfs4_stid We will want to add reference counting to the lock stateid and open stateids too in later patches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 17:03:00 -04:00
Jeff Layton	417c6629b2	nfsd: fix race that grants unrecallable delegation If nfs4_setlease succesfully acquires a new delegation, then another task breaks the delegation before we reach hash_delegation_locked, then the breaking task will see an empty fi_delegations list and do nothing. The client will receive an open reply incorrectly granting a delegation and will never receive a recall. Move more of the delegation fields to be protected by the fi_lock. It's more granular than the state_lock and in later patches we'll want to be able to rely on it in addition to the state_lock. Attempt to acquire a delegation. If that succeeds, take the spinlocks and then check to see if the file has had a conflict show up since then. If it has, then we assume that the lease is no longer valid and that we shouldn't hand out a delegation. There's also one more potential (but very unlikely) problem. If the lease is broken before the delegation is hashed, then it could leak. In the event that the fi_delegations list is empty, reset the fl_break_time to jiffies so that it's cleaned up ASAP by the normal lease handling code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 16:31:17 -04:00
Greg Kroah-Hartman	90125edbc4	Merge 3.16-rc6 into driver-core-next We want the platform changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-07-21 10:07:25 -07:00
J. Bruce Fields	57a3714421	nfsd4: CREATE_SESSION should update backchannel immediately nfsd4_probe_callback kicks off some work that will eventually run nfsd4_process_cb_update and update the session flags. In theory we could process a following SEQUENCE call before that update happens resulting in flags that don't accurately represent, for example, the lack of a backchannel. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 12:30:50 -04:00
Brian Norris	d0d5864676	Linux 3.16-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTzJFGAAoJEHm+PkMAQRiGNzQH/087gQch5K+A2HKvPzjUXq57 G82DJHLONMMq8+NY3Vqhp8g2V8zRbXGJEvMJMsyuscO37Vo7ADcrYo8lqY9w5bIl h+Zarhkqz0rqRs2SfMMIVzdd2W7MzL+lqj3GplGPxHztw0+qk7PRKILx6eRppGaH JaD4NfkD5+1vfve/2d1ze9D5pCiw6PFNzjesKZxScQhNhIyLdRamfSTY4r9XeURo CxpwjphEYfvAcgc39mwzEHPHyKSqULu0By6R8FXQpJ9QjVtzcGEiF+cPqGncpZOR 5ZSyU5e1CpBl9w8o6Lm9ewXmaCSnBU/VFrOwWvZrXfokZedXBOz7KdShU93XFjU= =0VJM -----END PGP SIGNATURE----- Merge tag 'v3.16-rc6' into MTD development branch Linux 3.16-rc6	2014-07-21 00:01:16 -07:00
Linus Torvalds	da83fc6e0f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have two more fixes in my for-linus branch. I was hoping to also include a fix for a btrfs deadlock with compression enabled, but we're still nailing that one down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: test for valid bdev before kobj removal in btrfs_rm_device Btrfs: fix abnormal long waiting in fsync	2014-07-20 20:21:05 -07:00
Linus Torvalds	90d51d5606	NFS client fixes for Linux 3.16 Highlights include; - Stable fix for an NFSv3 posix ACL regression - Multiple fixes for regressions to the NFS generic read/write code - Fix page splitting bugs that come into play when a small rsize/wsize read/write needs to be sent again (due to error conditions or page redirty). - Fix nfs_wb_page_cancel, which is called by the "invalidatepage" method - Fix 2 compile warnings about unused variables. - Fix a performance issue affecting unstable writes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTys5JAAoJEGcL54qWCgDygakP/1JOfQZzyHNRml5aDVRLtfp/ QQpn8ne3jjEov+0BhzSxqkpHP+8fCcF0UgD5asEnbM83ruoB8EUHErXdq7kRkAYf COpDZAww4sPXUszG/xGqIED503DKbf69Ds5pQMh8g71hfQpw04UmMQYbnRNBP2bI Z5eu0Xiwaf3MVRaxHMXVy9sl7in/cQBvrXnKLTIYOLA0U5bdCI9JWT5+qbOUCC/3 YuYm5EjM+OMyhvEWyVDXFp9kmw0vtBS8FwAXYKBjwfJLNl8dGuERWKlFDPNOxdpZ QrOBhUH73d2tgvTHzUg/RRtpA6mKOfznU3SQK3muP188/2/sbb4y/RChwv+Ignat YqWFgbDTz1idKvjj+Vzcd7eL3HO9Kono/YkAG8i5xmOlSeenzDra1lDAbuB+zNzd oLVY1AJp8+13c74gaCDurxJ7fq6Fth97eqxdo8fdQH8Mn6m6qRm2V57rOo59eFQS bX6EN4ja8ashrKprCvlhXDGJmQKZWJMEGoTXJCj5w+HBklONV6jdbyzGFKT+Zmhg IP+USsaLJDswZNpdWq8Zb9RthfFAURKEQEemwV5eLQNtTa+xQ22ZOF2ycrbBqsED etCCm8gyNukl2RCAxGfLrtpBytlcNMmcRKGktKqO1SAAua3IP33wGD7zql8rwGCZ m9XMXUkVKiHoErl3jZ4T =qklz -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Apologies for the relative lateness of this pull request, however the commits fix some issues with the NFS read/write code updates in 3.16-rc1 that can cause serious Oopsing when using small r/wsize. The delay was mainly due to extra testing to make sure that the fixes behave correctly. Highlights include; - Stable fix for an NFSv3 posix ACL regression - Multiple fixes for regressions to the NFS generic read/write code: - Fix page splitting bugs that come into play when a small rsize/wsize read/write needs to be sent again (due to error conditions or page redirty) - Fix nfs_wb_page_cancel, which is called by the "invalidatepage" method - Fix 2 compile warnings about unused variables - Fix a performance issue affecting unstable writes" * tag 'nfs-for-3.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Don't reset pg_moreio in __nfs_pageio_add_request NFS: Remove 2 unused variables nfs: handle multiple reqs in nfs_wb_page_cancel nfs: handle multiple reqs in nfs_page_async_flush nfs: change find_request to find_head_request nfs: nfs_page should take a ref on the head req nfs: mark nfs_page reqs with flag for extra ref nfs: only show Posix ACLs in listxattr if actually present	2014-07-20 19:55:44 -07:00
Yan, Zheng	d0d0db2268	ceph: check zero length in ceph_sync_read() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-21 10:17:05 +08:00
Eric Sandeen	0bfaa9c5cb	btrfs: test for valid bdev before kobj removal in btrfs_rm_device commit `99994cd` btrfs: dev delete should remove sysfs entry added a btrfs_kobj_rm_device, which dereferences device->bdev... right after we check whether device->bdev might be NULL. I don't honestly know if it's possible to have a NULL device->bdev here, but assuming that it is (given the test), we need to move the kobject removal to be under that test. (Coverity spotted this) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-07-19 11:49:44 -07:00
Liu Bo	98ce2deda2	Btrfs: fix abnormal long waiting in fsync xfstests generic/127 detected this problem. With commit `7fc34a62ca`, now fsync will only flush data within the passed range. This is the cause of the above problem, -- btrfs's fsync has a stage called 'sync log' which will wait for all the ordered extents it've recorded to finish. In xfstests/generic/127, with mixed operations such as truncate, fallocate, punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will mmap, and then msync. And I find that msync will wait for quite a long time (about 20s in my case), thanks to ftrace, it turns out that the previous fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, there can be some ordered extents created but not getting corresponding pages flushed, then they're left in memory until we fsync which runs into the stage 'sync log', and fsync will just wait for the system writeback thread to flush those pages and get ordered extents finished, so the latency is inevitable. This adds a flush similar to btrfs_start_ordered_extent() in btrfs_wait_logged_extents() to fix that. Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-07-19 11:49:44 -07:00
Artem Bityutskiy	545f7fdf6d	UBIFS: add a log overlap assertion Add an assertion which checkes that the head of the log never overlaps with the tail of the log. Suggested-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:55:19 +03:00
Artem Bityutskiy	f1cb705acc	UBIFS: remove unnecessary check Remove the "if (c->lhead_offs == 0)" check because is unnecessary, since at that point the log head offset is guaranteed to be zero due to the previous operation. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:54:25 +03:00
Artem Bityutskiy	07e19dff63	UBIFS: remove mst_mutex The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only called on the mount path and commit path. The mount path is sequential and there is no parallelism, and the commit path is also serialized - there is only one commit going on at a time. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	39274a1e8d	UBIFS: kernel-doc warning fix s/data/timer Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	d4eb08ff0a	UBIFS: replace seq_printf by seq_puts Fix checkpatch warnings: "WARNING: Prefer seq_puts to seq_printf" Andrew Morton wrote: " - puts is presumably faster - puts doesn't go rogue if you accidentally pass it a "%". - this patch actually made fs/ubifs/super.o 12 bytes smaller. Perhaps because seq_printf() is a varargs function, forcing the caller to pass args on the stack instead of in registers. " Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	86b4c14de3	UBIFS: replace countsize kzalloc by kcalloc kcalloc manages countsizeof overflow. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	ef13f01828	UBIFS: kernel-doc warning fix No grouped argument in drop_last_node. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
hujianyang	6dcfb80264	UBIFS: fix error path in create_default_filesystem() In the end of 'create_default_filesystem()' we need to check the return value of 'ubifs_write_node()' to ensure that we have successfully written the 'cs_node'. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Artem Bityutskiy	f2b6521aa1	UBIFS: fix spelling of "scanned" Randy Dunlap pointed that we should use "scanned" instead of "scaned". This patch makes the correction. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
Seunghun Lee	d685c41215	UBIFS: fix some comments This patch fixes some comments about return type. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	c7b5bb0beb	UBIFS: remove useless @ecc in struct ubifs_scan_leb We set @ecc in ubifs_scan_leb only if leb_read returns EBADMSG and do not use it any more. This patch removes this variable and adds comments about EBADMSG handling. Artem: re-phrase commentaries Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	b793a8c888	UBIFS: remove useless statements This patch removes useless and duplicate statements. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	ce6ebdb87e	UBIFS: Add missing break statements in dbg_chk_pnode() This is a minor fix. These two branches in 'dbg_chk_pnode()' are dealing with different conditions. Although there is no fault in current state, I think adding "break"s in each end of branch is better. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	5a95741a57	UBIFS: fix error handling in dump_lpt_leb() This patch checks the return value of 'ubifs_unpack_nnode()'. If this function returns an error, 'nnode' may not be initialized, so just print an error message and break. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
Kees Cook	c2e1f2e30d	seccomp: implement SECCOMP_FILTER_FLAG_TSYNC Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2014-07-18 12:13:40 -07:00
Kees Cook	1d4457f999	sched: move no_new_privs into new atomic flags Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2014-07-18 12:13:38 -07:00
Linus Torvalds	f839719122	This patch set contains two minor docs/spelling fixes, some fixes for flock, a change to use GFP_NOFS to avoid recursion on a rarely used code path and a fix for a race relating to the glock lru. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJTyPZQAAoJEMrg3m4a/8jSFBEQAKSnJQUP9MSxVwNBrgOiybXW kQd8RYs7cdt33i97C3Im9xSVktPz4HKTvuwHyvNV1oyWScfWSyqCgC//cU+/zlYV wJDZWIASNoQheY6UfxR6TeBPZo9Hgq7RQRGj4h1ttag9+b8Zz9aV5TCxcoh28ULF 629TyNwg4xdiEKX2xZusDwGCoHn5f5l9pAa5MyPrcyPzn1lOJP1lz++Lci2nqC4g DvA/KzQzDLQ2lKXdSd95avwQxnHqmeCTvClPmK9GgONrt66tqq6CcCLB1jPRE7/O J7f0VWy/PEeo8ot+9siiA380EvM6hWvJx5Fuen/Qb9dQ5sgsJMkvgbqlHK6zB/i3 Je6Qq+aVPz3qktmXdyEagpXfZAQAxy0PUWezQBQH8HIlhwKMGC1QaFgMoAFIks1Y S38IBHCwlymytWYdVaRhyUOnlzzaSyeYROzs7hZoxRRUilge5rPkrqtv4HWLSRtZ rGFEid181+qTO2TyoiMRY2oR3U0PHfbE9Dhv5Pu9caTl55kj9eAGwvqnOn6IpyvF eiUoWOnDYFO+8sxFKPYFndglEZx0zBU6B/7axyQ3qam3BojTJwKh+2+4TqauM0zo 4ehwJEzVmV21sbyMfUHCKTQEkW8OjQ+EkxAEmGhp4IODNwZ3vPfFBdhFi3fBipqO WhDmeDmOddb9cCoQG8WZ =VTve -----END PGP SIGNATURE----- Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull gfs2 fixes from Steven Whitehouse: "This patch set contains two minor docs/spelling fixes, some fixes for flock, a change to use GFP_NOFS to avoid recursion on a rarely used code path and a fix for a race relating to the glock lru" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes GFS2: memcontrol: Spelling s/invlidate/invalidate/ GFS2: Allow caching of glocks for flock GFS2: Allow flocks to use normal glock dq rather than dq_wait GFS2: replace count*size kzalloc by kcalloc GFS2: Use GFP_NOFS when allocating glocks GFS2: Fix race in glock lru glock disposal GFS2: Only wait for demote when last holder is dequeued	2014-07-18 06:26:04 -10:00
Linus Torvalds	847f56eb0e	xfs: fixes for 3.15-rc5 Fixes for low memory perforamnce regressions and a quota inode handling regression. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJTyFtkAAoJEK3oKUf0dfodAioP/0nsIq3be9zch0/Jjf9aduON 1aMaUE/9p4g/zu0f96ld3GI5/guRVllZ/0qJFyYAnAgOGs1hGARQfOuNnHBjgdLZ D261JbT9z8d8inQ7BMSc3EBBQ2CAZsAwRmg6UbeaWBE4hjlJ03RGWBE/aMMh10Wh t2fWUoeYWZWLi+Gfa+lPpnTESH7nBK5cW+daC16I1fU9Z/RcXAl+pCN2s5Ls6h7G 1mQNfhAuII+/ydB7UWXL6SQ7/sDvDwNvedMLljwg6oYu/riSG2kYPZKW6O9qwL6T z9onPg5lEQMjWlbl6qzNo6OT3pAs37vssG0zVw2ZS8GjZ84edRzw2PGctJAFu2Hh sWqmtYGNKBjtjnxJ4zlRfeBkwpHbZGlLyOwzKoDlyQ8j9KZ8v8lsKyEpJK6/XJNG 1rMJZV5twu+xvZUwf0zkg0tuxoT/3T3kbIHsFkEaJmQW7jrxTvdybW/rp6KHSbcb rzCpuZ5Ghh9qss+EeCv3k2nnWjysDP4kSwQMZ0zCedzvDTmga2TMw//MUwaM+i7M D7Raq4Qcs2updrFk2j9OyML2hi49KuPTtEu2OC7ObfxvBsZgSbTvyw1Vq/rsiDM5 FZMV/giKRoCFpRpp7xF+db0zkBC2xDU9tGz196dzGtg7rvp6Z5401mS8fAr9H/LJ D2Wf2OXx3oss9v4rrO7N =rnN8 -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs Pull xfs fixes from Dave Chinner: "Fixes for low memory perforamnce regressions and a quota inode handling regression. These are regression fixes for issues recently introduced - the change in the stack switch location is fairly important, so I've held off sending this update until I was sure that it still addresses the stack usage problem the original solved. So while the commits in the xfs tree are recent, it has been under tested for several weeks now" * tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs: xfs: null unused quota inodes when quota is on xfs: refine the allocation stack switch Revert "xfs: block allocation work needs to be kswapd aware"	2014-07-18 06:21:43 -10:00
Chuck Lever	3c45ddf823	svcrdma: Select NFSv4.1 backchannel transport based on forward channel The current code always selects XPRT_TRANSPORT_BC_TCP for the back channel, even when the forward channel was not TCP (eg, RDMA). When a 4.1 mount is attempted with RDMA, the server panics in the TCP BC code when trying to send CB_NULL. Instead, construct the transport protocol number from the forward channel transport or'd with XPRT_TRANSPORT_BC. Transports that do not support bi-directional RPC will not have registered a "BC" transport, causing create_backchannel_client() to fail immediately. Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-18 11:35:45 -04:00
Fabian Frederick	27ff6a0f7f	GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:15:14 +01:00
Geert Uytterhoeven	6b49d1d9c3	GFS2: memcontrol: Spelling s/invlidate/invalidate/ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: cluster-devel@redhat.com Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:14:31 +01:00
Bob Peterson	97a4f1d765	GFS2: Allow caching of glocks for flock This patch removes the GLF_NOCACHE flag from the glocks associated with flocks. There should be no good reason not to cache glocks for flocks: they only force the glock to be demoted before they can be reacquired, which can slow down performance and even cause glock hangs, especially in cases where the flocks are held in Shared (SH) mode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:14:12 +01:00
Bob Peterson	5bef3e7cf1	GFS2: Allow flocks to use normal glock dq rather than dq_wait This patch allows flock glocks to use a non-blocking dequeue rather than dq_wait. It also reverts the previous patch I had posted regarding dq_wait. The reverted patch isn't necessarily a bad idea, but I decided this might avoid unforeseen side effects, and was therefore safer. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:13:56 +01:00
Fabian Frederick	6ec43b1838	GFS2: replace countsize kzalloc by kcalloc kcalloc manages countsizeof overflow. Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:13:38 +01:00
Steven Whitehouse	fe0bbd2986	GFS2: Use GFP_NOFS when allocating glocks Normally GFP_KERNEL is ok here, but there is now a rarely used code path relating to deallocation of unlinked inodes (in certain corner cases) which if hit at times of memory shortage can cause recursion while trying to free memory. One solution would be to try and move the gfs2_glock_get() call so that it is no longer called while another glock is held, but that doesn't look at all easy, so GFP_NOFS is the best solution for the time being. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:13:12 +01:00
Steven Whitehouse	94a09a3999	GFS2: Fix race in glock lru glock disposal We must not leave items on the LRU list with GLF_LOCK set, since they can be removed if the glock is brought back into use, which may then potentially result in a hang, waiting for GLF_LOCK to clear. It doesn't happen very often, since it requires a glock that has not been used for a long time to be brought back into use at the same moment that the shrinker is part way through disposing of glocks. The fix is to set GLF_LOCK at a later time, when we already know that the other locks can be obtained. Also, we now only release the lru_lock in case a resched is needed, rather than on every iteration. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:12:51 +01:00
Bob Peterson	79272b3562	GFS2: Only wait for demote when last holder is dequeued Function gfs2_glock_dq_wait is supposed to dequeue a glock and then wait for the lock to be demoted. The problem is, if this is a shared lock, its demote will depend on the other holders, which means you might end up waiting forever because the other process is blocked. This problem is especially apparent when dealing with nested flocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:12:14 +01:00
Cyrill Gorcunov	5442e9fbd7	timerfd: Implement timerfd_ioctl method to restore timerfd_ctx::ticks, v3 The read() of timerfd files allows to fetch the number of timer ticks while there is no way to set it back from userspace. To restore the timer's state as it was at checkpoint moment we need a path to bring @ticks back. Initially I thought about writing ticks back via write() interface but it seems such API is somehow obscure. Instead implement timerfd_ioctl() method with TFD_IOC_SET_TICKS command which allows to adjust @ticks into non-zero value waking up the waiters. I wrapped code with CONFIG_CHECKPOINT_RESTORE which can be dropped off if there users except c/r camp appear. v2 (by akpm@): - Use define timerfd_ioctl NULL for non c/r config v3: - Use copy_from_user for @ticks fetching since not all arch support get_user for 8 byte argument Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christopher Covington <cov@codeaurora.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.285617923@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-07-18 11:49:57 +02:00
Cyrill Gorcunov	af9c4957cf	timerfd: Implement show_fdinfo method For checkpoint/restore of timerfd files we need to know how exactly the timer were armed, to be able to recreate it on restore stage. Thus implement show_fdinfo method which provides enough information for that. One of significant changes I think is the addition of @settime_flags member. Currently there are two flags TFD_TIMER_ABSTIME and TFD_TIMER_CANCEL_ON_SET, and the second can be found from @might_cancel variable but in case if the flags will be extended in future we most probably will have to somehow remember them explicitly anyway so I guss doing that right now won't hurt. To not bloat the timerfd_ctx structure I've converted @expired to short integer and defined @settime_flags as short too. v2 (by avagin@, vdavydov@ and tglx@): - Add it_value/it_interval fields - Save flags being used in timerfd_setup in context v3 (by tglx@): - don't forget to use CONFIG_PROC_FS v4 (by akpm@): -Use define timerfd_show NULL for non c/r config Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.114365649@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-07-18 11:49:57 +02:00
J. Bruce Fields	5d6031ca74	nfsd4: zero op arguments beyond the 8th compound op The first 8 ops of the compound are zeroed since they're a part of the argument that's zeroed by the memset(rqstp->rq_argp, 0, procp->pc_argsize); in svc_process_common(). But we handle larger compounds by allocating the memory on the fly in nfsd4_decode_compound(). Other than code recently fixed by `01529e3f81` "NFSD: Fix memory leak in encoding denied lock", I don't know of any examples of code depending on this initialization. But it definitely seems possible, and I'd rather be safe. Compounds this long are unusual so I'm much more worried about failure in this poorly tested cases than about an insignificant performance hit. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-17 16:20:39 -04:00
Jeff Layton	ae4b884fc6	nfsd: silence sparse warning about accessing credentials sparse says: fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces) fs/nfsd/auth.c:31:38: expected struct cred const cred fs/nfsd/auth.c:31:38: got struct cred const [noderef] <asn:4>real_cred Add a new accessor for the ->real_cred and use that to fetch the pointer. Accessing current->real_cred directly is actually quite safe since we know that they can't go away so this is mostly a cosmetic fixup to silence sparse. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-17 16:15:35 -04:00
David Howells	0c7774abb4	KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMIN Special kernel keys, such as those used to hold DNS results for AFS, CIFS and NFS and those used to hold idmapper results for NFS, used to be 'invalidateable' with key_revoke(). However, since the default permissions for keys were reduced: Commit: `96b5c8fea6` KEYS: Reduce initial permissions on keys it has become impossible to do this. Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be invalidated by root. This should not be used for system keyrings as the garbage collector will try and remove any invalidate key. For system keyrings, KEY_FLAG_ROOT_CAN_CLEAR can be used instead. After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and idmapper keys. Invalidated keys are immediately garbage collected and will be immediately rerequested if needed again. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve Dickson <steved@redhat.com>	2014-07-17 20:45:08 +01:00
Trond Myklebust	b0fc29d6fc	nfsd: Ensure stateids remain unique until they are freed Add an extra delegation state to allow the stateid to remain in the idr tree until the last reference has been released. This will be necessary to ensure uniqueness once the client_mutex is removed. [jlayton: reset the sc_type under the state_lock in unhash_delegation] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 21:39:51 -04:00
Jeff Layton	d564fbec7a	nfsd: nfs4_alloc_init_lease should take a nfs4_file arg No need to pass the delegation pointer in here as it's only used to get the nfs4_file pointer. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 21:35:25 -04:00
Jeff Layton	02e1215f9f	nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg state_lock is a heavily contended global lock. We don't want to grab that while simultaneously holding the inode->i_lock. Add a new per-nfs4_file lock that we can use to protect the per-nfs4_file delegation list. Hold that while walking the list in the break_deleg callback and queue the workqueue job for each one. The workqueue job can then take the state_lock and do the list manipulations without the i_lock being held prior to starting the rpc call. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 21:06:12 -04:00
Jeff Layton	e8051c837b	nfsd: eliminate nfsd4_init_callback It's just an obfuscated INIT_WORK call. Just make the work_func_t a non-static symbol and use a normal INIT_WORK call. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 14:18:58 -04:00
NeilBrown	c1221321b7	sched: Allow wait_on_bit_action() functions to support a timeout It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-16 15:10:41 +02:00
NeilBrown	743162013d	sched: Remove proliferation of wait_on_bit() action functions The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are not given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-16 15:10:39 +02:00
Linus Torvalds	c20ddc6499	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fix from Jan Kara: "Fix locking of dquot shrinker" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: missing lock in dqcache_shrink_scan()	2014-07-15 17:47:42 -10:00
Chao Yu	f1121ab0ba	f2fs: reduce searching region of segmap when free section In __set_test_and_free we will check whether all segment are free in one section When free one segment, in order to set section to free status. But the searching region of segmap is from start segno to last segno of f2fs, it's not necessary. So let's just only check all segment bitmap of target section. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-15 13:56:49 -07:00
Chao Yu	3f1be4f9c9	udf: avoid redundant memcpy when writing data in ICB Valid data within i_size in page cache will be copied to ICB cache when we writeback the page by invoking udf_adinicb_writepage, so the copy in udf_adinicb_write_end is redundant. After we remove the copy, it's better to use simple_write_end directly in udf_adinicb_aops instead of udf_adinicb_write_end. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:24 +02:00
Andy Shevchenko	c7ff48212d	fs/udf: re-use hex_asc_upper_{hi,lo} macros This patch cleans up udf_translate_to_linux() a bit by using globally defined macros instead of custom code. We can use sprintf(buf, "%04X", ...) there as well, but this one faster. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:24 +02:00
Fabian Frederick	2c15ac5bdb	fs/quota: kernel-doc warning fixes type and id were removed and qid added to quota_send_warning in commit `431f19744d` ("userns: Convert quota netlink aka quota_send_warning") Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:23 +02:00
Fabian Frederick	e973606cc2	udf: use linux/uaccess.h Fix checkpatch warning WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:23 +02:00
Himangi Saraogi	e625b310ed	fs/ext2/super.c: Drop memory allocation cast Drop cast on the result of kmem_cache_alloc. The semantic patch that makes this change is as follows: // <smpl> @@ type T; @@ - (T *) ($kmalloc\\|kzalloc\\|kcalloc\\|kmem_cache_alloc\\|kmem_cache_zalloc\\| kmem_cache_alloc_node\\|kmalloc_node\\|kzalloc_node$(...)) // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:22 +02:00
Niu Yawei	b9ba6f94b2	quota: remove dqptr_sem Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem, accessing inode->i_dquot now protected by dquot_srcu, and changing inode->i_dquot is now serialized by dq_data_lock. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:22 +02:00
Niu Yawei	9eb6463f31	quota: simplify remove_inode_dquot_ref() Simplify the remove_inode_dquot_ref() to make it more obvious that now we keep one reference for each dquot from inodes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:21 +02:00
Niu Yawei	1ea06bec78	quota: avoid unnecessary dqget()/dqput() calls Avoid unnecessary dqget()/dqput() calls in __dquot_initialize(), that will introduce global lock contention otherwise. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:21 +02:00
Niu Yawei	606cdcca04	quota: protect Q_GETFMT by dqonoff_mutex dqptr_sem will go away. Protect Q_GETFMT quotactl by dqonoff_mutex instead. This is also enough to make sure quota info will not go away while we are looking at it. Signed-off-by: Lai Siyao <lai.siyao@intel.com> Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:40:20 +02:00
Niu Yawei	d68aab6b8f	quota: missing lock in dqcache_shrink_scan() Commit `1ab6c4997e` (fs: convert fs shrinkers to new scan/count API) accidentally removed locking from quota shrinker. Fix it - dqcache_shrink_scan() should use dq_list_lock to protect the scan on free_dquots list. CC: stable@vger.kernel.org Fixes: `1ab6c4997e` Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2014-07-15 22:36:18 +02:00
Linus Torvalds	0b632204c7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This contains miscellaneous fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: replace count*size kzalloc by kcalloc fuse: release temporary page if fuse_writepage_locked() failed fuse: restructure ->rename2() fuse: avoid scheduling while atomic fuse: handle large user and group ID fuse: inode: drop cast fuse: ignore entry-timeout on LOOKUP_REVAL fuse: timeout comparison fix	2014-07-15 08:57:17 -07:00
Zheng Liu	83447ccb4d	ext4: make ext4_has_inline_data() as a inline function Now ext4_has_inline_data() is used in wide spread codepaths. So we need to make it as a inline function to avoid burning some CPU cycles. Change in text size: text data bss dec hex filename before: 326110 19258 5528 350896 55ab0 fs/ext4/ext4.o after: 326227 19258 5528 351013 55b25 fs/ext4/ext4.o I use the following script to measure the CPU usage. #!/bin/bash shm_base='/dev/shm' img=${shm_base}/ext4-img mnt=/mnt/loop e2fsprgs_base=$HOME/e2fsprogs mkfs=${e2fsprgs_base}/misc/mke2fs fsck=${e2fsprgs_base}/e2fsck/e2fsck sudo umount $mnt dd if=/dev/zero of=$img bs=4k count=3145728 ${mkfs} -t ext4 -O inline_data -F $img sudo mount -t ext4 -o loop $img $mnt # start testing... testdir="${mnt}/testdir" mkdir $testdir cd $testdir echo "start testing..." for ((cnt=0;cnt<100;cnt++)); do for ((i=0;i<5;i++)); do for ((j=0;j<5;j++)); do for ((k=0;k<5;k++)); do for ((l=0;l<5;l++)); do mkdir -p $i/$j/$k/$l echo "$i-$j-$k-$l" > $i/$j/$k/$l/testfile done done done done ls -R $testdir > /dev/null rm -rf $testdir/* done The result of `perf top -G -U` is as below. vanilla: 13.92% [ext4] [k] ext4_do_update_inode 9.36% [ext4] [k] __ext4_get_inode_loc 4.07% [ext4] [k] ftrace_define_fields_ext4_writepages 3.83% [ext4] [k] __ext4_handle_dirty_metadata 3.42% [ext4] [k] ext4_get_inode_flags 2.71% [ext4] [k] ext4_mark_iloc_dirty 2.46% [ext4] [k] ftrace_define_fields_ext4_direct_IO_enter 2.26% [ext4] [k] ext4_get_inode_loc 2.22% [ext4] [k] ext4_has_inline_data [...] After applied the patch, we don't see ext4_has_inline_data() because it has been inlined and perf couldn't sample it. Although it doesn't mean that the CPU cycles can be saved but at least the overhead of function calls can be eliminated. So IMHO we'd better inline this function. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-15 10:10:04 -04:00
Zhang Zhen	590a141863	ext4: remove readpage() check in ext4_mmap_file() There is no kind of file which does not supply a page reading function. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-15 09:56:19 -04:00
Lukas Czerner	4f579ae7de	ext4: fix punch hole on files with indirect mapping Currently punch hole code on files with direct/indirect mapping has some problems which may lead to a data loss. For example (from Jan Kara): fallocate -n -p 10240000 4096 will punch the range 10240000 - 12632064 instead of the range 1024000 - 10244096. Also the code is a bit weird and it's not using infrastructure provided by indirect.c, but rather creating it's own way. This patch fixes the issues as well as making the operation to run 4 times faster from my testing (punching out 60GB file). It uses similar approach used in ext4_ind_truncate() which takes advantage of ext4_free_branches() function. Also rename the ext4_free_hole_blocks() to something more sensible, like the equivalent we have for extent mapped files. Call it ext4_ind_remove_space(). This has been tested mostly with fsx and some xfstests which are testing punch hole but does not require unwritten extents which are not supported with direct/indirect mapping. Not problems showed up even with 1024k block size. CC: stable@vger.kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-15 06:03:38 -04:00
Theodore Ts'o	71d4f7d032	ext4: remove metadata reservation checks Commit `27dd438542` ("ext4: introduce reserved space") reserves 2% of the file system space to make sure metadata allocations will always succeed. Given that, tracking the reservation of metadata blocks is no longer necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-15 06:02:38 -04:00
Theodore Ts'o	d5e03cbb0c	ext4: rearrange initialization to fix EXT4FS_DEBUG The EXT4FS_DEBUG is a very developer specific #ifdef designed for ext4 developers only. (You have to modify fs/ext4/ext4.h to enable it.) Rearrange how we initialize data structures to avoid calling ext4_count_free_clusters() until the multiblock allocator has been initialized. This also allows us to only call ext4_count_free_clusters() once, and simplifies the code somewhat. (Thanks to Chen Gang <gang.chen.5i5j@gmail.com> for pointing out a !CONFIG_SMP compile breakage in the original patch.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>	2014-07-15 06:01:38 -04:00
Brian Foster	80d6d69821	xfs: add log attributes for log lsn and grant head data Create log attributes to export the current runtime state of the log to sysfs. Note that the filesystem should be frozen for consistency across attributes. The following per-mount attributes are created: log_head_lsn, log_tail_lsn, reserve_grant_head and write_grant_head. These represent the physical log head, tail and reserve and write grant heads respectively. Attribute values are exported in the following format: "cycle:[block,byte]" ... where cycle represents the log cycle and [block,bytes] represents either the basic block or byte offset of the log, depending on the attribute. Log sequence number (LSN) values are encoded in basic blocks and grant heads are encoded in bytes. All values are in decimal format. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 08:07:48 +10:00
Brian Foster	baff4e44b9	xfs: add xlog sysfs kobject and attribute handlers Embed a kobject into the xfs log data structure (xlog). This creates a 'log' subdirectory for every XFS mount instance in sysfs. The lifecycle of the log kobject is tied to the lifecycle of the log. Also define a set of generic attribute handlers associated with the log kobject in preparation for the addition of attributes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 08:07:29 +10:00
Brian Foster	a31b1d3d89	xfs: add xfs_mount sysfs kobject Embed a base kobject into xfs_mount. This creates a kobject associated with each XFS mount and a subdirectory in sysfs with the name of the filesystem. The subdirectory lifecycle matches that of the mount. Also add the new xfs_sysfs.[c,h] source files with some XFS sysfs infrastructure to facilitate attribute creation. Note that there are currently no attributes exported as part of the xfs_mount kobject. It exists solely to serve as a per-mount container for child objects. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 08:07:01 +10:00
Brian Foster	3d8712265c	xfs: add a sysfs kset Create a sysfs kset to contain all sub-objects associated with the XFS module. The kset is created and removed on module initialization and removal respectively. The kset uses fs_obj as a parent. This leads to the creation of a /sys/fs/xfs directory when the kset exists. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 07:41:37 +10:00
Brian Foster	a70a4fa528	xfs: fix a couple error sequence jumps in xfs_mountfs() xfs_mountfs() has a couple failure conditions that do not jump to the correct labels. Specifically: - xfs_initialize_perag_data() failure does not deallocate the log even though it occurs after log initialization - xfs_mount_reset_sbqflags() failure returns the error directly rather than jump to the error sequence Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 07:41:25 +10:00
Dave Chinner	7f8a058f6d	Merge branch 'xfs-libxfs-restructure' into for-next	2014-07-15 07:37:18 +10:00
Dave Chinner	03e01349c6	xfs: null unused quota inodes when quota is on When quota is on, it is expected that unused quota inodes have a value of NULLFSINO. The changes to support a separate project quota in 3.12 broken this rule for non-project quota inode enabled filesystem, as the code now refuses to write the group quota inode if neither group or project quotas are enabled. This regression was introduced by commit `d892d58` ("xfs: Start using pquotaino from the superblock"). In this case, we should be writing NULLFSINO rather than nothing to ensure that we leave the group quota inode in a valid state while quotas are enabled. Failure to do so doesn't cause a current kernel to break - the separate project quota inodes introduced translation code to always treat a zero inode as NULLFSINO. This was introduced by commit `0102629` ("xfs: Initialize all quota inodes to be NULLFSINO") with is also in 3.12 but older kernels do not do this and hence taking a filesystem back to an older kernel can result in quotas failing initialisation at mount time. When that happens, we see this in dmesg: [ 1649.215390] XFS (sdb): Mounting Filesystem [ 1649.316894] XFS (sdb): Failed to initialize disk quotas. [ 1649.316902] XFS (sdb): Ending clean mount By ensuring that we write NULLFSINO to quota inodes that aren't active, we avoid this problem. We have to be really careful when determining if the quota inodes are active or not, because we don't want to write a NULLFSINO if the quota inodes are active and we simply aren't updating them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 07:28:41 +10:00
Dave Chinner	cf11da9c5d	xfs: refine the allocation stack switch The allocation stack switch at xfs_bmapi_allocate() has served it's purpose, but is no longer a sufficient solution to the stack usage problem we have in the XFS allocation path. Whilst the kernel stack size is now 16k, that is not a valid reason for undoing all our "keep stack usage down" modifications. What it does allow us to do is have the freedom to refine and perfect the modifications knowing that if we get it wrong it won't blow up in our faces - we have a safety net now. This is important because we still have the issue of older kernels having smaller stacks and that they are still supported and are demonstrating a wide range of different stack overflows. Red Hat has several open bugs for allocation based stack overflows from directory modifications and direct IO block allocation and these problems still need to be solved. If we can solve them upstream, then distro's won't need to bake their own unique solutions. To that end, I've observed that every allocation based stack overflow report has had a specific characteristic - it has happened during or directly after a bmap btree block split. That event requires a new block to be allocated to the tree, and so we effectively stack one allocation stack on top of another, and that's when we get into trouble. A further observation is that bmap btree block splits are much rarer than writeback allocation - over a range of different workloads I've observed the ratio of bmap btree inserts to splits ranges from 100:1 (xfstests run) to 10000:1 (local VM image server with sparse files that range in the hundreds of thousands to millions of extents). Either way, bmap btree split events are much, much rarer than allocation events. Finally, we have to move the kswapd state to the allocation workqueue work when allocation is done on behalf of kswapd. This is proving to cause significant perturbation in performance under memory pressure and appears to be generating allocation deadlock warnings under some workloads, so avoiding the use of a workqueue for the majority of kswapd writeback allocation will minimise the impact of such behaviour. Hence it makes sense to move the stack switch to xfs_btree_split() and only do it for bmap btree splits. Stack switches during allocation will be much rarer, so there won't be significant performacne overhead caused by switching stacks. The worse case stack from all allocation paths will be split, not just writeback. And the majority of memory allocations will be done in the correct context (e.g. kswapd) without causing additional latency, and so we simplify the memory reclaim interactions between processes, workqueues and kswapd. The worst stack I've been able to generate with this patch in place is 5600 bytes deep. It's very revealing because we exit XFS at: 37) 1768 64 kmem_cache_alloc+0x13b/0x170 about 1800 bytes of stack consumed, and the remaining 3800 bytes (and 36 functions) is memory reclaim, swap and the IO stack. And this occurs in the inode allocation from an open(O_CREAT) syscall, not writeback. The amount of stack being used is much less than I've previously be able to generate - fs_mark testing has been able to generate stack usage of around 7k without too much trouble; with this patch it's only just getting to 5.5k. This is primarily because the metadata allocation paths (e.g. directory blocks) are no longer causing double splits on the same stack, and hence now stack tracing is showing swapping being the worst stack consumer rather than XFS. Performance of fs_mark inode create workloads is unchanged. Performance of fs_mark async fsync workloads is consistently good with context switches reduced by around 150,000/s (30%). Performance of dbench, streaming IO and postmark is unchanged. Allocation deadlock warnings have not been seen on the workloads that generated them since adding this patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 07:08:24 +10:00
Dave Chinner	aa182e64f1	Revert "xfs: block allocation work needs to be kswapd aware" This reverts commit `1f6d64829d`. This commit resulted in regressions in performance in low memory situations where kswapd was doing writeback of delayed allocation blocks. It resulted in significant parallelism of the kswapd work and with the special kswapd flags meant that hundreds of active allocation could dip into kswapd specific memory reserves and avoid being throttled. This cause a large amount of performance variation, as well as random OOM-killer invocations that didn't previously exist. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-15 07:08:10 +10:00
Benjamin LaHaise	6e830d5371	Merge ../aio-fixes	2014-07-14 13:14:27 -04:00
Benjamin LaHaise	263782c1c9	aio: protect reqs_available updates from changes in interrupt handlers As of commit `f8567a3845` it is now possible to have put_reqs_available() called from irq context. While put_reqs_available() is per cpu, it did not protect itself from interrupts on the same CPU. This lead to aio_complete() corrupting the available io requests count when run under a heavy O_DIRECT workloads as reported by Robert Elliott. Fix this by disabling irq updates around the per cpu batch updates of reqs_available. Many thanks to Robert and folks for testing and tracking this down. Reported-by: Robert Elliot <Elliott@hp.com> Tested-by: Robert Elliot <Elliott@hp.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org> Cc: stable@vger.kenel.org	2014-07-14 13:05:26 -04:00

... 5 6 7 8 9 ...

37611 Commits