linux

Author	SHA1	Message	Date
Alexander Aring	63eab2b00b	fs: dlm: add lkb waiters debugfs functionality This patch adds functionality to put a lkb to the waiters state. It can be useful to combine this feature with the "rawmsg" debugfs functionality. It will bring the DLM lkb into a state that a message will be parsed by the kernel. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	5054e79de9	fs: dlm: add lkb debugfs functionality This patch adds functionality to add an lkb during runtime. This is a highly debugging feature only, wrong input can crash the kernel. It is a early state feature as well. The goal is to provide a user interface for manipulate dlm state and combine it with the rawmsg feature. It is debugfs functionality, we don't care about UAPI breakage. Even it's possible to add lkb's/rsb's which could never be exists in such wat by using normal DLM operation. The user of this interface always need to think before using this feature, not every crash which happens can really occur during normal dlm operation. Future there should be more functionality to add a more realistic lkb which reflects normal DLM state inside the kernel. For now this is enough. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	75d25ffe38	fs: dlm: allow create lkb with specific id range This patch adds functionality to add a lkb with a specific id range. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	9af5b8f0ea	fs: dlm: add debugfs rawmsg send functionality This patch adds a dlm functionality to send a raw dlm message to a specific cluster node. This raw message can be build by user space and send out by writing the message to "rawmsg" dlm debugfs file. There is a in progress scapy dlm module which provides a easy build of DLM messages in user space. For example: DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...) The goal is to provide an easy reproducable state to crash DLM or to fuzz the DLM kernel stack if there are possible ways to crash it. Note: that if the sequence number is zero and dlm version is not set to 3.1 the kernel will automatic will set a right sequence number, otherwise DLM stack testing is not possible. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	5c16febbc1	fs: dlm: let handle callback data as void This patch changes the dlm_lowcomms_new_msg() function pointer private data from "struct mhandle " to "void " to provide different structures than just "struct mhandle". Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	3cb5977c52	fs: dlm: ls_count busy wait to event based wait This patch changes the ls_count busy wait to use atomic counter values and wait_event() to wait until ls_count reach zero. It will slightly reduce the number of holding lslist_lock. At remove lockspace we need to retry the wait because it a lockspace get could interefere between wait_event() and holding the lock which deletes the lockspace list entry. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	164d88abd7	fs: dlm: requestqueue busy wait to event based wait This patch changes the requestqueue busy waiting algorithm to use atomic counter values and wait_event() to wait until the requestqueue is empty. It will slightly reduce the number of holding ls_requestqueue_mutex mutex. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	92732376fd	fs: dlm: trace socket handling This patch adds tracepoints for dlm socket receive and send functionality. We can use it to track how much data was send or received to or from a specific nodeid. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	f1d3b8f91d	fs: dlm: initial support for tracepoints This patch adds initial support for dlm tracepoints. It will introduce tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their complete ast() callback or blocking bast() callback. The lock/unlock functionality has a start and end tracepoint, this is because there exists a race in case if would have a tracepoint at the end position only the complete/blocking callbacks could occur before. To work with eBPF tracing and using their lookup hash functionality there could be problems that an entry was not inserted yet. However use the start functionality for hash insert and check again in end functionality if there was an dlm internal error so there is no ast callback. In further it might also that locks with local masters will occur those callbacks immediately so we must have such functionality. I did not make everything accessible yet, although it seems eBPF can be used to access a lot of internal datastructures if it's aware of the struct definitions of the running kernel instance. We still can change it, if you do eBPF experiments e.g. time measurements between lock and callback functionality you can simple use the local lkb_id field as hash value in combination with the lockspace id if you have multiple lockspaces. Otherwise you can simple use trace-cmd for some functionality, e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	2f05ec4327	fs: dlm: make dlm_callback_resume quite This patch makes dlm_callback_resume info printout less noisy by accumulate all callback queues into one printout not in 25 times steps. It seems this printout became lately quite noisy in relationship with gfs2. Before: [241767.849302] dlm: bin: dlm_callback_resume 25 [241767.854846] dlm: bin: dlm_callback_resume 25 [241767.860373] dlm: bin: dlm_callback_resume 25 ... [241767.865920] dlm: bin: dlm_callback_resume 25 [241767.871352] dlm: bin: dlm_callback_resume 25 [241767.876733] dlm: bin: dlm_callback_resume 25 After the patch: [ 385.485728] dlm: gfs2: dlm_callback_resume 175 if zero it will not be printed out. Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	e10249b190	fs: dlm: use dlm_recovery_stopped in condition This patch will change to evaluate the dlm_recovery_stopped() in the condition of the if branch instead fetch it before evaluating the condition. As this is an atomic test-set operation it should be evaluated in the condition itself. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	3e9736713d	fs: dlm: use dlm_recovery_stopped instead of test_bit This patch will change to use dlm_recovery_stopped() which is the dlm way to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper. It is an atomic operation but the check is still as before to fetch the value if ls_recover_lock is held. There might be more further investigations if the value can be changed afterwards and if it has any side effects. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	658bd576f9	fs: dlm: move version conversion to compile time This patch moves version conversion to little endian from a runtime variable to compile time constant. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	fe93367541	fs: dlm: remove check SCTP is loaded message Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try load the sctp module before we try to create a sctp kernel socket. That a socket creation fails now has more likely other reasons. This patch removes the part of error to load the sctp module and instead printout the error code. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	1aafd9c231	fs: dlm: debug improvements print nodeid This patch improves the debug output for midcomms layer by also printing out the nodeid where users counter belongs to. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	bb6866a5bd	fs: dlm: fix small lockspace typo This patch fixes a typo from lockspace to lockspace. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	dea450c90f	fs: dlm: remove obsolete INBUF define This patch removes an obsolete define for some length for an temporary buffer which is not being used anymore. The use of this define is not necessary anymore since commit `4798cbbfbd` ("fs: dlm: rework receive handling"). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Linus Torvalds	78805cbe5d	Merge tag 'gfs2-v5.15-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix a locking order inversion between the inode and iopen glocks in gfs2_inode_lookup. - Implement proper queuing of glock holders for glocks that require instantiation (like reading an inode or bitmap blocks from disk). Before, multiple glock holders could race with each other and half-initialized objects could be exposed; the GL_SKIP flag further exacerbated this problem. - Fix a rare deadlock between inode lookup / creation and remote delete work. - Fix a rare scheduling-while-atomic bug in dlm during glock hash table walks. - Various other minor fixes and cleanups. * tag 'gfs2-v5.15-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits) gfs2: Fix unused value warning in do_gfs2_set_flags() gfs2: check context in gfs2_glock_put gfs2: Fix glock_hash_walk bugs gfs2: Cancel remote delete work asynchronously gfs2: set glock object after nq gfs2: remove RDF_UPTODATE flag gfs2: Eliminate GIF_INVALID flag gfs2: fix GL_SKIP node_scope problems gfs2: split glock instantiation off from do_promote gfs2: further simplify do_promote gfs2: re-factor function do_promote gfs2: Remove 'first' trace_gfs2_promote argument gfs2: change go_lock to go_instantiate gfs2: dump glocks from gfs2_consist_OBJ_i gfs2: dequeue iopen holder in gfs2_inode_lookup error gfs2: Save ip from gfs2_glock_nq_init gfs2: Allow append and immutable bits to coexist gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug gfs2: move GL_SKIP check from glops to do_promote gfs2: Add GL_SKIP holder flag to dump_holder ...	2021-11-02 12:35:04 -07:00
Linus Torvalds	c03098d4b9	Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher: "Functions gfs2_file_read_iter and gfs2_file_write_iter are both accessing the user buffer to write to or read from while holding the inode glock. In the most basic deadlock scenario, that buffer will not be resident and it will be mapped to the same file. Accessing the buffer will trigger a page fault, and gfs2 will deadlock trying to take the same inode glock again while trying to handle that fault. Fix that and similar, more complex scenarios by disabling page faults while accessing user buffers. To make this work, introduce a small amount of new infrastructure and fix some bugs that didn't trigger so far, with page faults enabled" * tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix mmap + page fault deadlocks for direct I/O iov_iter: Introduce nofault flag to disable page faults gup: Introduce FOLL_NOFAULT flag to disable page faults iomap: Add done_before argument to iomap_dio_rw iomap: Support partial direct I/O on user copy failures iomap: Fix iomap_dio_rw return value for user copies gfs2: Fix mmap + page fault deadlocks for buffered I/O gfs2: Eliminate ip->i_gh gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Introduce flag for glock holder auto-demotion gfs2: Clean up function may_grant gfs2: Add wrapper for iomap_file_buffered_write iov_iter: Introduce fault_in_iov_iter_writeable iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} powerpc/kvm: Fix kvm_use_magic_page iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value	2021-11-02 12:25:03 -07:00
Beld Zhang	71c9ce27bb	io-wq: fix max-workers not correctly set on multi-node system In io-wq.c:io_wq_max_workers(), new_count[] was changed right after each node's value was set. This caused the following node getting the setting of the previous one. Returned values are copied from node 0. Fixes: `2e480058dd` ("io-wq: provide a way to limit max number of workers") Signed-off-by: Beld Zhang <beldzhang@gmail.com> [axboe: minor fixups] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-02 12:33:35 -06:00
Steve French	7ae5e588b0	cifs: add mount parameter tcpnodelay Although corking and uncorking the socket (which cifs.ko already does) should usually have the desired benefit, using the new tcpnodelay mount option causes tcp_sock_set_nodelay() to be set on the socket which may be useful in order to ensure that we don't ever have cases where the network stack is waiting on sending an SMB request until multiple SMB requests have been added to the send queue (since this could lead to long latencies). To enable it simply append "tcpnodelay" it to the mount options Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-11-02 13:13:34 -05:00
Shyam Prasad N	7be3248f31	cifs: To match file servers, make sure the server hostname matches We generally rely on a bunch of factors to differentiate between servers. For example, IP address, port etc. For certain server types (like Azure), it is important to make sure that the server hostname matches too, even if the both hostnames currently resolve to the same IP address. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2021-11-02 13:12:55 -05:00
Chuck Lever	8791545eda	NFS: Move NFS protocol display macros to global header Refactor: surface useful show_ macros so they can be shared between the client and server trace code. Additional clean up: - Housekeeping: ensure the correct #include files are pulled in and add proper TRACE_DEFINE_ENUM where they are missing - Use a consistent naming scheme for the helpers - Store values to be displayed symbolically as unsigned long, as that is the type that the __print_yada() functions take Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-11-02 12:31:23 -04:00
Chuck Lever	9d2d48bbbd	NFS: Move generic FS show macros to global header Refactor: Surface useful show_ macros for use by other trace subsystems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-11-02 12:31:23 -04:00
Pavel Begunkov	9881024aab	io_uring: clean up io_queue_sqe_arm_apoll The fix for linked timeout unprep got a bit distored with two rebases, handle linked timeouts for IO_APOLL_READY as with all other cases, i.e. queue it at the end of the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/130b1ea5605bbd81d7b874a95332295799d33b81.1635863773.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-02 09:26:14 -06:00
Miklos Szeredi	712a951025	fuse: fix page stealing It is possible to trigger a crash by splicing anon pipe bufs to the fuse device. The reason for this is that anon_pipe_buf_release() will reuse buf->page if the refcount is 1, but that page might have already been stolen and its flags modified (e.g. PG_lru added). This happens in the unlikely case of fuse_dev_splice_write() getting around to calling pipe_buf_release() after a page has been stolen, added to the page cache and removed from the page cache. Fix by calling pipe_buf_release() right after the page was inserted into the page cache. In this case the page has an elevated refcount so any release function will know that the page isn't reusable. Reported-by: Frank Dinoff <fdinoff@google.com> Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1yw@mail.gmail.com/ Fixes: `dd3bb14f44` ("fuse: support splice() writing to fuse device") Cc: <stable@vger.kernel.org> # v2.6.35 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-11-02 11:10:37 +01:00
Miklos Szeredi	7c594bbd2d	virtiofs: use strscpy for copying the queue name Always null terminate fsvq->name. Reported-by: kernel test robot <lkp@intel.com> Fixes: `b43b7e81eb` ("virtiofs: provide a helper function for virtqueue initialization") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-11-02 11:08:19 +01:00
Marc Dionne	52af7105ec	afs: Set mtime from the client for yfs create operations For operations that create vnodes on the server such as CreateFile, MakeDir or Symlink, the server will store its own current time as the mtime if the client doesn't pass in a time in the accompanying StoreStatus structure. If the server and client clocks are not well synchronized, the client may see timestamps in the future or inconsistent dependency checks with "make" for files that are not modified after creation: make[2]: Warning: File 'arch/x86/kernel/apic/modules.order' has modification time 0.14 s in the future make[2]: warning: Clock skew detected. Your build may be incomplete. This is already handled correctly for non yfs operations; also set the mtime for the corresponding yfs operations. Changes: v3: Replace S_IRWXUGO with 0777, per checkpatch v2: [dhowells] Merge the two xdr_encode_YFSStoreStatus*() functions together Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: http://lists.infradead.org/pipermail/linux-afs/2021-October/004395.html	2021-11-02 09:42:26 +00:00
David Howells	75bd228d56	afs: Sort out symlink reading afs_readpage() doesn't get a file pointer when called for a symlink, so separate it from regular file pointer handling. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/162687508008.276387.6418924257569297305.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162981152280.1901565.2264055504466731917.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005742570.2472992.7800423440314043178.stgit@warthog.procyon.org.uk/ # v2	2021-11-02 09:40:18 +00:00
Linus Torvalds	d2fac0afe8	Merge tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Add some additional audit logging to capture the openat2() syscall open_how struct info. Previous variations of the open()/openat() syscalls allowed audit admins to inspect the syscall args to get the information contained in the new open_how struct used in openat2()" * tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: return early if the filter rule has a lower priority audit: add OPENAT2 record to list "how" info audit: add support for the openat2 syscall audit: replace magic audit syscall class numbers with macros lsm_audit: avoid overloading the "key" audit field audit: Convert to SPDX identifier audit: rename struct node to struct audit_node to prevent future name collisions	2021-11-01 21:17:39 -07:00
Linus Torvalds	cdab10bf32	Merge tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add LSM/SELinux/Smack controls and auditing for io-uring. As usual, the individual commit descriptions have more detail, but we were basically missing two things which we're adding here: + establishment of a proper audit context so that auditing of io-uring ops works similarly to how it does for syscalls (with some io-uring additions because io-uring ops are not syscalls) + additional LSM hooks to enable access control points for some of the more unusual io-uring features, e.g. credential overrides. The additional audit callouts and LSM hooks were done in conjunction with the io-uring folks, based on conversations and RFC patches earlier in the year. - Fixup the binder credential handling so that the proper credentials are used in the LSM hooks; the commit description and the code comment which is removed in these patches are helpful to understand the background and why this is the proper fix. - Enable SELinux genfscon policy support for securityfs, allowing improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA. * tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: security: Return xattr name from security_dentry_init_security() selinux: fix a sock regression in selinux_ip_postroute_compat() binder: use cred instead of task for getsecid binder: use cred instead of task for selinux checks binder: use euid from cred instead of using task LSM: Avoid warnings about potentially unused hook variables selinux: fix all of the W=1 build warnings selinux: make better use of the nf_hook_state passed to the NF hooks selinux: fix race condition when computing ocontext SIDs selinux: remove unneeded ipv6 hook wrappers selinux: remove the SELinux lockdown implementation selinux: enable genfscon labeling for securityfs Smack: Brutalist io_uring support selinux: add support for the io_uring access controls lsm,io_uring: add LSM hooks to io_uring io_uring: convert io_uring to the secure anon inode interface fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() audit: add filtering for io_uring records audit,io_uring,io-wq: add some basic audit support to io_uring audit: prepare audit_context for use in calling contexts beyond syscalls	2021-11-01 21:06:18 -07:00
Linus Torvalds	79ef0c0014	Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - kprobes: Restructured stack unwinder to show properly on x86 when a stack dump happens from a kretprobe callback. - Fix to bootconfig parsing - Have tracefs allow owner and group permissions by default (only denying others). There's been pressure to allow non root to tracefs in a controlled fashion, and using groups is probably the safest. - Bootconfig memory managament updates. - Bootconfig clean up to have the tools directory be less dependent on changes in the kernel tree. - Allow perf to be traced by function tracer. - Rewrite of function graph tracer to be a callback from the function tracer instead of having its own trampoline (this change will happen on an arch by arch basis, and currently only x86_64 implements it). - Allow multiple direct trampolines (bpf hooks to functions) be batched together in one synchronization. - Allow histogram triggers to add variables that can perform calculations against the event's fields. - Use the linker to determine architecture callbacks from the ftrace trampoline to allow for proper parameter prototypes and prevent warnings from the compiler. - Extend histogram triggers to key off of variables. - Have trace recursion use bit magic to determine preempt context over if branches. - Have trace recursion disable preemption as all use cases do anyway. - Added testing for verification of tracing utilities. - Various small clean ups and fixes. * tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits) tracing/histogram: Fix semicolon.cocci warnings tracing/histogram: Fix documentation inline emphasis warning tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together tracing: Show size of requested perf buffer bootconfig: Initialize ret in xbc_parse_tree() ftrace: do CPU checking after preemption disabled ftrace: disable preemption when recursion locked tracing/histogram: Document expression arithmetic and constants tracing/histogram: Optimize division by a power of 2 tracing/histogram: Covert expr to const if both operands are constants tracing/histogram: Simplify handling of .sym-offset in expressions tracing: Fix operator precedence for hist triggers expression tracing: Add division and multiplication support for hist triggers tracing: Add support for creating hist trigger variables from literal selftests/ftrace: Stop tracing while reading the trace file by default MAINTAINERS: Update KPROBES and TRACING entries test_kprobes: Move it from kernel/ to lib/ docs, kprobes: Remove invalid URL and add new reference samples/kretprobes: Fix return value if register_kretprobe() failed lib/bootconfig: Fix the xbc_get_info kerneldoc ...	2021-11-01 20:05:19 -07:00
Linus Torvalds	bf953917be	Merge tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull hardening fixes and cleanups from Gustavo A. R. Silva: "Various hardening fixes and cleanups that I've been collecting during the last development cycle: Fix -Wcast-function-type error: - firewire: Remove function callback casts (Oscar Carter) Fix application of sizeof operator: - firmware/psci: fix application of sizeof to pointer (jing yangyang) Replace open coded instances with size_t saturating arithmetic helpers: - assoc_array: Avoid open coded arithmetic in allocator arguments (Len Baker) - writeback: prefer struct_size over open coded arithmetic (Len Baker) - aio: Prefer struct_size over open coded arithmetic (Len Baker) - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic (Len Baker) Flexible array transformation: - KVM: PPC: Replace zero-length array with flexible array member (Len Baker) Use 2-factor argument multiplication form: - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva) - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)" * tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: firewire: Remove function callback casts nouveau/svm: Use kvcalloc() instead of kvzalloc() firmware/psci: fix application of sizeof to pointer dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic KVM: PPC: Replace zero-length array with flexible array member aio: Prefer struct_size over open coded arithmetic writeback: prefer struct_size over open coded arithmetic xfs: Use kvcalloc() instead of kvzalloc() assoc_array: Avoid open coded arithmetic in allocator arguments	2021-11-01 17:29:10 -07:00
Linus Torvalds	2dc26d98cf	Merge tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull overflow updates from Kees Cook: "The end goal of the current buffer overflow detection work[0] is to gain full compile-time and run-time coverage of all detectable buffer overflows seen via array indexing or memcpy(), memmove(), and memset(). The str() family of functions already have full coverage. While much of the work for these changes have been on-going for many releases (i.e. 0-element and 1-element array replacements, as well as avoiding false positives and fixing discovered overflows[1]), this series contains the foundational elements of several related buffer overflow detection improvements by providing new common helpers and FORTIFY_SOURCE changes needed to gain the introspection required for compiler visibility into array sizes. Also included are a handful of already Acked instances using the helpers (or related clean-ups), with many more waiting at the ready to be taken via subsystem-specific trees[2]. The new helpers are: - struct_group() for gaining struct member range introspection - memset_after() and memset_startat() for clearing to the end of structures - DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs Also included is the beginning of the refactoring of FORTIFY_SOURCE to support memcpy() introspection, fix missing and regressed coverage under GCC, and to prepare to fix the currently broken Clang support. Finishing this work is part of the larger series[0], but depends on all the false positives and buffer overflow bug fixes to have landed already and those that depend on this series to land. As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time and run-time tests are added for FORTIFY_SOURCE and the mem()-family functions respectively. The compile time tests have found a legitimate (though corner-case) bug[6] already. Please note that the appearance of "panic" and "BUG" in the FORTIFY_SOURCE refactoring are the result of relocating existing code, and no new use of those code-paths are expected nor desired. Finally, there are two tree-wide conversions for 0-element arrays and flexible array unions to gain sane compiler introspection coverage that result in no known object code differences. After this series (and the changes that have now landed via netdev and usb), we are very close to finally being able to build with -Warray-bounds and -Wzero-length-bounds. However, due corner cases in GCC[3] and Clang[4], I have not included the last two patches that turn on these options, as I don't want to introduce any known warnings to the build. Hopefully these can be solved soon" Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0] Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1] Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2] Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3] Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4] Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5] Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6] * tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) fortify: strlen: Avoid shadowing previous locals compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer treewide: Replace 0-element memcpy() destinations with flexible arrays treewide: Replace open-coded flex arrays in unions stddef: Introduce DECLARE_FLEX_ARRAY() helper btrfs: Use memset_startat() to clear end of struct string.h: Introduce memset_startat() for wiping trailing members and padding xfrm: Use memset_after() to clear padding string.h: Introduce memset_after() for wiping trailing members/padding lib: Introduce CONFIG_MEMCPY_KUNIT_TEST fortify: Add compile-time FORTIFY_SOURCE tests fortify: Allow strlen() and strnlen() to pass compile-time known lengths fortify: Prepare to improve strnlen() and strlen() warnings fortify: Fix dropped strcpy() compile-time write overflow check fortify: Explicitly disable Clang support fortify: Move remaining fortify helpers into fortify-string.h lib/string: Move helper functions out of string.c compiler_types.h: Remove __compiletime_object_size() cm4000_cs: Use struct_group() to zero struct cm4000_dev region can: flexcan: Use struct_group() to zero struct flexcan_regs regions ...	2021-11-01 17:12:56 -07:00
Linus Torvalds	6e5772c8d9	Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic confidential computing updates from Borislav Petkov: "Add an interface called cc_platform_has() which is supposed to be used by confidential computing solutions to query different aspects of the system. The intent behind it is to unify testing of such aspects instead of having each confidential computing solution add its own set of tests to code paths in the kernel, leading to an unwieldy mess" * tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide: Replace the use of mem_encrypt_active() with cc_platform_has() x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() x86/sev: Replace occurrences of sev_active() with cc_platform_has() x86/sme: Replace occurrences of sme_active() with cc_platform_has() powerpc/pseries/svm: Add a powerpc version of cc_platform_has() x86/sev: Add an x86 version of cc_platform_has() arch/cc: Introduce a function to check for confidential computing features x86/ioremap: Selectively build arch override encryption functions	2021-11-01 15:16:52 -07:00
J. Bruce Fields	80479eb862	nfsd4: remove obselete comment Mandatory locking has been removed. And the rest of this comment is redundant with the code. Reported-by: Jeff layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-11-01 17:17:14 -04:00
Linus Torvalds	9a7e0a90a4	Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place * tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) sched/fair: Cleanup newidle_balance sched/fair: Remove sysctl_sched_migration_cost condition sched/fair: Wait before decaying max_newidle_lb_cost sched/fair: Skip update_blocked_averages if we are defering load balance sched/fair: Account update_blocked_averages in newidle_balance cost x86: Fix __get_wchan() for !STACKTRACE sched,x86: Fix L2 cache mask sched/core: Remove rq_relock() sched: Improve wake_up_all_idle_cpus() take #2 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ sched: Add cluster scheduler level for x86 sched: Add cluster scheduler level in core and related Kconfig for ARM64 topology: Represent clusters of CPUs within a die sched: Disable -Wunused-but-set-variable sched: Add wrapper for get_wchan() to keep task blocked x86: Fix get_wchan() to support the ORC unwinder proc: Use task_is_running() for wchan in /proc/$pid/stat ...	2021-11-01 13:48:52 -07:00
Linus Torvalds	037c50bfbe	Merge tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The updates this time are more under the hood and enhancing existing features (subpage with compression and zoned namespaces). Performance related: - misc small inode logging improvements (+3% throughput, -11% latency on sample dbench workload) - more efficient directory logging: bulk item insertion, less tree searches and locking - speed up bulk insertion of items into a b-tree, which is used when logging directories, when running delayed items for directories (fsync and transaction commits) and when running the slow path (full sync) of an fsync (bulk creation run time -4%, deletion -12%) Core: - continued subpage support - make defragmentation work - make compression write work - zoned mode - support ZNS (zoned namespaces), zone capacity is number of usable blocks in each zone - add dedicated block group (zoned) for relocation, to prevent out of order writes in some cases - greedy block group reclaim, pick the ones with least usable space first - preparatory work for send protocol updates - error handling improvements - cleanups and refactoring Fixes: - lockdep warnings - in show_devname callback, on seeding device - device delete on loop device due to conversions to workqueues - fix deadlock between chunk allocation and chunk btree modifications - fix tracking of missing device count and status" * tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (140 commits) btrfs: remove root argument from check_item_in_log() btrfs: remove root argument from add_link() btrfs: remove root argument from btrfs_unlink_inode() btrfs: remove root argument from drop_one_dir_item() btrfs: clear MISSING device status bit in btrfs_close_one_device btrfs: call btrfs_check_rw_degradable only if there is a missing device btrfs: send: prepare for v2 protocol btrfs: fix comment about sector sizes supported in 64K systems btrfs: update device path inode time instead of bd_inode fs: export an inode_update_time helper btrfs: fix deadlock when defragging transparent huge pages btrfs: sysfs: convert scnprintf and snprintf to sysfs_emit btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZE btrfs: update comments for chunk allocation -ENOSPC cases btrfs: fix deadlock between chunk allocation and chunk btree modifications btrfs: zoned: use greedy gc for auto reclaim btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls btrfs: add a btrfs_get_dev_args_from_path helper btrfs: handle device lookup with btrfs_dev_lookup_args ...	2021-11-01 12:48:25 -07:00
Linus Torvalds	2cf3f8133b	btrfs: fix lzo_decompress_bio() kmap leakage Commit `ccaa66c8dd` reinstated the kmap/kunmap that had been dropped in commit `8c945d32e6` ("btrfs: compression: drop kmap/kunmap from lzo"). However, it seems to have done so incorrectly due to the change not reverting cleanly, and lzo_decompress_bio() ended up not having a matching "kunmap()" to the "kmap()" that was put back. Also, any assert that the page pointer is not NULL should be before the kmap() of said pointer, since otherwise you'd just oops in the kmap() before the assert would even trigger. I noticed this when trying to verify my btrfs merge, and things not adding up. I'm doing this fixup before re-doing my merge, because this commit needs to also be backported to 5.15 (after verification from the btrfs people). Fixes: `ccaa66c8dd` ("Revert 'btrfs: compression: drop kmap/kunmap from lzo'") Cc: David Sterba <dsterba@suse.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-01 12:46:47 -07:00
Linus Torvalds	9c6e8d52a7	Merge tag 'exfat-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: "Fix ->i_blocks truncation issue caused by wrong 32bit mask" * tag 'exfat-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix incorrect loading of i_blocks for large files	2021-11-01 11:45:04 -07:00
Linus Torvalds	67a135b80e	Merge tag 'erofs-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "There are some new features available for this cycle. Firstly, EROFS LZMA algorithm support, specifically called MicroLZMA, is available as an option for embedded devices, LiveCDs and/or as the secondary auxiliary compression algorithm besides the primary algorithm in one file. In order to better support the LZMA fixed-sized output compression, especially for 4KiB pcluster size (which has lowest memory pressure thus useful for memory-sensitive scenarios), Lasse introduced a new LZMA header/container format called MicroLZMA to minimize the original LZMA1 header (for example, we don't need to waste 4-byte dictionary size and another 8-byte uncompressed size, which can be calculated by fs directly, for each pcluster) and enable EROFS fixed-sized output compression. Note that MicroLZMA can also be later used by other things in addition to EROFS too where wasting minimal amount of space for headers is important and it can be only compiled by enabling XZ_DEC_MICROLZMA. MicroLZMA has been supported by the latest upstream XZ embedded [1] & XZ utils [2], apply the latest related XZ embedded upstream patches by the XZ author Lasse here. Secondly, multiple device is also supported in this cycle, which is designed for multi-layer container images. By working together with inter-layer data deduplication and compression, we can achieve the next high-performance container image solution. Our team will announce the new Nydus container image service [3] implementation with new RAFS v6 (EROFS-compatible) format in Open Source Summit 2021 China [4] soon. Besides, the secondary compression head support and readmore decompression strategy are also included in this cycle. There are also some minor bugfixes and cleanups, as always. Summary: - support multiple devices for multi-layer container images; - support the secondary compression head; - support readmore decompression strategy; - support new LZMA algorithm (specifically called MicroLZMA); - some bugfixes & cleanups" * tag 'erofs-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: don't trigger WARN() when decompression fails erofs: get rid of ->lru usage erofs: lzma compression support erofs: rename some generic methods in decompressor lib/xz, lib/decompress_unxz.c: Fix spelling in comments lib/xz: Add MicroLZMA decoder lib/xz: Move s->lzma.len = 0 initialization to lzma_reset() lib/xz: Validate the value before assigning it to an enum variable lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression erofs: introduce readmore decompression strategy erofs: introduce the secondary compression head erofs: get compression algorithms directly on mapping erofs: add multiple device support erofs: decouple basic mount options from fs_context erofs: remove the fast path of per-CPU buffer decompression	2021-11-01 11:39:22 -07:00
Linus Torvalds	cd3e8ea847	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fscrypt updates from Eric Biggers: "Some cleanups for fs/crypto/: - Allow 256-bit master keys with AES-256-XTS - Improve documentation and comments - Remove unneeded field fscrypt_operations::max_namelen" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: improve a few comments fscrypt: allow 256-bit master keys with AES-256-XTS fscrypt: improve documentation for inline encryption fscrypt: clean up comments in bio.c fscrypt: remove fscrypt_operations::max_namelen	2021-11-01 11:36:35 -07:00
Linus Torvalds	19901165d9	Merge tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block Pull block inode sync updates from Jens Axboe: "This contains improvements to how bdev inode syncing is handled, unifying the API" * tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block: block: simplify the block device syncing code ntfs3: use sync_blockdev_nowait fat: use sync_blockdev_nowait btrfs: use sync_blockdev xen-blkback: use sync_blockdev block: remove __sync_blockdev fs: remove __sync_filesystem	2021-11-01 10:25:27 -07:00
Linus Torvalds	b6773cdb0e	Merge tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block Pull kiocb->ki_complete() cleanup from Jens Axboe: "This removes the res2 argument from kiocb->ki_complete(). Only the USB gadget code used it, everybody else passes 0. The USB guys checked the user gadget code they could find, and everybody just uses res as expected for the async interface" * tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block: fs: get rid of the res2 iocb->ki_complete argument usb: remove res2 argument from gadget code completions	2021-11-01 10:17:11 -07:00
Linus Torvalds	71ae42629e	Merge tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block Pull QUEUE_FLAG_SCSI_PASSTHROUGH removal from Jens Axboe: "This contains a series leading to the removal of the QUEUE_FLAG_SCSI_PASSTHROUGH queue flag" * tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block: block: remove blk_{get,put}_request block: remove QUEUE_FLAG_SCSI_PASSTHROUGH block: remove the initialize_rq_fn blk_mq_ops method scsi: add a scsi_alloc_request helper bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fn nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands sd: implement ->get_unique_id block: add a ->get_unique_id method	2021-11-01 10:12:44 -07:00
Linus Torvalds	3f01727f75	Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block Pull bdev size cleanups from Jens Axboe: "Clean up the bdev size handling with new bdev_nr_bytes() helper" * tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits) partitions/ibm: use bdev_nr_sectors instead of open coding it partitions/efi: use bdev_nr_bytes instead of open coding it block/ioctl: use bdev_nr_sectors and bdev_nr_bytes block: cache inode size in bdev udf: use sb_bdev_nr_blocks reiserfs: use sb_bdev_nr_blocks ntfs: use sb_bdev_nr_blocks jfs: use sb_bdev_nr_blocks ext4: use sb_bdev_nr_blocks block: add a sb_bdev_nr_blocks helper block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate squashfs: use bdev_nr_bytes instead of open coding it reiserfs: use bdev_nr_bytes instead of open coding it pstore/blk: use bdev_nr_bytes instead of open coding it ntfs3: use bdev_nr_bytes instead of open coding it nilfs2: use bdev_nr_bytes instead of open coding it nfs/blocklayout: use bdev_nr_bytes instead of open coding it jfs: use bdev_nr_bytes instead of open coding it hfsplus: use bdev_nr_sectors instead of open coding it hfs: use bdev_nr_sectors instead of open coding it ...	2021-11-01 09:50:37 -07:00
Linus Torvalds	8d1f01775f	Merge tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: "Light on new features - basically just the hybrid mode support. Outside of that it's just fixes, cleanups, and performance improvements. In detail: - Add ring related information to the fdinfo output (Hao) - Hybrid async mode (Hao) - Support for batched issue on block (me) - sqe error trace improvement (me) - IOPOLL efficiency improvements (Pavel) - submit state cleanups and improvements (Pavel) - Completion side improvements (Pavel) - Drain improvements (Pavel) - Buffer selection cleanups (Pavel) - Fixed file node improvements (Pavel) - io-wq setup cancelation fix (Pavel) - Various other performance improvements and cleanups (Pavel) - Misc fixes (Arnd, Bixuan, Changcheng, Hao, me, Noah)" * tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block: (97 commits) io-wq: remove worker to owner tw dependency io_uring: harder fdinfo sq/cq ring iterating io_uring: don't assign write hint in the read path io_uring: clusterise ki_flags access in rw_prep io_uring: kill unused param from io_file_supports_nowait io_uring: clean up timeout async_data allocation io_uring: don't try io-wq polling if not supported io_uring: check if opcode needs poll first on arming io_uring: clean iowq submit work cancellation io_uring: clean io_wq_submit_work()'s main loop io-wq: use helper for worker refcounting io_uring: implement async hybrid mode for pollable requests io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR()) io_uring: split logic of force_nonblock io_uring: warning about unused-but-set parameter io_uring: inform block layer of how many requests we are submitting io_uring: simplify io_file_supports_nowait() io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags io_uring: arm poll for non-nowait files fs/io_uring: Prioritise checking faster conditions first in io_write ...	2021-11-01 09:41:33 -07:00
Linus Torvalds	33c8846c81	Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...	2021-11-01 09:19:50 -07:00
Linus Torvalds	9ac211426f	Merge tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Most of this is just follow-on cleanup work of documentation and comments from the mandatory locking removal in v5.15. The only real functional change is that LOCK_MAND flock() support is also being removed, as it has basically been non-functional since the v2.5 days" * tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove leftover comments from mandatory locking removal locks: remove changelog comments docs: fs: locks.rst: update comment about mandatory file locking Documentation: remove reference to now removed mandatory-locking doc locks: remove LOCK_MAND flock lock support	2021-11-01 09:06:53 -07:00
Linus Torvalds	49f8275c7d	Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache Pull memory folios from Matthew Wilcox: "Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. The point of all this churn is to allow filesystems and the page cache to manage memory in larger chunks than PAGE_SIZE. The original plan was to use compound pages like THP does, but I ran into problems with some functions expecting only a head page while others expect the precise page containing a particular byte. The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head(). This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready. The multi-page folios offer some improvement to some workloads. The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%. I haven't heard of any performance losses as a result of this series. Nobody has done any serious performance tuning; I imagine that tweaking the readahead algorithm could provide some more interesting wins. There are also other places where we could choose to create large folios and currently do not, such as writes that are larger than PAGE_SIZE. I'd like to thank all my reviewers who've offered review/ack tags: Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil Babka, William Kucharski, Yu Zhao and Zi Yan. I'd also like to thank those who gave feedback I incorporated but haven't offered up review tags for this part of the series: Nick Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard, Hugh Dickins, and probably a few others who I forget" * tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits) mm/writeback: Add folio_write_one mm/filemap: Add FGP_STABLE mm/filemap: Add filemap_get_folio mm/filemap: Convert mapping_get_entry to return a folio mm/filemap: Add filemap_add_folio() mm/filemap: Add filemap_alloc_folio mm/page_alloc: Add folio allocation functions mm/lru: Add folio_add_lru() mm/lru: Convert __pagevec_lru_add_fn to take a folio mm: Add folio_evictable() mm/workingset: Convert workingset_refault() to take a folio mm/filemap: Add readahead_folio() mm/filemap: Add folio_mkwrite_check_truncate() mm/filemap: Add i_blocks_per_folio() mm/writeback: Add folio_redirty_for_writepage() mm/writeback: Add folio_account_redirty() mm/writeback: Add folio_clear_dirty_for_io() mm/writeback: Add folio_cancel_dirty() mm/writeback: Add folio_account_cleaned() mm/writeback: Add filemap_dirty_folio() ...	2021-11-01 08:47:59 -07:00

... 20 21 22 23 24 ...

74668 Commits