Currently we use the page->lru list for maintaining lists of slabs. We
have a list in the page structure (slab_list) that can be used for this
purpose. Doing so makes the code cleaner since we are not overloading the
lru list.
Use the slab_list instead of the lru list for maintaining lists of slabs.
Link: http://lkml.kernel.org/r/20190402230545.2929-6-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLUB allocator makes heavy use of ifdef/endif pre-processor macros. The
pairing of these statements is at times hard to follow e.g. if the pair
are further than a screen apart or if there are nested pairs. We can
reduce cognitive load by adding a comment to the endif statement of form
#ifdef CONFIG_FOO
...
#endif /* CONFIG_FOO */
Add comments to endif pre-processor macros if ifdef/endif pair is not
immediately apparent.
Link: http://lkml.kernel.org/r/20190402230545.2929-5-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we use the page->lru list for maintaining lists of slabs. We
have a list_head in the page structure (slab_list) that can be used for
this purpose. Doing so makes the code cleaner since we are not
overloading the lru list.
The slab_list is part of a union within the page struct (included here
stripped down):
union {
struct { /* Page cache and anonymous pages */
struct list_head lru;
...
};
struct {
dma_addr_t dma_addr;
};
struct { /* slab, slob and slub */
union {
struct list_head slab_list;
struct { /* Partial pages */
struct page *next;
int pages; /* Nr of pages left */
int pobjects; /* Approximate count */
};
};
...
Here we see that slab_list and lru are the same bits. We can verify that
this change is safe to do by examining the object file produced from
slob.c before and after this patch is applied.
Steps taken to verify:
1. checkout current tip of Linus' tree
commit a667cb7a94 ("Merge branch 'akpm' (patches from Andrew)")
2. configure and build (select SLOB allocator)
CONFIG_SLOB=y
CONFIG_SLAB_MERGE_DEFAULT=y
3. dissasemble object file `objdump -dr mm/slub.o > before.s
4. apply patch
5. build
6. dissasemble object file `objdump -dr mm/slub.o > after.s
7. diff before.s after.s
Use slab_list list_head instead of the lru list_head for maintaining
lists of slabs.
Link: http://lkml.kernel.org/r/20190402230545.2929-4-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we reach inside the list_head. This is a violation of the layer
of abstraction provided by the list_head. It makes the code fragile.
More importantly it makes the code wicked hard to understand.
The code reaches into the list_head structure to counteract the fact that
the list _may_ have been changed during slob_page_alloc(). Instead of
this we can add a return parameter to slob_page_alloc() to signal that the
list was modified (list_del() called with page->lru to remove page from
the freelist).
This code is concerned with an optimisation that counters the tendency for
first fit allocation algorithm to fragment memory into many small chunks
at the front of the memory pool. Since the page is only removed from the
list when an allocation uses _all_ the remaining memory in the page then
in this special case fragmentation does not occur and we therefore do not
need the optimisation.
Add a return parameter to slob_page_alloc() to signal that the allocation
used up the whole page and that the page was removed from the free list.
After calling slob_page_alloc() check the return value just added and only
attempt optimisation if the page is still on the list.
Use list_head API instead of reaching into the list_head structure to
check if sp is at the front of the list.
Link: http://lkml.kernel.org/r/20190402230545.2929-3-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: Use slab_list list_head instead of lru", v5.
Currently the slab allocators (ab)use the struct page 'lru' list_head. We
have a list head for slab allocators to use, 'slab_list'.
During v2 it was noted by Christoph that the SLOB allocator was reaching
into a list_head, this version adds 2 patches to the front of the set to
fix that.
Clean up all three allocators by using the 'slab_list' list_head instead
of overloading the 'lru' list_head.
This patch (of 7):
Currently if we wish to rotate a list until a specific item is at the
front of the list we can call list_move_tail(head, list). Note that the
arguments are the reverse way to the usual use of list_move_tail(list,
head). This is a hack, it depends on the developer knowing how the
list_head operates internally which violates the layer of abstraction
offered by the list_head. Also, it is not intuitive so the next developer
to come along must study list.h in order to fully understand what is meant
by the call, while this is 'good for' the developer it makes reading the
code harder. We should have an function appropriately named that does
this if there are users for it intree.
By grep'ing the tree for list_move_tail() and list_tail() and attempting
to guess the argument order from the names it seems there is only one
place currently in the tree that does this - the slob allocatator.
Add function list_rotate_to_front() to rotate a list until the specified
item is at the front of the list.
Link: http://lkml.kernel.org/r/20190402230545.2929-2-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some cases, ocfs2_iget() reads the data of inode, which has been
deleted for some reason. That will make the system panic. So We should
judge whether this inode has been deleted, and tell the caller that the
inode is a bad inode.
For example, the ocfs2 is used as the backed of nfs, and the client is
nfsv3. This issue can be reproduced by the following steps.
on the nfs server side,
..../patha/pathb
Step 1: The process A was scheduled before calling the function fh_verify.
Step 2: The process B is removing the 'pathb', and just completed the call
to function dput. Then the dentry of 'pathb' has been deleted from the
dcache, and all ancestors have been deleted also. The relationship of
dentry and inode was deleted through the function hlist_del_init. The
following is the call stack.
dentry_iput->hlist_del_init(&dentry->d_u.d_alias)
At this time, the inode is still in the dcache.
Step 3: The process A call the function ocfs2_get_dentry, which get the
inode from dcache. Then the refcount of inode is 1. The following is the
call stack.
nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)
Step 4: Dirty pages are flushed by bdi threads. So the inode of 'patha'
is evicted, and this directory was deleted. But the inode of 'pathb'
can't be evicted, because the refcount of the inode was 1.
Step 5: The process A keep running, and call the function
reconnect_path(in exportfs_decode_fh), which call function
ocfs2_get_parent of ocfs2. Get the block number of parent
directory(patha) by the name of ... Then read the data from disk by the
block number. But this inode has been deleted, so the system panic.
Process A Process B
1. in nfsd3_proc_getacl |
2. | dput
3. fh_to_dentry(ocfs2_get_dentry) |
4. bdi flush dirty cache |
5. ocfs2_iget |
[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
Invalid dinode #580640: OCFS2_VALID_FL not set
[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
after error
[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G W
4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
[283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/21/2015
[283465.549657] 0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
ffffffffa0514758
[283465.550392] 000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
0000000000000008
[283465.551056] ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
ffff88005df9f000
[283465.551710] Call Trace:
[283465.552516] [<ffffffff816e839c>] dump_stack+0x63/0x81
[283465.553291] [<ffffffff816e62d3>] panic+0xcb/0x21b
[283465.554037] [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
[283465.554882] [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2]
[283465.555768] [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230
[ocfs2]
[283465.556683] [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
[283465.557408] [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20
[ocfs2]
[283465.557973] [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60
[ocfs2]
[283465.558525] [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2]
[283465.559082] [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2]
[283465.559622] [<ffffffff81297c05>] reconnect_path+0xb5/0x300
[283465.560156] [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0
[283465.560708] [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[283465.561262] [<ffffffff810a8196>] ? prepare_creds+0x26/0x110
[283465.561932] [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd]
[283465.562862] [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
[283465.563697] [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
[283465.564510] [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd]
[283465.565358] [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30
[sunrpc]
[283465.566272] [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc]
[283465.567155] [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc]
[283465.568020] [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd]
[283465.568962] [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd]
[283465.570112] [<ffffffff810a622b>] kthread+0xcb/0xf0
[283465.571099] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
[283465.572114] [<ffffffff816f11b8>] ret_from_fork+0x58/0x90
[283465.573156] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deduplicate the ocfs2 file type conversion implementation and remove
OCFS2_FT_* definitions - file systems that use the same file types as
defined by POSIX do not need to define their own versions and can use the
common helper functions decared in fs_types.h and implemented in
fs_types.c
Common implementation can be found via bbe7449e25 ("fs: common
implementation of file type").
Link: http://lkml.kernel.org/r/20190326213919.GA20878@pathfinder
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have been contributing and reviewing to the ocfs2 filesystem for recent
years and I'm willing to continue doing so. Volunteer as a co-maintainer
for ocfs2 filesystem.
Link: http://lkml.kernel.org/r/f56d75b3-2be5-25c2-51f2-c3f5423d4f14@gmail.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long). These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.
As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL. From the image dump of a program:
| "mm_start_code": "0x400000",
| "mm_end_code": "0x8f5fb4",
| "mm_start_data": "0xf1bfb0",
| "mm_end_data": "0xf1bfb0",
Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.
Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
spinlock recursion happened when do LTP test:
#!/bin/bash
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
The dtor returned by get_compound_page_dtor in __put_compound_page may be
the function of free_huge_page which will lock the hugetlb_lock, so don't
put_page in lock of hugetlb_lock.
BUG: spinlock recursion on CPU#0, hugemmap05/1079
lock: hugetlb_lock+0x0/0x18, .magic: dead4ead, .owner: hugemmap05/1079, .owner_cpu: 0
Call trace:
dump_backtrace+0x0/0x198
show_stack+0x24/0x30
dump_stack+0xa4/0xcc
spin_dump+0x84/0xa8
do_raw_spin_lock+0xd0/0x108
_raw_spin_lock+0x20/0x30
free_huge_page+0x9c/0x260
__put_compound_page+0x44/0x50
__put_page+0x2c/0x60
alloc_surplus_huge_page.constprop.19+0xf0/0x140
hugetlb_acct_memory+0x104/0x378
hugetlb_reserve_pages+0xe0/0x250
hugetlbfs_file_mmap+0xc0/0x140
mmap_region+0x3e8/0x5b0
do_mmap+0x280/0x460
vm_mmap_pgoff+0xf4/0x128
ksys_mmap_pgoff+0xb4/0x258
__arm64_sys_mmap+0x34/0x48
el0_svc_common+0x78/0x130
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
Link: http://lkml.kernel.org/r/b8ade452-2d6b-0372-32c2-703644032b47@huawei.com
Fixes: 9980d744a0 ("mm, hugetlb: get rid of surplus page accounting tricks")
Signed-off-by: Kai Shen <shenkai8@huawei.com>
Signed-off-by: Feilong Lin <linfeilong@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXNpu6gAKCRDh3BK/laaZ
PNYnAQCLMJBZp9AVKU+5onOGKLmgUfnbKZhWJYICW6DVKobo6AEA4aXBIk5TIDiu
+a3Ny0nAutdpHcRkbi8jJty91BeJgQg=
=iLSD
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
"Just bug fixes in this small update"
* tag 'ovl-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: relax WARN_ON() for overlapping layers use case
ovl: check the capability before cred overridden
ovl: do not generate duplicate fsnotify events for "fake" path
ovl: support stacked SEEK_HOLE/SEEK_DATA
ovl: fix missing upper fs freeze protection on copy up for ioctl
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXNpuRwAKCRDh3BK/laaZ
PMq/AP9kLvB97JU2GbzIJq6wOjDV8whPE/a2Knx0fajvW3AEOAD+NQwdZLmVNql7
DkkY8lZ7fVut3TMj8jHhpIbv4P1R1AE=
=qX6f
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
"Add more caching controls for userspace filesystems to use, as well as
bug fixes and cleanups"
* tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up fuse_alloc_inode
fuse: Add ioctl flag for x32 compat ioctl
fuse: Convert fusectl to use the new mount API
fuse: fix changelog entry for protocol 7.9
fuse: fix changelog entry for protocol 7.12
fuse: document fuse_fsync_in.fsync_flags
fuse: Add FOPEN_STREAM to use stream_open()
fuse: require /dev/fuse reads to have enough buffer capacity
fuse: retrieve: cap requested size to negotiated max_write
fuse: allow filesystems to have precise control over data cache
fuse: convert printk -> pr_*
fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
fuse: fix writepages on 32bit
Another round of various bug fixes came in. Damien improved SMR drive support a
bit, and Chao replaced BUG_ON() with reporting errors to user since we've not
hit from users but did hit from crafted images. We've found a disk layout bug
in large_nat_bits feature which supports very large NAT entries enabled at mkfs.
If the feature is enabled, it will give a notice to run fsck to correct the
on-disk layout.
Enhancement:
- reduce memory consumption for SMR drive
- better discard handling for multiple partitions
- tracepoints for f2fs_file_write_iter/f2fs_filemap_fault
- allow to change CP_CHKSUM_OFFSET
- detect wrong layout of large_nat_bitmap feature
- enhance checking valid data indices
Bug fix:
- Multiple partition support for SMR drive
- deadlock problem in f2fs_balance_fs_bg
- add boundary checks to fix abnormal behaviors on fuzzed images
- inline_xattr space calculations
- replace f2fs_bug_on with errors
In addition, this series contains various memory boundary check and sanity check
of on-disk consistency.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlzZ/ukACgkQQBSofoJI
UNKnkQ//U6QsEgNsC5GeNAXPLxodxC406CSo+aRk//3xUdORzkVNia1ThLeYVNfd
3LsLaSy1eLZ0GylrR/MpvHt+eSWH5L5Ferj2v4l1zAYLr29FEhl6bmYPyvc3e4pr
IbHwC6W1g1sV2LxeNp7z0chkT7cOAm633d3OVJzUXD5B1k52lx72QnqoXal0Ae1L
tt3h/TVBIRJX26nSTiEZTBnIDmpiZYSsJVGgjqLnTCXHWUKPPfuJqALiELluhBNJ
M7gDE3/JYJyb+1PrdtvsEesPJxSLovGJJJvcJKcuMhWwij0Jsq2BwiP3shcfj8iA
76PiPlhjS6D5sMo1hJzbKettusfZrxX284UHNacrkgA/TBbHeGGBy3Fbh6B3+/o1
qvCl0atqp3km6Z8vg5r7nDTOMrg0JSjsHz3WA9ZXZ+cSM6mGxk7vYMd7Sn7h2GqY
deIfWqlTdB8hOFQJWDswXI2ILyJlvquc4jTKIUmHp4ZEQXdYSWlBUWm7+1XZvoYn
rlrAcr/loSQ/gT4U/Z9RQdpMYb2n2n3+YF8QAeTDmVafMeqbrwspCZf6l8Pfoto1
ZVYO9QUIsvRBDEiDHdLmRwb1ckTTiHiHrO9YwtA5zlYRRyH93MamQTsH7BTGt0OC
tjCPbpQGlWK6M9p3LNQ4H5lsdLzD7EEP0JXLOQ57rY3aDaZsOsE=
=HOz+
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"Another round of various bug fixes came in. Damien improved SMR drive
support a bit, and Chao replaced BUG_ON() with reporting errors to
user since we've not hit from users but did hit from crafted images.
We've found a disk layout bug in large_nat_bits feature which supports
very large NAT entries enabled at mkfs. If the feature is enabled, it
will give a notice to run fsck to correct the on-disk layout.
Enhancements:
- reduce memory consumption for SMR drive
- better discard handling for multiple partitions
- tracepoints for f2fs_file_write_iter/f2fs_filemap_fault
- allow to change CP_CHKSUM_OFFSET
- detect wrong layout of large_nat_bitmap feature
- enhance checking valid data indices
Bug fixes:
- Multiple partition support for SMR drive
- deadlock problem in f2fs_balance_fs_bg
- add boundary checks to fix abnormal behaviors on fuzzed images
- inline_xattr space calculations
- replace f2fs_bug_on with errors
In addition, this series contains various memory boundary check and
sanity check of on-disk consistency"
* tag 'f2fs-for-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: fix to avoid accessing xattr across the boundary
f2fs: fix to avoid potential race on sbi->unusable_block_count access/update
f2fs: add tracepoint for f2fs_filemap_fault()
f2fs: introduce DATA_GENERIC_ENHANCE
f2fs: fix to handle error in f2fs_disable_checkpoint()
f2fs: remove redundant check in f2fs_file_write_iter()
f2fs: fix to be aware of readonly device in write_checkpoint()
f2fs: fix to skip recovery on readonly device
f2fs: fix to consider multiple device for readonly check
f2fs: relocate chksum_offset for large_nat_bitmap feature
f2fs: allow unfixed f2fs_checkpoint.checksum_offset
f2fs: Replace spaces with tab
f2fs: insert space before the open parenthesis '('
f2fs: allow address pointer number of dnode aligning to specified size
f2fs: introduce f2fs_read_single_page() for cleanup
f2fs: mark is_extension_exist() inline
f2fs: fix to set FI_UPDATE_WRITE correctly
f2fs: fix to avoid panic in f2fs_inplace_write_data()
f2fs: fix to do sanity check on valid block count of segment
f2fs: fix to do sanity check on valid node/block count
...
As of Linux 5.1, alpha and s390 are the last architectures that
have defconfig in arch/*/ instead of arch/*/configs/.
$ find arch -name defconfig | sort
arch/alpha/defconfig
arch/arm64/configs/defconfig
arch/csky/configs/defconfig
arch/nds32/configs/defconfig
arch/riscv/configs/defconfig
arch/s390/defconfig
The arch/$(ARCH)/defconfig is the hard-coded default in Kconfig,
and I want to deprecate it after evacuating the remaining defconfig
into the standard location, arch/*/configs/.
Define KBUILD_DEFCONFIG like other architectures, and move defconfig
into the configs/ subdirectory.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The CNTLID value is required to be unique, and we do rely on this
for correct operation. So reject any controller for which a non-unique
CNTLID has been detected.
Based on a patch from Hannes Reinecke.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Life becomes a lot simpler if we just use the global
nvme_subsystems_lock to protect this list. Given that it is only
accessed during controller probing and removal that isn't a scalability
problem either.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
This patch removes the tracing of the NVMe Async events out of the
switch so that it can trace all the events including the ones which
are not handled in the nvme_handle_aen_notice(). The events which
are not handled in the nvme_handle_aen_notice() such as
NVME_AER_NOTICE_DISC_CHANGED corresponding event identifier needs
to be added in the drivers/nvme/host/trace.h so that it can stringify
the AER .
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The variable 'count' is not currently used by nvmf_create_ctrl(), so
remove it.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.
I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.
[ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]
Fixes: c73e8833be ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
Add patch for realtek codec in Lenovo B50-70 that fixes inverted
internal microphone channel.
Device IdeaPad Y410P has the same PCI SSID as Lenovo B50-70,
but first one is about fix the noise and it didn't seem help in a
later kernel version.
So I replaced IdeaPad Y410P device description with B50-70 and apply
inverted microphone fix.
Bugzilla: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1524215
Signed-off-by: Michał Wadowski <wadosm@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.
Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.
The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.
The mitigation set comes with documentation for administrators and a
deeper technical view"
* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...
Valid pathnames will never exceed PATH_MAX, but these file names
are unsanitized and can cause buffer overflow if set incorrectly.
Use snprintf to avoid this. This was flagged during a Coverity scan
of the coreboot project, which also uses kconfig for its build system.
Signed-off-by: Jacob Garber <jgarber1@ualberta.ca>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
conf_write_dep() has just one caller:
conf_write_dep("include/config/auto.conf.cmd");
"name" always points to a valid string.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The msm_gem_object structure contains resv and _resv fields that are
no longer needed since the reservation object is now stored on
drm_gem_object. msm_atomic_prepare_fb() and msm_atomic_prepare_fb()
both referenced the wrong reservation object, and would lead to an
attempt to dereference a NULL pointer. Correct those two cases to
point to the correct reservation object.
Fixes: dd55cf6929 ("drm: msm: Switch to use drm_gem_object reservation_object")
Cc: David Airlie <airlied@linux.ie>
Cc: linux-arm-msm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Cc: Rob Herring <robh@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Brian Masney <masneyb@onstation.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190513234105.7531-1-masneyb@onstation.org
Use devm_thermal_of_cooling_device_register() to register the cooling
device. Also use devm_add_action_or_reset() to stop the fan on device
removal, and to disable the pwm. Introduce a local 'dev' variable in
the probe function to make the code easier to read.
As a side effect, this fixes a bug seen if pwm_fan_of_get_cooling_data()
returned an error. In that situation, the pwm was not disabled, and
the fan was not stopped. Using devm functions also ensures that the
pwm is disabled and that the fan is stopped only after the hwmon device
has been unregistered.
Cc: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Use devm_thermal_of_cooling_device_register() to register the cooling
device. As a side effect, this fixes a driver bug:
thermal_cooling_device_unregister() was not called on device removal.
Fixes: f1fd4a4db7 ("hwmon: Add NPCM7xx PWM and Fan driver")
Cc: Tomer Maimon <tmaimon77@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Call devm_thermal_of_cooling_device_register() to register the cooling
device. Also introduce struct device *dev = &pdev->dev; to make the code
easier to read.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Call devm_thermal_of_cooling_device_register() to register the cooling
device. Also use devm_add_action_or_reset() to stop the fan on device
removal. This fixes a race condition since the fan was stopped before
the hwmon device was removed.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Use devm_thermal_of_cooling_device_register() to register the cooling
device. As a side effect, this fixes a driver bug:
thermal_cooling_device_unregister() was not called on removal.
Fixes: f198907d2f ("hwmon: (aspeed-pwm-tacho) cooling device support.")
Cc: Mykola Kostenok <c_mykolak@mellanox.com>
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Patrick Venture <venture@google.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Currently after store trip points number in 'ret', it is overwritten
afterwards, this cause incorrect trip point number always be shown in
the debug information after register of each thermal zone.
This patch fix this issue by moving get of trip number to
end of thermal zone registration.
Fixes: 6269e9f790 ("thermal: rcar_gen3_thermal: Register hwmon sysfs interface")
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Update calculation for the R-Car Gen3 and RZ/G2 SoCs which have a
thermal IP block controlled by this driver. That is the:
* R-Car D3 (r8a77995)
* R-Car E2 (r8a77990)
* R-Car V3M (r8a77970)
* RZ/G2E (r8a774c0)
The calculation update is as documented in the R-Car Gen3 User's Manual,
v1.50 Nov 2018:
- When CTEMP is less than 24
T = CTEMP[5:0] * 5.5 - 72
- When CTEMP is equal to/greater than 24
T = CTEMP[5:0] * 5 - 60
This was inspired by a patch in the BSP by Van Do <van.do.xw@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The CPU load values passed to the thermal_power_cpu_get_power
tracepoint are zero for all CPUs, unless, unless the
thermal_power_cpu_limit tracepoint is enabled too:
irq/41-rockchip-98 [000] .... 290.972410: thermal_power_cpu_get_power:
cpus=0000000f freq=1800000 load={{0x0,0x0,0x0,0x0}} dynamic_power=4815
vs
irq/41-rockchip-96 [000] .... 95.773585: thermal_power_cpu_get_power:
cpus=0000000f freq=1800000 load={{0x56,0x64,0x64,0x5e}} dynamic_power=4959
irq/41-rockchip-96 [000] .... 95.773596: thermal_power_cpu_limit:
cpus=0000000f freq=408000 cdev_state=10 power=416
There seems to be no good reason for omitting the CPU load information
depending on another tracepoint. My guess is that the intention was to
check whether thermal_power_cpu_get_power is (still) enabled, however
'load_cpu != NULL' already indicates that it was at least enabled when
cpufreq_get_requested_power() was entered, there seems little gain
from omitting the assignment if the tracepoint was just disabled, so
just remove the check.
Fixes: 6828a4711f ("thermal: add trace events to the power allocator governor")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Javi Merino <javi.merino@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
PX30 SOC has two Temperature Sensors for CPU and GPU.
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Add a new compatible for thermal founding on PX30 SoCs.
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Explicitly use the pinctrl to set/unset the right mode
instead of relying on the pinctrl init mode.
And it requires setting the tshut polarity before select pinctrl.
When the temperature sensor mode is set to 0, it will automatically
reset the board via the Clock-Reset-Unit (CRU) if the over temperature
threshold is reached. However, when the pinctrl initializes, it does a
transition to "otp_out" which may lead the SoC restart all the time.
"otp_out" IO may be connected to the RESET circuit on the hardware.
If the IO is in the wrong state, it will trigger RESET.
(similar to the effect of pressing the RESET button)
which will cause the soc to restart all the time.
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Unlike DT framework, thermal-zones and its parameters can't be parsed
using ACPI framework. So that ACPI support is removed in this driver.
Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The devres.o gets linked if HAS_IOMEM is present so on ARCH=um
allyesconfig (COMPILE_TEST) failed on many files with:
drivers/thermal/thermal_mmio.o:
In function 'thermal_mmio_probe':thermal_mmio.c:(.text+0xe1):
undefined reference to `devm_ioremap_resource'
The users of devm_ioremap_resource() which are compile-testable
should depend on HAS_IOMEM.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Talel Shenhar <talel@amazon.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The structure cpufreq_cooling_device provides a backpointer to the thermal
device but this one is used for a trace and to unregister. For the trace,
we don't really need this field and the unregister function as the same
pointer passed as parameter. Remove it.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The copyright format does not conform to the format requested by
Linaro: https://wiki.linaro.org/Copyright
Fix it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
When the static power computation was removed, the test with the power
being negative was not removed. However, the substraction which was
responsible of the negative value was removed and the variable is now
an u32. A double reason to remove the test which does not make sense.
Fixes: 84fe2cab48 ("cpu_cooling: Drop static-power related stuff")
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Currently IRQ remains enabled after .remove, later if device is probed,
IRQ is requested before .thermal_init, this may cause IRQ function be
called before device is initialized.
this patch disables interrupt in .remove, to ensure irq function
only be called after device is fully initialized.
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Currently IRQF_SHARED type interrupt line is allocated, but it
is not appropriate, as the interrupt line isn't shared between
different devices, instead IRQF_ONESHOT is the proper type.
By changing interrupt type to IRQF_ONESHOT, now irq handler is
no longer needed, as clear of interrupt status can be done in
threaded interrupt context.
Because IRQF_ONESHOT type interrupt line is kept disabled until
the threaded handler has been run, so there is no need to protect
read/write of REG_GEN3_IRQSTR with lock.
Fixes: 7d4b269776 ("enable hardware interrupts for trip points")
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
thermal_of_cooling_device_register() and thermal_cooling_device_register()
are typically called from driver probe functions, and
thermal_cooling_device_unregister() is called from remove functions. This
makes both a perfect candidate for device managed functions.
Introduce devm_thermal_of_cooling_device_register(). This function can
also be used to replace thermal_cooling_device_register() by passing a NULL
pointer as device node. The new function requires both struct device *
and struct device_node * as parameters since the struct device_node *
parameter is not always identical to dev->of_node.
Don't introduce a device managed remove function since it is not needed
at this point.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Certain ADC channels, such as the xilinx-ams temperature channels, give
milliCelcius already when read with iio_read_channel_processed.
Rather than having to provide a 1:1 dummy lookup table, simply allow to
bypass the mechanism.
Signed-off-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>