Commit Graph

399 Commits

Author SHA1 Message Date
Mukesh Ojha
2c10a20f5e binder_alloc: Fix sleeping function called from invalid context
36c55ce870 ("binder_alloc: Replace kcalloc with kvcalloc to
mitigate OOM issues") introduced schedule while atomic issue.

[ 2689.152635][ T4275] BUG: sleeping function called from invalid context at mm/vmalloc.c:2847
[ 2689.161291][ T4275] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4275, name: kworker/1:140
[ 2689.170708][ T4275] preempt_count: 1, expected: 0
[ 2689.175572][ T4275] RCU nest depth: 0, expected: 0
[ 2689.180521][ T4275] INFO: lockdep is turned off.
[ 2689.180523][ T4275] Preemption disabled at:
[ 2689.180525][ T4275] [<ffffffe031f2a2dc>] binder_alloc_deferred_release+0x2c/0x388
..
..
[ 2689.213419][ T4275]  __might_resched+0x174/0x178
[ 2689.213423][ T4275]  __might_sleep+0x48/0x7c
[ 2689.213426][ T4275]  vfree+0x4c/0x15c
[ 2689.213430][ T4275]  kvfree+0x24/0x44
[ 2689.213433][ T4275]  binder_alloc_deferred_release+0x2c0/0x388
[ 2689.213436][ T4275]  binder_proc_dec_tmpref+0x15c/0x2a8
[ 2689.213440][ T4275]  binder_deferred_func+0xa8/0x8ec
[ 2689.213442][ T4275]  process_one_work+0x254/0x59c
[ 2689.213447][ T4275]  worker_thread+0x274/0x3ec
[ 2689.213450][ T4275]  kthread+0x110/0x134
[ 2689.213453][ T4275]  ret_from_fork+0x10/0x20

Fix it by moving the place of kvfree outside of spinlock context.

Fixes: 36c55ce870 ("binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues")
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Link: https://lore.kernel.org/r/20240725062510.2856662-1-quic_mojha@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-31 13:48:25 +02:00
Carlos Llamas
11512c197d binder: fix descriptor lookup for context manager
In commit 15d9da3f81 ("binder: use bitmap for faster descriptor
lookup"), it was incorrectly assumed that references to the context
manager node should always get descriptor zero assigned to them.

However, if the context manager dies and a new process takes its place,
then assigning descriptor zero to the new context manager might lead to
collisions, as there could still be references to the older node. This
issue was reported by syzbot with the following trace:

  kernel BUG at drivers/android/binder.c:1173!
  Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 1 PID: 447 Comm: binder-util Not tainted 6.10.0-rc6-00348-g31643d84b8c3 #10
  Hardware name: linux,dummy-virt (DT)
  pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : binder_inc_ref_for_node+0x500/0x544
  lr : binder_inc_ref_for_node+0x1e4/0x544
  sp : ffff80008112b940
  x29: ffff80008112b940 x28: ffff0e0e40310780 x27: 0000000000000000
  x26: 0000000000000001 x25: ffff0e0e40310738 x24: ffff0e0e4089ba34
  x23: ffff0e0e40310b00 x22: ffff80008112bb50 x21: ffffaf7b8f246970
  x20: ffffaf7b8f773f08 x19: ffff0e0e4089b800 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 000000002de4aa60
  x14: 0000000000000000 x13: 2de4acf000000000 x12: 0000000000000020
  x11: 0000000000000018 x10: 0000000000000020 x9 : ffffaf7b90601000
  x8 : ffff0e0e48739140 x7 : 0000000000000000 x6 : 000000000000003f
  x5 : ffff0e0e40310b28 x4 : 0000000000000000 x3 : ffff0e0e40310720
  x2 : ffff0e0e40310728 x1 : 0000000000000000 x0 : ffff0e0e40310710
  Call trace:
   binder_inc_ref_for_node+0x500/0x544
   binder_transaction+0xf68/0x2620
   binder_thread_write+0x5bc/0x139c
   binder_ioctl+0xef4/0x10c8
  [...]

This patch adds back the previous behavior of assigning the next
non-zero descriptor if references to previous context managers still
exist. It amends both strategies, the newer dbitmap code and also the
legacy slow_desc_lookup_olocked(), by allowing them to start looking
for available descriptors at a given offset.

Fixes: 15d9da3f81 ("binder: use bitmap for faster descriptor lookup")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+3dae065ca76952a67257@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c1c0a0061d1e6979@google.com/
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240722150512.4192473-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-31 13:47:48 +02:00
Carlos Llamas
31643d84b8 binder: fix hang of unregistered readers
With the introduction of binder_available_for_proc_work_ilocked() in
commit 1b77e9dcc3 ("ANDROID: binder: remove proc waitqueue") a binder
thread can only "wait_for_proc_work" after its thread->looper has been
marked as BINDER_LOOPER_STATE_{ENTERED|REGISTERED}.

This means an unregistered reader risks waiting indefinitely for work
since it never gets added to the proc->waiting_threads. If there are no
further references to its waitqueue either the task will hang. The same
applies to readers using the (e)poll interface.

I couldn't find the rationale behind this restriction. So this patch
restores the previous behavior of allowing unregistered threads to
"wait_for_proc_work". Note that an error message for this scenario,
which had previously become unreachable, is now re-enabled.

Fixes: 1b77e9dcc3 ("ANDROID: binder: remove proc waitqueue")
Cc: stable@vger.kernel.org
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240711201452.2017543-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12 11:31:37 +02:00
Lei Liu
36c55ce870 binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues
In binder_alloc, there is a frequent need for order3 memory allocation,
especially on small-memory mobile devices, which can lead to OOM and
cause foreground applications to be killed, resulting in flashbacks.

We use kvcalloc to allocate memory, which can reduce system OOM
occurrences, as well as decrease the time and probability of failure for
order3 memory allocations. Additionally, It has little impact on the
throughput of the binder. (as verified by Google's binder_benchmark
testing tool).

We have conducted multiple tests on an 8GB memory phone, kvcalloc has
little performance degradation and resolves frequent OOM issues, Below
is a partial excerpt of the test data.

throughput(TH_PUT) = (size * Iterations)/Time
kcalloc->kvcalloc:

Sample with kcalloc():
adb shell stop/ kcalloc /8+256G
---------------------------------------------------------------------
Benchmark                Time     CPU   Iterations  TH-PUT  TH-PUTCPU
                         (ns)     (ns)              (GB/s)    (GB/s)
---------------------------------------------------------------------
BM_sendVec_binder4      39126    18550    38894    3.976282  8.38684
BM_sendVec_binder8      38924    18542    37786    7.766108  16.3028
BM_sendVec_binder16     38328    18228    36700    15.32039  32.2141
BM_sendVec_binder32     38154    18215    38240    32.07213  67.1798
BM_sendVec_binder64     39093    18809    36142    59.16885  122.977
BM_sendVec_binder128    40169    19188    36461    116.1843  243.2253
BM_sendVec_binder256    40695    19559    35951    226.1569  470.5484
BM_sendVec_binder512    41446    20211    34259    423.2159  867.8743
BM_sendVec_binder1024   44040    22939    28904    672.0639  1290.278
BM_sendVec_binder2048   47817    25821    26595    1139.063  2109.393
BM_sendVec_binder4096   54749    30905    22742    1701.423  3014.115
BM_sendVec_binder8192   68316    42017    16684    2000.634  3252.858
BM_sendVec_binder16384  95435    64081    10961    1881.752  2802.469
BM_sendVec_binder32768  148232  107504     6510    1439.093  1984.295
BM_sendVec_binder65536  326499  229874     3178    637.8991  906.0329
NORAML TEST                                 SUM    10355.79  17188.15
stressapptest eat 2G                        SUM    10088.39  16625.97

Sample with kvcalloc():
adb shell stop/ kvcalloc /8+256G
----------------------------------------------------------------------
Benchmark                Time     CPU   Iterations  TH-PUT  TH-PUTCPU
                         (ns)     (ns)              (GB/s)    (GB/s)
----------------------------------------------------------------------
BM_sendVec_binder4       39673    18832    36598    3.689965  7.773577
BM_sendVec_binder8       39869    18969    37188    7.462038  15.68369
BM_sendVec_binder16      39774    18896    36627    14.73405  31.01355
BM_sendVec_binder32      40225    19125    36995    29.43045  61.90013
BM_sendVec_binder64      40549    19529    35148    55.47544  115.1862
BM_sendVec_binder128     41580    19892    35384    108.9262  227.6871
BM_sendVec_binder256     41584    20059    34060    209.6806  434.6857
BM_sendVec_binder512     42829    20899    32493    388.4381  796.0389
BM_sendVec_binder1024    45037    23360    29251    665.0759  1282.236
BM_sendVec_binder2048    47853    25761    27091    1159.433  2153.735
BM_sendVec_binder4096    55574    31745    22405    1651.328  2890.877
BM_sendVec_binder8192    70706    43693    16400    1900.105  3074.836
BM_sendVec_binder16384   96161    64362    10793    1838.921  2747.468
BM_sendVec_binder32768  147875   107292     6296    1395.147  1922.858
BM_sendVec_binder65536  330324   232296     3053    605.7126  861.3209
NORAML TEST                                 SUM     10033.56  16623.35
stressapptest eat 2G                        SUM      9958.43  16497.55

Signed-off-by: Lei Liu <liulei.rjpt@vivo.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240619113841.3362-1-liulei.rjpt@vivo.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03 16:22:36 +02:00
Carlos Llamas
15d9da3f81 binder: use bitmap for faster descriptor lookup
When creating new binder references, the driver assigns a descriptor id
that is shared with userspace. Regrettably, the driver needs to keep the
descriptors small enough to accommodate userspace potentially using them
as Vector indexes. Currently, the driver performs a linear search on the
rb-tree of references to find the smallest available descriptor id. This
approach, however, scales poorly as the number of references grows.

This patch introduces the usage of bitmaps to boost the performance of
descriptor assignments. This optimization results in notable performance
gains, particularly in processes with a large number of references. The
following benchmark with 100,000 references showcases the difference in
latency between the dbitmap implementation and the legacy approach:

  [  587.145098] get_ref_desc_olocked: 15us (dbitmap on)
  [  602.788623] get_ref_desc_olocked: 47343us (dbitmap off)

Note the bitmap size is dynamically adjusted in line with the number of
references, ensuring efficient memory usage. In cases where growing the
bitmap is not possible, the driver falls back to the slow legacy method.

A previous attempt to solve this issue was proposed in [1]. However,
such method involved adding new ioctls which isn't great, plus older
userspace code would not have benefited from the optimizations either.

Link: https://lore.kernel.org/all/20240417191418.1341988-1-cmllamas@google.com/ [1]
Cc: Tim Murray <timmurray@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: John Stultz <jstultz@google.com>
Cc: Steven Moreland <smoreland@google.com>
Suggested-by: Nick Chen <chenjia3@oppo.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240612042535.1556708-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03 16:21:59 +02:00
Carlos Llamas
4231694133 binder: fix max_thread type inconsistency
The type defined for the BINDER_SET_MAX_THREADS ioctl was changed from
size_t to __u32 in order to avoid incompatibility issues between 32 and
64-bit kernels. However, the internal types used to copy from user and
store the value were never updated. Use u32 to fix the inconsistency.

Fixes: a9350fc859 ("staging: android: binder: fix BINDER_SET_MAX_THREADS declaration")
Reported-by: Arve Hjønnevåg <arve@android.com>
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240421173750.3117808-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-04 18:59:47 +02:00
Carlos Llamas
aaef73821a binder: check offset alignment in binder_get_object()
Commit 6d98eb95b4 ("binder: avoid potential data leakage when copying
txn") introduced changes to how binder objects are copied. In doing so,
it unintentionally removed an offset alignment check done through calls
to binder_alloc_copy_from_buffer() -> check_buffer().

These calls were replaced in binder_get_object() with copy_from_user(),
so now an explicit offset alignment check is needed here. This avoids
later complications when unwinding the objects gets harder.

It is worth noting this check existed prior to commit 7a67a39320
("binder: add function to copy binder object from buffer"), likely
removed due to redundancy at the time.

Fixes: 6d98eb95b4 ("binder: avoid potential data leakage when copying txn")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20240330190115.1877819-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-11 15:19:12 +02:00
Linus Torvalds
bb41fe35dc Char/Misc and other driver subsystem updates for 6.9-rc1
Here is the big set of char/misc and a number of other driver subsystem
 updates for 6.9-rc1.  Included in here are:
   - IIO driver updates, loads of new ones and evolution of existing ones
   - coresight driver updates
   - const cleanups for many driver subsystems
   - speakup driver additions
   - platform remove callback void cleanups
   - mei driver updates
   - mhi driver updates
   - cdx driver updates for MSI interrupt handling
   - nvmem driver updates
   - other smaller driver updates and cleanups, full details in the
     shortlog
 
 All of these have been in linux-next for a long time with no reported
 issue, other than a build warning with some older versions of gcc for a
 speakup driver, fix for that will come in a few days when I catch up
 with my pending patch queues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZfwuLg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynKVACgjvR1cD8NYk9PcGWc9ZaXAZ6zSnwAn260kMoe
 lLFtwszo7m0N6ZULBWBd
 =y3yz
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver subsystem updates from Greg KH:
 "Here is the big set of char/misc and a number of other driver
  subsystem updates for 6.9-rc1. Included in here are:

   - IIO driver updates, loads of new ones and evolution of existing ones

   - coresight driver updates

   - const cleanups for many driver subsystems

   - speakup driver additions

   - platform remove callback void cleanups

   - mei driver updates

   - mhi driver updates

   - cdx driver updates for MSI interrupt handling

   - nvmem driver updates

   - other smaller driver updates and cleanups, full details in the
    shortlog

  All of these have been in linux-next for a long time with no reported
  issue, other than a build warning for the speakup driver"

The build warning hits clang and is a gcc (and C23) extension, and is
fixed up in the merge.

Link: https://lore.kernel.org/all/20240321134831.GA2762840@dev-arch.thelio-3990X/

* tag 'char-misc-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (279 commits)
  binder: remove redundant variable page_addr
  uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion
  uio_pruss: UIO_MEM_DMA_COHERENT conversion
  cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
  uio: introduce UIO_MEM_DMA_COHERENT type
  cdx: add MSI support for CDX bus
  pps: use cflags-y instead of EXTRA_CFLAGS
  speakup: Add /dev/synthu device
  speakup: Fix 8bit characters from direct synth
  parport: sunbpp: Convert to platform remove callback returning void
  parport: amiga: Convert to platform remove callback returning void
  char: xillybus: Convert to platform remove callback returning void
  vmw_balloon: change maintainership
  MAINTAINERS: change the maintainer for hpilo driver
  char: xilinx_hwicap: Fix NULL vs IS_ERR() bug
  hpet: remove hpets::hp_clocksource
  platform: goldfish: move the separate 'default' propery for CONFIG_GOLDFISH
  char: xilinx_hwicap: drop casting to void in dev_set_drvdata
  greybus: move is_gb_* functions out of greybus.h
  greybus: Remove usage of the deprecated ida_simple_xx() API
  ...
2024-03-21 13:21:31 -07:00
Colin Ian King
367b3560e1 binder: remove redundant variable page_addr
Variable page_addr is being assigned a value that is never read. The
variable is redundant and can be removed.

Cleans up clang scan build warning:
warning: Value stored to 'page_addr' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@intel.com>
Fixes: 162c797314 ("binder: avoid user addresses in debug logs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312060851.cudv98wG-lkp@intel.com/
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240307221505.101431-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07 22:22:32 +00:00
Pierre Gondois
3fa2601e4a binder: use of hlist_count_nodes()
Make use of the newly added hlist_count_nodes().

Link: https://lkml.kernel.org/r/20240104164937.424320-3-pierre.gondois@arm.com
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:38:51 -08:00
Carlos Llamas
97830f3c30 binder: signal epoll threads of self-work
In (e)poll mode, threads often depend on I/O events to determine when
data is ready for consumption. Within binder, a thread may initiate a
command via BINDER_WRITE_READ without a read buffer and then make use
of epoll_wait() or similar to consume any responses afterwards.

It is then crucial that epoll threads are signaled via wakeup when they
queue their own work. Otherwise, they risk waiting indefinitely for an
event leaving their work unhandled. What is worse, subsequent commands
won't trigger a wakeup either as the thread has pending work.

Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Steven Moreland <smoreland@google.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240131215347.1808751-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-31 14:08:28 -08:00
Linus Torvalds
296455ade1 Char/Misc and other Driver changes for 6.8-rc1
Here is the big set of char/misc and other driver subsystem changes for
 6.8-rc1.  Lots of stuff in here, but first off, you will get a merge
 conflict in drivers/android/binder_alloc.c when merging this tree due to
 changing coming in through the -mm tree.
 
 The resolution of the merge issue can be found here:
 	https://lore.kernel.org/r/20231207134213.25631ae9@canb.auug.org.au
 or in a simpler patch form in that thread:
 	https://lore.kernel.org/r/ZXHzooF07LfQQYiE@google.com
 
 If there are issues with the merge of this file, please let me know.
 
 Other than lots of binder driver changes (as you can see by the merge
 conflicts) included in here are:
  - lots of iio driver updates and additions
  - spmi driver updates
  - eeprom driver updates
  - firmware driver updates
  - ocxl driver updates
  - mhi driver updates
  - w1 driver updates
  - nvmem driver updates
  - coresight driver updates
  - platform driver remove callback api changes
  - tags.sh script updates
  - bus_type constant marking cleanups
  - lots of other small driver updates
 
 All of these have been in linux-next for a while with no reported issues
 (other than the binder merge conflict.)
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeMMQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynWNgCfQ/Yz7QO6EMLDwHO5LRsb3YMhjL4AoNVdanjP
 YoI7f1I4GBcC0GKNfK6s
 =+Kyv
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver updates from Greg KH:
 "Here is the big set of char/misc and other driver subsystem changes
  for 6.8-rc1.

  Other than lots of binder driver changes (as you can see by the merge
  conflicts) included in here are:

   - lots of iio driver updates and additions

   - spmi driver updates

   - eeprom driver updates

   - firmware driver updates

   - ocxl driver updates

   - mhi driver updates

   - w1 driver updates

   - nvmem driver updates

   - coresight driver updates

   - platform driver remove callback api changes

   - tags.sh script updates

   - bus_type constant marking cleanups

   - lots of other small driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (341 commits)
  android: removed duplicate linux/errno
  uio: Fix use-after-free in uio_open
  drivers: soc: xilinx: add check for platform
  firmware: xilinx: Export function to use in other module
  scripts/tags.sh: remove find_sources
  scripts/tags.sh: use -n to test archinclude
  scripts/tags.sh: add local annotation
  scripts/tags.sh: use more portable -path instead of -wholename
  scripts/tags.sh: Update comment (addition of gtags)
  firmware: zynqmp: Convert to platform remove callback returning void
  firmware: turris-mox-rwtm: Convert to platform remove callback returning void
  firmware: stratix10-svc: Convert to platform remove callback returning void
  firmware: stratix10-rsu: Convert to platform remove callback returning void
  firmware: raspberrypi: Convert to platform remove callback returning void
  firmware: qemu_fw_cfg: Convert to platform remove callback returning void
  firmware: mtk-adsp-ipc: Convert to platform remove callback returning void
  firmware: imx-dsp: Convert to platform remove callback returning void
  firmware: coreboot_table: Convert to platform remove callback returning void
  firmware: arm_scpi: Convert to platform remove callback returning void
  firmware: arm_scmi: Convert to platform remove callback returning void
  ...
2024-01-17 16:47:17 -08:00
Linus Torvalds
fb46e22a9e Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
 
 - Peng Zhang has done some mapletree maintainance work in the
   series
 
 	"maple_tree: add mt_free_one() and mt_attr() helpers"
 	"Some cleanups of maple tree"
 
 - In the series "mm: use memmap_on_memory semantics for dax/kmem"
   Vishal Verma has altered the interworking between memory-hotplug
   and dax/kmem so that newly added 'device memory' can more easily
   have its memmap placed within that newly added memory.
 
 - Matthew Wilcox continues folio-related work (including a few
   fixes) in the patch series
 
 	"Add folio_zero_tail() and folio_fill_tail()"
 	"Make folio_start_writeback return void"
 	"Fix fault handler's handling of poisoned tail pages"
 	"Convert aops->error_remove_page to ->error_remove_folio"
 	"Finish two folio conversions"
 	"More swap folio conversions"
 
 - Kefeng Wang has also contributed folio-related work in the series
 
 	"mm: cleanup and use more folio in page fault"
 
 - Jim Cromie has improved the kmemleak reporting output in the
   series "tweak kmemleak report format".
 
 - In the series "stackdepot: allow evicting stack traces" Andrey
   Konovalov to permits clients (in this case KASAN) to cause
   eviction of no longer needed stack traces.
 
 - Charan Teja Kalla has fixed some accounting issues in the page
   allocator's atomic reserve calculations in the series "mm:
   page_alloc: fixes for high atomic reserve caluculations".
 
 - Dmitry Rokosov has added to the samples/ dorectory some sample
   code for a userspace memcg event listener application.  See the
   series "samples: introduce cgroup events listeners".
 
 - Some mapletree maintanance work from Liam Howlett in the series
   "maple_tree: iterator state changes".
 
 - Nhat Pham has improved zswap's approach to writeback in the
   series "workload-specific and memory pressure-driven zswap
   writeback".
 
 - DAMON/DAMOS feature and maintenance work from SeongJae Park in
   the series
 
 	"mm/damon: let users feed and tame/auto-tune DAMOS"
 	"selftests/damon: add Python-written DAMON functionality tests"
 	"mm/damon: misc updates for 6.8"
 
 - Yosry Ahmed has improved memcg's stats flushing in the series
   "mm: memcg: subtree stats flushing and thresholds".
 
 - In the series "Multi-size THP for anonymous memory" Ryan Roberts
   has added a runtime opt-in feature to transparent hugepages which
   improves performance by allocating larger chunks of memory during
   anonymous page faults.
 
 - Matthew Wilcox has also contributed some cleanup and maintenance
   work against eh buffer_head code int he series "More buffer_head
   cleanups".
 
 - Suren Baghdasaryan has done work on Andrea Arcangeli's series
   "userfaultfd move option".  UFFDIO_MOVE permits userspace heap
   compaction algorithms to move userspace's pages around rather than
   UFFDIO_COPY'a alloc/copy/free.
 
 - Stefan Roesch has developed a "KSM Advisor", in the series
   "mm/ksm: Add ksm advisor".  This is a governor which tunes KSM's
   scanning aggressiveness in response to userspace's current needs.
 
 - Chengming Zhou has optimized zswap's temporary working memory
   use in the series "mm/zswap: dstmem reuse optimizations and
   cleanups".
 
 - Matthew Wilcox has performed some maintenance work on the
   writeback code, both code and within filesystems.  The series is
   "Clean up the writeback paths".
 
 - Andrey Konovalov has optimized KASAN's handling of alloc and
   free stack traces for secondary-level allocators, in the series
   "kasan: save mempool stack traces".
 
 - Andrey also performed some KASAN maintenance work in the series
   "kasan: assorted clean-ups".
 
 - David Hildenbrand has gone to town on the rmap code.  Cleanups,
   more pte batching, folio conversions and more.  See the series
   "mm/rmap: interface overhaul".
 
 - Kinsey Ho has contributed some maintenance work on the MGLRU
   code in the series "mm/mglru: Kconfig cleanup".
 
 - Matthew Wilcox has contributed lruvec page accounting code
   cleanups in the series "Remove some lruvec page accounting
   functions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
 jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
 =0NHs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

	'maple_tree: add mt_free_one() and mt_attr() helpers'
	'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

	'Add folio_zero_tail() and folio_fill_tail()'
	'Make folio_start_writeback return void'
	'Fix fault handler's handling of poisoned tail pages'
	'Convert aops->error_remove_page to ->error_remove_folio'
	'Finish two folio conversions'
	'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

	'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

	'mm/damon: let users feed and tame/auto-tune DAMOS'
	'selftests/damon: add Python-written DAMON functionality tests'
	'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...
2024-01-09 11:18:47 -08:00
Tanzir Hasan
5850edccec android: removed duplicate linux/errno
There are two linux/errno.h inclusions in this file. The second one has
been removed and the file builds correctly.

Fixes: 54ffdab820 ("android: binder: binderfs.c: removed asm-generic/errno-base.h")
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Tanzir Hasan <tanzirh@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240104-removeduperror-v1-1-d170d4b3675a@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-07 09:48:26 +01:00
Tanzir Hasan
54ffdab820 android: binder: binderfs.c: removed asm-generic/errno-base.h
asm-generic/errno-base.h can be replaced by linux/errno.h and the file
will still build correctly. It is an asm-generic file which should be
avoided if possible.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tanzir Hasan <tanzirh@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231226-binderfs-v1-1-66829e92b523@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-04 16:36:07 +01:00
Nhat Pham
0a97c01cd2 list_lru: allow explicit memcg and NUMA node selection
Patch series "workload-specific and memory pressure-driven zswap
writeback", v8.

There are currently several issues with zswap writeback:

1. There is only a single global LRU for zswap, making it impossible to
   perform worload-specific shrinking - an memcg under memory pressure
   cannot determine which pages in the pool it owns, and often ends up
   writing pages from other memcgs. This issue has been previously
   observed in practice and mitigated by simply disabling
   memcg-initiated shrinking:

   https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u

   But this solution leaves a lot to be desired, as we still do not
   have an avenue for an memcg to free up its own memory locked up in
   the zswap pool.

2. We only shrink the zswap pool when the user-defined limit is hit.
   This means that if we set the limit too high, cold data that are
   unlikely to be used again will reside in the pool, wasting precious
   memory. It is hard to predict how much zswap space will be needed
   ahead of time, as this depends on the workload (specifically, on
   factors such as memory access patterns and compressibility of the
   memory pages).

This patch series solves these issues by separating the global zswap LRU
into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e
memcg- and NUMA-aware) zswap writeback under memory pressure.  The new
shrinker does not have any parameter that must be tuned by the user, and
can be opted in or out on a per-memcg basis.

As a proof of concept, we ran the following synthetic benchmark: build the
linux kernel in a memory-limited cgroup, and allocate some cold data in
tmpfs to see if the shrinker could write them out and improved the overall
performance.  Depending on the amount of cold data generated, we observe
from 14% to 35% reduction in kernel CPU time used in the kernel builds.


This patch (of 6):

The interface of list_lru is based on the assumption that the list node
and the data it represents belong to the same allocated on the correct
node/memcg.  While this assumption is valid for existing slab objects LRU
such as dentries and inodes, it is undocumented, and rather inflexible for
certain potential list_lru users (such as the upcoming zswap shrinker and
the THP shrinker).  It has caused us a lot of issues during our
development.

This patch changes list_lru interface so that the caller must explicitly
specify numa node and memcg when adding and removing objects.  The old
list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and
list_lru_del_obj(), respectively.

It also extends the list_lru API with a new function, list_lru_putback,
which undoes a previous list_lru_isolate call.  Unlike list_lru_add, it
does not increment the LRU node count (as list_lru_isolate does not
decrement the node count).  list_lru_putback also allows for explicit
memcg and NUMA node selection.

Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 10:57:01 -08:00
Christian Brauner
a88c955fcf
file: s/close_fd_get_file()/file_close_fd()/g
That really shouldn't have "get" in there as that implies we're bumping
the reference count which we don't do at all. We used to but not anmore.
Now we're just closing the fd and pick that file from the fdtable
without bumping the reference count. Update the wrong documentation
while at it.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-12 14:24:13 +01:00
Randy Dunlap
96d1d578de android: binder: fix a kernel-doc enum warning
Add kernel-doc notation for @LOOP_END to prevent a kernel-doc
warning.

binder_alloc_selftest.c:76: warning: Enum value 'LOOP_END' not described in enum 'buf_end_align_type'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christian Brauner <christian@brauner.io>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231205225324.32362-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 08:48:26 +09:00
Carlos Llamas
7710e2cca3 binder: switch alloc->mutex to spinlock_t
The alloc->mutex is a highly contended lock that causes performance
issues on Android devices. When a low-priority task is given this lock
and it sleeps, it becomes difficult for the task to wake up and complete
its work. This delays other tasks that are also waiting on the mutex.

The problem gets worse when there is memory pressure in the system,
because this increases the contention on the alloc->mutex while the
shrinker reclaims binder pages.

Switching to a spinlock helps to keep the waiters running and avoids the
overhead of waking up tasks. This significantly improves the transaction
latency when the problematic scenario occurs.

The performance impact of this patchset was measured by stress-testing
the binder alloc contention. In this test, several clients of different
priorities send thousands of transactions of different sizes to a single
server. In parallel, pages get reclaimed using the shinker's debugfs.

The test was run on a Pixel 8, Pixel 6 and qemu machine. The results
were similar on all three devices:

after:
  | sched  | prio | average | max       | min     |
  |--------+------+---------+-----------+---------|
  | fifo   |   99 | 0.135ms |   1.197ms | 0.022ms |
  | fifo   |   01 | 0.136ms |   5.232ms | 0.018ms |
  | other  |  -20 | 0.180ms |   7.403ms | 0.019ms |
  | other  |   19 | 0.241ms |  58.094ms | 0.018ms |

before:
  | sched  | prio | average | max       | min     |
  |--------+------+---------+-----------+---------|
  | fifo   |   99 | 0.350ms | 248.730ms | 0.020ms |
  | fifo   |   01 | 0.357ms | 248.817ms | 0.024ms |
  | other  |  -20 | 0.399ms | 249.906ms | 0.020ms |
  | other  |   19 | 0.477ms | 297.756ms | 0.022ms |

The key metrics above are the average and max latencies (wall time).
These improvements should roughly translate to p95-p99 latencies on real
workloads. The response time is up to 200x faster in these scenarios and
there is no penalty in the regular path.

Note that it is only possible to convert this lock after a series of
changes made by previous patches. These mainly include refactoring the
sections that might_sleep() and changing the locking order with the
mmap_lock amongst others.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-29-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:41 +09:00
Carlos Llamas
e50f4e6cc9 binder: reverse locking order in shrinker callback
The locking order currently requires the alloc->mutex to be acquired
first followed by the mmap lock. However, the alloc->mutex is converted
into a spinlock in subsequent commits so the order needs to be reversed
to avoid nesting the sleeping mmap lock under the spinlock.

The shrinker's callback binder_alloc_free_page() is the only place that
needs to be reordered since other functions have been refactored and no
longer nest these locks.

Some minor cosmetic changes are also included in this patch.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-28-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:41 +09:00
Carlos Llamas
162c797314 binder: avoid user addresses in debug logs
Prefer logging vma offsets instead of addresses or simply drop the debug
log altogether if not useful. Note this covers the instances affected by
the switch to store addresses as unsigned long. However, there are other
sections in the driver that could do the same.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-27-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
f07b83a48e binder: refactor binder_delete_free_buffer()
Skip the freelist call immediately as needed, instead of continuing the
pointless checks. Also, drop the debug logs that we don't really need.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-26-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
8e905217c4 binder: collapse print_binder_buffer() into caller
The code in print_binder_buffer() is quite small so it can be collapsed
into its single caller binder_alloc_print_allocated().

No functional change in this patch.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-25-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
67dcc88078 binder: document the final page calculation
The code to determine the page range for binder_lru_freelist_del() is
quite obscure. It leverages the buffer_size calculated before doing an
oversized buffer split. This is used to figure out if the last page is
being shared with another active buffer. If so, the page gets trimmed
out of the range as it has been previously removed from the freelist.

This would be equivalent to getting the start page of the next in-use
buffer explicitly. However, the code for this is much larger as we can
see in binder_free_buf_locked() routine. Instead, lets settle on
documenting the tricky step and using better names for now.

I believe an ideal solution would be to count the binder_page->users to
determine when a page should be added or removed from the freelist.
However, this is a much bigger change than what I'm willing to risk at
this time.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-24-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
ea9cdbf0c7 binder: rename lru shrinker utilities
Now that the page allocation step is done separately we should rename
the binder_free_page_range() and binder_allocate_page_range() functions
to provide a more accurate description of what they do. Lets borrow the
freelist concept used in other parts of the kernel for this.

No functional change here.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-23-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
de0e657312 binder: make oversized buffer code more readable
The sections in binder_alloc_new_buf_locked() dealing with oversized
buffers are scattered which makes them difficult to read. Instead,
consolidate this code into a single block to improve readability.

No functional change here.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-22-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
258ce20ede binder: remove redundant debug log
The debug information in this statement is already logged earlier in the
same function. We can get rid of this duplicate log.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-21-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
37ebbb4f73 binder: perform page installation outside of locks
Split out the insertion of pages to be outside of the alloc->mutex in a
separate binder_install_buffer_pages() routine. Since this is no longer
serialized, we must look at the full range of pages used by the buffers.
The installation is protected with mmap_sem in write mode since multiple
tasks might race to install the same page.

Besides avoiding unnecessary nested locking this helps in preparation of
switching the alloc->mutex into a spinlock_t in subsequent patches.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-20-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
68aef12d09 binder: initialize lru pages in mmap callback
Rather than repeatedly initializing some of the binder_lru_page members
during binder_alloc_new_buf(), perform this initialization just once in
binder_alloc_mmap_handler(), after the pages have been created.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-19-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:40 +09:00
Carlos Llamas
c7ac30fad1 binder: malloc new_buffer outside of locks
Preallocate new_buffer before acquiring the alloc->mutex and hand it
down to binder_alloc_new_buf_locked(). The new buffer will be used in
the vast majority of requests (measured at 98.2% in field data). The
buffer is discarded otherwise. This change is required in preparation
for transitioning alloc->mutex into a spinlock in subsequent commits.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-18-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
ea2735ce19 binder: refactor page range allocation
Instead of looping through the page range twice to first determine if
the mmap lock is required, simply do it per-page as needed. Split out
all this logic into a separate binder_install_single_page() function.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-17-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
cbc174a64b binder: relocate binder_alloc_clear_buf()
Move this function up along with binder_alloc_get_page() so that their
prototypes aren't necessary.

No functional change in this patch.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-16-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
c13500eaab binder: relocate low space calculation
Move the low async space calculation to debug_low_async_space_locked().
This logic not only fits better here but also offloads some of the many
tasks currently done in binder_alloc_new_buf_locked().

No functional change in this patch.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-15-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
9409af24e4 binder: separate the no-space debugging logic
Move the no-space debugging logic into a separate function. Lets also
mark this branch as unlikely in binder_alloc_new_buf_locked() as most
requests will fit without issue.

Also add a few cosmetic changes and suggestions from checkpatch.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-14-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
89f71743bf binder: remove pid param in binder_alloc_new_buf()
Binder attributes the buffer allocation to the current->tgid everytime.
There is no need to pass this as a parameter so drop it.

Also add a few touchups to follow the coding guidelines. No functional
changes are introduced in this patch.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-13-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
377e1684db binder: do unlocked work in binder_alloc_new_buf()
Extract non-critical sections from binder_alloc_new_buf_locked() that
don't require holding the alloc->mutex. While we are here, consolidate
the checks for size overflow and zero-sized padding into a separate
sanitized_size() helper function.

Also add a few touchups to follow the coding guidelines.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-12-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
0d35bf3bf2 binder: split up binder_update_page_range()
The binder_update_page_range() function performs both allocation and
freeing of binder pages. However, these two operations are unrelated and
have no common logic. In fact, when a free operation is requested, the
allocation logic is skipped entirely. This behavior makes the error path
unnecessarily complex. To improve readability of the code, this patch
splits the allocation and freeing operations into separate functions.

No functional changes are introduced by this patch.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-11-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:39 +09:00
Carlos Llamas
df9aabead7 binder: keep vma addresses type as unsigned long
The vma addresses in binder are currently stored as void __user *. This
requires casting back and forth between the mm/ api which uses unsigned
long. Since we also do internal arithmetic on these addresses we end up
having to cast them _again_ to an integer type.

Lets stop all the unnecessary casting which kills code readability and
store the virtual addresses as the native unsigned long from mm/. Note
that this approach is preferred over uintptr_t as Linus explains in [1].

Opportunistically add a few cosmetic touchups.

Link: https://lore.kernel.org/all/CAHk-=wj2OHy-5e+srG1fy+ZU00TmZ1NFp6kFLbVLMXHe7A1d-g@mail.gmail.com/ [1]
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-10-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
da483f8b39 binder: remove extern from function prototypes
The kernel coding style does not require 'extern' in function prototypes
in .h files, so remove them from drivers/android/binder_alloc.h as they
are not needed.

No functional changes in this patch.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-9-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
e1090371e0 binder: fix comment on binder_alloc_new_buf() return value
Update the comments of binder_alloc_new_buf() to reflect that the return
value of the function is now ERR_PTR(-errno) on failure.

No functional changes in this patch.

Cc: stable@vger.kernel.org
Fixes: 57ada2fb22 ("binder: add log information for binder transaction failures")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-8-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
122a3c1cb0 binder: fix trivial typo of binder_free_buf_locked()
Fix minor misspelling of the function in the comment section.

No functional changes in this patch.

Cc: stable@vger.kernel.org
Fixes: 0f966cba95 ("binder: add flag to clear buffer on txn complete")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-7-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
c6d05e0762 binder: fix unused alloc->free_async_space
Each transaction is associated with a 'struct binder_buffer' that stores
the metadata about its buffer area. Since commit 74310e06be ("android:
binder: Move buffer out of area shared with user space") this struct is
no longer embedded within the buffer itself but is instead allocated on
the heap to prevent userspace access to this driver-exclusive info.

Unfortunately, the space of this struct is still being accounted for in
the total buffer size calculation, specifically for async transactions.
This results in an additional 104 bytes added to every async buffer
request, and this area is never used.

This wasted space can be substantial. If we consider the maximum mmap
buffer space of SZ_4M, the driver will reserve half of it for async
transactions, or 0x200000. This area should, in theory, accommodate up
to 262,144 buffers of the minimum 8-byte size. However, after adding
the extra 'sizeof(struct binder_buffer)', the total number of buffers
drops to only 18,724, which is a sad 7.14% of the actual capacity.

This patch fixes the buffer size calculation to enable the utilization
of the entire async buffer space. This is expected to reduce the number
of -ENOSPC errors that are seen on the field.

Fixes: 74310e06be ("android: binder: Move buffer out of area shared with user space")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-6-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
3091c21d3e binder: fix async space check for 0-sized buffers
Move the padding of 0-sized buffers to an earlier stage to account for
this round up during the alloc->free_async_space check.

Fixes: 74310e06be ("android: binder: Move buffer out of area shared with user space")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-5-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
9a9ab0d963 binder: fix race between mmput() and do_exit()
Task A calls binder_update_page_range() to allocate and insert pages on
a remote address space from Task B. For this, Task A pins the remote mm
via mmget_not_zero() first. This can race with Task B do_exit() and the
final mmput() refcount decrement will come from Task A.

  Task A            | Task B
  ------------------+------------------
  mmget_not_zero()  |
                    |  do_exit()
                    |    exit_mm()
                    |      mmput()
  mmput()           |
    exit_mmap()     |
      remove_vma()  |
        fput()      |

In this case, the work of ____fput() from Task B is queued up in Task A
as TWA_RESUME. So in theory, Task A returns to userspace and the cleanup
work gets executed. However, Task A instead sleep, waiting for a reply
from Task B that never comes (it's dead).

This means the binder_deferred_release() is blocked until an unrelated
binder event forces Task A to go back to userspace. All the associated
death notifications will also be delayed until then.

In order to fix this use mmput_async() that will schedule the work in
the corresponding mm->async_put_work WQ instead of Task A.

Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-4-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
3f489c2067 binder: fix use-after-free in shinker's callback
The mmap read lock is used during the shrinker's callback, which means
that using alloc->vma pointer isn't safe as it can race with munmap().
As of commit dd2283f260 ("mm: mmap: zap pages with read mmap_sem in
munmap") the mmap lock is downgraded after the vma has been isolated.

I was able to reproduce this issue by manually adding some delays and
triggering page reclaiming through the shrinker's debug sysfs. The
following KASAN report confirms the UAF:

  ==================================================================
  BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
  Read of size 8 at addr ffff356ed50e50f0 by task bash/478

  CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   zap_page_range_single+0x470/0x4b8
   binder_alloc_free_page+0x608/0xadc
   __list_lru_walk_one+0x130/0x3b0
   list_lru_walk_node+0xc4/0x22c
   binder_shrink_scan+0x108/0x1dc
   shrinker_debugfs_scan_write+0x2b4/0x500
   full_proxy_write+0xd4/0x140
   vfs_write+0x1ac/0x758
   ksys_write+0xf0/0x1dc
   __arm64_sys_write+0x6c/0x9c

  Allocated by task 492:
   kmem_cache_alloc+0x130/0x368
   vm_area_alloc+0x2c/0x190
   mmap_region+0x258/0x18bc
   do_mmap+0x694/0xa60
   vm_mmap_pgoff+0x170/0x29c
   ksys_mmap_pgoff+0x290/0x3a0
   __arm64_sys_mmap+0xcc/0x144

  Freed by task 491:
   kmem_cache_free+0x17c/0x3c8
   vm_area_free_rcu_cb+0x74/0x98
   rcu_core+0xa38/0x26d4
   rcu_core_si+0x10/0x1c
   __do_softirq+0x2fc/0xd24

  Last potentially related work creation:
   __call_rcu_common.constprop.0+0x6c/0xba0
   call_rcu+0x10/0x1c
   vm_area_free+0x18/0x24
   remove_vma+0xe4/0x118
   do_vmi_align_munmap.isra.0+0x718/0xb5c
   do_vmi_munmap+0xdc/0x1fc
   __vm_munmap+0x10c/0x278
   __arm64_sys_munmap+0x58/0x7c

Fix this issue by performing instead a vma_lookup() which will fail to
find the vma that was isolated before the mmap lock downgrade. Note that
this option has better performance than upgrading to a mmap write lock
which would increase contention. Plus, mmap_write_trylock() has been
recently removed anyway.

Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Cc: stable@vger.kernel.org
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:38 +09:00
Carlos Llamas
6ac061db9c binder: use EPOLLERR from eventpoll.h
Use EPOLLERR instead of POLLERR to make sure it is cast to the correct
__poll_t type. This fixes the following sparse issue:

  drivers/android/binder.c:5030:24: warning: incorrect type in return expression (different base types)
  drivers/android/binder.c:5030:24:    expected restricted __poll_t
  drivers/android/binder.c:5030:24:    got int

Fixes: f88982679f ("binder: check for binder_thread allocation failure in binder_poll()")
Cc: stable@vger.kernel.org
Cc: Eric Biggers <ebiggers@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20231201172212.1813387-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05 09:23:37 +09:00
Linus Torvalds
d99b91a99b Char/Misc and other driver changes for 6.7-rc1
Here is the big set of char/misc and other small driver subsystem
 changes for 6.7-rc1.  Included in here are:
   - IIO subsystem driver updates and additions (largest part of this
     pull request)
   - FPGA subsystem driver updates
   - Counter subsystem driver updates
   - ICC subsystem driver updates
   - extcon subsystem driver updates
   - mei driver updates and additions
   - nvmem subsystem driver updates and additions
   - comedi subsystem dependency fixes
   - parport driver fixups
   - cdx subsystem driver and core updates
   - splice support for /dev/zero and /dev/full
   - other smaller driver cleanups
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTSzg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylH3QCfbZuG8MiglEZUd4slRLUNqcRQ5tQAn1yKpDFo
 l3KLkxo1UTLMXbJBWe+b
 =gafK
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here is the big set of char/misc and other small driver subsystem
  changes for 6.7-rc1. Included in here are:

   - IIO subsystem driver updates and additions (largest part of this
     pull request)

   - FPGA subsystem driver updates

   - Counter subsystem driver updates

   - ICC subsystem driver updates

   - extcon subsystem driver updates

   - mei driver updates and additions

   - nvmem subsystem driver updates and additions

   - comedi subsystem dependency fixes

   - parport driver fixups

   - cdx subsystem driver and core updates

   - splice support for /dev/zero and /dev/full

   - other smaller driver cleanups

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (326 commits)
  cdx: add sysfs for subsystem, class and revision
  cdx: add sysfs for bus reset
  cdx: add support for bus enable and disable
  cdx: Register cdx bus as a device on cdx subsystem
  cdx: Create symbol namespaces for cdx subsystem
  cdx: Introduce lock to protect controller ops
  cdx: Remove cdx controller list from cdx bus system
  dts: ti: k3-am625-beagleplay: Add beaglecc1352
  greybus: Add BeaglePlay Linux Driver
  dt-bindings: net: Add ti,cc1352p7
  dt-bindings: eeprom: at24: allow NVMEM cells based on old syntax
  dt-bindings: nvmem: SID: allow NVMEM cells based on old syntax
  Revert "nvmem: add new config option"
  MAINTAINERS: coresight: Add missing Coresight files
  misc: pci_endpoint_test: Add deviceID for J721S2 PCIe EP device support
  firmware: xilinx: Move EXPORT_SYMBOL_GPL next to zynqmp_pm_feature definition
  uacce: make uacce_class constant
  ocxl: make ocxl_class constant
  cxl: make cxl_class constant
  misc: phantom: make phantom_class constant
  ...
2023-11-03 14:51:08 -10:00
Linus Torvalds
ecae0bd517 Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
 
 - Kemeng Shi has contributed some compation maintenance work in the
   series "Fixes and cleanups to compaction".
 
 - Joel Fernandes has a patchset ("Optimize mremap during mutual
   alignment within PMD") which fixes an obscure issue with mremap()'s
   pagetable handling during a subsequent exec(), based upon an
   implementation which Linus suggested.
 
 - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
   following patch series:
 
 	mm/damon: misc fixups for documents, comments and its tracepoint
 	mm/damon: add a tracepoint for damos apply target regions
 	mm/damon: provide pseudo-moving sum based access rate
 	mm/damon: implement DAMOS apply intervals
 	mm/damon/core-test: Fix memory leaks in core-test
 	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
 
 - In the series "Do not try to access unaccepted memory" Adrian Hunter
   provides some fixups for the recently-added "unaccepted memory' feature.
   To increase the feature's checking coverage.  "Plug a few gaps where
   RAM is exposed without checking if it is unaccepted memory".
 
 - In the series "cleanups for lockless slab shrink" Qi Zheng has done
   some maintenance work which is preparation for the lockless slab
   shrinking code.
 
 - Qi Zheng has redone the earlier (and reverted) attempt to make slab
   shrinking lockless in the series "use refcount+RCU method to implement
   lockless slab shrink".
 
 - David Hildenbrand contributes some maintenance work for the rmap code
   in the series "Anon rmap cleanups".
 
 - Kefeng Wang does more folio conversions and some maintenance work in
   the migration code.  Series "mm: migrate: more folio conversion and
   unification".
 
 - Matthew Wilcox has fixed an issue in the buffer_head code which was
   causing long stalls under some heavy memory/IO loads.  Some cleanups
   were added on the way.  Series "Add and use bdev_getblk()".
 
 - In the series "Use nth_page() in place of direct struct page
   manipulation" Zi Yan has fixed a potential issue with the direct
   manipulation of hugetlb page frames.
 
 - In the series "mm: hugetlb: Skip initialization of gigantic tail
   struct pages if freed by HVO" has improved our handling of gigantic
   pages in the hugetlb vmmemmep optimizaton code.  This provides
   significant boot time improvements when significant amounts of gigantic
   pages are in use.
 
 - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
   rationalization and folio conversions in the hugetlb code.
 
 - Yin Fengwei has improved mlock()'s handling of large folios in the
   series "support large folio for mlock"
 
 - In the series "Expose swapcache stat for memcg v1" Liu Shixin has
   added statistics for memcg v1 users which are available (and useful)
   under memcg v2.
 
 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
   prctl so that userspace may direct the kernel to not automatically
   propagate the denial to child processes.  The series is named "MDWE
   without inheritance".
 
 - Kefeng Wang has provided the series "mm: convert numa balancing
   functions to use a folio" which does what it says.
 
 - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
   makes is possible for a process to propagate KSM treatment across
   exec().
 
 - Huang Ying has enhanced memory tiering's calculation of memory
   distances.  This is used to permit the dax/kmem driver to use "high
   bandwidth memory" in addition to Optane Data Center Persistent Memory
   Modules (DCPMM).  The series is named "memory tiering: calculate
   abstract distance based on ACPI HMAT"
 
 - In the series "Smart scanning mode for KSM" Stefan Roesch has
   optimized KSM by teaching it to retain and use some historical
   information from previous scans.
 
 - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
   series "mm: memcg: fix tracking of pending stats updates values".
 
 - In the series "Implement IOCTL to get and optionally clear info about
   PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
   us to atomically read-then-clear page softdirty state.  This is mainly
   used by CRIU.
 
 - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
   - a bunch of relatively minor maintenance tweaks to this code.
 
 - Matthew Wilcox has increased the use of the VMA lock over file-backed
   page faults in the series "Handle more faults under the VMA lock".  Some
   rationalizations of the fault path became possible as a result.
 
 - In the series "mm/rmap: convert page_move_anon_rmap() to
   folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
   and folio conversions.
 
 - In the series "various improvements to the GUP interface" Lorenzo
   Stoakes has simplified and improved the GUP interface with an eye to
   providing groundwork for future improvements.
 
 - Andrey Konovalov has sent along the series "kasan: assorted fixes and
   improvements" which does those things.
 
 - Some page allocator maintenance work from Kemeng Shi in the series
   "Two minor cleanups to break_down_buddy_pages".
 
 - In thes series "New selftest for mm" Breno Leitao has developed
   another MM self test which tickles a race we had between madvise() and
   page faults.
 
 - In the series "Add folio_end_read" Matthew Wilcox provides cleanups
   and an optimization to the core pagecache code.
 
 - Nhat Pham has added memcg accounting for hugetlb memory in the series
   "hugetlb memcg accounting".
 
 - Cleanups and rationalizations to the pagemap code from Lorenzo
   Stoakes, in the series "Abstract vma_merge() and split_vma()".
 
 - Audra Mitchell has fixed issues in the procfs page_owner code's new
   timestamping feature which was causing some misbehaviours.  In the
   series "Fix page_owner's use of free timestamps".
 
 - Lorenzo Stoakes has fixed the handling of new mappings of sealed files
   in the series "permit write-sealed memfd read-only shared mappings".
 
 - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
   series "Batch hugetlb vmemmap modification operations".
 
 - Some buffer_head folio conversions and cleanups from Matthew Wilcox in
   the series "Finish the create_empty_buffers() transition".
 
 - As a page allocator performance optimization Huang Ying has added
   automatic tuning to the allocator's per-cpu-pages feature, in the series
   "mm: PCP high auto-tuning".
 
 - Roman Gushchin has contributed the patchset "mm: improve performance
   of accounted kernel memory allocations" which improves their performance
   by ~30% as measured by a micro-benchmark.
 
 - folio conversions from Kefeng Wang in the series "mm: convert page
   cpupid functions to folios".
 
 - Some kmemleak fixups in Liu Shixin's series "Some bugfix about
   kmemleak".
 
 - Qi Zheng has improved our handling of memoryless nodes by keeping them
   off the allocation fallback list.  This is done in the series "handle
   memoryless nodes more appropriately".
 
 - khugepaged conversions from Vishal Moola in the series "Some
   khugepaged folio conversions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
 jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
 FgeUPAD1oasg6CP+INZvCj34waNxwAc=
 =E+Y4
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
2023-11-02 19:38:47 -10:00
Linus Torvalds
14ab6d425e vfs-6.7.ctime
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc
 okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC
 YkBLU3lXqQ87nfu28ExFAzh10hG2jwM=
 =m4pB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode time accessor updates from Christian Brauner:
 "This finishes the conversion of all inode time fields to accessor
  functions as discussed on list. Changing timestamps manually as we
  used to do before is error prone. Using accessors function makes this
  robust.

  It does not contain the switch of the time fields to discrete 64 bit
  integers to replace struct timespec and free up space in struct inode.
  But after this, the switch can be trivially made and the patch should
  only affect the vfs if we decide to do it"

* tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
  fs: rename inode i_atime and i_mtime fields
  security: convert to new timestamp accessors
  selinux: convert to new timestamp accessors
  apparmor: convert to new timestamp accessors
  sunrpc: convert to new timestamp accessors
  mm: convert to new timestamp accessors
  bpf: convert to new timestamp accessors
  ipc: convert to new timestamp accessors
  linux: convert to new timestamp accessors
  zonefs: convert to new timestamp accessors
  xfs: convert to new timestamp accessors
  vboxsf: convert to new timestamp accessors
  ufs: convert to new timestamp accessors
  udf: convert to new timestamp accessors
  ubifs: convert to new timestamp accessors
  tracefs: convert to new timestamp accessors
  sysv: convert to new timestamp accessors
  squashfs: convert to new timestamp accessors
  server: convert to new timestamp accessors
  client: convert to new timestamp accessors
  ...
2023-10-30 09:47:13 -10:00
Jeff Layton
5463704f78
android: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-3-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18 13:26:15 +02:00