linux

Author	SHA1	Message	Date
Chunfeng Yun	259714100d	PM / wakeirq: support enabling wake-up irq after runtime_suspend called When the dedicated wake IRQ is level trigger, and it uses the device's low-power status as the wakeup source, that means if the device is not in low-power state, the wake IRQ will be triggered if enabled; For this case, need enable the wake IRQ after running the device's ->runtime_suspend() which make it enter low-power state. e.g. Assume the wake IRQ is a low level trigger type, and the wakeup signal comes from the low-power status of the device. The wakeup signal is low level at running time (0), and becomes high level when the device enters low-power state (runtime_suspend (1) is called), a wakeup event at (2) make the device exit low-power state, then the wakeup signal also becomes low level. ------------------ \| ^ ^\| ---------------- \| \| -------------- \|<---(0)--->\|<--(1)--\| (3) (2) (4) if enable the wake IRQ before running runtime_suspend during (0), a wake IRQ will arise, it causes resume immediately; it works if enable wake IRQ ( e.g. at (3) or (4)) after running ->runtime_suspend(). This patch introduces a new status WAKE_IRQ_DEDICATED_REVERSE to optionally support enabling wake IRQ after running ->runtime_suspend(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-10-27 20:49:32 +02:00
Andy Shevchenko	27e0bcd029	device property: Drop redundant NULL checks In cases when functions are called via fwnode operations, we already know that this is software node we are dealing with, hence no need to check if it's NULL, it can't be, Reported-by: YE Chengfeng <cyeaa@connect.ust.hk> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211026162954.89811-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-26 19:14:01 +02:00
Rafael J. Wysocki	23f62d7ab2	PM: sleep: Pause cpuidle later and resume it earlier during system transitions Commit `8651f97bd9` ("PM / cpuidle: System resume hang fix with cpuidle") that introduced cpuidle pausing during system suspend did that to work around a platform firmware issue causing systems to hang during resume if CPUs were allowed to enter idle states in the system suspend and resume code paths. However, pausing cpuidle before the last phase of suspending devices is the source of an otherwise arbitrary difference between the suspend-to-idle path and other system suspend variants, so it is cleaner to do that later, before taking secondary CPUs offline (it is still safer to take secondary CPUs offline with cpuidle paused, though). Modify the code accordingly, but in order to avoid code duplication, introduce new wrapper functions, pm_sleep_disable_secondary_cpus() and pm_sleep_enable_secondary_cpus(), to combine cpuidle_pause() and cpuidle_resume(), respectively, with the handling of secondary CPUs during system-wide transitions to sleep states. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Ulf Hansson <ulf.hansson@linaro.org>	2021-10-26 15:52:07 +02:00
Rafael J. Wysocki	8d89835b04	PM: suspend: Do not pause cpuidle in the suspend-to-idle path It is pointless to pause cpuidle in the suspend-to-idle path, because it is going to be resumed in the same path later and pausing it does not serve any particular purpose in that case. Rework the code to avoid doing that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Ulf Hansson <ulf.hansson@linaro.org>	2021-10-26 15:52:07 +02:00
Sean Anderson	65aa371ea5	net: Convert more users of mdiobus_* to mdiodev_* This converts users of mdiobus to mdiodev using the following semantic patch: @@ identifier mdiodev; expression regnum; @@ - mdiobus_read(mdiodev->bus, mdiodev->addr, regnum) + mdiodev_read(mdiodev, regnum) @@ identifier mdiodev; expression regnum, val; @@ - mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val) + mdiodev_write(mdiodev, regnum, val) Signed-off-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:40:33 +01:00
Lucas Tanure	f231ff38b7	regmap: spi: Set regmap max raw r/w from max_transfer_size Set regmap raw read/write from spi max_transfer_size so regmap_raw_read/write can split the access into chunks Signed-off-by: Lucas Tanure <tanureal@opensource.cirrus.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> [André: fix build warning] Signed-off-by: André Almeida <andrealmeid@collabora.com> Link: https://lore.kernel.org/r/20211021132721.13669-1-andrealmeid@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>	2021-10-23 19:55:03 +01:00
Rafael J. Wysocki	928265e360	PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions There is no reason to allow "syscore" devices to runtime-suspend during system-wide PM transitions, because they are subject to the same possible failure modes as any other devices in that respect. Accordingly, change device_prepare() and device_complete() to call pm_runtime_get_noresume() and pm_runtime_put(), respectively, for "syscore" devices too. Fixes: `057d51a126` ("Merge branch 'pm-sleep'") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>	2021-10-22 18:09:58 +02:00
Luis Chamberlain	e2e2c0f20f	firmware_loader: move struct builtin_fw to the only place used Now that x86 doesn't abuse picking at internals to the firmware loader move out the built-in firmware struct to its only user. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211021155843.1969401-5-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-22 14:13:53 +02:00
Luis Chamberlain	48d09e9787	firmware_loader: formalize built-in firmware API Formalize the built-in firmware with a proper API. This can later be used by other callers where all they need is built-in firmware. We export the firmware_request_builtin() call for now only under the TEST_FIRMWARE symbol namespace as there are no direct modular users for it. If they pop up they are free to export it generally. Built-in code always gets access to the callers and we'll demonstrate a hidden user which has been lurking in the kernel for a while and the reason why using a proper API was better long term. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211021155843.1969401-2-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-22 14:13:44 +02:00
David S. Miller	bdfa75ad70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-22 11:41:16 +01:00
Kai Vehmanen	c87761db21	component: do not leave master devres group open after bind In current code, the devres group for aggregate master is left open after call to component_master_add_*(). This leads to problems when the master does further managed allocations on its own. When any participating driver calls component_del(), this leads to immediate release of resources. This came up when investigating a page fault occurring with i915 DRM driver unbind with 5.15-rc1 kernel. The following sequence occurs: i915_pci_remove() -> intel_display_driver_unregister() -> i915_audio_component_cleanup() -> component_del() -> component.c:take_down_master() -> hdac_component_master_unbind() [via master->ops->unbind()] -> devres_release_group(master->parent, NULL) With older kernels this has not caused issues, but with audio driver moving to use managed interfaces for more of its allocations, this no longer works. Devres log shows following to occur: component_master_add_with_match() [ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes) [ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes) [ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes) audio driver completes its PCI probe() [ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes) component_del() called() at DRM/i915 unbind() [ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes) [ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes) [ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes) So the "devres_release_group(master->parent, NULL)" ends up freeing the pcim_iomap allocation. Upon next runtime resume, the audio driver will cause a page fault as the iomap alloc was released without the driver knowing about it. Fix this issue by using the "struct master" pointer as identifier for the devres group, and by closing the devres group after the master->ops->bind() call is done. This allows devres allocations done by the driver acting as master to be isolated from the binding state of the aggregate driver. This modifies the logic originally introduced in commit `9e1ccb4a77` ("drivers/base: fix devres handling for master device") Fixes: `9e1ccb4a77` ("drivers/base: fix devres handling for master device") Cc: stable@vger.kernel.org Acked-by: Imre Deak <imre.deak@intel.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-21 13:01:56 +02:00
Andy Shevchenko	a164ff53cb	driver core: Provide device_match_acpi_handle() helper We have a couple of users of this helper, make it available for them. The prototype for the helper is specifically crafted in order to be easily used with bus_find_device() call. That's why its location is in the driver core rather than ACPI. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211014134756.39092-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-20 19:38:29 +02:00
Greg Kroah-Hartman	b5bc8ac25a	Merge 5.15-rc6 into driver-core-next We need the driver-core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-18 09:43:37 +02:00
Linus Torvalds	cf52ad5ff1	Merge tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core fixes for 5.15-rc6, all of which have been in linux-next for a while with no reported issues. They include: - kernfs negative dentry bugfix - simple pm bus fixes to resolve reported issues" * tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers: bus: Delete CONFIG_SIMPLE_PM_BUS drivers: bus: simple-pm-bus: Add support for probing simple bus only devices driver core: Reject pointless SYNC_STATE_ONLY device links kernfs: don't create a negative dentry if inactive node exists	2021-10-17 17:17:28 -10:00
Jonathan Cameron	c5e22feffd	topology: Represent clusters of CPUs within a die Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ \| +------+ +------+ +--------------------------+ \| \| \| CPU0 \| \| cpu1 \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| +----+ L3 \| \| \| \| +------+ +------+ cluster \| \| tag \| \| \| \| \| CPU2 \| \| CPU3 \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +----+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| L3 \| \| data \| +-----------------------------------+ \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ +----+ L3 \| \| \| \| \| \| tag \| \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ +--------------------------+ \| +-----------------------------------\| \| \| +-----------------------------------\| \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| +----+ L3 \| \| \| \| +------+ +------+ \| \| tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +---+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| \| \| +-----------------------------------+ \| \| +-----------------------------------+ \| \| \| +------+ +------+ +--------------------------+ \| \| \| \| \| \| \| +-----------+ \| \| \| +------+ +------+ \| \| \| \| \| \| \| \| L3 \| \| \| \| +------+ +------+ +--+ tag \| \| \| \| \| \| \| \| \| \| \| \| \| \| +------+ +------+ \| +-----------+ \| \| \| \| +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-cluster Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com	2021-10-15 11:25:15 +02:00
Jakub Kicinski	e15f5972b8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/ioam6.sh `7b1700e009` ("selftests: net: modify IOAM tests for undef bits") `bf77b1400a` ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 16:50:14 -07:00
Yang Yingliang	55e6d80378	regmap: Fix possible double-free in regcache_rbtree_exit() In regcache_rbtree_insert_to_block(), when 'present' realloc failed, the 'blk' which is supposed to assign to 'rbnode->block' will be freed, so 'rbnode->block' points a freed memory, in the error handling path of regcache_rbtree_init(), 'rbnode->block' will be freed again in regcache_rbtree_exit(), KASAN will report double-free as follows: BUG: KASAN: double-free or invalid-free in kfree+0xce/0x390 Call Trace: slab_free_freelist_hook+0x10d/0x240 kfree+0xce/0x390 regcache_rbtree_exit+0x15d/0x1a0 regcache_rbtree_init+0x224/0x2c0 regcache_init+0x88d/0x1310 __regmap_init+0x3151/0x4a80 __devm_regmap_init+0x7d/0x100 madera_spi_probe+0x10f/0x333 [madera_spi] spi_probe+0x183/0x210 really_probe+0x285/0xc30 To fix this, moving up the assignment of rbnode->block to immediately after the reallocation has succeeded so that the data structure stays valid even if the second reallocation fails. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: `3f4ff561bc` ("regmap: rbtree: Make cache_present bitmap per node") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20211012023735.1632786-1-yangyingliang@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>	2021-10-12 11:48:43 +01:00
Linus Torvalds	fa58787605	Merge tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit fixes from Shuah Khan: - Fixes to address the structleak plugin causing the stack frame size to grow immensely when used with KUnit. Fixes include adding a new makefile to disable structleak and using it from KUnit iio, device property, thunderbolt, and bitfield tests to disable it. - KUnit framework reference count leak in kfree_at_end - KUnit tool fix to resolve conflict between --json and --raw_output and generate correct test output in either case. - kernel-doc warnings due to mismatched arg names * tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: fix kernel-doc warnings due to mismatched arg names bitfield: build kunit tests without structleak plugin thunderbolt: build kunit tests without structleak plugin device property: build kunit tests without structleak plugin iio/test-format: build kunit tests without structleak plugin gcc-plugins/structleak: add makefile var for disabling structleak kunit: fix reference count leak in kfree_at_end kunit: tool: better handling of quasi-bool args (--json, --raw_output)	2021-10-11 17:25:08 -07:00
Jakub Kicinski	9fe1155233	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-07 15:24:06 -07:00
Jakub Kicinski	433baf0719	device property: move mac addr helpers to eth.c Move the mac address helpers out, eth.c already contains a bunch of similar helpers. Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-07 13:39:51 +01:00
Brendan Higgins	6a1e2d93d5	device property: build kunit tests without structleak plugin The structleak plugin causes the stack frame size to grow immensely when used with KUnit: ../drivers/base/test/property-entry-test.c:492:1: warning: the frame size of 2832 bytes is larger than 2048 bytes [-Wframe-larger-than=] ../drivers/base/test/property-entry-test.c:322:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=] ../drivers/base/test/property-entry-test.c:250:1: warning: the frame size of 4976 bytes is larger than 2048 bytes [-Wframe-larger-than=] ../drivers/base/test/property-entry-test.c:115:1: warning: the frame size of 3280 bytes is larger than 2048 bytes [-Wframe-larger-than=] Turn it off in this file. Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2021-10-06 17:53:42 -06:00
Saravana Kannan	f729a592ad	driver core: Reject pointless SYNC_STATE_ONLY device links SYNC_STATE_ONLY device links intentionally allow cycles because cyclic sync_state() dependencies are valid and necessary. However a SYNC_STATE_ONLY device link where the consumer and the supplier are the same device is pointless because the device link would be deleted as soon as the device probes (because it's also the consumer) and won't affect when the sync_state() callback is called. It's a waste of CPU cycles and memory to create this device link. So reject any attempts to create such a device link. Fixes: `05ef983e0d` ("driver core: Add device link support for SYNC_STATE_ONLY flag") Cc: stable <stable@vger.kernel.org> Reported-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210929190549.860541-1-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 17:45:54 +02:00
Luis Chamberlain	0f8d7ccc2e	firmware_loader: add a sanity check for firmware_request_builtin() Right now firmware_request_builtin() is used internally only and so we have control over the callers. But if we want to expose that API more broadly we should ensure the firmware pointer is valid. This doesn't fix any known issue, it just prepares us to later expose this API to other users. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210917182226.3532898-4-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 16:26:49 +02:00
Luis Chamberlain	7c4fd90741	firmware_loader: split built-in firmware call There are two ways the firmware_loader can use the built-in firmware: with or without the pre-allocated buffer. We already have one explicit use case for each of these, and so split them up so that it is clear what the intention is on the caller side. This also paves the way so that eventually other callers outside of the firmware loader can uses these if and when needed. While at it, adopt the firmware prefix for the routine names. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210917182226.3532898-3-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 16:26:37 +02:00
Luis Chamberlain	f7a07f7b96	firmware_loader: fix pre-allocated buf built-in firmware use The firmware_loader can be used with a pre-allocated buffer through the use of the API calls: o request_firmware_into_buf() o request_partial_firmware_into_buf() If the firmware was built-in and present, our current check for if the built-in firmware fits into the pre-allocated buffer does not return any errors, and we proceed to tell the caller that everything worked fine. It's a lie and no firmware would end up being copied into the pre-allocated buffer. So if the caller trust the result it may end up writing a bunch of 0's to a device! Fix this by making the function that checks for the pre-allocated buffer return non-void. Since the typical use case is when no pre-allocated buffer is provided make this return successfully for that case. If the built-in firmware does not fit into the pre-allocated buffer size return a failure as we should have been doing before. I'm not aware of users of the built-in firmware using the API calls with a pre-allocated buffer, as such I doubt this fixes any real life issue. But you never know... perhaps some oddball private tree might use it. In so far as upstream is concerned this just fixes our code for correctness. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210917182226.3532898-2-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 16:26:29 +02:00
Mianhan Liu	30b7ecf731	drivers/base/component.c: remove superfluous header files from component.c component.c hasn't use any macro or function declared in linux/kref.h. Thus, these files can be removed from component.c safely without affecting the compilation of the drivers/base/ module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Link: https://lore.kernel.org/r/20210928193849.28717-1-liumh1@shanghaitech.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 16:21:10 +02:00
Mianhan Liu	b39214911a	drivers/base/arch_topology.c: remove superfluous header arch_topology.c hasn't use any macro or function declared in linux/percpu.h, linux/smp.h and linux/string.h. Thus, these files can be removed from arch_topology.c safely without affecting the compilation of the drivers/base/ module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Link: https://lore.kernel.org/r/20210928193138.24192-1-liumh1@shanghaitech.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 16:21:01 +02:00
Max Gurtovoy	d460d7f7bb	driver core: use NUMA_NO_NODE during device_initialize Don't use (-1) constant for setting initial device node. Instead, use the generic NUMA_NO_NODE definition to indicate that "no node id specified". Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20211004133453.18881-1-mgurtovoy@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 15:42:22 +02:00
Yang Yingliang	df0a181494	driver core: Fix possible memory leak in device_link_add() I got memory leak as follows: unreferenced object 0xffff88801f0b2200 (size 64): comm "i2c-lis2hh12-21", pid 5455, jiffies 4294944606 (age 15.224s) hex dump (first 32 bytes): 72 65 67 75 6c 61 74 6f 72 3a 72 65 67 75 6c 61 regulator:regula 74 6f 72 2e 30 2d 2d 69 32 63 3a 31 2d 30 30 31 tor.0--i2c:1-001 backtrace: [<00000000bf5b0c3b>] __kmalloc_track_caller+0x19f/0x3a0 [<0000000050da42d9>] kvasprintf+0xb5/0x150 [<000000004bbbed13>] kvasprintf_const+0x60/0x190 [<00000000cdac7480>] kobject_set_name_vargs+0x56/0x150 [<00000000bf83f8e8>] dev_set_name+0xc0/0x100 [<00000000cc1cf7e3>] device_link_add+0x6b4/0x17c0 [<000000009db9faed>] _regulator_get+0x297/0x680 [<00000000845e7f2b>] _devm_regulator_get+0x5b/0xe0 [<000000003958ee25>] st_sensors_power_enable+0x71/0x1b0 [st_sensors] [<000000005f450f52>] st_accel_i2c_probe+0xd9/0x150 [st_accel_i2c] [<00000000b5f2ab33>] i2c_device_probe+0x4d8/0xbe0 [<0000000070fb977b>] really_probe+0x299/0xc30 [<0000000088e226ce>] __driver_probe_device+0x357/0x500 [<00000000c21dda32>] driver_probe_device+0x4e/0x140 [<000000004e650441>] __device_attach_driver+0x257/0x340 [<00000000cf1891b8>] bus_for_each_drv+0x166/0x1e0 When device_register() returns an error, the name allocated in dev_set_name() will be leaked, the put_device() should be used instead of kfree() to give up the device reference, then the name will be freed in kobject_cleanup() and the references of consumer and supplier will be decreased in device_link_release_fn(). Fixes: `287905e68d` ("driver core: Expose device link details in sysfs") Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20210930085714.2057460-1-yangyingliang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-05 15:39:29 +02:00
Greg Kroah-Hartman	bb76c82358	Merge 5.15-rc4 into driver-core-next We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-04 09:20:57 +02:00
Linus Torvalds	84928ce3bb	Merge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core and kernfs fixes for reported issues for 5.15-rc4. These fixes include: - kernfs positive dentry bugfix - debugfs_create_file_size error path fix - cpumask sysfs file bugfix to preserve the user/kernel abi (has been reported multiple times.) - devlink fixes for mdiobus devices as reported by the subsystem maintainers. Also included in here are some devlink debugging changes to make it easier for people to report problems when asked. They have already helped with the mdiobus and other subsystems reporting issues. All of these have been linux-next for a while with no reported issues" * tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kernfs: also call kernfs_set_rev() for positive dentry driver core: Add debug logs when fwnode links are added/deleted driver core: Create __fwnode_link_del() helper function driver core: Set deferred probe reason when deferred by driver core net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD driver core: fw_devlink: Improve handling of cyclic dependencies cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf debugfs: debugfs_create_file_size(): use IS_ERR to check for error	2021-10-03 11:10:09 -07:00
Saravana Kannan	ebd6823af3	driver core: Add debug logs when fwnode links are added/deleted This will help with debugging fw_devlink issues. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915172808.620546-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-28 09:48:48 +02:00
Saravana Kannan	76f130810b	driver core: Create __fwnode_link_del() helper function The same code is repeated in multiple locations. Create a helper function for it. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915172808.620546-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-28 09:48:48 +02:00
Saravana Kannan	68223eeec7	driver core: Set deferred probe reason when deferred by driver core When the driver core defers the probe of a device, set the deferred probe reason so that it's easier to debug. The deferred probe reason is available in debugfs under devices_deferred. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915172808.620546-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-28 09:48:48 +02:00
Linus Torvalds	47d7e65d64	Merge tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fix from Rafael Wysocki: "Fix software node refcount imbalance on device removal (Laurentiu Tudor)" * tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: balance refcount for managed software nodes	2021-09-24 11:20:29 -07:00
Saravana Kannan	5501765a02	driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD If a parent device is also a supplier to a child device, fw_devlink=on by design delays the probe() of the child device until the probe() of the parent finishes successfully. However, some drivers of such parent devices (where parent is also a supplier) expect the child device to finish probing successfully as soon as they are added using device_add() and before the probe() of the parent device has completed successfully. One example of such a case is discussed in the link mentioned below. Add a flag to make fw_devlink=on not enforce these supplier-consumer relationships, so these drivers can continue working. Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/ Fixes: `ea718c6990` ("Revert "Revert "driver core: Set fw_devlink=on by default""") Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915170940.617415-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-23 19:26:54 +02:00
Douglas Anderson	7065f92255	driver core: Clarify that dev_err_probe() is OK even w/out -EPROBE_DEFER There is some debate about whether it's deemed acceptable to call dev_err_probe() if you know that the error code can never be -EPROBE_DEFER. Clarify in the function comments that this is OK. Specifically this makes us able to transform code like this: ret = do_something_that_cant_defer(); if (ret < 0) { dev_err(dev, "The foo failed to bar (%pe)\n", ERR_PTR(ret)); return ret; } to code like this: ret = do_something_that_cant_defer(); if (ret < 0) return dev_err_probe(dev, ret, "The foo failed to bar\n"); It is also possible that in the future folks might want a CONFIG option to strip out all probe error strings to save space (keeping non-probe errors) with the argument that probe errors rarely happen after bringup. Having probe errors reported with a consistent function would allow that. Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210916161931.1.I32bea713bd6c6fb419a24da73686145742b6c117@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-21 18:29:35 +02:00
Saravana Kannan	2de9d8e0d2	driver core: fw_devlink: Improve handling of cyclic dependencies When we have a dependency of the form: Device-A -> Device-C Device-B Device-C -> Device-B Where, * Indentation denotes "child of" parent in previous line. * X -> Y denotes X is consumer of Y based on firmware (Eg: DT). We have cyclic dependency: device-A -> device-C -> device-B -> device-A fw_devlink current treats device-C -> device-B dependency as an invalid dependency and doesn't enforce it but leaves the rest of the dependencies as is. While the current behavior is necessary, it is not sufficient if the false dependency in this example is actually device-A -> device-C. When this is the case, device-C will correctly probe defer waiting for device-B to be added, but device-A will be incorrectly probe deferred by fw_devlink waiting on device-C to probe successfully. Due to this, none of the devices in the cycle will end up probing. To fix this, we need to go relax all the dependencies in the cycle like we already do in the other instances where fw_devlink detects cycles. A real world example of this was reported[1] and analyzed[2]. [1] - https://lore.kernel.org/lkml/0a2c4106-7f48-2bb5-048e-8c001a7c3fda@samsung.com/ [2] - https://lore.kernel.org/lkml/CAGETcx8peaew90SWiux=TyvuGgvTQOmO4BFALz7aj0Za5QdNFQ@mail.gmail.com/ Fixes: `f9aa460672` ("driver core: Refactor fw_devlink feature") Cc: stable <stable@vger.kernel.org> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915170940.617415-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-21 18:21:48 +02:00
Linus Torvalds	c6460daea2	Merge tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - The first hunk of a Xen swiotlb fixup series fixing multiple minor issues and doing some small cleanups - Some further Xen related fixes avoiding WARN() splats when running as Xen guests or dom0 - A Kconfig fix allowing the pvcalls frontend to be built as a module * tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: swiotlb-xen: drop DEFAULT_NSLABS swiotlb-xen: arrange to have buffer info logged swiotlb-xen: drop leftover __ref swiotlb-xen: limit init retries swiotlb-xen: suppress certain init retries swiotlb-xen: maintain slab count properly swiotlb-xen: fix late init retry swiotlb-xen: avoid double free xen/pvcalls: backend can be a module xen: fix usage of pmd_populate in mremap for pv guests xen: reset legacy rtc flag for PV domU PM: base: power: don't try to use non-existing RTC for storing data xen/balloon: use a kernel thread instead a workqueue	2021-09-17 08:31:49 -07:00
Laurentiu Tudor	5aeb05b27f	software node: balance refcount for managed software nodes software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed software nodes, thus leading to underflow errors. Balance the refcount by bumping it in the device_create_managed_software_node() function. The error [1] was encountered after adding a .shutdown() op to our fsl-mc-bus driver. [1] pc : refcount_warn_saturate+0xf8/0x150 lr : refcount_warn_saturate+0xf8/0x150 sp : ffff80001009b920 x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000 x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030 x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000 x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607 x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17 x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670 x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000 Call trace: refcount_warn_saturate+0xf8/0x150 kobject_put+0x10c/0x120 software_node_notify+0xd8/0x140 device_platform_notify+0x4c/0xb4 device_del+0x188/0x424 fsl_mc_device_remove+0x2c/0x4c rebofind sp.c__fsl_mc_device_remove+0x14/0x2c device_for_each_child+0x5c/0xac dprc_remove+0x9c/0xc0 fsl_mc_driver_remove+0x28/0x64 __device_release_driver+0x188/0x22c device_release_driver+0x30/0x50 bus_remove_device+0x128/0x134 device_del+0x16c/0x424 fsl_mc_bus_remove+0x8c/0x114 fsl_mc_bus_shutdown+0x14/0x20 platform_shutdown+0x28/0x40 device_shutdown+0x15c/0x330 __do_sys_reboot+0x218/0x2a0 __arm64_sys_reboot+0x28/0x34 invoke_syscall+0x48/0x114 el0_svc_common+0x40/0xdc do_el0_svc+0x2c/0x94 el0_svc+0x2c/0x54 el0t_64_sync_handler+0xa8/0x12c el0t_64_sync+0x198/0x19c ---[ end trace 32eb1c71c7d86821 ]--- Fixes: `151f6ff78c` ("software node: Provide replacement for device_add_properties()") Reported-by: Jon Nettleton <jon@solid-run.com> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: 5.12+ <stable@vger.kernel.org> # 5.12+ [ rjw: Fix up the software_node_notify() invocation ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-09-16 13:13:48 +02:00
Linus Torvalds	77e02cf57b	memblock: introduce saner 'memblock_free_ptr()' interface The boot-time allocation interface for memblock is a mess, with 'memblock_alloc()' returning a virtual pointer, but then you are supposed to free it with 'memblock_free()' that takes a _physical_ address. Not only is that all kinds of strange and illogical, but it actually causes bugs, when people then use it like a normal allocation function, and it fails spectacularly on a NULL pointer: https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/ or just random memory corruption if the debug checks don't catch it: https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/ I really don't want to apply patches that treat the symptoms, when the fundamental cause is this horribly confusing interface. I started out looking at just automating a sane replacement sequence, but because of this mix or virtual and physical addresses, and because people have used the "__pa()" macro that can take either a regular kernel pointer, or just the raw "unsigned long" address, it's all quite messy. So this just introduces a new saner interface for freeing a virtual address that was allocated using 'memblock_alloc()', and that was kept as a regular kernel pointer. And then it converts a couple of users that are obvious and easy to test, including the 'xbc_nodes' case in lib/bootconfig.c that caused problems. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: `40caa127f3` ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-14 13:23:22 -07:00
Cai Huoqing	86854b4379	driver core: platform: Make use of the helper macro SET_RUNTIME_PM_OPS() Use the helper macro SET_RUNTIME_PM_OPS() instead of the verbose operators ".runtime_suspend/.runtime_resume", because the SET_RUNTIME_PM_OPS() is a nice helper macro that could be brought in to make code a little clearer, a little more concise. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210828090219.1177-1-caihuoqing@baidu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-14 16:52:41 +02:00
Juergen Gross	0560204b36	PM: base: power: don't try to use non-existing RTC for storing data If there is no legacy RTC device, don't try to use it for storing trace data across suspend/resume. Cc: <stable@vger.kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20210903084937.19392-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-09-14 09:56:20 +02:00
Rafael J. Wysocki	be2d24336f	Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em' * pm-cpufreq: cpufreq: intel_pstate: hybrid: Rework HWP calibration ACPI: CPPC: Introduce cppc_get_nominal_perf() * pm-sleep: PM: sleep: core: Avoid setting power.must_resume to false PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq() * pm-em: Documentation: power: include kernel-doc in Energy Model doc PM: EM: fix kernel-doc comments	2021-09-10 20:26:08 +02:00
Linus Torvalds	30f3490978	Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are mostly ARM cpufreq driver updates, including one new MediaTek driver that has just passed all of the reviews, with the addition of a revert of a recent intel_pstate commit, some core cpufreq changes and a DT-related update of the operating performance points (OPP) support code. Specifics: - Add new cpufreq driver for the MediaTek MT6779 platform called mediatek-hw along with corresponding DT bindings (Hector.Yuan). - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara Gopinath). - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu policy flag (Taniya Das). - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn Andersson). - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV flag (Viresh Kumar). - Add new cpufreq driver callback to allow drivers to register with the Energy Model in a consistent way and make several drivers use it (Viresh Kumar). - Change the remaining users of the .ready() cpufreq driver callback to move the code from it elsewhere and drop it from the cpufreq core (Viresh Kumar). - Revert recent intel_pstate change adding HWP guaranteed performance change notification support to it that led to problems, because the notification in question is triggered prematurely on some systems (Rafael Wysocki). - Convert the OPP DT bindings to DT schema and clean them up while at it (Rob Herring)" * tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW cpufreq: Remove ready() callback cpufreq: sh: Remove sh_cpufreq_cpu_ready() cpufreq: acpi: Remove acpi_cpufreq_cpu_ready() cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model dt-bindings: opp: Convert to DT schema dt-bindings: Clean-up OPP binding node names in examples ARM: dts: omap: Drop references to opp.txt cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model ...	2021-09-08 16:38:25 -07:00
Linus Torvalds	2d338201d5	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "147 patches, based on `7d2a07b769`. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...	2021-09-08 12:55:35 -07:00
David Hildenbrand	3fcebf9020	mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy Currently, the "auto-movable" online policy does not allow for hotplugged KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we can have, primarily, because there is no coordiantion across memory devices and we don't want to create zone-imbalances accidentially when unplugging memory. However, within a single memory device it's different. Let's allow for KERNEL memory within a dynamic memory group to allow for more MOVABLE within the same memory group. The only thing we have to take care of is that the managing driver avoids zone imbalances by unplugging MOVABLE memory first, otherwise there can be corner cases where unplug of memory could result in (accidential) zone imbalances. virtio-mem is the only user of dynamic memory groups and recently added support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we don't need a new toggle to enable it for dynamic memory groups. We limit this handling to dynamic memory groups, because: * We want to keep the runtime overhead for collecting stats when onlining a single memory block small. We tend to have only a handful of dynamic memory groups, but we can have quite some static memory groups (e.g., 256 DIMMs). * It doesn't make too much sense for static memory groups, as we try onlining all applicable memory blocks either completely to ZONE_MOVABLE or not. In ordinary operation, we won't have a mixture of zones within a static memory group. When adding memory to a dynamic memory group, we'll first online memory to ZONE_MOVABLE as long as early KERNEL memory allows for it. Then, we'll online the next unit(s) to ZONE_NORMAL, until we can online the next unit(s) to ZONE_MOVABLE. For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will result in a layout like: [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]... ^ movable memory due to early kernel memory ^ allows for more movable memory ... ^-----^ ... here ^ allows for more movable memory ... ^-----^ ... here While the created layout is sub-optimal when it comes to contiguous zones, it gives us the maximum flexibility when dynamically growing/shrinking a device; we can grow small VMs really big in small steps, and still shrink reliably to e.g., 1/4 of the maximum VM size in this example, removing full memory blocks along with meta data more reliably. Mark dynamic memory groups in the xarray such that we can efficiently iterate over them when collecting stats. In usual setups, we have one virtio-mem device per NUMA node, and usually only a small number of NUMA nodes. Note: for now, there seems to be no compelling reason to make this behavior configurable. Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:23 -07:00
David Hildenbrand	445fcf7c72	mm/memory_hotplug: memory group aware "auto-movable" online policy Use memory groups to improve our "auto-movable" onlining policy: 1. For static memory groups (e.g., a DIMM), online a memory block MOVABLE only if all other memory blocks in the group are either MOVABLE or could be onlined MOVABLE. A DIMM will either be MOVABLE or not, not a mixture. 2. For dynamic memory groups (e.g., a virtio-mem device), online a memory block MOVABLE only if all other memory blocks inside the current unit are either MOVABLE or could be onlined MOVABLE. For a virtio-mem device with a device block size with 512 MiB, all 128 MiB memory blocks wihin a 512 MiB unit will either be MOVABLE or not, not a mixture. We have to pass the memory group to zone_for_pfn_range() to take the memory group into account. Note: for now, there seems to be no compelling reason to make this behavior configurable. Link: https://lkml.kernel.org/r/20210806124715.17090-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:23 -07:00
David Hildenbrand	836809ec75	mm/memory_hotplug: track present pages in memory groups Let's track all present pages in each memory group. Especially, track memory present in ZONE_MOVABLE and memory present in one of the kernel zones (which really only is ZONE_NORMAL right now as memory groups only apply to hotplugged memory) separately within a memory group, to prepare for making smart auto-online decision for individual memory blocks within a memory group based on group statistics. Link: https://lkml.kernel.org/r/20210806124715.17090-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:23 -07:00
David Hildenbrand	028fc57a1c	drivers/base/memory: introduce "memory groups" to logically group memory blocks In our "auto-movable" memory onlining policy, we want to make decisions across memory blocks of a single memory device. Examples of memory devices include ACPI memory devices (in the simplest case a single DIMM) and virtio-mem. For now, we don't have a connection between a single memory block device and the real memory device. Each memory device consists of 1..X memory block devices. Let's logically group memory blocks belonging to the same memory device in "memory groups". Memory groups can span multiple physical ranges and a memory group itself does not contain any information regarding physical ranges, only properties (e.g., "max_pages") necessary for improved memory onlining. Introduce two memory group types: 1) Static memory group: E.g., a single ACPI memory device, consisting of 1..X memory resources. A memory group consists of 1..Y memory blocks. The whole group is added/removed in one go. If any part cannot get offlined, the whole group cannot be removed. 2) Dynamic memory group: E.g., a single virtio-mem device. Memory is dynamically added/removed in a fixed granularity, called a "unit", consisting of 1..X memory blocks. A unit is added/removed in one go. If any part of a unit cannot get offlined, the whole unit cannot be removed. In case of 1) we usually want either all memory managed by ZONE_MOVABLE or none. In case of 2) we usually want to have as many units as possible managed by ZONE_MOVABLE. We want a single unit to be of the same type. For now, memory groups are an internal concept that is not exposed to user space; we might want to change that in the future, though. add_memory() users can specify a mgid instead of a nid when passing the MHP_NID_IS_MGID flag. Link: https://lkml.kernel.org/r/20210806124715.17090-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:23 -07:00

... 4 5 6 7 8 ...

5452 Commits