The name of pm_runtime_callbacks_present() is confusing, because
it suggests that the device has PM-runtime callbacks if 'true' is
returned by that function, but in fact that may not be the case,
so replace it with pm_runtime_has_no_callbacks() which is not
ambiguous.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
rpm_suspend() simple bails out when conditions are wrong. But this is not
immediately obvious from the code. Make it clear what we do when conditions
are wrong in rpm_suspend().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 21c27f0658 ("driver core: Fix SYNC_STATE_ONLY device link
implementation") didn't completely fix STATELESS + SYNC_STATE_ONLY
handling.
What looks like an optimization in that commit is actually a bug that
causes an if condition to always take the else path. This prevents
reordering of devices in the dpm_list when a DL_FLAG_STATELESS device
link is create on top of an existing DL_FLAG_SYNC_STATE_ONLY device
link.
Fixes: 21c27f0658 ("driver core: Fix SYNC_STATE_ONLY device link implementation")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20200520043626.181820-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When SYNC_STATE_ONLY support was added in commit 05ef983e0d ("driver
core: Add device link support for SYNC_STATE_ONLY flag"),
device_link_add() incorrectly skipped adding the new SYNC_STATE_ONLY
device link to the supplier's and consumer's "device link" list.
This causes multiple issues:
- The device link is lost forever from driver core if the caller
didn't keep track of it (caller typically isn't expected to). This is
a memory leak.
- The device link is also never visible to any other code path after
device_link_add() returns.
If we fix the "device link" list handling, that exposes a bunch of
issues.
1. The device link "status" state management code rightfully doesn't
handle the case where a DL_FLAG_MANAGED device link exists between a
supplier and consumer, but the consumer manages to probe successfully
before the supplier. The addition of DL_FLAG_SYNC_STATE_ONLY links break
this assumption. This causes device_links_driver_bound() to throw a
warning when this happens.
Since DL_FLAG_SYNC_STATE_ONLY device links are mainly used for creating
proxy device links for child device dependencies and aren't useful once
the consumer device probes successfully, this patch just deletes
DL_FLAG_SYNC_STATE_ONLY device links once its consumer device probes.
This way, we avoid the warning, free up some memory and avoid
complicating the device links "status" state management code.
2. Creating a DL_FLAG_STATELESS device link between two devices that
already have a DL_FLAG_SYNC_STATE_ONLY device link will result in the
DL_FLAG_STATELESS flag not getting set correctly. This patch also fixes
this.
Lastly, this patch also fixes minor whitespace issues.
Cc: stable@vger.kernel.org
Fixes: 05ef983e0d ("driver core: Add device link support for SYNC_STATE_ONLY flag")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20200519063000.128819-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit c8c43cee29 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we set the default
driver_deferred_probe_timeout value to 30 seconds to allow for
drivers that are missing dependencies to have some time so that
the dependency may be loaded from userland after initcalls_done
is set.
However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.
In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.
This patch tries to fix the issue by creating a waitqueue
for the driver_deferred_probe_timeout, and calling wait_event()
to make sure driver_deferred_probe_timeout is zero in
wait_for_device_probe() to make sure all the probing is
finished.
The downside to this solution is that kernel functionality that
uses wait_for_device_probe(), will block until the
driver_deferred_probe_timeout fires, regardless of if there is
any missing dependencies.
However, the previous patch reverts the default timeout value to
zero, so this side-effect will only affect users who specify a
driver_deferred_probe_timeout= value as a boot argument, where
the additional delay would be beneficial to allow modules to
load later during boot.
Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20200422203245.83244-4-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch addresses a regression in 5.7-rc1+
In commit c8c43cee29 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we both cleaned up
the logic and also set the default driver_deferred_probe_timeout
value to 30 seconds to allow for drivers that are missing
dependencies to have some time so that the dependency may be
loaded from userland after initcalls_done is set.
However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.
In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.
Fixing that issue is possible, but could also introduce 30
second delays in bootups for users who don't have any
missing dependencies, which is not ideal.
So I think the best solution to avoid any regressions is to
revert back to a default timeout value of zero, and allow
systems that need to utilize the timeout in order for userland
to load any modules that supply misisng dependencies in the dts
to specify the timeout length via the exiting documented boot
argument.
Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Basil Eljuse <Basil.Eljuse@arm.com>
Cc: Ferry Toth <fntoth@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: c8c43cee29 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20200422203245.83244-2-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When commit 8375e74f2b ("driver core: Add fw_devlink kernel
commandline option") added fw_devlink, it didn't implement "permissive"
mode correctly.
That commit got the device links flags correct to make sure unprobed
suppliers don't block the probing of a consumer. However, if a consumer
is waiting for mandatory suppliers to register, that could still block a
consumer from probing.
This commit fixes that by making sure in permissive mode, all suppliers
to a consumer are treated as a optional suppliers. So, even if a
consumer is waiting for suppliers to register and link itself (using the
DL_FLAG_SYNC_STATE_ONLY flag) to the supplier, the consumer is never
blocked from probing.
Fixes: 8375e74f2b ("driver core: Add fw_devlink kernel commandline option")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200331022832.209618-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's currently the platform driver's responsibility to initialize the
pointer, dma_parms, for its corresponding struct device. The benefit with
this approach allows us to avoid the initialization and to not waste memory
for the struct device_dma_parameters, as this can be decided on a case by
case basis.
However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.
For these reasons, let's do the initialization from the common platform bus
at the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Tested-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200422100954.31211-1-ulf.hansson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull driver core fixes from Greg KH:
"Here are some small firmware/driver core/debugfs fixes for 5.7-rc3.
The debugfs change is now possible as now the last users of
debugfs_create_u32() have been fixed up in the different trees that
got merged into 5.7-rc1, and I don't want it creeping back in.
The firmware changes did cause a regression in linux-next, so the
final patch here reverts part of that, re-exporting the symbol to
resolve that issue. All of these patches, with the exception of the
final one, have been in linux-next with only that one reported issue"
* tag 'driver-core-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware_loader: revert removal of the fw_fallback_config export
debugfs: remove return value of debugfs_create_u32()
firmware_loader: remove unused exports
firmware: imx: fix compile-testing
Christoph's patch removed two unsused exported symbols, however, one
symbol is used by the firmware_loader itself. If CONFIG_FW_LOADER=m so
the firmware_loader is modular but CONFIG_FW_LOADER_USER_HELPER=y we fail
the build at mostpost.
ERROR: modpost: "fw_fallback_config" [drivers/base/firmware_loader/firmware_class.ko] undefined!
This happens because the variable fw_fallback_config is built into the
kernel if CONFIG_FW_LOADER_USER_HELPER=y always, so we need to grant
access to the firmware loader module by exporting it.
Revert only one hunk from his patch.
Fixes: 739604734b ("firmware_loader: remove unused exports")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20200424184916.22843-1-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because all callers of dev_pm_smart_suspend_and_suspended use it only
for checking whether or not to skip driver suspend callbacks for a
device, rename it to dev_pm_skip_suspend() in analogy with
dev_pm_skip_resume().
No functional impact.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
The name of dev_pm_may_skip_resume() may be easily confused with the
power.may_skip_resume flag which is not checked by that function, so
rename the former as dev_pm_skip_resume().
No functional impact.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Because the power.may_skip_resume device status bit is taken
into account in combination with the DPM_FLAG_LEAVE_SUSPENDED
driver flag, it can be set to 'true' for all devices in the
"suspend" phase of a suspend-resume cycle, so do that.
Then, neither the PM core nor the middle-layer (sybsystem) code
handling it needs to set it to 'true' any more and it just has
to be cleared if there is a reason to avoid skipping the "noirq"
and "early" resume callbacks provided by the driver, so update
the code in question accordingly.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
The current code in device_resume_noirq() causes the entire early
resume and resume phases of device suspend to be skipped for
devices for which the noirq resume phase have been skipped (due
to the LEAVE_SUSPENDED flag being set) on the premise that those
devices should stay in runtime-suspend after system-wide resume.
However, that may not be correct in two situations. First, the
middle layer (subsystem) noirq resume callback may be missing for
a given device, but its early resume callback may be present and it
may need to do something even if it decides to skip the driver
callback. Second, if the device's wakeup settings were adjusted
in the suspend phase without resuming the device (that was in
runtime suspend at that time), they most likely need to be
adjusted again in the resume phase and so the driver callback
in that phase needs to be run.
For the above reason, modify the core to allow the middle layer
->resume_late callback to run even if its ->resume_noirq callback
is missing (and the core has skipped the driver-level callback
in that phase) and to allow all device callbacks to run in the
resume phase. Also make the core set the PM-runtime status of
devices with SMART_SUSPEND set whose resume callbacks are not
skipped to "active" in the "noirq" resume phase and update the
affected subsystems (PCI and ACPI) accordingly.
After this change, middle-layer (subsystem) callbacks will always
be invoked in all phases of system suspend and resume and driver
callbacks will always run in the prepare, suspend, resume, and
complete phases for all devices.
For devices with SMART_SUSPEND set, driver callbacks will be
skipped in the late and noirq phases of system suspend if those
devices remain in runtime suspend in __device_suspend_late().
Driver callbacks will also be skipped for them during the
noirq and early phases of the "thaw" transition related to
hibernation in that case.
Setting LEAVE_SUSPENDED means that the driver allows its callbacks
to be skipped in the noirq and early phases of system resume, but
some additional conditions need to be met for that to happen (among
other things, the power.may_skip_resume flag needs to be set for the
device during system suspend for the driver callbacks to be skipped
during the subsequent resume transition).
For all devices with SMART_SUSPEND set whose driver callbacks are
invoked during system resume, the PM-runtime status will be set to
"active" (by the core).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 8b9ec6b732 ("PM core: Use new async_schedule_dev command")
introduced a new function for better performance.
However commit f2a424f6c6 ("PM / core: Introduce dpm_async_fn()
helper") went back to the non-optimized version, async_schedule().
So switch back to the sync_schedule_dev() to improve performance
Fixes: f2a424f6c6 ("PM / core: Introduce dpm_async_fn() helper")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fold four functions in the PM core that each have only one caller
now into their callers.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
The code to handle the SMART_SUSPEND driver PM flag is hard to follow
and somewhat inconsistent with respect to devices without middle-layer
(subsystem) callbacks.
Namely, for those devices the core takes the role of a middle layer
in providing the expected ordering of execution of callbacks (under
the assumption that the drivers setting SMART_SUSPEND can reuse their
PM-runtime callbacks directly for system-wide suspend). To that end,
it prevents driver ->suspend_late and ->suspend_noirq callbacks from
being executed for devices that are still runtime-suspended in
__device_suspend_late(), because running the same callback funtion
that was previously run by PM-runtime for them may be invalid.
However, it does that only for devices without any middle-layer
callbacks for the late/noirq/early suspend/resume phases even
though it would be simpler and more consistent to skip the
driver-lavel callbacks for all devices with SMART_SUSPEND set
that are runtime-suspended in __device_suspend_late().
Simplify the code in accordance with the above observation.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
For now, distributions implement advanced udev rules to essentially
- Don't online any hotplugged memory (s390x)
- Online all memory to ZONE_NORMAL (e.g., most virt environments like
hyperv)
- Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
care of (e.g., bare metal, special virt environments)
In summary: All memory is usually onlined the same way, however, the
kernel always has to ask user space to come up with the same answer.
E.g., Hyper-V always waits for a memory block to get onlined before
continuing, otherwise it might end up adding memory faster than
onlining it, which can result in strange OOM situations. This waiting
slows down adding of a bigger amount of memory.
Let's allow to specify a default online_type, not just "online" and
"offline". This allows distributions to configure the default online_type
when booting up and be done with it.
We can now specify "offline", "online", "online_movable" and
"online_kernel" via
- "memhp_default_state=" on the kernel cmdline
- /sys/devices/system/memory/auto_online_blocks
just like we are able to specify for a single memory block via
/sys/devices/system/memory/memoryX/state
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Yumei Huang <yuhuang@redhat.com>
Link: http://lkml.kernel.org/r/20200317104942.11178-9-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/memory_hotplug: allow to specify a default online_type", v3.
Distributions nowadays use udev rules ([1] [2]) to specify if and how to
online hotplugged memory. The rules seem to get more complex with many
special cases. Due to the various special cases,
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
is handled via udev rules.
Every time we hotplug memory, the udev rule will come to the same
conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
memory in separate memory blocks and wait for memory to get onlined by
user space before continuing to add more memory blocks (to not add memory
faster than it is getting onlined). This of course slows down the whole
memory hotplug process.
To make the job of distributions easier and to avoid udev rules that get
more and more complicated, let's extend the mechanism provided by
- /sys/devices/system/memory/auto_online_blocks
- "memhp_default_state=" on the kernel cmdline
to be able to specify also "online_movable" as well as "online_kernel"
=== Example /usr/libexec/config-memhotplug ===
#!/bin/bash
VIRT=`systemd-detect-virt --vm`
ARCH=`uname -p`
sense_virtio_mem() {
if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then
DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l | wc -l`
if [ $DEVICES != "0" ]; then
return 0
fi
fi
return 1
}
if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then
echo "Memory hotplug configuration support missing in the kernel"
exit 1
fi
if grep "memhp_default_state=" /proc/cmdline > /dev/null; then
echo "Memory hotplug configuration overridden in kernel cmdline (memhp_default_state=)"
exit 1
fi
if [ $VIRT == "microsoft" ]; then
echo "Detected Hyper-V on $ARCH"
# Hyper-V wants all memory in ZONE_NORMAL
ONLINE_TYPE="online_kernel"
elif sense_virtio_mem; then
echo "Detected virtio-mem on $ARCH"
# virtio-mem wants all memory in ZONE_NORMAL
ONLINE_TYPE="online_kernel"
elif [ $ARCH == "s390x" ] || [ $ARCH == "s390" ]; then
echo "Detected $ARCH"
# standby memory should not be onlined automatically
ONLINE_TYPE="offline"
elif [ $ARCH == "ppc64" ] || [ $ARCH == "ppc64le" ]; then
echo "Detected" $ARCH
# PPC64 onlines all hotplugged memory right from the kernel
ONLINE_TYPE="offline"
elif [ $VIRT == "none" ]; then
echo "Detected bare-metal on $ARCH"
# Bare metal users expect hotplugged memory to be unpluggable. We assume
# that ZONE imbalances on such enterpise servers cannot happen and is
# properly documented
ONLINE_TYPE="online_movable"
else
# TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE
# imbalances won't happen
echo "Detected $VIRT on $ARCH"
# Usually, ballooning is used in virtual environments, so memory should go to
# ZONE_NORMAL. However, sometimes "movable_node" is relevant.
ONLINE_TYPE="online"
fi
echo "Selected online_type:" $ONLINE_TYPE
# Configure what to do with memory that will be hotplugged in the future
echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks
if [ $? != "0" ]; then
echo "Memory hotplug cannot be configured (e.g., old kernel or missing permissions)"
# A backup udev rule should handle old kernels if necessary
exit 1
fi
# Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or virtio-mem)
if [ $ONLINE_TYPE != "offline" ]; then
for MEMORY in /sys/devices/system/memory/memory*; do
STATE=`cat $MEMORY/state`
if [ $STATE == "offline" ]; then
echo $ONLINE_TYPE > $MEMORY/state
fi
done
fi
=== Example /usr/lib/systemd/system/config-memhotplug.service ===
[Unit]
Description=Configure memory hotplug behavior
DefaultDependencies=no
Conflicts=shutdown.target
Before=sysinit.target shutdown.target
After=systemd-modules-load.service
ConditionPathExists=|/sys/devices/system/memory/auto_online_blocks
[Service]
ExecStart=/usr/libexec/config-memhotplug
Type=oneshot
TimeoutSec=0
RemainAfterExit=yes
[Install]
WantedBy=sysinit.target
=== Example modification to the 40-redhat.rules [2] ===
: diff --git a/40-redhat.rules b/40-redhat.rules-new
: index 2c690e5..168fd03 100644
: --- a/40-redhat.rules
: +++ b/40-redhat.rules-new
: @@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}
: # Memory hotadd request
: SUBSYSTEM!="memory", GOTO="memory_hotplug_end"
: ACTION!="add", GOTO="memory_hotplug_end"
: +# memory hotplug behavior configured
: +PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", GOTO="memory_hotplug_end"
: +
: PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"
:
: ENV{.state}="online"
===
[1] https://github.com/lnykryn/systemd-rhel/pull/281
[2] https://github.com/lnykryn/systemd-rhel/blob/staging/rules/40-redhat.rules
This patch (of 8):
The name is misleading and it's not really clear what is "kept". Let's
just name it like the online_type name we expose to user space ("online").
Add some documentation to the types.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Yumei Huang <yuhuang@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Link: http://lkml.kernel.org/r/20200319131221.14044-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200317104942.11178-2-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pages_correctly_probed() is a leftover from ancient times. It dates back
to commit 3947be1969 ("[PATCH] memory hotplug: sysfs and add/remove
functions"), where Pg_reserved checks were added as a sfety net:
/*
* The probe routines leave the pages reserved, just
* as the bootmem code does. Make sure they're still
* that way.
*/
The checks were refactored quite a bit over the years, especially in
commit b77eab7079 ("mm/memory_hotplug: optimize probe routine"), where
checks for present, valid, and online sections were added.
Hotplugged memory is added via add_memory(), which will create the full
memmap for the hotplugged memory, and mark all sections valid and present.
Only full memory blocks are onlined/offlined, so we also cannot have an
inconsistency in that regard (especially, memory blocks with some sections
being online and some being offline).
1. Boot memory always starts online. Since commit c5e79ef561
("mm/memory_hotplug.c: don't allow to online/offline memory blocks with
holes") we disallow to offline any memory with holes. Therefore, we
never online memory with holes. Present and validity checks are
superfluous.
2. Only complete memory blocks are onlined/offlined (and especially,
the state - online or offline - is stored for whole memory blocks).
Besides the core, only arch/powerpc/platforms/powernv/memtrace.c
manually calls offline_pages() and fiddels with memory block states.
But it also only offlines complete memory blocks.
3. To make any of these conditions trigger, something would have to be
terribly messed up in the core. (e.g., online/offline only some
sections of a memory block).
4. Memory unplug properly makes sure that all sysfs attributes were
removed (and therefore, that all threads left the sysfs handlers). We
don't have to worry about zombie devices at this point.
5. The valid_section_nr(section_nr) check is actually dead code, as it
would never have been reached due to the WARN_ON_ONCE(!pfn_valid(pfn)).
No wonder we haven't seen any of these errors in a long time (or even
ever, according to my search). Let's just get rid of them. Now, all
checks that could hinder onlining and offlining are completely
contained in online_pages()/offline_pages().
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Link: http://lkml.kernel.org/r/20200127110424.5757-3-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: drop superfluous section checks when onlining/offlining".
Let's drop some superfluous section checks on the onlining/offlining path.
This patch (of 3):
Since commit c5e79ef561 ("mm/memory_hotplug.c: don't allow to
online/offline memory blocks with holes") we have a generic check in
offline_pages() that disallows offlining memory blocks with holes.
Memory blocks with missing sections are just another variant of these type
of blocks. We can stop checking (and especially storing) present
sections. A proper error message is now printed why offlining failed.
section_count was initially introduced in commit 0768121597 ("Driver
core: Add section count to memory_block struct") in order to detect when
it is okay to remove a memory block. It was used in commit 26bbe7ef6d
("drivers/base/memory.c: prohibit offlining of memory blocks with missing
sections") to disallow offlining memory blocks with missing sections. As
we refactored creation/removal of memory devices and have a proper check
for holes in place, we can drop the section_count.
This also removes a leftover comment regarding the mem_sysfs_mutex, which
was removed in commit 848e19ad3c ("drivers/base/memory.c: drop the
mem_sysfs_mutex").
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Link: http://lkml.kernel.org/r/20200127110424.5757-2-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more power management updates from Rafael Wysocki:
"Additional power management updates.
These fix a corner-case suspend-to-idle wakeup issue on systems where
the ACPI SCI is shared with another wakeup source, add a kernel
command line option to set pm_debug_messages via the kernel command
line, add a document desctibing system-wide suspend and resume code
flows, modify cpufreq Kconfig to choose schedutil as the preferred
governor by default in a couple of cases and do some assorted
cleanups.
Specifics:
- Fix corner-case suspend-to-idle wakeup issue on systems where the
ACPI SCI is shared with another wakeup source (Hans de Goede).
- Add document describing system-wide suspend and resume code flows
to the admin guide (Rafael Wysocki).
- Add kernel command line option to set pm_debug_messages (Chen Yu).
- Choose schedutil as the preferred scaling governor by default on
ARM big.LITTLE systems and on x86 systems using the intel_pstate
driver in the passive mode (Linus Walleij, Rafael Wysocki).
- Drop racy and redundant checks from the PM core's device_prepare()
routine (Rafael Wysocki).
- Make resume from hibernation take the hibernation_restore() return
value into account (Dexuan Cui)"
* tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler()
ACPI: PM: Add acpi_[un]register_wakeup_handler()
Documentation: PM: sleep: Document system-wide suspend code flows
cpufreq: Select schedutil when using big.LITTLE
PM: sleep: Add pm_debug_messages kernel command line option
PM: sleep: core: Drop racy and redundant checks from device_prepare()
PM: hibernate: Propagate the return value of hibernation_restore()
cpufreq: intel_pstate: Select schedutil as the default governor
Alan Stern points out that the WARN_ON() check in device_prepare()
is racy (because the PM-runtime API can be disabled briefly for any
device at any time and system suspend can start at any time too) and
the pm_runtime_suspended() check in the computation of the
direct_complete flag value is redundant (because it will be
repeated later anyway).
Drop both these checks accordingly.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
Pull arm64 updates from Catalin Marinas:
"The bulk is in-kernel pointer authentication, activity monitors and
lots of asm symbol annotations. I also queued the sys_mremap() patch
commenting the asymmetry in the address untagging.
Summary:
- In-kernel Pointer Authentication support (previously only offered
to user space).
- ARM Activity Monitors (AMU) extension support allowing better CPU
utilisation numbers for the scheduler (frequency invariance).
- Memory hot-remove support for arm64.
- Lots of asm annotations (SYM_*) in preparation for the in-kernel
Branch Target Identification (BTI) support.
- arm64 perf updates: ARMv8.5-PMU 64-bit counters, refactoring the
PMU init callbacks, support for new DT compatibles.
- IPv6 header checksum optimisation.
- Fixes: SDEI (software delegated exception interface) double-lock on
hibernate with shared events.
- Minor clean-ups and refactoring: cpu_ops accessor,
cpu_do_switch_mm() converted to C, cpufeature finalisation helper.
- sys_mremap() comment explaining the asymmetric address untagging
behaviour"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
mm/mremap: Add comment explaining the untagging behaviour of mremap()
arm64: head: Convert install_el2_stub to SYM_INNER_LABEL
arm64: Introduce get_cpu_ops() helper function
arm64: Rename cpu_read_ops() to init_cpu_ops()
arm64: Declare ACPI parking protocol CPU operation if needed
arm64: move kimage_vaddr to .rodata
arm64: use mov_q instead of literal ldr
arm64: Kconfig: verify binutils support for ARM64_PTR_AUTH
lkdtm: arm64: test kernel pointer authentication
arm64: compile the kernel with ptrauth return address signing
kconfig: Add support for 'as-option'
arm64: suspend: restore the kernel ptrauth keys
arm64: __show_regs: strip PAC from lr in printk
arm64: unwind: strip PAC from kernel addresses
arm64: mask PAC bits of __builtin_return_address
arm64: initialize ptrauth keys for kernel booting task
arm64: initialize and switch ptrauth kernel keys
arm64: enable ptrauth earlier
arm64: cpufeature: handle conflicts based on capability
arm64: cpufeature: Move cpu capability helpers inside C file
...
Pull core SMP updates from Thomas Gleixner:
"CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low
level functions cpu_up/down() are now confined to the core code and
not longer accessible from random code"
* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
cpu/hotplug: Hide cpu_up/down()
cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
torture: Replace cpu_up/down() with add/remove_cpu()
firmware: psci: Replace cpu_up/down() with add/remove_cpu()
xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
parisc: Replace cpu_up/down() with add/remove_cpu()
sparc: Replace cpu_up/down() with add/remove_cpu()
powerpc: Replace cpu_up/down() with add/remove_cpu()
x86/smp: Replace cpu_up/down() with add/remove_cpu()
arm64: hibernate: Use bringup_hibernate_cpu()
cpu/hotplug: Provide bringup_hibernate_cpu()
arm64: Use reboot_cpu instead of hardconding it to 0
arm64: Don't use disable_nonboot_cpus()
ARM: Use reboot_cpu instead of hardcoding it to 0
ARM: Don't use disable_nonboot_cpus()
ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
cpu/hotplug: Create a new function to shutdown nonboot cpus
cpu/hotplug: Add new {add,remove}_cpu() functions
sched/core: Remove rq.hrtick_csd_pending
...
Pull power management updates from Rafael Wysocki:
"These clean up and rework the PM QoS API, address a suspend-to-idle
wakeup regression on some ACPI-based platforms, clean up and extend a
few cpuidle drivers, update multiple cpufreq drivers and cpufreq
documentation, and fix a number of issues in devfreq and several other
things all over.
Specifics:
- Clean up and rework the PM QoS API to simplify the code and reduce
the size of it (Rafael Wysocki).
- Fix a suspend-to-idle wakeup regression on Dell XPS13 9370 and
similar platforms where the USB plug/unplug events are handled by
the EC (Rafael Wysocki).
- CLean up the intel_idle and PSCI cpuidle drivers (Rafael Wysocki,
Ulf Hansson).
- Extend the haltpoll cpuidle driver so that it can be forced to run
on some systems where it refused to load (Maciej Szmigiero).
- Convert several cpufreq documents to the .rst format and move the
legacy driver documentation into one common file (Mauro Carvalho
Chehab, Rafael Wysocki).
- Update several cpufreq drivers:
* Extend and fix the imx-cpufreq-dt driver (Anson Huang).
* Improve the -EPROBE_DEFER handling and fix unwanted CPU
overclocking on i.MX6ULL in imx6q-cpufreq (Anson Huang,
Christoph Niedermaier).
* Add support for Krait based SoCs to the qcom driver (Ansuel
Smith).
* Add support for OPP_PLUS to ti-cpufreq (Lokesh Vutla).
* Add platform specific intermediate callbacks support to
cpufreq-dt and update the imx6q driver (Peng Fan).
* Simplify and consolidate some pieces of the intel_pstate
driver and update its documentation (Rafael Wysocki, Alex
Hung).
- Fix several devfreq issues:
* Remove unneeded extern keyword from a devfreq header file and
use the DEVFREQ_GOV_UPDATE_INTERNAL event name instead of
DEVFREQ_GOV_INTERNAL (Chanwoo Choi).
* Fix the handling of dev_pm_qos_remove_request() result
(Leonard Crestez).
* Use constant name for userspace governor (Pierre Kuo).
* Get rid of doc warnings and fix a typo (Christophe JAILLET).
- Use built-in RCU list checking in some places in the PM core to
avoid false-positive RCU usage warnings (Madhuparna Bhowmik).
- Add explicit READ_ONCE()/WRITE_ONCE() annotations to low-level PM
QoS routines (Qian Cai).
- Fix removal of wakeup sources to avoid NULL pointer dereferences in
a corner case (Neeraj Upadhyay).
- Clean up the handling of hibernate compat ioctls and fix the
related documentation (Eric Biggers).
- Update the idle_inject power capping driver to use variable-length
arrays instead of zero-length arrays (Gustavo Silva).
- Fix list format in a PM QoS document (Randy Dunlap).
- Make the cpufreq stats module use scnprintf() to avoid potential
buffer overflows (Takashi Iwai).
- Add pm_runtime_get_if_active() to PM-runtime API (Sakari Ailus).
- Allow no domain-idle-states DT property in generic PM domains (Ulf
Hansson).
- Fix a broken y-axis scale in the intel_pstate_tracer utility (Doug
Smythies)"
* tag 'pm-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (78 commits)
cpufreq: intel_pstate: Simplify intel_pstate_cpu_init()
tools/power/x86/intel_pstate_tracer: fix a broken y-axis scale
ACPI: PM: s2idle: Refine active GPEs check
ACPICA: Allow acpi_any_gpe_status_set() to skip one GPE
PM: sleep: wakeup: Skip wakeup_source_sysfs_remove() if device is not there
PM / devfreq: Get rid of some doc warnings
PM / devfreq: Fix handling dev_pm_qos_remove_request result
PM / devfreq: Fix a typo in a comment
PM / devfreq: Change to DEVFREQ_GOV_UPDATE_INTERVAL event name
PM / devfreq: Remove unneeded extern keyword
PM / devfreq: Use constant name of userspace governor
ACPI: PM: s2idle: Fix comment in acpi_s2idle_prepare_late()
cpufreq: qcom: Add support for krait based socs
cpufreq: imx6q-cpufreq: Improve the logic of -EPROBE_DEFER handling
cpufreq: Use scnprintf() for avoiding potential buffer overflow
cpuidle: psci: Split psci_dt_cpu_init_idle()
PM / Domains: Allow no domain-idle-states DT property in genpd when parsing
PM / hibernate: Remove unnecessary compat ioctl overrides
PM: hibernate: fix docs for ioctls that return loff_t via pointer
Documentation: intel_pstate: update links for references
...
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core changes for 5.7-rc1.
Nothing huge in here, just lots of little firmware core changes and
use of new apis, a libfs fix, a debugfs api change, and some driver
core deferred probe rework.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (44 commits)
Revert "driver core: Set fw_devlink to "permissive" behavior by default"
driver core: Set fw_devlink to "permissive" behavior by default
driver core: Replace open-coded list_last_entry()
driver core: Read atomic counter once in driver_probe_done()
libfs: fix infoleak in simple_attr_read()
driver core: Add device links from fwnode only for the primary device
platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet
platform/x86: touchscreen_dmi: Add EFI embedded firmware info support
Input: icn8505 - Switch to firmware_request_platform for retreiving the fw
Input: silead - Switch to firmware_request_platform for retreiving the fw
selftests: firmware: Add firmware_request_platform tests
test_firmware: add support for firmware_request_platform
firmware: Add new platform fallback mechanism and firmware_request_platform()
Revert "drivers: base: power: wakeup.c: Use built-in RCU list checking"
drivers: base: power: wakeup.c: Use built-in RCU list checking
component: allow missing unbind callback
debugfs: remove return value of debugfs_create_file_size()
debugfs: Check module state before warning in {full/open}_proxy_open()
firmware: fix a double abort case with fw_load_sysfs_fallback
arch_topology: Fix putting invalid cpu clk
...
Pull USB / PHY updates from Greg KH:
"Here are the big set of USB and PHY driver patches for 5.7-rc1.
Nothing huge here, some new PHY drivers, loads of USB gadget fixes and
updates, xhci updates, usb-serial driver updates and new device ids,
and other minor things. Full details in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (239 commits)
USB: cdc-acm: restore capability check order
usb: cdns3: make signed 1 bit bitfields unsigned
usb: gadget: fsl: remove unused variable 'driver_desc'
usb: gadget: f_fs: Fix use after free issue as part of queue failure
usb: typec: Correct the documentation for typec_cable_put()
USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
USB: serial: option: add Wistron Neweb D19Q1
USB: serial: option: add BroadMobi BM806U
USB: serial: option: add support for ASKEY WWHC050
usb: core: Add ACPI support for USB interface devices
driver core: platform: Reimplement devm_platform_ioremap_resource
usb: dwc2: convert to devm_platform_get_and_ioremap_resource
usb: host: hisilicon: convert to devm_platform_get_and_ioremap_resource
usb: host: xhci-plat: convert to devm_platform_get_and_ioremap_resource
drivers: provide devm_platform_get_and_ioremap_resource()
phy: qcom-qusb2: Add new overriding tuning parameters in QUSB2 V2 PHY
phy: qcom-qusb2: Add support for overriding tuning parameters in QUSB2 V2 PHY
dt-bindings: phy: qcom-qusb2: Add support for overriding Phy tuning parameters
phy: qcom-qusb2: Add generic QUSB2 V2 PHY support
dt-bindings: phy: qcom,qusb2: Add compatibles for QUSB2 V2 phy and SC7180
...
Pull media updates from Mauro Carvalho Chehab:
- New sensor driver: imx219
- Support for some new pixelformats
- Support for Sun8i SoC
- Added more codecs to meson vdec driver
- Prepare for removing the legacy usbvision driver by moving it to
staging. This driver has issues and use legacy core APIs. If nobody
steps up to address those, it is time for its retirement.
- Several cleanups and improvements on drivers, with the addition of
new supported boards
* tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (236 commits)
media: venus: firmware: Ignore secure call error on first resume
media: mtk-vpu: load vpu firmware from the new location
media: i2c: video-i2c: fix build errors due to 'imply hwmon'
media: MAINTAINERS: add myself to co-maintain Hantro G1/G2 for i.MX8MQ
media: hantro: add initial i.MX8MQ support
media: dt-bindings: Document i.MX8MQ VPU bindings
media: vivid: fix incorrect PA assignment to HDMI outputs
media: hantro: Add linux-rockchip mailing list to MAINTAINERS
media: cedrus: h264: Fix 4K decoding on H6
media: siano: Use scnprintf() for avoiding potential buffer overflow
media: rc: Use scnprintf() for avoiding potential buffer overflow
media: allegro: create new struct for channel parameters
media: allegro: move mail definitions to separate file
media: allegro: pass buffers through firmware
media: allegro: verify source and destination buffer in VCU response
media: allegro: handle dependency of bitrate and bitrate_peak
media: allegro: read bitrate mode directly from control
media: allegro: make QP configurable
media: allegro: make frame rate configurable
media: allegro: skip filler data if possible
...
We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).
1. It runs basically lockless. While this might be good for performance,
we see possible races with memory offlining that will require at
least some sort of locking to fix.
2. Nowadays, more false positives are possible. No arch-specific checks
are performed that validate if memory offlining will not be denied
right away (and such check will require locking). For example, arm64
won't allow to offline any memory block that was added during boot -
which will imply a very high error rate. Other archs have other
constraints.
3. The interface is inherently racy. E.g., if a memory block is detected
to be removable (and was not a false positive at that time), there is
still no guarantee that offlining will actually succeed. So any
caller already has to deal with false positives.
4. It is unclear which performance benefit this interface actually
provides. The introducing commit 5c755e9fd8 ("memory-hotplug: add
sysfs removable attribute for hotplug memory remove") mentioned
"A user-level agent must be able to identify which sections
of memory are likely to be removable before attempting the
potentially expensive operation."
However, no actual performance comparison was included.
Known users:
- lsmem: Will group memory blocks based on the "removable" property. [1]
- chmem: Indirect user. It has a RANGE mode where one can specify
removable ranges identified via lsmem to be offlined. However,
it also has a "SIZE" mode, which allows a sysadmin to skip the
manual "identify removable blocks" step. [2]
- powerpc-utils: Uses the "removable" attribute to skip some memory
blocks right away when trying to find some to offline+remove.
However, with ballooning enabled, it already skips this
information completely (because it once resulted in many false
negatives). Therefore, the implementation can deal with false
positives properly already. [3]
According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils). Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the
affected legacy userspace handling is only active on old kernels. Only
very old versions of drmgr on a new kernel (unlikely) might execute
slower - totally acceptable.
With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool. We implement a very bad heuristic now.
Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.
Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").
Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.
[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com
Also, this patch probably fixes a crash reported by Steve.
http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com
Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Nathan Fontenot <ndfont@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit c442a0d187 as it
breaks some of the Raspberry Pi devices. Marek writes:
This patch has just landed in linux-next 20200326. Sadly it
breaks booting of the Raspberry Pi3b and Pi4 boards, either in
32bit or 64bit mode. There is no warning nor panic message, just
a silent freeze. The last message shown on the earlycon is:
[ 0.893217] Serial: 8250/16550 driver, 1 ports, IRQ sharing enabled
so revert it for now and let's try again and add it to linux-next after
5.7-rc1 is out so that we can try to get more debugging/testing
happening.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Saravana Kannan <saravanak@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use separate functions for the device core to bring a CPU up and down.
Users outside the device core must use add/remove_cpu() which will take
care of extra housekeeping work like keeping sysfs in sync.
Make cpu_up/down() static and replace the extra layer of indirection.
[ tglx: Removed the extra wrapper functions and adjusted function names ]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com
Skip wakeup_source_sysfs_remove() to fix a NULL pinter dereference via
ws->dev, if the wakeup source is unregistered before registering the
wakeup class from device_add().
Fixes: 2ca3d1ecb8 ("PM / wakeup: Register wakeup class kobj after device is added")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
[ rjw: Subject & changelog, white space ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Set fw_devlink to "permissive" behavior by default so that device links
are automatically created (with DL_FLAG_SYNC_STATE_ONLY) by scanning the
firmware.
This ensures suppliers get their sync_state() calls only after all their
consumers have probed successfully. Without this, suppliers will get
their sync_state() calls at late_initcall_sync() even if their consuer
Ideally, we'd want to set fw_devlink to "on" or "rpm" by default. But
that needs more testing as it's known to break some corner case
drivers/platforms.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20200321210305.28937-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>