* pm-cpufreq-sched:
cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
* pm-opp:
PM / OPP: Add dev_pm_opp_{un}register_get_pstate_helper()
PM / OPP: Support updating performance state of device's power domain
PM / OPP: add missing of_node_put() for of_get_cpu_node()
PM / OPP: Rename dev_pm_opp_register_put_opp_helper()
PM / OPP: Add missing of_node_put(np)
PM / OPP: Move error message to debug level
PM / OPP: Use snprintf() to avoid kasprintf() and kfree()
PM / OPP: Move the OPP directory out of power/
* pm-cpufreq: (22 commits)
cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
cpufreq: arm_big_little: make function arguments and structure pointer const
cpufreq: pxa: convert to clock API
cpufreq: speedstep-lib: mark expected switch fall-through
cpufreq: ti-cpufreq: add missing of_node_put()
cpufreq: dt: Remove support for Exynos4212 SoCs
cpufreq: imx6q: Move speed grading check to cpufreq driver
cpufreq: ti-cpufreq: kfree opp_data when failure
cpufreq: SPEAr: pr_err() strings should end with newlines
cpufreq: powernow-k8: pr_err() strings should end with newlines
cpufreq: dt-platdev: drop socionext,uniphier-ld6b from whitelist
arm64: wire cpu-invariant accounting support up to the task scheduler
arm64: wire frequency-invariant accounting support up to the task scheduler
arm: wire cpu-invariant accounting support up to the task scheduler
arm: wire frequency-invariant accounting support up to the task scheduler
drivers base/arch_topology: allow inlining cpu-invariant accounting support
drivers base/arch_topology: provide frequency-invariant accounting support
cpufreq: dt: invoke frequency-invariance setter function
cpufreq: arm_big_little: invoke frequency-invariance setter function
...
* pm-domains:
PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
PM / domains: Rework governor code to be more consistent
PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
PM / Domains: Allow genpd users to specify default active wakeup behavior
PM / Domains: Add support to select performance-state of domains
PM / Domains: Rename genpd internals from pm_genpd_* to genpd_*
During system-wide PM, genpd relies on its PM callbacks to be invoked for
all its attached devices, as to deal with powering off/on the PM domain. In
other words, genpd is not compatible with the direct_complete path, if
executed by the PM core for any of its attached devices.
However, when genpd's ->prepare() callback invokes pm_generic_prepare(), it
does not take into account that it may return 1. Instead it treats that as
an error internally and expects the PM core to abort the prepare phase and
roll back. This leads to genpd not properly powering on/off the PM domain,
because its internal counters gets wrongly balanced.
To fix the behaviour, allow drivers to return 1 from their ->prepare()
callbacks, but let's return 0 from genpd's ->prepare() callback in such
case, as that prevents the PM core from running the direct_complete path
for the device.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The special value of 0 for device resume latency PM QoS means
"no restriction", but there are two problems with that.
First, device resume latency PM QoS requests with 0 as the
value are always put in front of requests with positive
values in the priority lists used internally by the PM QoS
framework, causing 0 to be chosen as an effective constraint
value. However, that 0 is then interpreted as "no restriction"
effectively overriding the other requests with specific
restrictions which is incorrect.
Second, the users of device resume latency PM QoS have no
way to specify that *any* resume latency at all should be
avoided, which is an artificial limitation in general.
To address these issues, modify device resume latency PM QoS to
use S32_MAX as the "no constraint" value and 0 as the "no
latency at all" one and rework its users (the cpuidle menu
governor, the genpd QoS governor and the runtime PM framework)
to follow these changes.
Also add a special "n/a" value to the corresponding user space I/F
to allow user space to indicate that it cannot accept any resume
latencies at all for the given device.
Fixes: 85dc0b8a40 (PM / QoS: Make it possible to expose PM QoS latency constraints)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Ramesh Thomas <ramesh.thomas@intel.com>
The genpd governor currently uses negative PM QoS values to indicate
the "no suspend" condition and 0 as "no restriction", but it doesn't
use them consistently. Moreover, it tries to refresh QoS values for
already suspended devices in a quite questionable way.
For the above reasons, rework it to be a bit more consistent.
First off, note that dev_pm_qos_read_value() in
dev_update_qos_constraint() and __default_power_down_ok() is
evaluated for devices in suspend. Moreover, that only happens if the
effective_constraint_ns value for them is negative (meaning "no
suspend"). It is not evaluated in any other cases, so effectively
the QoS values are only updated for devices in suspend that should
not have been suspended in the first place. In all of the other
cases, the QoS values taken into account are the effective ones from
the time before the device has been suspended, so generally devices
need to be resumed and suspended again for new QoS values to take
effect anyway. Thus evaluating dev_update_qos_constraint() in
those two places doesn't make sense at all, so drop it.
Second, initialize effective_constraint_ns to 0 ("no constraint")
rather than to (-1) ("no suspend"), which makes more sense in
general and in case effective_constraint_ns is never updated
(the device is in suspend all the time or it is never suspended)
it doesn't affect the device's parent and so on.
Finally, rework default_suspend_ok() to explicitly handle the
"no restriction" and "no suspend" special cases.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Ramesh Thomas <ramesh.thomas@intel.com>
There are no more users left of the gpd_dev_ops.active_wakeup()
callback. All have been converted to GENPD_FLAG_ACTIVE_WAKEUP.
Hence remove the callback.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is quite common for PM Domains to require slave devices to be kept
active during system suspend if they are to be used as wakeup sources.
To enable this, currently each PM Domain or driver has to provide its
own gpd_dev_ops.active_wakeup() callback.
Introduce a new flag GENPD_FLAG_ACTIVE_WAKEUP to consolidate this.
If specified, all slave devices configured as wakeup sources will be
kept active during system suspend.
PM Domains that need more fine-grained controls, based on the slave
device, can still provide their own callbacks, as before.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The special value of 0 for device resume latency PM QoS means
"no restriction", but there are two problems with that.
First, device resume latency PM QoS requests with 0 as the
value are always put in front of requests with positive
values in the priority lists used internally by the PM QoS
framework, causing 0 to be chosen as an effective constraint
value. However, that 0 is then interpreted as "no restriction"
effectively overriding the other requests with specific
restrictions which is incorrect.
Second, the users of device resume latency PM QoS have no
way to specify that *any* resume latency at all should be
avoided, which is an artificial limitation in general.
To address these issues, modify device resume latency PM QoS to
use S32_MAX as the "no constraint" value and 0 as the "no
latency at all" one and rework its users (the cpuidle menu
governor, the genpd QoS governor and the runtime PM framework)
to follow these changes.
Also add a special "n/a" value to the corresponding user space I/F
to allow user space to indicate that it cannot accept any resume
latencies at all for the given device.
Fixes: 85dc0b8a40 (PM / QoS: Make it possible to expose PM QoS latency constraints)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alex Shi <alex.shi@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
When I execute numactl -H (which reads /sys/devices/system/node/nodeX/cpumap
and displays cpumask_of_node for each node), I get different result
on X86 and arm64. For each numa node, the former only displayed online
CPUs, and the latter displayed all possible CPUs. Unfortunately, both
Linux documentation and numactl manual have not described it clear.
I sent a mail to ask for help, and Michal Hocko replied that he
preferred to print online cpus because it doesn't really make much sense
to bind anything on offline nodes.
Will said:
"I suspect the vast majority (if not all) code that reads this file was
developed for x86, so having the same behaviour for arm64 sounds like
something we should do ASAP before people try to special case with
things like #ifdef __aarch64__. I'd rather have this in 4.14 if
possible."
Link: http://lkml.kernel.org/r/1506678805-15392-2-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Libin <huawei.libin@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PM QoS flag PM_QOS_FLAG_REMOTE_WAKEUP is not used consistently
and the vast majority of code simply assumes that remote wakeup
should be enabled for devices in runtime suspend if they can
generate wakeup signals, so drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Some platforms have the capability to configure the performance state of
PM domains. This patch enhances the genpd core to support such
platforms.
The performance levels (within the genpd core) are identified by
positive integer values, a lower value represents lower performance
state.
This patch adds a new genpd API, which is called by user drivers (like
OPP framework):
- int dev_pm_genpd_set_performance_state(struct device *dev,
unsigned int state);
This updates the performance state constraint of the device on its PM
domain. On success, the genpd will have its performance state set to a
value which is >= "state" passed to this routine. The genpd core calls
the genpd->set_performance_state() callback, if implemented,
else -ENODEV is returned to the caller.
The PM domain drivers need to implement the following callback if they
want to support performance states.
- int (*set_performance_state)(struct generic_pm_domain *genpd,
unsigned int state);
This is called internally by the genpd core on several occasions. The
genpd core passes the genpd pointer and the aggregate of the
performance states of the devices supported by that genpd to this
callback. This callback must update the performance state of the genpd
(in a platform dependent way).
The power domains can avoid supplying above callback, if they don't
support setting performance-states.
Currently we aren't propagating performance state changes of a subdomain
to its masters as we don't have hardware that needs it right now. Over
that, the performance states of subdomain and its masters may not have
one-to-one mapping and would require additional information. We can get
back to this once we have hardware that needs it.
Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
acpi_fwnode_get_reference_args(), the function implementing ACPI
support for fwnode_property_get_reference_args(), returns directly
error codes from __acpi_node_get_property_reference(). The latter
uses different error codes than the OF implementation. In particular,
the OF implementation uses -ENOENT to indicate that the property is
not found, a reference entry is empty and there are no more
references.
Document and align the error codes for property for
fwnode_property_get_reference_args() so that they match with
of_parse_phandle_with_args().
Fixes: 3e3119d308 (device property: Introduce fwnode_property_get_reference_args)
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Most of the functions names has already moved the genpd naming rules,
however let's make this complete to avoid any further confusions.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Deletion of subdevice will remove device properties associated to parent
when they share the same firmware node after commit 478573c93a (driver
core: Don't leak secondary fwnode on device removal). This was observed
with a driver adding subdevice that driver wasn't able to read device
properties after rmmod/modprobe cycle.
Consider the lifecycle of it:
parent device registration
ACPI_COMPANION_SET()
device_add_properties()
pset_copy_set()
set_secondary_fwnode(dev, &p->fwnode)
device_add()
parent probe
read device properties
ACPI_COMPANION_SET(subdevice, ACPI_COMPANION(parent))
device_add(subdevice)
parent remove
device_del(subdevice)
device_remove_properties()
set_secondary_fwnode(dev, NULL);
pset_free()
Parent device will have its primary firmware node pointing to an ACPI
node and secondary firmware node point to device properties.
ACPI_COMPANION_SET() call in parent probe will set the subdevice's
firmware node to point to the same 'struct fwnode_handle' and the
associated secondary firmware node, i.e. the device properties as the
parent.
When subdevice is deleted in parent remove that will remove those
device properties and attempt to read device properties in next
parent probe call will fail.
Fix this by tracking the owner device of device properties and delete
them only when owner device is being deleted.
Fixes: 478573c93a (driver core: Don't leak secondary fwnode on device removal)
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of mine...)
So there's a simple removal of the last user, and then finally the macro
is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year in
the future we can drop the workaround, once users update to the latest
version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer overflow
fix to match the PCI fixes you took through the PCI tree in the same
area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWdN8qA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymLEgCfUSSBhxW04teEcPua4QygLv2omK0An2SRkpnY
28nn+D+AfeOByQImY8v+
=RQY+
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of
mine...) So there's a simple removal of the last user, and then
finally the macro is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year
in the future we can drop the workaround, once users update to the
latest version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer
overflow fix to match the PCI fixes you took through the PCI tree in
the same area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay"
* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove DRIVER_ATTR
fpga: altera-cvp: remove DRIVER_ATTR() usage
driver core: platform: Don't read past the end of "driver_override" buffer
base: arch_topology: fix section mismatch build warnings
driver core: suppress sending MODALIAS in UNBIND uevents
The drivers/base/power/ directory is special and contains code related
to power management core like system suspend/resume, hibernation, etc.
It was fine to keep the OPP code inside it when we had just one file for
it, but it is growing now and already has a directory for itself.
Lets move it directly under drivers/ directory, just like cpufreq and
cpuidle.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow inlining of topology_get_cpu_scale() into the task
scheduler fast path (e.g. __update_load_avg_se()) by coding it as a
static inline function in the arch topology header file.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Implements the arch-specific (arm and arm64) frequency-invariance setter
function arch_set_freq_scale() which provides the following frequency
scaling factor:
current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_supported_freq(cpu)
One possible consumer of the frequency-invariance getter function
topology_get_freq_scale() is the Per-Entity Load Tracking (PELT)
mechanism of the task scheduler.
Allow inlining of topology_get_freq_scale() into the task scheduler
fast path (e.g. __update_load_avg_se()) by coding it as a static inline
function in the arch topology header file.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Free cpumask cpus_to_visit in case registering
init_cpu_capacity_notifier has failed or the parsing of the cpu
capacity-dmips-mhz property is done. The cpumask cpus_to_visit is
only used inside the notifier call init_cpu_capacity_callback.
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The notifier callbacks may want to call some OPP helper routines which
may try to take the same opp_table->lock again and cause a deadlock. One
such usecase was reported by Chanwoo Choi, where calling
dev_pm_opp_disable() leads us to the devfreq's OPP notifier handler,
which further calls dev_pm_opp_find_freq_floor() and it deadlocks.
We don't really need the opp_table->lock to be held across the notifier
call though, all we want to make sure is that the 'opp' doesn't get
freed while being used from within the notifier chain. We can do it with
help of dev_pm_opp_get/put() as well. Let's do it.
Cc: 4.11+ <stable@vger.kernel.org> # 4.11+
Fixes: 5b650b3888 "PM / OPP: Take kref from _find_opp_table()"
Reported-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Fix a regression in cpufreq on systems using DT as the source of
CPU configuration information where two different code paths
attempt to create the cpufreq-dt device object (there can be only
one) and fix up the "compatible" matching for some TI platforms
on top of that (Viresh Kumar, Dave Gerlach).
- Fix an initialization time memory leak in cpuidle on ARM which
occurs if the cpuidle driver initialization fails (Stefan Wahren).
- Fix a PM core function that checks whether or not there are any
system suspend/resume callbacks for a device, but forgets to
check legacy callbacks which then may be skipped incorrectly
and the system may crash and/or the device may become unusable
after a suspend-resume cycle (Rafael Wysocki).
- Fix request type validation for latency tolerance PM QoS requests
which may lead to unexpected behavior (Jan Schönherr).
- Fix a broken link to PM documentation from a header file and a
typo in a PM document (Geert Uytterhoeven, Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZxYXOAAoJEILEb/54YlRxgkQP/1tQha6hykoryGscv1XGFPsc
VP7tpsf00gjMcUnbuRFbgD+jF60SAWtt48sIACUOqtxSXRTGdPHS8zggLo9pa0+A
A0Y8QlFXYHlk6up2iUyNOE0C2+UfD1pnh1fGfDtNLKob+hDhrrmRI2foMRVij9Cw
0+szDkUENcVwLcCzyWgv7uCdrG823kzBS0gCOJT4cpYid+s7lfFEwRZMz7m2Q/2Y
s0MMELdhBaaMAZyGIh28B2D+bAS1QTTxLr++nSiiIejVplraWlUYRToGlQMpZbJn
yPjKld5Zc6+VJ4sPl3oRw3wnfo4Rndwgx0VDGDdne1e0S7pQtIjNRbyGTbl2n6yS
+nckLpW767l9Mt9TVQzQBgnalUUZX78sqHqa3Ca9j8QXFcG0Khn5aewq76+9KVdv
uSAxUmthvErJUGa6GTkhBGywZxKtuIdA/eg23XDN/P2PSVW8hj+z4Fn8DylXSEo7
H3KMOJgSRWDFvAaYsusNEdQwy6+UWN9OiMoZYMVL6XcDYyFZrAaXtaZIb7MjOL5i
U9cv3QCU4CWtsbKoQb3qvQZudUa92MIHtLWILTuLT/lWPIPGiUGLbIYRE+hzRzkj
26J21Ex8veYuGEJT+YP6GBqtywX9aKuec2RZVSfqtAK/x/zXxwjFP6T3JHVril3T
/mxhLZIg/aJwIeJQ3JTG
=Cbgd
-----END PGP SIGNATURE-----
Merge tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a cpufreq regression introduced by recent changes related to
the generic DT driver, an initialization time memory leak in cpuidle
on ARM, a PM core bug that may cause system suspend/resume to fail on
some systems, a request type validation issue in the PM QoS framework
and two documentation-related issues.
Specifics:
- Fix a regression in cpufreq on systems using DT as the source of
CPU configuration information where two different code paths
attempt to create the cpufreq-dt device object (there can be only
one) and fix up the "compatible" matching for some TI platforms on
top of that (Viresh Kumar, Dave Gerlach).
- Fix an initialization time memory leak in cpuidle on ARM which
occurs if the cpuidle driver initialization fails (Stefan Wahren).
- Fix a PM core function that checks whether or not there are any
system suspend/resume callbacks for a device, but forgets to check
legacy callbacks which then may be skipped incorrectly and the
system may crash and/or the device may become unusable after a
suspend-resume cycle (Rafael Wysocki).
- Fix request type validation for latency tolerance PM QoS requests
which may lead to unexpected behavior (Jan Schönherr).
- Fix a broken link to PM documentation from a header file and a typo
in a PM document (Geert Uytterhoeven, Rafael Wysocki)"
* tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: ti-cpufreq: Support additional am43xx platforms
ARM: cpuidle: Avoid memleak if init fail
cpufreq: dt-platdev: Add some missing platforms to the blacklist
PM: core: Fix device_pm_check_callbacks()
PM: docs: Drop an excess character from devices.rst
PM / QoS: Use the correct variable to check the QoS request type
driver core: Fix link to device power management documentation
* pm-core:
PM: core: Fix device_pm_check_callbacks()
* pm-qos:
PM / QoS: Use the correct variable to check the QoS request type
* pm-docs:
PM: docs: Drop an excess character from devices.rst
driver core: Fix link to device power management documentation
The device_pm_check_callbacks() function doesn't check legacy
->suspend and ->resume callback pointers under the device's
bus type, class and driver, so in some cases it may set the
no_pm_callbacks flag for the device incorrectly and then the
callbacks may be skipped during system suspend/resume, which
shouldn't happen.
Fixes: aa8e54b559 (PM / sleep: Go direct_complete if driver has no callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.
Reject driver_override values of these lengths in driver_override_store().
This is in close analogy to commit 4efe874aac ("PCI: Don't read past the
end of sysfs "driver_override" buffer") from Sasha Levin.
Fixes: 3d713e0e38 ("driver core: platform: add device binding path 'driver_override'")
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 2ef7a2953c ("arm, arm64: factorize common cpu capacity default code")
introduced init_cpu_capacity_callback and init_cpu_capacity_notifier
which are referenced from initcall and are missing __init{,data}
annotations resulting the below section mismatch build warnings.
"WARNING: vmlinux.o(.text+0xbab790): Section mismatch in reference from
the function init_cpu_capacity_callback() to the variable .init.text:$x
The function init_cpu_capacity_callback() references the variable
__init $x. This is often because init_cpu_capacity_callback lacks a
__init annotation or the annotation of $x is wrong."
This patch fixes the above build warnings by adding the required annotations.
Fixes: 2ef7a2953c ("arm, arm64: factorize common cpu capacity default code")
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use the actual function argument for the validation of the request type,
instead of the type field in a fresh (supposedly zero-initialized)
request structure.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
My recent bug fix introduced another bug, which caused rmem_dma_device_init
to always fail, as rmem->priv is never set to anything.
This restores the previous behavior, calling dma_init_coherent_memory()
whenever ->priv is NULL.
Fixes: d35b0996fe ("dma-coherent: fix dma_declare_coherent_memory() logic error")
Reported-by: Roy Pledge <roy.pledge@nxp.com>
Tested-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit 5620a0d1aa ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
Fixes: 5620a0d1aa ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- removal of the old dma_alloc_noncoherent interface
- remove unused flags to dma_declare_coherent_memory
- restrict OF DMA configuration to specific physical busses
- use the iommu mailing list for dma-mapping questions and
patches
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlm38VMLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNa+BAAjlTbZ2DhEmBDKHDYjQepnqdw7oY/5B/6A3GKArbR
psXDmWR8HetbJC1+Yg2aaCvSWdza9cR36juI4WSsA2+4FVLU0nj5elkX9QEN/eP4
kiUuwUXCKi3MNd55ESIiVQos0RKvQ8us7rLDfug2h/8dVP5EdLNI//axHY4Y/R7k
6j6ymal9QdHxLj8wdkAQ332womV2jQH44ISIy/Eu/eF1DmWJ1HZvgO/PNeSyVpwB
bP8CRfMAnQDFqB0XiwCXyukNKntvBe7ZpazMdQKH6bObiLOa8DT2Ex/UeG6VBtb3
TXeLW9aSJDG9FfzlqwDlySZWqQlvcEEK7DWm7kNka9tiHx+AKZW+utlrS7Pvofut
ufqD9NRrmutBzgXsZiri/x9b5+UsREKsdHNyLHVfjYo9PlEsZwMjaXxExAlEdgvl
L0pU4EpLP8kK5pRV5K8IXCEVMyoejI2ZIrtu5DHlc3mblUApXkKx1TG81gJsncHh
fDLARilSgCN6g8wQjSiBiXwS6ynBRcmvpeIxrh3x7xQ7OqmqDMS5JcgJCNvBQEKm
yEMCKd8vf30Jpva3VRmQXyUiDMwebguUqSJUDJW0JeDFFGQ7O39JeOwB72LzVHjZ
N9VMh3gCvtkF2ZPV+HIPNbTFnfydzyjNW/BpGt4r5cyvfd2TjEDSPFeIOWZ+chVQ
j68=
=qHNs
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- removal of the old dma_alloc_noncoherent interface
- remove unused flags to dma_declare_coherent_memory
- restrict OF DMA configuration to specific physical busses
- use the iommu mailing list for dma-mapping questions and patches
* tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping:
dma-coherent: fix dma_declare_coherent_memory() logic error
ARM: imx: mx31moboard: Remove unused 'dma' variable
dma-coherent: remove an unused variable
MAINTAINERS: use the iommu list for the dma-mapping subsystem
dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags
dma-coherent: remove the DMA_MEMORY_INCLUDES_CHILDREN flag
of: restrict DMA configuration
dma-mapping: remove dma_alloc_noncoherent and dma_free_noncoherent
i825xx: switch to switch to dma_alloc_attrs
au1000_eth: switch to dma_alloc_attrs
sgiseeq: switch to dma_alloc_attrs
dma-mapping: reduce dma_mapping_error inline bloat
This reverts commit 81f9507628.
It causes random failures of firmware loading at resume time (well,
random for me, it seems to be more reliable for others) because the
firmware disabling is not actually synchronous with any particular
resume event, and at least the btusb driver that uses a workqueue to
load the firmware at resume seems to occasionally hit the "firmware
loading is disabled" logic because the firmware loader hasn't gotten the
resume event yet.
Some kind of sanity check for not trying to load firmware when it's not
possible might be a good thing, but this commit was not it.
Greg seems to have silently suffered the same issue, and pointed to the
likely culprit, and Gabriel C verified the revert fixed it for him too.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Pointed-at-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Gabriel C <nix.or.die@gmail.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First, number of CPUs can't be negative number.
Second, different signnnedness leads to suboptimal code in the following
cases:
1)
kmalloc(nr_cpu_ids * sizeof(X));
"int" has to be sign extended to size_t.
2)
while (loff_t *pos < nr_cpu_ids)
MOVSXD is 1 byte longed than the same MOV.
Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".
Code savings on allyesconfig kernel: -3KB
add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function old new delta
coretemp_cpu_online 450 512 +62
rcu_init_one 1234 1272 +38
pci_device_probe 374 399 +25
...
pgdat_reclaimable_pages 628 556 -72
select_fallback_rq 446 369 -77
task_numa_find_cpu 1923 1807 -116
Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Separate NUMA statistics from zone statistics", v2.
Each page allocation updates a set of per-zone statistics with a call to
zone_statistics(). As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed. This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).
A link to the MM summit slides:
http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf
To mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang). The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.
With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark. Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).
I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.
Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
Threshold CPU cycles Throughput(88 threads)
32 799 241760478
64 640 301628829
125 537 358906028 <==> system by default
256 468 412397590
512 428 450550704
4096 399 482520943
20000 394 489009617
30000 395 488017817
65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset
N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics
This patch (of 3):
In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.
E.g. cat /proc/zoneinfo
***Base*** ***With this patch***
nr_free_pages 3976 nr_free_pages 3976
nr_zone_inactive_anon 0 nr_zone_inactive_anon 0
nr_zone_active_anon 0 nr_zone_active_anon 0
nr_zone_inactive_file 0 nr_zone_inactive_file 0
nr_zone_active_file 0 nr_zone_active_file 0
nr_zone_unevictable 0 nr_zone_unevictable 0
nr_zone_write_pending 0 nr_zone_write_pending 0
nr_mlock 0 nr_mlock 0
nr_page_table_pages 0 nr_page_table_pages 0
nr_kernel_stack 0 nr_kernel_stack 0
nr_bounce 0 nr_bounce 0
nr_zspages 0 nr_zspages 0
numa_hit 0 *nr_free_cma 0*
numa_miss 0 numa_hit 0
numa_foreign 0 numa_miss 0
numa_interleave 0 numa_foreign 0
numa_local 0 numa_interleave 0
numa_other 0 numa_local 0
*nr_free_cma 0* numa_other 0
... ...
vm stats threshold: 10 vm stats threshold: 10
... ...
The next patch updates the numa stats counter size and threshold.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose
of the movable zone is, however, not bound to any physical memory
restriction. It merely defines a class of migrateable and reclaimable
memory.
There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to
be prepared for zones overlapping in the physical range already because
we do support interleaving NUMA nodes and therefore zones can interleave
as well. This means we can allow each memory block to be associated
with a different zone.
Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but
still sensible. echo online > memoryXY/state will online the given
block to
1) the default zone if the given range is outside of any zone
2) the enclosing zone if such a zone doesn't interleave with
any other zone
3) the default zone if more zones interleave for this range
where default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.
Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline:
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable
Now, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable
As we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.
root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable
Now the default zone for blocks 37-41 has changed because movable zone
spans that range.
root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable
Note that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.
For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
echo online > $i/state 2>/dev/null
done
memory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:Movable
Implementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a
catch all default behavior.
Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: <slaoub@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prior to commit f1dd2cd13c ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") we used to allow to change the
valid zone types of a memory block if it is adjacent to a different zone
type.
This fact was reflected in memoryNN/valid_zones by the ordering of
printed zones. The first one was default (echo online > memoryNN/state)
and the other one could be onlined explicitly by online_{movable,kernel}.
This behavior was removed by the said patch and as such the ordering was
not all that important. In most cases a kernel zone would be default
anyway. The only exception is movable_node handled by "mm,
memory_hotplug: support movable_node for hotpluggable nodes".
Let's reintroduce this behavior again because later patch will remove
the zone overlap restriction and so user will be allowed to online
kernel resp. movable block regardless of its placement. Original
behavior will then become significant again because it would be
non-trivial for users to see what is the default zone to online into.
Implementation is really simple. Pull out zone selection out of
move_pfn_range into zone_for_pfn_range helper and use it in
show_valid_zones to display the zone for default onlining and then both
kernel and movable if they are allowed. Default online zone is not
duplicated.
Link: http://lkml.kernel.org/r/20170714121233.16861-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: <slaoub@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Introduce fwnode operations for all of the separate types of
"firmware nodes" that can be handled by the device properties
framework and drop the type field from struct fwnode_handle
(Sakari Ailus, Arnd Bergmann).
- Make the device properties framework use const fwnode arguments
where possible (Sakari Ailus).
- Add a helper for the consolidated handling of node references
to the device properties framework (Sakari Ailus).
- Switch over the ACPI part of the device properties framework
to the new UUID API (Andy Shevchenko).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZrcHoAAoJEILEb/54YlRxVH4P/i7MVmWxZW1qosqt8NbI+kqu
rjxBiQ1YaPuwWiZk5LMRQWIr4Y52v+8uwoVAoQbpfkpQpxpUtIApqFGGHkOK091S
6wcwdAJv78m7dQGJZ96nQkBdw+qCUG+s9L3KMfXYiipwyG7bg4BVcs5jZcIqcZ4F
2xecG6DMn4ESwFbZyVULWyQh50tSBztaHEG6AU2T/07yXU3RNJmwAVVZzpHdtA80
mDbWcCFjcmhrpPa0Aq6MrSMjKso1zd8Es+xwYhXsIQpD1l0HhLLQ0X4veSPcPG4B
aSNEYuribpvZ2FIRti7H7gi/F+Arm9vPdc9WHbOPLOIF1z+GJKiqjBuxUrfXKPqG
v1W3f1bcApe9DfmC5z1wZBi2d7thQOzRFfc8WRrMybQ6z1MAqqe5PfAlgpMFmL22
8ZCzzXIBUsfUjVlwYBvgkKvpLioEl88otWGdhewWY6F+DZ8+vPyvrpi15P36Xgos
ijX89cvyfze3m5GW08hQ6DTOVvaFoMyucYfSo6/MBamw9fbUgiEgBfUAsQyb3sRU
8g1KrwkAX8KFmoocX/AVjvwVBaKNdYeJ9Gy6EItAPxNl+F1q6vjkO0r/VeSrO1KW
3GRqw5MZP35DD9IRo4DTAjwtNVkgIUjpG/hfB7l3PFdDxWfeiM5tf2zMExhT0nIR
h8s8mn61KZp0gpsE02FS
=0rnk
-----END PGP SIGNATURE-----
Merge tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
"These introduce fwnode operations for all of the separate types of
'firmware nodes' that can be handled by the device properties
framework, make the framework use const fwnode arguments all over, add
a helper for the consolidated handling of node references and switch
over the framework to the new UUID API.
Specifics:
- Introduce fwnode operations for all of the separate types of
'firmware nodes' that can be handled by the device properties
framework and drop the type field from struct fwnode_handle (Sakari
Ailus, Arnd Bergmann).
- Make the device properties framework use const fwnode arguments
where possible (Sakari Ailus).
- Add a helper for the consolidated handling of node references to
the device properties framework (Sakari Ailus).
- Switch over the ACPI part of the device properties framework to the
new UUID API (Andy Shevchenko)"
* tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: device property: Switch to use new generic UUID API
device property: export irqchip_fwnode_ops
device property: Introduce fwnode_property_get_reference_args
device property: Constify fwnode property API
device property: Constify argument to pset fwnode backend
ACPI: Constify internal fwnode arguments
ACPI: Constify acpi_bus helper functions, switch to macros
ACPI: Prepare for constifying acpi_get_next_subnode() fwnode argument
device property: Get rid of struct fwnode_handle type field
ACPI: Use IS_ERR_OR_NULL() instead of non-NULL check in is_acpi_data_node()
- Drop the P-state selection algorithm based on a PID controller
from intel_pstate and make it use the same P-state selection
method (based on the CPU load) for all types of systems in the
active mode (Rafael Wysocki, Srinivas Pandruvada).
- Rework the cpufreq core and governors to make it possible to
take cross-CPU utilization updates into account and modify the
schedutil governor to actually do so (Viresh Kumar).
- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).
- Add support for new SoCs (MT2701/MT7623 and MT7622) to the
mediatek cpufreq driver and update its DT bindings (Sean Wang).
- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points
(OPP) DT bindings and update its whitelist of supported systems
(Viresh Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen,
Finley Xiao).
- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).
- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
- Update the IO-wait boost handling in the schedutil governor to
make it less aggressive (Joel Fernandes).
- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).
- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).
- Update documentation related to system sleep and rename a number
of items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).
- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).
- Clean up the cpuidle subsystem to handle the polling state on
x86 in a more straightforward way and to use %pOF instead of
full_name (Rafael Wysocki, Rob Herring).
- Update the devfreq framework to fix and clean up a few minor
issues (Chanwoo Choi, Rob Herring).
- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).
- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).
- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).
- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).
- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZrcDJAAoJEILEb/54YlRx9FUQAIUKvWBAARc61ZIZXjbqZF1v
aEMOBuksFns0CMekdptSic6n4wc81E/XYMS8yDhOOMpyDzfAZsTWjmu+gKwN7w3l
E/yf/NVlhob9JZ7MqGgqD4EUFfFIaKBXPlWFdDi2rdCUXE2L8xJ7rla8i7zyZlc5
pYHfAppBbF4qUcEY4OoOVOOGRZCfMdiLXj0iZOhMX8Y6yLBRk/AjnVADYsF33hoj
gBEfomU+H0K5V8nQEp0ZFKDArPwL+oElHQj6i+nxBpGfPM5evvLXhHOyR6AsldJ5
J4YI1kMuQNSCmvHMqOTxTYyJf8Jcf3Fj4wcjwaVMVGceY1lz6McAKknnFnCqCvz+
mskn84gFCBCM8EoJDqRf0b9MQHcuRyQKM+yw4tjnR9r8yd32erb85ZWFHcPWYhCT
fZatNOwFFv2MU+2vo5J3yeUNSWIKT+uBjy+tKPbrDkUwpKZVRj3Oj+hP3Mq9NE8U
YBqltsj7tmrdA634zI8C7jfS6wF221S0fId/iPszwmPJaVn/lq8Ror7pWL5YI8U7
SCJFjiqDiGmAcQEkuWwFAQnscZkyHpO+Y3A+jfXl/izoaZETaI5+ceIHBaocm3+5
XrOOpHS3ik8EHf9ji0KFCKZ/pYDwllday3cBQPWo3sMIzpQ2lrjbqdnE1cVnBrld
OtHZAeD/jLUXuY6XW2jN
=mAiV
-----END PGP SIGNATURE-----
Merge tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"This time (again) cpufreq gets the majority of changes which mostly
are driver updates (including a major consolidation of intel_pstate),
some schedutil governor modifications and core cleanups.
There also are some changes in the system suspend area, mostly related
to diagnostics and debug messages plus some renames of things related
to suspend-to-idle. One major change here is that suspend-to-idle is
now going to be preferred over S3 on systems where the ACPI tables
indicate to do so and provide requsite support (the Low Power Idle S0
_DSM in particular). The system sleep documentation and the tools
related to it are updated too.
The rest is a few cpuidle changes (nothing major), devfreq updates,
generic power domains (genpd) framework updates and a few assorted
modifications elsewhere.
Specifics:
- Drop the P-state selection algorithm based on a PID controller from
intel_pstate and make it use the same P-state selection method
(based on the CPU load) for all types of systems in the active mode
(Rafael Wysocki, Srinivas Pandruvada).
- Rework the cpufreq core and governors to make it possible to take
cross-CPU utilization updates into account and modify the schedutil
governor to actually do so (Viresh Kumar).
- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).
- Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
cpufreq driver and update its DT bindings (Sean Wang).
- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points (OPP)
DT bindings and update its whitelist of supported systems (Viresh
Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
Xiao).
- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).
- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
- Update the IO-wait boost handling in the schedutil governor to make
it less aggressive (Joel Fernandes).
- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).
- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).
- Update documentation related to system sleep and rename a number of
items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).
- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).
- Clean up the cpuidle subsystem to handle the polling state on x86
in a more straightforward way and to use %pOF instead of full_name
(Rafael Wysocki, Rob Herring).
- Update the devfreq framework to fix and clean up a few minor issues
(Chanwoo Choi, Rob Herring).
- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).
- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).
- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).
- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).
- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava)"
* tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
cpuidle: Make drivers initialize polling state
cpuidle: Move polling state initialization code to separate file
cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
cpufreq: imx6q: Fix imx6sx low frequency support
cpufreq: speedstep-lib: make several arrays static, makes code smaller
PM: docs: Delete the obsolete states.txt document
PM: docs: Describe high-level PM strategies and sleep states
PM / devfreq: Fix memory leak when fail to register device
PM / devfreq: Add dependency on PM_OPP
PM / devfreq: Move private devfreq_update_stats() into devfreq
PM / devfreq: Convert to using %pOF instead of full_name
PM / AVS: rockchip-io: add io selectors and supplies for RV1108
cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
cpufreq: dt-platdev: Drop few entries from whitelist
cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
ARM: ux500: don't select CPUFREQ_DT
cpuidle: Convert to using %pOF instead of full_name
cpufreq: Convert to using %pOF instead of full_name
PM / Domains: Convert to using %pOF instead of full_name
cpufreq: Cap the default transition delay value to 10 ms
...
A recent change interprets the return code of dma_init_coherent_memory
as an error value, but it is instead a boolean, where 'true' indicates
success. This leads causes the caller to always do the wrong thing,
and also triggers a compile-time warning about it:
drivers/base/dma-coherent.c: In function 'dma_declare_coherent_memory':
drivers/base/dma-coherent.c:99:15: error: 'mem' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I ended up changing the code a little more, to give use the usual
error handling, as this seemed the best way to fix up the warning
and make the code look reasonable at the same time.
Fixes: 2436bdcda5 ("dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
* pm-sleep:
ACPI / PM: Check low power idle constraints for debug only
PM / s2idle: Rename platform operations structure
PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
PM / s2idle: Rename freeze_state enum and related items
PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
ACPI / PM: Prefer suspend-to-idle over S3 on some systems
platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
PM / suspend: Define pr_fmt() in suspend.c
PM / suspend: Use mem_sleep_labels[] strings in messages
PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
PM / core: Add error argument to dpm_show_time()
PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
PM / s2idle: Rearrange the main suspend-to-idle loop
PM / timekeeping: Print debug messages when requested
PM / sleep: Mark suspend/hibernation start and finish
PM / sleep: Do not print debug messages by default
PM / suspend: Export pm_suspend_target_state
* pm-core:
PM / wakeup: Set power.can_wakeup if wakeup_sysfs_add() fails
* pm-opp:
PM / OPP: Fix get sharing CPUs when hotplug is used
PM / OPP: OF: Use pr_debug() instead of pr_err() while adding OPP table
* pm-domains:
PM / Domains: Convert to using %pOF instead of full_name
PM / Domains: Extend generic power domain debugfs
PM / Domains: Add time accounting to various genpd states
* pm-cpu:
PM / CPU: replace raw_notifier with atomic_notifier
* pm-avs:
PM / AVS: rockchip-io: add io selectors and supplies for RV1108
DMA_MEMORY_IO was never used in the tree, so remove it. That means there is
no need for the DMA_MEMORY_MAP flag either now, so remove it as well and
change dma_declare_coherent_memory to return a normal errno value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com>
The .release function of driver_ktype is 'driver_release()'.
This function frees the container_of this kobject.
So, this memory must not be freed explicitly in the error handling path of
'bus_add_driver()'. Otherwise a double free will occur.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As seen from the implementation of the single class shutdown hook this
is not very sound design.
Rename the class shutdown hook to shutdown_pre to make it clear it runs
before the driver shutdown hook.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>