408c9861c6
- Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZWrICAAoJEILEb/54YlRxZYMQAIRhfbyDxKq+ByvSilUS8kTA AItwJ8FFzykhiwN75Cqabg4rAGyWma7IRs1vzU7zeC1aEQIn+bTQtvk+utZNI+g2 ANFlDha20q/sXsP/CDMMTIAdW9tSOC0TOvFI9s2V2Y8dJZhoekO4ctx34FAfUS5d Ao6rwSAWCMsCXcGaTAlqTA+TEJmBG7u6Iq6hq6ngltoFwOv3mWWBVn52VVaJ7SMp 9/IPbbLGMFAedrgEBRGCR+MME1xZZpvcZIJaTt1Mgn7Cx3cJaysIUAvqY/SsvFGq 5FcUTcF2qpK3+AGawiAxZIjvOBsGRtIwqKinNIzYWs/NjiIdzmgVAmTeuPtTqp+5 HFehUdtkFcnuDnLqSNzAaZUa7tw84cJkwnbVMnesx0MkG6rZ1SeL22E2Sabpcdsh 3Yo1ThzJSxi59DhiiE92EQnNCEjmCldRy+8q5Ag035muxl6EJYvuNBMnZv/BMCUn ltSNOrmps1DlN+Col8ORIeNzQ1YjYzWMqKAYzSbyccm4ug/iSHx0/DuESmQ4GTlF YCwkmqyWiHrBwpl51jc+4a7SGlMmKRqU+MJes0CjagaaqoUAb8qeBOpzEJ0yNwjZ wtI41l6blE6kbMD3yqGdCfiB2S7GlPVoxa15eX1wRyLH3fLjwwrzJirEaiBS86tI 1PzHZEOmBlh3DYC6DBKA =Wsph -----END PGP SIGNATURE----- Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The big ticket items here are the rework of suspend-to-idle in order to add proper support for power button wakeup from it on recent Dell laptops and the rework of interfaces exporting the current CPU frequency on x86. In addition to that, support for a few new pieces of hardware is added, the PCI/ACPI device wakeup infrastructure is simplified significantly and the wakeup IRQ framework is fixed to unbreak the IRQ bus locking infrastructure. Also, there are some functional improvements for intel_pstate, tools updates and small fixes and cleanups all over. Specifics: - Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski)" * tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits) cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes PM: hibernate: constify attribute_group structures. cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style PM / Domains: Fix missing default_power_down_ok comment PM / Domains: Fix unsafe iteration over modified list of domains PM / Domains: Fix unsafe iteration over modified list of domain providers PM / Domains: Fix unsafe iteration over modified list of device links PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device PM / Domains: Call driver's noirq callbacks PM / core: Drop run_wake flag from struct dev_pm_info PCI / PM: Simplify device wakeup settings code PCI / PM: Drop pme_interrupt flag from struct pci_dev ACPI / PM: Consolidate device wakeup settings code ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags PM / QoS: constify *_attribute_group. PM / AVS: rockchip-io: add io selectors and supplies for rk3228 powercap/RAPL: prevent overridding bits outside of the mask PM / sysfs: Constify attribute groups ...
331 lines
8.1 KiB
C
331 lines
8.1 KiB
C
/*
|
|
* Hibernation support for x86-64
|
|
*
|
|
* Distribute under GPLv2
|
|
*
|
|
* Copyright (c) 2007 Rafael J. Wysocki <rjw@sisk.pl>
|
|
* Copyright (c) 2002 Pavel Machek <pavel@ucw.cz>
|
|
* Copyright (c) 2001 Patrick Mochel <mochel@osdl.org>
|
|
*/
|
|
|
|
#include <linux/gfp.h>
|
|
#include <linux/smp.h>
|
|
#include <linux/suspend.h>
|
|
#include <linux/scatterlist.h>
|
|
#include <linux/kdebug.h>
|
|
|
|
#include <crypto/hash.h>
|
|
|
|
#include <asm/e820/api.h>
|
|
#include <asm/init.h>
|
|
#include <asm/proto.h>
|
|
#include <asm/page.h>
|
|
#include <asm/pgtable.h>
|
|
#include <asm/mtrr.h>
|
|
#include <asm/sections.h>
|
|
#include <asm/suspend.h>
|
|
#include <asm/tlbflush.h>
|
|
|
|
/* Defined in hibernate_asm_64.S */
|
|
extern asmlinkage __visible int restore_image(void);
|
|
|
|
/*
|
|
* Address to jump to in the last phase of restore in order to get to the image
|
|
* kernel's text (this value is passed in the image header).
|
|
*/
|
|
unsigned long restore_jump_address __visible;
|
|
unsigned long jump_address_phys;
|
|
|
|
/*
|
|
* Value of the cr3 register from before the hibernation (this value is passed
|
|
* in the image header).
|
|
*/
|
|
unsigned long restore_cr3 __visible;
|
|
|
|
unsigned long temp_level4_pgt __visible;
|
|
|
|
unsigned long relocated_restore_code __visible;
|
|
|
|
static int set_up_temporary_text_mapping(pgd_t *pgd)
|
|
{
|
|
pmd_t *pmd;
|
|
pud_t *pud;
|
|
p4d_t *p4d;
|
|
|
|
/*
|
|
* The new mapping only has to cover the page containing the image
|
|
* kernel's entry point (jump_address_phys), because the switch over to
|
|
* it is carried out by relocated code running from a page allocated
|
|
* specifically for this purpose and covered by the identity mapping, so
|
|
* the temporary kernel text mapping is only needed for the final jump.
|
|
* Moreover, in that mapping the virtual address of the image kernel's
|
|
* entry point must be the same as its virtual address in the image
|
|
* kernel (restore_jump_address), so the image kernel's
|
|
* restore_registers() code doesn't find itself in a different area of
|
|
* the virtual address space after switching over to the original page
|
|
* tables used by the image kernel.
|
|
*/
|
|
|
|
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
|
|
p4d = (p4d_t *)get_safe_page(GFP_ATOMIC);
|
|
if (!p4d)
|
|
return -ENOMEM;
|
|
}
|
|
|
|
pud = (pud_t *)get_safe_page(GFP_ATOMIC);
|
|
if (!pud)
|
|
return -ENOMEM;
|
|
|
|
pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
|
|
if (!pmd)
|
|
return -ENOMEM;
|
|
|
|
set_pmd(pmd + pmd_index(restore_jump_address),
|
|
__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
|
|
set_pud(pud + pud_index(restore_jump_address),
|
|
__pud(__pa(pmd) | _KERNPG_TABLE));
|
|
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
|
|
set_p4d(p4d + p4d_index(restore_jump_address), __p4d(__pa(pud) | _KERNPG_TABLE));
|
|
set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(p4d) | _KERNPG_TABLE));
|
|
} else {
|
|
/* No p4d for 4-level paging: point the pgd to the pud page table */
|
|
set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(pud) | _KERNPG_TABLE));
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
static void *alloc_pgt_page(void *context)
|
|
{
|
|
return (void *)get_safe_page(GFP_ATOMIC);
|
|
}
|
|
|
|
static int set_up_temporary_mappings(void)
|
|
{
|
|
struct x86_mapping_info info = {
|
|
.alloc_pgt_page = alloc_pgt_page,
|
|
.page_flag = __PAGE_KERNEL_LARGE_EXEC,
|
|
.offset = __PAGE_OFFSET,
|
|
};
|
|
unsigned long mstart, mend;
|
|
pgd_t *pgd;
|
|
int result;
|
|
int i;
|
|
|
|
pgd = (pgd_t *)get_safe_page(GFP_ATOMIC);
|
|
if (!pgd)
|
|
return -ENOMEM;
|
|
|
|
/* Prepare a temporary mapping for the kernel text */
|
|
result = set_up_temporary_text_mapping(pgd);
|
|
if (result)
|
|
return result;
|
|
|
|
/* Set up the direct mapping from scratch */
|
|
for (i = 0; i < nr_pfn_mapped; i++) {
|
|
mstart = pfn_mapped[i].start << PAGE_SHIFT;
|
|
mend = pfn_mapped[i].end << PAGE_SHIFT;
|
|
|
|
result = kernel_ident_mapping_init(&info, pgd, mstart, mend);
|
|
if (result)
|
|
return result;
|
|
}
|
|
|
|
temp_level4_pgt = __pa(pgd);
|
|
return 0;
|
|
}
|
|
|
|
static int relocate_restore_code(void)
|
|
{
|
|
pgd_t *pgd;
|
|
p4d_t *p4d;
|
|
pud_t *pud;
|
|
pmd_t *pmd;
|
|
pte_t *pte;
|
|
|
|
relocated_restore_code = get_safe_page(GFP_ATOMIC);
|
|
if (!relocated_restore_code)
|
|
return -ENOMEM;
|
|
|
|
memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
|
|
|
|
/* Make the page containing the relocated code executable */
|
|
pgd = (pgd_t *)__va(read_cr3_pa()) +
|
|
pgd_index(relocated_restore_code);
|
|
p4d = p4d_offset(pgd, relocated_restore_code);
|
|
if (p4d_large(*p4d)) {
|
|
set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
|
|
goto out;
|
|
}
|
|
pud = pud_offset(p4d, relocated_restore_code);
|
|
if (pud_large(*pud)) {
|
|
set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
|
|
goto out;
|
|
}
|
|
pmd = pmd_offset(pud, relocated_restore_code);
|
|
if (pmd_large(*pmd)) {
|
|
set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
|
|
goto out;
|
|
}
|
|
pte = pte_offset_kernel(pmd, relocated_restore_code);
|
|
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
|
|
out:
|
|
__flush_tlb_all();
|
|
return 0;
|
|
}
|
|
|
|
int swsusp_arch_resume(void)
|
|
{
|
|
int error;
|
|
|
|
/* We have got enough memory and from now on we cannot recover */
|
|
error = set_up_temporary_mappings();
|
|
if (error)
|
|
return error;
|
|
|
|
error = relocate_restore_code();
|
|
if (error)
|
|
return error;
|
|
|
|
restore_image();
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* pfn_is_nosave - check if given pfn is in the 'nosave' section
|
|
*/
|
|
|
|
int pfn_is_nosave(unsigned long pfn)
|
|
{
|
|
unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
|
|
unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
|
|
return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
|
|
}
|
|
|
|
#define MD5_DIGEST_SIZE 16
|
|
|
|
struct restore_data_record {
|
|
unsigned long jump_address;
|
|
unsigned long jump_address_phys;
|
|
unsigned long cr3;
|
|
unsigned long magic;
|
|
u8 e820_digest[MD5_DIGEST_SIZE];
|
|
};
|
|
|
|
#define RESTORE_MAGIC 0x23456789ABCDEF01UL
|
|
|
|
#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
|
|
/**
|
|
* get_e820_md5 - calculate md5 according to given e820 table
|
|
*
|
|
* @table: the e820 table to be calculated
|
|
* @buf: the md5 result to be stored to
|
|
*/
|
|
static int get_e820_md5(struct e820_table *table, void *buf)
|
|
{
|
|
struct scatterlist sg;
|
|
struct crypto_ahash *tfm;
|
|
int size;
|
|
int ret = 0;
|
|
|
|
tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC);
|
|
if (IS_ERR(tfm))
|
|
return -ENOMEM;
|
|
|
|
{
|
|
AHASH_REQUEST_ON_STACK(req, tfm);
|
|
size = offsetof(struct e820_table, entries) + sizeof(struct e820_entry) * table->nr_entries;
|
|
ahash_request_set_tfm(req, tfm);
|
|
sg_init_one(&sg, (u8 *)table, size);
|
|
ahash_request_set_callback(req, 0, NULL, NULL);
|
|
ahash_request_set_crypt(req, &sg, buf, size);
|
|
|
|
if (crypto_ahash_digest(req))
|
|
ret = -EINVAL;
|
|
ahash_request_zero(req);
|
|
}
|
|
crypto_free_ahash(tfm);
|
|
|
|
return ret;
|
|
}
|
|
|
|
static void hibernation_e820_save(void *buf)
|
|
{
|
|
get_e820_md5(e820_table_firmware, buf);
|
|
}
|
|
|
|
static bool hibernation_e820_mismatch(void *buf)
|
|
{
|
|
int ret;
|
|
u8 result[MD5_DIGEST_SIZE];
|
|
|
|
memset(result, 0, MD5_DIGEST_SIZE);
|
|
/* If there is no digest in suspend kernel, let it go. */
|
|
if (!memcmp(result, buf, MD5_DIGEST_SIZE))
|
|
return false;
|
|
|
|
ret = get_e820_md5(e820_table_firmware, result);
|
|
if (ret)
|
|
return true;
|
|
|
|
return memcmp(result, buf, MD5_DIGEST_SIZE) ? true : false;
|
|
}
|
|
#else
|
|
static void hibernation_e820_save(void *buf)
|
|
{
|
|
}
|
|
|
|
static bool hibernation_e820_mismatch(void *buf)
|
|
{
|
|
/* If md5 is not builtin for restore kernel, let it go. */
|
|
return false;
|
|
}
|
|
#endif
|
|
|
|
/**
|
|
* arch_hibernation_header_save - populate the architecture specific part
|
|
* of a hibernation image header
|
|
* @addr: address to save the data at
|
|
*/
|
|
int arch_hibernation_header_save(void *addr, unsigned int max_size)
|
|
{
|
|
struct restore_data_record *rdr = addr;
|
|
|
|
if (max_size < sizeof(struct restore_data_record))
|
|
return -EOVERFLOW;
|
|
rdr->jump_address = (unsigned long)restore_registers;
|
|
rdr->jump_address_phys = __pa_symbol(restore_registers);
|
|
rdr->cr3 = restore_cr3;
|
|
rdr->magic = RESTORE_MAGIC;
|
|
|
|
hibernation_e820_save(rdr->e820_digest);
|
|
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* arch_hibernation_header_restore - read the architecture specific data
|
|
* from the hibernation image header
|
|
* @addr: address to read the data from
|
|
*/
|
|
int arch_hibernation_header_restore(void *addr)
|
|
{
|
|
struct restore_data_record *rdr = addr;
|
|
|
|
restore_jump_address = rdr->jump_address;
|
|
jump_address_phys = rdr->jump_address_phys;
|
|
restore_cr3 = rdr->cr3;
|
|
|
|
if (rdr->magic != RESTORE_MAGIC) {
|
|
pr_crit("Unrecognized hibernate image header format!\n");
|
|
return -EINVAL;
|
|
}
|
|
|
|
if (hibernation_e820_mismatch(rdr->e820_digest)) {
|
|
pr_crit("Hibernate inconsistent memory map detected!\n");
|
|
return -ENODEV;
|
|
}
|
|
|
|
return 0;
|
|
}
|