Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.
The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.
This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The secondary threads should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.
Since the device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.
Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.
They will be taken care of by the core powernv code.
Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch exposes the available i2c busses on the PowerNV platform
to the kernel and implements the bus driver to support i2c and
smbus commands.
The driver uses the platform device infrastructure to probe the busses
on the platform and registers them with the i2c driver framework.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.
We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
Performance number:
We use randbox_access_bench written by Anton.
Kernel with THP disabled and smaller hash page table size.
86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp
2.10% random_access_b random_access_bench [.] doit
1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock
1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range
1.18% random_access_b [kernel.kallsyms] [k] .__delay
0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page
0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return
0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm
With Fix:
27.54% random_access_b random_access_bench [.] doit
22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return
5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm
3.31% random_access_b [kernel.kallsyms] [k] data_access_common
1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If we know that user address space has never executed on other cpus
we could use tlbiel.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PowerNV platform, PHB diag-data is dumped after stopping device
drivers. In case of recursive EEH errors, the kernel is usually
crashed before dumping PHB diag-data for the second EEH error. It's
hard to locate the root cause of the second EEH error without PHB
diag-data.
The patch adds one more EEH option "eeh=early_log", which helps
dumping PHB diag-data immediately once frozen PE is detected, in
order to get the PHB diag-data for the second EEH error.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch introduces additional flag EEH_PE_RESET to indicate the
corresponding PE is under reset. In turn, the PE retrieval bakcend
on PowerNV platform can return unfrozen state for the EEH core to
moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for
the purpose.
In PCI passthrou case, the problem is more worse: Guest doesn't
recover 6th EEH error. The PE is left in isolated (frozen) and
config blocked state on Broadcom adapters. We can't retrieve the
PE's state correctly any more, even from the host side via sysfs
/sys/bus/pci/devices/xxx/eeh_pe_state.
Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.
Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:
p = (struct foo *)alloc_bootmem(x);
Becomes:
p = memblock_virt_alloc(x, 0);
We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.
We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.
The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().
Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
nvram_pstore_info's buf_lock is not initialized before registering,
which is clearly incorrect.
It causes some strange behavior when trying to obtain the lock during
kdump process.
On a UP configuration, the console stopped for a couple of seconds, then
"lockup suspected" warning printed out, but then it continued to run.
So try lock fails, and lockup reported, but then arch_spin_lock()
passes.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
[mpe: Edited changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Scott says:
"Highlights include a bunch of 8xx optimizations, device tree bindings
for Freescale BMan, QMan, and FMan datapath components, misc device tree
updates, and inbound rio window support."
The patch implements the OPAL rtc driver that binds with the rtc
driver subsystem. The driver uses the platform device infrastructure
to probe the rtc device and register it to rtc class framework. The
'wakeup' is supported depending upon the property 'has-tpo' present
in the OF node. It provides a way to load the generic rtc driver in
in the absence of an OPAL driver.
The patch also moves the existing OPAL rtc get/set time interfaces to the
new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.
Test results:
-------------
Host:
[root@tul169p1 ~]# ls -l /sys/class/rtc/
total 0
lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
[root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
08:10:07
[root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
[root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
1413274345
[root@tul169p1 ~]#
FSP:
$ smgr mfgState
standby
$ rtim timeofday
System time is valid: 2014/10/14 08:12:04.225115
$ smgr mfgState
ipling
$
CC: devicetree@vger.kernel.org
CC: tglx@linutronix.de
CC: rtc-linux@googlegroups.com
CC: a.zummo@towertech.it
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If there're no PHBs under P5IOC2 HUB device tree node, we should
bail early to avoid zero devisor and allocating TCE tables.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When freezing compound PEs in pnv_ioda_freeze_pe(), we should bail
upon illegal master PE. We needn't freeze slave PE because it should
have been put into frozen state by hardware.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nested if statements are always bad and the patch avoids one by
checking PHB type and bail in advance if necessary.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 262af55 ("powerpc/powernv: Enable M64 aperatus for PHB3")
introduced compound PEs in order to support M64 aperatus on PHB3.
However, we never configured PELTV for compound PEs. The patch
fixes that by: parent PE can freeze all child compound PEs. Any
compound PE affects the group.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch initializes PE instance when reserving PE number to
keep consistent things as we did before. Also, it replaces the
iteration on bridge's windows with the prefered way.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch renames alloc_m64_pe() to reserve_m64_pe() to reflect
its real usage: We reserve PE numbers for M64 segments in advance
and then pick up the reserved PE numbers when building the mapping
between PE numbers and M64 segments.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The M64 resource should be removed if we don't have hook to
initialize it, or (not and) fail to do that.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch checks PHB type a bit early to save a bit cycles
for P7 because we don't support M64 for P7IOC no matter what
OPAL firmware we have.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recent OPAL firmare adds a couple of functions to send and receive IPMI
messages:
https://github.com/open-power/skiboot/commit/b2a374da
This change updates the token list and wrappers to suit, and adds the
platform devices for any IPMI interfaces.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simplify the error path to avoid calling of_node_put when it is not needed.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Of_node_put supports NULL as its argument, so the initial test is not
necessary.
Suggested by Uwe Kleine-König.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
@@
-if (e)
of_node_put(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The H_SET_MODE hcall returns H_P2 if a function is not implemented
and all callers should handle this case.
The call to enable relocation on exceptions currently prints an error
message if the feature is not implemented. While H_SET_MODE was
first introduced on POWER8 (which has relocation on exceptions), it
has been now added on some POWER7 configurations (which does not).
Check for H_P2 and print an informational message instead.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The ibm,pcie-link-speed-stats isn't mandatory, so we shouldn't print
a high priority error message when missing. One example where we see
this is QEMU.
Reduce it to pr_debug.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit d4fe0965e2 ("powerpc/jump_label: use HAVE_JUMP_LABEL?")
missed a few conversions. Change the remaining uses of
CONFIG_JUMP_LABEL to HAVE_JUMP_LABEL.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Lots of places included bootmem.h even when not using bootmem.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
At the moment we transition from the memblock alloctor to the bootmem
allocator. Gitting rid of the bootmem allocator removes a bunch of
complicated code (most of which I owe the dubious honour of being
responsible for writing).
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 39eb56da2b ("pcmcia: Remove m8xx_pcmcia driver") removed the
only driver that used CONFIG_FADS. Setting the Kconfig symbol FADS is
pointless since that commit. Remove it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Scott Wood <scottwood@freescale.com>
We have an extra level of indirection on memory hot remove which is not
matched on memory hot add. Memory hotplug is book3s only, so there is
no need for it.
This also enables means remove_memory() (ie memory hot unplug) works
on powernv.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The generic Linux framework to power off the machine is a function pointer
called pm_power_off. The trick about this pointer is that device drivers can
potentially implement it rather than board files.
Today on powerpc we set pm_power_off to invoke our generic full machine power
off logic which then calls ppc_md.power_off to invoke machine specific power
off.
However, when we want to add a power off GPIO via the "gpio-poweroff" driver,
this card house falls apart. That driver only registers itself if pm_power_off
is NULL to ensure it doesn't override board specific logic. However, since we
always set pm_power_off to the generic power off logic (which will just not
power off the machine if no ppc_md.power_off call is implemented), we can't
implement power off via the generic GPIO power off driver.
To fix this up, let's get rid of the ppc_md.power_off logic and just always use
pm_power_off as was intended. Then individual drivers such as the GPIO power off
driver can implement power off logic via that function pointer.
With this patch set applied and a few patches on top of QEMU that implement a
power off GPIO on the virt e500 machine, I can successfully turn off my virtual
machine after halt.
Signed-off-by: Alexander Graf <agraf@suse.de>
[mpe: Squash into one patch and update changelog based on cover letter]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This still has not been merged and now powerpc is the only arch that does
not have this change. Sorry about missing linuxppc-dev before.
V2->V2
- Fix up to work against 3.18-rc1
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
[mpe: Fix build errors caused by set/or_softirq_pending(), and rework
assignment in __set_breakpoint() to use memcpy().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In powerpc pseries platform dlpar operations, use device_online() and
device_offline() instead of cpu_up() and cpu_down().
Calling cpu_up/down() directly does not update the cpu device offline
field, which is used to online/offline a cpu from sysfs. Calling
device_online/offline() instead keeps the sysfs cpu online value
correct. The hotplug lock, which is required to be held when calling
device_online/offline(), is already held when dlpar_online/offline_cpu()
are called, since they are called only from cpu_probe|release_store().
This patch fixes errors on phyp (PowerVM) systems that have cpu(s)
added/removed using dlpar operations; without this patch, the
/sys/devices/system/cpu/cpuN/online nodes do not correctly show the
online state of added/removed cpus.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Fixes: 0902a9044f ("Driver core: Use generic offline/online for CPU offline/online")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Endian is hard, especially when I designed a stupid FW interface, and
I should know better... oh well, this is attempt #2 at fixing this
properly. This time it seems to work with all access sizes and I
can run my flashing tool (which exercises all sort of access sizes
and types to access the SPI controller in the BMC) just fine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit bf7588a085.
Ben says although the code is not correct "[this] fix was completely
wrong and does more damages than it fixes things."
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, we can't call opal wrappers from modules when using the LE
ABIv2, which requires a TOC init. If we do we'll try and load the opal
entry point using the wrong toc and probably explode or worse jump to
the wrong address.
Nothing in upstream is making opal calls from a module, but we do export
one of the wrappers so we should fix this anyway.
This change uses the _GLOBAL_TOC() macro (rather than _GLOBAL) for the
opal wrappers, so that we can do non-local calls to them.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull more powerpc updates from Michael Ellerman:
"Here's some more updates for powerpc for 3.18.
They are a bit late I know, though must are actually bug fixes. In my
defence I nearly cut the top of my finger off last weekend in a
gruesome bike maintenance accident, so I spent a good part of the week
waiting around for doctors. True story, I can send photos if you like :)
Probably the most interesting fix is the sys_call_table one, which
enables syscall tracing for powerpc. There's a fix for HMI handling
for old firmware, more endian fixes for firmware interfaces, more EEH
fixes, Anton fixed our routine that gets the current stack pointer,
and a few other misc bits"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (22 commits)
powerpc: Only do dynamic DMA zone limits on platforms that need it
powerpc: sync pseries_le_defconfig with pseries_defconfig
powerpc: Add printk levels to setup_system output
powerpc/vphn: NUMA node code expects big-endian
powerpc/msi: Use WARN_ON() in msi bitmap selftests
powerpc/msi: Fix the msi bitmap alignment tests
powerpc/eeh: Block CFG upon frozen Shiner adapter
powerpc/eeh: Don't collect logs on PE with blocked config space
powerpc/eeh: Block PCI config access upon frozen PE
powerpc/pseries: Drop config requests in EEH accessors
powerpc/powernv: Drop config requests in EEH accessors
powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKED
powerpc/eeh: Fix condition for isolated state
powerpc/pseries: Make CPU hotplug path endian safe
powerpc/pseries: Use dump_stack instead of show_stack
powerpc: Rename __get_SP() to current_stack_pointer()
powerpc: Reimplement __get_SP() as a function not a define
powerpc/numa: Add ability to disable and debug topology updates
powerpc/numa: check error return from proc_create
powerpc/powernv: Fallback to old HMI handling behavior for old firmware
...
The Broadcom Shiner 2-ports 10G ethernet adapter has same problem
commit 6f20bda0 ("powerpc/eeh: Block PCI config access upon frozen
PE") fixes. Put it to the black list as well.
# lspci -s 0004:01:00.0
0004:01:00.0 Ethernet controller: Broadcom Corporation \
NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
# lspci -n -s 0004:01:00.0
0004:01:00.0 0200: 14e4:168e (rev 10)
Reported-by: John Walthour <jwalthour@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The problem was found when I tried to inject PCI config error by
PHB3 PAPR error injection registers into Broadcom Austin 4-ports
NIC adapter. The frozen PE was reported successfully and EEH core
started to recover it. However, I run into fenced PHB when dumping
PCI config space as EEH logs. I was told that PCI config requests
should not be progagated to the adapter until PE reset is done
successfully. Otherise, we would run out of PHB internal credits
and trigger PCT (PCIE Completion Timeout), which leads to the
fenced PHB.
The patch introduces another PE flag EEH_PE_CFG_RESTRICTED, which
is set during PE initialization time if the PE includes the specific
PCI devices that need block PCI config access until PE reset is done.
When the PE becomes frozen for the first time, EEH_PE_CFG_BLOCKED is
set if the PE has flag EEH_PE_CFG_RESTRICTED. Then the PCI config
access to the PE will be dropped by platform PCI accessors until
PE reset is done successfully. The mechanism is shared by PowerNV
platform owned PE or userland owned ones. It's not used on pSeries
platform yet.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's bad idea to access the PCI config registers of the adapters,
which is experiencing reset. It leads to recursive EEH error without
exception. The patch drops PCI config requests in EEH accessors if
the PE has been marked to accept PCI config requests, for example
during PE reseet time.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The flag EEH_PE_RESET indicates blocking config space of the PE
during reset time. We potentially need block PE's config space
other than reset time. So it's reasonable to replace it with
EEH_PE_CFG_BLOCKED to indicate its usage.
There are no substantial code or logic changes in this patch.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- ibm,rtas-configure-connector should treat the RTAS data as big endian.
- Treat ibm,ppc-interrupt-server#s as big-endian when setting
smp_processor_id during hotplug.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We can use the simpler dump_stack() instead of
show_stack(current, __get_SP())
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recently we moved HMI handling into Linux kernel instead of taking
HMI directly in OPAL. This new change is dependent on new OPAL call
for HMI recovery which was introduced in newer firmware. While this new
change works fine with latest OPAL firmware, we broke the HMI handling
if we run newer kernel on old OPAL firmware that results in system hang.
This patch fixes this issue by falling back to old HMI behavior on older
OPAL firmware.
This patch introduces a check for opal token OPAL_HANDLE_HMI to see
if we are running on newer firmware or old firmware. On newer firmware
this check would return OPAL_TOKEN_PRESENT, otherwise we are running on
old firmware and fallback to old HMI behavior.
Old firmware: POWER8 System Firmware Release as of today <= SV810_087
Action: Let OPAL handle HMIs
Newer firmware: in development/yet to be released.
Action: Let Linux host handle HMIs.
This patch depends on opal check token patch posted at ppc-devel
https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-August/120224.html
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Minor comment and printk rewording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>