Commit Graph

9423 Commits

Author SHA1 Message Date
Takuya Yoshikawa
db3fe4eb45 KVM: Introduce kvm_memory_slot::arch and move lpage_info into it
Some members of kvm_memory_slot are not used by every architecture.

This patch is the first step to make this difference clear by
introducing kvm_memory_slot::arch;  lpage_info is moved into it.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08 14:10:22 +02:00
Akinobu Mita
2d4b971287 powerpc/pmac: Use string library in nvram code
- Use memchr_inv to check if the data contains all 0xFF bytes.
  It is faster than looping for each byte.

- Use memcmp to compare memory areas

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:09:05 +11:00
Grant Likely
ad5b7f1350 powerpc: Make SPARSE_IRQ required
All IRQs on powerpc are managed via irq_domain anyway, there isn't really
any advantage to turning SPARSE_IRQ off, and it's the direction we want
to take the kernel design anyway.  This patch makes powerpc always use
SPARSE_IRQ.

On pseries_defconfig, SPARSE_IRQ adds only about 0x300 bytes to the
.text sections, and removes about 0x20000 from the data section for the
static irq_desc table.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:09:04 +11:00
Nishanth Aravamudan
e9daf2ad7f powerpc/prom: Remove limit on maximum size of properties
On a 16TB system (using AMS/CMO), I get:

WARNING: ignoring large property [/ibm,dynamic-reconfiguration-memory] ibm,dynamic-memory length 0x000000000017ffec

and significantly less memory is thus shown to the partition. As far as
I can tell, the constant used is arbitrary. Ben Herrenschmidt provided
additional background that

> The limit was originally set because of Apple machines carrying ROM
> images in the device-tree, at a time where we were much more memory
> constrained than we are now.

and that it is likely not very useful any longer.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:10 +11:00
Matt Fleming
a2007ce844 powerpc: Use set_current_blocked() and block_sigmask()
As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block
is pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0f2
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this
code wrong, so using this helper function should stop that from
happening again.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:09 +11:00
Joe Perches
a2234b4bae powerpc: Use vsprintf extention %pf with builtin_return_address
Emit the function name not the address when possible.

builtin_return_address() gives an address.  When building
a kernel with CONFIG_KALLSYMS, emit the actual function
name not the address.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:09 +11:00
Jimi Xenidis
de801de139 powerpc/icswx: Fix race condition with IPI setting ACOP
There is a race where a thread causes a coprocessor type to be valid
in its own ACOP _and_ in the current context, but it does not
propagate to the ACOP register of other threads in time for them to
use it.  The original code tries to solve this by sending an IPI to
all threads on the system, which is heavy handed, but unfortunately
still provides a window where the icswx is issued by other threads and
the ACOP is not up to date.

This patch detects that the ACOP DSI fault was a "false positive" and
syncs the ACOP and causes the icswx to be replayed.

Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:09 +11:00
Anton Blanchard
a6cf7ed511 powerpc/atomic: Implement atomic*_inc_not_zero
Implement atomic_inc_not_zero and atomic64_inc_not_zero. At the
moment we use atomic*_add_unless which requires us to put 0 and
1 constants into registers. We can also avoid a subtract by
saving the original value in a second temporary.

This removes 3 instructions from fget:

- c0000000001b63c0:       39 00 00 00     li      r8,0
- c0000000001b63c4:       39 40 00 01     li      r10,1
...
- c0000000001b63e8:       7c 0a 00 50     subf    r0,r10,r0

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:08 +11:00
Duc Dang
8dfc2b45ff powerpc/44x: Add new compatible value for EMAC node of APM821XX dts file.
This compatible value will be used to distinguish some special features of APM821XX EMAC: no half duplex mode support, configuring jumbo frame.

Signed-off-by: Duc Dang <dhdang@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 17:06:07 -05:00
Stephane Eranian
2481c5fa6d perf: Disable PERF_SAMPLE_BRANCH_* when not supported
PERF_SAMPLE_BRANCH_* is disabled for:

 - SW events (sw counters, tracepoints)
 - HW breakpoints
 - ALL but Intel x86 architecture
 - AMD64 processors

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 14:55:42 +01:00
Alexander Graf
d2a1b483a4 KVM: PPC: Add HPT preallocator
We're currently allocating 16MB of linear memory on demand when creating
a guest. That does work some times, but finding 16MB of linear memory
available in the system at runtime is definitely not a given.

So let's add another command line option similar to the RMA preallocator,
that we can use to keep a pool of page tables around. Now, when a guest
gets created it has a pretty low chance of receiving an OOM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:57:28 +02:00
Alexander Graf
b7f5d0114c KVM: PPC: Initialize linears with zeros
RMAs and HPT preallocated spaces should be zeroed, so we don't accidently
leak information from previous VM executions.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:57:27 +02:00
Alexander Graf
b4e706111d KVM: PPC: Convert RMA allocation into generic code
We have code to allocate big chunks of linear memory on bootup for later use.
This code is currently used for RMA allocation, but can be useful beyond that
extent.

Make it generic so we can reuse it for other stuff later.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:57:25 +02:00
Alexander Graf
9cf7c0e465 KVM: PPC: E500: Fail init when not on e500v2
When enabling the current KVM code on e500mc, I get the following oops:

    Oops: Exception in kernel mode, sig: 4 [#1]
    SMP NR_CPUS=8 P2041 RDB
    Modules linked in:
    NIP: c067df4c LR: c067df44 CTR: 00000000
    REGS: ee055ed0 TRAP: 0700   Not tainted  (3.2.0-10391-g36c5afe)
    MSR: 00029002 <CE,EE,ME>  CR: 24042022  XER: 00000000
    TASK = ee0429b0[1] 'swapper/0' THREAD: ee054000 CPU: 2
    GPR00: c067df44 ee055f80 ee0429b0 00000000 00000058 0000003f ee211600 60c6b864
    GPR08: 7cc903a6 0000002c 00000000 00000001 44042082 2d180088 00000000 00000000
    GPR16: c0000a00 00000014 3fffffff 03fe9000 00000015 7ff3be68 c06e0000 00000000
    GPR24: 00000000 00000000 00001720 c067df1c c06e0000 00000000 ee054000 c06ab51c
    NIP [c067df4c] kvmppc_e500_init+0x30/0xf8
    LR [c067df44] kvmppc_e500_init+0x28/0xf8
    Call Trace:
    [ee055f80] [c067df44] kvmppc_e500_init+0x28/0xf8 (unreliable)
    [ee055fb0] [c0001d30] do_one_initcall+0x50/0x1f0
    [ee055fe0] [c06721dc] kernel_init+0xa4/0x14c
    [ee055ff0] [c000e910] kernel_thread+0x4c/0x68
    Instruction dump:
    9421ffd0 7c0802a6 93410018 9361001c 90010034 93810020 93a10024 93c10028
    93e1002c 4bfffe7d 2c030000 408200a4 <7c1082a6> 90010008 7c1182a6 9001000c
    ---[ end trace b8ef4903fcbf9dd3 ]---

Since it doesn't make sense to run the init function on any non-supported
platform, we can just call our "is this platform supported?" function and
bail out of init() if it's not.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:57:23 +02:00
Paul Mackerras
9d4cba7f93 KVM: Move gfn_to_memslot() to kvm_host.h
This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
kvm_host.h to reduce the code duplication caused by the need for
non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
gfn_to_memslot() in real mode.

Rather than putting gfn_to_memslot() itself in a header, which would
lead to increased code size, this puts __gfn_to_memslot() in a header.
Then, the non-modular uses of gfn_to_memslot() are changed to call
__gfn_to_memslot() instead.  This way there is only one place in the
source code that needs to be changed should the gfn_to_memslot()
implementation need to be modified.

On powerpc, the Book3S HV style of KVM has code that is called from
real mode which needs to call gfn_to_memslot() and thus needs this.
(Module code is allocated in the vmalloc region, which can't be
accessed in real mode.)

With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:57:22 +02:00
Scott Wood
54f65795c8 KVM: PPC: refer to paravirt docs in header file
Instead of keeping separate copies of struct kvm_vcpu_arch_shared (one in
the code, one in the docs) that inevitably fail to be kept in sync
(already sr[] is missing from the doc version), just point to the header
file as the source of documentation on the contents of the magic page.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:41 +02:00
Alexander Graf
b3c5d3c2a4 KVM: PPC: Rename MMIO register identifiers
We need the KVM_REG namespace for generic register settings now, so
let's rename the existing users to something different, enabling
us to reuse the namespace for more visible interfaces.

While at it, also move these private constants to a private header.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:41 +02:00
Paul Mackerras
31f3438eca KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code
This moves the get/set_one_reg implementation down from powerpc.c into
booke.c, book3s_pr.c and book3s_hv.c.  This avoids #ifdefs in C code,
but more importantly, it fixes a bug on Book3s HV where we were
accessing beyond the end of the kvm_vcpu struct (via the to_book3s()
macro) and corrupting memory, causing random crashes and file corruption.

On Book3s HV we only accept setting the HIOR to zero, since the guest
runs in supervisor mode and its vectors are never offset from zero.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf update to apply on top of changed ONE_REG patches]
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:41 +02:00
Alexander Graf
1022fc3d3b KVM: PPC: Add support for explicit HIOR setting
Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SET_ONE_REG based method, we drop the PVR logic.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:41 +02:00
Alexander Graf
e24ed81fed KVM: PPC: Add generic single register ioctls
Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:40 +02:00
Sasha Levin
6b75e6bfef KVM: PPC: Use the vcpu kmem_cache when allocating new VCPUs
Currently the code kzalloc()s new VCPUs instead of using the kmem_cache
which is created when KVM is initialized.

Modify it to allocate VCPUs from that kmem_cache.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:40 +02:00
Liu Yu
d37b1a037c KVM: PPC: booke: Add booke206 TLB trace
The existing kvm_stlb_write/kvm_gtlb_write were a poor match for
the e500/book3e MMU -- mas1 was passed as "tid", mas2 was limited
to "unsigned int" which will be a problem on 64-bit, mas3/7 got
split up rather than treated as a single 64-bit word, etc.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[scottwood@freescale.com: made mas2 64-bit, and added mas8 init]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:40 +02:00
Paul Mackerras
82ed36164c KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit
This changes the implementation of kvm_vm_ioctl_get_dirty_log() for
Book3s HV guests to use the hardware C (changed) bits in the guest
hashed page table.  Since this makes the implementation quite different
from the Book3s PR case, this moves the existing implementation from
book3s.c to book3s_pr.c and creates a new implementation in book3s_hv.c.
That implementation calls kvmppc_hv_get_dirty_log() to do the actual
work by calling kvm_test_clear_dirty on each page.  It iterates over
the HPTEs, clearing the C bit if set, and returns 1 if any C bit was
set (including the saved C bit in the rmap entry).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:39 +02:00
Paul Mackerras
5551489373 KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva
This uses the host view of the hardware R (referenced) bit to speed
up kvm_age_hva() and kvm_test_age_hva().  Instead of removing all
the relevant HPTEs in kvm_age_hva(), we now just reset their R bits
if set.  Also, kvm_test_age_hva() now scans the relevant HPTEs to
see if any of them have R set.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:39 +02:00
Paul Mackerras
bad3b5075e KVM: PPC: Book3s HV: Maintain separate guest and host views of R and C bits
This allows both the guest and the host to use the referenced (R) and
changed (C) bits in the guest hashed page table.  The guest has a view
of R and C that is maintained in the guest_rpte field of the revmap
entry for the HPTE, and the host has a view that is maintained in the
rmap entry for the associated gfn.

Both view are updated from the guest HPT.  If a bit (R or C) is zero
in either view, it will be initially set to zero in the HPTE (or HPTEs),
until set to 1 by hardware.  When an HPTE is removed for any reason,
the R and C bits from the HPTE are ORed into both views.  We have to
be careful to read the R and C bits from the HPTE after invalidating
it, but before unlocking it, in case of any late updates by the hardware.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:39 +02:00
Paul Mackerras
a92bce95f0 KVM: PPC: Book3S HV: Keep HPTE locked when invalidating
This reworks the implementations of the H_REMOVE and H_BULK_REMOVE
hcalls to make sure that we keep the HPTE locked and in the reverse-
mapping chain until we have finished invalidating it.  Previously
we would remove it from the chain and unlock it before invalidating
it, leaving a tiny window when the guest could access the page even
though we believe we have removed it from the guest (e.g.,
kvm_unmap_hva() has been called for the page and has found no HPTEs
in the chain).  In addition, we'll need this for future patches where
we will need to read the R and C bits in the HPTE after invalidating
it.

Doing this required restructuring kvmppc_h_bulk_remove() substantially.
Since we want to batch up the tlbies, we now need to keep several
HPTEs locked simultaneously.  In order to avoid possible deadlocks,
we don't spin on the HPTE bitlock for any except the first HPTE in
a batch.  If we can't acquire the HPTE bitlock for the second or
subsequent HPTE, we terminate the batch at that point, do the tlbies
that we have accumulated so far, unlock those HPTEs, and then start
a new batch to do the remaining invalidations.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:39 +02:00
Matt Evans
b5434032fc KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS
PPC KVM lacks these two capabilities, and as such a userland system must assume
a max of 4 VCPUs (following api.txt).  With these, a userland can determine
a more realistic limit.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Matt Evans
03cdab5340 KVM: PPC: Fix vcpu_create dereference before validity check.
Fix usage of vcpu struct before check that it's actually valid.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Paul Mackerras
4cf302bc10 KVM: PPC: Allow for read-only pages backing a Book3S HV guest
With this, if a guest does an H_ENTER with a read/write HPTE on a page
which is currently read-only, we make the actual HPTE inserted be a
read-only version of the HPTE.  We now intercept protection faults as
well as HPTE not found faults, and for a protection fault we work out
whether it should be reflected to the guest (e.g. because the guest HPTE
didn't allow write access to usermode) or handled by switching to
kernel context and calling kvmppc_book3s_hv_page_fault, which will then
request write access to the page and update the actual HPTE.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Paul Mackerras
342d3db763 KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7.  Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page.  Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.

Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970.  We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out.  If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Paul Mackerras
697d3899dc KVM: PPC: Implement MMIO emulation support for Book3S HV guests
This provides the low-level support for MMIO emulation in Book3S HV
guests.  When the guest tries to map a page which is not covered by
any memslot, that page is taken to be an MMIO emulation page.  Instead
of inserting a valid HPTE, we insert an HPTE that has the valid bit
clear but another hypervisor software-use bit set, which we call
HPTE_V_ABSENT, to indicate that this is an absent page.  An
absent page is treated much like a valid page as far as guest hcalls
(H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
an absent HPTE doesn't need to be invalidated with tlbie since it
was never valid as far as the hardware is concerned.

When the guest accesses a page for which there is an absent HPTE, it
will take a hypervisor data storage interrupt (HDSI) since we now set
the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
looks up the hash table and if it finds an absent HPTE mapping the
requested virtual address, will switch to kernel mode and handle the
fault in kvmppc_book3s_hv_page_fault(), which at present just calls
kvmppc_hv_emulate_mmio() to set up the MMIO emulation.

This is based on an earlier patch by Benjamin Herrenschmidt, but since
heavily reworked.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Paul Mackerras
06ce2c63d9 KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn
This expands the reverse mapping array to contain two links for each
HPTE which are used to link together HPTEs that correspond to the
same guest logical page.  Each circular list of HPTEs is pointed to
by the rmap array entry for the guest logical page, pointed to by
the relevant memslot.  Links are 32-bit HPT entry indexes rather than
full 64-bit pointers, to save space.  We use 3 of the remaining 32
bits in the rmap array entries as a lock bit, a referenced bit and
a present bit (the present bit is needed since HPTE index 0 is valid).
The bit lock for the rmap chain nests inside the HPTE lock bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Paul Mackerras
9d0ef5ea04 KVM: PPC: Allow I/O mappings in memory slots
This provides for the case where userspace maps an I/O device into the
address range of a memory slot using a VM_PFNMAP mapping.  In that
case, we work out the pfn from vma->vm_pgoff, and record the cache
enable bits from vma->vm_page_prot in two low-order bits in the
slot_phys array entries.  Then, in kvmppc_h_enter() we check that the
cache bits in the HPTE that the guest wants to insert match the cache
bits in the slot_phys array entry.  However, we do allow the guest to
create what it thinks is a non-cacheable or write-through mapping to
memory that is actually cacheable, so that we can use normal system
memory as part of an emulated device later on.  In that case the actual
HPTE we insert is a cacheable HPTE.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Paul Mackerras
da9d1d7f28 KVM: PPC: Allow use of small pages to back Book3S HV guests
This relaxes the requirement that the guest memory be provided as
16MB huge pages, allowing it to be provided as normal memory, i.e.
in pages of PAGE_SIZE bytes (4k or 64k).  To allow this, we index
the kvm->arch.slot_phys[] arrays with a small page index, even if
huge pages are being used, and use the low-order 5 bits of each
entry to store the order of the enclosing page with respect to
normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Paul Mackerras
c77162dee7 KVM: PPC: Only get pages when actually needed, not in prepare_memory_region()
This removes the code from kvmppc_core_prepare_memory_region() that
looked up the VMA for the region being added and called hva_to_page
to get the pfns for the memory.  We have no guarantee that there will
be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION
ioctl call; userspace can do that ioctl and then map memory into the
region later.

Instead we defer looking up the pfn for each memory page until it is
needed, which generally means when the guest does an H_ENTER hcall on
the page.  Since we can't call get_user_pages in real mode, if we don't
already have the pfn for the page, kvmppc_h_enter() will return
H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back
to kernel context.  That calls kvmppc_get_guest_page() to get the pfn
for the page, and then calls back to kvmppc_h_enter() to redo the HPTE
insertion.

When the first vcpu starts executing, we need to have the RMO or VRMA
region mapped so that the guest's real mode accesses will work.  Thus
we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set
up and if not, call kvmppc_hv_setup_rma().  It checks if the memslot
starting at guest physical 0 now has RMO memory mapped there; if so it
sets it up for the guest, otherwise on POWER7 it sets up the VRMA.
The function that does that, kvmppc_map_vrma, is now a bit simpler,
as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself.

Since we are now potentially updating entries in the slot_phys[]
arrays from multiple vcpu threads, we now have a spinlock protecting
those updates to ensure that we don't lose track of any references
to pages.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:36 +02:00
Paul Mackerras
075295dd32 KVM: PPC: Make the H_ENTER hcall more reliable
At present, our implementation of H_ENTER only makes one try at locking
each slot that it looks at, and doesn't even retry the ldarx/stdcx.
atomic update sequence that it uses to attempt to lock the slot.  Thus
it can return the H_PTEG_FULL error unnecessarily, particularly when
the H_EXACT flag is set, meaning that the caller wants a specific PTEG
slot.

This improves the situation by making a second pass when no free HPTE
slot is found, where we spin until we succeed in locking each slot in
turn and then check whether it is full while we hold the lock.  If the
second pass fails, then we return H_PTEG_FULL.

This also moves lock_hpte to a header file (since later commits in this
series will need to use it from other source files) and renames it to
try_lock_hpte, which is a somewhat less misleading name.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:36 +02:00
Paul Mackerras
93e602490c KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests
This adds two new functions, kvmppc_pin_guest_page() and
kvmppc_unpin_guest_page(), and uses them to pin the guest pages where
the guest has registered areas of memory for the hypervisor to update,
(i.e. the per-cpu virtual processor areas, SLB shadow buffers and
dispatch trace logs) and then unpin them when they are no longer
required.

Although it is not strictly necessary to pin the pages at this point,
since all guest pages are already pinned, later commits in this series
will mean that guest pages aren't all pinned.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:36 +02:00
Paul Mackerras
b2b2f16508 KVM: PPC: Keep page physical addresses in per-slot arrays
This allocates an array for each memory slot that is added to store
the physical addresses of the pages in the slot.  This array is
vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr().
This allows us to remove the ram_pginfo field from the kvm_arch
struct, and removes the 64GB guest RAM limit that we had.

We use the low-order bits of the array entries to store a flag
indicating that we have done get_page on the corresponding page,
and therefore need to call put_page when we are finished with the
page.  Currently this is set for all pages except those in our
special RMO regions.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:35 +02:00
Paul Mackerras
8936dda4c2 KVM: PPC: Keep a record of HV guest view of hashed page table entries
This adds an array that parallels the guest hashed page table (HPT),
that is, it has one entry per HPTE, used to store the guest's view
of the second doubleword of the corresponding HPTE.  The first
doubleword in the HPTE is the same as the guest's idea of it, so we
don't need to store a copy, but the second doubleword in the HPTE has
the real page number rather than the guest's logical page number.
This allows us to remove the back_translate() and reverse_xlate()
functions.

This "reverse mapping" array is vmalloc'd, meaning that to access it
in real mode we have to walk the kernel's page tables explicitly.
That is done by the new real_vmalloc_addr() function.  (In fact this
returns an address in the linear mapping, so the result is usable
both in real mode and in virtual mode.)

There are also some minor cleanups here: moving the definitions of
HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:35 +02:00
Paul Mackerras
4e72dbe135 KVM: PPC: Make wakeups work again for Book3S HV guests
When commit f43fdc15fa ("KVM: PPC: booke: Improve timer register
emulation") factored out some code in arch/powerpc/kvm/powerpc.c
into a new helper function, kvm_vcpu_kick(), an error crept in
which causes Book3s HV guest vcpus to stall.  This fixes it.
On POWER7 machines, guest vcpus are grouped together into virtual
CPU cores that share a single waitqueue, so it's important to use
vcpu->arch.wqp rather than &vcpu->wq.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:34 +02:00
Liu Yu-B13201
befdc0a65a KVM: PPC: Avoid patching paravirt template code
Currently we patch the whole code include paravirt template code.
This isn't safe for scratch area and has impact to performance.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:34 +02:00
Scott Wood
570135243a KVM: PPC: e500: use hardware hint when loading TLB0 entries
The hardware maintains a per-set next victim hint.  Using this
reduces conflicts, especially on e500v2 where a single guest
TLB entry is mapped to two shadow TLB entries (user and kernel).
We want those two entries to go to different TLB ways.

sesel is now only used for TLB1.

Reported-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:34 +02:00
Scott Wood
7b11dc9938 KVM: PPC: e500: Fix TLBnCFG in KVM_CONFIG_TLB
The associativity, not just total size, can differ from the host
hardware.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:32 +02:00
Alexander Graf
e371f713db KVM: PPC: Book3S: PR: Fix signal check race
As Scott put it:

> If we get a signal after the check, we want to be sure that we don't
> receive the reschedule IPI until after we're in the guest, so that it
> will cause another signal check.

we need to have interrupts disabled from the point we do signal_check()
all the way until we actually enter the guest.

This patch fixes potential signal loss races.

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:30 +02:00
Alexander Graf
ae21216bec KVM: PPC: align vcpu_kick with x86
Our vcpu kick implementation differs a bit from x86 which resulted in us not
disabling preemption during the kick. Get it a bit closer to what x86 does.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:30 +02:00
Alexander Graf
468a12c2b5 KVM: PPC: Use get/set for to_svcpu to help preemption
When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were
doing a few things wrong, most notably access to PACA fields without making
sure that the pointers stay stable accross the access (preempt_disable()).

This patch moves to_svcpu towards a get/put model which allows us to disable
preemption while accessing the shadow vcpu fields in the PACA. That way we
can run preemptible and everyone's happy!

Reported-by: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:30 +02:00
Alexander Graf
d33ad328c0 KVM: PPC: Book3s: PR: No irq_disable in vcpu_run
Somewhere during merges we ended up from

  local_irq_enable()
  foo();
  local_irq_disable()

to always keeping irqs enabled during that part. However, we now
have the following code:

  foo();
  local_irq_disable()

which disables interrupts without the surrounding code enabling them
again! So let's remove that disable and be happy.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:28 +02:00
Alexander Graf
7d82714d4d KVM: PPC: Book3s: PR: Disable preemption in vcpu_run
When entering the guest, we want to make sure we're not getting preempted
away, so let's disable preemption on entry, but enable it again while handling
guest exits.

Reported-by: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:27 +02:00
Scott Wood
dfd4d47e9a KVM: PPC: booke: Improve timer register emulation
Decrementers are now properly driven by TCR/TSR, and the guest
has full read/write access to these registers.

The decrementer keeps ticking (and setting the TSR bit) regardless of
whether the interrupts are enabled with TCR.

The decrementer stops at zero, rather than going negative.

Decrementers (and FITs, once implemented) are delivered as
level-triggered interrupts -- dequeued when the TSR bit is cleared, not
on delivery.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[scottwood@freescale.com: significant changes]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:27 +02:00
Scott Wood
b59049720d KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn
This allows additional registers to be accessed by the guest
in PR-mode KVM without trapping.

SPRG4-7 are readable from userspace.  On booke, KVM will sync
these registers when it enters the guest, so that accesses from
guest userspace will work.  The guest kernel, OTOH, must consistently
use either the real registers or the shared area between exits.  This
also applies to the already-paravirted SPRG3.

On non-booke, it's not clear to what extent SPRG4-7 are supported
(they're not architected for book3s, but exist on at least some classic
chips).  They are copied in the get/set regs ioctls, but I do not see any
non-booke emulation.  I also do not see any syncing with real registers
(in PR-mode) including the user-readable SPRG3.  This patch should not
make that situation any worse.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood
940b45ec18 KVM: PPC: booke: Paravirtualize wrtee
Also fix wrteei 1 paravirt to check for a pending interrupt.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood
29ac26efbd KVM: PPC: booke: Fix int_pending calculation for MSR[EE] paravirt
int_pending was only being lowered if a bit in pending_exceptions
was cleared during exception delivery -- but for interrupts, we clear
it during IACK/TSR emulation.  This caused paravirt for enabling
MSR[EE] to be ineffective.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood
c59a6a3e4e KVM: PPC: booke: Check for MSR[WE] in prepare_to_enter
This prevents us from inappropriately blocking in a KVM_SET_REGS
ioctl -- the MSR[WE] will take effect when the guest is next entered.

It also causes SRR1[WE] to be set when we enter the guest's interrupt
handler, which is what e500 hardware is documented to do.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood
25051b5a5a KVM: PPC: Move prepare_to_enter call site into subarch code
This function should be called with interrupts disabled, to avoid
a race where an exception is delivered after we check, but the
resched kick is received before we disable interrupts (and thus doesn't
actually trigger the exit code that would recheck exceptions).

booke already does this properly in the lightweight exit case, but
not on initial entry.

For now, move the call of prepare_to_enter into subarch-specific code so
that booke can do the right thing here.  Ideally book3s would do the same
thing, but I'm having a hard time seeing where it does any interrupt
disabling of this sort (plus it has several additional call sites), so
I'm deferring the book3s fix to someone more familiar with that code.
book3s behavior should be unchanged by this patch.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood
7e28e60ef9 KVM: PPC: Rename deliver_interrupts to prepare_to_enter
This function also updates paravirt int_pending, so rename it
to be more obvious that this is a collection of checks run prior
to (re)entering a guest.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Scott Wood
1d1ef22208 KVM: PPC: booke: check for signals in kvmppc_vcpu_run
Currently we check prior to returning from a lightweight exit,
but not prior to initial entry.

book3s already does a similar test.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan
7401f6266d KVM: PPC: booke: Do Not start decrementer when SPRN_DEC set 0
As per specification the decrementer interrupt not happen when DEC is written
with 0. Also when DEC is zero, no decrementer running. So we should not start
hrtimer for decrementer when DEC = 0.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan
dc2babfea5 KVM: PPC: Fix DEC truncation for greater than 0xffff_ffff/1000
kvmppc_emulate_dec() uses dec_nsec of type unsigned long and does below calculation:

        dec_nsec = vcpu->arch.dec;
        dec_nsec *= 1000;
This will truncate if DEC value "vcpu->arch.dec" is greater than 0xffff_ffff/1000.
For example : For tb_ticks_per_usec = 4a, we can not set decrementer more than ~58ms.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan
f9208427f7 PPC: Fix race in mtmsr paravirt implementation
The current implementation of mtmsr and mtmsrd are racy in that it does:

  * check (int_pending == 0)
  ---> host sets int_pending = 1 <---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is written.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Alexander Graf
95325e6b19 KVM: PPC: E500: Support hugetlbfs
With hugetlbfs support emerging on e500, we should also support KVM
backing its guest memory by it.

This patch adds support for hugetlbfs into the e500 shadow mmu code.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
841741f23b KVM: PPC: e500: Don't hardcode PIR=0
The hardcoded behavior prevents proper SMP support.

user space shall specify the vcpu's PIR as the vcpu id.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
303b7c97e3 KVM: PPC: e500: tlbsx: fix tlb0 esel
It should contain the way, not the absolute TLB0 index.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
dc83b8bc02 KVM: PPC: e500: MMU API
This implements a shared-memory API for giving host userspace access to
the guest's TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
0164c0f0c4 KVM: PPC: e500: clear up confusion between host and guest entries
Split out the portions of tlbe_priv that should be associated with host
entries into tlbe_ref.  Base victim selection on the number of hardware
entries, not guest entries.

For TLB1, where one guest entry can be mapped by multiple host entries,
we use the host tlbe_ref for tracking page references.  For the guest
TLB0 entries, we still track it with gtlb_priv, to avoid having to
retranslate if the entry is evicted from the host TLB but not the
guest TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Scott Wood
90b92a6f51 KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all
The only place it makes sense to call this function already needs
to have preemption disabled.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Scott Wood
3bf3cdcc14 KVM: PPC: e500: don't translate gfn to pfn with preemption disabled
Delay allocation of the shadow pid until we're ready to disable
preemption and write the entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Christian Borntraeger
b9e5dc8d45 KVM: provide synchronous registers in kvm_run
On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:22 +02:00
Carsten Otte
5b1c1493af KVM: s390: ucontrol: export SIE control block to user
This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault  is introduced for all
architectures. It allows to map architecture specific pages.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:19 +02:00
Carsten Otte
e08b963716 KVM: s390: add parameter for KVM_CREATE_VM
This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a parameter to KVM_CREATE_VM that
allows to set bits that alter the capabilities of the newly created
virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
This requires CAP_SYS_ADMIN privileges and creates a user controlled
virtual machine on s390 architectures.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:18 +02:00
Ingo Molnar
737f24bda7 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/perf.h
	tools/perf/util/top.h

Merge reason: resolve these cherry-picking conflicts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 09:20:08 +01:00
Thomas Gleixner
ba74c1448f sched/rt: Document scheduler related skip-resched-check sites
Create a distinction between scheduler related preempt_enable_no_resched()
calls and the nearly one hundred other places in the kernel that do not
want to reschedule, for one reason or another.

This distinction matters for -rt, where the scheduler and the non-scheduler
preempt models (and checks) are different. For upstream it's purely
documentational.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01 10:28:04 +01:00
Thomas Gleixner
bd2f55361f sched/rt: Use schedule_preempt_disabled()
Coccinelle based conversion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-24swm5zut3h9c4a6s46x8rws@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01 10:28:03 +01:00
Paul Gortmaker
50af5ead3b bug.h: add include of it to various implicit C users
With bug.h currently living right in linux/kernel.h there
are files that use BUG_ON and friends but are not including
the header explicitly.  Fix them up so we can remove the
presence in kernel.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-29 17:15:08 -05:00
Yinghai Lu
210647af89 PCI: Rename pci_remove_bus_device to pci_stop_and_remove_bus_device
The old pci_remove_bus_device actually did stop and remove.

Make the name reflect that to reduce confusion.

This patch is done by sed scripts and changes back some incorrect
__pci_remove_bus_device changes.

Suggested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-02-27 12:12:18 -08:00
David S. Miller
ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Danny Kukawka
0a167e0a5c arch/powerpc/platforms/powernv/setup.c: included asm/xics.h twice
arch/powerpc/platforms/powernv/setup.c: included 'asm/xics.h' twice,
remove the duplicate.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:59 +11:00
Danny Kukawka
ed7e3d1ca7 arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
remove the duplicate.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Stephen Rothwell
3d066d77cf powerpc: remove CONFIG_PPC_ISERIES from the architecture Kconfig files
After this, we can remove the legacy iSeries code more easily.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Benjamin Herrenschmidt
fe83364f0b powerpc/mpic: Fix allocation of reverse-map for multi-ISU mpics
When using a multi-ISU MPIC, we can interrupts up to
isu_size * MPIC_MAX_ISU, not just isu_size, so allocate
the right size reverse map.

Without this, the code will constantly fallback to
a linear search.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Benjamin Herrenschmidt
f851013cb2 Merge remote-tracking branch 'origin/master' into next 2012-02-27 10:50:11 +11:00
Ben Greear
3bdc0eba0b net: Add framework to allow sending packets with customized CRC.
This is useful for testing RX handling of frames with bad
CRCs.

Requires driver support to actually put the packet on the
wire properly.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-02-24 01:37:35 -08:00
Ingo Molnar
c5905afb0e static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-24 10:05:59 +01:00
Bjorn Helgaas
fb127cb9de PCI: collapse pcibios_resource_to_bus
Everybody uses the generic pcibios_resource_to_bus() supplied by the core
now, so remove the ARCH_HAS_GENERIC_PCI_OFFSETS used during conversion.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:19:04 -07:00
Bjorn Helgaas
6c5705fec6 powerpc/PCI: get rid of device resource fixups
Tell the PCI core about host bridge address translation so it can take
care of bus-to-resource conversion for us.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:19:03 -07:00
Bjorn Helgaas
673c975624 powerpc/PCI: replace pci_probe_only with pci_flags
We already use pci_flags, so this just sets pci_flags directly and removes
the intermediate step of figuring out pci_probe_only, then using it to set
pci_flags.

The PCI core provides a pci_flags definition (currently __weak), so drop
the powerpc definitions in favor of that.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:18:58 -07:00
Bjorn Helgaas
3c13be017a powerpc/PCI: make pci_probe_only default to 0
pci_probe_only is set on ppc64 to prevent resource re-allocation
by the core. It's meant to be used in very specific circumstances
such as when operating under a hypervisor that may prevent such
re-allocation.

Instead of default to 1, we make it default to 0 and explicitly
set it in the few cases where we need it.

This fixes FSL PCI which wants it clear among others.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:18:58 -07:00
Paul Gortmaker
6d166fec12 ppc-6xx: fix build failure in flipper-pic.c and hlwd-pic.c
The commit bae1d8f199 (linux-next)

  "irq_domain/powerpc: Use common irq_domain structure instead of irq_host"

made this change:

   -static struct irq_host *flipper_irq_host;
   +static struct irq_domain *flipper_irq_host;

and this change:

   -static struct irq_host *hlwd_irq_host;
   +static struct irq_domain *hlwd_irq_host;

The intent was to change the type, and not the name, but then in a
couple of instances, it looks like the sed to change the irq_domain_ops
name inadvertently also changed the irq_host name where it was not
supposed to, causing build failures.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-02-22 18:41:19 -07:00
Michael Ellerman
f2699491e0 powerpc/perf: Move perf core & PMU code into a subdirectory
The perf code has grown a lot since it started, and is big enough to
warrant its own subdirectory. For reference it's ~60% bigger than the
oprofile code. It declutters the kernel directory, makes it simpler to
grep for "just perf stuff", and allows us to shorten some filenames.

While we're at it, make it more obvious that we have two implementations
of the core perf logic. One for (roughly) Book3S CPUs, which was the
original implementation, and the other for Freescale embedded CPUs.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:04 +11:00
Mahesh Salgaonkar
12d9299241 fadump: Remove the phyp assisted dump code.
Remove the phyp assisted dump implementation which is not is use.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:03 +11:00
Mahesh Salgaonkar
67b43b9d7c fadump: Invalidate the fadump registration during machine shutdown.
If dump is active during system reboot, shutdown or halt then invalidate
the fadump registration as it does not get invalidated automatically.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:03 +11:00
Mahesh Salgaonkar
b500afff11 fadump: Invalidate registration and release reserved memory for general use.
This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
invalidate the last fadump registration, invalidate '/proc/vmcore', release
the reserved memory for general use and re-register for future kernel dump.
Once the dump is copied to the disk, unlike phyp dump, the userspace tool
can release all the memory reserved for dump with one single operation of
echo 1 to '/sys/kernel/fadump_release_mem'.

Release the reserved memory region excluding the size of the memory required
for future kernel dump registration. And therefore, unlike kdump, Fadump
doesn't need a 2nd reboot to get back the system to the production
configuration.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:02 +11:00
Mahesh Salgaonkar
d34c5f26cf fadump: Add PT_NOTE program header for vmcoreinfo
Introduce a PT_NOTE program header that points to physical address of
vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
note buffer is populated during crash_fadump() at the time of system
crash.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:02 +11:00
Mahesh Salgaonkar
ebaeb5ae24 fadump: Convert firmware-assisted cpu state dump data into elf notes.
When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Mahesh Salgaonkar
2df173d9e8 fadump: Initialize elfcore header and add PT_LOAD program headers.
Build the crash memory range list by traversing through system memory during
the first kernel before we register for firmware-assisted dump. After the
successful dump registration, initialize the elfcore header and populate
PT_LOAD program headers with crash memory ranges. The elfcore header is
saved in the scratch area within the reserved memory. The scratch area starts
at the end of the memory reserved for saving RMR region contents. The
scratch area contains fadump crash info structure that contains magic number
for fadump validation and physical address where the eflcore header can be
found. This structure will also be used to pass some important crash info
data to the second kernel which will help second kernel to populate ELF core
header with correct data before it gets exported through /proc/vmcore. Since
the firmware preserves the entire partition memory at the time of crash the
contents of the scratch area will be preserved till second kernel boot.

Since the memory dump exported through /proc/vmcore is in ELF format similar
to kdump, it will help us to reuse the kdump infrastructure for dump capture
and filtering. Unlike phyp dump, userspace tool does not need to refer any
sysfs interface while reading /proc/vmcore.

NOTE: The current design implementation does not address a possibility of
introducing additional fields (in future) to this structure without affecting
compatibility. It's on TODO list to come up with better approach to
address this.

Reserved dump area start => +-------------------------------------+
                            |  CPU state dump data                |
                            +-------------------------------------+
                            |  HPTE region data                   |
                            +-------------------------------------+
                            |  RMR region data                    |
Scratch area start       => +-------------------------------------+
                            |  fadump crash info structure {      |
                            |     magic nummber                   |
                     +------|---- elfcorehdr_addr                 |
                     |      |  }                                  |
                     +----> +-------------------------------------+
                            |  ELF core header                    |
Reserved dump area end   => +-------------------------------------+

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Mahesh Salgaonkar
3ccc00a7e0 fadump: Register for firmware assisted dump.
On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote:
> On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote:
>
> If I have read the code correctly, we are going to get this printk on
> non-pSeries machines or on older pSeries machines, even if the user
> has not put the fadump=on option on the kernel command line.  The
> printk will be annoying since there is no actual error condition.  It
> seems to me that the condition for the printk should include
> fw_dump.fadump_enabled.  In other words you should probably add
>
> 	if (!fw_dump.fadump_enabled)
> 		return 0;
>
> at the beginning of the function.

Hi Paul,

Thanks for pointing it out. Please find the updated patch below.

The existing patches above this (4/10 through 10/10) cleanly applies
on this update.

Thanks,
-Mahesh.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Mahesh Salgaonkar
eb39c8803d fadump: Reserve the memory for firmware assisted dump.
Reserve the memory during early boot to preserve CPU state data, HPTE region
and RMA (real mode area) region data in case of kernel crash. At the time of
crash, powerpc firmware will store CPU state data, HPTE region data and move
RMA region data to the reserved memory area.

If the firmware-assisted dump fails to reserve the memory, then fallback
to existing kexec-based kdump.

Most of the code implementation to reserve memory has been
adapted from phyp assisted dump implementation written by Linas Vepstas
and Manish Ahuja

This patch also introduces a config option CONFIG_FA_DUMP for firmware
assisted dump feature on Powerpc (ppc64) architecture.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Kyle Moffett
e55d7f737d powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag
There are two separate flags controlling whether or not the MPIC is
reset during initialization, which is completely unnecessary, and only
one of them can be specified in the device tree.

Also, most platforms in-tree right now do actually want to reset the
MPIC during initialization anyways, which means lots of duplicate code
passing the MPIC_WANTS_RESET flag.

Fix all of the callers which currently do not pass the MPIC_WANTS_RESET
flag to pass the MPIC_NO_RESET flag, then remove the MPIC_WANTS_RESET
flag and make the code reset the MPIC by default.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:00 +11:00
Kyle Moffett
c1b8d45db4 powerpc/mpic: Add "last-interrupt-source" property to override hardware
The FreeScale PowerQUICC-III-compatible (mpc85xx/mpc86xx) MPICs do not
correctly report the number of hardware interrupt sources, so software
needs to override the detected value with "256".

To avoid needing to write custom board-specific code to detect that
scenario, allow it to be easily overridden in the device-tree.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:00 +11:00
Kyle Moffett
5019609fce powerpc/mpic: Remove MPIC_BROKEN_FRR_NIRQS and duplicate irq_count
The mpic->irq_count variable is only used as a software error-checking
limit to determine whether or not an IRQ number is valid.  In board code
which does not manually specify an IRQ count to mpic_alloc(), i.e. 0, it
is automatically detected from the number of ISUs and the ISU size.

In practice, all hardware ends up with irq_count == num_sources, so all
of the runtime checks on mpic->irq_count should just check the value of
mpic->num_sources instead.

When platform hardware does not correctly report the number of IRQs,
which only happens on the MPC85xx/MPC86xx, the MPIC_BROKEN_FRR_NIRQS
flag is used to override the detected value of num_sources with the
manual irq_count parameter.  Since there's no need to manually specify
the number of IRQs except in this case, the extra flag can be eliminated
and the test changed to "irq_count != 0".

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:59 +11:00
Kyle Moffett
9ca163c860 fsl/mpic: Create and document the "single-cpu-affinity" device-tree flag
The Freescale MPIC (and perhaps others in the future) is incapable of
routing non-IPI interrupts to more than once CPU at a time.  Currently
all of the Freescale boards msut pass the MPIC_SINGLE_DEST_CPU flag to
mpic_alloc(), but that information should really be present in the
device-tree.

Older board code can't rely on the device-tree having the property set,
but newer platforms won't need it manually specified in the code.

[BenH: Remove unrelated changes, folded in a different patch]

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:59 +11:00
Kyle Moffett
98cca250ae fsl/mpic: Document and use the "big-endian" device-tree flag
The MPIC code checks for a "big-endian" property and sets the flag
MPIC_BIG_ENDIAN if one is present, although prior to the "mpic->flags"
fixup that would never have worked anways.

Unfortunately, even now that it works properly, the Freescale mpic
device-node (the "PowerQUICC-III"-compatible one) does not specify it,
so all of the board ports need to manually pass it to mpic_alloc().

Document the flag and add it to the pq3 device tree.  Existing code will
still need to pass the MPIC_BIG_ENDIAN flag because their dtb may not
have this property, but new platforms shouldn't need to do so.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:58 +11:00
Kyle Moffett
3a7a7176e8 powerpc/mpic: Fix use of "flags" variable in mpic_alloc()
The mpic_alloc() function takes a "flags" parameter and assigns it into
the mpic->flags variable fairly early on, but several later pieces of
code detect various device-tree properties and save them into the
"mpic->flags" variable (EG: "big-endian" => MPIC_BIG_ENDIAN).

Unfortunately, a number of codepaths (including several which test the
flag MPIC_BIG_ENDIAN!) test "flags" instead of "mpic->flags", and get
wrong answers as a result.

Consolidate the device-tree flag tests early in mpic_alloc() and change
all of the checks after "mpic->flags" is init'ed to use "mpic->flags".

[BenH: Fixed up use of mpic->node before it's initialized]

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:58 +11:00
Benjamin Herrenschmidt
18b246fa60 powerpc: Fix various issues with return to userspace
We have a few problems when returning to userspace. This is a
quick set of fixes for 3.3, I'll look into a more comprehensive
rework for 3.4. This fixes:

 - We kept interrupts soft-disabled when schedule'ing or calling
do_signal when returning to userspace as a result of a hardware
interrupt.

 - Rename do_signal to do_notify_resume like all other archs (and
do_signal_pending back to do_signal, which it was before Roland
changed it).

 - Add the missing call to key_replace_session_keyring() to
do_notify_resume().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2012-02-22 16:48:53 +11:00
Michael Ellerman
922b9f86a0 powerpc: Fix program check handling when lockdep is enabled
In commit 54321242af ("Disable interrupts early in Program Check"), we
switched from enabling to disabling interrupts in program_check_common.

Whereas ENABLE_INTS leaves r3 untouched, if lockdep is enabled DISABLE_INTS
calls into lockdep code and will clobber r3. That means we pass a bogus
struct pt_regs* into program_check_exception() and all hell breaks loose.

So load our regs pointer into r3 after we call DISABLE_INTS.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:49 +11:00
Rusty Russell
07d2f1a54a powerpc: Remove references to cpu_*_map
This has been obsolescent for a while; time for the final push.

In adjacent context, replaced old cpus_* with cpumask_*.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:47 +11:00
Grant Likely
0f22dd395f of: Only compile OF_DYNAMIC on PowerPC pseries and iseries
Only two architectures use the OF node reference counting and reclaim bits.
There is no need to compile it for the rest of the PowerPC platforms or for
any of the other architectures.  This patch makes iseries and pseries
select CONFIG_OF_DYNAMIC, and makes it default to off for everything else.

It is still safe to turn on CONFIG_OF_DYNAMIC on all architectures, it just
isn't necessary.

v2: Also select OF_DYNAMIC for PPC_CHROMA and MPC885ADS as reported by Michael
    Meuling

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jimi Xenidis <jimix@pobox.com> (for PPC_CHROMA bug fix)
Cc: Rob Herring <rob.herring@calxeda.com>
2012-02-21 13:33:00 -07:00
Pavel Emelyanov
ef64a54f6e sock: Introduce the SO_PEEK_OFF sock option
This one specifies where to start MSG_PEEK-ing queue data from. When
set to negative value means that MSG_PEEK works as ususally -- peeks
from the head of the queue always.

When some bytes are peeked from queue and the peeking offset is non
negative it is moved forward so that the next peek will return next
portion of data.

When non-peeking recvmsg occurs and the peeking offset is non negative
is is moved backward so that the next peek will still peek the proper
data (i.e. the one that would have been picked if there were no non
peeking recv in between).

The offset is set using per-proto opteration to let the protocol handle
the locking issues and to check whether the peeking offset feature is
supported by the protocol the socket belongs to.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:03:48 -05:00
Justin P. Mattock
a80581d0d1 Typos: change aditional to additional.
The below patch fixes some typos "aditional" to "additional", and also fixes
a comment with another word mispelled.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-21 11:40:36 +01:00
David Howells
1dce27c5aa Wrap accesses to the fd_sets in struct fdtable
Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.

The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.

This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:

	void __set_close_on_exec(int fd, struct fdtable *fdt);
	void __clear_close_on_exec(int fd, struct fdtable *fdt);
	bool close_on_exec(int fd, const struct fdtable *fdt);
	void __set_open_fd(int fd, struct fdtable *fdt);
	void __clear_open_fd(int fd, struct fdtable *fdt);
	bool fd_is_open(int fd, const struct fdtable *fdt);

Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.

Note also that I haven't added wrappers for looking behind the scenes at the
the array.  Possibly that should exist too.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-19 10:30:52 -08:00
Grant Likely
ff8c3ab816 irq_domain/powerpc: Replace custom xlate functions with library functions
This patch converts a number of the powerpc drivers to use the common library
of irq_domain xlate functions, dropping a bunch of lines in the process.

v5: - Remove tsi108 changes from patch

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:24 -07:00
Grant Likely
9f70b8eb3c irq_domain/powerpc: constify irq_domain_ops
Make all the irq_domain_ops structures in powerpc 'static const'

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:24 -07:00
Grant Likely
a18dc81bf5 irq_domain: constify irq_domain_ops
Make irq_domain_ops pointer a constant to make it safer for multiple
instances to share the same ops pointer and change the irq_domain code
so that it does not modify the ops.

v4: Fix mismatched type reference in powerpc code

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:24 -07:00
Grant Likely
1bc04f2cf8 irq_domain: Add support for base irq and hwirq in legacy mappings
Add support for a legacy mapping where irq = (hwirq - first_hwirq + first_irq)
so that a controller driver can allocate a fixed range of irq_descs and use
a simple calculation to translate back and forth between linux and hw irq
numbers.  This is needed to use an irq_domain with many of the ARM interrupt
controller drivers that manage their own irq_desc allocations.  Ultimately
the goal is to migrate those drivers to use the linear revmap, but doing it
this way allows each driver to be converted separately which makes the
migration path easier.

This patch generalizes the IRQ_DOMAIN_MAP_LEGACY method to use
(first_irq-first_hwirq) as the offset between hwirq and linux irq number,
and adds checks to make sure that the hwirq number does not exceed range
assigned to the controller.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:23 -07:00
Grant Likely
a8db8cf0d8 irq_domain: Replace irq_alloc_host() with revmap-specific initializers
Each revmap type has different arguments for setting up the revmap.
This patch splits up the generator functions so that each revmap type
can do its own setup and the user doesn't need to keep track of how
each revmap type handles the arguments.

This patch also adds a host_data argument to the generators.  There are
cases where the host_data pointer will be needed before the function returns.
ie. the legacy map calls the .map callback for each irq before returning.

v2: - Add void *host_data argument to irq_domain_add_*() functions
    - fixed failure to compile
    - Moved IRQ_DOMAIN_MAP_* defines into irqdomain.c

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:22 -07:00
Grant Likely
cc79ca691c irq_domain: Move irq_domain code from powerpc to kernel/irq
This patch only moves the code.  It doesn't make any changes, and the
code is still only compiled for powerpc.  Follow-on patches will generalize
the code for other architectures.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 01:37:49 -07:00
Grant Likely
6d9285b00f irq_domain/powerpc: Eliminate virq_is_host()
There is only one user, and it is trivial to open-code.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 01:36:46 -07:00
Anton Blanchard
9a45a9407c powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events
perf on POWER stopped working after commit e050e3f0a7 (perf: Fix
broken interrupt rate throttling). That patch exposed a bug in
the POWER perf_events code.

Since the PMCs count upwards and take an exception when the top bit
is set, we want to write 0x80000000 - left in power_pmu_start. We were
instead programming in left which effectively disables the counter
until we eventually hit 0x80000000. This could take seconds or longer.

With the patch applied I get the expected number of samples:

          SAMPLE events:       9948

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
2012-02-16 16:24:35 +11:00
Benjamin Herrenschmidt
54321242af powerpc: Disable interrupts early in Program Check
Program Check exceptions are the result of WARNs, BUGs, some
type of breakpoints, kprobe, and other illegal instructions.

We want interrupts (and thus preemption) to remain disabled
while doing the initial stage of testing the reason and
branching off to a debugger or kprobe, so we are still on
the original CPU which makes debugging easier in various cases.

This is how the code was intended, hence the local_irq_enable()
right in the middle of program_check_exception().

However, the assembly exception prologue for that exception was
incorrectly marked as enabling interrupts, which defeats that
(and records a redundant enable with lockdep).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-16 16:15:10 +11:00
Stephen Rothwell
a1a1d1bfc9 powerpc: Remove legacy iSeries from ppc64_defconfig
Since we are heading towards removing the Legacy iSeries platform, start
by no longer building it for ppc64_defconfig.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-16 16:15:08 +11:00
Benjamin Herrenschmidt
13635dfdc6 powerpc/fsl/pci: Fix PCIe fixup regression
Upstream changes to the way PHB resources are registered
broke the resource fixup for FSL boards.

We can no longer rely on the resource pointer array for the PHB's
pci_bus structure, so let's leave it alone and go straight for
the PHB resources instead. This also makes the code generally
more readable.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-16 16:15:03 +11:00
Ira Snyder
40c8cefaaf powerpc: Fix kernel log of oops/panic instruction dump
A kernel oops/panic prints an instruction dump showing several
instructions before and after the instruction which caused the
oops/panic.

The code intended that the faulting instruction be enclosed in angle
brackets, however a bug caused the faulting instruction to be
interpreted by printk() as the message log level.

To fix this, the KERN_CONT log level is added before the actual text of
the printed message.

=== Before the patch ===

[ 1081.587266] Instruction dump:
[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
[ 1081.602500]  4e800020 3803ffd0 2b800009

<4>[ 1081.587266] Instruction dump:
<4>[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
<4>[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
<98090000>[ 1081.602500]  4e800020 3803ffd0 2b800009

=== After the patch ===

[   51.385216] Instruction dump:
[   51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
[   51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009

<4>[   51.385216] Instruction dump:
<4>[   51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
<4>[   51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-16 16:11:23 +11:00
Grant Likely
4bbdd45afd irq_domain/powerpc: eliminate irq_map; use irq_alloc_desc() instead
This patch drops the powerpc-specific irq_map table and replaces it with
directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
and freeing irq_desc structures.

This patch is a preparation step for generalizing the powerpc-specific virq
infrastructure to become irq_domains.

As part of this change, the irq_big_lock is changed to a mutex from a raw
spinlock.  There is no longer any need to use a spin lock since the irq_desc
allocation code is now responsible for the critical section of finding
an unused range of irq numbers.

The radix lookup table is also changed to store the irq_data pointer instead
of the irq_map entry since the irq_map is removed.  This should end up being
functionally equivalent since only allocated irq_descs are ever added to the
radix tree.

v5: - Really don't ever allocate virq 0.  The previous version could still
      do it if hint == 0
    - Respect irq_virq_count setting for NOMAP.  Some NOMAP domains cannot
      use virq values above irq_virq_count.
    - Use numa_node_id() when allocating irq_descs.  Ideally the API should
      obtain that value from the caller, but that touches a lot of call sites
      so will be deferred to a follow-on patch.
    - Fix irq_find_mapping() to include irq numbers lower than
      NUM_ISA_INTERRUPTS.  With the switch to irq_alloc_desc*(), the lowest
      possible allocated irq is now returned by arch_probe_nr_irqs().
v4: - Fix incorrect access to irq_data structure in debugfs code
    - Don't ever allocate virq 0

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-14 14:06:51 -07:00
Grant Likely
bae1d8f199 irq_domain/powerpc: Use common irq_domain structure instead of irq_host
This patch drops the powerpc-specific irq_host structures and uses the common
irq_domain strucutres defined in linux/irqdomain.h.  It also fixes all
the users to use the new structure names.

Renaming irq_host to irq_domain has been discussed for a long time, and this
patch is a step in the process of generalizing the powerpc virq code to be
usable by all architecture.

An astute reader will notice that this patch actually removes the irq_host
structure instead of renaming it.  This is because the irq_domain structure
already exists in include/linux/irqdomain.h and has the needed data members.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-14 14:06:50 -07:00
H. Peter Anvin
fae89ee8d7 powerpc: Use generic posix_types.h
Change the powerpc architecture to use <asm-generic/posix_types.h>.

[ v2: fix the definition for __kernel_ssize_t ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1328677745-20121-16-git-send-email-hpa@zytor.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
2012-02-14 12:01:29 -08:00
Thadeu Lima de Souza Cascardo
778a785f02 powerpc/pseries/eeh: Fix crash when error happens during device probe
EEH may happen during a PCI driver probe. If the driver is trying to
access some register in a loop, the EEH code will try to print the
driver name. But the driver pointer in struct pci_dev is not set until
probe returns successfully.

Use a function to test if the device and the driver pointer is NULL
before accessing the driver's name.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:39 +11:00
Brian King
444080d13d powerpc/pseries: Fix partition migration hang in stop_topology_update
This fixes a hang that was observed during live partition migration.
Since stop_topology_update must not be called from an interrupt
context, call it earlier in the migration process. The hang observed
can be seen below:

WARNING: at kernel/timer.c:1011
Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 fuse loop ibmveth sg ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin dm_multipath scsi_dh sd_mod crc_t10dif ibmvfc scsi_transport_fc scsi_tgt scsi_mod dm_snapshot dm_mod
NIP: c0000000000c52d8 LR: c00000000004be28 CTR: 0000000000000000
REGS: c00000005ffd77d0 TRAP: 0700   Not tainted  (3.2.0-git-00001-g07d106d)
MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 48000084  XER: 00000001
CFAR: c00000000004be20
TASK = c00000005ec78860[0] 'swapper/3' THREAD: c00000005ec98000 CPU: 3
GPR00: 0000000000000001 c00000005ffd7a50 c000000000fbbc98 c000000000ec8340
GPR04: 00000000282a0020 0000000000000000 0000000000004000 0000000000000101
GPR08: 0000000000000012 c00000005ffd4000 0000000000000020 c000000000f3ba88
GPR12: 0000000000000000 c000000007f40900 0000000000000001 0000000000000004
GPR16: 0000000000000001 0000000000000000 0000000000000000 c000000001022310
GPR20: 0000000000000001 0000000000000000 0000000000200200 c000000001029e14
GPR24: 0000000000000000 0000000000000001 0000000000000040 c00000003f74bc80
GPR28: c00000003f74bc84 c000000000f38038 c000000000f16b58 c000000000ec8340
NIP [c0000000000c52d8] .del_timer_sync+0x28/0x60
LR [c00000000004be28] .stop_topology_update+0x20/0x38
Call Trace:
[c00000005ffd7a50] [c00000005ec78860] 0xc00000005ec78860 (unreliable)
[c00000005ffd7ad0] [c00000000004be28] .stop_topology_update+0x20/0x38
[c00000005ffd7b40] [c000000000028378] .__rtas_suspend_last_cpu+0x58/0x260
[c00000005ffd7bf0] [c0000000000fa230] .generic_smp_call_function_interrupt+0x160/0x358
[c00000005ffd7cf0] [c000000000036ec8] .smp_ipi_demux+0x88/0x100
[c00000005ffd7d80] [c00000000005c154] .icp_hv_ipi_action+0x5c/0x80
[c00000005ffd7e00] [c00000000012a088] .handle_irq_event_percpu+0x100/0x318
[c00000005ffd7f00] [c00000000012e774] .handle_percpu_irq+0x84/0xd0
[c00000005ffd7f90] [c000000000022ba8] .call_handle_irq+0x1c/0x2c
[c00000005ec9ba20] [c00000000001157c] .do_IRQ+0x22c/0x2a8
[c00000005ec9bae0] [c0000000000054bc] hardware_interrupt_entry+0x18/0x1c
Exception: 501 at .cpu_idle+0x194/0x2f8
    LR = .cpu_idle+0x194/0x2f8
[c00000005ec9bdd0] [c000000000017e58] .cpu_idle+0x188/0x2f8 (unreliable)
[c00000005ec9be90] [c00000000067ec18] .start_secondary+0x3e4/0x524
[c00000005ec9bf90] [c0000000000093e8] .start_secondary_prolog+0x10/0x14
Instruction dump:
ebe1fff8 4e800020 fbe1fff8 7c0802a6 f8010010 7c7f1b78 f821ff81 78290464
80090014 5400019e 7c0000d0 78000fe0 <0b000000> 4800000c 7c210b78 7c421378

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:39 +11:00
Michael Ellerman
f1c853b53c powerpc/powernv: Disable interrupts while taking phb->lock
We need to disable interrupts when taking the phb->lock. Otherwise
we could deadlock with pci_lock taken from an interrupt.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:39 +11:00
Benjamin Herrenschmidt
6fe5f5f3ff powerpc: Fix WARN_ON in decrementer_check_overflow
We use __get_cpu_var() which triggers a false positive warning
in smp_processor_id() thinking interrupts are enabled (at this
point, they are soft-enabled but hard-disabled).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:38 +11:00
Benjamin Herrenschmidt
7a768d30ca powerpc/wsp: Fix IRQ affinity setting
We call the cache_hwirq_map() function with a linux IRQ number
but it expects a HW irq number. This triggers a BUG on multic-chip
setups in addition to not doing the right thing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:38 +11:00
Srikar Dronamraju
e62894273c powerpc: Implement GET_IP/SET_IP
With this change, helpers such as instruction_pointer() et al, get defined
in the generic header in terms of GET_IP

Removed the unnecessary definition of profile_pc in !CONFIG_SMP case as
suggested by Mike Frysinger.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:38 +11:00
Benjamin Herrenschmidt
454c0bfd0c powerpc/wsp: Permanently enable PCI class code workaround
It appears that on the Chroma card, the class code of the root
complex is still wrong even on DD2 or later chips. This could
be a firmware issue, but that breaks resource allocation so let's
unconditionally fix it up.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:38 +11:00
Grant Likely
07d57a32fb drivercore: Output common devicetree information in uevent
When userspace needs to find a specific device, it currently isn't easy to
resolve a /sys/devices/ path from a specific device tree node.  Nor is it
easy to obtain the compatible list for devices.

This patch generalizes the code that inserts OF_* values into the uevent
device attribute so that any device that is attached to an OF node will
have that information exported to userspace.  Without this patch only
platform devices and some powerpc-specific busses have access to this
data.

The original function also creates a MODALIAS property for the compatible
list, but that code has not been generalized into the common case because
it has the potential to break module loading on a lot of bus types.  Bus
types are still responsible for their own MODALIAS properties.

Boot tested on ARM and compile tested on PowerPC and SPARC.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Frederic Lambert <frdrc66@gmail.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Mark Brown <broonie@sirena.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-01 14:26:30 -07:00
Ingo Molnar
44a6839711 Merge branch 'perf/fast' into perf/core
Merge reason: Lets ready it for v3.4

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-27 12:08:09 +01:00
Rob Herring
2ed86b16ea irq: make SPARSE_IRQ an optionally hidden option
On ARM, we don't want SPARSE_IRQ to be a user visible option. Make
SPARSE_IRQ visible based on MAY_HAVE_SPARSE_IRQ instead of depending
on HAVE_SPARSE_IRQ.

With this, SPARSE_IRQ is not visible on C6X and ARM.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
2012-01-25 20:37:42 -06:00
Benjamin Herrenschmidt
3493c85366 powerpc: Fix build on some non-freescale platforms
Commit 9deaa53ac7 broke build
on platforms that use legacy_serial.c without also having
CONFIG_SERIAL_8250_FSL enabled due to an unconditional code
to a routine in that module.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-25 13:33:22 +11:00
Benjamin Herrenschmidt
f7ea82beb2 powerpc/powernv: Fix PCI resource handling
Recent changes to the handling of PCI resources for host bridges
are breaking the PowerNV code for assigning resources on IODA.

The root of the problem is that the pci_bus attached to a host
bridge no longer has its "legacy" resource pointers populated
but only uses the newer list instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-25 13:32:00 +11:00
Christian Kujau
897e01a08c powerpc/crash: Fix build error without SMP
I could not find cpus_in_crash anywhere in the sourcetree, except for
arch/powerpc/kernel/crash.c. Moving the definition into the CONFIG_SMP
fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-25 09:47:45 +11:00
Deepthi Dharwar
f7aa554510 powerpc/cpuidle: Make it a bool, not a tristate
As pointed out, asm/system.h has empty inline implementations for
update_smt_snooze_delay and pseries_notify_cpuidle_add_cpu, which are
used when CONFIG_PSERIES_IDLE is undefined. Since those two functions
are used in core power architecture functions (store_smt_snooze_delay
at kernel/sysfs.c and smp_xics_setup_cpu at platforms/pseries/smp.c),

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-25 09:43:06 +11:00
Benjamin Herrenschmidt
407a362f94 Merge remote-tracking branch 'kumar/merge' into merge 2012-01-25 09:40:34 +11:00
Ramneek Mehresh
621c4b999e powerpc/85xx: Add dr_mode property in USB nodes
Add usb2 controller node for P1020RDB, P2020RDB, P2020DS, P1021MDS

Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-18 08:05:42 -06:00
Ramneek Mehresh
70a0ac6686 powerpc/85xx: Enable USB2 controller node for P1020RDB
Enable USB2 controller node for P1020RDB. USB2 controller is used only
when board boots from SPI or SD as it is muxed with eLBC

Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-18 08:05:36 -06:00
Jerry Huang
f8b5a31877 powerpc/85xx: Fix cmd12 bug and add the chip compatible for eSDHC
According to latest kernel, the auto-cmd12 property should be
"sdhci,auto-cmd12", and according to the SDHC binding and the workaround for
the special chip, add the chip compatible for eSDHC: "fsl,p1022-esdhc",
"fsl,mpc8536-esdhc", "fsl,p1020-esdhc", "fsl,p2020-esdhc" and
"fsl,p1010-esdhc".

Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-18 08:01:53 -06:00
Linus Torvalds
f429ee3b80 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
  audit: no leading space in audit_log_d_path prefix
  audit: treat s_id as an untrusted string
  audit: fix signedness bug in audit_log_execve_info()
  audit: comparison on interprocess fields
  audit: implement all object interfield comparisons
  audit: allow interfield comparison between gid and ogid
  audit: complex interfield comparison helper
  audit: allow interfield comparison in audit rules
  Kernel: Audit Support For The ARM Platform
  audit: do not call audit_getname on error
  audit: only allow tasks to set their loginuid if it is -1
  audit: remove task argument to audit_set_loginuid
  audit: allow audit matching on inode gid
  audit: allow matching on obj_uid
  audit: remove audit_finish_fork as it can't be called
  audit: reject entry,always rules
  audit: inline audit_free to simplify the look of generic code
  audit: drop audit_set_macxattr as it doesn't do anything
  audit: inline checks for not needing to collect aux records
  audit: drop some potentially inadvisable likely notations
  ...

Use evil merge to fix up grammar mistakes in Kconfig file.

Bad speling and horrible grammar (and copious swearing) is to be
expected, but let's keep it to commit messages and comments, rather than
expose it to users in config help texts or printouts.
2012-01-17 16:41:31 -08:00
Eric Paris
b05d8447e7 audit: inline audit_syscall_entry to reduce burden on archs
Every arch calls:

if (unlikely(current->audit_context))
	audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:16:56 -05:00
Eric Paris
d7e7528bcd Audit: push audit success and retcode into arch ptrace.h
The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure.  This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall.  The fix is to fix the
layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure.  We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO.  This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros.  The reason is because the audit function must take a void*
for the regs.  (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs).  Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value.  But the ptrace_64.h code defined the macro
regs_return_value() as regs[3].  I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3].  regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
2012-01-17 16:16:56 -05:00
Julia Lawall
0cf572dc00 arch/powerpc/sysdev/fsl_pci.c: add missing iounmap
Add missing iounmap in error handling code, in a case where the function
already preforms iounmap on some other execution path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
statement S,S1;
int ret;
@@
e = \(ioremap\|ioremap_nocache\)(...)
... when != iounmap(e)
if (<+...e...+>) S
... when any
    when != iounmap(e)
*if (...)
   { ... when != iounmap(e)
     return ...; }
... when any
iounmap(e);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-17 13:08:35 -06:00
Michael Neuling
ba8438fb4e powerpc: fix compile error with 85xx/p1022_ds.c
Current linux-next compiled with mpc85xx_defconfig causes this:
 arch/powerpc/platforms/85xx/p1022_ds.c:341:14: error: 'udbg_progress' undeclared here (not in a function)

Add include to fix this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-17 13:08:08 -06:00
Linus Torvalds
c63dbbd526 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  Kbuild: Use dtc's -d (dependency) option
  dtc: Implement -d option to write out a dependency file
  kbuild: Fix comment in Makefile.lib
  scripts/genksyms: clean lex/yacc generated files
  kbuild: Correctly deal with make options which contain an "s"
2012-01-16 14:34:54 -08:00
Stephen Warren
7c43185138 Kbuild: Use dtc's -d (dependency) option
This hooks dtc into Kbuild's dependency system.

Thus, for example, "make dtbs" will rebuild tegra-harmony.dtb if only
tegra20.dtsi has changed yet tegra-harmony.dts has not. The previous
lack of this feature recently caused me to have very confusing "git
bisect" results.

For ARM, it's obvious what to add to $(targets). I'm not familiar enough
with other architectures to know what to add there. Powerpc appears to
already add various .dtb files into $(targets), but the other archs may
need something added to $(targets) to work.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
[mmarek: Dropped arch/c6x part to avoid merging commits from the middle
of the merge window]
Signed-off-by: Michal Marek <mmarek@suse.cz>
2012-01-15 00:04:35 +01:00
Linus Torvalds
8e63dd6e1c Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix unpaired __trace_hcall_entry and __trace_hcall_exit
  powerpc: Fix RCU idle and hcall tracing
2012-01-14 12:26:23 -08:00
WANG Cong
a3dd332305 kexec: remove KMSG_DUMP_KEXEC
KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.

[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Wanlong Gao
9512938b88 cpumask: update setup_node_to_cpumask_map() comments
node_to_cpumask() has been replaced by cpumask_of_node(), and wholly
removed since commit 29c337a0 ("cpumask: remove obsolete node_to_cpumask
now everyone uses cpumask_of_node").

So update the comments for setup_node_to_cpumask_map().

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Joe Perches
ff2d8b19a3 treewide: convert uses of ATTRIB_NORETURN to __noreturn
Use the more commonly used __noreturn instead of ATTRIB_NORETURN.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:03 -08:00
Joe Perches
9402c95f34 treewide: remove useless NORET_TYPE macro and uses
It's a very old and now unused prototype marking so just delete it.

Neaten panic pointer argument style to keep checkpatch quiet.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:03 -08:00
Linus Torvalds
7b67e75147 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (80 commits)
  x86/PCI: Expand the x86_msi_ops to have a restore MSIs.
  PCI: Increase resource array mask bit size in pcim_iomap_regions()
  PCI: DEVICE_COUNT_RESOURCE should be equal to PCI_NUM_RESOURCES
  PCI: pci_ids: add device ids for STA2X11 device (aka ConneXT)
  PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB
  x86/PCI: amd: factor out MMCONFIG discovery
  PCI: Enable ATS at the device state restore
  PCI: msi: fix imbalanced refcount of msi irq sysfs objects
  PCI: kconfig: English typo in pci/pcie/Kconfig
  PCI/PM/Runtime: make PCI traces quieter
  PCI: remove pci_create_bus()
  xtensa/PCI: convert to pci_scan_root_bus() for correct root bus resources
  x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus()
  x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented()
  x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan
  sparc32, leon/PCI: convert to pci_scan_root_bus() for correct root bus resources
  sparc/PCI: convert to pci_create_root_bus()
  sh/PCI: convert to pci_scan_root_bus() for correct root bus resources
  powerpc/PCI: convert to pci_create_root_bus()
  powerpc/PCI: split PHB part out of pcibios_map_io_space()
  ...

Fix up conflicts in drivers/pci/msi.c and include/linux/pci_regs.h due
to the same patches being applied in other branches.
2012-01-11 18:50:26 -08:00
Linus Torvalds
e343a895a9 lib: use generic pci_iomap on all architectures
Many architectures don't want to pull in iomap.c,
 so they ended up duplicating pci_iomap from that file.
 That function isn't trivial, and we are going to modify it
 https://lkml.org/lkml/2011/11/14/183
 so the duplication hurts.
 
 This reduces the scope of the problem significantly,
 by moving pci_iomap to a separate file and
 referencing that from all architectures.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPBZXBAAoJECgfDbjSjVRpuuYIAIMD0wE96MuTOSBJX4VG8VAP
 UyjL9dsfMRy8CKioQo5/fxpTY07YBCWmNauSSX7pzgcoUKBfYIGn4Z1qwGYsWK9M
 CzLs6PXLTugw0FtKobHZl/klRTWEBS6YOUjp9x568rplwF+Ppk7b993uj7eS/g+e
 T0mUKzqg4/UavbHd9+W5KgC4drQ5hgtu2WZHoUxBK4umnd3C2G+U82Sthg50o/XU
 SC8IGm39K8I36HoIWgXj3Y7nkOP3mQELohOT4ZPiVSmLvGS4i47+ix75anO+8ZvZ
 jxHr8RC85IK1Nd89NZhbKOyvx0QQiwoKUZaTwcWXJNSOADzZnM6icdIsodc+Elo=
 =ccQZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

lib: use generic pci_iomap on all architectures

Many architectures don't want to pull in iomap.c,
so they ended up duplicating pci_iomap from that file.
That function isn't trivial, and we are going to modify it
https://lkml.org/lkml/2011/11/14/183
so the duplication hurts.

This reduces the scope of the problem significantly,
by moving pci_iomap to a separate file and
referencing that from all architectures.

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  alpha: drop pci_iomap/pci_iounmap from pci-noop.c
  mn10300: switch to GENERIC_PCI_IOMAP
  mn10300: add missing __iomap markers
  frv: switch to GENERIC_PCI_IOMAP
  tile: switch to GENERIC_PCI_IOMAP
  tile: don't panic on iomap
  sparc: switch to GENERIC_PCI_IOMAP
  sh: switch to GENERIC_PCI_IOMAP
  powerpc: switch to GENERIC_PCI_IOMAP
  parisc: switch to GENERIC_PCI_IOMAP
  mips: switch to GENERIC_PCI_IOMAP
  microblaze: switch to GENERIC_PCI_IOMAP
  arm: switch to GENERIC_PCI_IOMAP
  alpha: switch to GENERIC_PCI_IOMAP
  lib: add GENERIC_PCI_IOMAP
  lib: move GENERIC_IOMAP to lib/Kconfig

Fix up trivial conflicts due to changes nearby in arch/{m68k,score}/Kconfig
2012-01-10 18:04:27 -08:00
Li Zhong
ebb7f616ab powerpc: Fix unpaired __trace_hcall_entry and __trace_hcall_exit
Unpaired calling of __trace_hcall_entry and __trace_hcall_exit could
 cause incorrect preempt count. And it might happen as the global
 variable hcall_tracepoint_refcount is checked separately before calling
 them.

 Instead, store the value that was used on entry in the stack frame
 and retreive it from there after the call

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-11 12:50:26 +11:00
Anton Blanchard
a5ccfee05a powerpc: Fix RCU idle and hcall tracing
Tracepoints should not be called inside an rcu_idle_enter/rcu_idle_exit
region. Since pSeries calls H_CEDE in the idle loop, we were violating
this rule.

commit a7b152d534 (powerpc: Tell RCU about idle after hcall tracing)
tried to work around it by delaying the rcu_idle_enter until after we
called the hcall tracepoint, but there are a number of issues with it.

The hcall tracepoint trampoline code is called conditionally when the
tracepoint is enabled. If the tracepoint is not enabled we never call
rcu_idle_enter. The idle_uses_rcu check was also done at compile time
which breaks multiplatform builds.

The simple fix is to avoid tracing H_CEDE and rely on other tracepoints
and the hypervisor dispatch trace log to work out if we called H_CEDE.

This fixes a hang during boot on pSeries.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-11 11:54:20 +11:00
Linus Torvalds
3dcf6c1b6b Merge branch 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits)
  KVM: PPC: Whitespace fix for kvm.h
  KVM: Fix whitespace in kvm_para.h
  KVM: PPC: annotate kvm_rma_init as __init
  KVM: x86 emulator: implement RDPMC (0F 33)
  KVM: x86 emulator: fix RDPMC privilege check
  KVM: Expose the architectural performance monitoring CPUID leaf
  KVM: VMX: Intercept RDPMC
  KVM: SVM: Intercept RDPMC
  KVM: Add generic RDPMC support
  KVM: Expose a version 2 architectural PMU to a guests
  KVM: Expose kvm_lapic_local_deliver()
  KVM: x86 emulator: Use opcode::execute for Group 9 instruction
  KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions
  KVM: x86 emulator: Use opcode::execute for Group 1A instruction
  KVM: ensure that debugfs entries have been created
  KVM: drop bsp_vcpu pointer from kvm struct
  KVM: x86: Consolidate PIT legacy test
  KVM: x86: Do not rely on implicit inclusions
  KVM: Make KVM_INTEL depend on CPU_SUP_INTEL
  KVM: Use memdup_user instead of kmalloc/copy_from_user
  ...
2012-01-10 09:57:11 -08:00
Linus Torvalds
5983faf942 Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits)
  tty: serial: imx: move del_timer_sync() to avoid potential deadlock
  imx: add polled io uart methods
  imx: Add save/restore functions for UART control regs
  serial/imx: let probing fail for the dt case without a valid alias
  serial/imx: propagate error from of_alias_get_id instead of using -ENODEV
  tty: serial: imx: Allow UART to be a source for wakeup
  serial: driver for m32 arch should not have DEC alpha errata
  serial/documentation: fix documented name of DCD cpp symbol
  atmel_serial: fix spinlock lockup in RS485 code
  tty: Fix memory leak in virtual console when enable unicode translation
  serial: use DIV_ROUND_CLOSEST instead of open coding it
  serial: add support for 400 and 800 v3 series Titan cards
  serial: bfin-uart: Remove ASYNC_CTS_FLOW flag for hardware automatic CTS.
  serial: bfin-uart: Enable hardware automatic CTS only when CTS pin is available.
  serial: make FSL errata depend on 8250_CONSOLE, not just 8250
  serial: add irq handler for Freescale 16550 errata.
  serial: manually inline serial8250_handle_port
  serial: make 8250 timeout use the specified IRQ handler
  serial: export the key functions for an 8250 IRQ handler
  serial: clean up parameter passing for 8250 Rx IRQ handling
  ...
2012-01-09 12:09:24 -08:00
Linus Torvalds
98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Linus Torvalds
eb59c505f8 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
  PM / Hibernate: Implement compat_ioctl for /dev/snapshot
  PM / Freezer: fix return value of freezable_schedule_timeout_killable()
  PM / shmobile: Allow the A4R domain to be turned off at run time
  PM / input / touchscreen: Make st1232 use device PM QoS constraints
  PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
  PM / shmobile: Remove the stay_on flag from SH7372's PM domains
  PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
  PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  ARM: S3C64XX: Implement basic power domain support
  PM / shmobile: Use common always on power domain governor
  ...

Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
2012-01-08 13:10:57 -08:00
Linus Torvalds
972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Linus Torvalds
fbce1c234f Changes queued in gpio/next for the start of the 3.3 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPBSJSAAoJEEFnBt12D9kBDmIP/R6PWxg+NrJaGDmxRWBBkOch
 iI4Et7GT3Sk5hYqPz0kehiSz1F4bqpYK3JmdfqotJIhQCM2znXM+BaDLzgFnMsRf
 g4t70PIZQMpk3pj4gtIkMpZCkRdA9VFriIf1cEdHcLiRdNVsXlTE6CcQT4fbE1P/
 JTAGBrVDwbchuoPsf7Mbu8SYT+cQdWLKpgzC4fAzI++QoYyYBtfvuCHTE4VvasNS
 jMDOfT+u0Gwp6LW34v0gKk7MXEQzeq07r1VZxMyuaWbRQoiLr5d1fy97+Fh5+4dU
 Rb9C6zGIoBzRdiJLlfNYrls/FueDHwPyIwXuapMp5QbaGN5fkhVpIVIhYICOLdHW
 IXH2V3saK1s6QbD7nErNT2lCWZQSCUKq/PR9W40dj3PYAI0oxbRBSoPdfGuW1Bl7
 fYqClUza1Bln8bEmmsvAOnIDdfs6zpkWmfouwx4AGaHTfzIAg4VRdoxdUpS8h+/d
 sNEYi99IuYfl3KWCwQbWJMgCOvoBdOGZSCsTrXnjLf1HJCsQOt/Md69Ff6DLn1VR
 gDp0EeqgDH5oXJmQTW1WjXaJJair2j8pY8mlmGxl1AqA4aY8C3dxIfHxlBPzdxii
 I6KTPaIUSXZaOo6e7XDjjjTzabL4q0aQGT5ahpKj928Rnx00QZNXoTy4FmYkL+0k
 j7QZfM/p6+vfxekKUR17
 =9ugo
 -----END PGP SIGNATURE-----

Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6

Changes queued in gpio/next for the start of the 3.3 merge window

* tag 'gpio-for-linus-20120104' of git://git.secretlab.ca/git/linux-2.6:
  gpio: Add decode of WM8994 GPIO configuration
  gpio: Convert GPIO drivers to module_platform_driver
  gpio: Fix typo in comment in Samsung driver
  gpio: Explicitly index samsung_gpio_cfgs
  gpio: Add Linus Walleij as gpio co-maintainer
  of: Add device tree selftests
  of: create of_phandle_args to simplify return of phandle parsing data
  gpio/powerpc: Eliminate duplication of of_get_named_gpio_flags()
  gpio/microblaze: Eliminate duplication of of_get_named_gpio_flags()
  gpiolib: output basic details and consolidate gpio device drivers
  pch_gpio: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  pch_gpio: Support new device LAPIS Semiconductor ML7831 IOH
  spi/pl022: make the chip deselect handling thread safe
  spi/pl022: add support for pm_runtime autosuspend
  spi/pl022: disable the PL022 block when unused
  spi/pl022: move device disable to workqueue thread
  spi/pl022: skip default configuration before suspending
  spi/pl022: fix build warnings
  spi/pl022: only enable RX interrupts when TX is complete
2012-01-07 12:15:36 -08:00
Linus Torvalds
7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Al Viro
ece2ccb668 Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
Linus Torvalds
e4e88f31bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
  powerpc: fix compile error with 85xx/p1010rdb.c
  powerpc: fix compile error with 85xx/p1023_rds.c
  powerpc/fsl: add MSI support for the Freescale hypervisor
  arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
  powerpc/fsl: Add support for Integrated Flash Controller
  powerpc/fsl: update compatiable on fsl 16550 uart nodes
  powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
  powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
  powerpc/fsl: Update defconfigs to enable some standard FSL HW features
  powerpc: Add TBI PHY node to first MDIO bus
  sbc834x: put full compat string in board match check
  powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
  powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
  offb: Fix setting of the pseudo-palette for >8bpp
  offb: Add palette hack for qemu "standard vga" framebuffer
  offb: Fix bug in calculating requested vram size
  powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
  powerpc/44x: Fix build error on currituck platform
  powerpc/boot: Change the load address for the wrapper to fit the kernel
  powerpc/44x: Enable CRASH_DUMP for 440x
  ...

Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
the additional sparse-checking code for cputime_t.
2012-01-06 17:58:22 -08:00
Linus Torvalds
9753dfe19a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits)
  net: pack skb_shared_info more efficiently
  net_sched: red: split red_parms into parms and vars
  net_sched: sfq: extend limits
  cnic: Improve error recovery on bnx2x devices
  cnic: Re-init dev->stats_addr after chip reset
  net_sched: Bug in netem reordering
  bna: fix sparse warnings/errors
  bna: make ethtool_ops and strings const
  xgmac: cleanups
  net: make ethtool_ops const
  vmxnet3" make ethtool ops const
  xen-netback: make ops structs const
  virtio_net: Pass gfp flags when allocating rx buffers.
  ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call
  netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
  igb: reset PHY after recovering from PHY power down
  igb: add basic runtime PM support
  igb: Add support for byte queue limits.
  e1000: cleanup CE4100 MDIO registers access
  e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove
  ...
2012-01-06 17:22:09 -08:00
Bjorn Helgaas
45a709f890 powerpc/PCI: convert to pci_create_root_bus()
Convert from pci_create_bus() to pci_create_root_bus().  This way the root
bus resources are correct immediately.  This patch doesn't fix a problem
because powerpc fixed the resources before scanning the bus, but it makes
powerpc more consistent with other architectures.

v2: fix build error with resource pointer passing

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:11:09 -08:00
Bjorn Helgaas
49a6cba4eb powerpc/PCI: split PHB part out of pcibios_map_io_space()
No functional change.  This is so we can use pcibios_phb_map_io_space()
before we have a struct pci_bus.

v2: fix map io phb typo

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:11:08 -08:00
Bjorn Helgaas
a46770f5b9 powerpc/PCI: make pcibios_setup_phb_resources() static
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:11:08 -08:00
Myron Stowe
79c8be8384 PCI: PowerPC: convert pcibios_set_master() to a non-inlined function
This patch converts PowerPC's architecture-specific
'pcibios_set_master()' routine to a non-inlined function.  This will
allow follow on patches to create a generic 'pcibios_set_master()'
function using the '__weak' attribute which can be used by all
architectures as a default which, if necessary, can then be over-
ridden by architecture-specific code.

Converting 'pci_bios_set_master()' to a non-inlined function will
allow PowerPC's 'pcibios_set_master()' implementation to remain
architecture-specific after the generic version is introduced and
thus, not change current behavior.

No functional change.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:10:38 -08:00
Greg Kroah-Hartman
ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Linus Torvalds
0db49b72bc Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  sched/tracing: Add a new tracepoint for sleeptime
  sched: Disable scheduler warnings during oopses
  sched: Fix cgroup movement of waking process
  sched: Fix cgroup movement of newly created process
  sched: Fix cgroup movement of forking process
  sched: Remove cfs bandwidth period check in tg_set_cfs_period()
  sched: Fix load-balance lock-breaking
  sched: Replace all_pinned with a generic flags field
  sched: Only queue remote wakeups when crossing cache boundaries
  sched: Add missing rcu_dereference() around ->real_parent usage
  [S390] fix cputime overflow in uptime_proc_show
  [S390] cputime: add sparse checking and cleanup
  sched: Mark parent and real_parent as __rcu
  sched, nohz: Fix missing RCU read lock
  sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
  sched, nohz: Fix the idle cpu check in nohz_idle_balance
  sched: Use jump_labels for sched_feat
  sched/accounting: Fix parameter passing in task_group_account_field
  sched/accounting: Fix user/system tick double accounting
  sched/accounting: Re-use scheduler statistics for the root cgroup
  ...

Fix up conflicts in
 - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
	usecs_to_cputime64() vs the sparse cleanups
 - kernel/sched/fair.c, kernel/time/tick-sched.c
	scheduler changes in multiple branches
2012-01-06 08:44:54 -08:00
Linus Torvalds
423d091dfe Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  cpu: Export cpu_up()
  rcu: Apply ACCESS_ONCE() to rcu_boost() return value
  Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
  docs: Additional LWN links to RCU API
  rcu: Augment rcu_batch_end tracing for idle and callback state
  rcu: Add rcutorture tests for srcu_read_lock_raw()
  rcu: Make rcutorture test for hotpluggability before offlining CPUs
  driver-core/cpu: Expose hotpluggability to the rest of the kernel
  rcu: Remove redundant rcu_cpu_stall_suppress declaration
  rcu: Adaptive dyntick-idle preparation
  rcu: Keep invoking callbacks if CPU otherwise idle
  rcu: Irq nesting is always 0 on rcu_enter_idle_common
  rcu: Don't check irq nesting from rcu idle entry/exit
  rcu: Permit dyntick-idle with callbacks pending
  rcu: Document same-context read-side constraints
  rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
  rcu: Remove dynticks false positives and RCU failures
  rcu: Reduce latency of rcu_prepare_for_idle()
  rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
  rcu: Avoid needlessly IPIing CPUs at GP end
  ...
2012-01-06 08:02:40 -08:00
Linus Torvalds
4a2164a7db Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  memblock: Reimplement memblock allocation using reverse free area iterator
  memblock: Kill early_node_map[]
  score: Use HAVE_MEMBLOCK_NODE_MAP
  s390: Use HAVE_MEMBLOCK_NODE_MAP
  mips: Use HAVE_MEMBLOCK_NODE_MAP
  ia64: Use HAVE_MEMBLOCK_NODE_MAP
  SuperH: Use HAVE_MEMBLOCK_NODE_MAP
  sparc: Use HAVE_MEMBLOCK_NODE_MAP
  powerpc: Use HAVE_MEMBLOCK_NODE_MAP
  memblock: Implement memblock_add_node()
  memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
  memblock: Track total size of regions automatically
  powerpc: Cleanup memblock usage
  memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove()
  memblock: Make memblock functions handle overflowing range @size
  memblock: Reimplement __memblock_remove() using memblock_isolate_range()
  memblock: Separate out memblock_isolate_range() from memblock_set_node()
  memblock: Kill memblock_init()
  memblock: Kill sentinel entries at the end of static region arrays
  memblock: Add __memblock_dump_all()
  ...
2012-01-06 07:54:53 -08:00
Tony Breeds
ef88e3911c powerpc: fix compile error with 85xx/p1010rdb.c
Current linux-next compiled with mpc85xx_defconfig causes this:

arch/powerpc/platforms/85xx/p1010rdb.c:41:14: error: 'np' undeclared (first use in this function)
arch/powerpc/platforms/85xx/p1023_rds.c:102:14: error: 'np' undeclared (first use in this function)

Introduced in:
  commit 996983b75c
  Author: Kyle Moffett <Kyle.D.Moffett@boeing.com>
  Date:   Fri Dec 2 06:28:02 2011 +0000
  powerpc/mpic: Search for open-pic device-tree node if NULL

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:52:40 -06:00
Michael Neuling
2c1c743039 powerpc: fix compile error with 85xx/p1023_rds.c
Current linux-next compiled with mpc85xx_smp_defconfig causes this:
arch/powerpc/platforms/85xx/p1023_rds.c: In function 'mpc85xx_rds_pic_init':
arch/powerpc/platforms/85xx/p1023_rds.c:102:14: error: 'np' undeclared (first use in this function)
arch/powerpc/platforms/85xx/p1023_rds.c:102:14: note: each undeclared identifier is reported only once for each function it appears in

Introduced in:
  commit 996983b75c
  Author: Kyle Moffett <Kyle.D.Moffett@boeing.com>
  Date:   Fri Dec 2 06:28:02 2011 +0000
  powerpc/mpic: Search for open-pic device-tree node if NULL

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:51:24 -06:00
Timur Tabi
446bc1ffe4 powerpc/fsl: add MSI support for the Freescale hypervisor
Add support for vmpic-msi nodes to the fsl_msi driver.  The MSI is
virtualized by the hypervisor, so the vmpic-msi does not contain a 'reg'
property.  Instead, the driver uses hcalls.

Add support for the "msi-address-64" property to the fsl_pci driver.
The Freescale hypervisor typically puts the virtualized MSIIR register
in the page after the end of DDR, so we extend the DDR ATMU to cover it.
Any other location for MSIIR is not supported, for now.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:47:44 -06:00
Julia Lawall
c6ca52ad32 arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
rmu needs to be freed before leaving the function in an error case.

A simplified version of the semantic match that finds the problem is as
follows: (http://coccinelle.lip6.fr)

// <smpl>
@r exists@
local idexpression x;
statement S;
identifier f1;
position p1,p2;
expression *ptr != NULL;
@@

x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
     when != if (...) { <+...x...+> }
x->f1
...>
(
 return \(0\|<+...x...+>\|ptr\);
|
 return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:43:03 -06:00
Prabhakar Kushwaha
a20cbdeffc powerpc/fsl: Add support for Integrated Flash Controller
Integrated Flash Controller supports various flashes like NOR, NAND
and other devices using NOR, NAND and GPCM Machine available on it.
IFC supports four chip selects.

Signed-off-by: Dipen Dudhat <Dipen.Dudhat@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Liu Shuo <b35362@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:41:22 -06:00
Kumar Gala
f706bed114 powerpc/fsl: update compatiable on fsl 16550 uart nodes
The Freescale serial port's are pretty much a 16550, however there are
some FSL specific bugs and features.  Add a "fsl,ns16550" compatiable
string to allow code to handle those FSL specific issues.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:38:40 -06:00
Timur Tabi
0725696306 powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
PCI ranges, localbus reg and localbus chip-select 2 range do not match
the memory map setup by bootloader.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:33:58 -06:00
Timur Tabi
3900fad3d3 powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
Commit 7c4b2f09 (powerpc: Update mpc85xx/corenet 32-bit defconfigs)
accidentally disabled the ePAPR byte channel driver in the defconfig for
Freescale CoreNet platforms.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:33:58 -06:00
Kumar Gala
389a6c527e powerpc/fsl: Update defconfigs to enable some standard FSL HW features
corenet64_smp_defconfig:
 - enabled rapidio

corenet32_smp_defconfig:
 - enabled hugetlbfs, rapidio

mpc85xx_smp_defconfig:
 - enabled P1010RDB, hugetlbfs, SPI, SDHC, Crypto/CAAM

mpc85xx_smp_defconfig:
 - enabled hugetlbfs, SPI, SDHC, Crypto/CAAM

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:33:58 -06:00
Andy Fleming
220669495b powerpc: Add TBI PHY node to first MDIO bus
Systems which use the fsl_pq_mdio driver need to specify an
address for TBI PHY transactions such that the address does
not conflict with any PHYs on the bus (all transactions to
that address are directed to the onboard TBI PHY). The driver
used to scan for a free address if no address was specified,
however this ran into issues when the PHY Lib was fixed so
that all MDIO transactions were protected by a mutex. As it
is, the code was meant to serve as a transitional tool until
the device trees were all updated to specify the TBI address.

The best fix for the mutex issue was to remove the scanning code,
but it turns out some of the newer SoCs have started to omit
the tbi-phy node when SGMII is not being used. As such, these
devices will now fail unless we add a tbi-phy node to the first
mdio controller.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:33:51 -06:00
Paul Gortmaker
dabc78403f sbc834x: put full compat string in board match check
The commit 883c2cfc8b:

 "fix of_flat_dt_is_compatible() to match the full compatible string"

causes silent boot death on the sbc8349 board because it was
just looking for 8349 and not 8349E -- as originally there
were non-E (no SEC/encryption) chips available.  Just add the
E to the board detection string since all boards I've seen
were manufactured with the E versions.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:27:59 -06:00
Kumar Gala
96ea3b4a70 powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
There is an issue on FSL-BookE 64-bit devices (P5020) in which PCIe
devices that are capable of doing 64-bit DMAs (like an Intel e1000) do
not function and crash the kernel if we have >4G of memory in the system.

The reason is that the existing code only sets up one inbound window for
access to system memory across PCIe.  That window is limited to a 32-bit
address space.  So on systems we'll end up utilizing SWIOTLB for dma
mappings.  However SWIOTLB dma ops implement dma_alloc_coherent() as
dma_direct_alloc_coherent().  Thus we can end up with dma addresses that
are not accessible because of the inbound window limitation.

We could possibly set the SWIOTLB alloc_coherent op to
swiotlb_alloc_coherent() however that does not address the issue since
the swiotlb_alloc_coherent() will behave almost identical to
dma_direct_alloc_coherent() since the devices coherent_dma_mask will be
greater than any address allocated by swiotlb_alloc_coherent() and thus
we'll never bounce buffer it into a range that would be dma-able.

The easiest and best solution is to just make it so that a 64-bit
capable device is able to DMA to any internal system address.

We accomplish this by opening up a second inbound window that maps all
of memory above the internal SoC address width so we can set it up to
access all of the internal SoC address space if needed.

We than fixup the dma_ops and dma_offset for PCIe devices with a dma
mask greater than the maximum internal SoC address.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-01-04 15:27:58 -06:00
Al Viro
0583fcc96b consolidate umode_t declarations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:17 -05:00
Al Viro
1bc94226d5 switch spu_create(2) to use of SYSCALL_DEFINE4, make it use umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:16 -05:00
Al Viro
c6684b2685 switch spufs guts to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:15 -05:00
Al Viro
d161a13f97 switch procfs to umode_t use
both proc_dir_entry ->mode and populating functions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Al Viro
ff01bb4832 fs: move code out of buffer.c
Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:07 -05:00
Al Viro
6b520e0565 vfs: fix the stupidity with i_dentry in inode destructors
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Li Zhong
e4f387d8db powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen
as following, which could cause incorrect preempt count.

__trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry =>
get_cpu_var => preempt_disable

__trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit =>
put_cpu_var => preempt_enable

where:
A => B and A -> B means A calls B, but
=> means A will call B through function name, and B will definitely be
called.
-> means A will call B through function pointer, so B might not be
called if the function pointer is not set.

So error happens when only one of probe_hcall_entry and probe_hcall_exit
get called during a hcall.

This patch tries to move the preempt count operations from
probe_hcall_entry and probe_hcall_exit to its callers.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: stable@kernel.org [v2.6.32+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-03 12:09:27 +11:00
David S. Miller
7f8e3234c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-30 13:04:14 -05:00
Andreas Schwab
34845636a1 procfs: do not confuse jiffies with cputime64_t
Commit 2a95ea6c0d ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Artem S. Tashkinov" <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-29 16:31:57 -08:00
Alexander Graf
da69dee073 KVM: PPC: Whitespace fix for kvm.h
kvm.h had sparse whitespace at the end of the line. Clean it
up so syncing with QEMU gets easier.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-27 11:26:43 +02:00
Nishanth Aravamudan
6c9b7c409c KVM: PPC: annotate kvm_rma_init as __init
kvm_rma_init() is only called at boot-time, by setup_arch, which is also __init.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-27 11:26:40 +02:00
Xiao Guangrong
28a37544fb KVM: introduce id_to_memslot function
Introduce id_to_memslot to get memslot by slot id

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-27 11:17:39 +02:00
Scott Wood
fae9dbb4b4 KVM: PPC: e500: include linux/export.h
This is required for THIS_MODULE.  We recently stopped acquiring
it via some other header.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-26 13:28:03 +02:00
Michael Neuling
251da03897 KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
Currently kvmppc_start_thread() tries to wake other SMT threads via
xics_wake_cpu().  Unfortunately xics_wake_cpu only exists when
CONFIG_SMP=Y so when compiling with CONFIG_SMP=N we get:

  arch/powerpc/kvm/built-in.o: In function `.kvmppc_start_thread':
  book3s_hv.c:(.text+0xa1e0): undefined reference to `.xics_wake_cpu'

The following should be fine since kvmppc_start_thread() shouldn't
called to start non-zero threads when SMP=N since threads_per_core=1.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-26 13:28:02 +02:00
Andreas Schwab
96f38d7286 KVM: PPC: protect use of kvmppc_h_pr
kvmppc_h_pr is only available if CONFIG_KVM_BOOK3S_64_PR.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-26 13:28:01 +02:00
Andreas Schwab
36cc66d638 KVM: PPC: move compute_tlbie_rb to book3s_64 common header
compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-12-26 13:28:00 +02:00
Kay Sievers
edbaa603eb driver-core: remove sysdev.h usage.
The sysdev.h file should not be needed by any in-kernel code, so remove
the .h file from these random files that seem to still want to include
it.

The sysdev code will be going away soon, so this include needs to be
removed no matter what.

Cc: Jiandong Zheng <jdzheng@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Cc: Bryan Huntsman <bryanh@codeaurora.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "Venkatesh Pallipadi
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
2011-12-21 16:26:03 -08:00
Kay Sievers
86ba41d033 power: suspend - convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 15:09:52 -08:00
Kay Sievers
cfde779fcc power: qe_ic - convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Timur Tabi <timur@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 15:09:51 -08:00
Kay Sievers
6c9d290952 power: cmm - convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 15:09:51 -08:00
Kay Sievers
10fbcf4c6c convert 'memory' sysdev_class to a regular subsystem
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:48:43 -08:00
Kay Sievers
8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Rafael J. Wysocki
90363ddf0a PM: Drop generic_subsys_pm_ops
Since the PM core is now going to execute driver callbacks directly
if the corresponding subsystem callbacks are not present,
forward-only subsystem callbacks (i.e. such that only execute the
corresponding driver callbacks) are not necessary any more.  Thus
it is possible to remove generic_subsys_pm_ops, because the only
callback in there that is not forward-only, .runtime_idle, is not
really used by the only user of generic_subsys_pm_ops, which is
vio_bus_type.

However, the generic callback routines themselves cannot be removed
from generic_ops.c, because they are used individually by a number
of subsystems.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-21 22:03:32 +01:00
Rafael J. Wysocki
b00f4dc5ff Merge branch 'master' into pm-sleep
* master: (848 commits)
  SELinux: Fix RCU deref check warning in sel_netport_insert()
  binary_sysctl(): fix memory leak
  mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
  ipmi_watchdog: restore settings when BMC reset
  oom: fix integer overflow of points in oom_badness
  memcg: keep root group unchanged if creation fails
  nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
  nilfs2: unbreak compat ioctl
  cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mmc: vub300: fix type of firmware_rom_wait_states module parameter
  Revert "mmc: enable runtime PM by default"
  mmc: sdhci: remove "state" argument from sdhci_suspend_host
  x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
  IB/qib: Correct sense on freectxts increment and decrement
  RDMA/cma: Verify private data length
  cgroups: fix a css_set not found bug in cgroup_attach_proc
  oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
  Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
  ...

Conflicts:
	kernel/cgroup_freezer.c
2011-12-21 21:59:45 +01:00
Suzuki Poulose
eba3d97db8 powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
commit c55aef0e5b ("powerpc/boot: Change the load address
for the wrapper to fit the kernel") introduced a WARNING to
inform the user that the uncompressed kernel would overlap
the boot uncompressing wrapper code. Change it to an INFO.

I initially thought, this would be a 'WARNING' for the those
boards, where the link_address should be fixed, so that the
user can take actions accordingly.

Changing the same to INFO.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-21 15:09:25 -05:00
Peter Zijlstra
35edc2a509 perf, arch: Rework perf_event_index()
Put the logic to compute the event index into a per pmu method. This
is required because the x86 rules are weird and wonderful and don't
match the capabilities of the current scheme.

AFAIK only powerpc actually has a usable userspace read of the PMCs
but I'm not at all sure anybody actually used that.

ARM is restored to the default since it currently does not support
userspace access at all. And all software events are provided with a
method that reports their index as 0 (disabled).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/n/tip-dfydxodki16lylkt3gl2j7cw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 11:01:07 +01:00
Josh Boyer
eb975652b8 powerpc/44x: Fix build error on currituck platform
The MPIC_PRIMARY define was recently made "default" and the meaning was
inverted to MPIC_SECONDARY.  This causes compile errors in currituck now, so
fix it to the new manner of allocating mpics.

Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:41:28 -05:00
Suzuki Poulose
c55aef0e5b powerpc/boot: Change the load address for the wrapper to fit the kernel
The wrapper code which uncompresses the kernel in case of a 'ppc' boot
is by default loaded at 0x00400000 and the kernel will be uncompressed
to fit the location 0-0x00400000. But with dynamic relocations, the size
of the kernel may exceed 0x00400000(4M). This would cause an overlap
of the uncompressed kernel and the boot wrapper, causing a failure in
boot.

The message looks like :

   zImage starting: loaded at 0x00400000 (sp: 0x0065ffb0)
   Allocating 0x5ce650 bytes for kernel ...
   Insufficient memory for kernel at address 0! (_start=00400000, uncompressed size=00591a20)

This patch shifts the load address of the boot wrapper code to the next
higher MB, according to the size of  the uncompressed vmlinux.

With the patch, we get the following message while building the image :

 WARN: Uncompressed kernel (size 0x5b0344) overlaps the address of the wrapper(0x400000)
 WARN: Fixing the link_address of wrapper to (0x600000)

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:22:56 -05:00
Suzuki Poulose
5b2e478da0 powerpc/44x: Enable CRASH_DUMP for 440x
Now that we have relocatable kernel, supporting CRASH_DUMP only requires
turning the switches on for UP machines.

We don't have kexec support on 47x yet. Enabling SMP support would be done
as part of enabling the PPC_47x support.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:22:14 -05:00
Suzuki Poulose
26ecb6c44b powerpc/44x: Enable CONFIG_RELOCATABLE for PPC44x
The following patch adds relocatable kernel support - based on processing
of dynamic relocations - for PPC44x kernel.

We find the runtime address of _stext and relocate ourselves based
on the following calculation.

	virtual_base = ALIGN(KERNELBASE,256M) +
			MODULO(_stext.run,256M)

relocate() is called with the Effective Virtual Base Address (as
shown below)

            | Phys. Addr| Virt. Addr |
Page (256M) |------------------------|
Boundary    |           |            |
            |           |            |
            |           |            |
Kernel Load |___________|_ __ _ _ _ _|<- Effective
Addr(_stext)|           |      ^     |Virt. Base Addr
            |           |      |     |
            |           |      |     |
            |           |reloc_offset|
            |           |      |     |
            |           |      |     |
            |           |______v_____|<-(KERNELBASE)%256M
            |           |            |
            |           |            |
            |           |            |
Page(256M)  |-----------|------------|
Boundary    |           |            |

The virt_phys_offset is updated accordingly, i.e,

	virt_phys_offset = effective. kernel virt base - kernstart_addr

I have tested the patches on 440x platforms only. However this should
work fine for PPC_47x also, as we only depend on the runtime address
and the current TLB XLAT entry for the startup code, which is available
in r25. I don't have access to a 47x board yet. So, it would be great if
somebody could test this on 47x.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:21:57 -05:00
Suzuki Poulose
368ff8f14d powerpc: Define virtual-physical translations for RELOCATABLE
We find the runtime address of _stext and relocate ourselves based
on the following calculation.

	virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) +
			MODULO(_stext.run,KERNEL_TLB_PIN_SIZE)

relocate() is called with the Effective Virtual Base Address (as
shown below)

            | Phys. Addr| Virt. Addr |
Page        |------------------------|
Boundary    |           |            |
            |           |            |
            |           |            |
Kernel Load |___________|_ __ _ _ _ _|<- Effective
Addr(_stext)|           |      ^     |Virt. Base Addr
            |           |      |     |
            |           |      |     |
            |           |reloc_offset|
            |           |      |     |
            |           |      |     |
            |           |______v_____|<-(KERNELBASE)%TLB_SIZE
            |           |            |
            |           |            |
            |           |            |
Page        |-----------|------------|
Boundary    |           |            |

On BookE, we need __va() & __pa() early in the boot process to access
the device tree.

Currently this has been defined as :

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) -
						PHYSICAL_START + KERNELBASE)
where:
 PHYSICAL_START is kernstart_addr - a variable updated at runtime.
 KERNELBASE	is the compile time Virtual base address of kernel.

This won't work for us, as kernstart_addr is dynamic and will yield different
results for __va()/__pa() for same mapping.

e.g.,

Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as
PAGE_OFFSET).

In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M

Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000
		= 0xbc100000 , which is wrong.

it should be : 0xc0000000 + 0x100000 = 0xc0100000

On platforms which support AMP, like PPC_47x (based on 44x), the kernel
could be loaded at highmem. Hence we cannot always depend on the compile
time constants for mapping.

Here are the possible solutions:

1) Update kernstart_addr(PHSYICAL_START) to match the Physical address of
compile time KERNELBASE value, instead of the actual Physical_Address(_stext).

The disadvantage is that we may break other users of PHYSICAL_START. They
could be replaced with __pa(_stext).

2) Redefine __va() & __pa() with relocation offset

#ifdef	CONFIG_RELOCATABLE_PPC32
#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + (KERNELBASE + RELOC_OFFSET)))
#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - (KERNELBASE + RELOC_OFFSET))
#endif

where, RELOC_OFFSET could be

  a) A variable, say relocation_offset (like kernstart_addr), updated
     at boot time. This impacts performance, as we have to load an additional
     variable from memory.

		OR

  b) #define RELOC_OFFSET ((PHYSICAL_START & PPC_PIN_SIZE_OFFSET_MASK) - \
                      (KERNELBASE & PPC_PIN_SIZE_OFFSET_MASK))

   This introduces more calculations for doing the translation.

3) Redefine __va() & __pa() with a new variable

i.e,

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))

where VIRT_PHYS_OFFSET :

#ifdef CONFIG_RELOCATABLE_PPC32
#define VIRT_PHYS_OFFSET virt_phys_offset
#else
#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
#endif /* CONFIG_RELOCATABLE_PPC32 */

where virt_phy_offset is updated at runtime to :

	Effective KERNELBASE - kernstart_addr.

Taking our example, above:

virt_phys_offset = effective_kernelstart_vaddr - kernstart_addr
		 = 0xc0400000 - 0x400000
		 = 0xc0000000
	and

	__va(0x100000) = 0xc0000000 + 0x100000 = 0xc0100000
	 which is what we want.

I have implemented (3) in the following patch which has same cost of
operation as the existing one.

I have tested the patches on 440x platforms only. However this should
work fine for PPC_47x also, as we only depend on the runtime address
and the current TLB XLAT entry for the startup code, which is available
in r25. I don't have access to a 47x board yet. So, it would be great if
somebody could test this on 47x.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:21:34 -05:00
Suzuki Poulose
9c5f7d39a8 powerpc: Process dynamic relocations for kernel
The following patch implements the dynamic relocation processing for
PPC32 kernel. relocate() accepts the target virtual address and relocates
 the kernel image to the same.

Currently the following relocation types are handled :

	R_PPC_RELATIVE
	R_PPC_ADDR16_LO
	R_PPC_ADDR16_HI
	R_PPC_ADDR16_HA

The last 3 relocations in the above list depends on value of Symbol indexed
whose index is encoded in the Relocation entry. Hence we need the Symbol
Table for processing such relocations.

Note: The GNU ld for ppc32 produces buggy relocations for relocation types
that depend on symbols. The value of the symbols with STB_LOCAL scope
should be assumed to be zero. - Alan Modra

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Modra <amodra@au1.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:21:08 -05:00
Suzuki Poulose
2391324545 powerpc/44x: Enable DYNAMIC_MEMSTART for 440x
DYNAMIC_MEMSTART(old RELOCATABLE) was restricted only to PPC_47x variants
of 44x. This patch enables DYNAMIC_MEMSTART for 440x based chipsets.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:20:38 -05:00
Suzuki Poulose
0f890c8d20 powerpc: Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookE
The current implementation of CONFIG_RELOCATABLE in BookE is based
on mapping the page aligned kernel load address to KERNELBASE. This
approach however is not enough for platforms, where the TLB page size
is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used
currently in BookE to DYNAMIC_MEMSTART to reflect the actual method.

The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the
dynamic relocations will be introduced in the later in the patch series.

This change would allow the use of the old method of RELOCATABLE for
platforms which can afford to enforce the page alignment (platforms with
smaller TLB size).

Changes since v3:

* Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is
  either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer)

Suggested-by: Scott Wood <scottwood@freescale.com>
Tested-by: Scott Wood <scottwood@freescale.com>

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:20:19 -05:00
Benjamin Herrenschmidt
3f53638c80 powerpc: Fix old bug in prom_init setting of the color
We have an array of 16 entries and a loop of 32 iterations... oops.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:25 +11:00
Paul Mackerras
64968f60e7 powerpc: Only use initrd_end as the limit for alloc_bottom if it's inside the RMO.
As the kernels and initrd's get bigger boot-loaders and possibly
kexec-tools will need to place the initrd outside the RMO.  When this
happens we end up with no lowmem and the boot doesn't get very far.

Only use initrd_end as the limit for alloc_bottom if it's inside the
RMO.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:24 +11:00
Anton Blanchard
b206590c04 powerpc: Fix comment explaining our VSID layout
We support 16TB of user address space and half a million contexts
so update the comment to reflect this.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:22 +11:00
Andreas Schwab
9f5072d4f6 powerpc: Fix wrong divisor in usecs_to_cputime
Commit d57af9b (taskstats: use real microsecond granularity for CPU times)
renamed msecs_to_cputime to usecs_to_cputime, but failed to update all
numbers on the way.  This causes nonsensical cpu idle/iowait values to be
displayed in /proc/stat (the only user of usecs_to_cputime so far).

This also renames __cputime_msec_factor to __cputime_usec_factor, adapting
its value and using it directly in cputime_to_usecs instead of doing two
multiplications.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:20 +11:00
David Rientjes
2011b1d0d3 powerpc/mm: Fix section mismatch for read_n_cells
read_n_cells() cannot be marked as .devinit.text since it is referenced
from two functions that are not in that section: of_get_lmb_size() and
hot_add_drconf_scn_to_nid().

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:18 +11:00
David Rientjes
28e86bdbc9 powerpc/mm: Fix section mismatch for mark_reserved_regions_for_nid
mark_reserved_regions_for_nid() is only called from do_init_bootmem(),
which is in .init.text, so it must be in the same section to avoid a
section mismatch warning.

Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:16 +11:00
Matt Evans
2c9c6ce019 powerpc: Add __SANE_USERSPACE_TYPES__ to asm/types.h for LL64
PPC64 uses long long for u64 in the kernel, but powerpc's asm/types.h
prevents 64-bit userland from seeing this definition, instead defaulting
to u64 == long in userspace.  Some user programs (e.g. kvmtool) may actually
want LL64, so this patch adds a check for __SANE_USERSPACE_TYPES__ so that,
if defined, int-ll64.h is included instead.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:14 +11:00
Anton Blanchard
a66086b819 powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX
Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
For large aligned copies this new loop is over 10% faster, and for
large unaligned copies it is over 200% faster.

If we take a fault we fall back to the old version, this keeps
things relatively simple and easy to verify.

On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled completely in hardware and should be 20-30 cycles.

Unaligned loads on the other hand flush much more often - whenever
crossing a 128 byte cache line, or a 32 byte sector if either sector
is an L1 miss.

Considering this information we really want to get the loads aligned
and not worry about the alignment of the stores. Microbenchmarks
confirm that this approach is much faster than the current unaligned
copy loop that uses shifts and rotates to ensure both loads and
stores are aligned.

We also want to try and do the stores in cacheline aligned, cacheline
sized chunks. If the store queue is unable to merge an entire
cacheline of stores then the L2 cache will have to do a
read/modify/write. Even worse, we will serialise this with the stores
in the next iteration of the copy loop since both iterations hit
the same cacheline.

Based on this, the new loop does the following things:

1 - 127 bytes
Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
boring and similar to how the current loop works.

128 - 4095 bytes
Get the source 8 byte aligned and use 8 byte loads and stores,
1 cacheline at a time. We aren't doing the stores in cacheline
aligned chunks so we will potentially serialise once per cacheline.
Even so it is much better than the loop we have today.

4096 - bytes
If both source and destination have the same alignment get them both
16 byte aligned, then get the destination cacheline aligned. Do
cacheline sized loads and stores using VMX.

If source and destination do not have the same alignment, we get the
destination cacheline aligned, and use permute to do aligned loads.

In both cases the VMX loop should be optimal - we always do aligned
loads and stores and are always doing stores in cacheline aligned,
cacheline sized chunks.

To be able to use VMX we must be careful about interrupts and
sleeping. We don't use the VMX loop when in an interrupt (which should
be rare anyway) and we wrap the VMX loop in disable/enable_pagefault
and fall back to the existing copy_tofrom_user loop if we do need to
sleep.

The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

Since we are using VMX and there is a cost to saving and restoring
the user VMX state there are two broad cases we need to benchmark:

- Best case - userspace never uses VMX

- Worst case - userspace always uses VMX

In reality a userspace process will sit somewhere between these two
extremes. Since we need to test both aligned and unaligned copies we
end up with 4 combinations. The point at which the VMX loop begins to
win is:

0% VMX
aligned		2048 bytes
unaligned	2048 bytes

100% VMX
aligned		16384 bytes
unaligned	8192 bytes

Considering this is a microbenchmark, the data is hot in cache and
the VMX loop has better store queue merging properties we set the
breakpoint to 4096 bytes, a little below the unaligned breakpoints.

Some future optimisations we can look at:

- Looking at the perf data, a significant part of the cost when a
  task is always using VMX is the extra exception we take to restore
  the VMX state. As such we should do something similar to the x86
  optimisation that restores FPU state for heavy users. ie:

        /*
         * If the task has used fpu the last 5 timeslices, just do a full
         * restore of the math state immediately to avoid the trap; the
         * chances of needing FPU soon are obviously high now
         */
        preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

  and

        /*
         * fpu_counter contains the number of consecutive context switches
         * that the FPU is used. If this is over a threshold, the lazy fpu
         * saving becomes unlazy to save the trap. This is an unsigned char
         * so that after 256 times the counter wraps and the behavior turns
         * lazy again; this to deal with bursty apps that only use FPU for
         * a short time
         */

- We could create a paca bit to mirror the VMX enabled MSR bit and check
  that first, avoiding multiple calls to calling enable_kernel_altivec.
  That should help with iovec based system calls like readv.

- We could have two VMX breakpoints, one for when we know the user VMX
  state is loaded into the registers and one when it isn't. This could
  be a second bit in the paca so we can calculate the break points quickly.

- One suggestion from Ben was to save and restore the VSX registers
  we use inline instead of using enable_kernel_altivec.

[BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:40:40 +11:00
Richard Kuo
0766387bcf powerpc: Use rwsem.h from generic location
As of commit dd472da38, rwsem.h was moved into asm-generic.
This patch removes the arch file and points the build at
its new location.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-16 14:39:48 +11:00
Benjamin Herrenschmidt
1e7342e778 Merge remote-tracking branch 'jwb/next' into next
Conflicts:
	arch/powerpc/platforms/40x/ppc40x_simple.c
2011-12-16 11:24:25 +11:00
Benjamin Herrenschmidt
78c5c68a4c powerpc/pmac: Fix SMP kernels on pre-core99 UP machines
The code for "powersurge" SMP would kick in and cause a crash
at boot due to the lack of a NULL test.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-16 11:10:25 +11:00
Benjamin Herrenschmidt
8e609d5e7b powerpc/pmac: Simplify old pmac PIC interrupt handling
In the old days, we treated all interrupts from the legacy Apple home made
interrupt controllers as level, with a trick reading the "level" register
along with the "event" register to work arounds bugs where it would
occasionally fail to latch some events.

Doing so appeared to work fine for both level and edge interrupts.

Later on, we discovered in Darwin source the magic masks that define which
interrupts are actually level and which are edge, and implemented a
different algorithm, more similar to what Apple does, that treats those
differently.

I recently discovered however that this caused problems (including loss
of interrupts) with an old Wallstreet PowerBook when trying to use the
internal modem (connected to a cascaded controller).

It looks like some interrupts are treated as edge while they are really
level and I'm starting to seriously doubt the correctness of the Darwin
code (which has other obvious bugs when you read it, so ...)

This patch reverts to our original behaviour of treating everything as
a level interrupt. It appears to solve the problems with the modem on
the Wallstreet and everything else seems to be working properly as well.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-16 11:10:11 +11:00
Benjamin Herrenschmidt
43ca5d347a Merge branch 'kexec' into next 2011-12-16 11:09:21 +11:00
Benjamin Herrenschmidt
efdad722ef Merge branch 'ps3' into next 2011-12-16 11:09:15 +11:00
Benjamin Herrenschmidt
e6f08d37e6 Merge branch 'cpuidle' into next 2011-12-16 11:09:11 +11:00
Martin Schwidefsky
648616343c [S390] cputime: add sparse checking and cleanup
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-15 14:56:19 +01:00
Grant Likely
1a2d397a6e gpio/powerpc: Eliminate duplication of of_get_named_gpio_flags()
A large chunk of qe_pin_request() is unnecessarily cut-and-paste
directly from of_get_named_gpio_flags().  This patch cuts out the
duplicate code and replaces it with a call to of_get_gpio().

v2: fixed compile error due to missing gpio_to_chip()

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
2011-12-12 13:40:16 -07:00
Frederic Weisbecker
1268fbc746 nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu()
Those two APIs were provided to optimize the calls of
tick_nohz_idle_enter() and rcu_idle_enter() into a single
irq disabled section. This way no interrupt happening in-between would
needlessly process any RCU job.

Now we are talking about an optimization for which benefits
have yet to be measured. Let's start simple and completely decouple
idle rcu and dyntick idle logics to simplify.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:57 -08:00
Paul E. McKenney
a7b152d534 powerpc: Tell RCU about idle after hcall tracing
The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
hypervisor calls that is traced is the H_CEDE call in the idle loop
that tells the hypervisor that this OS instance no longer needs the
current CPU.  However, tracing uses RCU, so this combination of kernel
configuration variables needs to avoid telling RCU about the current CPU's
idleness until after the H_CEDE-entry tracing completes on the one hand,
and must tell RCU that the the current CPU is no longer idle before the
H_CEDE-exit tracing starts.

In all other cases, it suffices to inform RCU of CPU idleness upon
idle-loop entry and exit.

This commit makes the required adjustments.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:39 -08:00
Frederic Weisbecker
2bbb6817c0 nohz: Allow rcu extended quiescent state handling seperately from tick stop
It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:36 -08:00
Frederic Weisbecker
280f06774a nohz: Separate out irq exit and idle loop dyntick logic
The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:35 -08:00
Paul Gortmaker
9deaa53ac7 serial: add irq handler for Freescale 16550 errata.
Sending a break on the SOC UARTs found in some MPC83xx/85xx/86xx
chips seems to cause a short lived IRQ storm (/proc/interrupts
typically shows somewhere between 300 and 1500 events).  Unfortunately
this renders SysRQ over the serial console completely inoperable.

The suggested workaround in the errata is to read the Rx register,
wait one character period, and then read the Rx register again.
We achieve this by tracking the old LSR value, and on the subsequent
interrupt event after a break, we don't read LSR, instead we just
read the RBR again and return immediately.

The "fsl,ns16550" is used in the compatible field of the serial
device to mark UARTs known to have this issue.

Thanks to Scott Wood for providing the errata data which led to
a much cleaner fix.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 19:14:13 -08:00
Tony Breeds
228d550533 powerpc/47x: Add support for the new IBM currituck platform
Based on original work by David 'Shaggy' Kleikamp.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:51:40 -05:00
Tony Breeds
df777bd39a powerpc/476fpe: Add 476fpe SoC code
Based on original work by David 'Shaggy' Kleikamp.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:51:02 -05:00
Tony Breeds
075bcf5879 powerpc/boot: Add mfdcrx
Needed for currituck support.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:49:50 -05:00
Tony Breeds
e32a03290c powerpc/boot: Add extended precision shifts to the boot wrapper.
The upcomming currituck patches will need to do 64-bit shifts which will
fail with undefined symbol without this patch.

I looked at linking against libgcc but we can't guarantee that libgcc
was compiled with soft-float.  Also Using ../lib/div64.S or
../kernel/misc_32.S, this will break the build as the .o's need to be
built with different flags for the bootwrapper vs the kernel.  So for
now the easyest option is to just copy code from
arch/powerpc/kernel/misc_32.S  I don't think this code changes too often ;P

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:49:27 -05:00
Christoph Egger
ca899859f1 powerpc/44x: Removing dead CONFIG_PPC47x
CONFIG_PPC47x doesn't exist in Kconfig and no 476 processor calls this
function ppc44x_pin_tlb() as it has it's own ppc47x_pin_tlb().

This code is probably an artifact of the original 476 code that
shouldn't have made it upstream.

Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:48:56 -05:00
Tony Breeds
466c2bc762 powerpc/44x: pci: Setup the dma_window properties for each pci_controller
Needed if you want to use swiotlb, harmless otherwise.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:48:36 -05:00