Commit Graph

496 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
bcd6acd51f Merge commit 'origin/master' into next
Conflicts:
	include/linux/kvm.h
2009-12-09 17:14:38 +11:00
Mark Nelson
49bd364713 powerpc/pseries: Track previous CPPR values to correctly EOI interrupts
At the moment when we EOI an interrupt we set the CPPR back to 0xFF
regardless of its previous value. This could lead to problems if we
take an interrupt with a priority of 5, but before EOIing it we get
an IPI which has a priority of 4. The problem is that at the moment
when we EOI the IPI we will set the CPPR to 0xFF, but it should
really be set back to 5 (the previous priority).

To keep track of the previous CPPR values we create the xics_cppr
structure that has an array for CPPR values and an index pointing
to the current priority. This can easily grow if new priorities get
added in the future.

This will also be useful because the partition adjunct option of
upcoming machines will update the H_XIRR hcall to accept the CPPR
as a parameter.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:10:38 +11:00
Nathan Fontenot
275a64f604 powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP
The recent patch to add cpu offline/online as part of the DLPAR
process for pseries causes a build break if CONFIG_SMP is not
defined.  Original patch here;
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-November/078299.html

This corrects the build break by moving the online_node_cpus
and offline_node_cpus under the #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
portions of dlpar.c.

This patch also slightly modifies the online_node_cpus and offline_node_cpus
routines to prepend dlpar_ to the them and make them static.  These two
routine are only used in the dlpar add/remove of cpus and these changes
should help clarify that.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:10:38 +11:00
Roman Fietze
40d50cf7ca powerpc: Make "intspec" pointers in irq_host->xlate() const
Writing a driver using SCLPC on the MPC5200B I detected, that the
intspec arrays to map irqs to Linux virq cannot be const, because the
mapping and xlate functions only take non const pointers. All those
functions do not modify the intspec, so a const pointer could be used.

Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:10:37 +11:00
Gautham R Shenoy
51badebdcf powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
  information.

- Online/offline the allocated/deallocated CPU.

This is achieved by writing to the sysfs tunables "probe" during allocation
and "release" during deallocation.

At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable "online".

It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the "online" sysfs tunable thereby causing the deallocate process to fail.

The solution to this is to serialize writes to the "probe/release" sysfs
tunable with the writes to the "online" sysfs tunable.

This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:09:36 +11:00
Gautham R Shenoy
b6db63d1a7 pseries/pseries: Add code to online/offline CPUs of a DLPAR node
Currently the cpu-allocation/deallocation on pSeries is a
two step process from the Userspace.

- Set the indicators and update the device tree by writing to the sysfs
  tunable "probe" during allocation and "release" during deallocation.
- Online / Offline the CPUs of the allocated/would_be_deallocated node by
  writing to the sysfs tunable "online".

This patch adds kernel code to online/offline the CPUs soon_after/just_before
they have been allocated/would_be_deallocated. This way, the userspace tool
that performs DLPAR operations would only have to deal with one set of sysfs
tunables namely "probe" and release".

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:09:35 +11:00
Nathan Fontenot
1a8061c46c powerpc/pseries: Add kernel based CPU DLPAR handling
This patch adds the specific routines to probe and release (add and remove)
cpu resource for the powerpc pseries platform and registers these handlers
with the ppc_md callout structure.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:09:34 +11:00
Nathan Fontenot
ab519a011c powerpc/pseries: Kernel DLPAR Infrastructure
The Dynamic Logical Partitioning capabilities of the powerpc pseries platform
allows for the addition and removal of resources (i.e. CPU's, memory, and PCI
devices) from a partition. The removal of a resource involves
removing the resource's node from the device tree and then returning the
resource to firmware via the rtas set-indicator call.  To add a resource, it
is first obtained from firmware via the rtas set-indicator call and then a
new device tree node is created using the ibm,configure-coinnector rtas call
and added to the device tree.

This patch provides the kernel DLPAR infrastructure in a new filed named
dlpar.c.  The functionality provided is for acquiring and releasing a resource
from firmware and the parsing of information returned from the
ibm,configure-connector rtas call.  Additionally this exports the pSeries
reconfiguration notifier chain so that it can be invoked when device tree
updates are made.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:09:32 +11:00
Gautham R Shenoy
3aa565f53c powerpc/pseries: Add hooks to put the CPU into an appropriate offline state
When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.

This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.

The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
  rtas_stop_self()

- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
  hint specifier.

The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().

The patch also provides a boot-time command line argument to disable/enable
CPU_STATE_INACTIVE.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-24 14:33:04 +11:00
Gautham R Shenoy
69ddb57cbe powerpc/pseries: Add extended_cede_processor() helper function.
This patch provides an extended_cede_processor() helper function
which takes the cede latency hint as an argument. This hint is to be passed
on to the hypervisor to cede to the corresponding state on platforms
which support it.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-24 14:33:03 +11:00
Thomas Gleixner
b27df67248 powerpc: Fixup last users of irq_chip->typename
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-24 14:32:45 +11:00
Ingo Molnar
0ffa798d94 Merge branches 'perf/powerpc' and 'perf/bench' into perf/core
Merge reason: Both 'perf bench' and the pending PowerPC changes
              are now ready for the next merge window.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:51:24 +01:00
Benjamin Herrenschmidt
0526484aa3 Merge commit 'origin/master' into next 2009-11-12 10:59:04 +11:00
Andre Detsch
8435b027b8 powerpc/pci: Fix regression in powerpc MSI-X
Patch f598282f51 exposed a problem in
powerpc MSI-X functionality, making network interfaces such as ixgbe
and cxgb3 stop to work when MSI-X is enabled. RX interrupts were not
being generated.

The problem was caused because MSI irq was not being effectively
unmasked after device initialization.

Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 17:06:27 +11:00
Brian King
8be8cf5b47 powerpc: Add kdump support to Collaborative Memory Manager
When running Active Memory Sharing, the Collaborative Memory Manager (CMM)
may mark some pages as "loaned" with the hypervisor. Periodically, the
CMM will query the hypervisor for a loan request, which is a single signed
value. When kexec'ing into a kdump kernel, the CMM driver in the kdump
kernel is not aware of the pages the previous kernel had marked as "loaned",
so the hypervisor and the CMM driver are out of sync. Fix the CMM driver
to handle this scenario by ignoring requests to decrease the number of loaned
pages if we don't think we have any pages loaned. Pages that are marked as
"loaned" which are not in the balloon will automatically get switched to "active"
the next time we touch the page. This also fixes the case where totalram_pages
is smaller than min_mem_mb, which can occur during kdump.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:56 +11:00
Michael Ellerman
6cff46f4bc powerpc: Remove get_irq_desc()
get_irq_desc() is a powerpc-specific version of irq_to_desc(). That
is reason enough to remove it, but it also doesn't know about sparse
irq_desc support which irq_to_desc() does (when we enable it).

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:55 +11:00
Michael Ellerman
59e3f83702 powerpc/pseries: Use irq_has_action() in eeh_disable_irq()
Rather than open-coding our own check, use irq_has_action()
to check if an irq has an action - ie. is "in use".

irq_has_action() doesn't take the descriptor lock, but it
shouldn't matter - we're just using it as an indicator
that the irq is in use. disable_irq_nosync() will take
the descriptor lock before doing anything also.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:54 +11:00
Benjamin Herrenschmidt
3d541c4b7f powerpc/chrp: Use the same RTAS daemon as pSeries
The CHRP code has some fishy timer based code to scan the RTAS event
log, which uses a 1KB stack buffer and doesn't even use the results.

The pSeries code as a nicer daemon that allows userspace to read the
event log and basically uses the same RTAS interface

This patch moves rtasd.c out of platform/pseries and makes it usable
by CHRP, after removing the old crufty event log mechanism in there.

The nvram logging part of the daemon is still only available on 64-bit
since the underlying nvram management routines aren't currently shared.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:53 +11:00
Benjamin Herrenschmidt
188917e183 powerpc: Move /proc/ppc64 to /proc/powerpc and add symlink
Some of the stuff in /proc/ppc64 such as the RTAS bits are actually
useful to some 32-bit platforms. Rename the file, and create a
symlink on 64-bit for backward compatibility

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:53 +11:00
Anton Blanchard
6f26353ca2 powerpc: tracing: Give hypervisor call tracepoints access to arguments
While most users of the hcall tracepoints will only want the opcode
and return code, some will want all the arguments.  To avoid the
complexity of using varargs we pass a pointer to the register save
area, which contains all the arguments.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-10-28 16:13:04 +11:00
Anton Blanchard
c8cd093a6e powerpc: tracing: Add hypervisor call tracepoints
Add hcall_entry and hcall_exit tracepoints.  This replaces the inline
assembly HCALL_STATS code and converts it to use the new tracepoints.

To keep the disabled case as quick as possible, we embed a status word
in the TOC so we can get at it with a single load.  By doing so we
keep the overhead at a minimum.  Time taken for a null hcall:

No tracepoint code:	135.79 cycles
Disabled tracepoints:	137.95 cycles

For reference, before this patch enabling HCALL_STATS resulted in a null
hcall of 201.44 cycles!

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-10-28 16:13:04 +11:00
Anton Blanchard
b6dcde5c74 powerpc: Fix hypervisor TLB batching
Profiling of a page fault scalability microbenchmark shows flush_hash_range
is not calling the batch hpte invalidate hcall (H_BULK_REMOVE).

It turns out we have a duplicate firmware feature for hcall-bulk and the
current setup code stops after finding the first match. This meant we never
batch and always do individual invalidates.

The patch below removes the duplicate and shifts FW_FEATURE_CMO to close
the gap. With the patch applied the single threaded page fault rate improves
from 217169 to 238755 per second on a POWER5 test box, a 10% improvement.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-14 16:58:37 +11:00
Alexey Dobriyan
828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Becky Bruce
738ef42e32 powerpc: Change archdata dma_data to a union
Sometimes this is used to hold a simple offset, and sometimes
it is used to hold a pointer.  This patch changes it to a union containing
void * and dma_addr_t.  get/set accessors are also provided, because it was
getting a bit ugly to get to the actual data.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:43 +10:00
Rusty Russell
ea0f1cab6e cpumask: Use accessors for cpu_*_mask: powerpc
Use the accessors rather than frobbing bits directly (the new versions
are const).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2009-09-24 09:34:48 +09:30
James Morris
88e9d34c72 seq_file: constify seq_operations
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:29 -07:00
Linus Torvalds
4406c56d0a Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
  PCI hotplug: clean up acpi_run_hpp()
  PCI hotplug: acpiphp: use generic pci_configure_slot()
  PCI hotplug: shpchp: use generic pci_configure_slot()
  PCI hotplug: pciehp: use generic pci_configure_slot()
  PCI hotplug: add pci_configure_slot()
  PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
  PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
  PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
  PCI: Clear saved_state after the state has been restored
  PCI PM: Return error codes from pci_pm_resume()
  PCI: use dev_printk in quirk messages
  PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
  PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
  PCI Hotplug: acpiphp: find bridges the easy way
  PCI: pcie portdrv: remove unused variable
  PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
  ACPI PM: Replace wakeup.prepared with reference counter
  PCI PM: Introduce device flag wakeup_prepared
  PCI / ACPI PM: Rework some debug messages
  PCI PM: Simplify PCI wake-up code
  ...

Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases.  The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
2009-09-16 07:49:54 -07:00
Paul Mackerras
a6dbf93a2a powerpc: Fix bug where perf_counters breaks oprofile
Currently there is a bug where if you use oprofile on a pSeries
machine, then use perf_counters, then use oprofile again, oprofile
will not work correctly; it will lose the PMU configuration the next
time the hypervisor does a partition context switch, and thereafter
won't count anything.

Maynard Johnson identified the sequence causing the problem:
- oprofile setup calls ppc_enable_pmcs(), which calls
  pseries_lpar_enable_pmcs, which tells the hypervisor that we want
  to use the PMU, and sets the "PMU in use" flag in the lppaca.
  This flag tells the hypervisor whether it needs to save and restore
  the PMU config.
- The perf_counter code sets and clears the "PMU in use" flag directly
  as it context-switches the PMU between tasks, and leaves it clear
  when it finishes.
- oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
  which does nothing because it has already been called.  In particular
  it doesn't set the "PMU in use" flag.

This fixes the problem by arranging for ppc_enable_pmcs to always set
the "PMU in use" flag.  It makes the perf_counter code call
ppc_enable_pmcs also rather than calling the lower-level function
directly, and removes the setting of the "PMU in use" flag from
pseries_lpar_enable_pmcs, since that is now done in its caller.

This also removes the declaration of pasemi_enable_pmcs because it
isn't defined anywhere.

Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 11:27:58 +10:00
Mike Mason
6e19314cc9 PCI/powerpc: support PCIe fundamental reset
By default, the EEH framework on powerpc does what's known as a "hot
reset" during recovery of a PCI Express device.  We've found a case
where the device needs a "fundamental reset" to recover properly.  The
current PCI error recovery and EEH frameworks do not support this
distinction.

The attached patch makes changes to EEH to utilize the new bit field.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Richard Lary <rlary@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:41 -07:00
Brian King
46db2f86a3 powerpc/pseries: Fix to handle slb resize across migration
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.

BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
      breaking ppc32 build

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 16:19:01 +10:00
Grant Likely
0ed2c722c6 powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
The two versions are doing almost exactly the same thing.  No need to
maintain them as separate files.  This patch also has the side effect
of making the PCI device tree scanning code available to 32 bit powerpc
machines, but no board ports actually make use of this feature at this
point.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 15:45:53 +10:00
Benjamin Herrenschmidt
cf54dc7cd4 powerpc: Move definitions of secondary CPU spinloop to header file
Those definitions are currently declared extern in the .c file where
they are used, move them to a header file instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:44 +10:00
Michael Ellerman
b69e9e931d powerpc/pseries: Use pr_devel() in xics.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   7720    5488     296   13504    34c0 platforms/pseries/xics.o

size after:
   text    data     bss     dec     hex filename
   7535	   5456	    296	  13287	   33e7	platforms/pseries/xics.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:21 +10:00
Michael Ellerman
551a232c87 powerpc/pseries: Use pr_devel() in pseries LPAR HPTE routines
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

In particular, pSeries_lpar_hpte_insert() goes from 185 instructions
to 77 instructions as a result of this patch. Luckily that code
isn't called very often ...

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   7284    1552     296    9132    23ac platforms/pseries/lpar.o

size after:
   text    data     bss     dec     hex filename
   5806    1096     296    7198    1c1e platforms/pseries/lpar.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:21 +10:00
Benjamin Herrenschmidt
c4007a2fbf powerpc: Use one common impl. of RTAS timebase sync and use raw spinlock
Several platforms use their own copy of what is essentially the same code,
using RTAS to synchronize the timebases when bringing up new CPUs. This
moves it all into a single common implementation and additionally
turns the spinlock into a raw spinlock since the former can rely on
the timebase not being frozen when spinlock debugging is enabled, and finally
masks interrupts while the timebase is disabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 16:55:25 +10:00
Linus Torvalds
59ef7a83f1 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (74 commits)
  PCI: make msi_free_irqs() to use msix_mask_irq() instead of open coded write
  PCI: Fix the NIU MSI-X problem in a better way
  PCI ASPM: remove get_root_port_link
  PCI ASPM: cleanup pcie_aspm_sanity_check
  PCI ASPM: remove has_switch field
  PCI ASPM: cleanup calc_Lx_latency
  PCI ASPM: cleanup pcie_aspm_get_cap_device
  PCI ASPM: cleanup clkpm checks
  PCI ASPM: cleanup __pcie_aspm_check_state_one
  PCI ASPM: cleanup initialization
  PCI ASPM: cleanup change input argument of aspm functions
  PCI ASPM: cleanup misc in struct pcie_link_state
  PCI ASPM: cleanup clkpm state in struct pcie_link_state
  PCI ASPM: cleanup latency field in struct pcie_link_state
  PCI ASPM: cleanup aspm state field in struct pcie_link_state
  PCI ASPM: fix typo in struct pcie_link_state
  PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS
  PCI: remove redundant __msi_set_enable()
  PCI PM: consistently use type bool for wake enable variable
  x86/ACPI: Correct maximum allowed _CRS returned resources and warn if exceeded
  ...
2009-06-22 11:59:51 -07:00
Zhang, Yanmin
70298c6e6c PCI AER: support Multiple Error Received and no error source id
Based on PCI Express AER specs, a root port might receive multiple
TLP errors while it could only save a correctable error source id
and an uncorrectable error source id at the same time. In addition,
some root port hardware might be unable to provide a correct source
id, i.e., the source id, or the bus id part of the source id provided
by root port might be equal to 0.

The patchset implements the support in kernel by searching the device
tree under the root port.

Patch 1 changes parameter cb of function pci_walk_bus to return a value.
When cb return non-zero, pci_walk_bus stops more searching on the
device tree.

Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-16 14:30:13 -07:00
Benjamin Herrenschmidt
bc47ab0241 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/kernel/asm-offsets.c
2009-06-12 16:53:38 +10:00
Stephen Rothwell
41febbc829 powerpc/pseries: Fix warnings when printing resource_size_t
resource_size_t is 64 bits on pseries

Gets rid of these warnings:

arch/powerpc/platforms/pseries/iommu.c: In function 'pci_dma_bus_setup_pSeries':
arch/powerpc/platforms/pseries/iommu.c:391: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'resource_size_t'
arch/powerpc/platforms/pseries/iommu.c:417: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'resource_size_t'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:39 +10:00
Anton Blanchard
f8729e8531 powerpc: Convert RTAS event scan from kernel thread to workqueue
RTAS event scan has to run across all cpus. Right now we use a kernel
thread and set_cpus_allowed but in doing so we wake up the previous cpu
unnecessarily.

Some ftrace output shows this:

previous cpu (2):
[002]  7.022331: sched_switch: task swapper:0 [140] ==> rtasd:194 [120]
[002]  7.022338: sched_switch: task rtasd:194 [120] ==> migration/2:9 [0]
[002]  7.022344: sched_switch: task migration/2:9 [0] ==> swapper:0 [140]

next cpu (3):
[003]  7.022345: sched_switch: task swapper:0 [140] ==> rtasd:194 [120]
[003]  7.022371: sched_switch: task rtasd:194 [120] ==> swapper:0 [140]

We can use schedule_delayed_work_on and avoid the unnecessary wakeup.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-02 10:35:32 +10:00
Kumar Gala
2eb4afb69f powerpc/pci: Move pseries code into pseries platform specific area
There doesn't appear to be any specific reason that we need to setup the
pseries specific notifier in generic arch pci code.  Move it into pseries
land.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Robert Jennings
14f966e794 powerpc/pseries: CMO unused page hinting
Adds support for the "unused" page hint which can be used in shared
memory partitions to flag pages not in use, which will then be stolen
before active pages by the hypervisor when memory needs to be moved to
LPARs in need of additional memory.  Failure to mark pages as 'unused'
makes the LPAR slower to give up unused memory to other partitions.

This adds the kernel parameter 'cmo_free_hint' to disable this
functionality.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:43:58 +10:00
Yinghai Lu
d5dedd4507 irq: change ->set_affinity() to return status
according to Ingo, change set_affinity() in irq_chip should return int,
because that way we can handle failure cases in a much cleaner way, in
the genirq layer.

v2: fix two typos

[ Impact: extend API ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
LKML-Reference: <49F654E9.4070809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-28 12:21:16 +02:00
Sachin Sant
b71a0c296c powerpc: pseries/dtl.c should include asm/firmware.h
A randconfig build on powerpc failed with:

dtl.c: In function 'dtl_init':
dtl.c:238: error: implicit declaration of function 'firmware_has_feature'
dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function)

- We need firmware.h for these definitions.

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-15 15:23:55 +10:00
Mike Mason
c58dc575f3 powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
While adding native EEH support to Emulex and Qlogic drivers, it was
discovered that dev->error_state was set to pci_io_channel_normal too
late in the recovery process. These drivers rely on error_state to
determine if they can access the device in their slot_reset callback,
thus error_state needs to be set to pci_io_channel_normal in
eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
Lary) as to why this is necessary.

Background:
PCI MMIO or DMA accesses to a frozen slot generate additional EEH
errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
adapter will be shutdown. To avoid triggering excessive EEH errors and
an undesirable adapter shutdown, some drivers use the
pci_channel_offline(dev) wrapper function to return a Boolean value
based on the value of pci_dev->error_state to determine if PCI MMIO or
DMA accesses are safe. If the wrapper returns TRUE, drivers must not
make PCI MMIO or DMA access to their hardware.

The pci_dev structure member error_state reflects one of three values,
1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.

The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
point where the PCI slot is frozen. Currently, the EEH driver restores
dev->error_state to pci_channel_io_normal in eeh_report_resume() before
calling the driver's resume callback. However, when the EEH driver calls
the driver's slot_reset callback() from eeh_report_reset(), it
incorrectly indicates the error state is still pci_channel_io_frozen.

Waiting until eeh_report_resume() to restore dev->error_state to
pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
any other drivers which are designed to use common code paths in these
two cases: i) those called after the driver's slot_reset callback() and
ii) those called after the PCI slot is frozen but before the driver's
slot_reset callback is called. Case i) all driver paths executed to
reinitialize the hardware after a reset and case ii) all code paths
executed by driver kernel threads that run asynchronous to the main
driver thread, such as interrupt handlers and worker threads to process
driver work queues.

Emulex and QLogic FC drivers are designed with common code paths which
require that pci_channel_offline(dev) reflect the true state of the
hardware. The state transitions that the hardware takes from Normal
Operations to Slot Frozen to Reset to Normal Operations are documented
in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75.
PE State Control.

PAPR defines the following 3 states:

0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
     (Normal Operations)
1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
     (Slot Frozen)

An EEH error places the slot in state 2 (Frozen) and the adapter driver
is notified that an EEH error was detected. If the adapter driver
returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
eeh_reset_device() to place the slot into state 1 (Reset) and
eeh_reset_device completes by placing the slot into State 0 (Normal
Operations). Upon return from eeh_reset_device(), the EEH driver calls
eeh_report_reset, which then calls the adapter's slot_reset callback. At
the time the adapter's slot_reset callback is called, the true state of
the hardware is Normal Operations and should be accurately reflected by
setting dev->error_state to pci_channel_io_normal.

The current implementation of EEH driver does not do so and requires
this change to correct this deficiency.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-15 15:23:53 +10:00
Benjamin Herrenschmidt
9ff9a26b78 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/elf.h
	drivers/i2c/busses/i2c-mpc.c
2009-03-30 14:04:53 +11:00
Jeremy Kerr
82631f5dd1 powerpc: Add write barrier before enabling DTL flags
Currently, we don't enforce any ordering for updates to the lppaca
when enabling dtl logging, so we may end up enabling logging before the
index fields have been established.

This change adds a smp_wmb() before doing the actual enable.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-27 16:58:23 +11:00
Jeremy Kerr
fc59a3fc8e powerpc: Add virtual processor dispatch trace log
pseries SPLPAR machines are able to retrieve a log of dispatch and
preempt events from the hypervisor. With this information, we can
see when and why each dispatch & preempt is occuring.

This change adds a set of debugfs files allowing userspace to read this
dispatch log.

Based on initial patches from Nishanth Aravamudan <nacc@us.ibm.com>.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:28 +11:00
Nathan Fontenot
c5785f9e1c powerpc/pseries: Failed reconfig notifier chain call cleanup
The return code from invoking the notifier chain when updating the
ibm,dynamic-memory property is not handled properly. In failure
cases (rc == NOTIFY_BAD) we should be restoring the original value
of the property.  In success (rc == NOTIFY_OK) we should be returning
zero from the calling routine.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:43:52 +11:00
Benjamin Herrenschmidt
28794d34ec powerpc/kconfig: Kill PPC_MULTIPLATFORM
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.

This removes it along with the following changes:

 - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
   on 6xx which is what they want anyway.

 - A new symbol, PPC_BOOK3S, is defined that represent compliance with
   the "Server" variant of the architecture. This is set when either 6xx
   or PPC64 is set and open the door for future BOOK3E 64-bit.

 - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
   PPC64 && PPC_BOOK3S

 - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
   used to control the use of prom_init.c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:35 +11:00