linux/arch
Hidetoshi Seto 68cb14c7c4 [IA64] kdump: Don't return APs to SAL from kdump
Summary:

  Asserting INIT on cpu going to be offline will result in unexpected
  behavior.  It will be a real problem in kdump cases where INIT might
  be asserted to unstable APs going to be offline by returning to SAL.

Description:

  Since psr.mc is cleared when bits in psr are set to SAL_PSR_BITS_TO_SET
  in ia64_jump_to_sal(), there is a small window (~few msecs) that the
  cpu can receive INIT even if the cpu enter there via INIT handler.
  In this window we do restore of registers for SAL, so INIT asserted
  here will not work properly.

  It is hard to remove this window by masking INIT (i.e. setting psr.mc)
  because we have to unmask it later in OS, because we have to use branch
  instruction (br.ret, not rfi) to return SAL, due to OS_BOOT_RENDEZ to
  SAL return convention.

  I suppose this window will not be a real problem on cpu offline if we
  can educate people not to push INIT button during hotplug operation.
  However, only exception is a race in kdump and INIT.  Now kdump returns
  APs to SAL before processing dump, but the kernel might receive INIT at
  that point in time.  Such INIT might be asserted by kdump itself if an
  AP doesn't react IPI soon and kdump decided to use INIT to stop the AP.
  Or it might be asserted by operator or an external agent to start dump
  on the unstable system.

  Such panic+INIT or INIT+INIT cases should be rare, but it will be happy
  if we can retrieve crashdump even in such cases.

How to reproduce:

  panic+INIT or INIT+INIT, with kdump configured

Expected results:

  crashdump is retrieved anyway

Actual results:

  panic, hang etc. (unexpected)

Proposed fix

  To avoid the window on the way to SAL, this patch stops returning APs
  to SAL in case of kdump.  In other words, this patch makes APs spin
  in OS instead of spinning in SAL.

  (* Note: What impact would be there?  If a cpu is spinning in SAL,
   the cpu is in BOOT_RENDEZ loop, as same as offlined cpu.
   In theory if an INIT is asserted there, cpus in the BOOT_RENDEZ loop
   should not invoke OS_INIT on it.  So in either way, no matter where
   the cpu is spinning actually in, once cpu starts spin and act as
   "frozen," INIT on the cpu have no effects.
   From another point of view, all debug information on the cpu should
   have stored to memory before the cpu start to be frozen.  So no more
   action on the cpu is required.)

  I confirmed that the kdump sometime hangs by concurrent INITs (another
  INIT after an INIT), and it doesn't hang after applying this patch.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-09-14 16:18:37 -07:00
..
alpha mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
arm [ARM] Kirkwood: enable eSATA on QNAP TS-219P 2009-08-24 11:56:00 -04:00
avr32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 2009-08-24 12:26:48 -07:00
blackfin blackfin: fix wrong CTS inversion 2009-07-20 16:38:44 -07:00
cris mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
frv mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
h8300 sched: INIT_PREEMPT_COUNT 2009-07-10 14:24:05 -07:00
ia64 [IA64] kdump: Don't return APs to SAL from kdump 2009-09-14 16:18:37 -07:00
m32r mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
m68k m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open 2009-08-26 23:14:50 +02:00
m68knommu m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open 2009-08-26 23:14:50 +02:00
microblaze microblaze: Update Microblaze defconfigs 2009-08-18 11:05:11 +02:00
mips Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus 2009-08-17 13:39:52 -07:00
mn10300 MN10300: includecheck fix: mn10300, pci.h 2009-08-10 08:54:27 -07:00
parisc parisc: fix warning in traps.c 2009-08-28 19:37:20 -10:00
powerpc powerpc: Fix i8259 interrupt driver kernel crash on ML510 2009-09-05 14:58:07 -07:00
s390 [S390] set preferred console based on conmode 2009-08-23 18:10:01 +02:00
sh sh: sh7724 ddr self-refresh changes 2009-08-15 12:58:50 +09:00
sparc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 2009-09-05 13:49:06 -07:00
um mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
x86 x86: Fix vSMP boot crash 2009-08-26 10:13:17 +02:00
xtensa mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
.gitignore
Kconfig gcov: add gcov profiling infrastructure 2009-06-18 13:03:57 -07:00