linux/drivers
NeilBrown dfc7064500 md: restart recovery cleanly after device failure.
When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
..
accessibility Kconfig: improved help for CONFIG_ACCESSIBILITY 2008-05-08 10:46:55 -07:00
acorn/char
acpi acpi: fix integer as NULL pointer warning 2008-05-23 08:11:06 -07:00
amba
ata drivers/ata: trim trailing whitespace 2008-05-19 17:56:10 -04:00
atm drivers/atm/: remove CVS keywords 2008-05-20 14:52:25 -07:00
auxdisplay
base Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-05-20 17:23:03 -07:00
block [POWERPC] iSeries: Remove unused mail address 2008-05-23 16:45:04 +10:00
bluetooth hci_usb.h: fix hard-to-trigger race 2008-05-02 16:45:10 -07:00
cdrom [POWERPC] iSeries: Remove unused mail address 2008-05-23 16:45:04 +10:00
char remove debug printk from DRM suspend path 2008-05-23 08:53:13 -07:00
clocksource
connector
cpufreq [CPUFREQ] clarify license of freq_table.c 2008-05-22 16:38:03 -04:00
cpuidle
crypto
dca
dio
dma iop-adma: fixup some kzalloc/memset confusions 2008-05-20 13:51:20 -07:00
edac dev_name introduction fall out fix 2008-05-05 15:08:38 -07:00
eisa
firewire firewire: prevent userspace from accessing shut down devices 2008-05-20 18:24:17 +02:00
firmware edd: add default mode CONFIG_EDD_OFF=n, override with edd={on,off} 2008-04-29 08:06:23 -07:00
gpio gpio: pca953x: add support for pca9555 I2C I/O expander 2008-05-01 08:04:01 -07:00
hid HID: remove CVS keywords 2008-05-20 16:44:43 +02:00
hwmon ibmaem: new driver for power/energy/temp meters in IBM System X hardware 2008-05-24 09:56:08 -07:00
i2c i2c/max6875: Really prevent 24RF08 corruption 2008-05-18 20:49:41 +02:00
ide ide: fix race in device_create 2008-05-20 13:31:54 -07:00
ieee1394 ieee1394: sbp2: use correct size of command descriptor block 2008-05-20 18:24:17 +02:00
infiniband Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband 2008-05-23 11:11:44 -07:00
input MODULE_LICENSE expects "GPL v2", not "GPLv2" 2008-05-21 16:56:00 -07:00
isdn isdn: fix integer as NULL pointer warning 2008-05-23 08:11:06 -07:00
leds LEDS: fix race in device_create 2008-05-20 13:31:55 -07:00
lguest lguest: make Launcher see device status updates 2008-05-02 21:50:54 +10:00
macintosh [POWERPC] macintosh: Replace deprecated __initcall with device_initcall 2008-05-15 20:50:00 +10:00
mca proc: remove proc_root from drivers 2008-04-29 08:06:18 -07:00
md md: restart recovery cleanly after device failure. 2008-05-24 09:56:10 -07:00
media missing dependencies on HAS_DMA 2008-05-21 16:55:59 -07:00
memstick
message Remove duplicated unlikely() in IS_ERR() 2008-04-29 08:06:25 -07:00
mfd HTC_EGPIO is ARM-only 2008-05-21 16:56:00 -07:00
misc drivers/misc/sgi-xp: replace partid_t with a short 2008-05-13 08:02:23 -07:00
mmc missing dependencies on HAS_DMA 2008-05-21 16:55:59 -07:00
mtd mtd: solutionengine flash map depends on solution engine mach group. 2008-05-08 19:51:40 +09:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-05-21 22:14:39 -07:00
nubus proc: convert /proc/bus/nubus to seq_file interface 2008-04-29 08:06:19 -07:00
of [POWERPC] Add null pointer check to of_find_property 2008-05-15 20:49:49 +10:00
oprofile oprofile: don't request cache line alignment for cpu_buffer 2008-05-14 19:11:12 -07:00
parisc drivers/parisc: replace remaining __FUNCTION__ occurrences 2008-05-15 10:38:54 -04:00
parport debugobjects: add timer specific object debugging code 2008-04-30 08:29:53 -07:00
pci Clean up 'print_fn_descriptor_symbol()' types 2008-05-15 17:50:37 -07:00
pcmcia pcmcia: replace remaining __FUNCTION__ occurrences 2008-05-01 08:04:00 -07:00
pnp Clean up 'print_fn_descriptor_symbol()' types 2008-05-15 17:50:37 -07:00
power Power Supply: fix race in device_create 2008-05-20 13:31:55 -07:00
ps3 [POWERPC] PS3: Remove unsupported wakeup sources 2008-05-02 15:00:44 +10:00
rapidio [RAPIDIO] Auto-probe the RapidIO system size 2008-04-29 19:40:28 +10:00
rtc rtc: m41t80: include <linux/kernel.h> for printk() 2008-05-13 08:02:26 -07:00
s390 s390: fix race in device_create 2008-05-20 13:31:56 -07:00
sbus sbus bpp: instances missed in s/dev_name/bpp_dev_name/ 2008-05-21 16:55:59 -07:00
scsi scsi: fix integer as NULL pointer warning 2008-05-23 08:11:07 -07:00
serial serial: support for InstaShield IS-400 four port RS-232 PCI card 2008-05-24 09:56:09 -07:00
sh
sn
spi mpc5200_psc_spi: typo fix in header block 2008-05-14 19:11:12 -07:00
ssb
tc
telephony
thermal thermal: re-name thermal.c to thermal_sys.c 2008-04-29 03:12:17 -04:00
uio UIO: fix race in device_create 2008-05-20 13:31:55 -07:00
usb Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 2008-05-20 17:20:49 -07:00
video fbdev: fix integer as NULL pointer warning 2008-05-23 08:11:07 -07:00
virtio virtio: explicit advertisement of driver features 2008-05-02 21:50:50 +10:00
w1 drivers: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:53 -07:00
watchdog
xen
zorro zorro: use non-racy method for proc entries creation 2008-04-29 08:06:21 -07:00
Kconfig Basic braille screen reader support 2008-04-30 08:29:52 -07:00
Makefile Basic braille screen reader support 2008-04-30 08:29:52 -07:00