linux

History

Christian Borntraeger 52a3a05f3a virtio_net: another race with virtio_net and enable_cb Hello Rusty, seems that we still have a problem with virtio_net and the enable_cb callback. During a long running network stress tests with virtio and got the following oops: ------------[ cut here ]------------ kernel BUG at drivers/virtio/virtio_ring.c:230! illegal operation: 0001 [#1] SMP Modules linked in: CPU: 0 Not tainted 2.6.26-rc2-kvm-00436-gc94c08b-dirty #34 Process netserver (pid: 2582, task: 000000000fbc4c68, ksp: 000000000f42b990) Krnl PSW : 0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 0000000000000000 0000000000000000 000000000ef3d000 0000000010009800 0000000000000000 0000000000419ce0 0000000000000080 000000000000007b 000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920 0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18 Krnl Code: 00000000002d0ebc: a7110001 tmll %r1,1 00000000002d0ec0: a7740004 brc 7,2d0ec8 00000000002d0ec4: a7f40001 brc 15,2d0ec6 >00000000002d0ec8: a517fffe nill %r1,65534 00000000002d0ecc: 40103000 sth %r1,0(%r3) 00000000002d0ed0: 07f0 bcr 15,%r0 00000000002d0ed2: e31020380004 lg %r1,56(%r2) 00000000002d0ed8: a7480000 lhi %r4,0 Call Trace: ([<000000000029c0fc>] virtnet_poll+0x290/0x3b8) [<0000000000333fb8>] net_rx_action+0x9c/0x1b8 [<00000000001394bc>] __do_softirq+0x74/0x108 [<000000000010d16a>] do_softirq+0x92/0xac [<0000000000139826>] irq_exit+0x72/0xc8 [<000000000010a7b6>] do_extint+0xe2/0x104 [<0000000000110508>] ext_no_vtime+0x16/0x1a Last Breaking-Event-Address: [<00000000002d0ec4>] vring_enable_cb+0x18/0x60 I looked into the virtio_net code for some time and I think the following scenario happened. Please look at virtnet_poll: [...] /* Out of packets? / if (received < budget) { netif_rx_complete(vi->dev, napi); if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq)) && napi_schedule_prep(napi)) { vi->rvq->vq_ops->disable_cb(vi->rvq); __netif_rx_schedule(vi->dev, napi); goto again; } } If an interrupt arrives after netif_rx_complete, a second poll routine can run on a different cpu. The second check for napi_schedule_prep would prevent any harm in the network stack, but we have called enable_cb possibly after the disable_cb in skb_recv_done. static void skb_recv_done(struct virtqueue rvq) { struct virtnet_info vi = rvq->vdev->priv; / Schedule NAPI, Suppress further interrupts if successful. */ if (netif_rx_schedule_prep(vi->dev, &vi->napi)) { rvq->vq_ops->disable_cb(rvq); __netif_rx_schedule(vi->dev, &vi->napi); } } That means that the second poll routine runs with interrupts enabled, which is ok, since we can handle additional interrupts. The problem is now that the second poll routine might also call enable_cb, triggering the BUG. The only solution I can come up with, is to remove the BUG statement in enable_cb - similar to disable_cb. Opinions or better ideas where the oops could come from? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>		2008-05-30 15:09:45 +10:00
..
accessibility	Kconfig: improved help for CONFIG_ACCESSIBILITY	2008-05-08 10:46:55 -07:00
acorn/char
acpi	acpi: fix integer as NULL pointer warning	2008-05-23 08:11:06 -07:00
amba
ata	drivers/ata: trim trailing whitespace	2008-05-19 17:56:10 -04:00
atm	drivers/atm/: remove CVS keywords	2008-05-20 14:52:25 -07:00
auxdisplay
base	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-05-20 17:23:03 -07:00
block	virtio_blk: allow read-only disks	2008-05-30 15:09:44 +10:00
bluetooth	hci_usb.h: fix hard-to-trigger race	2008-05-02 16:45:10 -07:00
cdrom	[POWERPC] iSeries: Remove unused mail address	2008-05-23 16:45:04 +10:00
char	virtio: An entropy device, as suggested by hpa.	2008-05-30 15:09:44 +10:00
clocksource
connector
cpufreq	[CPUFREQ] clarify license of freq_table.c	2008-05-22 16:38:03 -04:00
cpuidle
crypto
dca
dio
dma	iop-adma: fixup some kzalloc/memset confusions	2008-05-20 13:51:20 -07:00
edac	edac: mpc85xx: fix building as a module	2008-05-24 09:56:13 -07:00
eisa
firewire	firewire: prevent userspace from accessing shut down devices	2008-05-20 18:24:17 +02:00
firmware
gpio	gpiolib: fix off by one errors	2008-05-24 09:56:11 -07:00
hid	HID: remove CVS keywords	2008-05-20 16:44:43 +02:00
hwmon	ibmaem: new driver for power/energy/temp meters in IBM System X hardware	2008-05-24 09:56:08 -07:00
i2c	i2c/max6875: Really prevent 24RF08 corruption	2008-05-18 20:49:41 +02:00
ide	ide: fix race in device_create	2008-05-20 13:31:54 -07:00
ieee1394	ieee1394: sbp2: use correct size of command descriptor block	2008-05-20 18:24:17 +02:00
infiniband	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband	2008-05-23 11:11:44 -07:00
input	MODULE_LICENSE expects "GPL v2", not "GPLv2"	2008-05-21 16:56:00 -07:00
isdn	isdn: fix integer as NULL pointer warning	2008-05-23 08:11:06 -07:00
leds	LEDS: fix race in device_create	2008-05-20 13:31:55 -07:00
lguest	virtio: set device index in common code.	2008-05-30 15:09:42 +10:00
macintosh	[POWERPC] macintosh: Replace deprecated __initcall with device_initcall	2008-05-15 20:50:00 +10:00
mca
md	md: restart recovery cleanly after device failure.	2008-05-24 09:56:10 -07:00
media	tuner: Do not alter i2c_client.name	2008-05-26 16:08:40 +02:00
memstick
message
mfd	HTC_EGPIO is ARM-only	2008-05-21 16:56:00 -07:00
misc	drivers/misc/sgi-xp: replace partid_t with a short	2008-05-13 08:02:23 -07:00
mmc	missing dependencies on HAS_DMA	2008-05-21 16:55:59 -07:00
mtd	ck804rom: fix driver_data in probe table.	2008-05-27 07:34:38 -07:00
net	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-05-26 10:14:02 -07:00
nubus
of	[POWERPC] Add null pointer check to of_find_property	2008-05-15 20:49:49 +10:00
oprofile	oprofile: don't request cache line alignment for cpu_buffer	2008-05-14 19:11:12 -07:00
parisc	drivers/parisc: replace remaining __FUNCTION__ occurrences	2008-05-15 10:38:54 -04:00
parport
pci	pciehp: add message about pciehp_slot_with_bus option	2008-05-27 15:43:47 -07:00
pcmcia
pnp	Clean up 'print_fn_descriptor_symbol()' types	2008-05-15 17:50:37 -07:00
power	Power Supply: fix race in device_create	2008-05-20 13:31:55 -07:00
ps3	[POWERPC] PS3: Remove unsupported wakeup sources	2008-05-02 15:00:44 +10:00
rapidio
rtc	rtc: m41t80: include <linux/kernel.h> for printk()	2008-05-13 08:02:26 -07:00
s390	virtio: set device index in common code.	2008-05-30 15:09:42 +10:00
sbus	sbus bpp: instances missed in s/dev_name/bpp_dev_name/	2008-05-21 16:55:59 -07:00
scsi	scsi: fix integer as NULL pointer warning	2008-05-23 08:11:07 -07:00
serial	serial: fix enable_irq_wake/disable_irq_wake imbalance in serial_core.c	2008-05-24 09:56:11 -07:00
sh
sn
spi	spi: remove some spidev oops-on-rmmod paths	2008-05-24 09:56:14 -07:00
ssb
tc
telephony
thermal
uio	UIO: fix race in device_create	2008-05-20 13:31:55 -07:00
usb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6	2008-05-20 17:20:49 -07:00
video	Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm	2008-05-24 10:13:16 -07:00
virtio	virtio_net: another race with virtio_net and enable_cb	2008-05-30 15:09:45 +10:00
w1
watchdog	[WATCHDOG] Add ICH9DO into the iTCO_wdt.c driver	2008-05-25 09:45:39 +00:00
xen
zorro
Kconfig
Makefile