linux

Author	SHA1	Message	Date
Herbert Xu	97402b96f8	virtio net: Allow receiving SG packets Finally this patch lets virtio_net receive GSO packets in addition to sending them. This can definitely be optimised for the non-GSO case. For comparison the Xen approach stores one page in each skb and uses subsequent skb's pages to construct an SG skb instead of preallocating the maximum amount of pages per skb. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)	2008-07-25 12:06:01 +10:00
Herbert Xu	a9ea3fc6f2	virtio net: Add ethtool ops for SG/GSO This patch adds some basic ethtool operations to virtio_net so I could test SG without GSO (which was really useful because TSO turned out to be buggy :) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)	2008-07-25 12:06:01 +10:00
Mark McLoughlin	9953ca6cb7	virtio: fix virtio_net xmit of freed skb bug On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote: > If we fail to transmit a packet, we assume the queue is full and put > the skb into last_xmit_skb. However, if more space frees up before we > xmit it, we loop, and the result can be transmitting the same skb twice. > > Fix is simple: set skb to NULL if we've used it in some way, and check > before sending. ... > diff -r 564237b31993 drivers/net/virtio_net.c > --- a/drivers/net/virtio_net.c Mon May 19 12:22:00 2008 +1000 > +++ b/drivers/net/virtio_net.c Mon May 19 12:24:58 2008 +1000 > @@ -287,21 +287,25 @@ again: > free_old_xmit_skbs(vi); > > /* If we has a buffer left over from last time, send it now. / > - if (vi->last_xmit_skb) { > + if (unlikely(vi->last_xmit_skb)) { > if (xmit_skb(vi, vi->last_xmit_skb) != 0) { > / Drop this skb: we only queue one. */ > vi->dev->stats.tx_dropped++; > kfree_skb(skb); > + skb = NULL; > goto stop_queue; > } > vi->last_xmit_skb = NULL; With this, may drop an skb and then later in the function discover that we could have sent it after all. Poor wee skb :) How about the incremental patch below? Cheers, Mark. Subject: [PATCH] virtio_net: Delay dropping tx skbs Currently we drop the skb in start_xmit() if we have a queued buffer and fail to transmit it. However, if we delay dropping it until we've stopped the queue and enabled the tx notification callback, then there is a chance space might become available for it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:00 +10:00
Mark McLoughlin	5e4fe5c45a	virtio_net: Set VIRTIO_NET_F_GUEST_CSUM feature We can handle receiving partial csums, so set the appropriate feature bit. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-11 01:20:33 -04:00
Rusty Russell	363f15149c	virtio: use callback on empty in virtio_net virtio_net uses a timer to free old transmitted packets, rather than leaving callbacks enabled all the time. If the host promises to always notify us when the transmit ring is empty, we can free packets at that point and avoid the timer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-06-10 18:20:32 -04:00
Mark McLoughlin	14c998f034	virtio: virtio_net free transmit skbs in a timer virtio_net currently only frees old transmit skbs just before queueing new ones. If the queue is full, it then enables interrupts and waits for notification that more work has been performed. However, a side-effect of this scheme is that there are always xmit skbs left dangling when no new packets are sent, against the Documentation/networking/driver.txt guideline: "... it is not allowed for your TX mitigation scheme to let TX packets "hang out" in the TX ring unreclaimed forever if no new TX packets are sent." Add a timer to ensure that any time we queue new TX skbs, we will shortly free them again. This fixes an easily reproduced hang at shutdown where iptables attempts to unload nf_conntrack and nf_conntrack waits for an skb it is tracking to be freed, but virtio_net never frees it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-06-10 18:20:31 -04:00
Mark McLoughlin	23cde76d80	virtio_net: Fix skb->csum_start computation hdr->csum_start is the offset from the start of the ethernet header to the transport layer checksum field. skb->csum_start is the offset from skb->head. skb_partial_csum_set() assumes that skb->data points to the ethernet header - i.e. it computes skb->csum_start by adding the headroom to hdr->csum_start. Since eth_type_trans() skb_pull()s the ethernet header, skb_partial_csum_set() should be called before eth_type_trans(). (Without this patch, GSO packets from a guest to the world outside the host are corrupted). Signed-off-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-06-10 18:20:29 -04:00
Rusty Russell	11a3a1546d	virtio: fix delayed xmit of packet and freeing of old packets. Because we cache the last failed-to-xmit packet, if there are no packets queued behind that one we may never send it (reproduced here as TCP stalls, "cured" by an outgoing ping). Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-30 22:07:21 -04:00
Rusty Russell	7eb2e25112	virtio: fix virtio_net xmit of freed skb bug If we fail to transmit a packet, we assume the queue is full and put the skb into last_xmit_skb. However, if more space frees up before we xmit it, we loop, and the result can be transmitting the same skb twice. Fix is simple: set skb to NULL if we've used it in some way, and check before sending. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-30 22:07:20 -04:00
Wang Chen	288369cc25	VIRTIO: Use __skb_queue_purge() Use standard routine for queue purging. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-22 14:01:02 -04:00
Rusty Russell	c45a6816c1	virtio: explicit advertisement of driver features A recent proposed feature addition to the virtio block driver revealed some flaws in the API: in particular, we assume that feature negotiation is complete once a driver's probe function returns. There is nothing in the API to require this, however, and even I didn't notice when it was violated. So instead, we require the driver to specify what features it supports in a table, we can then move the feature negotiation into the virtio core. The intersection of device and driver features are presented in a new 'features' bitmap in the struct virtio_device. Note that this highlights the difference between Linux unsigned-long bitmaps where each unsigned long is in native endian, and a straight-forward little-endian array of bytes. Drivers can still remove feature bits in their probe routine if they really have to. API changes: - dev->config->feature() no longer gets and acks a feature. - drivers should advertise their features in the 'feature_table' field - use virtio_has_feature() for extra sanity when checking feature bits Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:50 +10:00
Rusty Russell	5539ae9613	virtio: finer-grained features for virtio_net So, we previously had a 'VIRTIO_NET_F_GSO' bit which meant that 'the host can handle csum offload, and any TSO (v4&v6 incl ECN) or UFO packets you might want to send. I thought this was good enough for Linux, but it actually isn't, since we don't do UFO in software. So, add separate feature bits for what the host can handle. Add equivalent ones for the guest to say what it can handle, because LRO is coming too (thanks Herbert!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:47 +10:00
Rusty Russell	99ffc696d1	virtio: wean net driver off NETDEV_TX_BUSY Herbert tells me that returning NETDEV_TX_BUSY from hard_start_xmit is seen as a poor thing to do; we should cache the packet and stop the queue. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-05-02 21:50:46 +10:00
Rusty Russell	0527168522	virtio: fix scatterlist sizing in net driver. Herbert Xu points out (within another patch) that my scatterlists are too short: one entry for the gso header, one for the skb->data, and MAX_SKB_FRAGS for all the fragments. Fix both xmit and recv sides (recv currently unused, coming in later patch). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:45 +10:00
Rusty Russell	655aa31f02	virtio: fix tx_ stats in virtio_net get_buf() gives the length written by the other side, which will be zero. We want to add the skb length. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:44 +10:00
Linus Torvalds	90768c09bc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: [NETNS][IPV6] tcp - assign the netns for timewait sockets [IPV4]: Fix byte value boundary check in do_ip_getsockopt(). BNX2X: Correct bringing chip out of reset [NETFILTER]: nf_nat: autoload IPv4 connection tracking [NETFILTER]: xt_hashlimit: fix mask calculation [XFRM]: xfrm_user: fix selector family initialization rt61pci: rt61pci_beacon_update do not free skb twice ssb-mipscore: Fix interrupt vectors ssb-pcicore: Fix IRQ TPS flag handling mac80211: use short_preamble mode from capability if ERP IE not present [NET]: Undo code bloat in hot paths due to print_mac(). [TCP]: Don't allow FRTO to take place while MTU is being probed [TCP]: tcp_simple_retransmit can cause S+L [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack nl80211: fix STA AID bug b43legacy: fix bcm4303 crash iwlwifi: fix n-band association problem ipw2200: set MAC address on radiotap interface libertas: fix mode initialization problem	2008-04-11 08:10:24 -07:00
David S. Miller	21f644f3ea	[NET]: Undo code bloat in hot paths due to print_mac(). If print_mac() is used inside of a pr_debug() the compiler can't see that the call is redundant so still performs it even of pr_debug() ends up being a nop. So don't use print_mac() in such cases in hot code paths, use MAC_FMT et al. instead. As noted by Joe Perches, pr_debug() could be modified to handle this better, but that is a change to an interface used by the entire kernel and thus needs to be validated carefully. This here is thus the less risky fix for 2.6.25 Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-08 16:50:44 -07:00
Anthony Liguori	6ea0a4679d	virtio_net: remove overzealous printk The 'disable_cb' is really just a hint and as such, it's possible for more work to get queued up while callbacks are disabled. Under stress with an SMP guest, this printk triggers very frequently. There is no race here, this is how things are designed to work so let's just remove the printk. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-07 13:56:33 -07:00
Christian Borntraeger	4265f161b6	virtio: fix race in enable_cb There is a race in virtio_net, dealing with disabling/enabling the callback. I saw the following oops: kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218! illegal operation: 0001 [#1] SMP Modules linked in: sunrpc dm_mod CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99 Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60) Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001 000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000 0000000000000000 000000000f870000 0000000000000000 0000000000001237 000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8 Krnl Code: 00000000002b819a: a7110001 tmll %r1,1 00000000002b819e: a7840004 brc 8,2b81a6 00000000002b81a2: a7f40001 brc 15,2b81a4 >00000000002b81a6: a51b0001 oill %r1,1 00000000002b81aa: 40102000 sth %r1,0(%r2) 00000000002b81ae: 07fe bcr 15,%r14 00000000002b81b0: eb7ff0380024 stmg %r7,%r15,56(%r15) 00000000002b81b6: a7f13e00 tmll %r15,15872 Call Trace: ([<000000000fa0bcd0>] 0xfa0bcd0) [<00000000002b8350>] vring_interrupt+0x5c/0x6c [<000000000010ab08>] do_extint+0xb8/0xf0 [<0000000000110716>] ext_no_vtime+0x16/0x1a [<0000000000107e72>] cpu_idle+0x1c2/0x1e0 The problem can be triggered with a high amount of host->guest traffic. I think its the following race: poll says netif_rx_complete poll calls enable_cb enable_cb opens the interrupt mask a new packet comes, an interrupt is triggered----\ enable_cb sees that there is more work \| enable_cb disables the interrupt \| . V . interrupt is delivered . skb_recv_done does atomic napi test, ok some waiting disable_cb is called->check fails->bang! . poll would do napi check poll would do disable_cb The fix is to let enable_cb not disable the interrupt again, but expect the caller to do the cleanup if it returns false. In that case, the interrupt is only disabled, if the napi test_set_bit was successful. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)	2008-03-17 22:58:21 +11:00
Amit Shah	da74e89d40	virtio: Enable netpoll interface for netconsole logging Add a new poll_controller handler that the netpoll interface needs. This enables netconsole logging from a kvm guest over the virtio net interface. Signed-off-by: Amit Shah <amitshah@gmx.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-17 22:58:20 +11:00
Christian Borntraeger	d9d5dcc88c	virtio_net: Fix oops on early interrupts - introduced by virtio reset code Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-02-23 23:55:04 -05:00
Christian Borntraeger	370076d932	virtio net: fix oops on interface-up I got the following oops during interface ifup. Unfortunately its not easily reproducable so I cant say for sure that my fix fixes this problem, but I am confident and I think its correct anyway: <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234! <4>illegal operation: 0001 [#1] PREEMPT SMP <4>Modules linked in: <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91 <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8) <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34) <4> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344 <4> 000000000e980900 00000000008848b0 000000000084e748 0000000000000000 <4> 000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8 <4> 000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8 <4>Krnl Code: 0000000000466368: e3b0b0700004 lg %r11,112(%r11) <4> 000000000046636e: 07fe bcr 15,%r14 <4> 0000000000466370: a7f40001 brc 15,466372 <4> >0000000000466374: a7f4fff6 brc 15,466360 <4> 0000000000466378: eb7ff0500024 stmg %r7,%r15,80(%r15) <4> 000000000046637e: a7f13e00 tmll %r15,15872 <4> 0000000000466382: b90400ef lgr %r14,%r15 <4> 0000000000466386: a7840001 brc 8,466388 <4>Call Trace: <4>([<000201500f85c000>] 0x201500f85c000) <4> [<0000000000466556>] vring_interrupt+0x72/0x88 <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44 <4> [<000000000010d22c>] do_extint+0xbc/0xf8 <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a <4> [<000000000010a182>] cpu_idle+0x216/0x238 <4>([<000000000010a162>] cpu_idle+0x1f6/0x238) <4> [<0000000000568656>] rest_init+0xaa/0xb8 <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490 <4> [<0000000000100020>] _stext+0x20/0x80 <4> <4> <0>Kernel panic - not syncing: Fatal exception in interrupt <4> After looking at the code and the dump I think the following scenario happened: Ifup was running on cpu2 and the interrupt arrived on cpu0. Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was called by vring_interrupt, executed netif_rx_schedule_prep, which succeeded and therefore called disable_cb. This triggered the BUG_ON, as interrupts were already disabled by cpu 2. I think the proper solution is to make the call to disable_cb depend on the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep in the same way as skb_recv_done. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-02-06 06:42:30 -05:00
Dor Laor	6c0cd7c000	virtio_net: parametrize the napi_weight for virtio receive queue. It is done in order to improve performance. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:09 +11:00
Rusty Russell	2cb9c6bafc	virtio: free transmit skbs when notified, not on next xmit. This fixes a potential dangling xmit problem. We also suppress refill interrupts until we need them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:08 +11:00
Rusty Russell	a48bd8f670	virtio: flush buffers on open Fix bug found by Christian Borntraeger: if the other side fills all the registered network buffers before we enable NAPI, we will never get an interrupt. The simplest fix is to process the input queue once on open. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:07 +11:00
Christian Borntraeger	e70f2f1bb8	virtnet: remove double ether_setup Hello Rusty, virtnet_probe already calls alloc_etherdev, which calls ether_setup. There is no need to do that again. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:07 +11:00
Rusty Russell	6e5aa7efb2	virtio: reset function A reset function solves three problems: 1) It allows us to renegotiate features, eg. if we want to upgrade a guest driver without rebooting the guest. 2) It gives us a clean way of shutting down virtqueues: after a reset, we know that the buffers won't be used by the host, and 3) It helps the guest recover from messed-up drivers. So we remove the ->shutdown hook, and the only way we now remove feature bits is via reset. We leave it to the driver to do the reset before it deletes queues: the balloon driver, for example, needs to chat to the host in its remove function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:03 +11:00
Rusty Russell	b3369c1fb4	virtio: populate network rings in the probe routine, not open Since we want to reset the device to remove them, this is simpler (device is reset for us on driver remove). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:03 +11:00
Rusty Russell	34a48579e4	virtio: Tweak virtio_net defines 1) Turn GSO on virtio net into an all-or-nothing (keep checksumming separate). Having multiple bits is a pain: if you can't support something you should handle it in software, which is still a performance win. 2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to IPv6 or v4. 3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do checksumming). 4) Add csum and gso params to virtio_net to allow more testing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:02 +11:00
Rusty Russell	50c8ea8080	virtio: Net header needs hdr_len It's far easier to deal with packets if we don't have to parse the packet to figure out the header length to know how much to pull into the skb data. Add the field to the virtio_net_hdr struct (and fix the spaces that somehow crept in there). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:02 +11:00
Rusty Russell	18445c4d50	virtio: explicit enable_cb/disable_cb rather than callback return. It seems that virtio_net wants to disable callbacks (interrupts) before calling netif_rx_schedule(), so we can't use the return value to do so. Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback now returns void, rather than a boolean. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:49:58 +11:00
Rusty Russell	a586d4f601	virtio: simplify config mechanism. Previously we used a type/len pair within the config space, but this seems overkill. We now simply define a structure which represents the layout in the config space: the config space can now only be extended at the end. The main driver-visible changes: 1) We indicate what fields are present with an explicit feature bit. 2) Virtqueues are explicitly numbered, and not in the config space. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:49:57 +11:00
Rusty Russell	f35d9d8aae	virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets. Use it in virtio_net (replacing buggy version there), it's also going to be used by TAP for partial csum support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net>	2008-02-04 23:49:56 +11:00
Rusty Russell	8329d98e48	virtio: fix net driver loop case where we fail to restart skb is only NULL the first time around: it's more correct to test for being under-budget. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-11-19 11:20:44 +11:00
Rusty Russell	74b2553f1d	virtio: fix module/device unloading The virtio code never hooked through the ->remove callback. Although noone supports device removal at the moment, this code is already needed for module unloading. This of course also revealed bugs in virtio_blk, virtio_net and lguest unloading paths. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-11-19 11:20:42 +11:00
Rusty Russell	4d125de3a5	virtio: more fallout from scatterlist changes. This fixes OOPS in network driver when CONFIG_DEBUG_SG=y. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-11-12 13:55:25 +11:00
Rusty Russell	296f96fcfc	Net driver using virtio The network driver uses two virtqueues: one for input packets and one for output packets. This has nice locking properties (ie. we don't do any for recv vs send). TODO: 1) Big packets. 2) Multi-client devices (maybe separate driver?). 3) Resolve freeing of old xmit skbs (Christian Borntraeger) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org	2007-10-23 15:49:54 +10:00

1 2 3

137 Commits