linux

Author	SHA1	Message	Date
Julian Stecklina	eca2a33c98	igbvf: Remove some dead code in igbvf Removed unused variable in igbvf. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Acked-by: Greg Rose <greg.v.rose@intel.com> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:41:36 -08:00
Greg Rose	2c20ebbaed	igbvf: Update version and Copyright Update version string and copyright notice Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:41:35 -08:00
Greg Rose	5d426ad1af	ixgbevf: Fix Oops The driver is calling netif_carrier_off and netif_tx_stop_all_queues before the netdevice is registered which causes an Oops. Move call to netif_carrier_off after the netdevice is registered and remove call to netif_tx_stop_all_queues because there aren't any TX queues yet. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:19 -08:00
Eric Dumazet	e2ddeba95c	ixgbe: refactor ixgbe_alloc_queues() I noticed ring variable was initialized before allocations, and that memory node management was a bit ugly. We also leak memory in case of ring allocations error. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:18 -08:00
Don Skidmore	b93a22260f	ixgbe: add support for x540 MAC This patch adds support for the x540 MAC which is the next MAC in the 82598/82599 line. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:17 -08:00
Don Skidmore	fe15e8e1c7	ixgbe: add MAC and PHY support for x540 Adds the new x540.c file and Aquantia 1202 PHY for X540 support. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:16 -08:00
Don Skidmore	a391f1d512	ixgbe: make silicon specific functions generic The new MAC type X540 shares much of the same functionality of some silicon specific functions. To reduce duplicate code, made these functions generic. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:15 -08:00
Yi Zou	9b55bb0384	ixgbe: make sure FCoE DDP user buffers are really released by the HW When invalidating the DDP context is invalidated, the HW may not be done with the user buffer right away. In which case, we poll the FCBUFF register to check if the buffer valid bit is cleared or not, if not, we wait for max 100us that is guaranteed by the HW. Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:14 -08:00
Yi Zou	8ca371e484	ixgbe: invalidate FCoE DDP context when no error status is available The hw automatically invalidates the context if DDP is successful or there is error detected. In case there is no error status available from the hw, initializing the per context error status to be 1 allows the DDP context to be still invalidated via the upper layer call to ddp_put(). Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:13 -08:00
Yi Zou	a41c059741	ixgbe: avoid doing FCoE DDP when adapter is DOWN or RESETTING There is no point to allow incoming DDP requests from the upper layer stack if the adapter is going down or being reset. Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:13 -08:00
John Fastabend	c84d324c77	ixgbe: rework Tx hang detection to fix reoccurring false Tx hangs The Tx hang logic has been known to detect false hangs when the device is receiving pause frames or has delayed processing for some other reason. This patch makes the logic more robust and resolves these known issues. The old logic checked to see if the device was paused by querying the HW then the hang logic was aborted if the device was currently paused. This check was racy because the device could have been in the pause state any time up to this check. The other operation of the hang logic is to verify the Tx ring is still advancing the old logic checked the EOP timestamp. This is not sufficient to determine the ring is not advancing but only infers that it may be moving slowly. Here we add logic to track the number of completed Tx descriptors and use the adapter stats to check if any pause frames have been received since the previous Tx hang check. This way we avoid racing with the HW register and do not detect false hangs if the ring is advancing slowly. This patch is primarily the work of Jesse Brandeburg. I clean it up some and fixed the PFC checking. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:12 -08:00
Alexander Duyck	e3de4b7bdf	ixgbe: Resolve null function pointer accesses on 82598 w/ multi-speed fiber This change resolves some null function pointer accesses on 82598 when a multi-speed fiber module is inserted into the adapter. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:11 -08:00
Alexander Duyck	2274543f15	ixgbe: populate the ring->q_vector pointer during ring mapping The q_vector back pointer was not being set in the rings so it would not have been possible to determine the parent q_vector of the ring. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:10 -08:00
Alexander Duyck	d0759ebb05	ixgbe: cleanup ixgbe_map_rings_to_vectors This change cleans up some of the items in ixgbe_map_rings_to_vectors. Specifically it merges the two for loops and drops the unnecessary vectors parameter. It also moves the vector names into the q_vectors themselves. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:09 -08:00
Alexander Duyck	125601bf03	ixgbe: simplify math and improve stack use of ixgbe_set_itr functions This change is meant to improve the stack utilization and simplify the math used in ixgbe_set_itr_msix. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:08 -08:00
Alexander Duyck	bf29ee6c48	ixgbe: cleanup unclear references to reg_idx There are a number of places where we use the variable j to contain the register index of the ring. Instead of using such a non-descriptive variable name it is better that we name it reg_idx so that it is clear what the variable contains. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:07 -08:00
Alexander Duyck	9d6b758f42	ixgbe: cleanup unnecessary return value in ixgbe_cache_ring_rss This change is just to cleanup some confusing logic in ixgbe_cache_ring_rss which can be simplified by adding a conditional with return to the start of the call. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:06 -08:00
Alexander Duyck	673ac60461	ixgbe: Cleanup DCB logic, whitespace, and comments in ixgbe_ethtool.c This change address a few whitespace issues in DCB #ifdefs, adds a comment calling out the DCB specific registers, and nests an if statement inline with a number of if statements related to flow control. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:05 -08:00
Alexander Duyck	50d6c681d0	ixgbe: add WOL support for backplane adapters This change adds support for certain 82599 based Mezzanine adapters. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:05 -08:00
Alexander Duyck	e2b4e216b7	ixgbe: cleanup ixgbe_set_tx_csum ethtool flags configuration This change makes it so that we always disable SCTP regardless of mac type since we shouldn't need to check mac type before disabling a feature that isn't supported on a given piece of hardware. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:04 -08:00
Alexander Duyck	bd50817859	ixgbe: change mac_type if statements to switch statements This change replaces a number of if/elseif/else statements with switch statements to support the addition of future devices to the ixgbe driver. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:03 -08:00
Alexander Duyck	aa80175a53	ixgbe: cleanup use of ixgbe_rsc_count and RSC_CB This change cleans up the use of rsc_count and changes it to a boolean since the actual numerical value is used nowhere in the Rx cleanup path. I am also moving the skb count into the RSC_CB path since it is much easier to track it there than when it is passed as a parameter to various function calls. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:02 -08:00
Alexander Duyck	ee9e0f0b40	ixgbe: cleanup ATR filter setup function This change cleans up the ixgbe_atr filter setup function so that it uses fewer items from the stack. Since the code is only applicable to IPv4 w/ TCP it makes sense to just use the pointers based on the headers themselves instead of copying them to temp variables and then writing those to the filters. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:01 -08:00
Alexander Duyck	c267fc166a	ixgbe: cleanup ixgbe_clean_rx_irq The code for ixgbe_clean_rx_irq was much more tangled up than it needed to be in terms of logic statements and unused variables. This change untangles much of that and drops several unused variables such as cleaned which was being returned but never checked. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:27:00 -08:00
Alexander Duyck	32aa77a4fc	ixgbe: change vector numbering so that queues end up on correct CPUs This changes the numbering scheme slightly. Previously the ordering was coming out like this: Rx-2 Rx-1 Rx-0 TxRx-0 Which would drop two queues on CPU 0. This change makes it so that the ordering is like this: Rx-3 Rx-2 Rx-1 TxRx-0 This means that each CPU will have it's own Rx queue, and only CPU 0 will have the Tx queue. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:59 -08:00
Alexander Duyck	b953799ee2	ixgbe: reorder Tx cleanup so that if adapter will reset we don't rearm The code as it existed could re-arm the queues when it was requesting a HW reset due to a TX hang. Instead of doing that this change makes it so that we will just exit if the hardware is believed to be hung. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:58 -08:00
Alexander Duyck	80fba3f434	ixgbe: Disable RSC when ITR setting is too high to allow RSC RSC will flush its descriptors every time the interrupt throttle timer expires. In addition there are known issues with RSC when the rx-usecs value is set too low. As such we are forced to clear the RSC_ENABLED bit and reset the adapter when the rx-usecs value is set too low. However we do not need to clear the NETIF_F_LRO flag because it is used to indicate that the user wants to leave the LRO feature enabled, and in fact with this change we will now re-enable RSC as soon as the rx-usecs value is increased and the flag is still set. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:57 -08:00
Alexander Duyck	73c4b7cdd2	ixgbe: cleanup race conditions in link setup This change makes it so that we perform link setup with interrupts disabled. If the SFP has not been detected previously we will schedule the SFP detection task to run in order to detect link. By doing this we avoid the possibility of interrupts firing in the middle of our link setup during ixgbe_up_complete. In addition this change makes it so that the multi-speed fiber setup and SFP setup are not mutually exclusive. The addresses issues seen in which a link would only come up at 1G on some multi-speed fiber modules. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:57 -08:00
Alexander Duyck	7d637bcc8f	ixgbe: add a state flags to ring This change adds a set of state flags to the rings that allow them to independently function allowing for features like RSC, packet split, and TX hang detection to be done per ring instead of for the entire device. This is accomplished by re-purposing the flow director reinit_state member and making it a global state instead since a long for a single bit flag is a bit wasteful. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:56 -08:00
Alexander Duyck	33cf09c958	ixgbe: move CPU variable from ring into q_vector, add ring->q_vector This is the start of work to sort out what belongs in the rings and what belongs in the q_vector. Items like the CPU variable for make much more sense in the q_vector since the CPU is a per-interrupt thing rather than a per ring thing. I also added a back-pointer from the ring to the q_vector. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:55 -08:00
Alexander Duyck	c60fbb00f0	ixgbe: move adapter into pci_dev driver data instead of netdev This change moves an adapter pointer into the private portion of the pci_dev instead of a pointer to the netdev. The reason for this change is because in most cases we just want the adapter anyway. In addition as we start moving toward multiple netdevs per port we may want to move the adapter pointer out of the netdevs entirely. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:54 -08:00
Alexander Duyck	01fa7d905f	ixgbe: remove residual code left over from earlier combining of TXDCTL Missed some code that was left floating around in the DCB configuration for the TXDCTL register. As a result the register was being messed with in two different spots when we only needed to do the change once. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:53 -08:00
Alexander Duyck	5f5ae6fc86	ixgbe: move ixgbe_clear_interrupt_scheme to before pci_save_state The main reason for this change is to keep the suspend/resume logic matched up. The clear_interrupt_scheme function will disable MSI-X which will effect the PCIe configuration space. Therefore we will want to do it before we save state to avoid having the interrupt state restored by pci_restore_state, and then trying to re-enable MSI/MSI-X interrupts via ixgbe_setup_interrupt_scheme. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:52 -08:00
Alexander Duyck	fc77dc3cc1	ixgbe: add a netdev pointer to the ring structure This change places a netdev pointer directly into the ring structure. This way we can avoid having to determine which netdev we are supposed to be using and can just access the one on the ring directly. As a result of this change further collapse of the code is possible by dropping the adapter from ixgbe_alloc_rx_buffers, and the netdev pointer from ixgbe_xmit_frame_ring_adv and ixgbe_maybe_stop_tx. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:51 -08:00
Alexander Duyck	5b7da51547	ixgbe: combine some stats into a union to allow for Tx/Rx stats overlap This change moved some of the RX and TX stats into separate structures and them placed those structures in a union in order to help reduce the size of the ring structure. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:50 -08:00
Alexander Duyck	b6ec895ecd	ixgbe: move device pointer into the ring structure This change is meant to simplify DMA map/unmap by providing a device pointer. As a result the adapter pointer can be dropped from many of the calls. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:49 -08:00
Alexander Duyck	84ea2591e4	ixgbe: drop ring->head, make ring->tail a pointer instead of offset This change drops ring->head since it is not used in any hot-path and can easily be determined using IXGBE_[RT]DH(ring->reg_idx). It also changes ring->tail into a true pointer so we can avoid unnecessary pointer math to find the location of the tail. In addition I also dropped the setting of head and tail in ixgbe_clean_[rx\|tx]_ring. The only location that should be setting the head and tail values is ixgbe_configure_[rx\|tx]_ring and that is only while the queue is disabled. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:49 -08:00
Alexander Duyck	d5f398ed73	ixgbe: cleanup ixgbe_alloc_rx_buffers This change re-orders alloc_rx_buffers to make better use of the packet split enabled flag. The new setup should require less branching in the code since now we are down to fewer if statements since we either are handling packet split or aren't. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:48 -08:00
Alexander Duyck	8ad494b0e5	ixgbe: move GSO segments and byte count processing into ixgbe_tx_map This change simplifies the work being done by the TX interrupt handler and pushes it into the tx_map call. This allows for fewer cache misses since the TX cleanup now accesses almost none of the skb members. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:47 -08:00
Alexander Duyck	4c0ec6544a	ixgbe: remove unnecessary re-init of adapter on Rx-csum change There is no need to reset the adapter when changing the Rx checksum settings. Since the only change is a software flag we can disable it without needing to reset the entire adapter. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:46 -08:00
John Fastabend	80ab193dce	ixgbe: DCB: credit max only needs to be gt TSO size for 82598 The maximum credits per traffic class only needs to be greater then the TSO size for 82598 devices. The 82599 devices do not have this requirement so only do this test for 82598 devices. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:45 -08:00
John Fastabend	16b61beb39	ixgbe: DCB set PFC high and low water marks per data sheet specs Currently the high and low water marks for PFC are being set conservatively for jumbo frames. This means the RX buffers are being underutilized in the default 1500 MTU. This patch fixes this so that the water marks are set as described in the data sheet considering the MTU size. The equation used is, RTT * 1.44 + MTU * 1.44 + MTU Where RTT is the round trip time and MTU is the max frame size in KB. To avoid floating point arithmetic FC_HIGH_WATER is defined ((((RTT + MTU) * 144) + 99) / 100) + MTU This changes how the hardware field fc.low_water and fc.high_water are used. With this change they are no longer storing the actual low water and high water markers but are storing the required head room in the buffer. This simplifies the logic and we do not need to account for the size of the buffer when setting the thresholds. Testing with iperf and 16 threads showed a slight uptick in throughput over a single traffic class .1-.2Gbps and a reduction in pause frames. Without the patch a 30 second run would show ~10-15 pause frames being transmitted with the patch ~2-5 are seen. Test were run back to back with 82599. Note RXPBSIZE is in KB and low and high water marks fields are also in KB. However the FCRT* registers are 32B granularity and right shifted 5 into the register, (((rx_pbsize - water_mark) * 1024) / 32) << 5 is the most explicit conversion here we simplify (rx_pbsize - water_mark) * 32 << 5 = (rx_pbsize - water_mark) << 10 This patch updates the PFC thresholds and legacy FC thresholds. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:44 -08:00
Greg Rose	66c87bd50d	ixgbevf: Update Version String and Copyright Notice Update version string and copyright notice. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:43 -08:00
Eric Dumazet	1a51502bdd	ixgbe: delay rx_ring freeing "cat /proc/net/dev" uses RCU protection only. Its quite possible we call a driver get_stats() method while device is dismantling and freeing its data structures. So get_stats() methods must be very careful not accessing driver private data without appropriate locking. In ixgbe case, we access rx_ring pointers. These pointers are freed in ixgbe_clear_interrupt_scheme() and set to NULL, this can trigger NULL dereference in ixgbe_get_stats64() A possible fix is to use RCU locking in ixgbe_get_stats64() and defer rx_ring freeing after a grace period in ixgbe_clear_interrupt_scheme() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Tantilov, Emil S <emil.s.tantilov@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2010-11-16 19:26:42 -08:00
Eric Dumazet	b178bb3dfc	net: reorder struct sock fields Right now, fields in struct sock are not optimally ordered, because each path (RX softirq, TX completion, RX user, TX user) has to touch fields that are contained in many different cache lines. The really critical thing is to shrink number of cache lines that are used at RX softirq time : CPU handling softirqs for a device can receive many frames per second for many sockets. If load is too big, we can drop frames at NIC level. RPS or multiqueue cards can help, but better reduce latency if possible. This patch starts with UDP protocol, then additional patches will try to reduce latencies of other ones as well. At RX softirq time, fields of interest for UDP protocol are : (not counting ones in inet struct for the lookup) Read/Written: sk_refcnt (atomic increment/decrement) sk_rmem_alloc & sk_backlog.len (to check if there is room in queues) sk_receive_queue sk_backlog (if socket locked by user program) sk_rxhash sk_forward_alloc sk_drops Read only: sk_rcvbuf (sk_rcvqueues_full()) sk_filter sk_wq sk_policy[0] sk_flags Additional notes : - sk_backlog has one hole on 64bit arches. We can fill it to save 8 bytes. - sk_backlog is used only if RX sofirq handler finds the socket while locked by user. - sk_rxhash is written only once per flow. - sk_drops is written only if queues are full Final layout : [1] One section grouping all read/write fields, but placing rxhash and sk_backlog at the end of this section. [2] One section grouping all read fields in RX handler (sk_filter, sk_rcv_buf, sk_wq) [3] Section used by other paths I'll post a patch on its own to put sk_refcnt at the end of struct sock_common so that it shares same cache line than section [1] New offsets on 64bit arch : sizeof(struct sock)=0x268 offsetof(struct sock, sk_refcnt) =0x10 offsetof(struct sock, sk_lock) =0x48 offsetof(struct sock, sk_receive_queue)=0x68 offsetof(struct sock, sk_backlog)=0x80 offsetof(struct sock, sk_rmem_alloc)=0x80 offsetof(struct sock, sk_forward_alloc)=0x98 offsetof(struct sock, sk_rxhash)=0x9c offsetof(struct sock, sk_rcvbuf)=0xa4 offsetof(struct sock, sk_drops) =0xa0 offsetof(struct sock, sk_filter)=0xa8 offsetof(struct sock, sk_wq)=0xb0 offsetof(struct sock, sk_policy)=0xd0 offsetof(struct sock, sk_flags) =0xe0 Instead of : sizeof(struct sock)=0x270 offsetof(struct sock, sk_refcnt) =0x10 offsetof(struct sock, sk_lock) =0x50 offsetof(struct sock, sk_receive_queue)=0xc0 offsetof(struct sock, sk_backlog)=0x70 offsetof(struct sock, sk_rmem_alloc)=0xac offsetof(struct sock, sk_forward_alloc)=0x10c offsetof(struct sock, sk_rxhash)=0x128 offsetof(struct sock, sk_rcvbuf)=0x4c offsetof(struct sock, sk_drops) =0x16c offsetof(struct sock, sk_filter)=0x198 offsetof(struct sock, sk_wq)=0x88 offsetof(struct sock, sk_policy)=0x98 offsetof(struct sock, sk_flags) =0x130 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:17:43 -08:00
Eric Dumazet	c31504dc0d	udp: use atomic_inc_not_zero_hint UDP sockets refcount is usually 2, unless an incoming frame is going to be queued in receive or backlog queue. Using atomic_inc_not_zero_hint() permits to reduce latency, because processor issues less memory transactions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:17:43 -08:00
Eric Dumazet	213b15ca81	vlan: remove ndo_select_queue() logic Now vlan are lockless, we dont need special ndo_select_queue() logic. dev_pick_tx() will do the multiqueue stuff on the real device transmit. Suggested-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:17:42 -08:00
Eric Dumazet	4af429d29b	vlan: lockless transmit path vlan is a stacked device, like tunnels. We should use the lockless mechanism we are using in tunnels and loopback. This patch completely removes locking in TX path. tx stat counters are added into existing percpu stat structure, renamed from vlan_rx_stats to vlan_pcpu_stats. Note : this partially reverts commit `2e59af3dcb` (vlan: multiqueue vlan device) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:15:08 -08:00
Eric Dumazet	8ffab51b3d	macvlan: lockless tx path macvlan is a stacked device, like tunnels. We should use the lockless mechanism we are using in tunnels and loopback. This patch completely removes locking in TX path. tx stat counters are added into existing percpu stat structure, renamed from rx_stats to pcpu_stats. Note : this reverts commit `2c11455321` (macvlan: add multiqueue capability) Note : rx_errors converted to a 32bit counter, like tx_dropped, since they dont need 64bit range. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Ben Greear <greearb@candelatech.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 10:58:30 -08:00
Neil Horman	0e3125c755	packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Version 4 of this patch. Change notes: 1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :) Summary: It was shown to me recently that systems under high load were driven very deep into swap when tcpdump was run. The reason this happened was because the AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space application to specify how many entries an AF_PACKET socket will have and how large each entry will be. It seems the default setting for tcpdump is to set the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5 allocation. Thats difficult under good circumstances, and horrid under memory pressure. I thought it would be good to make that a bit more usable. I was going to do a simple conversion of the ring buffer from contigous pages to iovecs, but unfortunately, the metadata which AF_PACKET places in these buffers can easily span a page boundary, and given that these buffers get mapped into user space, and the data layout doesn't easily allow for a change to padding between frames to avoid that, a simple iovec change is just going to break user space ABI consistency. So I've done this, I've added a three tiered mechanism to the af_packet set_ring socket option. It attempts to allocate memory in the following order: 1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without digging into swap 2) Using vmalloc 3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as needed to get the memory The effect is that we don't disturb the system as much when we're under load, while still being able to conduct tcpdumps effectively. Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com> Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 10:26:47 -08:00

1 2 3 4 5 ...

222345 Commits