While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :
- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:
if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;
Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)
In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.
In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
E1000 can benefit from calling the GRO receive functions.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Net drivers in general have an issue where timers fired
by mod_timer or work threads with schedule_work are running
outside of the rtnl_lock.
With no other lock protection these routines are vulnerable
to races with driver unload or reset paths.
The longer term solution to this might be a redesign with
safer locks being taken in the driver to guarantee no
reentrance, but for now a safe and effective fix is
to take the rtnl_lock in these routines.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
E1000 is using several timers that in a follow on patch
will need to acquire the rtnl_lock in order to be safe.
This patch moves the timer bodies into work queues which
will allow the next patch to add rtnl_lock.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the netdev->features is set with NETIF_F_HIGHDMA, we should set the
corresponding netdev->vlan_features as well to allow VLAN netdev created
on top of the real netdev to be able to also benefit from HIGHDMA on 32bit
system, reducing the performance hit that is caused by __skb_linearize(),
particularly for large send. This is fixed in this patch for all Intel e1000,
e1000e, igb, ixgbe, and ixgbe drivers since this should be beneficial
to all devices supported by these drivers.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the Intel(r) DH89xxCC series. The new
device will be using Intel(r) i347-AT4 and Marvell(r) M88E1322 and
M88E1112 PHYs. Support for these devices has also been added here.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change corrects an issue in which we were setting all flag bits except
for promisc instead of clearing the promisc bits due to the incorrect use
of an |= instead of an &=.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For non-managed versions of 82579, set the bit that prevents the hardware
from automatically configuring the PHY after resets only when the driver
performs a reset, clear the bit after resets. This is so the hardware can
configure the PHY automatically when the part is reset in a manner that is
not controlled by the driver (e.g. in a virtual environment via PCI FLR)
otherwise the PHY will be mis-configured causing issues such as failing to
link at 1000Mbps.
For managed versions of 82579, keep the previous behavior since the
manageability firmware will handle the PHY configuration.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The subject workaround was causing CRC errors due to writing the wrong
register with updates of the RCTL register. It was also found that the
workaround function which modifies the RCTL register was being called in
the middle of a read-modify-write operation of the RCTL register, so the
function call has been moved appropriately. Lastly, jumbo frames must not
be allowed when CRC stripping is disabled by a module parameter because the
workaround requires the CRC be stripped.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 82579, there is a hardware bug that can cause received packets to not
get transferred from the PHY to the MAC due to K1 (a power saving feature
of the PHY-MAC interconnect similar to ASPM L1). Since the MAC controls
the accounting of missed packets, these will go unnoticed. Workaround the
issue by setting the K1 beacon duration according to the link speed.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two recent patches to cleanup the reset[1] and initial PHY configuration[2]
code paths for ICH/PCH devices inadvertently left out a 10msec delay and
device ID check respectively which are necessary for the 82566DC (device id
0x104b) to be configured properly, otherwise it will not get link.
[1] commit e98cac447c
[2] commit 3f0c16e844
CC: stable@kernel.org
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the hardware is prevented from performing automatic PHY configuration
(the driver does it instead), the OEM_WRITE_ENABLE bit in the EXTCNF_CTRL
register will not get cleared preventing the SMBus address and the LED
configuration to be written to the PHY registers. On 82579, do not check
the OEM_WRITE_ENABLE bit.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When going to Sx, disable gigabit in PHY (e1000_oem_bits_config_ich8lan)
in addition to the MAC before configuring PHY wakeup otherwise the PHY
configuration writes might be missed. Also write the LED configuration
and SMBus address to the PHY registers (e1000_oem_bits_config_ich8lan and
e1000_write_smbus_addr, respectively). The reset is no longer needed
since re-auto-negotiation is forced in e1000_oem_bits_config_ich8lan and
leaving it in causes issues with auto-negotiating the link.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.
Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
They are allocated in atl1_setup_ring_resources, zero out the pointers
in atl1_free_ring_resources (like the other resources).
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
adapter->cmb.cmb is initialized when the device is opened and freed when
it's closed. Accessing it unconditionally during resume results either
in a crash (NULL pointer dereference, when the interface has not been
opened yet) or data corruption (when the interface has been used and
brought down adapter->cmb.cmb points to a deallocated memory area).
Cc: stable@kernel.org
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The x25_datagram_poll didn't add anything, removed it.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This board never went into production, but some engineering samples
are in use.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SFN4111T never reached production and is not being used for internal
or customer testing.
Since we have no production Falcon boards using the SFT9001 or the
GMAC, remove support for them as well.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h. It has no impact on
the kernel.
(This came up because we have several C++ applications which use "net" as a
namespace name.)
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checksum calculation in nf_nat_snmp_basic.
Based on patches by Clark Wang <wtweeker@163.com> and
Stephen Hemminger <shemminger@vyatta.com>.
https://bugzilla.kernel.org/show_bug.cgi?id=17622
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as rcu_read_unlock() is called, there is no guarantee current
thread can safely derefence t pointer, rcu protected.
Fix is to copy t->alloc_size in a temporary variable.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_me_harder can't create the route cache when the outdev is the same
with the indev for the skbs whichout a valid protocol set.
__mkroute_input functions has this check:
1998 if (skb->protocol != htons(ETH_P_IP)) {
1999 /* Not IP (i.e. ARP). Do not create route, if it is
2000 * invalid for proxy arp. DNAT routes are always valid.
2001 *
2002 * Proxy arp feature have been extended to allow, ARP
2003 * replies back to the same interface, to support
2004 * Private VLAN switch technologies. See arp.c.
2005 */
2006 if (out_dev == in_dev &&
2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
2008 err = -EINVAL;
2009 goto cleanup;
2010 }
2011 }
This patch gives the new skb a valid protocol to bypass this check. In order
to make ipt_REJECT work with bridges, you also need to enable ip_forward.
This patch also fixes a regression. When we used skb_copy_expand(), we
didn't have this issue stated above, as the protocol was properly set.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I initially noticed this because of the compiler warning below, but it
does seem to be a valid concern in the case where ct_sip_get_header()
returns 0 in the first iteration of the while loop.
net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp':
net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function
Signed-off-by: Simon Horman <horms@verge.net.au>
[Patrick: changed NF_DROP to NF_ACCEPT]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIF sockets should use socket's default send and receive buffers sizes.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check that receive function pointer is not null before calling it.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pr_debug for flow control printouts, and refine an error printout.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove debugging quirk redefining pr_debug to pr_warning.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Use rcu_dereference_rtnl() in __in6_dev_get
- kerneldoc for __in6_dev_get() and in6_dev_get()
- Use inline functions instead of macros
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dev_kfree_skb_any() helper to free the skb
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: linux-atm-general@lists.sourceforge.net
Signed-off-by: David S. Miller <davem@davemloft.net>
tmspci driver uses dev->name before register_netdev() and so prints tr%d
in initialization messages. Fix it by using dev_info.
Found and tested on real hardware.
Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/core/ethtool.c: In function 'ethtool_get_regs':
net/core/ethtool.c:818:2: error: implicit declaration of function 'vmalloc'
net/core/ethtool.c:818:9: warning: assignment makes pointer from integer without a cast
net/core/ethtool.c:833:2: error: implicit declaration of function 'vfree'
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/sfc/filter.c: In function ‘efx_probe_filters’:
drivers/net/sfc/filter.c:422: error: implicit declaration of function ‘vmalloc’
drivers/net/sfc/filter.c:422: warning: assignment makes pointer from integer without a cast
drivers/net/sfc/filter.c: In function ‘efx_remove_filters’:
drivers/net/sfc/filter.c:442: error: implicit declaration of function ‘vfree’
Signed-off-by: David S. Miller <davem@davemloft.net>
Special care should be taken when slow path is hit in ip_fragment() :
When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.
Many thanks to Nick Bowler, who provided a very clean bug report and
test program.
Thanks to Jarek for reviewing my first patch and providing a V2
While Nick bisection pointed to commit 2b85a34e91 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)
A side effect is to extend work done in commit b2722b1c3a
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For backward compatibility, add it at the end.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some NICs have huge register files which exceed the maximum heap
allocation size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>