This is a way to avoid nasty routing loops when multiple ipvs instances can
forward to eachother.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
In the event of an icmp packet, take only the ports instead of trying to
grab the full header.
In the event of an inverse packet, use the source address and port.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In the event of an icmp packet, take only the ports instead of trying to
grab the full header.
In the event of an inverse packet, use the source address and port.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In the event of an icmp packet, take only the ports instead of trying to
grab the full header.
In the event of an inverse packet, use the source address and port.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Check the header for icmp before sending a PACKET_TOO_BIG
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Invoke the try_to_schedule logic from the icmp path and update it to the
appropriate ip_vs_conn_put function. The schedule functions have been
updated to reject the packets immediately for now.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
"source_hash" the dest fields if it's an inverse packet.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The ip_vs_iphdr may refer to an internal header, so use the outer one
instead.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This sysctl will be used to enable the scheduling of icmp packets.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This is necessary to schedule icmp later.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
No longer necessary since the information is included in the ip_vs_iphdr
itself.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This is necessary as we'll be trying to schedule icmp later and we'll want
to share this code.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
These flags contain information like whether or not the addresses are
inverted or from icmp. The first will allow us to drop an inverse param
all over the place, and the second will later be useful in scheduling icmp.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This removes some duplicated code and makes the ICMPv6 path look more like
the ICMP path.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
bridge/netfilter/ebtables.c:290:26: warning: incorrect type in assignment (different modifiers)
-> remove __pure annotation.
ipv6/netfilter/ip6t_SYNPROXY.c:240:27: warning: cast from restricted __be16
-> switch ntohs to htons and vice versa.
netfilter/core.c:391:30: warning: symbol 'nfq_ct_nat_hook' was not declared. Should it be static?
-> delete it, got removed
net/netfilter/nf_synproxy_core.c:221:48: warning: cast to restricted __be32
-> Use __be32 instead of u32.
Tested with objdiff that these changes do not affect generated code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This reverts commit 98d1bd802c.
mark_source_chains will not re-visit chains, so
*filter
:INPUT ACCEPT [365:25776]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [217:45832]
:t1 - [0:0]
:t2 - [0:0]
:t3 - [0:0]
:t4 - [0:0]
-A t1 -i lo -j t2
-A t2 -i lo -j t3
-A t3 -i lo -j t4
# -A INPUT -j t4
# -A INPUT -j t3
# -A INPUT -j t2
-A INPUT -j t1
COMMIT
Will compute a chain depth of 2 if the comments are removed.
Revert back to counting the number of chains for the time being.
Reported-by: Cong Wang <cwang@twopensource.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
Second Round of IPVS Updates for v4.3
I realise these are a little late in the cycle, so if you would prefer
me to repost them for v4.4 then just let me know.
The updates include:
* A new scheduler from Raducu Deaconu
* Enhanced configurability of the sync daemon from Julian Anastasov
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
RFC 4443 added two new codes values for ICMPv6 type 1:
5 - Source address failed ingress/egress policy
6 - Reject route to destination
And RFC 7084 states in L-14 that IPv6 Router MUST send ICMPv6 Destination
Unreachable with code 5 for packets forwarded to it that use an address
from a prefix that has been invalidated.
Codes 5 and 6 are more informative subsets of code 1.
Signed-off-by: Andreas Herz <andi@geekosphere.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of IS_ENABLED(CONFIG_IPV6), otherwise we hit:
et/built-in.o: In function `tee_tg6':
>> xt_TEE.c:(.text+0x6cd8c): undefined reference to `nf_dup_ipv6'
when:
CONFIG_IPV6=y
CONFIG_NF_DUP_IPV4=y
# CONFIG_NF_DUP_IPV6 is not set
CONFIG_NETFILTER_XT_TARGET_TEE=y
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- mcast_group: configure the multicast address, now IPv6
is supported too
- mcast_port: configure the multicast port
- mcast_ttl: configure the multicast TTL/HOP_LIMIT
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Allow setups with large MTU to send large sync packets by
adding sync_maxlen parameter. The default value is now based
on MTU but no more than 1500 for compatibility reasons.
To avoid problems if MTU changes allow fragmentation by
sending packets with DF=0. Problem reported by Dan Carpenter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
When the sync damon is started we need to hold rtnl
lock while calling ip_mc_join_group. Currently, we have
a wrong locking order because the correct one is
rtnl_lock->__ip_vs_mutex. It is implied from the usage
of __ip_vs_mutex in ip_vs_dst_event() which is called
under rtnl lock during NETDEV_* notifications.
Fix the problem by calling rtnl_lock early only for the
start_sync_thread call. As a bonus this fixes the usage
__dev_get_by_name which was not called under rtnl lock.
This patch actually extends and depends on commit 54ff9ef36b
("ipv4, ipv6: kill ip_mc_{join, leave}_group and
ipv6_sock_mc_{join, drop}").
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The weighted overflow scheduling algorithm directs network connections
to the server with the highest weight that is currently available
and overflows to the next when active connections exceed the node's weight.
Signed-off-by: Raducu Deaconu <rhadoo.io88@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Convert the xgene_get_mac_address to device_get_mac_address(), and
xgene_get_phy_mode() to device_get_phy_mode().
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas reported breakage adding routes with local nexthops:
$ ip route show table main
...
172.28.0.0/24 dev vnf-xe1p0 proto kernel scope link src 172.28.0.16
$ ip route add 10.0.0.0/8 via 172.28.0.32 table 100 dev vnf-xe1p0
RTNETLINK answers: Resource temporarily unavailable
3bfd847203 changed the lookup to use the passed in table but for cases like
this the nexthop is in the local table rather than the passed in table.
Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.maxtype should match .policy. Probably just been getting lucky here
because IFLA_BRPORT_MAX > IFLA_BR_MAX.
Fixes: 13323516 ("bridge: implement rtnl_link_ops->changelink")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev==NULL check in smsc911x_probe_config is useless
and isn't providing any additional protection. If a fwnode
doesn't exist then an appropriate error should be returned
by device_get_phy_mode() covering the original case
of a missing of/fwnode.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds MAC address length check back into
the device_get_mac_addr() function before calling
is_valid_ether_addr() similar to the way the OF
routine does it.
Update the comments for the two new functions.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ath10k:
* add support for qca99x0 family of devices
* improve performance of tx_lock
* add support for raw mode (802.11 frame format) and software crypto
engine enabled via a module parameter
ath9k:
* add fast-xmit support
wil6210:
* implement TSO support
* support bootloader v1 and onwards
iwlwifi:
* Deprecate -10.ucode
* Clean ups towards multiple Rx queues
* Add support for longer CMD IDs. This will be required by new
firmwares since we are getting close to the u8 limit.
* bugfixes for the D0i3 power state
* Add basic support for FTM
* polish the Miracast operation
* fix a few power consumption issues
* scan cleanup
* fixes for D0i3 system state
* add paging for devices that support it
* add again the new RBD allocation model
* add more options to the firmware debug system
* add support for frag SKBs in Tx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJV1FziAAoJEG4XJFUm622bEJwH/RdNlAK4+IS9Tx0K6EA2fDE3
001J16in/1IPPr8/RmQWNzHGN5/GAYlAdl5v4fkwb8qvcDCyPlQbSEC8ghd32H6M
G1101qEfGk9bhLTpI8xjeqsT1gl98LswPYNAfoRX4AJKAmNCkfJ1WILLi/Q2DUOf
oG6IersuqQOdQbkXDnMm49FBZeSkVvzsJL+WQhKhYblh0bDH3qGuLcYvtkSnt+P8
8QG5X+DPEpmaYcu+5E5N0XgffsdPj/+xXlFova4DFySDYb4OzEH1MYp/rHjPpnUo
nDJqxZIgkkmhu0CGtvpWNeSZ7GQBI0Bzdd+dwBY5yyg5AqWuQ5A1W+4r3SjvHyU=
=A8Ax
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2015-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
Major changes:
ath10k:
* add support for qca99x0 family of devices
* improve performance of tx_lock
* add support for raw mode (802.11 frame format) and software crypto
engine enabled via a module parameter
ath9k:
* add fast-xmit support
wil6210:
* implement TSO support
* support bootloader v1 and onwards
iwlwifi:
* Deprecate -10.ucode
* Clean ups towards multiple Rx queues
* Add support for longer CMD IDs. This will be required by new
firmwares since we are getting close to the u8 limit.
* bugfixes for the D0i3 power state
* Add basic support for FTM
* polish the Miracast operation
* fix a few power consumption issues
* scan cleanup
* fixes for D0i3 system state
* add paging for devices that support it
* add again the new RBD allocation model
* add more options to the firmware debug system
* add support for frag SKBs in Tx
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make fib_encap_match() static as it isn't used outside the file.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
average: convert users to inline implementation
Since there's very little benefit of the out-of-line implementation
(a single byte of .text in one driver as far as I've seen), convert
all drivers to the inline implementation, saving memory, and remove
the out-of-line implementation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all users are now converted to the inline implementation,
remove the out-of-line implementation entirely.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in code that's one
byte larger (for me), but reduces struct link_ant and struct link
size by the two unsigned long values that store the parameters each.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reduces code size slightly (at least on x86/64) while also
removing memory consumption by two unsigned long values for each
ath5k device.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in no change in code
size for me, but reduces the struct receive_queue size by the two
unsigned long values that store the parameters.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f34fa14cc0 ("bnx2x: Add vxlan RSS support") has introduced an
endianity issue when passing the vxlan UDP port to the HW.
Reported-by: <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov says:
====================
vrf: cleanups part 2
This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC
when creating kmem cache since it's handled, patch 02 removes a slave
duplicate check which is already done by the lower/upper code, patch 3
moves the ndo_add_slave code around a bit so we can drop an error
label and patch 4 drops the master device checks which are unnecessary
because the ops are taken from the master device itself so it can't be
different.
====================
Acked-by: David Ahern <dsa@cumulusnetworks.com>
When ndo_add|del_slave ops are used, they're taken from the respective
master device's netdev ops, so if the master device is a VRF only then
the VRF ops will get called thus no need to check the type of the
master.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can simplify do_vrf_add_slave by moving vrf_insert_slave in the end
of the enslaving and thus eliminate an error goto label. It always
succeeds and isn't needed before that anyway.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The upper/lower functions already check for duplicate slaves so no need
to do it again.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's pointless to panic on cache create failure when that case is handled
and even more so since it's not a kernel-wide fatal problem so don't
panic.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently whenever a packet different from ETH_P_IP is sent through the
VRF device it is leaked so plug the leaks and properly drop these
packets.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_LWTUNNEL config is not enabled, the lwtstate_free() is not
declared in lwtunnel.h at all. However, even in this case, the function
is still referenced in fib_semantics.c so that there appears the
following sparse warnings:
net/ipv4/fib_semantics.c:553:17: error: undefined identifier 'lwtstate_free'
CC net/ipv4/fib_semantics.o
net/ipv4/fib_semantics.c: In function ‘fib_encap_match’:
net/ipv4/fib_semantics.c:553:3: error: implicit declaration of function ‘lwtstate_free’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[1]: *** [net/ipv4/fib_semantics.o] Error 1
make: *** [net/ipv4/fib_semantics.o] Error 2
To eliminate the error, we define an empty function for lwtstate_free()
in lwtunnel.h when CONFIG_LWTUNNEL is disabled.
Fixes: df383e6240 ("lwtunnel: fix memory leak")
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
make payload expression aware of the fact that VLAN offload may have
removed a vlan header.
When we encounter tagged skb, transparently insert the tag into the
register so that vlan header matching can work without userspace being
aware of offload features.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-08-18
This series contains updates to igb, e100, e1000e and ixgbe.
Shota Suzuki provides a fix for a possible overflow in
igb_set_interrupt_capability() which leads to an oops. When changing the
number of queues by "ethtool -L", set IGB_FLAG_QUEUE_PAIRS in the same
manner as when initializing the igb driver.
Vasily Averin provides a fix for a missing rtnl_unlock() for when we
error out due to not being able to allocate memory for our queues.
Stefan Assman provides a couple of fixes for igb/igbvf. First changes
the igb driver in probe to simply call igb_enable_sriov() instead of
igb_sriov_reinit() since we are starting from scratch. Then in igbvf,
fix the driver where it does not clear the buffer_info->dma in all
cases after calling dma_unmap_single(), which was found by changing the
MTU twice.
Richard Cochran implements the periodic output function using the
programmable clock outputs available in i210 when possible, falling
back to the target time for longer periods.
Todd adds support for the Marvell PHY 1512 which is required for i354
devices. Then updates igb to make sure SR-IOV init uses the correct
number of queues, since recent changes could result in the PF holding
onto all of the queues.
Alex Williamson provides a fix in the case where a guest OS does not
support hot-unplug, so disable SR-IOV prior to unregister_netdev() to
avoid the problem.
Jia-Ju Bai provides several patches, first knocks some collecting dust
off an old e100 driver to add a check to avoid a null pointer
dereference. Then cleans up a possible resource leak by releasing the
skb buffer allocated when the e100_xmit_prepare() runs into an issue
in the DMA mapping. In igb, add a missing rtnl_unlock() for when we
error out due to igb_sriov_reinit() in the igb_init_interrupt_scheme().
Provides a e1000e fix, based on suggestions from Alex Duyck to move
head/tail register writing to e1000_configure_tx/rx() to avoid a
possible null pointer dereference (similar to igb driver). Lastly,
fix a possible memory leak in igb_probe(), where the memory shadow_vfta
allocated by kcalloc in igb_sw_init() is not freed.
Mark simplifies port-specific macros for ixgbe by eliminating explicit
comparisons with 0 and enclose formal parameters in parens to eliminate
the risk of an operator precedence issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov says:
====================
vrf: a few simplifications and cleanups
These patches remove some unnecessary checks (patches 3, 4), unnecessary
num_slaves member and refcnt manipulations which are already done by the
upper functions.
====================
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>