linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-18 01:51:53 +00:00

Author	SHA1	Message	Date
Yunsheng Lin	206703289a	net: hns3: Remove error log when getting pfc stats fails When mac supports DCB, but is in GE mode, it does not support querying pfc stats, firmware returns error when trying to query the pfc stats. this creates a lot of noise in the kernel log when it prints the error log. This patch fixes it by removing the error log, because it already return the error to the user space, so the user should be aware of the error. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 15:08:36 -04:00
Nick Dyer	068bdb67ef	Input: atmel_mxt_ts - fix the firmware update The automatic update mechanism will trigger an update if the info block CRCs are different between maxtouch configuration file (maxtouch.cfg) and chip. The driver compared the CRCs without retrieving the chip CRC, resulting always in a failure and firmware flashing action triggered. Fix this issue by retrieving the chip info block CRC before the check. Note that this solution has the benefit that by reading the information block and the object table into a contiguous region of memory, we can verify the checksum at probe time. This means we make sure that we are indeed talking to a chip that supports object protocol correctly. Using this patch on a kevin chromebook, the touchscreen and touchpad drivers are able to match the CRC: atmel_mxt_ts 3-004b: Family: 164 Variant: 14 Firmware V2.3.AA Objects: 40 atmel_mxt_ts 5-004a: Family: 164 Variant: 17 Firmware V2.0.AA Objects: 31 atmel_mxt_ts 3-004b: Resetting device atmel_mxt_ts 5-004a: Resetting device atmel_mxt_ts 3-004b: Config CRC 0x573E89: OK atmel_mxt_ts 3-004b: Touchscreen size X4095Y2729 input: Atmel maXTouch Touchscreen as /devices/platform/ff130000.i2c/i2c-3/3-004b/input/input5 atmel_mxt_ts 5-004a: Config CRC 0x0AF6BA: OK atmel_mxt_ts 5-004a: Touchscreen size X1920Y1080 input: Atmel maXTouch Touchpad as /devices/platform/ff140000.i2c/i2c-5/5-004a/input/input6 Signed-off-by: Nick Dyer <nick.dyer@shmanahar.org> Acked-by: Benson Leung <bleung@chromium.org> [Ezequiel: minor patch massage] Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2018-05-01 11:54:51 -07:00
Vittorio Gambaletta (VittGam)	f372b81101	Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro This patch adds the correct platform data information for the Caroline Chromebook, so that the mouse button does not get stuck in pressed state after the first click. The Samus button keymap and platform data definition are the correct ones for Caroline, so they have been reused here. Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Signed-off-by: Salvatore Bellizzi <lkml@seppia.net> Tested-by: Guenter Roeck <groeck@chromium.org> Cc: stable@vger.kernel.org [dtor: adjusted vendor spelling to match shipping firmware] Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2018-05-01 11:35:33 -07:00
Stefan Strogin	b086ff8725	connector: add parent pid and tgid to coredump and exit events The intention is to get notified of process failures as soon as possible, before a possible core dumping (which could be very long) (e.g. in some process-manager). Coredump and exit process events are perfect for such use cases (see `2b5faa4c55` "connector: Added coredumping event to the process connector"). The problem is that for now the process-manager cannot know the parent of a dying process using connectors. This could be useful if the process-manager should monitor for failures only children of certain parents, so we could filter the coredump and exit events by parent process and/or thread ID. Add parent pid and tgid to coredump and exit process connectors event data. Signed-off-by: Stefan Strogin <sstrogin@cisco.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:25:37 -04:00
Florian Fainelli	e283de3a4f	net: core: Inline netdev_features_size_check() We do not require this inline function to be used in multiple different locations, just inline it where it gets used in register_netdevice(). Suggested-by: David Miller <davem@davemloft.net> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:24:19 -04:00
Thomas Winter	edd7ceb782	ipv6: Allow non-gateway ECMP for IPv6 It is valid to have static routes where the nexthop is an interface not an address such as tunnels. For IPv4 it was possible to use ECMP on these routes but not for IPv6. Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Cc: David Ahern <dsahern@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:23:33 -04:00
Willem de Bruijn	a8c744a8b4	udp: disable gso with no_check_tx Syzbot managed to send a udp gso packet without checksum offload into the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This triggered the skb_warn_bad_offload. RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658 skb_gso_segment include/linux/netdevice.h:4038 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120 __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577 dev_queue_xmit+0x17/0x20 net/core/dev.c:3618 UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to catch and fail in this case. After the integrity checks jump directly to the CHECKSUM_PARTIAL case to avoid reading the no_check_tx flags again (a TOCTTOU race). Fixes: `bec1f6f697` ("udp: generate gso with UDP_SEGMENT") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:20:14 -04:00
Wenwen Wang	d656fe49e3	ethtool: fix a potential missing-check bug In ethtool_get_rxnfc(), the object "info" is firstly copied from user-space. If the FLOW_RSS flag is set in the member field flow_type of "info" (and cmd is ETHTOOL_GRXFH), info needs to be copied again from user-space because FLOW_RSS is newer and has new definition, as mentioned in the comment. However, given that the user data resides in user-space, a malicious user can race to change the data after the first copy. By doing so, the user can inject inconsistent data. For example, in the second copy, the FLOW_RSS flag could be cleared in the field flow_type of "info". In the following execution, "info" will be used in the function ops->get_rxnfc(). Such inconsistent data can potentially lead to unexpected information leakage since ops->get_rxnfc() will prepare various types of data according to flow_type, and the prepared data will be eventually copied to user-space. This inconsistent data may also cause undefined behaviors based on how ops->get_rxnfc() is implemented. This patch simply re-verifies the flow_type field of "info" after the second copy. If the value is not as expected, an error code will be returned. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:18:47 -04:00
Colin Ian King	26ff75857e	net/mlx4: fix spelling mistake: "failedi" -> "failed" trivial fix to spelling mistake in mlx4_warn message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:17:29 -04:00
Paul Blakey	05cd271fd6	cls_flower: Support multiple masks per priority Currently flower doesn't support inserting filters with different masks on a single priority, even if the actual flows (key + mask) inserted aren't overlapping, as with the use case of offloading openvswitch datapath flows. Instead one must go up one level, and assign different priorities for each mask, which will create a different flower instances. This patch opens flower to support more than one mask per priority, and a single flower instance. It does so by adding another hash table on top of the existing one which will store the different masks, and the filters that share it. The user is left with the responsibility of ensuring non overlapping flows, otherwise precedence is not guaranteed. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:14:15 -04:00
Michael S. Tsirkin	de08481a25	vhost: make msg padding explicit There's a 32 bit hole just after type. It's best to give it a name, this way compiler is forced to initialize it with rest of the structure. Reported-by: Kevin Easton <kevin@guarana.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 14:05:13 -04:00
Eric Dumazet	bf2acc943a	tcp: fix TCP_REPAIR_QUEUE bound checking syzbot is able to produce a nasty WARN_ON() in tcp_verify_left_out() with following C-repro : socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0 setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 sendto(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1242, MSG_FASTOPEN, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("127.0.0.1")}, 16) = 1242 setsockopt(3, SOL_TCP, TCP_REPAIR_WINDOW, "\4\0\0@+\205\0\0\377\377\0\0\377\377\377\177\0\0\0\0", 20) = 0 writev(3, [{"\270", 1}], 1) = 1 setsockopt(3, SOL_TCP, TCP_REPAIR_OPTIONS, "\10\0\0\0\0\0\0\0\0\0\0\0\|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 386) = 0 writev(3, [{"\210v\r[\226\320t\231qwQ\204\264l\254\t\1\20\245\214p\350H\223\254;\\\37\345\307p$"..., 3144}], 1) = 3144 The 3rd system call looks odd : setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0 This patch makes sure bound checking is using an unsigned compare. Fixes: `ee9952831c` ("tcp: Initial repair mode") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:25:58 -04:00
Eric Dumazet	cea67a2dd6	ipv6: fix uninit-value in ip6_multipath_l3_keys() syzbot/KMSAN reported an uninit-value in ip6_multipath_l3_keys(), root caused to a bad assumption of ICMP header being already pulled in skb->head ip_multipath_l3_keys() does the correct thing, so it is an IPv6 only bug. BUG: KMSAN: uninit-value in ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline] BUG: KMSAN: uninit-value in rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858 CPU: 0 PID: 4507 Comm: syz-executor661 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline] rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858 ip6_route_input+0x65a/0x920 net/ipv6/route.c:1884 ip6_rcv_finish+0x413/0x6e0 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:288 [inline] ipv6_rcv+0x1e16/0x2340 net/ipv6/ip6_input.c:208 __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701 netif_receive_skb+0x230/0x240 net/core/dev.c:4725 tun_rx_batched drivers/net/tun.c:1555 [inline] tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 call_write_iter include/linux/fs.h:1782 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x7fb/0x9f0 fs/read_write.c:482 vfs_write+0x463/0x8d0 fs/read_write.c:544 SYSC_write+0x172/0x360 fs/read_write.c:589 SyS_write+0x55/0x80 fs/read_write.c:581 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: `23aebdacb0` ("ipv6: Compute multipath hash for ICMP errors from offending packet") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:15:24 -04:00
Linus Torvalds	f2125992e7	Changes since last update: - Enhance inode fork verifiers to prevent loading of corrupted metadata. - Fix a crash when we try to convert extents format inodes to btree format, we run out of space, but forget to revert the in-core state changes. - Fix file size checks when doing INSERT_RANGE that could cause files to end up negative size if there previously was an extent mapped at s_maxbytes. - Fix a bug when doing a remove-then-add ATTR_REPLACE xattr update where we forget to clear ATTR_REPLACE after the remove, which causes the attr to be lost and the fs to shut down due to (what it thinks is) inconsistent in-core state. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCgAGBQJa2gs1AAoJEPh/dxk0SrTru50P+wZuY2DRfBSCH20Oq3qTbO9M dQGBnOIqRUC+9IDkpRExB4BYVZoK9/v3fnW3zttRMlEmTQfHtMsoEUXpByoETY5b /bTZh+c7fMhj7eNZspfc+8xsHvq0k3hSxrITe4zjL2rSy72KUrsYtDoIu2UvXyZK nJsqCiyFOdFgMi6IBRLOAVBPzs/q8sIBfl/axjyvokLL/6ki/TfvCAtLkdT4FRIt UHzE8ly/Z99honciPQW4axZ9TobAVd6g2d11XJpbhku3ijTL/vHyflRCFgmM/2T3 VyJ3tTH/w3rCAXOEEj3H8TvKAlHiKpB+g9VwhsTZrP0B14/ljkClm/yEFnCOLmb1 /26t+A++fkqy71PoqeQvXGJGEAxC/1TGo7dxq6Gn5SESc1yr8CjXXMiWTMBBWgfF xIsBNA6Ok0F5+OkEhfSUtfIBziPeUIjbmNGTnMV/EiL9stpHwrQS+JLAK5+POHvQ ZoH5RTY69mK5rJtzN6USGyPe4J0f+S7YTf9fjKTcjjpWHLjJYEYJcGvQb3ynZSV+ 5Wu1TqaUkl/Mp3mkZ1KFMq+MXBgOnarOZug80fbhkv3Vyzbelzpebuc5fMJnzofp 0x6BQUC+bwOxB2KJX8TC+9ajen8O+oxrE2RyehVtCZL0O830NvZ4hQg3yxjKUbwv 8Y0Mq6Q2CQEmsBJkIHH2 =lFf0 -----END PGP SIGNATURE----- Merge tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "Here are a few more bug fixes for xfs for 4.17-rc4. Most of them are fixes for bad behavior. This series has been run through a full xfstests run during LSF and through a quick xfstests run against this morning's master, with no major failures reported. Summary: - Enhance inode fork verifiers to prevent loading of corrupted metadata. - Fix a crash when we try to convert extents format inodes to btree format, we run out of space, but forget to revert the in-core state changes. - Fix file size checks when doing INSERT_RANGE that could cause files to end up negative size if there previously was an extent mapped at s_maxbytes. - Fix a bug when doing a remove-then-add ATTR_REPLACE xattr update where we forget to clear ATTR_REPLACE after the remove, which causes the attr to be lost and the fs to shut down due to (what it thinks is) inconsistent in-core state" * tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE xfs: prevent creating negative-sized file via INSERT_RANGE xfs: set format back to extents if xfs_bmap_extents_to_btree xfs: enhance dinode verifier	2018-05-01 09:11:45 -07:00
David S. Miller	9908b3630f	Merge branch 'sctp-unify-sctp_make_op_error_fixed-and-sctp_make_op_error_space' Marcelo Ricardo Leitner says: ==================== sctp: unify sctp_make_op_error_fixed and sctp_make_op_error_space These two variants are very close to each other and can be merged to avoid code duplication. That's what this patchset does. First, we allow sctp_init_cause to return errors, which then allow us to add sctp_make_op_error_limited that handles both situations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:09:36 -04:00
Marcelo Ricardo Leitner	8914f4bace	sctp: add sctp_make_op_error_limited and reuse inner functions The idea is quite similar to the old functions, but note that the _fixed function wasn't "fixed" as in that it would generate a packet with a fixed size, but rather limited/bounded to PMTU. Also, now with sctp_mtu_payload(), we have a more accurate limit. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:09:35 -04:00
Marcelo Ricardo Leitner	6d3e8aa876	sctp: allow sctp_init_cause to return errors And do so if the skb doesn't have enough space for the payload. This is a preparation for the next patch. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 12:09:35 -04:00
David S. Miller	065662d941	Merge branch 'net-stmmac-dwmac-meson-100M-phy-mode-support-for-AXG-SoC' Yixun Lan says: ==================== net: stmmac: dwmac-meson: 100M phy mode support for AXG SoC Due to the dwmac glue layer register changed, we need to introduce a new compatible name for the Meson-AXG SoC to support for the RMII 100M ethernet PHY. Change since v1 at [1]: - implement set_phy_mode() for each SoC [1] https://lkml.kernel.org/r/20180426160508.29380-1-yixun.lan@amlogic.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 11:30:00 -04:00
Yixun Lan	efacb568c9	net: stmmac: dwmac-meson: extend phy mode setting In the Meson-AXG SoC, the phy mode setting of PRG_ETH0 in the glue layer is extended from bit[0] to bit[2:0]. There is no problem if we configure it to the RGMII 1000M PHY mode, since the register setting is coincidentally compatible with previous one, but for the RMII 100M PHY mode, the configuration need to be changed to value - b100. This patch was verified with a RTL8201F 100M ethernet PHY. Signed-off-by: Yixun Lan <yixun.lan@amlogic.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 11:29:59 -04:00
Yixun Lan	7e5d05e18b	dt-bindings: net: meson-dwmac: new compatible name for AXG SoC We need to introduce a new compatible name for the Meson-AXG SoC in order to support the RMII 100M ethernet PHY, since the PRG_ETH0 register of the dwmac glue layer is changed from previous old SoC. Signed-off-by: Yixun Lan <yixun.lan@amlogic.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 11:29:59 -04:00
David S. Miller	90d52d4fd8	Merge branch 'netns-uevent-filtering' Christian Brauner says: ==================== netns: uevent filtering This is the new approach to uevent filtering as discussed (see the threads in [1], [2], and [3]). It only contains non-functional changes. This series deals with with fixing up uevent filtering logic: - uevent filtering logic is simplified - locking time on uevent_sock_list is minimized - tagged and untagged kobjects are handled in separate codepaths - permissions for userspace are fixed for network device uevents in network namespaces owned by non-initial user namespaces Udev is now able to see those events correctly which it wasn't before. For example, moving a physical device into a network namespace not owned by the initial user namespaces before gave: root@xen1:~# udevadm --debug monitor -k calling: monitor monitor will print the received events for: KERNEL - the kernel uevent sender uid=65534, message ignored sender uid=65534, message ignored sender uid=65534, message ignored sender uid=65534, message ignored sender uid=65534, message ignored and now after the discussion and solution in [3] correctly gives: root@xen1:~# udevadm --debug monitor -k calling: monitor monitor will print the received events for: KERNEL - the kernel uevent KERNEL[625.301042] add /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net) KERNEL[625.301109] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net) KERNEL[625.301138] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net) KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net) Thanks! Christian [1]: https://lkml.org/lkml/2018/4/4/739 [2]: https://lkml.org/lkml/2018/4/26/767 [3]: https://lkml.org/lkml/2018/4/26/738 ==================== Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 10:22:41 -04:00
Christian Brauner	a3498436b3	netns: restrict uevents commit `07e98962fa` ("kobject: Send hotplug events in all network namespaces") enabled sending hotplug events into all network namespaces back in 2010. Over time the set of uevents that get sent into all network namespaces has shrunk. We have now reached the point where hotplug events for all devices that carry a namespace tag are filtered according to that namespace. Specifically, they are filtered whenever the namespace tag of the kobject does not match the namespace tag of the netlink socket. Currently, only network devices carry namespace tags (i.e. network namespace tags). Hence, uevents for network devices only show up in the network namespace such devices are created in or moved to. However, any uevent for a kobject that does not have a namespace tag associated with it will not be filtered and we will broadcast it into all network namespaces. This behavior stopped making sense when user namespaces were introduced. This patch simplifies and fixes couple of things: - Split codepath for sending uevents by kobject namespace tags: 1. Untagged kobjects - uevent_net_broadcast_untagged(): Untagged kobjects will be broadcast into all uevent sockets recorded in uevent_sock_list, i.e. into all network namespacs owned by the intial user namespace. 2. Tagged kobjects - uevent_net_broadcast_tagged(): Tagged kobjects will only be broadcast into the network namespace they were tagged with. Handling of tagged kobjects in 2. does not cause any semantic changes. This is just splitting out the filtering logic that was handled by kobj_bcast_filter() before. Handling of untagged kobjects in 1. will cause a semantic change. The reasons why this is needed and ok have been discussed in [1]. Here is a short summary: - Userspace ignores uevents from network namespaces that are not owned by the intial user namespace: Uevents are filtered by userspace in a user namespace because the received uid != 0. Instead the uid associated with the event will be 65534 == "nobody" because the global root uid is not mapped. This means we can safely and without introducing regressions modify the kernel to not send uevents into all network namespaces whose owning user namespace is not the initial user namespace because we know that userspace will ignore the message because of the uid anyway. I have a) verified that is is true for every udev implementation out there b) that this behavior has been present in all udev implementations from the very beginning. - Thundering herd: Broadcasting uevents into all network namespaces introduces significant overhead. All processes that listen to uevents running in non-initial user namespaces will end up responding to uevents that will be meaningless to them. Mainly, because non-initial user namespaces cannot easily manage devices unless they have a privileged host-process helping them out. This means that there will be a thundering herd of activity when there shouldn't be any. - Removing needless overhead/Increasing performance: Currently, the uevent socket for each network namespace is added to the global variable uevent_sock_list. The list itself needs to be protected by a mutex. So everytime a uevent is generated the mutex is taken on the list. The mutex is held from the creation of the uevent (memory allocation, string creation etc. until all uevent sockets have been handled. This is aggravated by the fact that for each uevent socket that has listeners the mc_list must be walked as well which means we're talking O(n^2) here. Given that a standard Linux workload usually has quite a lot of network namespaces and - in the face of containers - a lot of user namespaces this quickly becomes a performance problem (see "Thundering herd" above). By just recording uevent sockets of network namespaces that are owned by the initial user namespace we significantly increase performance in this codepath. - Injecting uevents: There's a valid argument that containers might be interested in receiving device events especially if they are delegated to them by a privileged userspace process. One prime example are SR-IOV enabled devices that are explicitly designed to be handed of to other users such as VMs or containers. This use-case can now be correctly handled since commit `692ec06d7c` ("netns: send uevent messages"). This commit introduced the ability to send uevents from userspace. As such we can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user namespace of the network namespace of the netlink socket) userspace process make a decision what uevents should be sent. This removes the need to blindly broadcast uevents into all user namespaces and provides a performant and safe solution to this problem. - Filtering logic: This patch filters by owning user namespace of the network namespace a given task resides in and not by user namespace of the task per se. This means if the user namespace of a given task is unshared but the network namespace is kept and is owned by the initial user namespace a listener that is opening the uevent socket in that network namespace can still listen to uevents. - Fix permission for tagged kobjects: Network devices that are created or moved into a network namespace that is owned by a non-initial user namespace currently are send with INVALID_{G,U}ID in their credentials. This means that all current udev implementations in userspace will ignore the uevent they receive for them. This has lead to weird bugs whereby new devices showing up in such network namespaces were not recognized and did not get IPs assigned etc. This patch adjusts the permission to the appropriate {g,u}id in the respective user namespace. This way udevd is able to correctly handle such devices. - Simplify filtering logic: do_one_broadcast() already ensures that only listeners in mc_list receive uevents that have the same network namespace as the uevent socket itself. So the filtering logic in kobj_bcast_filter is not needed (see [3]). This patch therefore removes kobj_bcast_filter() and replaces netlink_broadcast_filtered() with the simpler netlink_broadcast() everywhere. [1]: https://lkml.org/lkml/2018/4/4/739 [2]: https://lkml.org/lkml/2018/4/26/767 [3]: https://lkml.org/lkml/2018/4/26/738 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 10:22:41 -04:00
Christian Brauner	26045a7b14	uevent: add alloc_uevent_skb() helper This patch adds alloc_uevent_skb() in preparation for follow up patches. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 10:22:40 -04:00
David S. Miller	e33200bc01	Merge branch 'tls-offload-netdev-and-mlx5-support' Boris Pismenny says: ==================== TLS offload, netdev & MLX5 support The following series provides TLS TX inline crypto offload. v1->v2: - Added IS_ENABLED(CONFIG_TLS_DEVICE) and a STATIC_KEY for icsk_clean_acked - File license fix - Fix spelling, comment by DaveW - Move memory allocations out of tls_set_device_offload and other misc fixes, comments by Kiril. v2->v3: - Reversed xmas tree where needed and style fixes - Removed the need for skb_page_frag_refill, per Eric's comment - IPv6 dependency fixes v3->v4: - Remove "inline" from functions in C files - Make clean_acked_data_enabled a static variable and add enable/disable functions to control it. - Remove unnecessary variable initialization mentioned by ShannonN - Rebase over TLS RX - Refactor the tls_software_fallback to reduce the number of variables mentioned by KirilT v4->v5: - Add missing CONFIG_TLS_DEVICE v5->v6: - Move changes to the software implementation into a seperate patch - Fix some checkpatch warnings - GPL export the enable/disable clean_acked_data functions v6->v7: - Use the dst_entry to obtain the netdev in dev_get_by_index - Remove the IPv6 patch since it is redundent now v7->v8: - Fix a merge conflict in mlx5 header v8->v9: - Fix false -Wmaybe-uninitialized warning - Fix empty space in the end of new files v9->v10: - Remove default "n" in net/Kconfig This series adds a generic infrastructure to offload TLS crypto to a network devices. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmit the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Expected TCP SN is accessed without a lock, under the assumption that TCP doesn't transmit SKBs from different TX queue concurrently. If packets are rerouted to a different netdevice, then a software fallback routine handles encryption. Paper: https://www.netdevconf.org/1.2/papers/netdevconf-TLS.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Boris Pismenny	f9c8141fc1	MAINTAINERS: Update TLS maintainers Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Boris Pismenny	a051505c7e	MAINTAINERS: Update mlx5 innova driver maintainers Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Ilya Lesokhin	43585a41bd	net/mlx5e: TLS, Add error statistics Add statistics for rare TLS related errors. Since the errors are rare we have a counter per netdev rather then per SQ. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:48 -04:00
Ilya Lesokhin	bf23974104	net/mlx5e: TLS, Add Innova TLS TX offload data path Implement the TLS tx offload data path according to the requirements of the TLS generic NIC offload infrastructure. Special metadata ethertype is used to pass information to the hardware. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	c83294b9ef	net/mlx5e: TLS, Add Innova TLS TX support Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the TLS generic NIC offload infrastructure. The NETIF_F_HW_TLS_TX capability will be added in the next patch. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	1ae1732284	net/mlx5: Accel, Add TLS tx offload interface Add routines for manipulating TLS TX offload contexts. In Innova TLS, TLS contexts are added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Add implementation for Innova TLS (FPGA-based) hardware. These routines will be used by the TLS offload support in a later patch mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs to work directly with mlx5_core rather than Innova FPGA or other mlx5 acceleration providers. In the future, when IPSec/TLS or any other acceleration gets integrated into ConnectX chip, mlx5/accel layer will provide the integrated acceleration, rather than the Innova one. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	bb9094161b	net/mlx5e: Move defines out of ipsec code The defines are not IPSEC specific. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	e8f6979981	net/tls: Add generic NIC offload infrastructure This patch adds a generic infrastructure to offload TLS crypto to a network device. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmits the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Boris Pismenny	f66de3ee2c	net/tls: Split conf to rx + tx In TLS inline crypto, we can have one direction in software and another in hardware. Thus, we split the TLS configuration to separate structures for receive and transmit. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	2342a8512a	net: Add TLS TX offload features This patch adds a netdev feature to configure TLS TX offloads. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	a5c37c63f7	net: Add TLS offload netdev ops Add new netdev ops to add and delete tls context Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:47 -04:00
Ilya Lesokhin	ebf4e808fa	net: Add Software fallback infrastructure for socket dependent offloads With socket dependent offloads we rely on the netdev to transform the transmitted packets before sending them to the wire. When a packet from an offloaded socket is rerouted to a different device we need to detect it and do the transformation in software. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:46 -04:00
Ilya Lesokhin	08303c1895	net: Rename and export copy_skb_header copy_skb_header is renamed to skb_copy_header and exported. Exposing this function give more flexibility in copying SKBs. skb_copy and skb_copy_expand do not give enough control over which parts are copied. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:46 -04:00
Ilya Lesokhin	6dac152355	tcp: Add clean acked data hook Called when a TCP segment is acknowledged. Could be used by application protocols who hold additional metadata associated with the stream data. This is required by TLS device offload to release metadata associated with acknowledged TLS records. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:42:46 -04:00
David S. Miller	1a1f4a28f3	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-04-30 This series contains updates to i40e and i40evf only. Jia-Ju Bai replaces an instance of GFP_ATOMIC to GFP_KERNEL, since i40evf is not in atomic context when i40evf_add_vlan() is called. Jake cleans up function header comments to ensure that the function parameter comments actually match the function parameters. Fixed a possible overflow error in the PTP clock code. Fixed warnings regarding restricted __be32 type usage. Mariusz fixes the reading of the LLDP configuration, which moves from using relative values to calculating the absolute address. Jakub adds a check for 10G LR mode for i40e. Paweł fixes an issue, where changing the MTU would turn on TSO, GSO and GRO. Alex fixes a couple of issues with the UDP tunnel filter configuration. First being that the tunnels did not have mutual exclusion in place to prevent a race condition between a user request to add/remove a port and an update. The second issue was we were deleting filters that were not associated with the actual filter we wanted to delete. Harshitha ensures that the queue map sent by the VF is taken into account when enabling/disabling queues in the VF VSI. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-01 09:37:44 -04:00
Sun Lianwen	154a8c46ba	change the comment of vti6_ioctl The comment of vti6_ioctl() is wrong. which use vti6_tnl_ioctl instead of vti6_ioctl. Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-05-01 09:38:41 +02:00
Linus Torvalds	fff75eb2a0	errseq infrastructure fix for v4.17 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJa55HYAAoJEAAOaEEZVoIVbr4QAICUweu5RqbcNm6/zGZkYJvP Cmk9bRhqluM3sx7RpDpsgde12z5j6qt+MbS1k+uZ7+V5gsO25jYuQYNGs6io6IEL WzO81HsMmqqqV30MI0/S4ffaYDtqTfJdKkbWvvV7aoFUZpz5oHNjqQM30hSRzy+i 1e6EfxtzUvaavekCvVCqQoDwXWMp1jaxCIFPNiIY82NPmszovydaZiZORL6qRXIM x/n3gWLJv2c0iRr6PBLIwKVOqPmdWwsmgSXTvP5/IH6LuQOZjmfHCJaxA8HZeFyz WvVTaH7aGtYp/pZt6UzvUlCXXcDKVYR4RmCfHW5+OXBsRQPfkSWKRguDXsvedrHe vQN7csCR47AwBsGtli6EF/VzwQnbCLjxOJ8kxVItPHlplYWeRsQCNw0JT9i6wB2k OIuRwekQTYUTXAHYm7XM21wS+JJbcTfANIfmmmfIrcF/Z/Zard5/UuCPSaKcizTy +qVhYE7m+cNYvARgxYrDvCXO7fnef6fj+cC4DrPmqfkvaJfd4U61Yls7QDi9NuO9 DNehjrWOR1ZLlI98aHjs65tilVFA++k2YdR8283ZqXE1tksUI3w/JvBa/h/H/9lr F290fgEe31yMAv3ky0QKm3Glr5Qu9I6AOSWWIHCmW7w6CkFB/K/1iKgy3h2g6rPL 3cMUf/uV4mA9RzAhSgx3 =pLgb -----END PGP SIGNATURE----- Merge tag 'errseq-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull errseq infrastructure fix from Jeff Layton: "The PostgreSQL developers recently had a spirited discussion about the writeback error handling in Linux, and reached out to us about a behavoir change to the code that bit them when the errseq_t changes were merged. When we changed to using errseq_t for tracking writeback errors, we lost the ability for an application to see a writeback error that occurred before the open on which the fsync was issued. This was problematic for PostgreSQL which offloads fsync calls to a completely separate process from the DB writers. This patch restores that ability. If the errseq_t value in the inode does not have the SEEN flag set, then we just return 0 for the sample. That ensures that any recorded error is always delivered at least once. Note that we might still lose the error if the inode gets evicted from the cache before anything can reopen it, but that was the case before errseq_t was merged. At LSF/MM we had some discussion about keeping inodes with unreported writeback errors around in the cache for longer (possibly indefinitely), but that's really a separate problem" * tag 'errseq-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: errseq: Always report a writeback error once	2018-04-30 16:53:40 -07:00
Linus Torvalds	8188fc8bef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc - Fixup license text for oradax driver, from Rob Gardner. - Release device object with put_device() instead of straight kfree(), from Arvind Yadav. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: vio: use put_device() instead of kfree() sparc64: Fix mistake in oradax license text	2018-04-30 13:27:16 -07:00
Arvind Yadav	00ad691ab1	sparc: vio: use put_device() instead of kfree() Never directly free @dev after calling device_register(), even if it returned an error. Always use put_device() to give up the reference initialized. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 16:09:34 -04:00
Rob Gardner	d3c68d0b41	sparc64: Fix mistake in oradax license text The license text in both oradax files mistakenly specifies "version 3" of the GNU General Public License. This is corrected to specify "version 2". Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 16:06:01 -04:00
David S. Miller	8231bee646	Merge branch 'mlxsw-SPAN-Support-routes-pointing-at-bridges' Ido Schimmel says: ==================== mlxsw: SPAN: Support routes pointing at bridges Petr says: When mirroring to a gretap or ip6gretap netdevice, the route that directs the encapsulated packets can reference a bridge. In that case, in the software model, the packet is switched. Thus when offloading mirroring like that, take into consideration FDB, STP, PVID configured at the bridge, and whether that VLAN ID should be tagged on egress. Patch #1 introduces functions to get bridge PVID, VLAN flags and to look up an FDB entry. Patches #2 and #3 refactor some existing code and introduce a new accessor function. With patches #4 and #5 mlxsw calls mlxsw_sp_span_respin() on switchdev events as well. There is no impact yet, because bridge as an underlay device is still not allowed. That is implemented in patch #6, which uses the new interfaces to figure out on which one port the mirroring should be configured, and whether the mirrored packets should be VLAN-tagged and how. Changes from v2 to v3: - Rename the suite of bridge accessor function to br_vlan_get_pvid(), br_vlan_get_info() and br_fdb_find_port(). The _get bit is to avoid clashing with an existing static function. Changes from v1 to v2: - Change the suite of bridge accessor functions to br_vlan_pvid_rtnl(), br_vlan_info_rtnl(), br_fdb_find_port_rtnl(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 12:42:41 -04:00
Petr Machata	946a11e740	mlxsw: spectrum_span: Allow bridge for gretap mirror When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the underlay address (i.e. the remote address of the tunnel) may be routed to a bridge. In that case, look up the resolved neighbor Ethernet address in that bridge's FDB. Then configure the offload to direct the mirrored traffic to that port, possibly with tagging. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 12:42:40 -04:00
Petr Machata	c520bc6986	mlxsw: Respin SPAN on switchdev events Changes to switchdev artifact can make a SPAN entry offloadable or unoffloadable. To that end: - Listen to SWITCHDEV_FDB__TO_BRIDGE notifications in addition to the _TO_DEVICE ones, to catch whatever activity is sent to the bridge (likely by mlxsw itself). On each FDB notification, respin SPAN to reconcile it with the FDB changes. - Also respin on switchdev port attribute changes (which currently covers changes to STP state of ports) and port object additions and removals. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 12:42:40 -04:00
Petr Machata	cda880de93	mlxsw: spectrum: Register SPAN before switchdev Since switchdev events can trigger SPAN respin, it is necessary that the data structures are available. Register SPAN first, with a commentary on what the dependencies are. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 12:42:40 -04:00
Petr Machata	ea93c7b608	mlxsw: spectrum_switchdev: Publish two functions Publish the existing function mlxsw_sp_bridge_port_find(), and add another service accessor mlxsw_sp_bridge_port_stp_state(). Publish both in a new file spectrum_switchdev.h. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 12:42:40 -04:00
Petr Machata	541e11595c	mlxsw: spectrum: Extract mlxsw_sp_stp_spms_state() Instead of duplicating the decision regarding port forwarding state made by mlxsw_sp_port_vid_stp_set(), extract the decision-making into a new function and reuse. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-30 12:42:40 -04:00

... 3 4 5 6 7 ...

753622 Commits