linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-15 08:31:55 +00:00

Author	SHA1	Message	Date
Martin KaFai Lau	cfea5a688e	tcp: Merge tx_flags and tskey in tcp_shifted_skb After receiving sacks, tcp_shifted_skb() will collapse skbs if possible. tx_flags and tskey also have to be merged. This patch reuses the tcp_skb_collapse_tstamp() to handle them. BPF Output Before: ~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~ <...>-2024 [007] d.s. 88.644374: : ee_data:14599 Packetdrill Script: ~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 1460) = 1460 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 14:40:55 -04:00
Martin KaFai Lau	082ac2d51d	tcp: Merge tx_flags and tskey in tcp_collapse_retrans If two skbs are merged/collapsed during retransmission, the current logic does not merge the tx_flags and tskey. The end result is the SCM_TSTAMP_ACK timestamp could be missing for a packet. The patch: 1. Merge the tx_flags 2. Overwrite the prev_skb's tskey with the next_skb's tskey BPF Output Before: ~~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~~ packetdrill-2092 [001] d.s. 453.998486: : ee_data:1459 Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 write(4, ..., 11680) = 11680 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 > P. 1:731(730) ack 1 0.200 > P. 731:1461(730) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:13141(4380) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 13141 win 257 0.400 close(4) = 0 0.400 > F. 13141:13141(0) ack 1 0.500 < F. 1:1(0) ack 13142 win 257 0.500 > . 13142:13142(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 14:40:55 -04:00
Grygorii Strashko	3fa88c51c7	drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open The cpsw_ndo_open() could try to access CPSW registers before calling pm_runtime_get_sync(). This will trigger L3 error: WARNING: CPU: 0 PID: 21 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x220/0x34c() 44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_FAST (Idle): Data Access in Supervisor mode during Functional access and CPSW will stop functioning. Hence, fix it by moving pm_runtime_get_sync() before the first access to CPSW registers in cpsw_ndo_open(). Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 14:30:39 -04:00
Martin KaFai Lau	479f85c366	tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received, it could incorrectly think that a skb has already been acked and queue a SCM_TSTAMP_ACK cmsg to the sk->sk_error_queue. In tcp_ack_tstamp(), it checks 'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'. If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill script, between() returns true but the tskey is actually not acked. e.g. try between(3, 2, 1). The fix is to replace between() with one before() and one !before(). By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be removed. A packetdrill script is used to reproduce the dup ack scenario. Due to the lacking cmsg support in packetdrill (may be I cannot find it), a BPF prog is used to kprobe to sock_queue_err_skb() and print out the value of serr->ee.ee_data. Both the packetdrill and the bcc BPF script is attached at the end of this commit message. BPF Output Before Fix: ~~~~~~ <...>-2056 [001] d.s. 433.927987: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 433.929563: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 433.930765: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 434.028177: : ee_data:1459 packetdrill-2056 [001] d.s. 434.029686: : ee_data:14599 BPF Output After Fix: ~~~~~~ <...>-2049 [000] d.s. 113.517039: : ee_data:1459 <...>-2049 [000] d.s. 113.517253: : ee_data:14599 BCC BPF Script: ~~~~~~ #!/usr/bin/env python from __future__ import print_function from bcc import BPF bpf_text = """ #include <uapi/linux/ptrace.h> #include <net/sock.h> #include <bcc/proto.h> #include <linux/errqueue.h> #ifdef memset #undef memset #endif int trace_err_skb(struct pt_regs ctx) { struct sk_buff skb = (struct sk_buff )ctx->si; struct sock sk = (struct sock )ctx->di; struct sock_exterr_skb serr; u32 ee_data = 0; if (!sk \|\| !skb) return 0; serr = SKB_EXT_ERR(skb); bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data); bpf_trace_printk("ee_data:%u\\n", ee_data); return 0; }; """ b = BPF(text=bpf_text) b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb") print("Attached to kprobe") b.trace_print() Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 1460) = 1460 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 13:45:43 -04:00
Joe Stringer	49e261a8a2	openvswitch: Orphan skbs before IPv6 defrag This is the IPv6 counterpart to commit `8282f27449` ("inet: frag: Always orphan skbs inside ip_defrag()"). Prior to commit `029f7f3b87` ("netfilter: ipv6: nf_defrag: avoid/free clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be cloned (implicitly orphaning) prior to queueing for reassembly. As such, when the IPv6 message is eventually reassembled, the skb->sk for all fragments would be NULL. After that commit was introduced, rather than cloning, the original skbs were queued directly without orphaning. The end result is that all frags except for the first and last may have a socket attached. This commit explicitly orphans such skbs during nf_ct_frag6_gather() to prevent BUG_ON(skb->sk) during a later call to ip6_fragment(). kernel BUG at net/ipv6/ip6_output.c:631! [...] Call Trace: <IRQ> [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch] [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70 [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch] [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20 [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80 [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch] [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch] [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch] [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch] [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch] [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch] [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch] [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660 [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0 [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950 [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff81690990>] dev_queue_xmit+0x10/0x20 [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220 [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0 [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0 [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0 [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500 [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580 [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690 [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690 [<ffffffff817567a0>] ip6_input+0x30/0xa0 [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0 [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0 [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0 [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0 [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0 [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230 [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70 [<ffffffff8168c088>] process_backlog+0x78/0x230 [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230 [<ffffffff8168db43>] net_rx_action+0x203/0x480 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817c156e>] __do_softirq+0xde/0x49f [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0 [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40 [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0 [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0 [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30 [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0 [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0 [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0 [<ffffffff8166d078>] sock_sendmsg+0x38/0x50 [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170 [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d [<ffffffff8166e38e>] SyS_sendto+0xe/0x10 [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8 Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8# RIP [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50 RSP <ffff880072803120> Fixes: `029f7f3b87` ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 13:42:05 -04:00
Andrew Goodbody	df63719390	Revert "Prevent NUll pointer dereference with two PHYs on cpsw" This reverts commit `cfe2556001` This can result in a "Unable to handle kernel paging request" during boot. This was due to using an uninitialised struct member, data->slaves. Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-20 12:01:36 -04:00
Jorgen Hansen	9c995cc9a2	VSOCK: Only check error on skb_recv_datagram when skb is NULL If skb_recv_datagram returns an skb, we should ignore the err value returned. Otherwise, datagram receives will return EAGAIN when they have to wait for a datagram. Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:42:01 -04:00
Konstantin Khlebnikov	2309236c13	cls_cgroup: get sk_classid only from full sockets skb->sk could point to timewait or request socket which has no sk_classid. Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify". Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:09:25 -04:00
Konstantin Khlebnikov	851b10d608	net/mlx4_en: do batched put_page using atomic_sub This patch fixes couple error paths after allocation failures. Atomic set of page reference counter is safe only if it is zero, otherwise set can race with any speculative get_page_unless_zero. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:04:24 -04:00
Konstantin Khlebnikov	04aeb56a17	net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC High order pages are optional here since commit `51151a16a6` ("mlx4: allow order-0 memory allocations in RX path"), so here is no reason for depleting reserves. Generic __netdev_alloc_frag() implements the same logic. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:04:24 -04:00
Arnd Bergmann	ab2ed0171a	macsec: fix crypto Kconfig dependency The new MACsec driver uses the AES crypto algorithm, but can be configured even if CONFIG_CRYPTO is disabled, leading to a build error: warning: (MAC80211 && MACSEC) selects CRYPTO_GCM which has unmet direct dependencies (CRYPTO) warning: (BT && CEPH_LIB && INET && MAC802154 && MAC80211 && BLK_DEV_RBD && MACSEC && AIRO_CS && LIBIPW && HOSTAP && USB_WUSB && RTLLIB_CRYPTO_CCMP && FS_ENCRYPTION && EXT4_ENCRYPTION && CEPH_FS && BIG_KEYS && ENCRYPTED_KEYS) selects CRYPTO_AES which has unmet direct dependencies (CRYPTO) crypto/built-in.o: In function `gcm_enc_copy_hash': aes_generic.c:(.text+0x2b8): undefined reference to `crypto_xor' aes_generic.c:(.text+0x2dc): undefined reference to `scatterwalk_map_and_copy' This adds an explicit 'select CRYPTO' statement the way that other drivers handle it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `c09440f7dc` ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-17 18:39:02 -04:00
Eric Dumazet	6517eb59b0	net: bcmgenet: device stats are unsigned long On 64bit kernels, device stats are 64bit wide, not 32bit. Fixes: `1c1008c793` ("net: bcmgenet: add main driver file") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 22:03:39 -04:00
David S. Miller	cf6b5fb251	Merge branch 'dsa-mv88e6xxx-fix-cross-chip-bridging' Vivien Didelot says: ==================== net: dsa: mv88e6xxx: fix hardware cross-chip bridging In order to accelerate cross-chip switching of frames with the hardware, the DSA Tag ports, used to interconnect switch devices, must learn SA and DA addresses, and share the same FDB with the user ports. The two first patches restore address learning on DSA links. This fixes hardware cross-chip bridging in a VLAN filtering enabled system, which implements a bridge group as a 802.1Q VLAN and thus share an isolated address database between DSA and user ports. The third patch changes the distinct default databases used for each port, to the same address database. This fixes the hardware cross-chip bridging in a VLAN filtering disabled system, where a bridge group gets implemented only as a port-based VLAN. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:07:11 -04:00
Vivien Didelot	207afda1b5	net: dsa: mv88e6xxx: share the same default FDB For hardware cross-chip bridging to work, user ports and DSA ports need to share a common address database, in order to switch a frame to the correct interconnected device. This is currently working for VLAN filtering aware systems, since Linux will implement a bridge group as a 802.1Q VLAN, which has its own FDB, including DSA and CPU links as members. However when the system doesn't support VLAN filtering, Linux only relies on the port-based VLAN to implement a bridge group. To fix hardware cross-chip bridging for such systems, set the same default address database 0 for user and DSA ports, instead of giving them all a different default database. Note that the bridging code prevents frames to egress between unbridged ports, and flushes FDB entries of a port when changing its STP state. Also note that the FID 0 is special and means "all" for ATU operations, but it's OK since it is used as a default forwarding address database. Fixes: `2db9ce1fd9` ("net: dsa: mv88e6xxx: assign default FDB to ports") Fixes: `466dfa0770` ("net: dsa: mv88e6xxx: assign dynamic FDB to bridges") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:07:10 -04:00
Vivien Didelot	996ecb8246	net: dsa: mv88e6xxx: enable SA learning on DSA ports In multi-chip systems, DSA Tag ports must learn SA addresses in order to correctly switch frames between interconnected chips. This fixes cross-chip hardware bridging in a VLAN filtering aware system, because a bridge group gets implemented as an hardware 802.1Q VLAN and thus DSA and user ports share the same FDB. Fixes: `4c7ea3c079` ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:07:10 -04:00
Vivien Didelot	65fa40276a	net: dsa: mv88e6xxx: unlock DSA and CPU ports Locking a port generates an hardware interrupt when a new SA address is received. This enables CPU directed learning, which is needed for 802.1X MAC authentication. To disable automatic learning on a port, the only configuration needed is to set its Port Association Vector to all zero. Clear PAV when SA learning should be disabled instead of locking a port. Fixes: `4c7ea3c079` ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:07:10 -04:00
santosh.shilimkar@oracle.com	e47db94e10	RDS: Fix the atomicity for congestion map update Two different threads with different rds sockets may be in rds_recv_rcvbuf_delta() via receive path. If their ports both map to the same word in the congestion map, then using non-atomic ops to update it could cause the map to be incorrect. Lets use atomics to avoid such an issue. Full credit to Wengang <wen.gang.wang@oracle.com> for finding the issue, analysing it and also pointing out to offending code with spin lock based fix. Reviewed-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:01:05 -04:00
Qing Huang	a7c556546f	RDS: fix endianness for dp_ack_seq dp->dp_ack_seq is used in big endian format. We need to do the big endianness conversion when we assign a value in host format to it. Signed-off-by: Qing Huang <qing.huang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:01:05 -04:00
Daniel Borkmann	9241e2df4f	vlan: pull on __vlan_insert_tag error path and fix csum correction When __vlan_insert_tag() fails from skb_vlan_push() path due to the skb_cow_head(), we need to undo the __skb_push() in the error path as well that was done earlier to move skb->data pointer to mac header. Moreover, I noticed that when in the non-error path the __skb_pull() is done and the original offset to mac header was non-zero, we fixup from a wrong skb->data offset in the checksum complete processing. So the skb_postpush_rcsum() really needs to be done before __skb_pull() where skb->data still points to the mac header start and thus operates under the same conditions as in __vlan_insert_tag(). Fixes: `93515d53b1` ("net: move vlan pop/push functions into common code") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 23:20:11 -04:00
Andrew Goodbody	cfe2556001	cpsw: Prevent NUll pointer dereference with two PHYs Adding a 2nd PHY to cpsw results in a NULL pointer dereference as below. Fix by maintaining a reference to each PHY node in slave struct instead of a single reference in the priv struct which was overwritten by the 2nd PHY. [ 17.870933] Unable to handle kernel NULL pointer dereference at virtual address 00000180 [ 17.879557] pgd = dc8bc000 [ 17.882514] [00000180] pgd=9c882831, pte=00000000, *ppte=00000000 [ 17.889213] Internal error: Oops: 17 [#1] ARM [ 17.893838] Modules linked in: [ 17.897102] CPU: 0 PID: 1657 Comm: connmand Not tainted 4.5.0-ge463dfb-dirty #11 [ 17.904947] Hardware name: Cambrionix whippet [ 17.909576] task: dc859240 ti: dc968000 task.ti: dc968000 [ 17.915339] PC is at phy_attached_print+0x18/0x8c [ 17.920339] LR is at phy_attached_info+0x14/0x18 [ 17.925247] pc : [<c042baec>] lr : [<c042bb74>] psr: 600f0113 [ 17.925247] sp : dc969cf8 ip : dc969d28 fp : dc969d18 [ 17.937425] r10: dda7a400 r9 : 00000000 r8 : 00000000 [ 17.942971] r7 : 00000001 r6 : ddb00480 r5 : ddb8cb34 r4 : 00000000 [ 17.949898] r3 : c0954cc0 r2 : c09562b0 r1 : 00000000 r0 : 00000000 [ 17.956829] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 17.964401] Control: 10c5387d Table: 9c8bc019 DAC: 00000051 [ 17.970500] Process connmand (pid: 1657, stack limit = 0xdc968210) [ 17.977059] Stack: (0xdc969cf8 to 0xdc96a000) [ 17.981692] 9ce0: dc969d28 dc969d08 [ 17.990386] 9d00: c038f9bc c038f6b4 ddb00480 dc969d34 dc969d28 c042bb74 c042bae4 00000000 [ 17.999080] 9d20: c09562b0 c0954cc0 dc969d5c dc969d38 c043ebfc c042bb6c 00000007 00000003 [ 18.007773] 9d40: ddb00000 ddb8cb58 ddb00480 00000001 dc969dec dc969d60 c0441614 c043ea68 [ 18.016465] 9d60: 00000000 00000003 00000000 fffffff4 dc969df4 0000000d 00000000 00000000 [ 18.025159] 9d80: dc969db4 dc969d90 c005dc08 c05839e0 dc969df4 0000000d ddb00000 00001002 [ 18.033851] 9da0: 00000000 00000000 dc969dcc dc969db8 c005ddf4 c005dbc8 00000000 00000118 [ 18.042544] 9dc0: dc969dec dc969dd0 ddb00000 c06db27c ffff9003 00001002 00000000 00000000 [ 18.051237] 9de0: dc969e0c dc969df0 c057c88c c04410dc dc969e0c ddb00000 ddb00000 00000001 [ 18.059930] 9e00: dc969e34 dc969e10 c057cb44 c057c7d8 ddb00000 ddb00138 00001002 beaeda20 [ 18.068622] 9e20: 00000000 00000000 dc969e5c dc969e38 c057cc28 c057cac0 00000000 dc969e80 [ 18.077315] 9e40: dda7a40c beaeda20 00000000 00000000 dc969ecc dc969e60 c05e36d0 c057cc14 [ 18.086007] 9e60: dc969e84 00000051 beaeda20 00000000 dda7a40c 00000014 ddb00000 00008914 [ 18.094699] 9e80: 30687465 00000000 00000000 00000000 00009003 00000000 00000000 00000000 [ 18.103391] 9ea0: 00001002 00008914 dd257ae0 beaeda20 c098a428 beaeda20 00000011 00000000 [ 18.112084] 9ec0: dc969edc dc969ed0 c05e4e54 c05e3030 dc969efc dc969ee0 c055f5ac c05e4cc4 [ 18.120777] 9ee0: beaeda20 dd257ae0 dc8ab4c0 00008914 dc969f7c dc969f00 c010b388 c055f45c [ 18.129471] 9f00: c071ca40 dd257ac0 c00165e8 dc968000 dc969f3c dc969f20 dc969f64 dc969f28 [ 18.138164] 9f20: c0115708 c0683ec8 dd257ac0 dd257ac0 dc969f74 dc969f40 c055f350 c00fc66c [ 18.146857] 9f40: dd82e4d0 00000011 00000000 00080000 dd257ac0 00000000 dc8ab4c0 dc8ab4c0 [ 18.155550] 9f60: 00008914 beaeda20 00000011 00000000 dc969fa4 dc969f80 c010bc34 c010b2fc [ 18.164242] 9f80: 00000000 00000011 00000002 00000036 c00165e8 dc968000 00000000 dc969fa8 [ 18.172935] 9fa0: c00163e0 c010bbcc 00000000 00000011 00000011 00008914 beaeda20 00009003 [ 18.181628] 9fc0: 00000000 00000011 00000002 00000036 00081018 00000001 00000000 beaedc10 [ 18.190320] 9fe0: 00083188 beaeda1c 00043a5d b6d29c0c 600b0010 00000011 00000000 00000000 [ 18.198989] Backtrace: [ 18.201621] [<c042bad8>] (phy_attached_print) from [<c042bb74>] (phy_attached_info+0x14/0x18) [ 18.210664] r3:c0954cc0 r2:c09562b0 r1:00000000 [ 18.215588] r4:ddb00480 [ 18.218322] [<c042bb60>] (phy_attached_info) from [<c043ebfc>] (cpsw_slave_open+0x1a0/0x280) [ 18.227293] [<c043ea5c>] (cpsw_slave_open) from [<c0441614>] (cpsw_ndo_open+0x544/0x674) [ 18.235874] r7:00000001 r6:ddb00480 r5:ddb8cb58 r4:ddb00000 [ 18.241944] [<c04410d0>] (cpsw_ndo_open) from [<c057c88c>] (__dev_open+0xc0/0x128) [ 18.249972] r9:00000000 r8:00000000 r7:00001002 r6:ffff9003 r5:c06db27c r4:ddb00000 [ 18.258255] [<c057c7cc>] (__dev_open) from [<c057cb44>] (__dev_change_flags+0x90/0x154) [ 18.266745] r5:00000001 r4:ddb00000 [ 18.270575] [<c057cab4>] (__dev_change_flags) from [<c057cc28>] (dev_change_flags+0x20/0x50) [ 18.279523] r9:00000000 r8:00000000 r7:beaeda20 r6:00001002 r5:ddb00138 r4:ddb00000 [ 18.287811] [<c057cc08>] (dev_change_flags) from [<c05e36d0>] (devinet_ioctl+0x6ac/0x76c) [ 18.296483] r9:00000000 r8:00000000 r7:beaeda20 r6:dda7a40c r5:dc969e80 r4:00000000 [ 18.304762] [<c05e3024>] (devinet_ioctl) from [<c05e4e54>] (inet_ioctl+0x19c/0x1c8) [ 18.312882] r10:00000000 r9:00000011 r8:beaeda20 r7:c098a428 r6:beaeda20 r5:dd257ae0 [ 18.321235] r4:00008914 [ 18.323956] [<c05e4cb8>] (inet_ioctl) from [<c055f5ac>] (sock_ioctl+0x15c/0x2d8) [ 18.331829] [<c055f450>] (sock_ioctl) from [<c010b388>] (do_vfs_ioctl+0x98/0x8d0) [ 18.339765] r7:00008914 r6:dc8ab4c0 r5:dd257ae0 r4:beaeda20 [ 18.345822] [<c010b2f0>] (do_vfs_ioctl) from [<c010bc34>] (SyS_ioctl+0x74/0x84) [ 18.353573] r10:00000000 r9:00000011 r8:beaeda20 r7:00008914 r6:dc8ab4c0 r5:dc8ab4c0 [ 18.361924] r4:00000000 [ 18.364653] [<c010bbc0>] (SyS_ioctl) from [<c00163e0>] (ret_fast_syscall+0x0/0x3c) [ 18.372682] r9:dc968000 r8:c00165e8 r7:00000036 r6:00000002 r5:00000011 r4:00000000 [ 18.380960] Code: e92dd810 e24cb010 e24dd010 e59b4004 (e5902180) [ 18.387580] ---[ end trace c80529466223f3f3 ]--- Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:24:37 -04:00
Felix Fietkau	c02bc350f9	bgmac: fix MAC soft-reset bit for corerev > 4 Only core revisions older than 4 use BGMAC_CMDCFG_SR_REV0. This mainly fixes support for BCM4708A0KF SoCs with Ethernet core rev 5 (it means only some devices as most of BCM4708A0KF-s got core rev 4). This was tested for regressions on BCM47094 which doesn't seem to care which bit gets used. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:17:14 -04:00
David S. Miller	01c445a4bd	Merge branch 'soreuseport-mixed-v4-v6-fixes' Craig Gallek says: ==================== Fixes for SO_REUSEPORT and mixed v4/v6 sockets Recent changes to the datastructures associated with SO_REUSEPORT broke an existing behavior when equivalent SO_REUSEPORT sockets are created using both AF_INET and AF_INET6. This patch series restores the previous behavior and includes a test to validate it. This series should be a trivial merge to stable kernels (if deemed necessary), but will have conflicts in net-next. The following patches recently replaced the use of hlist_nulls with hlists for UDP and TCP socket lists: `ca065d0cf8` ("udp: no longer use SLAB_DESTROY_BY_RCU") `3b24d854cb` ("tcp/dccp: do not touch listener sk_refcnt under synflood") If this series is accepted, I will send an RFC for the net-next change to assist with the merge. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:14:04 -04:00
Craig Gallek	d6a61f80b8	soreuseport: test mixed v4/v6 sockets Test to validate the behavior of SO_REUSEPORT sockets that are created with both AF_INET and AF_INET6. See the commit prior to this for a description of this behavior. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:14:04 -04:00
Craig Gallek	d894ba18d4	soreuseport: fix ordering for mixed v4/v6 sockets With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets. Prior to the commits referenced below, an incoming IPv4 packet would always be routed to a socket of type AF_INET when this mixed-mode was used. After those changes, the same packet would be routed to the most recently bound socket (if this happened to be an AF_INET6 socket, it would have an IPv4 mapped IPv6 address). The change in behavior occurred because the recent SO_REUSEPORT optimizations short-circuit the socket scoring logic as soon as they find a match. They did not take into account the scoring logic that favors AF_INET sockets over AF_INET6 sockets in the event of a tie. To fix this problem, this patch changes the insertion order of AF_INET and AF_INET6 addresses in the TCP and UDP socket lists when the sockets have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at the tail of the list. This will force AF_INET sockets to always be considered first. Fixes: `e32ea7e747` ("soreuseport: fast reuseport UDP socket selection") Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection") Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:14:03 -04:00
Bjørn Mork	c5b5343cfb	cdc_mbim: apply "NDP to end" quirk to all Huawei devices We now have a positive report of another Huawei device needing this quirk: The ME906s-158 (12d1:15c1). This is an m.2 form factor modem with no obvious relationship to the E3372 (12d1:157d) we already have a quirk entry for. This is reason enough to believe the quirk might be necessary for any number of current and future Huawei devices. Applying the quirk to all Huawei devices, since it is crucial to any device affected by the firmware bug, while the impact on non-affected devices is negligible. The quirk can if necessary be disabled per-device by writing N to /sys/class/net/<iface>/cdc_ncm/ndp_to_end Reported-by: Andreas Fett <andreas.fett@secunet.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:04:04 -04:00
Rafał Miłecki	b4dfd8e929	bgmac: reset & enable Ethernet core before using it This fixes Ethernet on D-Link DIR-885L with BCM47094 SoC. Felix reported similar fix was needed for his BCM4709 device (Buffalo WXR-1900DHP?). I tested this for regressions on BCM4706, BCM4708A0 and BCM47081A0. Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 17:15:55 -04:00
David S. Miller	97cc931fc9	Merge branch 'ipv6-dgram-dst-cache' Martin KaFai Lau says: ==================== ipv6: datagram: Update dst cache of a connected udp sk during pmtu update v2: ~ Protect __sk_dst_get() operations with rcu_read_lock in release_cb() because another thread may do ip6_dst_store() for a udp sk without taking the sk lock (e.g. in sendmsg). ~ Do a ipv6_addr_v4mapped(&sk->sk_v6_daddr) check before calling ip6_datagram_dst_update() in patch 3 and 4. It is similar to how __ip6_datagram_connect handles it. ~ One fix in ip6_datagram_dst_update() in patch 2. It needs to check (np->flow_label & IPV6_FLOWLABEL_MASK) before doing fl6_sock_lookup. I was confused with the naming of IPV6_FLOWLABEL_MASK and IPV6_FLOWINFO_MASK. ~ Check dst->obsolete just on the safe side, although I think it should at least have DST_OBSOLETE_FORCE_CHK by now. ~ Add Fixes tag to patch 3 and 4 ~ Add some points from the previous discussion about holding sk lock to the commit message in patch 3. v1: There is a case in connected UDP socket such that getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible sequence could be the following: 1. Create a connected UDP socket 2. Send some datagrams out 3. Receive a ICMPV6_PKT_TOOBIG 4. No new outgoing datagrams to trigger the sk_dst_check() logic to update the sk->sk_dst_cache. 5. getsockopt(IPV6_MTU) returns the mtu from the invalid sk->sk_dst_cache instead of the newly created RTF_CACHE clone. Patch 1 and 2 are the prep work. Patch 3 and 4 are the fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:29:54 -04:00
Martin KaFai Lau	e646b657f6	ipv6: udp: Do a route lookup and update during release_cb This patch adds a release_cb for UDPv6. It does a route lookup and updates sk->sk_dst_cache if it is needed. It picks up the left-over job from ip6_sk_update_pmtu() if the sk was owned by user during the pmtu update. It takes a rcu_read_lock to protect the __sk_dst_get() operations because another thread may do ip6_dst_store() without taking the sk lock (e.g. sendmsg). Fixes: `45e4fd2668` ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Wei Wang <weiwan@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:29:53 -04:00
Martin KaFai Lau	33c162a980	ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update There is a case in connected UDP socket such that getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible sequence could be the following: 1. Create a connected UDP socket 2. Send some datagrams out 3. Receive a ICMPV6_PKT_TOOBIG 4. No new outgoing datagrams to trigger the sk_dst_check() logic to update the sk->sk_dst_cache. 5. getsockopt(IPV6_MTU) returns the mtu from the invalid sk->sk_dst_cache instead of the newly created RTF_CACHE clone. This patch updates the sk->sk_dst_cache for a connected datagram sk during pmtu-update code path. Note that the sk->sk_v6_daddr is used to do the route lookup instead of skb->data (i.e. iph). It is because a UDP socket can become connected after sending out some datagrams in un-connected state. or It can be connected multiple times to different destinations. Hence, iph may not be related to where sk is currently connected to. It is done under '!sock_owned_by_user(sk)' condition because the user may make another ip6_datagram_connect() (i.e changing the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update code path. For the sock_owned_by_user(sk) == true case, the next patch will introduce a release_cb() which will update the sk->sk_dst_cache. Test: Server (Connected UDP Socket): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Route Details: [root@arch-fb-vm1 ~]# ip -6 r show \| egrep '2fac' 2fac::/64 dev eth0 proto kernel metric 256 pref medium 2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium A simple python code to create a connected UDP socket: import socket import errno HOST = '2fac::1' PORT = 8080 s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) s.bind((HOST, PORT)) s.connect(('2fac:face::face', 53)) print("connected") while True: try: data = s.recv(1024) except socket.error as se: if se.errno == errno.EMSGSIZE: pmtu = s.getsockopt(41, 24) print("PMTU:%d" % pmtu) break s.close() Python program output after getting a ICMPV6_PKT_TOOBIG: [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py connected PMTU:1300 Cache routes after recieving TOOBIG: [root@arch-fb-vm1 ~]# ip -6 r show table cache 2fac:face::face via 2fac::face dev eth0 metric 0 cache expires 463sec mtu 1300 pref medium Client (Send the ICMPV6_PKT_TOOBIG): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ scapy is used to generate the TOOBIG message. Here is the scapy script I have used: >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac:: 1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53) >>> sendp(p, iface='qemubr0') Fixes: `45e4fd2668` ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Wei Wang <weiwan@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:29:51 -04:00
Martin KaFai Lau	7e2040db15	ipv6: datagram: Refactor dst lookup and update codes to a new function This patch moves the route lookup and update codes for connected datagram sk to a newly created function ip6_datagram_dst_update() It will be reused during the pmtu update in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:29:50 -04:00
Martin KaFai Lau	80fbdb208f	ipv6: datagram: Refactor flowi6 init codes to a new function Move flowi6 init codes for connected datagram sk to a newly created function ip6_datagram_flow_key_init(). Notes: 1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect 2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of (addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init() This new function will be reused during pmtu update in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:29:49 -04:00
John Crispin	f1d0540db6	net: mediatek: update the IRQ part of the binding document The current binding document only describes a single interrupt. Update the document by adding the 2 other interrupts. The driver currently only uses a single interrupt. The HW is however able to using IRQ grouping to split TX and RX onto separate GIC irqs. Signed-off-by: John Crispin <blogic@openwrt.org> Cc: devicetree@vger.kernel.org Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:28:10 -04:00
David S. Miller	5e26502912	This has just the single fix from Dmitry Ivanov, adding the missing netlink notifier family check to avoid the socket close DoS problem. -----BEGIN PGP SIGNATURE----- iQIbBAABCgAGBQJXD0YaAAoJEGt7eEactAAdJqUP9R0Ljud2jnH2JeNRv1qdEAXw /PKERFsiZjAnU7kXXdOlm5mqYNJILNGbtPaKF5oi6mswzojwgaIQ+rjsHidX0SVs 1F1+VFxt0//pyfp6udEJ3gyx6wxGlncTdWUtxmzikgmMacn20ejT/wJFzF6y8MHO YSyI8NGNxdTn4eBPLL1QdipsDFYpkS7Z2P8ZUquJmR+uMccjIJ4m4kB2NM8AGQgf Z6KVjVP5dOvelWoEjSKaw2MyAAN/jRTxc9PFtEARoKMMtGMHBkn9qzin9wDjLA6k h3M34Aj5i4R8AIFWJmrCw06rDhPT6j02l4g2IXpmZQqvbP7zewyXyRUkmxMiFvQd n6zpNnQf3hDSNC8f7sBjNM5syi23j9ineJgtXdYtHqyqO0xY2qAVRYm7cSEXxFVk WaxrWDO/wtGyKm1lTQYudAAsFARIgomra4QJd5FqFdIwX2nuZekuikOUsytUqnEV ta+cwLJGLmof5jicrkoYD83974awJHcngLYMYLbkZC0kwWEZEiXX58D3o+FoF9ns 4D+qkLPRDAVIJI/W9j5yuIA6/Aj7Cy9kuxtKf5CoqxL35k47lu69aHEXEceB6fDD M24G8nwEFcMimEWeYz4AARqaEJ08PWcA3/5dgf+N0XDCvQT95OutsZbqAdR2kqzF ApCKvGmbx/sNeyZtjbs= =FvAT -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2016-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== This has just the single fix from Dmitry Ivanov, adding the missing netlink notifier family check to avoid the socket close DoS problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 12:00:59 -04:00
Alexei Starovoitov	d82bccc690	bpf/verifier: reject invalid LD_ABS \| BPF_DW instruction verifier must check for reserved size bits in instruction opcode and reject BPF_LD \| BPF_ABS \| BPF_DW and BPF_LD \| BPF_IND \| BPF_DW instructions, otherwise interpreter will WARN_RATELIMIT on them during execution. Fixes: `ddd872bc30` ("bpf: verifier: add checks for BPF_ABS \| BPF_IND instructions") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 01:31:50 -04:00
Lars Persson	3dcd493fbe	net: sched: do not requeue a NULL skb A failure in validate_xmit_skb_list() triggered an unconditional call to dev_requeue_skb with skb=NULL. This slowly grows the queue discipline's qlen count until all traffic through the queue stops. We take the optimistic approach and continue running the queue after a failure since it is unknown if later packets also will fail in the validate path. Fixes: `55a93b3ea7` ("qdisc: validate skb without holding lock") Signed-off-by: Lars Persson <larper@axis.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 01:28:51 -04:00
Mathias Krause	309cf37fe2	packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface Because we miss to wipe the remainder of i->addr[] in packet_mc_add(), pdiag_put_mclist() leaks uninitialized heap bytes via the PACKET_DIAG_MCLIST netlink attribute. Fix this by explicitly memset(0)ing the remaining bytes in i->addr[]. Fixes: `eea68e2f1a` ("packet: Report socket mclist info via diag module") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 00:46:39 -04:00
David S. Miller	41015e89d9	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2016-04-13 This series contains updates to i40e, i40evf and fm10k. Alex fixes a bug introduced earlier based on his interpretation of the XL710 datasheet. The actual limit for fragments with TSO and a skbuff that has payload data in the header portion of the buffer is actually only 7 fragments and the skb-data portion counts as 2 buffers, one for the TSO header, and the one for a segment payload buffer. Jacob fixes a bug where in a previous refactor of the code broke multi-bit updates for VFs. The problem occurs because a multi-bit request has a non-zero length, and the PF would simply drop any request with the upper 16 bits set. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 00:31:45 -04:00
Chris Friesen	d6d5e999e5	route: do not cache fib route info on local routes with oif For local routes that require a particular output interface we do not want to cache the result. Caching the result causes incorrect behaviour when there are multiple source addresses on the interface. The end result being that if the intended recipient is waiting on that interface for the packet he won't receive it because it will be delivered on the loopback interface and the IP_PKTINFO ipi_ifindex will be set to the loopback interface as well. This can be tested by running a program such as "dhcp_release" which attempts to inject a packet on a particular interface so that it is received by another program on the same board. The receiving process should see an IP_PKTINFO ipi_ifndex value of the source interface (e.g., eth1) instead of the loopback interface (e.g., lo). The packet will still appear on the loopback interface in tcpdump but the important aspect is that the CMSG info is correct. Sample dhcp_release command line: dhcp_release eth1 192.168.204.222 02:11:33:22:44:66 Signed-off-by: Allain Legacy <allain.legacy@windriver.com> Signed off-by: Chris Friesen <chris.friesen@windriver.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 23:33:01 -04:00
Jacob Keller	f808c5dbcd	fm10k: fix multi-bit VLAN update requests from VF The VF uses a multi-bit update request to clear unused VLANs whenever it resets. However, an accident in a previous refector broke multi-bit updates for VFs, due to misreading a comment in fm10k_vf.c and attempting to reduce code duplication. The problem occurs because a multi-bit request has a non-zero length, and the PF would simply drop any request with the upper 16 bits set. We can't simply remove the check of the upper 16 bits and the call to fm10k_iov_select vid, because this would remove the checks for default VID and for ensuring no other VLANs can be enabled except pf_vid when it has been set. To resolve that issue, this revision uses the iov_select_vid when we have a single-bit update, and denies any multi-bit update when the VLAN was administratively set by the PF. This should be ok since the PF properly updates VLAN_TABLE when it assigns the PF vid. This ensures that requests to add or remove the PF vid work as expected, but a rogue VF could not use the multi-bit update as a loophole to attempt receiving traffic on other VLANs. Reported-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-13 20:06:55 -07:00
David Daney	65c66af660	net: thunderx: Fix broken of_node_put() code. commit `b7d3e3d3d2` ("net: thunderx: Don't leak phy device references on -EPROBE_DEFER condition.") incorrectly moved the call to of_node_put() outside of the loop. Under normal loop exit, the node has already had of_node_put() called, so the extra call results in: [ 8.228020] ERROR: Bad of_node_put() on /soc@0/pci@848000000000/mrml-bridge0@1,0/bgx0/xlaui00 [ 8.239433] CPU: 16 PID: 608 Comm: systemd-udevd Not tainted 4.6.0-rc1-numa+ #157 [ 8.247380] Hardware name: www.cavium.com EBB8800/EBB8800, BIOS 0.3 Mar 2 2016 [ 8.273541] Call trace: [ 8.273550] [<fffffc0008097364>] dump_backtrace+0x0/0x210 [ 8.273557] [<fffffc0008097598>] show_stack+0x24/0x2c [ 8.273560] [<fffffc0008399ed0>] dump_stack+0x8c/0xb4 [ 8.273566] [<fffffc00085aa828>] of_node_release+0xa8/0xac [ 8.273570] [<fffffc000839cad8>] kobject_cleanup+0x8c/0x194 [ 8.273573] [<fffffc000839c97c>] kobject_put+0x44/0x6c [ 8.273576] [<fffffc00085a9ab0>] of_node_put+0x24/0x30 [ 8.273587] [<fffffc0000bd0f74>] bgx_probe+0x17c/0xcd8 [thunder_bgx] [ 8.273591] [<fffffc00083ed220>] pci_device_probe+0xa0/0x114 [ 8.273596] [<fffffc0008473fbc>] driver_probe_device+0x178/0x418 [ 8.273599] [<fffffc000847435c>] __driver_attach+0x100/0x118 [ 8.273602] [<fffffc0008471b58>] bus_for_each_dev+0x6c/0xac [ 8.273605] [<fffffc0008473884>] driver_attach+0x30/0x38 [ 8.273608] [<fffffc00084732f4>] bus_add_driver+0x1f8/0x29c [ 8.273611] [<fffffc0008475028>] driver_register+0x70/0x110 [ 8.273617] [<fffffc00083ebf08>] __pci_register_driver+0x60/0x6c [ 8.273623] [<fffffc0000bf0040>] bgx_init_module+0x40/0x48 [thunder_bgx] [ 8.273626] [<fffffc0008090d04>] do_one_initcall+0xcc/0x1c0 [ 8.273631] [<fffffc0008198abc>] do_init_module+0x68/0x1c8 [ 8.273635] [<fffffc0008125668>] load_module+0xf44/0x11f4 [ 8.273638] [<fffffc0008125b64>] SyS_finit_module+0xb8/0xe0 [ 8.273641] [<fffffc0008093b30>] el0_svc_naked+0x24/0x28 Go back to the previous (correct) code that only did the extra of_node_put() call on early exit from the loop. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 23:06:31 -04:00
Emrah Demir	b821646826	mISDN: Fixing missing validation in base_sock_bind() Add validation code into mISDN/socket.c Signed-off-by: Emrah Demir <ed@abdsec.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 23:00:50 -04:00
Alexander Duyck	3f3f7cb875	i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet This patch addresses a bug introduced based on my interpretation of the XL710 datasheet. Specifically section 8.4.1 states that "A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including both the header and payload buffers)." It then later goes on to say that each segment for a TSO obeys the previous rule, however it then refers to TSO header and the segment payload buffers. I believe the actual limit for fragments with TSO and a skbuff that has payload data in the header portion of the buffer is actually only 7 fragments as the skb->data portion counts as 2 buffers, one for the TSO header, and one for a segment payload buffer. Fixes: `2d37490b82` ("i40e/i40evf: Rewrite logic for 8 descriptor per packet check") Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-13 19:59:23 -07:00
David Ahern	70af921db6	net: ipv6: Do not keep linklocal and loopback addresses `f1705ec197` added the option to retain user configured addresses on an admin down. A comment to one of the later revisions suggested using the IFA_F_PERMANENT flag rather than adding a user_managed boolean to the ifaddr struct. A side effect of this change is that link local and loopback addresses are also retained which is not part of the objective of `f1705ec197`. Add check to drop those addresses. Fixes: `f1705ec197` ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:58:37 -04:00
Wolfram Sang	a6d37131c0	net: ethernet: renesas: ravb_main: test clock rate to avoid division by 0 The clk API may return 0 on clk_get_rate, so we should check the result before using it as a divisor. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:36:28 -04:00
David S. Miller	60e19518d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. More specifically, they are: 1) Fix missing filter table per-netns registration in arptables, from Florian Westphal. 2) Resolve out of bound access when parsing TCP options in nf_conntrack_tcp, patch from Jozsef Kadlecsik. 3) Prefer NFPROTO_BRIDGE extensions over NFPROTO_UNSPEC in ebtables, this resolves conflict between xt_limit and ebt_limit, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 21:49:03 -04:00
David S. Miller	4bc0eb3a1b	wireless-drivers fixes for 4.6 b43 * fix memory leaks when removing the device bcma * fix building without OF_IRQ rtlwifi * fix gcc-6 indentation warning iwlwifi * lower the debug level of a benign print * fix a memory leak -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJXDomvAAoJEG4XJFUm622bwXkH/0NnBnBRcNTNsnuNkrJ4bRu9 QVb+D0KfjnuIeOfg4ZWt4hmzw24fl4wfdjCK96P/TBYt9we900rCX1fVhLJydLfb JHqpPuc7rruvJSCv8UEzwwf3cbMFyxdrqvugVsQhx5q/OnLwfQgnVGGNbwQ+ODyU 8xDtDjq8JPhTD3aE7Y38O9aTwryu+3TBdrOWaxwld0SN4MMcfIMcP3T4Jzx5vVUI WiTdlNIxWPb90k9MBhQt+sUJQTNhs4NqeNQg2vObrxGh3L6g9ci5st9SgvuDVOLO 1a8EwBDQ1qIOuWkrdLWVRO8UN0jin9ZMGdExsOVddv4Qk63m+zCsi6tEm9pPeIQ= =jWwA -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-for-davem-2016-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.6 b43 * fix memory leaks when removing the device bcma * fix building without OF_IRQ rtlwifi * fix gcc-6 indentation warning iwlwifi * lower the debug level of a benign print * fix a memory leak ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 17:55:39 -04:00
Phil Sutter	bcf4934288	netfilter: ebtables: Fix extension lookup with identical name If a requested extension exists as module and is not loaded, ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same name and fail. Reproduced with limit match: Given xt_limit and ebt_limit both built as module, the following would fail: modprobe xt_limit ebtables -I INPUT --limit 1/s -j ACCEPT The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC extension and retry after requesting an appropriate module. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-13 01:16:57 +02:00
Dmitry Ivanov	8f815cdde3	nl80211: check netlink protocol in socket release notification A non-privileged user can create a netlink socket with the same port_id as used by an existing open nl80211 netlink socket (e.g. as used by a hostapd process) with a different protocol number. Closing this socket will then lead to the notification going to nl80211's socket release notification handler, and possibly cause an action such as removing a virtual interface. Fix this issue by checking that the netlink protocol is NETLINK_GENERIC. Since generic netlink has no notifier chain of its own, we can't fix the problem more generically. Fixes: `026331c4d9` ("cfg80211/mac80211: allow registering for and sending action frames") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-12 15:39:06 +02:00
stephen hemminger	1ecf689013	devlink: add missing install of header The new devlink.h in uapi was not being installed by make headers_install Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 21:33:44 -04:00
David Ahern	4f7f34eaab	net: vrf: Fix dev refcnt leak due to IPv6 prefix route ifupdown2 found a kernel bug with IPv6 routes and movement from the main table to the VRF table. Sequence of events: Create the interface and add addresses: ip link add dev eth4.105 link eth4 type vlan id 105 ip addr add dev eth4.105 8.105.105.10/24 ip -6 addr add dev eth4.105 2008:105:105::10/64 At this point IPv6 has inserted a prefix route in the main table even though the interface is 'down'. From there the VRF device is created: ip link add dev vrf105 type vrf table 105 ip addr add dev vrf105 9.9.105.10/32 ip -6 addr add dev vrf105 2000:9:105::10/128 ip link set vrf105 up Then the interface is enslaved, while still in the 'down' state: ip link set dev eth4.105 master vrf105 Since the device is down the VRF driver cycling the device does not send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event which does not flush the routes inserted prior. When the link is brought up ip link set dev eth4.105 up the prefix route is added in the VRF table, but does not remove the route from the main table. Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented for IPv4 in `7f49e7a38b` ("net: Flush local routes when device changes vrf association") Fixes: `35402e3136` ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:56:20 -04:00

1 2 3 4 5 ...

589096 Commits