Commit Graph

709762 Commits

Author SHA1 Message Date
Brenda J. Butler
518828fcdf tc-testing: fix arg to ip command: -s -> -n
Fixes: 31c2611b66 ("selftests: Introduce a new test case to tc testsuite")
Fixes: 76b903ee19 ("selftests: Introduce tc testsuite")
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-31 11:12:52 +09:00
Cong Wang
822e86d997 net_sched: remove tcf_block_put_deferred()
In commit 7aa0045dad ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
because qdisc is already destroyed after we queue this work.

The tcf_block_put_deferred() is no longer necessary after we get RTNL
for each tc filter destroy work, no others could jump in at this point.
Same for tcf_chain_hold(), we are fully serialized now.

This also reduces one indirection therefore makes the code more
readable. Note this brings back a rcu_barrier(), however comparing
to the code prior to commit 7aa0045dad we still reduced one
rcu_barrier(). For net-next, we can consider to refcnt tcf block to
avoid it.

Fixes: 7aa0045dad ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-31 11:06:01 +09:00
Guillaume Nault
f9e56baf03 l2tp: hold tunnel in pppol2tp_connect()
Use l2tp_tunnel_get() in pppol2tp_connect() to ensure the tunnel isn't
going to disappear while processing the rest of the function.

Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-31 10:59:52 +09:00
Jakub Kicinski
aa2bc739ef net: filter: remove unused variable and fix warning
bpf_getsockopt bpf call sets the ret variable to zero and
never changes it.  What's worse in case CONFIG_INET is
not selected the variable is completely unused generating
a warning.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-31 09:18:04 +09:00
Linus Torvalds
5f479447d9 Urgent power management fix for v4.14
This fixes new breakage introduced by the most recent PM QoS
 fix in which, embarrassingly enough, I forgot to update
 dev_pm_qos_raw_read_value() to return the right default
 for devices with no PM QoS constraints at all which prevents
 runtime PM from suspending those devices (fix from Tero Kristo).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZ97EwAAoJEILEb/54YlRxAw8P/Rtl4qgVBGncDCDlmVK7uyaa
 nFGSSAdgTVo+WlyZ/ubs1zXj8ra8AhkKIOa3rwtFI21e/DejlqTxBbUHQMqSDPZC
 JUfS6F4hkLW86BmAMeDh+tma0oG5abXvhzx1F26/1LRFsAAORW2MQ02GUzQfe04j
 iThjk1J+CmfdZ5Lkc6fIWnI5bQRo7WxuYlT1GfthJVPyorTgCviQ/p5I7012HgF1
 ZDwCbUWyEereybIMYaXkwXdyP2RltBKDlxRRWe/tC6O3m3VsDHWuO9+HoS6ZFMCH
 sicgpovEHhZagJSWdNuEK2HAvE3XoDTtWU3xSRkZpxJSarKxPD/aXZzwGkyg1x6N
 8/KmuwsBuNThQLY6ODDxBP9+1ThpbpgtFbx8JMPy5/Usdm/HlIJSHZyeGSc6BWoj
 TI8GLEzqhF61aHWM1u5Nxh8v1DsGvAXEXLWKgQlgfInUPaw1PxYv5lCa5r4fhRzr
 te3ePtbaMh2bNu0fbJyYgTcteaOz/1DPSc/keYFhhGj/iU39232IzryDZKTbz0Cn
 Zb0OVhkpQcr+1e3HPf1aTjHVQcWds4QtTmDmaoushxmBxhdBYnWX8qiwwmSnvW8S
 +kkax7UY4EE7FH+8+3cW7OfqQ7vGLqqG+l3pQptf5yeGMb/hwdWidC669Be6g9Bz
 tZ0sDywMg2MGcbeRPx/j
 =xbGZ
 -----END PGP SIGNATURE-----

Merge tag 'pm-urgent-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "This fixes new breakage introduced by the most recent PM QoS fix in
  which, embarrassingly enough, I forgot to update
  dev_pm_qos_raw_read_value() to return the right default for devices
  with no PM QoS constraints at all which prevents runtime PM from
  suspending those devices (fix from Tero Kristo)"

* tag 'pm-urgent-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / QoS: Fix default runtime_pm device resume latency
2017-10-30 16:38:03 -07:00
Linus Torvalds
b39ab98e2f Mark 'ioremap_page_range()' as possibly sleeping
It turns out that some drivers seem to think it's ok to remap page
ranges from within interrupts and even NMI's.  That is definitely not
the case, since the page table build-up is simply not interrupt-safe.

This showed up in the zero-day robot that reported it for the ACPI APEI
GHES ("Generic Hardware Error Source") driver.  Normally it had been
hidden by the fact that no page table operations had been needed because
the vmalloc area had been set up by other things.

Apparently due to a recent change to the GHEI driver: commit
77b246b32b ("acpi: apei: check for pending errors when probing GHES
entries") 0day actually caught a case during bootup whenthe ioremap
called down to page allocation.  But that recent change only showed the
symptom, it wasn't the root cause of the problem.

Hopefully it is limited to just that one driver.

If you need to access random physical memory, you either need to ioremap
in process context, or you need to use the FIXMAP facility to set one
particular fixmap entry to the required mapping - that can be done safely.

Cc: Borislav Petkov <bp@suse.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-30 10:09:56 -07:00
Linus Torvalds
daea3daaf9 MMC host:
- renesas_sdhi: fix kernel panic
  - tmio: fix swiotlb buffer is full
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZ9vhhAAoJEP4mhCVzWIwp9zAQAKeBpohfcBWTHMlZDX5mOBQW
 kyYIkWwszFyEHO74akBDZ8twsYIt1NDttRc95pXz/kEEfe5AOn0e5TBt/GCsYToy
 eVn9joQS0SCmwzTuSi44faJmLkHGAD76Ssrh06x6hM3JGR8kT4s25wODrW/Xl81Q
 JHNk4mkdLxXBYuahlftoW0U/d+ZAHmqlvf4jzUsOZtFZ9dyjEEUi3aKINbUOKlDf
 cdBT/Rg4ozGqS7seI650r2EDx/VkvfjnBn/QqAZu84WLGa6gseU/A+tTJGFdA83W
 GhabNmrid98/O+AAgBLlXBFc94mAwjUTUtgKmLOjLAsvCaDv0i1StL5Umdv1nuVo
 r2RSnM6qYzhXiD+mU/erQUSNPhFJRhoJ7yu+CsLlnGKeBcbQX+p/nkZPx29vTlcl
 t3LIbxUQJrWBNldqmDoe8QOzPfhxmFVh3LyfcQ1V/f3CFPXY4SgAwd5Nr+E0vLON
 Jp/PLtvIEVNjkO04KgExKHM2m5haZW1roX/3ZDUvifRtZ0UuUoO8/EB/v+K4cXCP
 mtMN+9n6BtDMAfqlPlsi4EjqrQmdK732erjSTLdSpKCTVIj/cNp8+LCJWUfj/xzX
 tfj+zalVe0tvBBwTZO0lE2C5MEWUG9JpPf7ntn6ZyrN1Wxv6mSrWsBiP3gtAgSka
 mn75GI9U+iIJKfIr7XYT
 =/InS
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v4.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "A couple of MMC host fixes intended for v4.14-rc8:

   - renesas_sdhi: fix kernel panic
   - tmio: fix swiotlb buffer is full"

* tag 'mmc-v4.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: renesas_sdhi: fix kernel panic in _internal_dmac.c
  mmc: tmio: fix swiotlb buffer is full
2017-10-30 09:41:54 -07:00
Linus Torvalds
1960e8eabc Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes an objtool regression"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/chacha20 - satisfy stack validation 2.0
2017-10-30 09:31:15 -07:00
Ronald Tschalär
0338b1b393 Bluetooth: hci_ldisc: Fix another race when closing the tty.
The following race condition still existed:

         P1                                P2
  cancel_work_sync()
                                     hci_uart_tx_wakeup()
                                     hci_uart_write_work()
                                     hci_uart_dequeue()
  clear_bit(HCI_UART_PROTO_READY)
  hci_unregister_dev(hdev)
  hci_free_dev(hdev)
  hu->proto->close(hu)
  kfree(hu)
                                     access to hdev and hu

Cancelling the work after clearing the HCI_UART_PROTO_READY bit avoids
this as any hci_uart_tx_wakeup() issued after the flag is cleared will
detect that and not schedule further work.

Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-30 15:48:32 +01:00
Keith Busch
5e0fab57fb nvme: Fix setting logical block format when revalidating
Revalidating the disk needs to set the logical block format and capacity,
otherwise it can't figure out if the users modified anything about
the namespace.

Fixes: cdbff4f26b ("nvme: remove nvme_revalidate_ns")

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-30 08:22:41 -06:00
David S. Miller
e1ea2f9856 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts here.

NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.

Parallel additions to net/pkt_cls.h and net/sch_generic.h

A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.

The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-30 21:09:24 +09:00
Marcel Holtmann
459232fc0e Bluetooth: btusb: Fix isochronous interface assignments
The recent MacBook's with multi-function USB interfaces for HID and
Bluetooth operation have the isochronous interface on number 3 instead
of number 1. Store the interface number and use it.

P:  Vendor=05ac ProdID=8290 Rev= 1.40
S:  Manufacturer=Broadcom Corp.
S:  Product=Bluetooth USB Host Controller
C:* #Ifs= 6 Cfg#= 1 Atr=e0 MxPwr=  0mA
A:  FirstIf#= 2 IfCount= 4 Cls=ff(vend.) Sub=01 Prot=01
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
E:  Ad=85(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
E:  Ad=86(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 3 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 3 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 3 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 3 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 3 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=btusb
E:  Ad=84(I) Atr=02(Bulk) MxPS=  32 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS=  32 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2017-10-30 12:28:13 +02:00
Jaya P G
af3715e5ce Bluetooth: btusb: Update firmware filename for Intel 9x60 and later
The format of Intel Bluetooth firmware for bootloader product is
ibt-<hw_variant>-<device_revision_id>.sfi and .ddc.

But for the SKU's 9x60, there a 3 variants of FW, which cannot be
differentiated just with hw_variant and devision_revision_id.
So to pick the appropriate FW file for 9x60 SKU's, it will be
differentiated using hw_variant, hw_revision and fw_revision rather
than hw_variant and device_revision_id only.

Format will be like this:
ibt-<hw_variant>-<hw_revision>-<fw_revision>.sfi and .ddc

Signed-off-by: Jaya P G <jaya.p.g@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2017-10-30 12:25:49 +02:00
Marcel Holtmann
2064ee332e Bluetooth: Use bt_dev_err and bt_dev_info when possible
In case of using BT_ERR and BT_INFO, convert to bt_dev_err and
bt_dev_info when possible. This allows for controller specific
reporting.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2017-10-30 12:25:45 +02:00
Tero Kristo
2a9a86d5c8 PM / QoS: Fix default runtime_pm device resume latency
The recent change to the PM QoS framework to introduce a proper
no constraint value overlooked to handle the devices which don't
implement PM QoS OPS.  Runtime PM is one of the more severely
impacted subsystems, failing every attempt to runtime suspend
a device.  This leads into some nasty second level issues like
probe failures and increased power consumption among other
things.

Fix this by adding a proper return value for devices that don't
implement PM QoS.

Fixes: 0cc2b4e5a0 (PM / QoS: Fix device resume latency PM QoS)
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-30 10:54:57 +01:00
Kalle Valo
e48e9c429a Revert "ath10k: fix napi_poll budget overflow"
Thorsten reported on <fa6e3ee2-91b5-a54b-afe3-87f30aac7a48@leemhuis.info> that
commit c9353bf483 made ath10k unstable with QCA6174 on his Dell XPS13 (9360)
with an error message:

ath10k_pci 0000:3a:00.0: failed to extract amsdu: -11

It only seemed to happen with certain APs, not all, but when it happened the
only way to get ath10k working was to switch the wifi off and on with a hotkey.

As this commit made things even worse (a warning vs breaking the whole
connection) let's revert the commit for now and while the issue is being fixed.

Link: http://lists.infradead.org/pipermail/ath10k/2017-October/010227.html
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2017-10-30 10:39:42 +02:00
Vasanthakumar Thiagarajan
7eccb738fc ath10k: rebuild crypto header in rx data frames
Rx data frames notified through HTT_T2H_MSG_TYPE_RX_IND and
HTT_T2H_MSG_TYPE_RX_FRAG_IND expect PN/TSC check to be done
on host (mac80211) rather than firmware. Rebuild cipher header
in every received data frames (that are notified through those
HTT interfaces) from the rx_hdr_status tlv available in the
rx descriptor of the first msdu. Skip setting RX_FLAG_IV_STRIPPED
flag for the packets which requires mac80211 PN/TSC check support
and set appropriate RX_FLAG for stripped crypto tail. Hw QCA988X,
QCA9887, QCA99X0, QCA9984, QCA9888 and QCA4019 currently need the
rebuilding of cipher header to perform PN/TSC check for replay
attack.

Please note that removing crypto tail for CCMP-256, GCMP and GCMP-256 ciphers
in raw mode needs to be fixed. Since Rx with these ciphers in raw
mode does not work in the current form even without this patch and
removing crypto tail for these chipers needs clean up, raw mode related
issues in CCMP-256, GCMP and GCMP-256 can be addressed in follow up
patches.

Tested-by: Manikanta Pubbisetty <mpubbise@qti.qualcomm.com>
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2017-10-30 10:36:57 +02:00
Sebastian Andrzej Siewior
a9ee77af75 Bluetooth: avoid recursive locking in hci_send_to_channel()
Mart reported a deadlock in -RT in the call path:
  hci_send_monitor_ctrl_event() -> hci_send_to_channel()

because both functions acquire the same read lock hci_sk_list.lock. This
is also a mainline issue because the qrwlock implementation is writer
fair (the traditional rwlock implementation is reader biased).

To avoid the deadlock there is now __hci_send_to_channel() which expects
the readlock to be held.

Fixes: 38ceaa00d0 ("Bluetooth: Add support for sending MGMT commands and events to monitor")
Reported-by: Mart van de Wege <mvdwege@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-30 09:04:07 +01:00
Ronnie Sahlberg
f74bc7c667 cifs: check MaxPathNameComponentLength != 0 before using it
And fix tcon leak in error path.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: David Disseldorp <ddiss@samba.org>
2017-10-30 02:11:38 -05:00
Linus Torvalds
0b07194bb5 Linux 4.14-rc7 2017-10-29 13:58:38 -07:00
Linus Torvalds
19e12196da Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix route leak in xfrm_bundle_create().

 2) In mac80211, validate user rate mask before configuring it. From
    Johannes Berg.

 3) Properly enforce memory limits in fair queueing code, from Toke
    Hoiland-Jorgensen.

 4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.

 5) Fix TSO header allocation and management in mvpp2 driver, from Yan
    Markman.

 6) Don't take socket lock in BH handler in strparser code, from Tom
    Herbert.

 7) Don't show sockets from other namespaces in AF_UNIX code, from
    Andrei Vagin.

 8) Fix double free in error path of tap_open(), from Girish Moodalbail.

 9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
    and Alexander Duyck.

10) Fix DCB mode programming in stmmac driver, from Jose Abreu.

11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
    Long.

12) Properly align SKB head before building SKB in tuntap, from Jason
    Wang.

13) Avoid matching qdiscs with a zero handle during lookups, from Cong
    Wang.

14) Fix various endianness bugs in sctp, from Xin Long.

15) Fix tc filter callback races and add selftests which trigger the
    problem, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  selftests: Introduce a new test case to tc testsuite
  selftests: Introduce a new script to generate tc batch file
  net_sched: fix call_rcu() race on act_sample module removal
  net_sched: add rtnl assertion to tcf_exts_destroy()
  net_sched: use tcf_queue_work() in tcindex filter
  net_sched: use tcf_queue_work() in rsvp filter
  net_sched: use tcf_queue_work() in route filter
  net_sched: use tcf_queue_work() in u32 filter
  net_sched: use tcf_queue_work() in matchall filter
  net_sched: use tcf_queue_work() in fw filter
  net_sched: use tcf_queue_work() in flower filter
  net_sched: use tcf_queue_work() in flow filter
  net_sched: use tcf_queue_work() in cgroup filter
  net_sched: use tcf_queue_work() in bpf filter
  net_sched: use tcf_queue_work() in basic filter
  net_sched: introduce a workqueue for RCU callbacks of tc filter
  sctp: fix some type cast warnings introduced since very beginning
  sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
  sctp: fix some type cast warnings introduced by transport rhashtable
  sctp: fix some type cast warnings introduced by stream reconf
  ...
2017-10-29 08:11:49 -07:00
David S. Miller
6c325f4eca Merge branch 'net_sched-fix-races-with-RCU-callbacks'
Cong Wang says:

====================
net_sched: fix races with RCU callbacks

Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:

1. A simple one fixed by Daniel:

commit c78e1746d3
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed May 20 17:13:33 2015 +0200

    net: sched: fix call_rcu() race on classifier module unloads

2. A very nasty one fixed by me:

commit 1697c4bb52
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon Sep 11 16:33:32 2017 -0700

    net_sched: carefully handle tcf_block_put()

3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/
https://patchwork.ozlabs.org/patch/826695/

Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs.

Alternatively, we could also:

a) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb52 ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible (if not
impossible).

b) Just get rid of these RCU callbacks, because they are not
necessary at all, callers of these call_rcu() are all on slow paths
and holding RTNL lock, so blocking is allowed in their contexts.
However, David and Eric dislike adding synchronize_rcu() here.

As suggested by Paul, we could defer the work to a workqueue and
gain the permission of holding RTNL again without any performance
impact, however, in tcf_block_put() we could have a deadlock when
flushing workqueue while hodling RTNL lock, the trick here is to
defer the work itself in workqueue and make it queued after all
other works so that we keep the same ordering to avoid any
use-after-free. Please see the first patch for details.

Patch 1 introduces the infrastructure, patch 2~12 move each
tc filter to the new tc filter workqueue, patch 13 adds
an assertion to catch potential bugs like this, patch 14
closes another rcu callback race, patch 15 and patch 16 add
new test cases.
====================

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:32 +09:00
Chris Mi
31c2611b66 selftests: Introduce a new test case to tc testsuite
In this patchset, we fixed a tc bug. This patch adds the test case
that reproduces the bug. To run this test case, user should specify
an existing NIC device:
  # sudo ./tdc.py -d enp4s0f0

This test case belongs to category "flower". If user doesn't specify
a NIC device, the test cases belong to "flower" will not be run.

In this test case, we create 1M filters and all filters share the same
action. When destroying all filters, kernel should not panic. It takes
about 18s to run it.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Chris Mi
7f07199847 selftests: Introduce a new script to generate tc batch file
# ./tdc_batch.py -h
  usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file

  TC batch file generator

  positional arguments:
    device                device name
    file                  batch file name

  optional arguments:
    -h, --help            show this help message and exit
    -n NUMBER, --number NUMBER
                          how many lines in batch file
    -o, --skip_sw         skip_sw (offload), by default skip_hw
    -s, --share_action    all filters share the same action
    -p, --prio            all filters have different prio

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
46e235c15c net_sched: fix call_rcu() race on act_sample module removal
Similar to commit c78e1746d3
("net: sched: fix call_rcu() race on classifier module unloads"),
we need to wait for flying RCU callback tcf_sample_cleanup_rcu().

Cc: Yotam Gigi <yotamg@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
2d132eba1d net_sched: add rtnl assertion to tcf_exts_destroy()
After previous patches, it is now safe to claim that
tcf_exts_destroy() is always called with RTNL lock.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
27ce4f05e2 net_sched: use tcf_queue_work() in tcindex filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
d4f84a41dc net_sched: use tcf_queue_work() in rsvp filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
c2f3f31d40 net_sched: use tcf_queue_work() in route filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
c0d378ef12 net_sched: use tcf_queue_work() in u32 filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
df2735ee8e net_sched: use tcf_queue_work() in matchall filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
e071dff2a6 net_sched: use tcf_queue_work() in fw filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
0552c8afa0 net_sched: use tcf_queue_work() in flower filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
94cdb47566 net_sched: use tcf_queue_work() in flow filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:31 +09:00
Cong Wang
b1b5b04fdb net_sched: use tcf_queue_work() in cgroup filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Cong Wang
e910af676b net_sched: use tcf_queue_work() in bpf filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Cong Wang
c96a48385d net_sched: use tcf_queue_work() in basic filter
Defer the tcf_exts_destroy() in RCU callback to
tc filter workqueue and get RTNL lock.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Cong Wang
7aa0045dad net_sched: introduce a workqueue for RCU callbacks of tc filter
This patch introduces a dedicated workqueue for tc filters
so that each tc filter's RCU callback could defer their
action destroy work to this workqueue. The helper
tcf_queue_work() is introduced for them to use.

Because we hold RTNL lock when calling tcf_block_put(), we
can not simply flush works inside it, therefore we have to
defer it again to this workqueue and make sure all flying RCU
callbacks have already queued their work before this one, in
other words, to ensure this is the last one to execute to
prevent any use-after-free.

On the other hand, this makes tcf_block_put() ugly and
harder to understand. Since David and Eric strongly dislike
adding synchronize_rcu(), this is probably the only
solution that could make everyone happy.

Please also see the code comments below.

Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 22:49:30 +09:00
Konrad Zapałowicz
1f01d8be0e Bluetooth: increase timeout for le auto connections
This patch increases the connection timeout for LE connections that are
triggered by the advertising report to 4 seconds.

It has been observed that devices equipped with wifi+bt combo SoC fail
to create a connection with BLE devices due to their coexistence issues.
Increasing this timeout gives them enough time to complete the
connection with success.

Signed-off-by: Konrad Zapałowicz <konrad.zapalowicz@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-29 14:10:35 +01:00
Loic Poulain
13df5000d3 Bluetooth: hci_ath: Add ath_vendor_cmd helper
Introduce ath_vendor_cmd function which can be used to
configure 'tags' and patch the firmware.

ATH vendor command has the following format:
| OPCODE (u8) | INDEX (LE16) | DLEN (U8) | DATA (U8 * DLEN) |

BD address configuration tag is at index 0x0001.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-29 14:08:56 +01:00
Jaganath Kanakkassery
f17d858ed0 Bluetooth: Fix potential memory leak
If command is added to req then it should be freed in case if
hdev is down or HCI_ADVERTISING flag is set.

This introduces a helper in hci_request to purge the cmd_q
to make cmd_q internal to hci_request which is used to fix
the leak.

This also replace accessing of cmd_q in hci_conn with the
new helper.

Signed-off-by: Jaganath Kanakkassery <jaganathx.kanakkassery@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-29 14:07:10 +01:00
Bartosz Chronowski
858ff38af7 Bluetooth: btusb: Add new NFA344A entry.
This change allows proper low power mode entry in suspend.

/sys/kernel/debug/usb/devices entry:
T:  Bus=01 Lev=01 Prnt=01 Port=05 Cnt=03 Dev#=  3 Spd=12   MxCh= 0
D:  Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0489 ProdID=e09f Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms

Signed-off-by: Bartosz Chronowski <ext.bartosz.chronowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-29 14:05:17 +01:00
Ronald Tschalär
67d2f8781b Bluetooth: hci_ldisc: Allow sleeping while proto locks are held.
Commit dec2c92880 ("Bluetooth: hci_ldisc:
Use rwlocking to avoid closing proto races") introduced locks in
hci_ldisc that are held while calling the proto functions. These locks
are rwlock's, and hence do not allow sleeping while they are held.
However, the proto functions that hci_bcm registers use mutexes and
hence need to be able to sleep.

In more detail: hci_uart_tty_receive() and hci_uart_dequeue() both
acquire the rwlock, after which they call proto->recv() and
proto->dequeue(), respectively. In the case of hci_bcm these point to
bcm_recv() and bcm_dequeue(). The latter both acquire the
bcm_device_lock, which is a mutex, so doing so results in a call to
might_sleep(). But since we're holding a rwlock in hci_ldisc, that
results in the following BUG (this for the dequeue case - a similar
one for the receive case is omitted for brevity):

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c
  in_atomic(): 1, irqs_disabled(): 0, pid: 7303, name: kworker/7:3
  INFO: lockdep is turned off.
  CPU: 7 PID: 7303 Comm: kworker/7:3 Tainted: G        W  OE   4.13.2+ #17
  Hardware name: Apple Inc. MacBookPro13,3/Mac-A5C67F76ED83108C, BIOS MBP133.8
  Workqueue: events hci_uart_write_work [hci_uart]
  Call Trace:
   dump_stack+0x8e/0xd6
   ___might_sleep+0x164/0x250
   __might_sleep+0x4a/0x80
   __mutex_lock+0x59/0xa00
   ? lock_acquire+0xa3/0x1f0
   ? lock_acquire+0xa3/0x1f0
   ? hci_uart_write_work+0xd3/0x160 [hci_uart]
   mutex_lock_nested+0x1b/0x20
   ? mutex_lock_nested+0x1b/0x20
   bcm_dequeue+0x21/0xc0 [hci_uart]
   hci_uart_write_work+0xe6/0x160 [hci_uart]
   process_one_work+0x253/0x6a0
   worker_thread+0x4d/0x3b0
   kthread+0x133/0x150

We can't replace the mutex in hci_bcm, because there are other calls
there that might sleep. Therefore this replaces the rwlock's in
hci_ldisc with rw_semaphore's (which allow sleeping). This is a safer
approach anyway as it reduces the restrictions on the proto callbacks.
Also, because acquiring write-lock is very rare compared to acquiring
the read-lock, the percpu variant of rw_semaphore is used.

Lastly, because hci_uart_tx_wakeup() may be called from an IRQ context,
we can't block (sleep) while trying acquire the read lock there, so we
use the trylock variant.

Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-10-29 14:03:28 +01:00
David S. Miller
aad93c70b9 Merge branch 'ipvlan-private-vepa'
Mahesh Bandewar says:

====================
add 'private' and 'vepa' attributes to ipvlan modes

IPvlan has always been operating in bridge-mode for its supported modes i.e.
if the packets are destined to the adjacent neighbor dev, then IPvlan driver
will switch the packet internally without needing the packets to hit the
wire or get routed. However, there are situations where this bridge-mode is
not needed. e.g. two private processes running inside two namespaces which
are having one IPvlan slave each for its namespace but sharing the master. These
processes should reach the outside world through the master device but at
the same time the bridge function should not work. Currently that's not
possible hence the private attribute for the selected mode comes in play.

VEPA or 802.1Qbg on the other hand has limited appeal with IPvlan since IPvlan
uses the mac-address of the lower device. So packets that are destined to
the adjacent neighbor slave-dev will have same src and dest mac. When these
packets reach the external switch/router, they will send you the redirect
message which the host will have to deal with. Having said that this attribute
will have appeal in debugging as IPvlan will not switch / short-circuit
packets internally. e.g. using VEPA mode with lower-device in loopback mode
will avoid some complicated set-ups that use non-local-bind with some route
jugglery.

This patch-set implements these attributes for the existing modes that
IPvlan has. Please see individual patches for their detailed implementation.
A subsequent ip-utils patch is needed and will be sent soon.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:39:58 +09:00
Mahesh Bandewar
fe89aa6b25 ipvlan: implement VEPA mode
This is very similar to the Macvlan VEPA mode, however, there is some
difference. IPvlan uses the mac-address of the lower device, so the VEPA
mode has implications of ICMP-redirects for packets destined for its
immediate neighbors sharing same master since the packets will have same
source and dest mac. The external switch/router will send redirect msg.

Having said that, this will be useful tool in terms of debugging
since IPvlan will not switch packets within its slaves and rely completely
on the external entity as intended in 802.1Qbg.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:39:57 +09:00
Mahesh Bandewar
a190d04db9 ipvlan: introduce 'private' attribute for all existing modes.
IPvlan has always operated in bridge mode. However there are scenarios
where each slave should be able to talk through the master device but
not necessarily across each other. Think of an environment where each
of a namespace is a private and independant customer. In this scenario
the machine which is hosting these namespaces neither want to tell who
their neighbor is nor the individual namespaces care to talk to neighbor
on short-circuited network path.

This patch implements the mode that is very similar to the 'private' mode
in macvlan where individual slaves can send and receive traffic through
the master device, just that they can not talk among slave devices.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:39:57 +09:00
Quentin Monnet
995231c820 tools: bpftool: add bash completion for bpftool
Add a completion file for bash. The completion function runs bpftool
when needed, making it smart enough to help users complete ids or tags
for eBPF programs and maps currently on the system.

Update Makefile to install completion file to
/usr/share/bash-completion/completions when running `make install`.

Emacs file mode and (at the end) Vim modeline have been added, to keep
the style in use for most existing bash completion files. In this, it
differs from tools/perf/perf-completion.sh, which seems to be the only
other completion file among the kernel sources repository. This is also
valid for indent style: 4-space indents, as in other completion files.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:37:33 +09:00
David S. Miller
8c83c88584 Merge branch 'sctp-endianness-fixes'
Xin Long says:

====================
sctp: a bunch of fixes for some sparse warnings

As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
warnings or errors checked by sparse appear. They are all problems
about Endian and type cast.

Most of them are just warnings by which no issues could be caused
while some might be bugs.

This patchset fixes them with four patches basically according to
how they are introduced.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:25 +09:00
Xin Long
978aa04741 sctp: fix some type cast warnings introduced since very beginning
These warnings were found by running 'make C=2 M=net/sctp/'.
They are there since very beginning.

Note after this patch, there still one warning left in
sctp_outq_flush():
  sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)

Since it has been moved to sctp_stream_outq_migrate on net-next,
to avoid the extra job when merging net-next to net, I will post
the fix for it after the merging is done.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:24 +09:00
Xin Long
f6fc6bc0b8 sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
These warnings were found by running 'make C=2 M=net/sctp/'.

Commit d4d6fb5787 ("sctp: Try not to change a_rwnd when faking a
SACK from SHUTDOWN.") expected to use the peers old rwnd and add
our flight size to the a_rwnd. But with the wrong Endian, it may
not work as well as expected.

So fix it by converting to the right value.

Fixes: d4d6fb5787 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 18:03:24 +09:00