To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks. This function is
based on the old alloc_bucket_locks in rhashtable and should
produce the same effect.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two new library functions: alloc_bucket_spinlocks and
free_bucket_spinlocks. These are used to allocate and free an array
of spinlocks that are useful as locks for hash buckets. The interface
specifies the maximum number of spinlocks in the array as well
as a CPU multiplier to derive the number of spinlocks to allocate.
The number allocated is rounded up to a power of two to make the
array amenable to hash lookup.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split out most of rht_key_hashfn which is calculating the hash into
its own function. This way the hash function can be called separately to
get the hash value.
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is like rhashtable_walk_next except that it only returns
the current element in the inter and does not advance the iter.
This patch also creates __rhashtable_walk_find_next. It finds the next
element in the table when the entry cached in iter is NULL or at the end
of a slot. __rhashtable_walk_find_next is called from
rhashtable_walk_next and rhastable_walk_peek.
end_of_table is an added field to the iter structure. This indicates
that the end of table was reached (walker.tbl being NULL is not a
sufficient condition for end of table).
Signed-off-by: Tom Herbert <tom@quantonium.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most callers of rhashtable_walk_start don't care about a resize event
which is indicated by a return value of -EAGAIN. So calls to
rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
like this is common:
ret = rhashtable_walk_start(rhiter);
if (ret && ret != -EAGAIN)
goto out;
Since zero and -EAGAIN are the only possible return values from the
function this check is pointless. The condition never evaluates to true.
This patch changes rhashtable_walk_start to return void. This simplifies
code for the callers that ignore -EAGAIN. For the few cases where the
caller cares about the resize event, particularly where the table can be
walked in mulitple parts for netlink or seq file dump, the function
rhashtable_walk_start_check has been added that returns -EAGAIN on a
resize event.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 46e6b992c2 ("rtnetlink: allow GSO maximums to be set on device creation")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4675ff05de ("kmemcheck: rip it out") has removed the code but
for some reason SPDX header stayed in place. This looks like a rebase
mistake in the mmotm tree or the merge mistake. Let's drop those
leftovers as well.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
drivers).
2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
to userspace and broke some apps. From Johannes Berg.
3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
Wang.
4) Gianfar MAC can't do EEE so don't advertise it by default, from
Claudiu Manoil.
5) Relax strict netlink attribute validation, but emit a warning. From
David Ahern.
6) Fix regression in checksum offload of thunderx driver, from Florian
Westphal.
7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.
8) New card support in iwlwifi, from Ihab Zhaika.
9) BBR congestion control bug fixes from Neal Cardwell.
10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.
11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.
12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.
13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
net: mvpp2: fix the RSS table entry offset
tcp: evaluate packet losses upon RTT change
tcp: fix off-by-one bug in RACK
tcp: always evaluate losses in RACK upon undo
tcp: correctly test congestion state in RACK
bnxt_en: Fix sources of spurious netpoll warnings
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
sfc: pass valid pointers from efx_enqueue_unwind
gianfar: Disable EEE autoneg by default
tcp: invalidate rate samples during SACK reneging
can: peak/pcie_fd: fix potential bug in restarting tx queue
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
can: ems_usb: cancel urb on -EPIPE and -EPROTO
can: mcba_usb: cancel urb on -EPROTO
usbnet: fix alignment for frames with no ethernet header
tcp: use current time in tcp_rcv_space_adjust()
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaKrOpAAoJEAhfPr2O5OEVK5sP/iHWDJtRw/WWBYOqF9cGFl0o
ssh1iXJAsOmLjaARMOkxQBPLwTvVMreux2fHow/mukrXB8BIR7OmflBaQRLM3lbW
xm9G4yarXf7xgkxngisemQrJiweNyaNDX9P0BqJLV55Xhp+rO2Q6rutspho3xzoo
Pmz7bgnt0FjBKz+0LWnKnzozdNinj2uhTdBOTgm9rUuUUGiAVTAQSNUq2e5Gzw3m
DhO4UPOJDRmnZ6Rqldq2pD2wzmRfbVonirz7IEh7j2opcAoCM2TnmM+Z9C5zjlUj
XVFagiR+XG8lsh2zvHdweter4X9DqLBlMbBDVAQ+vH+xhBuz63Yqq1pSIvXA342U
rs1X182FSyE+pOxipuae94csHkyzlb9tmzoiUoItU+QXi8Wlg8pLH6GFQe/72t/g
HpcXMBq2lflc2KDstx+QGs/G+1EYtz64vUXZ4pvuIitrhdbd5ulLh7Y7nb7qfEym
WEHpaJwxh4MZh6RzB2tq+Tvy0/sD1krNjPTXtf2CuTdFNUuvk8QenRW1q2YVA06L
UOTxRpfPYUPwhGKOoKb2IJ5g1xoztoqMaafdsUJ16H7fcuVtE9xQayVCVhaBuo08
4g7SFCb7MhNLFG1Su291u9PKYlvGhCJaDPKE/6E+IM6szxt7h/kveYsdb05yWOoB
di+cIWZkCRUzbO7fxz4J
=xoHW
-----END PGP SIGNATURE-----
Merge tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A series of fixes for the media subsytem:
- The largest amount of fixes in this series is with regards to
comments that aren't kernel-doc, but start with "/**".
A new check added for 4.15 makes it to produce a *huge* amount of
new warnings (I'm compiling here with W=1). Most of the patches in
this series fix those.
No code changes - just comment changes at the source files
- rc: some fixed in order to better handle RC repetition codes
- v4l-async: use the v4l2_dev from the root notifier when matching
sub-devices
- v4l2-fwnode: Check subdev count after checking port
- ov 13858 and et8ek8: compilation fix with randconfigs
- usbtv: a trivial new USB ID addition
- dibusb-common: don't do DMA on stack on firmware load
- imx274: Fix error handling, add MAINTAINERS entry
- sir_ir: detect presence of port"
* tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits)
media: imx274: Fix error handling, add MAINTAINERS entry
media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices
media: v4l2-fwnode: Check subdev count after checking port
media: et8ek8: select V4L2_FWNODE
media: ov13858: Select V4L2_FWNODE
media: rc: partial revert of "media: rc: per-protocol repeat period"
media: dvb: i2c transfers over usb cannot be done from stack
media: dvb-frontends: complete kernel-doc markups
media: docs: add documentation for frontend attach info
media: dvb_frontends: fix kernel-doc macros
media: drivers: remove "/**" from non-kernel-doc comments
media: lm3560: add a missing kernel-doc parameter
media: rcar_jpu: fix two kernel-doc markups
media: vsp1: add a missing kernel-doc parameter
media: soc_camera: fix a kernel-doc markup
media: mt2063: fix some kernel-doc warnings
media: radio-wl1273: fix a parameter name at kernel-doc macro
media: s3c-camif: add missing description at s3c_camif_find_format()
media: mtk-vpu: add description for wdt fields at struct mtk_vpu
media: vdec: fix some kernel-doc warnings
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaKedrAAoJEAx081l5xIa+CLkP/AkLTMW17WT/aktLfq4/+NQV
WGcLxQw+SB9iYMffFUNdQdMqv3pKsDzeatifxNuoRnjvK6rNZDUFK11SoTEFCdA+
opI3zBPiyMVv71LtLF/r2kj3Vw5rnARdCg24FIzptGg/4j+G0EOb9uzd9qjmjof1
r2XD1f0mjIG6SREMYCZPJj/Q/ZGfWFKXxcatmf87sAoq4o2nfoQ1Sl0T4cnMV7yp
FzzFKA/j6OaiVa0Tb1KiJOl8VnW5f9STanUynFcPtKd2mhDcVaEP4lAfBhHgccd8
ufmsyPJDadGQDT0iQY5E8n+ht124wg4gW8TKGqyxDfxuzTFj2k62WOUMH4eSjGSt
x7SHEewTQLoV5kIPp0y00T2NWuA4/jmjjgllQZgqSFwnsv8fQ9/GwLNPWrUGM0+z
6sqzj7QRyn8GGuYblqVghVYj0bX3ovmYlLzOaKZ+pNSM34hLBFRenZQ5mKJ6WrcE
SIKzuQm3HEC8KwNw72zZ6+zz5fc7x9KuURztwEPt6vOMnxz1PtjWSdHzfvja4cH2
6D2t0lZSqkM244FFG4ImFP6CDF+8vaY7i9YybjYteycWH7o9haVa869vCxUdNUWo
DDghWKPRjbl7LJLJ14BxMrg0+sk+XWYcoZXjUAM0rYFFOWoFkVAlF8Sy3kgb5Pky
wMLalQohWFbkZY60q02h
=2f34
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This pull is a bit larger than I'd like but a large bunch of it is
license fixes, AMD wanted to fix the licenses for a bunch of files
that were missing them,
Otherwise a bunch of TTM regression fix since the hugepage support,
some i915 and gvt fixes, a core connector free in a safe context fix,
and one bridge fix"
* tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits)
drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
drm/exynos: remove unnecessary function declaration
drm/exynos: remove unnecessary descrptions
drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
drm/exynos: Fix dma-buf import
drm/ttm: swap consecutive allocated pooled pages v4
drm: safely free connectors from connector_iter
drm/i915/gvt: set max priority for gvt context
drm/i915/gvt: Don't mark vgpu context as inactive when preempted
drm/i915/gvt: Limit read hw reg to active vgpu
drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
drm/i915/gvt: Emulate PCI expansion ROM base address register
drm/ttm: swap consecutive allocated cached pages v3
drm/ttm: roundup the shrink request to prevent skip huge pool
drm/ttm: add page order support in ttm_pages_put
drm/ttm: add set_pages_wb for handling page order more than zero
drm/ttm: add page order in page pool
...
Pull md fixes from Shaohua Li:
"Some MD fixes.
The notable one is a raid5-cache deadlock bug with dm-raid, others are
not significant"
* tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid1/10: add missed blk plug
md: limit mdstat resync progress to max_sectors
md/r5cache: move mddev_lock() out of r5c_journal_mode_set()
md/raid5: correct degraded calculation in raid5_error
- Fixes from overlay code rework. A trifecta of fixes to the locking,
an out of bounds access, and a memory leak in of_overlay_apply().
- Clean-up at25 eeprom binding document
- Remove leading '0x' in unit-addresses from binding docs
-----BEGIN PGP SIGNATURE-----
iQItBAABCAAXBQJaKrC8EBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcPY7Q/9
GOZDOhajewVoYo+LkKl8+1wtdih7QHOvziKY8Jn1Nd/o6ASXU5ltOxvk5iF/q4tz
x7JFKMkUo8eyOy5p5XnkqAzOnQ+rP+aRvHxjeAiMtQZRtMcDJ6bv4FRMI4/hq610
QocAcxzblXJayy/Ad2sbuMr34O6QBkvnicMiLHZNb2IhCgk0nxBbyiNJDr868XE6
CwlkKgo+y0fFJKkW3y4v4jaMdMlmcxTPTY+4jOJoGnnD1LgNwNRgkTFCzRQLebBk
E5D6XweMS7G0IOO8wLYEn70JYsNk4wRGjIvlZGR1e5REQNgT9H+1J8rXLxlWc5rk
l2o4V3+nrBj3BpVCG5oQWBrwVsMZS8ExdnRnrGmZJ25C1EFzlRHGYtsvo6fvXsLs
T2VfYPEDhN5IHZiUU/rcHfz1VpHjuBrtFStPQkFTSTfO52n22Q5KWBxHUqUjXbBZ
phcPoHcDm+FfO4swkLuIP0wDAiNj8SJpJJNnHLrgIuy52UvXe/VD5BlY5RgXIZbg
XhxAcsNKrqfH8ggfbM7ndXkUoFrDyl/dOupfwbtwzNqDQiPGvpjkr9hmE7XIL2aJ
tyGahgkpZ3n49oGJPe5rTHKJacZBkl6rNI2CFQfERXyGYUmQSEU6AViSSyRG6ljK
kXLmIzNPgJJF/I9YC3q8Wr6FUoOAapE7Zww/Cc83xJ8=
=vBBN
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Another set of DT fixes:
- Fixes from overlay code rework. A trifecta of fixes to the locking,
an out of bounds access, and a memory leak in of_overlay_apply()
- Clean-up at25 eeprom binding document
- Remove leading '0x' in unit-addresses from binding docs"
* tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: overlay: Make node skipping in init_overlay_changeset() clearer
of: overlay: Fix out-of-bounds write in init_overlay_changeset()
of: overlay: Fix (un)locking in of_overlay_apply()
of: overlay: Fix memory leak in of_overlay_apply() error path
dt-bindings: eeprom: at25: Document device-specific compatible values
dt-bindings: eeprom: at25: Grammar s/are can/can/
dt-bindings: Remove leading 0x from bindings notation
of: overlay: Remove else after goto
of: Spelling s/changset/changeset/
of: unittest: Remove bogus overlay mutex release from overlay_data_add()
A couple of minor bugfixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaKW26AAoJECgfDbjSjVRpDwIH/1tq1vcjd5i00nh+gQD7jWm2
qWrpcKbsS0ANTvhm9UNFY24BVoPuNpNrApQpuBxmgxE/XGqx7A+8xXhwPAM5lLiG
uDSPB2nsfjvEUOde7bgeR+t6ay+Ki2UWKzY46lSoCTcIN7BSVFWQZonsGu2xLDzz
kKpCtlobXRnXzeWm+fh1oOZu/cn/TuAF0mbb+6TUQqSsHAt6PQ3Hwsly2EmQV5xR
of6Si/TnxFOkhZUKpezbuTU/g/20ZkHUSzgfrOuZLGonaIvZIE6BUm8m21/IgCbw
WnwSqe66WsWF2vd0RDLFq4L3qaRyGn+W7iYo3KPbwIPEEotSRFMUgixBWBB3IgU=
=Pf4w
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio bugfixes from Michael Tsirkin:
"A couple of minor bugfixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_net: fix return value check in receive_mergeable()
virtio_mmio: add cleanup for virtio_mmio_remove
virtio_mmio: add cleanup for virtio_mmio_probe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJaKlu/AAoJELDendYovxMvaqoH/i7gN9xry7QkUM6RkwddGwYY
v0rqaUo4WCW27yFOE7Bzej9Y+W92/eFPJnVUhc/quTVfV+uEjbs4PiAwuxSr+lIU
X+BhBNbEi9C5RlRL1z75J0ZySyu6WXL2hsmPbc0wrrqdQikfiZ7bnRjdGAHh5C5C
TijKQKGZTt6ccjIPUEZTIqeajOt/p7uxkCXPWhHQA1mudf9PVhsKyYnGdYp5gp8X
KID+8XmKtAcSwPUz+eG9vGlGwmP28mH0BfCT0suC2uUI4o+PJFPqBTlfsco2kfHO
NqVCgnMZs31Js8mdEVz8h2ZO8m2T5m1oml1zOeyDbgTJ8yjqgADy8K6Lm38clko=
=ZHtb
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Just two small fixes for the new pvcalls frontend driver"
* tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvcalls: Fix a check in pvcalls_front_remove()
xen/pvcalls: check for xenbus_read() errors
One notable fix for kexec on Power9, where we were not clearing MMU PID properly
which sometimes leads to hangs. Finally debugged to a root cause by Nick.
A revert of a patch which tried to rework our panic handling to get more output
on the console, but inadvertently broke reporting the panic to the hypervisor,
which apparently people care about.
Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in xmon.
Thanks to:
David Gibson, Nicholas Piggin, Ravi Bangoria.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJaKoWXExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAp
Rw/+KRvwt1jt3vFKrWlcXQ4Mx4UTseSaBO7FsGwyANqNGUNvkIEIAZYu6M9x0LLh
tfVowZdJ2vQrgdZy4Rd5zhIjzVaybyENMAMZFmGCxQUidORdibP2qT+3612FmQl0
rczQB4Ra1Jymw+42iwe4WQfyta9cvVgfk7D+1KVWaCXQ0lx8DynZ75yK+U0fensz
FPQNdtkfC2D37IFrqtgGBS5YLkeQpfftm8C/eBG0n2tv8PO1KM5xwVU8Ovf5LoIm
8NbWL//H+zUOoU2jCGHDMfg1qLv9owScTMRtquQSmrE1i21mE2lLOSDSM105+AP2
7CVRMMkth8V9w/nauPq0a5OGyzJWtClI9qj2ZPWS2wPF331g58GUNJsEy7OQAJgO
QZoqcCkpT5qarmxkcKJlYZGF6AZ/4mIBL9mucfQc/afEgRUqksaUKck0qD5SW28j
fm3pPjlMyf2vMRKGgaE9/+by5N/Bmxy2VCoFSuhm1ZrQsIpZXtp/Mfylqz0msdhU
VCt4T229S7rdCQTn2TyMNW+iVmjlgvR4OUXvba/eBz67gzGk4huLNB4EnEwHA/SK
qhkTJqYbP9B/MBD9GrNLFzG5yZTTv+3OA/aehL0PEGouV7cgMEqyGtYw2afwggRC
sf+veK/2apPMnlA5WItEa7JPWaTLsxljZ65acskb7S/4W8g=
=wU60
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One notable fix for kexec on Power9, where we were not clearing MMU
PID properly which sometimes leads to hangs. Finally debugged to a
root cause by Nick.
A revert of a patch which tried to rework our panic handling to get
more output on the console, but inadvertently broke reporting the
panic to the hypervisor, which apparently people care about.
Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in
xmon.
Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria"
* tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/xmon: Don't print hashed pointers in xmon
powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"
powerpc/perf: Fix oops when grouping different pmu events
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAloqeVYTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAe/sog7Dgc9tiqCADK4f/QYW5q5jC93A6JZSItI8vAK2+h
0s4MRTj9x+thBIGIhJ59uYBSTd374bvsWmrGdV7CBoGX4TnEJfGiMV77lpGnGRVG
jpzk9cSFoAnE5UW2qZlF+JM8SNEFlU18MCQQlnMzKbSGerUAlveK+mcF5sJrqrQh
CGZ9MH1Bp4Fz3WMRQ9hHzKWjTOhhM54qjPceCVTZM6I0RJam6I2lpVZPQeom9uVa
r+F5Lv2ZOpNZc+8Pbu+L95YyivKKaQOPzeP4btFLNEHUyFDygHcv2iKRIn9MdEu2
2XfaDVKk2Ey/qWc782SLBxLOihnhWltwC7Kg1ZnrLhNZ6V5UbYQ5FzF4
=OMZE
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.15-20171208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-12-08
this is a pull request of 6 patches for net/master.
Martin Kelly provides 5 patches for various USB based CAN drivers, that
properly cancel the URBs on adapter unplug, so that the driver doesn't
end up in an endless loop. Stephane Grosjean provides a patch to restart
the tx queue if zero length packages are transmitted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move 'macaddr_count' from after 'netpoll' to after 'nest_level' to pack
and reduce a memory hole.
Fixes: 88ca59d1aa (macvlan: remove unused fields in struct macvlan_dev)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Second set of fixes for 4.15. This time a lot of iwlwifi patches and
two brcmfmac patches. Most important here are the MIC and IVC fixes
for iwlwifi to unbreak 9000 series.
iwlwifi
* fix rate-scaling to not start lowest possible rate
* fix the TX queue hang detection for AP/GO modes
* fix the TX queue hang timeout in monitor interfaces
* fix packet injection
* remove a wrong error message when dumping PCI registers
* fix race condition with RF-kill
* tell mac80211 when the MIC has been stripped (9000 series)
* tell mac80211 when the IVC has been stripped (9000 series)
* add 2 new PCI IDs, one for 9000 and one for 22000
* fix a queue hang due during a P2P Remain-on-Channel operation
brcmfmac
* fix a race which sometimes caused a crash during sdio unbind
* fix a kernel-doc related build error
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaKp9UAAoJEG4XJFUm622bhXgH+wTtTVEH0lAOTtK+PyBkxkRH
Q+55Yf1XZ9lNYxmfXhYgObusSbmeL8tClMuISCcQS9gX0um1Vuuud4CLemgO2V7R
V1xuWTKjanaKf8PouKx9SUt1Fx6CsFdwlivJX+eZTfKlKYtwNbNX4onWl9GN2jVZ
5/2l+m3MJbTMMzarZTGLkBJqpTk8DGTNINtKeRd+VF+717SlbqpRlw1TTlbVJcR2
nHcJ3p5JGbU/+hOTroOWUr7kGdpdYlWLfcyOL8iT3rZXtAzH/POjPAmQv9VVRaC+
anm5zn+gZ5GH9XF+pc3nrGOEZ2Ei4LtszMQjdo4Zo9V3ngCj/0OoEnkto6VLbkw=
=lzG8
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2017-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.15
Second set of fixes for 4.15. This time a lot of iwlwifi patches and
two brcmfmac patches. Most important here are the MIC and IVC fixes
for iwlwifi to unbreak 9000 series.
iwlwifi
* fix rate-scaling to not start lowest possible rate
* fix the TX queue hang detection for AP/GO modes
* fix the TX queue hang timeout in monitor interfaces
* fix packet injection
* remove a wrong error message when dumping PCI registers
* fix race condition with RF-kill
* tell mac80211 when the MIC has been stripped (9000 series)
* tell mac80211 when the IVC has been stripped (9000 series)
* add 2 new PCI IDs, one for 9000 and one for 22000
* fix a queue hang due during a P2P Remain-on-Channel operation
brcmfmac
* fix a race which sometimes caused a crash during sdio unbind
* fix a kernel-doc related build error
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The first and only parameter of sl_alloc() is unused, so remove it.
Fixes: 5342b77c41 slip: ("Clean up create and destroy")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macro used to access or set an RSS table entry was using an offset
of 8, while it should use an offset of 0. This lead to wrongly configure
the RSS table, not accessing the right entries.
Fixes: 1d7d15d79f ("net: mvpp2: initialize the RSS tables")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use meminfo to identify the egress and ingress context regions and
fetch all valid contexts from those regions. Also flush all contexts
before attempting collection to prevent stale information.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use meminfo to identify TX and RX payload regions and skip them in
collection of EDC, MC, and HMA.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use meminfo to get base address and size of MC memory. Also use same
meminfo for EDC memory dumps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Collect memory layout of various on-chip memory regions. Move code
for collecting on-chip memory information to cudbg_lib.c and update
cxgb4_debugfs.c to use the common function. Also include
cudbg_entity.h before cudbg_lib.h to avoid adding cudbg entity
structure forward declarations in cudbg_lib.h.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
veth and GSO maximums
This is the more general way to solving the issue of GSO limits
not being set correctly for containers on Azure. If a GSO packet
is sent to host that exceeds the limit (reported by NDIS), then
the host is forced to do segmentation in software which has noticeable
performance impact.
The core rtnetlink infrastructure already has the messages and
infrastructure to allow changing gso limits. With an updated iproute2
the following already works:
# ip li set dev dummy0 gso_max_size 30000
These patches are about making it easier with veth.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When new veth is created, and GSO values have been configured
on one device, clone those values to the peer.
For example:
# ip link add dev vm1 gso_max_size 65530 type veth peer name vm2
This should create vm1 <--> vm2 with both having GSO maximum
size set to 65530.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink device already allows changing GSO sizes with
ip set command. The part that is missing is allowing overriding
GSO settings on device creation.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is nothing that says that number of TX queues == number of RX
queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue.
This code is obviously wrong:
for (chan = 0; chan < tx_channel_count; chan++) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an
uninitialized napi_struct to __napi_schedule(), causing us to
crash in net_rx_action(), because napi_struct->poll is zero.
[12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[12846.768014] pgd = (ptrval)
[12846.770742] [00000000] *pgd=39ec7831, *pte=00000000, *ppte=00000000
[12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[12846.782942] Modules linked in:
[12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36
[12846.794177] Hardware name: Axis ARTPEC-6 Platform
[12846.798879] task: (ptrval) task.stack: (ptrval)
[12846.803407] PC is at 0x0
[12846.805942] LR is at net_rx_action+0x274/0x43c
[12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113
[12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44
[12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100
[12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900
[12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900
Since each DMA channel can be used for rx and tx simultaneously,
the current code should probably be rewritten so that napi_struct is
embedded in a new struct stmmac_channel.
That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
where we got the IRQ, instead of looping through all tx queues.
This is also how the xgbe driver does it (another driver for this IP).
Fixes: c22a3f48ef ("net: stmmac: adding multiple napi mechanism")
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng says:
====================
tcp: RACK loss recovery bug fixes
This patch set has four minor bug fixes in TCP RACK loss recovery.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
RACK skips an ACK unless it advances the most recently delivered
TX timestamp (rack.mstamp). Since RACK also uses the most recent
RTT to decide if a packet is lost, RACK should still run the
loss detection whenever the most recent RTT changes. For example,
an ACK that does not advance the timestamp but triggers the cwnd
undo due to reordering, would then use the most recent (higher)
RTT measurement to detect further losses.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RACK should mark a packet lost when remaining wait time is zero.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sender detects spurious retransmission, all packets
marked lost are remarked to be in-flight. However some may
be considered lost based on its timestamps in RACK. This patch
forces RACK to re-evaluate, which may be skipped previously if
the ACK does not advance RACK timestamp.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RACK does not test the loss recovery state correctly to compute
the reordering window. It assumes if lost_out is zero then TCP is
not in loss recovery. But it can be zero during recovery before
calling tcp_rack_detect_loss(): when an ACK acknowledges all
packets marked lost before receiving this ACK, but has not yet
to discover new ones by tcp_rack_detect_loss(). The fix is to
simply test the congestion state directly.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ALR table operations are a sequence of related register operations which
should be protected from concurrent access. The alr_cache should also be
protected. Add alr_mutex doing that.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the block is freed with last chain being put, once we reach the
end of iteration of list_for_each_entry_safe, the block may be
already freed. I'm hitting this only by creating and deleting clsact:
[ 202.171952] ==================================================================
[ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
[ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
[ 202.194508]
[ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
[ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[ 202.213613] Call Trace:
[ 202.216369] dump_stack+0xda/0x169
[ 202.220192] ? dma_virt_map_sg+0x147/0x147
[ 202.224790] ? show_regs_print_info+0x54/0x54
[ 202.229691] ? tcf_chain_destroy+0x1dc/0x250
[ 202.234494] print_address_description+0x83/0x3d0
[ 202.239781] ? tcf_block_put_ext+0x240/0x390
[ 202.244575] kasan_report+0x1ba/0x460
[ 202.248707] ? tcf_block_put_ext+0x240/0x390
[ 202.253518] tcf_block_put_ext+0x240/0x390
[ 202.258117] ? tcf_chain_flush+0x290/0x290
[ 202.262708] ? qdisc_hash_del+0x82/0x1a0
[ 202.267111] ? qdisc_hash_add+0x50/0x50
[ 202.271411] ? __lock_is_held+0x5f/0x1a0
[ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress]
[ 202.281323] qdisc_destroy+0xcb/0x240
[ 202.285445] qdisc_graft+0x216/0x7b0
[ 202.289497] tc_get_qdisc+0x260/0x560
Fix this by holding the block also by chain 0 and put chain 0
explicitly, out of the list_for_each_entry_safe loop at the very
end of tcf_block_put_ext.
Fixes: efbf789739 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After applying 2270bc5da3 ("bnxt_en: Fix netpoll handling") and
903649e718 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
bnxt_poll+0x0/0xd0 exceeded budget in poll
<snip>
Call Trace:
[<ffffffff814be5cd>] dump_stack+0x4d/0x70
[<ffffffff8107e013>] __warn+0xd3/0xf0
[<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
[<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
[<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
[<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
[<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
[<ffffffff810c9549>] console_unlock+0x339/0x5b0
[<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
[<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
[<ffffffff81173df5>] printk+0x48/0x50
[<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
[<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
[<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
[<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
[<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
[<ffffffff81095f8b>] process_one_work+0x19b/0x480
[<ffffffff810967ca>] worker_thread+0x6a/0x520
[<ffffffff8109c7c4>] kthread+0xe4/0x100
[<ffffffff81884c52>] ret_from_fork+0x22/0x40
This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend says:
====================
lockless qdisc series
This series adds support for building lockless qdiscs. This is
the result of noticing the qdisc lock is a common hot-spot in
perf analysis of the Linux network stack, especially when testing
with high packet per second rates. However, nothing is free and
most qdiscs rely on the qdisc lock for their data structures so
each qdisc must be converted on a case by case basis. In this
series, to kick things off, we make pfifo_fast, mq, and mqprio
lockless. Follow up series can address additional qdiscs as needed.
For example sch_tbf might be useful. To allow this the lockless
design is an opt-in flag. In some future utopia we convert all
qdiscs and we get to drop this case analysis, but in order to
make progress we live in the real-world.
There are also a handful of optimizations I have behind this
series and a few code cleanups that I couldn't figure out how
to fit neatly into this series with out increasing the patch
count. Once this is in additional patches can address this. The
most notable is in skb_dequeue we can push the consumer lock
out a bit and consume multiple skbs off the skb_array in pfifo
fast per iteration. Ideally we could push arrays of packets at
drivers as well but we would need the infrastructure for this.
The other notable improvement is to do less locking in the
overrun cases where bad tx queue list and gso_skb are being
hit. Although, nice in theory in practice this is the error
case and I haven't found a benchmark where this matters yet.
For testing...
My first test case uses multiple containers (via cilium) where
multiple client containers use 'wrk' to benchmark connections with
a server container running lighttpd. Where lighttpd is configured
to use multiple threads, one per core. Additionally this test has
a proxy agent running so all traffic takes an extra hop through a
proxy container. In these cases each TCP packet traverses the egress
qdisc layer at least four times and the ingress qdisc layer an
additional four times. This makes for a good stress test IMO, perf
details below.
The other micro-benchmark I run is injecting packets directly into
qdisc layer using pktgen. This uses the benchmark script,
./pktgen_bench_xmit_mode_queue_xmit.sh
Benchmarks taken in two cases, "base" running latest net-next no
changes to qdisc layer and "qdisc" tests run with qdisc lockless
updates. Numbers reported in req/sec. All virtual 'veth' devices
run with pfifo_fast in the qdisc test case.
`wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"`
conns 16 32 64 1024
-----------------------------------------------
base: 18831 20201 21393 29151
qdisc: 19309 21063 23899 29265
notice in all cases we see performance improvement when running
with qdisc case.
Microbenchmarks using pktgen are as follows,
`pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000
base(mq): 2.1Mpps
base(pfifo_fast): 2.1Mpps
qdisc(mq): 2.6Mpps
qdisc(pfifo_fast): 2.6Mpps
notice numbers are the same for mq and pfifo_fast because only
testing a single thread here. In both tests we see a nice bump
in performance gain. The key with 'mq' is it is already per
txq ring so contention is minimal in the above cases. Qdiscs
such as tbf or htb which have more contention will likely show
larger gains when/if lockless versions are implemented.
Thanks to everyone who helped with this work especially Daniel
Borkmann, Eric Dumazet and Willem de Bruijn for discussing the
design and reviewing versions of the code.
Changes from the RFC: dropped a couple patches off the end,
fixed a bug with skb_queue_walk_safe not unlinking skb in all
cases, fixed a lockdep splat with pfifo_fast_destroy not calling
*_bh lock variant, addressed _most_ of Willem's comments, there
was a bug in the bulk locking (final patch) of the RFC series.
@Willem, I left out lockdep annotation for a follow on series
to add lockdep more completely, rather than just in code I
touched.
Comments and feedback welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This converts the pfifo_fast qdisc to use the skb_array data structure
and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
the lockless bit that can be a child of a qdisc requiring locking. So
we add logic to clear the lock bit on initialization in these cases when
the qdisc graft operation occurs.
This also removes the logic used to pick the next band to dequeue from
and instead just checks a per priority array for packets from top priority
to lowest. This might need to be a bit more clever but seems to work
for now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a peek routine to skb_array.h for use with qdisc.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.
This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.
This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Also exports __gnet_stats_copy_queue() to use as a helper function.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add qdisc qlen helper routines for lockless qdiscs to use.
The qdisc qlen is no longer used in the hotpath but it is reported
via stats query on the qdisc so it still needs to be tracked. This
adds the per cpu operations needed along with a helper to return
the summation of per cpu stats.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I can not think of any reason to pull the bad txq skb off the qdisc if
the txq we plan to send this on is still frozen. So check for frozen
queue first and abort before dequeuing either skb_bad_txq skb or
normal qdisc dequeue() skb.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to how gso is handled use skb list for skb_bad_tx this is
required with lockless qdiscs because we may have multiple cores
attempting to push skbs into skb_bad_tx concurrently
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy'
operation is called on the old qdisc. The destroy operation will wait
a rcu grace period and call qdisc_rcu_free(). At which point
gso_cpu_skb is free'd along with all stats so no need to zero stats
and gso_cpu_skb from the graft operation itself.
Further after dropping the qdisc locks we can not continue to call
qdisc_reset before waiting an rcu grace period so that the qdisc is
detached from all cpus. By removing the qdisc_reset() here we get
the correct property of waiting an rcu grace period and letting the
qdisc_destroy operation clean up the qdisc correctly.
Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work is preparing the qdisc layer to support egress lockless
qdiscs. If we are running the egress qdisc lockless in the case we
overrun the netdev, for whatever reason, the netdev returns a busy
error code and the skb is parked on the gso_skb pointer. With many
cores all hitting this case at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.
This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>