linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-10 22:21:40 +00:00

Author	SHA1	Message	Date
Tom Herbert	64e0cd0d35	rhashtable: Call library function alloc_bucket_locks To allocate the array of bucket locks for the hash table we now call library function alloc_bucket_spinlocks. This function is based on the old alloc_bucket_locks in rhashtable and should produce the same effect. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-11 09:58:39 -05:00
Tom Herbert	92f36cca57	spinlock: Add library function to allocate spinlock buckets array Add two new library functions: alloc_bucket_spinlocks and free_bucket_spinlocks. These are used to allocate and free an array of spinlocks that are useful as locks for hash buckets. The interface specifies the maximum number of spinlocks in the array as well as a CPU multiplier to derive the number of spinlocks to allocate. The number allocated is rounded up to a power of two to make the array amenable to hash lookup. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-11 09:58:39 -05:00
Tom Herbert	2b86093135	rhashtable: abstract out function to get hash Split out most of rht_key_hashfn which is calculating the hash into its own function. This way the hash function can be called separately to get the hash value. Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-11 09:58:38 -05:00
Tom Herbert	2db54b475a	rhashtable: Add rhastable_walk_peek This function is like rhashtable_walk_next except that it only returns the current element in the inter and does not advance the iter. This patch also creates __rhashtable_walk_find_next. It finds the next element in the table when the entry cached in iter is NULL or at the end of a slot. __rhashtable_walk_find_next is called from rhashtable_walk_next and rhastable_walk_peek. end_of_table is an added field to the iter structure. This indicates that the end of table was reached (walker.tbl being NULL is not a sufficient condition for end of table). Signed-off-by: Tom Herbert <tom@quantonium.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-11 09:58:38 -05:00
Tom Herbert	97a6ec4ac0	rhashtable: Change rhashtable_walk_start to return void Most callers of rhashtable_walk_start don't care about a resize event which is indicated by a return value of -EAGAIN. So calls to rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something like this is common: ret = rhashtable_walk_start(rhiter); if (ret && ret != -EAGAIN) goto out; Since zero and -EAGAIN are the only possible return values from the function this check is pointless. The condition never evaluates to true. This patch changes rhashtable_walk_start to return void. This simplifies code for the callers that ignore -EAGAIN. For the few cases where the caller cares about the resize event, particularly where the table can be walked in mulitple parts for netlink or seq file dump, the function rhashtable_walk_start_check has been added that returns -EAGAIN on a resize event. Signed-off-by: Tom Herbert <tom@quantonium.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-11 09:58:38 -05:00
Stephen Hemminger	a0b586fa75	rtnetlink: fix typo in GSO max segments Fixes: `46e6b992c2` ("rtnetlink: allow GSO maximums to be set on device creation") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-11 09:45:59 -05:00
David S. Miller	51e18a453f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflict was two parallel additions of include files to sch_generic.c, no biggie. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-09 22:09:55 -05:00
Michal Hocko	f335195adf	kmemcheck: rip it out for real Commit `4675ff05de` ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-12-08 13:40:17 -08:00
Linus Torvalds	e9ef1fe312	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...	2017-12-08 13:32:44 -08:00
Linus Torvalds	77071bc6c4	media fixes for v4.15-rc3 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaKrOpAAoJEAhfPr2O5OEVK5sP/iHWDJtRw/WWBYOqF9cGFl0o ssh1iXJAsOmLjaARMOkxQBPLwTvVMreux2fHow/mukrXB8BIR7OmflBaQRLM3lbW xm9G4yarXf7xgkxngisemQrJiweNyaNDX9P0BqJLV55Xhp+rO2Q6rutspho3xzoo Pmz7bgnt0FjBKz+0LWnKnzozdNinj2uhTdBOTgm9rUuUUGiAVTAQSNUq2e5Gzw3m DhO4UPOJDRmnZ6Rqldq2pD2wzmRfbVonirz7IEh7j2opcAoCM2TnmM+Z9C5zjlUj XVFagiR+XG8lsh2zvHdweter4X9DqLBlMbBDVAQ+vH+xhBuz63Yqq1pSIvXA342U rs1X182FSyE+pOxipuae94csHkyzlb9tmzoiUoItU+QXi8Wlg8pLH6GFQe/72t/g HpcXMBq2lflc2KDstx+QGs/G+1EYtz64vUXZ4pvuIitrhdbd5ulLh7Y7nb7qfEym WEHpaJwxh4MZh6RzB2tq+Tvy0/sD1krNjPTXtf2CuTdFNUuvk8QenRW1q2YVA06L UOTxRpfPYUPwhGKOoKb2IJ5g1xoztoqMaafdsUJ16H7fcuVtE9xQayVCVhaBuo08 4g7SFCb7MhNLFG1Su291u9PKYlvGhCJaDPKE/6E+IM6szxt7h/kveYsdb05yWOoB di+cIWZkCRUzbO7fxz4J =xoHW -----END PGP SIGNATURE----- Merge tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "A series of fixes for the media subsytem: - The largest amount of fixes in this series is with regards to comments that aren't kernel-doc, but start with "/*". A new check added for 4.15 makes it to produce a huge* amount of new warnings (I'm compiling here with W=1). Most of the patches in this series fix those. No code changes - just comment changes at the source files - rc: some fixed in order to better handle RC repetition codes - v4l-async: use the v4l2_dev from the root notifier when matching sub-devices - v4l2-fwnode: Check subdev count after checking port - ov 13858 and et8ek8: compilation fix with randconfigs - usbtv: a trivial new USB ID addition - dibusb-common: don't do DMA on stack on firmware load - imx274: Fix error handling, add MAINTAINERS entry - sir_ir: detect presence of port" * tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits) media: imx274: Fix error handling, add MAINTAINERS entry media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices media: v4l2-fwnode: Check subdev count after checking port media: et8ek8: select V4L2_FWNODE media: ov13858: Select V4L2_FWNODE media: rc: partial revert of "media: rc: per-protocol repeat period" media: dvb: i2c transfers over usb cannot be done from stack media: dvb-frontends: complete kernel-doc markups media: docs: add documentation for frontend attach info media: dvb_frontends: fix kernel-doc macros media: drivers: remove "/**" from non-kernel-doc comments media: lm3560: add a missing kernel-doc parameter media: rcar_jpu: fix two kernel-doc markups media: vsp1: add a missing kernel-doc parameter media: soc_camera: fix a kernel-doc markup media: mt2063: fix some kernel-doc warnings media: radio-wl1273: fix a parameter name at kernel-doc macro media: s3c-camif: add missing description at s3c_camif_find_format() media: mtk-vpu: add description for wdt fields at struct mtk_vpu media: vdec: fix some kernel-doc warnings ...	2017-12-08 13:18:47 -08:00
Linus Torvalds	4066aa72f9	i915, amdgpu + misc fixes -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaKedrAAoJEAx081l5xIa+CLkP/AkLTMW17WT/aktLfq4/+NQV WGcLxQw+SB9iYMffFUNdQdMqv3pKsDzeatifxNuoRnjvK6rNZDUFK11SoTEFCdA+ opI3zBPiyMVv71LtLF/r2kj3Vw5rnARdCg24FIzptGg/4j+G0EOb9uzd9qjmjof1 r2XD1f0mjIG6SREMYCZPJj/Q/ZGfWFKXxcatmf87sAoq4o2nfoQ1Sl0T4cnMV7yp FzzFKA/j6OaiVa0Tb1KiJOl8VnW5f9STanUynFcPtKd2mhDcVaEP4lAfBhHgccd8 ufmsyPJDadGQDT0iQY5E8n+ht124wg4gW8TKGqyxDfxuzTFj2k62WOUMH4eSjGSt x7SHEewTQLoV5kIPp0y00T2NWuA4/jmjjgllQZgqSFwnsv8fQ9/GwLNPWrUGM0+z 6sqzj7QRyn8GGuYblqVghVYj0bX3ovmYlLzOaKZ+pNSM34hLBFRenZQ5mKJ6WrcE SIKzuQm3HEC8KwNw72zZ6+zz5fc7x9KuURztwEPt6vOMnxz1PtjWSdHzfvja4cH2 6D2t0lZSqkM244FFG4ImFP6CDF+8vaY7i9YybjYteycWH7o9haVa869vCxUdNUWo DDghWKPRjbl7LJLJ14BxMrg0+sk+XWYcoZXjUAM0rYFFOWoFkVAlF8Sy3kgb5Pky wMLalQohWFbkZY60q02h =2f34 -----END PGP SIGNATURE----- Merge tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "This pull is a bit larger than I'd like but a large bunch of it is license fixes, AMD wanted to fix the licenses for a bunch of files that were missing them, Otherwise a bunch of TTM regression fix since the hugepage support, some i915 and gvt fixes, a core connector free in a safe context fix, and one bridge fix" * tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits) drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk" drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage drm/i915: Call i915_gem_init_userptr() before taking struct_mutex drm/exynos: remove unnecessary function declaration drm/exynos: remove unnecessary descrptions drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU drm/exynos: Fix dma-buf import drm/ttm: swap consecutive allocated pooled pages v4 drm: safely free connectors from connector_iter drm/i915/gvt: set max priority for gvt context drm/i915/gvt: Don't mark vgpu context as inactive when preempted drm/i915/gvt: Limit read hw reg to active vgpu drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id() drm/i915/gvt: Emulate PCI expansion ROM base address register drm/ttm: swap consecutive allocated cached pages v3 drm/ttm: roundup the shrink request to prevent skip huge pool drm/ttm: add page order support in ttm_pages_put drm/ttm: add set_pages_wb for handling page order more than zero drm/ttm: add page order in page pool ...	2017-12-08 13:11:57 -08:00
Linus Torvalds	7267212c80	Merge tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull md fixes from Shaohua Li: "Some MD fixes. The notable one is a raid5-cache deadlock bug with dm-raid, others are not significant" * tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid1/10: add missed blk plug md: limit mdstat resync progress to max_sectors md/r5cache: move mddev_lock() out of r5c_journal_mode_set() md/raid5: correct degraded calculation in raid5_error	2017-12-08 13:03:02 -08:00
Linus Torvalds	78d9b04844	DeviceTree fixes for v4.15 (part2): - Fixes from overlay code rework. A trifecta of fixes to the locking, an out of bounds access, and a memory leak in of_overlay_apply(). - Clean-up at25 eeprom binding document - Remove leading '0x' in unit-addresses from binding docs -----BEGIN PGP SIGNATURE----- iQItBAABCAAXBQJaKrC8EBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcPY7Q/9 GOZDOhajewVoYo+LkKl8+1wtdih7QHOvziKY8Jn1Nd/o6ASXU5ltOxvk5iF/q4tz x7JFKMkUo8eyOy5p5XnkqAzOnQ+rP+aRvHxjeAiMtQZRtMcDJ6bv4FRMI4/hq610 QocAcxzblXJayy/Ad2sbuMr34O6QBkvnicMiLHZNb2IhCgk0nxBbyiNJDr868XE6 CwlkKgo+y0fFJKkW3y4v4jaMdMlmcxTPTY+4jOJoGnnD1LgNwNRgkTFCzRQLebBk E5D6XweMS7G0IOO8wLYEn70JYsNk4wRGjIvlZGR1e5REQNgT9H+1J8rXLxlWc5rk l2o4V3+nrBj3BpVCG5oQWBrwVsMZS8ExdnRnrGmZJ25C1EFzlRHGYtsvo6fvXsLs T2VfYPEDhN5IHZiUU/rcHfz1VpHjuBrtFStPQkFTSTfO52n22Q5KWBxHUqUjXbBZ phcPoHcDm+FfO4swkLuIP0wDAiNj8SJpJJNnHLrgIuy52UvXe/VD5BlY5RgXIZbg XhxAcsNKrqfH8ggfbM7ndXkUoFrDyl/dOupfwbtwzNqDQiPGvpjkr9hmE7XIL2aJ tyGahgkpZ3n49oGJPe5rTHKJacZBkl6rNI2CFQfERXyGYUmQSEU6AViSSyRG6ljK kXLmIzNPgJJF/I9YC3q8Wr6FUoOAapE7Zww/Cc83xJ8= =vBBN -----END PGP SIGNATURE----- Merge tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: "Another set of DT fixes: - Fixes from overlay code rework. A trifecta of fixes to the locking, an out of bounds access, and a memory leak in of_overlay_apply() - Clean-up at25 eeprom binding document - Remove leading '0x' in unit-addresses from binding docs" * tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: overlay: Make node skipping in init_overlay_changeset() clearer of: overlay: Fix out-of-bounds write in init_overlay_changeset() of: overlay: Fix (un)locking in of_overlay_apply() of: overlay: Fix memory leak in of_overlay_apply() error path dt-bindings: eeprom: at25: Document device-specific compatible values dt-bindings: eeprom: at25: Grammar s/are can/can/ dt-bindings: Remove leading 0x from bindings notation of: overlay: Remove else after goto of: Spelling s/changset/changeset/ of: unittest: Remove bogus overlay mutex release from overlay_data_add()	2017-12-08 13:00:51 -08:00
Linus Torvalds	900add27f5	virtio: bugfixes A couple of minor bugfixes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJaKW26AAoJECgfDbjSjVRpDwIH/1tq1vcjd5i00nh+gQD7jWm2 qWrpcKbsS0ANTvhm9UNFY24BVoPuNpNrApQpuBxmgxE/XGqx7A+8xXhwPAM5lLiG uDSPB2nsfjvEUOde7bgeR+t6ay+Ki2UWKzY46lSoCTcIN7BSVFWQZonsGu2xLDzz kKpCtlobXRnXzeWm+fh1oOZu/cn/TuAF0mbb+6TUQqSsHAt6PQ3Hwsly2EmQV5xR of6Si/TnxFOkhZUKpezbuTU/g/20ZkHUSzgfrOuZLGonaIvZIE6BUm8m21/IgCbw WnwSqe66WsWF2vd0RDLFq4L3qaRyGn+W7iYo3KPbwIPEEotSRFMUgixBWBB3IgU= =Pf4w -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio bugfixes from Michael Tsirkin: "A couple of minor bugfixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: fix return value check in receive_mergeable() virtio_mmio: add cleanup for virtio_mmio_remove virtio_mmio: add cleanup for virtio_mmio_probe	2017-12-08 12:58:51 -08:00
Linus Torvalds	32abeb09ab	xen: fixes for 4.15-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJaKlu/AAoJELDendYovxMvaqoH/i7gN9xry7QkUM6RkwddGwYY v0rqaUo4WCW27yFOE7Bzej9Y+W92/eFPJnVUhc/quTVfV+uEjbs4PiAwuxSr+lIU X+BhBNbEi9C5RlRL1z75J0ZySyu6WXL2hsmPbc0wrrqdQikfiZ7bnRjdGAHh5C5C TijKQKGZTt6ccjIPUEZTIqeajOt/p7uxkCXPWhHQA1mudf9PVhsKyYnGdYp5gp8X KID+8XmKtAcSwPUz+eG9vGlGwmP28mH0BfCT0suC2uUI4o+PJFPqBTlfsco2kfHO NqVCgnMZs31Js8mdEVz8h2ZO8m2T5m1oml1zOeyDbgTJ8yjqgADy8K6Lm38clko= =ZHtb -----END PGP SIGNATURE----- Merge tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Just two small fixes for the new pvcalls frontend driver" * tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: Fix a check in pvcalls_front_remove() xen/pvcalls: check for xenbus_read() errors	2017-12-08 12:53:43 -08:00
Linus Torvalds	d90696ed61	powerpc fixes for 4.15 #4 One notable fix for kexec on Power9, where we were not clearing MMU PID properly which sometimes leads to hangs. Finally debugged to a root cause by Nick. A revert of a patch which tried to rework our panic handling to get more output on the console, but inadvertently broke reporting the panic to the hypervisor, which apparently people care about. Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in xmon. Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJaKoWXExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAp Rw/+KRvwt1jt3vFKrWlcXQ4Mx4UTseSaBO7FsGwyANqNGUNvkIEIAZYu6M9x0LLh tfVowZdJ2vQrgdZy4Rd5zhIjzVaybyENMAMZFmGCxQUidORdibP2qT+3612FmQl0 rczQB4Ra1Jymw+42iwe4WQfyta9cvVgfk7D+1KVWaCXQ0lx8DynZ75yK+U0fensz FPQNdtkfC2D37IFrqtgGBS5YLkeQpfftm8C/eBG0n2tv8PO1KM5xwVU8Ovf5LoIm 8NbWL//H+zUOoU2jCGHDMfg1qLv9owScTMRtquQSmrE1i21mE2lLOSDSM105+AP2 7CVRMMkth8V9w/nauPq0a5OGyzJWtClI9qj2ZPWS2wPF331g58GUNJsEy7OQAJgO QZoqcCkpT5qarmxkcKJlYZGF6AZ/4mIBL9mucfQc/afEgRUqksaUKck0qD5SW28j fm3pPjlMyf2vMRKGgaE9/+by5N/Bmxy2VCoFSuhm1ZrQsIpZXtp/Mfylqz0msdhU VCt4T229S7rdCQTn2TyMNW+iVmjlgvR4OUXvba/eBz67gzGk4huLNB4EnEwHA/SK qhkTJqYbP9B/MBD9GrNLFzG5yZTTv+3OA/aehL0PEGouV7cgMEqyGtYw2afwggRC sf+veK/2apPMnlA5WItEa7JPWaTLsxljZ65acskb7S/4W8g= =wU60 -----END PGP SIGNATURE----- Merge tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix for kexec on Power9, where we were not clearing MMU PID properly which sometimes leads to hangs. Finally debugged to a root cause by Nick. A revert of a patch which tried to rework our panic handling to get more output on the console, but inadvertently broke reporting the panic to the hypervisor, which apparently people care about. Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in xmon. Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria" * tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xmon: Don't print hashed pointers in xmon powerpc/64s: Initialize ISAv3 MMU registers before setting partition table Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier" powerpc/perf: Fix oops when grouping different pmu events	2017-12-08 12:52:09 -08:00
David S. Miller	fd29117aeb	linux-can-fixes-for-4.15-20171208 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAloqeVYTHG1rbEBwZW5n dXRyb25peC5kZQAKCRAe/sog7Dgc9tiqCADK4f/QYW5q5jC93A6JZSItI8vAK2+h 0s4MRTj9x+thBIGIhJ59uYBSTd374bvsWmrGdV7CBoGX4TnEJfGiMV77lpGnGRVG jpzk9cSFoAnE5UW2qZlF+JM8SNEFlU18MCQQlnMzKbSGerUAlveK+mcF5sJrqrQh CGZ9MH1Bp4Fz3WMRQ9hHzKWjTOhhM54qjPceCVTZM6I0RJam6I2lpVZPQeom9uVa r+F5Lv2ZOpNZc+8Pbu+L95YyivKKaQOPzeP4btFLNEHUyFDygHcv2iKRIn9MdEu2 2XfaDVKk2Ey/qWc782SLBxLOihnhWltwC7Kg1ZnrLhNZ6V5UbYQ5FzF4 =OMZE -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-4.15-20171208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-12-08 this is a pull request of 6 patches for net/master. Martin Kelly provides 5 patches for various USB based CAN drivers, that properly cancel the URBs on adapter unplug, so that the driver doesn't end up in an endless loop. Stephane Grosjean provides a patch to restart the tx queue if zero length packages are transmitted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:53:54 -05:00
Girish Moodalbail	5e54b3c120	macvlan: fix memory hole in macvlan_dev Move 'macaddr_count' from after 'netpoll' to after 'nest_level' to pack and reduce a memory hole. Fixes: `88ca59d1aa` (macvlan: remove unused fields in struct macvlan_dev) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:51:46 -05:00
David S. Miller	03afb6e43a	wireless-drivers fixes for 4.15 Second set of fixes for 4.15. This time a lot of iwlwifi patches and two brcmfmac patches. Most important here are the MIC and IVC fixes for iwlwifi to unbreak 9000 series. iwlwifi * fix rate-scaling to not start lowest possible rate * fix the TX queue hang detection for AP/GO modes * fix the TX queue hang timeout in monitor interfaces * fix packet injection * remove a wrong error message when dumping PCI registers * fix race condition with RF-kill * tell mac80211 when the MIC has been stripped (9000 series) * tell mac80211 when the IVC has been stripped (9000 series) * add 2 new PCI IDs, one for 9000 and one for 22000 * fix a queue hang due during a P2P Remain-on-Channel operation brcmfmac * fix a race which sometimes caused a crash during sdio unbind * fix a kernel-doc related build error -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJaKp9UAAoJEG4XJFUm622bhXgH+wTtTVEH0lAOTtK+PyBkxkRH Q+55Yf1XZ9lNYxmfXhYgObusSbmeL8tClMuISCcQS9gX0um1Vuuud4CLemgO2V7R V1xuWTKjanaKf8PouKx9SUt1Fx6CsFdwlivJX+eZTfKlKYtwNbNX4onWl9GN2jVZ 5/2l+m3MJbTMMzarZTGLkBJqpTk8DGTNINtKeRd+VF+717SlbqpRlw1TTlbVJcR2 nHcJ3p5JGbU/+hOTroOWUr7kGdpdYlWLfcyOL8iT3rZXtAzH/POjPAmQv9VVRaC+ anm5zn+gZ5GH9XF+pc3nrGOEZ2Ei4LtszMQjdo4Zo9V3ngCj/0OoEnkto6VLbkw= =lzG8 -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-for-davem-2017-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.15 Second set of fixes for 4.15. This time a lot of iwlwifi patches and two brcmfmac patches. Most important here are the MIC and IVC fixes for iwlwifi to unbreak 9000 series. iwlwifi * fix rate-scaling to not start lowest possible rate * fix the TX queue hang detection for AP/GO modes * fix the TX queue hang timeout in monitor interfaces * fix packet injection * remove a wrong error message when dumping PCI registers * fix race condition with RF-kill * tell mac80211 when the MIC has been stripped (9000 series) * tell mac80211 when the IVC has been stripped (9000 series) * add 2 new PCI IDs, one for 9000 and one for 22000 * fix a queue hang due during a P2P Remain-on-Channel operation brcmfmac * fix a race which sometimes caused a crash during sdio unbind * fix a kernel-doc related build error ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:48:49 -05:00
Marc Kleine-Budde	936e5d8bdf	slip: sl_alloc(): remove unused parameter "dev_t line" The first and only parameter of sl_alloc() is unused, so remove it. Fixes: `5342b77c41` slip: ("Clean up create and destroy") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:41:02 -05:00
Antoine Tenart	8a7b741e76	net: mvpp2: fix the RSS table entry offset The macro used to access or set an RSS table entry was using an offset of 8, while it should use an offset of 0. This lead to wrongly configure the RSS table, not accessing the right entries. Fixes: `1d7d15d79f` ("net: mvpp2: initialize the RSS tables") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:35:15 -05:00
David S. Miller	2deeb49550	Merge branch 'cxgb4-collect-hardware-logs-via-ethtool' Rahul Lakkireddy says: ==================== cxgb4: collect hardware logs via ethtool Collect more hardware logs via ethtool --get-dump facility. Patch 1 collects on-chip memory layout information. Patch 2 collects on-chip MC memory dumps. Patch 3 collects HMA memory dump. Patch 4 evaluates and skips TX and RX payload regions in memory dumps. Patch 5 collects egress and ingress SGE queue contexts. Patch 6 collects PCIe configuration logs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:51 -05:00
Rahul Lakkireddy	6078ab196b	cxgb4: collect PCIe configuration logs Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	736c3b9447	cxgb4: collect egress and ingress SGE queue contexts Use meminfo to identify the egress and ingress context regions and fetch all valid contexts from those regions. Also flush all contexts before attempting collection to prevent stale information. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	c1219653f3	cxgb4: skip TX and RX payload regions in memory dumps Use meminfo to identify TX and RX payload regions and skip them in collection of EDC, MC, and HMA. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	4db0401f8a	cxgb4: collect HMA memory dump Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	a1c69520f7	cxgb4: collect MC memory dump Use meminfo to get base address and size of MC memory. Also use same meminfo for EDC memory dumps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	123e25c4a5	cxgb4: collect on-chip memory information Collect memory layout of various on-chip memory regions. Move code for collecting on-chip memory information to cudbg_lib.c and update cxgb4_debugfs.c to use the common function. Also include cudbg_entity.h before cudbg_lib.h to avoid adding cudbg entity structure forward declarations in cudbg_lib.h. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:49 -05:00
David S. Miller	62fd8b189e	Merge branch 'veth-and-GSO-maximums' Stephen Hemminger says: ==================== veth and GSO maximums This is the more general way to solving the issue of GSO limits not being set correctly for containers on Azure. If a GSO packet is sent to host that exceeds the limit (reported by NDIS), then the host is forced to do segmentation in software which has noticeable performance impact. The core rtnetlink infrastructure already has the messages and infrastructure to allow changing gso limits. With an updated iproute2 the following already works: # ip li set dev dummy0 gso_max_size 30000 These patches are about making it easier with veth. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:23:00 -05:00
Stephen Hemminger	72d24955b4	veth: set peer GSO values When new veth is created, and GSO values have been configured on one device, clone those values to the peer. For example: # ip link add dev vm1 gso_max_size 65530 type veth peer name vm2 This should create vm1 <--> vm2 with both having GSO maximum size set to 65530. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:22:59 -05:00
Stephen Hemminger	46e6b992c2	rtnetlink: allow GSO maximums to be set on device creation Netlink device already allows changing GSO sizes with ip set command. The part that is missing is allowing overriding GSO settings on device creation. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:22:59 -05:00
Niklas Cassel	5a6a0445d1	net: stmmac: fix broken dma_interrupt handling for multi-queues There is nothing that says that number of TX queues == number of RX queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue. This code is obviously wrong: for (chan = 0; chan < tx_channel_count; chan++) { struct stmmac_rx_queue rx_q = &priv->rx_queue[chan]; priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an uninitialized napi_struct to __napi_schedule(), causing us to crash in net_rx_action(), because napi_struct->poll is zero. [12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [12846.768014] pgd = (ptrval) [12846.770742] [00000000] pgd=39ec7831, pte=00000000, ppte=00000000 [12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM [12846.782942] Modules linked in: [12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36 [12846.794177] Hardware name: Axis ARTPEC-6 Platform [12846.798879] task: (ptrval) task.stack: (ptrval) [12846.803407] PC is at 0x0 [12846.805942] LR is at net_rx_action+0x274/0x43c [12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113 [12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44 [12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100 [12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900 [12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900 Since each DMA channel can be used for rx and tx simultaneously, the current code should probably be rewritten so that napi_struct is embedded in a new struct stmmac_channel. That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue where we got the IRQ, instead of looping through all tx queues. This is also how the xgbe driver does it (another driver for this IP). Fixes: `c22a3f48ef` ("net: stmmac: adding multiple napi mechanism") Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:18:40 -05:00
David S. Miller	b7e445a1c8	Merge branch 'tcp-RACK-loss-recovery-bug-fixes' Yuchung Cheng says: ==================== tcp: RACK loss recovery bug fixes This patch set has four minor bug fixes in TCP RACK loss recovery. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:12 -05:00
Yuchung Cheng	6065fd0d17	tcp: evaluate packet losses upon RTT change RACK skips an ACK unless it advances the most recently delivered TX timestamp (rack.mstamp). Since RACK also uses the most recent RTT to decide if a packet is lost, RACK should still run the loss detection whenever the most recent RTT changes. For example, an ACK that does not advance the timestamp but triggers the cwnd undo due to reordering, would then use the most recent (higher) RTT measurement to detect further losses. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Yuchung Cheng	428aec5e69	tcp: fix off-by-one bug in RACK RACK should mark a packet lost when remaining wait time is zero. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Yuchung Cheng	cd1fc85b43	tcp: always evaluate losses in RACK upon undo When sender detects spurious retransmission, all packets marked lost are remarked to be in-flight. However some may be considered lost based on its timestamps in RACK. This patch forces RACK to re-evaluate, which may be skipped previously if the ACK does not advance RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Yuchung Cheng	0ce294d884	tcp: correctly test congestion state in RACK RACK does not test the loss recovery state correctly to compute the reordering window. It assumes if lost_out is zero then TCP is not in loss recovery. But it can be zero during recovery before calling tcp_rack_detect_loss(): when an ACK acknowledges all packets marked lost before receiving this ACK, but has not yet to discover new ones by tcp_rack_detect_loss(). The fix is to simply test the congestion state directly. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Egil Hjelmeland	2e8d243e88	net: dsa: lan9303: Protect ALR operations with mutex ALR table operations are a sequence of related register operations which should be protected from concurrent access. The alr_cache should also be protected. Add alr_mutex doing that. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:12:33 -05:00
Jiri Pirko	df45bf84e4	net: sched: fix use-after-free in tcf_block_put_ext Since the block is freed with last chain being put, once we reach the end of iteration of list_for_each_entry_safe, the block may be already freed. I'm hitting this only by creating and deleting clsact: [ 202.171952] ================================================================== [ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390 [ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796 [ 202.194508] [ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5 [ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 202.213613] Call Trace: [ 202.216369] dump_stack+0xda/0x169 [ 202.220192] ? dma_virt_map_sg+0x147/0x147 [ 202.224790] ? show_regs_print_info+0x54/0x54 [ 202.229691] ? tcf_chain_destroy+0x1dc/0x250 [ 202.234494] print_address_description+0x83/0x3d0 [ 202.239781] ? tcf_block_put_ext+0x240/0x390 [ 202.244575] kasan_report+0x1ba/0x460 [ 202.248707] ? tcf_block_put_ext+0x240/0x390 [ 202.253518] tcf_block_put_ext+0x240/0x390 [ 202.258117] ? tcf_chain_flush+0x290/0x290 [ 202.262708] ? qdisc_hash_del+0x82/0x1a0 [ 202.267111] ? qdisc_hash_add+0x50/0x50 [ 202.271411] ? __lock_is_held+0x5f/0x1a0 [ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress] [ 202.281323] qdisc_destroy+0xcb/0x240 [ 202.285445] qdisc_graft+0x216/0x7b0 [ 202.289497] tc_get_qdisc+0x260/0x560 Fix this by holding the block also by chain 0 and put chain 0 explicitly, out of the list_for_each_entry_safe loop at the very end of tcf_block_put_ext. Fixes: `efbf789739` ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:09:08 -05:00
Calvin Owens	2edbdb3159	bnxt_en: Fix sources of spurious netpoll warnings After applying `2270bc5da3` ("bnxt_en: Fix netpoll handling") and `903649e718` ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."), we still see the following WARN fire: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160 bnxt_poll+0x0/0xd0 exceeded budget in poll <snip> Call Trace: [<ffffffff814be5cd>] dump_stack+0x4d/0x70 [<ffffffff8107e013>] __warn+0xd3/0xf0 [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60 [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160 [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250 [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440 [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250 [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110 [<ffffffff810c9549>] console_unlock+0x339/0x5b0 [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450 [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30 [<ffffffff81173df5>] printk+0x48/0x50 [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core] [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core] [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac] [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac] [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core] [<ffffffff81095f8b>] process_one_work+0x19b/0x480 [<ffffffff810967ca>] worker_thread+0x6a/0x520 [<ffffffff8109c7c4>] kthread+0xe4/0x100 [<ffffffff81884c52>] ret_from_fork+0x22/0x40 This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually given a non-zero budget. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:07:19 -05:00
David S. Miller	fc8b81a598	Merge branch 'lockless-qdisc-series' John Fastabend says: ==================== lockless qdisc series This series adds support for building lockless qdiscs. This is the result of noticing the qdisc lock is a common hot-spot in perf analysis of the Linux network stack, especially when testing with high packet per second rates. However, nothing is free and most qdiscs rely on the qdisc lock for their data structures so each qdisc must be converted on a case by case basis. In this series, to kick things off, we make pfifo_fast, mq, and mqprio lockless. Follow up series can address additional qdiscs as needed. For example sch_tbf might be useful. To allow this the lockless design is an opt-in flag. In some future utopia we convert all qdiscs and we get to drop this case analysis, but in order to make progress we live in the real-world. There are also a handful of optimizations I have behind this series and a few code cleanups that I couldn't figure out how to fit neatly into this series with out increasing the patch count. Once this is in additional patches can address this. The most notable is in skb_dequeue we can push the consumer lock out a bit and consume multiple skbs off the skb_array in pfifo fast per iteration. Ideally we could push arrays of packets at drivers as well but we would need the infrastructure for this. The other notable improvement is to do less locking in the overrun cases where bad tx queue list and gso_skb are being hit. Although, nice in theory in practice this is the error case and I haven't found a benchmark where this matters yet. For testing... My first test case uses multiple containers (via cilium) where multiple client containers use 'wrk' to benchmark connections with a server container running lighttpd. Where lighttpd is configured to use multiple threads, one per core. Additionally this test has a proxy agent running so all traffic takes an extra hop through a proxy container. In these cases each TCP packet traverses the egress qdisc layer at least four times and the ingress qdisc layer an additional four times. This makes for a good stress test IMO, perf details below. The other micro-benchmark I run is injecting packets directly into qdisc layer using pktgen. This uses the benchmark script, ./pktgen_bench_xmit_mode_queue_xmit.sh Benchmarks taken in two cases, "base" running latest net-next no changes to qdisc layer and "qdisc" tests run with qdisc lockless updates. Numbers reported in req/sec. All virtual 'veth' devices run with pfifo_fast in the qdisc test case. `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"` conns 16 32 64 1024 ----------------------------------------------- base: 18831 20201 21393 29151 qdisc: 19309 21063 23899 29265 notice in all cases we see performance improvement when running with qdisc case. Microbenchmarks using pktgen are as follows, `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000 base(mq): 2.1Mpps base(pfifo_fast): 2.1Mpps qdisc(mq): 2.6Mpps qdisc(pfifo_fast): 2.6Mpps notice numbers are the same for mq and pfifo_fast because only testing a single thread here. In both tests we see a nice bump in performance gain. The key with 'mq' is it is already per txq ring so contention is minimal in the above cases. Qdiscs such as tbf or htb which have more contention will likely show larger gains when/if lockless versions are implemented. Thanks to everyone who helped with this work especially Daniel Borkmann, Eric Dumazet and Willem de Bruijn for discussing the design and reviewing versions of the code. Changes from the RFC: dropped a couple patches off the end, fixed a bug with skb_queue_walk_safe not unlinking skb in all cases, fixed a lockdep splat with pfifo_fast_destroy not calling *_bh lock variant, addressed _most_ of Willem's comments, there was a bug in the bulk locking (final patch) of the RFC series. @Willem, I left out lockdep annotation for a follow on series to add lockdep more completely, rather than just in code I touched. Comments and feedback welcome. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:27 -05:00
John Fastabend	c5ad119fb6	net: sched: pfifo_fast use skb_array This converts the pfifo_fast qdisc to use the skb_array data structure and set the lockless qdisc bit. pfifo_fast is the first qdisc to support the lockless bit that can be a child of a qdisc requiring locking. So we add logic to clear the lock bit on initialization in these cases when the qdisc graft operation occurs. This also removes the logic used to pick the next band to dequeue from and instead just checks a per priority array for packets from top priority to lowest. This might need to be a bit more clever but seems to work for now. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	4a86a4cf68	net: skb_array: expose peek API This adds a peek routine to skb_array.h for use with qdisc. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	ce679e8df7	net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then called independently for enqueue and dequeue operations. However statistics are aggregated and pushed up to the "master" qdisc. This patch adds support for any of the sub-qdiscs to be per cpu statistic qdiscs. To handle this case add a check when calculating stats and aggregate the per cpu stats if needed. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	b01ac095c7	net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq The sch_mq qdisc creates a sub-qdisc per tx queue which are then called independently for enqueue and dequeue operations. However statistics are aggregated and pushed up to the "master" qdisc. This patch adds support for any of the sub-qdiscs to be per cpu statistic qdiscs. To handle this case add a check when calculating stats and aggregate the per cpu stats if needed. Also exports __gnet_stats_copy_queue() to use as a helper function. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	7e66016f2c	net: sched: helpers to sum qlen and qlen for per cpu logic Add qdisc qlen helper routines for lockless qdiscs to use. The qdisc qlen is no longer used in the hotpath but it is reported via stats query on the qdisc so it still needs to be tracked. This adds the per cpu operations needed along with a helper to return the summation of per cpu stats. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	fd8e8d1a77	net: sched: check for frozen queue before skb_bad_txq check I can not think of any reason to pull the bad txq skb off the qdisc if the txq we plan to send this on is still frozen. So check for frozen queue first and abort before dequeuing either skb_bad_txq skb or normal qdisc dequeue() skb. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	70e57d5e3f	net: sched: use skb list for skb_bad_tx Similar to how gso is handled use skb list for skb_bad_tx this is required with lockless qdiscs because we may have multiple cores attempting to push skbs into skb_bad_tx concurrently Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	7bbde83b18	net: sched: drop qdisc_reset from dev_graft_qdisc In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy' operation is called on the old qdisc. The destroy operation will wait a rcu grace period and call qdisc_rcu_free(). At which point gso_cpu_skb is free'd along with all stats so no need to zero stats and gso_cpu_skb from the graft operation itself. Further after dropping the qdisc locks we can not continue to call qdisc_reset before waiting an rcu grace period so that the qdisc is detached from all cpus. By removing the qdisc_reset() here we get the correct property of waiting an rcu grace period and letting the qdisc_destroy operation clean up the qdisc correctly. Note, a refcnt greater than 1 would cause the destroy operation to be aborted however if this ever happened the reference to the qdisc would be lost and we would have a memory leak. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:25 -05:00
John Fastabend	a53851e2c3	net: sched: explicit locking in gso_cpu fallback This work is preparing the qdisc layer to support egress lockless qdiscs. If we are running the egress qdisc lockless in the case we overrun the netdev, for whatever reason, the netdev returns a busy error code and the skb is parked on the gso_skb pointer. With many cores all hitting this case at once its possible to have multiple sk_buffs here so we turn gso_skb into a queue. This should be the edge case and if we see this frequently then the netdev/qdisc layer needs to back off. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:25 -05:00

1 2 3 4 5 ...

722910 Commits