Commit Graph

827945 Commits

Author SHA1 Message Date
Lorenz Bauer
dbaf2877e9 selftests/bpf: allow specifying helper for BPF_SK_LOOKUP
Make the BPF_SK_LOOKUP macro take a helper function, to ease
writing tests for new helpers.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Lorenz Bauer
253c8dde3c tools: update include/uapi/linux/bpf.h
Pull definitions for bpf_skc_lookup_tcp and bpf_sk_check_syncookie.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer
3990408470 bpf: add helper to check for a valid SYN cookie
Using bpf_skc_lookup_tcp it's possible to ascertain whether a packet
belongs to a known connection. However, there is one corner case: no
sockets are created if SYN cookies are active. This means that the final
ACK in the 3WHS is misclassified.

Using the helper, we can look up the listening socket via
bpf_skc_lookup_tcp and then check whether a packet is a valid SYN
cookie ACK.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer
edbf8c01de bpf: add skc_lookup_tcp helper
Allow looking up a sock_common. This gives eBPF programs
access to timewait and request sockets.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer
85a51f8c28 bpf: allow helpers to return PTR_TO_SOCK_COMMON
It's currently not possible to access timewait or request sockets
from eBPF, since there is no way to return a PTR_TO_SOCK_COMMON
from a helper. Introduce RET_PTR_TO_SOCK_COMMON to enable this
behaviour.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer
0f3adc288d bpf: track references based on is_acquire_func
So far, the verifier only acquires reference tracking state for
RET_PTR_TO_SOCKET_OR_NULL. Instead of extending this for every
new return type which desires these semantics, acquire reference
tracking state iff the called helper is an acquire function.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Uwe Kleine-König
507aaeeef8 ARM: imx_v4_v5_defconfig: enable PWM driver
While there is no mainline board that makes use of the PWM still enable the
driver for it to increase compile test coverage.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-03-22 09:56:11 +08:00
Uwe Kleine-König
728e096dd7 ARM: imx_v6_v7_defconfig: continue compiling the pwm driver
After the pwm-imx driver was split into two drivers and the Kconfig symbol
changed accordingly, use the new name to continue being able to use the
PWM hardware.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-03-22 09:55:59 +08:00
Dave Airlie
6a9d8fc018 Merge branch 'vmwgfx-fixes-5.1' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Two fixes CC'd stable. One fix for a long-standing a bit hard-to-trigger fbdev
modesetting bug and one out-of-bo-id fix.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321112026.114328-1-thellstrom@vmware.com
2019-03-22 11:53:36 +10:00
Dave Airlie
28d3ba6c99 - Fix page fault issue at Mixer device
. This patch fixes the page fault issue by correcting sychronization
     method for updating shadow registers for Mixer device.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJck17SAAoJEFc4NIkMQxK4VHkP/jYOO+ByvOw1rlC1F1CfvYfT
 hQTa/pxrCtqUquLGUst+dc7bj8jTszlam5V3SpxRrlCWCiS3qrXuyzaWw6XnX9MT
 dcrha5TmxYXVHX7q3ISkI65qiav/drt1mHDfoRP8W7cOUrgssARfKi2xsyB+STnt
 0JqfWD4WMZGcr8gx1b/Xjm6dqJQc3s4QB181lw5v5RkdpNbM2kVAYQ9dZ9iTBmFH
 QhyDJ4qJ3huqq9dG59RpQvr2AT/EmklFCfOuXqoJYV1cklR8qUFT4FZEqyN8KQsE
 cD2xX8FStZKlLWKKhexljGteLodRIF09hfc3Zwj9w3HPdEbtQwpMKWzjaGJ6DpXi
 fm3YkW8fYigY/pOUSwl5f25WP2QnhoZxU5olvSdAyaJPr2ajKxpCnZOT0lbQApOX
 Qseb3vX/rh2CUYBHndP0jXv+A8wdNWqNVR9sDAC3ENNfbxs4G8C2Mlp/L9MF/BRb
 nHxGvAsQG5ws5g+WNPFbAhD2ChWF4PYYUN+vSCUN/Pmz/LZBhdiDospE596KkTZV
 vURrZAoHSFJvYEe9FHUZeHhNWExhMm+TkEjkPj3xG4+Chc/L87Radycndy1wj804
 ewMe+UHpyIDRQa/ql+XQaj6t/D6qm10FugHeWvmK6vgdWyQ19hEBYX/uKU2DlhtI
 SB4NWkrgchshZ3HPPFnW
 =5NcR
 -----END PGP SIGNATURE-----

Merge tag 'exynos-drm-fixes-for-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

- Fix page fault issue at Mixer device
  . This patch fixes the page fault issue by correcting sychronization
    method for updating shadow registers for Mixer device.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1553162223-10090-1-git-send-email-inki.dae@samsung.com
2019-03-22 11:52:44 +10:00
Michal Vokáč
15b43e497f ARM: dts: imx6dl-yapp4: Use correct pseudo PHY address for the switch
The switch is accessible through pseudo PHY which is located at 0x10.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Fixes: 87489ec3a7 ("ARM: dts: imx: Add Y Soft IOTA Draco, Hydra and Ursa boards")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-03-22 09:20:32 +08:00
Dave Airlie
8cf13f71fa A protection on our mmap against attempts to map past the end of the object;
plus a fix off-by-one in our hang report and a protection;
 and a fix for eDP panels on Gen9 platforms on VBT absence.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJckpgDAAoJEPpiX2QO6xPKDDgH/i9AQQLbJCX/nXzJUhnKSvu5
 xFIFbXt/kd5hkeRVZ/qLW831okdJdVq8GW/C4AAfEMNbnhhhAI1/PRuO7lrHKuU6
 QAOKvJrNKkdRxNrkHArIKXTdfjr4UnXOgQFSgFQNRLFmOi9/czoeWG0/kGkvp+WT
 6HV7o1Zgghqs9SUOtmA9TZ/znq1neLlwJRuePep1TY4vbJoWnZ0ekn/MqUPDWpg2
 W5stfS7AyuMX1CWJnr91G6NEqaU8Bm9aC1WmBqvdPPkVYWIJNDGWLjOEDpqzdoi4
 D17RntvZ685mer+GM6fMICY09lC9vuYomT3gdRy4ZLz24VmMSyZJ7juJoYYGa7Y=
 =gHfL
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2019-03-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

A protection on our mmap against attempts to map past the end of the object;
plus a fix off-by-one in our hang report and a protection;
and a fix for eDP panels on Gen9 platforms on VBT absence.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190320201451.GA7993@intel.com
2019-03-22 10:52:27 +10:00
Masanari Iida
41b37f4c0f ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi
This patch fixes a spelling typo.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Fixes: cc42603de3 ("ARM: dts: imx6q-icore-rqs: Add Engicam IMX6 Q7 initial support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-03-22 08:42:24 +08:00
Dave Airlie
cd84579112 Merge branch 'linux-5.1' of git://github.com/skeggsb/linux into drm-fixes
Some minor nouveau dmem and other fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <bskeggs@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA==kMkD6n-cS9KpQBcTU1E8p7Wc+H1ZuOhSfD7yTFJVvkw@mail.gmail.com
2019-03-22 10:39:35 +10:00
Jérôme Glisse
8385741807 drm/nouveau/dmem: empty chunk do not have a buffer object associated with them.
Empty chunk do not have a bo associated with them so no need to pin/unpin
on suspend/resume.

This fix suspend/resume on 5.1rc1 when NOUVEAU_SVM is enabled.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-03-22 09:58:38 +10:00
YueHaibing
909e9c9c42 drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
pm_runtime_get_sync returns negative on failure.

Fixes: eaeb9010bb ("drm/nouveau/debugfs: Wake up GPU before doing any reclocking")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-03-22 09:57:58 +10:00
Dan Carpenter
18ec3c129b drm/nouveau/dmem: Fix a NULL vs IS_ERR() check
The hmm_devmem_add() function doesn't return NULL, it returns error
pointers.

Fixes: 5be73b6908 ("drm/nouveau/dmem: device memory helpers for SVM")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-03-22 09:57:58 +10:00
YueHaibing
2219c9ee92 drm/nouveau/dmem: remove set but not used variable 'drm'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/nouveau/nouveau_dmem.c: In function 'nouveau_dmem_free':
drivers/gpu/drm/nouveau/nouveau_dmem.c:103:22: warning:
 variable 'drm' set but not used [-Wunused-but-set-variable]
  struct nouveau_drm *drm;
                      ^

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-03-22 09:57:58 +10:00
David S. Miller
1d965c4def Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock'
Vlad Buslov says:

====================
Refactor flower classifier to remove dependency on rtnl lock

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor
flower classifier to support unlocked execution and register it with
unlocked flag.

This patch set implements following changes to make flower classifier
concurrency-safe:

- Implement reference counting for individual filters. Change fl_get to
  take reference to filter. Implement tp->ops->put callback that was
  introduced in cls API patch set to release reference to flower filter.

- Use tp->lock spinlock to protect internal classifier data structures
  from concurrent modification.

- Handle concurrent tcf proto deletion by returning EAGAIN, which will
  cause cls API to retry and create new proto instance or return error
  to the user (depending on message type).

- Handle concurrent insertion of filter with same priority and handle by
  returning EAGAIN, which will cause cls API to lookup filter again and
  process it accordingly to netlink message flags.

- Extend flower mask with reference counting and protect masks list with
  masks_lock spinlock.

- Prevent concurrent mask insertion by inserting temporary value to
  masks hash table. This is necessary because mask initialization is a
  sleeping operation and cannot be done while holding tp->lock.

Both chain level and classifier level conflicts are resolved by
returning -EAGAIN to cls API that results restart of whole operation.
This retry mechanism is a result of fine-grained locking approach used
in this and previous changes in series and is necessary to allow
concurrent updates on same chain instance. Alternative approach would be
to lock the whole chain while updating filters on any of child tp's,
adding and removing classifier instances from the chain. However, since
most CPU-intensive parts of filter update code are specifically in
classifier code and its dependencies (extensions and hw offloads), such
approach would negate most of the gains introduced by this change and
previous changes in the series when updating same chain instance.

Tcf hw offloads API is not changed by this patch set and still requires
caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock
state by means of 'rtnl_held' flag provided by cls API and obtains the
lock before calling hw offloads. Following patch set will lift this
restriction and refactor cls hw offloads API to support unlocked
execution.

With these changes flower classifier is safely registered with
TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch.

Changes from V2 to V3:
- Rebase on latest net-next

Changes from V1 to V2:
- Extend cover letter with explanation about retry mechanism.
- Rebase on current net-next.
- Patch 1:
  - Use rcu_dereference_raw() for tp->root dereference.
  - Update comment in fl_head_dereference().
- Patch 2:
  - Remove redundant check in fl_change error handling code.
  - Add empty line between error check and new handle assignment.
- Patch 3:
  - Refactor loop in fl_get_next_filter() to improve readability.
- Patch 4:
  - Refactor __fl_delete() to improve readability.
- Patch 6:
  - Fix comment in fl_check_assign_mask().
- Patch 9:
  - Extend commit message.
  - Fix error code in comment.
- Patch 11:
  - Fix fl_hw_replace_filter() to always release rtnl lock in error
    handlers.
- Patch 12:
  - Don't take rtnl lock before calling __fl_destroy_filter() in
    workqueue context.
  - Extend commit message with explanation why flower still takes rtnl
    lock before calling hardware offloads API.

Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
9214919006 net: sched: flower: set unlocked flag for flower proto ops
Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock
in fl_destroy_filter_work() that is executed on workqueue instead of being
called by cls API and is not affected by setting
TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower
classifier before calling hardware offloads API that has not been updated
for unlocked execution.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
c24e43d83b net: sched: flower: track rtnl lock state
Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
to internal functions that need to know rtnl lock state. Take rtnl lock
before calling tcf APIs that require it (hw offload, bind filter, etc.).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
3d81e7118d net: sched: flower: protect flower classifier state with spinlock
struct tcf_proto was extended with spinlock to be used by classifiers
instead of global rtnl lock. Use it to protect shared flower classifier
data structures (handle_idr, mask hashtable and list) and fields of
individual filters that can be accessed concurrently. This patch set uses
tcf_proto->lock as per instance lock that protects all filters on
tcf_proto.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
272ffaadeb net: sched: flower: handle concurrent tcf proto deletion
Without rtnl lock protection tcf proto can be deleted concurrently. Check
tcf proto 'deleting' flag after taking tcf spinlock to verify that no
concurrent deletion is in progress. Return EAGAIN error if concurrent
deletion detected, which will cause caller to retry and possibly create new
instance of tcf proto.

Retry mechanism is a result of fine-grained locking approach used in this
and previous changes in series and is necessary to allow concurrent updates
on same chain instance. Alternative approach would be to lock the whole
chain while updating filters on any of child tp's, adding and removing
classifier instances from the chain. However, since most CPU-intensive
parts of filter update code are specifically in classifier code and its
dependencies (extensions and hw offloads), such approach would negate most
of the gains introduced by this change and previous changes in the series
when updating same chain instance.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
9a2d938998 net: sched: flower: handle concurrent filter insertion in fl_change
Check if user specified a handle and another filter with the same handle
was inserted concurrently. Return EAGAIN to retry filter processing (in
case it is an overwrite request).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
259e60f967 net: sched: flower: protect masks list with spinlock
Protect modifications of flower masks list with spinlock to remove
dependency on rtnl lock and allow concurrent access.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
195c234d15 net: sched: flower: handle concurrent mask insertion
Without rtnl lock protection masks with same key can be inserted
concurrently. Insert temporary mask with reference count zero to masks
hashtable. This will cause any concurrent modifications to retry.

Wait for rcu grace period to complete after removing temporary mask from
masks hashtable to accommodate concurrent readers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
f48ef4d5b0 net: sched: flower: add reference counter to flower mask
Extend fl_flow_mask structure with reference counter to allow parallel
modification without relying on rtnl lock. Use rcu read lock to safely
lookup mask and increment reference counter in order to accommodate
concurrent deletes.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
b2552b8c40 net: sched: flower: track filter deletion with flag
In order to prevent double deletion of filter by concurrent tasks when rtnl
lock is not used for synchronization, add 'deleted' filter field. Check
value of this field when modifying filters and return error if concurrent
deletion is detected.

Refactor __fl_delete() to accept pointer to 'last' boolean as argument,
and return error code as function return value instead. This is necessary
to signal concurrent filter delete to caller.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
061775583e net: sched: flower: introduce reference counting for filters
Extend flower filters with reference counting in order to remove dependency
on rtnl lock in flower ops and allow to modify filters concurrently.
Reference to flower filter can be taken/released concurrently as soon as it
is marked as 'unlocked' by last patch in this series. Use atomic reference
counter type to make concurrent modifications safe.

Always take reference to flower filter while working with it:
- Modify fl_get() to take reference to filter.
- Implement tp->put() callback as fl_put() function to allow cls API to
release reference taken by fl_get().
- Modify fl_change() to assume that caller holds reference to fold and take
reference to fnew.
- Take reference to filter while using it in fl_walk().

Implement helper functions to get/put filter reference counter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
620da48608 net: sched: flower: refactor fl_change
As a preparation for using classifier spinlock instead of relying on
external rtnl lock, rearrange code in fl_change. The goal is to group the
code which changes classifier state in single block in order to allow
following commits in this set to protect it from parallel modification with
tp->lock. Data structures that require tp->lock protection are mask
hashtable and filters list, and classifier handle_idr.

fl_hw_replace_filter() is a sleeping function and cannot be called while
holding a spinlock. In order to execute all sequence of changes to shared
classifier data structures atomically, call fl_hw_replace_filter() before
modifying them.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Vlad Buslov
e474619a24 net: sched: flower: don't check for rtnl on head dereference
Flower classifier only changes root pointer during init and destroy. Cls
API implements reference counting for tcf_proto, so there is no danger of
concurrent access to tp when it is being destroyed, even without protection
provided by rtnl lock.

Implement new function fl_head_dereference() to dereference tp->root
without checking for rtnl lock. Use it in all flower function that obtain
head pointer instead of rtnl_dereference().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:32:17 -07:00
Jakub Kicinski
31f1a0e37c nfp: remove defines for unused control bits
NFP driver ABI contains bits for L2 switching which were never
implemented in initially envisioned form.

Remove the defines, and open up the possibility of
reclaiming the bits for other uses.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:01:55 -07:00
David S. Miller
143eb9ac9f Merge branch 'rhashtable-cleanups'
NeilBrown says:

====================
Two clean-ups for rhashtable.

These two patches make small improvements to
rhashtable, but are otherwise unrelated.

Thanks to Herbert, Miguel, and Paul for the review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:01:10 -07:00
NeilBrown
f7ad68bf98 rhashtable: rename rht_for_each*continue as *from.
The pattern set by list.h is that for_each..continue()
iterators start at the next entry after the given one,
while for_each..from() iterators start at the given
entry.

The rht_for_each*continue() iterators are documented as though the
start at the 'next' entry, but actually start at the given entry,
and they are used expecting that behaviour.
So fix the documentation and change the names to *from for consistency
with list.h

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:01:10 -07:00
NeilBrown
4feb7c7a4f rhashtable: don't hold lock on first table throughout insertion.
rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.

As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.

When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table.  The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.

This simplifies the code and allows the ->rehash field to be
discarded.

We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop().  This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 14:01:10 -07:00
Yunsheng Lin
5f543a54ee net: hns3: fix for not calculating tx bd num correctly
When there is only one byte in a frag, the current calculation
using "(size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET"
will return zero, because HNS3_MAX_BD_SIZE is 65535 and
HNS3_MAX_BD_SIZE_OFFSET is 16. So it will cause tx error when
a frag's size is one byte.

This patch fixes it by using DIV_ROUND_UP.

Fixes: 3fe13ed95d ("net: hns3: avoid mult + div op in critical data path")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:59:24 -07:00
Herbert Xu
408f13ef35 rhashtable: Still do rehash when we get EEXIST
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.

This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.

The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.

Fixes: da20420f83 ("rhashtable: Add nested tables")
Reported-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:57:28 -07:00
David S. Miller
83b038db25 Merge branch 'net-phy-Move-Omega-PHY-entry-to-Cygnus-PHY-driver'
Florian Fainelli says:

====================
net: phy: Move Omega PHY entry to Cygnus PHY driver

In order to pave the way for adding some specific Omega PHY features
that may not be desirable on other products covered by the bcm7xxx PHY
driver, split the Omega PHY entry into the Cygnus PHY driver such that
the PHY drivers are reflective of product lines/business units
maintaining them within Broadcom.

No functional changes intended.
====================

Acked-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:41:52 -07:00
Florian Fainelli
17cc982176 net: phy: Move Omega PHY entry to Cygnus PHY driver
Cygnus and Omega are part of the same business unit and product line, it
makes sense to group PHY entries by products such that a platform can
select only the drivers that it needs. Bring all the functionality that
the BCM7XXX_28NM_GPHY() macro hides for us and remove the Omega PHY
entry from bcm7xxx.c.

As an added bonus, we now have a proper mdio_device_id entry to permit
auto-loading.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:41:26 -07:00
Florian Fainelli
f878fe5685 net: phy: Prepare for moving Omega out of bcm7xxx
The Omega PHY entry was added to bcm7xxx.c out of convenience and this
breaks the one driver per product line paradigm that was applied up
until now. Since the AFE initialization is shared between Omega and
BCM7xxx move the relevant functions to bcm-phy-lib.[ch]. No functional
changes introduced.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:41:26 -07:00
Julian Wiedmann
02afc7ad45 net: dst: remove gc leftovers
Get rid of some obsolete gc-related documentation and macros that were
missed in commit 5b7c9a8ff8 ("net: remove dst gc related code").

CC: Wei Wang <weiwan@google.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:39:25 -07:00
Wang Hai
6b70fc94af net-sysfs: Fix memory leak in netdev_register_kobject
When registering struct net_device, it will call
	register_netdevice ->
		netdev_register_kobject ->
			device_initialize(dev);
			dev_set_name(dev, "%s", ndev->name)
			device_add(dev)
			register_queue_kobjects(ndev)

In netdev_register_kobject(), if device_add(dev) or
register_queue_kobjects(ndev) failed. Register_netdevice()
will return error, causing netdev_freemem(ndev) to be
called to free net_device, however put_device(&dev->dev)->..->
kobject_cleanup() won't be called, resulting in a memory leak.

syzkaller report this:
BUG: memory leak
unreferenced object 0xffff8881f4fad168 (size 8):
comm "syz-executor.0", pid 3575, jiffies 4294778002 (age 20.134s)
hex dump (first 8 bytes):
  77 70 61 6e 30 00 ff ff                          wpan0...
backtrace:
  [<000000006d2d91d7>] kstrdup_const+0x3d/0x50 mm/util.c:73
  [<00000000ba9ff953>] kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
  [<000000005555ec09>] kobject_set_name_vargs+0x55/0x130 lib/kobject.c:281
  [<0000000098d28ec3>] dev_set_name+0xbb/0xf0 drivers/base/core.c:1915
  [<00000000b7553017>] netdev_register_kobject+0xc0/0x410 net/core/net-sysfs.c:1727
  [<00000000c826a797>] register_netdevice+0xa51/0xeb0 net/core/dev.c:8711
  [<00000000857bfcfd>] cfg802154_update_iface_num.isra.2+0x13/0x90 [ieee802154]
  [<000000003126e453>] ieee802154_llsec_fill_key_id+0x1d5/0x570 [ieee802154]
  [<00000000e4b3df51>] 0xffffffffc1500e0e
  [<00000000b4319776>] platform_drv_probe+0xc6/0x180 drivers/base/platform.c:614
  [<0000000037669347>] really_probe+0x491/0x7c0 drivers/base/dd.c:509
  [<000000008fed8862>] driver_probe_device+0xdc/0x240 drivers/base/dd.c:671
  [<00000000baf52041>] device_driver_attach+0xf2/0x130 drivers/base/dd.c:945
  [<00000000c7cc8dec>] __driver_attach+0x10e/0x210 drivers/base/dd.c:1022
  [<0000000057a757c2>] bus_for_each_dev+0x154/0x1e0 drivers/base/bus.c:304
  [<000000005f5ae04b>] bus_add_driver+0x427/0x5e0 drivers/base/bus.c:645

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 1fa5ae857b ("driver core: get rid of struct device's bus_id string array")
Signed-off-by: Wang Hai <wanghai26@huawei.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:38:27 -07:00
David S. Miller
88f808f312 Merge branch 'net-broadcom-Remove-print-of-base-address'
Florian Fainelli says:

====================
net: broadcom: Remove print of base address

Some broadcom MDIO/switch/Ethernet MAC drivers insist on printing the
base register virtual address which has little value.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:32:35 -07:00
Florian Fainelli
62be757fbe net: systemport: Remove print of base address
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful, especially given we use a dev_info() which
already displays the platform device's address.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:32:35 -07:00
Florian Fainelli
fbb7bc45ea net: dsa: bcm_sf2: Remove print of base address
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful, we use a dev_info() print which already
displays the platform device's address.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:32:35 -07:00
Florian Fainelli
647aed232a net: phy: mdio-bcm-unimac: Remove print of base address
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful, especially given we use a dev_info() which
already displays the platform device's address.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:32:35 -07:00
David Ahern
10585b4342 ipv6: Remove fallback argument from ip6_hold_safe
net and null_fallback are redundant. Remove null_fallback in favor of
!net check.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:30:34 -07:00
David Ahern
9ab948a91b ipv4: Allow amount of dirty memory from fib resizing to be controllable
fib_trie implementation calls synchronize_rcu when a certain amount of
pages are dirty from freed entries. The number of pages was determined
experimentally in 2009 (commit c3059477fc).

At the current setting, synchronize_rcu is called often -- 51 times in a
second in one test with an average of an 8 msec delay adding a fib entry.
The total impact is a lot of slow down modifying the fib. This is seen
in the output of 'time' - the difference between real time and sys+user.
For example, using 720,022 single path routes and 'ip -batch'[1]:

    $ time ./ip -batch ipv4/routes-1-hops
    real    0m14.214s
    user    0m2.513s
    sys     0m6.783s

So roughly 35% of the actual time to install the routes is from the ip
command getting scheduled out, most notably due to synchronize_rcu (this
is observed using 'perf sched timehist').

This patch makes the amount of dirty memory configurable between 64k where
the synchronize_rcu is called often (small, low end systems that are memory
sensitive) to 64M where synchronize_rcu is called rarely during a large
FIB change (for high end systems with lots of memory). The default is 512kB
which corresponds to the current setting of 128 pages with a 4kB page size.

As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
in a second blocking for up to 30 msec in a single instance, and a total
of almost 100 msec across the 4 calls in the second. The trade off is
allowing FIB entries to consume more memory in a given time window but
but with much better fib insertion rates (~30% increase in prefixes/sec).
With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
file runs in:

    $ time ./ip -batch ipv4/routes-1-hops
    real    0m9.692s
    user    0m2.491s
    sys     0m6.769s

So the dead time is reduced to about 1/2 second or <5% of the real time.

[1] 'ip' modified to not request ACK messages which improves route
    insertion times by about 20%

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:29:53 -07:00
David S. Miller
1ea186e3ae Merge branch 'net-sched-validate-the-control-action-with-all-the-other-parameters'
Davide Caratti says:

====================
net/sched: validate the control action with all the other parameters

currently, the kernel checks for bad values of the control action in
tcf_action_init_1(), after a successful call to the action's init()
function. When the control action is 'goto chain', this causes two
undesired behaviors:

1. "misconfigured action after replace that causes kernel crash":
   if users replace a valid TC action with another one having invalid
   control action, all the new configuration data (including the bad
   control action) are applied successfully, even if the kernel returned
   an error. As a consequence, it's possible to trigger a NULL pointer
   dereference in the traffic path of every TC action (1), replacing the
   control action with 'goto chain x', when chain <x> doesn't exist.

2. "refcount leak that makes kmemleak complain"
   when a valid 'goto chain' action is overwritten with another action,
   the kernel forgets to decrease refcounts in the chain.

The above problems can be fixed if we validate the control action in each
action's init() function, the same way as we are already doing for all the
other configuration parameters.
Now that chains can be released after an action is replaced, we need to
care about concurrent access of 'goto_chain' pointer: ensure we access it
through RCU, like we did with most action-specific configuration parameters.

- Patch 1 removes the wrong checks and provides functions that can be
  used to properly validate control actions in  individual actions
- Patch 2 to 16 fix individual actions, and add TDC selftest code to
  verify the correct behavior (2)
- Patch 17 and 18 fix concurrent access issues on 'goto_chain', that can be
  observed after the chain refcount leak is fixed.

Changes since v1:
- reword the cover letter
- condense the extack message in case tc_action_check_ctrlact() is called
  with invalid parameters.
- add tcf_action_set_ctrlact() to avoid code duplication an make the
  RCU-ification of 'goto_chain' easier.
- fix errors in act_ife, act_simple, act_skbedit, and avoid useless 'goto
  end' in act_connmark, thanks a lot to Vlad Buslov.
- avoid dereferencing 'goto_chain' in tcf_gact_goto_chain_index(), so
  we don't have to care about the grace period there.
- let actions respect the grace period when they release chains, thanks
  to Cong Wang and Vlad Buslov.

Changes since RFC:
- include a fix for all TC actions
- add a selftest for each TC action
- squash fix for refcount leaks into a single patch, the first in the
  series, thanks to Cong Wang
- ensure that chain refcount is released without tcfa_lock held, thanks
  to Vlad Buslov

Notes:
(1) act_ipt didn't need any fix, as the control action is constantly equal
    to TC_ACT_OK.
(2) the selftest for act_simple fails because userspace tc backend for
    'simple' does not parse the control action correctly (and hardcodes it
    to TC_ACT_PIPE).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
ee3bbfe806 net/sched: let actions use RCU to access 'goto_chain'
use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:

 # tc chain add dev dd0 chain 42 ingress protocol ip flower \
 > ip_proto udp action pass index 4
 # tc filter add dev dd0 ingress protocol ip flower \
 > ip_proto udp action csum udp goto chain 42 index 66
 # tc chain del dev dd0 chain 42 ingress
 (start UDP traffic towards dd0)
 # tc action replace action csum udp pass index 66

was run repeatedly for several hours.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00