Commit Graph

619976 Commits

Author SHA1 Message Date
David S. Miller
32986b554a Merge branch 'ovs-mpls'
Jiri Benc says:

====================
openvswitch: mpls fix and clean up

Convert to the new mpls skb layout the last remaining place in openvswitch,
forgotten on the mpls GSO rework. The GSO rework also allows for some
cleanup in the third patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 02:00:29 -04:00
Jiri Benc
85de4a2101 openvswitch: use mpls_hdr
skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper
instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 02:00:22 -04:00
Jiri Benc
9095e10edd mpls: move mpls_hdr to a common location
This will be also used by openvswitch.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 02:00:21 -04:00
Jiri Benc
f7d49bce8e openvswitch: mpls: set network header correctly on key extract
After the 48d2ab609b ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.

However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.

Fixes: 48d2ab609b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 02:00:21 -04:00
Arnd Bergmann
ab58070569 mlxsw: spectrum_router: avoid potential uninitialized data usage
If fi->fib_nhs is zero, the router interface pointer is uninitialized, as shown by
this warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_router_fib_event':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1674:21: error: 'r' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1643:23: note: 'r' was declared here

This changes the loop so we handle the case the same way as finding no router
interface pointer attached to one of the nexthops to ensure we always
trap here instead of using uninitialized data.

Fixes: b45f64d16d ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:55:18 -04:00
Arnd Bergmann
d0debb76df net/mlx5e: shut up maybe-uninitialized warning
Build-testing this driver with -Wmaybe-uninitialized gives a new false-positive
warning that I can't really explain:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_configure_flower':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:509:3: error: 'old_attr' may be used uninitialized in this function [-Werror=maybe-uninitialized]

It's obvious from the code that 'old_attr' is initialized whenever 'old'
is non-NULL here. The warning appears with all versions I tested from gcc-4.7
through gcc-6.1, and I could not come up with a way to rewrite the function
in a more readable way that avoids the warning, so I'm adding another
initialization to shut it up.

Fixes: 8b32580df1 ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:55:18 -04:00
Arnd Bergmann
7c70c4f8b2 cxgb4: unexport cxgb4_dcb_enabled
A recent cleanup marked cxgb4_dcb_enabled as 'static', which is correct, but this ignored
how the symbol is also exported. In addition, the export can be compiled out when modules
are disabled, causing a harmless compiler warning in configurations for which it is not
used at all:

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:282:12: error: 'cxgb4_dcb_enabled' defined but not used [-Werror=unused-function]

This removes the export and moves the function into the correct #ifdef so we only build
it when there are users.

Fixes: 50935857f8 ("cxgb4: mark symbols static where possible")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:33:15 -04:00
Arnd Bergmann
fa34cd94fb net: rtnl: avoid uninitialized data in IFLA_VF_VLAN_LIST handling
With the newly added support for IFLA_VF_VLAN_LIST netlink messages,
we get a warning about potential uninitialized variable use in
the parsing of the user input when enabling the -Wmaybe-uninitialized
warning:

net/core/rtnetlink.c: In function 'do_setvfinfo':
net/core/rtnetlink.c:1756:9: error: 'ivvl$' may be used uninitialized in this function [-Werror=maybe-uninitialized]

I have not been able to prove whether it is possible to arrive in
this code with an empty IFLA_VF_VLAN_LIST block, but if we do,
then ndo_set_vf_vlan gets called with uninitialized arguments.

This adds an explicit check for an empty list, making it obvious
to the reader and the compiler that this cannot happen.

Fixes: 79aab093a0 ("net: Update API for VF vlan protocol 802.1ad support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:31:48 -04:00
Paolo Abeni
63d75463c9 net: pktgen: fix pkt_size
The commit 879c7220e8 ("net: pktgen: Observe needed_headroom
of the device") increased the 'pkt_overhead' field value by
LL_RESERVED_SPACE.
As a side effect the generated packet size, computed as:

	/* Eth + IPh + UDPh + mpls */
	datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
		  pkt_dev->pkt_overhead;

is decreased by the same value.
The above changed slightly the behavior of existing pktgen users,
and made the procfs interface somewhat inconsistent.
Fix it by restoring the previous pkt_overhead value and using
LL_RESERVED_SPACE as extralen in skb allocation.
Also, change pktgen_alloc_skb() to only partially reserve
the headroom to allow the caller to prefetch from ll header
start.

v1 -> v2:
 - fixed some typos in the comments

Fixes: 879c7220e8 ("net: pktgen: Observe needed_headroom of the device")
Suggested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:29:57 -04:00
Gavin Schenk
b82d44d784 net: fec: set mac address unconditionally
If the mac address origin is not dt, you can only safely assign a mac
address after "link up" of the device. If the link is off the clocks are
disabled and because of issues assigning registers when clocks are off the
new mac address cannot be written in .ndo_set_mac_address() on some soc's.
This fix sets the mac address unconditionally in fec_restart(...) and
ensures consistency between fec registers and the network layer.

Signed-off-by: Gavin Schenk <g.schenk@eckelmann.de>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 9638d19e48 ("net: fec: add netif status check before set mac address")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:27:41 -04:00
Baoyou Xie
3a82e78c13 net: ethernet: mediatek: mark symbols static where possible
We get 2 warnings when building kernel with W=1:
drivers/net/ethernet/mediatek/mtk_eth_soc.c:2041:5: warning: no previous prototype for 'mtk_get_link_ksettings' [-Wmissing-prototypes]
drivers/net/ethernet/mediatek/mtk_eth_soc.c:2052:5: warning: no previous prototype for 'mtk_set_link_ksettings' [-Wmissing-prototypes]

In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:23:47 -04:00
Baoyou Xie
8efebd6e5e cxgb4: mark cxgb_setup_tc() static
We get 1 warning when building kernel with W=1:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:2715:5: warning: no previous prototype for 'cxgb_setup_tc' [-Wmissing-prototypes]

In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
so this patch marks this function with 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03 01:19:29 -04:00
Maciej Żenczykowski
cb9e684e89 ipv6 addrconf: remove addrconf_sysctl_hop_limit()
This is an effective no-op in terms of user observable behaviour.

By preventing the overwrite of non-null extra1/extra2 fields
in addrconf_sysctl() we can enable the use of proc_dointvec_minmax().

This allows us to eliminate the constant min/max (1..255) trampoline
function that is addrconf_sysctl_hop_limit().

This is nice because it simplifies the code, and allows future
sysctls with constant min/max limits to also not require trampolines.

We still can't eliminate the trampoline for mtu because it isn't
actually a constant (it depends on other tunables of the device)
and thus requires at-write-time logic to enforce range.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02 23:48:13 -04:00
Stefan Agner
d4ef9f7212 netfilter: bridge: clarify bridge/netfilter message
When using bridge without bridge netfilter enabled the message
displayed is rather confusing and leads to belive that a deprecated
feature is in use. Use IS_MODULE to be explicit that the message only
affects users which use bridge netfilter as module and reword the
message.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02 22:44:03 -04:00
David S. Miller
b50afd203a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three sets of overlapping changes.  Nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02 22:20:41 -04:00
Linus Torvalds
c8d2bc9bc3 Linux 4.8 2016-10-02 16:24:33 -07:00
Linus Torvalds
f76d9c61d9 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Three relatively small fixes for ARM:

   - Roger noticed that dma_max_pfn() was calculating the upper limit
     wrongly, by adding the PFN offset of memory twice.

   - A fix from Robin to correct parsing of MPIDR values when the
     address size is larger than one BE32 unit.

   - A fix from Srinivas to ensure that we do not rely on the boot
     loader (or previous Linux kernel) setting the translation table
     base register a certain way in the decompressor, which can lead to
     crashes"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
  ARM: 8617/1: dma: fix dma_max_pfn()
  ARM: 8616/1: dt: Respect property size when parsing CPUs
2016-10-02 15:23:00 -07:00
Srinivas Ramana
117e5e9c4c ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
If the bootloader uses the long descriptor format and jumps to
kernel decompressor code, TTBCR may not be in a right state.
Before enabling the MMU, it is required to clear the TTBCR.PD0
field to use TTBR0 for translation table walks.

The commit dbece45894 ("ARM: 7501/1: decompressor:
reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
doesn't consider all the bits for the size of TTBCR.N.

Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
indicate the use of TTBR0 and the correct base address width.

Fixes: dbece45894 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-10-02 20:05:14 +01:00
Linus Torvalds
be67d60ba9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "The last regression fixes for 4.8 final:

   - Two patches addressing the fallout of the CR4 optimizations which
     caused CR4-less machines to fail.

   - Fix the VDSO build on big endian machines

   - Take care of FPU initialization if no CPUID is available otherwise
     task struct size ends up being zero

   - Fix up context tracking in case load_gs_index fails"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Fix context tracking state warning when load_gs_index fails
  x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
  x86/vdso: Fix building on big endian host
  x86/boot: Fix another __read_cr4() case on 486
  x86/init: Fix cr4_init_shadow() on CR4-less machines
2016-10-02 11:04:29 -07:00
Linus Torvalds
66188fb11a Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "Another round of fixes:

   - CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
   - CPS: Avoid BUG() when offlining pre-r6 CPUs
   - DEC: Avoid gas warnings due to suspicious instruction scheduling by
     manually expanding assembler macros.
   - FTLB: Fix configuration by moving confiuguratoin after probing
   - FTLB: clear execution hazard after changing FTLB enable
   - Highmem: Fix detection of unsupported highmem with cache aliases
   - I6400: Don't touch FTLBP chicken bits
   - microMIPS: Fix BUILD_ROLLBACK_PROLOGUE
   - Malta: Fix IOCU disable switch read for MIPS64
   - Octeon: Fix probing of devices attached to GPIO lines
   - uprobes: Misc small fixes"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
  MIPS: Fix detection of unsupported highmem with cache aliases
  MIPS: Malta: Fix IOCU disable switch read for MIPS64
  MIPS: Fix BUILD_ROLLBACK_PROLOGUE for microMIPS
  MIPS: clear execution hazard after changing FTLB enable
  MIPS: Configure FTLB after probing TLB sizes from config4
  MIPS: Stop setting I6400 FTLBP
  MIPS: DEC: Avoid la pseudo-instruction in delay slots
  MIPS: Octeon: mark GPIO controller node not populated after IRQ init.
  MIPS: uprobes: fix use of uninitialised variable
  MIPS: uprobes: remove incorrect set_orig_insn
  MIPS: fix uretprobe implementation
  MIPS: smp-cps: Avoid BUG() when offlining pre-r6 CPUs
2016-10-02 10:53:38 -07:00
Linus Torvalds
0c7fc30f18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:

 1) Fix section mismatches in some builds, from Paul Gortmaker.

 2) Need to count huge zero page mappings when doing TSB sizing, from
    Mike Kravetz.

 3) Fix handing of cpu_possible_mask when nr_cpus module option is
    specified, from Atish Patra.

 4) Don't allocate irq stacks until nr_irqs has been processed, also
    from Atish Patra.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix non-SMP build.
  sparc64: Fix irq stack bootmem allocation.
  sparc64: Fix cpu_possible_mask if nr_cpus is set
  sparc64 mm: Fix more TSB sizing issues
  sparc64: fix section mismatch in find_numa_latencies_for_group
2016-10-02 10:42:26 -07:00
Linus Torvalds
bb6bbc7ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix wrong TCP checksums on MTU probing when checksum offloading is
    disabled, from Douglas Caetano dos Santos.

 2) Fix qdisc backlog updates in qfq and sfb schedulers, from Cong Wang.

 3) Route lookup flow key protocol value is wrong in ip6gre_xmit_other(),
    fix from Lance Richardson.

 4) Scheduling while atomic in multicast routing code of ipv4 and ipv6,
    fix from Nikolay Aleksandrov.

 5) Fix packet alignment in fec driver, from Eric Nelson.

 6) Fix perf regression in sctp due to struct layout and cache misses,
    from Xin Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock
  sctp: change to check peer prsctp_capable when using prsctp polices
  sctp: remove prsctp_param from sctp_chunk
  sctp: move sent_count to the memory hole in sctp_chunk
  tg3: Avoid NULL pointer dereference in tg3_io_error_detected()
  act_ife: Fix false encoding
  act_ife: Fix external mac header on encode
  VSOCK: Don't dec ack backlog twice for rejected connections
  Revert "net: ethernet: bcmgenet: use phydev from struct net_device"
  net: fec: align IP header in hardware
  net: fec: remove QUIRK_HAS_RACC from i.mx27
  net: fec: remove QUIRK_HAS_RACC from i.mx25
  ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
  ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
  tcp: fix a compile error in DBGUNDO()
  tcp: fix wrong checksum calculation on MTU probing
  sch_sfb: keep backlog updated with qlen
  sch_qfq: keep backlog updated with qlen
  can: dev: fix deadlock reported after bus-off
2016-10-02 10:36:41 -07:00
Paul Burton
6605d156bd MIPS: CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
When discovering the number of VPEs per core, smp_num_siblings will be
incorrect for kernels built without support for the MIPS MultiThreading
(MT) ASE running on systems which implement said ASE. This leads to
accesses to VPEs in secondary cores being performed incorrectly since
mips_cm_vp_id calculates the wrong ID to write to the local "other"
registers. Fix this by examining the number of VPEs in the core as
reported by the CM.

This patch presumes that the number of VPEs will be the same in each
core of the system. As this path only applies to systems with CM version
2.5 or lower, and this property is true of all such known systems, this
is likely to be fine but is described in a comment for good measure.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14338/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-10-02 01:40:56 +02:00
Linus Torvalds
f51fdffad5 SCSI fixes on 20161001
One final fix before 4.8: there's a memory leak triggered by turning scsi mq
 off due to the fact that we assume on host release that the already running
 hosts weren't mq based because that's the state of the global flag (even
 though they were), so fix it by tracking this on a per host host basis.
 
 Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJX72dVAAoJEAVr7HOZEZN47IQP/jwEcng9UUJ/OtxM+awKigDF
 3ySxC1giaYCh8wOreKhJZt+0145D2NDFLGtiW5/yTHFOTPynAEetesnZSXKFUk6e
 PSdHVs8EU+gBhmL6wEtV3K4zcM4crgY+mSYZHy6LgVzZqRu8xljnb+fmHL69mkHy
 FvxiKBFz4T/sULN7gEr37UN9SfjQpSP6MmyLl9Q4g2BFsEWPXWBkPD03qc59TGoC
 U15srgCxQLmKZYq5WXdH0eU4XViUZZFD8Y0b02c23R/ltisblBX5LYUSGQh8njaY
 ZEt7EcPtI0K47uadxizx47rULkiaKZsX1bnwRfCJKiR14TTj9neJ38mQjBuGBPPf
 w5zR33ljp9SyDPsNFNVSoF03gFfc13T6Q5TSoUQix4yOKzDuCyuV5Wcvamkh3FyO
 Zw57k1LsR+3v2aFP5OgTs4QXOZx+6WmztXWGSFs0JZRD99CNxfqRrkD1XSvVcsfh
 Mi+OQAbLh1iqmZ4AGmASYiJTA+Ef3N9+aF8LtdPEjcLFjJij1PZQydxx6x5a0or+
 E7dzo4ZGjVaPG6acyFRaw0AOJ7xbZeT5Ydt24Psm5Mj7YHNXIwapYJl/5tiD/iHy
 SqmecHJz5xc0aso+bzM+2OBCl7fzEYV2sua+8hmBw8XAItxrkGEMTAoG7Bk2Lees
 udOErciZ39/NfHFO7GOM
 =awzl
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
 "One final fix before 4.8.

  There was a memory leak triggered by turning scsi mq off due to the
  fact that we assume on host release that the already running hosts
  weren't mq based because that's the state of the global flag (even
  though they were).

  Fix it by tracking this on a per host host basis"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: Avoid that toggling use_blk_mq triggers a memory leak
2016-10-01 07:37:15 -07:00
Tyler Hicks
d6169b0206 net: Use ns_capable_noaudit() when determining net sysctl permissions
The capability check should not be audited since it is only being used
to determine the inode permissions. A failed check does not indicate a
violation of security policy but, when an LSM is enabled, a denial audit
message was being generated.

The denial audit message caused confusion for some application authors
because root-running Go applications always triggered the denial. To
prevent this confusion, the capability check in net_ctl_permissions() is
switched to the noaudit variant.

BugLink: https://launchpad.net/bugs/1465724

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
[dtor: reapplied after e79c6a4fc9 ("net: make net namespace sysctls
belong to container's owner") accidentally reverted the change.]
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-01 03:24:28 -04:00
Linus Torvalds
2161a2a644 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
 "One small change to make joydev (which is used by older games) to bind
  to devices that export Z axis but not X or Y (such as TRC rudder)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: joydev - recognize devices with Z axis as joysticks
2016-09-30 21:25:09 -07:00
Linus Torvalds
dbd8805b0a Merge branch 'akpm' (patches from Andrew)
Merge more fixes from Andrew Morton:
 "Three fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/property.h: fix typo/compile error
  ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
2016-09-30 15:51:10 -07:00
John Youn
37aa7271d9 include/linux/property.h: fix typo/compile error
This fixes commit d76eebfa17 ("include/linux/property.h: fix build
issues with gcc-4.4.4").

With that commit we get the following compile error when using the
PROPERTY_ENTRY_INTEGER_ARRAY macro.

 include/linux/property.h:201:39: error: `u32_data' undeclared (first
                 use in this function)
  PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, _val_)
                                       ^
 include/linux/property.h:193:17: note: in definition of macro
                 `PROPERTY_ENTRY_INTEGER_ARRAY'
  { .pointer = { _type_##_data = _val_ } },  \
                 ^

This needs a '.' to reference the union member.  It seems this was just
overlooked here since it is done correctly in similar constructs in
other parts of the original commit.

This fix is in preparation of upcoming commits that will use this macro.

Fixes: commit d76eebfa17 ("include/linux/property.h: fix build issues with gcc-4.4.4")
Link: http://lkml.kernel.org/r/2de3b929290d88a723ed829a3e3cbd02044714df.1475114627.git.johnyoun@synopsys.com
Signed-off-by: John Youn <johnyoun@synopsys.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30 15:26:52 -07:00
Eric Ren
c33f0785bf ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
ftruncate(fd, CLUSTER_SIZE) again and again.

This is the backtrace when the deadlock happens:

   __wait_on_bit_lock+0x50/0xa0
   __lock_page+0xb7/0xc0
   ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
   ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
   do_page_mkwrite+0x66/0xc0
   handle_mm_fault+0x685/0x1350
   __do_page_fault+0x1d8/0x4d0
   trace_do_page_fault+0x37/0xf0
   do_async_page_fault+0x19/0x70
   async_page_fault+0x28/0x30

In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
disk space for this write; ocfs2_try_to_free_truncate_log() will be
called if -ENOSPC is returned; if we're lucky to get enough clusters,
which is usually the case, we start over again.

But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
will deadlock when trying to grab the target page again.

Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
Another deadlock will happen in __do_page_mkwrite() if
ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
locked target page.

These two errors fail on the same path, so fix them by unlocking the
target page manually before ocfs2_free_write_ctxt().

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
cause.

Changes since v1:
1. Also put ENOMEM error case into consideration.

Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: He Gang <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30 15:26:52 -07:00
Johannes Weiner
22f2ac51b6 mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
Antonio reports the following crash when using fuse under memory pressure:

  kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: all of them
  CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
  RIP: shadow_lru_isolate+0x181/0x190
  Call Trace:
    __list_lru_walk_one.isra.3+0x8f/0x130
    list_lru_walk_one+0x23/0x30
    scan_shadow_nodes+0x34/0x50
    shrink_slab.part.40+0x1ed/0x3d0
    shrink_zone+0x2ca/0x2e0
    kswapd+0x51e/0x990
    kthread+0xd8/0xf0
    ret_from_fork+0x3f/0x70

which corresponds to the following sanity check in the shadow node
tracking:

  BUG_ON(node->count & RADIX_TREE_COUNT_MASK);

The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere with
reclaim of the radix tree node under memory pressure.

While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page.  Eventually the page count bits in node->count underflow while
leaving the node incorrectly linked to the shadow node LRU.

To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert().  This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.

Also, make the sanity checks a bit less obscure by using the helpers for
checking the number of pages and shadows in a radix tree node.

Fixes: 449dd6984d ("mm: keep page cache radix tree nodes in check")
Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>	[3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30 15:26:52 -07:00
Javi Merino
9a2172a8d5 MAINTAINERS: Switch to kernel.org email address for Javi Merino
Change my email address to my kernel.org account instead of the ARM one.

Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30 10:45:11 -07:00
Wanpeng Li
2fa5f04f85 x86/entry/64: Fix context tracking state warning when load_gs_index fails
This warning:

 WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50
 CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13
 Call Trace:
  dump_stack+0x99/0xd0
  __warn+0xd1/0xf0
  warn_slowpath_null+0x1d/0x20
  enter_from_user_mode+0x32/0x50
  error_entry+0x6d/0xc0
  ? general_protection+0x12/0x30
  ? native_load_gs_index+0xd/0x20
  ? do_set_thread_area+0x19c/0x1f0
  SyS_set_thread_area+0x24/0x30
  do_int80_syscall_32+0x7c/0x220
  entry_INT80_compat+0x38/0x50

... can be reproduced by running the GS testcase of the ldt_gdt test unit in
the x86 selftests.

do_int80_syscall_32() will call enter_form_user_mode() to convert context
tracking state from user state to kernel state. The load_gs_index() call
can fail with user gsbase, gsbase will be fixed up and proceed if this
happen.

However, enter_from_user_mode() will be called again in the fixed up path
though it is context tracking kernel state currently.

This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
are off once load_gs_index() failed with user gsbase.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 13:53:12 +02:00
Andy Lutomirski
05fb3c199b x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
Otherwise arch_task_struct_size == 0 and we die.  While we're at it,
set X86_FEATURE_ALWAYS, too.

Reported-by: David Saggiorato <david@saggiorato.net>
Tested-by: David Saggiorato <david@saggiorato.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 13:53:04 +02:00
Segher Boessenkool
e4aad64597 x86/vdso: Fix building on big endian host
We need to call GET_LE to read hdr->e_type.

Fixes: 57f90c3dfc ("x86/vdso: Error out if the vDSO isn't a valid DSO")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-next@vger.kernel.org
Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30 12:37:40 +02:00
Andy Lutomirski
192d1dccbf x86/boot: Fix another __read_cr4() case on 486
The condition for reading CR4 was wrong: there are some CPUs with
CPUID but not CR4.  Rather than trying to make the condition exact,
use __read_cr4_safe().

Fixes: 18bc7bd523 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly")
Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30 12:37:40 +02:00
Calvin Owens
803783849f mlx5: Add ndo_poll_controller() implementation
This implements ndo_poll_controller in net_device_ops callbacks for mlx5,
which is necessary to use netconsole with this driver.

Acked-By: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:11:16 -04:00
Jakub Kicinski
6cd80b5547 nfp: bpf: zero extend 4 byte context loads
Set upper 32 bits of destination register to zeros after
load from the context structure.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:10:15 -04:00
Xin Long
1cceda7849 sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock
When sctp dumps all the ep->assocs, it needs to lock_sock first,
but now it locks sock in rcu_read_lock, and lock_sock may sleep,
which would break rcu_read_lock.

This patch is to get and hold one sock when traversing the list.
After that and get out of rcu_read_lock, lock and dump it. Then
it will traverse the list again to get the next one until all
sctp socks are dumped.

For sctp_diag_dump_one, it fixes this issue by holding asoc and
moving cb() out of rcu_read_lock in sctp_transport_lookup_process.

Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:08:57 -04:00
David S. Miller
75b005b949 Merge branch 'sctp-fixes'
Xin Long says:

====================
sctp: a bunch of fixes for prsctp polices

This patchset is to fix 2 issues for prsctp polices:

  1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue
     when overloading the CPU.

  2. patch 3 fix "prsctp polices should check both sides' prsctp_capable,
     instead of only local side".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:07:10 -04:00
Xin Long
be4947bf46 sctp: change to check peer prsctp_capable when using prsctp polices
Now before using prsctp polices, sctp uses asoc->prsctp_enable to
check if prsctp is enabled. However asoc->prsctp_enable is set only
means local host support prsctp, sctp should not abandon packet if
peer host doesn't enable prsctp.

So this patch is to use asoc->peer.prsctp_capable to check if prsctp
is enabled on both side, instead of asoc->prsctp_enable, as asoc's
peer.prsctp_capable is set only when local and peer both enable prsctp.

Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:07:05 -04:00
Xin Long
0605483f6a sctp: remove prsctp_param from sctp_chunk
Now sctp uses chunk->prsctp_param to save the prsctp param for all the
prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
and reuse msg->expires_at for TTL policy, as the prsctp polices and old
expires policy are mutual exclusive.

This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
polices.

Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
as it needs a u64 variables to save the expires_at time.

This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
issue.

Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:07:05 -04:00
Xin Long
73dca124cd sctp: move sent_count to the memory hole in sctp_chunk
Now pahole sctp_chunk, it has 2 memory holes:
   struct sctp_chunk {
	struct list_head           list;
	atomic_t                   refcnt;
	/* XXX 4 bytes hole, try to pack */
	...
	long unsigned int          prsctp_param;
	int                        sent_count;
	/* XXX 4 bytes hole, try to pack */

This patch is to move up sent_count to fill the 1st one and eliminate
the 2nd one.

It's not just another struct compaction, it also fixes the "netperf-
Throughput_Mbps -37.2% regression" issue when overloading the CPU.

Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 02:07:05 -04:00
David Decotigny
5038056e6b mlx4: remove unused fields
This also can address following UBSAN warnings:
[   36.640343] ================================================================================
[   36.648772] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx4/fw.c:857:26
[   36.656853] shift exponent 64 is too large for 32-bit type 'int'
[   36.663348] ================================================================================
[   36.671783] ================================================================================
[   36.680213] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx4/fw.c:861:27
[   36.688297] shift exponent 35 is too large for 32-bit type 'int'
[   36.694702] ================================================================================

Tested:
  reboot with UBSAN, no warning.

Signed-off-by: David Decotigny <decot@googlers.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:56:41 -04:00
Maciej Żenczykowski
bd11f0741f ipv6 addrconf: implement RFC7559 router solicitation backoff
This implements:
  https://tools.ietf.org/html/rfc7559

Backoff is performed according to RFC3315 section 14:
  https://tools.ietf.org/html/rfc3315#section-14

We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
to a negative value meaning an unlimited number of retransmits,
and we make this the new default (inline with the RFC).

We also add a new setting:
  /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
defaulting to 1 hour (per RFC recommendation).

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:54:28 -04:00
David S. Miller
bcdc6efabd Merge branch 'net_proc_perf'
Jia He says:

====================
Reduce cache miss for snmp_fold_field

In a PowerPc server with large cpu number(160), besides commit
a3a773726c ("net: Optimize snmp stat aggregation by walking all
the percpu data at once"), I watched several other snmp_fold_field
callsites which would cause high cache miss rate.

test source code:
================
My simple test case, which read from the procfs items endlessly:
/***********************************************************/
int main(int argc, char **argv)
{
        int i;
        int fd = -1 ;
        int rdsize = 0;
        char buf[LINELEN+1];

        buf[LINELEN] = 0;
        memset(buf,0,LINELEN);

        if(1 >= argc) {
                printf("file name empty\n");
                return -1;
        }

        fd = open(argv[1], O_RDWR, 0644);
        if(0 > fd){
                printf("open error\n");
                return -2;
        }

        for(i=0;i<0xffffffff;i++) {
                while(0 < (rdsize = read(fd,buf,LINELEN))){
                        //nothing here
                }

                lseek(fd, 0, SEEK_SET);
        }

        close(fd);
        return 0;
}
/**********************************************************/

compile and run:
================
gcc test.c -o test

perf stat -d -e cache-misses ./test /proc/net/snmp
perf stat -d -e cache-misses ./test /proc/net/snmp6
perf stat -d -e cache-misses ./test /proc/net/sctp/snmp
perf stat -d -e cache-misses ./test /proc/net/xfrm_stat

before the patch set:
====================
 Performance counter stats for 'system wide':

         355911097      cache-misses                                                 [40.08%]
        2356829300      L1-dcache-loads                                              [60.04%]
         355642645      L1-dcache-load-misses     #   15.09% of all L1-dcache hits   [60.02%]
         346544541      LLC-loads                                                    [59.97%]
            389763      LLC-load-misses           #    0.11% of all LL-cache hits    [40.02%]

       6.245162638 seconds time elapsed

After the patch set:
===================
 Performance counter stats for 'system wide':

         194992476      cache-misses                                                 [40.03%]
        6718051877      L1-dcache-loads                                              [60.07%]
         194871921      L1-dcache-load-misses     #    2.90% of all L1-dcache hits   [60.11%]
         187632232      LLC-loads                                                    [60.04%]
            464466      LLC-load-misses           #    0.25% of all LL-cache hits    [39.89%]

       6.868422769 seconds time elapsed
The cache-miss rate can be reduced from 15% to 2.9%

changelog
=========
v6:
- correct v5
v5:
- order local variables from longest to shortest line
v4:
- move memset into one block of if statement in snmp6_seq_show_item
- remove the changes in netstat_seq_show considerred the stack usage is too large
v3:
- introduce generic interface (suggested by Marcelo Ricardo Leitner)
- use max_t instead of self defined macro (suggested by David Miller)
v2:
- fix bug in udplite statistics.
- snmp_seq_show is split into 2 parts
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:57 -04:00
Jia He
6d4a741cbb net: Suppress the "Comparison to NULL could be written" warnings
This is to suppress the checkpatch.pl warning "Comparison to NULL
could be written". No functional changes here.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:45 -04:00
Jia He
aca05671d5 ipv6: Remove useless parameter in __snmp6_fill_statsdev
The parameter items(is always ICMP6_MIB_MAX) is useless for __snmp6_fill_statsdev

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:45 -04:00
Jia He
07613873f1 proc: Reduce cache miss in xfrm_statistics_seq_show
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:45 -04:00
Jia He
7d64a94be2 proc: Reduce cache miss in sctp_snmp_seq_show
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:44 -04:00
Jia He
4a4857b1c8 proc: Reduce cache miss in snmp6_seq_show
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:44 -04:00