Merge updates from Andrew Morton:
- a few misc bits
- ocfs2 updates
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
memory hotplug: fix comments when adding section
mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
mm: simplify nodemask printing
mm,oom_reaper: remove pointless kthread_run() error check
mm/page_ext.c: check if page_ext is not prepared
writeback: remove unused function parameter
mm: do not rely on preempt_count in print_vma_addr
mm, sparse: do not swamp log with huge vmemmap allocation failures
mm/hmm: remove redundant variable align_end
mm/list_lru.c: mark expected switch fall-through
mm/shmem.c: mark expected switch fall-through
mm/page_alloc.c: broken deferred calculation
mm: don't warn about allocations which stall for too long
fs: fuse: account fuse_inode slab memory as reclaimable
mm, page_alloc: fix potential false positive in __zone_watermark_ok
mm: mlock: remove lru_add_drain_all()
mm, sysctl: make NUMA stats configurable
shmem: convert shmem_init_inodecache() to void
Unify migrate_pages and move_pages access checks
mm, pagevec: rename pagevec drained field
...
As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.
This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop. It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.
The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.
Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
3JMrjjLxtCIvpUyzCvDW
=A1oL
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is a fairly plain pull request. Lots of driver updates across the
stack, a huge number of static analysis cleanups including a close to
50 patch series from Bart Van Assche, and a number of new features
inside the stack such as general CQ moderation support.
Nothing really stands out, but there might be a few conflicts as you
take things in. In particular, the cleanups touched some of the same
lines as the new timer_setup changes.
Everything in this pull request has been through 0day and at least two
days of linux-next (since Stephen doesn't necessarily flag new
errors/warnings until day2). A few more items (about 30 patches) from
Intel and Mellanox showed up on the list on Tuesday. I've excluded
those from this pull request, and I'm sure some of them qualify as
fixes suitable to send any time, but I still have to review them
fully. If they contain mostly fixes and little or no new development,
then I will probably send them through by the end of the week just to
get them out of the way.
There was a break in my acceptance of patches which coincides with the
computer problems I had, and then when I got things mostly back under
control I had a backlog of patches to process, which I did mostly last
Friday and Monday. So there is a larger number of patches processed in
that timeframe than I was striving for.
Summary:
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
RDMA/core: Rename kernel modify_cq to better describe its usage
IB/mlx5: Add CQ moderation capability to query_device
IB/mlx4: Add CQ moderation capability to query_device
IB/uverbs: Add CQ moderation capability to query_device
IB/mlx5: Exposing modify CQ callback to uverbs layer
IB/mlx4: Exposing modify CQ callback to uverbs layer
IB/uverbs: Allow CQ moderation with modify CQ
iw_cxgb4: atomically flush the qp
iw_cxgb4: only call the cq comp_handler when the cq is armed
iw_cxgb4: Fix possible circular dependency locking warning
RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
IB/core: Only maintain real QPs in the security lists
IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
RDMA/core: Make function rdma_copy_addr return void
RDMA/vmw_pvrdma: Add shared receive queue support
RDMA/core: avoid uninitialized variable warning in create_udata
RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
RDMA/bnxt_re: Set QP state in case of response completion errors
RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
...
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.15:
API:
- Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
This change touches code outside the crypto API.
- Reset settings when empty string is written to rng_current.
Algorithms:
- Add OSCCA SM3 secure hash.
Drivers:
- Remove old mv_cesa driver (replaced by marvell/cesa).
- Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
- Add ccm/gcm AES in crypto4xx.
- Add support for BCM7278 in iproc-rng200.
- Add hash support on Exynos in s5p-sss.
- Fix fallback-induced error in vmx.
- Fix output IV in atmel-aes.
- Fix empty GCM hash in mediatek.
Others:
- Fix DoS potential in lib/mpi.
- Fix potential out-of-order issues with padata"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
lib/mpi: call cond_resched() from mpi_powm() loop
crypto: stm32/hash - Fix return issue on update
crypto: dh - Remove pointless checks for NULL 'p' and 'g'
crypto: qat - Clean up error handling in qat_dh_set_secret()
crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
crypto: dh - Don't permit 'p' to be 0
crypto: dh - Fix double free of ctx->p
hwrng: iproc-rng200 - Add support for BCM7278
dt-bindings: rng: Document BCM7278 RNG200 compatible
crypto: chcr - Replace _manual_ swap with swap macro
crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]
hwrng: virtio - Virtio RNG devices need to be re-registered after suspend/resume
crypto: atmel - remove empty functions
crypto: ecdh - remove empty exit()
MAINTAINERS: update maintainer for qat
crypto: caam - remove unused param of ctx_map_to_sec4_sg()
crypto: caam - remove unneeded edesc zeroization
crypto: atmel-aes - Reset the controller before each use
crypto: atmel-aes - properly set IV after {en,de}crypt
hwrng: core - Reset user selected rng by writing "" to rng_current
...
The LPI transitioning logic in stmmac_main uses
priv->tx_path_in_lpi_mode to enter/exit LPI.
However, priv->tx_path_in_lpi_mode is assigned
using the return value from host_irq_status().
So for dwmac4, priv->tx_path_in_lpi_mode was always false,
so stmmac_tx_clean() would always try to put us in eee mode,
and stmmac_xmit() would never take us out of eee mode.
To fix this, make host_irq_status() read and return the LPI
irq status also for dwmac4.
This also increments the existing LPI counters, so that
ethtool --statistics shows LPI transitions also for dwmac4.
For dwmac1000, irqs are enabled/disabled using the register
named "Interrupt Mask Register", and thus setting a bit disables
that specific irq.
For dwmac4 the matching register is named "MAC_Interrupt_Enable",
and thus setting a bit enables that specific irq.
Looking at dwmac1000_core.c, the irqs that are always enabled are:
LPI and PMT.
Looking at dwmac4_core.c, the irqs that are always enabled are:
PMT.
To be able to read the LPI irq status, we need to enable the LPI
irq also for dwmac4.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We accidentally return success if lio_vf_rep_modinit() fails instead of
propogating the error code.
Fixes: e20f469660 ("liquidio: synchronize VF representor names with NIC firmware")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements and enables VDP support for the ibmvnic driver.
Moreover, it includes the implementation of suitable structs, signal
transmission/handling and functions which allows the retrival of firmware
information from the ibmvnic card through the ethtool command.
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvneta controller provides a 8-bit register to update the pending
Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
added at once. In the current code the mvneta_txq_pend_desc_add function
assumes the caller takes care of this limit. But it is not the case. In
some situations (xmit_more flag), more than 255 descriptors are added.
When this happens, the Tx descriptor counter register is updated with a
wrong value, which breaks the whole Tx queue management.
This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
function to process more than 255 Tx descriptors.
Fixes: 2a90f7e1d5 ("net: mvneta: add xmit_more support")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch migrates the HNS3 driver code from use of depricated PCI
MSI/MSI-X interrupt vector allocation/free APIs to new common APIs.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 4a3c67a6e7 ("mlxsw: spectrum_router: Don't batch neighbour
deletion") I removed the support for batch deletion of neighbours on a
router interface (RIF) since at that time the firmware did not support
it for IPv6 neighbours.
This is now supported by the version enforced by the driver, so there is
no reason to delete neighbours one by one anymore.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs,
BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task,
bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open
NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem allocates dma
failure. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't
allocate memory" pops out. The variable slowpath is set to NULL.
When shutdown the NIC, the function bnx2x_nic_unload is called. In
the function bnx2x_nic_unload, the following functions are executed.
bnx2x_chip_cleanup
bnx2x_set_storm_rx_mode
bnx2x_set_q_rx_mode
bnx2x_set_q_rx_mode
bnx2x_config_rx_mode
bnx2x_set_rx_mode_e2
In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated.
Then the crash occurs.
To fix this crash, the variable slowpath is checked. And in the function
bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown
and open NIC is executed.
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Ariel Elior <aelior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer updates from Thomas Gleixner:
"Yet another big pile of changes:
- More year 2038 work from Arnd slowly reaching the point where we
need to think about the syscalls themself.
- A new timer function which allows to conditionally (re)arm a timer
only when it's either not running or the new expiry time is sooner
than the armed expiry time. This allows to use a single timer for
multiple timeout requirements w/o caring about the first expiry
time at the call site.
- A new NMI safe accessor to clock real time for the printk timestamp
work. Can be used by tracing, perf as well if required.
- A large number of timer setup conversions from Kees which got
collected here because either maintainers requested so or they
simply got ignored. As Kees pointed out already there are a few
trivial merge conflicts and some redundant commits which was
unavoidable due to the size of this conversion effort.
- Avoid a redundant iteration in the timer wheel softirq processing.
- Provide a mechanism to treat RTC implementations depending on their
hardware properties, i.e. don't inflict the write at the 0.5
seconds boundary which originates from the PC CMOS RTC to all RTCs.
No functional change as drivers need to be updated separately.
- The usual small updates to core code clocksource drivers. Nothing
really exciting"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
timers: Add a function to start/reduce a timer
pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
timer: Prepare to change all DEFINE_TIMER() callbacks
netfilter: ipvs: Convert timers to use timer_setup()
scsi: qla2xxx: Convert timers to use timer_setup()
block/aoe: discover_timer: Convert timers to use timer_setup()
ide: Convert timers to use timer_setup()
drbd: Convert timers to use timer_setup()
mailbox: Convert timers to use timer_setup()
crypto: Convert timers to use timer_setup()
drivers/pcmcia: omap1: Fix error in automated timer conversion
ARM: footbridge: Fix typo in timer conversion
drivers/sgi-xp: Convert timers to use timer_setup()
drivers/pcmcia: Convert timers to use timer_setup()
drivers/memstick: Convert timers to use timer_setup()
drivers/macintosh: Convert timers to use timer_setup()
hwrng/xgene-rng: Convert timers to use timer_setup()
auxdisplay: Convert timers to use timer_setup()
sparc/led: Convert timers to use timer_setup()
mips: ip22/32: Convert timers to use timer_setup()
...
Pull scheduler updates from Ingo Molnar:
"The main updates in this cycle were:
- Group balancing enhancements and cleanups (Brendan Jackman)
- Move CPU isolation related functionality into its separate
kernel/sched/isolation.c file, with related 'housekeeping_*()'
namespace and nomenclature et al. (Frederic Weisbecker)
- Improve the interactive/cpu-intense fairness calculation (Josef
Bacik)
- Improve the PELT code and related cleanups (Peter Zijlstra)
- Improve the logic of pick_next_task_fair() (Uladzislau Rezki)
- Improve the RT IPI based balancing logic (Steven Rostedt)
- Various micro-optimizations:
- better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)
- better idle loop (Cheng Jian)
- ... plus misc fixes, cleanups and updates"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
sched/sysctl: Fix attributes of some extern declarations
sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
sched/isolation: Add basic isolcpus flags
sched/isolation: Move isolcpus= handling to the housekeeping code
sched/isolation: Handle the nohz_full= parameter
sched/isolation: Introduce housekeeping flags
sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
sched/isolation: Use its own static key
sched/isolation: Make the housekeeping cpumask private
sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
sched/isolation: Move housekeeping related code to its own file
sched/idle: Micro-optimize the idle loop
sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
sched/rt: Simplify the IPI based RT balancing logic
block/ioprio: Use a helper to check for RT prio
sched/rt: Add a helper to test for a RT task
...
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle are:
- Another attempt at enabling cross-release lockdep dependency
tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
with better performance and fewer false positives. (Byungchul Park)
- Introduce lockdep_assert_irqs_enabled()/disabled() and convert
open-coded equivalents to lockdep variants. (Frederic Weisbecker)
- Add down_read_killable() and use it in the VFS's iterate_dir()
method. (Kirill Tkhai)
- Convert remaining uses of ACCESS_ONCE() to
READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
driven. (Mark Rutland, Paul E. McKenney)
- Get rid of lockless_dereference(), by strengthening Alpha atomics,
strengthening READ_ONCE() with smp_read_barrier_depends() and thus
being able to convert users of lockless_dereference() to
READ_ONCE(). (Will Deacon)
- Various micro-optimizations:
- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)
- ... plus other fixes and enhancements. (Borislav Petkov, Juergen
Gross, Miguel Bernal Marin)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
rcu: Use lockdep to assert IRQs are disabled/enabled
netpoll: Use lockdep to assert IRQs are disabled/enabled
timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
irq_work: Use lockdep to assert IRQs are disabled/enabled
irq/timings: Use lockdep to assert IRQs are disabled/enabled
perf/core: Use lockdep to assert IRQs are disabled/enabled
x86: Use lockdep to assert IRQs are disabled/enabled
smp/core: Use lockdep to assert IRQs are disabled/enabled
timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
timers/nohz: Use lockdep to assert IRQs are disabled/enabled
workqueue: Use lockdep to assert IRQs are disabled/enabled
irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
locking/rwlocks: Fix comments
x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
...
In xmit process, the variables are set many times. In fact,
it is enough for these variables to be set once.
After a long time test, the throughput performance is better
than before.
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since Mellanox focus is on newer adapters, we would like to have the
ability to disable the support for old gen2 adapters.
This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag.
We keep it turned on by default.
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable giga_ctrl is being assigned to zero however this is
never read and hence the assignment is redundant, so remove it.
Cleans up clang warning:
drivers/net/ethernet/realtek/r8169.c:1978:3: warning: Value stored
to 'giga_ctrl' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for enabling Broadcom tags with b53, pad packets to a
minimum size of 64 bytes (sans FCS) in order for the Broadcom switch to
accept ingressing frames. Without this, we would typically be able to
DHCP, but not resolve with ARP because packets are too small and get
rejected by the switch.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an error in memory allocation/freeing in
ThunderX PF driver.
I moved the allocation to the probe() function and made it managed.
>From the Colin's email:
While running static analysis on linux-next with CoverityScan I found 3
double free errors in the Cavium thunder driver.
The issue occurs on the err_disable_device: label of function nic_probe
when nic_free_lmacmem(nic) is called and a double free occurs on
nic->duplex, nic->link and nic->speed. This occurs when nic_init_hw()
fails:
/* Initialize hardware */
err = nic_init_hw(nic);
if (err)
goto err_release_regions;
nic_init_hw() calls nic_get_hw_info() and this calls nic_free_lmacmem()
if any of the allocations fail. This free'ing occurs again by the call
to nic_free_lmacmem() on the err_release_regions exit path in nic_probe().
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable start is assigned but never read hence it is redundant
and can be removed. Cleans up clang warning:
drivers/net/ethernet/sfc/ptp.c:655:2: warning: Value stored to 'start'
is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assignment to mbcp is identical to the initiatialized value assigned
to mbcp at declaration time a few lines earlier, hence we can remove the
second redundant assignment. Cleans up clang warning:
drivers/net/ethernet/qlogic/qlge/qlge_mpi.c:209:22: warning:
Value stored to 'mbcp' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114888
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114891
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1397960
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1397972
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the login buffer to include client data for the vnic driver,
this includes the OS name, LPAR name, and device name. This update
allows this information to be available in the VIOS.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We add the call of_node_put(bp->phy_node) to all associated error
paths for memory clean up.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We add the call of_phy_deregister_fixed_link to all associated
error paths for memory clean up.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GOP statistics from all ports of one instance of the driver are gathered
with one work recalled in loop in a workqueue. The loop is started when
a port is up, and stopped when a port is down. This last condition is
obviously wrong.
Fix this by having a work per port. This way, starting and stoping it
when the port is up or down will be fine, while minimizing unnecessary
CPU usage.
Fixes: 118d6298f6 ("net: mvpp2: add ethtool GOP statistics")
Reported-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking whether auto-negotiation is on, driver only needs to
check the value of mac.autoneg(SW) directly, and does not need to
query it from hardware. Because this value is always synchronized
with the auto-negotiation state of hardware.
This patch removes mac auto-negotiation state query in
hclge_update_speed_duplex().
Fixes: 46a3df9f97 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver gets phy address from NCL_config file and uses the phy address
to initialize phydev. There are 5 bits for phy address. And C22 phy
address has 5 bits. So 0-31 are all valid address for phy. If there
is no phy, it will crash. Because driver always get a valid phy address.
This patch fixes the phy address to 8 bits, and use 0xff to indicate
invalid phy address.
Fixes: 46a3df9f97 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to prevent the case of working with a single MPWQE
(1 WQE is always reserved as RQ is linked-list).
When the WQE is fully consumed, HW should still have available buffer
in order not to drop packets.
Fixes: 461017cb00 ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, when dma mapping fails, put_page is called,
but the page is not set to null. Later, in the page_reuse treatment in
mlx5e_free_rx_descs(), mlx5e_page_release() is called for the second time,
improperly doing dma_unmap (for a non-mapped address) and an extra put_page.
Prevent this by nullifying the page pointer when dma_map fails.
Fixes: accd588332 ("net/mlx5e: Introduce RX Page-Reuse")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
napi->poll can be called with budget 0, e.g. in netpoll scenarios
where the caller only wants to poll TX rings
(poll_one_napi@net/core/netpoll.c).
The below commit changed RX polling from "while" loop to "do {} while",
which caused to ignore the initial budget and handle at least one RX
packet.
This fixes the following warning:
[ 2852.049194] mlx5e_napi_poll+0x0/0x260 [mlx5_core] exceeded budget in poll
[ 2852.049195] ------------[ cut here ]------------
[ 2852.049195] WARNING: CPU: 0 PID: 25691 at net/core/netpoll.c:171 netpoll_poll_dev+0x18a/0x1a0
Fixes: 4b7dfc9925 ("net/mlx5e: Early-return on empty completion queues")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Martin KaFai Lau <kafai@fb.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After the panic teardown firmware command, health_care detects the error
in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is
no longer needed because the panic teardown firmware command will bring
down the PCI bus communication with the HCA.
The solution is to cancel the health care timer and its pending
workqueue request before sending panic teardown firmware command.
Kernel trace:
mlx5_core 0033:01:00.0: Shutdown was called
mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here
mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1
mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called
mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start
mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0
mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end
Unable to handle kernel paging request for data at address 0x0000003f
Faulting instruction address: 0xc0080000434b8c80
Fixes: 8812c24d28 ('net/mlx5: Add fast unload support in shutdown flow')
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
list_splice_init initializing waiting_events_list after splicing it to
temp list, therefore we should loop over temp list to fire the events.
Fixes: 4ca637a20a ("net/mlx5: Delay events till mlx5 interface's add complete for pci resume")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 535a61777f ("sfc: suppress handled MCDI failures when changing the MAC address")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several pointers that are being assigned but never read
so remove these as they are redundant. Also remove an assignment
to function_mode that is never read. Cleans up several clang
warnings:
vxge-main.c:1139:2: warning: Value stored to 'hldev' is never read
vxge-main.c:1294:2: warning: Value stored to 'hldev' is never read
vxge-main.c:2188:2: warning: Value stored to 'dev' is never read
vxge-main.c:2188:2: warning: Value stored to 'dev' is never read
vxge-main.c:2723:2: warning: Value stored to 'function_mode' is
never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5-updates-2017-11-09
This series introduces vlan offloads related improvements for mlx5
ethernet netdev driver, from Gal Pressman.
- Add support for 802.1ad vlan filter
- Add support for 802.1ad vlan insertion
- Add vlan offloads statistics to ethtool (inserted/stripped vlans)
- CHECKSUM_COMPLETE support for vlan traffic when vlan stripping is off! (Finally)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple cases of overlapping changes in the packet scheduler.
Must easier to resolve this time.
Which probably means that I screwed it up somehow.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the VLAN tag is present in the packet buffer (i.e VLAN stripping disabled, QinQ)
the driver will currently report CHECKSUM_UNNECESSARY.
Instead of using CHECKSUM_COMPLETE offload for packets with first
ethertype of IPv4/6, use it for packets with last ethertype of IPv4/6 to
cover the former cases as well.
The checksum field present in the CQE is calculated from the IP header
until the end of the packet. When the first ethertype is different than
IPv4/6 (for ex. 802.1Q VLAN) a checksum of the VLAN header/s should be
added. The small header/s checksum calculation will allow us to use
CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY.
Testing bandwidth of one and 8 TCP streams to a single RQ,
LRO and VLAN stripping offloads disabled:
CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Before:
+--------------+--------------------+---------------------+----------------------+
| Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload |
+--------------+--------------------+---------------------+----------------------+
| Untagged | 28,247.35 | 24,716.88 | CHECKSUM_COMPLETE |
| VLAN | 27,516.69 | 23,752.26 | CHECKSUM_UNNECESSARY |
| QinQ | 6,961.30 | 20,667.04 | CHECKSUM_UNNECESSARY |
+--------------+--------------------+---------------------+----------------------+
Now:
+--------------+--------------------+---------------------+-------------------+
| Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload |
+--------------+--------------------+---------------------+-------------------+
| Untagged | 28,521.28 | 24,926.32 | CHECKSUM_COMPLETE |
| VLAN | 27,389.37 | 23,715.34 | CHECKSUM_COMPLETE |
| QinQ | 6,901.77 | 20,845.73 | CHECKSUM_COMPLETE |
+--------------+--------------------+---------------------+-------------------+
No performance degradation observed.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The following counters are now exposed through ethtool -S:
rx[i]_removed_vlan_packets (per channel)
rx_removed_vlan_packets
tx[i]_added_vlan_packets (per channel)
tx_added_vlan_packets
rx_removed_vlan_packets: The number of packets that had their
outer VLAN header stripped to the CQE by the hardware.
tx_added_vlan_packets: The number of packets that had their
outer VLAN header inserted by the hardware.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>