Commit Graph

27986 Commits

Author SHA1 Message Date
Linus Torvalds
2406fb8d94 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "A single bugfix to prevent a pinned thread which queues stomp machine
  work to be preempted by the stopper thread on its CPU which causes a
  live lock as it is unable to wake the second CPUs stopper thread"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine: Atomically queue and wake stopper threads
2018-08-13 11:21:34 -07:00
Linus Torvalds
b99cdfdf0b Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Thomas Gleixner:
 "A large update to RCU:

  Preparatory work for consolidating the RCU flavors:

   - Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt,
     and RCU-sched flavors, replacing the old ->gpnum and ->completed
     pair of fields.

     This change allows lockless code to obtain the complete
     grace-period state with a single READ_ONCE(), which is needed to
     maintain tolerable lock contention during the upcoming
     consolidation of the three RCU flavors.

     Note that grace-period sequence numbers are already used by
     rcu_barrier(), expedited RCU grace periods, and SRCU, and are thus
     already heavily used and well-tested. Joel Fernandes contributed a
     number of excellent fixes and improvements.

   - Clean up some grace-period-reporting loose ends, including
     improving the handling of quiescent states from offline CPUs and
     fixing some false-positive WARN_ON_ONCE() invocations.

     (Strictly speaking, the WARN_ON_ONCE() invocations were quite
     correct, but their invariants were (harmlessly) violated by the
     earlier sloppy handling of quiescent states from offline CPUs.)

     In addition, improve grace-period forward-progress guarantees so as
     to allow removal of fail-safe checks that required otherwise
     needless lock acquisitions. Finally, add more diagnostics to help
     debug the upcoming consolidation of the RCU-bh, RCU-preempt, and
     RCU-sched flavors.

  The rest:

   - SRCU updates

   - Updates to rcutorture and associated scripting.

   - The usual pile of miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
  rcutorture: Fix rcu_barrier successes counter
  rcutorture: Add support to detect if boost kthread prio is too low
  rcutorture: Use monotonic timestamp for stall detection
  rcutorture: Make boost test more robust
  rcutorture: Disable RT throttling for boost tests
  rcutorture: Emphasize testing of single reader protection type
  rcutorture: Handle extended read-side critical sections
  rcutorture: Make rcu_torture_timer() use rcu_torture_one_read()
  rcutorture: Use per-CPU random state for rcu_torture_timer()
  rcutorture: Use atomic increment for n_rcu_torture_timers
  rcutorture: Extract common code from rcu_torture_reader()
  rcuperf: Remove unused torturing_tasks() function
  rcu: Remove rcutorture test version and sequence number
  rcutorture: Change units of onoff_interval to jiffies
  rcu: Assign higher prio to RCU threads if rcutorture is built-in
  rculist: Improve documentation for list_for_each_entry_from_rcu()
  srcu: Add grace-period number to rcutorture statistics printout
  rcu: Print stall-warning NMI dyntick state in hexadecimal
  MAINTAINERS: Update RCU, SRCU, and TORTURE-TEST entries
  rcu: Make rcu_seq_diff() more exact
  ...
2018-08-13 10:49:41 -07:00
Linus Torvalds
d0daaeaf60 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull genirq updates from Thomas Gleixner:
 "The irq departement provides:

   - A synchronization fix for free_irq() to synchronize just the
     removed interrupt thread on shared interrupt lines.

   - Consolidate the multi low level interrupt entry handling and mvoe
     it to the generic code instead of adding yet another copy for
     RISC-V

   - Refactoring of the ARM LPI allocator and LPI exposure to the
     hypervisor

   - Yet another interrupt chip driver for the JZ4725B SoC

   - Speed up for /proc/interrupts as people seem to love reading this
     file with high frequency

   - Miscellaneous fixes and updates"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
  genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
  openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
  arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
  ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
  irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER
  irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices
  dt-bindings: irqchip: renesas-irqc: Document r8a77980 support
  dt-bindings: irqchip: renesas-irqc: Document r8a77470 support
  irqchip/ingenic: Add support for the JZ4725B SoC
  irqchip/stm32: Add exti0 translation for stm32mp1
  genirq: Remove redundant NULL pointer check in __free_irq()
  irqchip/gic-v3-its: Honor hypervisor enforced LPI range
  irqchip/gic-v3: Expose GICD_TYPER in the rdist structure
  irqchip/gic-v3-its: Drop chunk allocation compatibility
  irqchip/gic-v3-its: Move minimum LPI requirements to individual busses
  irqchip/gic-v3-its: Use full range of LPIs
  irqchip/gic-v3-its: Refactor LPI allocator
  genirq: Synchronize only with single thread on free_irq()
  genirq: Update code comments wrt recycled thread_mask
  ...
2018-08-13 10:47:26 -07:00
Linus Torvalds
b5b1404d08 init: rename and re-order boot_cpu_state_init()
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.

We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).

This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized.  It even has a comment to
that effect.

Except it _doesn't_ actually run after the percpu data has been properly
initialized.  On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().

This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:

  -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
  +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);

which is obviously the right thing to do.  Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.

So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.

Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-12 12:19:42 -07:00
Jesper Dangaard Brouer
1bf9116d08 xdp: fix bug in devmap teardown code path
Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.

Fixes: 389ab7f01a ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09 21:50:44 +02:00
Jesper Dangaard Brouer
ad0ab027fc xdp: fix bug in cpumap teardown code path
When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.

The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path.  Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.

This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.

Fixes: 389ab7f01a ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09 21:50:44 +02:00
Daniel Borkmann
7c81c71730 bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of
ENOMEM, it may also mean that we've partially filled the scatterlist
entries with pages. Later jumping to sk_stream_wait_memory()
we could further fail with an error for several reasons, however
we miss to call free_start_sg() if the local sk_msg_buff was used.

Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08 12:06:17 -07:00
Daniel Borkmann
5121700b34 bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
While working on bpf_tcp_sendmsg() code, I noticed that when a
sk->sk_err is set we error out with err = sk->sk_err. However
this is problematic since sk->sk_err is a positive error value
and therefore we will neither go into sk_stream_error() nor will
we report an error back to user space. I had this case with EPIPE
and user space was thinking sendmsg() succeeded since EPIPE is
a positive value, thinking we submitted 32 bytes. Fix it by
negating the sk->sk_err value.

Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-08 12:06:17 -07:00
Thomas Gleixner
9e90c79852 irqchip updates for 4.19
- GICv3 ITS LPI allocation revamp
 - GICv3 support for hypervisor-enforced LPI range
 - GICv3 ITS conversion to raw spinlock
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq
 BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi
 9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp
 ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym
 JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM
 zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC
 0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI
 wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh
 BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb
 0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK
 1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA
 uy3QSkmjx9vw/sAu
 =gKw8
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
2018-08-06 12:45:42 +02:00
Prasad Sodagudi
cfd355145c stop_machine: Atomically queue and wake stopper threads
When cpu_stop_queue_work() releases the lock for the stopper
thread that was queued into its wake queue, preemption is
enabled, which leads to the following deadlock:

CPU0                              CPU1
sched_setaffinity(0, ...)
__set_cpus_allowed_ptr()
stop_one_cpu(0, ...)              stop_two_cpus(0, 1, ...)
cpu_stop_queue_work(0, ...)       cpu_stop_queue_two_works(0, ..., 1, ...)

-grabs lock for migration/0-
                                  -spins with preemption disabled,
                                   waiting for migration/0's lock to be
                                   released-

-adds work items for migration/0
and queues migration/0 to its
wake_q-

-releases lock for migration/0
 and preemption is enabled-

-current thread is preempted,
and __set_cpus_allowed_ptr
has changed the thread's
cpu allowed mask to CPU1 only-

                                  -acquires migration/0 and migration/1's
                                   locks-

                                  -adds work for migration/0 but does not
                                   add migration/0 to wake_q, since it is
                                   already in a wake_q-

                                  -adds work for migration/1 and adds
                                   migration/1 to its wake_q-

                                  -releases migration/0 and migration/1's
                                   locks, wakes migration/1, and enables
                                   preemption-

                                  -since migration/1 is requested to run,
                                   migration/1 begins to run and waits on
                                   migration/0, but migration/0 will never
                                   be able to run, since the thread that
                                   can wake it is affine to CPU1-

Disable preemption in cpu_stop_queue_work() before queueing works for
stopper threads, and queueing the stopper thread in the wake queue, to
ensure that the operation of queueing the works and waking the stopper
threads is atomic.

Fixes: 0b26351b91 ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org

Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-05 21:58:31 +02:00
Linus Torvalds
2f3672cbf9 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two oneliners addressing NOHZ failures:

   - Use a bitmask to check for the pending timer softirq and not the
     bit number. The existing code using the bit number checked for
     the wrong bit, which caused timers to either expire late or stop
     completely.

   - Make the nohz evaluation on interrupt exit more robust. The
     existing code did not re-arm the hardware when interrupting a
     running softirq in task context (ksoftirqd or tail of
     local_bh_enable()), which caused timers to either expire late
     or stop completely"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Fix missing tick reprogram when interrupting an inline softirq
  nohz: Fix local_timer_softirq_pending()
2018-08-05 09:25:29 -07:00
Frederic Weisbecker
0a0e0829f9 nohz: Fix missing tick reprogram when interrupting an inline softirq
The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
a nesting interrupt. This stands as an optimization: whether a hardirq or a
softirq is interrupted, the tick is going to be reprogrammed when necessary
at the end of the inner interrupt, with even potential new updates on the
timer queue.

When soft interrupts are interrupted, it's assumed that they are executing
on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
called after softirq processing to take care of the tick reprogramming.

But the assumption is wrong: softirqs can be processed inline as well, ie:
outside of an interrupt, like in a call to local_bh_enable() or from
ksoftirqd.

Inline softirqs don't reprogram the tick once they are done, as opposed to
interrupt tail softirq processing. So if a tick interrupts an inline
softirq processing, the next timer will neither be reprogrammed from the
interrupting tick's irq_exit() nor after the interrupted softirq
processing. This situation may leave the tick unprogrammed while timers are
armed.

To fix this, simply keep reprogramming the tick even if a softirq has been
interrupted. That can be optimized further, but for now correctness is more
important.

Note that new timers enqueued in nohz_full mode after a softirq gets
interrupted will still be handled just fine through self-IPIs triggered by
the timer code.

Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org # 4.14+
Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
2018-08-03 15:52:10 +02:00
Thomas Gleixner
d1f0301b33 genirq: Make force irq threading setup more robust
The support of force threading interrupts which are set up with both a
primary and a threaded handler wreckaged the setup of regular requested
threaded interrupts (primary handler == NULL).

The reason is that it does not check whether the primary handler is set to
the default handler which wakes the handler thread. Instead it replaces the
thread handler with the primary handler as it would do with force threaded
interrupts which have been requested via request_irq(). So both the primary
and the thread handler become the same which then triggers the warnon that
the thread handler tries to wakeup a not configured secondary thread.

Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
when requesting the threaded interrupt, which is normaly caught by the
sanity checks when force irq threading is disabled.

Fix it by skipping the force threading setup when a regular threaded
interrupt is requested. As a consequence the interrupt request which lacks
the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
it.

Fixes: 2a1d3ab898 ("genirq: Handle force threading of irqs with primary and thread handler")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Cc: stable@vger.kernel.org
2018-08-03 15:19:01 +02:00
Palmer Dabbelt
4f7799d96e genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
Now that every user of MULTI_IRQ_HANDLER has been convereted over to use
GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux@armlinux.org.uk
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: jonas@southpole.se
Cc: stefan.kristiansson@saunalahti.fi
Cc: shorne@gmail.com
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: nicolas.pitre@linaro.org
Cc: vladimir.murzin@arm.com
Cc: keescook@chromium.org
Cc: jinb.park7@gmail.com
Cc: yamada.masahiro@socionext.com
Cc: alexandre.belloni@bootlin.com
Cc: pombredanne@nexb.com
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: kstewart@linuxfoundation.org
Cc: jhogan@kernel.org
Cc: mark.rutland@arm.com
Cc: ard.biesheuvel@linaro.org
Cc: james.morse@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: openrisc@lists.librecores.org
Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com
2018-08-03 12:14:10 +02:00
Linus Torvalds
37b71411b7 audit/stable-4.18 PR 20180731
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAltguv8UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQVeRaWujKfIqYOA/9GgMzBJYU+bVCbNSagcq6LluWFoYV
 ObZb9sfsf23wL0YgtKgkWaCefWAAYnWOr6bUvDa+5oMRLVR+bsP+YEkCVK45CJr0
 g44oe4VH9t5inX2F2JSkoVbkUDZIwwOxiTi/L4Emqhv8cT9zc89tcKRjYhqt50d1
 4Gm4++jZcTHQNKkzYUIIpKc0TZmKW5mRNmFaGogWPi72FWrhbDfjKLZZUvd+kIUC
 HSKnv6pKnwbxLPhd9i0p5NchuTM6kRCptGzN07UUzeww6UVvs8t62+DzHUM1o3Ft
 sraIx7BLenGC8OBCgi8aNkE+yseQE4h2OTym3paEkLVJsl/9qcsSyXL1dwO4Z96U
 HFq/TpDZoBieZihHDBk4ry7ox942mE5N51QTDUh+cygEWeNvqGwqpAUbI14J23oh
 3p7w7hgXAtdtuj4pzqUARemHvIR0Xbpn8ritH9cx1s1mDdycyyBDn9mFw3Ehigom
 XIpUrSJtdfJYFj+z6wA4vXssvXe4TITrJTUmPAM1Alk1p+LhRkTA8JxBjHmL3qjR
 mFIxA40t+ON5OtCqTtGsapaoJy2Jj97dPEp5i5Jg49BQclQoTG2rpYuIu/aKrixG
 EZwdezckD3DPQUQdQidru7dS1J/phIaDDvEauq291ERHPfNAxQuMllXHeczzyJkc
 eVRMkj0/E5lihlE=
 =MN3z
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "A single small audit fix to guard against memory allocation failures
  when logging information about a kernel module load.

  It's small, easy to understand, and self-contained; while nothing is
  zero risk, this should be pretty low"

* tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix potential null dereference 'context->module.name'
2018-07-31 13:17:46 -07:00
Anna-Maria Gleixner
80d20d35af nohz: Fix local_timer_softirq_pending()
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
2018-07-31 22:08:44 +02:00
Linus Torvalds
f67077deb4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several smallish fixes, I don't think any of this requires another -rc
  but I'll leave that up to you:

   1) Don't leak uninitialzed bytes to userspace in xfrm_user, from Eric
      Dumazet.

   2) Route leak in xfrm_lookup_route(), from Tommi Rantala.

   3) Premature poll() returns in AF_XDP, from Björn Töpel.

   4) devlink leak in netdevsim, from Jakub Kicinski.

   5) Don't BUG_ON in fib_compute_spec_dst, the condition can
      legitimately happen. From Lorenzo Bianconi.

   6) Fix some spectre v1 gadgets in generic socket code, from Jeremy
      Cline.

   7) Don't allow user to bind to out of range multicast groups, from
      Dmitry Safonov with a follow-up by Dmitry Safonov.

   8) Fix metrics leak in fib6_drop_pcpu_from(), from Sabrina Dubroca"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  netlink: Don't shift with UB on nlk->ngroups
  net/ipv6: fix metrics leak
  xen-netfront: wait xenbus state change when load module manually
  can: ems_usb: Fix memory leak on ems_usb_disconnect()
  openvswitch: meter: Fix setting meter id for new entries
  netlink: Do not subscribe to non-existent groups
  NET: stmmac: align DMA stuff to largest cache line length
  tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
  net: socket: Fix potential spectre v1 gadget in sock_is_registered
  net: socket: fix potential spectre v1 gadget in socketcall
  net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
  ipv4: remove BUG_ON() from fib_compute_spec_dst
  enic: handle mtu change for vf properly
  net: lan78xx: fix rx handling before first packet is send
  nfp: flower: fix port metadata conversion bug
  bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()
  bpf: fix bpf_skb_load_bytes_relative pkt length check
  perf build: Build error in libbpf missing initialization
  net: ena: Fix use of uninitialized DMA address bits field
  bpf: btf: Use exact btf value_size match in map_check_btf()
  ...
2018-07-30 21:40:37 -07:00
Yi Wang
b305f7ed0f audit: fix potential null dereference 'context->module.name'
The variable 'context->module.name' may be null pointer when
kmalloc return null, so it's better to check it before using
to avoid null dereference.
Another one more thing this patch does is using kstrdup instead
of (kmalloc + strcpy), and signal a lost record via audit_log_lost.

Cc: stable@vger.kernel.org # 4.11
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-07-30 18:09:37 -04:00
Linus Torvalds
ae3e10aba5 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes:

   - a deadline scheduler related bug fix which triggered a kernel
     warning

   - an RT_RUNTIME_SHARE fix

   - a stop_machine preemption fix

   - a potential NULL dereference fix in sched_domain_debug_one()"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE
  sched/deadline: Update rq_clock of later_rq when pushing a task
  stop_machine: Disable preemption after queueing stopper threads
  sched/topology: Check variable group before dereferencing it
2018-07-30 12:13:56 -07:00
Linus Torvalds
0634922a78 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes:

   - AMD IBS data corruptor fix (uncovered by UBSAN)

   - an Intel PEBS entry unwind error fix

   - a HW-tracing crash fix

   - a MAINTAINERS update"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix crash when using HW tracing kernel filters
  perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)
  MAINTAINERS: Add Naveen N. Rao as kprobes co-maintainer
  perf/x86/amd/ibs: Don't access non-started event
2018-07-30 11:45:30 -07:00
Linus Torvalds
fb20c03d37 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code
  i2c/mux, locking/core: Annotate the nested rt_mutex usage
  locking/rtmutex: Allow specifying a subclass for nested locking
2018-07-30 11:37:16 -07:00
David S. Miller
958b4cd8fa Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-07-28

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) API fixes for libbpf's BTF mapping of map key/value types in order
   to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
   markings, from Martin.

2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
   consumer pointer of the RX queue, from Björn.

3) Fix __xdp_return() to check for NULL pointer after the rhashtable
   lookup that retrieves the allocator object, from Taehee.

4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
   by 4 bytes which got removed from overall stack usage, from Wang.

5) Fix bpf_skb_load_bytes_relative() length check to use actual
   packet length, from Daniel.

6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
   handler, from Thomas.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28 21:02:21 -07:00
Linus Torvalds
864af0d40c Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kvm, mm: account shadow page tables to kmemcg
  zswap: re-check zswap_is_full() after do zswap_shrink()
  include/linux/eventfd.h: include linux/errno.h
  mm: fix vma_is_anonymous() false-positives
  mm: use vma_init() to initialize VMAs on stack and data segments
  mm: introduce vma_init()
  mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
  ipc/sem.c: prevent queue.status tearing in semop
  mm: disallow mappings that conflict for devm_memremap_pages()
  kasan: only select SLUB_DEBUG with SYSFS=y
  delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
2018-07-27 10:30:47 -07:00
Linus Torvalds
3ebb6fb03d Various fixes to the tracing infrastructure:
- Fix double free when the reg() call fails in event_trigger_callback()
 
  - Fix anomoly of snapshot causing tracing_on flag to change
 
  - Add selftest to test snapshot and tracing_on affecting each other
 
  - Fix setting of tracepoint flag on error that prevents probes from
    being deleted.
 
  - Fix another possible double free that is similar to event_trigger_callback()
 
  - Quiet a gcc warning of a false positive unused variable
 
  - Fix crash of partial exposed task->comm to trace events
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW1pToBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qijEAQCzqQsnlO6YBCYajRBq2wFaM7J6tVnJ
 LxLZlVE8lJlHZQD/YpyGOPq98CB81BfQV7RA/CAVd4RZAhTjldDgGyfL/QI=
 =wU8I
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various fixes to the tracing infrastructure:

   - Fix double free when the reg() call fails in
     event_trigger_callback()

   - Fix anomoly of snapshot causing tracing_on flag to change

   - Add selftest to test snapshot and tracing_on affecting each other

   - Fix setting of tracepoint flag on error that prevents probes from
     being deleted.

   - Fix another possible double free that is similar to
     event_trigger_callback()

   - Quiet a gcc warning of a false positive unused variable

   - Fix crash of partial exposed task->comm to trace events"

* tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  kthread, tracing: Don't expose half-written comm when creating kthreads
  tracing: Quiet gcc warning about maybe unused link variable
  tracing: Fix possible double free in event_enable_trigger_func()
  tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
  selftests/ftrace: Add snapshot and tracing_on test case
  ring_buffer: tracing: Inherit the tracing setting to next ring buffer
  tracing: Fix double free of event_trigger_data
2018-07-27 09:50:33 -07:00
Kirill A. Shutemov
027232da7c mm: introduce vma_init()
Not all VMAs allocated with vm_area_alloc().  Some of them allocated on
stack or in data segment.

The new helper can be use to initialize VMA properly regardless where it
was allocated.

Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26 19:38:03 -07:00
Dan Williams
31c5bda3a6 mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
Commit e763848843 ("mm: introduce MEMORY_DEVICE_FS_DAX and
CONFIG_DEV_PAGEMAP_OPS") added two EXPORT_SYMBOL_GPL() symbols, but
these symbols are required by the inlined put_page(), thus accidentally
making put_page() a GPL export only.  This breaks OpenAFS (at least).

Mark them EXPORT_SYMBOL() instead.

Link: http://lkml.kernel.org/r/153128611970.2928.11310692420711601254.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: e763848843 ("mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Joe Gorse <jhgorse@gmail.com>
Reported-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Joe Gorse <jhgorse@gmail.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Mark Vitale <mvitale@sinenomine.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26 19:38:03 -07:00
Dave Jiang
15d36fecd0 mm: disallow mappings that conflict for devm_memremap_pages()
When pmem namespaces created are smaller than section size, this can
cause an issue during removal and gpf was observed:

  general protection fault: 0000 1 SMP PTI
  CPU: 36 PID: 3941 Comm: ndctl Tainted: G W 4.14.28-1.el7uek.x86_64 #2
  task: ffff88acda150000 task.stack: ffffc900233a4000
  RIP: 0010:__put_page+0x56/0x79
  Call Trace:
    devm_memremap_pages_release+0x155/0x23a
    release_nodes+0x21e/0x260
    devres_release_all+0x3c/0x48
    device_release_driver_internal+0x15c/0x207
    device_release_driver+0x12/0x14
    unbind_store+0xba/0xd8
    drv_attr_store+0x27/0x31
    sysfs_kf_write+0x3f/0x46
    kernfs_fop_write+0x10f/0x18b
    __vfs_write+0x3a/0x16d
    vfs_write+0xb2/0x1a1
    SyS_write+0x55/0xb9
    do_syscall_64+0x79/0x1ae
    entry_SYSCALL_64_after_hwframe+0x3d/0x0

Add code to check whether we have a mapping already in the same section
and prevent additional mappings from being created if that is the case.

Link: http://lkml.kernel.org/r/152909478401.50143.312364396244072931.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26 19:38:03 -07:00
Martin KaFai Lau
5f300e8004 bpf: btf: Use exact btf value_size match in map_check_btf()
The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects
'> map->value_size' to ensure map_seq_show_elem() will not
access things beyond an array element.

Yonghong suggested that using '!=' is a more correct
check.  The 8 bytes round_up on value_size is stored
in array->elem_size.  Hence, using '!=' on map->value_size
is a proper check.

This patch also adds new tests to check the btf array
key type and value type.  Two of these new tests verify
the btf's value_size (the change in this patch).

It also fixes two existing tests that wrongly encoded
a btf's type size (pprint_test) and the value_type_id (in one
of the raw_tests[]).  However, that do not affect these two
BTF verification tests before or after this test changes.
These two tests mainly failed at array creation time after
this patch.

Fixes: a26ca7c982 ("bpf: btf: Add pretty print support to the basic arraymap")
Suggested-by: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 03:45:49 +02:00
Snild Dolkow
3e536e222f kthread, tracing: Don't expose half-written comm when creating kthreads
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.

	creator                     other
	vsnprintf:
	  fill (not terminated)
	  count the rest            trace_sched_waking(p):
	  ...                         memcpy(comm, p->comm, TASK_COMM_LEN)
	  write \0

The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):

	crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
	0xffffffd5b3818640:     "irq/497-pwr_evenkworker/u16:12"

...and a strcpy out of there would cause stack corruption:

	[224761.522292] Kernel panic - not syncing: stack-protector:
	    Kernel stack is corrupted in: ffffff9bf9783c78

	crash-arm64> kbt | grep 'comm\|trace_print_context'
	#6  0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
	      comm (char [16]) =  "irq/497-pwr_even"

	crash-arm64> rd 0xffffffd4d0e17d14 8
	ffffffd4d0e17d14:  2f71726900000000 5f7277702d373934   ....irq/497-pwr_
	ffffffd4d0e17d24:  726f776b6e657665 3a3631752f72656b   evenkworker/u16:
	ffffffd4d0e17d34:  f9780248ff003231 cede60e0ffffff9b   12..H.x......`..
	ffffffd4d0e17d44:  cede60c8ffffffd4 00000fffffffffd4   .....`..........

The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.

Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.

Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com

Cc: stable@vger.kernel.org
Fixes: bc0c38d139 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-26 09:59:33 -04:00
Steven Rostedt (VMware)
2519c1bbe3 tracing: Quiet gcc warning about maybe unused link variable
Commit 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:

 "warning: 'link' may be used uninitialized in this function"

It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.

Cc: stable@vger.kernel.org
Fixes: 57ea2a34ad ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 22:33:50 -04:00
Steven Rostedt (VMware)
15cc78644d tracing: Fix possible double free in event_enable_trigger_func()
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.

Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.

Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 7862ad1846 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 21:25:16 -04:00
Artem Savkov
57ea2a34ad tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previously.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.

Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 41a7dd420c ("tracing/kprobes: Support ftrace_event_file base multibuffer")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 11:41:08 -04:00
Masami Hiramatsu
73c8d89455 ring_buffer: tracing: Inherit the tracing setting to next ring buffer
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.

Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:

  /sys/kernel/debug/tracing # cat tracing_on
  1
  /sys/kernel/debug/tracing # echo 0 > tracing_on
  /sys/kernel/debug/tracing # cat tracing_on
  0
  /sys/kernel/debug/tracing # echo 1 > snapshot
  /sys/kernel/debug/tracing # cat tracing_on
  1
  /sys/kernel/debug/tracing # echo 1 > snapshot
  /sys/kernel/debug/tracing # cat tracing_on
  0

We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.

Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f51 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 10:29:41 -04:00
Steven Rostedt (VMware)
1863c38725 tracing: Fix double free of event_trigger_data
Running the following:

 # cd /sys/kernel/debug/tracing
 # echo 500000 > buffer_size_kb
[ Or some other number that takes up most of memory ]
 # echo snapshot > events/sched/sched_switch/trigger

Triggers the following bug:

 ------------[ cut here ]------------
 kernel BUG at mm/slub.c:296!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
 CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 RIP: 0010:kfree+0x16c/0x180
 Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
 RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
 RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
 RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
 RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
 R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
 R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
 FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
 Call Trace:
  event_trigger_callback+0xee/0x1d0
  event_trigger_write+0xfc/0x1a0
  __vfs_write+0x33/0x190
  ? handle_mm_fault+0x115/0x230
  ? _cond_resched+0x16/0x40
  vfs_write+0xb0/0x190
  ksys_write+0x52/0xc0
  do_syscall_64+0x5a/0x160
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7f363e16ab50
 Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
 RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
 RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
 RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
 R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
 R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
 ---[ end trace d301afa879ddfa25 ]---

The cause is because the register_snapshot_trigger() call failed to
allocate the snapshot buffer, and then called unregister_trigger()
which freed the data that was passed to it. Then on return to the
function that called register_snapshot_trigger(), as it sees it
failed to register, it frees the trigger_data again and causes
a double free.

By calling event_trigger_init() on the trigger_data (which only ups
the reference counter for it), and then event_trigger_free() afterward,
the trigger_data would not get freed by the registering trigger function
as it would only up and lower the ref count for it. If the register
trigger function fails, then the event_trigger_free() called after it
will free the trigger data normally.

Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home

Cc: stable@vger.kerne.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 10:29:24 -04:00
Mathieu Poirier
7f635ff187 perf/core: Fix crash when using HW tracing kernel filters
In function perf_event_parse_addr_filter(), the path::dentry of each struct
perf_addr_filter is left unassigned (as it should be) when the pattern
being parsed is related to kernel space.  But in function
perf_addr_filter_match() the same dentries are given to d_inode() where
the value is not expected to be NULL, resulting in the following splat:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
  pc : perf_event_mmap+0x2fc/0x5a0
  lr : perf_event_mmap+0x2c8/0x5a0
  Process uname (pid: 2860, stack limit = 0x000000001cbcca37)
  Call trace:
   perf_event_mmap+0x2fc/0x5a0
   mmap_region+0x124/0x570
   do_mmap+0x344/0x4f8
   vm_mmap_pgoff+0xe4/0x110
   vm_mmap+0x2c/0x40
   elf_map+0x60/0x108
   load_elf_binary+0x450/0x12c4
   search_binary_handler+0x90/0x290
   __do_execve_file.isra.13+0x6e4/0x858
   sys_execve+0x3c/0x50
   el0_svc_naked+0x30/0x34

This patch is fixing the problem by introducing a new check in function
perf_addr_filter_match() to see if the filter's dentry is NULL.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: miklos@szeredi.hu
Cc: namhyung@kernel.org
Cc: songliubraving@fb.com
Fixes: 9511bce9fe ("perf/core: Fix bad use of igrab()")
Link: http://lkml.kernel.org/r/1531782831-1186-1-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:46:22 +02:00
Peter Zijlstra
6cbc304f2f perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)
Vince reported the perf_fuzzer giving various unwinder warnings and
Josh reported:

> Deja vu.  Most of these are related to perf PEBS, similar to the
> following issue:
>
>   b8000586c9 ("perf/x86/intel: Cure bogus unwind from PEBS entries")
>
> This is basically the ORC version of that.  setup_pebs_sample_data() is
> assembling a franken-pt_regs which ORC isn't happy about.  RIP is
> inconsistent with some of the other registers (like RSP and RBP).

And where the previous unwinder only needed BP,SP ORC also requires
IP. But we cannot spoof IP because then the sample will get displaced,
entirely negating the point of PEBS.

So cure the whole thing differently by doing the unwind early; this
does however require a means to communicate we did the unwind early.
We (ab)use an unused sample_type bit for this, which we set on events
that fill out the data->callchain before the normal
perf_prepare_sample().

Debugged-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:46:21 +02:00
Hailong Liu
f3d133ee0a sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE
NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough
runtime with a spin-rt-task.

However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd
enough rt_runtime at the beginning, rt_runtime can't be restored to
its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE.

E.g. on my PC with 4 cores, procedure to reproduce:
1) Make sure  RT_RUNTIME_SHARE is enabled
 cat /sys/kernel/debug/sched_features
  GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
  CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK
  LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP
  NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN
  ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS
2) Start a spin-rt-task
 ./loop_rr &
3) set affinity to the last cpu
 taskset -p 8 $pid_of_loop_rr
4) Observe that last cpu have borrowed enough runtime.
 cat /proc/sched_debug | grep rt_runtime
  .rt_runtime                    : 950.000000
  .rt_runtime                    : 900.000000
  .rt_runtime                    : 950.000000
  .rt_runtime                    : 1000.000000
5) Disable RT_RUNTIME_SHARE
 echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
6) Observe that rt_runtime can not been restored
 cat /proc/sched_debug | grep rt_runtime
  .rt_runtime                    : 950.000000
  .rt_runtime                    : 900.000000
  .rt_runtime                    : 950.000000
  .rt_runtime                    : 1000.000000

This patch help to restore rt_runtime after we disable
RT_RUNTIME_SHARE.

Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:29:08 +02:00
Daniel Bristot de Oliveira
840d719604 sched/deadline: Update rq_clock of later_rq when pushing a task
Daniel Casini got this warn while running a DL task here at RetisLab:

  [  461.137582] ------------[ cut here ]------------
  [  461.137583] rq->clock_update_flags < RQCF_ACT_SKIP
  [  461.137599] WARNING: CPU: 4 PID: 2354 at kernel/sched/sched.h:967 assert_clock_updated.isra.32.part.33+0x17/0x20
      [a ton of modules]
  [  461.137646] CPU: 4 PID: 2354 Comm: label_image Not tainted 4.18.0-rc4+ #3
  [  461.137647] Hardware name: ASUS All Series/Z87-K, BIOS 0801 09/02/2013
  [  461.137649] RIP: 0010:assert_clock_updated.isra.32.part.33+0x17/0x20
  [  461.137649] Code: ff 48 89 83 08 09 00 00 eb c6 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 98 7a 6c a5 c6 05 bc 0d 54 01 01 48 89 e5 e8 a9 84 fb ff <0f> 0b 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 83 7e 60 01 74 0a 48 3b
  [  461.137673] RSP: 0018:ffffa77e08cafc68 EFLAGS: 00010082
  [  461.137674] RAX: 0000000000000000 RBX: ffff8b3fc1702d80 RCX: 0000000000000006
  [  461.137674] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8b3fded164b0
  [  461.137675] RBP: ffffa77e08cafc68 R08: 0000000000000026 R09: 0000000000000339
  [  461.137676] R10: ffff8b3fd060d410 R11: 0000000000000026 R12: ffffffffa4e14e20
  [  461.137677] R13: ffff8b3fdec22940 R14: ffff8b3fc1702da0 R15: ffff8b3fdec22940
  [  461.137678] FS:  00007efe43ee5700(0000) GS:ffff8b3fded00000(0000) knlGS:0000000000000000
  [  461.137679] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  461.137680] CR2: 00007efe30000010 CR3: 0000000301744003 CR4: 00000000001606e0
  [  461.137680] Call Trace:
  [  461.137684]  push_dl_task.part.46+0x3bc/0x460
  [  461.137686]  task_woken_dl+0x60/0x80
  [  461.137689]  ttwu_do_wakeup+0x4f/0x150
  [  461.137690]  ttwu_do_activate+0x77/0x80
  [  461.137692]  try_to_wake_up+0x1d6/0x4c0
  [  461.137693]  wake_up_q+0x32/0x70
  [  461.137696]  do_futex+0x7e7/0xb50
  [  461.137698]  __x64_sys_futex+0x8b/0x180
  [  461.137701]  do_syscall_64+0x5a/0x110
  [  461.137703]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [  461.137705] RIP: 0033:0x7efe4918ca26
  [  461.137705] Code: 00 00 00 74 17 49 8b 48 20 44 8b 59 10 41 83 e3 30 41 83 fb 20 74 1e be 85 00 00 00 41 ba 01 00 00 00 41 b9 01 00 00 04 0f 05 <48> 3d 01 f0 ff ff 73 1f 31 c0 c3 be 8c 00 00 00 49 89 c8 4d 31 d2
  [  461.137738] RSP: 002b:00007efe43ee4928 EFLAGS: 00000283 ORIG_RAX: 00000000000000ca
  [  461.137739] RAX: ffffffffffffffda RBX: 0000000005094df0 RCX: 00007efe4918ca26
  [  461.137740] RDX: 0000000000000001 RSI: 0000000000000085 RDI: 0000000005094e24
  [  461.137741] RBP: 00007efe43ee49c0 R08: 0000000005094e20 R09: 0000000004000001
  [  461.137741] R10: 0000000000000001 R11: 0000000000000283 R12: 0000000000000000
  [  461.137742] R13: 0000000005094df8 R14: 0000000000000001 R15: 0000000000448a10
  [  461.137743] ---[ end trace 187df4cad2bf7649 ]---

This warning happened in the push_dl_task(), because
__add_running_bw()->cpufreq_update_util() is getting the rq_clock of
the later_rq before its update, which takes place at activate_task().
The fix then is to update the rq_clock before calling add_running_bw().

To avoid double rq_clock_update() call, we set ENQUEUE_NOCLOCK flag to
activate_task().

Reported-by: Daniel Casini <daniel.casini@santannapisa.it>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Fixes: e0367b1267 sched/deadline: Move CPU frequency selection triggering points
Link: http://lkml.kernel.org/r/ca31d073a4788acf0684a8b255f14fea775ccf20.1532077269.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:29:08 +02:00
Isaac J. Manjarres
2610e88946 stop_machine: Disable preemption after queueing stopper threads
This commit:

  9fb8d5dc4b ("stop_machine, Disable preemption when waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: matt@codeblueprint.co.uk
Fixes: 9fb8d5dc4b ("stop_machine, Disable preemption when waking two stopper threads")
Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:25:08 +02:00
Yi Wang
6cd0c583b0 sched/topology: Check variable group before dereferencing it
The 'group' variable in sched_domain_debug_one() is not checked
when firstly used in cpumask_test_cpu(cpu, sched_group_span(group)),
but it might be NULL (it is checked later in the following while loop)
and may cause NULL pointer dereference.

We need to check it before using to avoid NULL dereference.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1532319547-33335-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:25:07 +02:00
Peter Rosin
62cedf3e60 locking/rtmutex: Allow specifying a subclass for nested locking
Needed for annotating rt_mutex locks.

Tested-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepadinamani@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Chang <dpf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Link: http://lkml.kernel.org/r/20180720083914.1950-2-peda@axentia.se
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:22:19 +02:00
Linus Torvalds
0723090656 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle stations tied to AP_VLANs properly during mac80211 hw
    reconfig. From Manikanta Pubbisetty.

 2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.

 3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
    Ben Elisha.

 4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.

 5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.

 6) Fix cached netdev name leak in nf_tables, from Florian Westphal.

 7) Fix memory leaks on chain rename, also from Florian Westphal.

 8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
    Cheng.

 9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.

10) Fix link local address handling with VRF, from David Ahern.

11) Don't clobber 'err' on a successful call to __skb_linearize() in
    skb_segment(). From Eric Dumazet.

12) Fix vxlan fdb notification races, from Roopa Prabhu.

13) Hash UDP fragments consistently, from Paolo Abeni.

14) If TCP receives lots of out of order tiny packets, we do really
    silly stuff. Make the out-of-order queue ending more robust to this
    kind of behavior, from Eric Dumazet.

15) Don't leak netlink dump state in nf_tables, from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: axienet: Fix double deregister of mdio
  qmi_wwan: fix interface number for DW5821e production firmware
  ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
  bnx2x: Fix invalid memory access in rss hash config path.
  net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
  r8169: restore previous behavior to accept BIOS WoL settings
  cfg80211: never ignore user regulatory hint
  sock: fix sg page frag coalescing in sk_alloc_sg
  netfilter: nf_tables: move dumper state allocation into ->start
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  ip: hash fragments consistently
  ipv6: use fib6_info_hold_safe() when necessary
  can: xilinx_can: fix power management handling
  can: xilinx_can: fix incorrect clear of non-processed interrupts
  can: xilinx_can: fix RX overflow interrupt not being enabled
  can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
  ...
2018-07-24 17:31:47 -07:00
Martin KaFai Lau
6283fa38dc bpf: btf: Ensure the member->offset is in the right order
This patch ensures the member->offset of a struct
is in the correct order (i.e the later member's offset cannot
go backward).

The current "pahole -J" BTF encoder does not generate something
like this.  However, checking this can ensure future encoder
will not violate this.

Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-24 01:20:44 +02:00
Linus Torvalds
48b1db7c7a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two fixes: a stop-machine preemption fix and a SCHED_DEADLINE fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix switched_from_dl() warning
  stop_machine: Disable preemption when waking two stopper threads
2018-07-21 17:21:34 -07:00
Linus Torvalds
490fc05386 mm: make vm_area_alloc() initialize core fields
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 15:24:03 -07:00
Linus Torvalds
95faf6992d mm: make vm_area_dup() actually copy the old vma data
.. and re-initialize th eanon_vma_chain head.

This removes some boiler-plate from the users, and also makes it clear
why it didn't need use the 'zalloc()' version.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 14:48:45 -07:00
Linus Torvalds
3928d4f5ee mm: use helper functions for allocating and freeing vm_area structs
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls.  This is a purely mechanical conversion:

    # new vma:
    kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()

    # copy old vma
    kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)

    # free vma
    kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 13:48:51 -07:00
Martin KaFai Lau
36fc3c8c28 bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h
This patch shrinks the BTF_INT_BITS() mask.  The current
btf_int_check_meta() ensures the nr_bits of an integer
cannot exceed 64.  Hence, it is mostly an uapi cleanup.

The actual btf usage (i.e. seq_show()) is also modified
to use u8 instead of u16.  The verification (e.g. btf_int_check_meta())
path stays as is to deal with invalid BTF situation.

Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-20 10:25:48 +02:00
Linus Torvalds
024ddc0ce1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Lots of fixes, here goes:

   1) NULL deref in qtnfmac, from Gustavo A. R. Silva.

   2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.

   3) Lost completion messages in AF_XDP, from Magnus Karlsson.

   4) Correct bogus self-assignment in rhashtable, from Rishabh
      Bhatnagar.

   5) Fix regression in ipv6 route append handling, from David Ahern.

   6) Fix masking in __set_phy_supported(), from Heiner Kallweit.

   7) Missing module owner set in x_tables icmp, from Florian Westphal.

   8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.

   9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.

  10) Fix NULL deref when using chains in act_csum, from Davide Caratti.

  11) XDP_REDIRECT needs to check if the interface is up and whether the
      MTU is sufficient. From Toshiaki Makita.

  12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
      connections, from Lorenzo Colitti.

  13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
      full minute, delaying device unregister. From Eric Dumazet.

  14) Update MAC entries in the correct order in ixgbe, from Alexander
      Duyck.

  15) Don't leave partial mangles bpf program in jit_subprogs, from
      Daniel Borkmann.

  16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.

  17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.

  18) Use after free in tun XDP_TX, from Toshiaki Makita.

  19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.

  20) Don't reuse remainder of RX page when XDP is set in mlx4, from
      Saeed Mahameed.

  21) Fix window probe handling of TCP rapair sockets, from Stefan
      Baranoff.

  22) Missing socket locking in smc_ioctl(), from Ursula Braun.

  23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.

  24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.

  25) Two spots in ipv6 do a rol32() on a hash value but ignore the
      result. Fixes from Colin Ian King"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
  tcp: identify cryptic messages as TCP seq # bugs
  ptp: fix missing break in switch
  hv_netvsc: Fix napi reschedule while receive completion is busy
  MAINTAINERS: Drop inactive Vitaly Bordug's email
  net: cavium: Add fine-granular dependencies on PCI
  net: qca_spi: Fix log level if probe fails
  net: qca_spi: Make sure the QCA7000 reset is triggered
  net: qca_spi: Avoid packet drop during initial sync
  ipv6: fix useless rol32 call on hash
  ipv6: sr: fix useless rol32 call on hash
  net: sched: Using NULL instead of plain integer
  net: usb: asix: replace mii_nway_restart in resume path
  net: cxgb3_main: fix potential Spectre v1
  lib/rhashtable: consider param->min_size when setting initial table size
  net/smc: reset recv timeout after clc handshake
  net/smc: add error handling for get_user()
  net/smc: optimize consumer cursor updates
  net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
  ipv6: ila: select CONFIG_DST_CACHE
  net: usb: rtl8150: demote allmulti message to dev_dbg()
  ...
2018-07-18 19:32:54 -07:00
Linus Torvalds
3c53776e29 Mark HI and TASKLET softirq synchronous
Way back in 4.9, we committed 4cd13c21b2 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it.  For
example, we've had:

  1ff688209e ("watchdog: core: make sure the watchdog_worker is not deferred")
  8d5755b3f7 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
  217f697436 ("net: busy-poll: allow preemption in sk_busy_loop()")

all of which worked around some of the effects of that commit.

The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic.  This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.

Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:

 "I'm facing kernel panic issue while running raid 5 on sata disks
  connected to Macchiatobin (Marvell community board with Armada-8040
  SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
  and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
  uses a tasklet to clean the done descriptors from the queue"

The latency problem causes a panic:

  mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
  Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction

We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc).  This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.

We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.

Reported-and-tested-by: Hanna Hawa <hannah@marvell.com>
Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-17 11:12:43 -07:00