Commit Graph

721807 Commits

Author SHA1 Message Date
Rafael J. Wysocki
29aaf90875 Merge branch 'pm-domains'
* pm-domains:
  PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
  PM / domains: Rework governor code to be more consistent
  PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
  soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
  soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
  ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
  PM / Domains: Allow genpd users to specify default active wakeup behavior
  PM / Domains: Add support to select performance-state of domains
  PM / Domains: Rename genpd internals from pm_genpd_* to genpd_*
2017-11-13 01:33:35 +01:00
Linus Torvalds
9d5604101e Merge branch 'kallsyms-restrictions'
Merge /proc/kallsyms pointer value restrictions.

Instead of using %pK, and making it about root access (at the wrong
time, no less), make the whole choice of whether to show the actual
pointer value be very explicit to the kallsyms code.

In particular, we can now default to not doing so, and yet avoid
annoying kernel profiling by actually looking at whether kernel
profiling is allowed or not (by default it is not).

This is all mostly preparation for the real "let's stop leaking kernel
addresses" work that Tobin Harding is working on.

Small steps.

* kallsyms-restrictions:
  stop using '%pK' for /proc/kallsyms pointer values
2017-11-12 16:33:33 -08:00
Rafael J. Wysocki
040e8a4a4c Merge branches 'pm-pci', 'pm-avs' and 'pm-docs'
* pm-pci:
  PCI / PM: Add dev_dbg() to print device suspend power states
  PCI / PM: Do not resume any devices in pci_pm_prepare()

* pm-avs:
  PM / AVS: Use %pS printk format for direct addresses

* pm-docs:
  PM: docs: Fix formatting typo in devices.rst
2017-11-13 01:32:25 +01:00
Naveen N. Rao
46725b17f1 powerpc/signal: Properly handle return value from uprobe_deny_signal()
When a uprobe is installed on an instruction that we currently do not
emulate, we copy the instruction into a xol buffer and single step
that instruction. If that instruction generates a fault, we abort the
single stepping before invoking the signal handler. Once the signal
handler is done, the uprobe trap is hit again since the instruction is
retried and the process repeats.

We use uprobe_deny_signal() to detect if the xol instruction triggered
a signal. If so, we clear TIF_SIGPENDING and set TIF_UPROBE so that the
signal is not handled until after the single stepping is aborted. In
this case, uprobe_deny_signal() returns true and get_signal() ends up
returning 0. However, in do_signal(), we are not looking at the return
value, but depending on ksig.sig for further action, all with an
uninitialized ksig that is not touched in this scenario. Fix the same
by initializing ksig.sig to 0.

Fixes: 129b69df9c ("powerpc: Use get_signal() signal_setup_done()")
Cc: stable@vger.kernel.org # v3.17+
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 10:53:05 +11:00
Michal Suchanek
dcdc46794b powerpc/fadump: use kstrtoint to handle sysfs store
Currently sysfs store handlers in fadump use if buf[0] == 'char'.

This means input "100foo" is interpreted as '1' and "01" as '0'.

Change to kstrtoint so leading zeroes and the like is handled in
expected way.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michal Suchanek <a class="moz-txt-link-rfc2396E" href="mailto:msuchanek@suse.de">&lt;msuchanek@suse.de&gt;</a></pre>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 10:51:38 +11:00
Dan Williams
0e7f074145 acpi, nfit: validate commands against the device type
Fix occasions in acpi_nfit_ctl where we check the command type without
validating whether we are parsing dimm vs bus level commands. Where the
command numbers alias between dimms and bus we can make the wrong
assumption just checking the raw command number. For example, with a
simple nfit_test mock up of the clear-error command we trigger the
following:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000094
    IP: acpi_nfit_ctl+0x29b/0x930 [nfit]
    [..]
    Call Trace:
     nfit_test_probe+0xb85/0xc09 [nfit_test]
     platform_drv_probe+0x3b/0xa0
     ? platform_drv_probe+0x3b/0xa0
     driver_probe_device+0x29c/0x450
     ? test_alloc+0x180/0x180 [nfit_test]
     __driver_attach+0xe3/0xf0
     ? driver_probe_device+0x450/0x450
     bus_for_each_dev+0x73/0xc0
     driver_attach+0x1e/0x20
     bus_add_driver+0x173/0x270
     driver_register+0x60/0xe0
     __platform_driver_register+0x36/0x40
     nfit_test_init+0x2a1/0x1000 [nfit_test]

Fixes: 4b27db7e26 ("acpi, nfit: add support for the _LSI, _LSR, and...")
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-12 15:18:57 -08:00
Rasmus Villemoes
ffc661c99f genirq: Fix type of shifting literal 1 in __setup_irq()
If ffz() ever returns a value >= 31 then the following shift is undefined
behaviour because the literal 1 which gets shifted is treated as signed
integer.

In practice, the bug is probably harmless, since the first undefined shift
count is 31 which results - ignoring UB - in (int)(0x80000000). This gets
sign extended so bit 32-63 will be set as well and all subsequent
__setup_irq() calls would just end up hitting the -EBUSY branch.

However, a sufficiently aggressive optimizer may use the UB of 1<<31
to decide that doesn't happen, and hence elide the sign-extension
code, so that subsequent calls can indeed get ffz > 31.

In any case, the right thing to do is to make the literal 1UL.

[ tglx: For this to happen a single interrupt would have to be shared by 32
  	devices. Hardware like that does not exist and would have way more
  	problems than that. ]

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171030213548.16831-1-linux@rasmusvillemoes.dk
2017-11-12 23:25:40 +01:00
Rasmus Villemoes
306eb5a38d irqdomain: Drop pointless NULL check in virq_debug_show_one
data has been already derefenced unconditionally, so it's pointless to do a
NULL pointer check on it afterwards. Drop it.

[ tglx: Depersonify changelog. ]

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20171112212904.28574-1-linux@rasmusvillemoes.dk
2017-11-12 23:25:40 +01:00
Wen Yaxng
6714796edc genirq/proc: Return proper error code when irq_set_affinity() fails
write_irq_affinity() returns the number of written bytes, which means
success, unconditionally whether the actual irq_set_affinity() call
succeeded or not.

Add proper error handling and pass the error code returned from
irq_set_affinity() back to user space in case of failure.

[ tglx: Fixed coding style and massaged changelog ]

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Cc: zhong.weidong@zte.com.cn
Link: https://lkml.kernel.org/r/1510106103-184761-1-git-send-email-wen.yang99@zte.com.cn
2017-11-12 23:25:39 +01:00
Randy Dunlap
ba1029c9cb modpost: detect modules without a MODULE_LICENSE
Partially revert commit 2fa3656829 ("kbuild: soften MODULE_LICENSE
check") so that modpost detects modules that do not have a
MODULE_LICENSE.

Sam's commit also changed the fatal error to a warning, which I am
leaving as is.

This gives advance notice of when a module has no license and will taint
the kernel if the module is loaded.

This produces the following warnings on x86_64 allmodconfig:

    MODPOST 6520 modules
  WARNING: modpost: missing MODULE_LICENSE() in drivers/auxdisplay/img-ascii-lcd.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/gpio/gpio-ath79.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/gpio/gpio-iop.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/iio/accel/kxsd9-i2c.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/iio/adc/qcom-vadc-common.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/mtk-vcodec/mtk-vcodec-common.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/media/platform/soc_camera/soc_scale_crop.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/mtd/nand/denali_pci.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/net/phy/cortina.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/pinctrl/pxa/pinctrl-pxa2xx.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/power/reset/zx-reboot.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/rpmsg/qcom_glink_native.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/staging/comedi/drivers/ni_atmio.o
  WARNING: modpost: missing MODULE_LICENSE() in net/9p/9pnet_xen.o
  WARNING: modpost: missing MODULE_LICENSE() in sound/soc/codecs/snd-soc-pcm512x-spi.o

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-12 13:47:40 -08:00
Oliver O'Halloran
6c44741d75 powerpc/lib: Implement UACCESS_FLUSHCACHE API
Implement the architecture specific portitions of the UACCESS_FLUSHCACHE
API. This provides functions for the copy_user_flushcache iterator that
ensure that when the copy is finished the destination buffer contains
a copy of the original and that the destination buffer is clean in the
processor caches.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 08:00:31 +11:00
Oliver O'Halloran
32ce3862af powerpc/lib: Implement PMEM API
Implement the architecture specific cache maintence functions that make
up the "PMEM API". Currently the writeback and invalidate functions
are the same since the function of the DCBST (data cache block store)
instruction is typically interpreted as "writeback to the point of
coherency" rather than to memory. As a result implementing the API
requires a full cache flush rather than just a cache write back. This
will probably change in the not-too-distant future.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 08:00:30 +11:00
Alistair Popple
1b2c2b1238 powerpc/powernv/npu: Don't explicitly flush nmmu tlb
The nest mmu required an explicit flush as a tlbi would not flush it in the
same way as the core. However an alternate firmware fix exists which should
eliminate the need for this flush, so instead add a device-tree property
(ibm,nmmu-flush) on the NVLink2 PHB to enable it only if required.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 08:00:30 +11:00
Alistair Popple
2a31ad093b powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
With the optimisations introduced by commit a46cc7a908 ("powerpc/mm/radix:
Improve TLB/PWC flushes"), flush_tlb_mm() no longer flushes the page walk
cache with radix. Switch to using flush_all_mm() to ensure the pwc and tlb
are properly flushed on the nmmu.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 08:00:29 +11:00
Vaidyanathan Srinivasan
8d4e10e9ed powerpc/powernv/idle: Round up latency and residency values
On PowerNV platforms, firmware provides exit latency and
target residency for each of the idle states in nano
seconds.  Cpuidle framework expects the values in micro
seconds.  Round up to nearest micro seconds to avoid errors
in cases where the values are defined as fractional micro
seconds.

Default idle state of 'snooze' has exit latency of zero.  If
other states have fractional micro second exit latency, they
would get rounded down to zero micro second and make cpuidle
framework choose deeper idle state when snooze loop is the
right choice.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 08:00:29 +11:00
Linus Torvalds
bebc6082da Linux 4.14 2017-11-12 10:46:13 -08:00
Linus Torvalds
152bbb43b3 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes:

   - make KGDB work again which got broken by the conversion of WARN()
     to #UD. The WARN fixup needs to run before the notifier callchain,
     otherwise KGDB tries to handle it and crashes.

   - disable KASAN in the ORC unwinder to prevent false positive KASAN
     warnings

   - prevent default mapping above 47bit when 5 level page tables are
     enabled

   - make the delay calibration optimization work correctly, which had
     the conditionals the wrong way around and was operating on data
     which was not yet updated.

   - remove the bogus X86_TRAP_BP trap init from the default IDT init
     table, which broke 32bit int3 handling by overwriting the correct
     int3 setup.

   - replace this_cpu* with boot_cpu_data access in the preemptible
     oprofile init code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Handle warnings before the notifier chain, to fix KGDB crash
  x86/mm: Fix ELF_ET_DYN_BASE for 5-level paging
  x86/idt: Remove X86_TRAP_BP initialization in idt_setup_traps()
  x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context
  x86/unwind: Disable KASAN checking in the ORC unwinder
  x86/smpboot: Make optimization of delay calibration work correctly
2017-11-12 10:12:41 -08:00
Linus Torvalds
69581c7472 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tool fixes from Thomas Gleixner:
 "A small set of fixes for perf tool:

   - synchronize the i915 drm header to avoid the 'out of date' warning

   - make sure that perf trace cleans up its temporary files on exit

   - unbreak the build with newer flex versions

   - add missing braces in the eBPF parsing rules"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tooling/headers: Sync the tools/include/uapi/drm/i915_drm.h UAPI header
  perf trace: Call machine__exit() at exit
  perf tools: Fix eBPF event specification parsing
  perf tools: Add "reject" option for parse-events.l
2017-11-12 09:43:53 -08:00
Johan Hovold
00ee9a1ca5 irqchip/gic-v3: Fix ppi-partitions lookup
Fix child-node lookup during initialisation, which ended up searching
the whole device tree depth-first starting at the parent rather than
just matching on its children.

To make things worse, the parent gic node was prematurely freed, while
the ppi-partitions node was leaked.

Fixes: e3825ba1af ("irqchip/gic-v3: Add support for partitioned PPIs")
Cc: stable <stable@vger.kernel.org>     # 4.7
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-12 14:23:19 +00:00
David Howells
b24591e2fc timers: Add a function to start/reduce a timer
Add a function, similar to mod_timer(), that will start a timer if it isn't
running and will modify it if it is running and has an expiry time longer
than the new time.  If the timer is running with an expiry time that's the
same or sooner, no change is made.

The function looks like:

	int timer_reduce(struct timer_list *timer, unsigned long expires);

This can be used by code such as networking code to make it easier to share
a timer for multiple timeouts.  For instance, in upcoming AF_RXRPC code,
the rxrpc_call struct will maintain a number of timeouts:

	unsigned long	ack_at;
	unsigned long	resend_at;
	unsigned long	ping_at;
	unsigned long	expect_rx_by;
	unsigned long	expect_req_by;
	unsigned long	expect_term_by;

each of which is set independently of the others.  With timer reduction
available, when the code needs to set one of the timeouts, it only needs to
look at that timeout and then call timer_reduce() to modify the timer,
starting it or bringing it forward if necessary.  There is no need to refer
to the other timeouts to see which is earliest and no need to take any lock
other than, potentially, the timer lock inside timer_reduce().

Note, that this does not protect against concurrent invocations of any of
the timer functions.

As an example, the expect_rx_by timeout above, which terminates a call if
we don't get a packet from the server within a certain time window, would
be set something like this:

	unsigned long now = jiffies;
	unsigned long expect_rx_by = now + packet_receive_timeout;
	WRITE_ONCE(call->expect_rx_by, expect_rx_by);
	timer_reduce(&call->timer, expect_rx_by);

The timer service code (which might, say, be in a work function) would then
check all the timeouts to see which, if any, had triggered, deal with
those:

	t = READ_ONCE(call->ack_at);
	if (time_after_eq(now, t)) {
		cmpxchg(&call->ack_at, t, now + MAX_JIFFY_OFFSET);
		set_bit(RXRPC_CALL_EV_ACK, &call->events);
	}

and then restart the timer if necessary by finding the soonest timeout that
hasn't yet passed and then calling timer_reduce().

The disadvantage of doing things this way rather than comparing the timers
each time and calling mod_timer() is that you *will* take timer events
unless you can finish what you're doing and delete the timer in time.

The advantage of doing things this way is that you don't need to use a lock
to work out when the next timer should be set, other than the timer's own
lock - which you might not have to take.

[ tglx: Fixed weird formatting and adopted it to pending changes ]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-afs@lists.infradead.org
Link: https://lkml.kernel.org/r/151023090769.23050.1801643667223880753.stgit@warthog.procyon.org.uk
2017-11-12 15:10:27 +01:00
Arnd Bergmann
df27067e60 pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
__getnstimeofday() is a rather odd interface, with a number of quirks:

- The caller may come from NMI context, but the implementation is not NMI safe,
  one way to get there from NMI is

      NMI handler:
        something bad
          panic()
            kmsg_dump()
              pstore_dump()
                 pstore_record_init()
                   __getnstimeofday()

- The calling conventions are different from any other timekeeping functions,
  to deal with returning an error code during suspended timekeeping.

Address the above issues by using a completely different method to get the
time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior
when timekeeping is suspended: it returns the time at which it got
suspended. As Thomas Gleixner explained, this is safe, as
ktime_get_real_fast_ns() does not call into the clocksource driver that
might be suspended.

The result can easily be transformed into a timespec structure. Since
ktime_get_real_fast_ns() was not exported to modules, add the export.

The pstore behavior for the suspended case changes slightly, as it now
stores the timestamp at which timekeeping was suspended instead of storing
a zero timestamp.

This change is not addressing y2038-safety, that's subject to a more
complex follow up patch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Colin Cross <ccross@android.com>
Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de
2017-11-12 15:05:52 +01:00
Naveen N. Rao
acdfe93101 powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
Use safer string manipulation functions when dealing with a
user-provided string in kprobe_lookup_name().

Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 23:51:43 +11:00
Naveen N. Rao
67ac0bfe29 powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
Commit 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't
modify *regs") introduced emulate_update_regs() to perform part of what
emulate_step() was doing earlier. However, this function was not added
to the kprobes blacklist. Add it so as to prevent it from being probed.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 23:51:42 +11:00
Naveen N. Rao
f72180cc93 powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
Per Documentation/kprobes.txt, we don't necessarily need to disable
interrupts before invoking the kprobe handlers. Masami submitted
similar changes for x86 via commit a19b2e3d78 ("kprobes/x86: Remove
IRQ disabling from ftrace-based/optimized kprobes"). Do the same for
powerpc.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 23:51:41 +11:00
Naveen N. Rao
8a2d71a3f2 powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
Per Documentation/kprobes.txt, probe handlers need to be invoked with
preemption disabled. Update optimized_callback() to do so. Also move
get_kprobe_ctlblk() invocation post preemption disable, since it
accesses pre-cpu data.

This was not an issue so far since optprobes wasn't selected if
CONFIG_PREEMPT was enabled. Commit a30b85df7d ("kprobes: Use
synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y") changes
this.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 23:51:40 +11:00
Stephen Rothwell
fc2a5a6161 powerpc/64s: ppc_save_regs is now needed for all 64s builds
Commit 78adf6c214 ("powerpc/64s: Implement system reset idle wakeup
reason"), added a call to ppc_save_regs() in the book3s code.

ppc_save_regs() is only built if XMON and/or KEXEC_CORE are enabled,
which is usually the case, however if they're not enabled then the
build breaks.

Fix it by making the Makefile check also build ppc_save_regs.o if
CONFIG_PPC_BOOK3S is enabled.

Fixes: 78adf6c214 ("powerpc/64s: Implement system reset idle wakeup reason")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 23:44:36 +11:00
Balbir Singh
f79ad50ea3 powerpc/mm/radix: Fix crashes on Power9 DD1 with radix MMU and STRICT_RWX
When using the radix MMU on Power9 DD1, to work around a hardware
problem, radix__pte_update() is required to do a two stage update of
the PTE. First we write a zero value into the PTE, then we flush the
TLB, and then we write the new PTE value.

In the normal case that works OK, but it does not work if we're
updating the PTE that maps the code we're executing, because the
mapping is removed by the TLB flush and we can no longer execute from
it. Unfortunately the STRICT_RWX code needs to do exactly that.

The exact symptoms when we hit this case vary, sometimes we print an
oops and then get stuck after that, but I've also seen a machine just
get stuck continually page faulting with no oops printed. The variance
is presumably due to the exact layout of the text and the page size
used for the mappings. In all cases we are unable to boot to a shell.

There are possible solutions such as creating a second mapping of the
TLB flush code, executing from that, and then jumping back to the
original. However we don't want to add that level of complexity for a
DD1 work around.

So just detect that we're running on Power9 DD1 and refrain from
changing the permissions, effectively disabling STRICT_RWX on Power9
DD1.

Fixes: 7614ff3272 ("powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix")
Cc: stable@vger.kernel.org # v4.13+
Reported-by: Andrew Jeffery <andrew@aj.id.au>
[Changelog as suggested by Michael Ellerman <mpe@ellerman.id.au>]
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 23:25:48 +11:00
Thomas Gleixner
d00a08cf9e irq/work: Use llist_for_each_entry_safe
The llist_for_each_entry() loop in irq_work_run_list() is unsafe because
once the works PENDING bit is cleared it can be requeued on another CPU.

Use llist_for_each_entry_safe() instead.

Fixes: 16c0890dc6 ("irq/work: Don't reinvent the wheel but use existing llist API")
Reported-by:Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petri Latvala <petri.latvala@intel.com>
Link: http://lkml.kernel.org/r/151027307351.14762.4611888896020658384@mail.alporthouse.com
2017-11-12 13:15:14 +01:00
Xiaochen Shen
2244645ab1 x86/intel_rdt: Fix a silent failure when writing zero value schemata
Writing an invalid schemata with no domain values (e.g., "(L3|MB):"),
results in a silent failure, i.e. the last_cmd_status returns OK,

Check for an empty value and set the result string with a proper error
message and return -EINVAL.

Before the fix:
 # mkdir /sys/fs/resctrl/p1

 # echo "L3:" > /sys/fs/resctrl/p1/schemata
 (silent failure)
 # cat /sys/fs/resctrl/info/last_cmd_status
 ok

 # echo "MB:" > /sys/fs/resctrl/p1/schemata
 (silent failure)
 # cat /sys/fs/resctrl/info/last_cmd_status
 ok

After the fix:
 # mkdir /sys/fs/resctrl/p1

 # echo "L3:" > /sys/fs/resctrl/p1/schemata
 -bash: echo: write error: Invalid argument
 # cat /sys/fs/resctrl/info/last_cmd_status
 Missing 'L3' value

 # echo "MB:" > /sys/fs/resctrl/p1/schemata
 -bash: echo: write error: Invalid argument
 # cat /sys/fs/resctrl/info/last_cmd_status
 Missing 'MB' value

[ Tony: This is an unintended side effect of the patch earlier to allow the
    	user to just write the value they want to change.  While allowing
    	user to specify less than all of the values, it also allows an
    	empty value. ]

Fixes: c4026b7b95 ("x86/intel_rdt: Implement "update" mode when writing schemata file")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
2017-11-12 09:01:40 +01:00
David S. Miller
fdae5f37a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-12 09:17:05 +09:00
Martin Wilck
a04b5de505 nvme: fix visibility of "uuid" ns attribute
"uuid" must be invisible if both ns->uuid and ns->nguid are unset,
not if either one is.

Fixes: d934f9848a "nvme: provide UUID value to userspace"
Signed-off-by: Martin Wilck <mwilck@suse.com>
[hch: rebased to the nvme-4.15 tree to help resolving a conflict]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-11 15:38:21 -07:00
Haren Myneni
0f46a79a5a crypto/nx: Do not initialize workmem allocation
We are using percpu send window on P9 NX (powerNV) instead of opening
/ closing per each crypto session. Means txwin is removed from
workmem. So we do not need to initialize workmem for each request.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:11 +11:00
Haren Myneni
976dd6490b crypto/nx: Use percpu send window for NX requests
For P9 NX, the send window is opened for each crypto session and
closed upon free. But VAS supports 64K windows per chip for all
coprocessors including in user space support. So there is a
possibility of not getting the window for kernel requests.

This patch reserves windows for each coprocessor type (NX842) and are
available forever for kernel requests, Opens each window for each CPU
on the corresponding chip during driver initialization. So then use
the percpu txwin for NX requests depends on the CPU on which the
process is executing.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:11 +11:00
Sukadev Bhattiprolu
6c8e6bb2a5 powerpc/vas: Add support for user receive window
Add support for user space receive window (for the Fast thread-wakeup
coprocessor type)

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:10 +11:00
Sukadev Bhattiprolu
61f3cca8cd powerpc/vas: Define vas_win_id()
Define an interface to return a system-wide unique id for a given VAS
window.

The vas_win_id() will be used in a follow-on patch to generate an unique
handle for a user space receive window. Applications can use this handle
to pair send and receive windows for fast thread-wakeup.

The hardware refers to this system-wide unique id as a Partition Send
Window ID which is expected to be used during fault handling. Hence the
"pswid" in the function names.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:10 +11:00
Sukadev Bhattiprolu
5676be2fb7 powerpc/vas: Define vas_win_paste_addr()
Define an interface that the NX drivers can use to find the physical
paste address of a send window. This interface is expected to be used
with the mmap() operation of the NX driver's device. i.e the user space
process can use driver's mmap() operation to map the send window's paste
address into their address space and then use copy and paste instructions
to submit the CRBs to the NX engine.

Note that kernel drivers will use vas_paste_crb() directly and don't need
this interface.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:10 +11:00
Sukadev Bhattiprolu
9d2a4d7133 powerpc: Define set_thread_uses_vas()
A CP_ABORT instruction is required in processes that have mapped a VAS
"paste address" with the intention of using COPY/PASTE instructions.
But since CP_ABORT is expensive, we want to restrict it to only
processes that use/intend to use COPY/PASTE.

Define an interface, set_thread_uses_vas(), that VAS can use to
indicate that the current process opened a send window. During context
switch, issue CP_ABORT only for processes that have the flag set.

Thanks for input from Nick Piggin, Michael Ellerman.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fix to not use new_thread after _switch() returns]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:09 +11:00
Sukadev Bhattiprolu
ec233ede4c powerpc: Add support for setting SPRN_TIDR
We need the SPRN_TIDR to be set for use with fast thread-wakeup (core-
to-core wakeup) and also with CAPI.

Each thread in a process needs to have a unique id within the process.
But for now, we assign globally unique thread ids to all threads in
the system.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
[mpe: Simplify tidr clearing on fork() and ctx switch code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:09 +11:00
Sukadev Bhattiprolu
ece4e51291 powerpc/vas: Export HVWC to debugfs
Export the VAS Window context information to debugfs.

We need to hold a mutex when closing the window to prevent a race
with the debugfs read(). Rather than introduce a per-instance mutex,
we use the global vas_mutex for now, since it is not heavily contended.

The window->cop field is only relevant to a receive window so we were
not setting it for a send window (which is is paired to a receive window
anyway). But to simplify reporting in debugfs, set the 'cop' field for the
send window also.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:09 +11:00
Sukadev Bhattiprolu
d4ef61b5e8 powerpc/vas, nx-842: Define and use chip_to_vas_id()
Define a helper, chip_to_vas_id() to map a given chip id to corresponding
vas id.

Normally, callers of vas_rx_win_open() and vas_tx_win_open() want the VAS
window to be on the same chip where the calling thread is executing. These
callers can pass in -1 for the VAS id.

This interface will be useful if a thread running on one chip wants to open
a window on another chip (like the NX-842 driver does during start up).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:08 +11:00
Sukadev Bhattiprolu
ca03258b6b powerpc/vas: Create cpu to vas id mapping
Create a cpu to vasid mapping so callers can specify -1 instead of
trying to find a VAS id.

Changelog[v2]
	[Michael Ellerman] Use per-cpu variables to simplify code.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:08 +11:00
Sukadev Bhattiprolu
6fccac16c5 powerpc/vas: poll for return of window credits
Normally, the NX driver waits for the CRBs to be processed before closing
the window. But it is better to ensure that the credits are returned before
the window gets reassigned later.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:08 +11:00
Sukadev Bhattiprolu
62f659e08c powerpc/vas: Save configured window credits
Save the configured max window credits for a window in the vas_window
structure. We will need this when polling for return of window credits.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:07 +11:00
Sukadev Bhattiprolu
dfe954e445 powerpc/vas: Reduce polling interval for busy state
A VAS window is normally in "busy" state for only a short duration.
Reduce the time we wait for the window to go to "not-busy" state to
speed-up vas_win_close() a bit.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:07 +11:00
Sukadev Bhattiprolu
36a288fe9d powerpc/vas: Use helper to unpin/close window
Use a helper to have the hardware unpin and mark a window closed.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:07 +11:00
Sukadev Bhattiprolu
4963ac3632 powerpc/vas: Drop poll_window_cast_out().
Polling for window cast out is listed in the spec, but turns out that
it is not strictly necessary and slows down window close. Making it a
stub for now.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:06 +11:00
Sukadev Bhattiprolu
0a2c2c24cf powerpc/vas: Cleanup some debug code
Clean up vas.h and the debug code around ifdef vas_debug.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:06 +11:00
Sukadev Bhattiprolu
51b537124f powerpc/vas: Validate window credits
NX-842, the only user of VAS, sets the window credits to default values
but VAS should check the credits against the possible max values.

The VAS_WCREDS_MIN is not needed and can be dropped.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:05 +11:00
Sukadev Bhattiprolu
e34917fbee powerpc/vas: init missing fields from [rt]xattr
Initialize a few missing window context fields from the window attributes
specified by the caller. These fields are currently set to their default
values by the caller (NX-842), but would be good to apply them anyway.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-12 09:03:05 +11:00
Linus Torvalds
b39545684a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Use after free in vlan, from Cong Wang.

 2) Handle NAPI poll with a zero budget properly in mlx5 driver, from
    Saeed Mahameed.

 3) If DMA mapping fails in mlx5 driver, NULL out page, from Inbar
    Karmy.

 4) Handle overrun in RX FIFO of sun4i CAN driver, from Gerhard
    Bertelsmann.

 5) Missing return in mdb and vlan prepare phase of DSA layer, from
    Vivien Didelot.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  vlan: fix a use-after-free in vlan_device_event()
  net: dsa: return after vlan prepare phase
  net: dsa: return after mdb prepare phase
  can: ifi: Fix transmitter delay calculation
  tcp: fix tcp_fastretrans_alert warning
  tcp: gso: avoid refcount_t warning from tcp_gso_segment()
  can: peak: Add support for new PCIe/M2 CAN FD interfaces
  can: sun4i: handle overrun in RX FIFO
  can: c_can: don't indicate triple sampling support for D_CAN
  net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs
  net/mlx5e: Set page to null in case dma mapping fails
  net/mlx5e: Fix napi poll with zero budget
  net/mlx5: Cancel health poll before sending panic teardown command
  net/mlx5: Loop over temp list to release delay events
  rds: ib: Fix NULL pointer dereference in debug code
2017-11-11 09:10:39 -08:00