linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-29 06:12:08 +00:00

Author	SHA1	Message	Date
Andi Kleen	54b5091606	perf stat: Implement --metric-only mode Add a new mode to only print metrics. Sometimes we don't care about the raw values, just want the computed metrics. This allows more compact printing, so with -I each sample is only a single line. This also allows easier plotting and processing with other tools. The main target is with using --topdown, but it also works with -T and standard perf stat. A few metrics are not supported. To avoiding having to hardcode all the metrics in the code it uses a two pass approach: first compute dummy metrics and only print the headers in the print_metric callback. Then use the callback to print the actual values. There are some additional changes in the stat printout code to handle all metrics being on a single line. One issue is that the column code doesn't know in advance what events are not supported by the CPU, and it would be hard to find out as this could change based on dynamic conditions. That causes empty columns in some cases. The output can be fairly wide, often you may need more than 80 columns. Example: % perf stat -a -I 1000 --metric-only 1.001452803 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches 1.001452803 158.91% 0.66 2.39 2.92% 2.002192321 180.63% 0.76 2.08 2.96% 3.003088282 150.59% 0.62 2.57 2.84% 4.004369835 196.20% 0.98 1.62 3.79% 5.005227314 231.98% 0.84 1.90 4.71% v2: Lots of updates. v3: Use slightly narrower columns v4: Add comment Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1457049458-28956-6-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:49:40 -03:00
Andi Kleen	6b45f7b2a3	perf stat: Document CSV format in manpage With all the recently added fields in the perf stat CSV output we should finally document them in the man page. Do this here. v2: Fix fields in documentation (Jiri) v3: fix order of fields again (Jiri) v4: Change order again. v5: Document more fields (Jiri) v6: Move time stamp first v7: More fixes (Jiri) Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1457049458-28956-5-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:49:06 -03:00
Namhyung Kim	599a2f38a9	perf hists browser: Check sort keys before hot key actions The context menu in TUI hists browser checks corresponding sort keys when creating the menu item. But hotkey actions lacks these checks so it can filter using incorrect info. For example, default sort key of 'perf top' doesn't contain 'comm' or 'pid' sort key so each hist entry's thread info is not reliable. Thus it should prohibit using thread filter on 't' key. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1457533253-21419-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:48:02 -03:00
Namhyung Kim	6962ccb37b	perf hists browser: Allow thread filtering for comm sort key The commit `2eafd410e6` ("perf hists browser: Only 'Zoom into thread' only when sort order has 'pid'") disabled thread filtering in hist browser for the default sort key. However the he->thread is still valid even if 'pid' sort key is not given. Only thing it should not use is the pid (or tid) of the thread. So allow to filter by thread when 'comm' sort key is given and show pid only if 'pid' sort key is given. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1457536490-24084-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:47:37 -03:00
Namhyung Kim	078b8d4a40	perf tools: Add sort__has_comm variable The sort__has_comm variable is to check whether the comm sort key is given. This is necessary to support thread filtering in the TUI hists browser later. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1457533253-21419-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:47:19 -03:00
Namhyung Kim	f7fb538afe	perf tools: Recalc total periods using top-level entries in hierarchy When hierarchy mode is enabled, each entry in a hierarchy level shares the period. IOW an upper level entry's period is the sum of lower level entries. Thus perf uses only one of them to calculate the total period of hists. It was lowest-level (leaf) entries but it has a problem when it comes to filters. If a filter is applied, entries in the same level will be filtered or not. But upper level entries still have period of their sum including filtered one. So total sum of upper level entries will not be same as sum of lower level entries. This resulted in entries having more than 100% of overhead and it can be produced using perf top with filter(s). Reported-and-Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-8-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:46:13 -03:00
Namhyung Kim	86e3ee5224	perf tools: Remove nr_sort_keys field The nr_sort_keys field is to carry the number of sort entries in a hpp_list or hists to determine the depth of indentation of a hist entry. As it's only used in hierarchy mode and now we have used nr_hpp_node for this reason, there's no need to keep it anymore. Let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-7-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:46:08 -03:00
Namhyung Kim	325a62834e	perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry() The hist_browser__fprintf_hierarchy_entry() if to dump current output into a file so it needs to be sync-ed with the corresponding function hist_browser__show_hierarchy_entry(). So use hists->nr_hpp_node to indent width and use first fmt_node to print overhead columns instead of checking whether it's a sort entry (or dynamic entry). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-6-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:46:04 -03:00
Namhyung Kim	a515d8ff70	perf tools: Remove hist_entry->fmt field It's not used anymore and the output format is accessed by the hpp_list pointer instead when hierarchy is enabled. Let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-5-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:45:59 -03:00
Namhyung Kim	aec13a7ec7	perf tools: Fix command line filters in hierarchy mode When a command-line filter is applied in hierarchy mode, output is broken especially when filtering on lower level. The higher level entries doesn't show up so it's hard to see the results. Also it needs to handle multi sort keys in a single hierarchy level. Before: $ perf report --hierarchy -s 'cpu,{dso,comm}' --comms swapper --stdio ... # Overhead CPU / Shared Object+Command # ........... ........................... # 13.79% [kernel.vmlinux] swapper 31.71% 000 13.80% [kernel.vmlinux] swapper 0.43% [e1000e] swapper 11.89% [kernel.vmlinux] swapper 9.18% [kernel.vmlinux] swapper After: # Overhead CPU / Shared Object+Command # ........... ............................... # 33.09% 003 13.79% [kernel.vmlinux] swapper 31.71% 000 13.80% [kernel.vmlinux] swapper 0.43% [e1000e] swapper 21.90% 002 11.89% [kernel.vmlinux] swapper 13.30% 001 9.18% [kernel.vmlinux] swapper Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-4-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:45:48 -03:00
Namhyung Kim	4945cf2aa1	perf tools: Add more sort entry check functions Those functions are for checkinf if a given perf_hpp_fmt is a filter-related sort entry. With hierarchy mode, it needs to check filters on the hist entries with its own hpp format list. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:45:44 -03:00
Namhyung Kim	f4954cfb1c	perf tools: Fix hist_entry__filter() for hierarchy When hierarchy mode is enabled each output format is in a separate hpp list. So when applying a filter it should check all formats in the list. Currently it only checks a single ->fmt field which was not set properly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457531222-18130-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:45:36 -03:00
Jiri Olsa	e12b202f8f	perf jitdump: Build only on supported archs Build jitdump only on architectures defined in util/genelf.h file, to avoid breaking the build on such arches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Borislav Petkov <bp@suse.de> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Ahern <dsahern@gmail.com> Cc: Davidlohr Bueso <dbueso@suse.com> Cc: He Kuang <hekuang@huawei.com> Cc: Mel Gorman <mgorman@suse.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160310164113.GA11357@krava.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:33:19 -03:00
Steven Rostedt	9eb42dee2b	tools lib traceevent: Add '~' operation within arg_num_eval() When evaluating values for print flags, if the value included a '~' operator, the parsing would fail. This broke kmalloc's parsing of: __print_flags(REC->gfp_flags, "\|", {(unsigned long)((((((( gfp_t)(0x400000u\|0x2000000u)) \| (( gfp_t)0x40u) \| (( gfp_t)0x80u) \| (( gfp_t)0x20000u)) \| (( gfp_t)0x02u)) \| (( gfp_t)0x08u)) \| (( gfp_t)0x4000u) \| (( gfp_t)0x10000u) \| (( gfp_t)0x1000u) \| (( gfp_t)0x200u)) & ~(( gfp_t)0x2000000u)) ^ \| here Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160226181328.22f47129@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2016-03-10 16:27:41 -03:00
Boris BREZILLON	9df4f913eb	MAINTAINERS: add a maintainer for the NAND subsystem Add myself as the maintainer of the NAND subsystem. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>	2016-03-10 10:48:04 -08:00
Linus Torvalds	f2c1242194	A few simple fixes for ARM, x86, PPC and generic code. The x86 MMU fix is a bit larger because the surrounding code needed a cleanup, but nothing worrisome. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJW4UwZAAoJEL/70l94x66DG3YH/0PfUr4sW0jnWRVXmYlPVka4 sNFYrdtYnx08PwXu2sWMm1F+OBXlF/t0ZSJXJ9OBF8WdKIu8TU4yBOINRAvGO/oE slrivjktLTKgicTtIXP5BpRR14ohwHIGcuiIlppxvnhmQz1/rMtig7fvhZxYI545 lJyIbyquNR86tiVdUSG9/T9+ulXXXCvOspYv8jPXZx7VKBXKTvp5P5qavSqciRb+ O9RqY+GDCR/5vrw+MV0J7H9ZydeEJeD02LcWguTGMATTm0RCrhydvSbou42UcKfY osWii0kwt2LhcM/sTOz+cWnLJ6gwU9T+ZtJTTbLvYWXWDLP/+icp9ACMkwNciNo= =/y4V -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "A few simple fixes for ARM, x86, PPC and generic code. The x86 MMU fix is a bit larger because the surrounding code needed a cleanup, but nothing worrisome" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo kvm: cap halt polling at exactly halt_poll_ns KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS KVM: VMX: disable PEBS before a guest entry KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit	2016-03-10 10:42:15 -08:00
Linus Torvalds	c32c2cb272	arm64 fixes: - Temporarily disable huge pages built using contiguous ptes - Ensure vmemmap region is sufficiently aligned for sparsemem sections -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJW4FSYAAoJELescNyEwWM0cp4H/0+9iMTqb3KowIW1vQPNnOG2 BG/RVsGlCPAwBu0V4FdcY7eK4fQ9J+/UCmVd5/SrlpjNpblinxRPihbIzs4bToqa It02wNk7ISPm0oYtyGrRu1TpC5AvMykcluZkU/CUk0sjZlBAi8WSaJiqftFuZSGH lhyhARO4KscbAUUhwDDYNKuWmLbmyOpt9RM2fziNQdjSp+8czCoCR9G+JXiPQFsJ ORU10BqBDCyFpp8/NhM55qA76FJo6RCBUWx/6L1oJJxjvahkmPba/hhnfI7+Xj1u 3FKAntJ6wVZeKqRsIkOlECoU/mrgjlTByTFN+o3KhOky8ZYBHoveIQWtsqNMFd4= =7d1o -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "I thought we were done for 4.5, but then the 64k-page chaps came crawling out of the woodwork. sigh The vmemmap fix I sent for -rc7 caused a regression with 64k pages and sparsemem and at some point during the release cycle the new hugetlb code using contiguous ptes started failing the libhugetlbfs tests with 64k pages enabled. So here are a couple of patches that fix the vmemmap alignment and disable the new hugetlb page sizes whilst a proper fix is being developed: - Temporarily disable huge pages built using contiguous ptes - Ensure vmemmap region is sufficiently aligned for sparsemem sections" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hugetlb: partial revert of `66b3923a1a` arm64: account for sparsemem section alignment when choosing vmemmap offset	2016-03-10 10:39:04 -08:00
Linus Torvalds	2da33f9f96	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Three bug fixes: - The fix for the page table corruption (CVE-2016-2143) - The diagnose statistics introduced a regression for the dasd diag driver - Boot crash on systems without the set-program-parameters facility" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: four page table levels vs. fork s390/cpumf: Fix lpp detection s390/dasd: fix diag 0x250 inline assembly	2016-03-10 10:36:07 -08:00
Mauro Carvalho Chehab	b2cd27448b	[media] media-device: map new functions into old types for legacy API The legacy media controller userspace API exposes entity types that carry both type and function information. The new API replaces the type with a function. It preserves backward compatibility by defining legacy functions for the existing types and using them in drivers. This works fine, as long as newer entity functions won't be added. Unfortunately, some tools, like media-ctl with --print-dot argument rely on the now legacy MEDIA_ENT_T_V4L2_SUBDEV and MEDIA_ENT_T_DEVNODE numeric ranges to identify what entities will be shown. Also, if the entity doesn't match those ranges, it will ignore the major/minor information on devnodes, and won't be getting the devnode name via udev or sysfs. As we're now adding devices outside the old range, the legacy ioctl needs to map the new entity functions into a type at the old range, or otherwise we'll have a regression. Detected on all released media-ctl versions (e. g. versions <= 1.10). Fix this by deriving the type from the function to emulate the legacy API if the function isn't in the legacy functions range. Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2016-03-10 15:10:59 -03:00
Luck, Tony	eb1af3b71f	EDAC/sb_edac: Fix computation of channel address Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 18:31:55 +01:00
Ludovic Desroches	25c5e9626c	dmaengine: at_xdmac: fix residue computation When computing the residue we need two pieces of information: the current descriptor and the remaining data of the current descriptor. To get that information, we need to read consecutively two registers but we can't do it in an atomic way. For that reason, we have to check manually that current descriptor has not changed. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Suggested-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Reported-by: David Engraf <david.engraf@sysgo.com> Tested-by: David Engraf <david.engraf@sysgo.com> Fixes: `e1f7c9eee7` ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Cc: stable@vger.kernel.org #4.1 and later Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2016-03-10 16:32:36 +05:30
Borislav Petkov	84477336ec	x86/delay: Avoid preemptible context checks in delay_mwaitx() We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly accessed per-cpu var as the MONITORX target in delay_mwaitx(). However, when called in preemptible context, this_cpu_ptr -> smp_processor_id() -> debug_smp_processor_id() fires: BUG: using smp_processor_id() in preemptible [00000000] code: udevd/312 caller is delay_mwaitx+0x40/0xa0 But we don't care about that check - we only need cpu_tss as a MONITORX target and it doesn't really matter which CPU's var we're touching as we're going idle anyway. Fix that. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 11:27:12 +01:00
Paolo Bonzini	5f0b819995	KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 KVM has special logic to handle pages with pte.u=1 and pte.w=0 when CR0.WP=1. These pages' SPTEs flip continuously between two states: U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed) and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed). When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1. When guest EFER has the NX bit cleared, the reserved bit check thinks that the latter state is invalid; teach it that the smep_andnot_wp case will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com> Fixes: `c258b62b26` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-10 11:26:10 +01:00
Paolo Bonzini	844a5fe219	KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM runs the guest with CR0.WP=1, so it must handle supervisor writes specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0. When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and restarts execution. This will still cause a user write to fault, while supervisor writes will succeed. User reads will fault spuriously now, and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads will be enabled and supervisor writes disabled, going back to the originary situation where supervisor writes fault spuriously. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry control, so they do not use user-return notifiers for EFER---if they did, EFER.NX would be forced to the same value as the host). There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: `f6577a5fa1` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-10 11:26:07 +01:00
Davidlohr Bueso	38460a2178	locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait() We can micro-optimize this call and mildly relax the barrier requirements by relying on ctrl + rmb, keeping the acquire semantics. In addition, this is pretty much the now standard for busy-waiting under such restraints. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1457574936-19065-3-git-send-email-dbueso@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 10:28:35 +01:00
Davidlohr Bueso	90d1098478	locking/csd_lock: Explicitly inline csd_lock*() helpers While the compiler tends to already to it for us (except for csd_unlock), make it explicit. These helpers mainly deal with the ->flags, are short-lived and can be called, for example, from smp_call_function_many(). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1457574936-19065-2-git-send-email-dbueso@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 10:28:35 +01:00
Ingo Molnar	6cbe9e4a22	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 10:28:27 +01:00
Yu-cheng Yu	a65050c6f1	x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off") Leonid Shatz noticed that the SDM interpretation of the following recent commit: `394db20ca2` ("x86/fpu: Disable AVX when eagerfpu is off") ... is incorrect and that the original behavior of the FPU code was correct. Because AVX is not stated in CR0 TS bit description, it was mistakenly believed to be not supported for lazy context switch. This turns out to be false: Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers: 'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task.' Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception Specification: 'AVX instructions refer to exceptions by classes that include #NM "Device Not Available" exception for lazy context switch.' So revert the commit. Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 10:15:58 +01:00
Martin Schwidefsky	3446c13b26	s390/mm: four page table levels vs. fork The fork of a process with four page table levels is broken since git commit `6252d702c5` "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index and the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: Marcin Kościelnicki <koriakin@0x04.net> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2016-03-10 09:21:24 +01:00
Linus Torvalds	8e0f93cda4	spi: Fixes for v4.5 A few driver specific fixes for the Rockchip and i.MX SPI controllers, especially for the i.MX they're annoying bugs if you run into them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW4POVAAoJECTWi3JdVIfQMRYH/R3bNmMQHZB+qBm6vVL28Pdo OXNejqStZLJfvdP6+xNXKijomMmvL/Xd+Jf2XMhDcEH8FZDVyemp9WIfQg8LMsy8 uSYxfNIitL7T4xfPkKF8J0z3E81lc6EDGtv9M/7XWYJM1FrfMibQB3lUv1OK6+Gd z3Q6lpTblh6o6sqN0m2g23uSv3FGhoSdTNMhLeT3Wo5L3SfGFupO35Um5pbX8nHZ ZKmVNu1wmnWvf879NsaRs8W7btP/l/lcaz+aKB0HlJfzjUjyojsgSZu+EYTBXEDF ZOe3YZnodXhnFjyHLAcI09aXeDbtLlx4QHGv36OL8ObWHX9M8UYG9pQUuh5/Ujc= =gJMG -----END PGP SIGNATURE----- Merge tag 'spi-fix-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few driver specific fixes for the Rockchip and i.MX SPI controllers, especially for the i.MX they're annoying bugs if you run into them" * tag 'spi-fix-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: imx: fix spi resource leak with dma transfer spi: imx: allow only WML aligned transfers to use DMA spi: rockchip: add missing spi_master_put spi: rockchip: disable runtime pm when in err case	2016-03-09 20:24:23 -08:00
Mark Brown	3ee20abb06	Merge remote-tracking branch 'spi/fix/rockchip' into spi-linus	2016-03-10 10:42:24 +07:00
Mark Brown	c23663ace8	Merge remote-tracking branch 'spi/fix/imx' into spi-linus	2016-03-10 10:42:22 +07:00
Linus Torvalds	718e47a573	This fixes a regression which crept in v4.5-rc5. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJW4N4xAAoJEPL5WVaVDYGjJQsH/i/9SP178CiaMeUp22PHmETi UpCaQd9AY3xGGIjCktL2DC4NC86fjsRMYl1FJdVMxElUx54fuEU17wEW4BZyjUhI aF9X7LfxQcxe+CRsY37ZdJ19nmE6EUZay8Vt/tB2LK/RvfruLNYmnzX5MmmjJY/S 1TKz6Jy5M0DTl+jpod2nv/xJ2j32WSPul8Un/iBinC16LPH+Q7KZRVjFLlf/krsM SvZ1G6I70P7t9HW88BO9KhiYyxxuwqWC6SSoPMKTr4WeGnYQbA2JE6PJPktqsq76 Q91ucFkkGi+DZuZe5EuDMYMBrwaHQG8hKG3ueCj/pTu9IRErW94uO++H03bichk= =Yjfq -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "This fixes a regression which crept in v4.5-rc5" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: iterate over buffer heads correctly in move_extent_per_page()	2016-03-09 19:33:05 -08:00
Linus Torvalds	a6e434e955	Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A few imx fixes I missed from a couple of weeks ago, they still aren't that big and fix some regression and a fail to boot problem. Other than that, a couple of regression fixes for radeon/amdgpu, one regression fix for vmwgfx and one regression fix for tda998x" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: Revert "drm/radeon/pm: adjust display configuration after powerstate" drm/amdgpu/dp: add back special handling for NUTMEG drm/radeon/dp: add back special handling for NUTMEG drm/i2c: tda998x: Choose between atomic or non atomic dpms helper drm/vmwgfx: Add back ->detect() and ->fill_modes() drm/radeon: Fix error handling in radeon_flip_work_func. drm/amdgpu: Fix error handling in amdgpu_flip_work_func. drm/imx: Add missing DRM_FORMAT_RGB565 to ipu_plane_formats drm/imx: notify DRM core about CRTC vblank state gpu: ipu-v3: Reset IPU before activating IRQ gpu: ipu-v3: Do not bail out on missing optional port nodes	2016-03-09 19:12:37 -08:00
Linus Torvalds	8205ff1dc8	I previously sent a fix that prevents all trace events from being called if the current cpu is offline. But I forgot that in 3.18, we added lockdep checks to test RCU usage even when the event is disabled. Although there cannot be any bug when a cpu is going offline, we now get false warnings triggered by the added checks of the event being disabled. I removed the check from the tracepoint code itself, and added it to the condition section (which is "1" for 'no condition'). This way the online cpu check will get checked in all the right locations. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW4N3rAAoJEKKk/i67LK/8WBIH/1WS6n919iU5sbdwmb843o4v KTeZ962l7fsiU+Op2ha4eLO6qXwa85X1sq3yHxo7APttE0oN933b6VQFjEs+HnqU INZorOQpy7soztHNewr48hdS0Z/x57xHywuf9i1K51zKCycuhupS6eZxN65zcuZp jeyg1dWqcvUzQcbxc5xflt0+n27txUpHix3e290aNoH9cya7gdbXi5dWAQgM8Kfm l8i2DJeyEy9nAKMjsKpKvdPkV5C8ZMGS1sJc/Psx9MGL08kM5Lqtuu8gkrvjqiLk HYmWPKQ+l3OROORd6Sia88SniPT9ZU4A73CobgPt5flr8BvU51kxLeObD8myXps= =s1tp -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "I previously sent a fix that prevents all trace events from being called if the current cpu is offline. But I forgot that in 3.18, we added lockdep checks to test RCU usage even when the event is disabled. Although there cannot be any bug when a cpu is going offline, we now get false warnings triggered by the added checks of the event being disabled. I removed the check from the tracepoint code itself, and added it to the condition section (which is "1" for 'no condition'). This way the online cpu check will get checked in all the right locations" * tag 'trace-fixes-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix check for cpu online when event is disabled	2016-03-09 19:01:58 -08:00
Eryu Guan	6ffe77bad5	ext4: iterate over buffer heads correctly in move_extent_per_page() In commit `bcff24887d` ("ext4: don't read blocks from disk after extents being swapped") bh is not updated correctly in the for loop and wrong data has been written to disk. generic/324 catches this on sub-page block size ext4. Fixes: `bcff24887d` ("ext4: don't read blocks from disk after extentsbeing swapped") Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 21:37:53 -05:00
Linus Torvalds	380173ff56	Merge branch 'akpm' (patches from Andrew) Merge fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dma-mapping: avoid oops when parameter cpu_addr is null mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers memremap: check pfn validity before passing to pfn_to_page() mm, thp: fix migration of PTE-mapped transparent huge pages dax: check return value of dax_radix_entry() ocfs2: fix return value from ocfs2_page_mkwrite() arm64: kasan: clear stale stack poison sched/kasan: remove stale KASAN poison after hotplug kasan: add functions to clear stack poison mm: fix mixed zone detection in devm_memremap_pages list: kill list_force_poison() mm: __delete_from_page_cache show Bad page if mapped mm/hugetlb: hugetlb_no_page: rate-limit warning message	2016-03-09 18:27:52 -08:00
Zhen Lei	d6b7eaeb03	dma-mapping: avoid oops when parameter cpu_addr is null To keep consistent with kfree, which tolerate ptr is NULL. We do this because sometimes we may use goto statement, so that success and failure case can share parts of the code. But unfortunately, dma_free_coherent called with parameter cpu_addr is null will cause oops, such as showed below: Unable to handle kernel paging request at virtual address ffffffc020d3b2b8 pgd = ffffffc083a61000 [ffffffc020d3b2b8] pgd=0000000000000000, pud=0000000000000000 CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G O 4.1.12 #1 Hardware name: ARM64 (DT) PC is at __dma_free_coherent.isra.10+0x74/0xc8 LR is at __dma_free+0x9c/0xb0 Process malloc_dma_1 (pid: 1489, stack limit = 0xffffffc0837fc020) [...] Call trace: __dma_free_coherent.isra.10+0x74/0xc8 __dma_free+0x9c/0xb0 malloc_dma+0x104/0x158 [dma_alloc_coherent_mtmalloc] kthread+0xec/0xfc Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Jan Stancek	86613628b3	mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers Replace ENOTSUPP with EOPNOTSUPP. If hugepages are not supported, this value is propagated to userspace. EOPNOTSUPP is part of uapi and is widely supported by libc libraries. It gives nicer message to user, rather than: # cat /proc/sys/vm/nr_hugepages cat: /proc/sys/vm/nr_hugepages: Unknown error 524 And also LTP's proc01 test was failing because this ret code (524) was unexpected: proc01 1 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524 proc01 2 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524 proc01 3 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524 Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Ard Biesheuvel	ac343e882a	memremap: check pfn validity before passing to pfn_to_page() In memremap's helper function try_ram_remap(), we dereference a struct page pointer that was derived from a PFN that is known to be covered by a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN, i.e., a PFN that has a struct page associated with it and is covered by the kernel direct mapping. However, the assumption that there is a 1:1 relation between the System RAM iomem region and the kernel direct mapping is not universally valid on all architectures, and on ARM and arm64, 'System RAM' may include regions for which pfn_valid() returns false. Generally speaking, both __va() and pfn_to_page() should only ever be called on PFNs/physical addresses for which pfn_valid() returns true, so add that check to try_ram_remap(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Kirill A. Shutemov	0a2e280b6d	mm, thp: fix migration of PTE-mapped transparent huge pages We don't have native support of THP migration, so we have to split huge page into small pages in order to migrate it to different node. This includes PTE-mapped huge pages. I made mistake in refcounting patchset: we don't actually split PTE-mapped huge page in queue_pages_pte_range(), if we step on head page. The result is that the head page is queued for migration, but none of tail pages: putting head page on queue takes pin on the page and any subsequent attempts of split_huge_pages() would fail and we skip queuing tail pages. unmap_and_move_huge_page() will eventually split the huge pages, but only one of 512 pages would get migrated. Let's fix the situation. Fixes: `248db92da1` ("migrate_pages: try to split pages on queuing") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Ross Zwisler	30f471fd88	dax: check return value of dax_radix_entry() dax_pfn_mkwrite() previously wasn't checking the return value of the call to dax_radix_entry(), which was a mistake. Instead, capture this return value and return the appropriate VM_FAULT_ value. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Jan Kara	566e8dfd88	ocfs2: fix return value from ocfs2_page_mkwrite() ocfs2_page_mkwrite() could mistakenly return error code instead of mkwrite status value. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Mark Rutland	0d97e6d802	arm64: kasan: clear stale stack poison Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poison prior to returning. In the case of cpuidle, CPUs exit the kernel a number of levels deep in C code. Any instrumented functions on this critical path will leave portions of the stack shadow poisoned. If CPUs lose context and return to the kernel via a cold path, we restore a prior context saved in __cpu_suspend_enter are forgotten, and we never remove the poison they placed in the stack shadow area by functions calls between this and the actual exit of the kernel. Thus, (depending on stackframe layout) subsequent calls to instrumented functions may hit this stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, clear any stale poison from the idle thread for a CPU prior to bringing a CPU online. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Mark Rutland	e1b77c9298	sched/kasan: remove stale KASAN poison after hotplug Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poision prior to returning. In the case of CPU hotplug, CPUs exit the kernel a number of levels deep in C code. Any instrumented functions on this critical path will leave portions of the stack shadow poisoned. When a CPU is subsequently brought back into the kernel via a different path, depending on stackframe, layout calls to instrumented functions may hit this stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, clear any stale poison from the idle thread for a CPU prior to bringing a CPU online. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Mark Rutland	e3ae116339	kasan: add functions to clear stack poison Functions which the compiler has instrumented for ASAN place poison on the stack shadow upon entry and remove this poison prior to returning. In some cases (e.g. hotplug and idle), CPUs may exit the kernel a number of levels deep in C code. If there are any instrumented functions on this critical path, these will leave portions of the idle thread stack shadow poisoned. If a CPU returns to the kernel via a different path (e.g. a cold entry), then depending on stack frame layout subsequent calls to instrumented functions may use regions of the stack with stale poison, resulting in (spurious) KASAN splats to the console. Contemporary GCCs always add stack shadow poisoning when ASAN is enabled, even when asked to not instrument a function [1], so we can't simply annotate functions on the critical path to avoid poisoning. Instead, this series explicitly removes any stale poison before it can be hit. In the common hotplug case we clear the entire stack shadow in common code, before a CPU is brought online. On architectures which perform a cold return as part of cpu idle may retain an architecture-specific amount of stack contents. To retain the poison for this retained context, the arch code must call the core KASAN code, passing a "watermark" stack pointer value beyond which shadow will be cleared. Architectures which don't perform a cold return as part of idle do not need any additional code. This patch (of 3): Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poision prior to returning. In some cases (e.g. hotplug and idle), CPUs may exit the kernel a number of levels deep in C code. If there are any instrumented functions on this critical path, these will leave portions of the stack shadow poisoned. If a CPU returns to the kernel via a different path (e.g. a cold entry), then depending on stack frame layout subsequent calls to instrumented functions may use regions of the stack with stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, we must clear stale poison from the stack prior to instrumented functions being called. This patch adds functions to the KASAN core for removing poison from (portions of) a task's stack. These will be used by subsequent patches to avoid problems with hotplug and idle. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Dan Williams	5f29a77cd9	mm: fix mixed zone detection in devm_memremap_pages The check for whether we overlap "System RAM" needs to be done at section granularity. For example a system with the following mapping: 100000000-37bffffff : System RAM 37c000000-837ffffff : Persistent Memory ...is unable to use devm_memremap_pages() as it would result in two zones colliding within a given section. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Dan Williams	d77a117e68	list: kill list_force_poison() Given we have uninitialized list_heads being passed to list_add() it will always be the case that those uninitialized values randomly trigger the poison value. Especially since a list_add() operation will seed the stack with the poison value for later stack allocations to trip over. For example, see these two false positive reports: list_add attempted on force-poisoned entry WARNING: at lib/list_debug.c:34 [..] NIP [c00000000043c390] __list_add+0xb0/0x150 LR [c00000000043c38c] __list_add+0xac/0x150 Call Trace: __list_add+0xac/0x150 (unreliable) __down+0x4c/0xf8 down+0x68/0x70 xfs_buf_lock+0x4c/0x150 [xfs] list_add attempted on force-poisoned entry(0000000000000500), new->next == d0000000059ecdb0, new->prev == 0000000000000500 WARNING: at lib/list_debug.c:33 [..] NIP [c00000000042db78] __list_add+0xa8/0x140 LR [c00000000042db74] __list_add+0xa4/0x140 Call Trace: __list_add+0xa4/0x140 (unreliable) rwsem_down_read_failed+0x6c/0x1a0 down_read+0x58/0x60 xfs_log_commit_cil+0x7c/0x600 [xfs] Fixes: commit `5c2c2587b1` ("mm, dax, pmem: introduce {get\|put}_dev_pagemap() for dax-gup") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Eryu Guan <eguan@redhat.com> Tested-by: Eryu Guan <eguan@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Hugh Dickins	06b241f32c	mm: __delete_from_page_cache show Bad page if mapped Commit `e1534ae950` ("mm: differentiate page_mapped() from page_mapcount() for compound pages") changed the famous BUG_ON(page_mapped(page)) in __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not. Although it has not usually been very helpul, being hit long after the error in question, we do need to know if it actually happens on users' systems; but reinstating a crash there is likely to be opposed :) In the non-debug case, pr_alert("BUG: Bad page cache") plus dump_page(), dump_stack(), add_taint() - I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the standard procedure now. Move that, or the VM_BUG_ON_PAGE(), up before the deletion from tree: so that the unNULLified page->mapping gives a little more information. If the inode is being evicted (rather than truncated), it won't have any vmas left, so it's safe(ish) to assume that the raised mapcount is erroneous, and we can discount it from page_count to avoid leaking the page (I'm less worried by leaking the occasional 4kB, than losing a potential 2MB page with each 4kB page leaked). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Geoffrey Thomas	910154d520	mm/hugetlb: hugetlb_no_page: rate-limit warning message The warning message "killed due to inadequate hugepage pool" simply indicates that SIGBUS was sent, not that the process was forcibly killed. If the process has a signal handler installed does not fix the problem, this message can rapidly spam the kernel log. On my amd64 dev machine that does not have hugepages configured, I can reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e., 4 megabytes of huge pages) and running something that sets a signal handler and forks, like #include <sys/mman.h> #include <signal.h> #include <stdlib.h> #include <unistd.h> sig_atomic_t counter = 10; void handler(int signal) { if (counter-- == 0) exit(0); } int main(void) { int status; char addr = mmap(NULL, 4 1048576, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS \| MAP_HUGETLB, -1, 0); if (addr == MAP_FAILED) {perror("mmap"); return 1;} addr = 'x'; switch (fork()) { case -1: perror("fork"); return 1; case 0: signal(SIGBUS, handler); addr = 'x'; break; default: *addr = 'x'; wait(&status); if (WIFSIGNALED(status)) { psignal(WTERMSIG(status), "child"); } break; } } Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00

1 2 3 4 5 ...

576095 Commits