linux

Author	SHA1	Message	Date
Thibaut Sautereau	09a6b0bc3b	random32: Restore __latent_entropy attribute on net_rand_state Commit `f227e3ec3b` ("random32: update the net random state on interrupt and activity") broke compilation and was temporarily fixed by Linus in `83bdc7275e` ("random32: remove net_rand_state from the latent entropy gcc plugin") by entirely moving net_rand_state out of the things handled by the latent_entropy GCC plugin. From what I understand when reading the plugin code, using the __latent_entropy attribute on a declaration was the wrong part and simply keeping the __latent_entropy attribute on the variable definition was the correct fix. Fixes: `83bdc7275e` ("random32: remove net_rand_state from the latent entropy gcc plugin") Acked-by: Willy Tarreau <w@1wt.eu> Cc: Emese Revfy <re.emese@gmail.com> Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-02 09:31:54 -07:00
Roman Gushchin	be458311cd	mm: memcg/slab: fix slab statistics in !SMP configuration Since commit `ea426c2a7d` ("mm: memcg: prepare for byte-sized vmstat items") the write side of slab counters accepts a value in bytes and converts it to pages. It happens in __mod_node_page_state(). However a non-SMP version of __mod_node_page_state() doesn't perform this conversion. It leads to incorrect (unrealistically high) slab counters values. Fix this by adding a similar conversion to the non-SMP version of __mod_node_page_state(). Signed-off-by: Roman Gushchin <guro@fb.com> Reported-and-tested-by: Bastian Bittorf <bb@npl.de> Fixes: `ea426c2a7d` ("mm: memcg: prepare for byte-sized vmstat items") Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-02 09:13:41 -07:00
Linus Torvalds	472e5b056f	pipe: remove pipe_wait() and fix wakeup race with splice The pipe splice code still used the old model of waiting for pipe IO by using a non-specific "pipe_wait()" that waited for any pipe event to happen, which depended on all pipe IO being entirely serialized by the pipe lock. So by checking the state you were waiting for, and then adding yourself to the wait queue before dropping the lock, you were guaranteed to see all the wakeups. Strictly speaking, the actual wakeups were not done under the lock, but the pipe_wait() model still worked, because since the waiter held the lock when checking whether it should sleep, it would always see the current state, and the wakeup was always done after updating the state. However, commit `0ddad21d3e` ("pipe: use exclusive waits when reading or writing") split the single wait-queue into two, and in the process also made the "wait for event" code wait for _two_ wait queues, and that then showed a race with the wakers that were not serialized by the pipe lock. It's only splice that used that "pipe_wait()" model, so the problem wasn't obvious, but Josef Bacik reports: "I hit a hang with fstest btrfs/187, which does a btrfs send into /dev/null. This works by creating a pipe, the write side is given to the kernel to write into, and the read side is handed to a thread that splices into a file, in this case /dev/null. The box that was hung had the write side stuck here [pipe_write] and the read side stuck here [splice_from_pipe_next -> pipe_wait]. [ more details about pipe_wait() scenario ] The problem is we're doing the prepare_to_wait, which sets our state each time, however we can be woken up either with reads or writes. In the case above we race with the WRITER waking us up, and re-set our state to INTERRUPTIBLE, and thus never break out of schedule" Josef had a patch that avoided the issue in pipe_wait() by just making it set the state only once, but the deeper problem is that pipe_wait() depends on a level of synchonization by the pipe mutex that it really shouldn't. And the whole "wait for any pipe state change" model really isn't very good to begin with. So rather than trying to work around things in pipe_wait(), remove that legacy model of "wait for arbitrary pipe event" entirely, and actually create functions that wait for the pipe actually being readable or writable, and can do so without depending on the pipe lock serializing everything. Fixes: `0ddad21d3e` ("pipe: use exclusive waits when reading or writing") Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/ Reported-by: Josef Bacik <josef@toxicpanda.com> Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-01 19:14:36 -07:00
Linus Torvalds	44b6e23be3	Merge tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Fix a device reference counting bug in the Exynos IOMMU driver. - Lockdep fix for the Intel VT-d driver. - Fix a bug in the AMD IOMMU driver which caused corruption of the IVRS ACPI table and caused IOMMU driver initialization failures in kdump kernels. * tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() iommu/amd: Fix the overwritten field in IVMD header iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()	2020-10-01 12:59:36 -07:00
Linus Torvalds	eed2ef4403	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "A previous commit to prevent AML memory opregions from accessing the kernel memory turned out to be too restrictive. Relax the permission check to permit the ACPI core to map kernel memory used for table overrides" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: permit ACPI core to map kernel memory used for table overrides	2020-10-01 11:49:01 -07:00
Linus Torvalds	fcadab7404	Merge tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "AMD and vmwgfx fixes. Just dequeuing these a bit early as the AMD ones are bit larger than I'd prefer, but Alex missed last week so it's a double set of fixes. The larger ones are just register header fixes for the new chips that were just introduced in rc1 along with some new PCI IDs for new hw. Otherwise it is usual fixes. The vmwgfx fix was due to some testing I was doing and found we weren't booting properly, vmware had the fix internally so hurried it vmwgfx: - fix a regression due to TTM refactor amdgpu: - Fix potential double free in userptr handling - Sienna Cichlid and Navy Flounder udpates - Add Sienna Cichlid PCI IDs - Drop experimental flag for navi12 - Raven fixes - Renoir fixes - HDCP fix - DCN3 fix for clang and older versions of gcc - Fix a runtime pm refcount issue" * tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: disable gfxoff temporarily for navy_flounder drm/amd/pm: setup APU dpm clock table in SMU HW initialization drm/vmwgfx: Fix error handling in get_node drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version() drm/amdgpu/swsmu/smu12: fix force clock handling for mclk drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config drm/amdgpu/display: fix CFLAGS setup for DCN30 drm/amd/display: fix return value check for hdcp_work drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc. drm/amd/pm: Removed fixed clock in auto mode DPM drm/amdgpu: remove experimental flag from navi12 drm/amdgpu: add device ID for sienna_cichlid (v2) drm/amdgpu: use the AV1 defines for VCN 3.0 drm/amdgpu: add VCN 3.0 AV1 registers drm/amdgpu: add the GC 10.3 VRS registers drm/amdgpu: prevent double kfree ttm->sg	2020-10-01 09:45:37 -07:00
Linus Torvalds	aa5ff93523	Merge tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two tracing fixes: - Fix temp buffer accounting that caused a WARNING for ftrace_dump_on_opps() - Move the recursion check in one of the function callback helpers to the beginning of the function, as if the rcu_is_watching() gets traced, it will cause a recursive loop that will crash the kernel" * tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Move RCU is watching check after recursion check tracing: Fix trace_find_next_entry() accounting of temp buffer size	2020-10-01 09:41:02 -07:00
Lu Baolu	1a3f2fd7fc	iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() Lock(&iommu->lock) without disabling irq causes lockdep warnings. [ 12.703950] ======================================================== [ 12.703962] WARNING: possible irq lock inversion dependency detected [ 12.703975] 5.9.0-rc6+ #659 Not tainted [ 12.703983] -------------------------------------------------------- [ 12.703995] systemd-udevd/284 just changed the state of lock: [ 12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at: iommu_flush_dev_iotlb.part.57+0x2e/0x90 [ 12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 12.704043] (&iommu->lock){+.+.}-{2:2} [ 12.704045] and interrupts could create inverse lock ordering between them. [ 12.704073] other info that might help us debug this: [ 12.704085] Possible interrupt unsafe locking scenario: [ 12.704097] CPU0 CPU1 [ 12.704106] ---- ---- [ 12.704115] lock(&iommu->lock); [ 12.704123] local_irq_disable(); [ 12.704134] lock(device_domain_lock); [ 12.704146] lock(&iommu->lock); [ 12.704158] <Interrupt> [ 12.704164] lock(device_domain_lock); [ 12.704174] * DEADLOCK * Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-10-01 14:54:17 +02:00
Adrian Huang	0bbe4ced53	iommu/amd: Fix the overwritten field in IVMD header Commit `387caf0b75` ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions") accidentally overwrites the 'flags' field in IVMD (struct ivmd_header) when the I/O virtualization memory definition is associated with the exclusion range entry. This leads to the corrupted IVMD table (incorrect checksum). The kdump kernel reports the invalid checksum: ACPI BIOS Warning (bug): Incorrect checksum in table [IVRS] - 0x5C, should be 0x60 (20200717/tbprint-177) AMD-Vi: [Firmware Bug]: IVRS invalid checksum Fix the above-mentioned issue by modifying the 'struct unity_map_entry' member instead of the IVMD header. Cleanup: The exclusion_range functions are not used anymore, so get rid of them. Fixes: `387caf0b75` ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions") Reported-and-tested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20200926102602.19177-1-adrianhuang0701@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-10-01 14:11:36 +02:00
Dave Airlie	132d7c8abe	Merge tag 'amd-drm-fixes-5.9-2020-09-30' of git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.9-2020-09-30: amdgpu: - Fix potential double free in userptr handling - Sienna Cichlid and Navy Flounder udpates - Add Sienna Cichlid PCI IDs - Drop experimental flag for navi12 - Raven fixes - Renoir fixes - HDCP fix - DCN3 fix for clang and older versions of gcc - Fix a runtime pm refcount issue Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930161326.4243-1-alexander.deucher@amd.com	2020-10-01 15:25:33 +10:00
Ard Biesheuvel	a509a66a9d	arm64: permit ACPI core to map kernel memory used for table overrides Jonathan reports that the strict policy for memory mapped by the ACPI core breaks the use case of passing ACPI table overrides via initramfs. This is due to the fact that the memory type used for loading the initramfs in memory is not recognized as a memory type that is typically used by firmware to pass firmware tables. Since the purpose of the strict policy is to ensure that no AML or other ACPI code can manipulate any memory that is used by the kernel to keep its internal state or the state of user tasks, we can relax the permission check, and allow mappings of memory that is reserved and marked as NOMAP via memblock, and therefore not covered by the linear mapping to begin with. Fixes: `1583052d11` ("arm64/acpi: disallow AML memory opregions to access kernel memory") Fixes: `325f5585ec` ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2020-09-30 22:27:51 +01:00
Linus Torvalds	60e7209315	Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Another batch of clk driver fixes: - Make sure DRAM and ChipID region doesn't get disabled on Exynos - Fix a SATA failure on Tegra - Fix the emac_ptp clk divider on stratix10" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED clk: tegra: Fix missing prototype for tegra210_clk_register_emc() clk: tegra: Always program PLL_E when enabled clk: tegra: Capitalization fixes clk: samsung: Keep top BPLL mux on Exynos542x enabled	2020-09-30 14:18:38 -07:00
Jiansong Chen	95433a1305	drm/amdgpu: disable gfxoff temporarily for navy_flounder gfxoff is temporarily disabled for navy_flounder, since at present the feature caused some tdr when performing display operations. Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-30 09:47:43 -04:00
Evan Quan	b195152536	drm/amd/pm: setup APU dpm clock table in SMU HW initialization As the dpm clock table is needed during DC HW initialization. And that (DC HW initialization) comes before smu_late_init() where current APU dpm clock table setup is performed. So, NULL pointer dereference will be triggered. By moving APU dpm clock table setup to smu_hw_init(), this can be avoided. Fixes: `02cf91c113` ("drm/amd/powerplay: postpone operations not required for hw setup to late_init") Acked-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Evan Quan <evan.quan@amd.com> Reported-by: Dirk Gouders <dirk@gouders.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-30 09:29:00 -04:00
Dave Airlie	6f4fc18f35	Merge branch 'vmwgfx-fixes-5.9' of git://people.freedesktop.org/~sroland/linux into drm-fixes One vmwgfx regression fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: "Roland Scheidegger (VMware)" <rscheidegger.oss@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930041000.2423-1-rscheidegger.oss@gmail.com	2020-09-30 14:22:05 +10:00
Zack Rusin	f54c444289	drm/vmwgfx: Fix error handling in get_node ttm_mem_type_manager_func.get_node was changed to return -ENOSPC instead of setting the node pointer to NULL. Unfortunately vmwgfx still had two places where it was explicitly converting -ENOSPC to 0 causing regressions. This fixes those spots by allowing -ENOSPC to be returned. That seems to fix recent regressions with vmwgfx. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Sigend-off-by: Roland Scheidegger <sroland@vmware.com>	2020-09-30 05:44:28 +02:00
Linus Torvalds	02de58b24d	Merge tag 'devicetree-fixes-for-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix handling of HOST_EXTRACFLAGS for dtc - Several warning fixes for DT bindings * tag 'devicetree-fixes-for-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: scripts/dtc: only append to HOST_EXTRACFLAGS instead of overwriting dt-bindings: Fix 'reg' size issues in zynqmp examples ARM: dts: bcm2835: Change firmware compatible from simple-bus to simple-mfd dt-bindings: leds: cznic,turris-omnia-leds: fix error in binding dt-bindings: crypto: sa2ul: fix a DT binding check warning	2020-09-29 17:56:30 -07:00
Linus Torvalds	90fb702791	autofs: use __kernel_write() for the autofs pipe writing autofs got broken in some configurations by commit `13c164b1a1` ("autofs: switch to kernel_write") because there is now an extra LSM permission check done by security_file_permission() in rw_verify_area(). autofs is one if the few places that really does want the much more limited __kernel_write(), because the write is an internal kernel one that shouldn't do any user permission checks (it also doesn't need the file_start_write/file_end_write logic, since it's just a pipe). There are a couple of other cases like that - accounting, core dumping, and splice - but autofs stands out because it can be built as a module. As a result, we need to export this internal __kernel_write() function again. We really don't want any other module to use this, but we don't have a "EXPORT_SYMBOL_FOR_AUTOFS_ONLY()". But we can mark it GPL-only to at least approximate that "internal use only" for licensing. While in this area, make autofs pass in NULL for the file position pointer, since it's always a pipe, and we now use a NULL file pointer for streaming file descriptors (see file_ppos() and commit `438ab720c6`: "vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files") This effectively reverts commits `9db9775224` ("fs: unexport __kernel_write") and `13c164b1a1` ("autofs: switch to kernel_write"). Fixes: `13c164b1a1` ("autofs: switch to kernel_write") Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-29 17:18:34 -07:00
Dirk Gouders	548c7ba7dc	drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version() Commit `78fe9f6394` ("drm/amd/display: Remove DISPCLK Limit Floor for Certain SMU Versions") added a call to rn_vbios_smu_get_smu_version() to set clk_mgr->smu_ver. That field is initialized prior to the if-statement, already. Fixes: `78fe9f6394` (drm/amd/display: Remove DISPCLK Limit Floor for Certain SMU Versions) Signed-off-by: Dirk Gouders <dirk@gouders.net> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Sung Lee <sung.lee@amd.com> Cc: Yongqiang Sun <yongqiang.sun@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:10:31 -04:00
Alex Deucher	3c26d0314c	drm/amdgpu/swsmu/smu12: fix force clock handling for mclk The state array is in the reverse order compared to other asics (high to low rather than low to high). Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1313 Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:09:59 -04:00
Jean Delvare	a39d0d7bdf	drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config A recent attempt to fix a ref count leak in amdgpu_display_crtc_set_config() turned out to be doing too much and "fixed" an intended decrease as if it were a leak. Undo that part to restore the proper balance. This is the very nature of this function to increase or decrease the power reference count depending on the situation. Consequences of this bug is that the power reference would eventually get down to 0 while the display was still in use, resulting in that display switching off unexpectedly. Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: `e008fa6fb4` ("drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config") Cc: stable@vger.kernel.org Cc: Navid Emamdoost <navid.emamdoost@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:09:22 -04:00
Alex Deucher	c73d05eaba	drm/amdgpu/display: fix CFLAGS setup for DCN30 Properly handle clang and older versions of gcc. Fixes: `e77165bf7b` ("drm/amd/display: Add DCN3 blocks to Makefile") Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:08:02 -04:00
Flora Cui	898c7302f4	drm/amd/display: fix return value check for hdcp_work max_caps might be 0, thus hdcp_work might be ZERO_SIZE_PTR Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:07:34 -04:00
Jiansong Chen	0c7014154d	drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc. Remove gpu_info fw support for sienna_cichlid etc., since the information can be retrieved from discovery binary. Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:07:06 -04:00
Sudheesh Mavila	97cf32996c	drm/amd/pm: Removed fixed clock in auto mode DPM SMU10_UMD_PSTATE_PEAK_FCLK value should not be used to set the DPM. Suggested-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-29 17:05:02 -04:00
Uwe Kleine-König	efe84d408b	scripts/dtc: only append to HOST_EXTRACFLAGS instead of overwriting When building with $ HOST_EXTRACFLAGS=-g make the expectation is that host tools are built with debug informations. This however doesn't happen if the Makefile assigns a new value to the HOST_EXTRACFLAGS instead of appending to it. So use += instead of := for the first assignment. Fixes: `e3fd9b5384` ("scripts/dtc: consolidate include path options in Makefile") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Rob Herring <robh@kernel.org>	2020-09-29 15:48:08 -05:00
Rob Herring	64ff609b55	dt-bindings: Fix 'reg' size issues in zynqmp examples The default sizes in examples for 'reg' are 1 cell each. Fix the incorrect sizes in zynqmp examples: Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.example.dt.yaml: example-0: dma-controller@fd4c0000:reg:0: [0, 4249616384, 0, 4096] is too long From schema: /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.example.dt.yaml: example-0: display@fd4a0000:reg:0: [0, 4249485312, 0, 4096] is too long From schema: /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.example.dt.yaml: example-0: display@fd4a0000:reg:1: [0, 4249526272, 0, 4096] is too long From schema: /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.example.dt.yaml: example-0: display@fd4a0000:reg:2: [0, 4249530368, 0, 4096] is too long From schema: /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.example.dt.yaml: example-0: display@fd4a0000:reg:3: [0, 4249534464, 0, 4096] is too long From schema: /usr/local/lib/python3.8/dist-packages/dtschema/schemas/reg.yaml Cc: Hyun Kwon <hyun.kwon@xilinx.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Vinod Koul <vkoul@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: dmaengine@vger.kernel.org Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Rob Herring <robh@kernel.org>	2020-09-29 15:39:02 -05:00
Linus Torvalds	ccc1d052ef	Merge tag 'dmaengine-fix-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fix from Vinod Koul: "Fix dmatest for misconfigured channel" * tag 'dmaengine-fix-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: dmatest: Prevent to run on misconfigured channel	2020-09-29 10:35:42 -07:00
Linus Torvalds	1ccfa66d94	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "A couple of last minute fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: fix backend feature ioctls vhost: Fix documentation	2020-09-29 10:33:06 -07:00
Steven Rostedt (VMware)	b40341fad6	ftrace: Move RCU is watching check after recursion check The first thing that the ftrace function callback helper functions should do is to check for recursion. Peter Zijlstra found that when "rcu_is_watching()" had its notrace removed, it caused perf function tracing to crash. This is because the call of rcu_is_watching() is tested before function recursion is checked and and if it is traced, it will cause an infinite recursion loop. rcu_is_watching() should still stay notrace, but to prevent this should never had crashed in the first place. The recursion prevention must be the first thing done in callback functions. Link: https://lore.kernel.org/r/20200929112541.GM2628@hirez.programming.kicks-ass.net Cc: stable@vger.kernel.org Cc: Paul McKenney <paulmck@kernel.org> Fixes: `c68c0fa293` ("ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-09-29 13:05:10 -04:00
Steven Rostedt (VMware)	851e6f61cd	tracing: Fix trace_find_next_entry() accounting of temp buffer size The temp buffer size variable for trace_find_next_entry() was incorrectly being updated when the size did not change. The temp buffer size should only be updated when it is reallocated. This is mostly an issue when used with ftrace_dump(). That's because ftrace_dump() can not allocate a new buffer, and instead uses a temporary buffer with a fix size. But the variable that keeps track of that size is incorrectly updated with each call, and it could fall into the path that would try to reallocate the buffer and produce a warning. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1601 at kernel/trace/trace.c:3548 trace_find_next_entry+0xd0/0xe0 Modules linked in [..] CPU: 1 PID: 1601 Comm: bash Not tainted 5.9.0-rc5-test+ #521 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:trace_find_next_entry+0xd0/0xe0 Code: 40 21 00 00 4c 89 e1 31 d2 4c 89 ee 48 89 df e8 c6 9e ff ff 89 ab 54 21 00 00 5b 5d 41 5c 41 5d c3 48 63 d5 eb bf 31 c0 eb f0 <0f> 0b 48 63 d5 eb b4 66 0f 1f 84 00 00 00 00 00 53 48 8d 8f 60 21 RSP: 0018:ffff95a4f2e8bd70 EFLAGS: 00010046 RAX: ffffffff96679fc0 RBX: ffffffff97910de0 RCX: ffffffff96679fc0 RDX: ffff95a4f2e8bd98 RSI: ffff95a4ee321098 RDI: ffffffff97913000 RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000046 R12: ffff95a4f2e8bd98 R13: 0000000000000000 R14: ffff95a4ee321098 R15: 00000000009aa301 FS: 00007f8565484740(0000) GS:ffff95a55aa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055876bd43d90 CR3: 00000000b76e6003 CR4: 00000000001706e0 Call Trace: trace_print_lat_context+0x58/0x2d0 ? cpumask_next+0x16/0x20 print_trace_line+0x1a4/0x4f0 ftrace_dump.cold+0xad/0x12c __handle_sysrq.cold+0x51/0x126 write_sysrq_trigger+0x3f/0x4a proc_reg_write+0x53/0x80 vfs_write+0xca/0x210 ksys_write+0x70/0xf0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f8565579487 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffd40707948 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f8565579487 RDX: 0000000000000002 RSI: 000055876bd74de0 RDI: 0000000000000001 RBP: 000055876bd74de0 R08: 000000000000000a R09: 0000000000000001 R10: 000055876bdec280 R11: 0000000000000246 R12: 0000000000000002 R13: 00007f856564a500 R14: 0000000000000002 R15: 00007f856564a700 irq event stamp: 109958 ---[ end trace 7aab5b7e51484b00 ]--- Not only fix the updating of the temp buffer, but also do not free the temp buffer before a new buffer is allocated (there's no reason to not continue to use the current temp buffer if an allocation fails). Cc: stable@vger.kernel.org Fixes: `8e99cf91b9` ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") Reported-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-09-29 12:46:22 -04:00
Linus Torvalds	fb0155a09b	Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - NFSv4.2: copy_file_range needs to invalidate caches on success - NFSv4.2: Fix security label length not being reset - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read - pNFS/flexfiles: Fix signed/unsigned type issues with mirror indices" * tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Be consistent about mirror index types pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read NFSv4.2: fix client's attribute cache management for copy_file_range nfs: Fix security label length not being reset	2020-09-28 11:05:56 -07:00
Jason A. Donenfeld	a4d63c3732	mm: do not rely on mm == current->mm in __get_user_pages_locked It seems likely this block was pasted from internal_get_user_pages_fast, which is not passed an mm struct and therefore uses current's. But __get_user_pages_locked is passed an explicit mm, and current->mm is not always valid. This was hit when being called from i915, which uses: pin_user_pages_remote-> __get_user_pages_remote-> __gup_longterm_locked-> __get_user_pages_locked Before, this would lead to an OOPS: BUG: kernel NULL pointer dereference, address: 0000000000000064 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140 Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020 Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915] RIP: 0010:__get_user_pages_remote+0xd7/0x310 Call Trace: __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915] process_one_work+0x1ca/0x390 worker_thread+0x48/0x3c0 kthread+0x114/0x130 ret_from_fork+0x1f/0x30 CR2: 0000000000000064 This commit fixes the problem by using the mm pointer passed to the function rather than the bogus one in current. Fixes: `008cfe4418` ("mm: Introduce mm_struct.has_pinned") Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Harald Arnesen <harald@skogtun.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-28 09:21:50 -07:00
Maxime Ripard	6a7548305a	ARM: dts: bcm2835: Change firmware compatible from simple-bus to simple-mfd The current binding for the RPi firmware uses the simple-bus compatible as a fallback to benefit from its automatic probing of child nodes. However, simple-bus also comes with some constraints, like having the ranges, our case. Let's switch to simple-mfd that provides the same probing logic without those constraints. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://lore.kernel.org/r/20200924082642.18144-1-maxime@cerno.tech Signed-off-by: Rob Herring <robh@kernel.org>	2020-09-28 07:55:12 -05:00
Linus Torvalds	a1b8638ba1	Linux 5.9-rc7	2020-09-27 14:38:10 -07:00
Linus Torvalds	16bc1d5432	Merge tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - ignore compiler stubs for PPC to fix builds - fix the usage of --target mentioned in the LLVM document * tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Documentation/llvm: Fix clang target examples scripts/kallsyms: skip ppc compiler stub .long_branch. / .plt_branch.	2020-09-27 12:18:57 -07:00
Linus Torvalds	f8818559ca	Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for the x86 interrupt code: - Unbreak the magic 'search the timer interrupt' logic in IO/APIC code which got wreckaged when the core interrupt code made the state tracking logic stricter. That caused the interrupt line to stay masked after switching from IO/APIC to PIC delivery mode, which obviously prevents interrupts from being delivered. - Make run_on_irqstack_code() typesafe. The function argument is a void pointer which is then cast to 'void (fun)(void ). This breaks Control Flow Integrity checking in clang. Use proper helper functions for the three variants reuqired" * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Unbreak check_timer() x86/irq: Make run_on_irqstack_cond() typesafe	2020-09-27 12:15:21 -07:00
Linus Torvalds	ba25f0570b	Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A set of clocksource/clockevents updates: - Reset the TI/DM timer before enabling it instead of doing it the other way round. - Initialize the reload value for the GX6605s timer correctly so the hardware counter starts at 0 again after overrun. - Make error return value negative in the h8300 timer init function" * tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/timer-gx6605s: Fixup counter reload clocksource/drivers/timer-ti-dm: Do reset before enable clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()	2020-09-27 12:11:35 -07:00
Peter Xu	d042035eaf	mm/thp: Split huge pmds/puds if they're pinned when fork() Pinned pages shouldn't be write-protected when fork() happens, because follow up copy-on-write on these pages could cause the pinned pages to be replaced by random newly allocated pages. For huge PMDs, we split the huge pmd if pinning is detected. So that future handling will be done by the PTE level (with our latest changes, each of the small pages will be copied). We can achieve this by let copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and finally land the next copy_pte_range() call. Huge PUDs will be even more special - so far it does not support anonymous pages. But it can actually be done the same as the huge PMDs even if the split huge PUDs means to erase the PUD entries. It'll guarantee the follow up fault ins will remap the same pages in either parent/child later. This might not be the most efficient way, but it should be easy and clean enough. It should be fine, since we're tackling with a very rare case just to make sure userspaces that pinned some thps will still work even without MADV_DONTFORK and after they fork()ed. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-27 11:21:35 -07:00
Peter Xu	70e806e4e6	mm: Do early cow for pinned pages during fork() for ptes This allows copy_pte_range() to do early cow if the pages were pinned on the source mm. Currently we don't have an accurate way to know whether a page is pinned or not. The only thing we have is page_maybe_dma_pinned(). However that's good enough for now. Especially, with the newly added mm->has_pinned flag to make sure we won't affect processes that never pinned any pages. It would be easier if we can do GFP_KERNEL allocation within copy_one_pte(). Unluckily, we can't because we're with the page table locks held for both the parent and child processes. So the page allocation needs to be done outside copy_one_pte(). Some trick is there in copy_present_pte(), majorly the wrprotect trick to block concurrent fast-gup. Comments in the function should explain better in place. Oleg Nesterov reported a (probably harmless) bug during review that we didn't reset entry.val properly in copy_pte_range() so that potentially there's chance to call add_swap_count_continuation() multiple times on the same swp entry. However that should be harmless since even if it happens, the same function (add_swap_count_continuation()) will return directly noticing that there're enough space for the swp counter. So instead of a standalone stable patch, it is touched up in this patch directly. Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-27 11:21:35 -07:00
Peter Xu	7a4830c380	mm/fork: Pass new vma pointer into copy_page_range() This prepares for the future work to trigger early cow on pinned pages during fork(). No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-27 11:21:35 -07:00
Peter Xu	008cfe4418	mm: Introduce mm_struct.has_pinned (Commit message majorly collected from Jason Gunthorpe) Reduce the chance of false positive from page_maybe_dma_pinned() by keeping track if the mm_struct has ever been used with pin_user_pages(). This allows cases that might drive up the page ref_count to avoid any penalty from handling dma_pinned pages. Future work is planned, to provide a more sophisticated solution, likely to turn it into a real counter. For now, make it atomic_t but use it as a boolean for simplicity. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-27 11:21:35 -07:00
Thomas Gleixner	a7b6c0feda	Merge tag 'timers-v5.9-rc4' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent Pull clocksource/clockevent fixes from Daniel Lezcano: - Fix wrong signed return value when checking of_iomap in the probe function for the h8300 timer (Tianjia Zhang) - Fix reset sequence when setting up the timer on the dm_timer (Tony Lindgren) - Fix counter reload when the interrupt fires on gx6605s (Guo Ren)	2020-09-27 11:24:34 +02:00
Linus Torvalds	a1bffa4874	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three fixes: one in drivers (lpfc) and two for zoned block devices. The latter also impinges on the block layer but only to introduce a new block API for setting the zone model rather than fiddling with the queue directly in the zoned block driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: sd_zbc: Fix ZBC disk initialization scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported	2020-09-26 11:18:37 -07:00
Linus Torvalds	692495baa3	Merge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two fixes for regressions in this cycle, and one that goes to 5.8 stable: - fix leak of getname() retrieved filename - remove plug->nowait assignment, fixing a regression with btrfs - fix for async buffered retry" * tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block: io_uring: ensure async buffered read-retry is setup properly io_uring: don't unconditionally set plug->nowait = true io_uring: ensure open/openat2 name is cleaned on cancelation	2020-09-26 11:13:51 -07:00
Linus Torvalds	9d2fbaefb3	Merge tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "NVMe pull request from Christoph, and removal of a dead define. - fix error during controller probe that cause double free irqs (Keith Busch) - FC connection establishment fix (James Smart) - properly handle completions for invalid tags (Xianting Tian) - pass the correct nsid to the command effects and supported log (Chaitanya Kulkarni)" * tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block: block: remove unused BLK_QC_T_EAGAIN flag nvme-core: don't use NVME_NSID_ALL for command effects and supported log nvme-fc: fail new connections to a deleted host or remote port nvme-pci: fix NULL req in completion handler nvme: return errors for hwmon init	2020-09-26 11:07:36 -07:00
Linus Torvalds	eeddbe6841	Merge tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Vasily Gorbik: "Fix truncated ZCRYPT_PERDEV_REQCNT ioctl result. Copy entire reqcnt list" * tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl	2020-09-26 11:01:18 -07:00
Linus Torvalds	8fb1e91033	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "9 patches. Subsystems affected by this patch series: mm (thp, memcg, gup, migration, memory-hotplug), lib, and x86" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: don't rely on system state to detect hot-plug operations mm: replace memmap_context by meminit_context arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback lib/memregion.c: include memregion.h lib/string.c: implement stpcpy mm/migrate: correct thp migration stats mm/gup: fix gup_fast with dynamic page table folding mm: memcontrol: fix missing suffix of workingset_restore mm, THP, swap: fix allocating cluster for swapfile by mistake	2020-09-26 10:53:35 -07:00
Minchan Kim	ce2684254b	mm: validate pmd after splitting syzbot reported the following KASAN splat: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296 Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006 Call Trace: lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389 walk_pmd_range mm/pagewalk.c:89 [inline] walk_pud_range mm/pagewalk.c:160 [inline] walk_p4d_range mm/pagewalk.c:193 [inline] walk_pgd_range mm/pagewalk.c:229 [inline] __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331 walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427 madvise_pageout_page_range mm/madvise.c:521 [inline] madvise_pageout mm/madvise.c:557 [inline] madvise_vma mm/madvise.c:946 [inline] do_madvise+0x12d0/0x2090 mm/madvise.c:1145 __do_sys_madvise mm/madvise.c:1171 [inline] __se_sys_madvise mm/madvise.c:1169 [inline] __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The backing vma was shmem. In case of split page of file-backed THP, madvise zaps the pmd instead of remapping of sub-pages. So we need to check pmd validity after split. Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com Fixes: `1a4e58cce8` ("mm: introduce MADV_PAGEOUT") Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-26 10:48:08 -07:00
Laurent Dufour	f85086f95f	mm: don't rely on system state to detect hot-plug operations In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: `4fbce63391` ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-26 10:33:57 -07:00

1 2 3 4 5 ...

950941 Commits