linux

Author	SHA1	Message	Date
Anshuman Khandual	b4d6c06c8d	powerpc/perf: Configure BHRB filter before enabling PMU interrupts Right now the config_bhrb() PMU specific call happens after write_mmcr0(), which actually enables the PMU for event counting and interrupts. So there is a small window of time where the PMU and BHRB runs without the required HW branch filter (if any) enabled in BHRB. This can cause some of the branch samples to be collected through BHRB without any filter applied and hence affects the correctness of the results. This patch moves the BHRB config function call before enabling interrupts. Here are some data points captured via trace prints which depicts how we could get PMU interrupts with BHRB filter NOT enabled with a standard perf record command line (asking for branch record information as well). $ perf record -j any_call ls Before the patch:- ls-1962 [003] d... 2065.299590: .perf_event_interrupt: MMCRA: 40000000000 ls-1962 [003] d... 2065.299603: .perf_event_interrupt: MMCRA: 40000000000 ... All the PMU interrupts before this point did not have the requested HW branch filter enabled in the MMCRA. ls-1962 [003] d... 2065.299647: .perf_event_interrupt: MMCRA: 40040000000 ls-1962 [003] d... 2065.299662: .perf_event_interrupt: MMCRA: 40040000000 After the patch:- ls-1850 [008] d... 190.311828: .perf_event_interrupt: MMCRA: 40040000000 ls-1850 [008] d... 190.311848: .perf_event_interrupt: MMCRA: 40040000000 All the PMU interrupts have the requested HW BHRB branch filter enabled in MMCRA. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fixed up whitespace and cleaned up changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:50 +11:00
Nathan Fontenot	0ba3e10116	crypto/nx/nx-842: Fix handling of vmalloc addresses The powerpc specific nx-842 compression driver does not currently handle translating a vmalloc address to a physical address. The current driver uses __pa() for all addresses which does not properly handle vmalloc addresses and thus causes a failure since we do not pass a proper physical address to the hypervisor. This patch adds a routine to convert an address to a physical address by checking for vmalloc addresses and handling them properly. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> --- drivers/crypto/nx/nx-842.c \| 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:49 +11:00
Michael Ellerman	8d4887ee30	powerpc/pseries: Select ARCH_RANDOM on pseries We have a driver for the ARCH_RANDOM hook in rng.c, so we should select ARCH_RANDOM on pseries. Without this the build breaks if you turn ARCH_RANDOM off. This hasn't broken the build because pseries_defconfig doesn't specify a value for PPC_POWERNV, which is default y, and selects ARCH_RANDOM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:49 +11:00
Michael Ellerman	2fdd313f54	powerpc/perf: Add Power8 cache & TLB events Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:48 +11:00
Laurent Dufour	3b830c824a	powerpc/relocate fix relocate processing in LE mode Relocation's code is not working in little endian mode because the r_info field, which is a 64 bits value, should be read from the right offset. The current code is optimized to read the r_info field as a 32 bits value starting at the middle of the double word (offset 12). When running in LE mode, the read value is not correct since only the MSB is read. This patch removes this optimization which consist to deal with a 32 bits value instead of a 64 bits one. This way it works in big and little endian mode. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:48 +11:00
Mahesh Salgaonkar	429d2e8342	powerpc: Fix kdump hang issue on p8 with relocation on exception enabled. On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:47 +11:00
Mahesh Salgaonkar	3ec8b78fcc	powerpc/pseries: Disable relocation on exception while going down during crash. Disable relocation on exception while going down even in kdump case. This is because we are about clear htab mappings while kexec-ing into kdump kernel and we may run into issues if we still have AIL ON. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:47 +11:00
Thadeu Lima de Souza Cascardo	8cc6b6cd87	powerpc/eeh: Drop taken reference to driver on eeh_rmv_device Commit `f5c57710dd` ("powerpc/eeh: Use partial hotplug for EEH unaware drivers") introduces eeh_rmv_device, which may grab a reference to a driver, but not release it. That prevents a driver from being removed after it has gone through EEH recovery. This patch drops the reference if it was taken. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:46 +11:00
Paul Gortmaker	0215b4aa06	powerpc: Fix build failure in sysdev/mpic.c for MPIC_WEIRD=y Commit `446f6d06fa` ("powerpc/mpic: Properly set default triggers") breaks the mpc7447_hpc_defconfig as follows: CC arch/powerpc/sysdev/mpic.o arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type': arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant Looking at the cpp output (gcc 4.7.3), I see: case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] \| mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]: The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set for this platform, thus enabling the following: ------------------- #ifdef CONFIG_MPIC_WEIRD static u32 mpic_infos[][MPIC_IDX_END] = { [0] = { /* Original OpenPIC compatible MPIC / [...] #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name] #else / CONFIG_MPIC_WEIRD / #define MPIC_INFO(name) MPIC_##name #endif / CONFIG_MPIC_WEIRD */ ------------------- Here we convert the case section to if/else if, and also add the equivalent of a default case to warn about unknown types. Boot tested on sbc8548, build tested on all defconfigs. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:45 +11:00
Linus Torvalds	6792dfe383	Merge branch 'akpm' (patches from Andrew Morton) Merge misc fixes from Andrew Morton: "A bunch of fixes" * emailed patches fron Andrew Morton <akpm@linux-foundation.org>: ocfs2: check existence of old dentry in ocfs2_link() ocfs2: update inode size after zeroing the hole ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls mm: fix page leak at nfs_symlink() slub: do not assert not having lock in removing freed partial gitignore: add all.config ocfs2: fix ocfs2_sync_file() if filesystem is readonly drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem xen: properly account for _PAGE_NUMA during xen pte translations mm/slub.c: list_lock may not be held in some circumstances drivers/md/bcache/extents.c: use %zi to format size_t vmcore: prevent PT_NOTE p_memsz overflow during header update drivers/message/i2o/i2o_config.c: fix deadlock in compat_ioctl(I2OGETIOPS) Documentation/: update 00-INDEX files checkpatch: fix detection of git repository get_maintainer: fix detection of git repository drivers/misc/sgi-gru/grukdump.c: unlocking should be conditional in gru_dump_context()	2014-02-10 16:03:16 -08:00
Xue jiufei	0e048316ff	ocfs2: check existence of old dentry in ocfs2_link() System call linkat first calls user_path_at(), check the existence of old dentry, and then calls vfs_link()->ocfs2_link() to do the actual work. There may exist a race when Node A create a hard link for file while node B rm it. Node A Node B user_path_at() ->ocfs2_lookup(), find old dentry exist rm file, add inode say inodeA to orphan_dir call ocfs2_link(),create a hard link for inodeA. rm the link, add inodeA to orphan_dir again When orphan_scan work start, it calls ocfs2_queue_orphans() to do the main work. It first tranverses entrys in orphan_dir, linking all inodes in this orphan_dir to a list look like this: inodeA->inodeB->...->inodeA When tranvering this list, it will fall into loop, calling iput() again and again. And finally trigger BUG_ON(inode->i_state & I_CLEAR). Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:43 -08:00
Junxiao Bi	c7d2cbc364	ocfs2: update inode size after zeroing the hole fs-writeback will release the dirty pages without page lock whose offset are over inode size, the release happens at block_write_full_page_endio(). If not update, dirty pages in file holes may be released before flushed to the disk, then file holes will contain some non-zero data, this will cause sparse file md5sum error. To reproduce the bug, find a big sparse file with many holes, like vm image file, its actual size should be bigger than available mem size to make writeback work more frequently, tar it with -S option, then keep untar it and check its md5sum again and again until you get a wrong md5sum. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Younger Liu <younger.liu@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:43 -08:00
Younger Liu	d62e74be12	ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size The issue scenario is as following: - Create a small file and fallocate a large disk space for a file with FALLOC_FL_KEEP_SIZE option. - ftruncate the file back to the original size again. but the disk free space is not changed back. This is a real bug that be fixed in this patch. In order to solve the issue above, we modified ocfs2_setattr(), if attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and truncate disk space to attr->ia_size. Signed-off-by: Younger Liu <younger.liu@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Tested-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Jensen <shencanquan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:43 -08:00
Naoya Horiguchi	8d547ff4ac	mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED mce-test detected a test failure when injecting error to a thp tail page. This is because we take page refcount of the tail page in madvise_hwpoison() while the fix in commit `a3e0f9e47d` ("mm/memory-failure.c: transfer page count from head page to tail page after split thp") assumes that we always take refcount on the head page. When a real memory error happens we take refcount on the head page where memory_failure() is called without MF_COUNT_INCREASED set, so it seems to me that testing memory error on thp tail page using madvise makes little sense. This patch cancels moving refcount in !MF_COUNT_INCREASED for valid testing. [akpm@linux-foundation.org: s/&&/&/] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: <stable@vger.kernel.org> [3.9+: `a3e0f9e47d`] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:43 -08:00
Paul Gortmaker	fb37bb04d6	smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls Use what we already do for arch_disable_smp_support() to fix these: arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static? arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static? kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static? kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static? Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:42 -08:00
Rafael Aquini	a0b54adda3	mm: fix page leak at nfs_symlink() Changes in commit `a0b8cab3b9` ("mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API") have introduced a call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as now the page gets an extra refcount that is not dropped. Jan Stancek observed and reported the leak effect while running test8 from Connectathon Testsuite. After several iterations over the test case, which creates several symlinks on a NFS mountpoint, the test system was quickly getting into an out-of-memory scenario. This patch fixes the page leak by dropping that extra refcount add_to_page_cache_lru() is grabbing. Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: <stable@vger.kernel.org> [3.11.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:42 -08:00
Steven Rostedt	1e4dd9461f	slub: do not assert not having lock in removing freed partial Vladimir reported the following issue: Commit `c65c1877bd` ("slub: use lockdep_assert_held") requires remove_partial() to be called with n->list_lock held, but free_partial() called from kmem_cache_close() on cache destruction does not follow this rule, leading to a warning: WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0() Modules linked in: CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1 Hardware name: 0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600 0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000 ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0 Call Trace: __kmem_cache_shutdown+0x1b2/0x1f0 kmem_cache_destroy+0x43/0xf0 xfs_destroy_zones+0x103/0x110 [xfs] exit_xfs_fs+0x38/0x4e4 [xfs] SyS_delete_module+0x19a/0x1f0 system_call_fastpath+0x16/0x1b His solution was to add a spinlock in order to quiet lockdep. Although there would be no contention to adding the lock, that lock also requires disabling of interrupts which will have a larger impact on the system. Instead of adding a spinlock to a location where it is not needed for lockdep, make a __remove_partial() function that does not test if the list_lock is held, as no one should have it due to it being freed. Also added a __add_partial() function that does not do the lock validation either, as it is not needed for the creation of the cache. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:42 -08:00
Borislav Petkov	25fba9bebe	gitignore: add all.config This is used by kbuild to load preset Kconfig options. We need to ignore it, otherwise git clean kills it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:42 -08:00
Younger Liu	a987c7ca7f	ocfs2: fix ocfs2_sync_file() if filesystem is readonly If filesystem is readonly, there is no need to flush drive's caches or force any uncommitted transactions. [akpm@linux-foundation.org: return -EROFS, not 0] Signed-off-by: Younger Liu <younger.liucn@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:42 -08:00
Prarit Bhargava	79040cad3f	drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero If you do echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec the following stack trace is output because the edac module is not designed to poll with a timeout of zero. WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0() list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8). Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> __list_add+0xac/0xc0 __internal_add_timer+0xab/0x130 internal_add_timer+0x17/0x40 mod_timer_pinned+0xca/0x170 intel_pstate_timer_func+0x28a/0x380 call_timer_fn+0x36/0x100 run_timer_softirq+0x1ff/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 kernel BUG at kernel/timer.c:1084! invalid opcode: 0000 [#1] SMP Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> run_timer_softirq+0x245/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 RIP cascade+0x93/0xa0 WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0() Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Workqueue: edac-poller edac_mc_workq_function [edac_core] Call Trace: dump_stack+0x45/0x56 warn_slowpath_common+0x7d/0xa0 warn_slowpath_null+0x1a/0x20 __queue_delayed_work+0xed/0x1a0 queue_delayed_work_on+0x27/0x50 edac_mc_workq_function+0x72/0xa0 [edac_core] process_one_work+0x17b/0x460 worker_thread+0x11b/0x400 kthread+0xd2/0xf0 ret_from_fork+0x7c/0xb0 This patch adds a range check in the edac_mc_poll_msec code to check for 0. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:41 -08:00
Eric W. Biederman	96c7a2ff21	fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem Recently due to a spike in connections per second memcached on 3 separate boxes triggered the OOM killer from accept. At the time the OOM killer was triggered there was 4GB out of 36GB free in zone 1. The problem was that alloc_fdtable was allocating an order 3 page (32KiB) to hold a bitmap, and there was sufficient fragmentation that the largest page available was 8KiB. I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious but I do agree that order 3 allocations are very likely to succeed. There are always pathologies where order > 0 allocations can fail when there are copious amounts of free memory available. Using the pigeon hole principle it is easy to show that it requires 1 page more than 50% of the pages being free to guarantee an order 1 (8KiB) allocation will succeed, 1 page more than 75% of the pages being free to guarantee an order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of the pages being free to guarantee an order 3 allocate will succeed. A server churning memory with a lot of small requests and replies like memcached is a common case that if anything can will skew the odds against large pages being available. Therefore let's not give external applications a practical way to kill linux server applications, and specify __GFP_NORETRY to the kmalloc in alloc_fdmem. Unless I am misreading the code and by the time the code reaches should_alloc_retry in __alloc_pages_slowpath (where __GFP_NORETRY becomes signification). We have already tried everything reasonable to allocate a page and the only thing left to do is wait. So not waiting and falling back to vmalloc immediately seems like the reasonable thing to do even if there wasn't a chance of triggering the OOM killer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:41 -08:00
Mel Gorman	a9c8e4beee	xen: properly account for _PAGE_NUMA during xen pte translations Steven Noonan forwarded a users report where they had a problem starting vsftpd on a Xen paravirtualized guest, with this in dmesg: BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067 page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x2ffc0000000014(referenced\|dirty) addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74 CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1 Call Trace: dump_stack+0x45/0x56 print_bad_pte+0x22e/0x250 unmap_single_vma+0x583/0x890 unmap_vmas+0x65/0x90 exit_mmap+0xc5/0x170 mmput+0x65/0x100 do_exit+0x393/0x9e0 do_group_exit+0xcc/0x140 SyS_exit_group+0x14/0x20 system_call_fastpath+0x1a/0x1f Disabling lock debugging due to kernel taint BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1 The issue could not be reproduced under an HVM instance with the same kernel, so it appears to be exclusive to paravirtual Xen guests. He bisected the problem to commit `1667918b64` ("mm: numa: clear numa hinting information on mprotect") that was also included in 3.12-stable. The problem was related to how xen translates ptes because it was not accounting for the _PAGE_NUMA bit. This patch splits pte_present to add a pteval_present helper for use by xen so both bare metal and xen use the same code when checking if a PTE is present. [mgorman@suse.de: wrote changelog, proposed minor modifications] [akpm@linux-foundation.org: fix typo in comment] Reported-by: Steven Noonan <steven@uplinklabs.net> Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:41 -08:00
David Rientjes	255d0884f5	mm/slub.c: list_lock may not be held in some circumstances Commit `c65c1877bd` ("slub: use lockdep_assert_held") incorrectly required that add_full() and remove_full() hold n->list_lock. The lock is only taken when kmem_cache_debug(s), since that's the only time it actually does anything. Require that the lock only be taken under such a condition. Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:41 -08:00
Geert Uytterhoeven	bd180b4e2b	drivers/md/bcache/extents.c: use %zi to format size_t drivers/md/bcache/extents.c: In function `btree_ptr_bad_expensive': drivers/md/bcache/extents.c:196: warning: format `%li' expects type `long int', but argument 4 has type `size_t' Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:40 -08:00
Greg Pearson	38dfac843c	vmcore: prevent PT_NOTE p_memsz overflow during header update Currently, update_note_header_size_elf64() and update_note_header_size_elf32() will add the size of a PT_NOTE entry to real_sz even if that causes real_sz to exceeds max_sz. This patch corrects the while loop logic in those routines to ensure that does not happen and prints a warning if a PT_NOTE entry is dropped. If zero PT_NOTE entries are found or this condition is encountered because the only entry was dropped, a warning is printed and an error is returned. One possible negative side effect of exceeding the max_sz limit is an allocation failure in merge_note_headers_elf64() or merge_note_headers_elf32() which would produce console output such as the following while booting the crash kernel. vmalloc: allocation failure: 14076997632 bytes swapper/0: page allocation failure: order:0, mode:0x80d2 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7 Call Trace: dump_stack+0x19/0x1b warn_alloc_failed+0xf0/0x160 __vmalloc_node_range+0x19e/0x250 vmalloc_user+0x4c/0x70 merge_note_headers_elf64.constprop.9+0x116/0x24a vmcore_init+0x2d4/0x76c do_one_initcall+0xe2/0x190 kernel_init_freeable+0x17c/0x207 kernel_init+0xe/0x180 ret_from_fork+0x7c/0xb0 Kdump: vmcore not initialized kdump: dump target is /dev/sda4 kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/ kdump: saving vmcore-dmesg.txt Cannot open /proc/vmcore: No such file or directory kdump: saving vmcore-dmesg.txt failed kdump: saving vmcore kdump: saving vmcore failed This type of failure has been seen on a four socket prototype system with certain memory configurations. Most PT_NOTE sections have a single entry similar to: n_namesz = 0x5 n_descsz = 0x150 n_type = 0x1 Occasionally, a second entry is encountered with very large n_namesz and n_descsz sizes: n_namesz = 0x80000008 n_descsz = 0x510ae163 n_type = 0x80000008 Not yet sure of the source of these extra entries, they seem bogus, but they shouldn't cause crash dump to fail. Signed-off-by: Greg Pearson <greg.pearson@hp.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:40 -08:00
Alexey Khoroshilov	a3eb7fbb41	drivers/message/i2o/i2o_config.c: fix deadlock in compat_ioctl(I2OGETIOPS) i2o_cfg_compat_ioctl(I2OGETIOPS) locks i2o_cfg_mutex and then calls i2o_cfg_ioctl(I2OGETIOPS) that locks i2o_cfg_mutex as well. A deadlock is guaranteed. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:40 -08:00
Henrik Austad	3cf8ca1c25	Documentation/: update 00-INDEX files Some of the 00-INDEX files are somewhat outdated and some folders does not contain 00-INDEX at all. Only outdated (with the notably exception of spi) indexes are touched here, the 169 folders without 00-INDEX has not been touched. New 00-INDEX - spi/* was added in a series of commits dating back to 2006 Added files (missing in (/)00-INDEX) - dmatest.txt was added by commit `851b7e16a0` ("dmatest: run test via debugfs") - this_cpu_ops.txt was added by commit `a1b2a555d6` ("percpu: add documentation on this_cpu operations") - ww-mutex-design.txt was added by commit `040a0a3710` ("mutex: Add support for wound/wait style locks") - bcache.txt was added by commit `cafe563591` ("bcache: A block layer cache") - kernel-per-CPU-kthreads.txt was added by commit `49717cb404` ("kthread: Document ways of reducing OS jitter due to per-CPU kthreads") - phy.txt was added by commit `ff76496347` ("drivers: phy: add generic PHY framework") - block/null_blk was added by commit `12f8f4fc03` ("null_blk: documentation") - module-signing.txt was added by commit `3cafea3076` ("Add Documentation/module-signing.txt file") - assoc_array.txt was added by commit `3cb989501c` ("Add a generic associative array implementation.") - arm/IXP4xx was part of the initial repo - arm/cluster-pm-race-avoidance.txt was added by commit `7fe31d28e8` ("ARM: mcpm: introduce helpers for platform coherency exit/setup") - arm/firmware.txt was added by commit `7366b92a77` ("ARM: Add interface for registering and calling firmware-specific operations") - arm/kernel_mode_neon.txt was added by commit `2afd0a0524` ("ARM: 7825/1: document the use of NEON in kernel mode") - arm/tcm.txt was added by commit `bc581770cf` ("ARM: 5580/2: ARM TCM (Tightly-Coupled Memory) support v3") - arm/vlocks.txt was added by commit `9762f12d3e` ("ARM: mcpm: Add baremetal voting mutexes") - blackfin/gptimers-example.c, Makefile was added by commit `4b60779d5e` ("Blackfin: add an example showing how to use the gptimers API") - devicetree/usage-model.txt was added by commit `31134efc68` ("dt: Linux DT usage model documentation") - fb/api.txt was added by commit `fb21c2f428` ("fbdev: Add FOURCC-based format configuration API") - fb/sm501.txt was added by commit `e6a0498071` ("video, sm501: add edid and commandline support") - fb/udlfb.txt was added by commit `96f8d864af` ("fbdev: move udlfb out of staging.") - filesystems/Makefile was added by commit `1e0051ae48` ("Documentation/fs/: split txt and source files") - filesystems/nfs/nfsd-admin-interfaces.txt was added by commit `8a4c6e19cf` ("nfsd: document kernel interfaces for nfsd configuration") - ide/warm-plug-howto.txt was added by commit `f74c91413e` ("ide: add warm-plug support for IDE devices (take 2)") - laptops/Makefile was added by commit `d49129accc` ("Documentation/laptop/: split txt and source files") - leds/leds-blinkm.txt was added by commit `b54cf35a7f` ("LEDS: add BlinkM RGB LED driver, documentation and update MAINTAINERS") - leds/ledtrig-oneshot.txt was added by commit `5e417281cd` ("leds: add oneshot trigger") - leds/ledtrig-transient.txt was added by commit `44e1e9f8e7` ("leds: add new transient trigger for one shot timer activation") - m68k/README.buddha was part of the initial repo - networking/LICENSE.(qla3xxx\|qlcnic\|qlge) was added by commits `40839129f7`, `c4e84bde1d`, `5a4faa8737` - networking/Makefile was added by commit `3794f3e812` ("docsrc: build Documentation/ sources") - networking/i40evf.txt was added by commit `105bf2fe6b` ("i40evf: add driver to kernel build system") - networking/ipsec.txt was added by commit `b3c6efbc36` ("xfrm: Add file to document IPsec corner case") - networking/mac80211-auth-assoc-deauth.txt was added by commit `3cd7920a2b` ("mac80211: add auth/assoc/deauth flow diagram") - networking/netlink_mmap.txt was added by commit `5683264c39` ("netlink: add documentation for memory mapped I/O") - networking/nf_conntrack-sysctl.txt was added by commit `c9f9e0e159` ("netfilter: doc: add nf_conntrack sysctl api documentation") lan) - networking/team.txt was added by commit `3d249d4ca7` ("net: introduce ethernet teaming device") - networking/vxlan.txt was added by commit `d342894c5d` ("vxlan: virtual extensible lan") - power/runtime_pm.txt was added by commit `5e928f77a0` ("PM: Introduce core framework for run-time PM of I/O devices (rev. 17)") - power/charger-manager.txt was added by commit `3bb3dbbd56` ("power_supply: Add initial Charger-Manager driver") - RCU/lockdep-splat.txt was added by commit `d7bd2d68aa` ("rcu: Document interpretation of RCU-lockdep splats") - s390/kvm.txt was added by `5ecee4b` (KVM: s390: API documentation) - s390/qeth.txt was added by commit `b4d72c08b3` ("qeth: bridgeport support - basic control") - scheduler/sched-bwc.txt was added by commit `88ebc08ea9` ("sched: Add documentation for bandwidth control") - scsi/advansys.txt was added by commit `4bd6d7f356` ("[SCSI] advansys: Move documentation to Documentation/scsi") - scsi/bfa.txt was added by commit `1ec90174bd` ("[SCSI] bfa: add readme file") - scsi/bnx2fc.txt was added by commit `12b8fc10ea` ("[SCSI] bnx2fc: Add driver documentation") - scsi/cxgb3i.txt was added by commit `c3673464eb` ("[SCSI] cxgb3i: Add cxgb3i iSCSI driver.") - scsi/hpsa.txt was added by commit `992ebcf14f` ("[SCSI] hpsa: Add hpsa.txt to Documentation/scsi") - scsi/link_power_management_policy.txt was added by commit `ca77329fb7` ("[libata] Link power management infrastructure") - scsi/osd.txt was added by commit `78e0c621de` ("[SCSI] osd: Documentation for OSD library") - scsi/scsi-parameter.txt was created/moved by commit `163475fb11` ("Documentation: move SCSI parameters to their own text file") - serial/driver was part of the initial repo - serial/n_gsm.txt was added by commit `323e84122e` ("n_gsm: add a documentation") - timers/Makefile was added by commit `3794f3e812` ("docsrc: build Documentation/ sources") - virt/kvm/s390.txt was added by commit `d9101fca3d` ("KVM: s390: diagnose call documentation") - vm/split_page_table_lock was added by commit `49076ec2cc` ("mm: dynamically allocate page->ptl if it cannot be embedded to struct page") - w1/slaves/w1_ds28e04 was added by commit `fbf7f7b4e2` ("w1: Add 1-wire slave device driver for DS28E04-100") - w1/masters/omap-hdq was added by commit `e0a29382c6` ("hdq: documentation for OMAP HDQ") - x86/early-microcode.txt was added by commit `0d91ea86a8` ("x86, doc: Documentation for early microcode loading") - x86/earlyprintk.txt was added by commit `a1aade4788` ("x86/doc: mini-howto for using earlyprintk=dbgp") - x86/entry_64.txt was added by commit `8b4777a4b5` ("x86-64: Document some of entry_64.S") - x86/pat.txt was added by commit `d27554d874` ("x86: PAT documentation") Moved files - arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by commit `37b8304642` ("ARM: kuser: move interface documentation out of the source code") - efi-stub.txt was moved out of x86/ and down into Documentation/ in commit `4172fe2f8a` ("EFI stub documentation updates") - laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit `efcfed9bad` ("Move hp_accel to drivers/platform/x86") - commit `5616c23ad9` ("x86: doc: move x86-generic documentation from Doc/x86/i386"): x86/usb-legacy-support.txt * x86/boot.txt * x86/zero_page.txt - power/video_extension.txt was moved to acpi in commit `70e66e4df1` ("ACPI / video: move video_extension.txt to Documentation/acpi") Removed files (left in 00-INDEX) - memory.txt was removed by commit `00ea8990aa` ("memory.txt: remove stray information") - gpio.txt was moved to gpio/ in commit `fd8e198cfc` ("Documentation: gpiolib: document new interface") - networking/DLINK.txt was removed by commit `168e06ae26` ("drivers/net: delete old parallel port de600/de620 drivers") - serial/hayes-esp.txt was removed by commit `f53a2ade0b` ("tty: esp: remove broken driver") - s390/TAPE was removed by commit `9e280f6693` ("[S390] remove tape block docu") - vm/locking was removed by commit `57ea8171d2` ("mm: documentation: remove hopelessly out-of-date locking doc") - laptops/acer-wmi.txt was remvoed by commit `020036678e` ("acer-wmi: Delete out-of-date documentation") Typos/misc issues - rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit `030d794bf4` ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") - commit `b88cf73d92` ("net: add missing entries to Documentation/networking/00-INDEX") * generic-hdlc.txt was added as generic_hdlc.txt * spider_net.txt was added as spider-net.txt - w1/master/mxc-w1 was added as mxc_w1 by commit `a5fd9139f7` ("w1: add 1-wire master driver for i.MX27 / i.MX31") - s390/zfcpdump.txt was added as zfcpdump by commit `6920c12a40` ("[S390] Add Documentation/s390/00-INDEX.") Signed-off-by: Henrik Austad <henrik@austad.us> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [rcu bits] Acked-by: Rob Landley <rob@landley.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Gleb Natapov <gleb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Len Brown <len.brown@intel.com> Cc: James Bottomley <JBottomley@parallels.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:40 -08:00
Richard Genoud	3645e3283b	checkpatch: fix detection of git repository Since git v1.7.7, the .git directory can be a file when, for example, the kernel is a submodule of another git super project. So, the check "-d .git" is not working anymore in this case. Using a more generic check like "-e .git" corrects this behaviour. Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:40 -08:00
Richard Genoud	ec83b616a7	get_maintainer: fix detection of git repository Since git v1.7.7, the .git directory can be a file when, for example, the kernel is a submodule of another git super project. So, the check "-d .git" is not working anymore in this case. Using a more generic check like "-e .git" corrects this behaviour. Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:40 -08:00
Dan Carpenter	49d3d6c37a	drivers/misc/sgi-gru/grukdump.c: unlocking should be conditional in gru_dump_context() I was reviewing this and noticed that unlocking should be conditional on the error path. I've changed it to unlock and return directly since we only do it once and it seems unlikely to change in the near future. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-02-10 16:01:39 -08:00
John Ogness	bf06200e73	tcp: tsq: fix nonagle handling Commit `46d3ceabd8` ("tcp: TCP Small Queues") introduced a possible regression for applications using TCP_NODELAY. If TCP session is throttled because of tsq, we should consult tp->nonagle when TX completion is done and allow us to send additional segment, especially if this segment is not a full MSS. Otherwise this segment is sent after an RTO. [edumazet] : Cooked the changelog, added another fix about testing sk_wmem_alloc twice because TX completion can happen right before setting TSQ_THROTTLED bit. This problem is particularly visible with recent auto corking, but might also be triggered with low tcp_limit_output_bytes values or NIC drivers delaying TX completion by hundred of usec, and very low rtt. Thomas Glanzmann for example reported an iscsi regression, caused by tcp auto corking making this bug quite visible. Fixes: `46d3ceabd8` ("tcp: TCP Small Queues") Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 15:23:39 -08:00
David S. Miller	684bd2e199	Merge branch 'bridge' Toshiaki Makita says: ==================== bridge: Fix corner case problems around local fdb entries There are so many corner cases that are not handled properly around local fdb entries. - We might fail to delete the old entry and might delete an arbitrary local entry when changing mac address of a bridge port. - We always fail to delete the old entry when changing mac address of the bridge device. - We might incorrectly delete a necessary entry when detaching a bridge port. - We might incorrectly delete a necessary entry when deleting a vlan. and so on. This is a patch series to fix these issues. v3: - Handle NTF_USE case in patch 1/9, commented by Vlad Yasevich. - Tested port detach/attach and didn't find any problem with patch 5/9, suggested by Stephen Hemminger. - Add comments about possible inconsistent state in current implementation into commit log of patch 5/9, found by the above test. - Reword unintensive changelog of patch 7/9, commented by Vlad Yasevich. v2: - Change the way to find the old entry in br_fdb_changeaddr() from memorizing previous port address to introducing a new flag indicating whether a fdb entry is added by user or not, commented by Stephen Hemminger. - Add a fix for the way to insert a new address in br_fdb_changeaddr(). - Prevent creating an entry such that its dst is NULL in br_add_if() to preserve old behavior, commented by Vlad Yasevich. - Add more comments about slight behavior change, where the bridge device come to be able to receive traffic to an address it has during short window, to changelogs, commented by Vlad Yasevich. - Add a fix for possible race in br_fdb_change_mac_address(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:35:11 -08:00
Toshiaki Makita	ac4c886883	bridge: Prevent possible race condition in br_fdb_change_mac_address br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without br->hash_lock. These hash list updates are racy with br_fdb_update()/br_fdb_cleanup(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	424bb9c97c	bridge: Properly check if local fdb entry can be deleted when deleting vlan Vlan codes unconditionally delete local fdb entries. We should consider the possibility that other ports have the same address and vlan. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev eth0 vid 10 bridge vlan add dev eth1 vid 10 bridge vlan add dev br0 vid 10 self We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, delete eth0 vlan 10. bridge vlan del dev eth0 vid 10 In this case, we still need the entry for br0, but it will be deleted. Note that br0 needs the entry even though its mac address is not set manually. To delete the entry with proper condition checking, fdb_delete_local() is suitable to use. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	a778e6d1a5	bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port br_fdb_delete_by_port() doesn't care about vlan and mac address of the bridge device. As the check is almost the same as mac address changing, slightly modify fdb_delete_local() and use it. Note that we can always set added_by_user to 0 in fdb_delete_local() because - br_fdb_delete_by_port() calls fdb_delete_local() for local entries regardless of its added_by_user. In this case, we have to check if another port has the same address and vlan, and if found, we have to create the entry (by changing dst). This is kernel-added entry, not user-added. - br_fdb_changeaddr() doesn't call fdb_delete_local() for user-added entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	960b589f86	bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address br_fdb_change_mac_address() doesn't check if the local entry has the same address as any of bridge ports. Although I'm not sure when it is beneficial, current implementation allow the bridge device to receive any mac address of its ports. To preserve this behavior, we have to check if the mac address of the entry being deleted is identical to that of any port. As this check is almost the same as that in br_fdb_changeaddr(), create a common function fdb_delete_local() and call it from br_fdb_changeadddr() and br_fdb_change_mac_address(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	2b292fb4a5	bridge: Fix the way to check if a local fdb entry can be deleted We should take into account the followings when deleting a local fdb entry. - nbp_vlan_find() can be used only when vid != 0 to check if an entry is deletable, because a fdb entry with vid 0 can exist at any time while nbp_vlan_find() always return false with vid 0. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address 12:34:56:78:90:ab brctl addif br0 eth0 brctl addif br0 eth1 ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the bridge port eth1 still has that address. - The port to which the bridge device is attached might needs a local entry if its mac address is set manually. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab brctl addif br0 eth0 ip link set br0 address 12:34:56:78:90:ab ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be deleted. We can use br->dev->addr_assign_type to check if the address is manually set or not, but I propose another approach. Since we delete and insert local entries whenever changing mac address of the bridge device, we can change dst of the entry to NULL regardless of addr_assign_type when deleting an entry associated with a certain port, and if it is found to be unnecessary later, then delete it. That is, if changing mac address of a port, the entry might be changed to its dst being NULL first, but is eventually deleted when recalculating and changing bridge id. This approach is especially useful when we want to share the code with deleting vlan in which the bridge device might want such an entry regardless of addr_assign_type, and makes things easy because we don't have to consider if mac address of the bridge device will be changed or not at the time we delete a local entry of a port, which means fdb code will not be bothered even if the bridge id calculating logic is changed in the future. Also, this change reduces inconsistent state, where frames whose dst is the mac address of the bridge, can't reach the bridge because of premature fdb entry deletion. This change reduces the possibility that the bridge device replies unreachable mac address to arp requests, which could occur during the short window between calling del_nbp() and br_stp_recalculate_bridge_id() in br_del_if(). This will effective after br_fdb_delete_by_port() starts to use the same code by following patch. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a4b816d8ba	bridge: Change local fdb entries whenever mac address of bridge device changes Vlan code may need fdb change when changing mac address of bridge device even if it is caused by the mac address changing of a bridge port. Example configuration: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev br0 vid 10 self bridge vlan add dev eth0 vid 10 We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, change the mac address of eth0 to greater value. ip link set eth0 address ee:ff:12:34:56:78 Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff. However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not able to communicate using br0 on vlan 10. Address this issue by deleting and adding local entries whenever changing the mac address of the bridge device. If there already exists an entry that has the same address, for example, in case that br_fdb_changeaddr() has already inserted it, br_fdb_change_mac_address() will simply fail to insert it and no duplicated entry will be made, as it was. This approach also needs br_add_if() to call br_fdb_insert() before br_stp_recalculate_bridge_id() so that we don't create an entry whose dst == NULL in this function to preserve previous behavior. Note that this is a slight change in behavior where the bridge device can receive the traffic to the new address before calling br_stp_recalculate_bridge_id() in br_add_if(). However, it is not a problem because we have already the address on the new port and such a way to insert new one before recalculating bridge id is taken in br_device_event() as well. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a3ebb7efe7	bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address We have been always failed to delete the old entry at br_fdb_change_mac_address() because br_set_mac_address() updates dev->dev_addr before calling br_fdb_change_mac_address() and br_fdb_change_mac_address() uses dev->dev_addr to find the old entry. That update of dev_addr is completely unnecessary because the same work is done in br_stp_change_bridge_id() which is called right away after calling br_fdb_change_mac_address(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	2836882fe0	bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr Since commit `bc9a25d21e` ("bridge: Add vlan support for local fdb entries"), br_fdb_changeaddr() has inserted a new local fdb entry only if it can find old one. But if we have two ports where they have the same address or user has deleted a local entry, there will be no entry for one of the ports. Example of problematic case: ip link set eth0 address aa:bb:cc:dd:ee:ff ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # eth1 will not have a local entry due to dup. ip link set eth1 address 12:34:56:78:90:ab Then, the new entry for the address 12:34:56:78:90:ab will not be created, and the bridge device will not be able to communicate. Insert new entries regardless of whether we can find old entries or not. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a5642ab474	bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr br_fdb_changeaddr() assumes that there is at most one local entry per port per vlan. It used to be true, but since commit `36fd2b63e3` ("bridge: allow creating/deleting fdb entries via netlink"), it has not been so. Therefore, the function might fail to search a correct previous address to be deleted and delete an arbitrary local entry if user has added local entries manually. Example of problematic case: ip link set eth0 address ee:ff:12:34:56:78 brctl addif br0 eth0 bridge fdb add 12:34:56:78:90:ab dev eth0 master ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the address 12:34:56:78:90:ab might be deleted instead of ee:ff:12:34:56:78, the original mac address of eth0. Address this issue by introducing a new flag, added_by_user, to struct net_bridge_fdb_entry. Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases like: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master brctl addif br0 eth1 brctl delif br0 eth0 In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff, but it also should have been added by "brctl addif br0 eth1" originally, so we don't delete it and treat it a new kernel-created entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Jens Axboe	9d4cb8e3a5	Merge branch 'stable/for-jens-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-linus Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git stable/for-jens-3.14 which is based off v3.13-rc6. If you would like me to rebase it on a different branch/tag I would be more than happy to do so. The patches are all bug-fixes and hopefully can go in 3.14. They deal with xen-blkback shutdown and cause memory leaks as well as shutdown races. They should go to stable tree and if you are OK with I will ask them to backport those fixes. There is also a fix to xen-blkfront to deal with unexpected state transition. And lastly a fix to the header where it was using the __aligned__ unnecessarily.	2014-02-10 12:52:34 -07:00
Michal Simek	46f5b96085	ARM: zynq: Reserve not DMAable space in front of the kernel Reserve space from 0x0 - __pa(swapper_pg_dir), if kernel is loaded from 0, which is not DMAable. It is causing problem with MMC driver and others which want to add dma buffers to this space. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 10:45:42 -08:00
Kevin Hilman	8760f7990a	First series of AT91 fixes for 3.14. All of them are DT-related. - fixes for typos in i2c and ohci clocks - addition of a USB host node for at91sam9n12ek - 2 DT documentation updates that have been sent a long time ago - a new board based on the sama5d36 SoC -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJS9QjhAAoJEAf03oE53VmQwwMIAMj6mM4Fv042REUssGIeVGlk kmt3NzFeQI2ipTsgdwloAd53Hvty9RiX0ww4ZIGnxy8eZdEaRy6aDCohQsCvfYtx LDcGdmIZ9056+62h/dqEbKYzs23j+diSJPe5sJF3/EHRRrRtBLSccEjnAMYI+Vqx Gpj4XplbG89yXQZb1sKCCo9OsScu1ncdXs6G1ghZt1WhcvDwRuf/C0Af5KYCOibs iXyMYxw7dd2VS8DxABxfi4xhW/ZZFR5ml66XdEW6MQLRG5Z33v7v7S9Wt0DV8uqd C+naFnbm09/miHFX8bGZeI1Yaavs+UhKdD86fCn0Mn4uHNr4IdqqLGPEkZpjyJA= =m41s -----END PGP SIGNATURE----- Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixes From Nicolas Ferre: First series of AT91 fixes for 3.14. All of them are DT-related. - fixes for typos in i2c and ohci clocks - addition of a USB host node for at91sam9n12ek - 2 DT documentation updates that have been sent a long time ago - a new board based on the sama5d36 SoC * tag 'at91-fixes' of git://github.com/at91linux/linux-at91: ARM: at91: add Atmel's SAMA5D3 Xplained board spi/atmel: document clock properties mmc: atmel-mci: document clock properties ARM: at91: enable USB host on at91sam9n12ek board ARM: at91/dt: fix sama5d3 ohci hclk clock reference ARM: at91/dt: sam9263: fix compatibility string for the I2C Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 10:45:26 -08:00
Nishanth Menon	5db1dabc56	ARM: multi_v7_defconfig: Select CONFIG_SOC_DRA7XX Select CONFIG_SOC_DRA7XX so that we can boot dra7-evm. DRA7 family are A15 based processors that supports LPAE and an evolutionary update to the OMAP5 generation of processors. Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 10:38:06 -08:00
Philipp Zabel	e7c57ecd60	ARM: imx6: Initialize low-power mode early again Since commit `9e8147bb5e` "ARM: imx6q: move low-power code out of clock driver" the kernel fails to boot on i.MX6Q/D if preemption is enabled (CONFIG_PREEMPT=y). The kernel just hangs before the console comes up. The above commit moved the initalization of the low-power mode setting (enabling clocked WAIT states), which was introduced in commit `83ae20981a` "ARM: imx: correct low-power mode setting", from imx6q_clks_init to imx6q_pm_init. Now it is called much later, after all cores are enabled. This patch moves the low-power mode initialization back to imx6q_clks_init again (and to imx6sl_clks_init). Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 10:37:32 -08:00
Linus Torvalds	cbf2822a7d	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French: "Small fix from Jeff for writepages leak, and some fixes for ACLs and xattrs when SMB2 enabled. Am expecting another fix from Jeff and at least one more fix (for mounting SMB2 with cifsacl) in the next week" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] clean up page array when uncached write send fails cifs: use a flexarray in cifs_writedata retrieving CIFS ACLs when mounted with SMB2 fails dropping session Add protocol specific operation for CIFS xattrs	2014-02-10 10:33:50 -08:00
Linus Walleij	9705e74671	ARM: pxa: fix various compilation problems Due to commit `88f718e3fa` "ARM: pxa: delete the custom GPIO header" some drivers fail compilation, for example like this: In file included from sound/soc/pxa/spitz.c:28:0: sound/soc/pxa/spitz.c: In function ‘spitz_ext_control’: arch/arm/mach-pxa/include/mach/spitz.h:111:30: error: ‘PXA_NR_BUILTIN_GPIO’ undeclared (first use in this function) #define SPITZ_SCP_GPIO_BASE (PXA_NR_BUILTIN_GPIO) (etc.) This is caused by implicit inclusion of <mach/irqs.h> from various board-specific headers under <mach/*> in the PXA platform. So we take a sweep over these, and for every such header that uses PXA_NR_BUILTIN_GPIO or PXA_GPIO_TO_IRQ() we explicitly #include "irqs.h" so that we satisfy the dependency in the board include file alone. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 10:33:04 -08:00
Linus Walleij	29ffa48fa6	ARM: pxa: fix compilation problem on AM300EPD board This board fails compilation like this: arch/arm/mach-pxa/am300epd.c: In function ‘am300_cleanup’: arch/arm/mach-pxa/am300epd.c:179:2: error: implicit declaration of function ‘PXA_GPIO_TO_IRQ’ [-Werror=implicit-function-declaration] free_irq(PXA_GPIO_TO_IRQ(RDY_GPIO_PIN), par); This was caused by commit `88f718e3fa` "ARM: pxa: delete the custom GPIO header" This is because it was previously getting the macro PXA_GPIO_TO_IRQ implicitly from <linux/gpio.h> which in turn implicitly included <mach/gpio.h> which in turn included <mach/irqs.h>. Add the missing include so that the board compiles again. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 10:32:08 -08:00
Kevin Hilman	41a11e31d4	mvebu phy and ata fixes for v3.14 - phy - add support for optional phys via NULL - ata - fix boot hang due to probe failure of optional phys NOTE: Series has been Ack'd by both the phy maintainer and the ata maintainer for going through arm-soc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS9mY1AAoJEP45WPkGe8ZnbcYP/jkidupz/J/AFkZ9h94hxBin qxU54nQkUNSfhwqhZzVoED7DMCokCNj09t4I3b++YHg1SDXcRpPbZfuUcEDd7RD9 ysiyFlrm6pv68KyYuR2IFOtaEP2MRU4iewgzMYHDYNCrovUoiy9TA/owdX2Pamnw qL9knDOCvRJvByTlv1+13giFbDu0JHpPiOOmW8S53c9jfrWuBX+VzVfF8UHTiJhg 3d3NfQlWcIpvAA4nZbc2+4gubiv1PeWYfJ48nWw95uzsfEIPgXGNklJpHwuF4u0J sJpZNP5L7Y4xJCGBCjMIgAx7YDs3mnkGGDLC9n0as9HTp0X0edMD0gnJVUOszZS9 CQZSnmMxtbY6w6WzGo1fHD6c4cZoVUc6TsJqwagblxN2oR70QlPu/jodwFieTbnE iZIUAMBQZMaE6xnBWfp1LDy/LW2IiaLGFGAaQydVdl8Sys4iXqyG18qKi3VAvhwM JrLMsYXDR400n/E9b0dMFmCpGT0z/NPug1GFH750rtDVe8p1vI2BJJfRhmetMMP0 EYyS11i3Z6IaGCAxph6YaCPbtlASZsuJRJznmyk+4CWhgMloYJpSOtJQX9D5gM0u hRxABHz/sw32W/TyvhSR6Dp50Oa+vRyz69difVaannzzw8V5li8iQ7fqfdeUBUUc +i1mKeNKg8HNaW/I8LEk =TXSu -----END PGP SIGNATURE----- Merge tag 'mvebu-phy_ata-fixes-3.14' of git://git.infradead.org/linux-mvebu into fixes From Jason Cooper: mvebu phy and ata fixes for v3.14 - phy - add support for optional phys via NULL - ata - fix boot hang due to probe failure of optional phys NOTE: Series has been Ack'd by both the phy maintainer and the ata maintainer for going through arm-soc * tag 'mvebu-phy_ata-fixes-3.14' of git://git.infradead.org/linux-mvebu: ata: sata_mv: Fix probe failures with optional phys drivers: phy: Add support for optional phys drivers: phy: Make NULL a valid phy reference Signed-off-by: Kevin Hilman <khilman@linaro.org>	2014-02-10 09:44:45 -08:00

... 2 3 4 5 6 ...

426659 Commits