linux

Author	SHA1	Message	Date
David Sterba	7ab19625a9	btrfs: add write protection to SET_FEATURES ioctl Perform the want_write check if we get far enough to do any writes. Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Anand Jain	48b3b9d401	btrfs: fix lock dep warning move scratch super outside of chunk_mutex Move scratch super outside of the chunk lock to avoid below lockdep warning. The better place to scratch super is in the function btrfs_rm_dev_replace_free_srcdev() just before free_device, which is outside of the chunk lock as well. To reproduce: (fresh boot) mkfs.btrfs -f -draid5 -mraid5 /dev/sdc /dev/sdd /dev/sde mount /dev/sdc /btrfs dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 (get devmgt from https://github.com/asj/devmgt.git) devmgt detach /dev/sde dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 sync btrfs replace start -Brf 3 /dev/sdf /btrfs <-- devmgt attach host7 ====================================================== [ INFO: possible circular locking dependency detected ] 4.6.0-rc2asj+ #1 Not tainted --------------------------------------------------- btrfs/2174 is trying to acquire lock: (sb_writers){.+.+.+}, at: [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0 but task is already holding lock: (&fs_info->chunk_mutex){+.+.+.}, at: [<ffffffffa05c5f55>] btrfs_dev_replace_finishing+0x145/0x980 [btrfs] which lock already depends on the new lock. Chain exists of: sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(sb_writers); * DEADLOCK * -> #0 (sb_writers){.+.+.+}: [<ffffffff810e6415>] __lock_acquire+0x1bc5/0x1ee0 [<ffffffff810e707e>] lock_acquire+0xbe/0x210 [<ffffffff810df49a>] percpu_down_read+0x4a/0xa0 [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0 [<ffffffff81265534>] mnt_want_write+0x24/0x50 [<ffffffff812508a2>] path_openat+0x952/0x1190 [<ffffffff81252451>] do_filp_open+0x91/0x100 [<ffffffff8123f5cc>] file_open_name+0xfc/0x140 [<ffffffff8123f643>] filp_open+0x33/0x60 [<ffffffffa0572bb6>] update_dev_time+0x16/0x40 [btrfs] [<ffffffffa057f60d>] btrfs_scratch_superblocks+0x5d/0xb0 [btrfs] [<ffffffffa057f70e>] btrfs_rm_dev_replace_remove_srcdev+0xae/0xd0 [btrfs] [<ffffffffa05c62c5>] btrfs_dev_replace_finishing+0x4b5/0x980 [btrfs] [<ffffffffa05c6ae8>] btrfs_dev_replace_start+0x358/0x530 [btrfs] Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Ashish Samant	2473114981	btrfs: Fix BUG_ON condition in scrub_setup_recheck_block() pagev array in scrub_block{} is of size SCRUB_MAX_PAGES_PER_BLOCK. page_index should be checked with the same to trigger BUG_ON(). Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Josef Bacik	e042d1ec44	Btrfs: remove BUG_ON()'s in btrfs_map_block btrfs_map_block can go horribly wrong in the face of fs corruption, lets agree to not be assholes and panic at any possible chance things are all fucked up. Signed-off-by: Josef Bacik <jbacik@fb.com> [ removed type casts ] Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Liu Bo	3d8da67817	Btrfs: fix divide error upon chunk's stripe_len The struct 'map_lookup' uses type int for @stripe_len, while btrfs_chunk_stripe_len() can return a u64 value, and it may end up with @stripe_len being undefined value and it can lead to 'divide error' in __btrfs_map_block(). This changes 'map_lookup' to use type u64 for stripe_len, also right now we only use BTRFS_STRIPE_LEN for stripe_len, so this adds a valid checker for BTRFS_STRIPE_LEN. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ folded division fix to scrub_raid56_parity ] Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	ee17fc8005	btrfs: sysfs: protect reading label by lock If the label setting ioctl races with sysfs label handler, we could get mixed result in the output, part old part new. We should either get the old or new label. The chances to hit this race are low. Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	66ac9fe7ba	btrfs: add check to sysfs handler of label Add a sanity check for the fs_info as we will dereference it, similar to what the 'store features' handler does. Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	ee6111386a	btrfs: add read-only check to sysfs handler of features We don't want to trigger the change on a read-only filesystem, similar to what the label handler does. Signed-off-by: David Sterba <dsterba@suse.cz>	2016-05-06 15:22:49 +02:00
David Sterba	e6c11f9a46	btrfs: reuse existing variable in scrub_stripe, reduce stack usage The key variable occupies 17 bytes, the key_start is used once, we can simply reuse existing 'key' for that purpose. As the key is not a simple type, compiler doest not do it on itself. Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	49a3c4d9b6	btrfs: use dynamic allocation for root item in create_subvol The size of root item is more than 400 bytes, which is quite a lot of stack space. As we do IO from inside the subvolume ioctls, we should keep the stack usage low in case the filesystem is on top of other layers (NFS, device mapper, iscsi, etc). Reviewed-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	153519559a	btrfs: clone: use vmalloc only as fallback for nodesize bufer Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	2f91306a37	btrfs: send: use vmalloc only as fallback for clone_sources_tmp Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	c03d01f340	btrfs: send: use vmalloc only as fallback for clone_roots Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	e55d1153db	btrfs: send: use temporary variable to store allocation size We're going to use the argument multiple times later. Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	eb5b75fe2e	btrfs: send: use vmalloc only as fallback for read_buf Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
David Sterba	6ff48ce06b	btrfs: send: use vmalloc only as fallback for send_buf Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Anand Jain	779bf3fefa	btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex When the replace target fails, the target device will be taken out of fs device list, scratch + update_dev_time and freed. However we could do the scratch + update_dev_time and free part after the device has been taken out of device list, so that we don't have to hold the device_list_mutex and uuid_mutex locks. Reported issue: [ 5375.718845] ====================================================== [ 5375.718846] [ INFO: possible circular locking dependency detected ] [ 5375.718849] 4.4.5-scst31x-debug-11+ #40 Not tainted [ 5375.718849] ------------------------------------------------------- [ 5375.718851] btrfs-health/4662 is trying to acquire lock: [ 5375.718861] (sb_writers){.+.+.+}, at: [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0 [ 5375.718862] [ 5375.718862] but task is already holding lock: [ 5375.718907] (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs] [ 5375.718907] [ 5375.718907] which lock already depends on the new lock. [ 5375.718907] [ 5375.718908] [ 5375.718908] the existing dependency chain (in reverse order) is: [ 5375.718911] [ 5375.718911] -> #3 (&fs_devs->device_list_mutex){+.+.+.}: [ 5375.718917] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.718921] [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0 [ 5375.718940] [<ffffffffa0219bf6>] btrfs_show_devname+0x36/0x210 [btrfs] [ 5375.718945] [<ffffffff81267079>] show_vfsmnt+0x49/0x150 [ 5375.718948] [<ffffffff81240b07>] m_show+0x17/0x20 [ 5375.718951] [<ffffffff81246868>] seq_read+0x2d8/0x3b0 [ 5375.718955] [<ffffffff8121df28>] __vfs_read+0x28/0xd0 [ 5375.718959] [<ffffffff8121e806>] vfs_read+0x86/0x130 [ 5375.718962] [<ffffffff8121f4c9>] SyS_read+0x49/0xa0 [ 5375.718966] [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 5375.718968] [ 5375.718968] -> #2 (namespace_sem){+++++.}: [ 5375.718971] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.718974] [<ffffffff81635199>] down_write+0x49/0x80 [ 5375.718977] [<ffffffff81243593>] lock_mount+0x43/0x1c0 [ 5375.718979] [<ffffffff81243c13>] do_add_mount+0x23/0xd0 [ 5375.718982] [<ffffffff81244afb>] do_mount+0x27b/0xe30 [ 5375.718985] [<ffffffff812459dc>] SyS_mount+0x8c/0xd0 [ 5375.718988] [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 5375.718991] [ 5375.718991] -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}: [ 5375.718994] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.718996] [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0 [ 5375.719001] [<ffffffff8122d608>] path_openat+0x468/0x1360 [ 5375.719004] [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0 [ 5375.719007] [<ffffffff8121da7b>] do_sys_open+0x12b/0x210 [ 5375.719010] [<ffffffff8121db7e>] SyS_open+0x1e/0x20 [ 5375.719013] [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 5375.719015] [ 5375.719015] -> #0 (sb_writers){.+.+.+}: [ 5375.719018] [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0 [ 5375.719021] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.719026] [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0 [ 5375.719028] [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0 [ 5375.719031] [<ffffffff81242eb4>] mnt_want_write+0x24/0x50 [ 5375.719035] [<ffffffff8122ded2>] path_openat+0xd32/0x1360 [ 5375.719037] [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0 [ 5375.719040] [<ffffffff8121d8a4>] file_open_name+0xe4/0x130 [ 5375.719043] [<ffffffff8121d923>] filp_open+0x33/0x60 [ 5375.719073] [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs] [ 5375.719099] [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs] [ 5375.719123] [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs] [ 5375.719150] [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs] [ 5375.719175] [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs] [ 5375.719199] [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs] [ 5375.719222] [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs] [ 5375.719225] [<ffffffff810a70df>] kthread+0xef/0x110 [ 5375.719229] [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70 [ 5375.719230] [ 5375.719230] other info that might help us debug this: [ 5375.719230] [ 5375.719233] Chain exists of: [ 5375.719233] sb_writers --> namespace_sem --> &fs_devs->device_list_mutex [ 5375.719233] [ 5375.719234] Possible unsafe locking scenario: [ 5375.719234] [ 5375.719234] CPU0 CPU1 [ 5375.719235] ---- ---- [ 5375.719236] lock(&fs_devs->device_list_mutex); [ 5375.719238] lock(namespace_sem); [ 5375.719239] lock(&fs_devs->device_list_mutex); [ 5375.719241] lock(sb_writers); [ 5375.719241] [ 5375.719241] * DEADLOCK * [ 5375.719241] [ 5375.719243] 4 locks held by btrfs-health/4662: [ 5375.719266] #0: (&fs_info->health_mutex){+.+.+.}, at: [<ffffffffa0246303>] health_kthread+0x63/0x490 [btrfs] [ 5375.719293] #1: (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.+.}, at: [<ffffffffa02c6611>] btrfs_dev_replace_finishing+0x41/0x990 [btrfs] [ 5375.719319] #2: (uuid_mutex){+.+.+.}, at: [<ffffffffa0282620>] btrfs_destroy_dev_replace_tgtdev+0x20/0x150 [btrfs] [ 5375.719343] #3: (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs] [ 5375.719343] [ 5375.719343] stack backtrace: [ 5375.719347] CPU: 2 PID: 4662 Comm: btrfs-health Not tainted 4.4.5-scst31x-debug-11+ #40 [ 5375.719348] Hardware name: Supermicro SYS-6018R-WTRT/X10DRW-iT, BIOS 1.0c 01/07/2015 [ 5375.719352] 0000000000000000 ffff880856f73880 ffffffff813529e3 ffffffff826182a0 [ 5375.719354] ffffffff8260c090 ffff880856f738c0 ffffffff810d667c ffff880856f73930 [ 5375.719357] ffff880861f32b40 ffff880861f32b68 0000000000000003 0000000000000004 [ 5375.719357] Call Trace: [ 5375.719363] [<ffffffff813529e3>] dump_stack+0x85/0xc2 [ 5375.719366] [<ffffffff810d667c>] print_circular_bug+0x1ec/0x260 [ 5375.719369] [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0 [ 5375.719373] [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 5375.719376] [<ffffffff810da4be>] lock_acquire+0xce/0x1e0 [ 5375.719378] [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0 [ 5375.719383] [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0 [ 5375.719385] [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0 [ 5375.719387] [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0 [ 5375.719389] [<ffffffff81242eb4>] mnt_want_write+0x24/0x50 [ 5375.719393] [<ffffffff8122ded2>] path_openat+0xd32/0x1360 [ 5375.719415] [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs] [ 5375.719418] [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 5375.719420] [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0 [ 5375.719423] [<ffffffff810f615d>] ? rcu_read_lock_sched_held+0x6d/0x80 [ 5375.719426] [<ffffffff81201a9b>] ? kmem_cache_alloc+0x26b/0x5d0 [ 5375.719430] [<ffffffff8122e7d4>] ? getname_kernel+0x34/0x120 [ 5375.719433] [<ffffffff8121d8a4>] file_open_name+0xe4/0x130 [ 5375.719436] [<ffffffff8121d923>] filp_open+0x33/0x60 [ 5375.719462] [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs] [ 5375.719485] [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs] [ 5375.719506] [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs] [ 5375.719530] [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs] [ 5375.719554] [<ffffffffa02c6b23>] ? btrfs_dev_replace_finishing+0x553/0x990 [btrfs] [ 5375.719576] [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs] [ 5375.719598] [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs] [ 5375.719621] [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs] [ 5375.719641] [<ffffffffa02463d8>] ? health_kthread+0x138/0x490 [btrfs] [ 5375.719661] [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs] [ 5375.719663] [<ffffffff810a70df>] kthread+0xef/0x110 [ 5375.719666] [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200 [ 5375.719669] [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70 [ 5375.719672] [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200 [ 5375.719697] ------------[ cut here ]------------ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Dan Carpenter	f5ecec3ce2	btrfs: send: silence an integer overflow warning The "sizeof(arg->clone_sources) arg->clone_sources_count" expression can overflow. It causes several static checker warnings. It's all under CAP_SYS_ADMIN so it's not that serious but lets silence the warnings. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Luis de Bethencourt	41b34accb2	btrfs: avoid overflowing f_bfree Since mixed block groups accounting isn't byte-accurate and f_bree is an unsigned integer, it could overflow. Avoid this. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Suggested-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Luis de Bethencourt	ae02d1bd07	btrfs: fix mixed block count of available space Metadata for mixed block is already accounted in total data and should not be counted as part of the free metadata space. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=114281 Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Austin S. Hemmelgarn	88be159c90	btrfs: allow balancing to dup with multi-device Currently, we don't allow the user to try and rebalance to a dup profile on a multi-device filesystem. In most cases, this is a perfectly sensible restriction as raid1 uses the same amount of space and provides better protection. However, when reshaping a multi-device filesystem down to a single device filesystem, this requires the user to convert metadata and system chunks to single profile before deleting devices, and then convert again to dup, which leaves a period of time where metadata integrity is reduced. This patch removes the single-device-only restriction from converting to dup profile to remove this potential data integrity reduction. Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 15:22:49 +02:00
Dmitry V. Levin	f0b22d1bb2	parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls Do not load one entry beyond the end of the syscall table when the syscall number of a traced process equals to __NR_Linux_syscalls. Similar bug with regular processes was fixed by commit `3bb457af4f` ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls"). This bug was found by strace test suite. Cc: stable@vger.kernel.org Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>	2016-05-06 15:09:07 +02:00
David Sterba	2355ac8495	btrfs: ioctl: reorder exclusive op check in RM_DEV Move the op exclusivity check before the other code (same as in ADD_DEV). Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 14:58:00 +02:00
David Sterba	58409edd2d	btrfs: kill unused writepage_io_hook callback It seems to be long time unused, since 2008 and `6885f308b5` ("Btrfs: Misc 2.6.25 updates"). Propagating the removal touches some code but has no functional effect. Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-06 14:57:57 +02:00
Rafael J. Wysocki	5f2f88e330	Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes' * pm-opp-fixes: PM / OPP: Remove useless check * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform * pm-cpuidle-fixes: ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value	2016-05-06 13:16:22 +02:00
Rafael J. Wysocki	7c21b38ca9	Merge branches 'acpica-fixes' and 'device-properties-fixes' * acpica-fixes: ACPICA: Dispatcher: Update thread ID for recursive method calls * device-properties-fixes: device property: Avoid potential dereferences of invalid pointers	2016-05-06 13:15:52 +02:00
Chen Yu	886123fb3a	x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f; Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM (35.5), the ratio bits are bit 8-15. Ignoring the upper bits can result in an incorrect tsc ratio, which causes the TSC calibration and the Local APIC timer frequency to be incorrect. Fix this problem by masking 0xff instead. [ tglx: Massaged changelog ] Fixes: `7da7c15613` "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs" Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Bin Gao <bin.gao@intel.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-06 11:50:50 +02:00
Linus Torvalds	9caa7e7848	Merge branch 'akpm' (patches from Andrew) Merge fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: byteswap: try to avoid __builtin_constant_p gcc bug lib/stackdepot: avoid to return 0 handle mm: fix kcompactd hang during memory offlining modpost: fix module autoloading for OF devices with generic compatible property proc: prevent accessing /proc/<PID>/environ until it's ready mm/zswap: provide unique zpool name mm: thp: kvm: fix memory corruption in KVM with THP enabled MAINTAINERS: fix Rajendra Nayak's address mm, cma: prevent nr_isolated_* counters from going negative mm: update min_free_kbytes from khugepaged after core initialization huge pagecache: mmap_sem is unlocked when truncation splits pmd rapidio/mport_cdev: fix uapi type definitions mm: memcontrol: let v2 cgroups follow changes in system swappiness mm: thp: correct split_huge_pages file permission	2016-05-05 20:48:35 -07:00
Linus Torvalds	43a3e837e2	mailmap: add John Paul Adrian Glaubitz Apparently patchwork ended up truncating the full name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 20:07:14 -07:00
Linus Torvalds	7270a3f761	Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - a fix for the persistent memory 'struct page' driver. The implementation overlooked the fact that pages are allocated in 2MB units leading to -ENOMEM when establishing some configurations. It's tagged for -stable as the problem was introduced with the initial implementation in 4.5. - The new "error status translation" routine, introduced with the 4.6 updates to the nfit driver, missed a necessary path in acpi_nfit_ctl(). The end result is that we are falsely assuming commands complete successfully when the embedded status says otherwise. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix translation of command status results libnvdimm, pfn: fix memmap reservation sizing	2016-05-05 18:10:01 -07:00
Arnd Bergmann	7322dd755e	byteswap: try to avoid __builtin_constant_p gcc bug This is another attempt to avoid a regression in wwn_to_u64() after that started using get_unaligned_be64(), which in turn ran into a bug on gcc-4.9 through 6.1. The regression got introduced due to the combination of two separate workarounds (commits `e3bde9568d`: "include/linux/unaligned: force inlining of byteswap operations" and `ef3fb2422f`: "scsi: fc: use get/put_unaligned64 for wwn access") that each try to sidestep distinct problems with gcc behavior (code growth and increased stack usage). Unfortunately after both have been applied, a more serious gcc bug has been uncovered, leading to incorrect object code that discards part of a function and causes undefined behavior. As part of this problem is how __builtin_constant_p gets evaluated on an argument passed by reference into an inline function, this avoids the use of __builtin_constant_p() for all architectures that set CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not set ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not suffer from the problem in the qla2xxx driver, but they might still run into it elsewhere. Both of the original workarounds were only merged in the 4.6 kernel, and the bug that is fixed by this patch should only appear if both are there, so we probably don't need to backport the fix. On the other hand, it works by simplifying the code path and should not have any negative effects. [arnd@arndb.de: fix older gcc warnings] (http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel) Link: https://lkml.org/lkml/headers/2016/4/12/1103 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 Fixes: `e3bde9568d` ("include/linux/unaligned: force inlining of byteswap operations") Fixes: `ef3fb2422f` ("scsi: fc: use get/put_unaligned64 for wwn access") Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> # on gcc-5.3 Tested-by: Quinn Tran <quinn.tran@qlogic.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Jan Hubicka <hubicka@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Joonsoo Kim	7c31190bcf	lib/stackdepot: avoid to return 0 handle Recently, we allow to save the stacktrace whose hashed value is 0. It causes the problem that stackdepot could return 0 even if in success. User of stackdepot cannot distinguish whether it is success or not so we need to solve this problem. In this patch, 1 bit are added to handle and make valid handle none 0 by setting this bit. After that, valid handle will not be 0 and 0 handle will represent failure correctly. Fixes: `33334e2576` ("lib/stackdepot.c: allow the stack trace hash to be zero") Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Vlastimil Babka	172400c69c	mm: fix kcompactd hang during memory offlining Assume memory47 is the last online block left in node1. This will hang: # echo offline > /sys/devices/system/node/node1/memory47/state After a couple of minutes, the following pops up in dmesg: INFO: task bash:957 blocked for more than 120 seconds. Not tainted 4.6.0-rc6+ #6 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bash D ffff8800b7adbaf8 0 957 951 0x00000000 Call Trace: schedule+0x35/0x80 schedule_timeout+0x1ac/0x270 wait_for_completion+0xe1/0x120 kthread_stop+0x4f/0x110 kcompactd_stop+0x26/0x40 __offline_pages.constprop.28+0x7e6/0x840 offline_pages+0x11/0x20 memory_block_action+0x73/0x1d0 memory_subsys_offline+0x47/0x60 device_offline+0x86/0xb0 store_mem_state+0xda/0xf0 dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 kernfs_fop_write+0x11d/0x170 __vfs_write+0x37/0x120 vfs_write+0xa9/0x1a0 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xa4 kcompactd is waiting for kcompactd_max_order > 0 when it's woken up to actually exit. Check kthread_should_stop() to break out of the wait. Fixes: `698b1b306` ("mm, compaction: introduce kcompactd"). Reported-by: Reza Arbab <arbab@linux.vnet.ibm.com> Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Philipp Zabel	acbef7b766	modpost: fix module autoloading for OF devices with generic compatible property Since the wildcard at the end of OF module aliases is gone, autoloading of modules that don't match a device's last (most generic) compatible value fails. For example the CODA960 VPU on i.MX6Q has the SoC specific compatible "fsl,imx6q-vpu" and the generic compatible "cnm,coda960". Since the driver currently only works with knowledge about the SoC specific integration, it doesn't list "cnm,cod960" in the module device table. This results in the device compatible "of:NvpuT<NULL>Cfsl,imx6q-vpuCcnm,coda960" not matching the module alias "of:NTCfsl,imx6q-vpu" anymore, whereas before commit `2f632369ab` ("modpost: don't add a trailing wildcard for OF module aliases") it matched the module alias "of:NTCfsl,imx6q-vpu". This patch adds two module aliases for each compatible, one without the wildcard and one with "C" appended. $ modinfo coda \| grep imx6q alias: of:NTCfsl,imx6q-vpuC* alias: of:NTCfsl,imx6q-vpu Fixes: `2f632369ab` ("modpost: don't add a trailing wildcard for OF module aliases") Link: http://lkml.kernel.org/r/1462203339-15340-1-git-send-email-p.zabel@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Cc: Javier Martinez Canillas <javier@osg.samsung.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Mathias Krause	8148a73c99	proc: prevent accessing /proc/<PID>/environ until it's ready If /proc/<PID>/environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc/<PID>/cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. The expected consequence is that userland trying to access /proc/<PID>/environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461 Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Dan Streetman	32a4e16903	mm/zswap: provide unique zpool name Instead of using "zswap" as the name for all zpools created, add an atomic counter and use "zswap%x" with the counter number for each zpool created, to provide a unique name for each new zpool. As zsmalloc, one of the zpool implementations, requires/expects a unique name for each pool created, zswap should provide a unique name. The zsmalloc pool creation does not fail if a new pool with a conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case, zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to change its compressor parameter if its zpool is zsmalloc; it also will be unable to change its zpool parameter back to zsmalloc, if it has any existing old zpool using zsmalloc with page(s) in it. Attempts to change the parameters will result in failure to create the zpool. This changes zswap to provide a unique name for each zpool creation. Fixes: `f1c54846ee` ("zswap: dynamic pool creation") Signed-off-by: Dan Streetman <ddstreet@ieee.org> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <dan.streetman@canonical.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Andrea Arcangeli	127393fbe5	mm: thp: kvm: fix memory corruption in KVM with THP enabled After the THP refcounting change, obtaining a compound pages from get_user_pages() no longer allows us to assume the entire compound page is immediately mappable from a secondary MMU. A secondary MMU doesn't want to call get_user_pages() more than once for each compound page, in order to know if it can map the whole compound page. So a secondary MMU needs to know from a single get_user_pages() invocation when it can map immediately the entire compound page to avoid a flood of unnecessary secondary MMU faults and spurious atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier users). Ideally instead of the page->_mapcount < 1 check, get_user_pages() should return the granularity of the "page" mapping in the "mm" passed to get_user_pages(). However it's non trivial change to pass the "pmd" status belonging to the "mm" walked by get_user_pages up the stack (up to the caller of get_user_pages). So the fix just checks if there is not a single pte mapping on the page returned by get_user_pages, and in turn if the caller can assume that the whole compound page is mapped in the current "mm" (in a pmd_trans_huge()). In such case the entire compound page is safe to map into the secondary MMU without additional get_user_pages() calls on the surrounding tail/head pages. In addition of being faster, not having to run other get_user_pages() calls also reduces the memory footprint of the secondary MMU fault in case the pmd split happened as result of memory pressure. Without this fix after a MADV_DONTNEED (like invoked by QEMU during postcopy live migration or balloning) or after generic swapping (with a failure in split_huge_page() that would only result in pmd splitting and not a physical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all times. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will need the same information if it doesn't want to run a flood of get_user_pages_fast and it can support multiple granularity in the secondary MMU mappings, so I think it is justified to be exposed not just to KVM. The other option would be to move transparent_hugepage_adjust to mm/huge_memory.c but that currently has all kind of KVM data structures in it, so it's definitely not a cut-and-paste work, so I couldn't do a fix as cleaner as this one for 4.6. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Eric Engestrom	ff2de822c9	MAINTAINERS: fix Rajendra Nayak's address Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Cc: Rajendra Nayak <rnayak@codeaurora.org> Cc: Afzal Mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Hugh Dickins	14af4a5e9b	mm, cma: prevent nr_isolated_* counters from going negative /proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go increasingly negative under compaction: which would add delay when should be none, or no delay when should delay. The bug in compaction was due to a recent mmotm patch, but much older instance of the bug was also noticed in isolate_migratepages_range() which is used for CMA and gigantic hugepage allocations. The bug is caused by putback_movable_pages() in an error path decrementing the isolated counters without them being previously incremented by acct_isolated(). Fix isolate_migratepages_range() by removing the error-path putback, thus reaching acct_isolated() with migratepages still isolated, and leaving putback to caller like most other places do. Fixes: `edc2ca6124` ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") [vbabka@suse.cz: expanded the changelog] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Jason Baron	bc22af74f2	mm: update min_free_kbytes from khugepaged after core initialization Khugepaged attempts to raise min_free_kbytes if its set too low. However, on boot khugepaged sets min_free_kbytes first from subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes after from init_per_zone_wmark_min(), via a module_init() call. Khugepaged used to use a late_initcall() to set min_free_kbytes (such that it occurred after the core initialization), however this was removed when the initialization of min_free_kbytes was integrated into the starting of the khugepaged thread. The fix here is simply to invoke the core initialization using a core_initcall() instead of module_init(), such that the previous initialization ordering is restored. I didn't restore the late_initcall() since start_stop_khugepaged() already sets min_free_kbytes via set_recommended_min_free_kbytes(). This was noticed when we had a number of page allocation failures when moving a workload to a kernel with this new initialization ordering. On an 8GB system this restores min_free_kbytes back to 67584 from 11365 when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. Fixes: `79553da293` ("thp: cleanup khugepaged startup") Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Hugh Dickins	684283988f	huge pagecache: mmap_sem is unlocked when truncation splits pmd zap_pmd_range()'s CONFIG_DEBUG_VM !rwsem_is_locked(&mmap_sem) BUG() will be invalid with huge pagecache, in whatever way it is implemented: truncation of a hugely-mapped file to an unhugely-aligned size would easily hit it. (Although anon THP could in principle apply khugepaged to private file mappings, which are not excluded by the MADV_HUGEPAGE restrictions, in practice there's a vm_ops check which excludes them, so it never hits this BUG() - there's no interface to "truncate" an anonymous mapping.) We could complicate the test, to check i_mmap_rwsem also when there's a vm_file; but my inclination was to make zap_pmd_range() more readable by simply deleting this check. A search has shown no report of the issue in the years since commit `e0897d75f0` ("mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range") expanded it from VM_BUG_ON() - though I cannot point to what commit I would say then fixed the issue. But there are a couple of other patches now floating around, neither yet in the tree: let's agree to retain the check as a VM_BUG_ON_VMA(), as Matthew Wilcox has done; but subject to a vma_is_anonymous() check, as Kirill Shutemov has done. And let's get this in, without waiting for any particular huge pagecache implementation to reach the tree. Matthew said "We can reproduce this BUG() in the current Linus tree with DAX PMDs". Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Alexandre Bounine	4e1016dac1	rapidio/mport_cdev: fix uapi type definitions Fix problems in uapi definitions reported by Gabriel Laskar: (see https://lkml.org/lkml/2016/4/5/205 for details) - move public header file rio_mport_cdev.h to include/uapi/linux directory - change types in data structures passed as IOCTL parameters - improve parameter checking in some IOCTL service routines Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Gabriel Laskar <gabriel@lse.epita.fr> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Johannes Weiner	4550c4e157	mm: memcontrol: let v2 cgroups follow changes in system swappiness Cgroup2 currently doesn't have a per-cgroup swappiness setting. We might want to add one later - that's a different discussion - but until we do, the cgroups should always follow the system setting. Otherwise it will be unchangeably set to whatever the ancestor inherited from the system setting at the time of cgroup creation. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Yang Shi	145bdaa150	mm: thp: correct split_huge_pages file permission split_huge_pages doesn't support get method at all, so the read permission sounds confusing, change the permission to write only. And, add "\n" to the output of set method to make it more readable. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 17:38:53 -07:00
Linus Torvalds	85f397a97a	asm-generic syscall fix for 4.6-rc My last pull request for asm-generic had just one patch that added two new system calls to asm/unistd.h, but unfortunately it turned out to be wrong, pointing arch/tile compat mode at the native handlers rather than the compat ones. This was spotted by Yury Norov, who is working on ILP32 mode for arch/arm64, which would have the same problem when merged. This fixes the table to use the correct compat syscalls, like the other 64-bit architectures do. I'll try to find the time to come up with a solution that prevents this problem from happening again, by allowing all future system calls to just get added in a single file for use by all architectures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVyuiZWCrR//JCVInAQLgcxAAnDsLXnepV7gYfkF3MjoN3GVR2BzehI+a f7YWTAoA/7MA9DJsSeSlqz0F0/M0TnVM7Yg3VkG4RvxhgSpHCnpol9/CEXuG4TLe 1Yn3CqNyMfNv9G3WfwWwSu4NeRWUeZAYbNkmWovhx3uWzmk1I+BnShd+IWmzDo0v +KC9tkiq7NYlManpdUR+e80Eoougqkrryk6VAdNcgmVvVSCEhSA3VVxiTx26kcdd mI7oz0gcJxlCwMZvNfRFFtrEAN9XGwV8bwkO5gYD/1nQSbzXcGLkmFpJmw8eCIX7 oM5gRs46tcKAEUA9fGVG58drrn0itwKqQO1LlUAhp+fsXU96c9rcgvZfY4Twehhk lGVIGPfRUJFOtXtVICofR0DPBkNvZB+EVTPV12gZlPOQKzSPHefEzQMkmUemZO13 Pv1lB8EeKeKlXsC9cSfKyNgBUNAkV35gV1s6wg/MrZo4Asx5/TlqO85n2wU0fspg d8yb866F+guz3OvU+RPyJpZL4sZ6tlg0/4TpBpDJetUwxYZNo4c1KpdwMFr64AFI Xh3wckbIBfLSUmB4ex6GCchqDlhla4QMA8Vl/ij/bfLMR6SwQEvfrtGegCluJsVX LRWV2arusSJTgcdCxa+chVnXjd6xwet8wPgahpOpTKtkJ9B7fnTFheMY+RRIV5Bb V7h/3AEbkvs= =yaH0 -----END PGP SIGNATURE----- Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic syscall fix from Arnd Bergmann: "My last pull request for asm-generic had just one patch that added two new system calls to asm/unistd.h, but unfortunately it turned out to be wrong, pointing arch/tile compat mode at the native handlers rather than the compat ones. This was spotted by Yury Norov, who is working on ILP32 mode for arch/arm64, which would have the same problem when merged. This fixes the table to use the correct compat syscalls, like the other 64-bit architectures do. I'll try to find the time to come up with a solution that prevents this problem from happening again, by allowing all future system calls to just get added in a single file for use by all architectures" * tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: use compat version for preadv2 and pwritev2	2016-05-05 15:40:38 -07:00
Greg Kroah-Hartman	2b86c4a843	Fourth set of IIO fixes for the 4.6 cycle. This last minute set is concerned with a regression in the mpu6050 driver. The regression causes a null pointer dereference on any ACPI device that has one of these present such as the ASUS T100TA Baytrail/T. The issue was known but thought (i.e. missunderstood by me) to only be a possible with no reports, so was routed via the normal merge window. Turns out this was wrong (thanks to Alan for reporting the crash). The pull is just for the null dereference fix and a followup fix that also stops the reported name of the device being NULL. * mpu6050 - Fix a 'possible' NULL dereference introduced as part of splitting the driver to allow both i2c and spi to be supported. The issue affects ACPI systems with this device. - Fix a follow up issue where the name and chip id both get set to null if the device driver instance is instantiated from ACPI tables. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXK50kAAoJEFSFNJnE9BaIP1MP/j3t1WfFkpbuoPlrsUOiYxd1 2VM9iNGZoLQa06p4WWkBv611NCht4yoOnvK9i8v3QM775xekmZQsnQBiCfm2eWHI cL19hCfQnCNcoJxQIUnGrO0xE7AYRxtgvYqmLljOLjKI/q23DxoPshadveLbEHhS zxo9hmDI7V8hxapyo15G3JZMSBWTz+d4yJn83d95GSEfdrRExJ66Ta7+eM62WR/M 4wiPEWtr5nL39orM0wY7lcGyV7KbG+csHSvXLjPq4tRV+D7kk9O916LWsWKNfcU3 3QiM0RYBdZIHZlpsTZH32+Zygt7efg6/BzRp/sXs/+CLaZw3mroeeQ8hjUaEnzac xiVtfc2SOUaSfWw6NO6DeyIBiHoe3EtcXBE5i0y6IjabhP3uUhQGZhWQGb8sinQl w6vvtHwS9/5hz6WN2qdkOUWVyThEaR1j7xd6dNA062pAjErZvwJ/zK9Z+CLojZ/P GAy9XS34qJ7ttx6NCvUqfWBXhU/K38UXEc6mPJCdSxMx2Q3J2UMojgTLCHC8/Eun H+m6cBXL58obGHZS7XcLT2W7u5mAAvNFg1T2bnpCcf3vLWGbpjXI55laATWw72ob m/LeYEfLO37Bbxe4XPU992zb2U6CrL8sJ/33pYJVQCnobenRDR9NmTn5IeHLKl5K lf1rwmtzTr1KNz4NBsP3 =FQCc -----END PGP SIGNATURE----- Merge tag 'iio-fixes-for-4.6d' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Fourth set of IIO fixes for the 4.6 cycle. This last minute set is concerned with a regression in the mpu6050 driver. The regression causes a null pointer dereference on any ACPI device that has one of these present such as the ASUS T100TA Baytrail/T. The issue was known but thought (i.e. missunderstood by me) to only be a possible with no reports, so was routed via the normal merge window. Turns out this was wrong (thanks to Alan for reporting the crash). The pull is just for the null dereference fix and a followup fix that also stops the reported name of the device being NULL. * mpu6050 - Fix a 'possible' NULL dereference introduced as part of splitting the driver to allow both i2c and spi to be supported. The issue affects ACPI systems with this device. - Fix a follow up issue where the name and chip id both get set to null if the device driver instance is instantiated from ACPI tables.	2016-05-05 15:38:07 -07:00
Linus Torvalds	c4781a8df9	Here are a couple last-minute fixes for ARM SoCs. Most of them are for the OMAP platforms, quoting Tony Lindgren: Fixes for omaps for v4.6-rc cycle. All dts fixes, mostly affecting voltages and pinctrl for various device drivers: - Regulator minimum voltage fixes for omap5 - ISP syscon register offset fix for omap3 - Fix regulator initial modes for n900 - Fix omap5 pinctrl wkup instance size The rest are all for different platforms: - Allwinner: Remove incorrect constraints from a dcdc1 regulator - Alltera SoCFPGA: Fix compilation in thumb2 mode - Samsung exynos: Fix a potential oops in the pm-domain error handling - Davinci: Avoid a link error if NVMEM is disabled - Renesas: Do not mark an external uart clock as disabled, to allow probing the uarts -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVyud6WCrR//JCVInAQKhXBAAir8+FCYQGLzwFQrCHTRa6zJq0sGUOLss DBawxezSxtcz9LYn2s9EI5W7yqs/vtjILNTtV3bNNHZTrn/cE8Jpvo+kjNK096PP 3m0LS20pbGV/629JXiuf55pWugoXUvQNP4kTcuW8dQzQWWuzv2QfJwtW776Q8rOQ ZRvh6uUsCgsc6JCCnZESVAnWQ7VA5YpTpZRhokhogdU0r6VTuHfOf8NPD10kiel+ jpayjC852MPJtS+1JI/d9vIydsSPHbfS8lkVp0rX7oep/Xjp6C3HGSNH+KkLTjXf 9q6uVm21Kko24wd3RAFYNFshNmD80j+BQJN+59gx7jUnQsVA+WZkNlKSPD1svf+R 9Ym+fGVn+UgsU/rSW+hhTYft7ao6Tud+W80QARFgWX6B3E3xF/ExJ9TE07hg0sK7 b+JZAFoSnEut6yTq5g99/YdvDLfqANPo3f3968bl18rKh15Iso/u177KR3cbMPBw rKFXg9fkmjd3g5mUUekYvaEKbb+bEeLaAT+2Cri3diSW7odTzsLQSXELS0UTOWfx TLTJSkmgxvABhdZZPQscHBvxwXPGQO8S479GGXG2xcI+tiT7ZDJPZeVm0P99B8WB Y2VjTjuc49ZALrzT93nY9nInyjhzI5NsnccG5Khw+qoxlZ3+H+N2tVkhwt6+FNcg vl8vcFbj9hM= =ymz3 -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Here are a couple last-minute fixes for ARM SoCs. Most of them are for the OMAP platforms, the rest are all for different platforms. OMAP: All dts fixes, mostly affecting voltages and pinctrl for various device drivers: - Regulator minimum voltage fixes for omap5 - ISP syscon register offset fix for omap3 - Fix regulator initial modes for n900 - Fix omap5 pinctrl wkup instance size Allwinner: Remove incorrect constraints from a dcdc1 regulator Alltera SoCFPGA: Fix compilation in thumb2 mode Samsung exynos: Fix a potential oops in the pm-domain error handling Davinci: Avoid a link error if NVMEM is disabled Renesas: Do not mark an external uart clock as disabled, to allow probing the uarts" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: davinci: only use NVMEM when available ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel ARM: dts: omap5: fix range of permitted wakeup pinmux registers ARM: dts: omap3-n900: Specify peripherals LDO regulators initial mode ARM: dts: omap3: Fix ISP syscon register offset ARM: dts: omap5-cm-t54: fix ldo1_reg and ldo4_reg ranges ARM: dts: omap5-board-common: fix ldo1_reg and ldo4_reg ranges arm64: dts: r8a7795: Don't disable referenced optional scif clock ARM: EXYNOS: Properly skip unitialized parent clock in power domain on ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator	2016-05-05 15:31:35 -07:00
Russell King	54176cc680	maintainers: update rmk's email address(es) Update my email and web addresses in the kernel maintainers file. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-05 15:26:31 -07:00
Howard Cochran	74d3694433	writeback: Fix performance regression in wb_over_bg_thresh() Commit `947e9762a8` ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") unintentionally changed this function's meaning from "are there more dirty pages than the background writeback threshold" to "are there more dirty pages than the writeback threshold". The background writeback threshold is typically half of the writeback threshold, so this had the effect of raising the number of dirty pages required to cause a writeback worker to perform background writeout. This can cause a very severe performance regression when a BDI uses BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker can now disagree on whether writeback should be initiated. For example, in a system having 1GB of RAM, a single spinning disk, and a "pass-through" FUSE filesystem mounted over the disk, application code mmapped a 128MB file on the disk and was randomly dirtying pages in that mapping. Because FUSE uses strictlimit and has a default max_ratio of only 1%, in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the dirty_freerun_ceiling is the average of those, ~150. So, it pauses the dirtying processes when we have 151 dirty pages and wakes up a background writeback worker. But the worker tests the wrong threshold (200 instead of 100), so it does not initiate writeback and just returns. Thus, balance_dirty_pages keeps looping, sleeping and then waking up the worker who will do nothing. It remains stuck in this state until the few dirty pages that we have finally expire and we write them back for that reason. Then the whole process repeats, resulting in near-zero throughput through the FUSE BDI. The fix is to call the parameterized variant of wb_calc_thresh, so that the worker will do writeback if the bg_thresh is exceeded which was the behavior before the referenced commit. Fixes: `947e9762a8` ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") Signed-off-by: Howard Cochran <hcochran@kernelspring.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+ Tested-by Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-05 15:44:55 -06:00
Vladimir Murzin	ec953b70f3	ARM: 8573/1: domain: move {set,get}_domain under config guard Recursive undefined instrcution falut is seen with R-class taking an exception. The reson for that is __show_regs() tries to get domain information, but domains is not available on !MMU cores, like R/M class. Fix it by puting {set,get}_domain functions under CONFIG_CPU_CP15_MMU guard and providing stubs for the case where domains is not supported. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2016-05-05 19:03:02 +01:00

1 2 3 4 5 ...

590115 Commits