Commit Graph

617695 Commits

Author SHA1 Message Date
Liu Bo
1ba98d086f Btrfs: detect corruption when non-root leaf has zero item
Right now we treat leaf which has zero item as a valid one
because we could have an empty tree, that is, a root that is
also a leaf without any item, however, in the same case but
when the leaf is not a root, we can end up with hitting the
BUG_ON(1) in btrfs_extend_item() called by
setup_inline_extent_backref().

This makes us check the situation as a corruption if leaf is
not its own root.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:31 -07:00
Liu Bo
053ab70f06 Btrfs: check btree node's nritems
When btree node (level = 1) has nritems which equals to zero,
we can end up with panic due to insert_ptr()'s

BUG_ON(slot > nritems);

where slot is 1 and nritems is 0, as copy_for_split() calls
insert_ptr(.., path->slots[1] + 1, ...);

A invalid value results in the whole mess, this adds the check
for btree's node nritems so that we stop reading block when
when something is wrong.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:30 -07:00
Jeff Mahoney
35bbb97fc8 btrfs: don't create or leak aliased root while cleaning up orphans
commit 909c3a22da (Btrfs: fix loading of orphan roots leading to BUG_ON)
avoids the BUG_ON but can add an aliased root to the dead_roots list or
leak the root.

Since we've already been loading roots into the radix tree, we should
use it before looking the root up on disk.

Cc: <stable@vger.kernel.org> # 4.5
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:29 -07:00
Josef Bacik
187ee58c62 Btrfs: fix em leak in find_first_block_group
We need to call free_extent_map() on the em we look up.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:29 -07:00
Anand Jain
1423881941 btrfs: do not background blkdev_put()
At the end of unmount/dev-delete, if the device exclusive open is not
actually closed, then there might be a race with another program in
the userland who is trying to open the device in exclusive mode and
it may fail for eg:
      unmount /btrfs; fsck /dev/x
      btrfs dev del /dev/x /btrfs; fsck /dev/x
so here background blkdev_put() is not a choice

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:28 -07:00
Liu Bo
28b737f6ed Btrfs: clarify do_chunk_alloc()'s return value
Function start_transaction() can return ERR_PTR(1) when flush is
BTRFS_RESERVE_FLUSH_LIMIT, so the call graph is

start_transaction (return ERR_PTR(1))
  -> btrfs_block_rsv_add (return 1)
     -> reserve_metadata_bytes (return 1)
        -> flush_space (return 1)
           -> do_chunk_alloc  (return 1)

With BTRFS_RESERVE_FLUSH_LIMIT, if flush_space is already on the
flush_state of ALLOC_CHUNK and it successfully allocates a new
chunk, then instead of trying to reserve space again,
reserve_metadata_bytes returns 1 immediately.

Eventually the callers who call start_transaction() usually just
do the IS_ERR() check which ERR_PTR(1) can pass, then it'll get
a panic when dereferencing a pointer which is ERR_PTR(1).

The following patch fixes the above problem.
"btrfs: flush_space: treat return value of do_chunk_alloc properly"
https://patchwork.kernel.org/patch/7778651/

This add comments to clarify do_chunk_alloc()'s return value.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:27 -07:00
Wang Xiaoguang
9e7cc91a6d btrfs: fix fsfreeze hang caused by delayed iputs deal
When running fstests generic/068, sometimes we got below deadlock:
  xfs_io          D ffff8800331dbb20     0  6697   6693 0x00000080
  ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
  ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
  ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
  Call Trace:
  [<ffffffff816a9045>] schedule+0x35/0x80
  [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
  [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
  [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
  [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
  [<ffffffff810d32b5>] percpu_down_read+0x35/0x50
  [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
  [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
  [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
  [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
  [<ffffffff81230a1a>] evict+0xba/0x1a0
  [<ffffffff812316b6>] iput+0x196/0x200
  [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
  [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
  [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
  [<ffffffff81218040>] freeze_super+0xf0/0x190
  [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
  [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
  [<ffffffff81229409>] SyS_ioctl+0x79/0x90
  [<ffffffff81003c12>] do_syscall_64+0x62/0x110
  [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25

>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.

The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. There still maybe some codes adding
delayed iputs, see below sample race window:

         CPU1                                  |         CPU2
|-> freeze_super()                             |
    |-> sync_filesystem(sb);                   |
    |                                          |-> cleaner_kthread()
    |                                          |   |-> btrfs_delete_unused_bgs()
    |                                          |       |-> btrfs_remove_chunk()
    |                                          |           |-> btrfs_remove_block_group()
    |                                          |               |-> btrfs_add_delayed_iput()
    |                                          |
    |-> sb->s_writers.frozen = SB_FREEZE_FS;   |
    |-> sb_wait_write(sb, SB_FREEZE_FS);       |
    |   acquire SB_FREEZE_FS lock.             |
    |                                          |
    |-> btrfs_freeze()                         |
        |-> btrfs_commit_transaction()         |
            |-> btrfs_run_delayed_iputs()      |
            |   will handle delayed iputs,     |
            |   that means start_transaction() |
            |   will be called, which will try |
            |   to get SB_FREEZE_FS lock.      |

To fix this issue, introduce a "int fs_frozen" to record internally whether
fs has been frozen. If fs has been frozen, we can not handle delayed iputs.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment to btrfs_freeze ]
Signed-off-by: David Sterba <dsterba@suse.com>

Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:26 -07:00
Wang Xiaoguang
18513091af btrfs: update btrfs_space_info's bytes_may_use timely
This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
	#!/bin/bash
	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
	dev=$(losetup --show -f fs.img)
	mkfs.btrfs -f -M $dev
	mkdir /tmp/mntpoint
	mount $dev /tmp/mntpoint
	cd /tmp/mntpoint
	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile

Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
|   bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
    |-> btrfs_reserve_extent()
        |-> btrfs_add_reserved_bytes()
        |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
        |   change bytes_may_use, and bytes_reserved += 64M. Now
        |   bytes_may_use + bytes_reserved == 128M, which is greater
        |   than btrfs_space_info's total_bytes, false enospc occurs.
        |   Note, the bytes_may_use decrease operation will be done in
        |   end of btrfs_fallocate(), which is too late.

Here is another simple case for buffered write:
                    CPU 1              |              CPU 2
                                       |
|-> cow_file_range()                   |-> __btrfs_buffered_write()
    |-> btrfs_reserve_extent()         |   |
    |                                  |   |
    |                                  |   |
    |    .....                         |   |-> btrfs_check_data_free_space()
    |                                  |
    |                                  |
    |-> extent_clear_unlock_delalloc() |

In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction in order to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.

To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.

As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.

Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:26 -07:00
Wang Xiaoguang
4824f1f412 btrfs: divide btrfs_update_reserved_bytes() into two functions
This patch divides btrfs_update_reserved_bytes() into
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), and
next patch will extend btrfs_add_reserved_bytes()to fix some
false ENOSPC error, please see later patch for detailed info.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:25 -07:00
Wang Xiaoguang
dcb40c196f btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.

start bytenr   cluster->start
|              |     extent      |   extent   | ...| extent |
|----------------------------------------------------------------|
|                block group reloc_inode                         |

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:24 -07:00
Qu Wenruo
df2c95f33e btrfs: qgroup: Fix qgroup incorrectness caused by log replay
When doing log replay at mount time(after power loss), qgroup will leak
numbers of replayed data extents.

The cause is almost the same of balance.
So fix it by manually informing qgroup for owner changed extents.

The bug can be detected by btrfs/119 test case.

Cc: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:23 -07:00
Qu Wenruo
62b99540a1 btrfs: relocation: Fix leaking qgroups numbers on data extents
This patch fixes a REGRESSION introduced in 4.2, caused by the big quota
rework.

When balancing data extents, qgroup will leak all its numbers for
relocated data extents.

The relocation is done in the following steps for data extents:
1) Create data reloc tree and inode
2) Copy all data extents to data reloc tree
   And commit transaction
3) Create tree reloc tree(special snapshot) for any related subvolumes
4) Replace file extent in tree reloc tree with new extents in data reloc
   tree
   And commit transaction
5) Merge tree reloc tree with original fs, by swapping tree blocks

For 1)~4), since tree reloc tree and data reloc tree doesn't count to
qgroup, everything is OK.

But for 5), the swapping of tree blocks will only info qgroup to track
metadata extents.

If metadata extents contain file extents, qgroup number for file extents
will get lost, leading to corrupted qgroup accounting.

The fix is, before commit transaction of step 5), manually info qgroup to
track all file extents in data reloc tree.
Since at commit transaction time, the tree swapping is done, and qgroup
will account these data extents correctly.

Cc: Mark Fasheh <mfasheh@suse.de>
Reported-by: Mark Fasheh <mfasheh@suse.de>
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:22 -07:00
Qu Wenruo
cb93b52cc0 btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent()
Refactor btrfs_qgroup_insert_dirty_extent() function, to two functions:
1. btrfs_qgroup_insert_dirty_extent_nolock()
   Almost the same with original code.
   For delayed_ref usage, which has delayed refs locked.

   Change the return value type to int, since caller never needs the
   pointer, but only needs to know if they need to free the allocated
   memory.

2. btrfs_qgroup_insert_dirty_extent()
   The more encapsulated version.

   Will do the delayed_refs lock, memory allocation, quota enabled check
   and other things.

The original design is to keep exported functions to minimal, but since
more btrfs hacks exposed, like replacing path in balance, we need to
record dirty extents manually, so we have to add such functions.

Also, add comment for both functions, to info developers how to keep
qgroup correct when doing hacks.

Cc: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:21 -07:00
Jeff Mahoney
d06f23d6a9 btrfs: waiting on qgroup rescan should not always be interruptible
We wait on qgroup rescan completion in three places: file system
shutdown, the quota disable ioctl, and the rescan wait ioctl.  If the
user sends a signal while we're waiting, we continue happily along.  This
is expected behavior for the rescan wait ioctl.  It's racy in the shutdown
path but mostly works due to other unrelated synchronization points.
In the quota disable path, it Oopses the kernel pretty much immediately.

Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:20 -07:00
Jeff Mahoney
d2c609b834 btrfs: properly track when rescan worker is running
The qgroup_flags field is overloaded such that it reflects the on-disk
status of qgroups and the runtime state.  The BTRFS_QGROUP_STATUS_FLAG_RESCAN
flag is used to indicate that a rescan operation is in progress, but if
the file system is unmounted while a rescan is running, the rescan
operation is paused.  If the file system is then mounted read-only,
the flag will still be present but the rescan operation will not have
been resumed.  When we go to umount, btrfs_qgroup_wait_for_completion
will see the flag and interpret it to mean that the rescan worker is
still running and will wait for a completion that will never come.

This patch uses a separate flag to indicate when the worker is
running.  The locking and state surrounding the qgroup rescan worker
needs a lot of attention beyond this patch but this is enough to
avoid a hung umount.

Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by; Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:19 -07:00
Alex Lyakas
eecba891d3 btrfs: flush_space: treat return value of do_chunk_alloc properly
do_chunk_alloc returns 1 when it succeeds to allocate a new chunk.
But flush_space will not convert this to 0, and will also return 1.
As a result, reserve_metadata_bytes will think that flush_space failed,
and may potentially return this value "1" to the caller (depends how
reserve_metadata_bytes was called). The caller will also treat this as an error.
For example, btrfs_block_rsv_refill does:

int ret = -ENOSPC;
...
ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
if (!ret) {
        block_rsv_add_bytes(block_rsv, num_bytes, 0);
        return 0;
}

return ret;

So it will return -ENOSPC.

Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:18 -07:00
Liu Bo
f3bca8028b Btrfs: add ASSERT for block group's memory leak
This adds several ASSERT()' s to report memory leak of block group cache.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:17 -07:00
Qu Wenruo
d8422ba334 btrfs: backref: Fix soft lockup in __merge_refs function
When over 1000 file extents refers to one extent, find_parent_nodes()
will be obviously slow, due to the O(n^2)~O(n^3) loops inside
__merge_refs().

The following ftrace shows the cubic growth of execution time:

256 refs
 5) + 91.768 us   |  __add_keyed_refs.isra.12 [btrfs]();
 5)   1.447 us    |  __add_missing_keys.isra.13 [btrfs]();
 5) ! 114.544 us  |  __merge_refs [btrfs]();
 5) ! 136.399 us  |  __merge_refs [btrfs]();

512 refs
 6) ! 279.859 us  |  __add_keyed_refs.isra.12 [btrfs]();
 6)   3.164 us    |  __add_missing_keys.isra.13 [btrfs]();
 6) ! 442.498 us  |  __merge_refs [btrfs]();
 6) # 2091.073 us |  __merge_refs [btrfs]();

and 1024 refs
 7) ! 368.683 us  |  __add_keyed_refs.isra.12 [btrfs]();
 7)   4.810 us    |  __add_missing_keys.isra.13 [btrfs]();
 7) # 2043.428 us |  __merge_refs [btrfs]();
 7) * 18964.23 us |  __merge_refs [btrfs]();

And sort them into the following char:
(Unit: us)
------------------------------------------------------------------------
 Trace function        | 256 ref        | 512 refs      | 1024 refs    |
------------------------------------------------------------------------
 __add_keyed_refs      | 91             | 249           | 368          |
 __add_missing_keys    | 1              | 3             | 4            |
 __merge_refs 1st call | 114            | 442           | 2043         |
 __merge_refs 2nd call | 136            | 2091          | 18964        |
------------------------------------------------------------------------

We can see the that __add_keyed_refs() grows almost in linear behavior.
And __add_missing_keys() in this case doesn't change much or takes much
time.

While for the 1st __merge_refs() it's square growth
for the 2nd __merge_refs() call it's cubic growth.

It's no doubt that merge_refs() will take a long long time to execute if
the number of refs continues its grows.

So add a cond_resced() into the loop of __merge_refs().

Although this will solve the problem of soft lockup, we need to use the
new rb_tree based structure introduced by Lu Fengqi to really solve the
long execution time.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:16 -07:00
Liu Bo
1c1ea4f781 Btrfs: fix memory leak of reloc_root
When some critical errors occur and FS would be flipped into RO,
if we have an on-going balance, we can end up with a memory leak
of root->reloc_root since btrfs_drop_snapshots() bails out
without freeing reloc_root at the very early start.

However, we're not able to free reloc_root in btrfs_drop_snapshots()
because its caller, merge_reloc_roots(), still needs to access it to
cleanup reloc_root's rbtree.

This makes us free reloc_root when we're going to free fs/file roots.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:15 -07:00
Mark Rutland
fd363bd417 arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE
When CONFIG_RANDOMIZE_BASE is selected, we modify the page tables to remap the
kernel at a newly-chosen VA range. We do this with the MMU disabled, but do not
invalidate TLBs prior to re-enabling the MMU with the new tables. Thus the old
mappings entries may still live in TLBs, and we risk violating
Break-Before-Make requirements, leading to TLB conflicts and/or other issues.

We invalidate TLBs when we uninsall the idmap in early setup code, but prior to
this we are subject to issues relating to the Break-Before-Make violation.

Avoid these issues by invalidating the TLBs before the new mappings can be
used by the hardware.

Fixes: f80fb3a3d5 ("arm64: add support for kernel ASLR")
Cc: <stable@vger.kernel.org> # 4.6+
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-08-25 11:11:32 +01:00
Linus Torvalds
61c04572de Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal fixes from Zhang Rui:

 - Fix cpu_cooling to have separate thermal_cooling_device_ops
   structures for cpus with and without power model, to avoid NULL
   dereference in cpufreq_state2power.  From Brendan Jackman.

 - Fix a possible NULL dereference in imx_thermal driver.  From Corentin
   LABBE.

 - Another two trivial fixes, one typo fix and one deleting module
   owner.  From Caesar Wang and Markus Elfring.

* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: imx: fix a possible NULL dereference
  thermal: trivial: fix the typo
  Thermal-INT3406: Delete owner assignment
  thermal: cpu_cooling: Fix NULL dereference in cpufreq_state2power
2016-08-25 05:49:38 -04:00
Lorenzo Colitti
a52e95abf7 net: diag: allow socket bytecode filters to match socket marks
This allows a privileged process to filter by socket mark when
dumping sockets via INET_DIAG_BY_FAMILY. This is useful on
systems that use mark-based routing such as Android.

The ability to filter socket marks requires CAP_NET_ADMIN, which
is consistent with other privileged operations allowed by the
SOCK_DIAG interface such as the ability to destroy sockets and
the ability to inspect BPF filters attached to packet sockets.

Tested: https://android-review.googlesource.com/261350
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 21:57:20 -07:00
Lorenzo Colitti
627cc4add5 net: diag: slightly refactor the inet_diag_bc_audit error checks.
This simplifies the code a bit and also allows inet_diag_bc_audit
to send to userspace an error that isn't EINVAL.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 21:57:20 -07:00
Vivien Didelot
9d490b4ee4 net: dsa: rename switch operations structure
Now that the dsa_switch_driver structure contains only function pointers
as it is supposed to, rename it to the more appropriate dsa_switch_ops,
uniformly to any other operations structure in the kernel.

No functional changes here, basically just the result of something like:
s/dsa_switch_driver *drv/dsa_switch_ops *ops/g

However keep the {un,}register_switch_driver functions and their
dsa_switch_drivers list as is, since they represent the -- likely to be
deprecated soon -- legacy DSA registration framework.

In the meantime, also fix the following checks from checkpatch.pl to
make it happy with this patch:

    CHECK: Comparison to NULL could be written "!ops"
    #403: FILE: net/dsa/dsa.c:470:
    +	if (ops == NULL) {

    CHECK: Comparison to NULL could be written "ds->ops->get_strings"
    #773: FILE: net/dsa/slave.c:697:
    +		if (ds->ops->get_strings != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats"
    #824: FILE: net/dsa/slave.c:785:
    +	if (ds->ops->get_ethtool_stats != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_sset_count"
    #835: FILE: net/dsa/slave.c:798:
    +		if (ds->ops->get_sset_count != NULL)

    total: 0 errors, 0 warnings, 4 checks, 784 lines checked

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 21:45:39 -07:00
Dave Airlie
179ca3bb2c Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
radeon and amdgpu fixes for 4.8.  Nothing major:
- fix a performance regression due to the LRU changes in 4.7
- 32 bit fixes
- fix a PLL regression
- misc bug fixes

* 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: skip TV/CV in display parsing
  drm/amdgpu: avoid a possible array overflow
  drm/amdgpu: fix lru size grouping v2
  drm/amdgpu: fix timeout value check in amd_sched_job_recovery
  drm/amdgpu: fix sdma_v2_4_ring_test_ib
  drm/amdgpu: fix amdgpu_move_blit on 32bit systems
  drm/radeon: fix radeon_move_blit on 32bit systems
  drm/radeon: only apply the SS fractional workaround to RS[78]80
2016-08-25 12:50:30 +10:00
Dave Airlie
30bffd1b44 drm/tegra: Fixes for v4.8-rc4
This contains one fix for DSI runtime power management support that was
 introduced in v4.8-rc1. This is slightly more elaborate than I would've
 wished, but there are a few corner cases that needed fixing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIwBAABCAAaBQJXvaiSExx0cmVkaW5nQG52aWRpYS5jb20ACgkQ3SOs138+s6Hv
 jg/6AnYkczQYraAb8Oy2YO3dIfz3r0TLGEEnY3AKudur867mNG0V59JcUXHQYKin
 FL7z4iIEx21RAtA9gUEc7xz9gitXP2W/GqRG6v1u0wLn+GS72TuNZCsqUVPQ6Glr
 yg0uLPoxsFyLa3rsTgif5jXxpsbSd6OV2NdkpImqt1EQLaOWYSNqLLq/YNvMsvCK
 QivRvvJEYznIyJayIN/PDekfYY3L5esMQNdDEI9jJKdJrTykQThx2eOIbsgY4MyC
 sdp/h9R23O1ZFvfGONrQHOOzDGumhDJtsRfew+q9JoBaOOgztk2LmPpRZFQsIrmV
 1rv1ahQh3boPWw3cSoQMCE8E3Eu2wD/zNrWYbKyhjChx/HTaSPyQGwk0v2WycizA
 KnNo3eQOTT/5Wb/KvhCaef4Q+yc4LbcAC8T3pkYyGVy+yhUOXr4SGOQLh4Za0Sag
 uaM20wk3TPYAiyUKZFE7RZ1WHtOp56gOIHbS5eV5ppuR2lmoCh4vK5yCojaz4gww
 DAHzldegReurS3du613KrV63lue1r3nwX1f0kKDsWd4yN0FH/74g/QC1jXKqJmmk
 ZevqKCQ1eoaPyOMQI2G0rvRL4B63s800oHO2L760e5SdbM1/Oraxzc+FobGZz9KW
 Y55SyNnn3QT5HcH6dntmisnuSbZVXVNonEJaPJWFLH/ChgI=
 =lyq1
 -----END PGP SIGNATURE-----

Merge tag 'drm/tegra/for-4.8-rc4' of git://anongit.freedesktop.org/tegra/linux into drm-fixes

drm/tegra: Fixes for v4.8-rc4

This contains one fix for DSI runtime power management support that was
introduced in v4.8-rc1. This is slightly more elaborate than I would've
wished, but there are a few corner cases that needed fixing.

* tag 'drm/tegra/for-4.8-rc4' of git://anongit.freedesktop.org/tegra/linux:
  drm/tegra: dsi: Enhance runtime power management
2016-08-25 12:49:22 +10:00
Heinz Mauelshagen
9c5a559d94 dm log: fix unitialized bio operation flags
Commit e6047149db ("dm: use bio op accessors") switched DM over to
using bio_set_op_attrs() but didn't take care to initialize
lc->io_req.bi_op_flags in dm-log.c:rw_header().  This caused
rw_header()'s call to dm_io() to make bio->bi_op_flags be uninitialized
in dm-io.c:do_region(), which ultimately resulted in a SCSI BUG() in
sd_init_command().

Also, adjust rw_header() and its callers to use REQ_OP_{READ|WRITE}.

Fixes: e6047149db ("dm: use bio op accessors")
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-08-24 21:55:05 -04:00
Mike Snitzer
299f6230bc dm flakey: fix reads to be issued if drop_writes configured
v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.

Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2016-08-24 21:55:05 -04:00
Jens Axboe
0e87e58bf6 blk-mq: improve warning for running a queue on the wrong CPU
__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online.  If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-24 15:38:01 -06:00
Jens Axboe
e57690fe00 blk-mq: don't overwrite rq->mq_ctx
We do this in a few places, if the CPU is offline. This isn't allowed,
though, since on multi queue hardware, we can't just move a request
from one software queue to another, if they map to different hardware
queues. The request and tag isn't valid on another hardware queue.

This can happen if plugging races with CPU offlining. But it does
no harm, since it can only happen in the window where we are
currently busy freezing the queue and flushing IO, in preparation
for redoing the software <-> hardware queue mappings.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-24 15:34:35 -06:00
Doug Ledford
716b076ba4 IB/srpt: Update sport->port_guid with each port refresh
If port_guid is set with the default subnet_prefix, then we get a change
event and run a port refresh, we don't update the port_guid.  As a
result, attempts to create a target device that uses the new
subnet_prefix in the wwn will fail to find a match and be rejected by
the ib_srpt driver.  This makes it impossible to configure a port if it
was initialized with a default subnet_prefix and later changed to any
non-default subnet-prefix.  Updating the port refresh task to always
update the wwn based upon the current subnext_prefix solves this
problem.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: nab@linux-iscsi.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-24 16:51:16 -04:00
Linus Torvalds
4935e04ef4 Merge branch 'for-linus-4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fix from Richard Weinberger:
 "This contains a fix for a build regression introduced during the merge
  window"

* 'for-linus-4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Don't discard .text.exit section
2016-08-24 16:04:59 -04:00
Linus Torvalds
94ef71a99a This pull requests contains fixes for two issues in UBI and UBIFS:
1. Wrong UBIFS assertion.
 2. A UBIFS xattr regression.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXvflzAAoJEEtJtSqsAOnWs8QP/jEgGY5QcuvdIA+ymFFFeZ8o
 /8YwzbLula+M5T7trMSaRmT5AW5iwY/Xu/VVpVipKMVkAP7079jLjJljOckwriut
 FY60BoQ8VcxwFPRn5xMDJ6KdDMAzVFX10j/+h71VrlE6Ej4nu8XVYGYiRdnjTYiF
 JdxvuWgIDmycRT6bH69c5ZSNpMuOPpCydX0CbWEFk9P07BKL2+9inpPBGRJxy8y8
 abT4ByCJmZWjruzjeBrR4o9A2hrDTrlHPH2RzQgJXCDKntM8AjsCCCReHbhrCKLo
 QmZh+8l8N8HN4GQczcrbTSL0EZn3IsbAS6Ut03NOPcSb+kWjaH5Hcr2kEUEFA7R4
 myKfFe6/BorgHX4QTqNiX7r0y63YepcIFAIUnfzv4wya8p+IGAunouZ+D6e+3BPy
 ICUv3oGqDZkI/fqc5h3cU9RLF7fOvdAtqO+M/lInwFPbqv3jJoVxuC4oD7PEh/eY
 n7VNeVyYr+8JZ5MKBU3zYHRNyHDYME+wTpNGu/4fJR1Rsym523L7hka3zQDkDqs3
 xqFoln1BcRKT/kMTKubK5dLAWaRv68RuMZTPRDeSBrY5vY7jYTYg/42eXnmcOzeP
 vi8L3p8o2CSLHTjeX+Q0lHAw/Ppy/FFowUEx03huj0C+5LNtY6RvIvBtC3YABR8U
 nC92PnhRhl9gUvkDuii9
 =8pdh
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.8-rc4' of git://git.infradead.org/linux-ubifs

Pull UBIFS fixes from Richard Weinberger:
 "This pull requests contains fixes for two issues in UBI and UBIFS:

   - wrong UBIFS assertion.
   - a UBIFS xattr regression"

* tag 'upstream-4.8-rc4' of git://git.infradead.org/linux-ubifs:
  ubifs: Fix xattr generic handler usage
  ubifs: Fix assertion in layout_in_gaps()
2016-08-24 15:54:41 -04:00
Mark Brown
cfb89f2e75 Merge remote-tracking branches 'asoc/fix/max98371', 'asoc/fix/nau8825', 'asoc/fix/omap', 'asoc/fix/samsung', 'asoc/fix/simple' and 'asoc/fix/wm2000' into asoc-linus 2016-08-24 19:05:25 +01:00
Mark Brown
e8f0f8aa4e Merge remote-tracking branches 'asoc/fix/atmel', 'asoc/fix/compress', 'asoc/fix/da7213' and 'asoc/fix/debugfs' into asoc-linus 2016-08-24 19:05:22 +01:00
Mark Brown
d520519518 Merge remote-tracking branch 'asoc/fix/rcar' into asoc-linus 2016-08-24 19:05:21 +01:00
Mark Brown
a74306fe94 Merge remote-tracking branch 'asoc/fix/intel' into asoc-linus 2016-08-24 19:05:20 +01:00
Mark Brown
b5db6c57c9 Merge remote-tracking branch 'asoc/fix/dapm' into asoc-linus 2016-08-24 19:05:18 +01:00
Mark Brown
ae16842306 Merge remote-tracking branch 'asoc/fix/core' into asoc-linus 2016-08-24 19:05:17 +01:00
Alex Deucher
611a1507fe drm/amdgpu: skip TV/CV in display parsing
No asics supported by amdgpu support analog TV.

Workaround for bug:
https://bugs.freedesktop.org/show_bug.cgi?id=97460

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2016-08-24 14:04:38 -04:00
Linus Torvalds
fe2dd21282 xen: regression fix for 4.8-rc3
- Fix a regression in the xenbus device preventing userspace tools
   from working.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXvdugAAoJEFxbo/MsZsTRAwEH/AiKLV4T0OiARv/df827WVnL
 obUmEAh/wVSWZh2xdUNurDOH64lEfeBDSBIpGPQMLGmXLzNEQO9u8ZJYWJ7R1Ryp
 JU37lu3DP7HqQqTXsy8ltgcBkwVaQZAo0GRtDeua80ZPdjulnZirwHWS48TuNIFF
 pVtW4Eoy1BNAVri55o5hOIub4HUKMRoNB/J+o+SKLyJEvOon+qD4pOfIhR3sqeja
 oYVX7QpY/4Miymd5uI9v8LUefS4PW/U58a7tjr414Ng4mzQbZOHDmNyWF0CH27lj
 INAmgMXDG7RtiSQMWPKtDQUvuefApKoeRmFr6mQ/xHyCX3cAzOw07+p0rKacCig=
 =PTX1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.8b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen regression fix from David Vrabel:
 "Fix a regression in the xenbus device preventing userspace tools from
  working"

* tag 'for-linus-4.8b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: change the type of xen_vcpu_id to uint32_t
  xenbus: don't look up transaction IDs for ordinary writes
2016-08-24 14:04:30 -04:00
Alex Deucher
e1718d97aa drm/amdgpu: avoid a possible array overflow
When looking up the connector type make sure the index
is valid.  Avoids a later crash if we read past the end
of the array.

Workaround for bug:
https://bugs.freedesktop.org/show_bug.cgi?id=97460

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2016-08-24 14:04:24 -04:00
Vitaly Kuznetsov
55467dea29 xen: change the type of xen_vcpu_id to uint32_t
We pass xen_vcpu_id mapping information to hypercalls which require
uint32_t type so it would be cleaner to have it as uint32_t. The
initializer to -1 can be dropped as we always do the mapping before using
it and we never check the 'not set' value anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-08-24 18:17:27 +01:00
Jan Beulich
9a035a40f7 xenbus: don't look up transaction IDs for ordinary writes
This should really only be done for XS_TRANSACTION_END messages, or
else at least some of the xenstore-* tools don't work anymore.

Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced condition")
Reported-by: Richard Schütz <rschuetz@uni-koblenz.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Richard Schütz <rschuetz@uni-koblenz.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-08-24 18:16:18 +01:00
Yuval Mintz
c7b7b483cc bnx2x: Don't flush multicast MACs
When ndo_set_rx_mode() is called for bnx2x, as part of process of
configuring the new MAC address filters [both unicast & multicast]
driver begins by flushing the existing configuration and then iterating
over the network device's list of addresses and configures those instead.

This has the side-effect of creating a short gap where traffic wouldn't
be properly classified, as no filters are configured in HW.
While for unicasts this is rather insignificant [as unicast MACs don't
frequently change while interface is actually running],
for multicast traffic it does pose an issue as there are multicast-based
networks where new multicast groups would constantly be removed and
added.

This patch tries to remedy this [at least for the newer adapters] -
Instead of flushing & reconfiguring all existing multicast filters,
the driver would instead create the approximate hash match that would
result from the required filters. It would then compare it against the
currently configured approximate hash match, and only add and remove the
delta between those.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:45:20 -07:00
David S. Miller
6546c78ea6 RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAV72ycPSw1s6N8H32AQI9fQ/8CQQSB8vCRKCdTvL4fDVN9z66nwJOzt0E
 4D41F646d2iYpaL7l2/7Z1ZUdTP0722Oss3b562vf24VjGSfJs4qFkqkXuhmtWhU
 O33qk3/p/eVaHatkQuyLyvZut9CkGLY8sYiGozcEsEVzNYcEvAxXi95Mw3YpHRJV
 OXbFedjaIrf2c2f2GsotsgLJz+1R2aCcbDePRpckh2dmNeN5tKtgnHx1+LSDGFL+
 gyGzfY5wEt6tdunnqPnutL1KSLckCQnQdM22P6HA3L4ZspEsSmVX92WfBhBWQXOi
 mG+lX3+qchACNHAQeSflxsP+hAWXKQCIE9c2wZs/jWHscZC7xXRao4mxkGEblczy
 T+WENnNof5qqxCOrUkjKor1FyU06DbBYOXlM4u10iZLSK4smuMC2AYEf5+yA85hZ
 D98ldD1/dtmck9nF5k719J+8qwbaskU7ZHWLny8Iz71qWymIKKUWd3B31tOqH6YV
 it+YGSS0JMKOrJgd+QyxSkf5KLnqOLLd5aEpXkGNAcUQZfVcuxScF1HMed9odkny
 TIzsLqHyVMBJ0Ik9vYXdcTVVDujWFFxyRGfRK4ZlGKBRjzAfyuUsHvMcZiXA9xYI
 5bCQh0WKKWGgl6/dWdIXidWy7HtNburv1Cse7q+2X1Sly2Kzz2c75vASgWb/oWaR
 /m6nA60G5FM=
 =0zp2
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20160824-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Add better client conn management strategy

These two patches add a better client connection management strategy.  They
need to be applied on top of the just-posted fixes.

 (1) Duplicate the connection list and separate out procfs iteration from
     garbage collection.  This is necessary for the next patch as with that
     client connections no longer appear on a single list and may not
     appear on a list at all - and really don't want to be exposed to the
     old garbage collector.

     (Note that client conns aren't left dangling, they're also in a tree
     rooted in the local endpoint so that they can be found by a user
     wanting to make a new client call.  Service conns do not appear in
     this tree.)

 (2) Implement a better lifetime management and garbage collection strategy
     for client connections.

     In this, a client connection can be in one of five cache states
     (inactive, waiting, active, culled and idle).  Limits are set on the
     number of client conns that may be active at any one time and makes
     users wait if they want to start a new call when there isn't capacity
     available.

     To make capacity available, active and idle connections can be culled,
     after a short delay (to allow for retransmission).  The delay is
     reduced if the capacity exceeds a tunable threshold.

     If there is spare capacity, client conns are permitted to hang around
     a fair bit longer (tunable) so as to allow reuse of negotiated
     security contexts.

     After this patch, the client conn strategy is separate from that of
     service conns (which continues to use the old code for the moment).

     This difference in strategy is because the client side retains control
     over when it allows a connection to become active, whereas the service
     side has no control over when it sees a new connection or a new call
     on an old connection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:43:44 -07:00
David S. Miller
d3c10db138 RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAV72yZ/Sw1s6N8H32AQITcQ//Tj+QPw2wnc6p6CAXSQ4CJAgprZP25dEY
 11aaLsKBsBpJXyGjRWVxHJeoNnfVO05ATyZU1AjIlALDUY2Kq1IyNJWmmxZbx0M/
 oaRN4kB41jKyJRGWnPdvQb7KL0SvjlyiEWNV9ztEk4W5Ik7UInAYl2sdovwzvgL0
 Nw7KClg2lTLE8Nu4v0GYFxz5bCUw3M4a0+C5oXCSIpXwLOMQezAmXRhYNhlxHwmZ
 phuvJrP7xH0Z2G4MgVwvyOsQzzKptWoo3c5YxTWlhUl2qk1ZUKcu+eMv9ir/T3mD
 exiiMgDLd74Wb9J9U1AEmGp9NgfxM20MGR7O/ARZ8K8FUrxMWbEifjv/eqMl0YGr
 Wk/df+VwUsp3nfOMe7/UZaBCx5ZSV7x8WT6p6lRIQAIrJj1CFbo5pxHLgG/SPt5x
 EPniDw/oC+0G7sc3BjqTLZZP7qh27TuvuUVqAdgM7lJCpozk37Qnq4C0jwheJ7ct
 MvB1mGkfiHVnq3F0UL7emazmlYBULPTJvj7fN9iPAsvNYBbdENCbjfNP0T2YKNlW
 1B08pUtMfHiS3/+LkFQ9yl0lhWnkApm++pNcOM1nLANi49Th92ch8lpfD6poAUBl
 vRWHcMeeqevnydl57WQJ40gtmJr1/5pKgVQwRY/sSBQztr/uqtM811dkNwlF3szw
 es7wA7HkKlI=
 =mirU
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20160824-1' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: More fixes

Here are a couple of fix patches:

 (1) Fix the conn-based retransmission patch posted yesterday.  This breaks
     if it actually has to retransmit.  However, it seems the likelihood of
     this happening is really low, despite the server I'm testing against
     being located >3000 miles away, and sometime of the time it's handled
     in the call background processor before we manage to disconnect the
     call - hence why I didn't spot it.

 (2) /proc/net/rxrpc_calls can cause a crash it accessed whilst a call is
     being torn down.  The window of opportunity is pretty small, however,
     as calls don't stay in this state for long.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:42:57 -07:00
David S. Miller
d14c800ba6 Merge branch 'mlxsw-fdb-learning-offload'
Jiri Pirko says:

====================
mlxsw: Offload FDB learning configuration

Ido says:
This patchset addresses two long standing issues in the mlxsw driver
concerning FDB learning.

Patch 1 limits the number of FDB records processed by the driver in a
single session. This is useful in situations in which many new records
need to be processed, thereby causing the RTNL mutex to be held for
long periods of time.

Patches 2-6 offload the learning configuration (on / off) of bridge
ports to the device instead of having the driver decide whether a
record needs to be learned or not.

The last patch is fallout and removes configuration no longer necessary
after the first patches are applied.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:13 -07:00
Ido Schimmel
0f7a4d8a9d mlxsw: spectrum: Don't set learning when creating vPorts
Before commit 99724c18fc ("mlxsw: spectrum: Introduce support for
router interfaces") we used to assign vFIDs to the created vPorts. Since
these vPorts were used for slow path traffic we had to disable learning
for them, as it doesn't make sense to have it enabled.

This is no longer the case and now vPorts are either used for router
interfaces (for which learning is disabled by the firmware) or bridge
ports (for which learning is explicitly enabled by the driver).

Therefore, we can remove the learning configuration upon vPort creation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:13 -07:00
Ido Schimmel
81f77bc006 mlxsw: spectrum: Remove unnecessary check in FDB processing
We now offload the learning configuration to the device and don't rely
on the driver to decide whether to learn the FDB record, so remove the
check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00