Commit Graph

574937 Commits

Author SHA1 Message Date
Jaegeuk Kim
60b286c442 f2fs: use correct errno
This patch is to fix misused error number.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
745e8490b1 f2fs crypto: add missing locking for keyring_key access
This patch adopts:
	ext4 crypto: add missing locking for keyring_key access

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
1dafa51d45 f2fs crypto: check for too-short encrypted file names
This patch adopts:
	ext4 crypto: check for too-short encrypted file names

An encrypted file name should never be shorter than an 16 bytes, the
AES block size.  The 3.10 crypto layer will oops and crash the kernel
if ciphertext shorter than the block size is passed to it.

Fortunately, in modern kernels the crypto layer will not crash the
kernel in this scenario, but nevertheless, it represents a corrupted
directory, and we should detect it and mark the file system as
corrupted so that e2fsck can fix this.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
ce855a3bd0 f2fs crypto: f2fs_page_crypto() doesn't need a encryption context
This patch adopts:
	ext4 crypto: ext4_page_crypto() doesn't need a encryption context

Since ext4_page_crypto() doesn't need an encryption context (at least
not any more), this allows us to simplify a number function signature
and also allows us to avoid needing to allocate a context in
ext4_block_write_begin().  It also means we no longer need a separate
ext4_decrypt_one() function.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
0fac2d501b f2fs crypto: fix spelling typo in comment
This patch adopts:
	ext4 crypto: fix spelling typo in comment

Signed-off-by: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
66aa3e1274 f2fs crypto: replace some BUG_ON()'s with error checks
This patch adopts:
	ext4 crypto: replace some BUG_ON()'s with error checks

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
8ef2af45ae f2fs: increase i_size to avoid missing data
When finsert is doing with dirting pages, we should increase i_size right away.
Otherwise, the moved page is able to be dropped by the following
filemap_write_and_wait_range before updating i_size.
Especially, it can be done by
	if ((page->index >= end_index + 1) || !offset)
		goto out;
in f2fs_write_data_page.

This should resolve the below xfstests/091 failure reported by Dave.

$ diff -u tests/generic/091.out /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad
--- tests/generic/091.out       2014-01-20 16:57:33.000000000 +1100
+++ /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad       2016-02-08 15:21:02.701375087 +1100
@@ -1,7 +1,18 @@
 QA output created by 091
 fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
+mapped writes DISABLED
+skipping insert range behind EOF
+skipping insert range behind EOF
+truncating to largest ever: 0x11e00
+dowrite: write: Invalid argument
+LOG DUMP (7 total operations):
+1(  1 mod 256): SKIPPED (no operation)
+2(  2 mod 256): SKIPPED (no operation)
+3(  3 mod 256): FALLOC   0x2e0f2 thru 0x3134a  (0x3258 bytes) PAST_EOF
+4(  4 mod 256): SKIPPED (no operation)
+5(  5 mod 256): SKIPPED (no operation)
+6(  6 mod 256): TRUNCATE UP    from 0x0 to 0x11e00
+7(  7 mod 256): WRITE    0x73400 thru 0x79fff  (0x6c00 bytes) HOLE
+Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops
+Correct content saved for comparison
+(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood")

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
24b8491251 f2fs: preallocate blocks for buffered aio writes
This patch preallocates data blocks for buffered aio writes.
With this patch, we can avoid redundant locking and unlocking of node pages
given consecutive aio request.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
b439b103a6 f2fs: move dio preallocation into f2fs_file_write_iter
This patch moves preallocation code for direct IOs into f2fs_file_write_iter.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Yunlei He
d31c7c3f0b f2fs: fix missing skip pages info
fix missing skip pages info in f2fs_writepages trace event.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
0c3a579758 f2fs: introduce f2fs_submit_merged_bio_cond
f2fs use single bio buffer per type data (META/NODE/DATA) for caching
writes locating in continuous block address as many as possible, after
submitting, these writes may be still cached in bio buffer, so we have
to flush cached writes in bio buffer by calling f2fs_submit_merged_bio.

Unfortunately, in the scenario of high concurrency, bio buffer could be
flushed by someone else before we submit it as below reasons:
a) there is no space in bio buffer.
b) add a request of different type (SYNC, ASYNC).
c) add a discontinuous block address.

For this condition, f2fs_submit_merged_bio will be devastating, because
it could break the following merging of writes in bio buffer, split one
big bio into two smaller one.

This patch introduces f2fs_submit_merged_bio_cond which can do a
conditional submitting with bio buffer, before submitting it will judge
whether:
 - page in DATA type bio buffer is matching with specified page;
 - page in DATA type bio buffer is belong to specified inode;
 - page in NODE type bio buffer is belong to specified inode;
If there is no eligible page in bio buffer, we will skip submitting step,
result in gaining more chance to merge consecutive block IOs in bio cache.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
d48dfc2073 f2fs: fix conflict on page->private usage
This patch fixes confilct on page->private value between f2fs_trace_pid and
atomic page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
17c19120eb f2fs: flush bios to handle cp_error in put_super
Sometimes, if cp_error is set, there remains under-writeback pages, resulting in
kernel hang in put_super.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
fa3d2bdf94 f2fs: wait on page's writeback in writepages path
Likewise f2fs_write_cache_pages, let's do for node and meta pages too.
Especially, for node blocks, we should do this before marking its fsync
and dentry flags.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Sheng Yong
479c8bc40c f2fs: fix endianness of on-disk summary_footer
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
da85985c61 f2fs: speed up handling holes in fiemap
This patch makes f2fs_map_blocks supporting returning next potential
page offset which skips hole region in indirect tree of inode, and
use it to speed up fiemap in handling big hole case.

Test method:
xfs_io -f /mnt/f2fs/file  -c "pwrite 1099511627776 4096"
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"

Before:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    3m3.518s
user    0m0.000s
sys     3m3.456s

After:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    0m0.008s
user    0m0.000s
sys     0m0.008s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
3cf4574705 f2fs: introduce get_next_page_offset to speed up SEEK_DATA
When seeking data in ->llseek, if we encounter a big hole which covers
several dnode pages, we will try to seek data from index of page which
is the first page of next dnode page, at most we could skip searching
(ADDRS_PER_BLOCK - 1) pages.

However it's still not efficient, because if our indirect/double-indirect
pointer are NULL, there are no dnode page locate in the tree indirect/
double-indirect pointer point to, it's not necessary to search the whole
region.

This patch introduces get_next_page_offset to calculate next page offset
based on current searching level and max searching level returned from
get_dnode_of_data, with this, we could skip searching the entire area
indirect or double-indirect node block is not exist.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
81ca7350ce f2fs: remove unneeded pointer conversion
There are redundant pointer conversion in following call stack:
 - at position a, inode was been converted to f2fs_file_info.
 - at position b, f2fs_file_info was been converted to inode again.

 - truncate_blocks(inode,..)
  - fi = F2FS_I(inode)		---a
  - ADDRS_PER_PAGE(node_page, fi)
   - addrs_per_inode(fi)
    - inode = &fi->vfs_inode	---b
    - f2fs_has_inline_xattr(inode)
     - fi = F2FS_I(inode)
     - is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
5b8db7fada f2fs: simplify __allocate_data_blocks
This patch uses existing function f2fs_map_block to simplify implementation
of __allocate_data_blocks.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
4fe71e88bf f2fs: simplify f2fs_map_blocks
In f2fs_map_blocks, we use duplicated codes to handle first block mapping
and the following blocks mapping, it's unnecessary. This patch simplifies
f2fs_map_blocks to avoid using copied codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Shuoran Liu
8f1dbbbbdf f2fs: introduce lifetime write IO statistics
This patch introduces lifetime IO write statistics exposed to the sysfs interface.
The write IO amount is obtained from block layer, accumulated in the file system and
stored in the hot node summary of checkpoint.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
[Jaegeuk Kim: add sysfs documentation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
6fe2bc9561 f2fs: give scheduling point in shrinking path
It needs to give a chance to be rescheduled while shrinking slab entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Hou Pengyang
201ef5e080 f2fs: improve shrink performance of extent nodes
On the worst case, we need to scan the whole radix tree and every rb-tree to
free the victimed extent_nodes when shrinking.

Pengyang initially introduced a victim_list to record the victimed extent_nodes,
and free these extent_nodes by just scanning a list.

Later, Chao Yu enhances the original patch to improve memory footprint by
removing victim list.

The policy of lru list shrinking becomes:
1) lock lru list's lock
2) trylock extent tree's lock
3) remove extent node from lru list
4) unlock lru list's lock
5) do shrink
6) repeat 1) to 5)

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
429267442a f2fs: don't set cached_en if it will be freed
If en has empty list pointer, it will be freed sooner, so we don't need to
set cached_en with it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
43a2fa180e f2fs: move extent_node list operations being coupled with rbtree operation
This patch moves extent_node list operations to be handled together with
its rbtree operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Hou Pengyang
a03f01f267 f2fs: reconstruct the code to free an extent_node
There are three steps to free an extent node:
1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free

In path f2fs_destroy_extent_tree, 1->2->3 to free a node,
But in path f2fs_update_extent_tree_range, it is 2->1->3.

This patch makes all the order to be: 1->2->3
It makes sense, since in the next patch, we import a victim list in the
path shrink_extent_tree, we could check if the extent_node is in the victim
list by checking the list_empty(). So it is necessary to put 1) first.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
7c506896cf f2fs: use wq_has_sleeper for cp_wait wait_queue
We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Fan Li
688159b6db f2fs: avoid unnecessary search while finding victim in gc
variable nsearched in get_victim_by_default() indicates the number of
dirty segments we already checked. There are 2 problems about the way
it updates:
1. When p.ofs_unit is greater than 1, the victim we find consists
   of multiple segments, possibly more than 1 dirty segment.
   But nsearched always increases by 1.
2. If segments have been found but not been chosen, nsearched won't
   increase. So even we have checked all dirty segments, nsearched
   may still less than p.max_search.
All these problems could cause unnecessary search after all dirty
segments have already been checked.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Yunlei He
85ead8185a f2fs: delete unnecessary wait for page writeback
no need to wait inline file page writeback for no one
use it, so this patch delete unnecessary wait.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
fec1d6576c f2fs: use wait_for_stable_page to avoid contention
In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
718e53fa63 f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:

	for each segment in victim section
		blk_start_plug
		for each valid block in segment
			write out by OPU method
		submit bio cache   <---
		blk_finish_plug   <---

There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.

So refactor the code as below structure to strive for biggest
opportunity of merging IOs:

	blk_start_plug
	for each segment in victim section
		for each valid block in segment
			write out by OPU method
	submit bio cache
	blk_finish_plug

Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl

Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times

After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
e3ef18762f f2fs: don't need to call set_page_dirty for io error
If end_io gets an error, we don't need to set the page as dirty, since we
already set f2fs_stop_checkpoint which will not flush any data.

This will resolve the following warning.

======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
4.4.0+ #9 Tainted: G           O
------------------------------------------------------
xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs]

and this task is already holding:
 (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490
which would create a new lock dependency:
 (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...}

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
ae96e7bdd4 f2fs: avoid needless sync_inode_page when reading inline_data
In write_begin, if there is an inline_data, f2fs loads it into 0'th data page.
Since it's the read path, we don't need to sync its inode page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
52f8033712 f2fs: don't need to sync node page at every time
In write_end, we don't need to sync inode page at every time.
Instead, we can expect f2fs_write_inode will update later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
2049d4fcb0 f2fs: avoid multiple node page writes due to inline_data
The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

So, this patch tries to flush inline_data when flushing node blocks.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
3c082b7b5b f2fs: do f2fs_balance_fs when block is allocated
We should consider data block allocation to trigger f2fs_balance_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
6e17bfbc75 f2fs: fix to overcome inline_data floods
The scenario is:
1. create lots of node blocks
2. sync
3. write lots of inline_data
-> got panic due to no free space

In that case, we should flush node blocks when writing inline_data in #3,
and trigger gc as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
25c1355151 f2fs: use writepages->lock for WB_SYNC_ALL
If there are many writepages calls by multiple threads in background, we don't
need to serialize to merge all the bios, since it's background.
In such the case, it'd better to run writepages concurrently.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
b483fadf7e f2fs: remove needless condition check
This patch removes needless condition variable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
0ab1435631 f2fs: correct search area in get_new_segment
get_new_segment starts from current segment position, tries to search a
free segment among its right neighbors locate in same section.

But previously our search area was set as [current segment, max segment],
which means we have to search to more bits in free_segmap bitmap for some
worse cases. So here we correct the search area to [current segment, last
segment in section] to avoid unnecessary searching.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
2304cb0c44 f2fs: export dirty_nats_ratio in sysfs
This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
of dirty nat entries, if current ratio exceeds configured threshold,
checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
7d768d2c26 f2fs: flush dirty nat entries when exceeding threshold
When testing f2fs with xfstest, generic/251 is stuck for long time,
the case uses below serials to obtain fresh released space in device,
in order to prepare for following fstrim test.

1. rm -rf /mnt/dir
2. mkdir /mnt/dir/
3. cp -axT `pwd`/ /mnt/dir/
4. goto 1

During preparing step, all nat entries will be cached in nat cache,
most of them are dirty entries with invalid blkaddr, which means
nodes related to these entries have been truncated, and they could
be reused after the dirty entries been checkpointed.

However, there was no checkpoint been triggered, so nid allocators
(e.g. mkdir, creat) will run into long journey of iterating all NAT
pages, looking for free nids in alloc_nid->build_free_nids.

Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
to flush nat entries for reusing them in free nid cache when dirty
entry count exceeds 10% of max count.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
0fd785eb93 f2fs: relocate is_merged_page
Operations in is_merged_page is related to inner bio cache, move it to
data.c.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Linus Torvalds
4de8ebeff8 Two more small fixes.
One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the
 stack made by the stack tracer. As the stack tracer scans the entire
 kernel stack, KASAN triggers seeing it as a "stack out of bounds" error.
 As the scan is looking at the contents of the stack from parent functions.
 The NOCHECK() tells KASAN that this is done on purpose, and is not some
 kind of stack overflow.
 
 The second fix is to the ftrace selftests, to retrieve the PID of executed
 commands from the shell with "$!" and not by parsing "jobs".
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWyycyAAoJEKKk/i67LK/8ALoH/RkMZ7Cih7vXb30wB13xSNrB
 6o4ApuC4YOS9Un/4ruCXb+cGbW2LJLHkEU2ageoHLOZMvwuAM7iQ6fTUW1KxCRP2
 ECvqyqi0ZRyoi/CibxVVH9hHEAJzUTwok67nkLeZBqIN9Fglcfd7toAwgcrH3y59
 Pybyv5CV2eaff5IKoLXKZJNRLdrVLeM7v4BvdI0dxEikhWZ0XsA0RoIaNfTPqyQJ
 F6sJ/njdoMMJK4N8CCPxlvnvEOzn0DnJnfUNUQEj5J3YU9DbAHAACaBSg5oSh9oK
 BcFYKV2GIzPku1cafutRRlErcGyB2yqv7bB8Eo86zXRHbeonaj4XGJmH276ldVg=
 =srlj
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Two more small fixes.

  One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the
  stack made by the stack tracer.  As the stack tracer scans the entire
  kernel stack, KASAN triggers seeing it as a "stack out of bounds"
  error.  As the scan is looking at the contents of the stack from
  parent functions.  The NOCHECK() tells KASAN that this is done on
  purpose, and is not some kind of stack overflow.

  The second fix is to the ftrace selftests, to retrieve the PID of
  executed commands from the shell with '$!' and not by parsing 'jobs'"

* tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing, kasan: Silence Kasan warning in check_stack of stack_tracer
  ftracetest: Fix instance test to use proper shell command for pids
2016-02-22 14:09:18 -08:00
Linus Torvalds
692b8c663c Xen bug fixes for 4.5-rc5
- Two scsiback fixes (resource leak and spurious warning).
 - Fix DMA mapping of compound pages on arm/arm64.
 - Fix some pciback regressions in MSI-X handling.
 - Fix a pcifront crash due to some uninitialize state.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWyvatAAoJEFxbo/MsZsTRBFcH+wWnv0/N+gKib3cKCI4lwmTg
 n8iVgf8dNWwD36M2s/OlzCAglAIt8Xr6ySNvPqTerpm7lT9yXlIVQxGXTbIGuTAA
 h8Kt8WiC0BNLHHlLxBuCz62KR47DvMhsr84lFURE8FmpUiulFjXmRcbrZkHIMYRS
 l/X+xJWO1vxwrSYho0P9n3ksTWHm488DTPvZz3ICNI2G2sndDfbT3gv3tMDaQhcX
 ZaQR93vtIoldqk29Ga59vaVtksbgxHZIbasY9PQ8rqOxHJpDQbPzpjocoLxAzf50
 cioQVyKQ7i9vUvZ+B3TTAOhxisA2hDwNhLGQzmjgxe2TXeKdo3yjYwO6m1dDBzY=
 =VY/S
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Two scsiback fixes (resource leak and spurious warning).

 - Fix DMA mapping of compound pages on arm/arm64.

 - Fix some pciback regressions in MSI-X handling.

 - Fix a pcifront crash due to some uninitialize state.

* tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
  xen/pcifront: Report the errors better.
  xen/pciback: Save the number of MSI-X entries to be copied later.
  xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
  xen: fix potential integer overflow in queue_reply
  xen/arm: correctly handle DMA mapping of compound pages
  xen/scsiback: avoid warnings when adding multiple LUNs to a domain
  xen/scsiback: correct frontend counting
2016-02-22 13:57:01 -08:00
Linus Torvalds
dea08e6044 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Looks like a lot, but mostly driver fixes scattered all over as usual.

  Of note:

   1) Add conditional sched in nf conntrack in cleanup to avoid NMI
      watchdogs.  From Florian Westphal.

   2) Fix deadlock in nfnetlink cttimeout, also from Floarian.

   3) Fix handling of slaves in bonding ARP monitor validation, from Jay
      Vosburgh.

   4) Callers of ip_cmsg_send() are responsible for freeing IP options,
      some were not doing so.  Fix from Eric Dumazet.

   5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.

   6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.

   7) bcm7xxx PHY driver bug fixes from Florian Fainelli.

   8) Avoid unaligned accesses to protocol headers wrt.  GRE, from
      Alexander Duyck.

   9) SKB leaks and other problems in arc_emac driver, from Alexander
      Kochetkov.

  10) tcp_v4_inbound_md5_hash() releases listener socket instead of
      request socket on error path, oops.  Fix from Eric Dumazet.

  11) Missing socket release in pppoe_rcv_core() that seems to have
      existed basically forever.  From Guillaume Nault.

  12) Missing slave_dev unregister in dsa_slave_create() error path,
      from Florian Fainelli.

  13) crypto_alloc_hash() never returns NULL, fix return value check in
      __tcp_alloc_md5sig_pool.  From Insu Yun.

  14) Properly expire exception route entries in ipv4, from Xin Long.

  15) Fix races in tcp/dccp listener socket dismantle, from Eric
      Dumazet.

  16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
      legal.  These drivers modify the SKB on transmit.  From Jiri Benc.

  17) Fix regression in the initialziation of netdev->tx_queue_len.
      From Phil Sutter.

  18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.

  19) SCTP port hash sizing does not properly ensure that table is a
      power of two in size.  From Neil Horman.

  20) Fix initializing of software copy of MAC address in fmvj18x_cs
      driver, from Ken Kawasaki"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
  bnx2x: Fix 84833 phy command handler
  bnx2x: Fix led setting for 84858 phy.
  bnx2x: Correct 84858 PHY fw version
  bnx2x: Fix 84833 RX CRC
  bnx2x: Fix link-forcing for KR2
  net: ethernet: davicom: fix devicetree irq resource
  fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
  Driver: Vmxnet3: Update Rx ring 2 max size
  net: netcp: rework the code for get/set sw_data in dma desc
  soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
  net: ti: netcp: restore get/set_pad_info() functionality
  MAINTAINERS: Drop myself as xen netback maintainer
  sctp: Fix port hash table size computation
  can: ems_usb: Fix possible tx overflow
  Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
  net: bcmgenet: Fix internal PHY link state
  af_unix: Don't use continue to re-execute unix_stream_read_generic loop
  unix_diag: fix incorrect sign extension in unix_lookup_by_ino
  bnxt_en: Failure to update PHY is not fatal condition.
  bnxt_en: Remove unnecessary call to update PHY settings.
  ...
2016-02-22 12:18:07 -08:00
Linus Torvalds
5c102d0eca Two fixes headed for stable:
Remove an un-necessary speed_index lookup for thermal hook in the gpio-fan
 driver. The unnecessary speed lookup can hog the system.
 
 Handle negative conversion values correctly in the ads1015 driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWydseAAoJEMsfJm/On5mBhxgP/iVNIDIusNLBHD7VG2vtpWim
 +nAmcdllLdtuT459zGX5z5HE3b7oj8v5JYW9TUH0x+N4WhvFZ9TWgH8OXt2cuUZ3
 vgbgt9QJLTzP6PHFdhhF2rLrL1Ur5eyeV0fIlRVHj2KqZixILcutLn3+TDKE1Buc
 o2Y5+CXI/XJZ2y5U8SOK5xYC1EmYkijc5rTmDdF1/5DhNzupO4S6bW/MdY9Qo5X6
 0qkyLQ01eH1GqnvUA9VNhDjlU7NZWVFpd/StL8IQk313E+79m3vwAlSNK6a33eC2
 w4nuFLA4zPfJwRYI7dN8jePCs+Y10ucfmX+EKFU7JzC4WTvyv1P76Ilrd6/ZuDJx
 YQ8ClT9ZscT9xqr+0MnhkBeE1Mu6YZoDLVaSCymcIp1E7mTAMY8yMBknFcWEZIX7
 cJuqMskvBHn7o1WCzzYOqFgOv4csm3fcazeEzp+R25Ds4NE715ClUfFZI6gLc9lg
 vYlLtU2mQGI3IsdD/a1/WOefOPGsFrGwVIAcgsi5zlCYFY/BSa9PLu/dWoLtomnP
 GEUUDIEyKsZ7om9B4B9229izL9jRT5MsYbh7vJb/1XULpXMBg1ESqhamxmtxV+oU
 J5fIlpw0g1/83mul8rq9FIeAxTcgAXsrGtJUWpatd/rHrQOgP/w5w9N3MkX7LDLE
 ccP+uEEP2z/i+PgOmr0L
 =luAI
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Two fixes headed for stable:

   - Remove an unnecessary speed_index lookup for thermal hook in the
     gpio-fan driver.  The unnecessary speed lookup can hog the system.

   - Handle negative conversion values correctly in the ads1015 driver"

* tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
  hwmon: (ads1015) Handle negative conversion values correctly
2016-02-22 12:12:46 -08:00
Linus Torvalds
a16152c897 Additional 4.5-rc5 fixes
- One fix ocrdma - The new CQ API support was added to ocrdma, but they
   got the arming logic wrong, so without this, transfers eventually fail
   when they fail to arm the interrupt properly under load
 - Two related fixes for mlx4 - When we added the 64bit extended counters
   support to the core IB code, they forgot to update the RoCE side of the
   mlx4 driver (the IB side they properly updated).  I debated whether or
   not to include these patches as they could be considered feature
   enablement patches, but the existing code will blindy copy the 32bit
   counters, whether any counters were requested at all (a bug).  These
   two patches make it A) check to see that counters were requested and
   B) copy the right counters (the 64bit support is new, the 32bit is
   not).  For that reason I went ahead and took them.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWxI9xAAoJELgmozMOVy/dqRgP/3JbB5MMcqqrxCXGE7874YOO
 ZXldH2nJ+uIffWn1IUDJd01vWKDbxBD5joD+HgH1XLjmSQZf7I6APmayTMx43k/k
 Ps4fg4iDU2HY35w4s8hCgIj1WmKtYWPA2wHFB1VUO8nhLQD0fxCInw3feDhOXu23
 VYSBOypHIMHZfldOog/RXQ4/cCStTmhyDlPtcldWTpMm/SmBEs/iGoUXe5gLHZfk
 aKEK2OOC/Yk/iK4flnYsNcgDhFfTcFlpbUtBfLYvFd9UxBCkPsB7WE5BT06zvEHe
 Cp5IN7093Nhh66uKyDHyzBJFMrZB3crcqmEGsauE6iBF6iQItlvVMch7DHP4FEWP
 2VlYKbhINH7yfl/bXq9CyljA2wcPS5MDtQgrEwa/qMeA8sqEQWnyPgS4xGKnufoQ
 SbvPY9pdzhU2G6weoRSiVfpMXh2dl/kbOcsn/rywfyrJ4Ne1lE60Di75WEpwxD+I
 4cRktAZp5W2gnv9NJRA/iYMQCVxJ9Sd7aj+dFfWOzJf3uuOlvMALTPuWF4zyeT1U
 VKpptmAmq0l3zVtpnmuVNO4BqC+00L+sojrkeGX9ZhLBaJZfUkj10sjPAWGHNU28
 liQ3Wnqb7q5Z7ROBs5bbMub7Q4+hWiEINPQtFm4f43bmievI9QgPjoNQXjXea7vI
 W9WXEhYS1fjI9z3w4Oqg
 =QD36
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "One ocrdma fix:

   - The new CQ API support was added to ocrdma, but they got the arming
     logic wrong, so without this, transfers eventually fail when they
     fail to arm the interrupt properly under load

  Two related fixes for mlx4:

   - When we added the 64bit extended counters support to the core IB
     code, they forgot to update the RoCE side of the mlx4 driver (the
     IB side they properly updated).

     I debated whether or not to include these patches as they could be
     considered feature enablement patches, but the existing code will
     blindy copy the 32bit counters, whether any counters were requested
     at all (a bug).

     These two patches make it (a) check to see that counters were
     requested and (b) copy the right counters (the 64bit support is
     new, the 32bit is not).  For that reason I went ahead and took
     them"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx4: Add support for the port info class for RoCE ports
  IB/mlx4: Add support for extended counters over RoCE ports
  RDMA/ocrdma: Fix arm logic to align with new cq API
2016-02-22 12:04:11 -08:00
Linus Torvalds
7ee302f6f7 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Some bugfixes from I2C for you:

  A fix for a RuntimePM regression with OMAP, a fix to enable TCO for
  Lewisburg platforms, and a typo fix while we are here"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i801: Adding Intel Lewisburg support for iTCO
  i2c: uniphier: fix typos in error messages
  i2c: omap: Fix PM regression with deferred probe for pm_runtime_reinit
2016-02-22 11:55:18 -08:00
David S. Miller
d856626d3b linux-can-fixes-for-4.5-20160221
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCgAGBQJWycVaAAoJED07qiWsqSVqhoIIAKNVMLApOlvduZ2JQYSUJ+tC
 IWjRSuB3sOwCIMKx4DXWhovkTU6j00jxQiEHF5phSOA9n5n+wwJMhgZQeOehAYoS
 smwkKMjJhgvEd4lZ7+OBWj6Xi1mStU3ddYYubE7MkTE/4E6vsnJ6/iVbCyIv4d0F
 7zGnWaq/bca+ccZyrVO9hk3mg8ZD+2L6hGR47E5vWerR4mupifMK0LvmfgIPBsMk
 FVMKhddDek2xUTZxdUesXCvAoOewSfhFbNG3uMH6AFKVjEhFoEC8c0Wn7zLrJ5Zn
 FCec9e4qXRHwQo1MgobLn6SXgrwEoPqXUFTQfNVG8rYkX/vXScXX9V7JZCCuC8U=
 =0Qdk
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.5-20160221' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-02-21

this is a pull reqeust of one patch for net/master.

The patch is by Gerhard Uttenthaler and fixes a potential tx overflow in the
ems_usb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:51:55 -05:00