Ruslan reported that f2fs hangs with an infinite loop in f2fs_sync_file():
while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0)
f2fs_write_inode(inode, NULL);
The reason was revealed that the cold flag is not set even thought this inode is
a normal file. Therefore, sync_node_pages() skips to write node blocks since it
only writes cold node blocks.
The cold flag is stored to the node_footer in node block, and whenever a new
node page is allocated, it is set according to its file type, file or directory.
But, after sudden-power-off, when recovering the inode page, f2fs doesn't recover
its cold flag.
So, let's assign the cold flag in more right places.
One more thing:
If f2fs_write_inode() returns an error due to whatever situations, there would
be no dirty node pages so that sync_node_pages() returns zero.
(i.e., zero means nothing was written.)
Reported-by: Ruslan N. Marchenko <me@ruff.mobi>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Test Case:
[NFS Client]
ls -lR .
[NFS Server]
while [ 1 ]
do
echo 3 > /proc/sys/vm/drop_caches
done
Error on NFS Client: "No such file or directory"
When cache is dropped at the server, it results in lookup failure at the
NFS client due to non-connection with the parent. The default path is it
initiates a lookup by calculating the hash value for the name, even though
the hash values stored on the disk for "." and ".." is maintained as zero,
which results in failure from find_in_block due to not matching HASH values.
Fix up, by using the correct hashing values for these entries.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In f2fs_statfs(), f_files should be the total number of available inodes
instead of the currently allocated inodes.
So, this patch should resolve the reported bug below.
Note that, showing 10% usage is not a bug, since f2fs reveals whole volume size
as much as possible and shows the space overhead as *used*.
This policy is fair enough with respect to other file systems.
<Reported Bug>
(loop0 is backed by 1GiB file)
$ mkfs.f2fs /dev/loop0
F2FS-tools: Ver: 1.1.0 (2012-12-11)
Info: sector size = 512
Info: total sectors = 2097152 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
Info: format successful
$ mount /dev/loop0 mnt/
$ df mnt/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 1046528 98312 929784 10%
/home/zeta/linux-devel/mtd-bench/mnt
$ df mnt/ -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/loop0 1 -465918 465919 - /home/zeta/linux-devel/mtd-bench/mnt
Notice IUsed is negative. Also, 10% usage on a fresh f2fs seems too
much to be correct.
Reported-and-Tested-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
We should guarantee not to do *scheduling while atomic*.
I found, in atomic f2fs_end_io_write(), there is a set_page_dirty() call
to deal with IO errors.
But, set_page_dirty() calls:
-> f2fs_set_data_page_dirty()
-> set_dirty_dir_page()
-> cond_resched() which results in scheduling.
In order to avoid this, I'd like to remove simply set_page_dirty(),
since the page is already marked as ERROR and f2fs will be operated
as the read-only mode as well.
So, there is no recovery issue with this.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Previously, f2fs didn't track the parent inode number correctly which is stored
in each f2fs_inode. In the case of the following scenario, a bug can be occured.
Let's suppose there are one directory, "/b", and two files, "/a" and "/b/a".
- pino of "/a" is ROOT_INO.
- pino of "/b/a" is DIR_B_INO.
Then,
# sync
: The inode pages of "/a" and "/b/a" contain the parent inode numbers as
ROOT_INO and DIR_B_INO respectively.
# mv /a /b/a
: The parent inode number of "/a" should be changed to DIR_B_INO, but f2fs
didn't do that. Ref. f2fs_set_link().
In order to fix this clearly, I added i_pino in f2fs_inode_info, and whenever
it needs to be changed like in f2fs_add_link() and f2fs_set_link(), it is
updated temporarily in f2fs_inode_info.
And later, f2fs_write_inode() stores the latest information to the inode pages.
For power-off-recovery, f2fs_sync_file() triggers simply f2fs_write_inode().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Do cleanup more for better code readability.
- Change the parameter set of f2fs_bio_alloc()
This function should allocate a bio only since it is not something like
f2fs_bio_init(). Instead, the caller should initialize the allocated bio.
- Introduce SECTOR_FROM_BLOCK
This macro translates a block address to its sector address.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Simplify code by providing the accessor macro to retrieve the
number of dentry slots for a given filename length.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Since, we anyway need to put the page after deleting entry. So, there is no
need to make same call under different conditions.
Move out the f2fs_put_page from the two conditions and call at once.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Since, GFP_NOFS and __GFP_ZERO is being used to set gfp_mask.
We can instead make use of already predefined macro GFP_F2FS_ZERO.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Since, GFP_NOFS(__GFP_WAIT) is used for allocation requests of bio in f2fs.
So, there is no chance of returning NULL from the BIO allocation.
Making the bio allocation routine for f2fs simpler.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
The variables node_page and page_offset are initialized but never used
otherwise, so remove those unused variables.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
In function f2fs_mkdir, err is being initialized without even checking
if there was any error in new inode creation. So, instead check the
inode error and make use of error/return condition.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
No need to initialize "struct f2fs_gc_kthread *gc_th = NULL",
as gc_th = NULL, will be taken care by the return values of kmalloc().
And fix codes in other places.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
If the filesystem is mounted as read-only then return from that point itself
instead of first doing a writeout/wait and then checking for read-only
condition.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Since, __GFP_ZERO is used while f2fs inode allocation, so we do not
need memset for f2fs_inode_info, as this is already zeroed out.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
print the invalid argument/value from parse_options in case of
mount failure.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
When CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled in the kernel, -Os optimisation
flag is passed to gcc for compilation, and somehow while trying to optimize
the code, compiler is might not able to see the initialisation of variable
ne struct variable inside the get_node_info() function and results into
following warning:
fs/f2fs/node.c: In function 'get_node_info':
fs/f2fs/node.c:175:3: warning: 'ne.block_addr' may be used uninitialized in
this function [-Wuninitialized]
fs/f2fs/node.c:265:24: note: 'ne.block_addr' was declared here
fs/f2fs/node.c:176:3: warning: 'ne.ino' may be used uninitialized in this
function [-Wuninitialized]
fs/f2fs/node.c:265:24: note: 'ne.ino' was declared here
fs/f2fs/node.c:177:3: warning: 'ne.version' may be used uninitialized in
this function [-Wuninitialized]
fs/f2fs/node.c:265:24: note: 'ne.version' was declared here
Hence, lets initialise the ne struct variable to zero, which will remove
this warning and also doing this does not seems to making any impact on the
code behavior.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
There exist two build failures reported by Randy Dunlap as follows.
(on i386)
a. (config-r8857)
ERROR: "f2fs_xattr_advise_handler" [fs/f2fs/f2fs.ko] undefined!
Key configs in (config-r8857) are as follows.
CONFIG_F2FS_FS=m
# CONFIG_F2FS_STAT_FS is not set
CONFIG_F2FS_FS_XATTR=y
# CONFIG_F2FS_FS_POSIX_ACL is not set
The error was occurred due to the function location that we made a mistake.
Recently we added a new functionality for users to indicate cold files
explicitly through xattr operations (i.e., f2fs_xattr_advise_handler).
This handler should have been added in xattr.c instead of acl.c in order
to avoid an undefined operation like in this case where XATTR is set and
ACL is not set.
b. (config-r8855)
fs/f2fs/file.c: In function 'f2fs_vm_page_mkwrite':
fs/f2fs/file.c:97:2: error: implicit declaration of function
'block_page_mkwrite_return'
Key config in (config-r8855) is CONFIG_BLOCK.
Obviously, f2fs works on top of the block device so that we should consider
carefully a sort of config dependencies.
The reason why this error was occurred was that f2fs_vm_page_mkwrite() calls
block_page_mkwrite_return() which is enalbed only if CONFIG_BLOCK is set.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment
blocks. Instead, just use "/*".
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch should resolve the bugs reported by the sparse tool.
Initial reports were written by "kbuild test robot" managed by fengguang.wu.
In my local machines, I've tested also by running:
> make C=2 CF="-D__CHECK_ENDIAN__"
Accordingly, I've found lots of warnings and bugs related to the endian
conversion. And I've fixed all at this moment.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files
in the fs directory.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This moves all of the f2fs debugging files into debugfs. The files are
located in /sys/kernel/debug/f2fs/
Note, I think we are generating all of the same information in each of
the files for every unique f2fs filesystem in the machine. This copies
the functionality that was present in the proc files, but this should be
fixed up in the future.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[jaegeuk.kim@samsung.com: merged 3 debugfs entries into a *status* entry]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds roll-forward routines to recover fsynced data.
- F2FS uses basically roll-back model with checkpointing.
- In order to implement fsync(), there are two approaches as follows.
1. A roll-back model with checkpointing at every fsync()
: This is a naive method, but suffers from very low performance.
2. A roll-forward model
: F2FS adopts this model where all the fsynced data should be recovered, which
were written after checkpointing was done. In order to figure out the data,
F2FS keeps a "fsync" mark in direct node blocks. In addition, F2FS remains
the location of next node block in each direct node block for reconstructing
the chain of node blocks during the recovery.
- In order to enhance the performance, F2FS keeps a "dentry" mark also in direct
node blocks. If this is set during the recovery, F2FS replays adding a dentry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This implements xattr and acl functionalities.
- F2FS uses a node page to contain use extended attributes.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds inode operations for directory, symlink, and special inodes.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds core functions to get, read, write, and evict an inode.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds address space operations for data.
- F2FS supports readpages(), writepages(), and direct_IO().
- Because of out-of-place writes, f2fs_direct_IO() does not write data in place.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds memory operations and file/file_inode operations.
- F2FS supports fallocate(), mmap(), fsync(), and basic ioctl().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds specific functions to manage NAT pages, a cache for NAT entries, free
nids, direct/indirect node blocks for indexing data, and address space for node
pages.
- The key information of an NAT entry consists of a node id and a block address.
- An NAT page is composed of block addresses covered by a certain range of NAT
entries, which is maintained by the address space of meta_inode.
- A radix tree structure is used to cache NAT entries. The index for the tree
is a node id.
- When there is no free nid, F2FS should scan NAT entries to find new one. In
order to avoid scanning frequently, F2FS manages a list containing a number of
free nids in memory. Only when free nids in the list are exhausted, scanning
process, build_free_nids(), is triggered.
- F2FS has direct and indirect node blocks for indexing data. This patch adds
fuctions related to the node block management such as getting, allocating, and
truncating node blocks to index data.
- In order to cache node blocks in memory, F2FS has a node_inode with an address
space for node pages. This patch also adds the address space operations for
node_inode.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds functions required by the checkpoint operations.
Basically, f2fs adopts a roll-back model with checkpoint blocks written in the
CP area. The checkpoint procedure includes as follows.
- write_checkpoint()
1. block_operations() freezes VFS calls.
2. submit cached bios.
3. flush_nat_entries() writes NAT pages updated by dirty NAT entries.
4. flush_sit_entries() writes SIT pages updated by dirty SIT entries.
5. do_checkpoint() writes,
- checkpoint block (#0)
- orphan inode blocks
- summary blocks made by active logs
- checkpoint block (copy of #0)
6. unblock_opeations()
In order to provide an address space for meta pages, f2fs_sb_info has a special
inode, namely meta_inode. This patch also adds the address space operations for
meta_inode.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds the implementation of superblock operations for f2fs, which includes
- init_f2fs_fs/exit_f2fs_fs
- f2fs_mount
- super_operations of f2fs
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>