We have 3 znode flags: cow, obsolete, dirty. For the last flag we have a
'ubifs_zn_dirty()' helper function, but for the other 2 flags we use
'test_bit()' directly.
This patch makes the situation more consistent and introduces helpers for the
other 2 flags: 'ubifs_zn_cow()' and 'ubifs_zn_obsolete()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove dead pieces of code under "if (c->min_io_size == 1)" statement -
we never execute it because in UBIFS 'c->min_io_size' is always at least 8.
This are leftovers from old pre-mainline prototype.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove unnecessary brackets in "inode->i_flags |= (S_NOCMTIME)" statement to
make the code not look silly.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using long "(inode->i_mode & S_IFMT) != S_IFREG" expression, use
shorted "!S_ISREG(inode->i_mode)".
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Teach 'dbg_dump_inode()' dump directory entries for directory inodes.
This requires few additional changes:
1. The 'c' argument of 'dbg_dump_inode()' cannot be const any more.
2. Users of 'dbg_dump_inode()' should not have 'tnc_mutex' locked.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch lessens the 'struct ubifs_debug_info' size by 90 bytes by
allocating less bytes for the debugfs root directory name. It introduces macros
for the name patter an length instead of hard-coding 100 bytes. It also makes
UBIFS use 'snprintf()' and teaches it to gracefully catch situations when the
name array is too short.
Additionally, this patch makes 2 unrelated changes - I just thought they do not
deserve separate commits: simplifies 'ubifs_assert()' for non-debugging case
and makes 'dbg_debugfs_init()' properly verify debugfs return code which may be
an error code or NULL, so we should you 'IS_ERR_OR_NULL()' instead of
'IS_ERR()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
If commit failed and it is in broken state, UBIFS switches to R/O mode. Most
operations return -EROFS in this case, except of commit which returns -EINVAL.
Make it return -EROFS too for consistency. This is also important for our power
cut emulation testing.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
d251ed271d "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* allocate ubifs_info in ->mount(), fill it enough for sb_test() and
set ->s_fs_info to it in set() callback passed to sget().
* do *not* free it in ->put_super(); do that in ->kill_sb() after we'd
done kill_anon_super().
* don't free it in ubifs_fill_super() either - deactivate_locked_super()
done by caller when ubifs_fill_super() returns an error will take care
of that sucker.
* get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info;
we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need
only ubifs_info. Which it will find in ->s_fs_info just fine, no need to
reassign anything...
As the result, sb_test() becomes safe to apply to all superblocks that
can be found by sget() (and a kludge with temporary use of ->s_fs_info
to store a pointer to very different structure goes away).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The free space fixup is currently initiated during mount after the call to
ubifs_write_master() which results in a write to PEBs; this has been observed
with the patch 'assert no fixup when writing a node' applied:
Move the free space fixup on mount to before the calls to
ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
assertions with the previously mentioned patch applied.
Artem: tweaked the patch a bit
Signed-off-by: Ben Gardiner <bengardiner@nanometrics>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The current 'mount_ubifs()' implementation does not initialize the LPT until the
the master node is marked dirty. Move the LPT initialization to before marking
the master node dirty. This is a preparation for the next patch which will move
the free-space-fixup check to before marking the master node dirty, because we
have to fix-up the free space before doing any writes.
Artem: massaged the patch and commit message.
Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The current free space fixup can result in some writing to the UBI volume
when the space_fixup flag is set.
To catch instances where UBIFS is writing to the NAND while the space_fixup
flag is set, add an assert to ubifs_write_node().
Artem: tweaked the patch, added similar assertion to the write buffer
write path.
Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS maintains per-filesystem and global clean znode counters
('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain
correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'.
However, in case of failures during commit the counters were corrupted. E.g.,
if a failure happens in the middle of 'write_index()', then some nodes in the
commit list ('c->cnext') are marked as clean, and some are marked as dirty. And
the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we
end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we
have 2 file-sytem and one of them fails, and we unmount it,
'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
Although the object is small, the alignment can make it large - e.g., 2KiB
if the min. I/O unit is 2KiB.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:
shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788
This patch makes sure UBIFS never returns negative count of objects.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb
(UBIFS: fix extremely rare mount failure) broke recovery. This commit make
UBIFS drop the last min. I/O unit in all journal heads, but this is needed only
for the GC head. And this does not work for non-GC heads. For example, if
suppose we have min. I/O units A and B, and A contains a valid node X, which
was fsynced, and then a group of nodes Y which spans the rest of A and B. In
this case we'll drop not only Y, but also X, which is obviously incorrect.
This patch fixes the issue and additionally makes recovery to drop last min.
I/O unit only for the GC head, and leave things as they have been for ages for
the other heads - this is safer.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells
whether the nodes are grouped in the LEB to recover, pass the journal head
number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag.
This patch is a preparation to a further fix where we'll need to know the
journal head number for other purposes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Journal heads are different in a way how UBIFS writes nodes there. All normal
journal heads receive grouped nodes, while the GC journal heads receives
ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which
describes this property.
This patch is a preparation to a further recovery fix.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but
it introduced a regression - now UBIFS prints scary error messages during
recovery on all corrupted nodes, even though the corruptions are expected
(due to a power cut). This patch fixes the issue.
Additionally fix a typo in a commentary introduced by the same commit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 1495f230fa ("vmscan: change shrinker API by passing
shrink_control struct") changed the API of ->shrink(), but missed ubifs
and cifs instances.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ubifs does not have problems with references to unlinked directories.
CC: Artem Bityutskiy <dedekind1@gmail.com>
CC: Adrian Hunter <adrian.hunter@nokia.com>
CC: linux-mtd@lists.infradead.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.
This is just the prototype change with no user for it yet. I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.
Also remove incorrect comments that ->dirty_inode can't block. That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits)
cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
ceph: remove unnecessary dentry_unhash calls
vfs: clean up vfs_rename_other
vfs: clean up vfs_rename_dir
vfs: clean up vfs_rmdir
vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
libfs: drop unneeded dentry_unhash
vfs: update dentry_unhash() comment
vfs: push dentry_unhash on rename_dir into file systems
vfs: push dentry_unhash on rmdir into file systems
vfs: remove dget() from dentry_unhash()
vfs: dentry_unhash immediately prior to rmdir
vfs: Block mmapped writes while the fs is frozen
...
Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.
This does not change behavior for any in-tree file systems.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Switch to debugging using dynamic printk (pr_debug()). There is no good reason
to carry custom debugging prints if there is so cool and powerful generic
dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With
dynamic printks we can switch on/of individual prints, per-file, per-function
and per format messages. This means that instead of doing old-fashioned
echo 1 > /sys/module/ubifs/parameters/debug_msgs
to enable general messages, we can do:
echo 'format "UBIFS DBG gen" +ptlf' > control
to enable general messages and additionally ask the dynamic printk
infrastructure to print process ID, line number and function name. So there is
no reason to keep UBIFS-specific crud if there is more powerful generic thing.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor fix for UBIFS kernel-doc comments - we forgot the "@" symbol
for several 'struct ubifs_debug_info'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes an extremely rare mount failure after a power cut, when mount
fails with ENOSPC error because UBIFS could not find the GC LEB.
In short, the reason for this failure is that after recovery the GC head LEB
contains less free space than it had contained just before the power cut
happened. As a result, if the FS is full, 'ubifs_rcvry_gc_commit()' is unable
to find a dirty LEB to GC and a free LEB, so mount fails.
This patch contains a huge comment with more detailed explanation, please refer
that comment.
Since this is really really rare and unlikely situation, I do not send this
patch to the stable tree, also because it requires a lot of preparation
patches which I did before. So sending this to -stable would be too risky.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Further simplify 'ubifs_recover_leb()' by noticing that we have to call
'clean_buf()' in any case, and it is fine to call it if the offset is
aligned to 'c->min_io_size'. Thus, we do not have to call it separately
from every "if" - just call it once at the end.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Now when we call 'ubifs_recover_leb()' only for LEBs which are potentially
corrupted (i.e., only for last buds, not for all of them), we can cleanup every
LEB, not only those where we find corruption. The reason - unstable bits. Even
though the LEB may look good now, it might contain unstable bits which may hit
us a bit later.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch cleans up 'ubifs_recover_leb()' function and makes it more readable.
Move things which are done only once out of the loop and kill unneeded 'switch'
statement.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
If a UBIFS filesystem is being mounted read-write, or is being remounted
from read-only to read-write, check for the "space_fixup" flag and fix
all LEBs containing empty space if necessary.
Artem: tweaked the patch a bit
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds the 'ubifs_fixup_free_space()' function which scans all
LEBs in the filesystem for those that are in-use but have one or more
empty pages, then re-maps the LEBs in order to erase the empty portions.
Afterward it removes the "space_fixup" flag from the UBIFS superblock.
Artem: massaged the patch
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The 'space_fixup' flag can be set in the superblock of a new filesystem by
mkfs.ubifs to indicate that any eraseblocks with free space remaining should be
fixed-up the first time it's mounted (after which the flag is un-set). This
means that the UBIFS image has been flashed by a "dumb" flasher and the free
space has been actually programmed (writing all 0xFFs), so this free space
cannot be used. UBIFS fixes the free space up by re-writing the contents of all
LEBs with free space using the atomic LEB change UBI operation.
Artem: improved commit message, add some more commentaries to the code.
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We'll need to use the 'next_log_lnum()' helper function from log.c in the fixup
code, so let's move it to misc.h. IOW, this is a preparation to the following
free space fixup changes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch improves UBIFS recovery and teaches it to expect corruption only
in the last buds. Indeed, currently we just recover all buds, which is
incorrect because only the last buds can have corruptions in case of a power
cut. So it is inconsistent with the rest of the recovery strategy which tries
hard to distinguish between corruptions cause by power cuts and other types of
corruptions.
This patch also adds one quirk - a bit older UBIFS was could have corruption in
the next to last bud because of the way it switched buds: when bud A is full,
it first searched for the next bud B, the wrote a reference node to the log
about B, and then synchronized the write-buffer of A. So we could end up with
buds A and B, where B is the last, but A had corruption. The UBIFS behavior
was fixed, though, so currently it always first synchronizes A's write-buffer
and only after this adds B to the log. However, to be make sure that we handle
unclean (after a power cut) UBIFS images belonging to older UBIFS - we need to
add a quirk and keep it for some time: we need to check for the situation
described above.
Thankfully, it is easy to check for that situation. When UBIFS adds B to the
log, it always first unmaps B, then maps it, and then syncs A's write-buffer.
Thus, in that situation we can check that B is empty, in which case it is OK to
have corruption in A. To check that B is empty it is enough to just read the
first few bytes of the bud and compare them with 0xFFs. This quirk may be
removed in a couple of years.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Currently when UBIFS fills up the current bud (which is the last in the journal
head) and switches to the next bud, it first writes the log reference node for
the next bud and only after this synchronizes the write-buffer of the previous
bud. This is not a big deal, but an unclean power cut may lead to a situation
when we have corruption in a next-to-last bud, although it is much more logical
that we have to have corruption only in the last bud.
This patch also removes write-buffer synchronization from
'ubifs_wbuf_seek_nolock()' because this is not needed anymore (we synchronize
the write-buffer explicitly everywhere now) and also because this is just
prone to various errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove a 'BUG()' statement when we are unable to find a bud and add a
similar 'ubifs_assert()' statement instead.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor preparation patch which changes 'replay_bud()' interface -
instead of passing bud lnum, offs, jhead, etc directly, pass a pointer to the
bud entry which contains all the information. The bud entry will be also needed
in one of the following patches.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch simplifies replay even further - it removes the replay tree and
adds the replay list instead. Indeed, we just do not need to use a tree here -
all we need to do is to add all nodes to the list and then sort it. Using
RB-tree is an overkill - more code and slower. And since we replay buds in
order, we expect the nodes to follow in _mostly_ sorted order, so the merge
sort becomes much cheaper in average than an RB-tree.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch simplifies the replay code and makes it smaller. First of all, we
can notice that we do not really need to create bud replay entries and insert
them to the replay tree, because the only reason we do this is to set buds
lprops correctly at the end. Instead, we can just walk the list of buds at the
very end and set lprops for each bud. This allows us to get rid of whole
'insert_ref_node()' function, the 'REPLAY_REF' flag, and several fields in
'struct replay_entry'. Then we can also notice that we do not need the 'flags'
'struct replay_entry' field, because there is only one flag -
'REPLAY_DELETION'. Instead, we can just add a 'deletion' bit fields. As a
result, this patch deletes much more lines that in adds.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is just a small preparation patch which adds 'free' and 'drity' fields to
'struct bud_entry'. They will be used to set bud lprops.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is patch removes an unnecessary 'offs' variable from 'ubifs_wbuf_write_nolock()'
- we can just keep 'wbuf->offs' up-to-date instead. This patch is very minor
the only motivation for it was that it is cleaner to keep wbuf->offs up-to-date
by the time we call 'ubifs_leb_write()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 52c6e6f990 provides misleading infomation
in the commit messages - buds are replied in order. And the real reason why
that fix helped is probably because it made sure we seek head even in read-only
mode (so deferred recovery will have seeked heads).
This patch adds an assertion which will fire if we reply buds out of order.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor change which makes 2 functions static because they
are not used outside the gc.c file: 'data_nodes_cmp()' and
'nondata_nodes_cmp()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>