linux

History

Linus Torvalds 24e7ea3bea Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/ +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU DrPKlXwIgva7zq5/S0Vr =4s1Z -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...		2014-04-04 15:39:39 -07:00
..
inode.c	Major changes for 3.14 include support for the newly added ZERO_RANGE	2014-04-04 15:39:39 -07:00
internal.h	cramfs: take headers to fs/cramfs	2014-01-25 03:13:02 -05:00
Kconfig	cramfs: mark as obsolete	2013-11-13 12:09:12 +09:00
Makefile
README
uncompress.c	cramfs: take headers to fs/cramfs	2014-01-25 03:13:02 -05:00

README

Notes on Filesystem Layout
--------------------------

These notes describe what mkcramfs generates.  Kernel requirements are
a bit looser, e.g. it doesn't care if the <file_data> items are
swapped around (though it does care that directory entries (inodes) in
a given directory are contiguous, as this is used by readdir).

All data is currently in host-endian format; neither mkcramfs nor the
kernel ever do swabbing.  (See section `Block Size' below.)

<filesystem>:
	<superblock>
	<directory_structure>
	<data>

<superblock>: struct cramfs_super (see cramfs_fs.h).

<directory_structure>:
	For each file:
		struct cramfs_inode (see cramfs_fs.h).
		Filename.  Not generally null-terminated, but it is
		 null-padded to a multiple of 4 bytes.

The order of inode traversal is described as "width-first" (not to be
confused with breadth-first); i.e. like depth-first but listing all of
a directory's entries before recursing down its subdirectories: the
same order as `ls -AUR' (but without the /^\..*:$/ directory header
lines); put another way, the same order as `find -type d -exec
ls -AU1 {} \;'.

Beginning in 2.4.7, directory entries are sorted.  This optimization
allows cramfs_lookup to return more quickly when a filename does not
exist, speeds up user-space directory sorts, etc.

<data>:
	One <file_data> for each file that's either a symlink or a
	 regular file of non-zero st_size.

<file_data>:
	nblocks * <block_pointer>
	 (where nblocks = (st_size - 1) / blksize + 1)
	nblocks * <block>
	padding to multiple of 4 bytes

The i'th <block_pointer> for a file stores the byte offset of the
*end* of the i'th <block> (i.e. one past the last byte, which is the
same as the start of the (i+1)'th <block> if there is one).  The first
<block> immediately follows the last <block_pointer> for the file.
<block_pointer>s are each 32 bits long.

The order of <file_data>'s is a depth-first descent of the directory
tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
-print'.


<block>: The i'th <block> is the output of zlib's compress function
applied to the i'th blksize-sized chunk of the input data.
(For the last <block> of the file, the input may of course be smaller.)
Each <block> may be a different size.  (See <block_pointer> above.)
<block>s are merely byte-aligned, not generally u32-aligned.


Holes
-----

This kernel supports cramfs holes (i.e. [efficient representation of]
blocks in uncompressed data consisting entirely of NUL bytes), but by
default mkcramfs doesn't test for & create holes, since cramfs in
kernels up to at least 2.3.39 didn't support holes.  Run mkcramfs
with -z if you want it to create files that can have holes in them.


Tools
-----

The cramfs user-space tools, including mkcramfs and cramfsck, are
located at <http://sourceforge.net/projects/cramfs/>.


Future Development
==================

Block Size
----------

(Block size in cramfs refers to the size of input data that is
compressed at a time.  It's intended to be somewhere around
PAGE_CACHE_SIZE for cramfs_readpage's convenience.)

The superblock ought to indicate the block size that the fs was
written for, since comments in <linux/pagemap.h> indicate that
PAGE_CACHE_SIZE may grow in future (if I interpret the comment
correctly).

Currently, mkcramfs #define's PAGE_CACHE_SIZE as 4096 and uses that
for blksize, whereas Linux-2.3.39 uses its PAGE_CACHE_SIZE, which in
turn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
This discrepancy is a bug, though it's not clear which should be
changed.

One option is to change mkcramfs to take its PAGE_CACHE_SIZE from
<asm/page.h>.  Personally I don't like this option, but it does
require the least amount of change: just change `#define
PAGE_CACHE_SIZE (4096)' to `#include <asm/page.h>'.  The disadvantage
is that the generated cramfs cannot always be shared between different
kernels, not even necessarily kernels of the same architecture if
PAGE_CACHE_SIZE is subject to change between kernel versions
(currently possible with arm and ia64).

The remaining options try to make cramfs more sharable.

One part of that is addressing endianness.  The two options here are
`always use little-endian' (like ext2fs) or `writer chooses
endianness; kernel adapts at runtime'.  Little-endian wins because of
code simplicity and little CPU overhead even on big-endian machines.

The cost of swabbing is changing the code to use the le32_to_cpu
etc. macros as used by ext2fs.  We don't need to swab the compressed
data, only the superblock, inodes and block pointers.


The other part of making cramfs more sharable is choosing a block
size.  The options are:

  1. Always 4096 bytes.

  2. Writer chooses blocksize; kernel adapts but rejects blocksize >
     PAGE_CACHE_SIZE.

  3. Writer chooses blocksize; kernel adapts even to blocksize >
     PAGE_CACHE_SIZE.

It's easy enough to change the kernel to use a smaller value than
PAGE_CACHE_SIZE: just make cramfs_readpage read multiple blocks.

The cost of option 1 is that kernels with a larger PAGE_CACHE_SIZE
value don't get as good compression as they can.

The cost of option 2 relative to option 1 is that the code uses
variables instead of #define'd constants.  The gain is that people
with kernels having larger PAGE_CACHE_SIZE can make use of that if
they don't mind their cramfs being inaccessible to kernels with
smaller PAGE_CACHE_SIZE values.

Option 3 is easy to implement if we don't mind being CPU-inefficient:
e.g. get readpage to decompress to a buffer of size MAX_BLKSIZE (which
must be no larger than 32KB) and discard what it doesn't need.
Getting readpage to read into all the covered pages is harder.

The main advantage of option 3 over 1, 2, is better compression.  The
cost is greater complexity.  Probably not worth it, but I hope someone
will disagree.  (If it is implemented, then I'll re-use that code in
e2compr.)


Another cost of 2 and 3 over 1 is making mkcramfs use a different
block size, but that just means adding and parsing a -b option.


Inode Size
----------

Given that cramfs will probably be used for CDs etc. as well as just
silicon ROMs, it might make sense to expand the inode a little from
its current 12 bytes.  Inodes other than the root inode are followed
by filename, so the expansion doesn't even have to be a multiple of 4
bytes.