Commit Graph

34219 Commits

Author SHA1 Message Date
Al Viro
9fd4d05949 [readdir] convert omfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:37 +04:00
Al Viro
1616abe841 [readdir] convert nilfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:36 +04:00
Al Viro
d55fea8ddb [readdir] convert sysfs
get rid of the kludges in sysfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:36 +04:00
Al Viro
d81a8ef598 [readdir] convert gfs2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:35 +04:00
Al Viro
75811d4fda [readdir] convert exofs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:34 +04:00
Al Viro
81b9f66e6b [readdir] convert bfs
... and get rid of that ridiculous mutex in bfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:33 +04:00
Al Viro
f0c3b5093a [readdir] convert procfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:32 +04:00
Al Viro
68c6147113 [readdir] convert openpromfs
what the hell is op_mutex for, BTW?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:32 +04:00
Al Viro
7aa123a0dc [readdir] convert efs
* sanity checks belong before risky operation, not after it
* don't quit as soon as we'd found an entry

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:31 +04:00
Al Viro
52018855e6 [readdir] convert configfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:30 +04:00
Al Viro
3903b38ce7 [readdir] convert romfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:29 +04:00
Al Viro
5f6039ce69 [readdir] convert squashfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:28 +04:00
Al Viro
01122e0688 [readdir] convert ubifs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:25 +04:00
Al Viro
5add2ee198 [readdir] convert udf
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:50 +04:00
Al Viro
5ded75ec4c [readdir] convert ext3
new helper: dir_relax(inode).  Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*).  Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:49 +04:00
Al Viro
5f99f4e79a [readdir] switch dcache_readdir() users to ->iterate()
new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:48 +04:00
Al Viro
80886298c0 [readdir] simple local unixlike: switch to ->iterate()
ext2, ufs, minix, sysv

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:47 +04:00
Al Viro
bb6f619b3a [readdir] introduce ->iterate(), ctx->pos, dir_emit()
New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:47 +04:00
Al Viro
5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
Al Viro
e06aeb5716 compat.c: LOOP_CLR_FD is taken care of in loop.c itself...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:44 +04:00
Artem Bityutskiy
605c912bb8 UBIFS: fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.

This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.

I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:45:37 +04:00
Artem Bityutskiy
33f1a63ae8 UBIFS: prepare to fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it.  But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.

In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.

So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:45:37 +04:00
David Disseldorp
7ac0febb81 cifs: fill TRANS2_QUERY_FILE_INFO ByteCount fields
Currently the trans2 ByteCount field is incorrectly left zero in
TRANS2_QUERY_FILE_INFO info_level=SMB_QUERY_FILE_ALL_INFO and
info_level=SMB_QUERY_FILE_UNIX_BASIC requests. The field should properly
reflect the FID, information_level and padding bytes carried in these
requests.

Leaving this field zero causes such requests to fail against Novell CIFS
servers. Other SMB servers (e.g. Samba) use the parameter count fields
for data length calculations instead, so do not suffer the same fate.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-06-29 00:09:44 -05:00
Chandra Seetharaman
83e782e1a1 xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD
Remove all incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD. Instead,
start using XFS_GQUOTA_.* XFS_PQUOTA_.* counterparts for GQUOTA and
PQUOTA respectively.

On-disk copy still uses XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD.

Read and write of the superblock does the conversion from *OQUOTA*
to *[PG]QUOTA*.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 17:39:22 -05:00
Lenny Szubowicz
8e48b1a8ed pstore: Return unique error if backend registration excluded by kernel param
This is patch 1/3 of a patch set that avoids what misleadingly appears
to be a error during boot:

ERST: Could not register with persistent store

This message is displayed if the system has a valid ACPI ERST table and the
pstore.backend kernel parameter has been used to disable use of ERST by
pstore. But this same message is used for errors that preclude registration.

As part of fixing this, return a unique error status from pstore_register
if the pstore.backend kernel parameter selects a specific facility other
than the requesting facility and check for this condition before any others.
This allows the caller to distinquish this benign case from the other failure
cases.

Also, print an informational console message about which facility
successfully registered as the pstore backend. Since there are various
kernel parameters, config build options, and boot-time errors that can
influence which facility registers with pstore, it's useful to have a
positive indication.

Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Reported-by: Naotaka Hamaguchi <n.hamaguchi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-28 15:21:52 -07:00
Trond Myklebust
959d921f5e Merge branch 'labeled-nfs' into linux-next
* labeled-nfs:
  NFS: Apply v4.1 capabilities to v4.2
  NFS: Add in v4.2 callback operation
  NFS: Make callbacks minor version generic
  Kconfig: Add Kconfig entry for Labeled NFS V4 client
  NFS: Extend NFS xattr handlers to accept the security namespace
  NFS: Client implementation of Labeled-NFS
  NFS: Add label lifecycle management
  NFS:Add labels to client function prototypes
  NFSv4: Extend fattr bitmaps to support all 3 words
  NFSv4: Introduce new label structure
  NFSv4: Add label recommended attribute and NFSv4 flags
  NFSv4.2: Added NFS v4.2 support to the NFS client
  SELinux: Add new labeling type native labels
  LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
  Security: Add Hook to test if the particular xattr is part of a MAC model.
  Security: Add hook to calculate context based on a negative dentry.
  NFS: Add NFSv4.2 protocol constants

Conflicts:
	fs/nfs/nfs4proc.c
2013-06-28 16:29:51 -04:00
Chuck Lever
f112bb4899 NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFS_CS_MIGRATION makes sense only for NFSv4 mounts.  Introduced by
commit 89652617 (NFS: Introduce "migration" mount option) Fri Sep 14
17:24:11 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 16:04:10 -04:00
Andy Adamson
18aad3d552 NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs4_init_session was originally written to be called prior to
nfs4_init_channel_attrs, setting the session target_max response and request
sizes that nfs4_init_channel_attrs would pay attention to.

In the current code flow, nfs4_init_session, just like nfs4_init_ds_session
for the data server case, is called after the session is all negotiated, and
is actually used in a RECLAIM COMPLETE call to the server.

Remove the un-needed fc_target_max response and request fields from
nfs4_session and just set the max_resp_sz and max_rqst_sz in
nfs4_init_channel_attrs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:55:19 -04:00
Jeff Layton
9111c95b07 nfs: have NFSv3 try server-specified auth flavors in turn
The current scheme is to try and pick the auth flavor that the server
prefers. In some cases though, we may find that we're not actually
able to use that auth flavor later. For instance, the server may
prefer an AUTH_GSS flavor, but we may not be able to get GSSAPI creds.

The current code just gives up at that point. Change it instead to
try the ->create_server call using each of the different authflavors
in the server's list if one was not specified at mount time. Once
we have a successful ->create_server call, return the result. Only
give up and return error if all attempts fail.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:52:37 -04:00
Jeff Layton
fb9b02fda0 nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
Instead of handling this as a special case in the auth-selection code,
we can simply fake up an auth_flavs list when the server doesn't
provide it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:51:51 -04:00
Jeff Layton
294ae81d4f nfs: move server_authlist into nfs_try_mount_request
In a later patch we're going to want to cycle over this list and attempt
to call ->create_server for each different flavor until one succeeds.
Move the list allocation to the stack of nfs_try_mount_request() and
pass a pointer to it and its length to nfs_request_mount().

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:49:49 -04:00
Jeff Layton
d17540c61b nfs: refactor "need_mount" code out of nfs_try_mount
This looks like pointless refactoring for now, but we'll flesh out
the need_mount case a little more in a later patch.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:49:27 -04:00
Andy Adamson
52fcac988a NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:45 -04:00
Andy Adamson
968fe25243 NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:44 -04:00
Andy Adamson
f1c097be2b NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
The GETDEVICEINFO gdia_maxcount represents all of the data being returned
within the GETDEVICEINFO4resok structure and includes the XDR overhead.

The CREATE_SESSION ca_maxresponsesize is the maximum reply and includes the RPC
headers (including security flavor credentials and verifiers).

Split out the struct pnfs_device field maxcount which is the gdia_maxcount
from the pglen field which is the reply (the total) buffer length.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:43 -04:00
Bryan Schumaker
ffa57b9e53 NFS: Improve legacy idmapping fallback
Fallback should happen only when the request_key() call fails, because
this indicates that there was a problem running the nfsidmap program.
We shouldn't call the legacy code if the error was elsewhere.

Signed-off-by: Bryan Schumaker <bjschuma@netappp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:22:07 -04:00
Chandra Seetharaman
0e6436d99e xfs: Change xfs_dquot_acct to be a 2-dimensional array
In preparation for combined pquota/gquota support, for the sake
of readability, change xfs_dquot_acct to be a 2-dimensional array.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 14:12:22 -05:00
Chandra Seetharaman
113a56835d xfs: Code cleanup and removal of some typedef usage
In preparation for combined pquota/gquota support, for the sake
of readability, do some code cleanup surrounding the affected
code.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 13:13:59 -05:00
Chandra Seetharaman
995961c451 xfs: Replace macro XFS_DQ_TO_QIP with a function
In preparation for combined pquota/gquota support, for the sake
of readability, change the macro to an inline function.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 13:12:42 -05:00
Chandra Seetharaman
329e087528 xfs: Replace macro XFS_DQUOT_TREE with a function
In preparation for combined pquota/gquota support, for the sake
of readability, change the macro to an inline function.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 13:05:07 -05:00
Chandra Seetharaman
9cad19d2cb xfs: Define a new function xfs_is_quota_inode()
In preparation for combined pquota/gquota support, define
a new function to check if the given inode is a quota inode.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 13:03:49 -05:00
Dave Chinner
dc037ad7d2 xfs: implement inode change count
For CRC enabled filesystems, add support for the monotonic inode
version change counter that is needed by protocols like NFSv4 for
determining if the inode has changed in any way at all between two
unrelated operations on the inode.

This bumps the change count the first time an inode is dirtied in a
transaction. Since all modifications to the inode are logged, this
will catch all changes that are made to the inode, including
timestamp updates that occur during data writes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-28 13:00:05 -05:00
Jan Kara
a5faeaf910 writeback: Fix periodic writeback after fs mount
Code in blkdev.c moves a device inode to default_backing_dev_info when
the last reference to the device is put and moves the device inode back
to its bdi when the first reference is acquired. This includes moving to
wb.b_dirty list if the device inode is dirty. The code however doesn't
setup timer to wake corresponding flusher thread and while wb.b_dirty
list is non-empty __mark_inode_dirty() will not set it up either. Thus
periodic writeback is effectively disabled until a sync(2) call which can
lead to unexpected data loss in case of crash or power failure.

Fix the problem by setting up a timer for periodic writeback in case we
add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi().

Reported-by: Bert De Jonghe <Bert.DeJonghe@amplidata.com>
CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:02 +02:00
Rafael J. Wysocki
207bc1181b Merge branch 'freezer'
* freezer:
  af_unix: use freezable blocking calls in read
  sigtimedwait: use freezable blocking call
  nanosleep: use freezable blocking call
  futex: use freezable blocking call
  select: use freezable blocking call
  epoll: use freezable blocking call
  binder: use freezable blocking calls
  freezer: add new freezable helpers using freezer_do_not_count()
  freezer: convert freezable helpers to static inline where possible
  freezer: convert freezable helpers to freezer_do_not_count()
  freezer: skip waking up tasks with PF_FREEZER_SKIP set
  freezer: shorten freezer sleep time using exponential backoff
  lockdep: check that no locks held at freeze time
  lockdep: remove task argument from debug_check_no_locks_held
  freezer: add unsafe versions of freezable helpers for CIFS
  freezer: add unsafe versions of freezable helpers for NFS
2013-06-28 13:00:53 +02:00
Jeff Layton
50285882fd cifs: fix SMB2 signing enablement in cifs_enable_signing
Commit 9ddec56131 (cifs: move handling of signed connections into
separate function) broke signing on SMB2/3 connections. While the code
to enable signing on the connections was very similar between the two,
the bits that get set in the sec_mode are different.

Declare a couple of new smb_version_values fields and set them
appropriately for SMB1 and SMB2/3. Then change cifs_enable_signing to
use those instead.

Reported-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-06-27 23:42:18 -05:00
Dave Chinner
ddf6ad0143 xfs: Use inode create transaction
Replace the use of buffer based logging of inode initialisation,
uses the new logical form to describe the range to be initialised
in recovery. We continue to "log" the inode buffers to push them
into the AIL and ensure that the inode create transaction is not
removed from the log before the inode buffers are written to disk.

Update the transaction identifier and reservations to match the
changed implementation.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27 14:27:18 -05:00
Dave Chinner
28c8e41af6 xfs: Inode create item recovery
When we find a icreate transaction, we need to get and initialise
the buffers in the range that has been passed. Extract and verify
the information in the item record, then loop over the range
initialising and issuing the buffer writes delayed.

Support an arbitrary size range to initialise so that in
future when we allocate inodes in much larger chunks all kernels
that understand this transaction can still recover them.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27 14:26:21 -05:00
Dave Chinner
b8402b4729 xfs: Inode create transaction reservations
Define the log and space transaction sizes. Factor the current
create log reservation macro into the two logical halves and reuse
one half for the new icreate transactions. The icreate transaction
is transparent to all the high level create code - the
pre-calculated reservations will correctly set the reservations
dependent on whether the filesystem supports the icreate
transaction.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27 13:36:37 -05:00
Dave Chinner
3ebe7d2d73 xfs: Inode create log items
Introduce the inode create log item type for logical inode create logging.
Instead of logging the changes in buffers, pass the range to be
initialised through the log by a new transaction type.  This reduces
the amount of log space required to record initialisation during
allocation from about 128 bytes per inode to a small fixed amount
per inode extent to be initialised.

This requires a new log item type to track it through the log
and the AIL. This is a relatively simple item - most callbacks are
noops as this item has the same life cycle as the transaction.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27 13:34:12 -05:00
Dave Chinner
5f6bed76c0 xfs: Introduce an ordered buffer item
If we have a buffer that we have modified but we do not wish to
physically log in a transaction (e.g. we've logged a logical
change), we still need to ensure that transactional integrity is
maintained. Hence we must not move the tail of the log past the
transaction that the buffer is associated with before the buffer is
written to disk.

This means these special buffers still need to be included in the
transaction and added to the AIL just like a normal buffer, but we
do not want the modifications to the buffer written into the
transaction. IOWs, what we want is an "ordered buffer" that
maintains the same transactional life cycle as a physically logged
buffer, just without the transcribing of the modifications to the
log.

Hence we need to flag the buffer as an "ordered buffer" to avoid
including it in vector size calculations or formatting during the
transaction. Once the transaction is committed, the buffer appears
for all intents to be the same as a physically logged buffer as it
transitions through the log and AIL.

Relogging will also work just fine for such an ordered buffer - the
logical transaction will be replayed before the subsequent
modifications that relog the buffer, so everything will be
reconstructed correctly by recovery.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27 13:33:11 -05:00