We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to
allocate and deallocate regions to a file without zeroing data or changing
i_size.
Though renamed, the structure passed in from user is identical to struct
xfs_flock64. The three fields that are actually used right now are l_whence,
l_start and l_len.
This should get ocfs2 immediate compatibility with userspace software using
the pre-existing xfs ioctls.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Provide an internal interface for the removal of arbitrary file regions.
ocfs2_remove_inode_range() takes a byte range within a file and will remove
existing extents within that range. Partial clusters will be zeroed so that
any read from within the region will return zeros.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The partial cluster zeroing code used during truncate usually assumes that
the rightmost byte in the range to be zeroed lies on a cluster boundary.
This makes sense for truncate, but punching holes might require zeroing on
non-aligned rightmost boundaries.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Add code to the btree paths to support the removal of arbitrary regions
within an existing extent. With proper higher level support this can be used
to "punch holes" in a file. Truncate (a special case of hole punching) could
also be converted to use these methods.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This can now be trivially supported with re-use of our existing extend code.
ocfs2_allocate_unwritten_extents() takes a start offset and a byte length
and iterates over the inode, adding extents (marked as unwritten) until len
is reached. Existing extents are skipped over.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Update the write code to detect when the user is asking to write to an
unwritten extent. Like writing to a hole, we must zero the region between
the write and the cluster boundaries. Most of the existing cluster zeroing
logic can be re-used with some additional checks for the unwritten flag on
extent records.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Writes to a region marked as unwritten might result in a record split or
merge. We can support splits by making minor changes to the existing insert
code. Merges require left rotations which mostly re-use right rotation
support functions.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The top level calls and logic for growing a tree can easily be abstracted
out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree().
This allows future code to easily grow btrees when needed.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Now that we have a method to deallocate blocks from them, each node should
allocate extent blocks from their local suballocator file.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Deallocation of suballocator blocks, most notably extent blocks, might
involve multiple suballocator inodes.
The locking for this can get extremely complicated, especially when the
suballocator inodes to delete from aren't known until deep within an
unrelated codepath.
Implement a simple scheme for recording the blocks to be unlinked so that
the actual deallocation can be done in a context which won't deadlock.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We don't want to submit buffer_new blocks for read i/o. This actually won't
happen right now because those requests during an allocating write are all nicely
aligned. It's probably a good idea to provide an explicit check though.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
ocfs2_mkwrite() will want this so that it can add some mmap specific checks
before asking for a write.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Use some ideas from the new-aops patch series and turn
ocfs2_buffered_write_cluster() into a 2 stage operation with the caller
copying data in between. The code now understands multiple cluster writes as
a result of having to deal with a full page write for greater than 4k pages.
This sets us up to easily call into the write path during ->page_mkwrite().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Use of the alloc sem during truncate was too narrow - we want to protect
the i_size change and page truncation against mmap now.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
ocfs2 will attempt to assign the node the slot# provided in the mount
option. Failure to assign the preferred slot is not an error. This small
feature can be useful for automated testing.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in
fs/ocfs2/dlm/dlmrecovery.c
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the
heartbeat region while it is starting up. Then o2hb_region_dev_write()
can check to see if it is alive and act accordingly. This prevents a hang
(not being woken) and a crash (if it's woken by a signal).
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Removing the local node configuration out from underneath a running
heartbeat is "bad". Provide an API in the ocfs2 nodemanager to request
a configfs dependancy on the local node, then use it in heartbeat.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
ocfs2 mounts require a heartbeat region. Use the new configfs_depend_item()
facility to actually depend on them so they can't go away from under us.
First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem.
Then teach o2hb_register_callbacks to take a UUID and depend on the
appropriate region. Finally, teach all users of o2hb to pass a UUID or
NULL if they don't require a pin.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Sometimes other drivers depend on particular configfs items. For
example, ocfs2 mounts depend on a heartbeat region item. If that
region item is removed with rmdir(2), the ocfs2 mount must BUG or go
readonly. Not happy.
This provides two additional API calls: configfs_depend_item() and
configfs_undepend_item(). A client driver can call
configfs_depend_item() on an existing item to tell configfs that it is
depended on. configfs will then return -EBUSY from rmdir(2) for that
item. When the item is no longer depended on, the client driver calls
configfs_undepend_item() on it.
These API cannot be called underneath any configfs callbacks, as
they will conflict. They can block and allocate. A client driver
probably shouldn't calling them of its own gumption. Rather it should
be providing an API that external subsystems call.
How does this work? Imagine the ocfs2 mount process. When it mounts,
it asks for a heart region item. This is done via a call into the
heartbeat code. Inside the heartbeat code, the region item is looked
up. Here, the heartbeat code calls configfs_depend_item(). If it
succeeds, then heartbeat knows the region is safe to give to ocfs2.
If it fails, it was being torn down anyway, and heartbeat can gracefully
pass up an error.
[ Fixed some bad whitespace in configfs.txt. --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Add a notification callback, ops->disconnect_notify(). It has the same
prototype as ->drop_item(), but it will be called just before the item
linkage is broken. This way, configfs users who want to do work while
the object is still in the heirarchy have a chance.
Client drivers will still need to config_item_put() in their
->drop_item(), if they implement it. They need do nothing in
->disconnect_notify(). They don't have to provide it if they don't
care. But someone who wants to be notified before ci_parent is set to
NULL can now be notified.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Seems copied from sysfs, but I don't see a reason here nor there to use
a semaphore instead of a mutex. Convert.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Convert the su_sem member of struct configfs_subsystem to a struct
mutex, as that's what it is. Also convert all the users and update
Documentation/configfs.txt and Documentation/configfs_example.c
accordingly.
[ Conflict in fs/dlm/config.c with commit
3168b0780d manually resolved. --Mark ]
Inspired-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Configfs being based upon sysfs code, config_group_find_obj() is probably
so named because of the similar kset_find_obj() in sysfs. However,
"kobject"s in sysfs become "config_item"s in configfs, so let's call it
config_group_find_item() instead, for sake of uniformity, and make
corresponding change in the users of this function.
BTW a crucial difference between kset_find_obj and config_group_find_item
is in locking expectations. kset_find_obj does its locking by itself, but
config_group_find_item expects the *caller* to do the locking. The reason
for this: kset's have their own locks, config_group's don't but instead
rely on the subsystem mutex. And, subsystem needn't necessarily be around
when config_group_find_item() is called.
So let's state these locking semantics explicitly, and rectify the comment,
otherwise bugs could continue to occur in future, as they did in the past
(refer commit d82b8191e238 in gfs2-2.6-fixes.git).
[ I also took the opportunity to fix some bad whitespace and
double-empty lines. --Joel ]
[ Conflict in fs/dlm/config.c with commit
3168b0780d manually resolved. --Mark ]
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
fs/dlm/config.c contains a useful generic macro called __CONFIGFS_ATTR
that is similar to sysfs' __ATTR macro that makes defining attributes
easy for any user of configfs. Separate it out into configfs.h so that
other users (forthcoming in dynamic netconsole patchset) can use it too.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1. item.c:config_item_cleanup() is a private function (only called by
config_item_release() in same file). However, it is spuriously
exported in include/linux/configfs.h, so remove that export and make
it static in item.c. Also, it is no longer exported / interface
function, so no need to give comment for this function (the comment
was stating obvious thing, anyway).
2. Kernel-doc comment format does not allow empty line between end of
comment and start of function (declaration line). There were several
such spurious empty lines in item.c, so fix them.
fs/configfs/item.c | 15 +++------------
include/linux/configfs.h | 1 -
2 files changed, 3 insertions(+), 13 deletions(-)
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The attribute store/show code currently limits attributes at PAGE_SIZE.
This code comes from sysfs, where it still works that way.
However, PAGE_SIZE is not constant. A 16k attribute string works on
ia64 but not on x86. Really a subsystem shouldn't allow different
attribute sizes based on platform.
As such, limit all simple attributes to 4k. This works on all
platforms, and is consistent with all current code.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] vmlogrdr function annotation.
[S390] s390: rename CPU_IDLE to S390_CPU_IDLE
[S390] cio: Remove prototype for non-existing function cmf_reset().
[S390] zcrypt: fix request timeout handling
[S390] system call optimization.
[S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE
[S390] Remove volatile from atomic_t
[S390] Program check in diag 210 under 31 bit
[S390] Bogomips calculation for 64 bit.
[S390] smp: Merge smp_count_cpus() and smp_get_save_areas().
[S390] zcore: Fix __user annotation.
[S390] fixed cdl-format detection.
[S390] sclp: Test facility list before executing a service call.
[S390] sclp: introduce some new interfaces.
[S390] Fixed comment typo.
[S390] vmcp cleanup
* 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block:
pipe: add documentation and comments
pipe: change the ->pin() operation to ->confirm()
Remove remnants of sendfile()
xip sendfile removal
splice: completely document external interface with kerneldoc
sendfile: remove bad_sendfile() from bad_file_ops
shmem: convert to using splice instead of sendfile()
relay: use splice_to_pipe() instead of open-coding the pipe loop
pipe: allow passing around of ops private pointer
splice: divorce the splice structure/function definitions from the pipe header
splice: relay support
sendfile: convert nfsd to splice_direct_to_actor()
sendfile: convert nfs to using splice_read()
loop: convert to using splice_direct_to_actor() instead of sendfile()
splice: add void cookie to the actor data
sendfile: kill generic_file_sendfile()
sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
sys_sendfile: switch to using ->splice_read, if available
vmsplice: add vmsplice-to-user support
splice: abstract out actor data
On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote:
> > -#define GFS2_LARGE_FH_SIZE 10
> > -
> > -struct gfs2_fh_obj {
> > - struct gfs2_inum_host this;
> > - u32 imode;
> > -};
> > +#define GFS2_LARGE_FH_SIZE 8
>
> Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE
> or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older
> gfs version anymore. Stale filehandles because of a new kernel version
> are a big no-no, so please add back code to handle the old filehandles
> on the decode side.
>
This should fix that problem I think since its only relating to end of
the fh we can just ignore that field in order to accept the older
format.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wendy Cheng <wcheng@redhat.com>
CDL formated DASDs are now detected correctly even if no VOL1 label is
on the disk. This prevents possible loss of data.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As per Andrew Mortons request, here's a set of documentation for
the generic pipe_buf_operations hooks, the pipe, and pipe_buffer
structures.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.
A good return from ->confirm() means that the buffer is really
there, and that the contents are good.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There are now zero users of .sendfile() in the kernel, so kill
it from the file_operations structure and in do_sendfile().
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch removes xip_file_sendfile, the sendfile implementation for
xip without replacement. Those customers that use xip on s390 are not
using sendfile() as far as we know, and so far s390 is the only platform
this could potentially be used on so far.
Having sendfile is not a popular feature for execute in place file
systems, however we have a working implementation of splice_read() based
on fs/splice.c if anyone asks for it.
At this point in time, it does not seem preferable to merge
splice_read() for xip because it causes extra maintenence effort due to
code duplication and it requires struct page behind the xip memory
segment. We'd like to get rid of that in favor of supporting flash based
embedded platforms (Monta Vista work) soon.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Also add fs/splice.c as a kerneldoc target with a smaller blurb that
should be expanded to better explain the overview of splice.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
relay needs this for proper consumption handling, and the network
receive support needs it as well to lookup the sk_buff on pipe
release.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
A bit of a cheat, it actually just copies the data to userspace. But
this makes the interface nice and symmetric and enables people to build
on splice, with room for future improvement in performance.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>