AIO/DIO prefix is wrong because it account unwritten extents which
also may be scheduled from buffered write endio
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Generic inode has unused i_private pointer which may be used as cur_aio_dio
storage.
TODO: If cur_aio_dio will be passed as an argument to get_block_t this allow
to have concurent AIO_DIO requests.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Rebased and resending the patch.
Path based queries can fail for lack of access, especially during lookup
during open.
open itself would actually succeed becasue of back up intent bit
but queries (either path or file handle based) do not have a means to
specifiy backup intent bit.
So query the file info during lookup using
trans2 / findfirst / file_id_full_dir_info
to obtain file info as well as file_id/inode value.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs fixes from Al Viro:
"A couple of fixes; one for automount/lazy umount race, another a
classic "we don't protect the refcount transition to zero with the
lock that protects looking for object in hash" kind of crap in lockd."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
close the race in nlmsvc_free_block()
do_add_mount()/umount -l races
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJQX7MuAAoJEHm+PkMAQRiG0h0IAJURkrMCAQUxA+Ik66ReH89s
LQcVd0U9uL4UUOi7f5WR64Vf9Cfu6VVGX9ZKSvjpNskvlQaUQPMIt4pMe6g4X4dI
u0bApEy4XZz3nGabUAghIU8jJ8cDmhCG6kPpSiS7pi7KHc0yIa4WFtJRrIpGaIWT
xuK38YOiOHcSDRlLyWZzainMncQp/ixJdxnqVMTonkVLk0q0b84XzOr4/qlLE5lU
i+TsK3PRKdQXgvZ4CebL+srPBwWX1dmgP3VkeBloQbSSenSeELICbFWavn2ml+sF
GXi4dO93oNquL/Oy5SwI666T4uNcrRPaS+5X+xSZgBW/y2aQVJVJuNZg6ZP/uWk=
=0v2l
-----END PGP SIGNATURE-----
Merge tag 'v3.6-rc7' into next
Linux 3.6-rc7
Requested by David Howells so he can merge his key susbsystem work into
my tree with requisite -linus changesets.
"Search list for X" sounds like you're trying to find X on a list.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When ext4_bread() returns NULL and err is set to zero, this means
there is no phyical block mapped to the specified logical block
number. (Previous to commit 90b0a97323, err was uninitialized in this
case, which caused other problems.)
The directory handling routines use ext4_bread() in many places, the
fact that ext4_bread() now returns NULL with err set to zero could
cause problems since a number of these functions will simply return
the value of err if the result of ext4_bread() was the NULL pointer,
causing the caller of the function to think that the function was
successful.
Since directories should never contain holes, this case can only
happen if the file system is corrupted. This commit audits all of the
callers of ext4_bread(), and makes sure they do the right thing if a
hole in a directory is found by ext4_bread().
Some ext4_bread() callers did not need any changes either because they
already had its own hole detector paths.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In the check code above, if orig_start != donor_start, we would
return -EINVAL. So here, orig_start should be equal with donor_start.
Remove the redundant check here.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the file system contains errors and it is being mounted read-only,
don't clear the orphan list. We should minimize changes to the file
system if it is mounted read-only.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
and remove redundant (rsp == NULL) checks after SendReceive2.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
ext4 users of data=journal mode with blocksize < pagesize were
occasionally hitting assertion failure in
jbd2_journal_commit_transaction() checking whether the transaction has
at least as many credits reserved as buffers attached. The core of the
problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are
left with buffer_mapped set. When this happens to buffers beyond i_size
attached to a page stradding i_size, subsequent write extending the file
will see these buffers and as they are mapped (but underlying blocks
were freed) things go awry from here.
The assertion failure just coincidentally (and in this case luckily as
we would start corrupting filesystem) triggers due to journal_head not
being properly cleaned up as well.
We fix the problem by unmapping buffers if possible (in lots of cases we
just need a buffer attached to a transaction as a place holder but it
must not be written out anyway). And in one case, we just have to bite
the bullet and wait for transaction commit to finish.
CC: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When the EXT4_IOC_MOVE_EXT ioctl() fails on bigalloc file systems, we
should jump to the 'mext_out' label to release the donor file reference.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
With a minor tweaks regarding minimum extent size to discard and
discarded bytes reporting the FITRIM can be enabled on bigalloc file
system and it works without any problem.
This patch fixes minlen handling and discarded bytes reporting to
take into consideration bigalloc enabled file systems and finally
removes the restriction and allow FITRIM to be used on file system with
bigalloc feature enabled.
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
memory for info->fields, it frees already allocated stuff and returns
error to its caller, fill_note_info. Which in turn returns error to its
caller, elf_core_dump. Which jumps to cleanup label and calls
free_note_info, which will happily try to free all info->fields again.
BOOM.
This is the fix.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Each iteration of d_delete we reload inode from dentry->d_inode and
then call S_ISDIR(inode-i_mode), so inode cannot possibly be NULL
shortly afterwards unless something went horribly wrong.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.
Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.
CC: <stable@vger.kernel.org> # >= 2.6.32
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
get_write_access() is needed for nfsd, not binfmt_aout (the latter
has no business doing anything of that kind, of course)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch converts /proc/pid/fdinfo/ handling routines to seq-file which
is needed to extend seq operations and plug in auxiliary fdinfo provides
from subsystems like eventfd/eventpoll/fsnotify.
Note the proc_fd_link no longer call for proc_fd_info, simply because
the guts of proc_fd_info() got merged into ->show() of that seq_file
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch prepares the ground for further extension of
/proc/pid/fd[info] handling code by moving fdinfo handling
code into fs/proc/fd.c.
I think such move makes both fs/proc/base.c and fs/proc/fd.c
easier to read.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: James Bottomley <jbottomley@parallels.com>
CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Matthew Helsley <matt.helsley@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
descriptor-related parts of daemonize, done right. As the
result we simplify the locking rules for ->files - we
hold task_lock in *all* cases when we modify ->files.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator. It is called with files->file_lock
held, so it is not allowed to block.
tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove unused function ext4_ext_check_cache() and merge the code back to
the ext4_ext_in_cache().
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Similar situation to that of __alloc_fd(); do not use unless you
really have to. You should not touch any descriptor table other
than your own; it's a sure sign of a really bad API design.
As with __alloc_fd(), you *must* use a first-class reference to
struct files_struct; something obtained by get_files_struct(some task)
(let alone direct task->files) will not do. It must be either
current->files, or obtained by get_files_struct(current) by the
owner of that sucker and given to you.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
At that point nobody can see us anyway; everything that
looks at files_fdtable(files) is separated from the
guts of put_files_struct(files) - either since files is
current->files or because we fetched it under task_lock()
and hadn't dropped that yet, or because we'd bumped
files->count while holding task_lock()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Essentially, alloc_fd() in a files_struct we own a reference to.
Most of the time wanting to use it is a sign of lousy API
design (such as android/binder). It's *not* a general-purpose
interface; better that than open-coding its guts, but again,
playing with other process' descriptor table is a sign of bad
design.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... except for one in android, where the check is different
and already done in caller. No need to recalculate rlimit
many times in alloc_fd() either.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do copy_to_user() before prepare_for_access_response(); that kills
the need in remove_access_response().
* don't do fd_install() until we are past the last possible failure
exit. Don't use sys_close() on cleanup side - just put_unused_fd()
and fput(). Less racy that way...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
don't mess with sys_close() if copy_to_user() fails; just postpone
fd_install() until we know it hasn't.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The only difference between autofs_dev_ioctl_fd_install() and
fd_install() is __set_close_on_exec() done by the latter. Just
use get_unused_fd_flags(O_CLOEXEC) to allocate the descriptor
and be done with that...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset().
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
As inode64 is the default option now, and was also made remountable
previously, inode32 can also be remounted on-the-fly when it is needed.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
To make inode32 a remountable option, xfs_set_inode32() should be able
to make a transition from inode64 option, disabling inode allocation on
higher AGs.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
With the changes made on xfs_set_inode64(), to make it behave as
xfs_set_inode32() (now leaving to the caller the responsibility to update
mp->m_maxagi), we use the return value of xfs_set_inode64() to update
mp->m_maxagi during remount.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Add xfs_set_inode32() to be used to enable inode32 allocation mode. this
will reduce the amount of duplicated code needed to mount/remount a
filesystem with inode32 option. This patch also changes
xfs_set_inode64() to return the maximum AG number that inodes can be
allocated instead of set mp->m_maxagi by itself, so that the behaviour
is the same as xfs_set_inode32(). This simplifies code that calls these
functions and needs to know the maximum AG that inodes can be allocated
in.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
since 64-bit inodes can be accessed while using inode32, and these can
also be used on 32-bit kernels, there is no reason to still keep inode32
as the default mount option. If the filesystem cannot handle 64bit
inode numbers (i.e CONFIG_LBDAF is not enabled and BITS_PER_LONG == 32),
XFS_MOUNT_SMALL_INUMS will still be set by default, so inode64 is not an
unconditional default value.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
xfs_ialloc_next_ag() currently resets m_agirotor when it is equal to
m_maxagi:
if (++mp->m_agirotor == mp->m_maxagi)
mp->m_agirotor = 0;
But, if for some reason mp->m_maxagi changes to a lower value than
current m_agirotor, this condition will never be true, causing
m_agirotor to exceed the maximum allowed value (m_maxagi).
This implies mainly during lookups for xfs_perag structs in its radix
tree, since the agno value used for the lookup is based on m_agirotor.
An out-of-range m_agirotor may cause a lookup failure which in case will
return NULL.
As an example, the value of m_maxagi is decreased during
inode64->inode32 remount process, case where I've found this problem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Actually, there is no reason about why a user must umount and mount a
XFS filesystem to enable 'inode64' option. So, this patch makes this a
remountable option.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
ERRnoresource is an ERRSRV level (aka server-side) error and means "No
resources currently available for request". Currently that maps to POSIX
-ENOBUFS. No NT errors map to it currently.
NT_STATUS_INSUFFICIENT_RESOURCES and NT_STATUS_INSUFF_SERVER_RESOURCES
are also similar in meaning. Currently the client maps those to
ERRnomem, which maps to -ENOMEM in POSIX.
All of these mappings seem to be quite wrong to me and are confusing for
users. All of the above errors indicate problems on the server, not the
client. Reporting -ENOMEM or -ENOBUFS implies that the client is running
out of resources.
This patch changes those mappings. The NT_* errors are changed to map to
the SRV level ERRnoresource. That error is in turn changed to return
-EREMOTEIO which is the only POSIX error I could find that conveys that
something went wrong on the server. While we're at it, change the SMB2
equivalent error to return the same.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Uninitialized extent may became initialized(parallel writeback task)
at any moment after we drop i_data_sem, so we have to recheck extent's
state after we hold page's lock and i_data_sem.
If we about to change page's mapping we must hold page's lock in order to
serialize other users.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Non-full list of bugs:
1) uninitialized extent optimization does not hold page's lock,
and simply replace brunches after that writeback code goes
crazy because block mapping changed under it's feets
kernel BUG at fs/ext4/inode.c:1434! ( 288'th xfstress)
2) uninitialized extent may became initialized right after we
drop i_data_sem, so extent state must be rechecked
3) Locked pages goes uptodate via following sequence:
->readpage(page); lock_page(page); use_that_page(page)
But after readpage() one may invalidate it because it is
uptodate and unlocked (reclaimer does that)
As result kernel bug at include/linux/buffer_head.c:133!
4) We call write_begin() with already opened stansaction which
result in following deadlock:
->move_extent_per_page()
->ext4_journal_start()-> hold journal transaction
->write_begin()
->ext4_da_write_begin()
->ext4_nonda_switch()
->writeback_inodes_sb_if_idle() --> will wait for journal_stop()
5) try_to_release_page() may fail and it does fail if one of page's bh was
pinned by journal
6) If we about to change page's mapping we MUST hold it's lock during entire
remapping procedure, this is true for both pages(original and donor one)
Fixes:
- Avoid (1) and (2) simply by temproraly drop uninitialized extent handling
optimization, this will be reimplemented later.
- Fix (3) by manually forcing page to uptodate state w/o dropping it's lock
- Fix (4) by rearranging existing locking:
from: journal_start(); ->write_begin
to: write_begin(); journal_extend()
- Fix (5) simply by checking retvalue
- Fix (6) by locking both (original and donor one) pages during extent swap
with help of mext_page_double_lock()
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
- Remove usless checks, because it is too late to check that inode != NULL
at the moment it was referenced several times.
- Double lock routines looks very ugly and locking ordering relays on
order of i_ino, but other kernel code rely on order of pointers.
Let's make them simple and clean.
- check that inodes belongs to the same SB as soon as possible.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
When performing an online resize, we add a bunch of groups at one time
in ext4_flex_group_add, so in most cases a lot of group descriptors
will be in the same group block. But in the end of this function,
update_backups will be called for every group descriptor and the same
block will be copied and journalled again and again. It is really a
waste.
Fix things so we only update a particular bg descriptor block once and
skip subsequent updates of the same block.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
bh_submit_read() is responsible for unlock bh on endio. In addition,
we need to use bh_uptodate_or_lock() to avoid races.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Wrap the use of TIOCSRS485 and TIOCGRS485 in #ifdef so that we avoid
adding undefined IOCTLs to the ioctl pointer list as compatible
ioctls.
This change was motivated by a build error on a MIPS build.
tree: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
tty-next
head: ac57e7f38e
commit: 84c3b84860 [10/16] compat_ioctl:
Add RS-485 IOCTLs to the list
config: mips-fuloong2e_defconfig
All related error/warning messages:
fs/compat_ioctl.c:869:1: error: 'TIOCSRS485' undeclared here (not in a
function)
fs/compat_ioctl.c:870:1: error: 'TIOCGRS485' undeclared here (not in a
function)
vim +869 fs/compat_ioctl.c
863 COMPATIBLE_IOCTL(TIOCSPGRP)
864 COMPATIBLE_IOCTL(TIOCGPGRP)
865 COMPATIBLE_IOCTL(TIOCGPTN)
866 COMPATIBLE_IOCTL(TIOCSPTLCK)
867 COMPATIBLE_IOCTL(TIOCSERGETLSR)
868 COMPATIBLE_IOCTL(TIOCSIG)
> 869 COMPATIBLE_IOCTL(TIOCSRS485)
870 COMPATIBLE_IOCTL(TIOCGRS485)
871 #ifdef TCGETS2
872 COMPATIBLE_IOCTL(TCGETS2)
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jaeden Amero <jaeden.amero@ni.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
...and make the default cache=strict as promised for 3.7.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
and add missed increments of failed async read and write requests.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
by making it __le64 rather than __u64 in FILE_AL_INFO structure.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Some trivial endian fixes for the SMB2 code. One
warning remains which I asked Pavel to look at.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Now that the merge of the remaining pieces needed for
SMB2 (SMB2.1 dialect) are in, and most test cases pass,
we can consider SMB2.1 EXPERIMENTAL rather than "BROKEN."
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
With SMB2 support, update from version 1.79 to 2.0 to make
it easier for users to recognize which version has SMB2 support.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
FL_CLOSE is quite common when you close a file on which you hold a
lock. The spurious "Unknown lock flags" message in cFYI is
confusing in this case.
Reported-by: Alexander Bokovoy <abokovoy@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The string for "unc=" in /proc/mounts needs to be escaped. The current
behaviour can create problems in cases when mounting a share starting
with a number.
example:
>mount -t cifs -o username=test,password=x vm140-31:/17000-test /mnt
>mount -o remount,password=x /mnt
mount error: could not resolve address for vm140-31x00-test: Unknown
error
The sub-string "\170" which is part of the unc for the mount above in
/proc/mounts is interpreted as character'x' in the case above. Escaping
the string fixes the problem.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Rename inode pointers for better clarity. Move the d_instantiate call to
the end of the function to prevent other tasks from seeing it before
we've finished constructing it. Since we should have exclusive access to
the inode at this point, remove the spinlock around i_nlink update.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now we walk though cifsFileInfo's list for every incoming lease
break and look for an equivalent there. That approach misses lease
breaks that come just after an open response - we don't have time
to populate new cifsFileInfo structure to the list. Fix this by
adding new list of pending opens and look for a lease there if we
didn't find it in the list of cifsFileInfo structures.
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we have a file opened with read oplock and we are writing a data
to this file, we need to store the data in the cache and then send to
the server to ensure that the next read operation will get a coherent
data.
Also mark it as CONFIG_CIFS_SMB2 because it's more suitable for SMB2
code but can fix some CIFS problems too (when server delays sending
an oplock break after a write request). We can drop this ifdefs
dependence in future.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Currently CIFS code accept read/write ops on mandatory locked area
when two processes use the same file descriptor - it's wrong.
Fix this by serializing io and brlock operations on the inode.
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
and allow several processes to walk through the lock list and read
can_cache_brlcks value if they are not going to modify them.
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now we need to lock/unlock a spinlock while processing brlock ops
on the inode. Move brlocks of a fid to a separate list and attach
all such lists to the inode. This let us not hold a spinlock.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Now that we aren't abusing the kmap address space, there's no need for
this lock or to impose a limit on the rsize.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Replace the "marshal_iov" function with a "read_into_pages" function.
That function will copy the read data off the socket and into the
pages array, kmapping and reading pages one at a time.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
We'll need an array to put into a smb_rqst, so convert this into an array
instead of (ab)using the lru list_head.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Eventually, we're going to want to append a list of pages to
cifs_readdata instead of a list of kvecs. To prepare for that, turn
the kvec array allocation into a separate one and just keep a
pointer to it in the readdata.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Now that we're using TCP_CORK on the socket, there's no value in
continuting to support this option. Schedule it for removal in 3.9.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Now that we're not kmapping so much at once, there's no need to cap
the wsize at the amount that can be simultaneously kmapped.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
For now, none of the callers populate rq_pages. That will be done for
writes in a later patch. While we're at it, change the prototype of
setup_async_request not to need a return pointer argument. Just
return the pointer to the mid_q_entry or an ERR_PTR.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Use the smb_send_rqst helper function to kmap each page in the array
and update the hash for that chunk.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Add code that allows smb_send_rqst to send an array of pages after the
initial kvec array has been sent. For now, we simply kmap the page
array and send it using the standard smb_send_kvec function. Eventually,
we may want to convert this code to use kernel_sendpage under the hood
and avoid the kmap altogether for the page data.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
We want to send SMBs as "atomically" as possible. Prior to sending any
data on the socket, cork it to make sure that no non-full frames go
out. Afterward, uncork it to make sure all of the data gets pushed out
to the wire.
Note that this more or less renders the socket=TCP_NODELAY mount option
obsolete. When TCP_CORK and TCP_NODELAY are used on the same socket,
TCP_NODELAY is essentially ignored.
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Again, just a change in the arguments and some function renaming here.
In later patches, we'll change this code to deal with page arrays.
In this patch, we add a new smb_send_rqst wrapper and have smb_sendv
call that. Then we move most of the existing smb_sendv code into a new
function -- smb_send_kvec. This seems a little redundant, but later
we'll flesh this out to deal with arrays of pages.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
We need a way to represent a call to be sent on the wire that does not
require having all of the page data kmapped. Behold the smb_rqst struct.
This new struct represents an array of kvecs immediately followed by an
array of pages.
Convert the signing routines to use these structs under the hood and
turn the existing functions for this into wrappers around that. For now,
we're just changing these functions to take different args. Later, we'll
teach them how to deal with arrays of pages.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Use hmac-sha256 and rather than hmac-md5 that is used for CIFS/SMB.
Signature field in SMB2 header is 16 bytes instead of 8 bytes.
Automatically enable signing by client when requested by the server
when signing ability is available to the client.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>