Commit Graph

99 Commits

Author SHA1 Message Date
Linus Torvalds
c96e6dabfb We've got eight GFS2 patches for this merge window:
1. Andreas Gruenbacher has four patches related to cleaning up the GFS2
    inode evict process. This is about half of his patches designed to
    fix a long-standing GFS2 hang related to the inode shrinker.
    (Shrinker calls gfs2 evict, evict calls DLM, DLM requires memory
    and blocks on the shrinker.) These 4 patches have been well tested.
    His second set of patches are still being tested, so I plan to hold
    them until the next merge window, after we have more weeks of testing.
    The first patch eliminates the flush_delayed_work, which can block.
 2. Andreas's second patch protects setting of gl_object for rgrps with
    a spin_lock to prevent proven races.
 3. His third patch introduces a centralized mechanism for queueing glock
    work with better reference counting, to prevent more races.
 4. His fourth patch retains a reference to inode glocks when an error
    occurs while creating an inode. This keeps the subsequent evict from
    needing to reacquire the glock, which might call into DLM and block
    in low memory conditions.
 5. Arvind Yadav has a patch to add const to attribute_group structures.
 6. I have a patch to detect directory entry inconsistencies and withdraw
    the file system if any are found. Better that than silent corruption.
 7. I have a patch to remove a vestigial variable from glock structures,
    saving some slab space.
 8. I have another patch to remove a vestigial variable from the GFS2
    in-core superblock structure.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZXOIfAAoJENeLYdPf93o7RVcH/jLEK3hmZOd94pDTYg3Damuo
 KI3xjyutDgQT83uwg8p5UBPwRYCDnyiOLwOWGBJJvjPEI1S4syrXq/FzOmxmX6cV
 nE28ARL/OXCoFEXBMUVHvHL3nK+zEUr8rO6Xz51B1ifVq7GV8iVK+ZgxzRhx0PWP
 f+0SVHiQtU0HKyxR5y9p43oygtHZaGbjy4WL0YbmFZM59y5q9A8rBHFACn2JyPBm
 /zXN6gF/Orao+BDXLT6OM3vNXZcOQ7FUPWwctguHsAO/bLzWiISyfJxLWJsHvSdW
 tzFTN1DByjXvqAhs4HTSuh9JfBDAyxcXkmczXJyATBkCTEJv42Iev+ILmre+wwQ=
 =YTwn
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got eight GFS2 patches for this merge window:

   - Andreas Gruenbacher has four patches related to cleaning up the
     GFS2 inode evict process. This is about half of his patches
     designed to fix a long-standing GFS2 hang related to the inode
     shrinker: Shrinker calls gfs2 evict, evict calls DLM, DLM requires
     memory and blocks on the shrinker.

     These four patches have been well tested. His second set of patches
     are still being tested, so I plan to hold them until the next merge
     window, after we have more weeks of testing. The first patch
     eliminates the flush_delayed_work, which can block.

   - Andreas's second patch protects setting of gl_object for rgrps with
     a spin_lock to prevent proven races.

   - His third patch introduces a centralized mechanism for queueing
     glock work with better reference counting, to prevent more races.

    -His fourth patch retains a reference to inode glocks when an error
     occurs while creating an inode. This keeps the subsequent evict
     from needing to reacquire the glock, which might call into DLM and
     block in low memory conditions.

   - Arvind Yadav has a patch to add const to attribute_group
     structures.

   - I have a patch to detect directory entry inconsistencies and
     withdraw the file system if any are found. Better that than silent
     corruption.

   - I have a patch to remove a vestigial variable from glock
     structures, saving some slab space.

   - I have another patch to remove a vestigial variable from the GFS2
     in-core superblock structure"

* tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: constify attribute_group structures.
  gfs2: gfs2_create_inode: Keep glock across iput
  gfs2: Clean up glock work enqueuing
  gfs2: Protect gl->gl_object by spin lock
  gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
  GFS2: Eliminate vestigial sd_log_flush_wrapped
  GFS2: Remove gl_list from glock structure
  GFS2: Withdraw when directory entry inconsistencies are detected
2017-07-05 16:57:08 -07:00
Arvind Yadav
29695254ec GFS2: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   5259	   1344	      8	   6611	   19d3	fs/gfs2/sys.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   5371	   1216	      8	   6595	   19c3	fs/gfs2/sys.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:14 -05:00
Christoph Hellwig
85787090a2 fs: switch ->s_uuid to uuid_t
For some file systems we still memcpy into it, but in various places this
already allows us to use the proper uuid helpers.  More to come..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05 16:59:12 +02:00
Ingo Molnar
5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Fabian Frederick
e50ead480f gfs2: convert simple_str to kstr
-Remove obsolete simple_str functions.
-Return error code when kstr failed.
-This patch also calls functions corresponding to destination type.

Thanks to Alexey Dobriyan for suggesting improvements in
block_store() and wdack_store()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-05-05 13:23:22 -05:00
alex chen
3566c96476 GFS2: fix sprintf format specifier
Sprintf format specifier "%d" and "%u" are mixed up in
gfs2_recovery_done() and freeze_show(). So correct them.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2015-01-13 10:48:57 +00:00
Linus Torvalds
ba1bdefec3 This must be about the smallest merge window patch set ever for GFS2.
It is probably also the first one without a single patch from me. That
 is down to a combination of factors, and I have some things in the works
 that are not quite ready yet, that I hope to put in next time around.
 
 Returning to what is here this time... we have 3 patches which fix
 various warnings. Two are bug fixes (for quotas and also a
 rare recovery race condition). The final patch, from Ben Marzinski,
 is an important change in the freeze code which has been in
 progress for some time. This removes the need to take and drop the
 transaction lock for every single transaction, when the only time it
 was used, was at file system freeze time. Ben's patch integrates the
 freeze operation into the journal flush code as an alternative with
 lower overheads and also lands up resolving some difficult to fix races
 at the same time.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjawdAAoJEMrg3m4a/8jS+fEP/19pd9norrPgt+INKeWF3Nlj
 0cCVyivBjjEOQseiokbn6AlO9sIBETCMsd3v/ke8haleR8J6F1K8OvRY2LV76vZT
 SKTae4Lts7Pzbf8JF9wSi3mpr3zhtQ47v6DvRYEylc68HcwM4EybaSKWEX3By2zd
 Xmhlv+v7V+PRYthaMalOXjhzuYA4Sv2BgdUGG9xKtIfzjvhHhzws/xBcr9UrotX2
 oPjq08X9HY1TNuWN8tTs4P7BrOx8QCb7ZJzT2A9girFyVXNiduGTd11mzCguvHVQ
 /Ove3/7Cg3fABMg/3Ub2dpARqYxJRV25FlTV8RrOWj0BMhndWAbzMt1KPexk4FAE
 a3KCMBbo8WZbjRmOB4tmfknxDCdeUDAIlm1mwDPFJ1/0Vv8rkove1+xWHDvOPWD3
 219GLiUe7PyVowBW4FQhW+CTjArqz3TWB+R/US18rXcwDS9s/vEIDVVwNYlrxRmK
 pztGMr25UoFhbvMe3jtu5xRwQM5oZfQtlYdL09+0BgkgPmuOtEwzuopa7g5MBAze
 Xq7h+oN8M4AtJs/msBF3di+fgOhUyoJmj129xgoZeCxbe80nA0ge0hnb93vQJHmE
 uHe4zV26ChGjUtxUwf77xOZfCEWKsp1ORJkFN+2SMcpUIlfNNumBv/UhrVRN55AO
 CneZaFLboYhxqc28K+Ms
 =x9iK
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw into next

Pull gfs2 updates from Steven Whitehouse:
 "This must be about the smallest merge window patch set ever for GFS2.
  It is probably also the first one without a single patch from me.
  That is down to a combination of factors, and I have some things in
  the works that are not quite ready yet, that I hope to put in next
  time around.

  Returning to what is here this time...  we have 3 patches which fix
  various warnings.  Two are bug fixes (for quotas and also a rare
  recovery race condition).  The final patch, from Ben Marzinski, is an
  important change in the freeze code which has been in progress for
  some time.  This removes the need to take and drop the transaction
  lock for every single transaction, when the only time it was used, was
  at file system freeze time.  Ben's patch integrates the freeze
  operation into the journal flush code as an alternative with lower
  overheads and also lands up resolving some difficult to fix races at
  the same time"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Prevent recovery before the local journal is set
  GFS2: fs/gfs2/file.c: kernel-doc warning fixes
  GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes
  GFS2: remove transaction glock
  GFS2: lops.c: replace 0 by NULL for pointers
  GFS2: quotas not being refreshed in gfs2_adjust_quota
2014-06-04 08:30:10 -07:00
Bob Peterson
0e48e055a7 GFS2: Prevent recovery before the local journal is set
This patch uses a completion to prevent dlm's recovery process from
referencing and trying to recover a journal before a journal has been
opened.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-06-02 19:12:06 +01:00
Benjamin Marzinski
24972557b1 GFS2: remove transaction glock
GFS2 has a transaction glock, which must be grabbed for every
transaction, whose purpose is to deal with freezing the filesystem.
Aside from this involving a large amount of locking, it is very easy to
make the current fsfreeze code hang on unfreezing.

This patch rewrites how gfs2 handles freezing the filesystem. The
transaction glock is removed. In it's place is a freeze glock, which is
cached (but not held) in a shared state by every node in the cluster
when the filesystem is mounted. This lock only needs to be grabbed on
freezing, and actions which need to be safe from freezing, like
recovery.

When a node wants to freeze the filesystem, it grabs this glock
exclusively.  When the freeze glock state changes on the nodes (either
from shared to unlocked, or shared to exclusive), the filesystem does a
special log flush.  gfs2_log_flush() does all the work for flushing out
the and shutting down the incore log, and then it tries to grab the
freeze glock in a shared state again.  Since the filesystem is stuck in
gfs2_log_flush, no new transaction can start, and nothing can be written
to disk. Unfreezing the filesytem simply involes dropping the freeze
glock, allowing gfs2_log_flush() to grab and then release the shared
lock, so it is cached for next time.

However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
shared lock on the filesystem root directory inode to check permissions.
If that glock has already been grabbed exclusively, fsfreeze will be
unable to get the shared lock and unfreeze the filesystem.

In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
on the filesystem root directory during the freeze, and hold it until it
unfreezes the filesystem.  The functions which need to grab a shared
lock in order to allow the unfreeze ioctl to be issued now use the lock
grabbed by the freeze code instead.

The freeze and unfreeze code take care to make sure that this shared
lock will not be dropped while another process is using it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-05-14 10:04:34 +01:00
Peter Zijlstra
4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Joe Perches
cb94eb066e GFS2: Convert gfs2_lm_withdraw to use fs_err
vprintk use is not prefixed by a KERN_<LEVEL>,
so emit these messages at KERN_ERR level.

Using %pV can save some code and allow fs_err to
be used, so do it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-07 09:39:18 +00:00
Joe Perches
d77d1b58aa GFS2: Use pr_<level> more consistently
Add pr_fmt, remove embedded "GFS2: " prefixes.
This now consistently emits lower case "gfs2: " for each message.

Other miscellanea around these changes:

o Add missing newlines
o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-07 09:30:51 +00:00
Steven Whitehouse
bef292a72d GFS2: Remove obsolete quota tunable
There is no need for a paramater which relates to the internals
of quota to be exposed to users. The only possible use would be
to turn it up so large that the memory allocation fails. So lets
remove it and set it to a sensible value which ensures that we
don't ask for multipage allocations.

Currently the size of struct gfs2_holder means that the caluclated
value is identical to the previous default value, so there should
be no functional change.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2013-10-04 09:49:29 +01:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Zhao Hongjiang
4173581876 fs: change return values from -EACCES to -EPERM
According to SUSv3:

[EACCES] Permission denied. An attempt was made to access a file in a way
forbidden by its file access permissions.

[EPERM] Operation not permitted. An attempt was made to perform an operation
limited to processes with appropriate privileges or to the owner of a file
or other resource.

So -EPERM should be returned if capability checks fails.

Strictly speaking this is an API change since the error code user sees is
altered.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:14 -05:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Eric W. Biederman
ed87dabcc3 gfs2: Convert gfs2_quota_refresh to take a kqid
- In quota_refresh_user_store convert the user supplied uid
  into a kqid and pass it to gfs2_quota_refresh.

- In quota_refresh_group_store convert the user supplied gid
  into a kqid and pass it to gfs2_quota_refresh.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:06 -08:00
Steven Whitehouse
fd95e81cb1 GFS2: Reinstate withdraw ack system
This patch reinstates the ack system which withdraw should be using. It
appears to have been accidentally forgotten when the lock module was
merged into GFS2, due to two different sysfs files having the same name.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-13 12:21:40 +00:00
Steven Whitehouse
d564053f07 GFS2: Clean up freeze code
The freeze code has not been looked at a lot recently. Upstream has
moved on, and this is an attempt to catch us back up again. There
is a vfs level interface for the freeze code which can be called
from our (obsolete, but kept for backward compatibility purposes)
sysfs freeze interface. This means freezing this way vs. doing it
from the ioctl should now work in identical fashion.

As a result of this, the freeze function is only called once
and we can drop our own special purpose code for counting the
number of freezes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:29:05 +00:00
Linus Torvalds
801b03653f Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Eliminate 64-bit divides
  GFS2: Reduce file fragmentation
  GFS2: kernel panic with small gfs2 filesystems - 1 RG
  GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
  GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve
  GFS2: Add kobject release method
  GFS2: Size seq_file buffer more carefully
  GFS2: Use seq_vprintf for glocks debugfs file
  seq_file: Add seq_vprintf function and export it
  GFS2: Use lvbs for storing rgrp information with mount option
  GFS2: Cache last hash bucket for glock seq_files
  GFS2: Increase buffer size for glocks and glstats debugfs files
  GFS2: Fix error handling when reading an invalid block from the journal
  GFS2: Add "top dir" flag support
  GFS2: Fold quota data into the reservations struct
  GFS2: Extend the life of the reservations
2012-07-24 17:57:05 -07:00
Jan Kara
ceed17236a quota: Split dquot_quota_sync() to writeback and cache flushing part
Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:19 +04:00
Bob Peterson
0d515210b6 GFS2: Add kobject release method
This patch adds a kobject release function that properly maintains
the kobject use count, so that accesses to the sysfs files do not
cause an access to freed kernel memory after an unmount.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-13 15:59:48 +01:00
David Teigland
1a058f5288 gfs2: fix recovery during unmount
Journal recovery from lock_dlm should not be ignored
if there is an unmount in progress.  Ignoring it will
causes the recovery to get stuck.  The recovery
process will correctly handle an in-progess unmount.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02 14:19:12 -05:00
David Teigland
e0c2a9aa1e GFS2: dlm based recovery coordination
This new method of managing recovery is an alternative to
the previous approach of using the userland gfs_controld.

- use dlm slot numbers to assign journal id's
- use dlm recovery callbacks to initiate journal recovery
- use a dlm lock to determine the first node to mount fs
- use a dlm lock to track journals that need recovery

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11 09:23:05 +00:00
Steven Whitehouse
3942ae5319 GFS2: Fix race during filesystem mount
There is a potential race during filesystem mounting which has recently
been reported. It occurs when the userland gfs_controld is able to
process requests fast enough that it tries to use the sysfs interface
before the lock module is properly initialised. This is a pretty
unusual case as normally the lock module initialisation is very quick
compared with gfs_controld.

This patch adds an interruptible completion which is used to ensure that
userland will wait for the initialisation of the lock module to
complete.

There are other potential solutions to this problem, but this is the
quickest at this stage and has been tested both with and without
mount.gfs2 present in the system.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Booher <dbooher@adams.net>
2011-07-12 09:15:46 +01:00
Steven Whitehouse
32e471ef10 GFS2: Use UUID field in generic superblock
The VFS superblock structure now has a UUID field, so we can use that
in preference to the UUID field in the GFS2 superblock now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10 15:01:59 +01:00
Steven Whitehouse
134669854e GFS2: Fix type mapping for demote_rq interface
Mostly the glock operations follow the type of the glock. The
one exception is the transaction glock, so we need to check for
that directly.

Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-10-06 09:58:44 +01:00
Steven Whitehouse
feb47ca931 GFS2: Improve journal allocation via sysfs
Recently a feature was added to GFS2 to allow journal id allocation
via sysfs. This patch builds upon that so that a negative journal id
will be treated as an error code to be passed back as the return code
from mount. This allows termination of the mount process if there is
a failure.

Also, the process has been updated so that the kernel will wait
for a journal id, even in the "spectator" case. This is required
in order to avoid mounting a filesystem in case there is an error
while joining the cluster. In the spectator case, 0 is written into
the file to indicate that all is well, and that mount should continue.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-29 15:04:18 +01:00
Linus Torvalds
3b7433b8a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
  workqueue: mark init_workqueues() as early_initcall()
  workqueue: explain for_each_*cwq_cpu() iterators
  fscache: fix build on !CONFIG_SYSCTL
  slow-work: kill it
  gfs2: use workqueue instead of slow-work
  drm: use workqueue instead of slow-work
  cifs: use workqueue instead of slow-work
  fscache: drop references to slow-work
  fscache: convert operation to use workqueue instead of slow-work
  fscache: convert object to use workqueue instead of slow-work
  workqueue: fix how cpu number is stored in work->data
  workqueue: fix mayday_mask handling on UP
  workqueue: fix build problem on !CONFIG_SMP
  workqueue: fix locking in retry path of maybe_create_worker()
  async: use workqueue for worker pool
  workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
  workqueue: implement unbound workqueue
  workqueue: prepare for WQ_UNBOUND implementation
  libata: take advantage of cmwq and remove concurrency limitations
  workqueue: fix worker management invocation without pending works
  ...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07 12:42:58 -07:00
Steven Whitehouse
ba6e93645f GFS2: Wait for journal id on mount if not specified on mount command line
This patch implements a wait for the journal id in the case that it has
not been specified on the command line. This is to allow the future
removal of the mount.gfs2 helper. The journal id would instead be
directly communicated by gfs_controld to the file system. Here is a
comparison of the two systems:

Current:
1. mount calls mount.gfs2
2. mount.gfs2 connects to gfs_controld to retrieve the journal id
3. mount.gfs2 adds the journal id to the mount command line and calls
the mount system call
4. gfs_controld receives the status of the mount request via a uevent

Proposed:
1. mount calls the mount system call (no mount.gfs2 helper)
2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know
about already
3. gfs_controld assigns a journal id to it via sysfs
4. the mount system call then completes as normal (sending a uevent
according to status)

The advantage of the proposed system is that it is completely backward
compatible with the current system both at the kernel and at the
userland levels. The "first" parameter can also be set the same way,
with the restriction that it must be set before the journal id is
assigned.

In addition, if mount becomes stuck waiting for a reply from
gfs_controld which never arrives, then it is killable and will abort the
mount gracefully.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-29 09:36:35 +01:00
Tejun Heo
6ecd7c2dd9 gfs2: use workqueue instead of slow-work
Workqueue can now handle high concurrency.  Convert gfs to use
workqueue instead of slow-work.

* Steven pointed out that recovery path might be run from allocation
  path and thus requires forward progress guarantee without memory
  allocation.  Create and use gfs_recovery_wq with rescuer.  Please
  note that forward progress wasn't guaranteed with slow-work.

* Updated to use non-reentrant workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-23 13:14:25 +02:00
Linus Torvalds
677abe49ad Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Fix typo
  GFS2: stuck in inode wait, no glocks stuck
  GFS2: Eliminate useless err variable
  GFS2: Fix writing to non-page aligned gfs2_quota structures
  GFS2: Add some useful messages
  GFS2: fix quota state reporting
  GFS2: Various gfs2_logd improvements
  GFS2: glock livelock
  GFS2: Clean up stuffed file copying
  GFS2: docs update
  GFS2: Remove space from slab cache name
2010-05-21 07:29:15 -07:00
Steven Whitehouse
6a99be5d7b GFS2: Fix typo
A missing ! in a test.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-14 14:05:51 +01:00
Steven Whitehouse
913a71d250 GFS2: Add some useful messages
The following patch adds a message to indicate when barriers have been
disabled due to a block device which doesn't support them. You could
already tell this via the mount options in /proc/mounts, but all the
other filesystems also log a message at the same time.

Also, the same mechanisms are used to indicate when the lock
demote interface has been used (only ever used for debugging)
which is a request from our support team.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-06 11:03:29 +01:00
Benjamin Marzinski
5e687eac1b GFS2: Various gfs2_logd improvements
This patch contains various tweaks to how log flushes and active item writeback
work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
for gfs2_logd to do the log flushing.  Multiple functions were rewritten to
remove the need to call gfs2_log_lock(). Instead of using one test to see if
gfs2_logd had work to do, there are now seperate tests to check if there
are two many buffers in the incore log or if there are two many items on the
active items list.

This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
some minor changes.  Since gfs2_ail1_start always submits all the active items,
it no longer needs to keep track of the first ai submitted, so this has been
removed. In gfs2_log_reserve(), the order of the calls to
prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
been switched.  If it called wake_up first there was a small window for a race,
where logd could run and return before gfs2_log_reserve was ready to get woken
up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
would be left waiting for gfs2_logd to eventualy run because it timed out.
Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
out, and flushes the log, can now be set on mount with ar_commit.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-05 09:39:18 +01:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Emese Revfy
52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Emese Revfy
9cd43611cc kobject: Constify struct kset_uevent_ops
Constify struct kset_uevent_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Linus Torvalds
e213e26ab3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
  quota: stop using QUOTA_OK / NO_QUOTA
  dquot: cleanup dquot initialize routine
  dquot: move dquot initialization responsibility into the filesystem
  dquot: cleanup dquot drop routine
  dquot: move dquot drop responsibility into the filesystem
  dquot: cleanup dquot transfer routine
  dquot: move dquot transfer responsibility into the filesystem
  dquot: cleanup inode allocation / freeing routines
  dquot: cleanup space allocation / freeing routines
  ext3: add writepage sanity checks
  ext3: Truncate allocated blocks if direct IO write fails to update i_size
  quota: Properly invalidate caches even for filesystems with blocksize < pagesize
  quota: generalize quota transfer interface
  quota: sb_quota state flags cleanup
  jbd: Delay discarding buffers in journal_unmap_buffer
  ext3: quota_write cross block boundary behaviour
  quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
  quota: split out compat_sys_quotactl support from quota.c
  quota: split out netlink notification support from quota.c
  quota: remove invalid optimization from quota_sync_all
  ...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05 13:20:53 -08:00
Christoph Hellwig
5fb324ad24 quota: move code from sync_quota_sb into vfs_quota_sync
Currenly sync_quota_sb does a lot of sync and truncate action that only
applies to "VFS" style quotas and is actively harmful for the sync
performance in XFS.  Move it into vfs_quota_sync and add a wait parameter
to ->quota_sync to tell if we need it or not.

My audit of the GFS2 code says it's also not needed given the way GFS2
implements quotas, but I'd be happy if this can get a detailed review.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:24 +01:00
Steven Whitehouse
c1184f8ab7 GFS2: Remove loopy umount code
As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.

Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-01 14:07:53 +00:00
Joe Perches
f0b34ae634 fs/gfs2/sys.c: use %pUB to print UUIDs
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:33 -08:00
Steven Whitehouse
ea76233859 GFS2: Add proper error reporting to quota sync via sysfs
For some reason, the errors were not making it to userspace.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03 11:50:03 +00:00
Steven Whitehouse
8c42d637f6 GFS2: Alter arguments of gfs2_quota/statfs_sync
These two functions are altered so that gfs2_quota_sync may
in future be called directly from the VFS. The GFS2 superblock
changes to a VFS super block and there is an addition of an int
argument which is currently ignored.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03 11:48:54 +00:00
Steven Whitehouse
2b88f7c535 GFS2: Remove unused sysfs file
The /sys/fs/gfs2/<fsname>/lock_module/id file has been unused for
some time now, so we can remove it. We still accept the mount option
though, as userspace still sends that.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-09 15:59:35 +01:00
Steven Whitehouse
31e54b01f3 GFS2: Add sysfs link to device
This adds a link from the per-gfs2 sb sysfs directory to
the block device upon which the filesystem is mounted. The
link is called "device", strangely enough :-)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-17 11:11:18 +01:00
Steven Whitehouse
440d6da207 GFS2: Add some more info to uevents
With each uevent, we now always include the journal ID. We
can't call it JID since that is already in use by some of
the individual events relating to recovery, so we use
JOURNALID instead. We don't send the JOURNALID for spectator
mounts, since there isn't one.

Also the ADD event now has both RDONLY and SPECTATOR information
to match that of the ONLINE event.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-17 11:05:00 +01:00
Steven Whitehouse
d7e623da1a GFS2: Fix permissions on "recover" file
Although this file is only ever written and not read by
userspace, it seems that the utils are opening this
file O_RDWR, so we need to allow that.

Also fixes the whitespace which seemed to be broken.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2009-08-14 14:04:46 +01:00
Steven Whitehouse
f6eb53498e GFS2: Remove args subdir from gfs2 sysfs files
Since we can cat /proc/mounts there is no need to have this
subdirectory in the gfs2 sysfs files. In fact this does not
reflect the full range of possible mount argumenmts, where
as /proc/mounts does.

There was only one userland user of this set of sysfs files
and it will function perfectly well without these files
being present (in fact that subcommand of gfs2_tool is
obsolete anyway).

The tune/* subdirectory is also considered mostly obsolete,
but there are a few uses of this until mount arguments can
be added for the last few functions for which there are no
equivalents currently. However the tune/* directory is still
in my sights and new code should avoid using it. Only the gfs2_quota
and gfs2_tool programs are know to use tune/* at the moment.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-26 15:50:25 +01:00