Commit Graph

40 Commits

Author SHA1 Message Date
Alexander Aring
d47b822974 dlm: warn about invalid nodeid comparsions
This patch adds a warn on if is_master() and dlm_is_removed() checks on
invalid nodeid states that are probably not what the caller wants to do
here. The is_master() function checking on r->res_nodeid is invalid when
it is set to -1, whereas the dlm_is_removed() has a different meaning
as "nodeid member" and also 0 is invalid.

We run into these cases and this patch changes those cases as we never
will run into them. There should be no functional changes as the
condition should return the same result. However this patch signals now
on caller level that there might be an "extra" case to handle here.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:15:07 -05:00
David Teigland
4f5957a980 dlm: change list and timer names
The old terminology of "toss" and "keep" is no longer an
accurate description of the rsb states and lists, so change
the names to "inactive" and "active".  The old names had
also been copied into the scanning code, which is changed
back to use the "scan" name.

- "active" rsb structs have lkb's attached, and are ref counted.
- "inactive" rsb structs have no lkb's attached, are not ref counted.
- "scan" list is for rsb's that can be freed after a timeout period.
- "slow" lists are for infrequent iterations through active or
   inactive rsb structs.
- inactive rsb structs that are directory records will not be put
  on the scan list, since they are not freed based on timeouts.
- inactive rsb structs that are not directory records will be
  put on the scan list to be freed, since they are not longer needed.

Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10 15:11:46 -05:00
Alexander Aring
b1f2381c1a dlm: drop dlm_scand kthread and use timers
Currently the scand kthread acts like a garbage collection for expired
rsbs on toss list, to clean them up after a certain timeout. It triggers
every couple of seconds and iterates over the toss list while holding
ls_rsbtbl_lock for the whole hash bucket iteration.

To reduce the amount of time holding ls_rsbtbl_lock, we now handle the
disposal of expired rsbs using a per-lockspace timer that expires for the
earliest tossed rsb on the lockspace toss queue. This toss queue is
ordered according to the rsb res_toss_time with the earliest tossed rsb
as the first entry. The toss timer will only trylock() necessary locks,
since it is low priority garbage collection, and will rearm the timer
if trylock() fails. If the timer function does not find any expired
rsb's, it rearms the timer with the next earliest expired rsb.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:40:27 -05:00
Alexander Aring
6644925abf dlm: do not use ref counts for rsb in the toss state
In the past we had problems when an rsb had a reference counter greater
than one while in the toss state.  An rsb in the toss state is not
actively used for locking, and should not have any other references
apart from the single ref keeping it on the rsb hash.  Shift to freeing
rsb's directly rather than using kref_put to free them, since the ref
counting is not meant to be used in this state.  Add warnings if ref
counting is seen while an rsb is in the toss state.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:34:39 -05:00
Alexander Aring
6c648035cb dlm: switch to use rhashtable for rsbs
Replace our own hash table with the more advanced rhashtable
for keeping rsb structs.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:34:39 -05:00
Alexander Aring
578acf9a87 dlm: use spin_lock_bh for message processing
Use spin_lock_bh for all spinlocks involved in message processing,
in preparation for softirq message processing.  DLM lock requests
from user space involve dlm processing in user context, in addition
to the standard kernel context, necessitating bh variants.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:45:23 -05:00
Alexander Aring
cc396e2355 dlm: convert res_lock to spinlock
Convert the rsb struct res_lock from a mutex to a spinlock
in preparation for processing messages in softirq context.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
6b52ea7916 dlm: drop mutex use in waiters recovery
The waiters_mutex no longer needs to be used in the waiters recovery
functions dlm_recover_waiters_pre() and dlm_recover_waiters_pre().
During recovery, ordinary locking operations are paused, and the
recovery thread is the only context accessing the waiters list,
so the lock is not needed.

Access to the waiters list from debugfs functions is avoided by
taking the top level recovery lock in the debugfs dump function.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
3a747f4a2e dlm: move rsb root_list to ls_recover() stack
Move the rsb root_list from the lockspace to a stack variable since
it is now only used by the ls_recover() function.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
1151935182 fs: dlm: constify receive buffer
The dlm receive buffer should be never manipulated as DLM is the last
instance of parsing layer. This patch constify the whole receive buffer
so we are sure it never gets manipulated when it's being parsed.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2023-08-10 10:33:03 -05:00
Alexander Aring
b9d2f6ada0 fs: dlm: drop rxbuf manipulation in dlm_recover_master_copy
Currently dlm_recover_master_copy() manipulates the receive buffer of an
rcom lock message and modifies it on the fly so a later memcpy() to a
new rcom message with the same message has those new values. This patch
avoids manipulating the received rcom message by store the values for
the new rcom message in paremter assigned with call by reference. Later
when dlm_send_rcom_lock() constructs a new message and memcpy() the
receive buffer those values will be set on the new constructed message.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2023-08-10 10:33:03 -05:00
Alexander Aring
c4f4e135c2 fs: dlm: get recovery sequence number as parameter
This patch removes a read of the ls->ls_recover_seq uint64_t number in
_create_rcom(). If the ls->ls_recover_seq is readed the ls_recover_lock
need to held. However this number was always readed before when any rcom
message is received and it's not necessary to read it again from a per
lockspace variable to use it for the replying message. This patch will
pass the sequence number as parameter so another read of ls->ls_recover_seq
and holding the ls->ls_recover_lock is not required.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2023-08-10 10:33:03 -05:00
Alexander Aring
01c7a59789 fs: dlm: remove deprecated code parts
This patch removes code parts which was declared deprecated by
commit 6b0afc0cc3 ("fs: dlm: don't use deprecated timeout features by
default"). This contains the following dlm functionality:

- start a cancel of a dlm request did not complete after certain timeout:
  The current way how dlm cancellation works and interfering with other
  dlm requests triggered by the user can end in an overlapping and
  returning in -EBUSY. The most user don't handle this case and are
  unaware that DLM can return such errno in such situation. Due the
  timeout the user are mostly unaware when this happens.
- start a netlink warning messages for user space if dlm requests did
  not complete after certain timeout:
  This feature was never being built in the only known dlm user space side.
  As we are to remove the timeout cancellation feature we can directly
  remove this feature as well.

There might be the possibility to bring the timeout cancellation feature
back. However the current way of handling the -EBUSY case which is only
a software limitation and not a hardware limitation should be changed.
We minimize the current code base in DLM cancellation feature to not have
to deal with those existing features while solving the DLM cancellation
feature in general.

UAPI define DLM_LSFL_TIMEWARN is commented as deprecated and reserved
value. We should avoid at first to give it a new meaning but let
possible users still compile by keeping this define. In far future we
can give this flag a new meaning. The same for the DLM_LKF_TIMEOUT lock
request flag.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2023-03-06 15:49:07 -06:00
Alexander Aring
56171e0db2 fs: dlm: const void resource name parameter
The resource name parameter should never be changed by DLM so we declare
it as const. At some point it is handled as a char pointer, a resource
name can be a non printable ascii string as well. This patch change it
to handle it as void pointer as it is offered by DLM API.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-23 15:02:47 -05:00
Alexander Aring
6b0afc0cc3 fs: dlm: don't use deprecated timeout features by default
This patch will disable use of deprecated timeout features if
CONFIG_DLM_DEPRECATED_API is not set.  The deprecated features
will be removed in upcoming kernel release v6.2.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:38 -05:00
Alexander Aring
8d614a4457 fs: dlm: remove timeout from dlm_user_adopt_orphan
Remove the unused timeout parameter from dlm_user_adopt_orphan().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:53 -05:00
Alexander Aring
2bb2a3d66c fs: dlm: remove waiter warnings
This patch removes warning messages that could be logged when
remote requests had been waiting on a reply message for some timeout
period (which could be set through configfs, but was rarely enabled.)
The improved midcomms layer now carefully tracks all messages and
replies, and logs much more useful messages if there is an actual
problem.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:52 -05:00
Alexander Aring
63eab2b00b fs: dlm: add lkb waiters debugfs functionality
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
5054e79de9 fs: dlm: add lkb debugfs functionality
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.

Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Thomas Gleixner
2522fe45a1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license v 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 45 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:29:21 -07:00
David Teigland
2ab4bd8ea3 dlm: adopt orphan locks
A process may exit, leaving an orphan lock in the lockspace.
This adds the capability for another process to acquire the
orphan lock.  Acquiring the orphan just moves the lock from
the orphan list onto the acquiring process's list of locks.

An adopting process must specify the resource name and mode
of the lock it wants to adopt.  If a matching lock is found,
the lock is moved to the caller's 's list of locks, and the
lkid of the lock is returned like the lkid of a new lock.

If an orphan with a different mode is found, then -EAGAIN is
returned.  If no orphan lock is found on the resource, then
-ENOENT is returned.  No async completion is used because
the result is immediately available.

Also, when orphans are purged, allow a zero nodeid to refer
to the local nodeid so the caller does not need to look up
the local nodeid.

Signed-off-by: David Teigland <teigland@redhat.com>
2014-11-19 14:48:02 -06:00
David Teigland
c04fecb4d9 dlm: use rsbtbl as resource directory
Remove the dir hash table (dirtbl), and use
the rsb hash table (rsbtbl) as the resource
directory.  It has always been an unnecessary
duplication of information.

This improves efficiency by using a single rsbtbl
lookup in many cases where both rsbtbl and dirtbl
lookups were needed previously.

This eliminates the need to handle cases of rsbtbl
and dirtbl being out of sync.

In many cases there will be memory savings because
the dir hash table no longer exists.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:16:19 -05:00
David Teigland
4875647a08 dlm: fixes for nodir mode
The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used.  This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
  all in-progress operations after recovery.  In some
  cases it's not possible to know which in-progess locks
  to recover, so recover all.  (Most require recovery
  in nodir mode anyway since rehashing changes most
  master nodes.)

- Change the way nodir mode is enabled, from a command
  line mount arg passed through gfs2, into a sysfs
  file managed by dlm_controld, consistent with the
  other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
  yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
  from a previous, aborted recovery cycle.  Base this
  on the local recovery status not being in the state
  where any nodes should be sending LOCK messages for the
  current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
  may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
  the master as is usual), because the lkb can switch
  back and forth between being a master and being a
  process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
  non-empty convert or waiting queues for granting
  at the end of recovery.  (Rename flag from LOCKS_PURGED
  to RECOVER_GRANT and similar for the recovery function,
  because it's not only resources with purged locks
  that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
  error messages.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02 14:15:27 -05:00
David Teigland
6d40c4a708 dlm: improve error and debug messages
Change some existing error/debug messages to
collect more useful information, and add
some new error/debug messages to address
recently found problems.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:41:46 -05:00
David Teigland
7210cb7a72 dlm: fix slow rsb search in dir recovery
The function used to find an rsb during directory
recovery was searching the single linear list of
rsb's.  This wasted a lot of time compared to
using the standard hash table to find the rsb.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-03-08 14:46:30 -06:00
David Teigland
c6ff669bac dlm: delayed reply message warning
Add an option (disabled by default) to print a warning message
when a lock has been waiting a configurable amount of time for
a reply message from another node.  This is mainly for debugging.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-04-01 14:19:06 -05:00
Adrian Bunk
170e19ab29 dlm: make dlm_print_rsb() static
dlm_print_rsb() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-04-21 11:18:01 -05:00
Al Viro
eef7d739c2 dlm: dlm_process_incoming_buffer() fixes
* check that length is large enough to cover the non-variable part of message or
  rcom resp. (after checking that it's large enough to cover the header, of
  course).

* kill more pointless casts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04 01:22:42 -06:00
David Teigland
85f0379aa0 dlm: keep cached master rsbs during recovery
To prevent the master of an rsb from changing rapidly, an unused rsb is kept
on the "toss list" for a period of time to be reused.  The toss list was
being cleared completely for each recovery, which is unnecessary.  Much of
the benefit of the toss list can be maintained if nodes keep rsb's in their
toss list that they are the master of.  These rsb's need to be included
when the resource directory is rebuilt during recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland
c36258b592 [DLM] block dlm_recv in recovery transition
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm.  This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv.  The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa.  Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:38 +01:00
David Teigland
8b4021fa43 [DLM] canceling deadlocked lock
Add a function that can be used through libdlm by a system daemon to cancel
another process's deadlocked lock.  A completion ast with EDEADLK is returned
to the process waiting for the lock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:54 +01:00
David Teigland
d7db923ea4 [DLM] dlm_device interface changes [3/6]
Change the user/kernel device interface used by libdlm:
- Add ability for userspace to check the version of the interface.  libdlm
  can now adapt to different versions of the kernel interface.
- Increase the size of the flags passed in a lock request so all possible
  flags can be used from userspace.
- Add an opaque "xid" value for each lock.  This "transaction id" will be
  used later to associate locks with each other during deadlock detection.
- Add a "timeout" value for each lock.  This is used along with the
  DLM_LKF_TIMEOUT flag.

Also, remove a fragment of unused code in device_read().

This patch requires updating libdlm which is backward compatible with
older kernels.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:36 +01:00
David Teigland
3ae1acf93a [DLM] add lock timeouts and warnings [2/6]
New features: lock timeouts and time warnings.  If the DLM_LKF_TIMEOUT
flag is set, then the request/conversion will be canceled after waiting
the specified number of centiseconds (specified per lock).  This feature
is only available for locks requested through libdlm (can be enabled for
kernel dlm users if there's a use for it.)

If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
a warning message will be sent to userspace (using genetlink) after a
request/conversion has been waiting for a given number of centiseconds
(configurable per node).  The time warnings will be used in the future
to do deadlock detection in userspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:33 +01:00
David Teigland
85e86edf95 [DLM] block scand during recovery [1/6]
Don't let dlm_scand run during recovery since it may try to do a resource
directory removal while the directory nodes are changing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:31 +01:00
David Teigland
72c2be776b [DLM] interface for purge (2/2)
Add code to accept purge commands from userland.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:12 +01:00
David Teigland
a345da3e8f [DLM] dump rsb and locks on assert
Introduce new function dlm_dump_rsb() to call within assertions instead of
dlm_print_rsb().  The new function dumps info about all locks on the rsb
in addition to rsb details.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-21 09:50:09 -04:00
David Teigland
597d0cae0f [DLM] dlm: user locks
This changes the way the dlm handles user locks.  The core dlm is now
aware of user locks so they can be dealt with more efficiently.  There is
no more dlm_device module which previously managed its own duplicate copy
of every user lock.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-13 09:25:34 -04:00
David Teigland
97a35d1e5f [DLM] fix grant_after_purge softlockup
In dlm_grant_after_purge() we were holding a hash table read_lock while
calling put_rsb() which potentially removes the rsb from the hash table,
taking the same lock in write.  Fix this by flagging rsb's ahead of time
that have been purged.  Then iteratively read_lock the hash table, find a
flagged rsb, unlock, process rsb.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-02 13:34:03 -04:00
David Teigland
901359256b [DLM] Update DLM to the latest patch level
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
2006-01-20 08:47:07 +00:00
David Teigland
e7fd41792f [DLM] The core of the DLM for GFS2/CLVM
This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.

It implements VAX-style locking modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
2006-01-18 09:30:29 +00:00