2005-04-16 22:20:36 +00:00
|
|
|
/*
|
2005-11-02 03:58:39 +00:00
|
|
|
* Copyright (c) 2000,2005 Silicon Graphics, Inc.
|
|
|
|
* All Rights Reserved.
|
2005-04-16 22:20:36 +00:00
|
|
|
*
|
2005-11-02 03:58:39 +00:00
|
|
|
* This program is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU General Public License as
|
2005-04-16 22:20:36 +00:00
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
2005-11-02 03:58:39 +00:00
|
|
|
* This program is distributed in the hope that it would be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
2005-04-16 22:20:36 +00:00
|
|
|
*
|
2005-11-02 03:58:39 +00:00
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, write the Free Software Foundation,
|
|
|
|
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
2005-04-16 22:20:36 +00:00
|
|
|
*/
|
|
|
|
#ifndef __XFS_EXTFREE_ITEM_H__
|
|
|
|
#define __XFS_EXTFREE_ITEM_H__
|
|
|
|
|
2013-08-12 10:49:25 +00:00
|
|
|
/* kernel only EFI/EFD definitions */
|
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
struct xfs_mount;
|
|
|
|
struct kmem_zone;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Max number of extents in fast allocation path.
|
|
|
|
*/
|
|
|
|
#define XFS_EFI_MAX_FAST_EXTENTS 16
|
|
|
|
|
|
|
|
/*
|
2010-12-20 00:59:49 +00:00
|
|
|
* Define EFI flag bits. Manipulated by set/clear/test_bit operators.
|
2005-04-16 22:20:36 +00:00
|
|
|
*/
|
2010-12-20 00:59:49 +00:00
|
|
|
#define XFS_EFI_RECOVERED 1
|
2005-04-16 22:20:36 +00:00
|
|
|
|
|
|
|
/*
|
xfs: don't free EFIs before the EFDs are committed
Filesystems are occasionally being shut down with this error:
xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.
It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:
http://oss.sgi.com/archives/xfs/2013-01/msg00082.html
The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:
EFI EFI in AIL EFD Result
committed/unpinned Yes committed OK
committed/pinned No committed Shutdown
uncommitted No committed Shutdown
Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck. That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.
It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.
What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.
This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.
Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.
However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-03 03:09:21 +00:00
|
|
|
* This is the "extent free intention" log item. It is used to log the fact
|
|
|
|
* that some extents need to be free. It is used in conjunction with the
|
|
|
|
* "extent free done" log item described below.
|
|
|
|
*
|
|
|
|
* The EFI is reference counted so that it is not freed prior to both the EFI
|
xfs: fix efi/efd error handling to avoid fs shutdown hangs
Freeing an extent in XFS involves logging an EFI (extent free
intention), freeing the actual extent, and logging an EFD (extent
free done). The EFI object is created with a reference count of 2:
one for the current transaction and one for the subsequently created
EFD. Under normal circumstances, the first reference is dropped when
the EFI is unpinned and the second reference is dropped when the EFD
is committed to the on-disk log.
In event of errors or filesystem shutdown, there are various
potential cleanup scenarios depending on the state of the EFI/EFD.
The cleanup scenarios are confusing and racy, as demonstrated by the
following test sequence:
# mount $dev $mnt
# fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \
-f punch=1 -f creat=1 -f unlink=1 &
# sleep 5
# killall -9 fsstress; wait
# godown -f $mnt
# umount
... in which the final umount can hang due to the AIL being pinned
indefinitely by one or more EFI items. This can occur due to several
conditions. For example, if the shutdown occurs after the EFI is
committed to the on-disk log and the EFD committed to the CIL, but
before the EFD committed to the log, the EFD iop_committed() abort
handler does not drop its reference to the EFI. Alternatively,
manual error injection in the xfs_bmap_finish() codepath shows that
if an error occurs after the EFI transaction is committed but before
the EFD is constructed and logged, the EFI is never released from
the AIL.
Update the EFI/EFD item handling code to use a more straightforward
and reliable approach to error handling. If an error occurs after
the EFI transaction is committed and before the EFD is constructed,
release the EFI explicitly from xfs_bmap_finish(). If the EFI
transaction is cancelled, release the EFI in the unlock handler.
Once the EFD is constructed, it is responsible for releasing the EFI
under any circumstances (including whether the EFI item aborts due
to log I/O error). Update the EFD item handlers to release the EFI
if the transaction is cancelled or aborts due to log I/O error.
Finally, update xfs_bmap_finish() to log at least one EFD extent to
the transaction before xfs_free_extent() errors are handled to
ensure the transaction is dirty and EFD item error handling is
triggered.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-18 23:51:16 +00:00
|
|
|
* and EFD being committed and unpinned. This ensures the EFI is inserted into
|
|
|
|
* the AIL even in the event of out of order EFI/EFD processing. In other words,
|
|
|
|
* an EFI is born with two references:
|
|
|
|
*
|
|
|
|
* 1.) an EFI held reference to track EFI AIL insertion
|
|
|
|
* 2.) an EFD held reference to track EFD commit
|
|
|
|
*
|
|
|
|
* On allocation, both references are the responsibility of the caller. Once the
|
|
|
|
* EFI is added to and dirtied in a transaction, ownership of reference one
|
|
|
|
* transfers to the transaction. The reference is dropped once the EFI is
|
|
|
|
* inserted to the AIL or in the event of failure along the way (e.g., commit
|
|
|
|
* failure, log I/O error, etc.). Note that the caller remains responsible for
|
|
|
|
* the EFD reference under all circumstances to this point. The caller has no
|
|
|
|
* means to detect failure once the transaction is committed, however.
|
|
|
|
* Therefore, an EFD is required after this point, even in the event of
|
|
|
|
* unrelated failure.
|
|
|
|
*
|
|
|
|
* Once an EFD is allocated and dirtied in a transaction, reference two
|
|
|
|
* transfers to the transaction. The EFD reference is dropped once it reaches
|
|
|
|
* the unpin handler. Similar to the EFI, the reference also drops in the event
|
|
|
|
* of commit failure or log I/O errors. Note that the EFD is not inserted in the
|
|
|
|
* AIL, so at this point both the EFI and EFD are freed.
|
2005-04-16 22:20:36 +00:00
|
|
|
*/
|
|
|
|
typedef struct xfs_efi_log_item {
|
|
|
|
xfs_log_item_t efi_item;
|
xfs: don't free EFIs before the EFDs are committed
Filesystems are occasionally being shut down with this error:
xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.
It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:
http://oss.sgi.com/archives/xfs/2013-01/msg00082.html
The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:
EFI EFI in AIL EFD Result
committed/unpinned Yes committed OK
committed/pinned No committed Shutdown
uncommitted No committed Shutdown
Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck. That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.
It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.
What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.
This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.
Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.
However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-03 03:09:21 +00:00
|
|
|
atomic_t efi_refcount;
|
2010-12-20 00:59:49 +00:00
|
|
|
atomic_t efi_next_extent;
|
|
|
|
unsigned long efi_flags; /* misc flags */
|
2005-04-16 22:20:36 +00:00
|
|
|
xfs_efi_log_format_t efi_format;
|
|
|
|
} xfs_efi_log_item_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is the "extent free done" log item. It is used to log
|
|
|
|
* the fact that some extents earlier mentioned in an efi item
|
|
|
|
* have been freed.
|
|
|
|
*/
|
|
|
|
typedef struct xfs_efd_log_item {
|
|
|
|
xfs_log_item_t efd_item;
|
|
|
|
xfs_efi_log_item_t *efd_efip;
|
|
|
|
uint efd_next_extent;
|
|
|
|
xfs_efd_log_format_t efd_format;
|
|
|
|
} xfs_efd_log_item_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Max number of extents in fast allocation path.
|
|
|
|
*/
|
|
|
|
#define XFS_EFD_MAX_FAST_EXTENTS 16
|
|
|
|
|
|
|
|
extern struct kmem_zone *xfs_efi_zone;
|
|
|
|
extern struct kmem_zone *xfs_efd_zone;
|
|
|
|
|
|
|
|
xfs_efi_log_item_t *xfs_efi_init(struct xfs_mount *, uint);
|
|
|
|
xfs_efd_log_item_t *xfs_efd_init(struct xfs_mount *, xfs_efi_log_item_t *,
|
|
|
|
uint);
|
2006-06-09 04:55:38 +00:00
|
|
|
int xfs_efi_copy_format(xfs_log_iovec_t *buf,
|
|
|
|
xfs_efi_log_format_t *dst_efi_fmt);
|
2005-06-21 05:41:19 +00:00
|
|
|
void xfs_efi_item_free(xfs_efi_log_item_t *);
|
2015-08-18 23:50:12 +00:00
|
|
|
void xfs_efi_release(struct xfs_efi_log_item *);
|
2005-06-21 05:41:19 +00:00
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
#endif /* __XFS_EXTFREE_ITEM_H__ */
|