That is necessary in case a connection does not have a volume 0
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In the context of drbd-8.4 it no longer makes sense to
dissalow that.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Took the chance and converted tconn_process_done_ee() to use
idr_for_each_entry()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This function was used to broadcast the (leading part of the)
bio payload in case we see a data integrity error. It could be received
from userland with the drbdsetup events subcommand,
to have a peek into the payload that caused the checksum mismatch,
and guess from there what may have caused the mismatch,
mainly to guess wether it was modification of in-flight data,
or data corruption by broken hardware or software bugs.
Meanwhile we support bios that are larger than the maximum payload a
netlink datagram can carry.
And we have means to reliably detect modification of in-flight data by
calculating, and comparing, the checksum before and after sendmsg.
There is no need to carry this around anymore.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Don't rely on availability of bios from the global fs_bio_set,
we should use our own bio_set for meta data IO.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This commit got it wrong:
drbd: Make the peer_seq updating code more obvious
Make it more clear that update_peer_seq() is supposed to wake up the
seq_wait queue whenever the sequence number changes.
We don't need to wake up everytime we receive a sequence number
that is _different_ from our currently stored "newest" sequence number,
but only if we receive a sequence number _newer_ than what we already
have, when we actually change mdev->peer_seq.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* Moved CONFIG_PENDING and DEVICE_DYING from mdev to tconn.
* Renamed drbd_reconfig_start() and drbd_reconfig_done() to
conn_reconfig_start() and conn_reconfig_done().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Instead of artificially enlarging the command decoding arrays to
P_MAX_CMD entries, check if an index is within the valid range using the
ARRAY_SIZE() macro.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The previous algorithm for dealing with overlapping concurrent writes
was generating unnecessary warnings for scenarios which could be
legitimate, and did not always handle partially overlapping requests
correctly. Improve it algorithm as follows:
* While local or remote write requests are in progress, conflicting new
local write requests will be delayed (commit 82172f7).
* When a conflict between a local and remote write request is detected,
the node with the discard flag decides how to resolve the conflict: It
will ask its peer to discard conflicting requests which are fully
contained in the local request and retry requests which overlap only
partially. This involves a protocol change.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the node with the discard flag resolves write conflicts in
dual-primary mode, it may determine that its peer has sent ack packets
on the metadata socket which did not arrive, yet. Wait for the next ack
with ping-timeout instead of a hard-coded 30 seconds.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Commit 9b1e63e changed the concurrent write detection algorithm to only insert
peer requests into write_requests tree after determining that there is no
conflict. With this change, new conflicting local requests could be added
while the algorithm runs, but this case was not handled correctly. Instead of
making the algorithm deal with this case, switch back to adding peer requests
to the write_requests tree immediately: this improves fairness.
When a peer request is discarded, remove that request from the write_requests
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In compatibility mode with old DRBDs, use that as the state_mutex
as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The lock they constructed is only taken when the state_mutex
was already taken. It is superficial.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
No longer work callbacks must operate on a mdev. From now on they
can also operate on a tconn.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The drbd_md_sync(mdev) happens in the after state change anyways...
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Make it more clear that update_peer_seq() is supposed to wake up the
seq_wait queue whenever the sequence number changes.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Instead of keeping a separate tree for local and remote write requests
for finding requests and for conflict detection, use the same tree for
both purposes. Introduce a flag to allow distinguishing the two
possible types of entries in this tree.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This flag is set when a processes puts itself to sleep to wait for a
conflicting request to complete.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
These things are only used there.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Now drbd communication with protocol 100 actually works.
Replaced the remaining p_header80 with p_header where we
no longer know which header it is.
In the places where p_header80 is still in use, it is on
purpose, because we know that it is an old style header
there.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The new header layout will only be used if the peer supports
it of course.
For the first packet and the handshake packet the old (h80)
layout is used for compatibility reasons.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
recv_bm_rle_bits() should not make any assumptions abou the layout
of the packet header
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Remove the file name and line number from the syslog messages generated:
we have no duplicate function names, and no function contains the same
assertion more than once.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Factor out duplicate code in got_NegAck().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Get rid of the ar_id_to_req() and ack_id_to_req() wrappers.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Unify the ar_id_to_req() and ack_id_to_req() functions: make both fail
if the consistency check fails. Move the request lookup code now
duplicated in both functions into its own function.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Move _ar_id_to_req() to drbd_receiver.c and mark it non-inline. Remove
the leading underscores from _ar_id_to_req() and _ack_id_to_req(). Mark
ar_hash_slot() inline.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This is the only place where this function is used. Make it static.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The ID_VACANT definition has become entirely irrelevant by now.
The is_syncer_block_id() macro does not improve the code, so eliminated
it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Converting the constants happens at compile time.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
DRBD_MAGIC has nothing to do with block ids and the funny values
computed were not actually used, anyway.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we have an asymetrically congested network, we may send P_PING,
but due to congestion, the corresponding P_PING_ACK would time out,
and we would drop a (congested, but otherwise) healthy connection
("PingAck did not arrive in time.")
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Found these with the help of ispell -l.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
The old (optimistic) implementation could shrink the bio size
on an primary device.
Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.
The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.
We cache the last seen max_bio_size of the peer in the meta
data, and rely on that, to make the operation of single
nodes more efficient.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
It seems that the real cause of all the issues where that
we did not noticed in drbd_try_connect() when the other
guy closes one socket if the round trip time gets higher
than 100ms. There were that 100ms hard coded!
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If there is no replication traffic within the idle timeout
(ping-int seconds), DRBD will send a P_PING,
and adjust the timeout to ping-timeout.
If there is no P_PING_ACK received within this ping-timeout,
DRBD finally drops the connection, and tries to re-establish it.
To decide which timeout was active, we compared the current timeout
with the ping-timeout, and dropped the connection, if that was the case.
By default, ping-int is 10 seconds, ping-timeout is 500 ms.
Unfortunately, if you configure ping-timeout to be the same as ping-int,
expiry of the idle-timeout had been mistaken for a missing ping ack,
and caused an immediate reconnection attempt.
Fix:
Allow both timeouts to be equal, use a local variable
to store which timeout is active.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Just deal with it more gracefully, if we fail to add even a single page
to an empty bio. We used to BUG_ON() there, but it has been observed in
some Xen deployment, so we need to handle that case more robustly now.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we fail to send the information that we lost our disk,
we have no connection, and no disk: no access to data anymore.
That is either expected (deconfiguration), or there will be so much
noise in the logs that "Sending state failed" is not useful at all.
Drop it.
If the reason for a shorter than expected receive was a signal,
which we sent because we already decided to disconnect,
these additional log messages are confusing and useless.
This patch follows this pattern:
- dev_warn(DEV, "short read expecting header on sock: r=%d\n", r);
+ if (!signal_pending(current))
+ dev_warn(DEV, "short read expecting header on sock: r=%d\n", r);
Also make them all dev_warn for consistency.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Now that we do no longer in-place endian-swap the bitmap, we allow
selected bitmap operations (testing bits, sometimes even settting bits)
during some bulk operations.
This caused us to hit a lot of FIXME asserts similar to
FIXME asender in drbd_bm_count_bits,
bitmap locked for 'write from resync_finished' by worker
Which now is nonsense: looking at the bitmap is perfectly legal
as long as it is not being resized.
This cosmetic patch defines some flags to describe expectations in finer
detail, so the asserts in e.g. bm_change_bits_to() can be skipped if
appropriate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
All decisions about sync, sync direction, and wether or not to
allow a connect or attach are based on our set of UUIDs to tag a
data generation.
Log changes to the UUIDs whenever they occur,
logging "new current UUID P:Q:R:S" is more useful
than "Creating new current UUID".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The "lazy writeout" of cleared bitmap pages happens during resync, and
should happen again once the resync finishes cleanly, or is aborted.
If resync finished cleanly, or was aborted because of peer disk
failure, we trigger the writeout from worker context in the after
state change work.
If resync was aborted because of connection failure, we should not
immediately trigger bitmap writeout, but rather postpone the
writeout to after the connection cleanup happened. We now do it
in the receiver context from drbd_disconnect().
If resync was aborted because of local disk failure, well, there
is nothing to write to anymore.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Protocol A has no P_WRITE_ACKs, but has P_NEG_ACKs.
The master bio might already be completed, therefore the
request is no longer in the collision hash.
=> Do not try to validate block_id as request
In Protocol B we might already have got a P_RECV_ACK
but then get a P_NEG_ACK after wards.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The point is that drbd_disconnect() can be called with a cstate of
WFConnection.
That happens if the user issues "drbdsetup disconnect" while the
drbd_connect() function executes. Then drbdd_init() will call
drbdd(), which in turn will return without receiving any
packets. Then drbdd_init() will end up calling drbd_disconnect()
with a cstate of WFConnection.
Bottom line: This assertion is wrong as it is, and we do not
see value in fixing it. => Removing it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The test if rs_pending_cnt == 0 was too weak. Using Test for
unacked_cnt == 0 instead. Moved that into the worker.
Since unacked_cnt gets already increased when an P_RS_DATA_REQ
comes in.
Also using a timer to make Ahead -> SyncSource -> Ahead cycles
slower...
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
See also commit from 2009-08-15
"drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost."
We saw cases where the History UUIDs where not as expected. So the
detection of the special case did not trigger. With the sync UUID
no longer being a random number, but deducible from the previous
bitmap UUID, the detection of this special case becomes more
reliable.
The SyncUUID now is the previous bitmap UUID + 0x1000000000000.
Rule 5a:
Cs = H1p & H1p + Offset = Bp
Connection was lost before SyncUUID Packet came through.
Corrent (peer) UUIDs:
Bp = H1p
H1p = H2p
H2p = 0
Become Sync target.
Rule 7a:
Cp = H1s & H1s + Offset = Bs
Connection was lost before SyncUUID Packet came through.
Correct (own) UUIDs:
Bs = H1s
H1s = H2s
H2s = 0
Become Sync source.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We may not get from SyncSource to Ahead if we have sent some
P_RS_DATA_REPLY packets to the peer and are waiting for
P_WRITE_ACK.
Again, this is not relevant for proper tuned systems, but makes
sure that the not-tuned system does not get diverging bitmaps.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the sync source node replies to a P_RS_DATA_REQUEST packet
when it is already in ahead mode. I.e. those two packets
crossed each other on the wire, that may lead to diverging
bitmaps.
This never happens in a well-tuned-system. In a well-tuned-
system the resync controller has reduced the resync speed
to zero long before we got into ahead-mode.
But we have to be prepared for the not-well-tuned-system
of course as well.
Because -> diverging bitmaps = non terminating resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We expect to only receive the recently introduced "set out of sync"
packets in specific states. If we receive them in different states, that
may confuse the resync process to the point where it won't terminate, or
think it made negative progress.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The FAULT_ACTIVE macro just wraps the drbd_insert_fault macro for no
apparent reason.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* C_STARTING_SYNC_S, C_STARTING_SYNC_T In these states the bitmap gets
written to disk. Locking out of app-IO is done by using the
drbd_queue_bitmap_io() and drbd_bitmap_io() functions these days.
It is no longer necessary to lock out app-IO based on the connection
state.
App-IO that may come in after the BITMAP_IO flag got cleared before the
state transition to C_SYNC_(SOURCE|TARGET) does not get mirrored, sets
a bit in the local bitmap, that is already set, therefore changes nothing.
* C_WF_BITMAP_S In this state we send updates (P_OUT_OF_SYNC packets).
With that we make sure they have the same number of bits when going
into the C_SYNC_(SOURCE|TARGET) connection state.
* C_UNCONNECTED: The receiver starts, no need to lock out IO.
* C_DISCONNECTING: in drbd_disconnect() we had a wait_event()
to wait until ap_bio_cnt reaches 0. Removed that.
* C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE
C_PROTOCOL_ERROR, C_TEAR_DOWN: Same as C_DISCONNECTING
* C_WF_REPORT_PARAMS: IO still possible since that is still
like C_WF_CONNECTION.
And we do not need to send barriers in C_WF_BITMAP_S connection state.
Allow concurrent accesses to the bitmap when receiving the bitmap.
Everything gets ORed anyways.
A drbd_free_tl_hash() is in after_state_chg_work(). At that point
all the work items of the last connections must have been processed.
Introduced a call to drbd_free_tl_hash() into drbd_free_mdev()
for paranoia reasons.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We only issue resync requests if there is no significant application IO
going on. = Application IO has higher priority than resnyc IO.
If application IO can not be started because the resync process locked
an resync_lru entry, start the IO operations necessary to release the
lock ASAP.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In this connection mode, the ahead node no longer replicates
application IO. The behind's disk becomes out dated.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
To ease tracking of bios in some hash tables, we want it to
not cross certain boundaries (128k, used to be 32k).
We limit the maximum bio size using queue parameters.
Historically some defines and variables we use there have been named
max_segment_size, which was misguided. Rename them to max_bio_size,
and use [blk_]queue_max_hw_sectors where appropriate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With data-integrity digest enabled, double-check on the sending side
for modifications by upper layers of buffers under write back,
so we can tell it appart from corruption on the "wire".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Show progressbar and ETA always, with proc_details >= 1 also show the
current sector position for both resync and online-verify on both nodes.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For partial (resumed) online verify, initialize the resync step marks
once we know what the online verify start sector is.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For a partial (resumed) online-verify, initialize rs_total not to total
bits, but to number of bits to check in this run, to match the meaning
rs_total has for actual resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>