Commit Graph

249 Commits

Author SHA1 Message Date
Alexander Aring
6cde210a97 fs: dlm: add helper for init connection
This patch will move the connection structure initialization into an
own function. This avoids cases to update the othercon initialization.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-11-10 12:14:20 -06:00
Alexander Aring
19633c7e20 fs: dlm: handle non blocked connect event
The manpage of connect shows that in non blocked mode a writeability
indicates successful connection event. This patch is handling this event
inside the writeability callback. In case of SCTP we use blocking
connect functionality which indicates a successful connect when the
function returns with a successful return value.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-11-10 12:14:20 -06:00
Alexander Aring
53a5edaa05 fs: dlm: flush othercon at close
This patch ensures we also flush the othercon writequeue when a lowcomms
close occurs.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-11-10 12:14:20 -06:00
Alexander Aring
692f51c8cb fs: dlm: add get buffer error handling
This patch adds an error handling to the get buffer functionality if the
user is requesting a buffer length which is more than possible of
the internal buffer allocator. This should never happen because specific
handling decided by compile time, but will warn if somebody forget about
to handle this limitation right.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-11-10 12:14:20 -06:00
Alexander Aring
5cbec208dc fs: dlm: fix proper srcu api call
This patch will use call_srcu() instead of call_rcu() because the
related datastructure resource are handled under srcu context. I assume
the current code is fine anyway since free_conn() must be called when
the related resource are not in use otherwise. However it will correct
the overall handling in a srcu context.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-11-10 12:14:20 -06:00
Alexander Aring
4f2b30fd9b fs: dlm: fix race in nodeid2con
This patch fixes a race in nodeid2con in cases that we parallel running
a lookup and both will create a connection structure for the same nodeid.
It's a rare case to create a new connection structure to keep reader
lockless we just do a lookup inside the protection area again and drop
previous work if this race happens.

Fixes: a47666eb76 ("fs: dlm: make connection hash lockless")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-10-01 09:25:07 -05:00
Alexander Aring
4798cbbfbd fs: dlm: rework receive handling
This patch reworks the current receive handling of dlm. As I tried to
change the send handling to fix reorder issues I took a look into the
receive handling and simplified it, it works as the following:

Each connection has a preallocated receive buffer with a minimum length of
4096. On receive, the upper layer protocol will process all dlm message
until there is not enough data anymore. If there exists "leftover" data at
the end of the receive buffer because the dlm message wasn't fully received
it will be copied to the begin of the preallocated receive buffer. Next
receive more data will be appended to the previous "leftover" data and
processing will begin again.

This will remove a lot of code of the current mechanism. Inside the
processing functionality we will ensure with a memmove() that the dlm
message should be memory aligned. To have a dlm message always started
at the beginning of the buffer will reduce some amount of memmove()
calls because src and dest pointers are the same.

The cluster attribute "buffer_size" becomes a new meaning, it's now the
size of application layer receive buffer size. If this is changed during
runtime the receive buffer will be reallocated. It's important that the
receive buffer size has at minimum the size of the maximum possible dlm
message size otherwise the received message cannot be placed inside
the receive buffer size.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29 14:00:32 -05:00
Alexander Aring
3f78cd7d24 fs: dlm: fix mark per nodeid setting
This patch fixes to set per nodeid mark configuration for accepted
sockets as well. Before this patch only the listen socket mark value was
used for all accepted connections. This patch will ensure that the
cluster mark attribute value will be always used for all sockets, if a
per nodeid mark value is specified dlm will use this value for the
specific node.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29 14:00:32 -05:00
Alexander Aring
0461e0db94 fs: dlm: remove lock dependency warning
During my experiments to make dlm robust against tcpkill application I
was able to run sometimes in a circular lock dependency warning between
clusters_root.subsys.su_mutex and con->sock_mutex. We don't need to
held the sock_mutex when getting the mark value which held the
clusters_root.subsys.su_mutex. This patch moves the specific handling
just before the sock_mutex will be held.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29 14:00:32 -05:00
Alexander Aring
7ae0451e2e fs: dlm: use free_con to free connection
This patch use free_con() functionality to free the listen connection if
listen fails. It also fixes an issue that a freed resource is still part
of the connection_hash as hlist_del() is not called in this case. The
only difference is that free_con() handles othercon as well, but this is
never been set for the listen connection.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-27 15:59:09 -05:00
Alexander Aring
948c47e9bc fs: dlm: handle possible othercon writequeues
This patch adds free of possible other writequeue entries in othercon
member of struct connection.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-27 15:59:09 -05:00
Alexander Aring
0de984323a fs: dlm: move free writequeue into con free
This patch just move the free of struct connection member writequeue
into the functionality when struct connection will be freed instead of
doing two iterations.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-27 15:59:09 -05:00
Alexander Aring
043697f030 fs: dlm: fix dlm_local_addr memory leak
This patch fixes the following memory detected by kmemleak and umount
gfs2 filesystem which removed the last lockspace:

unreferenced object 0xffff9264f4f48f00 (size 128):
  comm "mount", pid 425, jiffies 4294690253 (age 48.159s)
  hex dump (first 32 bytes):
    02 00 52 48 c0 a8 7a fb 00 00 00 00 00 00 00 00  ..RH..z.........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000067a34940>] kmemdup+0x18/0x40
    [<00000000c935f9ab>] init_local+0x4c/0xa0
    [<00000000bbd286ef>] dlm_lowcomms_start+0x28/0x160
    [<00000000a86625cb>] dlm_new_lockspace+0x7e/0xb80
    [<000000008df6cd63>] gdlm_mount+0x1cc/0x5de
    [<00000000b67df8c7>] gfs2_lm_mount.constprop.0+0x1a3/0x1d3
    [<000000006642ac5e>] gfs2_fill_super+0x717/0xba9
    [<00000000d3ab7118>] get_tree_bdev+0x17f/0x280
    [<000000001975926e>] gfs2_get_tree+0x21/0x90
    [<00000000561ce1c4>] vfs_get_tree+0x28/0xc0
    [<000000007fecaf63>] path_mount+0x434/0xc00
    [<00000000636b9594>] __x64_sys_mount+0xe3/0x120
    [<00000000cc478a33>] do_syscall_64+0x33/0x40
    [<00000000ce9ccf01>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-27 15:59:09 -05:00
Alexander Aring
a47666eb76 fs: dlm: make connection hash lockless
There are some problems with the connections_lock. During my
experiements I saw sometimes circular dependencies with sock_lock.
The reason here might be code parts which runs nodeid2con() before
or after sock_lock is acquired.

Another issue are missing locks in for_conn() iteration. Maybe this
works fine because for_conn() is running in a context where
connection_hash cannot be manipulated by others anymore.

However this patch changes the connection_hash to be protected by
sleepable rcu. The hotpath function __find_con() is implemented
lockless as it is only a reader of connection_hash and this hopefully
fixes the circular locking dependencies. The iteration for_conn() will
still call some sleepable functionality, that's why we use sleepable rcu
in this case.

This patch removes the kmemcache functionality as I think I need to
make some free() functionality via call_rcu(). However allocation time
isn't here an issue. The dlm_allow_con will not be protected by a lock
anymore as I think it's enough to just set and flush workqueues
afterwards.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-27 15:59:09 -05:00
Alexander Aring
aa7ab1e208 fs: dlm: synchronize dlm before shutdown
This patch moves the dlm workqueue dlm synchronization before shutdown
handling. The patch just flushes all pending work before starting to
shutdown the connection. At least for the send_workqeue we should flush
the workqueue to make sure there is no new connection handling going on
as dlm_allow_conn switch is turned to false before.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-27 15:59:09 -05:00
Alexander Aring
055923bf6b fs: dlm: implement tcp graceful shutdown
During my code inspection I saw there is no implementation of a graceful
shutdown for tcp. This patch will introduce a graceful shutdown for tcp
connections. The shutdown is implemented synchronized as
dlm_lowcomms_stop() is called to end all dlm communication. After shutdown
is done, a lot of flush and closing functionality will be called. However
I don't see a problem with that.

The waitqueue for synchronize the shutdown has a timeout of 10 seconds, if
timeout a force close will be exectued.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06 10:30:54 -05:00
Alexander Aring
ba3ab3ca68 fs: dlm: change handling of reconnects
This patch changes the handling of reconnects. At first we only close
the connection related to the communication failure. If we get a new
connection for an already existing connection we close the existing
connection and take the new one.

This patch improves significantly the stability of tcp connections while
running "tcpkill -9 -i $IFACE port 21064" while generating a lot of dlm
messages e.g. on a gfs2 mount with many files. My test setup shows that a
deadlock is "more" unlikely. Before this patch I wasn't able to get
not a deadlock after 5 seconds. After this patch my observation is
that it's more likely to survive after 5 seconds and more, but still a
deadlock occurs after certain time. My guess is that there are still
"segments" inside the tcp writequeue or retransmit queue which get dropped
when receiving a tcp reset [1]. Hard to reproduce because the right message
need to be inside these queues, which might even be in the 5 first seconds
with this patch.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp_input.c?h=v5.8-rc6#n4122

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06 10:30:54 -05:00
Alexander Aring
0ea47e4d21 fs: dlm: don't close socket on invalid message
This patch doesn't close sockets when there is an invalid dlm message
received. The connection will probably reconnect anyway so. To not
close the connection will reduce the number of possible failtures.
As we don't have a different strategy to react on such scenario
just keep going the connection and ignore the message.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06 10:30:53 -05:00
Alexander Aring
9c9f168f5b fs: dlm: set skb mark per peer socket
This patch adds support to set the skb mark value for the DLM tcp and
sctp socket per peer. The mark value will be offered as per comm value
of configfs. At creation time of the peer socket it will be set as
socket option.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06 10:30:52 -05:00
Alexander Aring
a5b7ab6352 fs: dlm: set skb mark for listen socket
This patch adds support to set the skb mark value for the DLM listen
tcp and sctp sockets. The mark value will be offered as cluster
configuration. At creation time of the listen socket it will be set as
socket option.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06 10:30:51 -05:00
Christoph Hellwig
c0425a4249 net: add a new bind_add method
The SCTP protocol allows to bind multiple address to a socket.  That
feature is currently only exposed as a socket option.  Add a bind_add
method struct proto that allows to bind additional addresses, and
switch the dlm code to use the method instead of going through the
socket option from kernel space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29 13:10:39 -07:00
Christoph Hellwig
40ef92c6ec sctp: add sctp_sock_set_nodelay
Add a helper to directly set the SCTP_NODELAY sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29 13:10:39 -07:00
Christoph Hellwig
12abc5ee78 tcp: add tcp_sock_set_nodelay
Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
26cfabf9cd net: add sock_set_rcvbuf
Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:44 -07:00
Christoph Hellwig
ce3d9544ce net: add sock_set_keepalive
Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:44 -07:00
Christoph Hellwig
76ee0785f4 net: add sock_set_sndtimeo
Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel
space without going through a fake uaccess.  The interface is
simplified to only pass the seconds value, as that is the only
thing needed at the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:44 -07:00
Christoph Hellwig
b58f0e8f38 net: add sock_set_reuseaddr
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space
without going through a fake uaccess.

For this the iscsi target now has to formally depend on inet to avoid
a mostly theoretical compile failure.  For actual operation it already
did depend on having ipv4 or ipv6 support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:44 -07:00
Christoph Hellwig
0774dc7643 dlm: use the tcp version of accept_from_sock for sctp as well
The only difference between a few missing fixes applied to the SCTP
one is that TCP uses ->getpeername to get the remote address, while
SCTP uses kernel_getsockopt(.. SCTP_PRIMARY_ADDR).  But given that
getpeername is defined to return the primary address for sctp, there
doesn't seem to be any reason for the different way of quering the
peername, or all the code duplication.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27 15:11:33 -07:00
Arnd Bergmann
5311f707b4 dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
Eliminate one more use of 'struct timeval' from the kernel so
we can eventually remove the definition as well.

The kernel supports the new format with a 64-bit time_t version
of timeval here, so use that instead of the old timeval.

Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-18 18:07:31 +01:00
David Windsor
b355516f45 dlm: check if workqueues are NULL before flushing/destroying
If the DLM lowcomms stack is shut down before any DLM
traffic can be generated, flush_workqueue() and
destroy_workqueue() can be called on empty send and/or recv
workqueues.

Insert guard conditionals to only call flush_workqueue()
and destroy_workqueue() on workqueues that are not NULL.

Signed-off-by: David Windsor <dwindsor@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2019-07-11 11:01:58 -05:00
Thomas Gleixner
2522fe45a1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license v 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 45 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:29:21 -07:00
Deepa Dinamani
45bdc66159 socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
as the time format. struct timeval is not y2038 safe.
The subsequent patches in the series add support for new socket
timeout options with _NEW suffix that will use y2038 safe
data structures. Although the existing struct timeval layout
is sufficiently wide to represent timeouts, because of the way
libc will interpret time_t based on user defined flag, these
new flags provide a way of having a structure that is the same
for all architectures consistently.
Rename the existing options with _OLD suffix forms so that the
right option is enabled for userspace applications according
to the architecture and time_t definition of libc.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:31 -08:00
David Howells
aa563d7bca iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements.  This makes it easier to add further
iterator types.  Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself.  Only the direction is required.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
Gang He
da3627c30d dlm: remove O_NONBLOCK flag in sctp_connect_to_sock
We should remove O_NONBLOCK flag when calling sock->ops->connect()
in sctp_connect_to_sock() function.
Why?
1. up to now, sctp socket connect() function ignores the flag argument,
that means O_NONBLOCK flag does not take effect, then we should remove
it to avoid the confusion (but is not urgent).
2. for the future, there will be a patch to fix this problem, then the flag
argument will take effect, the patch has been queued at https://git.kernel.o
rg/pub/scm/linux/kernel/git/davem/net.git/commit/net/sctp?id=644fbdeacf1d3ed
d366e44b8ba214de9d1dd66a9.
But, the O_NONBLOCK flag will make sock->ops->connect() directly return
without any wait time, then the connection will not be established, DLM kernel
module will call sock->ops->connect() again and again, the bad results are,
CPU usage is almost 100%, even trigger soft_lockup problem if the related
configurations are enabled,
DLM kernel module also prints lots of messages like,
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
The upper application (e.g. ocfs2 mount command) is hanged at new_lockspace(),
the whole backtrace is as below,
tb0307-nd2:~ # cat /proc/2935/stack
[<0>] new_lockspace+0x957/0xac0 [dlm]
[<0>] dlm_new_lockspace+0xae/0x140 [dlm]
[<0>] user_cluster_connect+0xc3/0x3a0 [ocfs2_stack_user]
[<0>] ocfs2_cluster_connect+0x144/0x220 [ocfs2_stackglue]
[<0>] ocfs2_dlm_init+0x215/0x440 [ocfs2]
[<0>] ocfs2_fill_super+0xcb0/0x1290 [ocfs2]
[<0>] mount_bdev+0x173/0x1b0
[<0>] mount_fs+0x35/0x150
[<0>] vfs_kern_mount.part.23+0x54/0x100
[<0>] do_mount+0x59a/0xc40
[<0>] SyS_mount+0x80/0xd0
[<0>] do_syscall_64+0x76/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[<0>] 0xffffffffffffffff

So, I think we should remove O_NONBLOCK flag here, since DLM kernel module can
not handle non-block sockect in connect() properly.

Signed-off-by: Gang He <ghe@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-05-29 10:48:35 -05:00
Gang He
f706d83015 dlm: make sctp_connect_to_sock() return in specified time
When the user setup a two-ring cluster, DLM kernel module
will automatically selects to use SCTP protocol to communicate
between each node. There will be about 5 minute hang in DLM
kernel module, in case one ring is broken before switching to
another ring, this will potentially affect the dependent upper
applications, e.g. ocfs2, gfs2, clvm and clustered-MD, etc.
Unfortunately, if the user setup a two-ring cluster, we can not
specify DLM communication protocol with TCP explicitly, since
DLM kernel module only supports SCTP protocol for multiple
ring cluster.
Base on my investigation, the time is spent in sock->ops->connect()
function before returns ETIMEDOUT(-110) error, since O_NONBLOCK
argument in connect() function does not work here, then we should
make sock->ops->connect() function return in specified time via
setting socket SO_SNDTIMEO atrribute.

Signed-off-by: Gang He <ghe@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-05-02 10:28:35 -05:00
Gang He
b09c603ca4 dlm: fix a clerical error when set SCTP_NODELAY
There is a clerical error when turn off Nagle's algorithm in
sctp_connect_to_sock() function, this results in turn off
Nagle's algorithm failure.
After this correction, DLM performance will be improved obviously
when using SCTP procotol.

Signed-off-by: Gang He <ghe@suse.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-05-02 10:22:25 -05:00
Denys Vlasenko
9b2c45d479 net: make getname() functions return length rather than use int* parameter
Changes since v1:
Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

    text    data     bss      dec     hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12 14:15:04 -05:00
Al Viro
c8c7840ea9 dlm: switch to sock_recvmsg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-02 20:37:47 -05:00
tsutomu.owa@toshiba.co.jp
26b41099e7 DLM: fix NULL pointer dereference in send_to_sock()
The writequeue and writequeue_lock member of othercon was not initialized.
If lowcomms_state_change() is called from network layer, othercon->swork
may be scheduled. In this case, send_to_sock() will generate a NULL pointer
reference. We avoid this problem by correctly initializing writequeue and
writequeue_lock member of othercon.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
0aa18464c8 DLM: fix to reschedule rwork
When an error occurs in kernel_recvmsg or kernel_sendpage and
close_connection is called and receive work is already scheduled,
receive work is canceled. In that case, the receive work will not
be scheduled forever after reconnection, because CF_READ_PENDING
flag is established.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
93eaadebe9 DLM: fix to use sk_callback_lock correctly
In the current implementation, we think that exclusion control between
processing to set the callback function to the connection structure and
processing to refer to the connection structure from the callback function
was not enough. We fix them.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
3421fb15be DLM: fix memory leak in tcp_accept_from_sock()
The sk member of the socket generated by sock_create_kern() is overwritten
by ops->accept(). So the previous sk will not be released.
We use kernel_accept() instead of sock_create_kern() and ops->accept().

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
173a31fe2b DLM: use CF_CLOSE flag to stop dlm_send correctly
If reconnection fails while executing dlm_lowcomms_stop,
dlm_send will not stop.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
8a4abb0819 DLM: Reanimate CF_WRITE_PENDING flag
CF_WRITE_PENDING flag has been reanimated to make dlm_send stop properly
when running dlm_lowcomms_stop.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
c553e173b0 DLM: close othercon at send/receive error
If an error occurs in the sending / receiving process, if othercon
exists, sending / receiving processing using othercon may also result
in an error. We fix to pre-close othercon as well.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
c7355827b2 DLM: fix to use sock_mutex correctly in xxx_accept_from_sock
In the current implementation, we think that exclusion control
for othercon in tcp_accept_from_sock() and sctp_accept_from_sock()
was not enough. We fix them.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
b2a6662932 DLM: fix race condition between dlm_send and dlm_recv
When kernel_sendpage(in send_to_sock) and kernel_recvmsg
(in receive_from_sock) return error, close_connection may works at the
same time. At that time, they may wait for each other by cancel_work_sync.

Signed-off-by: Tadashi Miyauchi <miayuchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
f0fb83cb92 DLM: fix double list_del()
dlm_lowcomms_stop() was not functioning properly. Correctly, we have to
wait until all processing is finished with send_workqueue and
recv_workqueue.
This problem causes the following issue. Senario is

1. dlm_send thread:
    send_to_sock refers con->writequeue
2. main thread:
    dlm_lowcomms_stop calls list_del
3. dlm_send thread:
    send_to_sock calls list_del in writequeue_entry_complete

[ 1925.770305] dlm: canceled swork for node 4
[ 1925.772374] general protection fault: 0000 [#1] SMP
[ 1925.777930] Modules linked in: ocfs2_stack_user ocfs2 ocfs2_nodemanager ocfs2_stackglue dlm fmxnet(O) fmx_api(O) fmx_cu(O) igb(O) kvm_intel kvm irqbypass autofs4
[ 1925.794131] CPU: 3 PID: 6994 Comm: kworker/u8:0 Tainted: G           O    4.4.39 #1
[ 1925.802684] Hardware name: TOSHIBA OX/OX, BIOS OX-P0015 12/03/2015
[ 1925.809595] Workqueue: dlm_send process_send_sockets [dlm]
[ 1925.815714] task: ffff8804398d3c00 ti: ffff88046910c000 task.ti: ffff88046910c000
[ 1925.824072] RIP: 0010:[<ffffffffa04bd158>]  [<ffffffffa04bd158>] process_send_sockets+0xf8/0x280 [dlm]
[ 1925.834480] RSP: 0018:ffff88046910fde0  EFLAGS: 00010246
[ 1925.840411] RAX: dead000000000200 RBX: 0000000000000001 RCX: 000000000000000a
[ 1925.848372] RDX: ffff88046bd980c0 RSI: 0000000000000000 RDI: ffff8804673c5670
[ 1925.856341] RBP: ffff88046910fe20 R08: 00000000000000c9 R09: 0000000000000010
[ 1925.864311] R10: ffffffff81e22fc0 R11: 0000000000000000 R12: ffff8804673c56d8
[ 1925.872281] R13: ffff8804673c5660 R14: ffff88046bd98440 R15: 0000000000000058
[ 1925.880251] FS:  0000000000000000(0000) GS:ffff88047fd80000(0000) knlGS:0000000000000000
[ 1925.889280] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1925.895694] CR2: 00007fff09eadf58 CR3: 00000004690f5000 CR4: 00000000001006e0
[ 1925.903663] Stack:
[ 1925.905903]  ffff8804673c5630 ffff8804673c5620 ffff8804673c5670 ffff88007d219b40
[ 1925.914181]  ffff88046f095800 0000000000000100 ffff8800717a1400 ffff8804673c56d8
[ 1925.922459]  ffff88046910fe60 ffffffff81073db2 00ff880400000000 ffff88007d219b40
[ 1925.930736] Call Trace:
[ 1925.933468]  [<ffffffff81073db2>] process_one_work+0x162/0x450
[ 1925.939983]  [<ffffffff81074459>] worker_thread+0x69/0x4a0
[ 1925.946109]  [<ffffffff810743f0>] ? rescuer_thread+0x350/0x350
[ 1925.952622]  [<ffffffff8107956f>] kthread+0xef/0x110
[ 1925.958165]  [<ffffffff81079480>] ? kthread_park+0x60/0x60
[ 1925.964283]  [<ffffffff8186ab2f>] ret_from_fork+0x3f/0x70
[ 1925.970312]  [<ffffffff81079480>] ? kthread_park+0x60/0x60
[ 1925.976436] Code: 01 00 00 48 8b 7d d0 e8 07 d3 3a e1 45 01 7e 18 45 29 7e 1c 75 ab 41 8b 46 24 85 c0 75 a3 49 8b 16 49 8b 46 08 31 f6 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 49 8b 7e 10 49 89 06 66
[ 1925.997791] RIP  [<ffffffffa04bd158>] process_send_sockets+0xf8/0x280 [dlm]
[ 1926.005577]  RSP <ffff88046910fde0>

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
tsutomu.owa@toshiba.co.jp
988419a9de DLM: fix remove save_cb argument from add_sock()
save_cb argument is not used. We remove them.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
Bob Peterson
cc661fc934 DLM: Fix saving of NULL callbacks
In a previous patch I noted that accept() often copies the struct
sock (sk) which overwrites the sock callbacks. However, in testing
we discovered that the dlm connection structures (con) are sometimes
deleted and recreated as connections come and go, and since they're
zeroed out by kmem_cache_zalloc, the saved callback pointers are
also initialized to zero. But with today's DLM code, the callbacks
are only saved when a socket is added.

During recovery testing, we discovered a common situation in which
the new con is initialized to zero, then a socket is added after
accept(). In this case, the sock's saved values are all NULL, but
the saved values are wiped out, due to accept(). Therefore, we
don't have a known good copy of the callbacks from which we can
restore.

Since the struct sock callbacks are always good after listen(),
this patch saves the known good values after listen(). These good
values are then used for subsequent restores.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
Bob Peterson
01da24d3fb DLM: Eliminate CF_WRITE_PENDING flag
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
Bob Peterson
61d9102b62 DLM: Eliminate CF_CONNECT_PENDING flag
Before this patch, there was a flag in the con structure that was
used to determine whether or not a connect was needed. The bit was
set here and there, and cleared here and there, so it left some
race conditions: the bit was set, work was queued, then the worker
cleared the bit, allowing someone else to set it while the worker
ran. For the most part, this worked okay, but we got into trouble
if connections were lost and it needed to reconnect.

This patch eliminates the flag in favor of simply checking if we
actually have a sock pointer while protected by the mutex.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
Guoqing Jiang
1c24285372 dlm: use sock_create_lite inside tcp_accept_from_sock
With commit 0ffdaf5b41 ("net/sock: add WARN_ON(parent->sk)
in sock_graft()"), a calltrace happened as follows:

[  457.018340] WARNING: CPU: 0 PID: 15623 at ./include/net/sock.h:1703 inet_accept+0x135/0x140
...
[  457.018381] RIP: 0010:inet_accept+0x135/0x140
[  457.018381] RSP: 0018:ffffc90001727d18 EFLAGS: 00010286
[  457.018383] RAX: 0000000000000001 RBX: ffff880012413000 RCX: 0000000000000001
[  457.018384] RDX: 000000000000018a RSI: 00000000fffffe01 RDI: ffffffff8156fae8
[  457.018384] RBP: ffffc90001727d38 R08: 0000000000000000 R09: 0000000000004305
[  457.018385] R10: 0000000000000001 R11: 0000000000004304 R12: ffff880035ae7a00
[  457.018386] R13: ffff88001282af10 R14: ffff880034e4e200 R15: 0000000000000000
[  457.018387] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  457.018388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  457.018389] CR2: 00007fdec22f9000 CR3: 0000000002b5a000 CR4: 00000000000006f0
[  457.018395] Call Trace:
[  457.018402]  tcp_accept_from_sock.part.8+0x12d/0x449 [dlm]
[  457.018405]  ? vprintk_emit+0x248/0x2d0
[  457.018409]  tcp_accept_from_sock+0x3f/0x50 [dlm]
[  457.018413]  process_recv_sockets+0x3b/0x50 [dlm]
[  457.018415]  process_one_work+0x138/0x370
[  457.018417]  worker_thread+0x4d/0x3b0
[  457.018419]  kthread+0x109/0x140
[  457.018421]  ? rescuer_thread+0x320/0x320
[  457.018422]  ? kthread_park+0x60/0x60
[  457.018424]  ret_from_fork+0x25/0x30

Since newsocket created by sock_create_kern sets it's
sock by the path:

	sock_create_kern -> __sock_creat
			 ->pf->create => inet_create
			 -> sock_init_data

Then WARN_ON is triggered by "con->sock->ops->accept =>
inet_accept -> sock_graft", it also means newsock->sk
is leaked since sock_graft will replace it with a new
sk.

To resolve the issue, we need to use sock_create_lite
instead of sock_create_kern, like commit 0933a578cd
("rds: tcp: use sock_create_lite() to create the accept
socket") did.

Reported-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07 11:23:09 -05:00
David Howells
cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
Wei Yongjun
26c1ec2fe4 dlm: fix error return code in sctp_accept_from_sock()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-10-24 10:01:51 -05:00
Bob Peterson
d2fee58a3b dlm: remove lock_sock to avoid scheduling while atomic
Before this patch, functions save_callbacks and restore_callbacks
called function lock_sock and release_sock to prevent other processes
from messing with the struct sock while the callbacks were saved and
restored. However, function add_sock calls write_lock_bh prior to
calling it save_callbacks, which disables preempts. So the call to
lock_sock would try to schedule when we can't schedule.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-10-19 11:00:03 -05:00
Bob Peterson
3735b4b9f1 dlm: don't save callbacks after accept
When DLM calls accept() on a socket, the comm code copies the sk
after we've saved its callbacks. Afterward, it calls add_sock which
saves the callbacks a second time. Since the error reporting function
lowcomms_error_report calls the previous callback too, this results
in a recursive call to itself. This patch adds a new parameter to
function add_sock to tell whether to save the callbacks. Function
tcp_accept_from_sock (and its sctp counterpart) then calls it with
false to avoid the recursion.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-10-19 11:00:03 -05:00
Marcelo Ricardo Leitner
3a8db79889 dlm: free workqueues after the connections
After backporting commit ee44b4bc05 ("dlm: use sctp 1-to-1 API")
series to a kernel with an older workqueue which didn't use RCU yet, it
was noticed that we are freeing the workqueues in dlm_lowcomms_stop()
too early as free_conn() will try to access that memory for canceling
the queued works if any.

This issue was introduced by commit 0d737a8cfd as before it such
attempt to cancel the queued works wasn't performed, so the issue was
not present.

This patch fixes it by simply inverting the free order.

Cc: stable@vger.kernel.org
Fixes: 0d737a8cfd ("dlm: fix race while closing connections")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-10-10 09:54:00 -05:00
Amitoj Kaur Chawla
5c93f56f77 dlm: Use kmemdup instead of kmalloc and memcpy
Replace calls to kmalloc followed by a memcpy with a direct call to
kmemdup.

The Coccinelle semantic patch used to make this change is as follows:
@@
expression from,to,size,flag;
statement S;
@@

-  to = \(kmalloc\|kzalloc\)(size,flag);
+  to = kmemdup(from,size,flag);
   if (to==NULL || ...) S
-  memcpy(to, from, size);

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-06-23 11:55:58 -05:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Bob Peterson
b81171cb68 DLM: Save and restore socket callbacks properly
This patch fixes the problems with patch b3a5bbfd7.

1. It removes a return statement from lowcomms_error_report
   because it needs to call the original error report in all paths
   through the function.
2. All socket callbacks are saved and restored, not just the
   sk_error_report, and that's done so with proper locking like
   sunrpc does.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-02-22 14:02:17 -06:00
Bob Peterson
1a31833d08 DLM: Replace nodeid_to_addr with kernel_getpeername
This patch replaces the call to nodeid_to_addr with a call to
kernel_getpeername. This avoids taking a spinlock because it may
potentially be called from a softirq context.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2016-02-22 14:02:11 -06:00
Eric Dumazet
9cd3e072b0 net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:45:05 -05:00
Bob Peterson
b3a5bbfd78 dlm: print error from kernel_sendpage
Print a dlm-specific error when a socket error occurs
when sending a dlm message.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-27 09:34:47 -05:00
kbuild test robot
18df8a87ba dlm: sctp_accept_from_sock() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:23:09 -05:00
Marcelo Ricardo Leitner
00dcffaebf dlm: fix reconnecting but not sending data
There are cases on which lowcomms_connect_sock() is called directly,
which caused the CF_WRITE_PENDING flag to not bet set upon reconnect,
specially on send_to_sock() error handling. On this last, the flag was
already cleared and no further attempt on transmitting would be done.

As dlm tends to connect when it needs to transmit something, it makes
sense to always mark this flag right after the connect.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:21 -05:00
Marcelo Ricardo Leitner
acee4e527d dlm: replace BUG_ON with a less severe handling
BUG_ON() is a severe action for this case, specially now that DLM with
SCTP will use 1 socket per association. Instead, we can just close the
socket on this error condition and return from the function.

Also move the check to an earlier stage as it won't change and thus we
can abort as soon as possible.

Although this issue was reported when still using SCTP with 1-to-many
API, this cleanup wouldn't be that simple back then because we couldn't
close the socket and making sure such event would cease would be hard.
And actually, previous code was closing the association, yet SCTP layer
is still raising the new data event. Probably a bug to be fixed in SCTP.

Reported-by: <tan.hu@zte.com.cn>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:21 -05:00
Marcelo Ricardo Leitner
ee44b4bc05 dlm: use sctp 1-to-1 API
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
needed but this causes it to use sctp_do_peeloff() to mimic an
kernel_accept() and this causes a symbol dependency on sctp module.

By switching it to 1-to-1 API we can avoid this dependency and also
reduce quite a lot of SCTP-specific code in lowcomms.c.

The caveat is that now DLM won't always use the same src port. It will
choose a random one, just like TCP code. This allows the peers to
attempt simultaneous connections, which now are handled just like for
TCP.

Even more sharing between TCP and SCTP code on DLM is possible, but it
is intentionally left for a later commit.

Note that for using nodes with this commit, you have to have at least
the early fixes on this patchset otherwise it will trigger some issues
on old nodes.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:20 -05:00
Marcelo Ricardo Leitner
356344c4c3 dlm: fix not reconnecting on connecting error handling
If we don't clear that bit, lowcomms_connect_sock() will not schedule
another attempt, and no further attempt will be done.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:19 -05:00
Marcelo Ricardo Leitner
0d737a8cfd dlm: fix race while closing connections
When a connection have issues DLM may need to close it.  Therefore we
should also cancel pending workqueues for such connection at that time,
and not just when dlm is not willing to use this connection anymore.

Also, if we don't clear CF_CONNECT_PENDING flag, the error handling
routines won't be able to re-connect as lowcomms_connect_sock() will
check for it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:19 -05:00
Marcelo Ricardo Leitner
28926a0965 dlm: fix connection stealing if using SCTP
When using SCTP and accepting a new connection, DLM currently validates
if the peer trying to connect to it is one of the cluster nodes, but it
doesn't check if it already has a connection to it or not.

If it already had a connection, it will be overwritten, and the new one
will be used for writes, possibly causing the node to leave the cluster
due to communication breakage.

Still, one could DoS the node by attempting N connections and keeping
them open.

As said, but being explicit, both situations are only triggerable from
other cluster nodes, but are doable with only user-level perms.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:15 -05:00
Eric W. Biederman
eeb1bd5c40 net: Add a struct net parameter to sock_create_kern
This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:17 -04:00
Lidong Zhong
883854c545 dlm: keep listening connection alive with sctp mode
The connection struct with nodeid 0 is the listening socket,
not a connection to another node.  The sctp resend function
was not checking that the nodeid was valid (non-zero), so it
would mistakenly get and resend on the listening connection
when nodeid was zero.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2014-06-12 10:26:14 -05:00
David S. Miller
676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
wangweidong
048ed4b626 sctp: remove macros sctp_{lock|release}_sock
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
Dongmao Zhang
ece35848c1 dlm: set zero linger time on sctp socket
The recovery time for a failed node was taking a long
time because the failed node could not perform the full
shutdown process.  Removing the linger time speeds this
up.  The dlm does not care what happens to messages to
or from the failed node.

Signed-off-by: Dongmao Zhang <dmzhang@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-12-16 09:52:34 -06:00
Wei Yongjun
06452eb053 dlm: remove duplicated include from lowcomms.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-19 09:52:09 -05:00
Mike Christie
86e92ad299 dlm: disable nagle for SCTP
For TCP we disable Nagle and I cannot think of why it would be needed
for SCTP. When disabled it seems to improve dlm_lock operations like it
does for TCP.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14 13:07:11 -05:00
Mike Christie
5d6898714f dlm: retry failed SCTP sends
Currently if a SCTP send fails, we lose the data we were trying
to send because the writequeue_entry is released when we do the send.
When this happens other nodes will then hang waiting for a reply.

This adds support for SCTP to retry the send operation.

I also removed the retry limit for SCTP use, because we want
to make sure we try every path during init time and for longer
failures we want to continually retry in case paths come back up
while trying other paths. We will do this until userspace tells us
to stop.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14 13:07:11 -05:00
Mike Christie
98e1b60ecc dlm: try other IPs when sctp init assoc fails
Currently, if we cannot create a association to the first IP addr
that is added to DLM, the SCTP init assoc code will just retry
the same IP. This patch adds a simple failover schemes where we
will try one of the addresses that was passed into DLM.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14 13:07:11 -05:00
Mike Christie
b390ca38d2 dlm: clear correct bit during sctp init failure handling
We should be testing and cleaing the init pending bit because later
when sctp_init_assoc is recalled it will be checking that it is not set
and set the bit.

We do not want to touch CF_CONNECT_PENDING here because we will queue
swork and process_send_sockets will then call the connect_action function.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14 13:07:11 -05:00
Mike Christie
e1631d0c48 dlm: set sctp assoc id during setup
sctp_assoc was not getting set so later lookups failed.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14 13:07:10 -05:00
Mike Christie
efad7e6b1a dlm: clear correct init bit during sctp setup
We were clearing the base con's init pending flags, but the
con for the node was the one with the pending bit set.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14 13:07:10 -05:00
Daniel Borkmann
1b86643411 net: sctp: introduce uapi header for sctp
This patch introduces an UAPI header for the SCTP protocol,
so that we can facilitate the maintenance and development of
user land applications or libraries, in particular in terms
of header synchronization.

To not break compatibility, some fragments from lksctp-tools'
netinet/sctp.h have been carefully included, while taking care
that neither kernel nor user land breaks, so both compile fine
with this change (for lksctp-tools I tested with the old
netinet/sctp.h header and with a newly adapted one that includes
the uapi sctp header). lksctp-tools smoke test run through
successfully as well in both cases.

Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:39 -04:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Wei Yongjun
eeee2b5fe1 dlm: remove unused variable in *dlm_lowcomms_get_buffer()
The variable users is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-11-01 15:27:13 -05:00
Ying Xue
9c5bef5849 dlm: cleanup send_to_sock routine
Remove unnecessary code form send_to_sock routine.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-13 10:03:18 -05:00
Ying Xue
4dd40f0cd9 dlm: convert add_sock routine return value type to void
Since add_sock() always returns a success code - 0, its return
value type should be changed from integer to void.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-10 09:10:10 -05:00
Xue Ying
b4c798cf69 dlm: remove redundant variable assignments
Once the tcp_create_listen_sock() is returned successfully, we
will invoke add_sock() immediately. In add_sock(), the 'con'
variable is assigned to 'sk_user_data', meanwhile, the 'sock' is
also set to 'con->sock'. So it's unnecessary to do the same thing
in tcp_create_listen_sock().

Signed-off-by: Xue Ying <ying.xue@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-10 09:10:10 -05:00
David Teigland
36b71a8bfb dlm: fix deadlock between dlm_send and dlm_controld
A deadlock sometimes occurs between dlm_controld closing
a lowcomms connection through configfs and dlm_send looking
up the address for a new connection in configfs.

dlm_controld does a configfs rmdir which calls
dlm_lowcomms_close which waits for dlm_send to
cancel work on the workqueues.

The dlm_send workqueue thread has called
tcp_connect_to_sock which calls dlm_nodeid_to_addr
which does a configfs lookup and blocks on a lock
held by dlm_controld in the rmdir path.

The solution here is to save the node addresses within
the lowcomms code so that the lowcomms workqueue does
not need to step through configfs to get a node address.

dlm_controld:
wait_for_completion+0x1d/0x20
__cancel_work_timer+0x1b3/0x1e0
cancel_work_sync+0x10/0x20
dlm_lowcomms_close+0x4c/0xb0 [dlm]
drop_comm+0x22/0x60 [dlm]
client_drop_item+0x26/0x50 [configfs]
configfs_rmdir+0x180/0x230 [configfs]
vfs_rmdir+0xbd/0xf0
do_rmdir+0x103/0x120
sys_rmdir+0x16/0x20

dlm_send:
mutex_lock+0x2b/0x50
get_comm+0x34/0x140 [dlm]
dlm_nodeid_to_addr+0x18/0xd0 [dlm]
tcp_connect_to_sock+0xf4/0x2d0 [dlm]
process_send_sockets+0x1d2/0x260 [dlm]
worker_thread+0x170/0x2a0

Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-08 11:33:35 -05:00
David Teigland
513ef596d4 dlm: prevent connections during shutdown
During lowcomms shutdown, a new connection could possibly
be created, and attempt to use a workqueue that's been
destroyed.  Similarly, during startup, a new connection
could attempt to use a workqueue that's not been set up
yet.  Add a global variable to indicate when new connections
are allowed.

Based on patch by: Christine Caulfield <ccaulfie@redhat.com>

Reported-by: dann frazier <dann.frazier@canonical.com>
Reviewed-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:35:38 -05:00
Linus Torvalds
30d73f3752 dlm for 3.4
This set includes one trivial fix, and one simple recovery
 speed up.  Directory recovery can use the standard hash table
 to find resources rather than always searching the linear
 recovery list.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPahBcAAoJEDgbc8f8gGmqeHEP/i288yZV8NVbIJG7XpX9JjTY
 4n4R1CI/qTMDn74GXDkk/OolHc8XTSQwbp02oFlJbPzj71lsWBUWijTAnwxiLIRz
 OHQg7eZ2aYL0YmaxAlvM2/6xLNOINmLW/DVwwH4QnpnSB4ymoCHBzyXxrNxgvgRv
 KWKUUXj7SDaUmbcK0TFZ39VprTmpw3L+mXIm+Y6kCCS2m4GfISp3Zij4OnxztA/c
 brex0R97EoZwrQOvPSRbVA5IaK6BjwfNScXAKsYCOSLsd+tvelD+UgYBdVHBTOmG
 godQ5pg8C7SpUB9NQqnLc8r78xpIUcOHQbWRqtwNQ2/6uPI/mWFj+lhpcHRmmzPk
 TczdDZVg+pIl9U+SMqiG689KgvnUTciPte0sYqksEbk3NqUMJOWOB7Cv79ZYquaV
 Pdmg788Essq7/5BmgeSRlOvS08RvdVfHXqYGOA6/tJ3f0b15M1YuSLjJdwYVWJkS
 gVmo4raN44Yh99R/+eNqeI8dvoVfd1pNDAD9VYXk4KdIv3AtKfRZi8XvWZ0o5uQI
 EdXTIhiA78ogjxG92cnnzj3+CAIpK4Iv1s53Y0KZgJ7gyExvVHyGp7zl1J7hlFLP
 jLuORsL+xMKTGbSWom796QuVn3jL/CGj/OKbnd1D98S0uRuSS6wiy/6ucBFaKmt0
 HvT7AVcX2Gh6t/qdTJ9h
 =ByvY
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates for 3.4 from David Teigland:
 "This set includes one trivial fix, and one simple recovery speed up.
  Directory recovery can use the standard hash table to find resources
  rather than always searching the linear recovery list."

* tag 'dlm-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: last element of dlm_local_addr[] never used
  dlm: fix slow rsb search in dir recovery
2012-03-21 13:54:22 -07:00
David Teigland
1b189b8889 dlm: last element of dlm_local_addr[] never used
The last element of dlm_local_addr[DLM_MAX_ADDR_COUNT]
was not used because the loop ended at COUNT - 1.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-03-21 09:18:34 -05:00
Benjamin Poirier
2f2d76cc3e dlm: Do not allocate a fd for peeloff
avoids allocating a fd that a) propagates to every kernel thread and
usermodehelper b) is not properly released.

References: http://article.gmane.org/gmane.linux.network.drbd/22529
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 13:52:09 -08:00
Alexey Dobriyan
4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
Masatake YAMATO
bcaadf5c1a dlm: dump address of unknown node
When the dlm fails to make a network connection to another
node, include the address of the node in the error message.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-06 16:37:23 -05:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
David Teigland
e43f055a95 dlm: use alloc_workqueue function
Replaces deprecated create_singlethread_workqueue().

Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10 13:22:34 -06:00
David Teigland
6b155c8fd4 dlm: use single thread workqueues
The recent commit to use cmwq for send and recv threads
dcce240ead introduced problems,
apparently due to multiple workqueue threads.  Single threads
make the problems go away, so return to that until we fully
understand the concurrency issues with multiple threads.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-02-11 16:50:47 -06:00
Namhyung Kim
b9d4105279 dlm: sanitize work_start() in lowcomms.c
The create_workqueue() returns NULL if failed rather than ERR_PTR().
Fix error checking and remove unnecessary variable 'error'.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-12-13 13:42:24 -06:00
Bob Peterson
f92c8dd7a0 dlm: reduce cond_resched during send
Calling cond_resched() after every send can unnecessarily
degrade performance.  Go back to an old method of scheduling
after 25 messages.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-12 11:15:20 -06:00
David Teigland
cb2d45da81 dlm: use TCP_NODELAY
Nagling doesn't help and can sometimes hurt dlm comms.

Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-12 11:12:55 -06:00
Steven Whitehouse
dcce240ead dlm: Use cmwq for send and receive workqueues
So far as I can tell, there is no reason to use a single-threaded
send workqueue for dlm, since it may need to send to several sockets
concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid
any possible deadlocks, WQ_HIGHPRI since locking traffic is highly
latency sensitive (and to avoid a priority inversion wrt GFS2's
glock_workqueue) and WQ_FREEZABLE just in case someone needs to do
that (even though with current cluster infrastructure, it doesn't
make sense as the node will most likely land up ejected from the
cluster) in the future.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-12 11:08:03 -06:00
David Miller
b36930dd50 dlm: Handle application limited situations properly.
In the normal regime where an application uses non-blocking I/O
writes on a socket, they will handle -EAGAIN and use poll() to
wait for send space.

They don't actually sleep on the socket I/O write.

But kernel level RPC layers that do socket I/O operations directly
and key off of -EAGAIN on the write() to "try again later" don't
use poll(), they instead have their own sleeping mechanism and
rely upon ->sk_write_space() to trigger the wakeup.

So they do effectively sleep on the write(), but this mechanism
alone does not let the socket layers know what's going on.

Therefore they must emulate what would have happened, otherwise
TCP cannot possibly see that the connection is application window
size limited.

Handle this, therefore, like SUNRPC by setting SOCK_NOSPACE and
bumping the ->sk_write_count as needed when we hit the send buffer
limits.

This should make TCP send buffer size auto-tuning and the
->sk_write_space() callback invocations actually happen.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-11 13:05:12 -06:00
Julia Lawall
f70cb33b9c fs/dlm: Drop unnecessary null test
hlist_for_each_entry binds its first argument to a non-null value, and thus
any null test on the value of that argument is superfluous.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
iterator I;
expression x,E,E1,E2;
statement S,S1,S2;
@@

I(x,...) { <...
- (x != NULL) &&
  E
  ...> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-08-05 14:23:45 -05:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
David Teigland
573c24c4af dlm: always use GFP_NOFS
Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
ls_allocation would be GFP_KERNEL for userland lockspaces
and GFP_NOFS for file system lockspaces.

It was discovered that any lockspaces on the system can
affect all others by triggering memory reclaim in the
file system which could in turn call back into the dlm
to acquire locks, deadlocking dlm threads that were
shared by all lockspaces, like dlm_recv.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-11-30 16:34:43 -06:00
David Teigland
6861f35078 dlm: fix socket fd translation
The code to set up sctp sockets was not using the sockfd_lookup()
and sockfd_put() routines to translate an fd to a socket.  The
direct fget and fput calls were resulting in error messages from
alloc_fd().

Also clean up two log messages and remove a third, related to
setting up sctp associations.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-09-30 12:19:44 -05:00
David Teigland
04bedd79a7 dlm: fix lowcomms_connect_node for sctp
The recently added dlm_lowcomms_connect_node() from
391fbdc5d5 does not work
when using SCTP instead of TCP.  The sctp connection code
has nothing to do without data to send.  Check for no data
in the sctp connection code and do nothing instead of
triggering a BUG.  Also have connect_node() do nothing
when the protocol is sctp.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-09-30 12:19:44 -05:00
Paolo Bonzini
1329e3f2c8 dlm: use kernel_sendpage
Using kernel_sendpage() is cleaner and safer than following
sock->ops ourselves.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-08-24 13:18:04 -05:00
Lars Marowsky-Bree
063c4c9963 dlm: fix connection close handling
Closing a connection to a node can create problems if there are
outstanding messages for that node.  The problems include dlm_send
spinning attempting to reconnect, or BUG from tcp_connect_to_sock()
attempting to use a partially closed connection.

To cleanly close a connection, we now first attempt to send any pending
messages, cancel any remaining workqueue work, and flag the connection
as closed to avoid reconnect attempts.

Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-08-24 13:13:56 -05:00
Casey Dahlin
b5711b8e5a dlm: fix double-release of socket in error exit path
The last correction to the tcp_connect_to_sock error exit path,
commit a89d63a159, can free an already
freed socket, due to collision with a previous (incomplete) attempt
to fix the same issue, commit 311f6fc77c.

Signed-off-by: Casey Dahlin <cdahlin@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-08-18 15:09:24 -05:00
Casey Dahlin
a89d63a159 dlm: free socket in error exit path
In the tcp_connect_to_sock() error exit path, the socket
allocated at the top of the function was not being freed.

Signed-off-by: Casey Dahlin <cdahlin@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-07-14 12:28:43 -05:00
David Teigland
748285ccf7 dlm: use more NOFS allocation
Change some GFP_KERNEL allocations to use either GFP_NOFS or
ls_allocation (when available) which the fs sets to GFP_NOFS.
The point is to prevent allocations from going back into the
cluster fs in places where that might lead to deadlock.

Signed-off-by: David Teigland <teigland@redhat.com>
2009-05-15 11:24:59 -05:00
Christine Caulfield
391fbdc5d5 dlm: connect to nodes earlier
Make network connections to other nodes earlier, in the context of
dlm_recoverd.  This avoids connecting to nodes from dlm_send where we
try to avoid allocations which could possibly deadlock if memory reclaim
goes into the cluster fs which may try to do a dlm operation.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-05-15 09:34:12 -05:00
Christine Caulfield
5e9ccc372d dlm: replace idr with hash table for connections
Integer nodeids can be too large for the idr code; use a hash
table instead.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-03-11 12:20:58 -05:00
Joe Perches
2cf12c0bf2 dlm: comment typo fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-01-28 12:56:07 -06:00
Joe Perches
44ad532b32 dlm: use ipv6_addr_copy
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2009-01-28 12:56:02 -06:00
Steven Whitehouse
1521848cbb dlm: remove kmap/kunmap
The pages used in lowcomms are not highmem, so kmap is not necessary.

Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-12-23 10:16:01 -06:00
Steven Whitehouse
d6d7b702a3 dlm: fix up memory allocation flags
Use ls_allocation for memory allocations, which a cluster fs sets to
GFP_NOFS.  Use GFP_NOFS for allocations when no lockspace struct is
available.  Taking dlm locks needs to avoid calling back into the
cluster fs because write-out can require taking dlm locks.

Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-12-23 10:15:40 -06:00
Masatake YAMATO
311f6fc77c dlm: release socket on error
It seems that `sock' allocated by sock_create_kern in
tcp_connect_to_sock() of dlm/fs/lowcomms.c is not released if
dlm_nodeid_to_addr an error.

Acked-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-07-14 13:56:59 -05:00
Marcin Slusarz
0035a4b149 dlm: tcp_connect_to_sock should check for -EINVAL, not EINVAL
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: David Teigland <teigland@redhat.com>
2008-05-19 15:37:27 -05:00
Matthias Kaehlcke
7a936ce71e dlm: convert connections_lock in a mutex
The semaphore connections_lock is used as a mutex.  Convert it to the mutex
API.

Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-05-19 15:37:27 -05:00
Patrick Caulfeld
39bd4177dd dlm: close othercons
This patch addresses a problem introduced with the last round of
lowcomms patches where the 'othercon' connections do not get freed when
the DLM shuts down.

This results in the error message
"slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all
objects"

and the DLM cannot be restarted without a system reboot.

See bz#428119

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29 17:17:32 -06:00
Lon Hohberger
6bd8fedaa1 dlm: bind connections from known local address when using TCP
A common problem occurs when multiple IP addresses within the same
subnet are assigned to the same NIC.  If we make a connection attempt to
another address on the same subnet as one of those addresses, the
connection attempt will not necessarily be routed from the address we
want.

In the case of the DLM, the other nodes will quickly drop the connection
attempt, causing problems.

This patch makes the DLM bind to the local address it acquired from the
cluster manager when using TCP prior to making a connection, obviating
the need for administrators to "fix" their systems or use clever routing
tricks.

Signed-off-by: Lon Hohberger <lhh@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29 16:44:25 -06:00
David S. Miller
df61c95262 [DLM] lowcomms: Do not muck with sysctl_rmem_max.
Use SO_RCVBUFFORCE instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:11:42 -08:00
Patrick Caulfield
d66f8277f5 [DLM] Make dlm_sendd cond_resched more
Under high recovery loads dlm_sendd can monopolise the CPU and cause soft lockups.

This one extra and one moved cond_resched() make it yield a little more during
such times keeping work moving.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:19 +01:00
Patrick Caulfield
61d96be0f4 [DLM] Fix lowcomms socket closing
This patch fixes the slight mess made in lowcomms closing by previous patches
and fixes all sorts of DLM hangs.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:55:39 +01:00
Patrick Caulfield
9e5f2825a8 [DLM] More othercon fixes
The last patch to clean out 'othercon' structures only fixed half the problem.
The attached addresses the other situations too, and fixes bz#238490

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:30:36 +01:00
Patrick Caulfield
01c8cab258 [DLM] zero unused parts of sockaddr_storage
When we build a sockaddr_storage for an IP address, clear the unused parts as
they could be used for node comparisons.

I have seen this occasionally make sctp connections fail.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:29:27 +01:00
Patrick Caulfield
25720c2d73 [DLM] Clear othercon pointers when a connection is closed
This patch clears the othercon pointer and frees the memory when a connnection
is closed. This could cause a small memory leak when nodes leave the cluster.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:28:05 +01:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
akpm@linux-foundation.org
f4fadb23ca [GFS2] git-gfs2-nmw-build-fix
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:06 +01:00
Patrick Caulfield
97d848365e [DLM] Telnet to port 21064 can stop all lockspaces
This patch fixes Red Hat bz#245892

Opening a tcp connection from a cluster member to another cluster member
targeting the dlm port it is enough to stop every dlm operation in the cluster.
This means that GFS and rgmanager will hang.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:57 +01:00
Patrick Caulfield
afb853fb4e [DLM] fix socket shutdown
This patch clears the user_data of active sockets as part of cleanup.
This prevents any late-arriving data from trying to add jobs to the work
queue while we are tidying up.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:05 +01:00
David Teigland
617e82e10c [DLM] lowcomms style
Replace some printk with log_print, and fix some simple cases of lines
over 80.  Also, return -ENOTCONN if lowcomms_start fails due to no local
IP address being available.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:51 +01:00
Patrick Caulfield
30d3a2373f [DLM] Lowcomms nodeid range & initialisation fixes
Fix a few range & initialization bugs in lowcomms.
- max_nodeid is really the highest nodeid encountered, so all loops must include
it in their iterations.
- clean dlm_local_count & connection_idr so we can do a clean restart.
- Remove a spurious BUG_ON

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:41 +01:00
Josef Bacik
2439fe5072 [DLM] Fix dlm_lowcoms_stop hang
When you attempt to release a lockspace in DLM, it will hang trying to down a
semaphore that has already been downed.  The attached patch fixes the problem.

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Patrick Caulfield <pcaulfie@redhat.com>
2007-05-01 09:11:38 +01:00
Patrick Caulfield
6ed7257b46 [DLM] Consolidate transport protocols
This patch consolidates the TCP & SCTP protocols for the DLM into a single file
and makes it switchable at run-time (well, at least before the DLM actually
starts up!)

For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
socket API but that has already been twice ACKed so it should be OK.

The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
& lowcomms-tcp.c files.

Signed-off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:23 +01:00
Patrick Caulfield
fdda387f73 [DLM] Add support for tcp communications
The following patch adds a TCP based communications layer
to the DLM which is compile time selectable. The existing SCTP
layer gives the advantage of allowing multihoming, whereas
the TCP layer has been heavily tested in previous versions of
the DLM and is known to be robust and therefore can be used as
a baseline for performance testing.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:00 -05:00
Patrick Caulfield
42fb00838a [DLM] fix iovec length in recvmsg
I didn't spot that the msg_iovlen was set to 2 if there
were two elements in the iovec but left at zero if not :(

I think this might be why bob was still seeing trouble.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-20 09:13:10 -04:00
Patrick Caulfield
4c5e1b1a8c [DLM] fix iovec length in recvmsg
The DLM always passes the iovec length as 1, this is wrong when the circular
buffer wraps round.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-12 17:11:33 -04:00
Al Viro
38d6fd26ea [PATCH] dlm gfp_t annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-09 14:19:08 -07:00
David Teigland
fcc8abc8d4 [DLM] move kmap to after spin_unlock
Doing the kmap() while holding the spinlock was causing recursive spinlock
problems.  It seems the kmap was scheduling, although there was no warning
as I'd expect.  Patrick, do we need locking around the kmap?

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-11 09:44:00 -04:00
David Teigland
7d5513d58d [DLM] init rwsem earlier
The nodeinfo_lock rwsem needs to be initialized when the module is loaded
instead of when the dlm is first used.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-06-19 09:15:38 -04:00
Steven Whitehouse
47c96298cd [GFS2] Change name due to local_nodeid being a macro
Change names of local_nodeid to dlm_local_nodeid to prevent a
namespace collision. Changed other local variable to match.

Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-05-25 17:43:14 -04:00
David Teigland
1c032c0311 [DLM] PATCH 2/3 dlm: lowcomms close
When a node is removed from a lockspace configuration, close our
connection to it, clearing any remaining messages for it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28 10:50:41 -04:00
David Teigland
e7fd41792f [DLM] The core of the DLM for GFS2/CLVM
This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.

It implements VAX-style locking modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
2006-01-18 09:30:29 +00:00