Commit Graph

3190 Commits

Author SHA1 Message Date
Al Viro
39c853ebfe Merge branch 'for-davem' into for-next 2015-04-11 22:27:19 -04:00
Al Viro
36e9f6535f Merge branch 'iov_iter' into for-next 2015-04-11 22:26:51 -04:00
Anton Altaparmakov
171a02032b VFS: Add iov_iter_fault_in_multipages_readable()
simillar to iov_iter_fault_in_readable() but differs in that it is
not limited to faulting in the first iovec and instead faults in
"bytes" bytes iterating over the iovecs as necessary.

Also, instead of only faulting in the first and last page of the
range, all pages are faulted in.

This function is needed by NTFS when it does multi page file
writes.

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:24:32 -04:00
Al Viro
fe3cce2e01 Merge branch 'iov_iter' into for-davem 2015-04-09 00:02:06 -04:00
David S. Miller
c85d6975ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/cmd.c
	net/core/fib_rules.c
	net/ipv4/fib_frontend.c

The fib_rules.c and fib_frontend.c conflicts were locking adjustments
in 'net' overlapping addition and removal of code in 'net-next'.

The mlx4 conflict was a bug fix in 'net' happening in the same
place a constant was being replaced with a more suitable macro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 22:34:15 -04:00
Linus Torvalds
57a9d89dc0 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fix from Jens Axboe:
 "Just one patch in this pull request, fixing a regression caused by a
  'mathematically correct' change to lcm()"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix blk_stack_limits() regression due to lcm() change
2015-04-03 14:49:26 -07:00
Herbert Xu
b81b7be6ae test_rhashtable: Remove bogus max_size setting
Now that resizing is completely automatic, we need to remove
the max_size setting or the test will fail.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03 15:09:36 -04:00
David S. Miller
9f0d34bc34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	drivers/net/usb/sr9800.c
	drivers/net/usb/usbnet.c
	include/linux/usb/usbnet.h
	net/ipv4/tcp_ipv4.c
	net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes.  In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:16:53 -04:00
Jiri Benc
5899f04785 netlink: pad nla_memcpy dest buffer with zeroes
This is especially important in cases where the kernel allocs a new
structure and expects a field to be set from a netlink attribute. If such
attribute is shorter than expected, the rest of the field is left containing
previous data. When such field is read back by the user space, kernel memory
content is leaked.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 14:07:24 -04:00
Mike Snitzer
e9637415a9 block: fix blk_stack_limits() regression due to lcm() change
Linux 3.19 commit 69c953c ("lib/lcm.c: lcm(n,0)=lcm(0,n) is 0, not n")
caused blk_stack_limits() to not properly stack queue_limits for stacked
devices (e.g. DM).

Fix this regression by establishing lcm_not_zero() and switching
blk_stack_limits() over to using it.

DM uses blk_set_stacking_limits() to establish the initial top-level
queue_limits that are then built up based on underlying devices' limits
using blk_stack_limits().  In the case of optimal_io_size (io_opt)
blk_set_stacking_limits() establishes a default value of 0.  With commit
69c953c, lcm(0, n) is no longer n, which compromises proper stacking of
the underlying devices' io_opt.

Test:
$ modprobe scsi_debug dev_size_mb=10 num_tgts=1 opt_blks=1536
$ cat /sys/block/sde/queue/optimal_io_size
786432
$ dmsetup create node --table "0 100 linear /dev/sde 0"

Before this fix:
$ cat /sys/block/dm-5/queue/optimal_io_size
0

After this fix:
$ cat /sys/block/dm-5/queue/optimal_io_size
786432

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-31 09:45:50 -06:00
Al Viro
bc917be810 saner iov_iter initialization primitives
iovec-backed iov_iter instances are assumed to satisfy several properties:
	* no more than UIO_MAXIOV elements in iovec array
	* total size of all ranges is no more than MAX_RW_COUNT
	* all ranges pass access_ok().

The problem is, invariants of data structures should be established in the
primitives creating those data structures, not in the code using those
primitives.  And iov_iter_init() violates that principle.  For a while we
managed to get away with that, but once the use of iov_iter started to
spread, it didn't take long for shit to hit the fan - missed check in
sys_sendto() had introduced a roothole.

We _do_ have primitives for importing and validating iovecs (both native and
compat ones) and those primitives are almost always followed by shoving the
resulting iovec into iov_iter.  Life would be considerably simpler (and safer)
if we combined those primitives with initializing iov_iter.

That gives us two new primitives - import_iovec() and compat_import_iovec().
Calling conventions:
	iovec = iov_array;
	err = import_iovec(direction, uvec, nr_segs,
			   ARRAY_SIZE(iov_array), &iovec,
			   &iter);
imports user vector into kernel space (into iov_array if it fits, allocated
if it doesn't fit or if iovec was NULL), validates it and sets iter up to
refer to it.  On success 0 is returned and allocated kernel copy (or NULL
if the array had fit into caller-supplied one) is returned via iovec.
On failure all allocations are undone and -E... is returned.  If the total
size of ranges exceeds MAX_RW_COUNT, the excess is silently truncated.

compat_import_iovec() expects uvec to be a pointer to user array of compat_iovec;
otherwise it's identical to import_iovec().

Finally, import_single_range() sets iov_iter backed by single-element iovec
covering a user-supplied range -

	err = import_single_range(direction, address, size, iovec, &iter);

does validation and sets iter up.  Again, size in excess of MAX_RW_COUNT gets
silently truncated.

Next commits will be switching the things up to use of those and reducing
the amount of iov_iter_init() instances.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-30 11:08:16 -04:00
Patrick McHardy
49f7b33e63 rhashtable: provide len to obj_hashfn
nftables sets will be converted to use so called setextensions, moving
the key to a non-fixed position. To hash it, the obj_hashfn must be used,
however it so far doesn't receive the length parameter.

Pass the key length to obj_hashfn() and convert existing users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-25 17:18:33 +01:00
Thomas Graf
6b6f302ced rhashtable: Add rhashtable_free_and_destroy()
rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:40 -04:00
Thomas Graf
b5e2c150ac rhashtable: Disable automatic shrinking by default
Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:40 -04:00
Thomas Graf
299e5c32a3 rhashtable: Use 'unsigned int' consistently
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:39 -04:00
Herbert Xu
27ed44a5d6 rhashtable: Add comment on choice of elasticity value
This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 14:57:04 -04:00
David S. Miller
d5c1d8c567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:22:43 -04:00
Herbert Xu
ba7c95ea38 rhashtable: Fix sleeping inside RCU critical section in walk_stop
The commit 963ecbd41a ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:16:07 -04:00
Hannes Frederic Sowa
ab2bb32417 lib: EXPORT_SYMBOL sha_init
We need this symbol later on in ipv6.ko, thus export it via EXPORT_SYMBOL
like sha_transform already is.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:08 -04:00
Herbert Xu
ccd57b1bd3 rhashtable: Add immediate rehash during insertion
This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:52 -04:00
Herbert Xu
b9ecfdaa10 rhashtable: Allow GFP_ATOMIC bucket table allocation
This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL.  This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:52 -04:00
Herbert Xu
b824478b21 rhashtable: Add multiple rehash support
This patch adds the missing bits to allow multiple rehashes.  The
read-side as well as remove already handle this correctly.  So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.

This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:52 -04:00
Herbert Xu
18093d1c0d rhashtable: Shrink to fit
This patch changes rhashtable_shrink to shrink to the smallest
size possible rather than halving the table.  This is needed
because with multiple rehashing we will defer shrinking until
all other rehashing is done, meaning that when we do shrink
we may be able to shrink a lot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:52 -04:00
Herbert Xu
31ccde2dac rhashtable: Allow hashfn to be unset
Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset.  In which case rhashtable will automatically
set it to jhash.  Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:51 -04:00
Herbert Xu
d88252f9bb rhashtable: Add barrier to ensure we see new tables in walker
The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:07:51 -04:00
David S. Miller
0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Herbert Xu
dc0ee268d8 rhashtable: Rip out obsolete out-of-line interface
Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Herbert Xu
b182aa6e96 test_rhashtable: Use inlined rhashtable interface
This patch converts test_rhashtable to the inlined rhashtable
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Herbert Xu
02fd97c3d4 rhashtable: Allow hash/comparison functions to be inlined
This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable.  We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.

The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.

This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.

This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not.  It
also adds a new function obj_cmpfn to compare a key against an
object.  This means that the caller no longer needs to supply
explicit compare functions.

All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Herbert Xu
488fb86ee9 rhashtable: Make rhashtable_init params argument const
This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.

This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 16:16:24 -04:00
Thomas Graf
a998f712f7 rhashtable: Round up/down min/max_size to ensure we respect limit
Round up min_size respectively round down max_size to the next power
of two to make sure we always respect the limit specified by the
user. This is required because we compare the table size against the
limit before we expand or shrink.

Also fixes a minor bug where we modified min_size in the params
provided instead of the copy stored in struct rhashtable.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19 21:02:23 -04:00
Herbert Xu
e2e21c1c58 rhashtable: Remove max_shift and min_shift
Now that nobody uses max_shift and min_shift, we can safely remove
them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:41 -04:00
Herbert Xu
4f509df4f5 test_rhashtable: Use rhashtable max_size instead of max_shift
This patch converts test_rhashtable to use rhashtable max_size
instead of the obsolete max_shift.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:40 -04:00
Herbert Xu
c2e213cff7 rhashtable: Introduce max_size/min_size
This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:40 -04:00
Herbert Xu
6aebd94084 rhashtable: Remove shift from bucket_table
Keeping both size and shift is silly.  We only need one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 12:46:40 -04:00
Thomas Graf
617011e7d5 rhashtable: Avoid calculating hash again to unlock
Caching the lock pointer avoids having to hash on the object
again to unlock the bucket locks.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 17:14:34 -04:00
JeHyeon Yeon
d5e7cafd69 LZ4 : fix the data abort issue
If the part of the compression data are corrupted, or the compression
data is totally fake, the memory access over the limit is possible.

This is the log from my system usning lz4 decompression.
   [6502]data abort, halting
   [6503]r0  0x00000000 r1  0x00000000 r2  0xdcea0ffc r3  0xdcea0ffc
   [6509]r4  0xb9ab0bfd r5  0xdcea0ffc r6  0xdcea0ff8 r7  0xdce80000
   [6515]r8  0x00000000 r9  0x00000000 r10 0x00000000 r11 0xb9a98000
   [6522]r12 0xdcea1000 usp 0x00000000 ulr 0x00000000 pc  0x820149bc
   [6528]spsr 0x400001f3
and the memory addresses of some variables at the moment are
    ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000

As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory
over @oend.

Signed-off-by: JeHyeon Yeon <tom.yeon@windriver.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-16 21:55:35 +01:00
Thomas Graf
db4374f48a rhashtable: Annotate RCU locking of walkers
Fixes the following sparse warnings:

lib/rhashtable.c:767:5: warning: context imbalance in 'rhashtable_walk_start' - wrong count at exit
lib/rhashtable.c:849:6: warning: context imbalance in 'rhashtable_walk_stop' - unexpected unlock

Fixes: f2dba9c6ff ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16 16:24:13 -04:00
Herbert Xu
565e86404e rhashtable: Fix rhashtable_remove failures
The commit 9d901bc051 ("rhashtable:
Free bucket tables asynchronously after rehash") causes gratuitous
failures in rhashtable_remove.

The reason is that it inadvertently introduced multiple rehashing
from the perspective of readers.  IOW it is now possible to see
more than two tables during a single RCU critical section.

Fortunately the other reader rhashtable_lookup already deals with
this correctly thanks to c4db8848af
("rhashtable: rhashtable: Move future_tbl into struct bucket_table")
so only rhashtable_remove is broken by this change.

This patch fixes this by looping over every table from the first
one to the last or until we find the element that we were trying
to delete.

Incidentally the simple test for detecting rehashing to prevent
starting another shrinking no longer works.  Since it isn't needed
anyway (the work queue and the mutex serves as a natural barrier
to unnecessary rehashes) I've simply killed the test.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 22:22:08 -04:00
Herbert Xu
963ecbd41a rhashtable: Fix use-after-free in rhashtable_walk_stop
The commit c4db8848af ("rhashtable:
Move future_tbl into struct bucket_table") introduced a use-after-
free bug in rhashtable_walk_stop because it dereferences tbl after
droping the RCU read lock.

This patch fixes it by moving the RCU read unlock down to the bottom
of rhashtable_walk_stop.  In fact this was how I had it originally
but it got dropped while rearranging patches because this one
depended on the async freeing of bucket_table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 22:22:08 -04:00
Herbert Xu
c4db8848af rhashtable: Move future_tbl into struct bucket_table
This patch moves future_tbl to open up the possibility of having
multiple rehashes on the same table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
63d512d0cf rhashtable: Add rehash counter to bucket_table
This patch adds a rehash counter to bucket_table to indicate
the last bucket that has been rehashed.  This serves two purposes:

1. Any bucket that has been rehashed can never gain a new object.
2. If the rehash counter reaches the size of the table, the table
will forever remain empty.

This patch also downsizes bucket_table->size to an unsigned int
since we do not support sizes greater than 32 bits yet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
9d901bc051 rhashtable: Free bucket tables asynchronously after rehash
There is in fact no need to wait for an RCU grace period in the
rehash function, since all insertions are guaranteed to go into
the new table through spin locks.

This patch uses call_rcu to free the old/rehashed table at our
leisure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
5269b53da4 rhashtable: Move seed init into bucket_table_alloc
It seems that I have already made every rehash redo the random
seed even though my commit message indicated otherwise :)

Since we have already taken that step, this patch goes one step
further and moves the seed initialisation into bucket_table_alloc.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
8f2484bdb5 rhashtable: Use SINGLE_DEPTH_NESTING
We only nest one level deep there is no need to roll our own
subclasses.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Herbert Xu
eddee5ba34 rhashtable: Fix walker behaviour during rehash
Previously whenever the walker encountered a resize it simply
snaps back to the beginning and starts again.  However, this only
works if the rehash started and completed while the walker was
idle.

If the walker attempts to restart while the rehash is still ongoing,
we may miss objects that we shouldn't have.

This patch fixes this by making the walker walk the old table
followed by the new table just like all other readers.  If a
rehash is detected we will still signal our caller of the fact
so they can prepare for duplicates but we will simply continue
the walk onto the new table after the old one is finished either
by us or by the rehasher.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15 01:35:34 -04:00
Linus Torvalds
f788baadbd Merge branch 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull gadgetfs fixes from Al Viro:
 "Assorted fixes around AIO on gadgetfs: leaks, use-after-free, troubles
  caused by ->f_op flipping"

* 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gadgetfs: really get rid of switching ->f_op
  gadgetfs: get rid of flipping ->f_op in ep_config()
  gadget: switch ep_io_operations to ->read_iter/->write_iter
  gadgetfs: use-after-free in ->aio_read()
  gadget/function/f_fs.c: switch to ->{read,write}_iter()
  gadget/function/f_fs.c: use put iov_iter into io_data
  gadget/function/f_fs.c: close leaks
  move iov_iter.c from mm/ to lib/
  new helper: dup_iter()
2015-03-13 10:55:32 -07:00
Herbert Xu
393619474e rhashtable: Fix read-side crash during rehash
This patch fixes a typo rhashtable_lookup_compare where we fail
to recompute the hash when looking up the new table.  This causes
elements to be missed and potentially a crash during a resize.

Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Daniel Borkmann
a5b6846f9e rhashtable: kill ht->shift atomic operations
Commit c0c09bfdc4 ("rhashtable: avoid unnecessary wakeup for worker
queue") changed ht->shift to be atomic, which is actually unnecessary.

Instead of leaving the current shift in the core rhashtable structure,
it can be cached inside the individual bucket tables.

There, it will only be initialized once during a new table allocation
in the shrink/expansion slow path, and from then onward it stays immutable
for the rest of the bucket table liftime.

That allows shift to be non-atomic. The patch also moves hash_rnd
management into the table setup. The rhashtable structure now consumes
3 instead of 4 cachelines.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00
Herbert Xu
9497df88ab rhashtable: Fix reader/rehash race
There is a potential race condition between readers and the rehasher.
In particular, the rehasher could have started a rehash while the
reader finishes a scan of the old table but fails to see the new
table pointer.

This patch closes this window by adding smp_wmb/smp_rmb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 23:02:30 -04:00