linux/fs/gfs2
Benjamin Marzinski 471f3db278 gfs2: change gfs2 readdir cookie
gfs2 currently returns 31 bits of filename hash as a cookie that readdir
uses for an offset into the directory.  When there are a large number of
directory entries, the likelihood of a collision goes up way too
quickly.  GFS2 will now return cookies that are guaranteed unique for a
while, and then fail back to using 30 bits of filename hash.
Specifically, the directory leaf blocks are divided up into chunks based
on the minimum size of a gfs2 directory entry (48 bytes). Each entry's
cookie is based off the chunk where it starts, in the linked list of
leaf blocks that it hashes to (there are 131072 hash buckets). Directory
entries will have unique names until they take reach chunk 8192.
Assuming the largest filenames possible, and the least efficient spacing
possible, this new method will still be able to return unique names when
the previous method has statistically more than a 99% chance of a
collision.  The non-unique names it fails back to are guaranteed to not
collide with the unique names.

unique cookies will be in this format:
- 1 bit "0" to make sure the the returned cookie is positive
- 17 bits for the hash table index
- 1 bit for the mode "0"
- 13 bits for the offset

non-unique cookies will be in this format:
- 1 bit "0" to make sure the the returned cookie is positive
- 17 bits for the hash table index
- 1 bit for the mode "1"
- 13 more bits of the name hash

Another benefit of location based cookies, is that once a directory's
exhash table is fully extended (so that multiple hash table indexs do
not use the same leaf blocks), gfs2 can skip sorting the directory
entries until it reaches the non-unique ones, and then it only needs to
sort these. This provides a significant speed up for directory reads of
very large directories.

The only issue is that for these cookies to continue to point to the
correct entry as files are added and removed from the directory, gfs2
must keep the entries at the same offset in the leaf block when they are
split (see my previous patch). This means that until all the nodes in a
cluster are running with code that will split the directory leaf blocks
this way, none of the nodes can use the new cookie code. To deal with
this, gfs2 now has the mount option loccookie, which, if set, will make
it return these new location based cookies.  This option must not be set
until all nodes in the cluster are at least running this version of the
kernel code, and you have guaranteed that there are no outstanding
cookies required by other software, such as NFS.

gfs2 uses some of the extra space at the end of the gfs2_dirent
structure to store the calculated readdir cookies. This keeps us from
needing to allocate a seperate array to hold these values.  gfs2
recomputes the cookie stored in de_cookie for every readdir call.  The
time it takes to do so is small, and if gfs2 expected this value to be
saved on disk, the new code wouldn't work correctly on filesystems
created with an earlier version of gfs2.

One issue with adding de_cookie to the union in the gfs2_dirent
structure is that it caused the union to align itself to a 4 byte
boundary, instead of its previous 2 byte boundary. This changed the
offset of de_rahead. To solve that, I pulled de_rahead out of the union,
since it does not need to be there.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-14 12:19:37 -06:00
..
acl.c GFS2: gfs2_set_acl(): Cache "no acl" as well 2015-03-18 12:41:57 -05:00
acl.h GFS2: Increase the max number of ACLs 2014-03-19 15:16:24 +00:00
aops.c GFS2: Extract quota data from reservations structure (revert 5407e24) 2015-11-24 08:38:44 -06:00
bmap.c GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
bmap.h GFS2: Clean up journal extent mapping 2014-03-03 13:50:12 +00:00
dentry.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dir.c gfs2: change gfs2 readdir cookie 2015-12-14 12:19:37 -06:00
dir.h GFS2: Make rename not save dirent location 2014-10-01 14:06:15 +01:00
export.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
file.c GFS2: Reduce size of incore inode 2015-12-14 12:19:24 -06:00
gfs2.h
glock.c GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear 2015-12-14 12:19:31 -06:00
glock.h GFS2: Reduce size of incore inode 2015-12-14 12:19:24 -06:00
glops.c gfs2: Remove gl_spin define 2015-10-29 12:57:48 -05:00
glops.h GFS2: update freeze code to use freeze/thaw_super on all nodes 2014-11-17 10:36:39 +00:00
incore.h gfs2: change gfs2 readdir cookie 2015-12-14 12:19:37 -06:00
inode.c GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
inode.h GFS2: Add atomic_open support 2013-06-14 11:17:15 +01:00
Kconfig Finally eradicate CONFIG_HOTPLUG 2013-06-03 14:20:18 -07:00
lock_dlm.c remove abs64() 2015-11-09 15:11:24 -08:00
log.c GFS2: update freeze code to use freeze/thaw_super on all nodes 2014-11-17 10:36:39 +00:00
log.h GFS2: remove transaction glock 2014-05-14 10:04:34 +01:00
lops.c GFS2: merge window 2015-09-11 12:23:51 -07:00
lops.h GFS2: Move log buffer lists into transaction 2014-02-24 16:54:54 +00:00
main.c GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
Makefile GFS2: Rename ops_inode.c to inode.c 2011-05-10 13:12:49 +01:00
meta_io.c gfs2: Extended attribute readahead optimization 2015-11-18 14:51:50 -06:00
meta_io.h gfs2: Extended attribute readahead 2015-11-16 12:00:29 -06:00
ops_fstype.c gfs2: change gfs2 readdir cookie 2015-12-14 12:19:37 -06:00
quota.c GFS2: Reduce size of incore inode 2015-12-14 12:19:24 -06:00
quota.h GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
recovery.c GFS2: fix sprintf format specifier 2015-01-13 10:48:57 +00:00
recovery.h GFS2: Move recovery variables to journal structure in memory 2014-03-07 09:14:48 +00:00
rgrp.c GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
rgrp.h GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
super.c gfs2: change gfs2 readdir cookie 2015-12-14 12:19:37 -06:00
super.h GFS2: update freeze code to use freeze/thaw_super on all nodes 2014-11-17 10:36:39 +00:00
sys.c gfs2: convert simple_str to kstr 2015-05-05 13:23:22 -05:00
sys.h GFS2: dlm based recovery coordination 2012-01-11 09:23:05 +00:00
trace_gfs2.h gfs2: Make statistics unsigned, suitable for use with do_div() 2015-09-03 13:33:32 -05:00
trans.c gfs2: Add missing else in trans_add_meta/data 2015-10-01 12:00:59 -05:00
trans.h GFS2: Split gfs2_trans_add_bh() into two 2013-01-29 10:28:04 +00:00
util.c GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
util.h GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
xattr.c gfs2: Extended attribute readahead 2015-11-16 12:00:29 -06:00
xattr.h