linux/fs/xfs
Dave Jiang a2d581675d mm,fs,dax: change ->pmd_fault to ->huge_fault
Patch series "1G transparent hugepage support for device dax", v2.

The following series implements support for 1G trasparent hugepage on
x86 for device dax.  The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX.  I have
forward ported the relevant bits to 4.10-rc.  The current submission has
only the necessary code to support device DAX.

Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs.  Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file.  We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].

[1]: https://lkml.org/lkml/2016/1/31/52

Comments from Nilesh @ Oracle:

There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:

processes         : 10,000
memory            :    6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB

As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level.  Memory sizes will keep increasing; so
this number will keep increasing.

An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.

This patch (of 3):

In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault.  The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.

[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
  Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
  Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
..
libxfs xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG 2017-02-17 20:32:10 -08:00
Kconfig
kmem.c
kmem.h
Makefile xfs: introduce the CoW fork 2016-10-04 18:06:40 -07:00
mrlock.h
uuid.c
uuid.h
xfs_acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
xfs_acl.h
xfs_aops.c xfs: mark speculative prealloc CoW fork extents unwritten 2017-02-02 15:14:02 -08:00
xfs_aops.h xfs: use iomap_dio_rw 2016-11-30 14:37:15 +11:00
xfs_attr_inactive.c
xfs_attr_list.c xfs: several xattr functions can be void 2016-12-05 12:32:14 +11:00
xfs_attr.h xfs: several xattr functions can be void 2016-12-05 12:32:14 +11:00
xfs_bmap_item.c xfs: when replaying bmap operations, don't let unlinked inodes get reaped 2016-10-04 11:05:44 -07:00
xfs_bmap_item.h xfs: log bmap intent items 2016-10-04 11:05:44 -07:00
xfs_bmap_util.c xfs: simplify xfs_rtallocate_extent 2017-02-17 16:52:52 -08:00
xfs_bmap_util.h xfs: remove unused full argument from bmap 2017-01-30 16:32:25 -08:00
xfs_buf_item.c xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t 2017-02-03 14:39:07 -08:00
xfs_buf_item.h
xfs_buf.c Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
xfs_buf.h block: Get rid of blk_get_backing_dev_info() 2017-02-02 08:21:32 -07:00
xfs_dir2_readdir.c xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_discard.c xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_discard.h xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_dquot_item.c xfs: allocate log vector buffers outside CIL context lock 2016-07-22 09:52:35 +10:00
xfs_dquot_item.h
xfs_dquot.c xfs: don't wrap ID in xfs_dq_get_next_id 2017-01-17 11:43:38 -08:00
xfs_dquot.h
xfs_error.c Merge branch 'xfs-4.8-misc-fixes-3' into for-next 2016-07-20 11:51:08 +10:00
xfs_error.h xfs: simulate per-AG reservations being critically low 2016-10-05 16:26:31 -07:00
xfs_export.c xfs: abstract block export operations from nfsd layouts 2016-07-15 15:31:29 -04:00
xfs_export.h
xfs_extent_busy.c xfs: fix len comparison in xfs_extent_busy_trim 2017-02-16 17:20:12 -08:00
xfs_extent_busy.h xfs: improve handling of busy extents in the low-level allocator 2017-02-09 10:50:25 -08:00
xfs_extfree_item.c xfs: remove unnecessary parentheses from log redo item recovery functions 2016-08-03 12:29:32 +10:00
xfs_extfree_item.h xfs: refactor redo intent item processing 2016-08-03 11:23:49 +10:00
xfs_file.c mm,fs,dax: change ->pmd_fault to ->huge_fault 2017-02-24 17:46:54 -08:00
xfs_filestream.c Merge branch 'xfs-4.9-log-recovery-fixes' into for-next 2016-10-03 09:56:28 +11:00
xfs_filestream.h
xfs_fsops.c xfs: remove boilerplate around xfs_btree_init_block 2017-01-30 16:32:24 -08:00
xfs_fsops.h xfs: preallocate blocks for worst-case btree expansion 2016-10-05 16:26:27 -07:00
xfs_globals.c xfs: garbage collect old cowextsz reservations 2016-10-05 16:26:28 -07:00
xfs_icache.c xfs: sync eofblocks scans under iolock are livelock prone 2017-01-30 16:32:25 -08:00
xfs_icache.h xfs: sync eofblocks scans under iolock are livelock prone 2017-01-30 16:32:25 -08:00
xfs_icreate_item.c fs: xfs: xfs_icreate_item: constify xfs_item_ops structure 2016-11-28 14:57:42 +11:00
xfs_icreate_item.h
xfs_inode_item.c xfs: provide helper for counting extents from if_bytes 2016-11-08 12:59:42 +11:00
xfs_inode_item.h
xfs_inode.c xfs: pull up iolock from xfs_free_eofblocks() 2017-01-30 16:32:25 -08:00
xfs_inode.h xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_ioctl32.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xfs_ioctl32.h
xfs_ioctl.c xfs: remove unused full argument from bmap 2017-01-30 16:32:25 -08:00
xfs_ioctl.h xfs: don't pass ioflags around in the ioctl path 2016-07-20 11:29:35 +10:00
xfs_iomap.c xfs: resurrect debug mode drop buffered writes mechanism 2017-02-16 17:19:15 -08:00
xfs_iomap.h xfs: introduce xfs_aligned_fsb_count 2017-02-06 17:47:46 -08:00
xfs_iops.c xfs: sanity check inode mode when creating new dentry 2017-01-17 11:42:22 -08:00
xfs_iops.h xfs: Propagate dentry down to inode_change_ok() 2016-09-22 10:56:19 +02:00
xfs_itable.c xfs: create a separate cow extent size hint for the allocator 2016-10-05 16:26:26 -07:00
xfs_itable.h
xfs_linux.h xfs: make the ASSERT() condition likely 2017-01-17 11:41:41 -08:00
xfs_log_cil.c xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_log_priv.h xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_log_recover.c Merge branch 'xfs-4.10-misc-fixes-3' into for-next 2016-12-07 17:42:30 +11:00
xfs_log.c xfs: don't print warnings when xfs_log_force fails 2017-01-09 13:45:01 -08:00
xfs_log.h xfs: remove unused struct declarations 2017-01-30 16:32:25 -08:00
xfs_message.c
xfs_message.h
xfs_mount.c xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_mount.h xfs: resurrect debug mode drop buffered writes mechanism 2017-02-16 17:19:15 -08:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_ondisk.h xfs: define the on-disk refcount btree format 2016-10-03 09:11:18 -07:00
xfs_pnfs.c xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_pnfs.h xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_qm_bhv.c
xfs_qm_syscalls.c
xfs_qm.c xfs: prevent quotacheck from overloading inode lru 2017-01-27 09:32:30 -08:00
xfs_qm.h
xfs_quota.h
xfs_quotaops.c
xfs_refcount_item.c xfs: fix double-cleanup when CUI recovery fails 2017-01-03 18:39:32 -08:00
xfs_refcount_item.h xfs: log refcount intent items 2016-10-03 09:11:21 -07:00
xfs_reflink.c xfs: fix uninitialized variable in _reflink_convert_cow 2017-02-16 17:20:12 -08:00
xfs_reflink.h xfs: allocate direct I/O COW blocks in iomap_begin 2017-02-06 17:47:47 -08:00
xfs_rmap_item.c xfs: convert unwritten status of reverse mappings for shared files 2016-10-05 16:26:29 -07:00
xfs_rmap_item.h xfs: convert RUI log formats to use variable length arrays 2016-09-19 10:24:27 +10:00
xfs_rtalloc.c xfs: simplify xfs_rtallocate_extent 2017-02-17 16:52:52 -08:00
xfs_rtalloc.h xfs: simplify xfs_rtallocate_extent 2017-02-17 16:52:52 -08:00
xfs_stats.c xfs: make xfs btree stats less huge 2016-12-05 14:38:58 +11:00
xfs_stats.h xfs: make xfs btree stats less huge 2016-12-05 14:38:58 +11:00
xfs_super.c xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_super.h xfs: don't block the log commit handler for discards 2017-02-09 11:36:40 -08:00
xfs_symlink.c xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_symlink.h
xfs_sysctl.c xfs: garbage collect old cowextsz reservations 2016-10-05 16:26:28 -07:00
xfs_sysctl.h xfs: garbage collect old cowextsz reservations 2016-10-05 16:26:28 -07:00
xfs_sysfs.c xfs: resurrect debug mode drop buffered writes mechanism 2017-02-16 17:19:15 -08:00
xfs_sysfs.h
xfs_trace.c xfs: rework xfs_bmap_free callers to use xfs_defer_ops 2016-08-03 11:15:38 +10:00
xfs_trace.h mm,fs,dax: change ->pmd_fault to ->huge_fault 2017-02-24 17:46:54 -08:00
xfs_trans_ail.c
xfs_trans_bmap.c xfs: implement deferred bmbt map/unmap operations 2016-10-04 11:05:44 -07:00
xfs_trans_buf.c
xfs_trans_dquot.c
xfs_trans_extfree.c xfs: set up per-AG free space reservations 2016-09-19 10:30:52 +10:00
xfs_trans_inode.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
xfs_trans_priv.h
xfs_trans_refcount.c xfs: connect refcount adjust functions to upper layers 2016-10-03 09:11:22 -07:00
xfs_trans_rmap.c xfs: add shared rmap map/unmap/convert log item types 2016-10-05 16:26:29 -07:00
xfs_trans.c Merge branch 'xfs-4.9-reflink-prep' into for-next 2016-10-03 09:52:31 +11:00
xfs_trans.h xfs: remove unused struct declarations 2017-01-30 16:32:25 -08:00
xfs_xattr.c xfs: several xattr functions can be void 2016-12-05 12:32:14 +11:00
xfs.h