linux/Documentation/vm
Kirill A. Shutemov c8d78c1823 mm: replace remap_file_pages() syscall with emulation
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.

Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.

Let's drop it and get rid of all these special-cased code.

The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.

I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on.  I've checked Debian code search and source of all
packages in ALT Linux.  No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.

There are few basic tests in LTP for the syscall.  They work just fine
with emulation.

To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.

The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.

Before:		23.3 ( +-  4.31% ) seconds
After:		43.9 ( +-  0.85% ) seconds
Slowdown:	1.88x

I believe we can live with that.

Test case:

        #define _GNU_SOURCE
        #include <assert.h>
        #include <stdlib.h>
        #include <stdio.h>
        #include <sys/mman.h>

        #define MB	(1024UL * 1024)
        #define SIZE	(4096 * MB)

        int main(int argc, char **argv)
        {
                unsigned long *p;
                long i, pass;

                for (pass = 0; pass < 10; pass++) {
                        p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                        MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                        if (p == MAP_FAILED) {
                                perror("mmap");
                                return -1;
                        }

                        for (i = 0; i < SIZE / 4096; i++)
                                p[i * 4096 / sizeof(*p)] = i;

                        for (i = 0; i < SIZE / 4096; i++) {
                                if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
                                                0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
                                        perror("remap_file_pages");
                                        return -1;
                                }
                        }

                        for (i = SIZE / 4096 - 1; i >= 0; i--)
                                assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);

                        munmap(p, SIZE);
                }

                return 0;
        }

[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:30 -08:00
..
.gitignore Documentation/vm/.gitignore: add page-types 2009-09-24 07:20:57 -07:00
00-INDEX Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
active_mm.txt Fix common misspellings 2011-03-31 11:26:23 -03:00
balance
cleancache.txt Cleanups: rename of flush to invalidate, moving reporting of statistics 2012-03-22 19:52:47 -07:00
frontswap.txt doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
highmem.txt mm: highmem documentation 2010-10-26 16:52:08 -07:00
hugetlbpage.txt Documentation: vm: Add 1GB large page support information 2014-11-06 15:14:11 -05:00
hwpoison.txt mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) 2014-06-04 16:54:13 -07:00
ksm.txt ksm: add some comments 2013-02-23 17:50:23 -08:00
numa doc: fix broken references 2011-09-27 18:08:04 +02:00
numa_memory_policy.txt Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt 2014-04-18 16:40:08 -07:00
overcommit-accounting mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
page_migration
page_owner.txt Documentation: add new page_owner document 2014-12-13 12:42:48 -08:00
pagemap.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-07-04 11:40:58 -07:00
remap_file_pages.txt mm: replace remap_file_pages() syscall with emulation 2015-02-10 14:30:30 -08:00
slub.txt Documentations: Fix slabinfo.c directory in vm/slub.txt 2012-05-10 11:45:23 +03:00
soft-dirty.txt mm: track vma changes with VM_SOFTDIRTY bit 2013-09-11 15:57:56 -07:00
split_page_table_lock x86, mm: do not leak page->ptl for pmd page tables 2013-11-21 16:42:28 -08:00
transhuge.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
unevictable-lru.txt doc: fix double words 2014-03-21 13:16:58 +01:00
zswap.txt Documentation/vm/zswap.txt: fix typos 2013-11-13 12:09:05 +09:00