forked from Minki/linux
Merge branch 'nfs' into docs-next
Daniel W. S. Almeida writes: This series converts a few docs in Documentation/filesystems/nfs to RST. The docs were also moved into admin-guide because they contain information that might be useful for system administrators Most changes are related to aesthetics and presentation, i.e. the content itself remains mostly untouched. The use of markup was limited in order not to negatively impact the plain-text reading experience.
This commit is contained in:
commit
61f005901b
@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking.
|
|||||||
device-mapper/index
|
device-mapper/index
|
||||||
efi-stub
|
efi-stub
|
||||||
ext4
|
ext4
|
||||||
|
nfs/index
|
||||||
gpio/index
|
gpio/index
|
||||||
highuid
|
highuid
|
||||||
hw_random
|
hw_random
|
||||||
|
@ -1,6 +1,7 @@
|
|||||||
|
===================
|
||||||
|
NFS Fault Injection
|
||||||
|
===================
|
||||||
|
|
||||||
Fault Injection
|
|
||||||
===============
|
|
||||||
Fault injection is a method for forcing errors that may not normally occur, or
|
Fault injection is a method for forcing errors that may not normally occur, or
|
||||||
may be difficult to reproduce. Forcing these errors in a controlled environment
|
may be difficult to reproduce. Forcing these errors in a controlled environment
|
||||||
can help the developer find and fix bugs before their code is shipped in a
|
can help the developer find and fix bugs before their code is shipped in a
|
15
Documentation/admin-guide/nfs/index.rst
Normal file
15
Documentation/admin-guide/nfs/index.rst
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
=============
|
||||||
|
NFS
|
||||||
|
=============
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
nfs-client
|
||||||
|
nfsroot
|
||||||
|
nfs-rdma
|
||||||
|
nfsd-admin-interfaces
|
||||||
|
nfs-idmapper
|
||||||
|
pnfs-block-server
|
||||||
|
pnfs-scsi-server
|
||||||
|
fault_injection
|
@ -1,3 +1,6 @@
|
|||||||
|
==========
|
||||||
|
NFS Client
|
||||||
|
==========
|
||||||
|
|
||||||
The NFS client
|
The NFS client
|
||||||
==============
|
==============
|
||||||
@ -59,10 +62,11 @@ The DNS resolver
|
|||||||
|
|
||||||
NFSv4 allows for one server to refer the NFS client to data that has been
|
NFSv4 allows for one server to refer the NFS client to data that has been
|
||||||
migrated onto another server by means of the special "fs_locations"
|
migrated onto another server by means of the special "fs_locations"
|
||||||
attribute. See
|
attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and
|
||||||
http://tools.ietf.org/html/rfc3530#section-6
|
`Implementation Guide for Referrals in NFSv4`_.
|
||||||
and
|
|
||||||
http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
|
.. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6
|
||||||
|
.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
|
||||||
|
|
||||||
The fs_locations information can take the form of either an ip address and
|
The fs_locations information can take the form of either an ip address and
|
||||||
a path, or a DNS hostname and a path. The latter requires the NFS client to
|
a path, or a DNS hostname and a path. The latter requires the NFS client to
|
||||||
@ -78,8 +82,8 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
|
|||||||
(2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
|
(2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
|
||||||
(may be changed using the 'nfs.cache_getent' kernel boot parameter)
|
(may be changed using the 'nfs.cache_getent' kernel boot parameter)
|
||||||
is run, with two arguments:
|
is run, with two arguments:
|
||||||
- the cache name, "dns_resolve"
|
- the cache name, "dns_resolve"
|
||||||
- the hostname to resolve
|
- the hostname to resolve
|
||||||
|
|
||||||
(3) After looking up the corresponding ip address, the helper script
|
(3) After looking up the corresponding ip address, the helper script
|
||||||
writes the result into the rpc_pipefs pseudo-file
|
writes the result into the rpc_pipefs pseudo-file
|
||||||
@ -94,43 +98,44 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
|
|||||||
script, and <ttl> is the 'time to live' of this cache entry (in
|
script, and <ttl> is the 'time to live' of this cache entry (in
|
||||||
units of seconds).
|
units of seconds).
|
||||||
|
|
||||||
Note: If <ip address> is invalid, say the string "0", then a negative
|
.. note::
|
||||||
entry is created, which will cause the kernel to treat the hostname
|
If <ip address> is invalid, say the string "0", then a negative
|
||||||
as having no valid DNS translation.
|
entry is created, which will cause the kernel to treat the hostname
|
||||||
|
as having no valid DNS translation.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
A basic sample /sbin/nfs_cache_getent
|
A basic sample /sbin/nfs_cache_getent
|
||||||
=====================================
|
=====================================
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
#
|
#
|
||||||
ttl=600
|
ttl=600
|
||||||
#
|
#
|
||||||
cut=/usr/bin/cut
|
cut=/usr/bin/cut
|
||||||
getent=/usr/bin/getent
|
getent=/usr/bin/getent
|
||||||
rpc_pipefs=/var/lib/nfs/rpc_pipefs
|
rpc_pipefs=/var/lib/nfs/rpc_pipefs
|
||||||
#
|
#
|
||||||
die()
|
die()
|
||||||
{
|
{
|
||||||
echo "Usage: $0 cache_name entry_name"
|
echo "Usage: $0 cache_name entry_name"
|
||||||
exit 1
|
exit 1
|
||||||
}
|
}
|
||||||
|
|
||||||
[ $# -lt 2 ] && die
|
[ $# -lt 2 ] && die
|
||||||
cachename="$1"
|
cachename="$1"
|
||||||
cache_path=${rpc_pipefs}/cache/${cachename}/channel
|
cache_path=${rpc_pipefs}/cache/${cachename}/channel
|
||||||
|
|
||||||
case "${cachename}" in
|
|
||||||
dns_resolve)
|
|
||||||
name="$2"
|
|
||||||
result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
|
|
||||||
[ -z "${result}" ] && result="0"
|
|
||||||
;;
|
|
||||||
*)
|
|
||||||
die
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
echo "${result} ${name} ${ttl}" >${cache_path}
|
|
||||||
|
|
||||||
|
case "${cachename}" in
|
||||||
|
dns_resolve)
|
||||||
|
name="$2"
|
||||||
|
result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
|
||||||
|
[ -z "${result}" ] && result="0"
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
die
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
echo "${result} ${name} ${ttl}" >${cache_path}
|
@ -1,7 +1,7 @@
|
|||||||
|
=============
|
||||||
|
NFS ID Mapper
|
||||||
|
=============
|
||||||
|
|
||||||
=========
|
|
||||||
ID Mapper
|
|
||||||
=========
|
|
||||||
Id mapper is used by NFS to translate user and group ids into names, and to
|
Id mapper is used by NFS to translate user and group ids into names, and to
|
||||||
translate user and group names into ids. Part of this translation involves
|
translate user and group names into ids. Part of this translation involves
|
||||||
performing an upcall to userspace to request the information. There are two
|
performing an upcall to userspace to request the information. There are two
|
||||||
@ -20,22 +20,24 @@ legacy rpc.idmap daemon for the id mapping. This result will be stored
|
|||||||
in a custom NFS idmap cache.
|
in a custom NFS idmap cache.
|
||||||
|
|
||||||
|
|
||||||
===========
|
|
||||||
Configuring
|
Configuring
|
||||||
===========
|
===========
|
||||||
|
|
||||||
The file /etc/request-key.conf will need to be modified so /sbin/request-key can
|
The file /etc/request-key.conf will need to be modified so /sbin/request-key can
|
||||||
direct the upcall. The following line should be added:
|
direct the upcall. The following line should be added:
|
||||||
|
|
||||||
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
|
``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...``
|
||||||
#====== ======= =============== =============== ===============================
|
``#====== ======= =============== =============== ===============================``
|
||||||
create id_resolver * * /usr/sbin/nfs.idmap %k %d 600
|
``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600``
|
||||||
|
|
||||||
|
|
||||||
This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap.
|
This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap.
|
||||||
The last parameter, 600, defines how many seconds into the future the key will
|
The last parameter, 600, defines how many seconds into the future the key will
|
||||||
expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout
|
expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout
|
||||||
is not specified, nfs.idmap will default to 600 seconds.
|
is not specified, nfs.idmap will default to 600 seconds.
|
||||||
|
|
||||||
id mapper uses for key descriptions:
|
id mapper uses for key descriptions::
|
||||||
|
|
||||||
uid: Find the UID for the given user
|
uid: Find the UID for the given user
|
||||||
gid: Find the GID for the given group
|
gid: Find the GID for the given group
|
||||||
user: Find the user name for the given UID
|
user: Find the user name for the given UID
|
||||||
@ -45,23 +47,24 @@ You can handle any of these individually, rather than using the generic upcall
|
|||||||
program. If you would like to use your own program for a uid lookup then you
|
program. If you would like to use your own program for a uid lookup then you
|
||||||
would edit your request-key.conf so it look similar to this:
|
would edit your request-key.conf so it look similar to this:
|
||||||
|
|
||||||
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
|
``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...``
|
||||||
#====== ======= =============== =============== ===============================
|
``#====== ======= =============== =============== ===============================``
|
||||||
create id_resolver uid:* * /some/other/program %k %d 600
|
``create id_resolver uid:* * /some/other/program %k %d 600``
|
||||||
create id_resolver * * /usr/sbin/nfs.idmap %k %d 600
|
``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600``
|
||||||
|
|
||||||
|
|
||||||
Notice that the new line was added above the line for the generic program.
|
Notice that the new line was added above the line for the generic program.
|
||||||
request-key will find the first matching line and corresponding program. In
|
request-key will find the first matching line and corresponding program. In
|
||||||
this case, /some/other/program will handle all uid lookups and
|
this case, /some/other/program will handle all uid lookups and
|
||||||
/usr/sbin/nfs.idmap will handle gid, user, and group lookups.
|
/usr/sbin/nfs.idmap will handle gid, user, and group lookups.
|
||||||
|
|
||||||
See <file:Documentation/security/keys/request-key.rst> for more information
|
See Documentation/security/keys/request-key.rst for more information
|
||||||
about the request-key function.
|
about the request-key function.
|
||||||
|
|
||||||
|
|
||||||
=========
|
|
||||||
nfs.idmap
|
nfs.idmap
|
||||||
=========
|
=========
|
||||||
|
|
||||||
nfs.idmap is designed to be called by request-key, and should not be run "by
|
nfs.idmap is designed to be called by request-key, and should not be run "by
|
||||||
hand". This program takes two arguments, a serialized key and a key
|
hand". This program takes two arguments, a serialized key and a key
|
||||||
description. The serialized key is first converted into a key_serial_t, and
|
description. The serialized key is first converted into a key_serial_t, and
|
292
Documentation/admin-guide/nfs/nfs-rdma.rst
Normal file
292
Documentation/admin-guide/nfs/nfs-rdma.rst
Normal file
@ -0,0 +1,292 @@
|
|||||||
|
===================
|
||||||
|
Setting up NFS/RDMA
|
||||||
|
===================
|
||||||
|
|
||||||
|
:Author:
|
||||||
|
NetApp and Open Grid Computing (May 29, 2008)
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
This document is probably obsolete.
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
This document describes how to install and setup the Linux NFS/RDMA client
|
||||||
|
and server software.
|
||||||
|
|
||||||
|
The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
|
||||||
|
was first included in the following release, Linux 2.6.25.
|
||||||
|
|
||||||
|
In our testing, we have obtained excellent performance results (full 10Gbit
|
||||||
|
wire bandwidth at minimal client CPU) under many workloads. The code passes
|
||||||
|
the full Connectathon test suite and operates over both Infiniband and iWARP
|
||||||
|
RDMA adapters.
|
||||||
|
|
||||||
|
Getting Help
|
||||||
|
============
|
||||||
|
|
||||||
|
If you get stuck, you can ask questions on the
|
||||||
|
nfs-rdma-devel@lists.sourceforge.net mailing list.
|
||||||
|
|
||||||
|
Installation
|
||||||
|
============
|
||||||
|
|
||||||
|
These instructions are a step by step guide to building a machine for
|
||||||
|
use with NFS/RDMA.
|
||||||
|
|
||||||
|
- Install an RDMA device
|
||||||
|
|
||||||
|
Any device supported by the drivers in drivers/infiniband/hw is acceptable.
|
||||||
|
|
||||||
|
Testing has been performed using several Mellanox-based IB cards, the
|
||||||
|
Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
|
||||||
|
|
||||||
|
- Install a Linux distribution and tools
|
||||||
|
|
||||||
|
The first kernel release to contain both the NFS/RDMA client and server was
|
||||||
|
Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
|
||||||
|
Linux kernel release should be installed.
|
||||||
|
|
||||||
|
The procedures described in this document have been tested with
|
||||||
|
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
||||||
|
|
||||||
|
- Install nfs-utils-1.1.2 or greater on the client
|
||||||
|
|
||||||
|
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
|
||||||
|
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
|
||||||
|
version with support for NFS/RDMA mounts, but for various reasons we
|
||||||
|
recommend using nfs-utils-1.1.2 or greater). To see which version of
|
||||||
|
mount.nfs you are using, type:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ /sbin/mount.nfs -V
|
||||||
|
|
||||||
|
If the version is less than 1.1.2 or the command does not exist,
|
||||||
|
you should install the latest version of nfs-utils.
|
||||||
|
|
||||||
|
Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs
|
||||||
|
|
||||||
|
Uncompress the package and follow the installation instructions.
|
||||||
|
|
||||||
|
If you will not need the idmapper and gssd executables (you do not need
|
||||||
|
these to create an NFS/RDMA enabled mount command), the installation
|
||||||
|
process can be simplified by disabling these features when running
|
||||||
|
configure:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ ./configure --disable-gss --disable-nfsv4
|
||||||
|
|
||||||
|
To build nfs-utils you will need the tcp_wrappers package installed. For
|
||||||
|
more information on this see the package's README and INSTALL files.
|
||||||
|
|
||||||
|
After building the nfs-utils package, there will be a mount.nfs binary in
|
||||||
|
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
||||||
|
or v4 mounts. To initiate a v4 mount, the binary must be called
|
||||||
|
mount.nfs4. The standard technique is to create a symlink called
|
||||||
|
mount.nfs4 to mount.nfs.
|
||||||
|
|
||||||
|
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
||||||
|
|
||||||
|
In this location, mount.nfs will be invoked automatically for NFS mounts
|
||||||
|
by the system mount command.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
||||||
|
on the NFS client machine. You do not need this specific version of
|
||||||
|
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
||||||
|
nfs-utils-1.1.2 is needed on the client.
|
||||||
|
|
||||||
|
- Install a Linux kernel with NFS/RDMA
|
||||||
|
|
||||||
|
The NFS/RDMA client and server are both included in the mainline Linux
|
||||||
|
kernel version 2.6.25 and later. This and other versions of the Linux
|
||||||
|
kernel can be found at: https://www.kernel.org/pub/linux/kernel/
|
||||||
|
|
||||||
|
Download the sources and place them in an appropriate location.
|
||||||
|
|
||||||
|
- Configure the RDMA stack
|
||||||
|
|
||||||
|
Make sure your kernel configuration has RDMA support enabled. Under
|
||||||
|
Device Drivers -> InfiniBand support, update the kernel configuration
|
||||||
|
to enable InfiniBand support [NOTE: the option name is misleading. Enabling
|
||||||
|
InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
|
||||||
|
|
||||||
|
Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
|
||||||
|
iWARP adapter support (amso, cxgb3, etc.).
|
||||||
|
|
||||||
|
If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
|
||||||
|
|
||||||
|
- Configure the NFS client and server
|
||||||
|
|
||||||
|
Your kernel configuration must also have NFS file system support and/or
|
||||||
|
NFS server support enabled. These and other NFS related configuration
|
||||||
|
options can be found under File Systems -> Network File Systems.
|
||||||
|
|
||||||
|
- Build, install, reboot
|
||||||
|
|
||||||
|
The NFS/RDMA code will be enabled automatically if NFS and RDMA
|
||||||
|
are turned on. The NFS/RDMA client and server are configured via the hidden
|
||||||
|
SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
|
||||||
|
value of SUNRPC_XPRT_RDMA will be:
|
||||||
|
|
||||||
|
#. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
|
||||||
|
and server will not be built
|
||||||
|
|
||||||
|
#. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
|
||||||
|
in this case the NFS/RDMA client and server will be built as modules
|
||||||
|
|
||||||
|
#. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
|
||||||
|
and server will be built into the kernel
|
||||||
|
|
||||||
|
Therefore, if you have followed the steps above and turned no NFS and RDMA,
|
||||||
|
the NFS/RDMA client and server will be built.
|
||||||
|
|
||||||
|
Build a new kernel, install it, boot it.
|
||||||
|
|
||||||
|
Check RDMA and NFS Setup
|
||||||
|
========================
|
||||||
|
|
||||||
|
Before configuring the NFS/RDMA software, it is a good idea to test
|
||||||
|
your new kernel to ensure that the kernel is working correctly.
|
||||||
|
In particular, it is a good idea to verify that the RDMA stack
|
||||||
|
is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
|
||||||
|
is working properly.
|
||||||
|
|
||||||
|
- Check RDMA Setup
|
||||||
|
|
||||||
|
If you built the RDMA components as modules, load them at
|
||||||
|
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
||||||
|
card:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ modprobe ib_mthca
|
||||||
|
$ modprobe ib_ipoib
|
||||||
|
|
||||||
|
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
||||||
|
running on the network. If your IB switch has an embedded SM, you can
|
||||||
|
use it. Otherwise, you will need to run an SM, such as OpenSM, on one
|
||||||
|
of your end nodes.
|
||||||
|
|
||||||
|
If an SM is running on your network, you should see the following:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ cat /sys/class/infiniband/driverX/ports/1/state
|
||||||
|
4: ACTIVE
|
||||||
|
|
||||||
|
where driverX is mthca0, ipath5, ehca3, etc.
|
||||||
|
|
||||||
|
To further test the InfiniBand software stack, use IPoIB (this
|
||||||
|
assumes you have two IB hosts named host1 and host2):
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
host1$ ip link set dev ib0 up
|
||||||
|
host1$ ip address add dev ib0 a.b.c.x
|
||||||
|
host2$ ip link set dev ib0 up
|
||||||
|
host2$ ip address add dev ib0 a.b.c.y
|
||||||
|
host1$ ping a.b.c.y
|
||||||
|
host2$ ping a.b.c.x
|
||||||
|
|
||||||
|
For other device types, follow the appropriate procedures.
|
||||||
|
|
||||||
|
- Check NFS Setup
|
||||||
|
|
||||||
|
For the NFS components enabled above (client and/or server),
|
||||||
|
test their functionality over standard Ethernet using TCP/IP or UDP/IP.
|
||||||
|
|
||||||
|
NFS/RDMA Setup
|
||||||
|
==============
|
||||||
|
|
||||||
|
We recommend that you use two machines, one to act as the client and
|
||||||
|
one to act as the server.
|
||||||
|
|
||||||
|
One time configuration:
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- On the server system, configure the /etc/exports file and start the NFS/RDMA server.
|
||||||
|
|
||||||
|
Exports entries with the following formats have been tested::
|
||||||
|
|
||||||
|
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
||||||
|
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
||||||
|
|
||||||
|
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
|
||||||
|
HCA or the client's iWARP address(es) for an RNIC.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
The "insecure" option must be used because the NFS/RDMA client does
|
||||||
|
not use a reserved port.
|
||||||
|
|
||||||
|
Each time a machine boots:
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
- Load and configure the RDMA drivers
|
||||||
|
|
||||||
|
For InfiniBand using a Mellanox adapter:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ modprobe ib_mthca
|
||||||
|
$ modprobe ib_ipoib
|
||||||
|
$ ip li set dev ib0 up
|
||||||
|
$ ip addr add dev ib0 a.b.c.d
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Please use unique addresses for the client and server!
|
||||||
|
|
||||||
|
- Start the NFS server
|
||||||
|
|
||||||
|
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||||
|
kernel config), load the RDMA transport module:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ modprobe svcrdma
|
||||||
|
|
||||||
|
Regardless of how the server was built (module or built-in), start the
|
||||||
|
server:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ /etc/init.d/nfs start
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ service nfs start
|
||||||
|
|
||||||
|
Instruct the server to listen on the RDMA transport:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ echo rdma 20049 > /proc/fs/nfsd/portlist
|
||||||
|
|
||||||
|
- On the client system
|
||||||
|
|
||||||
|
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||||
|
kernel config), load the RDMA client module:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ modprobe xprtrdma.ko
|
||||||
|
|
||||||
|
Regardless of how the client was built (module or built-in), use this
|
||||||
|
command to mount the NFS/RDMA server:
|
||||||
|
|
||||||
|
.. code-block:: sh
|
||||||
|
|
||||||
|
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
|
||||||
|
|
||||||
|
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
|
||||||
|
the "proto" field for the given mount.
|
||||||
|
|
||||||
|
Congratulations! You're using NFS/RDMA!
|
@ -1,5 +1,6 @@
|
|||||||
|
==================================
|
||||||
Administrative interfaces for nfsd
|
Administrative interfaces for nfsd
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
==================================
|
||||||
|
|
||||||
Note that normally these interfaces are used only by the utilities in
|
Note that normally these interfaces are used only by the utilities in
|
||||||
nfs-utils.
|
nfs-utils.
|
||||||
@ -13,18 +14,16 @@ nfsd/threads.
|
|||||||
Before doing that, NFSD can be told which sockets to listen on by
|
Before doing that, NFSD can be told which sockets to listen on by
|
||||||
writing to nfsd/portlist; that write may be:
|
writing to nfsd/portlist; that write may be:
|
||||||
|
|
||||||
- an ascii-encoded file descriptor, which should refer to a
|
- an ascii-encoded file descriptor, which should refer to a
|
||||||
bound (and listening, for tcp) socket, or
|
bound (and listening, for tcp) socket, or
|
||||||
- "transportname port", where transportname is currently either
|
- "transportname port", where transportname is currently either
|
||||||
"udp", "tcp", or "rdma".
|
"udp", "tcp", or "rdma".
|
||||||
|
|
||||||
If nfsd is started without doing any of these, then it will create one
|
If nfsd is started without doing any of these, then it will create one
|
||||||
udp and one tcp listener at port 2049 (see nfsd_init_socks).
|
udp and one tcp listener at port 2049 (see nfsd_init_socks).
|
||||||
|
|
||||||
On startup, nfsd and lockd grace periods start.
|
On startup, nfsd and lockd grace periods start. nfsd is shut down by a write of
|
||||||
|
0 to nfsd/threads. All locks and state are thrown away at that point.
|
||||||
nfsd is shut down by a write of 0 to nfsd/threads. All locks and state
|
|
||||||
are thrown away at that point.
|
|
||||||
|
|
||||||
Between startup and shutdown, the number of threads may be adjusted up
|
Between startup and shutdown, the number of threads may be adjusted up
|
||||||
or down by additional writes to nfsd/threads or by writes to
|
or down by additional writes to nfsd/threads or by writes to
|
||||||
@ -34,7 +33,7 @@ For more detail about files under nfsd/ and what they control, see
|
|||||||
fs/nfsd/nfsctl.c; most of them have detailed comments.
|
fs/nfsd/nfsctl.c; most of them have detailed comments.
|
||||||
|
|
||||||
Implementation notes
|
Implementation notes
|
||||||
^^^^^^^^^^^^^^^^^^^^
|
====================
|
||||||
|
|
||||||
Note that the rpc server requires the caller to serialize addition and
|
Note that the rpc server requires the caller to serialize addition and
|
||||||
removal of listening sockets, and startup and shutdown of the server.
|
removal of listening sockets, and startup and shutdown of the server.
|
@ -1,27 +1,34 @@
|
|||||||
|
===============================================
|
||||||
Mounting the root filesystem via NFS (nfsroot)
|
Mounting the root filesystem via NFS (nfsroot)
|
||||||
===============================================
|
===============================================
|
||||||
|
|
||||||
Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
|
:Authors:
|
||||||
Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
|
Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
|
||||||
Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
|
|
||||||
Updated 2006 by Horms <horms@verge.net.au>
|
Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
|
||||||
Updated 2018 by Chris Novakovic <chris@chrisn.me.uk>
|
|
||||||
|
Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
|
||||||
|
|
||||||
|
Updated 2006 by Horms <horms@verge.net.au>
|
||||||
|
|
||||||
|
Updated 2018 by Chris Novakovic <chris@chrisn.me.uk>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
In order to use a diskless system, such as an X-terminal or printer server
|
In order to use a diskless system, such as an X-terminal or printer server for
|
||||||
for example, it is necessary for the root filesystem to be present on a
|
example, it is necessary for the root filesystem to be present on a non-disk
|
||||||
non-disk device. This may be an initramfs (see Documentation/filesystems/
|
device. This may be an initramfs (see
|
||||||
ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/admin-guide/initrd.rst) or a
|
Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see
|
||||||
filesystem mounted via NFS. The following text describes on how to use NFS
|
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
|
||||||
for the root filesystem. For the rest of this text 'client' means the
|
following text describes on how to use NFS for the root filesystem. For the rest
|
||||||
diskless system, and 'server' means the NFS server.
|
of this text 'client' means the diskless system, and 'server' means the NFS
|
||||||
|
server.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
1.) Enabling nfsroot capabilities
|
Enabling nfsroot capabilities
|
||||||
-----------------------------
|
=============================
|
||||||
|
|
||||||
In order to use nfsroot, NFS client support needs to be selected as
|
In order to use nfsroot, NFS client support needs to be selected as
|
||||||
built-in during configuration. Once this has been selected, the nfsroot
|
built-in during configuration. Once this has been selected, the nfsroot
|
||||||
@ -34,8 +41,8 @@ DHCP, BOOTP and RARP is safe.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
2.) Kernel command line
|
Kernel command line
|
||||||
-------------------
|
===================
|
||||||
|
|
||||||
When the kernel has been loaded by a boot loader (see below) it needs to be
|
When the kernel has been loaded by a boot loader (see below) it needs to be
|
||||||
told what root fs device to use. And in the case of nfsroot, where to find
|
told what root fs device to use. And in the case of nfsroot, where to find
|
||||||
@ -44,19 +51,17 @@ This can be established using the following kernel command line parameters:
|
|||||||
|
|
||||||
|
|
||||||
root=/dev/nfs
|
root=/dev/nfs
|
||||||
|
|
||||||
This is necessary to enable the pseudo-NFS-device. Note that it's not a
|
This is necessary to enable the pseudo-NFS-device. Note that it's not a
|
||||||
real device but just a synonym to tell the kernel to use NFS instead of
|
real device but just a synonym to tell the kernel to use NFS instead of
|
||||||
a real device.
|
a real device.
|
||||||
|
|
||||||
|
|
||||||
nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
|
nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
|
||||||
|
|
||||||
If the `nfsroot' parameter is NOT given on the command line,
|
If the `nfsroot' parameter is NOT given on the command line,
|
||||||
the default "/tftpboot/%s" will be used.
|
the default ``"/tftpboot/%s"`` will be used.
|
||||||
|
|
||||||
<server-ip> Specifies the IP address of the NFS server.
|
<server-ip> Specifies the IP address of the NFS server.
|
||||||
The default address is determined by the `ip' parameter
|
The default address is determined by the ip parameter
|
||||||
(see below). This parameter allows the use of different
|
(see below). This parameter allows the use of different
|
||||||
servers for IP autoconfiguration and NFS.
|
servers for IP autoconfiguration and NFS.
|
||||||
|
|
||||||
@ -66,7 +71,8 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
|
|||||||
IP address.
|
IP address.
|
||||||
|
|
||||||
<nfs-options> Standard NFS options. All options are separated by commas.
|
<nfs-options> Standard NFS options. All options are separated by commas.
|
||||||
The following defaults are used:
|
The following defaults are used::
|
||||||
|
|
||||||
port = as given by server portmap daemon
|
port = as given by server portmap daemon
|
||||||
rsize = 4096
|
rsize = 4096
|
||||||
wsize = 4096
|
wsize = 4096
|
||||||
@ -79,13 +85,11 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
|
|||||||
flags = hard, nointr, noposix, cto, ac
|
flags = hard, nointr, noposix, cto, ac
|
||||||
|
|
||||||
|
|
||||||
ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
|
ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip>
|
||||||
<dns0-ip>:<dns1-ip>:<ntp0-ip>
|
|
||||||
|
|
||||||
This parameter tells the kernel how to configure IP addresses of devices
|
This parameter tells the kernel how to configure IP addresses of devices
|
||||||
and also how to set up the IP routing table. It was originally called
|
and also how to set up the IP routing table. It was originally called
|
||||||
`nfsaddrs', but now the boot-time IP configuration works independently of
|
nfsaddrs, but now the boot-time IP configuration works independently of
|
||||||
NFS, so it was renamed to `ip' and the old name remained as an alias for
|
NFS, so it was renamed to ip and the old name remained as an alias for
|
||||||
compatibility reasons.
|
compatibility reasons.
|
||||||
|
|
||||||
If this parameter is missing from the kernel command line, all fields are
|
If this parameter is missing from the kernel command line, all fields are
|
||||||
@ -93,17 +97,17 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
|
|||||||
this means that the kernel tries to configure everything using
|
this means that the kernel tries to configure everything using
|
||||||
autoconfiguration.
|
autoconfiguration.
|
||||||
|
|
||||||
The <autoconf> parameter can appear alone as the value to the `ip'
|
The <autoconf> parameter can appear alone as the value to the ip
|
||||||
parameter (without all the ':' characters before). If the value is
|
parameter (without all the ':' characters before). If the value is
|
||||||
"ip=off" or "ip=none", no autoconfiguration will take place, otherwise
|
"ip=off" or "ip=none", no autoconfiguration will take place, otherwise
|
||||||
autoconfiguration will take place. The most common way to use this
|
autoconfiguration will take place. The most common way to use this
|
||||||
is "ip=dhcp".
|
is "ip=dhcp".
|
||||||
|
|
||||||
<client-ip> IP address of the client.
|
<client-ip> IP address of the client.
|
||||||
|
|
||||||
Default: Determined using autoconfiguration.
|
Default: Determined using autoconfiguration.
|
||||||
|
|
||||||
<server-ip> IP address of the NFS server. If RARP is used to determine
|
<server-ip> IP address of the NFS server.
|
||||||
|
If RARP is used to determine
|
||||||
the client address and this parameter is NOT empty only
|
the client address and this parameter is NOT empty only
|
||||||
replies from the specified server are accepted.
|
replies from the specified server are accepted.
|
||||||
|
|
||||||
@ -115,19 +119,19 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
|
|||||||
(see below).
|
(see below).
|
||||||
|
|
||||||
Default: Determined using autoconfiguration.
|
Default: Determined using autoconfiguration.
|
||||||
The address of the autoconfiguration server is used.
|
The address of the autoconfiguration server is used.
|
||||||
|
|
||||||
<gw-ip> IP address of a gateway if the server is on a different subnet.
|
<gw-ip> IP address of a gateway if the server is on a different subnet.
|
||||||
|
|
||||||
Default: Determined using autoconfiguration.
|
Default: Determined using autoconfiguration.
|
||||||
|
|
||||||
<netmask> Netmask for local network interface. If unspecified
|
<netmask> Netmask for local network interface.
|
||||||
the netmask is derived from the client IP address assuming
|
If unspecified the netmask is derived from the client IP address
|
||||||
classful addressing.
|
assuming classful addressing.
|
||||||
|
|
||||||
Default: Determined using autoconfiguration.
|
Default: Determined using autoconfiguration.
|
||||||
|
|
||||||
<hostname> Name of the client. If a '.' character is present, anything
|
<hostname> Name of the client.
|
||||||
|
If a '.' character is present, anything
|
||||||
before the first '.' is used as the client's hostname, and anything
|
before the first '.' is used as the client's hostname, and anything
|
||||||
after it is used as its NIS domain name. May be supplied by
|
after it is used as its NIS domain name. May be supplied by
|
||||||
autoconfiguration, but its absence will not trigger autoconfiguration.
|
autoconfiguration, but its absence will not trigger autoconfiguration.
|
||||||
@ -138,21 +142,21 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
|
|||||||
Default: Client IP address is used in ASCII notation.
|
Default: Client IP address is used in ASCII notation.
|
||||||
|
|
||||||
<device> Name of network device to use.
|
<device> Name of network device to use.
|
||||||
|
|
||||||
Default: If the host only has one device, it is used.
|
Default: If the host only has one device, it is used.
|
||||||
Otherwise the device is determined using
|
Otherwise the device is determined using
|
||||||
autoconfiguration. This is done by sending
|
autoconfiguration. This is done by sending
|
||||||
autoconfiguration requests out of all devices,
|
autoconfiguration requests out of all devices,
|
||||||
and using the device that received the first reply.
|
and using the device that received the first reply.
|
||||||
|
|
||||||
<autoconf> Method to use for autoconfiguration. In the case of options
|
<autoconf> Method to use for autoconfiguration.
|
||||||
which specify multiple autoconfiguration protocols,
|
In the case of options
|
||||||
|
which specify multiple autoconfiguration protocols,
|
||||||
requests are sent using all protocols, and the first one
|
requests are sent using all protocols, and the first one
|
||||||
to reply is used.
|
to reply is used.
|
||||||
|
|
||||||
Only autoconfiguration protocols that have been compiled
|
Only autoconfiguration protocols that have been compiled
|
||||||
into the kernel will be used, regardless of the value of
|
into the kernel will be used, regardless of the value of
|
||||||
this option.
|
this option::
|
||||||
|
|
||||||
off or none: don't use autoconfiguration
|
off or none: don't use autoconfiguration
|
||||||
(do static IP assignment instead)
|
(do static IP assignment instead)
|
||||||
@ -221,7 +225,6 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
|
|||||||
|
|
||||||
|
|
||||||
nfsrootdebug
|
nfsrootdebug
|
||||||
|
|
||||||
This parameter enables debugging messages to appear in the kernel
|
This parameter enables debugging messages to appear in the kernel
|
||||||
log at boot time so that administrators can verify that the correct
|
log at boot time so that administrators can verify that the correct
|
||||||
NFS mount options, server address, and root path are passed to the
|
NFS mount options, server address, and root path are passed to the
|
||||||
@ -229,36 +232,32 @@ nfsrootdebug
|
|||||||
|
|
||||||
|
|
||||||
rdinit=<executable file>
|
rdinit=<executable file>
|
||||||
|
|
||||||
To specify which file contains the program that starts system
|
To specify which file contains the program that starts system
|
||||||
initialization, administrators can use this command line parameter.
|
initialization, administrators can use this command line parameter.
|
||||||
The default value of this parameter is "/init". If the specified
|
The default value of this parameter is "/init". If the specified
|
||||||
file exists and the kernel can execute it, root filesystem related
|
file exists and the kernel can execute it, root filesystem related
|
||||||
kernel command line parameters, including `nfsroot=', are ignored.
|
kernel command line parameters, including 'nfsroot=', are ignored.
|
||||||
|
|
||||||
A description of the process of mounting the root file system can be
|
A description of the process of mounting the root file system can be
|
||||||
found in:
|
found in Documentation/driver-api/early-userspace/early_userspace_support.rst
|
||||||
|
|
||||||
Documentation/driver-api/early-userspace/early_userspace_support.rst
|
|
||||||
|
|
||||||
|
|
||||||
|
Boot Loader
|
||||||
|
===========
|
||||||
3.) Boot Loader
|
|
||||||
----------
|
|
||||||
|
|
||||||
To get the kernel into memory different approaches can be used.
|
To get the kernel into memory different approaches can be used.
|
||||||
They depend on various facilities being available:
|
They depend on various facilities being available:
|
||||||
|
|
||||||
|
|
||||||
3.1) Booting from a floppy using syslinux
|
- Booting from a floppy using syslinux
|
||||||
|
|
||||||
When building kernels, an easy way to create a boot floppy that uses
|
When building kernels, an easy way to create a boot floppy that uses
|
||||||
syslinux is to use the zdisk or bzdisk make targets which use zimage
|
syslinux is to use the zdisk or bzdisk make targets which use zimage
|
||||||
and bzimage images respectively. Both targets accept the
|
and bzimage images respectively. Both targets accept the
|
||||||
FDARGS parameter which can be used to set the kernel command line.
|
FDARGS parameter which can be used to set the kernel command line.
|
||||||
|
|
||||||
e.g.
|
e.g::
|
||||||
|
|
||||||
make bzdisk FDARGS="root=/dev/nfs"
|
make bzdisk FDARGS="root=/dev/nfs"
|
||||||
|
|
||||||
Note that the user running this command will need to have
|
Note that the user running this command will need to have
|
||||||
@ -267,32 +266,36 @@ They depend on various facilities being available:
|
|||||||
For more information on syslinux, including how to create bootdisks
|
For more information on syslinux, including how to create bootdisks
|
||||||
for prebuilt kernels, see http://syslinux.zytor.com/
|
for prebuilt kernels, see http://syslinux.zytor.com/
|
||||||
|
|
||||||
N.B: Previously it was possible to write a kernel directly to
|
.. note::
|
||||||
a floppy using dd, configure the boot device using rdev, and
|
Previously it was possible to write a kernel directly to
|
||||||
boot using the resulting floppy. Linux no longer supports this
|
a floppy using dd, configure the boot device using rdev, and
|
||||||
method of booting.
|
boot using the resulting floppy. Linux no longer supports this
|
||||||
|
method of booting.
|
||||||
|
|
||||||
3.2) Booting from a cdrom using isolinux
|
- Booting from a cdrom using isolinux
|
||||||
|
|
||||||
When building kernels, an easy way to create a bootable cdrom that
|
When building kernels, an easy way to create a bootable cdrom that
|
||||||
uses isolinux is to use the isoimage target which uses a bzimage
|
uses isolinux is to use the isoimage target which uses a bzimage
|
||||||
image. Like zdisk and bzdisk, this target accepts the FDARGS
|
image. Like zdisk and bzdisk, this target accepts the FDARGS
|
||||||
parameter which can be used to set the kernel command line.
|
parameter which can be used to set the kernel command line.
|
||||||
|
|
||||||
e.g.
|
e.g::
|
||||||
|
|
||||||
make isoimage FDARGS="root=/dev/nfs"
|
make isoimage FDARGS="root=/dev/nfs"
|
||||||
|
|
||||||
The resulting iso image will be arch/<ARCH>/boot/image.iso
|
The resulting iso image will be arch/<ARCH>/boot/image.iso
|
||||||
This can be written to a cdrom using a variety of tools including
|
This can be written to a cdrom using a variety of tools including
|
||||||
cdrecord.
|
cdrecord.
|
||||||
|
|
||||||
e.g.
|
e.g::
|
||||||
|
|
||||||
cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso
|
cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso
|
||||||
|
|
||||||
For more information on isolinux, including how to create bootdisks
|
For more information on isolinux, including how to create bootdisks
|
||||||
for prebuilt kernels, see http://syslinux.zytor.com/
|
for prebuilt kernels, see http://syslinux.zytor.com/
|
||||||
|
|
||||||
3.2) Using LILO
|
- Using LILO
|
||||||
|
|
||||||
When using LILO all the necessary command line parameters may be
|
When using LILO all the necessary command line parameters may be
|
||||||
specified using the 'append=' directive in the LILO configuration
|
specified using the 'append=' directive in the LILO configuration
|
||||||
file.
|
file.
|
||||||
@ -300,15 +303,19 @@ They depend on various facilities being available:
|
|||||||
However, to use the 'root=' directive you also need to create
|
However, to use the 'root=' directive you also need to create
|
||||||
a dummy root device, which may be removed after LILO is run.
|
a dummy root device, which may be removed after LILO is run.
|
||||||
|
|
||||||
mknod /dev/boot255 c 0 255
|
e.g::
|
||||||
|
|
||||||
|
mknod /dev/boot255 c 0 255
|
||||||
|
|
||||||
For information on configuring LILO, please refer to its documentation.
|
For information on configuring LILO, please refer to its documentation.
|
||||||
|
|
||||||
3.3) Using GRUB
|
- Using GRUB
|
||||||
|
|
||||||
When using GRUB, kernel parameter are simply appended after the kernel
|
When using GRUB, kernel parameter are simply appended after the kernel
|
||||||
specification: kernel <kernel> <parameters>
|
specification: kernel <kernel> <parameters>
|
||||||
|
|
||||||
3.4) Using loadlin
|
- Using loadlin
|
||||||
|
|
||||||
loadlin may be used to boot Linux from a DOS command prompt without
|
loadlin may be used to boot Linux from a DOS command prompt without
|
||||||
requiring a local hard disk to mount as root. This has not been
|
requiring a local hard disk to mount as root. This has not been
|
||||||
thoroughly tested by the authors of this document, but in general
|
thoroughly tested by the authors of this document, but in general
|
||||||
@ -317,7 +324,8 @@ They depend on various facilities being available:
|
|||||||
|
|
||||||
Please refer to the loadlin documentation for further information.
|
Please refer to the loadlin documentation for further information.
|
||||||
|
|
||||||
3.5) Using a boot ROM
|
- Using a boot ROM
|
||||||
|
|
||||||
This is probably the most elegant way of booting a diskless client.
|
This is probably the most elegant way of booting a diskless client.
|
||||||
With a boot ROM the kernel is loaded using the TFTP protocol. The
|
With a boot ROM the kernel is loaded using the TFTP protocol. The
|
||||||
authors of this document are not aware of any no commercial boot
|
authors of this document are not aware of any no commercial boot
|
||||||
@ -326,7 +334,8 @@ They depend on various facilities being available:
|
|||||||
etherboot, both of which are available on sunsite.unc.edu, and both
|
etherboot, both of which are available on sunsite.unc.edu, and both
|
||||||
of which contain everything you need to boot a diskless Linux client.
|
of which contain everything you need to boot a diskless Linux client.
|
||||||
|
|
||||||
3.6) Using pxelinux
|
- Using pxelinux
|
||||||
|
|
||||||
Pxelinux may be used to boot linux using the PXE boot loader
|
Pxelinux may be used to boot linux using the PXE boot loader
|
||||||
which is present on many modern network cards.
|
which is present on many modern network cards.
|
||||||
|
|
||||||
@ -342,8 +351,8 @@ They depend on various facilities being available:
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
4.) Credits
|
Credits
|
||||||
-------
|
=======
|
||||||
|
|
||||||
The nfsroot code in the kernel and the RARP support have been written
|
The nfsroot code in the kernel and the RARP support have been written
|
||||||
by Gero Kuhlmann <gero@gkminix.han.de>.
|
by Gero Kuhlmann <gero@gkminix.han.de>.
|
@ -1,4 +1,6 @@
|
|||||||
|
===================================
|
||||||
pNFS block layout server user guide
|
pNFS block layout server user guide
|
||||||
|
===================================
|
||||||
|
|
||||||
The Linux NFS server now supports the pNFS block layout extension. In this
|
The Linux NFS server now supports the pNFS block layout extension. In this
|
||||||
case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition
|
case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition
|
||||||
@ -22,16 +24,19 @@ If the nfsd server needs to fence a non-responding client it calls
|
|||||||
/sbin/nfsd-recall-failed with the first argument set to the IP address of
|
/sbin/nfsd-recall-failed with the first argument set to the IP address of
|
||||||
the client, and the second argument set to the device node without the /dev
|
the client, and the second argument set to the device node without the /dev
|
||||||
prefix for the file system to be fenced. Below is an example file that shows
|
prefix for the file system to be fenced. Below is an example file that shows
|
||||||
how to translate the device into a serial number from SCSI EVPD 0x80:
|
how to translate the device into a serial number from SCSI EVPD 0x80::
|
||||||
|
|
||||||
cat > /sbin/nfsd-recall-failed << EOF
|
cat > /sbin/nfsd-recall-failed << EOF
|
||||||
#!/bin/sh
|
|
||||||
|
|
||||||
CLIENT="$1"
|
.. code-block:: sh
|
||||||
DEV="/dev/$2"
|
|
||||||
EVPD=`sg_inq --page=0x80 ${DEV} | \
|
|
||||||
grep "Unit serial number:" | \
|
|
||||||
awk -F ': ' '{print $2}'`
|
|
||||||
|
|
||||||
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log
|
#!/bin/sh
|
||||||
EOF
|
|
||||||
|
CLIENT="$1"
|
||||||
|
DEV="/dev/$2"
|
||||||
|
EVPD=`sg_inq --page=0x80 ${DEV} | \
|
||||||
|
grep "Unit serial number:" | \
|
||||||
|
awk -F ': ' '{print $2}'`
|
||||||
|
|
||||||
|
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log
|
||||||
|
EOF
|
@ -1,4 +1,5 @@
|
|||||||
|
|
||||||
|
==================================
|
||||||
pNFS SCSI layout server user guide
|
pNFS SCSI layout server user guide
|
||||||
==================================
|
==================================
|
||||||
|
|
@ -1,274 +0,0 @@
|
|||||||
################################################################################
|
|
||||||
# #
|
|
||||||
# NFS/RDMA README #
|
|
||||||
# #
|
|
||||||
################################################################################
|
|
||||||
|
|
||||||
Author: NetApp and Open Grid Computing
|
|
||||||
Date: May 29, 2008
|
|
||||||
|
|
||||||
Table of Contents
|
|
||||||
~~~~~~~~~~~~~~~~~
|
|
||||||
- Overview
|
|
||||||
- Getting Help
|
|
||||||
- Installation
|
|
||||||
- Check RDMA and NFS Setup
|
|
||||||
- NFS/RDMA Setup
|
|
||||||
|
|
||||||
Overview
|
|
||||||
~~~~~~~~
|
|
||||||
|
|
||||||
This document describes how to install and setup the Linux NFS/RDMA client
|
|
||||||
and server software.
|
|
||||||
|
|
||||||
The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
|
|
||||||
was first included in the following release, Linux 2.6.25.
|
|
||||||
|
|
||||||
In our testing, we have obtained excellent performance results (full 10Gbit
|
|
||||||
wire bandwidth at minimal client CPU) under many workloads. The code passes
|
|
||||||
the full Connectathon test suite and operates over both Infiniband and iWARP
|
|
||||||
RDMA adapters.
|
|
||||||
|
|
||||||
Getting Help
|
|
||||||
~~~~~~~~~~~~
|
|
||||||
|
|
||||||
If you get stuck, you can ask questions on the
|
|
||||||
|
|
||||||
nfs-rdma-devel@lists.sourceforge.net
|
|
||||||
|
|
||||||
mailing list.
|
|
||||||
|
|
||||||
Installation
|
|
||||||
~~~~~~~~~~~~
|
|
||||||
|
|
||||||
These instructions are a step by step guide to building a machine for
|
|
||||||
use with NFS/RDMA.
|
|
||||||
|
|
||||||
- Install an RDMA device
|
|
||||||
|
|
||||||
Any device supported by the drivers in drivers/infiniband/hw is acceptable.
|
|
||||||
|
|
||||||
Testing has been performed using several Mellanox-based IB cards, the
|
|
||||||
Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
|
|
||||||
|
|
||||||
- Install a Linux distribution and tools
|
|
||||||
|
|
||||||
The first kernel release to contain both the NFS/RDMA client and server was
|
|
||||||
Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
|
|
||||||
Linux kernel release should be installed.
|
|
||||||
|
|
||||||
The procedures described in this document have been tested with
|
|
||||||
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
|
||||||
|
|
||||||
- Install nfs-utils-1.1.2 or greater on the client
|
|
||||||
|
|
||||||
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
|
|
||||||
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
|
|
||||||
version with support for NFS/RDMA mounts, but for various reasons we
|
|
||||||
recommend using nfs-utils-1.1.2 or greater). To see which version of
|
|
||||||
mount.nfs you are using, type:
|
|
||||||
|
|
||||||
$ /sbin/mount.nfs -V
|
|
||||||
|
|
||||||
If the version is less than 1.1.2 or the command does not exist,
|
|
||||||
you should install the latest version of nfs-utils.
|
|
||||||
|
|
||||||
Download the latest package from:
|
|
||||||
|
|
||||||
http://www.kernel.org/pub/linux/utils/nfs
|
|
||||||
|
|
||||||
Uncompress the package and follow the installation instructions.
|
|
||||||
|
|
||||||
If you will not need the idmapper and gssd executables (you do not need
|
|
||||||
these to create an NFS/RDMA enabled mount command), the installation
|
|
||||||
process can be simplified by disabling these features when running
|
|
||||||
configure:
|
|
||||||
|
|
||||||
$ ./configure --disable-gss --disable-nfsv4
|
|
||||||
|
|
||||||
To build nfs-utils you will need the tcp_wrappers package installed. For
|
|
||||||
more information on this see the package's README and INSTALL files.
|
|
||||||
|
|
||||||
After building the nfs-utils package, there will be a mount.nfs binary in
|
|
||||||
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
|
||||||
or v4 mounts. To initiate a v4 mount, the binary must be called
|
|
||||||
mount.nfs4. The standard technique is to create a symlink called
|
|
||||||
mount.nfs4 to mount.nfs.
|
|
||||||
|
|
||||||
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
|
|
||||||
|
|
||||||
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
|
||||||
|
|
||||||
In this location, mount.nfs will be invoked automatically for NFS mounts
|
|
||||||
by the system mount command.
|
|
||||||
|
|
||||||
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
|
||||||
on the NFS client machine. You do not need this specific version of
|
|
||||||
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
|
||||||
nfs-utils-1.1.2 is needed on the client.
|
|
||||||
|
|
||||||
- Install a Linux kernel with NFS/RDMA
|
|
||||||
|
|
||||||
The NFS/RDMA client and server are both included in the mainline Linux
|
|
||||||
kernel version 2.6.25 and later. This and other versions of the Linux
|
|
||||||
kernel can be found at:
|
|
||||||
|
|
||||||
https://www.kernel.org/pub/linux/kernel/
|
|
||||||
|
|
||||||
Download the sources and place them in an appropriate location.
|
|
||||||
|
|
||||||
- Configure the RDMA stack
|
|
||||||
|
|
||||||
Make sure your kernel configuration has RDMA support enabled. Under
|
|
||||||
Device Drivers -> InfiniBand support, update the kernel configuration
|
|
||||||
to enable InfiniBand support [NOTE: the option name is misleading. Enabling
|
|
||||||
InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
|
|
||||||
|
|
||||||
Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
|
|
||||||
iWARP adapter support (amso, cxgb3, etc.).
|
|
||||||
|
|
||||||
If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
|
|
||||||
|
|
||||||
- Configure the NFS client and server
|
|
||||||
|
|
||||||
Your kernel configuration must also have NFS file system support and/or
|
|
||||||
NFS server support enabled. These and other NFS related configuration
|
|
||||||
options can be found under File Systems -> Network File Systems.
|
|
||||||
|
|
||||||
- Build, install, reboot
|
|
||||||
|
|
||||||
The NFS/RDMA code will be enabled automatically if NFS and RDMA
|
|
||||||
are turned on. The NFS/RDMA client and server are configured via the hidden
|
|
||||||
SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
|
|
||||||
value of SUNRPC_XPRT_RDMA will be:
|
|
||||||
|
|
||||||
- N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
|
|
||||||
and server will not be built
|
|
||||||
- M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
|
|
||||||
in this case the NFS/RDMA client and server will be built as modules
|
|
||||||
- Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
|
|
||||||
and server will be built into the kernel
|
|
||||||
|
|
||||||
Therefore, if you have followed the steps above and turned no NFS and RDMA,
|
|
||||||
the NFS/RDMA client and server will be built.
|
|
||||||
|
|
||||||
Build a new kernel, install it, boot it.
|
|
||||||
|
|
||||||
Check RDMA and NFS Setup
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Before configuring the NFS/RDMA software, it is a good idea to test
|
|
||||||
your new kernel to ensure that the kernel is working correctly.
|
|
||||||
In particular, it is a good idea to verify that the RDMA stack
|
|
||||||
is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
|
|
||||||
is working properly.
|
|
||||||
|
|
||||||
- Check RDMA Setup
|
|
||||||
|
|
||||||
If you built the RDMA components as modules, load them at
|
|
||||||
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
|
||||||
card:
|
|
||||||
|
|
||||||
$ modprobe ib_mthca
|
|
||||||
$ modprobe ib_ipoib
|
|
||||||
|
|
||||||
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
|
||||||
running on the network. If your IB switch has an embedded SM, you can
|
|
||||||
use it. Otherwise, you will need to run an SM, such as OpenSM, on one
|
|
||||||
of your end nodes.
|
|
||||||
|
|
||||||
If an SM is running on your network, you should see the following:
|
|
||||||
|
|
||||||
$ cat /sys/class/infiniband/driverX/ports/1/state
|
|
||||||
4: ACTIVE
|
|
||||||
|
|
||||||
where driverX is mthca0, ipath5, ehca3, etc.
|
|
||||||
|
|
||||||
To further test the InfiniBand software stack, use IPoIB (this
|
|
||||||
assumes you have two IB hosts named host1 and host2):
|
|
||||||
|
|
||||||
host1$ ip link set dev ib0 up
|
|
||||||
host1$ ip address add dev ib0 a.b.c.x
|
|
||||||
host2$ ip link set dev ib0 up
|
|
||||||
host2$ ip address add dev ib0 a.b.c.y
|
|
||||||
host1$ ping a.b.c.y
|
|
||||||
host2$ ping a.b.c.x
|
|
||||||
|
|
||||||
For other device types, follow the appropriate procedures.
|
|
||||||
|
|
||||||
- Check NFS Setup
|
|
||||||
|
|
||||||
For the NFS components enabled above (client and/or server),
|
|
||||||
test their functionality over standard Ethernet using TCP/IP or UDP/IP.
|
|
||||||
|
|
||||||
NFS/RDMA Setup
|
|
||||||
~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
We recommend that you use two machines, one to act as the client and
|
|
||||||
one to act as the server.
|
|
||||||
|
|
||||||
One time configuration:
|
|
||||||
|
|
||||||
- On the server system, configure the /etc/exports file and
|
|
||||||
start the NFS/RDMA server.
|
|
||||||
|
|
||||||
Exports entries with the following formats have been tested:
|
|
||||||
|
|
||||||
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
|
||||||
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
|
||||||
|
|
||||||
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
|
|
||||||
HCA or the client's iWARP address(es) for an RNIC.
|
|
||||||
|
|
||||||
NOTE: The "insecure" option must be used because the NFS/RDMA client does
|
|
||||||
not use a reserved port.
|
|
||||||
|
|
||||||
Each time a machine boots:
|
|
||||||
|
|
||||||
- Load and configure the RDMA drivers
|
|
||||||
|
|
||||||
For InfiniBand using a Mellanox adapter:
|
|
||||||
|
|
||||||
$ modprobe ib_mthca
|
|
||||||
$ modprobe ib_ipoib
|
|
||||||
$ ip li set dev ib0 up
|
|
||||||
$ ip addr add dev ib0 a.b.c.d
|
|
||||||
|
|
||||||
NOTE: use unique addresses for the client and server
|
|
||||||
|
|
||||||
- Start the NFS server
|
|
||||||
|
|
||||||
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
|
||||||
kernel config), load the RDMA transport module:
|
|
||||||
|
|
||||||
$ modprobe svcrdma
|
|
||||||
|
|
||||||
Regardless of how the server was built (module or built-in), start the
|
|
||||||
server:
|
|
||||||
|
|
||||||
$ /etc/init.d/nfs start
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
$ service nfs start
|
|
||||||
|
|
||||||
Instruct the server to listen on the RDMA transport:
|
|
||||||
|
|
||||||
$ echo rdma 20049 > /proc/fs/nfsd/portlist
|
|
||||||
|
|
||||||
- On the client system
|
|
||||||
|
|
||||||
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
|
||||||
kernel config), load the RDMA client module:
|
|
||||||
|
|
||||||
$ modprobe xprtrdma.ko
|
|
||||||
|
|
||||||
Regardless of how the client was built (module or built-in), use this
|
|
||||||
command to mount the NFS/RDMA server:
|
|
||||||
|
|
||||||
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
|
|
||||||
|
|
||||||
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
|
|
||||||
the "proto" field for the given mount.
|
|
||||||
|
|
||||||
Congratulations! You're using NFS/RDMA!
|
|
Loading…
Reference in New Issue
Block a user