forked from Minki/linux
This has been a busy cycle for documentation work. Highlights include:
- Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ... -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl6BLf4PHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YLhkIAIhcg6gxp0oZZ3KDfQyhvej0EWQGVDNkmloQ O1VOSV3RJsZL9HwN9xSNnNfN5+hw5RUYVbn1s201uj6kovZY9qcTpHP2LCizUeGb eFkSTmzkyAuAbJjuVLgMPDerJPEew0HnudiToeSpQeoIL1WB6YGd4/5H/cN1KLex 8ggjllcY0wOgbiFffmK6+tavDv7vT0lKTdwKRYh2nxu7zrPVVd1ZnW+RtntdTVQt i+xwV6/YdWtg5C53IwBPpeyubX40vqaIjU8rzpLq5SCVbsZN14sSR709m1AYCOK0 i4VDWEhfA2XBi6Nycl5U0czuGziwoHrTgSCkS1mmSDujnpgfKM8= =6YOS -----END PGP SIGNATURE----- Merge tag 'docs-5.7' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "This has been a busy cycle for documentation work. Highlights include: - Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ..." * tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits) Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT MAINTAINERS: adjust to filesystem doc ReST conversion docs: deprecated.rst: Add BUG()-family doc: zh_CN: add translation for virtiofs doc: zh_CN: index files in filesystems subdirectory docs: locking: Drop :c:func: throughout docs: locking: Add 'need' to hardirq section docs: conf.py: avoid thousands of duplicate label warning on Sphinx docs: prevent warnings due to autosectionlabel docs: fix reference to core-api/namespaces.rst docs: fix pointers to io-mapping.rst and io_ordering.rst files Documentation: Better document the softlockup_panic sysctl docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref docs: perf: imx-ddr.rst: get rid of a warning docs: filesystems: fuse.rst: supress a Sphinx warning docs: translations: it: avoid duplicate refs at programming-language.rst docs: driver.rst: supress two ReSt warnings docs: trace: events.rst: convert some new stuff to ReST format Documentation: Add io_ordering.rst to driver-api manual Documentation: Add io-mapping.rst to driver-api manual ...
This commit is contained in:
commit
481ed297d9
@ -1,5 +1,5 @@
|
||||
What: /sys/kernel/uids/<uid>/cpu_shares
|
||||
Date: December 2007
|
||||
Date: December 2007, finally removed in kernel v2.6.34-rc1
|
||||
Contact: Dhaval Giani <dhaval@linux.vnet.ibm.com>
|
||||
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
|
||||
Description:
|
@ -13,7 +13,7 @@ endif
|
||||
SPHINXBUILD = sphinx-build
|
||||
SPHINXOPTS =
|
||||
SPHINXDIRS = .
|
||||
_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst))
|
||||
_SPHINXDIRS = $(sort $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)))
|
||||
SPHINX_CONF = conf.py
|
||||
PAPER =
|
||||
BUILDDIR = $(obj)/output
|
||||
|
@ -239,7 +239,7 @@ from the PCI device config space. Use the values in the pci_dev structure
|
||||
as the PCI "bus address" might have been remapped to a "host physical"
|
||||
address by the arch/chip-set specific kernel support.
|
||||
|
||||
See Documentation/io-mapping.txt for how to access device registers
|
||||
See Documentation/driver-api/io-mapping.rst for how to access device registers
|
||||
or device memory.
|
||||
|
||||
The device driver needs to call pci_request_region() to verify
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. _psi:
|
||||
|
||||
================================
|
||||
PSI - Pressure Stall Information
|
||||
================================
|
||||
|
@ -1,5 +1,5 @@
|
||||
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
|
||||
=====================================================================
|
||||
Kernel Support for miscellaneous Binary Formats (binfmt_misc)
|
||||
=============================================================
|
||||
|
||||
This Kernel feature allows you to invoke almost (for restrictions see below)
|
||||
every program by simply typing its name in the shell.
|
||||
|
@ -251,8 +251,6 @@ line of text and contains the following stats separated by whitespace:
|
||||
|
||||
================ =============================================================
|
||||
orig_data_size uncompressed size of data stored in this disk.
|
||||
This excludes same-element-filled pages (same_pages) since
|
||||
no memory is allocated for them.
|
||||
Unit: bytes
|
||||
compr_data_size compressed size of data stored in this disk
|
||||
mem_used_total the amount of memory allocated for this disk. This
|
||||
|
@ -23,7 +23,7 @@ of dot-connected-words, and key and value are connected by ``=``. The value
|
||||
has to be terminated by semi-colon (``;``) or newline (``\n``).
|
||||
For array value, array entries are separated by comma (``,``). ::
|
||||
|
||||
KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
|
||||
KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
|
||||
|
||||
Unlike the kernel command line syntax, spaces are OK around the comma and ``=``.
|
||||
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. _cgroup-v1:
|
||||
|
||||
========================
|
||||
Control Groups version 1
|
||||
========================
|
||||
|
@ -9,7 +9,7 @@ This is the authoritative documentation on the design, interface and
|
||||
conventions of cgroup v2. It describes all userland-visible aspects
|
||||
of cgroup including core and specific controller behaviors. All
|
||||
future changes must be reflected in this document. Documentation for
|
||||
v1 is available under Documentation/admin-guide/cgroup-v1/.
|
||||
v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
|
||||
|
||||
.. CONTENTS
|
||||
|
||||
@ -1023,7 +1023,7 @@ All time durations are in microseconds.
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for CPU. See
|
||||
Documentation/accounting/psi.rst for details.
|
||||
:ref:`Documentation/accounting/psi.rst <psi>` for details.
|
||||
|
||||
cpu.uclamp.min
|
||||
A read-write single value file which exists on non-root cgroups.
|
||||
@ -1313,53 +1313,41 @@ PAGE_SIZE multiple when read back.
|
||||
Number of major page faults incurred
|
||||
|
||||
workingset_refault
|
||||
|
||||
Number of refaults of previously evicted pages
|
||||
|
||||
workingset_activate
|
||||
|
||||
Number of refaulted pages that were immediately activated
|
||||
|
||||
workingset_nodereclaim
|
||||
|
||||
Number of times a shadow node has been reclaimed
|
||||
|
||||
pgrefill
|
||||
|
||||
Amount of scanned pages (in an active LRU list)
|
||||
|
||||
pgscan
|
||||
|
||||
Amount of scanned pages (in an inactive LRU list)
|
||||
|
||||
pgsteal
|
||||
|
||||
Amount of reclaimed pages
|
||||
|
||||
pgactivate
|
||||
|
||||
Amount of pages moved to the active LRU list
|
||||
|
||||
pgdeactivate
|
||||
|
||||
Amount of pages moved to the inactive LRU list
|
||||
|
||||
pglazyfree
|
||||
|
||||
Amount of pages postponed to be freed under memory pressure
|
||||
|
||||
pglazyfreed
|
||||
|
||||
Amount of reclaimed lazyfree pages
|
||||
|
||||
thp_fault_alloc
|
||||
|
||||
Number of transparent hugepages which were allocated to satisfy
|
||||
a page fault, including COW faults. This counter is not present
|
||||
when CONFIG_TRANSPARENT_HUGEPAGE is not set.
|
||||
|
||||
thp_collapse_alloc
|
||||
|
||||
Number of transparent hugepages which were allocated to allow
|
||||
collapsing an existing range of pages. This counter is not
|
||||
present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
|
||||
@ -1403,7 +1391,7 @@ PAGE_SIZE multiple when read back.
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for memory. See
|
||||
Documentation/accounting/psi.rst for details.
|
||||
:ref:`Documentation/accounting/psi.rst <psi>` for details.
|
||||
|
||||
|
||||
Usage Guidelines
|
||||
@ -1478,7 +1466,7 @@ IO Interface Files
|
||||
dios Number of discard IOs
|
||||
====== =====================
|
||||
|
||||
An example read output follows:
|
||||
An example read output follows::
|
||||
|
||||
8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 dbytes=0 dios=0
|
||||
8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 dbytes=50331648 dios=3021
|
||||
@ -1643,7 +1631,7 @@ IO Interface Files
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for IO. See
|
||||
Documentation/accounting/psi.rst for details.
|
||||
:ref:`Documentation/accounting/psi.rst <psi>` for details.
|
||||
|
||||
|
||||
Writeback
|
||||
@ -1853,7 +1841,7 @@ Cpuset Interface Files
|
||||
from the requested CPUs.
|
||||
|
||||
The CPU numbers are comma-separated numbers or ranges.
|
||||
For example:
|
||||
For example::
|
||||
|
||||
# cat cpuset.cpus
|
||||
0-4,6,8-10
|
||||
@ -1892,7 +1880,7 @@ Cpuset Interface Files
|
||||
from the requested memory nodes.
|
||||
|
||||
The memory node numbers are comma-separated numbers or ranges.
|
||||
For example:
|
||||
For example::
|
||||
|
||||
# cat cpuset.mems
|
||||
0-1,3
|
||||
|
@ -11,11 +11,13 @@ Today, with the advent of Kernel Mode Setting, a graphics board is
|
||||
either correctly working because all components follow the standards -
|
||||
or the computer is unusable, because the screen remains dark after
|
||||
booting or it displays the wrong area. Cases when this happens are:
|
||||
|
||||
- The graphics board does not recognize the monitor.
|
||||
- The graphics board is unable to detect any EDID data.
|
||||
- The graphics board incorrectly forwards EDID data to the driver.
|
||||
- The monitor sends no or bogus EDID data.
|
||||
- A KVM sends its own EDID data instead of querying the connected monitor.
|
||||
|
||||
Adding the kernel parameter "nomodeset" helps in most cases, but causes
|
||||
restrictions later on.
|
||||
|
||||
@ -32,7 +34,7 @@ individual data for a specific misbehaving monitor, commented sources
|
||||
and a Makefile environment are given here.
|
||||
|
||||
To create binary EDID and C source code files from the existing data
|
||||
material, simply type "make".
|
||||
material, simply type "make" in tools/edid/.
|
||||
|
||||
If you want to create your own EDID file, copy the file 1024x768.S,
|
||||
replace the settings with your own data and add a new target to the
|
@ -136,8 +136,6 @@ enables the mitigation by default.
|
||||
The mitigation can be controlled at boot time via a kernel command line option.
|
||||
See :ref:`taa_mitigation_control_command_line`.
|
||||
|
||||
.. _virt_mechanism:
|
||||
|
||||
Virtualization mitigation
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
@ -75,6 +75,7 @@ configure specific aspects of kernel behavior to your liking.
|
||||
cputopology
|
||||
dell_rbu
|
||||
device-mapper/index
|
||||
edid
|
||||
efi-stub
|
||||
ext4
|
||||
nfs/index
|
||||
|
@ -1099,6 +1099,12 @@
|
||||
A valid base address must be provided, and the serial
|
||||
port must already be setup and configured.
|
||||
|
||||
ec_imx21,<addr>
|
||||
ec_imx6q,<addr>
|
||||
Start an early, polled-mode, output-only console on the
|
||||
Freescale i.MX UART at the specified address. The UART
|
||||
must already be setup and configured.
|
||||
|
||||
ar3700_uart,<addr>
|
||||
Start an early, polled-mode console on the
|
||||
Armada 3700 serial port at the specified
|
||||
@ -1779,7 +1785,7 @@
|
||||
provided by tboot because it makes the system
|
||||
vulnerable to DMA attacks.
|
||||
nobounce [Default off]
|
||||
Disable bounce buffer for unstrusted devices such as
|
||||
Disable bounce buffer for untrusted devices such as
|
||||
the Thunderbolt devices. This will treat the untrusted
|
||||
devices as the trusted ones, hence might expose security
|
||||
risks of DMA attacks.
|
||||
@ -1883,7 +1889,7 @@
|
||||
No delay
|
||||
|
||||
ip= [IP_PNP]
|
||||
See Documentation/filesystems/nfs/nfsroot.txt.
|
||||
See Documentation/admin-guide/nfs/nfsroot.rst.
|
||||
|
||||
ipcmni_extend [KNL] Extend the maximum number of unique System V
|
||||
IPC identifiers from 32,768 to 16,777,216.
|
||||
@ -2795,7 +2801,7 @@
|
||||
<name>,<region-number>[,<base>,<size>,<buswidth>,<altbuswidth>]
|
||||
|
||||
mtdparts= [MTD]
|
||||
See drivers/mtd/cmdlinepart.c.
|
||||
See drivers/mtd/parsers/cmdlinepart.c
|
||||
|
||||
multitce=off [PPC] This parameter disables the use of the pSeries
|
||||
firmware feature for updating multiple TCE entries
|
||||
@ -2853,13 +2859,13 @@
|
||||
Default value is 0.
|
||||
|
||||
nfsaddrs= [NFS] Deprecated. Use ip= instead.
|
||||
See Documentation/filesystems/nfs/nfsroot.txt.
|
||||
See Documentation/admin-guide/nfs/nfsroot.rst.
|
||||
|
||||
nfsroot= [NFS] nfs root filesystem for disk-less boxes.
|
||||
See Documentation/filesystems/nfs/nfsroot.txt.
|
||||
See Documentation/admin-guide/nfs/nfsroot.rst.
|
||||
|
||||
nfsrootdebug [NFS] enable nfsroot debugging messages.
|
||||
See Documentation/filesystems/nfs/nfsroot.txt.
|
||||
See Documentation/admin-guide/nfs/nfsroot.rst.
|
||||
|
||||
nfs.callback_nr_threads=
|
||||
[NFSv4] set the total number of threads that the
|
||||
@ -4514,10 +4520,10 @@
|
||||
Format: <integer>
|
||||
|
||||
A nonzero value instructs the soft-lockup detector
|
||||
to panic the machine when a soft-lockup occurs. This
|
||||
is also controlled by CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
|
||||
which is the respective build-time switch to that
|
||||
functionality.
|
||||
to panic the machine when a soft-lockup occurs. It is
|
||||
also controlled by the kernel.softlockup_panic sysctl
|
||||
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
|
||||
respective build-time switch to that functionality.
|
||||
|
||||
softlockup_all_cpu_backtrace=
|
||||
[KNL] Should the soft-lockup detector generate
|
||||
|
@ -234,7 +234,7 @@ To reduce its OS jitter, do any of the following:
|
||||
Such a workqueue can be confined to a given subset of the
|
||||
CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
|
||||
files. The set of WQ_SYSFS workqueues can be displayed using
|
||||
"ls sys/devices/virtual/workqueue". That said, the workqueues
|
||||
"ls /sys/devices/virtual/workqueue". That said, the workqueues
|
||||
maintainer would like to caution people against indiscriminately
|
||||
sprinkling WQ_SYSFS across all the workqueues. The reason for
|
||||
caution is that it is easy to add WQ_SYSFS, but because sysfs is
|
||||
|
@ -43,7 +43,8 @@ value 1 for supported.
|
||||
|
||||
AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance counter.
|
||||
When non-masked bits are matching corresponding AXI_ID bits then counter is
|
||||
incremented. Perf counter is incremented if
|
||||
incremented. Perf counter is incremented if::
|
||||
|
||||
AxID && AXI_MASKING == AXI_ID && AXI_MASKING
|
||||
|
||||
This filter doesn't support filter different AXI ID for axid-read and axid-write
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -4,18 +4,18 @@ ARM TCM (Tightly-Coupled Memory) handling in Linux
|
||||
|
||||
Written by Linus Walleij <linus.walleij@stericsson.com>
|
||||
|
||||
Some ARM SoC:s have a so-called TCM (Tightly-Coupled Memory).
|
||||
Some ARM SoCs have a so-called TCM (Tightly-Coupled Memory).
|
||||
This is usually just a few (4-64) KiB of RAM inside the ARM
|
||||
processor.
|
||||
|
||||
Due to being embedded inside the CPU The TCM has a
|
||||
Due to being embedded inside the CPU, the TCM has a
|
||||
Harvard-architecture, so there is an ITCM (instruction TCM)
|
||||
and a DTCM (data TCM). The DTCM can not contain any
|
||||
instructions, but the ITCM can actually contain data.
|
||||
The size of DTCM or ITCM is minimum 4KiB so the typical
|
||||
minimum configuration is 4KiB ITCM and 4KiB DTCM.
|
||||
|
||||
ARM CPU:s have special registers to read out status, physical
|
||||
ARM CPUs have special registers to read out status, physical
|
||||
location and size of TCM memories. arch/arm/include/asm/cputype.h
|
||||
defines a CPUID_TCM register that you can read out from the
|
||||
system control coprocessor. Documentation from ARM can be found
|
||||
|
@ -38,7 +38,11 @@ needs_sphinx = '1.3'
|
||||
# ones.
|
||||
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
|
||||
'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
|
||||
'maintainers_include']
|
||||
'maintainers_include', 'sphinx.ext.autosectionlabel' ]
|
||||
|
||||
# Ensure that autosectionlabel will produce unique names
|
||||
autosectionlabel_prefix_document = True
|
||||
autosectionlabel_maxdepth = 2
|
||||
|
||||
# The name of the math extension changed on Sphinx 1.4
|
||||
if (major == 1 and minor > 3) or (major > 1):
|
||||
|
@ -8,41 +8,81 @@ This is the beginning of a manual for core kernel APIs. The conversion
|
||||
Core utilities
|
||||
==============
|
||||
|
||||
This section has general and "core core" documentation. The first is a
|
||||
massive grab-bag of kerneldoc info left over from the docbook days; it
|
||||
should really be broken up someday when somebody finds the energy to do
|
||||
it.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
kernel-api
|
||||
assoc_array
|
||||
atomic_ops
|
||||
cachetlb
|
||||
refcount-vs-atomic
|
||||
cpu_hotplug
|
||||
idr
|
||||
local_ops
|
||||
workqueue
|
||||
genericirq
|
||||
xarray
|
||||
librs
|
||||
genalloc
|
||||
errseq
|
||||
packing
|
||||
printk-formats
|
||||
symbol-namespaces
|
||||
|
||||
Data structures and low-level utilities
|
||||
=======================================
|
||||
|
||||
Library functionality that is used throughout the kernel.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
kobject
|
||||
assoc_array
|
||||
xarray
|
||||
idr
|
||||
circular-buffers
|
||||
generic-radix-tree
|
||||
packing
|
||||
timekeeping
|
||||
errseq
|
||||
|
||||
Concurrency primitives
|
||||
======================
|
||||
|
||||
How Linux keeps everything from happening at the same time. See
|
||||
:doc:`/locking/index` for more related documentation.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
atomic_ops
|
||||
refcount-vs-atomic
|
||||
local_ops
|
||||
padata
|
||||
../RCU/index
|
||||
|
||||
Low-level hardware management
|
||||
=============================
|
||||
|
||||
Cache management, managing CPU hotplug, etc.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
cachetlb
|
||||
cpu_hotplug
|
||||
memory-hotplug
|
||||
genericirq
|
||||
protection-keys
|
||||
|
||||
Memory management
|
||||
=================
|
||||
|
||||
How to allocate and use memory in the kernel. Note that there is a lot
|
||||
more memory-management documentation in :doc:`/vm/index`.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
memory-allocation
|
||||
mm-api
|
||||
genalloc
|
||||
pin_user_pages
|
||||
gfp_mask-from-fs-io
|
||||
timekeeping
|
||||
boot-time-mm
|
||||
memory-hotplug
|
||||
protection-keys
|
||||
../RCU/index
|
||||
gcc-plugins
|
||||
symbol-namespaces
|
||||
padata
|
||||
ioctl
|
||||
|
||||
gfp_mask-from-fs-io
|
||||
|
||||
Interfaces for kernel debugging
|
||||
===============================
|
||||
@ -53,6 +93,16 @@ Interfaces for kernel debugging
|
||||
debug-objects
|
||||
tracepoint
|
||||
|
||||
Everything else
|
||||
===============
|
||||
|
||||
Documents that don't fit elsewhere or which have yet to be categorized.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
librs
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
|
@ -25,7 +25,7 @@ some terms we will be working with.
|
||||
usually embedded within some other structure which contains the stuff
|
||||
the code is really interested in.
|
||||
|
||||
No structure should EVER have more than one kobject embedded within it.
|
||||
No structure should **EVER** have more than one kobject embedded within it.
|
||||
If it does, the reference counting for the object is sure to be messed
|
||||
up and incorrect, and your code will be buggy. So do not do this.
|
||||
|
||||
@ -55,7 +55,7 @@ a larger, domain-specific object. To this end, kobjects will be found
|
||||
embedded in other structures. If you are used to thinking of things in
|
||||
object-oriented terms, kobjects can be seen as a top-level, abstract class
|
||||
from which other classes are derived. A kobject implements a set of
|
||||
capabilities which are not particularly useful by themselves, but which are
|
||||
capabilities which are not particularly useful by themselves, but are
|
||||
nice to have in other objects. The C language does not allow for the
|
||||
direct expression of inheritance, so other techniques - such as structure
|
||||
embedding - must be used.
|
||||
@ -65,7 +65,7 @@ this is analogous as to how "list_head" structs are rarely useful on
|
||||
their own, but are invariably found embedded in the larger objects of
|
||||
interest.)
|
||||
|
||||
So, for example, the UIO code in drivers/uio/uio.c has a structure that
|
||||
So, for example, the UIO code in ``drivers/uio/uio.c`` has a structure that
|
||||
defines the memory region associated with a uio device::
|
||||
|
||||
struct uio_map {
|
||||
@ -78,26 +78,26 @@ just a matter of using the kobj member. Code that works with kobjects will
|
||||
often have the opposite problem, however: given a struct kobject pointer,
|
||||
what is the pointer to the containing structure? You must avoid tricks
|
||||
(such as assuming that the kobject is at the beginning of the structure)
|
||||
and, instead, use the container_of() macro, found in <linux/kernel.h>::
|
||||
and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
|
||||
|
||||
container_of(pointer, type, member)
|
||||
|
||||
where:
|
||||
|
||||
* "pointer" is the pointer to the embedded kobject,
|
||||
* "type" is the type of the containing structure, and
|
||||
* "member" is the name of the structure field to which "pointer" points.
|
||||
* ``pointer`` is the pointer to the embedded kobject,
|
||||
* ``type`` is the type of the containing structure, and
|
||||
* ``member`` is the name of the structure field to which ``pointer`` points.
|
||||
|
||||
The return value from container_of() is a pointer to the corresponding
|
||||
container type. So, for example, a pointer "kp" to a struct kobject
|
||||
embedded *within* a struct uio_map could be converted to a pointer to the
|
||||
*containing* uio_map structure with::
|
||||
container type. So, for example, a pointer ``kp`` to a struct kobject
|
||||
embedded **within** a struct uio_map could be converted to a pointer to the
|
||||
**containing** uio_map structure with::
|
||||
|
||||
struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
|
||||
|
||||
For convenience, programmers often define a simple macro for "back-casting"
|
||||
For convenience, programmers often define a simple macro for **back-casting**
|
||||
kobject pointers to the containing type. Exactly this happens in the
|
||||
earlier drivers/uio/uio.c, as you can see here::
|
||||
earlier ``drivers/uio/uio.c``, as you can see here::
|
||||
|
||||
struct uio_map {
|
||||
struct kobject kobj;
|
||||
@ -172,13 +172,13 @@ call to kobject_uevent()::
|
||||
|
||||
int kobject_uevent(struct kobject *kobj, enum kobject_action action);
|
||||
|
||||
Use the KOBJ_ADD action for when the kobject is first added to the kernel.
|
||||
Use the **KOBJ_ADD** action for when the kobject is first added to the kernel.
|
||||
This should be done only after any attributes or children of the kobject
|
||||
have been initialized properly, as userspace will instantly start to look
|
||||
for them when this call happens.
|
||||
|
||||
When the kobject is removed from the kernel (details on how to do that are
|
||||
below), the uevent for KOBJ_REMOVE will be automatically created by the
|
||||
below), the uevent for **KOBJ_REMOVE** will be automatically created by the
|
||||
kobject core, so the caller does not have to worry about doing that by
|
||||
hand.
|
||||
|
||||
@ -238,7 +238,7 @@ Both types of attributes used here, with a kobject that has been created
|
||||
with the kobject_create_and_add(), can be of type kobj_attribute, so no
|
||||
special custom attribute is needed to be created.
|
||||
|
||||
See the example module, samples/kobject/kobject-example.c for an
|
||||
See the example module, ``samples/kobject/kobject-example.c`` for an
|
||||
implementation of a simple kobject and attributes.
|
||||
|
||||
|
||||
@ -365,7 +365,7 @@ Because other references to the kset may still exist, the release may happen
|
||||
after kset_unregister() returns.
|
||||
|
||||
An example of using a kset can be seen in the
|
||||
samples/kobject/kset-example.c file in the kernel tree.
|
||||
``samples/kobject/kset-example.c`` file in the kernel tree.
|
||||
|
||||
If a kset wishes to control the uevent operations of the kobjects
|
||||
associated with it, it can use the struct kset_uevent_ops to handle it::
|
||||
@ -408,8 +408,8 @@ Kobject removal
|
||||
After a kobject has been registered with the kobject core successfully, it
|
||||
must be cleaned up when the code is finished with it. To do that, call
|
||||
kobject_put(). By doing this, the kobject core will automatically clean up
|
||||
all of the memory allocated by this kobject. If a KOBJ_ADD uevent has been
|
||||
sent for the object, a corresponding KOBJ_REMOVE uevent will be sent, and
|
||||
all of the memory allocated by this kobject. If a ``KOBJ_ADD`` uevent has been
|
||||
sent for the object, a corresponding ``KOBJ_REMOVE`` uevent will be sent, and
|
||||
any other sysfs housekeeping will be handled for the caller properly.
|
||||
|
||||
If you need to do a two-stage delete of the kobject (say you are not
|
||||
@ -430,5 +430,5 @@ Example code to copy from
|
||||
=========================
|
||||
|
||||
For a more complete example of using ksets and kobjects properly, see the
|
||||
example programs samples/kobject/{kobject-example.c,kset-example.c},
|
||||
which will be built as loadable modules if you select CONFIG_SAMPLE_KOBJECT.
|
||||
example programs ``samples/kobject/{kobject-example.c,kset-example.c}``,
|
||||
which will be built as loadable modules if you select ``CONFIG_SAMPLE_KOBJECT``.
|
@ -1,22 +0,0 @@
|
||||
Debugging Modules after 2.6.3
|
||||
-----------------------------
|
||||
|
||||
In almost all distributions, the kernel asks for modules which don't
|
||||
exist, such as "net-pf-10" or whatever. Changing "modprobe -q" to
|
||||
"succeed" in this case is hacky and breaks some setups, and also we
|
||||
want to know if it failed for the fallback code for old aliases in
|
||||
fs/char_dev.c, for example.
|
||||
|
||||
In the past a debugging message which would fill people's logs was
|
||||
emitted. This debugging message has been removed. The correct way
|
||||
of debugging module problems is something like this:
|
||||
|
||||
echo '#! /bin/sh' > /tmp/modprobe
|
||||
echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe
|
||||
echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe
|
||||
chmod a+x /tmp/modprobe
|
||||
echo /tmp/modprobe > /proc/sys/kernel/modprobe
|
||||
|
||||
Note that the above applies only when the *kernel* is requesting
|
||||
that the module be loaded -- it won't have any effect if that module
|
||||
is being loaded explicitly using "modprobe" from userspace.
|
@ -203,7 +203,7 @@ Cause
|
||||
may not correctly copy files from sysfs.
|
||||
|
||||
Solution
|
||||
Use ``cat``' to read ``.gcda`` files and ``cp -d`` to copy links.
|
||||
Use ``cat`` to read ``.gcda`` files and ``cp -d`` to copy links.
|
||||
Alternatively use the mechanism shown in Appendix B.
|
||||
|
||||
|
||||
|
@ -8,7 +8,8 @@ with the difference that the orphan objects are not freed but only
|
||||
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
||||
Valgrind tool (``memcheck --leak-check``) to detect the memory leaks in
|
||||
user-space applications.
|
||||
Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze, ppc, mips, s390 and tile.
|
||||
Kmemleak is supported on x86, arm, arm64, powerpc, sparc, sh, microblaze, mips,
|
||||
s390, nds32, arc and xtensa.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
@ -272,8 +272,8 @@ STA information lifetime rules
|
||||
.. kernel-doc:: net/mac80211/sta_info.c
|
||||
:doc: STA information lifetime rules
|
||||
|
||||
Aggregation
|
||||
===========
|
||||
Aggregation Functions
|
||||
=====================
|
||||
|
||||
.. kernel-doc:: net/mac80211/sta_info.h
|
||||
:functions: sta_ampdu_mlme
|
||||
@ -284,8 +284,8 @@ Aggregation
|
||||
.. kernel-doc:: net/mac80211/sta_info.h
|
||||
:functions: tid_ampdu_rx
|
||||
|
||||
Synchronisation
|
||||
===============
|
||||
Synchronisation Functions
|
||||
=========================
|
||||
|
||||
TBD
|
||||
|
||||
|
@ -5,8 +5,8 @@ DMAEngine documentation
|
||||
DMAEngine documentation provides documents for various aspects of DMAEngine
|
||||
framework.
|
||||
|
||||
DMAEngine documentation
|
||||
-----------------------
|
||||
DMAEngine development documentation
|
||||
-----------------------------------
|
||||
|
||||
This book helps with DMAengine internal APIs and guide for DMAEngine device
|
||||
driver writers.
|
||||
|
@ -210,7 +210,7 @@ probed.
|
||||
While the typical use case for sync_state() is to have the kernel cleanly take
|
||||
over management of devices from the bootloader, the usage of sync_state() is
|
||||
not restricted to that. Use it whenever it makes sense to take an action after
|
||||
all the consumers of a device have probed.
|
||||
all the consumers of a device have probed::
|
||||
|
||||
int (*remove) (struct device *dev);
|
||||
|
||||
|
@ -17,6 +17,7 @@ available subsections can be seen below.
|
||||
driver-model/index
|
||||
basics
|
||||
infrastructure
|
||||
ioctl
|
||||
early-userspace/index
|
||||
pm/index
|
||||
clk
|
||||
@ -74,11 +75,12 @@ available subsections can be seen below.
|
||||
connector
|
||||
console
|
||||
dcdbas
|
||||
edid
|
||||
eisa
|
||||
ipmb
|
||||
isa
|
||||
isapnp
|
||||
io-mapping
|
||||
io_ordering
|
||||
generic-counter
|
||||
lightnvm-pblk
|
||||
memory-devices/index
|
||||
|
@ -23,7 +23,7 @@
|
||||
| openrisc: | TODO |
|
||||
| parisc: | TODO |
|
||||
| powerpc: | ok |
|
||||
| riscv: | TODO |
|
||||
| riscv: | ok |
|
||||
| s390: | ok |
|
||||
| sh: | ok |
|
||||
| sparc: | ok |
|
||||
|
@ -1,7 +1,10 @@
|
||||
v9fs: Plan 9 Resource Sharing for Linux
|
||||
=======================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
ABOUT
|
||||
=======================================
|
||||
v9fs: Plan 9 Resource Sharing for Linux
|
||||
=======================================
|
||||
|
||||
About
|
||||
=====
|
||||
|
||||
v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
|
||||
@ -14,9 +17,11 @@ and Maya Gokhale. Additional development by Greg Watson
|
||||
|
||||
The best detailed explanation of the Linux implementation and applications of
|
||||
the 9p client is available in the form of a USENIX paper:
|
||||
|
||||
http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
|
||||
|
||||
Other applications are described in the following papers:
|
||||
|
||||
* XCPU & Clustering
|
||||
http://xcpu.org/papers/xcpu-talk.pdf
|
||||
* KVMFS: control file system for KVM
|
||||
@ -28,18 +33,18 @@ Other applications are described in the following papers:
|
||||
* VirtFS: A Virtualization Aware File System pass-through
|
||||
http://goo.gl/3WPDg
|
||||
|
||||
USAGE
|
||||
Usage
|
||||
=====
|
||||
|
||||
For remote file server:
|
||||
For remote file server::
|
||||
|
||||
mount -t 9p 10.10.1.2 /mnt/9
|
||||
|
||||
For Plan 9 From User Space applications (http://swtch.com/plan9)
|
||||
For Plan 9 From User Space applications (http://swtch.com/plan9)::
|
||||
|
||||
mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
|
||||
|
||||
For server running on QEMU host with virtio transport:
|
||||
For server running on QEMU host with virtio transport::
|
||||
|
||||
mount -t 9p -o trans=virtio <mount_tag> /mnt/9
|
||||
|
||||
@ -48,18 +53,22 @@ mount points. Each 9P export is seen by the client as a virtio device with an
|
||||
associated "mount_tag" property. Available mount tags can be
|
||||
seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
|
||||
|
||||
OPTIONS
|
||||
Options
|
||||
=======
|
||||
|
||||
============= ===============================================================
|
||||
trans=name select an alternative transport. Valid options are
|
||||
currently:
|
||||
unix - specifying a named pipe mount point
|
||||
tcp - specifying a normal TCP/IP connection
|
||||
fd - used passed file descriptors for connection
|
||||
|
||||
======== ============================================
|
||||
unix specifying a named pipe mount point
|
||||
tcp specifying a normal TCP/IP connection
|
||||
fd used passed file descriptors for connection
|
||||
(see rfdno and wfdno)
|
||||
virtio - connect to the next virtio channel available
|
||||
virtio connect to the next virtio channel available
|
||||
(from QEMU with trans_virtio module)
|
||||
rdma - connect to a specified RDMA channel
|
||||
rdma connect to a specified RDMA channel
|
||||
======== ============================================
|
||||
|
||||
uname=name user name to attempt mount as on the remote server. The
|
||||
server may override or ignore this value. Certain user
|
||||
@ -69,28 +78,36 @@ OPTIONS
|
||||
offering several exported file systems.
|
||||
|
||||
cache=mode specifies a caching policy. By default, no caches are used.
|
||||
none = default no cache policy, metadata and data
|
||||
|
||||
none
|
||||
default no cache policy, metadata and data
|
||||
alike are synchronous.
|
||||
loose = no attempts are made at consistency,
|
||||
loose
|
||||
no attempts are made at consistency,
|
||||
intended for exclusive, read-only mounts
|
||||
fscache = use FS-Cache for a persistent, read-only
|
||||
fscache
|
||||
use FS-Cache for a persistent, read-only
|
||||
cache backend.
|
||||
mmap = minimal cache that is only used for read-write
|
||||
mmap
|
||||
minimal cache that is only used for read-write
|
||||
mmap. Northing else is cached, like cache=none
|
||||
|
||||
debug=n specifies debug level. The debug level is a bitmask.
|
||||
0x01 = display verbose error messages
|
||||
0x02 = developer debug (DEBUG_CURRENT)
|
||||
0x04 = display 9p trace
|
||||
0x08 = display VFS trace
|
||||
0x10 = display Marshalling debug
|
||||
0x20 = display RPC debug
|
||||
0x40 = display transport debug
|
||||
0x80 = display allocation debug
|
||||
0x100 = display protocol message debug
|
||||
0x200 = display Fid debug
|
||||
0x400 = display packet debug
|
||||
0x800 = display fscache tracing debug
|
||||
|
||||
===== ================================
|
||||
0x01 display verbose error messages
|
||||
0x02 developer debug (DEBUG_CURRENT)
|
||||
0x04 display 9p trace
|
||||
0x08 display VFS trace
|
||||
0x10 display Marshalling debug
|
||||
0x20 display RPC debug
|
||||
0x40 display transport debug
|
||||
0x80 display allocation debug
|
||||
0x100 display protocol message debug
|
||||
0x200 display Fid debug
|
||||
0x400 display packet debug
|
||||
0x800 display fscache tracing debug
|
||||
===== ================================
|
||||
|
||||
rfdno=n the file descriptor for reading with trans=fd
|
||||
|
||||
@ -103,9 +120,12 @@ OPTIONS
|
||||
noextend force legacy mode (no 9p2000.u or 9p2000.L semantics)
|
||||
|
||||
version=name Select 9P protocol version. Valid options are:
|
||||
9p2000 - Legacy mode (same as noextend)
|
||||
9p2000.u - Use 9P2000.u protocol
|
||||
9p2000.L - Use 9P2000.L protocol
|
||||
|
||||
======== ==============================
|
||||
9p2000 Legacy mode (same as noextend)
|
||||
9p2000.u Use 9P2000.u protocol
|
||||
9p2000.L Use 9P2000.L protocol
|
||||
======== ==============================
|
||||
|
||||
dfltuid attempt to mount as a particular uid
|
||||
|
||||
@ -118,22 +138,27 @@ OPTIONS
|
||||
hosts. This functionality will be expanded in later versions.
|
||||
|
||||
access there are four access modes.
|
||||
user = if a user tries to access a file on v9fs
|
||||
user
|
||||
if a user tries to access a file on v9fs
|
||||
filesystem for the first time, v9fs sends an
|
||||
attach command (Tattach) for that user.
|
||||
This is the default mode.
|
||||
<uid> = allows only user with uid=<uid> to access
|
||||
<uid>
|
||||
allows only user with uid=<uid> to access
|
||||
the files on the mounted filesystem
|
||||
any = v9fs does single attach and performs all
|
||||
any
|
||||
v9fs does single attach and performs all
|
||||
operations as one user
|
||||
client = ACL based access check on the 9p client
|
||||
clien
|
||||
ACL based access check on the 9p client
|
||||
side for access validation
|
||||
|
||||
cachetag cache tag to use the specified persistent cache.
|
||||
cache tags for existing cache sessions can be listed at
|
||||
/sys/fs/9p/caches. (applies only to cache=fscache)
|
||||
============= ===============================================================
|
||||
|
||||
RESOURCES
|
||||
Resources
|
||||
=========
|
||||
|
||||
Protocol specifications are maintained on github:
|
||||
@ -158,4 +183,3 @@ http://plan9.bell-labs.com/plan9
|
||||
|
||||
For information on Plan 9 from User Space (Plan 9 applications and libraries
|
||||
ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
|
||||
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============================
|
||||
Acorn Disc Filing System - ADFS
|
||||
===============================
|
||||
|
||||
Filesystems supported by ADFS
|
||||
-----------------------------
|
||||
|
||||
@ -25,6 +31,7 @@ directory updates, specifically updating the access mode and timestamp.
|
||||
Mount options for ADFS
|
||||
----------------------
|
||||
|
||||
============ ======================================================
|
||||
uid=nnn All files in the partition will be owned by
|
||||
user id nnn. Default 0 (root).
|
||||
gid=nnn All files in the partition will be in group
|
||||
@ -36,22 +43,23 @@ Mount options for ADFS
|
||||
ftsuffix=n When ftsuffix=0, no file type suffix will be applied.
|
||||
When ftsuffix=1, a hexadecimal suffix corresponding to
|
||||
the RISC OS file type will be added. Default 0.
|
||||
============ ======================================================
|
||||
|
||||
Mapping of ADFS permissions to Linux permissions
|
||||
------------------------------------------------
|
||||
|
||||
ADFS permissions consist of the following:
|
||||
|
||||
Owner read
|
||||
Owner write
|
||||
Other read
|
||||
Other write
|
||||
- Owner read
|
||||
- Owner write
|
||||
- Other read
|
||||
- Other write
|
||||
|
||||
(In older versions, an 'execute' permission did exist, but this
|
||||
does not hold the same meaning as the Linux 'execute' permission
|
||||
and is now obsolete).
|
||||
|
||||
The mapping is performed as follows:
|
||||
The mapping is performed as follows::
|
||||
|
||||
Owner read -> -r--r--r--
|
||||
Owner write -> --w--w---w
|
||||
@ -66,17 +74,18 @@ Mapping of ADFS permissions to Linux permissions
|
||||
Possible other mode permissions -> ----rwxrwx
|
||||
|
||||
Hence, with the default masks, if a file is owner read/write, and
|
||||
not a UnixExec filetype, then the permissions will be:
|
||||
not a UnixExec filetype, then the permissions will be::
|
||||
|
||||
-rw-------
|
||||
|
||||
However, if the masks were ownmask=0770,othmask=0007, then this would
|
||||
be modified to:
|
||||
be modified to::
|
||||
|
||||
-rw-rw----
|
||||
|
||||
There is no restriction on what you can do with these masks. You may
|
||||
wish that either read bits give read access to the file for all, but
|
||||
keep the default write protection (ownmask=0755,othmask=0577):
|
||||
keep the default write protection (ownmask=0755,othmask=0577)::
|
||||
|
||||
-rw-r--r--
|
||||
|
@ -1,9 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=============================
|
||||
Overview of Amiga Filesystems
|
||||
=============================
|
||||
|
||||
Not all varieties of the Amiga filesystems are supported for reading and
|
||||
writing. The Amiga currently knows six different filesystems:
|
||||
|
||||
============== ===============================================================
|
||||
DOS\0 The old or original filesystem, not really suited for
|
||||
hard disks and normally not used on them, either.
|
||||
Supported read/write.
|
||||
@ -23,6 +27,7 @@ DOS\4 The original filesystem with directory cache. The directory
|
||||
sense on hard disks. Supported read only.
|
||||
|
||||
DOS\5 The Fast File System with directory cache. Supported read only.
|
||||
============== ===============================================================
|
||||
|
||||
All of the above filesystems allow block sizes from 512 to 32K bytes.
|
||||
Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
|
||||
@ -36,14 +41,18 @@ are supported, too.
|
||||
Mount options for the AFFS
|
||||
==========================
|
||||
|
||||
protect If this option is set, the protection bits cannot be altered.
|
||||
protect
|
||||
If this option is set, the protection bits cannot be altered.
|
||||
|
||||
setuid[=uid] This sets the owner of all files and directories in the file
|
||||
setuid[=uid]
|
||||
This sets the owner of all files and directories in the file
|
||||
system to uid or the uid of the current user, respectively.
|
||||
|
||||
setgid[=gid] Same as above, but for gid.
|
||||
setgid[=gid]
|
||||
Same as above, but for gid.
|
||||
|
||||
mode=mode Sets the mode flags to the given (octal) value, regardless
|
||||
mode=mode
|
||||
Sets the mode flags to the given (octal) value, regardless
|
||||
of the original permissions. Directories will get an x
|
||||
permission if the corresponding r bit is set.
|
||||
This is useful since most of the plain AmigaOS files
|
||||
@ -53,33 +62,41 @@ nofilenametruncate
|
||||
The file system will return an error when filename exceeds
|
||||
standard maximum filename length (30 characters).
|
||||
|
||||
reserved=num Sets the number of reserved blocks at the start of the
|
||||
reserved=num
|
||||
Sets the number of reserved blocks at the start of the
|
||||
partition to num. You should never need this option.
|
||||
Default is 2.
|
||||
|
||||
root=block Sets the block number of the root block. This should never
|
||||
root=block
|
||||
Sets the block number of the root block. This should never
|
||||
be necessary.
|
||||
|
||||
bs=blksize Sets the blocksize to blksize. Valid block sizes are 512,
|
||||
bs=blksize
|
||||
Sets the blocksize to blksize. Valid block sizes are 512,
|
||||
1024, 2048 and 4096. Like the root option, this should
|
||||
never be necessary, as the affs can figure it out itself.
|
||||
|
||||
quiet The file system will not return an error for disallowed
|
||||
quiet
|
||||
The file system will not return an error for disallowed
|
||||
mode changes.
|
||||
|
||||
verbose The volume name, file system type and block size will
|
||||
verbose
|
||||
The volume name, file system type and block size will
|
||||
be written to the syslog when the filesystem is mounted.
|
||||
|
||||
mufs The filesystem is really a muFS, also it doesn't
|
||||
mufs
|
||||
The filesystem is really a muFS, also it doesn't
|
||||
identify itself as one. This option is necessary if
|
||||
the filesystem wasn't formatted as muFS, but is used
|
||||
as one.
|
||||
|
||||
prefix=path Path will be prefixed to every absolute path name of
|
||||
prefix=path
|
||||
Path will be prefixed to every absolute path name of
|
||||
symbolic links on an AFFS partition. Default = "/".
|
||||
(See below.)
|
||||
|
||||
volume=name When symbolic links with an absolute path are created
|
||||
volume=name
|
||||
When symbolic links with an absolute path are created
|
||||
on an AFFS partition, name will be prepended as the
|
||||
volume name. Default = "" (empty string).
|
||||
(See below.)
|
||||
@ -148,11 +165,13 @@ might be "User", "WB" and "Graphics", the mount points /amiga/User,
|
||||
Examples
|
||||
========
|
||||
|
||||
Command line:
|
||||
Command line::
|
||||
|
||||
mount Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
|
||||
mount /dev/sda3 /Amiga -t affs
|
||||
|
||||
/etc/fstab entry:
|
||||
/etc/fstab entry::
|
||||
|
||||
/dev/sdb5 /amiga/Workbench affs noauto,user,exec,verbose 0 0
|
||||
|
||||
IMPORTANT NOTE
|
||||
@ -170,7 +189,8 @@ before booting Windows!
|
||||
|
||||
If the damage is already done, the following should fix the RDB
|
||||
(where <disk> is the device name).
|
||||
DO AT YOUR OWN RISK:
|
||||
|
||||
DO AT YOUR OWN RISK::
|
||||
|
||||
dd if=/dev/<disk> of=rdb.tmp count=1
|
||||
cp rdb.tmp rdb.fixed
|
||||
@ -189,10 +209,14 @@ By default, filenames are truncated to 30 characters without warning.
|
||||
'nofilenametruncate' mount option can change that behavior.
|
||||
|
||||
Case is ignored by the affs in filename matching, but Linux shells
|
||||
do care about the case. Example (with /wb being an affs mounted fs):
|
||||
do care about the case. Example (with /wb being an affs mounted fs)::
|
||||
|
||||
rm /wb/WRONGCASE
|
||||
will remove /mnt/wrongcase, but
|
||||
|
||||
will remove /mnt/wrongcase, but::
|
||||
|
||||
rm /wb/WR*
|
||||
|
||||
will not since the names are matched by the shell.
|
||||
|
||||
The block allocation is designed for hard disk partitions. If more
|
||||
@ -219,4 +243,4 @@ due to an incompatibility with the Amiga floppy controller.
|
||||
|
||||
If you are interested in an Amiga Emulator for Linux, look at
|
||||
|
||||
http://web.archive.org/web/*/http://www.freiburg.linux.de/~uae/
|
||||
http://web.archive.org/web/%2E/http://www.freiburg.linux.de/~uae/
|
@ -1,8 +1,10 @@
|
||||
====================
|
||||
kAFS: AFS FILESYSTEM
|
||||
====================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Contents:
|
||||
====================
|
||||
kAFS: AFS FILESYSTEM
|
||||
====================
|
||||
|
||||
.. Contents:
|
||||
|
||||
- Overview.
|
||||
- Usage.
|
||||
@ -14,8 +16,7 @@ Contents:
|
||||
- The @sys substitution.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
Overview
|
||||
========
|
||||
|
||||
This filesystem provides a fairly simple secure AFS filesystem driver. It is
|
||||
@ -35,35 +36,33 @@ It does not yet support the following AFS features:
|
||||
(*) pioctl() system call.
|
||||
|
||||
|
||||
===========
|
||||
COMPILATION
|
||||
Compilation
|
||||
===========
|
||||
|
||||
The filesystem should be enabled by turning on the kernel configuration
|
||||
options:
|
||||
options::
|
||||
|
||||
CONFIG_AF_RXRPC - The RxRPC protocol transport
|
||||
CONFIG_RXKAD - The RxRPC Kerberos security handler
|
||||
CONFIG_AFS - The AFS filesystem
|
||||
|
||||
Additionally, the following can be turned on to aid debugging:
|
||||
Additionally, the following can be turned on to aid debugging::
|
||||
|
||||
CONFIG_AF_RXRPC_DEBUG - Permit AF_RXRPC debugging to be enabled
|
||||
CONFIG_AFS_DEBUG - Permit AFS debugging to be enabled
|
||||
|
||||
They permit the debugging messages to be turned on dynamically by manipulating
|
||||
the masks in the following files:
|
||||
the masks in the following files::
|
||||
|
||||
/sys/module/af_rxrpc/parameters/debug
|
||||
/sys/module/kafs/parameters/debug
|
||||
|
||||
|
||||
=====
|
||||
USAGE
|
||||
Usage
|
||||
=====
|
||||
|
||||
When inserting the driver modules the root cell must be specified along with a
|
||||
list of volume location server IP addresses:
|
||||
list of volume location server IP addresses::
|
||||
|
||||
modprobe rxrpc
|
||||
modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
|
||||
@ -77,14 +76,14 @@ The second module is the kerberos RxRPC security driver, and the third module
|
||||
is the actual filesystem driver for the AFS filesystem.
|
||||
|
||||
Once the module has been loaded, more modules can be added by the following
|
||||
procedure:
|
||||
procedure::
|
||||
|
||||
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
|
||||
|
||||
Where the parameters to the "add" command are the name of a cell and a list of
|
||||
volume location servers within that cell, with the latter separated by colons.
|
||||
|
||||
Filesystems can be mounted anywhere by commands similar to the following:
|
||||
Filesystems can be mounted anywhere by commands similar to the following::
|
||||
|
||||
mount -t afs "%cambridge.redhat.com:root.afs." /afs
|
||||
mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
|
||||
@ -104,8 +103,7 @@ named volume will be looked up in the cell specified during modprobe.
|
||||
Additional cells can be added through /proc (see later section).
|
||||
|
||||
|
||||
===========
|
||||
MOUNTPOINTS
|
||||
Mountpoints
|
||||
===========
|
||||
|
||||
AFS has a concept of mountpoints. In AFS terms, these are specially formatted
|
||||
@ -123,42 +121,40 @@ culled first. If all are culled, then the requested volume will also be
|
||||
unmounted, otherwise error EBUSY will be returned.
|
||||
|
||||
This can be used by the administrator to attempt to unmount the whole AFS tree
|
||||
mounted on /afs in one go by doing:
|
||||
mounted on /afs in one go by doing::
|
||||
|
||||
umount /afs
|
||||
|
||||
|
||||
============
|
||||
DYNAMIC ROOT
|
||||
Dynamic Root
|
||||
============
|
||||
|
||||
A mount option is available to create a serverless mount that is only usable
|
||||
for dynamic lookup. Creating such a mount can be done by, for example:
|
||||
for dynamic lookup. Creating such a mount can be done by, for example::
|
||||
|
||||
mount -t afs none /afs -o dyn
|
||||
|
||||
This creates a mount that just has an empty directory at the root. Attempting
|
||||
to look up a name in this directory will cause a mountpoint to be created that
|
||||
looks up a cell of the same name, for example:
|
||||
looks up a cell of the same name, for example::
|
||||
|
||||
ls /afs/grand.central.org/
|
||||
|
||||
|
||||
===============
|
||||
PROC FILESYSTEM
|
||||
Proc Filesystem
|
||||
===============
|
||||
|
||||
The AFS modules creates a "/proc/fs/afs/" directory and populates it:
|
||||
|
||||
(*) A "cells" file that lists cells currently known to the afs module and
|
||||
their usage counts:
|
||||
their usage counts::
|
||||
|
||||
[root@andromeda ~]# cat /proc/fs/afs/cells
|
||||
USE NAME
|
||||
3 cambridge.redhat.com
|
||||
|
||||
(*) A directory per cell that contains files that list volume location
|
||||
servers, volumes, and active servers known within that cell.
|
||||
servers, volumes, and active servers known within that cell::
|
||||
|
||||
[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
|
||||
USE ADDR STATE
|
||||
@ -171,8 +167,7 @@ The AFS modules creates a "/proc/fs/afs/" directory and populates it:
|
||||
1 Val 20000000 20000001 20000002 root.afs
|
||||
|
||||
|
||||
=================
|
||||
THE CELL DATABASE
|
||||
The Cell Database
|
||||
=================
|
||||
|
||||
The filesystem maintains an internal database of all the cells it knows and the
|
||||
@ -181,7 +176,7 @@ the system belongs is added to the database when modprobe is performed by the
|
||||
"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
|
||||
the kernel command line.
|
||||
|
||||
Further cells can be added by commands similar to the following:
|
||||
Further cells can be added by commands similar to the following::
|
||||
|
||||
echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
|
||||
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
|
||||
@ -189,8 +184,7 @@ Further cells can be added by commands similar to the following:
|
||||
No other cell database operations are available at this time.
|
||||
|
||||
|
||||
========
|
||||
SECURITY
|
||||
Security
|
||||
========
|
||||
|
||||
Secure operations are initiated by acquiring a key using the klog program. A
|
||||
@ -198,17 +192,17 @@ very primitive klog program is available at:
|
||||
|
||||
http://people.redhat.com/~dhowells/rxrpc/klog.c
|
||||
|
||||
This should be compiled by:
|
||||
This should be compiled by::
|
||||
|
||||
make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
|
||||
|
||||
And then run as:
|
||||
And then run as::
|
||||
|
||||
./klog
|
||||
|
||||
Assuming it's successful, this adds a key of type RxRPC, named for the service
|
||||
and cell, eg: "afs@<cellname>". This can be viewed with the keyctl program or
|
||||
by cat'ing /proc/keys:
|
||||
by cat'ing /proc/keys::
|
||||
|
||||
[root@andromeda ~]# keyctl show
|
||||
Session Keyring
|
||||
@ -232,20 +226,19 @@ socket), then the operations on the file will be made with key that was used to
|
||||
open the file.
|
||||
|
||||
|
||||
=====================
|
||||
THE @SYS SUBSTITUTION
|
||||
The @sys Substitution
|
||||
=====================
|
||||
|
||||
The list of up to 16 @sys substitutions for the current network namespace can
|
||||
be configured by writing a list to /proc/fs/afs/sysname:
|
||||
be configured by writing a list to /proc/fs/afs/sysname::
|
||||
|
||||
[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname
|
||||
|
||||
or cleared entirely by writing an empty list:
|
||||
or cleared entirely by writing an empty list::
|
||||
|
||||
[root@andromeda ~]# echo >/proc/fs/afs/sysname
|
||||
|
||||
The current list for current network namespace can be retrieved by:
|
||||
The current list for current network namespace can be retrieved by::
|
||||
|
||||
[root@andromeda ~]# cat /proc/fs/afs/sysname
|
||||
foo
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================================================================
|
||||
Miscellaneous Device control operations for the autofs kernel module
|
||||
====================================================================
|
||||
|
||||
@ -36,24 +38,24 @@ For example, there are two types of automount maps, direct (in the kernel
|
||||
module source you will see a third type called an offset, which is just
|
||||
a direct mount in disguise) and indirect.
|
||||
|
||||
Here is a master map with direct and indirect map entries:
|
||||
Here is a master map with direct and indirect map entries::
|
||||
|
||||
/- /etc/auto.direct
|
||||
/test /etc/auto.indirect
|
||||
/- /etc/auto.direct
|
||||
/test /etc/auto.indirect
|
||||
|
||||
and the corresponding map files:
|
||||
and the corresponding map files::
|
||||
|
||||
/etc/auto.direct:
|
||||
/etc/auto.direct:
|
||||
|
||||
/automount/dparse/g6 budgie:/autofs/export1
|
||||
/automount/dparse/g1 shark:/autofs/export1
|
||||
and so on.
|
||||
/automount/dparse/g6 budgie:/autofs/export1
|
||||
/automount/dparse/g1 shark:/autofs/export1
|
||||
and so on.
|
||||
|
||||
/etc/auto.indirect:
|
||||
/etc/auto.indirect::
|
||||
|
||||
g1 shark:/autofs/export1
|
||||
g6 budgie:/autofs/export1
|
||||
and so on.
|
||||
g1 shark:/autofs/export1
|
||||
g6 budgie:/autofs/export1
|
||||
and so on.
|
||||
|
||||
For the above indirect map an autofs file system is mounted on /test and
|
||||
mounts are triggered for each sub-directory key by the inode lookup
|
||||
@ -69,18 +71,18 @@ use the follow_link inode operation to trigger the mount.
|
||||
But, each entry in direct and indirect maps can have offsets (making
|
||||
them multi-mount map entries).
|
||||
|
||||
For example, an indirect mount map entry could also be:
|
||||
For example, an indirect mount map entry could also be::
|
||||
|
||||
g1 \
|
||||
g1 \
|
||||
/ shark:/autofs/export5/testing/test \
|
||||
/s1 shark:/autofs/export/testing/test/s1 \
|
||||
/s2 shark:/autofs/export5/testing/test/s2 \
|
||||
/s1/ss1 shark:/autofs/export1 \
|
||||
/s2/ss2 shark:/autofs/export2
|
||||
|
||||
and a similarly a direct mount map entry could also be:
|
||||
and a similarly a direct mount map entry could also be::
|
||||
|
||||
/automount/dparse/g1 \
|
||||
/automount/dparse/g1 \
|
||||
/ shark:/autofs/export5/testing/test \
|
||||
/s1 shark:/autofs/export/testing/test/s1 \
|
||||
/s2 shark:/autofs/export5/testing/test/s2 \
|
||||
@ -170,9 +172,9 @@ autofs Miscellaneous Device mount control interface
|
||||
The control interface is opening a device node, typically /dev/autofs.
|
||||
|
||||
All the ioctls use a common structure to pass the needed parameter
|
||||
information and return operation results:
|
||||
information and return operation results::
|
||||
|
||||
struct autofs_dev_ioctl {
|
||||
struct autofs_dev_ioctl {
|
||||
__u32 ver_major;
|
||||
__u32 ver_minor;
|
||||
__u32 size; /* total size of data passed in
|
||||
@ -195,7 +197,7 @@ struct autofs_dev_ioctl {
|
||||
};
|
||||
|
||||
char path[0];
|
||||
};
|
||||
};
|
||||
|
||||
The ioctlfd field is a mount point file descriptor of an autofs mount
|
||||
point. It is returned by the open call and is used by all calls except
|
||||
@ -212,7 +214,7 @@ is used account for the increased structure length when translating the
|
||||
structure sent from user space.
|
||||
|
||||
This structure can be initialized before setting specific fields by using
|
||||
the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *).
|
||||
the void function call init_autofs_dev_ioctl(``struct autofs_dev_ioctl *``).
|
||||
|
||||
All of the ioctls perform a copy of this structure from user space to
|
||||
kernel space and return -EINVAL if the size parameter is smaller than
|
@ -1,48 +1,54 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
BeOS filesystem for Linux
|
||||
=========================
|
||||
|
||||
Document last updated: Dec 6, 2001
|
||||
|
||||
WARNING
|
||||
Warning
|
||||
=======
|
||||
Make sure you understand that this is alpha software. This means that the
|
||||
implementation is neither complete nor well-tested.
|
||||
|
||||
I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
|
||||
|
||||
LICENSE
|
||||
=====
|
||||
License
|
||||
=======
|
||||
This software is covered by the GNU General Public License.
|
||||
See the file COPYING for the complete text of the license.
|
||||
Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
|
||||
|
||||
AUTHOR
|
||||
=====
|
||||
Author
|
||||
======
|
||||
The largest part of the code written by Will Dyson <will_dyson@pobox.com>
|
||||
He has been working on the code since Aug 13, 2001. See the changelog for
|
||||
details.
|
||||
|
||||
Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
|
||||
|
||||
His original code can still be found at:
|
||||
<http://hp.vector.co.jp/authors/VA008030/bfs/>
|
||||
|
||||
Does anyone know of a more current email address for Makoto? He doesn't
|
||||
respond to the address given above...
|
||||
|
||||
This filesystem doesn't have a maintainer.
|
||||
|
||||
WHAT IS THIS DRIVER?
|
||||
==================
|
||||
What is this Driver?
|
||||
====================
|
||||
This module implements the native filesystem of BeOS http://www.beincorporated.com/
|
||||
for the linux 2.4.1 and later kernels. Currently it is a read-only
|
||||
implementation.
|
||||
|
||||
Which is it, BFS or BEFS?
|
||||
================
|
||||
=========================
|
||||
Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".
|
||||
But Unixware Boot Filesystem is called bfs, too. And they are already in
|
||||
the kernel. Because of this naming conflict, on Linux the BeOS
|
||||
filesystem is called befs.
|
||||
|
||||
HOW TO INSTALL
|
||||
How to Install
|
||||
==============
|
||||
step 1. Install the BeFS patch into the source code tree of linux.
|
||||
|
||||
@ -63,7 +69,7 @@ The linux kernel has many compile-time options. Most of them are beyond the
|
||||
scope of this document. I suggest the Kernel-HOWTO document as a good general
|
||||
reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html
|
||||
|
||||
However, to use the BeFS module, you must enable it at configure time.
|
||||
However, to use the BeFS module, you must enable it at configure time::
|
||||
|
||||
cd /foo/bar/linux
|
||||
make menuconfig (or xconfig)
|
||||
@ -82,35 +88,40 @@ step 3. Install
|
||||
See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
|
||||
instructions on this critical step.
|
||||
|
||||
USING BFS
|
||||
Using BFS
|
||||
=========
|
||||
To use the BeOS filesystem, use filesystem type 'befs'.
|
||||
|
||||
ex)
|
||||
ex::
|
||||
|
||||
mount -t befs /dev/fd0 /beos
|
||||
|
||||
MOUNT OPTIONS
|
||||
Mount Options
|
||||
=============
|
||||
|
||||
============= ===========================================================
|
||||
uid=nnn All files in the partition will be owned by user id nnn.
|
||||
gid=nnn All files in the partition will be in group nnn.
|
||||
iocharset=xxx Use xxx as the name of the NLS translation table.
|
||||
debug The driver will output debugging information to the syslog.
|
||||
============= ===========================================================
|
||||
|
||||
HOW TO GET LASTEST VERSION
|
||||
How to Get Lastest Version
|
||||
==========================
|
||||
|
||||
The latest version is currently available at:
|
||||
<http://befs-driver.sourceforge.net/>
|
||||
|
||||
ANY KNOWN BUGS?
|
||||
===========
|
||||
Any Known Bugs?
|
||||
===============
|
||||
As of Jan 20, 2002:
|
||||
|
||||
None
|
||||
|
||||
SPECIAL THANKS
|
||||
Special Thanks
|
||||
==============
|
||||
Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
|
||||
|
||||
Hiroyuki Yamada ... Testing LinuxPPC.
|
||||
|
||||
|
@ -1,4 +1,7 @@
|
||||
BFS FILESYSTEM FOR LINUX
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================
|
||||
BFS Filesystem for Linux
|
||||
========================
|
||||
|
||||
The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
|
||||
@ -9,20 +12,20 @@ In order to access /stand partition under Linux you obviously need to
|
||||
know the partition number and the kernel must support UnixWare disk slices
|
||||
(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
|
||||
depend on having UnixWare disklabel support because one can also mount
|
||||
BFS filesystem via loopback:
|
||||
BFS filesystem via loopback::
|
||||
|
||||
# losetup /dev/loop0 stand.img
|
||||
# mount -t bfs /dev/loop0 /mnt/stand
|
||||
# losetup /dev/loop0 stand.img
|
||||
# mount -t bfs /dev/loop0 /mnt/stand
|
||||
|
||||
where stand.img is a file containing the image of BFS filesystem.
|
||||
When you have finished using it and umounted you need to also deallocate
|
||||
/dev/loop0 device by:
|
||||
/dev/loop0 device by::
|
||||
|
||||
# losetup -d /dev/loop0
|
||||
# losetup -d /dev/loop0
|
||||
|
||||
You can simplify mounting by just typing:
|
||||
You can simplify mounting by just typing::
|
||||
|
||||
# mount -t bfs -o loop stand.img /mnt/stand
|
||||
# mount -t bfs -o loop stand.img /mnt/stand
|
||||
|
||||
this will allocate the first available loopback device (and load loop.o
|
||||
kernel module if necessary) automatically. If the loopback driver is not
|
||||
@ -33,21 +36,21 @@ that modprobe is functioning. Beware that umount will not deallocate
|
||||
losetup(8). Read losetup(8) manpage for more info.
|
||||
|
||||
To create the BFS image under UnixWare you need to find out first which
|
||||
slice contains it. The command prtvtoc(1M) is your friend:
|
||||
slice contains it. The command prtvtoc(1M) is your friend::
|
||||
|
||||
# prtvtoc /dev/rdsk/c0b0t0d0s0
|
||||
# prtvtoc /dev/rdsk/c0b0t0d0s0
|
||||
|
||||
(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
|
||||
look for the slice with tag "STAND", which is usually slice 10. With this
|
||||
information you can use dd(1) to create the BFS image:
|
||||
information you can use dd(1) to create the BFS image::
|
||||
|
||||
# umount /stand
|
||||
# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
|
||||
# umount /stand
|
||||
# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
|
||||
|
||||
Just in case, you can verify that you have done the right thing by checking
|
||||
the magic number:
|
||||
the magic number::
|
||||
|
||||
# od -Ad -tx4 stand.img | more
|
||||
# od -Ad -tx4 stand.img | more
|
||||
|
||||
The first 4 bytes should be 0x1badface.
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====
|
||||
BTRFS
|
||||
=====
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================
|
||||
Ceph Distributed File System
|
||||
============================
|
||||
|
||||
@ -15,6 +18,7 @@ Basic features include:
|
||||
* Easy deployment: most FS components are userspace daemons
|
||||
|
||||
Also,
|
||||
|
||||
* Flexible snapshots (on any directory)
|
||||
* Recursive accounting (nested files, directories, bytes)
|
||||
|
||||
@ -63,7 +67,7 @@ no 'du' or similar recursive scan of the file system is required.
|
||||
Finally, Ceph also allows quotas to be set on any directory in the system.
|
||||
The quota can restrict the number of bytes or the number of files stored
|
||||
beneath that point in the directory hierarchy. Quotas can be set using
|
||||
extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg:
|
||||
extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg::
|
||||
|
||||
setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
|
||||
getfattr -n ceph.quota.max_bytes /some/dir
|
||||
@ -76,7 +80,7 @@ from writing as much data as it needs.
|
||||
Mount Syntax
|
||||
============
|
||||
|
||||
The basic mount syntax is:
|
||||
The basic mount syntax is::
|
||||
|
||||
# mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
|
||||
|
||||
@ -84,7 +88,7 @@ You only need to specify a single monitor, as the client will get the
|
||||
full list when it connects. (However, if the monitor you specify
|
||||
happens to be down, the mount won't succeed.) The port can be left
|
||||
off if the monitor is using the default. So if the monitor is at
|
||||
1.2.3.4,
|
||||
1.2.3.4::
|
||||
|
||||
# mount -t ceph 1.2.3.4:/ /mnt/ceph
|
||||
|
||||
@ -179,8 +183,8 @@ For more information on Ceph, see the home page at
|
||||
https://ceph.com/
|
||||
|
||||
The Linux kernel client source tree is available at
|
||||
https://github.com/ceph/ceph-client.git
|
||||
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
|
||||
- https://github.com/ceph/ceph-client.git
|
||||
- git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
|
||||
|
||||
and the source for the full system is at
|
||||
https://github.com/ceph/ceph.git
|
@ -13,7 +13,7 @@ network by utilizing SMB or CIFS protocol.
|
||||
|
||||
In order to mount, the network stack will also need to be set up by
|
||||
using 'ip=' config option. For more details, see
|
||||
Documentation/filesystems/nfs/nfsroot.txt.
|
||||
Documentation/admin-guide/nfs/nfsroot.rst.
|
||||
|
||||
A CIFS root mount currently requires the use of SMB1+UNIX Extensions
|
||||
which is only supported by the Samba server. SMB1 is the older
|
||||
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Cramfs - cram a filesystem onto a small ROM
|
||||
===========================================
|
||||
Cramfs - cram a filesystem onto a small ROM
|
||||
===========================================
|
||||
|
||||
cramfs is designed to be simple and small, and to compress things well.
|
||||
|
||||
@ -28,9 +31,9 @@ issue.
|
||||
Hard links are supported, but hard linked files
|
||||
will still have a link count of 1 in the cramfs image.
|
||||
|
||||
Cramfs directories have no `.' or `..' entries. Directories (like
|
||||
Cramfs directories have no ``.`` or ``..`` entries. Directories (like
|
||||
every other file on cramfs) always have a link count of 1. (There's
|
||||
no need to use -noleaf in `find', btw.)
|
||||
no need to use -noleaf in ``find``, btw.)
|
||||
|
||||
No timestamps are stored in a cramfs, so these default to the epoch
|
||||
(1970 GMT). Recently-accessed files may have updated timestamps, but
|
||||
@ -70,9 +73,9 @@ MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap
|
||||
(Flash device in physical memory map). MTD partitions based on such devices
|
||||
are fine too. Then that device should be specified with the "mtd:" prefix
|
||||
as the mount device argument. For example, to mount the MTD device named
|
||||
"fs_partition" on the /mnt directory:
|
||||
"fs_partition" on the /mnt directory::
|
||||
|
||||
$ mount -t cramfs mtd:fs_partition /mnt
|
||||
$ mount -t cramfs mtd:fs_partition /mnt
|
||||
|
||||
To boot a kernel with this as root filesystem, suffice to specify
|
||||
something like "root=mtd:fs_partition" on the kernel command line.
|
||||
@ -90,6 +93,7 @@ https://github.com/npitre/cramfs-tools
|
||||
For /usr/share/magic
|
||||
--------------------
|
||||
|
||||
===== ======================= =======================
|
||||
0 ulelong 0x28cd3d45 Linux cramfs offset 0
|
||||
>4 ulelong x size %d
|
||||
>8 ulelong x flags 0x%x
|
||||
@ -110,6 +114,7 @@ For /usr/share/magic
|
||||
>552 ulelong x fsid.blocks %d
|
||||
>556 ulelong x fsid.files %d
|
||||
>560 string >\0 name "%.16s"
|
||||
===== ======================= =======================
|
||||
|
||||
|
||||
Hacker Notes
|
@ -1,4 +1,11 @@
|
||||
Copyright 2009 Jonathan Corbet <corbet@lwn.net>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=======
|
||||
DebugFS
|
||||
=======
|
||||
|
||||
Copyright |copy| 2009 Jonathan Corbet <corbet@lwn.net>
|
||||
|
||||
Debugfs exists as a simple way for kernel developers to make information
|
||||
available to user space. Unlike /proc, which is only meant for information
|
||||
@ -6,11 +13,11 @@ about a process, or sysfs, which has strict one-value-per-file rules,
|
||||
debugfs has no rules at all. Developers can put any information they want
|
||||
there. The debugfs filesystem is also intended to not serve as a stable
|
||||
ABI to user space; in theory, there are no stability constraints placed on
|
||||
files exported there. The real world is not always so simple, though [1];
|
||||
files exported there. The real world is not always so simple, though [1]_;
|
||||
even debugfs interfaces are best designed with the idea that they will need
|
||||
to be maintained forever.
|
||||
|
||||
Debugfs is typically mounted with a command like:
|
||||
Debugfs is typically mounted with a command like::
|
||||
|
||||
mount -t debugfs none /sys/kernel/debug
|
||||
|
||||
@ -23,7 +30,7 @@ Note that the debugfs API is exported GPL-only to modules.
|
||||
|
||||
Code using debugfs should include <linux/debugfs.h>. Then, the first order
|
||||
of business will be to create at least one directory to hold a set of
|
||||
debugfs files:
|
||||
debugfs files::
|
||||
|
||||
struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
|
||||
|
||||
@ -36,7 +43,7 @@ something went wrong. If ERR_PTR(-ENODEV) is returned, that is an
|
||||
indication that the kernel has been built without debugfs support and none
|
||||
of the functions described below will work.
|
||||
|
||||
The most general way to create a file within a debugfs directory is with:
|
||||
The most general way to create a file within a debugfs directory is with::
|
||||
|
||||
struct dentry *debugfs_create_file(const char *name, umode_t mode,
|
||||
struct dentry *parent, void *data,
|
||||
@ -53,7 +60,7 @@ ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
|
||||
missing.
|
||||
|
||||
Create a file with an initial size, the following function can be used
|
||||
instead:
|
||||
instead::
|
||||
|
||||
struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
|
||||
struct dentry *parent, void *data,
|
||||
@ -66,7 +73,7 @@ as the function debugfs_create_file.
|
||||
In a number of cases, the creation of a set of file operations is not
|
||||
actually necessary; the debugfs code provides a number of helper functions
|
||||
for simple situations. Files containing a single integer value can be
|
||||
created with any of:
|
||||
created with any of::
|
||||
|
||||
void debugfs_create_u8(const char *name, umode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
@ -80,7 +87,7 @@ created with any of:
|
||||
These files support both reading and writing the given value; if a specific
|
||||
file should not be written to, simply set the mode bits accordingly. The
|
||||
values in these files are in decimal; if hexadecimal is more appropriate,
|
||||
the following functions can be used instead:
|
||||
the following functions can be used instead::
|
||||
|
||||
void debugfs_create_x8(const char *name, umode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
@ -94,7 +101,7 @@ the following functions can be used instead:
|
||||
These functions are useful as long as the developer knows the size of the
|
||||
value to be exported. Some types can have different widths on different
|
||||
architectures, though, complicating the situation somewhat. There are
|
||||
functions meant to help out in such special cases:
|
||||
functions meant to help out in such special cases::
|
||||
|
||||
void debugfs_create_size_t(const char *name, umode_t mode,
|
||||
struct dentry *parent, size_t *value);
|
||||
@ -103,7 +110,7 @@ As might be expected, this function will create a debugfs file to represent
|
||||
a variable of type size_t.
|
||||
|
||||
Similarly, there are helpers for variables of type unsigned long, in decimal
|
||||
and hexadecimal:
|
||||
and hexadecimal::
|
||||
|
||||
struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
|
||||
struct dentry *parent,
|
||||
@ -111,7 +118,7 @@ and hexadecimal:
|
||||
void debugfs_create_xul(const char *name, umode_t mode,
|
||||
struct dentry *parent, unsigned long *value);
|
||||
|
||||
Boolean values can be placed in debugfs with:
|
||||
Boolean values can be placed in debugfs with::
|
||||
|
||||
struct dentry *debugfs_create_bool(const char *name, umode_t mode,
|
||||
struct dentry *parent, bool *value);
|
||||
@ -120,7 +127,7 @@ A read on the resulting file will yield either Y (for non-zero values) or
|
||||
N, followed by a newline. If written to, it will accept either upper- or
|
||||
lower-case values, or 1 or 0. Any other input will be silently ignored.
|
||||
|
||||
Also, atomic_t values can be placed in debugfs with:
|
||||
Also, atomic_t values can be placed in debugfs with::
|
||||
|
||||
void debugfs_create_atomic_t(const char *name, umode_t mode,
|
||||
struct dentry *parent, atomic_t *value)
|
||||
@ -129,7 +136,7 @@ A read of this file will get atomic_t values, and a write of this file
|
||||
will set atomic_t values.
|
||||
|
||||
Another option is exporting a block of arbitrary binary data, with
|
||||
this structure and function:
|
||||
this structure and function::
|
||||
|
||||
struct debugfs_blob_wrapper {
|
||||
void *data;
|
||||
@ -151,7 +158,7 @@ If you want to dump a block of registers (something that happens quite
|
||||
often during development, even if little such code reaches mainline.
|
||||
Debugfs offers two functions: one to make a registers-only file, and
|
||||
another to insert a register block in the middle of another sequential
|
||||
file.
|
||||
file::
|
||||
|
||||
struct debugfs_reg32 {
|
||||
char *name;
|
||||
@ -175,7 +182,7 @@ The "base" argument may be 0, but you may want to build the reg32 array
|
||||
using __stringify, and a number of register names (macros) are actually
|
||||
byte offsets over a base for the register block.
|
||||
|
||||
If you want to dump an u32 array in debugfs, you can create file with:
|
||||
If you want to dump an u32 array in debugfs, you can create file with::
|
||||
|
||||
void debugfs_create_u32_array(const char *name, umode_t mode,
|
||||
struct dentry *parent,
|
||||
@ -185,7 +192,7 @@ The "array" argument provides data, and the "elements" argument is
|
||||
the number of elements in the array. Note: Once array is created its
|
||||
size can not be changed.
|
||||
|
||||
There is a helper function to create device related seq_file:
|
||||
There is a helper function to create device related seq_file::
|
||||
|
||||
struct dentry *debugfs_create_devm_seqfile(struct device *dev,
|
||||
const char *name,
|
||||
@ -197,7 +204,7 @@ The "dev" argument is the device related to this debugfs file, and
|
||||
the "read_fn" is a function pointer which to be called to print the
|
||||
seq_file content.
|
||||
|
||||
There are a couple of other directory-oriented helper functions:
|
||||
There are a couple of other directory-oriented helper functions::
|
||||
|
||||
struct dentry *debugfs_rename(struct dentry *old_dir,
|
||||
struct dentry *old_dentry,
|
||||
@ -219,7 +226,7 @@ module is unloaded without explicitly removing debugfs entries, the result
|
||||
will be a lot of stale pointers and no end of highly antisocial behavior.
|
||||
So all debugfs users - at least those which can be built as modules - must
|
||||
be prepared to remove all files and directories they create there. A file
|
||||
can be removed with:
|
||||
can be removed with::
|
||||
|
||||
void debugfs_remove(struct dentry *dentry);
|
||||
|
||||
@ -229,7 +236,7 @@ be removed.
|
||||
Once upon a time, debugfs users were required to remember the dentry
|
||||
pointer for every debugfs file they created so that all files could be
|
||||
cleaned up. We live in more civilized times now, though, and debugfs users
|
||||
can call:
|
||||
can call::
|
||||
|
||||
void debugfs_remove_recursive(struct dentry *dentry);
|
||||
|
||||
@ -237,5 +244,4 @@ If this function is passed a pointer for the dentry corresponding to the
|
||||
top-level directory, the entire hierarchy below that directory will be
|
||||
removed.
|
||||
|
||||
Notes:
|
||||
[1] http://lwn.net/Articles/309298/
|
||||
.. [1] http://lwn.net/Articles/309298/
|
@ -1,20 +1,25 @@
|
||||
dlmfs
|
||||
==================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=====
|
||||
DLMFS
|
||||
=====
|
||||
|
||||
A minimal DLM userspace interface implemented via a virtual file
|
||||
system.
|
||||
|
||||
dlmfs is built with OCFS2 as it requires most of its infrastructure.
|
||||
|
||||
Project web page: http://ocfs2.wiki.kernel.org
|
||||
Tools web page: https://github.com/markfasheh/ocfs2-tools
|
||||
OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
|
||||
:Project web page: http://ocfs2.wiki.kernel.org
|
||||
:Tools web page: https://github.com/markfasheh/ocfs2-tools
|
||||
:OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
|
||||
|
||||
All code copyright 2005 Oracle except when otherwise noted.
|
||||
|
||||
CREDITS
|
||||
Credits
|
||||
=======
|
||||
|
||||
Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds
|
||||
Some code taken from ramfs which is Copyright |copy| 2000 Linus Torvalds
|
||||
and Transmeta Corp.
|
||||
|
||||
Mark Fasheh <mark.fasheh@oracle.com>
|
||||
@ -96,14 +101,19 @@ operation. If the lock succeeds, you'll get an fd.
|
||||
open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
|
||||
not automatically create inodes for existing lock resources.
|
||||
|
||||
============ ===========================
|
||||
Open Flag Lock Request Type
|
||||
--------- -----------------
|
||||
============ ===========================
|
||||
O_RDONLY Shared Read
|
||||
O_RDWR Exclusive
|
||||
============ ===========================
|
||||
|
||||
|
||||
============ ===========================
|
||||
Open Flag Resulting Locking Behavior
|
||||
--------- --------------------------
|
||||
============ ===========================
|
||||
O_NONBLOCK Trylock operation
|
||||
============ ===========================
|
||||
|
||||
You must provide exactly one of O_RDONLY or O_RDWR.
|
||||
|
@ -1,14 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================================
|
||||
eCryptfs: A stacked cryptographic filesystem for Linux
|
||||
======================================================
|
||||
|
||||
eCryptfs is free software. Please see the file COPYING for details.
|
||||
For documentation, please see the files in the doc/ subdirectory. For
|
||||
building and installation instructions please see the INSTALL file.
|
||||
|
||||
Maintainer: Phillip Hellewell
|
||||
Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
|
||||
Developers: Michael C. Thompson
|
||||
:Maintainer: Phillip Hellewell
|
||||
:Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
|
||||
:Developers: Michael C. Thompson
|
||||
Kent Yoder
|
||||
Web Site: http://ecryptfs.sf.net
|
||||
:Web Site: http://ecryptfs.sf.net
|
||||
|
||||
This software is currently undergoing development. Make sure to
|
||||
maintain a backup copy of any data you write into eCryptfs.
|
||||
@ -19,34 +23,36 @@ SourceForge site:
|
||||
http://sourceforge.net/projects/ecryptfs/
|
||||
|
||||
Userspace requirements include:
|
||||
- David Howells' userspace keyring headers and libraries (version
|
||||
|
||||
- David Howells' userspace keyring headers and libraries (version
|
||||
1.0 or higher), obtainable from
|
||||
http://people.redhat.com/~dhowells/keyutils/
|
||||
- Libgcrypt
|
||||
- Libgcrypt
|
||||
|
||||
|
||||
NOTES
|
||||
.. note::
|
||||
|
||||
In the beta/experimental releases of eCryptfs, when you upgrade
|
||||
eCryptfs, you should copy the files to an unencrypted location and
|
||||
then copy the files back into the new eCryptfs mount to migrate the
|
||||
files.
|
||||
In the beta/experimental releases of eCryptfs, when you upgrade
|
||||
eCryptfs, you should copy the files to an unencrypted location and
|
||||
then copy the files back into the new eCryptfs mount to migrate the
|
||||
files.
|
||||
|
||||
|
||||
MOUNT-WIDE PASSPHRASE
|
||||
Mount-wide Passphrase
|
||||
=====================
|
||||
|
||||
Create a new directory into which eCryptfs will write its encrypted
|
||||
files (i.e., /root/crypt). Then, create the mount point directory
|
||||
(i.e., /mnt/crypt). Now it's time to mount eCryptfs:
|
||||
(i.e., /mnt/crypt). Now it's time to mount eCryptfs::
|
||||
|
||||
mount -t ecryptfs /root/crypt /mnt/crypt
|
||||
mount -t ecryptfs /root/crypt /mnt/crypt
|
||||
|
||||
You should be prompted for a passphrase and a salt (the salt may be
|
||||
blank).
|
||||
|
||||
Try writing a new file:
|
||||
Try writing a new file::
|
||||
|
||||
echo "Hello, World" > /mnt/crypt/hello.txt
|
||||
echo "Hello, World" > /mnt/crypt/hello.txt
|
||||
|
||||
The operation will complete. Notice that there is a new file in
|
||||
/root/crypt that is at least 12288 bytes in size (depending on your
|
||||
@ -59,10 +65,13 @@ keyctl clear @u
|
||||
Then umount /mnt/crypt and mount again per the instructions given
|
||||
above.
|
||||
|
||||
cat /mnt/crypt/hello.txt
|
||||
::
|
||||
|
||||
cat /mnt/crypt/hello.txt
|
||||
|
||||
|
||||
NOTES
|
||||
Notes
|
||||
=====
|
||||
|
||||
eCryptfs version 0.1 should only be mounted on (1) empty directories
|
||||
or (2) directories containing files only created by eCryptfs. If you
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================================
|
||||
efivarfs - a (U)EFI variable filesystem
|
||||
=======================================
|
||||
|
||||
The efivarfs filesystem was created to address the shortcomings of
|
||||
using entries in sysfs to maintain EFI variables. The old sysfs EFI
|
||||
@ -11,7 +14,7 @@ than a single page, sysfs isn't the best interface for this.
|
||||
Variables can be created, deleted and modified with the efivarfs
|
||||
filesystem.
|
||||
|
||||
efivarfs is typically mounted like this,
|
||||
efivarfs is typically mounted like this::
|
||||
|
||||
mount -t efivarfs none /sys/firmware/efi/efivars
|
||||
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================
|
||||
Enhanced Read-Only File System - EROFS
|
||||
======================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
@ -6,6 +12,7 @@ from other read-only file systems, it aims to be designed for flexibility,
|
||||
scalability, but be kept simple and high performance.
|
||||
|
||||
It is designed as a better filesystem solution for the following scenarios:
|
||||
|
||||
- read-only storage media or
|
||||
|
||||
- part of a fully trusted read-only solution, which means it needs to be
|
||||
@ -17,6 +24,7 @@ It is designed as a better filesystem solution for the following scenarios:
|
||||
for those embedded devices with limited memory (ex, smartphone);
|
||||
|
||||
Here is the main features of EROFS:
|
||||
|
||||
- Little endian on-disk design;
|
||||
|
||||
- Currently 4KB block size (nobh) and therefore maximum 16TB address space;
|
||||
@ -24,13 +32,17 @@ Here is the main features of EROFS:
|
||||
- Metadata & data could be mixed by design;
|
||||
|
||||
- 2 inode versions for different requirements:
|
||||
|
||||
===================== ============ =====================================
|
||||
compact (v1) extended (v2)
|
||||
Inode metadata size: 32 bytes 64 bytes
|
||||
Max file size: 4 GB 16 EB (also limited by max. vol size)
|
||||
Max uids/gids: 65536 4294967296
|
||||
File change time: no yes (64 + 32-bit timestamp)
|
||||
Max hardlinks: 65536 4294967296
|
||||
Metadata reserved: 4 bytes 14 bytes
|
||||
===================== ============ =====================================
|
||||
Inode metadata size 32 bytes 64 bytes
|
||||
Max file size 4 GB 16 EB (also limited by max. vol size)
|
||||
Max uids/gids 65536 4294967296
|
||||
File change time no yes (64 + 32-bit timestamp)
|
||||
Max hardlinks 65536 4294967296
|
||||
Metadata reserved 4 bytes 14 bytes
|
||||
===================== ============ =====================================
|
||||
|
||||
- Support extended attributes (xattrs) as an option;
|
||||
|
||||
@ -43,29 +55,36 @@ Here is the main features of EROFS:
|
||||
|
||||
The following git tree provides the file system user-space tools under
|
||||
development (ex, formatting tool mkfs.erofs):
|
||||
>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
|
||||
|
||||
- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
|
||||
|
||||
Bugs and patches are welcome, please kindly help us and send to the following
|
||||
linux-erofs mailing list:
|
||||
>> linux-erofs mailing list <linux-erofs@lists.ozlabs.org>
|
||||
|
||||
- linux-erofs mailing list <linux-erofs@lists.ozlabs.org>
|
||||
|
||||
Mount options
|
||||
=============
|
||||
|
||||
=================== =========================================================
|
||||
(no)user_xattr Setup Extended User Attributes. Note: xattr is enabled
|
||||
by default if CONFIG_EROFS_FS_XATTR is selected.
|
||||
(no)acl Setup POSIX Access Control List. Note: acl is enabled
|
||||
by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
|
||||
cache_strategy=%s Select a strategy for cached decompression from now on:
|
||||
disabled: In-place I/O decompression only;
|
||||
readahead: Cache the last incomplete compressed physical
|
||||
|
||||
========== =============================================
|
||||
disabled In-place I/O decompression only;
|
||||
readahead Cache the last incomplete compressed physical
|
||||
cluster for further reading. It still does
|
||||
in-place I/O decompression for the rest
|
||||
compressed physical clusters;
|
||||
readaround: Cache the both ends of incomplete compressed
|
||||
readaround Cache the both ends of incomplete compressed
|
||||
physical clusters for further reading.
|
||||
It still does in-place I/O decompression
|
||||
for the rest compressed physical clusters.
|
||||
========== =============================================
|
||||
=================== =========================================================
|
||||
|
||||
On-disk details
|
||||
===============
|
||||
@ -73,7 +92,7 @@ On-disk details
|
||||
Summary
|
||||
-------
|
||||
Different from other read-only file systems, an EROFS volume is designed
|
||||
to be as simple as possible:
|
||||
to be as simple as possible::
|
||||
|
||||
|-> aligned with the block size
|
||||
____________________________________________________________
|
||||
@ -83,13 +102,17 @@ to be as simple as possible:
|
||||
|
||||
All data areas should be aligned with the block size, but metadata areas
|
||||
may not. All metadatas can be now observed in two different spaces (views):
|
||||
|
||||
1. Inode metadata space
|
||||
|
||||
Each valid inode should be aligned with an inode slot, which is a fixed
|
||||
value (32 bytes) and designed to be kept in line with compact inode size.
|
||||
|
||||
Each inode can be directly found with the following formula:
|
||||
inode offset = meta_blkaddr * block_size + 32 * nid
|
||||
|
||||
::
|
||||
|
||||
|-> aligned with 8B
|
||||
|-> followed closely
|
||||
+ meta_blkaddr blocks |-> another slot
|
||||
@ -117,7 +140,7 @@ may not. All metadatas can be now observed in two different spaces (views):
|
||||
|-> aligned with 4B
|
||||
|
||||
Inode could be 32 or 64 bytes, which can be distinguished from a common
|
||||
field which all inode versions have -- i_format:
|
||||
field which all inode versions have -- i_format::
|
||||
|
||||
__________________ __________________
|
||||
| i_format | | i_format |
|
||||
@ -132,16 +155,19 @@ may not. All metadatas can be now observed in two different spaces (views):
|
||||
proper alignment, and they could be optional for different data mappings.
|
||||
_currently_ total 4 valid data mappings are supported:
|
||||
|
||||
== ====================================================================
|
||||
0 flat file data without data inline (no extent);
|
||||
1 fixed-sized output data compression (with non-compacted indexes);
|
||||
2 flat file data with tail packing data inline (no extent);
|
||||
3 fixed-sized output data compression (with compacted indexes, v5.3+).
|
||||
== ====================================================================
|
||||
|
||||
The size of the optional xattrs is indicated by i_xattr_count in inode
|
||||
header. Large xattrs or xattrs shared by many different files can be
|
||||
stored in shared xattrs metadata rather than inlined right after inode.
|
||||
|
||||
2. Shared xattrs metadata space
|
||||
|
||||
Shared xattrs space is similar to the above inode space, started with
|
||||
a specific block indicated by xattr_blkaddr, organized one by one with
|
||||
proper align.
|
||||
@ -149,6 +175,8 @@ may not. All metadatas can be now observed in two different spaces (views):
|
||||
Each share xattr can also be directly found by the following formula:
|
||||
xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
|
||||
|
||||
::
|
||||
|
||||
|-> aligned by 4 bytes
|
||||
+ xattr_blkaddr blocks |-> aligned with 4 bytes
|
||||
_________________________________________________________________________
|
||||
@ -163,13 +191,15 @@ random file lookup, and all directory entries are _strictly_ recorded in
|
||||
alphabetical order in order to support improved prefix binary search
|
||||
algorithm (could refer to the related source code).
|
||||
|
||||
::
|
||||
|
||||
___________________________
|
||||
/ |
|
||||
/ ______________|________________
|
||||
/ / | nameoff1 | nameoffN-1
|
||||
____________.______________._______________v________________v__________
|
||||
| dirent | dirent | ... | dirent | filename | filename | ... | filename |
|
||||
|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
|
||||
| dirent | dirent | ... | dirent | filename | filename | ... | filename |
|
||||
|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
|
||||
\ ^
|
||||
\ | * could have
|
||||
\ | trailing '\0'
|
||||
@ -184,14 +214,14 @@ introduce another on-disk field at all.
|
||||
Compression
|
||||
-----------
|
||||
Currently, EROFS supports 4KB fixed-sized output transparent file compression,
|
||||
as illustrated below:
|
||||
as illustrated below::
|
||||
|
||||
|---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
|
||||
clusterofs clusterofs clusterofs
|
||||
| | | logical data
|
||||
_________v_______________________________v_____________________v_______________
|
||||
... | . | | . | | . | ...
|
||||
____|____.________|_____________|________.____|_____________|__.__________|____
|
||||
_________v_______________________________v_____________________v_______________
|
||||
... | . | | . | | . | ...
|
||||
____|____.________|_____________|________.____|_____________|__.__________|____
|
||||
|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
|
||||
size size size size size
|
||||
. . . .
|
||||
@ -208,4 +238,3 @@ at most. For each logical cluster, there is a corresponding on-disk index to
|
||||
describe its cluster type, physical cluster address, etc.
|
||||
|
||||
See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
|
||||
The Second Extended Filesystem
|
||||
==============================
|
||||
@ -14,8 +16,9 @@ Options
|
||||
Most defaults are determined by the filesystem superblock, and can be
|
||||
set using tune2fs(8). Kernel-determined defaults are indicated by (*).
|
||||
|
||||
bsddf (*) Makes `df' act like BSD.
|
||||
minixdf Makes `df' act like Minix.
|
||||
==================== === ================================================
|
||||
bsddf (*) Makes ``df`` act like BSD.
|
||||
minixdf Makes ``df`` act like Minix.
|
||||
|
||||
check=none, nocheck (*) Don't do extra checking of bitmaps on mount
|
||||
(check=normal and check=strict options removed)
|
||||
@ -62,6 +65,7 @@ quota, usrquota Enable user disk quota support
|
||||
|
||||
grpquota Enable group disk quota support
|
||||
(requires CONFIG_QUOTA).
|
||||
==================== === ================================================
|
||||
|
||||
noquota option ls silently ignored by ext2.
|
||||
|
||||
@ -294,9 +298,9 @@ respective fsck programs.
|
||||
If you're exceptionally paranoid, there are 3 ways of making metadata
|
||||
writes synchronous on ext2:
|
||||
|
||||
per-file if you have the program source: use the O_SYNC flag to open()
|
||||
per-file if you don't have the source: use "chattr +S" on the file
|
||||
per-filesystem: add the "sync" option to mount (or in /etc/fstab)
|
||||
- per-file if you have the program source: use the O_SYNC flag to open()
|
||||
- per-file if you don't have the source: use "chattr +S" on the file
|
||||
- per-filesystem: add the "sync" option to mount (or in /etc/fstab)
|
||||
|
||||
the first and last are not ext2 specific but do force the metadata to
|
||||
be written synchronously. See also Journaling below.
|
||||
@ -316,10 +320,12 @@ Most of these limits could be overcome with slight changes in the on-disk
|
||||
format and using a compatibility flag to signal the format change (at
|
||||
the expense of some compatibility).
|
||||
|
||||
Filesystem block size: 1kB 2kB 4kB 8kB
|
||||
|
||||
File size limit: 16GB 256GB 2048GB 2048GB
|
||||
Filesystem size limit: 2047GB 8192GB 16384GB 32768GB
|
||||
===================== ======= ======= ======= ========
|
||||
Filesystem block size 1kB 2kB 4kB 8kB
|
||||
===================== ======= ======= ======= ========
|
||||
File size limit 16GB 256GB 2048GB 2048GB
|
||||
Filesystem size limit 2047GB 8192GB 16384GB 32768GB
|
||||
===================== ======= ======= ======= ========
|
||||
|
||||
There is a 2.4 kernel limit of 2048GB for a single block device, so no
|
||||
filesystem larger than that can be created at this time. There is also
|
||||
@ -370,19 +376,24 @@ ext4 and journaling.
|
||||
References
|
||||
==========
|
||||
|
||||
======================= ===============================================
|
||||
The kernel source file:/usr/src/linux/fs/ext2/
|
||||
e2fsprogs (e2fsck) http://e2fsprogs.sourceforge.net/
|
||||
Design & Implementation http://e2fsprogs.sourceforge.net/ext2intro.html
|
||||
Journaling (ext3) ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
|
||||
Filesystem Resizing http://ext2resize.sourceforge.net/
|
||||
Compression (*) http://e2compr.sourceforge.net/
|
||||
Compression [1]_ http://e2compr.sourceforge.net/
|
||||
======================= ===============================================
|
||||
|
||||
Implementations for:
|
||||
Windows 95/98/NT/2000 http://www.chrysocome.net/explore2fs
|
||||
Windows 95 (*) http://www.yipton.net/content.html#FSDEXT2
|
||||
DOS client (*) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
OS/2 (+) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
RISC OS client http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
|
||||
|
||||
(*) no longer actively developed/supported (as of Apr 2001)
|
||||
(+) no longer actively developed/supported (as of Mar 2009)
|
||||
======================= ===========================================================
|
||||
Windows 95/98/NT/2000 http://www.chrysocome.net/explore2fs
|
||||
Windows 95 [1]_ http://www.yipton.net/content.html#FSDEXT2
|
||||
DOS client [1]_ ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
OS/2 [2]_ ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
RISC OS client http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
|
||||
======================= ===========================================================
|
||||
|
||||
.. [1] no longer actively developed/supported (as of Apr 2001)
|
||||
.. [2] no longer actively developed/supported (as of Mar 2009)
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
Ext3 Filesystem
|
||||
===============
|
||||
|
@ -1,6 +1,8 @@
|
||||
================================================================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================================
|
||||
WHAT IS Flash-Friendly File System (F2FS)?
|
||||
================================================================================
|
||||
==========================================
|
||||
|
||||
NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
|
||||
been equipped on a variety systems ranging from mobile to server systems. Since
|
||||
@ -20,14 +22,15 @@ layout, but also for selecting allocation and cleaning algorithms.
|
||||
|
||||
The following git tree provides the file system formatting tool (mkfs.f2fs),
|
||||
a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
|
||||
>> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
|
||||
|
||||
- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
|
||||
|
||||
For reporting bugs and sending patches, please use the following mailing list:
|
||||
>> linux-f2fs-devel@lists.sourceforge.net
|
||||
|
||||
================================================================================
|
||||
BACKGROUND AND DESIGN ISSUES
|
||||
================================================================================
|
||||
- linux-f2fs-devel@lists.sourceforge.net
|
||||
|
||||
Background and Design issues
|
||||
============================
|
||||
|
||||
Log-structured File System (LFS)
|
||||
--------------------------------
|
||||
@ -61,6 +64,7 @@ needs to reclaim these obsolete blocks seamlessly to users. This job is called
|
||||
as a cleaning process.
|
||||
|
||||
The process consists of three operations as follows.
|
||||
|
||||
1. A victim segment is selected through referencing segment usage table.
|
||||
2. It loads parent index structures of all the data in the victim identified by
|
||||
segment summary blocks.
|
||||
@ -71,9 +75,8 @@ This cleaning job may cause unexpected long delays, so the most important goal
|
||||
is to hide the latencies to users. And also definitely, it should reduce the
|
||||
amount of valid data to be moved, and move them quickly as well.
|
||||
|
||||
================================================================================
|
||||
KEY FEATURES
|
||||
================================================================================
|
||||
Key Features
|
||||
============
|
||||
|
||||
Flash Awareness
|
||||
---------------
|
||||
@ -94,10 +97,11 @@ Cleaning Overhead
|
||||
- Support multi-head logs for static/dynamic hot and cold data separation
|
||||
- Introduce adaptive logging for efficient block allocation
|
||||
|
||||
================================================================================
|
||||
MOUNT OPTIONS
|
||||
================================================================================
|
||||
Mount Options
|
||||
=============
|
||||
|
||||
|
||||
====================== ============================================================
|
||||
background_gc=%s Turn on/off cleaning operations, namely garbage
|
||||
collection, triggered in background when I/O subsystem is
|
||||
idle. If background_gc=on, it will turn on the garbage
|
||||
@ -167,7 +171,10 @@ fault_injection=%d Enable fault injection in all supported types with
|
||||
fault_type=%d Support configuring fault injection type, should be
|
||||
enabled with fault_injection option, fault type value
|
||||
is shown below, it supports single or combined type.
|
||||
|
||||
=================== ===========
|
||||
Type_Name Type_Value
|
||||
=================== ===========
|
||||
FAULT_KMALLOC 0x000000001
|
||||
FAULT_KVMALLOC 0x000000002
|
||||
FAULT_PAGE_ALLOC 0x000000004
|
||||
@ -183,6 +190,7 @@ fault_type=%d Support configuring fault injection type, should be
|
||||
FAULT_CHECKPOINT 0x000001000
|
||||
FAULT_DISCARD 0x000002000
|
||||
FAULT_WRITE_IO 0x000004000
|
||||
=================== ===========
|
||||
mode=%s Control block allocation mode which supports "adaptive"
|
||||
and "lfs". In "lfs" mode, there should be no random
|
||||
writes towards main area.
|
||||
@ -246,22 +254,22 @@ compress_extension=%s Support adding specified extension, so that f2fs can enab
|
||||
on compression extension list and enable compression on
|
||||
these file by default rather than to enable it via ioctl.
|
||||
For other files, we can still enable compression via ioctl.
|
||||
====================== ============================================================
|
||||
|
||||
================================================================================
|
||||
DEBUGFS ENTRIES
|
||||
================================================================================
|
||||
Debugfs Entries
|
||||
===============
|
||||
|
||||
/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
|
||||
f2fs. Each file shows the whole f2fs information.
|
||||
|
||||
/sys/kernel/debug/f2fs/status includes:
|
||||
|
||||
- major file system information managed by f2fs currently
|
||||
- average SIT information about whole segments
|
||||
- current memory footprint consumed by f2fs.
|
||||
|
||||
================================================================================
|
||||
SYSFS ENTRIES
|
||||
================================================================================
|
||||
Sysfs Entries
|
||||
=============
|
||||
|
||||
Information about mounted f2fs file systems can be found in
|
||||
/sys/fs/f2fs. Each mounted filesystem will have a directory in
|
||||
@ -271,20 +279,22 @@ The files in each per-device directory are shown in table below.
|
||||
Files in /sys/fs/f2fs/<devname>
|
||||
(see also Documentation/ABI/testing/sysfs-fs-f2fs)
|
||||
|
||||
================================================================================
|
||||
USAGE
|
||||
================================================================================
|
||||
Usage
|
||||
=====
|
||||
|
||||
1. Download userland tools and compile them.
|
||||
|
||||
2. Skip, if f2fs was compiled statically inside kernel.
|
||||
Otherwise, insert the f2fs.ko module.
|
||||
Otherwise, insert the f2fs.ko module::
|
||||
|
||||
# insmod f2fs.ko
|
||||
|
||||
3. Create a directory trying to mount
|
||||
3. Create a directory trying to mount::
|
||||
|
||||
# mkdir /mnt/f2fs
|
||||
|
||||
4. Format the block device, and then mount as f2fs
|
||||
4. Format the block device, and then mount as f2fs::
|
||||
|
||||
# mkfs.f2fs -l label /dev/block_device
|
||||
# mount -t f2fs /dev/block_device /mnt/f2fs
|
||||
|
||||
@ -294,18 +304,26 @@ The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
|
||||
which builds a basic on-disk layout.
|
||||
|
||||
The options consist of:
|
||||
-l [label] : Give a volume label, up to 512 unicode name.
|
||||
-a [0 or 1] : Split start location of each area for heap-based allocation.
|
||||
|
||||
=============== ===========================================================
|
||||
``-l [label]`` Give a volume label, up to 512 unicode name.
|
||||
``-a [0 or 1]`` Split start location of each area for heap-based allocation.
|
||||
|
||||
1 is set by default, which performs this.
|
||||
-o [int] : Set overprovision ratio in percent over volume size.
|
||||
``-o [int]`` Set overprovision ratio in percent over volume size.
|
||||
|
||||
5 is set by default.
|
||||
-s [int] : Set the number of segments per section.
|
||||
``-s [int]`` Set the number of segments per section.
|
||||
|
||||
1 is set by default.
|
||||
-z [int] : Set the number of sections per zone.
|
||||
``-z [int]`` Set the number of sections per zone.
|
||||
|
||||
1 is set by default.
|
||||
-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
|
||||
-t [0 or 1] : Disable discard command or not.
|
||||
``-e [str]`` Set basic extension list. e.g. "mp3,gif,mov"
|
||||
``-t [0 or 1]`` Disable discard command or not.
|
||||
|
||||
1 is set by default, which conducts discard.
|
||||
=============== ===========================================================
|
||||
|
||||
fsck.f2fs
|
||||
---------
|
||||
@ -314,7 +332,8 @@ partition, which examines whether the filesystem metadata and user-made data
|
||||
are cross-referenced correctly or not.
|
||||
Note that, initial version of the tool does not fix any inconsistency.
|
||||
|
||||
The options consist of:
|
||||
The options consist of::
|
||||
|
||||
-d debug level [default:0]
|
||||
|
||||
dump.f2fs
|
||||
@ -327,20 +346,21 @@ It shows on-disk inode information recognized by a given inode number, and is
|
||||
able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
|
||||
./dump_sit respectively.
|
||||
|
||||
The options consist of:
|
||||
The options consist of::
|
||||
|
||||
-d debug level [default:0]
|
||||
-i inode no (hex)
|
||||
-s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
|
||||
-a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
|
||||
|
||||
Examples:
|
||||
# dump.f2fs -i [ino] /dev/sdx
|
||||
# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
|
||||
# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
|
||||
Examples::
|
||||
|
||||
================================================================================
|
||||
DESIGN
|
||||
================================================================================
|
||||
# dump.f2fs -i [ino] /dev/sdx
|
||||
# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
|
||||
# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
|
||||
|
||||
Design
|
||||
======
|
||||
|
||||
On-disk Layout
|
||||
--------------
|
||||
@ -351,7 +371,7 @@ consists of a set of sections. By default, section and zone sizes are set to one
|
||||
segment size identically, but users can easily modify the sizes by mkfs.
|
||||
|
||||
F2FS splits the entire volume into six areas, and all the areas except superblock
|
||||
consists of multiple segments as described below.
|
||||
consists of multiple segments as described below::
|
||||
|
||||
align with the zone size <-|
|
||||
|-> align with the segment size
|
||||
@ -373,28 +393,28 @@ consists of multiple segments as described below.
|
||||
|__zone__|
|
||||
|
||||
- Superblock (SB)
|
||||
: It is located at the beginning of the partition, and there exist two copies
|
||||
It is located at the beginning of the partition, and there exist two copies
|
||||
to avoid file system crash. It contains basic partition information and some
|
||||
default parameters of f2fs.
|
||||
|
||||
- Checkpoint (CP)
|
||||
: It contains file system information, bitmaps for valid NAT/SIT sets, orphan
|
||||
It contains file system information, bitmaps for valid NAT/SIT sets, orphan
|
||||
inode lists, and summary entries of current active segments.
|
||||
|
||||
- Segment Information Table (SIT)
|
||||
: It contains segment information such as valid block count and bitmap for the
|
||||
It contains segment information such as valid block count and bitmap for the
|
||||
validity of all the blocks.
|
||||
|
||||
- Node Address Table (NAT)
|
||||
: It is composed of a block address table for all the node blocks stored in
|
||||
It is composed of a block address table for all the node blocks stored in
|
||||
Main area.
|
||||
|
||||
- Segment Summary Area (SSA)
|
||||
: It contains summary entries which contains the owner information of all the
|
||||
It contains summary entries which contains the owner information of all the
|
||||
data and node blocks stored in Main area.
|
||||
|
||||
- Main Area
|
||||
: It contains file and directory data including their indices.
|
||||
It contains file and directory data including their indices.
|
||||
|
||||
In order to avoid misalignment between file system and flash-based storage, F2FS
|
||||
aligns the start block address of CP with the segment size. Also, it aligns the
|
||||
@ -414,7 +434,7 @@ One of them always indicates the last valid data, which is called as shadow copy
|
||||
mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
|
||||
|
||||
For file system consistency, each CP points to which NAT and SIT copies are
|
||||
valid, as shown as below.
|
||||
valid, as shown as below::
|
||||
|
||||
+--------+----------+---------+
|
||||
| CP | SIT | NAT |
|
||||
@ -438,7 +458,7 @@ indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
|
||||
indices, two direct node pointers, two indirect node pointers, and one double
|
||||
indirect node pointer as described below. One direct node block contains 1018
|
||||
data blocks, and one indirect node block contains also 1018 node blocks. Thus,
|
||||
one inode block (i.e., a file) covers:
|
||||
one inode block (i.e., a file) covers::
|
||||
|
||||
4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
|
||||
|
||||
@ -473,6 +493,8 @@ A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
|
||||
used to represent whether each dentry is valid or not. A dentry block occupies
|
||||
4KB with the following composition.
|
||||
|
||||
::
|
||||
|
||||
Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
|
||||
dentries(11 * 214 bytes) + file name (8 * 214 bytes)
|
||||
|
||||
@ -498,23 +520,25 @@ F2FS implements multi-level hash tables for directory structure. Each level has
|
||||
a hash table with dedicated number of hash buckets as shown below. Note that
|
||||
"A(2B)" means a bucket includes 2 data blocks.
|
||||
|
||||
----------------------
|
||||
A : bucket
|
||||
B : block
|
||||
N : MAX_DIR_HASH_DEPTH
|
||||
----------------------
|
||||
::
|
||||
|
||||
level #0 | A(2B)
|
||||
|
|
||||
level #1 | A(2B) - A(2B)
|
||||
|
|
||||
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
|
||||
. | . . . .
|
||||
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
|
||||
. | . . . .
|
||||
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
|
||||
----------------------
|
||||
A : bucket
|
||||
B : block
|
||||
N : MAX_DIR_HASH_DEPTH
|
||||
----------------------
|
||||
|
||||
The number of blocks and buckets are determined by,
|
||||
level #0 | A(2B)
|
||||
|
|
||||
level #1 | A(2B) - A(2B)
|
||||
|
|
||||
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
|
||||
. | . . . .
|
||||
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
|
||||
. | . . . .
|
||||
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
|
||||
|
||||
The number of blocks and buckets are determined by::
|
||||
|
||||
,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
|
||||
# of blocks in level #n = |
|
||||
@ -532,7 +556,7 @@ dentry consisting of the file name and its inode number. If not found, F2FS
|
||||
scans the next hash table in level #1. In this way, F2FS scans hash tables in
|
||||
each levels incrementally from 1 to N. In each levels F2FS needs to scan only
|
||||
one bucket determined by the following equation, which shows O(log(# of files))
|
||||
complexity.
|
||||
complexity::
|
||||
|
||||
bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
|
||||
|
||||
@ -540,7 +564,8 @@ In the case of file creation, F2FS finds empty consecutive slots that cover the
|
||||
file name. F2FS searches the empty slots in the hash tables of whole levels from
|
||||
1 to N in the same way as the lookup operation.
|
||||
|
||||
The following figure shows an example of two cases holding children.
|
||||
The following figure shows an example of two cases holding children::
|
||||
|
||||
--------------> Dir <--------------
|
||||
| |
|
||||
child child
|
||||
@ -611,14 +636,15 @@ Write-hint Policy
|
||||
2) whint_mode=user-based. F2FS tries to pass down hints given by
|
||||
users.
|
||||
|
||||
===================== ======================== ===================
|
||||
User F2FS Block
|
||||
---- ---- -----
|
||||
===================== ======================== ===================
|
||||
META WRITE_LIFE_NOT_SET
|
||||
HOT_NODE "
|
||||
WARM_NODE "
|
||||
COLD_NODE "
|
||||
*ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
||||
*extension list " "
|
||||
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
||||
extension list " "
|
||||
|
||||
-- buffered io
|
||||
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
||||
@ -635,11 +661,13 @@ WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
||||
WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
||||
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
||||
WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
||||
===================== ======================== ===================
|
||||
|
||||
3) whint_mode=fs-based. F2FS passes down hints with its policy.
|
||||
|
||||
===================== ======================== ===================
|
||||
User F2FS Block
|
||||
---- ---- -----
|
||||
===================== ======================== ===================
|
||||
META WRITE_LIFE_MEDIUM;
|
||||
HOT_NODE WRITE_LIFE_NOT_SET
|
||||
WARM_NODE "
|
||||
@ -662,6 +690,7 @@ WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
||||
WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
||||
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
||||
WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
||||
===================== ======================== ===================
|
||||
|
||||
Fallocate(2) Policy
|
||||
-------------------
|
||||
@ -681,6 +710,7 @@ Allocating disk space
|
||||
However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
|
||||
fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
|
||||
zero or random data, which is useful to the below scenario where:
|
||||
|
||||
1. create(fd)
|
||||
2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
|
||||
3. fallocate(fd, 0, 0, size)
|
||||
@ -692,26 +722,28 @@ Compression implementation
|
||||
--------------------------
|
||||
|
||||
- New term named cluster is defined as basic unit of compression, file can
|
||||
be divided into multiple clusters logically. One cluster includes 4 << n
|
||||
(n >= 0) logical pages, compression size is also cluster size, each of
|
||||
cluster can be compressed or not.
|
||||
be divided into multiple clusters logically. One cluster includes 4 << n
|
||||
(n >= 0) logical pages, compression size is also cluster size, each of
|
||||
cluster can be compressed or not.
|
||||
|
||||
- In cluster metadata layout, one special block address is used to indicate
|
||||
cluster is compressed one or normal one, for compressed cluster, following
|
||||
metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
|
||||
stores data including compress header and compressed data.
|
||||
cluster is compressed one or normal one, for compressed cluster, following
|
||||
metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
|
||||
stores data including compress header and compressed data.
|
||||
|
||||
- In order to eliminate write amplification during overwrite, F2FS only
|
||||
support compression on write-once file, data can be compressed only when
|
||||
all logical blocks in file are valid and cluster compress ratio is lower
|
||||
than specified threshold.
|
||||
support compression on write-once file, data can be compressed only when
|
||||
all logical blocks in file are valid and cluster compress ratio is lower
|
||||
than specified threshold.
|
||||
|
||||
- To enable compression on regular inode, there are three ways:
|
||||
* chattr +c file
|
||||
* chattr +c dir; touch dir/file
|
||||
* mount w/ -o compress_extension=ext; touch file.ext
|
||||
|
||||
Compress metadata layout:
|
||||
* chattr +c file
|
||||
* chattr +c dir; touch dir/file
|
||||
* mount w/ -o compress_extension=ext; touch file.ext
|
||||
|
||||
Compress metadata layout::
|
||||
|
||||
[Dnode Structure]
|
||||
+-----------------------------------------------+
|
||||
| cluster 1 | cluster 2 | ......... | cluster N |
|
||||
@ -719,9 +751,9 @@ Compress metadata layout:
|
||||
. . . .
|
||||
. . . .
|
||||
. Compressed Cluster . . Normal Cluster .
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
. .
|
||||
. .
|
||||
. .
|
@ -1,7 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
==============
|
||||
|
||||
====
|
||||
FUSE
|
||||
==============
|
||||
====
|
||||
|
||||
Definitions
|
||||
===========
|
||||
|
@ -1,14 +1,18 @@
|
||||
uevents and GFS2
|
||||
==================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
uevents and GFS2
|
||||
================
|
||||
|
||||
During the lifetime of a GFS2 mount, a number of uevents are generated.
|
||||
This document explains what the events are and what they are used
|
||||
for (by gfs_controld in gfs2-utils).
|
||||
|
||||
A list of GFS2 uevents
|
||||
-----------------------
|
||||
======================
|
||||
|
||||
1. ADD
|
||||
------
|
||||
|
||||
The ADD event occurs at mount time. It will always be the first
|
||||
uevent generated by the newly created filesystem. If the mount
|
||||
@ -21,6 +25,7 @@ with no journal assigned), and read-only (with journal assigned) status
|
||||
of the filesystem respectively.
|
||||
|
||||
2. ONLINE
|
||||
---------
|
||||
|
||||
The ONLINE uevent is generated after a successful mount or remount. It
|
||||
has the same environment variables as the ADD uevent. The ONLINE
|
||||
@ -29,6 +34,7 @@ RDONLY are a relatively recent addition (2.6.32-rc+) and will not
|
||||
be generated by older kernels.
|
||||
|
||||
3. CHANGE
|
||||
---------
|
||||
|
||||
The CHANGE uevent is used in two places. One is when reporting the
|
||||
successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
|
||||
@ -52,6 +58,7 @@ cluster. For this reason the ONLINE uevent was used when adding a new
|
||||
uevent for a successful mount or remount.
|
||||
|
||||
4. OFFLINE
|
||||
----------
|
||||
|
||||
The OFFLINE uevent is only generated due to filesystem errors and is used
|
||||
as part of the "withdraw" mechanism. Currently this doesn't give any
|
||||
@ -59,6 +66,7 @@ information about what the error is, which is something that needs to
|
||||
be fixed.
|
||||
|
||||
5. REMOVE
|
||||
---------
|
||||
|
||||
The REMOVE uevent is generated at the end of an unsuccessful mount
|
||||
or at the end of a umount of the filesystem. All REMOVE uevents will
|
||||
@ -68,9 +76,10 @@ kobject subsystem.
|
||||
|
||||
|
||||
Information common to all GFS2 uevents (uevent environment variables)
|
||||
----------------------------------------------------------------------
|
||||
=====================================================================
|
||||
|
||||
1. LOCKTABLE=
|
||||
--------------
|
||||
|
||||
The LOCKTABLE is a string, as supplied on the mount command
|
||||
line (locktable=) or via fstab. It is used as a filesystem label
|
||||
@ -78,6 +87,7 @@ as well as providing the information for a lock_dlm mount to be
|
||||
able to join the cluster.
|
||||
|
||||
2. LOCKPROTO=
|
||||
-------------
|
||||
|
||||
The LOCKPROTO is a string, and its value depends on what is set
|
||||
on the mount command line, or via fstab. It will be either
|
||||
@ -85,12 +95,14 @@ lock_nolock or lock_dlm. In the future other lock managers
|
||||
may be supported.
|
||||
|
||||
3. JOURNALID=
|
||||
-------------
|
||||
|
||||
If a journal is in use by the filesystem (journals are not
|
||||
assigned for spectator mounts) then this will give the
|
||||
numeric journal id in all GFS2 uevents.
|
||||
|
||||
4. UUID=
|
||||
--------
|
||||
|
||||
With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
|
||||
into the filesystem superblock. If it exists, this will
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
Global File System
|
||||
------------------
|
||||
==================
|
||||
|
||||
https://fedorahosted.org/cluster/wiki/HomePage
|
||||
|
||||
@ -14,16 +17,18 @@ on one machine show up immediately on all other machines in the cluster.
|
||||
GFS uses interchangeable inter-node locking mechanisms, the currently
|
||||
supported mechanisms are:
|
||||
|
||||
lock_nolock -- allows gfs to be used as a local file system
|
||||
lock_nolock
|
||||
- allows gfs to be used as a local file system
|
||||
|
||||
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
||||
lock_dlm
|
||||
- uses a distributed lock manager (dlm) for inter-node locking.
|
||||
The dlm is found at linux/fs/dlm/
|
||||
|
||||
Lock_dlm depends on user space cluster management systems found
|
||||
at the URL above.
|
||||
|
||||
To use gfs as a local file system, no external clustering systems are
|
||||
needed, simply:
|
||||
needed, simply::
|
||||
|
||||
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
||||
$ mount -t gfs2 /dev/block_device /dir
|
||||
@ -37,9 +42,12 @@ GFS2 is not on-disk compatible with previous versions of GFS, but it
|
||||
is pretty close.
|
||||
|
||||
The following man pages can be found at the URL above:
|
||||
|
||||
============ =============================================
|
||||
fsck.gfs2 to repair a filesystem
|
||||
gfs2_grow to expand a filesystem online
|
||||
gfs2_jadd to add journals to a filesystem online
|
||||
tunegfs2 to manipulate, examine and tune a filesystem
|
||||
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||
mkfs.gfs2 to make a filesystem
|
||||
============ =============================================
|
@ -1,11 +1,16 @@
|
||||
Note: This filesystem doesn't have a maintainer.
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================================
|
||||
Macintosh HFS Filesystem for Linux
|
||||
==================================
|
||||
|
||||
HFS stands for ``Hierarchical File System'' and is the filesystem used
|
||||
|
||||
.. Note:: This filesystem doesn't have a maintainer.
|
||||
|
||||
|
||||
HFS stands for ``Hierarchical File System`` and is the filesystem used
|
||||
by the Mac Plus and all later Macintosh models. Earlier Macintosh
|
||||
models used MFS (``Macintosh File System''), which is not supported,
|
||||
models used MFS (``Macintosh File System``), which is not supported,
|
||||
MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
|
||||
HFS but is extended in various areas. Use the hfsplus filesystem driver
|
||||
to access such filesystems from Linux.
|
||||
@ -49,25 +54,25 @@ Writing to HFS Filesystems
|
||||
HFS is not a UNIX filesystem, thus it does not have the usual features you'd
|
||||
expect:
|
||||
|
||||
o You can't modify the set-uid, set-gid, sticky or executable bits or the uid
|
||||
* You can't modify the set-uid, set-gid, sticky or executable bits or the uid
|
||||
and gid of files.
|
||||
o You can't create hard- or symlinks, device files, sockets or FIFOs.
|
||||
* You can't create hard- or symlinks, device files, sockets or FIFOs.
|
||||
|
||||
HFS does on the other have the concepts of multiple forks per file. These
|
||||
non-standard forks are represented as hidden additional files in the normal
|
||||
filesystems namespace which is kind of a cludge and makes the semantics for
|
||||
the a little strange:
|
||||
|
||||
o You can't create, delete or rename resource forks of files or the
|
||||
* You can't create, delete or rename resource forks of files or the
|
||||
Finder's metadata.
|
||||
o They are however created (with default values), deleted and renamed
|
||||
* They are however created (with default values), deleted and renamed
|
||||
along with the corresponding data fork or directory.
|
||||
o Copying files to a different filesystem will loose those attributes
|
||||
* Copying files to a different filesystem will loose those attributes
|
||||
that are essential for MacOS to work.
|
||||
|
||||
|
||||
Creating HFS filesystems
|
||||
===================================
|
||||
========================
|
||||
|
||||
The hfsutils package from Robert Leslie contains a program called
|
||||
hformat that can be used to create HFS filesystem. See
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================
|
||||
Macintosh HFSPlus Filesystem for Linux
|
||||
======================================
|
||||
|
@ -1,13 +1,21 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================
|
||||
Read/Write HPFS 2.09
|
||||
====================
|
||||
|
||||
1998-2004, Mikulas Patocka
|
||||
|
||||
email: mikulas@artax.karlin.mff.cuni.cz
|
||||
homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
|
||||
:email: mikulas@artax.karlin.mff.cuni.cz
|
||||
:homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
|
||||
|
||||
CREDITS:
|
||||
Credits
|
||||
=======
|
||||
Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
|
||||
is taken from it
|
||||
|
||||
Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
|
||||
|
||||
Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
|
||||
|
||||
Mount options
|
||||
@ -50,6 +58,7 @@ timeshift=(-)nnn (default 0)
|
||||
|
||||
|
||||
File names
|
||||
==========
|
||||
|
||||
As in OS/2, filenames are case insensitive. However, shell thinks that names
|
||||
are case sensitive, so for example when you create a file FOO, you can use
|
||||
@ -64,6 +73,7 @@ access it under names 'a.', 'a..', 'a . . . ' etc.
|
||||
|
||||
|
||||
Extended attributes
|
||||
===================
|
||||
|
||||
On HPFS partitions, OS/2 can associate to each file a special information called
|
||||
extended attributes. Extended attributes are pairs of (key,value) where key is
|
||||
@ -88,6 +98,7 @@ values doesn't work.
|
||||
|
||||
|
||||
Symlinks
|
||||
========
|
||||
|
||||
You can do symlinks on HPFS partition, symlinks are achieved by setting extended
|
||||
attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
|
||||
@ -101,6 +112,7 @@ to analyze or change OS2SYS.INI.
|
||||
|
||||
|
||||
Codepages
|
||||
=========
|
||||
|
||||
HPFS can contain several uppercasing tables for several codepages and each
|
||||
file has a pointer to codepage its name is in. However OS/2 was created in
|
||||
@ -128,6 +140,7 @@ this codepage - if you don't try to do what I described above :-)
|
||||
|
||||
|
||||
Known bugs
|
||||
==========
|
||||
|
||||
HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
|
||||
should work. If you have OS/2 server, use only read-only mode. I don't know how
|
||||
@ -152,7 +165,8 @@ would result in directory tree splitting, that takes disk space. Workaround is
|
||||
to delete other files that are leaf (probability that the file is non-leaf is
|
||||
about 1/50) or to truncate file first to make some space.
|
||||
You encounter this problem only if you have many directories so that
|
||||
preallocated directory band is full i.e.
|
||||
preallocated directory band is full i.e.::
|
||||
|
||||
number_of_directories / size_of_filesystem_in_mb > 4.
|
||||
|
||||
You can't delete open directories.
|
||||
@ -174,6 +188,7 @@ anybody know what does it mean?
|
||||
|
||||
|
||||
What does "unbalanced tree" message mean?
|
||||
=========================================
|
||||
|
||||
Old versions of this driver created sometimes unbalanced dnode trees. OS/2
|
||||
chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
|
||||
@ -187,6 +202,7 @@ whole created by this driver, it is BUG - let me know about it.
|
||||
|
||||
|
||||
Bugs in OS/2
|
||||
============
|
||||
|
||||
When you have two (or more) lost directories pointing each to other, chkdsk
|
||||
locks up when repairing filesystem.
|
||||
@ -199,13 +215,16 @@ File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
|
||||
marks them as short (and writes "minor fs error corrected"). This bug is not in
|
||||
HPFS386.
|
||||
|
||||
Codepage bugs described above.
|
||||
Codepage bugs described above
|
||||
=============================
|
||||
|
||||
If you don't install fixpacks, there are many, many more...
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
====== =========================================================================
|
||||
0.90 First public release
|
||||
0.91 Fixed bug that caused shooting to memory when write_inode was called on
|
||||
open inode (rarely happened)
|
||||
@ -219,78 +238,116 @@ History
|
||||
1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk
|
||||
Fixed a race-condition when write_inode is called while deleting file
|
||||
Fixed a bug that could possibly happen (with very low probability) when
|
||||
using 0xff in filenames
|
||||
using 0xff in filenames.
|
||||
|
||||
Rewritten locking to avoid race-conditions
|
||||
|
||||
Mount option 'eas' now works
|
||||
|
||||
Fsync no longer returns error
|
||||
|
||||
Files beginning with '.' are marked hidden
|
||||
|
||||
Remount support added
|
||||
|
||||
Alloc is not so slow when filesystem becomes full
|
||||
|
||||
Atimes are no more updated because it slows down operation
|
||||
|
||||
Code cleanup (removed all commented debug prints)
|
||||
1.92 Corrected a bug when sync was called just before closing file
|
||||
1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it
|
||||
works with previous versions
|
||||
|
||||
Fixed a possible problem with disks > 64G (but I don't have one, so I can't
|
||||
test it)
|
||||
|
||||
Fixed a file overflow at 2G
|
||||
|
||||
Added new option 'timeshift'
|
||||
|
||||
Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
|
||||
read-only mode
|
||||
|
||||
Fixed a bug that slowed down alloc and prevented allocating 100% space
|
||||
(this bug was not destructive)
|
||||
1.94 Added workaround for one bug in Linux
|
||||
|
||||
Fixed one buffer leak
|
||||
|
||||
Fixed some incompatibilities with large extended attributes (but it's still
|
||||
not 100% ok, I have no info on it and OS/2 doesn't want to create them)
|
||||
|
||||
Rewritten allocation
|
||||
|
||||
Fixed a bug with i_blocks (du sometimes didn't display correct values)
|
||||
|
||||
Directories have no longer archive attribute set (some programs don't like
|
||||
it)
|
||||
|
||||
Fixed a bug that it set badly one flag in large anode tree (it was not
|
||||
destructive)
|
||||
1.95 Fixed one buffer leak, that could happen on corrupted filesystem
|
||||
|
||||
Fixed one bug in allocation in 1.94
|
||||
1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
|
||||
error sometimes when opening directories in PMSHELL)
|
||||
|
||||
Fixed a possible bitmap race
|
||||
|
||||
Fixed possible problem on large disks
|
||||
|
||||
You can now delete open files
|
||||
|
||||
Fixed a nondestructive race in rename
|
||||
1.97 Support for HPFS v3 (on large partitions)
|
||||
Fixed a bug that it didn't allow creation of files > 128M (it should be 2G)
|
||||
|
||||
ZFixed a bug that it didn't allow creation of files > 128M
|
||||
(it should be 2G)
|
||||
1.97.1 Changed names of global symbols
|
||||
|
||||
Fixed a bug when chmoding or chowning root directory
|
||||
1.98 Fixed a deadlock when using old_readdir
|
||||
Better directory handling; workaround for "unbalanced tree" bug in OS/2
|
||||
1.99 Corrected a possible problem when there's not enough space while deleting
|
||||
file
|
||||
Now it tries to truncate the file if there's not enough space when deleting
|
||||
|
||||
Now it tries to truncate the file if there's not enough space when
|
||||
deleting
|
||||
|
||||
Removed a lot of redundant code
|
||||
2.00 Fixed a bug in rename (it was there since 1.96)
|
||||
Better anti-fragmentation strategy
|
||||
2.01 Fixed problem with directory listing over NFS
|
||||
|
||||
Directory lseek now checks for proper parameters
|
||||
|
||||
Fixed race-condition in buffer code - it is in all filesystems in Linux;
|
||||
when reading device (cat /dev/hda) while creating files on it, files
|
||||
could be damaged
|
||||
2.02 Workaround for bug in breada in Linux. breada could cause accesses beyond
|
||||
end of partition
|
||||
2.03 Char, block devices and pipes are correctly created
|
||||
|
||||
Fixed non-crashing race in unlink (Alexander Viro)
|
||||
|
||||
Now it works with Japanese version of OS/2
|
||||
2.04 Fixed error when ftruncate used to extend file
|
||||
2.05 Fixed crash when got mount parameters without =
|
||||
|
||||
Fixed crash when allocation of anode failed due to full disk
|
||||
|
||||
Fixed some crashes when block io or inode allocation failed
|
||||
2.06 Fixed some crash on corrupted disk structures
|
||||
|
||||
Better allocation strategy
|
||||
|
||||
Reschedule points added so that it doesn't lock CPU long time
|
||||
|
||||
It should work in read-only mode on Warp Server
|
||||
2.07 More fixes for Warp Server. Now it really works
|
||||
2.08 Creating new files is not so slow on large disks
|
||||
|
||||
An attempt to sync deleted file does not generate filesystem error
|
||||
2.09 Fixed error on extremely fragmented files
|
||||
|
||||
|
||||
vim: set textwidth=80:
|
||||
====== =========================================================================
|
@ -1,3 +1,5 @@
|
||||
.. _filesystems_index:
|
||||
|
||||
===============================
|
||||
Filesystems in the Linux kernel
|
||||
===============================
|
||||
@ -46,8 +48,53 @@ Documentation for filesystem implementations.
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
9p
|
||||
adfs
|
||||
affs
|
||||
afs
|
||||
autofs
|
||||
autofs-mount-control
|
||||
befs
|
||||
bfs
|
||||
btrfs
|
||||
ceph
|
||||
cramfs
|
||||
debugfs
|
||||
dlmfs
|
||||
ecryptfs
|
||||
efivarfs
|
||||
erofs
|
||||
ext2
|
||||
ext3
|
||||
f2fs
|
||||
gfs2
|
||||
gfs2-uevents
|
||||
hfs
|
||||
hfsplus
|
||||
hpfs
|
||||
fuse
|
||||
inotify
|
||||
isofs
|
||||
nilfs2
|
||||
nfs/index
|
||||
ntfs
|
||||
ocfs2
|
||||
ocfs2-online-filecheck
|
||||
omfs
|
||||
orangefs
|
||||
overlayfs
|
||||
proc
|
||||
qnx6
|
||||
ramfs-rootfs-initramfs
|
||||
relay
|
||||
romfs
|
||||
squashfs
|
||||
sysfs
|
||||
sysv-fs
|
||||
tmpfs
|
||||
ubifs
|
||||
ubifs-authentication.rst
|
||||
udf
|
||||
virtiofs
|
||||
vfat
|
||||
zonefs
|
||||
|
@ -1,27 +1,36 @@
|
||||
inotify
|
||||
a powerful yet simple file change notification system
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============================================================
|
||||
Inotify - A Powerful yet Simple File Change Notification System
|
||||
===============================================================
|
||||
|
||||
|
||||
|
||||
Document started 15 Mar 2005 by Robert Love <rml@novell.com>
|
||||
|
||||
Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
|
||||
--Deleted obsoleted interface, just refer to manpages for user interface.
|
||||
|
||||
- Deleted obsoleted interface, just refer to manpages for user interface.
|
||||
|
||||
(i) Rationale
|
||||
|
||||
Q: What is the design decision behind not tying the watch to the open fd of
|
||||
Q:
|
||||
What is the design decision behind not tying the watch to the open fd of
|
||||
the watched object?
|
||||
|
||||
A: Watches are associated with an open inotify device, not an open file.
|
||||
A:
|
||||
Watches are associated with an open inotify device, not an open file.
|
||||
This solves the primary problem with dnotify: keeping the file open pins
|
||||
the file and thus, worse, pins the mount. Dnotify is therefore infeasible
|
||||
for use on a desktop system with removable media as the media cannot be
|
||||
unmounted. Watching a file should not require that it be open.
|
||||
|
||||
Q: What is the design decision behind using an-fd-per-instance as opposed to
|
||||
Q:
|
||||
What is the design decision behind using an-fd-per-instance as opposed to
|
||||
an fd-per-watch?
|
||||
|
||||
A: An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
A:
|
||||
An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
more fd's than are feasible to manage, and more fd's than are optimally
|
||||
select()-able. Yes, root can bump the per-process fd limit and yes, users
|
||||
can use epoll, but requiring both is a silly and extraneous requirement.
|
||||
@ -65,9 +74,11 @@ A: An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
|
||||
process can easily want more than one queue.
|
||||
|
||||
Q: Why the system call approach?
|
||||
Q:
|
||||
Why the system call approach?
|
||||
|
||||
A: The poor user-space interface is the second biggest problem with dnotify.
|
||||
A:
|
||||
The poor user-space interface is the second biggest problem with dnotify.
|
||||
Signals are a terrible, terrible interface for file notification. Or for
|
||||
anything, for that matter. The ideal solution, from all perspectives, is a
|
||||
file descriptor-based one that allows basic file I/O and poll/select.
|
64
Documentation/filesystems/isofs.rst
Normal file
64
Documentation/filesystems/isofs.rst
Normal file
@ -0,0 +1,64 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
ISO9660 Filesystem
|
||||
==================
|
||||
|
||||
Mount options that are the same as for msdos and vfat partitions.
|
||||
|
||||
========= ========================================================
|
||||
gid=nnn All files in the partition will be in group nnn.
|
||||
uid=nnn All files in the partition will be owned by user id nnn.
|
||||
umask=nnn The permission mask (see umask(1)) for the partition.
|
||||
========= ========================================================
|
||||
|
||||
Mount options that are the same as vfat partitions. These are only useful
|
||||
when using discs encoded using Microsoft's Joliet extensions.
|
||||
|
||||
============== =============================================================
|
||||
iocharset=name Character set to use for converting from Unicode to
|
||||
ASCII. Joliet filenames are stored in Unicode format, but
|
||||
Unix for the most part doesn't know how to deal with Unicode.
|
||||
There is also an option of doing UTF-8 translations with the
|
||||
utf8 option.
|
||||
utf8 Encode Unicode names in UTF-8 format. Default is no.
|
||||
============== =============================================================
|
||||
|
||||
Mount options unique to the isofs filesystem.
|
||||
|
||||
================= ============================================================
|
||||
block=512 Set the block size for the disk to 512 bytes
|
||||
block=1024 Set the block size for the disk to 1024 bytes
|
||||
block=2048 Set the block size for the disk to 2048 bytes
|
||||
check=relaxed Matches filenames with different cases
|
||||
check=strict Matches only filenames with the exact same case
|
||||
cruft Try to handle badly formatted CDs.
|
||||
map=off Do not map non-Rock Ridge filenames to lower case
|
||||
map=normal Map non-Rock Ridge filenames to lower case
|
||||
map=acorn As map=normal but also apply Acorn extensions if present
|
||||
mode=xxx Sets the permissions on files to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
overriderockperm Set permissions on files and directories according to
|
||||
'mode' and 'dmode' even though Rock Ridge extensions are
|
||||
present.
|
||||
nojoliet Ignore Joliet extensions if they are present.
|
||||
norock Ignore Rock Ridge extensions if they are present.
|
||||
hide Completely strip hidden files from the file system.
|
||||
showassoc Show files marked with the 'associated' bit
|
||||
unhide Deprecated; showing hidden files is now default;
|
||||
If given, it is a synonym for 'showassoc' which will
|
||||
recreate previous unhide behavior
|
||||
session=x Select number of session on multisession CD
|
||||
sbsector=xxx Session begins from sector xxx
|
||||
================= ============================================================
|
||||
|
||||
Recommended documents about ISO 9660 standard are located at:
|
||||
|
||||
- http://www.y-adagio.com/
|
||||
- ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
|
||||
|
||||
Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically
|
||||
identical with ISO 9660.", so it is a valid and gratis substitute of the
|
||||
official ISO specification.
|
@ -1,48 +0,0 @@
|
||||
Mount options that are the same as for msdos and vfat partitions.
|
||||
|
||||
gid=nnn All files in the partition will be in group nnn.
|
||||
uid=nnn All files in the partition will be owned by user id nnn.
|
||||
umask=nnn The permission mask (see umask(1)) for the partition.
|
||||
|
||||
Mount options that are the same as vfat partitions. These are only useful
|
||||
when using discs encoded using Microsoft's Joliet extensions.
|
||||
iocharset=name Character set to use for converting from Unicode to
|
||||
ASCII. Joliet filenames are stored in Unicode format, but
|
||||
Unix for the most part doesn't know how to deal with Unicode.
|
||||
There is also an option of doing UTF-8 translations with the
|
||||
utf8 option.
|
||||
utf8 Encode Unicode names in UTF-8 format. Default is no.
|
||||
|
||||
Mount options unique to the isofs filesystem.
|
||||
block=512 Set the block size for the disk to 512 bytes
|
||||
block=1024 Set the block size for the disk to 1024 bytes
|
||||
block=2048 Set the block size for the disk to 2048 bytes
|
||||
check=relaxed Matches filenames with different cases
|
||||
check=strict Matches only filenames with the exact same case
|
||||
cruft Try to handle badly formatted CDs.
|
||||
map=off Do not map non-Rock Ridge filenames to lower case
|
||||
map=normal Map non-Rock Ridge filenames to lower case
|
||||
map=acorn As map=normal but also apply Acorn extensions if present
|
||||
mode=xxx Sets the permissions on files to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
overriderockperm Set permissions on files and directories according to
|
||||
'mode' and 'dmode' even though Rock Ridge extensions are
|
||||
present.
|
||||
nojoliet Ignore Joliet extensions if they are present.
|
||||
norock Ignore Rock Ridge extensions if they are present.
|
||||
hide Completely strip hidden files from the file system.
|
||||
showassoc Show files marked with the 'associated' bit
|
||||
unhide Deprecated; showing hidden files is now default;
|
||||
If given, it is a synonym for 'showassoc' which will
|
||||
recreate previous unhide behavior
|
||||
session=x Select number of session on multisession CD
|
||||
sbsector=xxx Session begins from sector xxx
|
||||
|
||||
Recommended documents about ISO 9660 standard are located at:
|
||||
http://www.y-adagio.com/
|
||||
ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
|
||||
Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically
|
||||
identical with ISO 9660.", so it is a valid and gratis substitute of the
|
||||
official ISO specification.
|
13
Documentation/filesystems/nfs/index.rst
Normal file
13
Documentation/filesystems/nfs/index.rst
Normal file
@ -0,0 +1,13 @@
|
||||
===============================
|
||||
NFS
|
||||
===============================
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
pnfs
|
||||
rpc-cache
|
||||
rpc-server-gss
|
||||
nfs41-server
|
||||
knfsd-stats
|
@ -1,7 +1,9 @@
|
||||
|
||||
============================
|
||||
Kernel NFS Server Statistics
|
||||
============================
|
||||
|
||||
:Authors: Greg Banks <gnb@sgi.com> - 26 Mar 2009
|
||||
|
||||
This document describes the format and semantics of the statistics
|
||||
which the kernel NFS server makes available to userspace. These
|
||||
statistics are available in several text form pseudo files, each of
|
||||
@ -18,7 +20,7 @@ by parsing routines. All other lines contain a sequence of fields
|
||||
separated by whitespace.
|
||||
|
||||
/proc/fs/nfsd/pool_stats
|
||||
------------------------
|
||||
========================
|
||||
|
||||
This file is available in kernels from 2.6.30 onwards, if the
|
||||
/proc/fs/nfsd filesystem is mounted (it almost always should be).
|
||||
@ -109,15 +111,12 @@ this case), or the transport can be enqueued for later attention
|
||||
(sockets-enqueued counts this case), or the packet can be temporarily
|
||||
deferred because the transport is currently being used by an nfsd
|
||||
thread. This last case is not very interesting and is not explicitly
|
||||
counted, but can be inferred from the other counters thus:
|
||||
counted, but can be inferred from the other counters thus::
|
||||
|
||||
packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
|
||||
packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
|
||||
|
||||
|
||||
More
|
||||
----
|
||||
====
|
||||
|
||||
Descriptions of the other statistics file should go here.
|
||||
|
||||
|
||||
Greg Banks <gnb@sgi.com>
|
||||
26 Mar 2009
|
256
Documentation/filesystems/nfs/nfs41-server.rst
Normal file
256
Documentation/filesystems/nfs/nfs41-server.rst
Normal file
@ -0,0 +1,256 @@
|
||||
=============================
|
||||
NFSv4.1 Server Implementation
|
||||
=============================
|
||||
|
||||
Server support for minorversion 1 can be controlled using the
|
||||
/proc/fs/nfsd/versions control file. The string output returned
|
||||
by reading this file will contain either "+4.1" or "-4.1"
|
||||
correspondingly.
|
||||
|
||||
Currently, server support for minorversion 1 is enabled by default.
|
||||
It can be disabled at run time by writing the string "-4.1" to
|
||||
the /proc/fs/nfsd/versions control file. Note that to write this
|
||||
control file, the nfsd service must be taken down. You can use rpc.nfsd
|
||||
for this; see rpc.nfsd(8).
|
||||
|
||||
(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
|
||||
"-4", respectively. Therefore, code meant to work on both new and old
|
||||
kernels must turn 4.1 on or off *before* turning support for version 4
|
||||
on or off; rpc.nfsd does this correctly.)
|
||||
|
||||
The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
|
||||
on RFC 5661.
|
||||
|
||||
From the many new features in NFSv4.1 the current implementation
|
||||
focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
|
||||
"exactly once" semantics and better control and throttling of the
|
||||
resources allocated for each client.
|
||||
|
||||
The table below, taken from the NFSv4.1 document, lists
|
||||
the operations that are mandatory to implement (REQ), optional
|
||||
(OPT), and NFSv4.0 operations that are required not to implement (MNI)
|
||||
in minor version 1. The first column indicates the operations that
|
||||
are not supported yet by the linux server implementation.
|
||||
|
||||
The OPTIONAL features identified and their abbreviations are as follows:
|
||||
|
||||
- **pNFS** Parallel NFS
|
||||
- **FDELG** File Delegations
|
||||
- **DDELG** Directory Delegations
|
||||
|
||||
The following abbreviations indicate the linux server implementation status.
|
||||
|
||||
- **I** Implemented NFSv4.1 operations.
|
||||
- **NS** Not Supported.
|
||||
- **NS\*** Unimplemented optional feature.
|
||||
|
||||
Operations
|
||||
==========
|
||||
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| Implementation status | Operation | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition |
|
||||
+=======================+======================+=====================+===========================+================+
|
||||
| | ACCESS | REQ | | Section 18.1 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | BACKCHANNEL_CTL | REQ | | Section 18.33 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | CLOSE | REQ | | Section 18.2 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | COMMIT | REQ | | Section 18.3 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | CREATE | REQ | | Section 18.4 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | CREATE_SESSION | REQ | | Section 18.36 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| NS* | DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | DELEGRETURN | OPT | FDELG, | Section 18.6 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | | | DDELG, pNFS | |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | | | (REQ) | |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | DESTROY_CLIENTID | REQ | | Section 18.50 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | DESTROY_SESSION | REQ | | Section 18.37 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | EXCHANGE_ID | REQ | | Section 18.35 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | FREE_STATEID | REQ | | Section 18.38 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | GETATTR | REQ | | Section 18.7 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| NS* | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | GETFH | REQ | | Section 18.8 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| NS* | GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | LINK | OPT | | Section 18.9 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | LOCK | REQ | | Section 18.10 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | LOCKT | REQ | | Section 18.11 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | LOCKU | REQ | | Section 18.12 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | LOOKUP | REQ | | Section 18.13 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | LOOKUPP | REQ | | Section 18.14 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | NVERIFY | REQ | | Section 18.15 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | OPEN | REQ | | Section 18.16 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| NS* | OPENATTR | OPT | | Section 18.17 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | OPEN_CONFIRM | MNI | | N/A |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | OPEN_DOWNGRADE | REQ | | Section 18.18 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | PUTFH | REQ | | Section 18.19 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | PUTPUBFH | REQ | | Section 18.20 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | PUTROOTFH | REQ | | Section 18.21 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | READ | REQ | | Section 18.22 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | READDIR | REQ | | Section 18.23 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | READLINK | OPT | | Section 18.24 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | RECLAIM_COMPLETE | REQ | | Section 18.51 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | RELEASE_LOCKOWNER | MNI | | N/A |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | REMOVE | REQ | | Section 18.25 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | RENAME | REQ | | Section 18.26 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | RENEW | MNI | | N/A |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | RESTOREFH | REQ | | Section 18.27 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | SAVEFH | REQ | | Section 18.28 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | SECINFO | REQ | | Section 18.29 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | | | layout (REQ) | Section 13.12 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | SEQUENCE | REQ | | Section 18.46 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | SETATTR | REQ | | Section 18.30 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | SETCLIENTID | MNI | | N/A |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | SETCLIENTID_CONFIRM | MNI | | N/A |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| NS | SET_SSV | REQ | | Section 18.47 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| I | TEST_STATEID | REQ | | Section 18.48 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | VERIFY | REQ | | Section 18.31 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| NS* | WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
| | WRITE | REQ | | Section 18.32 |
|
||||
+-----------------------+----------------------+---------------------+---------------------------+----------------+
|
||||
|
||||
|
||||
Callback Operations
|
||||
===================
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| Implementation status | Operation | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition |
|
||||
+=======================+=========================+=====================+===========================+===============+
|
||||
| | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_NOTIFY_LOCK | OPT | | Section 20.11 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | CB_RECALL | OPT | FDELG, | Section 20.2 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | DDELG, pNFS | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | (REQ) | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_RECALL_ANY | OPT | FDELG, | Section 20.6 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | DDELG, pNFS | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | (REQ) | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS | CB_RECALL_SLOT | REQ | | Section 20.8 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | (REQ) | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | DDELG, pNFS | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | (REQ) | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| NS* | CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | DDELG, pNFS | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
| | | | (REQ) | |
|
||||
+-----------------------+-------------------------+---------------------+---------------------------+---------------+
|
||||
|
||||
|
||||
Implementation notes:
|
||||
=====================
|
||||
|
||||
SSV:
|
||||
The spec claims this is mandatory, but we don't actually know of any
|
||||
implementations, so we're ignoring it for now. The server returns
|
||||
NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof.
|
||||
|
||||
GSS on the backchannel:
|
||||
Again, theoretically required but not widely implemented (in
|
||||
particular, the current Linux client doesn't request it). We return
|
||||
NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION.
|
||||
|
||||
DELEGPURGE:
|
||||
mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
|
||||
CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
|
||||
persist across client reboots). Thus we need not implement this for
|
||||
now.
|
||||
|
||||
EXCHANGE_ID:
|
||||
implementation ids are ignored
|
||||
|
||||
CREATE_SESSION:
|
||||
backchannel attributes are ignored
|
||||
|
||||
SEQUENCE:
|
||||
no support for dynamic slot table renegotiation (optional)
|
||||
|
||||
Nonstandard compound limitations:
|
||||
No support for a sessions fore channel RPC compound that requires both a
|
||||
ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
|
||||
fail to live up to the promise we made in CREATE_SESSION fore channel
|
||||
negotiation.
|
||||
|
||||
See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
|
@ -1,173 +0,0 @@
|
||||
NFSv4.1 Server Implementation
|
||||
|
||||
Server support for minorversion 1 can be controlled using the
|
||||
/proc/fs/nfsd/versions control file. The string output returned
|
||||
by reading this file will contain either "+4.1" or "-4.1"
|
||||
correspondingly.
|
||||
|
||||
Currently, server support for minorversion 1 is enabled by default.
|
||||
It can be disabled at run time by writing the string "-4.1" to
|
||||
the /proc/fs/nfsd/versions control file. Note that to write this
|
||||
control file, the nfsd service must be taken down. You can use rpc.nfsd
|
||||
for this; see rpc.nfsd(8).
|
||||
|
||||
(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
|
||||
"-4", respectively. Therefore, code meant to work on both new and old
|
||||
kernels must turn 4.1 on or off *before* turning support for version 4
|
||||
on or off; rpc.nfsd does this correctly.)
|
||||
|
||||
The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
|
||||
on RFC 5661.
|
||||
|
||||
From the many new features in NFSv4.1 the current implementation
|
||||
focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
|
||||
"exactly once" semantics and better control and throttling of the
|
||||
resources allocated for each client.
|
||||
|
||||
The table below, taken from the NFSv4.1 document, lists
|
||||
the operations that are mandatory to implement (REQ), optional
|
||||
(OPT), and NFSv4.0 operations that are required not to implement (MNI)
|
||||
in minor version 1. The first column indicates the operations that
|
||||
are not supported yet by the linux server implementation.
|
||||
|
||||
The OPTIONAL features identified and their abbreviations are as follows:
|
||||
pNFS Parallel NFS
|
||||
FDELG File Delegations
|
||||
DDELG Directory Delegations
|
||||
|
||||
The following abbreviations indicate the linux server implementation status.
|
||||
I Implemented NFSv4.1 operations.
|
||||
NS Not Supported.
|
||||
NS* Unimplemented optional feature.
|
||||
|
||||
Operations
|
||||
|
||||
+----------------------+------------+--------------+----------------+
|
||||
| Operation | REQ, REC, | Feature | Definition |
|
||||
| | OPT, or | (REQ, REC, | |
|
||||
| | MNI | or OPT) | |
|
||||
+----------------------+------------+--------------+----------------+
|
||||
| ACCESS | REQ | | Section 18.1 |
|
||||
I | BACKCHANNEL_CTL | REQ | | Section 18.33 |
|
||||
I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
|
||||
| CLOSE | REQ | | Section 18.2 |
|
||||
| COMMIT | REQ | | Section 18.3 |
|
||||
| CREATE | REQ | | Section 18.4 |
|
||||
I | CREATE_SESSION | REQ | | Section 18.36 |
|
||||
NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
|
||||
| DELEGRETURN | OPT | FDELG, | Section 18.6 |
|
||||
| | | DDELG, pNFS | |
|
||||
| | | (REQ) | |
|
||||
I | DESTROY_CLIENTID | REQ | | Section 18.50 |
|
||||
I | DESTROY_SESSION | REQ | | Section 18.37 |
|
||||
I | EXCHANGE_ID | REQ | | Section 18.35 |
|
||||
I | FREE_STATEID | REQ | | Section 18.38 |
|
||||
| GETATTR | REQ | | Section 18.7 |
|
||||
I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
|
||||
NS*| GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
|
||||
| GETFH | REQ | | Section 18.8 |
|
||||
NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
|
||||
I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
|
||||
I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
|
||||
I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
|
||||
| LINK | OPT | | Section 18.9 |
|
||||
| LOCK | REQ | | Section 18.10 |
|
||||
| LOCKT | REQ | | Section 18.11 |
|
||||
| LOCKU | REQ | | Section 18.12 |
|
||||
| LOOKUP | REQ | | Section 18.13 |
|
||||
| LOOKUPP | REQ | | Section 18.14 |
|
||||
| NVERIFY | REQ | | Section 18.15 |
|
||||
| OPEN | REQ | | Section 18.16 |
|
||||
NS*| OPENATTR | OPT | | Section 18.17 |
|
||||
| OPEN_CONFIRM | MNI | | N/A |
|
||||
| OPEN_DOWNGRADE | REQ | | Section 18.18 |
|
||||
| PUTFH | REQ | | Section 18.19 |
|
||||
| PUTPUBFH | REQ | | Section 18.20 |
|
||||
| PUTROOTFH | REQ | | Section 18.21 |
|
||||
| READ | REQ | | Section 18.22 |
|
||||
| READDIR | REQ | | Section 18.23 |
|
||||
| READLINK | OPT | | Section 18.24 |
|
||||
| RECLAIM_COMPLETE | REQ | | Section 18.51 |
|
||||
| RELEASE_LOCKOWNER | MNI | | N/A |
|
||||
| REMOVE | REQ | | Section 18.25 |
|
||||
| RENAME | REQ | | Section 18.26 |
|
||||
| RENEW | MNI | | N/A |
|
||||
| RESTOREFH | REQ | | Section 18.27 |
|
||||
| SAVEFH | REQ | | Section 18.28 |
|
||||
| SECINFO | REQ | | Section 18.29 |
|
||||
I | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
|
||||
| | | layout (REQ) | Section 13.12 |
|
||||
I | SEQUENCE | REQ | | Section 18.46 |
|
||||
| SETATTR | REQ | | Section 18.30 |
|
||||
| SETCLIENTID | MNI | | N/A |
|
||||
| SETCLIENTID_CONFIRM | MNI | | N/A |
|
||||
NS | SET_SSV | REQ | | Section 18.47 |
|
||||
I | TEST_STATEID | REQ | | Section 18.48 |
|
||||
| VERIFY | REQ | | Section 18.31 |
|
||||
NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
|
||||
| WRITE | REQ | | Section 18.32 |
|
||||
|
||||
Callback Operations
|
||||
|
||||
+-------------------------+-----------+-------------+---------------+
|
||||
| Operation | REQ, REC, | Feature | Definition |
|
||||
| | OPT, or | (REQ, REC, | |
|
||||
| | MNI | or OPT) | |
|
||||
+-------------------------+-----------+-------------+---------------+
|
||||
| CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
|
||||
I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
|
||||
NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
|
||||
NS*| CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
|
||||
NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 |
|
||||
NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
|
||||
| CB_RECALL | OPT | FDELG, | Section 20.2 |
|
||||
| | | DDELG, pNFS | |
|
||||
| | | (REQ) | |
|
||||
NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 |
|
||||
| | | DDELG, pNFS | |
|
||||
| | | (REQ) | |
|
||||
NS | CB_RECALL_SLOT | REQ | | Section 20.8 |
|
||||
NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 |
|
||||
| | | (REQ) | |
|
||||
I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 |
|
||||
| | | DDELG, pNFS | |
|
||||
| | | (REQ) | |
|
||||
NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
|
||||
| | | DDELG, pNFS | |
|
||||
| | | (REQ) | |
|
||||
+-------------------------+-----------+-------------+---------------+
|
||||
|
||||
Implementation notes:
|
||||
|
||||
SSV:
|
||||
* The spec claims this is mandatory, but we don't actually know of any
|
||||
implementations, so we're ignoring it for now. The server returns
|
||||
NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof.
|
||||
|
||||
GSS on the backchannel:
|
||||
* Again, theoretically required but not widely implemented (in
|
||||
particular, the current Linux client doesn't request it). We return
|
||||
NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION.
|
||||
|
||||
DELEGPURGE:
|
||||
* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
|
||||
CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
|
||||
persist across client reboots). Thus we need not implement this for
|
||||
now.
|
||||
|
||||
EXCHANGE_ID:
|
||||
* implementation ids are ignored
|
||||
|
||||
CREATE_SESSION:
|
||||
* backchannel attributes are ignored
|
||||
|
||||
SEQUENCE:
|
||||
* no support for dynamic slot table renegotiation (optional)
|
||||
|
||||
Nonstandard compound limitations:
|
||||
* No support for a sessions fore channel RPC compound that requires both a
|
||||
ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
|
||||
fail to live up to the promise we made in CREATE_SESSION fore channel
|
||||
negotiation.
|
||||
|
||||
See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
|
@ -1,4 +1,5 @@
|
||||
Reference counting in pnfs:
|
||||
==========================
|
||||
Reference counting in pnfs
|
||||
==========================
|
||||
|
||||
The are several inter-related caches. We have layouts which can
|
||||
@ -9,7 +10,8 @@ we need to reference count.
|
||||
|
||||
|
||||
struct pnfs_layout_hdr
|
||||
----------------------
|
||||
======================
|
||||
|
||||
The on-the-wire command LAYOUTGET corresponds to struct
|
||||
pnfs_layout_segment, usually referred to by the variable name lseg.
|
||||
Each nfs_inode may hold a pointer to a cache of these layout
|
||||
@ -25,7 +27,8 @@ the reference count, as the layout is kept around by the lseg that
|
||||
keeps it in the list.
|
||||
|
||||
deviceid_cache
|
||||
--------------
|
||||
==============
|
||||
|
||||
lsegs reference device ids, which are resolved per nfs_client and
|
||||
layout driver type. The device ids are held in a RCU cache (struct
|
||||
nfs4_deviceid_cache). The cache itself is referenced across each
|
||||
@ -38,24 +41,26 @@ justification, but seems reasonable given that we can have multiple
|
||||
deviceid's per filesystem, and multiple filesystems per nfs_client.
|
||||
|
||||
The hash code is copied from the nfsd code base. A discussion of
|
||||
hashing and variations of this algorithm can be found at:
|
||||
http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809
|
||||
hashing and variations of this algorithm can be found `here.
|
||||
<http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809>`_
|
||||
|
||||
data server cache
|
||||
-----------------
|
||||
=================
|
||||
|
||||
file driver devices refer to data servers, which are kept in a module
|
||||
level cache. Its reference is held over the lifetime of the deviceid
|
||||
pointing to it.
|
||||
|
||||
lseg
|
||||
----
|
||||
====
|
||||
|
||||
lseg maintains an extra reference corresponding to the NFS_LSEG_VALID
|
||||
bit which holds it in the pnfs_layout_hdr's list. When the final lseg
|
||||
is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED
|
||||
bit is set, preventing any new lsegs from being added.
|
||||
|
||||
layout drivers
|
||||
--------------
|
||||
==============
|
||||
|
||||
PNFS utilizes what is called layout drivers. The STD defines 4 basic
|
||||
layout types: "files", "objects", "blocks", and "flexfiles". For each
|
||||
@ -68,6 +73,6 @@ Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory
|
||||
Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory
|
||||
|
||||
blocks-layout setup
|
||||
-------------------
|
||||
===================
|
||||
|
||||
TODO: Document the setup needs of the blocks layout driver
|
@ -1,9 +1,14 @@
|
||||
This document gives a brief introduction to the caching
|
||||
=========
|
||||
RPC Cache
|
||||
=========
|
||||
|
||||
This document gives a brief introduction to the caching
|
||||
mechanisms in the sunrpc layer that is used, in particular,
|
||||
for NFS authentication.
|
||||
|
||||
CACHES
|
||||
Caches
|
||||
======
|
||||
|
||||
The caching replaces the old exports table and allows for
|
||||
a wide variety of values to be caches.
|
||||
|
||||
@ -12,6 +17,7 @@ quite possibly very different in content and use. There is a corpus
|
||||
of common code for managing these caches.
|
||||
|
||||
Examples of caches that are likely to be needed are:
|
||||
|
||||
- mapping from IP address to client name
|
||||
- mapping from client name and filesystem to export options
|
||||
- mapping from UID to list of GIDs, to work around NFS's limitation
|
||||
@ -21,6 +27,7 @@ Examples of caches that are likely to be needed are:
|
||||
- mapping from network identify to public key for crypto authentication.
|
||||
|
||||
The common code handles such things as:
|
||||
|
||||
- general cache lookup with correct locking
|
||||
- supporting 'NEGATIVE' as well as positive entries
|
||||
- allowing an EXPIRED time on cache items, and removing
|
||||
@ -35,23 +42,25 @@ The common code handles such things as:
|
||||
Creating a Cache
|
||||
----------------
|
||||
|
||||
1/ A cache needs a datum to store. This is in the form of a
|
||||
structure definition that must contain a
|
||||
struct cache_head
|
||||
- A cache needs a datum to store. This is in the form of a
|
||||
structure definition that must contain a struct cache_head
|
||||
as an element, usually the first.
|
||||
It will also contain a key and some content.
|
||||
Each cache element is reference counted and contains
|
||||
expiry and update times for use in cache management.
|
||||
2/ A cache needs a "cache_detail" structure that
|
||||
- A cache needs a "cache_detail" structure that
|
||||
describes the cache. This stores the hash table, some
|
||||
parameters for cache management, and some operations detailing how
|
||||
to work with particular cache items.
|
||||
The operations requires are:
|
||||
struct cache_head *alloc(void)
|
||||
|
||||
The operations are:
|
||||
|
||||
struct cache_head \*alloc(void)
|
||||
This simply allocates appropriate memory and returns
|
||||
a pointer to the cache_detail embedded within the
|
||||
structure
|
||||
void cache_put(struct kref *)
|
||||
|
||||
void cache_put(struct kref \*)
|
||||
This is called when the last reference to an item is
|
||||
dropped. The pointer passed is to the 'ref' field
|
||||
in the cache_head. cache_put should release any
|
||||
@ -59,28 +68,32 @@ Creating a Cache
|
||||
is set, any references created by cache_update.
|
||||
It should then release the memory allocated by
|
||||
'alloc'.
|
||||
int match(struct cache_head *orig, struct cache_head *new)
|
||||
|
||||
int match(struct cache_head \*orig, struct cache_head \*new)
|
||||
test if the keys in the two structures match. Return
|
||||
1 if they do, 0 if they don't.
|
||||
void init(struct cache_head *orig, struct cache_head *new)
|
||||
|
||||
void init(struct cache_head \*orig, struct cache_head \*new)
|
||||
Set the 'key' fields in 'new' from 'orig'. This may
|
||||
include taking references to shared objects.
|
||||
void update(struct cache_head *orig, struct cache_head *new)
|
||||
|
||||
void update(struct cache_head \*orig, struct cache_head \*new)
|
||||
Set the 'content' fileds in 'new' from 'orig'.
|
||||
int cache_show(struct seq_file *m, struct cache_detail *cd,
|
||||
struct cache_head *h)
|
||||
|
||||
int cache_show(struct seq_file \*m, struct cache_detail \*cd, struct cache_head \*h)
|
||||
Optional. Used to provide a /proc file that lists the
|
||||
contents of a cache. This should show one item,
|
||||
usually on just one line.
|
||||
int cache_request(struct cache_detail *cd, struct cache_head *h,
|
||||
char **bpp, int *blen)
|
||||
|
||||
int cache_request(struct cache_detail \*cd, struct cache_head \*h, char \*\*bpp, int \*blen)
|
||||
Format a request to be send to user-space for an item
|
||||
to be instantiated. *bpp is a buffer of size *blen.
|
||||
to be instantiated. \*bpp is a buffer of size \*blen.
|
||||
bpp should be moved forward over the encoded message,
|
||||
and *blen should be reduced to show how much free
|
||||
and \*blen should be reduced to show how much free
|
||||
space remains. Return 0 on success or <0 if not
|
||||
enough room or other problem.
|
||||
int cache_parse(struct cache_detail *cd, char *buf, int len)
|
||||
|
||||
int cache_parse(struct cache_detail \*cd, char \*buf, int len)
|
||||
A message from user space has arrived to fill out a
|
||||
cache entry. It is in 'buf' of length 'len'.
|
||||
cache_parse should parse this, find the item in the
|
||||
@ -88,7 +101,7 @@ Creating a Cache
|
||||
with sunrpc_cache_update.
|
||||
|
||||
|
||||
3/ A cache needs to be registered using cache_register(). This
|
||||
- A cache needs to be registered using cache_register(). This
|
||||
includes it on a list of caches that will be regularly
|
||||
cleaned to discard old data.
|
||||
|
||||
@ -107,7 +120,7 @@ cache_check will return -ENOENT in the entry is negative or if an up
|
||||
call is needed but not possible, -EAGAIN if an upcall is pending,
|
||||
or 0 if the data is valid;
|
||||
|
||||
cache_check can be passed a "struct cache_req *". This structure is
|
||||
cache_check can be passed a "struct cache_req\*". This structure is
|
||||
typically embedded in the actual request and can be used to create a
|
||||
deferred copy of the request (struct cache_deferred_req). This is
|
||||
done when the found cache item is not uptodate, but the is reason to
|
||||
@ -139,9 +152,11 @@ The 'channel' works a bit like a datagram socket. Each 'write' is
|
||||
passed as a whole to the cache for parsing and interpretation.
|
||||
Each cache can treat the write requests differently, but it is
|
||||
expected that a message written will contain:
|
||||
|
||||
- a key
|
||||
- an expiry time
|
||||
- a content.
|
||||
|
||||
with the intention that an item in the cache with the give key
|
||||
should be create or updated to have the given content, and the
|
||||
expiry time should be set on that item.
|
||||
@ -156,7 +171,8 @@ If there are no more requests to return, read will return EOF, but a
|
||||
select or poll for read will block waiting for another request to be
|
||||
added.
|
||||
|
||||
Thus a user-space helper is likely to:
|
||||
Thus a user-space helper is likely to::
|
||||
|
||||
open the channel.
|
||||
select for readable
|
||||
read a request
|
||||
@ -175,12 +191,13 @@ Each cache should also define a "cache_request" method which
|
||||
takes a cache item and encodes a request into the buffer
|
||||
provided.
|
||||
|
||||
Note: If a cache has no active readers on the channel, and has had not
|
||||
active readers for more than 60 seconds, further requests will not be
|
||||
added to the channel but instead all lookups that do not find a valid
|
||||
entry will fail. This is partly for backward compatibility: The
|
||||
previous nfs exports table was deemed to be authoritative and a
|
||||
failed lookup meant a definite 'no'.
|
||||
.. note::
|
||||
If a cache has no active readers on the channel, and has had not
|
||||
active readers for more than 60 seconds, further requests will not be
|
||||
added to the channel but instead all lookups that do not find a valid
|
||||
entry will fail. This is partly for backward compatibility: The
|
||||
previous nfs exports table was deemed to be authoritative and a
|
||||
failed lookup meant a definite 'no'.
|
||||
|
||||
request/response format
|
||||
-----------------------
|
||||
@ -193,10 +210,11 @@ with precisely one newline character which should be at the end.
|
||||
Fields within the record should be separated by spaces, normally one.
|
||||
If spaces, newlines, or nul characters are needed in a field they
|
||||
much be quoted. two mechanisms are available:
|
||||
1/ If a field begins '\x' then it must contain an even number of
|
||||
|
||||
- If a field begins '\x' then it must contain an even number of
|
||||
hex digits, and pairs of these digits provide the bytes in the
|
||||
field.
|
||||
2/ otherwise a \ in the field must be followed by 3 octal digits
|
||||
- otherwise a \ in the field must be followed by 3 octal digits
|
||||
which give the code for a byte. Other characters are treated
|
||||
as them selves. At the very least, space, newline, nul, and
|
||||
'\' must be quoted in this way.
|
@ -1,4 +1,4 @@
|
||||
|
||||
=========================================
|
||||
rpcsec_gss support for kernel RPC servers
|
||||
=========================================
|
||||
|
||||
@ -9,14 +9,17 @@ NFSv4.1 and higher don't require the client to act as a server for the
|
||||
purposes of authentication.)
|
||||
|
||||
RPCGSS is specified in a few IETF documents:
|
||||
|
||||
- RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt
|
||||
- RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt
|
||||
|
||||
and there is a 3rd version being proposed:
|
||||
|
||||
- http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt
|
||||
(At draft n. 02 at the time of writing)
|
||||
|
||||
Background
|
||||
----------
|
||||
==========
|
||||
|
||||
The RPCGSS Authentication method describes a way to perform GSSAPI
|
||||
Authentication for NFS. Although GSSAPI is itself completely mechanism
|
||||
@ -29,6 +32,7 @@ depends on GSSAPI extensions that are KRB5 specific.
|
||||
GSSAPI is a complex library, and implementing it completely in kernel is
|
||||
unwarranted. However GSSAPI operations are fundementally separable in 2
|
||||
parts:
|
||||
|
||||
- initial context establishment
|
||||
- integrity/privacy protection (signing and encrypting of individual
|
||||
packets)
|
||||
@ -41,7 +45,7 @@ kernel, but leave the initial context establishment to userspace. We
|
||||
need upcalls to request userspace to perform context establishment.
|
||||
|
||||
NFS Server Legacy Upcall Mechanism
|
||||
----------------------------------
|
||||
==================================
|
||||
|
||||
The classic upcall mechanism uses a custom text based upcall mechanism
|
||||
to talk to a custom daemon called rpc.svcgssd that is provide by the
|
||||
@ -62,21 +66,20 @@ groups) due to limitation on the size of the buffer that can be send
|
||||
back to the kernel (4KiB).
|
||||
|
||||
NFS Server New RPC Upcall Mechanism
|
||||
-----------------------------------
|
||||
===================================
|
||||
|
||||
The newer upcall mechanism uses RPC over a unix socket to a daemon
|
||||
called gss-proxy, implemented by a userspace program called Gssproxy.
|
||||
|
||||
The gss_proxy RPC protocol is currently documented here:
|
||||
|
||||
https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation
|
||||
The gss_proxy RPC protocol is currently documented `here
|
||||
<https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation>`_.
|
||||
|
||||
This upcall mechanism uses the kernel rpc client and connects to the gssproxy
|
||||
userspace program over a regular unix socket. The gssproxy protocol does not
|
||||
suffer from the size limitations of the legacy protocol.
|
||||
|
||||
Negotiating Upcall Mechanisms
|
||||
-----------------------------
|
||||
=============================
|
||||
|
||||
To provide backward compatibility, the kernel defaults to using the
|
||||
legacy mechanism. To switch to the new mechanism, gss-proxy must bind
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======
|
||||
NILFS2
|
||||
------
|
||||
======
|
||||
|
||||
NILFS2 is a log-structured file system (LFS) supporting continuous
|
||||
snapshotting. In addition to versioning capability of the entire file
|
||||
@ -25,9 +28,9 @@ available from the following download page. At least "mkfs.nilfs2",
|
||||
cleaner or garbage collector) are required. Details on the tools are
|
||||
described in the man pages included in the package.
|
||||
|
||||
Project web page: https://nilfs.sourceforge.io/
|
||||
Download page: https://nilfs.sourceforge.io/en/download.html
|
||||
List info: http://vger.kernel.org/vger-lists.html#linux-nilfs
|
||||
:Project web page: https://nilfs.sourceforge.io/
|
||||
:Download page: https://nilfs.sourceforge.io/en/download.html
|
||||
:List info: http://vger.kernel.org/vger-lists.html#linux-nilfs
|
||||
|
||||
Caveats
|
||||
=======
|
||||
@ -47,6 +50,7 @@ Mount options
|
||||
NILFS2 supports the following mount options:
|
||||
(*) == default
|
||||
|
||||
======================= =======================================================
|
||||
barrier(*) This enables/disables the use of write barriers. This
|
||||
nobarrier requires an IO stack which can support barriers, and
|
||||
if nilfs gets an error on a barrier write, it will
|
||||
@ -79,6 +83,7 @@ discard This enables/disables the use of discard/TRIM commands.
|
||||
nodiscard(*) The discard/TRIM commands are sent to the underlying
|
||||
block device when blocks are freed. This is useful
|
||||
for SSD devices and sparse/thinly-provisioned LUNs.
|
||||
======================= =======================================================
|
||||
|
||||
Ioctls
|
||||
======
|
||||
@ -87,9 +92,11 @@ There is some NILFS2 specific functionality which can be accessed by application
|
||||
through the system call interfaces. The list of all NILFS2 specific ioctls are
|
||||
shown in the table below.
|
||||
|
||||
Table of NILFS2 specific ioctls
|
||||
..............................................................................
|
||||
Table of NILFS2 specific ioctls:
|
||||
|
||||
============================== ===============================================
|
||||
Ioctl Description
|
||||
============================== ===============================================
|
||||
NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between
|
||||
checkpoint and snapshot state. This ioctl is
|
||||
used in chcp and mkcp utilities.
|
||||
@ -142,11 +149,12 @@ Table of NILFS2 specific ioctls
|
||||
NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and
|
||||
upper limit of segments in bytes. This ioctl
|
||||
is used by nilfs_resize utility.
|
||||
============================== ===============================================
|
||||
|
||||
NILFS2 usage
|
||||
============
|
||||
|
||||
To use nilfs2 as a local file system, simply:
|
||||
To use nilfs2 as a local file system, simply::
|
||||
|
||||
# mkfs -t nilfs2 /dev/block_device
|
||||
# mount -t nilfs2 /dev/block_device /dir
|
||||
@ -157,18 +165,20 @@ This will also invoke the cleaner through the mount helper program
|
||||
Checkpoints and snapshots are managed by the following commands.
|
||||
Their manpages are included in the nilfs-utils package above.
|
||||
|
||||
==== ===========================================================
|
||||
lscp list checkpoints or snapshots.
|
||||
mkcp make a checkpoint or a snapshot.
|
||||
chcp change an existing checkpoint to a snapshot or vice versa.
|
||||
rmcp invalidate specified checkpoint(s).
|
||||
==== ===========================================================
|
||||
|
||||
To mount a snapshot,
|
||||
To mount a snapshot::
|
||||
|
||||
# mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
|
||||
|
||||
where <cno> is the checkpoint number of the snapshot.
|
||||
|
||||
To unmount the NILFS2 mount point or snapshot, simply:
|
||||
To unmount the NILFS2 mount point or snapshot, simply::
|
||||
|
||||
# umount /dir
|
||||
|
||||
@ -181,7 +191,7 @@ Disk format
|
||||
A nilfs2 volume is equally divided into a number of segments except
|
||||
for the super block (SB) and segment #0. A segment is the container
|
||||
of logs. Each log is composed of summary information blocks, payload
|
||||
blocks, and an optional super root block (SR):
|
||||
blocks, and an optional super root block (SR)::
|
||||
|
||||
______________________________________________________
|
||||
| |SB| | Segment | Segment | Segment | ... | Segment | |
|
||||
@ -200,7 +210,7 @@ blocks, and an optional super root block (SR):
|
||||
|_blocks__|_________________|__|
|
||||
|
||||
The payload blocks are organized per file, and each file consists of
|
||||
data blocks and B-tree node blocks:
|
||||
data blocks and B-tree node blocks::
|
||||
|
||||
|<--- File-A --->|<--- File-B --->|
|
||||
_______________________________________________________________
|
||||
@ -213,7 +223,7 @@ files without data blocks or B-tree node blocks.
|
||||
|
||||
The organization of the blocks is recorded in the summary information
|
||||
blocks, which contains a header structure (nilfs_segment_summary), per
|
||||
file structures (nilfs_finfo), and per block structures (nilfs_binfo):
|
||||
file structures (nilfs_finfo), and per block structures (nilfs_binfo)::
|
||||
|
||||
_________________________________________________________________________
|
||||
| Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
|
||||
@ -223,7 +233,7 @@ file structures (nilfs_finfo), and per block structures (nilfs_binfo):
|
||||
The logs include regular files, directory files, symbolic link files
|
||||
and several meta data files. The mata data files are the files used
|
||||
to maintain file system meta data. The current version of NILFS2 uses
|
||||
the following meta data files:
|
||||
the following meta data files::
|
||||
|
||||
1) Inode file (ifile) -- Stores on-disk inodes
|
||||
2) Checkpoint file (cpfile) -- Stores checkpoints
|
||||
@ -232,7 +242,7 @@ the following meta data files:
|
||||
(DAT) block numbers. This file serves to
|
||||
make on-disk blocks relocatable.
|
||||
|
||||
The following figure shows a typical organization of the logs:
|
||||
The following figure shows a typical organization of the logs::
|
||||
|
||||
_________________________________________________________________________
|
||||
| Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR|
|
||||
@ -250,7 +260,7 @@ three special inodes, inodes for the DAT, cpfile, and sufile. Inodes
|
||||
of regular files, directories, symlinks and other special files, are
|
||||
included in the ifile. The inode of ifile itself is included in the
|
||||
corresponding checkpoint entry in the cpfile. Thus, the hierarchy
|
||||
among NILFS2 files can be depicted as follows:
|
||||
among NILFS2 files can be depicted as follows::
|
||||
|
||||
Super block (SB)
|
||||
|
|
@ -1,16 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================
|
||||
The Linux NTFS filesystem driver
|
||||
================================
|
||||
|
||||
|
||||
Table of contents
|
||||
=================
|
||||
.. Table of contents
|
||||
|
||||
- Overview
|
||||
- Web site
|
||||
- Features
|
||||
- Supported mount options
|
||||
- Known bugs and (mis-)features
|
||||
- Using NTFS volume and stripe sets
|
||||
- Overview
|
||||
- Web site
|
||||
- Features
|
||||
- Supported mount options
|
||||
- Known bugs and (mis-)features
|
||||
- Using NTFS volume and stripe sets
|
||||
- The Device-Mapper driver
|
||||
- The Software RAID / MD driver
|
||||
- Limitations when using the MD driver
|
||||
@ -66,8 +68,10 @@ Features
|
||||
partition by creating a large file while in Windows and then loopback
|
||||
mounting the file while in Linux and creating a Linux filesystem on it that
|
||||
is used to install Linux on it.
|
||||
- A comparison of the two drivers using:
|
||||
- A comparison of the two drivers using::
|
||||
|
||||
time find . -type f -exec md5sum "{}" \;
|
||||
|
||||
run three times in sequence with each driver (after a reboot) on a 1.4GiB
|
||||
NTFS partition, showed the new driver to be 20% faster in total time elapsed
|
||||
(from 9:43 minutes on average down to 7:53). The time spent in user space
|
||||
@ -104,6 +108,7 @@ In addition to the generic mount options described by the manual page for the
|
||||
mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
|
||||
following mount options:
|
||||
|
||||
======================= =======================================================
|
||||
iocharset=name Deprecated option. Still supported but please use
|
||||
nls=name in the future. See description for nls=name.
|
||||
|
||||
@ -175,16 +180,22 @@ disable_sparse=<BOOL> If disable_sparse is specified, creation of sparse
|
||||
|
||||
errors=opt What to do when critical filesystem errors are found.
|
||||
Following values can be used for "opt":
|
||||
continue: DEFAULT, try to clean-up as much as
|
||||
|
||||
======== =========================================
|
||||
continue DEFAULT, try to clean-up as much as
|
||||
possible, e.g. marking a corrupt inode as
|
||||
bad so it is no longer accessed, and then
|
||||
continue.
|
||||
recover: At present only supported is recovery of
|
||||
recover At present only supported is recovery of
|
||||
the boot sector from the backup copy.
|
||||
If read-only mount, the recovery is done
|
||||
in memory only and not written to disk.
|
||||
Note that the options are additive, i.e. specifying:
|
||||
======== =========================================
|
||||
|
||||
Note that the options are additive, i.e. specifying::
|
||||
|
||||
errors=continue,errors=recover
|
||||
|
||||
means the driver will attempt to recover and if that
|
||||
fails it will clean-up as much as possible and
|
||||
continue.
|
||||
@ -202,12 +213,18 @@ mft_zone_multiplier= Set the MFT zone multiplier for the volume (this
|
||||
In general use the default. If you have a lot of small
|
||||
files then use a higher value. The values have the
|
||||
following meaning:
|
||||
|
||||
===== =================================
|
||||
Value MFT zone size (% of volume size)
|
||||
===== =================================
|
||||
1 12.5%
|
||||
2 25%
|
||||
3 37.5%
|
||||
4 50%
|
||||
===== =================================
|
||||
|
||||
Note this option is irrelevant for read-only mounts.
|
||||
======================= =======================================================
|
||||
|
||||
|
||||
Known bugs and (mis-)features
|
||||
@ -252,13 +269,13 @@ To create the table describing your volume you will need to know each of its
|
||||
components and their sizes in sectors, i.e. multiples of 512-byte blocks.
|
||||
|
||||
For NT4 fault tolerant volumes you can obtain the sizes using fdisk. So for
|
||||
example if one of your partitions is /dev/hda2 you would do:
|
||||
example if one of your partitions is /dev/hda2 you would do::
|
||||
|
||||
$ fdisk -ul /dev/hda
|
||||
$ fdisk -ul /dev/hda
|
||||
|
||||
Disk /dev/hda: 81.9 GB, 81964302336 bytes
|
||||
255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
|
||||
Units = sectors of 1 * 512 = 512 bytes
|
||||
Disk /dev/hda: 81.9 GB, 81964302336 bytes
|
||||
255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
|
||||
Units = sectors of 1 * 512 = 512 bytes
|
||||
|
||||
Device Boot Start End Blocks Id System
|
||||
/dev/hda1 * 63 4209029 2104483+ 83 Linux
|
||||
@ -271,15 +288,17 @@ And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
|
||||
For Win2k and later dynamic disks, you can for example use the ldminfo utility
|
||||
which is part of the Linux LDM tools (the latest version at the time of
|
||||
writing is linux-ldm-0.0.8.tar.bz2). You can download it from:
|
||||
|
||||
http://www.linux-ntfs.org/
|
||||
|
||||
Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
|
||||
into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You
|
||||
will find the precompiled (i386) ldminfo utility there. NOTE: You will not be
|
||||
able to compile this yourself easily so use the binary version!
|
||||
|
||||
Then you would use ldminfo in dump mode to obtain the necessary information:
|
||||
Then you would use ldminfo in dump mode to obtain the necessary information::
|
||||
|
||||
$ ./ldminfo --dump /dev/hda
|
||||
$ ./ldminfo --dump /dev/hda
|
||||
|
||||
This would dump the LDM database found on /dev/hda which describes all of your
|
||||
dynamic disks and all the volumes on them. At the bottom you will see the
|
||||
@ -305,42 +324,36 @@ give you the correct information to do this.
|
||||
Assuming you know all your devices and their sizes things are easy.
|
||||
|
||||
For a linear raid the table would look like this (note all values are in
|
||||
512-byte sectors):
|
||||
512-byte sectors)::
|
||||
|
||||
--- cut here ---
|
||||
# Offset into Size of this Raid type Device Start sector
|
||||
# volume device of device
|
||||
0 1028161 linear /dev/hda1 0
|
||||
1028161 3903762 linear /dev/hdb2 0
|
||||
4931923 2103211 linear /dev/hdc1 0
|
||||
--- cut here ---
|
||||
# Offset into Size of this Raid type Device Start sector
|
||||
# volume device of device
|
||||
0 1028161 linear /dev/hda1 0
|
||||
1028161 3903762 linear /dev/hdb2 0
|
||||
4931923 2103211 linear /dev/hdc1 0
|
||||
|
||||
For a striped volume, i.e. raid level 0, you will need to know the chunk size
|
||||
you used when creating the volume. Windows uses 64kiB as the default, so it
|
||||
will probably be this unless you changes the defaults when creating the array.
|
||||
|
||||
For a raid level 0 the table would look like this (note all values are in
|
||||
512-byte sectors):
|
||||
512-byte sectors)::
|
||||
|
||||
--- cut here ---
|
||||
# Offset Size Raid Number Chunk 1st Start 2nd Start
|
||||
# into of the type of size Device in Device in
|
||||
# volume volume stripes device device
|
||||
0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0
|
||||
--- cut here ---
|
||||
# Offset Size Raid Number Chunk 1st Start 2nd Start
|
||||
# into of the type of size Device in Device in
|
||||
# volume volume stripes device device
|
||||
0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0
|
||||
|
||||
If there are more than two devices, just add each of them to the end of the
|
||||
line.
|
||||
|
||||
Finally, for a mirrored volume, i.e. raid level 1, the table would look like
|
||||
this (note all values are in 512-byte sectors):
|
||||
this (note all values are in 512-byte sectors)::
|
||||
|
||||
--- cut here ---
|
||||
# Ofs Size Raid Log Number Region Should Number Source Start Target Start
|
||||
# in of the type type of log size sync? of Device in Device in
|
||||
# vol volume params mirrors Device Device
|
||||
0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0
|
||||
--- cut here ---
|
||||
# Ofs Size Raid Log Number Region Should Number Source Start Target Start
|
||||
# in of the type type of log size sync? of Device in Device in
|
||||
# vol volume params mirrors Device Device
|
||||
0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0
|
||||
|
||||
If you are mirroring to multiple devices you can specify further targets at the
|
||||
end of the line.
|
||||
@ -353,17 +366,17 @@ to the "Target Device" or if you specified multiple target devices to all of
|
||||
them.
|
||||
|
||||
Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
|
||||
and hand it over to dmsetup to work with, like so:
|
||||
and hand it over to dmsetup to work with, like so::
|
||||
|
||||
$ dmsetup create myvolume1 /etc/ntfsvolume1
|
||||
$ dmsetup create myvolume1 /etc/ntfsvolume1
|
||||
|
||||
You can obviously replace "myvolume1" with whatever name you like.
|
||||
|
||||
If it all worked, you will now have the device /dev/device-mapper/myvolume1
|
||||
which you can then just use as an argument to the mount command as usual to
|
||||
mount the ntfs volume. For example:
|
||||
mount the ntfs volume. For example::
|
||||
|
||||
$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
|
||||
$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
|
||||
|
||||
(You need to create the directory /mnt/myvol1 first and of course you can use
|
||||
anything you like instead of /mnt/myvol1 as long as it is an existing
|
||||
@ -395,9 +408,9 @@ Windows by default uses a stripe chunk size of 64k, so you probably want the
|
||||
"chunk-size 64k" option for each raid-disk, too.
|
||||
|
||||
For example, if you have a stripe set consisting of two partitions /dev/hda5
|
||||
and /dev/hdb1 your /etc/raidtab would look like this:
|
||||
and /dev/hdb1 your /etc/raidtab would look like this::
|
||||
|
||||
raiddev /dev/md0
|
||||
raiddev /dev/md0
|
||||
raid-level 0
|
||||
nr-raid-disks 2
|
||||
nr-spare-disks 0
|
||||
@ -427,7 +440,9 @@ Once the raidtab is setup, run for example raid0run -a to start all devices or
|
||||
raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
|
||||
|
||||
Then just use the mount command as usual to mount the ntfs volume using for
|
||||
example: mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
|
||||
example::
|
||||
|
||||
mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
|
||||
|
||||
It is advisable to do the mount read-only to see if the md volume has been
|
||||
setup correctly to avoid the possibility of causing damage to the data on the
|
@ -1,5 +1,8 @@
|
||||
OCFS2 online file check
|
||||
-----------------------
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================
|
||||
OCFS2 file system - online file check
|
||||
=====================================
|
||||
|
||||
This document will describe OCFS2 online file check feature.
|
||||
|
||||
@ -40,7 +43,7 @@ When there are errors in the OCFS2 filesystem, they are usually accompanied
|
||||
by the inode number which caused the error. This inode number would be the
|
||||
input to check/fix the file.
|
||||
|
||||
There is a sysfs directory for each OCFS2 file system mounting:
|
||||
There is a sysfs directory for each OCFS2 file system mounting::
|
||||
|
||||
/sys/fs/ocfs2/<devname>/filecheck
|
||||
|
||||
@ -50,34 +53,36 @@ communicate with kernel space, tell which file(inode number) will be checked or
|
||||
fixed. Currently, three operations are supported, which includes checking
|
||||
inode, fixing inode and setting the size of result record history.
|
||||
|
||||
1. If you want to know what error exactly happened to <inode> before fixing, do
|
||||
1. If you want to know what error exactly happened to <inode> before fixing, do::
|
||||
|
||||
# echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
|
||||
# cat /sys/fs/ocfs2/<devname>/filecheck/check
|
||||
|
||||
The output is like this:
|
||||
The output is like this::
|
||||
|
||||
INO DONE ERROR
|
||||
39502 1 GENERATION
|
||||
39502 1 GENERATION
|
||||
|
||||
<INO> lists the inode numbers.
|
||||
<DONE> indicates whether the operation has been finished.
|
||||
<ERROR> says what kind of errors was found. For the detailed error numbers,
|
||||
please refer to the file linux/fs/ocfs2/filecheck.h.
|
||||
<INO> lists the inode numbers.
|
||||
<DONE> indicates whether the operation has been finished.
|
||||
<ERROR> says what kind of errors was found. For the detailed error numbers,
|
||||
please refer to the file linux/fs/ocfs2/filecheck.h.
|
||||
|
||||
2. If you determine to fix this inode, do
|
||||
2. If you determine to fix this inode, do::
|
||||
|
||||
# echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
|
||||
# cat /sys/fs/ocfs2/<devname>/filecheck/fix
|
||||
|
||||
The output is like this:
|
||||
The output is like this:::
|
||||
|
||||
INO DONE ERROR
|
||||
39502 1 SUCCESS
|
||||
39502 1 SUCCESS
|
||||
|
||||
This time, the <ERROR> column indicates whether this fix is successful or not.
|
||||
|
||||
3. The record cache is used to store the history of check/fix results. It's
|
||||
default size is 10, and can be adjust between the range of 10 ~ 100. You can
|
||||
adjust the size like this:
|
||||
adjust the size like this::
|
||||
|
||||
# echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set
|
||||
|
@ -1,5 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
OCFS2 filesystem
|
||||
==================
|
||||
================
|
||||
|
||||
OCFS2 is a general purpose extent based shared disk cluster file
|
||||
system with many similarities to ext3. It supports 64 bit inode
|
||||
numbers, and has automatically extending metadata groups which may
|
||||
@ -14,22 +18,26 @@ OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
|
||||
|
||||
All code copyright 2005 Oracle except when otherwise noted.
|
||||
|
||||
CREDITS:
|
||||
Credits
|
||||
=======
|
||||
|
||||
Lots of code taken from ext3 and other projects.
|
||||
|
||||
Authors in alphabetical order:
|
||||
Joel Becker <joel.becker@oracle.com>
|
||||
Zach Brown <zach.brown@oracle.com>
|
||||
Mark Fasheh <mfasheh@suse.com>
|
||||
Kurt Hackel <kurt.hackel@oracle.com>
|
||||
Tao Ma <tao.ma@oracle.com>
|
||||
Sunil Mushran <sunil.mushran@oracle.com>
|
||||
Manish Singh <manish.singh@oracle.com>
|
||||
Tiger Yang <tiger.yang@oracle.com>
|
||||
|
||||
- Joel Becker <joel.becker@oracle.com>
|
||||
- Zach Brown <zach.brown@oracle.com>
|
||||
- Mark Fasheh <mfasheh@suse.com>
|
||||
- Kurt Hackel <kurt.hackel@oracle.com>
|
||||
- Tao Ma <tao.ma@oracle.com>
|
||||
- Sunil Mushran <sunil.mushran@oracle.com>
|
||||
- Manish Singh <manish.singh@oracle.com>
|
||||
- Tiger Yang <tiger.yang@oracle.com>
|
||||
|
||||
Caveats
|
||||
=======
|
||||
Features which OCFS2 does not support yet:
|
||||
|
||||
- Directory change notification (F_NOTIFY)
|
||||
- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
|
||||
|
||||
@ -37,8 +45,10 @@ Mount options
|
||||
=============
|
||||
|
||||
OCFS2 supports the following mount options:
|
||||
|
||||
(*) == default
|
||||
|
||||
======================= ========================================================
|
||||
barrier=1 This enables/disables barriers. barrier=0 disables it,
|
||||
barrier=1 enables it.
|
||||
errors=remount-ro(*) Remount the filesystem read-only on an error.
|
||||
@ -104,3 +114,4 @@ journal_async_commit Commit block can be written to disk without waiting
|
||||
for descriptor blocks. If enabled older kernels cannot
|
||||
mount the device. This will enable 'journal_checksum'
|
||||
internally.
|
||||
======================= ========================================================
|
112
Documentation/filesystems/omfs.rst
Normal file
112
Documentation/filesystems/omfs.rst
Normal file
@ -0,0 +1,112 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================
|
||||
Optimized MPEG Filesystem (OMFS)
|
||||
================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
|
||||
and Rio Karma MP3 player. The filesystem is extent-based, utilizing
|
||||
block sizes from 2k to 8k, with hash-based directories. This
|
||||
filesystem driver may be used to read and write disks from these
|
||||
devices.
|
||||
|
||||
Note, it is not recommended that this FS be used in place of a general
|
||||
filesystem for your own streaming media device. Native Linux filesystems
|
||||
will likely perform better.
|
||||
|
||||
More information is available at:
|
||||
|
||||
http://linux-karma.sf.net/
|
||||
|
||||
Various utilities, including mkomfs and omfsck, are included with
|
||||
omfsprogs, available at:
|
||||
|
||||
http://bobcopeland.com/karma/
|
||||
|
||||
Instructions are included in its README.
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
OMFS supports the following mount-time options:
|
||||
|
||||
============ ========================================
|
||||
uid=n make all files owned by specified user
|
||||
gid=n make all files owned by specified group
|
||||
umask=xxx set permission umask to xxx
|
||||
fmask=xxx set umask to xxx for files
|
||||
dmask=xxx set umask to xxx for directories
|
||||
============ ========================================
|
||||
|
||||
Disk format
|
||||
===========
|
||||
|
||||
OMFS discriminates between "sysblocks" and normal data blocks. The sysblock
|
||||
group consists of super block information, file metadata, directory structures,
|
||||
and extents. Each sysblock has a header containing CRCs of the entire
|
||||
sysblock, and may be mirrored in successive blocks on the disk. A sysblock may
|
||||
have a smaller size than a data block, but since they are both addressed by the
|
||||
same 64-bit block number, any remaining space in the smaller sysblock is
|
||||
unused.
|
||||
|
||||
Sysblock header information::
|
||||
|
||||
struct omfs_header {
|
||||
__be64 h_self; /* FS block where this is located */
|
||||
__be32 h_body_size; /* size of useful data after header */
|
||||
__be16 h_crc; /* crc-ccitt of body_size bytes */
|
||||
char h_fill1[2];
|
||||
u8 h_version; /* version, always 1 */
|
||||
char h_type; /* OMFS_INODE_X */
|
||||
u8 h_magic; /* OMFS_IMAGIC */
|
||||
u8 h_check_xor; /* XOR of header bytes before this */
|
||||
__be32 h_fill2;
|
||||
};
|
||||
|
||||
Files and directories are both represented by omfs_inode::
|
||||
|
||||
struct omfs_inode {
|
||||
struct omfs_header i_head; /* header */
|
||||
__be64 i_parent; /* parent containing this inode */
|
||||
__be64 i_sibling; /* next inode in hash bucket */
|
||||
__be64 i_ctime; /* ctime, in milliseconds */
|
||||
char i_fill1[35];
|
||||
char i_type; /* OMFS_[DIR,FILE] */
|
||||
__be32 i_fill2;
|
||||
char i_fill3[64];
|
||||
char i_name[OMFS_NAMELEN]; /* filename */
|
||||
__be64 i_size; /* size of file, in bytes */
|
||||
};
|
||||
|
||||
Directories in OMFS are implemented as a large hash table. Filenames are
|
||||
hashed then prepended into the bucket list beginning at OMFS_DIR_START.
|
||||
Lookup requires hashing the filename, then seeking across i_sibling pointers
|
||||
until a match is found on i_name. Empty buckets are represented by block
|
||||
pointers with all-1s (~0).
|
||||
|
||||
A file is an omfs_inode structure followed by an extent table beginning at
|
||||
OMFS_EXTENT_START::
|
||||
|
||||
struct omfs_extent_entry {
|
||||
__be64 e_cluster; /* start location of a set of blocks */
|
||||
__be64 e_blocks; /* number of blocks after e_cluster */
|
||||
};
|
||||
|
||||
struct omfs_extent {
|
||||
__be64 e_next; /* next extent table location */
|
||||
__be32 e_extent_count; /* total # extents in this table */
|
||||
__be32 e_fill;
|
||||
struct omfs_extent_entry e_entry; /* start of extent entries */
|
||||
};
|
||||
|
||||
Each extent holds the block offset followed by number of blocks allocated to
|
||||
the extent. The final extent in each table is a terminator with e_cluster
|
||||
being ~0 and e_blocks being ones'-complement of the total number of blocks
|
||||
in the table.
|
||||
|
||||
If this table overflows, a continuation inode is written and pointed to by
|
||||
e_next. These have a header but lack the rest of the inode structure.
|
||||
|
@ -1,106 +0,0 @@
|
||||
Optimized MPEG Filesystem (OMFS)
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
|
||||
and Rio Karma MP3 player. The filesystem is extent-based, utilizing
|
||||
block sizes from 2k to 8k, with hash-based directories. This
|
||||
filesystem driver may be used to read and write disks from these
|
||||
devices.
|
||||
|
||||
Note, it is not recommended that this FS be used in place of a general
|
||||
filesystem for your own streaming media device. Native Linux filesystems
|
||||
will likely perform better.
|
||||
|
||||
More information is available at:
|
||||
|
||||
http://linux-karma.sf.net/
|
||||
|
||||
Various utilities, including mkomfs and omfsck, are included with
|
||||
omfsprogs, available at:
|
||||
|
||||
http://bobcopeland.com/karma/
|
||||
|
||||
Instructions are included in its README.
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
OMFS supports the following mount-time options:
|
||||
|
||||
uid=n - make all files owned by specified user
|
||||
gid=n - make all files owned by specified group
|
||||
umask=xxx - set permission umask to xxx
|
||||
fmask=xxx - set umask to xxx for files
|
||||
dmask=xxx - set umask to xxx for directories
|
||||
|
||||
Disk format
|
||||
===========
|
||||
|
||||
OMFS discriminates between "sysblocks" and normal data blocks. The sysblock
|
||||
group consists of super block information, file metadata, directory structures,
|
||||
and extents. Each sysblock has a header containing CRCs of the entire
|
||||
sysblock, and may be mirrored in successive blocks on the disk. A sysblock may
|
||||
have a smaller size than a data block, but since they are both addressed by the
|
||||
same 64-bit block number, any remaining space in the smaller sysblock is
|
||||
unused.
|
||||
|
||||
Sysblock header information:
|
||||
|
||||
struct omfs_header {
|
||||
__be64 h_self; /* FS block where this is located */
|
||||
__be32 h_body_size; /* size of useful data after header */
|
||||
__be16 h_crc; /* crc-ccitt of body_size bytes */
|
||||
char h_fill1[2];
|
||||
u8 h_version; /* version, always 1 */
|
||||
char h_type; /* OMFS_INODE_X */
|
||||
u8 h_magic; /* OMFS_IMAGIC */
|
||||
u8 h_check_xor; /* XOR of header bytes before this */
|
||||
__be32 h_fill2;
|
||||
};
|
||||
|
||||
Files and directories are both represented by omfs_inode:
|
||||
|
||||
struct omfs_inode {
|
||||
struct omfs_header i_head; /* header */
|
||||
__be64 i_parent; /* parent containing this inode */
|
||||
__be64 i_sibling; /* next inode in hash bucket */
|
||||
__be64 i_ctime; /* ctime, in milliseconds */
|
||||
char i_fill1[35];
|
||||
char i_type; /* OMFS_[DIR,FILE] */
|
||||
__be32 i_fill2;
|
||||
char i_fill3[64];
|
||||
char i_name[OMFS_NAMELEN]; /* filename */
|
||||
__be64 i_size; /* size of file, in bytes */
|
||||
};
|
||||
|
||||
Directories in OMFS are implemented as a large hash table. Filenames are
|
||||
hashed then prepended into the bucket list beginning at OMFS_DIR_START.
|
||||
Lookup requires hashing the filename, then seeking across i_sibling pointers
|
||||
until a match is found on i_name. Empty buckets are represented by block
|
||||
pointers with all-1s (~0).
|
||||
|
||||
A file is an omfs_inode structure followed by an extent table beginning at
|
||||
OMFS_EXTENT_START:
|
||||
|
||||
struct omfs_extent_entry {
|
||||
__be64 e_cluster; /* start location of a set of blocks */
|
||||
__be64 e_blocks; /* number of blocks after e_cluster */
|
||||
};
|
||||
|
||||
struct omfs_extent {
|
||||
__be64 e_next; /* next extent table location */
|
||||
__be32 e_extent_count; /* total # extents in this table */
|
||||
__be32 e_fill;
|
||||
struct omfs_extent_entry e_entry; /* start of extent entries */
|
||||
};
|
||||
|
||||
Each extent holds the block offset followed by number of blocks allocated to
|
||||
the extent. The final extent in each table is a terminator with e_cluster
|
||||
being ~0 and e_blocks being ones'-complement of the total number of blocks
|
||||
in the table.
|
||||
|
||||
If this table overflows, a continuation inode is written and pointed to by
|
||||
e_next. These have a header but lack the rest of the inode structure.
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========
|
||||
ORANGEFS
|
||||
========
|
||||
|
||||
@ -21,25 +24,25 @@ Orangefs features include:
|
||||
* Stateless
|
||||
|
||||
|
||||
MAILING LIST ARCHIVES
|
||||
Mailing List Archives
|
||||
=====================
|
||||
|
||||
http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
|
||||
|
||||
|
||||
MAILING LIST SUBMISSIONS
|
||||
Mailing List Submissions
|
||||
========================
|
||||
|
||||
devel@lists.orangefs.org
|
||||
|
||||
|
||||
DOCUMENTATION
|
||||
Documentation
|
||||
=============
|
||||
|
||||
http://www.orangefs.org/documentation/
|
||||
|
||||
|
||||
USERSPACE FILESYSTEM SOURCE
|
||||
Userspace Filesystem Source
|
||||
===========================
|
||||
|
||||
http://www.orangefs.org/download
|
||||
@ -48,16 +51,16 @@ Orangefs versions prior to 2.9.3 would not be compatible with the
|
||||
upstream version of the kernel client.
|
||||
|
||||
|
||||
RUNNING ORANGEFS ON A SINGLE SERVER
|
||||
Running ORANGEFS On a Single Server
|
||||
===================================
|
||||
|
||||
OrangeFS is usually run in large installations with multiple servers and
|
||||
clients, but a complete filesystem can be run on a single machine for
|
||||
development and testing.
|
||||
|
||||
On Fedora, install orangefs and orangefs-server.
|
||||
On Fedora, install orangefs and orangefs-server::
|
||||
|
||||
dnf -y install orangefs orangefs-server
|
||||
dnf -y install orangefs orangefs-server
|
||||
|
||||
There is an example server configuration file in
|
||||
/etc/orangefs/orangefs.conf. Change localhost to your hostname if
|
||||
@ -70,29 +73,29 @@ single line. Uncomment it and change the hostname if necessary. This
|
||||
controls clients which use libpvfs2. This does not control the
|
||||
pvfs2-client-core.
|
||||
|
||||
Create the filesystem.
|
||||
Create the filesystem::
|
||||
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
|
||||
Start the server.
|
||||
Start the server::
|
||||
|
||||
systemctl start orangefs-server
|
||||
systemctl start orangefs-server
|
||||
|
||||
Test the server.
|
||||
Test the server::
|
||||
|
||||
pvfs2-ping -m /pvfsmnt
|
||||
pvfs2-ping -m /pvfsmnt
|
||||
|
||||
Start the client. The module must be compiled in or loaded before this
|
||||
point.
|
||||
point::
|
||||
|
||||
systemctl start orangefs-client
|
||||
systemctl start orangefs-client
|
||||
|
||||
Mount the filesystem.
|
||||
Mount the filesystem::
|
||||
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
|
||||
|
||||
BUILDING ORANGEFS ON A SINGLE SERVER
|
||||
Building ORANGEFS on a Single Server
|
||||
====================================
|
||||
|
||||
Where OrangeFS cannot be installed from distribution packages, it may be
|
||||
@ -102,49 +105,51 @@ You can omit --prefix if you don't care that things are sprinkled around
|
||||
in /usr/local. As of version 2.9.6, OrangeFS uses Berkeley DB by
|
||||
default, we will probably be changing the default to LMDB soon.
|
||||
|
||||
./configure --prefix=/opt/ofs --with-db-backend=lmdb
|
||||
::
|
||||
|
||||
make
|
||||
./configure --prefix=/opt/ofs --with-db-backend=lmdb
|
||||
|
||||
make install
|
||||
make
|
||||
|
||||
Create an orangefs config file.
|
||||
make install
|
||||
|
||||
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
||||
Create an orangefs config file::
|
||||
|
||||
Create an /etc/pvfs2tab file.
|
||||
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
||||
|
||||
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
||||
Create an /etc/pvfs2tab file::
|
||||
|
||||
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
||||
/etc/pvfs2tab
|
||||
|
||||
Create the mount point you specified in the tab file if needed.
|
||||
Create the mount point you specified in the tab file if needed::
|
||||
|
||||
mkdir /pvfsmnt
|
||||
mkdir /pvfsmnt
|
||||
|
||||
Bootstrap the server.
|
||||
Bootstrap the server::
|
||||
|
||||
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
|
||||
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
|
||||
|
||||
Start the server.
|
||||
Start the server::
|
||||
|
||||
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
|
||||
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
|
||||
|
||||
Now the server should be running. Pvfs2-ls is a simple
|
||||
test to verify that the server is running.
|
||||
test to verify that the server is running::
|
||||
|
||||
/opt/ofs/bin/pvfs2-ls /pvfsmnt
|
||||
/opt/ofs/bin/pvfs2-ls /pvfsmnt
|
||||
|
||||
If stuff seems to be working, load the kernel module and
|
||||
turn on the client core.
|
||||
turn on the client core::
|
||||
|
||||
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
|
||||
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
|
||||
|
||||
Mount your filesystem.
|
||||
Mount your filesystem::
|
||||
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
|
||||
|
||||
RUNNING XFSTESTS
|
||||
Running xfstests
|
||||
================
|
||||
|
||||
It is useful to use a scratch filesystem with xfstests. This can be
|
||||
@ -159,21 +164,23 @@ Then there are two FileSystem sections: orangefs and scratch.
|
||||
|
||||
This change should be made before creating the filesystem.
|
||||
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
::
|
||||
|
||||
To run xfstests, create /etc/xfsqa.config.
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
|
||||
TEST_DIR=/orangefs
|
||||
TEST_DEV=tcp://localhost:3334/orangefs
|
||||
SCRATCH_MNT=/scratch
|
||||
SCRATCH_DEV=tcp://localhost:3334/scratch
|
||||
To run xfstests, create /etc/xfsqa.config::
|
||||
|
||||
Then xfstests can be run
|
||||
TEST_DIR=/orangefs
|
||||
TEST_DEV=tcp://localhost:3334/orangefs
|
||||
SCRATCH_MNT=/scratch
|
||||
SCRATCH_DEV=tcp://localhost:3334/scratch
|
||||
|
||||
./check -pvfs2
|
||||
Then xfstests can be run::
|
||||
|
||||
./check -pvfs2
|
||||
|
||||
|
||||
OPTIONS
|
||||
Options
|
||||
=======
|
||||
|
||||
The following mount options are accepted:
|
||||
@ -193,32 +200,32 @@ The following mount options are accepted:
|
||||
Distributed locking is being worked on for the future.
|
||||
|
||||
|
||||
DEBUGGING
|
||||
Debugging
|
||||
=========
|
||||
|
||||
If you want the debug (GOSSIP) statements in a particular
|
||||
source file (inode.c for example) go to syslog:
|
||||
source file (inode.c for example) go to syslog::
|
||||
|
||||
echo inode > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
No debugging (the default):
|
||||
No debugging (the default)::
|
||||
|
||||
echo none > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
Debugging from several source files:
|
||||
Debugging from several source files::
|
||||
|
||||
echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
All debugging:
|
||||
All debugging::
|
||||
|
||||
echo all > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
Get a list of all debugging keywords:
|
||||
Get a list of all debugging keywords::
|
||||
|
||||
cat /sys/kernel/debug/orangefs/debug-help
|
||||
|
||||
|
||||
PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE
|
||||
Protocol between Kernel Module and Userspace
|
||||
============================================
|
||||
|
||||
Orangefs is a user space filesystem and an associated kernel module.
|
||||
@ -234,7 +241,8 @@ The kernel module implements a pseudo device that userspace
|
||||
can read from and write to. Userspace can also manipulate the
|
||||
kernel module through the pseudo device with ioctl.
|
||||
|
||||
THE BUFMAP:
|
||||
The Bufmap
|
||||
----------
|
||||
|
||||
At startup userspace allocates two page-size-aligned (posix_memalign)
|
||||
mlocked memory buffers, one is used for IO and one is used for readdir
|
||||
@ -250,7 +258,8 @@ copied from user space to kernel space with copy_from_user and is used
|
||||
to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
|
||||
then contains:
|
||||
|
||||
* refcnt - a reference counter
|
||||
* refcnt
|
||||
- a reference counter
|
||||
* desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
|
||||
partition size, which represents the filesystem's block size and
|
||||
is used for s_blocksize in super blocks.
|
||||
@ -259,15 +268,17 @@ then contains:
|
||||
* desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
|
||||
* total_size - the total size of the IO buffer.
|
||||
* page_count - the number of 4096 byte pages in the IO buffer.
|
||||
* page_array - a pointer to page_count * (sizeof(struct page*)) bytes
|
||||
* page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
|
||||
of kcalloced memory. This memory is used as an array of pointers
|
||||
to each of the pages in the IO buffer through a call to get_user_pages.
|
||||
* desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc))
|
||||
* desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
|
||||
bytes of kcalloced memory. This memory is further intialized:
|
||||
|
||||
user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
|
||||
structure. user_desc->ptr points to the IO buffer.
|
||||
|
||||
::
|
||||
|
||||
pages_per_desc = bufmap->desc_size / PAGE_SIZE
|
||||
offset = 0
|
||||
|
||||
@ -293,7 +304,8 @@ then contains:
|
||||
* readdir_index_lock - a spinlock to protect readdir_index_array during
|
||||
update.
|
||||
|
||||
OPERATIONS:
|
||||
Operations
|
||||
----------
|
||||
|
||||
The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
|
||||
needs to communicate with userspace. Part of the op contains the "upcall"
|
||||
@ -308,13 +320,19 @@ in flight at any given time.
|
||||
|
||||
Ops are stateful:
|
||||
|
||||
* unknown - op was just initialized
|
||||
* waiting - op is on request_list (upward bound)
|
||||
* inprogr - op is in progress (waiting for downcall)
|
||||
* serviced - op has matching downcall; ok
|
||||
* purged - op has to start a timer since client-core
|
||||
* unknown
|
||||
- op was just initialized
|
||||
* waiting
|
||||
- op is on request_list (upward bound)
|
||||
* inprogr
|
||||
- op is in progress (waiting for downcall)
|
||||
* serviced
|
||||
- op has matching downcall; ok
|
||||
* purged
|
||||
- op has to start a timer since client-core
|
||||
exited uncleanly before servicing op
|
||||
* given up - submitter has given up waiting for it
|
||||
* given up
|
||||
- submitter has given up waiting for it
|
||||
|
||||
When some arbitrary userspace program needs to perform a
|
||||
filesystem operation on Orangefs (readdir, I/O, create, whatever)
|
||||
@ -389,10 +407,15 @@ union of structs, each of which is associated with a particular
|
||||
response type.
|
||||
|
||||
The several members outside of the union are:
|
||||
- int32_t type - type of operation.
|
||||
- int32_t status - return code for the operation.
|
||||
- int64_t trailer_size - 0 unless readdir operation.
|
||||
- char *trailer_buf - initialized to NULL, used during readdir operations.
|
||||
|
||||
``int32_t type``
|
||||
- type of operation.
|
||||
``int32_t status``
|
||||
- return code for the operation.
|
||||
``int64_t trailer_size``
|
||||
- 0 unless readdir operation.
|
||||
``char *trailer_buf``
|
||||
- initialized to NULL, used during readdir operations.
|
||||
|
||||
The appropriate member inside the union is filled out for any
|
||||
particular response.
|
||||
@ -449,18 +472,20 @@ Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
|
||||
made by the kernel side.
|
||||
|
||||
A buffer_list containing:
|
||||
|
||||
- a pointer to the prepared response to the request from the
|
||||
kernel (struct pvfs2_downcall_t).
|
||||
- and also, in the case of a readdir request, a pointer to a
|
||||
buffer containing descriptors for the objects in the target
|
||||
directory.
|
||||
|
||||
... is sent to the function (PINT_dev_write_list) which performs
|
||||
the writev.
|
||||
|
||||
PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
|
||||
|
||||
The first four elements of io_array are initialized like this for all
|
||||
responses:
|
||||
responses::
|
||||
|
||||
io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
|
||||
io_array[0].iov_len = sizeof(int32_t)
|
||||
@ -475,7 +500,7 @@ responses:
|
||||
of global variable vfs_request (vfs_request_t)
|
||||
io_array[3].iov_len = sizeof(pvfs2_downcall_t)
|
||||
|
||||
Readdir responses initialize the fifth element io_array like this:
|
||||
Readdir responses initialize the fifth element io_array like this::
|
||||
|
||||
io_array[4].iov_base = contents of member trailer_buf (char *)
|
||||
from out_downcall member of global variable
|
||||
@ -517,13 +542,13 @@ from a dentry is cheap, obtaining it from userspace is relatively expensive,
|
||||
hence the motivation to use the dentry when possible.
|
||||
|
||||
The timeout values d_time and getattr_time are jiffy based, and the
|
||||
code is designed to avoid the jiffy-wrap problem:
|
||||
code is designed to avoid the jiffy-wrap problem::
|
||||
|
||||
"In general, if the clock may have wrapped around more than once, there
|
||||
is no way to tell how much time has elapsed. However, if the times t1
|
||||
and t2 are known to be fairly close, we can reliably compute the
|
||||
difference in a way that takes into account the possibility that the
|
||||
clock may have wrapped between times."
|
||||
"In general, if the clock may have wrapped around more than once, there
|
||||
is no way to tell how much time has elapsed. However, if the times t1
|
||||
and t2 are known to be fairly close, we can reliably compute the
|
||||
difference in a way that takes into account the possibility that the
|
||||
clock may have wrapped between times."
|
||||
|
||||
from course notes by instructor Andy Wang
|
||||
from course notes by instructor Andy Wang
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================
|
||||
The QNX6 Filesystem
|
||||
===================
|
||||
|
||||
@ -14,10 +17,12 @@ Specification
|
||||
|
||||
qnx6fs shares many properties with traditional Unix filesystems. It has the
|
||||
concepts of blocks, inodes and directories.
|
||||
|
||||
On QNX it is possible to create little endian and big endian qnx6 filesystems.
|
||||
This feature makes it possible to create and use a different endianness fs
|
||||
for the target (QNX is used on quite a range of embedded systems) platform
|
||||
running on a different endianness.
|
||||
|
||||
The Linux driver handles endianness transparently. (LE and BE)
|
||||
|
||||
Blocks
|
||||
@ -26,6 +31,7 @@ Blocks
|
||||
The space in the device or file is split up into blocks. These are a fixed
|
||||
size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
|
||||
created.
|
||||
|
||||
Blockpointers are 32bit, so the maximum space that can be addressed is
|
||||
2^32 * 4096 bytes or 16TB
|
||||
|
||||
@ -50,6 +56,7 @@ Each of these root nodes holds information like total size of the stored
|
||||
data and the addressing levels in that specific tree.
|
||||
If the level value is 0, up to 16 direct blocks can be addressed by each
|
||||
node.
|
||||
|
||||
Level 1 adds an additional indirect addressing level where each indirect
|
||||
addressing block holds up to blocksize / 4 bytes pointers to data blocks.
|
||||
Level 2 adds an additional indirect addressing block level (so, already up
|
||||
@ -57,11 +64,13 @@ to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
|
||||
|
||||
Unused block pointers are always set to ~0 - regardless of root node,
|
||||
indirect addressing blocks or inodes.
|
||||
|
||||
Data leaves are always on the lowest level. So no data is stored on upper
|
||||
tree levels.
|
||||
|
||||
The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
|
||||
The Audi MMI 3G first superblock directly starts at byte 0.
|
||||
|
||||
Second superblock position can either be calculated from the superblock
|
||||
information (total number of filesystem blocks) or by taking the highest
|
||||
device address, zeroing the last 3 bytes and then subtracting 0x1000 from
|
||||
@ -84,6 +93,7 @@ Object mode field is POSIX format. (which makes things easier)
|
||||
|
||||
There are also pointers to the first 16 blocks, if the object data can be
|
||||
addressed with 16 direct blocks.
|
||||
|
||||
For more than 16 blocks an indirect addressing in form of another tree is
|
||||
used. (scheme is the same as the one used for the superblock root nodes)
|
||||
|
||||
@ -96,13 +106,18 @@ Directories
|
||||
A directory is a filesystem object and has an inode just like a file.
|
||||
It is a specially formatted file containing records which associate each
|
||||
name with an inode number.
|
||||
|
||||
'.' inode number points to the directory inode
|
||||
|
||||
'..' inode number points to the parent directory inode
|
||||
|
||||
Eeach filename record additionally got a filename length field.
|
||||
|
||||
One special case are long filenames or subdirectory names.
|
||||
|
||||
These got set a filename length field of 0xff in the corresponding directory
|
||||
record plus the longfile inode number also stored in that record.
|
||||
|
||||
With that longfilename inode number, the longfilename tree can be walked
|
||||
starting with the superblock longfilename root node pointers.
|
||||
|
||||
@ -111,6 +126,7 @@ Special files
|
||||
|
||||
Symbolic links are also filesystem objects with inodes. They got a specific
|
||||
bit in the inode mode field identifying them as symbolic link.
|
||||
|
||||
The directory entry file inode pointer points to the target file inode.
|
||||
|
||||
Hard links got an inode, a directory entry, but a specific mode bit set,
|
||||
@ -126,9 +142,11 @@ Long filenames
|
||||
|
||||
Long filenames are stored in a separate addressing tree. The staring point
|
||||
is the longfilename root node in the active superblock.
|
||||
|
||||
Each data block (tree leaves) holds one long filename. That filename is
|
||||
limited to 510 bytes. The first two starting bytes are used as length field
|
||||
for the actual filename.
|
||||
|
||||
If that structure shall fit for all allowed blocksizes, it is clear why there
|
||||
is a limit of 510 bytes for the actual filename stored.
|
||||
|
||||
@ -138,6 +156,7 @@ Bitmap
|
||||
The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap
|
||||
root node in the superblock and each bit in the bitmap represents one
|
||||
filesystem block.
|
||||
|
||||
The first block is block 0, which starts 0x1000 after superblock start.
|
||||
So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical
|
||||
address at which block 0 is located.
|
||||
@ -149,11 +168,14 @@ Bitmap system area
|
||||
------------------
|
||||
|
||||
The bitmap itself is divided into three parts.
|
||||
|
||||
First the system area, that is split into two halves.
|
||||
|
||||
Then userspace.
|
||||
|
||||
The requirement for a static, fixed preallocated system area comes from how
|
||||
qnx6fs deals with writes.
|
||||
|
||||
Each superblock got it's own half of the system area. So superblock #1
|
||||
always uses blocks from the lower half while superblock #2 just writes to
|
||||
blocks represented by the upper half bitmap system area bits.
|
@ -1,5 +1,11 @@
|
||||
ramfs, rootfs and initramfs
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================
|
||||
Ramfs, rootfs and initramfs
|
||||
===========================
|
||||
|
||||
October 17, 2005
|
||||
|
||||
Rob Landley <rob@landley.net>
|
||||
=============================
|
||||
|
||||
@ -99,14 +105,14 @@ out of that.
|
||||
All this differs from the old initrd in several ways:
|
||||
|
||||
- The old initrd was always a separate file, while the initramfs archive is
|
||||
linked into the linux kernel image. (The directory linux-*/usr is devoted
|
||||
to generating this archive during the build.)
|
||||
linked into the linux kernel image. (The directory ``linux-*/usr`` is
|
||||
devoted to generating this archive during the build.)
|
||||
|
||||
- The old initrd file was a gzipped filesystem image (in some file format,
|
||||
such as ext2, that needed a driver built into the kernel), while the new
|
||||
initramfs archive is a gzipped cpio archive (like tar only simpler,
|
||||
see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst). The
|
||||
kernel's cpio extraction code is not only extremely small, it's also
|
||||
see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).
|
||||
The kernel's cpio extraction code is not only extremely small, it's also
|
||||
__init text and data that can be discarded during the boot process.
|
||||
|
||||
- The program run by the old initrd (which was called /initrd, not /init) did
|
||||
@ -139,7 +145,7 @@ and living in usr/Kconfig) can be used to specify a source for the
|
||||
initramfs archive, which will automatically be incorporated into the
|
||||
resulting binary. This option can point to an existing gzipped cpio
|
||||
archive, a directory containing files to be archived, or a text file
|
||||
specification such as the following example:
|
||||
specification such as the following example::
|
||||
|
||||
dir /dev 755 0 0
|
||||
nod /dev/console 644 0 0 c 5 1
|
||||
@ -175,12 +181,12 @@ or extracting your own preprepared cpio files to feed to the kernel build
|
||||
(instead of a config file or directory).
|
||||
|
||||
The following command line can extract a cpio image (either by the above script
|
||||
or by the kernel build) back into its component files:
|
||||
or by the kernel build) back into its component files::
|
||||
|
||||
cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
|
||||
|
||||
The following shell script can create a prebuilt cpio archive you can
|
||||
use in place of the above config file:
|
||||
use in place of the above config file::
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
@ -202,14 +208,17 @@ use in place of the above config file:
|
||||
exit 1
|
||||
fi
|
||||
|
||||
Note: The cpio man page contains some bad advice that will break your initramfs
|
||||
archive if you follow it. It says "A typical way to generate the list
|
||||
of filenames is with the find command; you should give find the -depth option
|
||||
to minimize problems with permissions on directories that are unwritable or not
|
||||
searchable." Don't do this when creating initramfs.cpio.gz images, it won't
|
||||
work. The Linux kernel cpio extractor won't create files in a directory that
|
||||
doesn't exist, so the directory entries must go before the files that go in
|
||||
those directories. The above script gets them in the right order.
|
||||
.. Note::
|
||||
|
||||
The cpio man page contains some bad advice that will break your initramfs
|
||||
archive if you follow it. It says "A typical way to generate the list
|
||||
of filenames is with the find command; you should give find the -depth
|
||||
option to minimize problems with permissions on directories that are
|
||||
unwritable or not searchable." Don't do this when creating
|
||||
initramfs.cpio.gz images, it won't work. The Linux kernel cpio extractor
|
||||
won't create files in a directory that doesn't exist, so the directory
|
||||
entries must go before the files that go in those directories.
|
||||
The above script gets them in the right order.
|
||||
|
||||
External initramfs images:
|
||||
--------------------------
|
||||
@ -236,9 +245,10 @@ An initramfs archive is a complete self-contained root filesystem for Linux.
|
||||
If you don't already understand what shared libraries, devices, and paths
|
||||
you need to get a minimal root filesystem up and running, here are some
|
||||
references:
|
||||
http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
|
||||
http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
|
||||
http://www.linuxfromscratch.org/lfs/view/stable/
|
||||
|
||||
- http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
|
||||
- http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
|
||||
- http://www.linuxfromscratch.org/lfs/view/stable/
|
||||
|
||||
The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
|
||||
designed to be a tiny C library to statically link early userspace
|
||||
@ -255,7 +265,7 @@ name lookups, even when otherwise statically linked.)
|
||||
|
||||
A good first step is to get initramfs to run a statically linked "hello world"
|
||||
program as init, and test it under an emulator like qemu (www.qemu.org) or
|
||||
User Mode Linux, like so:
|
||||
User Mode Linux, like so::
|
||||
|
||||
cat > hello.c << EOF
|
||||
#include <stdio.h>
|
||||
@ -326,8 +336,8 @@ the above threads) is:
|
||||
|
||||
explained his reasoning:
|
||||
|
||||
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
|
||||
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
|
||||
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
|
||||
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
|
||||
|
||||
and, most importantly, designed and implemented the initramfs code.
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================================
|
||||
relay interface (formerly relayfs)
|
||||
==================================
|
||||
|
||||
@ -108,6 +111,7 @@ The relay interface implements basic file operations for user space
|
||||
access to relay channel buffer data. Here are the file operations
|
||||
that are available and some comments regarding their behavior:
|
||||
|
||||
=========== ============================================================
|
||||
open() enables user to open an _existing_ channel buffer.
|
||||
|
||||
mmap() results in channel buffer being mapped into the caller's
|
||||
@ -136,13 +140,16 @@ poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
|
||||
close() decrements the channel buffer's refcount. When the refcount
|
||||
reaches 0, i.e. when no process or kernel client has the
|
||||
buffer open, the channel buffer is freed.
|
||||
=========== ============================================================
|
||||
|
||||
In order for a user application to make use of relay files, the
|
||||
host filesystem must be mounted. For example,
|
||||
host filesystem must be mounted. For example::
|
||||
|
||||
mount -t debugfs debugfs /sys/kernel/debug
|
||||
|
||||
NOTE: the host filesystem doesn't need to be mounted for kernel
|
||||
.. Note::
|
||||
|
||||
the host filesystem doesn't need to be mounted for kernel
|
||||
clients to create or use channels - it only needs to be
|
||||
mounted when user space applications need access to the buffer
|
||||
data.
|
||||
@ -154,7 +161,7 @@ The relay interface kernel API
|
||||
Here's a summary of the API the relay interface provides to in-kernel clients:
|
||||
|
||||
TBD(curr. line MT:/API/)
|
||||
channel management functions:
|
||||
channel management functions::
|
||||
|
||||
relay_open(base_filename, parent, subbuf_size, n_subbufs,
|
||||
callbacks, private_data)
|
||||
@ -162,17 +169,17 @@ TBD(curr. line MT:/API/)
|
||||
relay_flush(chan)
|
||||
relay_reset(chan)
|
||||
|
||||
channel management typically called on instigation of userspace:
|
||||
channel management typically called on instigation of userspace::
|
||||
|
||||
relay_subbufs_consumed(chan, cpu, subbufs_consumed)
|
||||
|
||||
write functions:
|
||||
write functions::
|
||||
|
||||
relay_write(chan, data, length)
|
||||
__relay_write(chan, data, length)
|
||||
relay_reserve(chan, length)
|
||||
|
||||
callbacks:
|
||||
callbacks::
|
||||
|
||||
subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
|
||||
buf_mapped(buf, filp)
|
||||
@ -180,7 +187,7 @@ TBD(curr. line MT:/API/)
|
||||
create_buf_file(filename, parent, mode, buf, is_global)
|
||||
remove_buf_file(dentry)
|
||||
|
||||
helper functions:
|
||||
helper functions::
|
||||
|
||||
relay_buf_full(buf)
|
||||
subbuf_start_reserve(buf, length)
|
||||
@ -215,41 +222,41 @@ the file(s) created in create_buf_file() and is called during
|
||||
relay_close().
|
||||
|
||||
Here are some typical definitions for these callbacks, in this case
|
||||
using debugfs:
|
||||
using debugfs::
|
||||
|
||||
/*
|
||||
/*
|
||||
* create_buf_file() callback. Creates relay file in debugfs.
|
||||
*/
|
||||
static struct dentry *create_buf_file_handler(const char *filename,
|
||||
static struct dentry *create_buf_file_handler(const char *filename,
|
||||
struct dentry *parent,
|
||||
umode_t mode,
|
||||
struct rchan_buf *buf,
|
||||
int *is_global)
|
||||
{
|
||||
{
|
||||
return debugfs_create_file(filename, mode, parent, buf,
|
||||
&relay_file_operations);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
/*
|
||||
* remove_buf_file() callback. Removes relay file from debugfs.
|
||||
*/
|
||||
static int remove_buf_file_handler(struct dentry *dentry)
|
||||
{
|
||||
static int remove_buf_file_handler(struct dentry *dentry)
|
||||
{
|
||||
debugfs_remove(dentry);
|
||||
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
/*
|
||||
* relay interface callbacks
|
||||
*/
|
||||
static struct rchan_callbacks relay_callbacks =
|
||||
{
|
||||
static struct rchan_callbacks relay_callbacks =
|
||||
{
|
||||
.create_buf_file = create_buf_file_handler,
|
||||
.remove_buf_file = remove_buf_file_handler,
|
||||
};
|
||||
};
|
||||
|
||||
And an example relay_open() invocation using them:
|
||||
And an example relay_open() invocation using them::
|
||||
|
||||
chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
|
||||
|
||||
@ -339,13 +346,13 @@ whether or not to actually move on to the next sub-buffer.
|
||||
|
||||
To implement 'no-overwrite' mode, the userspace client would provide
|
||||
an implementation of the subbuf_start() callback something like the
|
||||
following:
|
||||
following::
|
||||
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
void *subbuf,
|
||||
void *prev_subbuf,
|
||||
unsigned int prev_padding)
|
||||
{
|
||||
{
|
||||
if (prev_subbuf)
|
||||
*((unsigned *)prev_subbuf) = prev_padding;
|
||||
|
||||
@ -355,7 +362,7 @@ static int subbuf_start(struct rchan_buf *buf,
|
||||
subbuf_start_reserve(buf, sizeof(unsigned int));
|
||||
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
If the current buffer is full, i.e. all sub-buffers remain unconsumed,
|
||||
the callback returns 0 to indicate that the buffer switch should not
|
||||
@ -370,20 +377,20 @@ ready sub-buffers will relay_buf_full() return 0, in which case the
|
||||
buffer switch can continue.
|
||||
|
||||
The implementation of the subbuf_start() callback for 'overwrite' mode
|
||||
would be very similar:
|
||||
would be very similar::
|
||||
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
void *subbuf,
|
||||
void *prev_subbuf,
|
||||
size_t prev_padding)
|
||||
{
|
||||
{
|
||||
if (prev_subbuf)
|
||||
*((unsigned *)prev_subbuf) = prev_padding;
|
||||
|
||||
subbuf_start_reserve(buf, sizeof(unsigned int));
|
||||
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
In this case, the relay_buf_full() check is meaningless and the
|
||||
callback always returns 1, causing the buffer switch to occur
|
@ -1,4 +1,8 @@
|
||||
ROMFS - ROM FILE SYSTEM
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
ROMFS - ROM File System
|
||||
=======================
|
||||
|
||||
This is a quite dumb, read only filesystem, mainly for initial RAM
|
||||
disks of installation disks. It has grown up by the need of having
|
||||
@ -51,9 +55,9 @@ the 16 byte padding for the name and the contents, also 16+14+15 = 45
|
||||
bytes. This is quite rare however, since most file names are longer
|
||||
than 3 bytes, and shorter than 15 bytes.
|
||||
|
||||
The layout of the filesystem is the following:
|
||||
The layout of the filesystem is the following::
|
||||
|
||||
offset content
|
||||
offset content
|
||||
|
||||
+---+---+---+---+
|
||||
0 | - | r | o | m | \
|
||||
@ -84,9 +88,9 @@ the source. This algorithm was chosen because although it's not quite
|
||||
reliable, it does not require any tables, and it is very simple.
|
||||
|
||||
The following bytes are now part of the file system; each file header
|
||||
must begin on a 16 byte boundary.
|
||||
must begin on a 16 byte boundary::
|
||||
|
||||
offset content
|
||||
offset content
|
||||
|
||||
+---+---+---+---+
|
||||
0 | next filehdr|X| The offset of the next file header
|
||||
@ -114,7 +118,9 @@ file is user and group 0, this should never be a problem for the
|
||||
intended use. The mapping of the 8 possible values to file types is
|
||||
the following:
|
||||
|
||||
== =============== ============================================
|
||||
mapping spec.info means
|
||||
== =============== ============================================
|
||||
0 hard link link destination [file header]
|
||||
1 directory first file's header
|
||||
2 regular file unused, must be zero [MBZ]
|
||||
@ -123,6 +129,7 @@ the following:
|
||||
5 char device - " -
|
||||
6 socket unused, MBZ
|
||||
7 fifo unused, MBZ
|
||||
== =============== ============================================
|
||||
|
||||
Note that hard links are specifically marked in this filesystem, but
|
||||
they will behave as you can expect (i.e. share the inode number).
|
||||
@ -158,24 +165,24 @@ to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
|
||||
Pending issues:
|
||||
|
||||
- Permissions and owner information are pretty essential features of a
|
||||
Un*x like system, but romfs does not provide the full possibilities.
|
||||
I have never found this limiting, but others might.
|
||||
Un*x like system, but romfs does not provide the full possibilities.
|
||||
I have never found this limiting, but others might.
|
||||
|
||||
- The file system is read only, so it can be very small, but in case
|
||||
one would want to write _anything_ to a file system, he still needs
|
||||
a writable file system, thus negating the size advantages. Possible
|
||||
solutions: implement write access as a compile-time option, or a new,
|
||||
similarly small writable filesystem for RAM disks.
|
||||
one would want to write _anything_ to a file system, he still needs
|
||||
a writable file system, thus negating the size advantages. Possible
|
||||
solutions: implement write access as a compile-time option, or a new,
|
||||
similarly small writable filesystem for RAM disks.
|
||||
|
||||
- Since the files are only required to have alignment on a 16 byte
|
||||
boundary, it is currently possibly suboptimal to read or execute files
|
||||
from the filesystem. It might be resolved by reordering file data to
|
||||
have most of it (i.e. except the start and the end) laying at "natural"
|
||||
boundaries, thus it would be possible to directly map a big portion of
|
||||
the file contents to the mm subsystem.
|
||||
boundary, it is currently possibly suboptimal to read or execute files
|
||||
from the filesystem. It might be resolved by reordering file data to
|
||||
have most of it (i.e. except the start and the end) laying at "natural"
|
||||
boundaries, thus it would be possible to directly map a big portion of
|
||||
the file contents to the mm subsystem.
|
||||
|
||||
- Compression might be an useful feature, but memory is quite a
|
||||
limiting factor in my eyes.
|
||||
limiting factor in my eyes.
|
||||
|
||||
- Where it is used?
|
||||
|
||||
@ -183,4 +190,5 @@ limiting factor in my eyes.
|
||||
|
||||
|
||||
Have fun,
|
||||
|
||||
Janos Farkas <chexum@shadow.banki.hu>
|
@ -1,7 +1,11 @@
|
||||
SQUASHFS 4.0 FILESYSTEM
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
Squashfs 4.0 Filesystem
|
||||
=======================
|
||||
|
||||
Squashfs is a compressed read-only filesystem for Linux.
|
||||
|
||||
It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
|
||||
directories. Inodes in the system are very small and all blocks are packed to
|
||||
minimise data overhead. Block sizes greater than 4K are supported up to a
|
||||
@ -15,31 +19,33 @@ needed.
|
||||
Mailing list: squashfs-devel@lists.sourceforge.net
|
||||
Web site: www.squashfs.org
|
||||
|
||||
1. FILESYSTEM FEATURES
|
||||
1. Filesystem Features
|
||||
----------------------
|
||||
|
||||
Squashfs filesystem features versus Cramfs:
|
||||
|
||||
============================== ========= ==========
|
||||
Squashfs Cramfs
|
||||
|
||||
Max filesystem size: 2^64 256 MiB
|
||||
Max file size: ~ 2 TiB 16 MiB
|
||||
Max files: unlimited unlimited
|
||||
Max directories: unlimited unlimited
|
||||
Max entries per directory: unlimited unlimited
|
||||
Max block size: 1 MiB 4 KiB
|
||||
Metadata compression: yes no
|
||||
Directory indexes: yes no
|
||||
Sparse file support: yes no
|
||||
Tail-end packing (fragments): yes no
|
||||
Exportable (NFS etc.): yes no
|
||||
Hard link support: yes no
|
||||
"." and ".." in readdir: yes no
|
||||
Real inode numbers: yes no
|
||||
32-bit uids/gids: yes no
|
||||
File creation time: yes no
|
||||
Xattr support: yes no
|
||||
ACL support: no no
|
||||
============================== ========= ==========
|
||||
Max filesystem size 2^64 256 MiB
|
||||
Max file size ~ 2 TiB 16 MiB
|
||||
Max files unlimited unlimited
|
||||
Max directories unlimited unlimited
|
||||
Max entries per directory unlimited unlimited
|
||||
Max block size 1 MiB 4 KiB
|
||||
Metadata compression yes no
|
||||
Directory indexes yes no
|
||||
Sparse file support yes no
|
||||
Tail-end packing (fragments) yes no
|
||||
Exportable (NFS etc.) yes no
|
||||
Hard link support yes no
|
||||
"." and ".." in readdir yes no
|
||||
Real inode numbers yes no
|
||||
32-bit uids/gids yes no
|
||||
File creation time yes no
|
||||
Xattr support yes no
|
||||
ACL support no no
|
||||
============================== ========= ==========
|
||||
|
||||
Squashfs compresses data, inodes and directories. In addition, inode and
|
||||
directory data are highly compacted, and packed on byte boundaries. Each
|
||||
@ -47,7 +53,7 @@ compressed inode is on average 8 bytes in length (the exact length varies on
|
||||
file type, i.e. regular file, directory, symbolic link, and block/char device
|
||||
inodes have different sizes).
|
||||
|
||||
2. USING SQUASHFS
|
||||
2. Using Squashfs
|
||||
-----------------
|
||||
|
||||
As squashfs is a read-only filesystem, the mksquashfs program must be used to
|
||||
@ -58,11 +64,11 @@ obtained from this site also.
|
||||
The squashfs-tools development tree is now located on kernel.org
|
||||
git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
|
||||
|
||||
3. SQUASHFS FILESYSTEM DESIGN
|
||||
3. Squashfs Filesystem Design
|
||||
-----------------------------
|
||||
|
||||
A squashfs filesystem consists of a maximum of nine parts, packed together on a
|
||||
byte alignment:
|
||||
byte alignment::
|
||||
|
||||
---------------
|
||||
| superblock |
|
||||
@ -229,15 +235,15 @@ location of the xattr list inside each inode, a 32-bit xattr id
|
||||
is stored. This xattr id is mapped into the location of the xattr
|
||||
list using a second xattr id lookup table.
|
||||
|
||||
4. TODOS AND OUTSTANDING ISSUES
|
||||
4. TODOs and Outstanding Issues
|
||||
-------------------------------
|
||||
|
||||
4.1 Todo list
|
||||
4.1 TODO list
|
||||
-------------
|
||||
|
||||
Implement ACL support.
|
||||
|
||||
4.2 Squashfs internal cache
|
||||
4.2 Squashfs Internal Cache
|
||||
---------------------------
|
||||
|
||||
Blocks in Squashfs are compressed. To avoid repeatedly decompressing
|
@ -1,11 +1,15 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
sysfs - _The_ filesystem for exporting kernel objects.
|
||||
=====================================================
|
||||
sysfs - _The_ filesystem for exporting kernel objects
|
||||
=====================================================
|
||||
|
||||
Patrick Mochel <mochel@osdl.org>
|
||||
|
||||
Mike Murphy <mamurph@cs.clemson.edu>
|
||||
|
||||
Revised: 16 August 2011
|
||||
Original: 10 January 2003
|
||||
:Revised: 16 August 2011
|
||||
:Original: 10 January 2003
|
||||
|
||||
|
||||
What it is:
|
||||
@ -24,7 +28,7 @@ Using sysfs
|
||||
~~~~~~~~~~~
|
||||
|
||||
sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
|
||||
it by doing:
|
||||
it by doing::
|
||||
|
||||
mount -t sysfs sysfs /sys
|
||||
|
||||
@ -65,17 +69,17 @@ formatting of data is heavily frowned upon. Doing these things may get
|
||||
you publicly humiliated and your code rewritten without notice.
|
||||
|
||||
|
||||
An attribute definition is simply:
|
||||
An attribute definition is simply::
|
||||
|
||||
struct attribute {
|
||||
struct attribute {
|
||||
char * name;
|
||||
struct module *owner;
|
||||
umode_t mode;
|
||||
};
|
||||
};
|
||||
|
||||
|
||||
int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
|
||||
void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
|
||||
int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
|
||||
void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
|
||||
|
||||
|
||||
A bare attribute contains no means to read or write the value of the
|
||||
@ -83,38 +87,38 @@ attribute. Subsystems are encouraged to define their own attribute
|
||||
structure and wrapper functions for adding and removing attributes for
|
||||
a specific object type.
|
||||
|
||||
For example, the driver model defines struct device_attribute like:
|
||||
For example, the driver model defines struct device_attribute like::
|
||||
|
||||
struct device_attribute {
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
};
|
||||
|
||||
int device_create_file(struct device *, const struct device_attribute *);
|
||||
void device_remove_file(struct device *, const struct device_attribute *);
|
||||
int device_create_file(struct device *, const struct device_attribute *);
|
||||
void device_remove_file(struct device *, const struct device_attribute *);
|
||||
|
||||
It also defines this helper for defining device attributes:
|
||||
It also defines this helper for defining device attributes::
|
||||
|
||||
#define DEVICE_ATTR(_name, _mode, _show, _store) \
|
||||
struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
|
||||
#define DEVICE_ATTR(_name, _mode, _show, _store) \
|
||||
struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
|
||||
|
||||
For example, declaring
|
||||
For example, declaring::
|
||||
|
||||
static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
|
||||
static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
|
||||
|
||||
is equivalent to doing:
|
||||
is equivalent to doing::
|
||||
|
||||
static struct device_attribute dev_attr_foo = {
|
||||
static struct device_attribute dev_attr_foo = {
|
||||
.attr = {
|
||||
.name = "foo",
|
||||
.mode = S_IWUSR | S_IRUGO,
|
||||
},
|
||||
.show = show_foo,
|
||||
.store = store_foo,
|
||||
};
|
||||
};
|
||||
|
||||
Note as stated in include/linux/kernel.h "OTHER_WRITABLE? Generally
|
||||
considered a bad idea." so trying to set a sysfs file writable for
|
||||
@ -127,15 +131,21 @@ readable. The above case could be shortened to:
|
||||
static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
|
||||
|
||||
the list of helpers available to define your wrapper function is:
|
||||
__ATTR_RO(name): assumes default name_show and mode 0444
|
||||
__ATTR_WO(name): assumes a name_store only and is restricted to mode
|
||||
|
||||
__ATTR_RO(name):
|
||||
assumes default name_show and mode 0444
|
||||
__ATTR_WO(name):
|
||||
assumes a name_store only and is restricted to mode
|
||||
0200 that is root write access only.
|
||||
__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently
|
||||
__ATTR_RO_MODE(name, mode):
|
||||
fore more restrictive RO access currently
|
||||
only use case is the EFI System Resource Table
|
||||
(see drivers/firmware/efi/esrt.c)
|
||||
__ATTR_RW(name): assumes default name_show, name_store and setting
|
||||
__ATTR_RW(name):
|
||||
assumes default name_show, name_store and setting
|
||||
mode to 0644.
|
||||
__ATTR_NULL: which sets the name to NULL and is used as end of list
|
||||
__ATTR_NULL:
|
||||
which sets the name to NULL and is used as end of list
|
||||
indicator (see: kernel/workqueue.c)
|
||||
|
||||
Subsystem-Specific Callbacks
|
||||
@ -143,12 +153,12 @@ Subsystem-Specific Callbacks
|
||||
|
||||
When a subsystem defines a new attribute type, it must implement a
|
||||
set of sysfs operations for forwarding read and write calls to the
|
||||
show and store methods of the attribute owners.
|
||||
show and store methods of the attribute owners::
|
||||
|
||||
struct sysfs_ops {
|
||||
struct sysfs_ops {
|
||||
ssize_t (*show)(struct kobject *, struct attribute *, char *);
|
||||
ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
|
||||
};
|
||||
};
|
||||
|
||||
[ Subsystems should have already defined a struct kobj_type as a
|
||||
descriptor for this type, which is where the sysfs_ops pointer is
|
||||
@ -160,14 +170,14 @@ and struct attribute pointers to the appropriate pointer types, and
|
||||
calls the associated methods.
|
||||
|
||||
|
||||
To illustrate:
|
||||
To illustrate::
|
||||
|
||||
#define to_dev(obj) container_of(obj, struct device, kobj)
|
||||
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
|
||||
#define to_dev(obj) container_of(obj, struct device, kobj)
|
||||
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
|
||||
|
||||
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
|
||||
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
{
|
||||
struct device_attribute *dev_attr = to_dev_attr(attr);
|
||||
struct device *dev = to_dev(kobj);
|
||||
ssize_t ret = -EIO;
|
||||
@ -179,7 +189,7 @@ static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
|
||||
dev_attr->show);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
@ -188,10 +198,10 @@ Reading/Writing Attribute Data
|
||||
|
||||
To read or write attributes, show() or store() methods must be
|
||||
specified when declaring the attribute. The method types should be as
|
||||
simple as those defined for device attributes:
|
||||
simple as those defined for device attributes::
|
||||
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
|
||||
IOW, they should take only an object, an attribute, and a buffer as parameters.
|
||||
@ -251,23 +261,23 @@ Other notes:
|
||||
sure to have a way to check this, if necessary.
|
||||
|
||||
|
||||
A very simple (and naive) implementation of a device attribute is:
|
||||
A very simple (and naive) implementation of a device attribute is::
|
||||
|
||||
static ssize_t show_name(struct device *dev, struct device_attribute *attr,
|
||||
static ssize_t show_name(struct device *dev, struct device_attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
{
|
||||
return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
|
||||
}
|
||||
}
|
||||
|
||||
static ssize_t store_name(struct device *dev, struct device_attribute *attr,
|
||||
static ssize_t store_name(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count)
|
||||
{
|
||||
{
|
||||
snprintf(dev->name, sizeof(dev->name), "%.*s",
|
||||
(int)min(count, sizeof(dev->name) - 1), buf);
|
||||
return count;
|
||||
}
|
||||
}
|
||||
|
||||
static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
|
||||
static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
|
||||
|
||||
|
||||
(Note that the real implementation doesn't allow userspace to set the
|
||||
@ -280,23 +290,23 @@ Top Level Directory Layout
|
||||
The sysfs directory arrangement exposes the relationship of kernel
|
||||
data structures.
|
||||
|
||||
The top level sysfs directory looks like:
|
||||
The top level sysfs directory looks like::
|
||||
|
||||
block/
|
||||
bus/
|
||||
class/
|
||||
dev/
|
||||
devices/
|
||||
firmware/
|
||||
net/
|
||||
fs/
|
||||
block/
|
||||
bus/
|
||||
class/
|
||||
dev/
|
||||
devices/
|
||||
firmware/
|
||||
net/
|
||||
fs/
|
||||
|
||||
devices/ contains a filesystem representation of the device tree. It maps
|
||||
directly to the internal kernel device tree, which is a hierarchy of
|
||||
struct device.
|
||||
|
||||
bus/ contains flat directory layout of the various bus types in the
|
||||
kernel. Each bus's directory contains two subdirectories:
|
||||
kernel. Each bus's directory contains two subdirectories::
|
||||
|
||||
devices/
|
||||
drivers/
|
||||
@ -331,71 +341,71 @@ Current Interfaces
|
||||
The following interface layers currently exist in sysfs:
|
||||
|
||||
|
||||
- devices (include/linux/device.h)
|
||||
----------------------------------
|
||||
Structure:
|
||||
devices (include/linux/device.h)
|
||||
--------------------------------
|
||||
Structure::
|
||||
|
||||
struct device_attribute {
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
};
|
||||
|
||||
Declaring:
|
||||
Declaring::
|
||||
|
||||
DEVICE_ATTR(_name, _mode, _show, _store);
|
||||
DEVICE_ATTR(_name, _mode, _show, _store);
|
||||
|
||||
Creation/Removal:
|
||||
Creation/Removal::
|
||||
|
||||
int device_create_file(struct device *dev, const struct device_attribute * attr);
|
||||
void device_remove_file(struct device *dev, const struct device_attribute * attr);
|
||||
int device_create_file(struct device *dev, const struct device_attribute * attr);
|
||||
void device_remove_file(struct device *dev, const struct device_attribute * attr);
|
||||
|
||||
|
||||
- bus drivers (include/linux/device.h)
|
||||
--------------------------------------
|
||||
Structure:
|
||||
bus drivers (include/linux/device.h)
|
||||
------------------------------------
|
||||
Structure::
|
||||
|
||||
struct bus_attribute {
|
||||
struct bus_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct bus_type *, char * buf);
|
||||
ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
|
||||
};
|
||||
};
|
||||
|
||||
Declaring:
|
||||
Declaring::
|
||||
|
||||
static BUS_ATTR_RW(name);
|
||||
static BUS_ATTR_RO(name);
|
||||
static BUS_ATTR_WO(name);
|
||||
static BUS_ATTR_RW(name);
|
||||
static BUS_ATTR_RO(name);
|
||||
static BUS_ATTR_WO(name);
|
||||
|
||||
Creation/Removal:
|
||||
Creation/Removal::
|
||||
|
||||
int bus_create_file(struct bus_type *, struct bus_attribute *);
|
||||
void bus_remove_file(struct bus_type *, struct bus_attribute *);
|
||||
int bus_create_file(struct bus_type *, struct bus_attribute *);
|
||||
void bus_remove_file(struct bus_type *, struct bus_attribute *);
|
||||
|
||||
|
||||
- device drivers (include/linux/device.h)
|
||||
-----------------------------------------
|
||||
device drivers (include/linux/device.h)
|
||||
---------------------------------------
|
||||
|
||||
Structure:
|
||||
Structure::
|
||||
|
||||
struct driver_attribute {
|
||||
struct driver_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device_driver *, char * buf);
|
||||
ssize_t (*store)(struct device_driver *, const char * buf,
|
||||
size_t count);
|
||||
};
|
||||
};
|
||||
|
||||
Declaring:
|
||||
Declaring::
|
||||
|
||||
DRIVER_ATTR_RO(_name)
|
||||
DRIVER_ATTR_RW(_name)
|
||||
DRIVER_ATTR_RO(_name)
|
||||
DRIVER_ATTR_RW(_name)
|
||||
|
||||
Creation/Removal:
|
||||
Creation/Removal::
|
||||
|
||||
int driver_create_file(struct device_driver *, const struct driver_attribute *);
|
||||
void driver_remove_file(struct device_driver *, const struct driver_attribute *);
|
||||
int driver_create_file(struct device_driver *, const struct driver_attribute *);
|
||||
void driver_remove_file(struct device_driver *, const struct driver_attribute *);
|
||||
|
||||
|
||||
Documentation
|
@ -1,25 +1,40 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
SystemV Filesystem
|
||||
==================
|
||||
|
||||
It implements all of
|
||||
- Xenix FS,
|
||||
- SystemV/386 FS,
|
||||
- Coherent FS.
|
||||
|
||||
To install:
|
||||
|
||||
* Answer the 'System V and Coherent filesystem support' question with 'y'
|
||||
when configuring the kernel.
|
||||
* To mount a disk or a partition, use
|
||||
* To mount a disk or a partition, use::
|
||||
|
||||
mount [-r] -t sysv device mountpoint
|
||||
The file system type names
|
||||
|
||||
The file system type names::
|
||||
|
||||
-t sysv
|
||||
-t xenix
|
||||
-t coherent
|
||||
|
||||
may be used interchangeably, but the last two will eventually disappear.
|
||||
|
||||
Bugs in the present implementation:
|
||||
|
||||
- Coherent FS:
|
||||
|
||||
- The "free list interleave" n:m is currently ignored.
|
||||
- Only file systems with no filesystem name and no pack name are recognized.
|
||||
(See Coherent "man mkfs" for a description of these features.)
|
||||
|
||||
- SystemV Release 2 FS:
|
||||
|
||||
The superblock is only searched in the blocks 9, 15, 18, which
|
||||
corresponds to the beginning of track 1 on floppy disks. No support
|
||||
for this FS on hard disk yet.
|
||||
@ -28,12 +43,14 @@ Bugs in the present implementation:
|
||||
These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
|
||||
* Linux fdisk reports on partitions
|
||||
|
||||
- Minix FS 0x81 Linux/Minix
|
||||
- Xenix FS ??
|
||||
- SystemV FS ??
|
||||
- Coherent FS 0x08 AIX bootable
|
||||
|
||||
* Size of a block or zone (data allocation unit on disk)
|
||||
|
||||
- Minix FS 1024
|
||||
- Xenix FS 1024 (also 512 ??)
|
||||
- SystemV FS 1024 (also 512 and 2048)
|
||||
@ -45,37 +62,51 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
all the block numbers (including the super block) are offset by one track.
|
||||
|
||||
* Byte ordering of "short" (16 bit entities) on disk:
|
||||
|
||||
- Minix FS little endian 0 1
|
||||
- Xenix FS little endian 0 1
|
||||
- SystemV FS little endian 0 1
|
||||
- Coherent FS little endian 0 1
|
||||
|
||||
Of course, this affects only the file system, not the data of files on it!
|
||||
|
||||
* Byte ordering of "long" (32 bit entities) on disk:
|
||||
|
||||
- Minix FS little endian 0 1 2 3
|
||||
- Xenix FS little endian 0 1 2 3
|
||||
- SystemV FS little endian 0 1 2 3
|
||||
- Coherent FS PDP-11 2 3 0 1
|
||||
|
||||
Of course, this affects only the file system, not the data of files on it!
|
||||
|
||||
* Inode on disk: "short", 0 means non-existent, the root dir ino is:
|
||||
- Minix FS 1
|
||||
- Xenix FS, SystemV FS, Coherent FS 2
|
||||
|
||||
================================= ==
|
||||
Minix FS 1
|
||||
Xenix FS, SystemV FS, Coherent FS 2
|
||||
================================= ==
|
||||
|
||||
* Maximum number of hard links to a file:
|
||||
- Minix FS 250
|
||||
- Xenix FS ??
|
||||
- SystemV FS ??
|
||||
- Coherent FS >=10000
|
||||
|
||||
=========== =========
|
||||
Minix FS 250
|
||||
Xenix FS ??
|
||||
SystemV FS ??
|
||||
Coherent FS >=10000
|
||||
=========== =========
|
||||
|
||||
* Free inode management:
|
||||
- Minix FS a bitmap
|
||||
|
||||
- Minix FS
|
||||
a bitmap
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
There is a cache of a certain number of free inodes in the super-block.
|
||||
When it is exhausted, new free inodes are found using a linear search.
|
||||
|
||||
* Free block management:
|
||||
- Minix FS a bitmap
|
||||
|
||||
- Minix FS
|
||||
a bitmap
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
Free blocks are organized in a "free list". Maybe a misleading term,
|
||||
since it is not true that every free block contains a pointer to
|
||||
@ -86,13 +117,18 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.
|
||||
|
||||
* Super-block location:
|
||||
- Minix FS block 1 = bytes 1024..2047
|
||||
- Xenix FS block 1 = bytes 1024..2047
|
||||
- SystemV FS bytes 512..1023
|
||||
- Coherent FS block 1 = bytes 512..1023
|
||||
|
||||
=========== ==========================
|
||||
Minix FS block 1 = bytes 1024..2047
|
||||
Xenix FS block 1 = bytes 1024..2047
|
||||
SystemV FS bytes 512..1023
|
||||
Coherent FS block 1 = bytes 512..1023
|
||||
=========== ==========================
|
||||
|
||||
* Super-block layout:
|
||||
- Minix FS
|
||||
|
||||
- Minix FS::
|
||||
|
||||
unsigned short s_ninodes;
|
||||
unsigned short s_nzones;
|
||||
unsigned short s_imap_blocks;
|
||||
@ -101,7 +137,9 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned short s_log_zone_size;
|
||||
unsigned long s_max_size;
|
||||
unsigned short s_magic;
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS::
|
||||
|
||||
unsigned short s_firstdatazone;
|
||||
unsigned long s_nzones;
|
||||
unsigned short s_fzone_count;
|
||||
@ -120,23 +158,33 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only
|
||||
char s_fname[6];
|
||||
char s_fpack[6];
|
||||
|
||||
then they differ considerably:
|
||||
Xenix FS
|
||||
|
||||
Xenix FS::
|
||||
|
||||
char s_clean;
|
||||
char s_fill[371];
|
||||
long s_magic;
|
||||
long s_type;
|
||||
SystemV FS
|
||||
|
||||
SystemV FS::
|
||||
|
||||
long s_fill[12 or 14];
|
||||
long s_state;
|
||||
long s_magic;
|
||||
long s_type;
|
||||
Coherent FS
|
||||
|
||||
Coherent FS::
|
||||
|
||||
unsigned long s_unique;
|
||||
|
||||
Note that Coherent FS has no magic.
|
||||
|
||||
* Inode layout:
|
||||
- Minix FS
|
||||
|
||||
- Minix FS::
|
||||
|
||||
unsigned short i_mode;
|
||||
unsigned short i_uid;
|
||||
unsigned long i_size;
|
||||
@ -144,7 +192,9 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned char i_gid;
|
||||
unsigned char i_nlinks;
|
||||
unsigned short i_zone[7+1+1];
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS::
|
||||
|
||||
unsigned short i_mode;
|
||||
unsigned short i_nlink;
|
||||
unsigned short i_uid;
|
||||
@ -155,38 +205,55 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned long i_mtime;
|
||||
unsigned long i_ctime;
|
||||
|
||||
* Regular file data blocks are organized as
|
||||
- Minix FS
|
||||
7 direct blocks
|
||||
1 indirect block (pointers to blocks)
|
||||
1 double-indirect block (pointer to pointers to blocks)
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
10 direct blocks
|
||||
1 indirect block (pointers to blocks)
|
||||
1 double-indirect block (pointer to pointers to blocks)
|
||||
1 triple-indirect block (pointer to pointers to pointers to blocks)
|
||||
|
||||
* Inode size, inodes per block
|
||||
- Minix FS 32 32
|
||||
- Xenix FS 64 16
|
||||
- SystemV FS 64 16
|
||||
- Coherent FS 64 8
|
||||
* Regular file data blocks are organized as
|
||||
|
||||
- Minix FS:
|
||||
|
||||
- 7 direct blocks
|
||||
- 1 indirect block (pointers to blocks)
|
||||
- 1 double-indirect block (pointer to pointers to blocks)
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS:
|
||||
|
||||
- 10 direct blocks
|
||||
- 1 indirect block (pointers to blocks)
|
||||
- 1 double-indirect block (pointer to pointers to blocks)
|
||||
- 1 triple-indirect block (pointer to pointers to pointers to blocks)
|
||||
|
||||
|
||||
=========== ========== ================
|
||||
Inode size inodes per block
|
||||
=========== ========== ================
|
||||
Minix FS 32 32
|
||||
Xenix FS 64 16
|
||||
SystemV FS 64 16
|
||||
Coherent FS 64 8
|
||||
=========== ========== ================
|
||||
|
||||
* Directory entry on disk
|
||||
- Minix FS
|
||||
|
||||
- Minix FS::
|
||||
|
||||
unsigned short inode;
|
||||
char name[14/30];
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS::
|
||||
|
||||
unsigned short inode;
|
||||
char name[14];
|
||||
|
||||
* Dir entry size, dir entries per block
|
||||
- Minix FS 16/32 64/32
|
||||
- Xenix FS 16 64
|
||||
- SystemV FS 16 64
|
||||
- Coherent FS 16 32
|
||||
=========== ============== =====================
|
||||
Dir entry size dir entries per block
|
||||
=========== ============== =====================
|
||||
Minix FS 16/32 64/32
|
||||
Xenix FS 16 64
|
||||
SystemV FS 16 64
|
||||
Coherent FS 16 32
|
||||
=========== ============== =====================
|
||||
|
||||
* How to implement symbolic links such that the host fsck doesn't scream:
|
||||
|
||||
- Minix FS normal
|
||||
- Xenix FS kludge: as regular files with chmod 1000
|
||||
- SystemV FS ??
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====
|
||||
Tmpfs
|
||||
=====
|
||||
|
||||
Tmpfs is a file system which keeps all files in virtual memory.
|
||||
|
||||
|
||||
@ -34,7 +40,7 @@ tmpfs has the following uses:
|
||||
|
||||
2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
|
||||
POSIX shared memory (shm_open, shm_unlink). Adding the following
|
||||
line to /etc/fstab should take care of this:
|
||||
line to /etc/fstab should take care of this::
|
||||
|
||||
tmpfs /dev/shm tmpfs defaults 0 0
|
||||
|
||||
@ -56,15 +62,17 @@ tmpfs has the following uses:
|
||||
|
||||
tmpfs has three mount options for sizing:
|
||||
|
||||
size: The limit of allocated bytes for this tmpfs instance. The
|
||||
========= ============================================================
|
||||
size The limit of allocated bytes for this tmpfs instance. The
|
||||
default is half of your physical RAM without swap. If you
|
||||
oversize your tmpfs instances the machine will deadlock
|
||||
since the OOM handler will not be able to free that memory.
|
||||
nr_blocks: The same as size, but in blocks of PAGE_SIZE.
|
||||
nr_inodes: The maximum number of inodes for this instance. The default
|
||||
nr_blocks The same as size, but in blocks of PAGE_SIZE.
|
||||
nr_inodes The maximum number of inodes for this instance. The default
|
||||
is half of the number of your physical RAM pages, or (on a
|
||||
machine with highmem) the number of lowmem RAM pages,
|
||||
whichever is the lower.
|
||||
========= ============================================================
|
||||
|
||||
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
||||
can be changed on remount. The size parameter also accepts a suffix %
|
||||
@ -82,6 +90,7 @@ tmpfs has a mount option to set the NUMA memory allocation policy for
|
||||
all files in that instance (if CONFIG_NUMA is enabled) - which can be
|
||||
adjusted on the fly via 'mount -o remount ...'
|
||||
|
||||
======================== ==============================================
|
||||
mpol=default use the process allocation policy
|
||||
(see set_mempolicy(2))
|
||||
mpol=prefer:Node prefers to allocate memory from the given Node
|
||||
@ -89,6 +98,7 @@ mpol=bind:NodeList allocates memory only from nodes in NodeList
|
||||
mpol=interleave prefers to allocate from each node in turn
|
||||
mpol=interleave:NodeList allocates from each node of NodeList in turn
|
||||
mpol=local prefers to allocate memory from the local node
|
||||
======================== ==============================================
|
||||
|
||||
NodeList format is a comma-separated list of decimal numbers and ranges,
|
||||
a range being two hyphen-separated decimal numbers, the smallest and
|
||||
@ -98,9 +108,9 @@ A memory policy with a valid NodeList will be saved, as specified, for
|
||||
use at file creation time. When a task allocates a file in the file
|
||||
system, the mount option memory policy will be applied with a NodeList,
|
||||
if any, modified by the calling task's cpuset constraints
|
||||
[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed
|
||||
below. If the resulting NodeLists is the empty set, the effective memory
|
||||
policy for the file will revert to "default" policy.
|
||||
[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags,
|
||||
listed below. If the resulting NodeLists is the empty set, the effective
|
||||
memory policy for the file will revert to "default" policy.
|
||||
|
||||
NUMA memory allocation policies have optional flags that can be used in
|
||||
conjunction with their modes. These optional flags can be specified
|
||||
@ -109,6 +119,8 @@ See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of
|
||||
all available memory allocation policy mode flags and their effect on
|
||||
memory policy.
|
||||
|
||||
::
|
||||
|
||||
=static is equivalent to MPOL_F_STATIC_NODES
|
||||
=relative is equivalent to MPOL_F_RELATIVE_NODES
|
||||
|
||||
@ -128,9 +140,11 @@ on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
|
||||
To specify the initial root directory you can use the following mount
|
||||
options:
|
||||
|
||||
mode: The permissions as an octal number
|
||||
uid: The user id
|
||||
gid: The group id
|
||||
==== ==================================
|
||||
mode The permissions as an octal number
|
||||
uid The user id
|
||||
gid The group id
|
||||
==== ==================================
|
||||
|
||||
These options do not have any effect on remount. You can change these
|
||||
parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
|
||||
@ -141,9 +155,9 @@ will give you tmpfs instance on /mytmpfs which can allocate 10GB
|
||||
RAM/SWAP in 10240 inodes and it is only accessible by root.
|
||||
|
||||
|
||||
Author:
|
||||
:Author:
|
||||
Christoph Rohland <cr@sap.com>, 1.12.01
|
||||
Updated:
|
||||
:Updated:
|
||||
Hugh Dickins, 4 June 2007
|
||||
Updated:
|
||||
:Updated:
|
||||
KOSAKI Motohiro, 16 Mar 2010
|
@ -1,3 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
:orphan:
|
||||
|
||||
.. UBIFS Authentication
|
||||
@ -92,11 +94,11 @@ UBIFS Index & Tree Node Cache
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
|
||||
of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file
|
||||
contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes.
|
||||
Almost all types of nodes share a common header (`ubifs_ch`) containing basic
|
||||
of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
|
||||
contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
|
||||
Almost all types of nodes share a common header (``ubifs_ch``) containing basic
|
||||
information like node type, node length, a sequence number, etc. (see
|
||||
`fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT
|
||||
``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
|
||||
and some less important node types like padding nodes which are used to pad
|
||||
unusable content at the end of LEBs.
|
||||
|
||||
|
@ -1,5 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
UBI File System
|
||||
===============
|
||||
|
||||
Introduction
|
||||
=============
|
||||
============
|
||||
|
||||
UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
|
||||
Block Images". UBIFS is a flash file system, which means it is designed
|
||||
@ -79,6 +85,7 @@ Mount options
|
||||
|
||||
(*) == default.
|
||||
|
||||
==================== =======================================================
|
||||
bulk_read read more in one go to take advantage of flash
|
||||
media that read faster sequentially
|
||||
no_bulk_read (*) do not bulk-read
|
||||
@ -98,6 +105,7 @@ auth_key= specify the key used for authenticating the filesystem.
|
||||
auth_hash_name= The hash algorithm used for authentication. Used for
|
||||
both hashing and for creating HMACs. Typical values
|
||||
include "sha256" or "sha512"
|
||||
==================== =======================================================
|
||||
|
||||
|
||||
Quick usage instructions
|
||||
@ -107,12 +115,14 @@ The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
|
||||
where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
|
||||
UBI volume name.
|
||||
|
||||
Mount volume 0 on UBI device 0 to /mnt/ubifs:
|
||||
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||
Mount volume 0 on UBI device 0 to /mnt/ubifs::
|
||||
|
||||
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||
|
||||
Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
|
||||
name):
|
||||
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||
name)::
|
||||
|
||||
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||
|
||||
The following is an example of the kernel boot arguments to attach mtd0
|
||||
to UBI and mount volume "rootfs":
|
||||
@ -122,5 +132,6 @@ References
|
||||
==========
|
||||
|
||||
UBIFS documentation and FAQ/HOWTO at the MTD web site:
|
||||
http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||
http://www.linux-mtd.infradead.org/faq/ubifs.html
|
||||
|
||||
- http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||
- http://www.linux-mtd.infradead.org/faq/ubifs.html
|
@ -1,6 +1,8 @@
|
||||
*
|
||||
* Documentation/filesystems/udf.txt
|
||||
*
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
UDF file system
|
||||
===============
|
||||
|
||||
If you encounter problems with reading UDF discs using this driver,
|
||||
please report them according to MAINTAINERS file.
|
||||
@ -18,8 +20,10 @@ performance due to very poor read-modify-write support supplied internally
|
||||
by drive firmware.
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following mount options are supported:
|
||||
|
||||
=========== ======================================
|
||||
gid= Set the default group.
|
||||
umask= Set the default umask.
|
||||
mode= Set the default file permissions.
|
||||
@ -34,6 +38,7 @@ The following mount options are supported:
|
||||
longad Use long ad's (default)
|
||||
nostrict Unset strict conformance
|
||||
iocharset= Set the NLS character set
|
||||
=========== ======================================
|
||||
|
||||
The uid= and gid= options need a bit more explaining. They will accept a
|
||||
decimal numeric value and all inodes on that mount will then appear as
|
||||
@ -47,13 +52,17 @@ the interactive user will always see the files on the disk as belonging to him.
|
||||
|
||||
The remaining are for debugging and disaster recovery:
|
||||
|
||||
===== ================================
|
||||
novrs Skip volume sequence recognition
|
||||
===== ================================
|
||||
|
||||
The following expect a offset from 0.
|
||||
|
||||
========== =================================================
|
||||
session= Set the CDROM session (default= last session)
|
||||
anchor= Override standard anchor location. (default= 256)
|
||||
lastblock= Set the last block of the filesystem/
|
||||
========== =================================================
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
@ -62,5 +71,5 @@ For the latest version and toolset see:
|
||||
https://github.com/pali/udftools
|
||||
|
||||
Documentation on UDF and ECMA 167 is available FREE from:
|
||||
http://www.osta.org/
|
||||
http://www.ecma-international.org/
|
||||
- http://www.osta.org/
|
||||
- http://www.ecma-international.org/
|
@ -1,5 +1,7 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. _virtiofs_index:
|
||||
|
||||
===================================================
|
||||
virtiofs: virtio-fs host<->guest shared file system
|
||||
===================================================
|
||||
|
@ -1,4 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================================
|
||||
ZoneFS - Zone filesystem for Zoned block devices
|
||||
================================================
|
||||
|
||||
Introduction
|
||||
============
|
||||
@ -29,6 +33,7 @@ Zoned block devices
|
||||
Zoned storage devices belong to a class of storage devices with an address
|
||||
space that is divided into zones. A zone is a group of consecutive LBAs and all
|
||||
zones are contiguous (there are no LBA gaps). Zones may have different types.
|
||||
|
||||
* Conventional zones: there are no access constraints to LBAs belonging to
|
||||
conventional zones. Any read or write access can be executed, similarly to a
|
||||
regular block device.
|
||||
@ -158,6 +163,7 @@ Format options
|
||||
--------------
|
||||
|
||||
Several optional features of zonefs can be enabled at format time.
|
||||
|
||||
* Conventional zone aggregation: ranges of contiguous conventional zones can be
|
||||
aggregated into a single larger file instead of the default one file per zone.
|
||||
* File ownership: The owner UID and GID of zone files is by default 0 (root)
|
||||
@ -249,7 +255,7 @@ permissions.
|
||||
Further action taken by zonefs I/O error recovery can be controlled by the user
|
||||
with the "errors=xxx" mount option. The table below summarizes the result of
|
||||
zonefs I/O error processing depending on the mount option and on the zone
|
||||
conditions.
|
||||
conditions::
|
||||
|
||||
+--------------+-----------+-----------------------------------------+
|
||||
| | | Post error state |
|
||||
@ -275,6 +281,7 @@ conditions.
|
||||
+--------------+-----------+-----------------------------------------+
|
||||
|
||||
Further notes:
|
||||
|
||||
* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
|
||||
error processing if no errors mount option is specified.
|
||||
* With the "errors=remount-ro" mount option, the change of the file access
|
||||
@ -302,6 +309,7 @@ Mount options
|
||||
zonefs define the "errors=<behavior>" mount option to allow the user to specify
|
||||
zonefs behavior in response to I/O errors, inode size inconsistencies or zone
|
||||
condition changes. The defined behaviors are as follow:
|
||||
|
||||
* remount-ro (default)
|
||||
* zone-ro
|
||||
* zone-offline
|
||||
@ -333,77 +341,77 @@ Examples
|
||||
--------
|
||||
|
||||
The following formats a 15TB host-managed SMR HDD with 256 MB zones
|
||||
with the conventional zones aggregation feature enabled.
|
||||
with the conventional zones aggregation feature enabled::
|
||||
|
||||
# mkzonefs -o aggr_cnv /dev/sdX
|
||||
# mount -t zonefs /dev/sdX /mnt
|
||||
# ls -l /mnt/
|
||||
total 0
|
||||
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
|
||||
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
|
||||
# mkzonefs -o aggr_cnv /dev/sdX
|
||||
# mount -t zonefs /dev/sdX /mnt
|
||||
# ls -l /mnt/
|
||||
total 0
|
||||
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
|
||||
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
|
||||
|
||||
The size of the zone files sub-directories indicate the number of files
|
||||
existing for each type of zones. In this example, there is only one
|
||||
conventional zone file (all conventional zones are aggregated under a single
|
||||
file).
|
||||
file)::
|
||||
|
||||
# ls -l /mnt/cnv
|
||||
total 137101312
|
||||
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0
|
||||
# ls -l /mnt/cnv
|
||||
total 137101312
|
||||
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0
|
||||
|
||||
This aggregated conventional zone file can be used as a regular file.
|
||||
This aggregated conventional zone file can be used as a regular file::
|
||||
|
||||
# mkfs.ext4 /mnt/cnv/0
|
||||
# mount -o loop /mnt/cnv/0 /data
|
||||
# mkfs.ext4 /mnt/cnv/0
|
||||
# mount -o loop /mnt/cnv/0 /data
|
||||
|
||||
The "seq" sub-directory grouping files for sequential write zones has in this
|
||||
example 55356 zones.
|
||||
example 55356 zones::
|
||||
|
||||
# ls -lv /mnt/seq
|
||||
total 14511243264
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 1
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 2
|
||||
...
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55354
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55355
|
||||
# ls -lv /mnt/seq
|
||||
total 14511243264
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 1
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 2
|
||||
...
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55354
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55355
|
||||
|
||||
For sequential write zone files, the file size changes as data is appended at
|
||||
the end of the file, similarly to any regular file system.
|
||||
the end of the file, similarly to any regular file system::
|
||||
|
||||
# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
|
||||
1+0 records in
|
||||
1+0 records out
|
||||
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
|
||||
# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
|
||||
1+0 records in
|
||||
1+0 records out
|
||||
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
|
||||
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
|
||||
|
||||
The written file can be truncated to the zone size, preventing any further
|
||||
write operation.
|
||||
write operation::
|
||||
|
||||
# truncate -s 268435456 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
|
||||
# truncate -s 268435456 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
|
||||
|
||||
Truncation to 0 size allows freeing the file zone storage space and restart
|
||||
append-writes to the file.
|
||||
append-writes to the file::
|
||||
|
||||
# truncate -s 0 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
|
||||
# truncate -s 0 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
|
||||
|
||||
Since files are statically mapped to zones on the disk, the number of blocks of
|
||||
a file as reported by stat() and fstat() indicates the size of the file zone.
|
||||
a file as reported by stat() and fstat() indicates the size of the file zone::
|
||||
|
||||
# stat /mnt/seq/0
|
||||
# stat /mnt/seq/0
|
||||
File: /mnt/seq/0
|
||||
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
|
||||
Device: 870h/2160d Inode: 50431 Links: 1
|
||||
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
|
||||
Access: 2019-11-25 13:23:57.048971997 +0900
|
||||
Modify: 2019-11-25 13:52:25.553805765 +0900
|
||||
Change: 2019-11-25 13:52:25.553805765 +0900
|
||||
Device: 870h/2160d Inode: 50431 Links: 1
|
||||
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
|
||||
Access: 2019-11-25 13:23:57.048971997 +0900
|
||||
Modify: 2019-11-25 13:52:25.553805765 +0900
|
||||
Change: 2019-11-25 13:52:25.553805765 +0900
|
||||
Birth: -
|
||||
|
||||
The number of blocks of the file ("Blocks") in units of 512B blocks gives the
|
@ -207,10 +207,10 @@ DPIO
|
||||
CSR firmware support for DMC
|
||||
----------------------------
|
||||
|
||||
.. kernel-doc:: drivers/gpu/drm/i915/intel_csr.c
|
||||
.. kernel-doc:: drivers/gpu/drm/i915/display/intel_csr.c
|
||||
:doc: csr support for dmc
|
||||
|
||||
.. kernel-doc:: drivers/gpu/drm/i915/intel_csr.c
|
||||
.. kernel-doc:: drivers/gpu/drm/i915/display/intel_csr.c
|
||||
:internal:
|
||||
|
||||
Video BIOS Table (VBT)
|
||||
|
@ -131,7 +131,6 @@ needed).
|
||||
usb/index
|
||||
PCI/index
|
||||
misc-devices/index
|
||||
mic/index
|
||||
scheduler/index
|
||||
|
||||
Architecture-agnostic documentation
|
||||
|
@ -72,6 +72,10 @@ e.g., on Ubuntu for gcc-4.9::
|
||||
|
||||
apt-get install gcc-4.9-plugin-dev
|
||||
|
||||
Or on Fedora::
|
||||
|
||||
dnf install gcc-plugin-devel
|
||||
|
||||
Enable a GCC plugin based feature in the kernel config::
|
||||
|
||||
CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
|
@ -19,6 +19,7 @@ Kernel Build System
|
||||
|
||||
issues
|
||||
reproducible-builds
|
||||
gcc-plugins
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
|
@ -601,7 +601,7 @@ Defined in ``include/linux/export.h``
|
||||
|
||||
This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol
|
||||
namespace. Symbol Namespaces are documented in
|
||||
``Documentation/core-api/symbol-namespaces.rst``.
|
||||
:doc:`../core-api/symbol-namespaces`
|
||||
|
||||
:c:func:`EXPORT_SYMBOL_NS_GPL()`
|
||||
--------------------------------
|
||||
@ -610,7 +610,7 @@ Defined in ``include/linux/export.h``
|
||||
|
||||
This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol
|
||||
namespace. Symbol Namespaces are documented in
|
||||
``Documentation/core-api/symbol-namespaces.rst``.
|
||||
:doc:`../core-api/symbol-namespaces`
|
||||
|
||||
Routines and Conventions
|
||||
========================
|
||||
|
@ -150,17 +150,17 @@ Locking Only In User Context
|
||||
If you have a data structure which is only ever accessed from user
|
||||
context, then you can use a simple mutex (``include/linux/mutex.h``) to
|
||||
protect it. This is the most trivial case: you initialize the mutex.
|
||||
Then you can call :c:func:`mutex_lock_interruptible()` to grab the
|
||||
mutex, and :c:func:`mutex_unlock()` to release it. There is also a
|
||||
:c:func:`mutex_lock()`, which should be avoided, because it will
|
||||
Then you can call mutex_lock_interruptible() to grab the
|
||||
mutex, and mutex_unlock() to release it. There is also a
|
||||
mutex_lock(), which should be avoided, because it will
|
||||
not return if a signal is received.
|
||||
|
||||
Example: ``net/netfilter/nf_sockopt.c`` allows registration of new
|
||||
:c:func:`setsockopt()` and :c:func:`getsockopt()` calls, with
|
||||
:c:func:`nf_register_sockopt()`. Registration and de-registration
|
||||
setsockopt() and getsockopt() calls, with
|
||||
nf_register_sockopt(). Registration and de-registration
|
||||
are only done on module load and unload (and boot time, where there is
|
||||
no concurrency), and the list of registrations is only consulted for an
|
||||
unknown :c:func:`setsockopt()` or :c:func:`getsockopt()` system
|
||||
unknown setsockopt() or getsockopt() system
|
||||
call. The ``nf_sockopt_mutex`` is perfect to protect this, especially
|
||||
since the setsockopt and getsockopt calls may well sleep.
|
||||
|
||||
@ -170,19 +170,19 @@ Locking Between User Context and Softirqs
|
||||
If a softirq shares data with user context, you have two problems.
|
||||
Firstly, the current user context can be interrupted by a softirq, and
|
||||
secondly, the critical region could be entered from another CPU. This is
|
||||
where :c:func:`spin_lock_bh()` (``include/linux/spinlock.h``) is
|
||||
where spin_lock_bh() (``include/linux/spinlock.h``) is
|
||||
used. It disables softirqs on that CPU, then grabs the lock.
|
||||
:c:func:`spin_unlock_bh()` does the reverse. (The '_bh' suffix is
|
||||
spin_unlock_bh() does the reverse. (The '_bh' suffix is
|
||||
a historical reference to "Bottom Halves", the old name for software
|
||||
interrupts. It should really be called spin_lock_softirq()' in a
|
||||
perfect world).
|
||||
|
||||
Note that you can also use :c:func:`spin_lock_irq()` or
|
||||
:c:func:`spin_lock_irqsave()` here, which stop hardware interrupts
|
||||
Note that you can also use spin_lock_irq() or
|
||||
spin_lock_irqsave() here, which stop hardware interrupts
|
||||
as well: see `Hard IRQ Context <#hard-irq-context>`__.
|
||||
|
||||
This works perfectly for UP as well: the spin lock vanishes, and this
|
||||
macro simply becomes :c:func:`local_bh_disable()`
|
||||
macro simply becomes local_bh_disable()
|
||||
(``include/linux/interrupt.h``), which protects you from the softirq
|
||||
being run.
|
||||
|
||||
@ -216,8 +216,8 @@ Different Tasklets/Timers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If another tasklet/timer wants to share data with your tasklet or timer
|
||||
, you will both need to use :c:func:`spin_lock()` and
|
||||
:c:func:`spin_unlock()` calls. :c:func:`spin_lock_bh()` is
|
||||
, you will both need to use spin_lock() and
|
||||
spin_unlock() calls. spin_lock_bh() is
|
||||
unnecessary here, as you are already in a tasklet, and none will be run
|
||||
on the same CPU.
|
||||
|
||||
@ -234,14 +234,14 @@ The same softirq can run on the other CPUs: you can use a per-CPU array
|
||||
going so far as to use a softirq, you probably care about scalable
|
||||
performance enough to justify the extra complexity.
|
||||
|
||||
You'll need to use :c:func:`spin_lock()` and
|
||||
:c:func:`spin_unlock()` for shared data.
|
||||
You'll need to use spin_lock() and
|
||||
spin_unlock() for shared data.
|
||||
|
||||
Different Softirqs
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You'll need to use :c:func:`spin_lock()` and
|
||||
:c:func:`spin_unlock()` for shared data, whether it be a timer,
|
||||
You'll need to use spin_lock() and
|
||||
spin_unlock() for shared data, whether it be a timer,
|
||||
tasklet, different softirq or the same or another softirq: any of them
|
||||
could be running on a different CPU.
|
||||
|
||||
@ -259,38 +259,38 @@ If a hardware irq handler shares data with a softirq, you have two
|
||||
concerns. Firstly, the softirq processing can be interrupted by a
|
||||
hardware interrupt, and secondly, the critical region could be entered
|
||||
by a hardware interrupt on another CPU. This is where
|
||||
:c:func:`spin_lock_irq()` is used. It is defined to disable
|
||||
spin_lock_irq() is used. It is defined to disable
|
||||
interrupts on that cpu, then grab the lock.
|
||||
:c:func:`spin_unlock_irq()` does the reverse.
|
||||
spin_unlock_irq() does the reverse.
|
||||
|
||||
The irq handler does not to use :c:func:`spin_lock_irq()`, because
|
||||
The irq handler does not need to use spin_lock_irq(), because
|
||||
the softirq cannot run while the irq handler is running: it can use
|
||||
:c:func:`spin_lock()`, which is slightly faster. The only exception
|
||||
spin_lock(), which is slightly faster. The only exception
|
||||
would be if a different hardware irq handler uses the same lock:
|
||||
:c:func:`spin_lock_irq()` will stop that from interrupting us.
|
||||
spin_lock_irq() will stop that from interrupting us.
|
||||
|
||||
This works perfectly for UP as well: the spin lock vanishes, and this
|
||||
macro simply becomes :c:func:`local_irq_disable()`
|
||||
macro simply becomes local_irq_disable()
|
||||
(``include/asm/smp.h``), which protects you from the softirq/tasklet/BH
|
||||
being run.
|
||||
|
||||
:c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) is a
|
||||
spin_lock_irqsave() (``include/linux/spinlock.h``) is a
|
||||
variant which saves whether interrupts were on or off in a flags word,
|
||||
which is passed to :c:func:`spin_unlock_irqrestore()`. This means
|
||||
which is passed to spin_unlock_irqrestore(). This means
|
||||
that the same code can be used inside an hard irq handler (where
|
||||
interrupts are already off) and in softirqs (where the irq disabling is
|
||||
required).
|
||||
|
||||
Note that softirqs (and hence tasklets and timers) are run on return
|
||||
from hardware interrupts, so :c:func:`spin_lock_irq()` also stops
|
||||
these. In that sense, :c:func:`spin_lock_irqsave()` is the most
|
||||
from hardware interrupts, so spin_lock_irq() also stops
|
||||
these. In that sense, spin_lock_irqsave() is the most
|
||||
general and powerful locking function.
|
||||
|
||||
Locking Between Two Hard IRQ Handlers
|
||||
-------------------------------------
|
||||
|
||||
It is rare to have to share data between two IRQ handlers, but if you
|
||||
do, :c:func:`spin_lock_irqsave()` should be used: it is
|
||||
do, spin_lock_irqsave() should be used: it is
|
||||
architecture-specific whether all interrupts are disabled inside irq
|
||||
handlers themselves.
|
||||
|
||||
@ -304,11 +304,11 @@ Pete Zaitcev gives the following summary:
|
||||
(``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``).
|
||||
|
||||
- Otherwise (== data can be touched in an interrupt), use
|
||||
:c:func:`spin_lock_irqsave()` and
|
||||
:c:func:`spin_unlock_irqrestore()`.
|
||||
spin_lock_irqsave() and
|
||||
spin_unlock_irqrestore().
|
||||
|
||||
- Avoid holding spinlock for more than 5 lines of code and across any
|
||||
function call (except accessors like :c:func:`readb()`).
|
||||
function call (except accessors like readb()).
|
||||
|
||||
Table of Minimum Requirements
|
||||
-----------------------------
|
||||
@ -320,7 +320,7 @@ particular thread can only run on one CPU at a time, but if it needs
|
||||
shares data with another thread, locking is required).
|
||||
|
||||
Remember the advice above: you can always use
|
||||
:c:func:`spin_lock_irqsave()`, which is a superset of all other
|
||||
spin_lock_irqsave(), which is a superset of all other
|
||||
spinlock primitives.
|
||||
|
||||
============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
|
||||
@ -363,13 +363,13 @@ They can be used if you need no access to the data protected with the
|
||||
lock when some other thread is holding the lock. You should acquire the
|
||||
lock later if you then need access to the data protected with the lock.
|
||||
|
||||
:c:func:`spin_trylock()` does not spin but returns non-zero if it
|
||||
spin_trylock() does not spin but returns non-zero if it
|
||||
acquires the spinlock on the first try or 0 if not. This function can be
|
||||
used in all contexts like :c:func:`spin_lock()`: you must have
|
||||
used in all contexts like spin_lock(): you must have
|
||||
disabled the contexts that might interrupt you and acquire the spin
|
||||
lock.
|
||||
|
||||
:c:func:`mutex_trylock()` does not suspend your task but returns
|
||||
mutex_trylock() does not suspend your task but returns
|
||||
non-zero if it could lock the mutex on the first try or 0 if not. This
|
||||
function cannot be safely used in hardware or software interrupt
|
||||
contexts despite not sleeping.
|
||||
@ -490,14 +490,14 @@ easy, since we copy the data for the user, and never let them access the
|
||||
objects directly.
|
||||
|
||||
There is a slight (and common) optimization here: in
|
||||
:c:func:`cache_add()` we set up the fields of the object before
|
||||
cache_add() we set up the fields of the object before
|
||||
grabbing the lock. This is safe, as no-one else can access it until we
|
||||
put it in cache.
|
||||
|
||||
Accessing From Interrupt Context
|
||||
--------------------------------
|
||||
|
||||
Now consider the case where :c:func:`cache_find()` can be called
|
||||
Now consider the case where cache_find() can be called
|
||||
from interrupt context: either a hardware interrupt or a softirq. An
|
||||
example would be a timer which deletes object from the cache.
|
||||
|
||||
@ -566,16 +566,16 @@ which are taken away, and the ``+`` are lines which are added.
|
||||
return ret;
|
||||
}
|
||||
|
||||
Note that the :c:func:`spin_lock_irqsave()` will turn off
|
||||
Note that the spin_lock_irqsave() will turn off
|
||||
interrupts if they are on, otherwise does nothing (if we are already in
|
||||
an interrupt handler), hence these functions are safe to call from any
|
||||
context.
|
||||
|
||||
Unfortunately, :c:func:`cache_add()` calls :c:func:`kmalloc()`
|
||||
Unfortunately, cache_add() calls kmalloc()
|
||||
with the ``GFP_KERNEL`` flag, which is only legal in user context. I
|
||||
have assumed that :c:func:`cache_add()` is still only called in
|
||||
have assumed that cache_add() is still only called in
|
||||
user context, otherwise this should become a parameter to
|
||||
:c:func:`cache_add()`.
|
||||
cache_add().
|
||||
|
||||
Exposing Objects Outside This File
|
||||
----------------------------------
|
||||
@ -592,7 +592,7 @@ This makes locking trickier, as it is no longer all in one place.
|
||||
The second problem is the lifetime problem: if another structure keeps a
|
||||
pointer to an object, it presumably expects that pointer to remain
|
||||
valid. Unfortunately, this is only guaranteed while you hold the lock,
|
||||
otherwise someone might call :c:func:`cache_delete()` and even
|
||||
otherwise someone might call cache_delete() and even
|
||||
worse, add another object, re-using the same address.
|
||||
|
||||
As there is only one lock, you can't hold it forever: no-one else would
|
||||
@ -693,8 +693,8 @@ Here is the code::
|
||||
|
||||
We encapsulate the reference counting in the standard 'get' and 'put'
|
||||
functions. Now we can return the object itself from
|
||||
:c:func:`cache_find()` which has the advantage that the user can
|
||||
now sleep holding the object (eg. to :c:func:`copy_to_user()` to
|
||||
cache_find() which has the advantage that the user can
|
||||
now sleep holding the object (eg. to copy_to_user() to
|
||||
name to userspace).
|
||||
|
||||
The other point to note is that I said a reference should be held for
|
||||
@ -710,7 +710,7 @@ number of atomic operations defined in ``include/asm/atomic.h``: these
|
||||
are guaranteed to be seen atomically from all CPUs in the system, so no
|
||||
lock is required. In this case, it is simpler than using spinlocks,
|
||||
although for anything non-trivial using spinlocks is clearer. The
|
||||
:c:func:`atomic_inc()` and :c:func:`atomic_dec_and_test()`
|
||||
atomic_inc() and atomic_dec_and_test()
|
||||
are used instead of the standard increment and decrement operators, and
|
||||
the lock is no longer used to protect the reference count itself.
|
||||
|
||||
@ -802,7 +802,7 @@ name to change, there are three possibilities:
|
||||
- You can make ``cache_lock`` non-static, and tell people to grab that
|
||||
lock before changing the name in any object.
|
||||
|
||||
- You can provide a :c:func:`cache_obj_rename()` which grabs this
|
||||
- You can provide a cache_obj_rename() which grabs this
|
||||
lock and changes the name for the caller, and tell everyone to use
|
||||
that function.
|
||||
|
||||
@ -861,11 +861,11 @@ Note that I decide that the popularity count should be protected by the
|
||||
``cache_lock`` rather than the per-object lock: this is because it (like
|
||||
the :c:type:`struct list_head <list_head>` inside the object)
|
||||
is logically part of the infrastructure. This way, I don't need to grab
|
||||
the lock of every object in :c:func:`__cache_add()` when seeking
|
||||
the lock of every object in __cache_add() when seeking
|
||||
the least popular.
|
||||
|
||||
I also decided that the id member is unchangeable, so I don't need to
|
||||
grab each object lock in :c:func:`__cache_find()` to examine the
|
||||
grab each object lock in __cache_find() to examine the
|
||||
id: the object lock is only used by a caller who wants to read or write
|
||||
the name field.
|
||||
|
||||
@ -887,7 +887,7 @@ trivial to diagnose: not a
|
||||
stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
|
||||
|
||||
For a slightly more complex case, imagine you have a region shared by a
|
||||
softirq and user context. If you use a :c:func:`spin_lock()` call
|
||||
softirq and user context. If you use a spin_lock() call
|
||||
to protect it, it is possible that the user context will be interrupted
|
||||
by the softirq while it holds the lock, and the softirq will then spin
|
||||
forever trying to get the same lock.
|
||||
@ -985,12 +985,12 @@ you might do the following::
|
||||
|
||||
|
||||
Sooner or later, this will crash on SMP, because a timer can have just
|
||||
gone off before the :c:func:`spin_lock_bh()`, and it will only get
|
||||
the lock after we :c:func:`spin_unlock_bh()`, and then try to free
|
||||
gone off before the spin_lock_bh(), and it will only get
|
||||
the lock after we spin_unlock_bh(), and then try to free
|
||||
the element (which has already been freed!).
|
||||
|
||||
This can be avoided by checking the result of
|
||||
:c:func:`del_timer()`: if it returns 1, the timer has been deleted.
|
||||
del_timer(): if it returns 1, the timer has been deleted.
|
||||
If 0, it means (in this case) that it is currently running, so we can
|
||||
do::
|
||||
|
||||
@ -1012,9 +1012,9 @@ do::
|
||||
|
||||
|
||||
Another common problem is deleting timers which restart themselves (by
|
||||
calling :c:func:`add_timer()` at the end of their timer function).
|
||||
calling add_timer() at the end of their timer function).
|
||||
Because this is a fairly common case which is prone to races, you should
|
||||
use :c:func:`del_timer_sync()` (``include/linux/timer.h``) to
|
||||
use del_timer_sync() (``include/linux/timer.h``) to
|
||||
handle this case. It returns the number of times the timer had to be
|
||||
deleted before we finally stopped it from adding itself back in.
|
||||
|
||||
@ -1086,7 +1086,7 @@ adding ``new`` to a single linked list called ``list``::
|
||||
list->next = new;
|
||||
|
||||
|
||||
The :c:func:`wmb()` is a write memory barrier. It ensures that the
|
||||
The wmb() is a write memory barrier. It ensures that the
|
||||
first operation (setting the new element's ``next`` pointer) is complete
|
||||
and will be seen by all CPUs, before the second operation is (putting
|
||||
the new element into the list). This is important, since modern
|
||||
@ -1097,7 +1097,7 @@ rest of the list.
|
||||
|
||||
Fortunately, there is a function to do this for standard
|
||||
:c:type:`struct list_head <list_head>` lists:
|
||||
:c:func:`list_add_rcu()` (``include/linux/list.h``).
|
||||
list_add_rcu() (``include/linux/list.h``).
|
||||
|
||||
Removing an element from the list is even simpler: we replace the
|
||||
pointer to the old element with a pointer to its successor, and readers
|
||||
@ -1108,7 +1108,7 @@ will either see it, or skip over it.
|
||||
list->next = old->next;
|
||||
|
||||
|
||||
There is :c:func:`list_del_rcu()` (``include/linux/list.h``) which
|
||||
There is list_del_rcu() (``include/linux/list.h``) which
|
||||
does this (the normal version poisons the old object, which we don't
|
||||
want).
|
||||
|
||||
@ -1116,9 +1116,9 @@ The reader must also be careful: some CPUs can look through the ``next``
|
||||
pointer to start reading the contents of the next element early, but
|
||||
don't realize that the pre-fetched contents is wrong when the ``next``
|
||||
pointer changes underneath them. Once again, there is a
|
||||
:c:func:`list_for_each_entry_rcu()` (``include/linux/list.h``)
|
||||
list_for_each_entry_rcu() (``include/linux/list.h``)
|
||||
to help you. Of course, writers can just use
|
||||
:c:func:`list_for_each_entry()`, since there cannot be two
|
||||
list_for_each_entry(), since there cannot be two
|
||||
simultaneous writers.
|
||||
|
||||
Our final dilemma is this: when can we actually destroy the removed
|
||||
@ -1127,14 +1127,14 @@ the list right now: if we free this element and the ``next`` pointer
|
||||
changes, the reader will jump off into garbage and crash. We need to
|
||||
wait until we know that all the readers who were traversing the list
|
||||
when we deleted the element are finished. We use
|
||||
:c:func:`call_rcu()` to register a callback which will actually
|
||||
call_rcu() to register a callback which will actually
|
||||
destroy the object once all pre-existing readers are finished.
|
||||
Alternatively, :c:func:`synchronize_rcu()` may be used to block
|
||||
Alternatively, synchronize_rcu() may be used to block
|
||||
until all pre-existing are finished.
|
||||
|
||||
But how does Read Copy Update know when the readers are finished? The
|
||||
method is this: firstly, the readers always traverse the list inside
|
||||
:c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` pairs:
|
||||
rcu_read_lock()/rcu_read_unlock() pairs:
|
||||
these simply disable preemption so the reader won't go to sleep while
|
||||
reading the list.
|
||||
|
||||
@ -1223,12 +1223,12 @@ this is the fundamental idea.
|
||||
}
|
||||
|
||||
Note that the reader will alter the popularity member in
|
||||
:c:func:`__cache_find()`, and now it doesn't hold a lock. One
|
||||
__cache_find(), and now it doesn't hold a lock. One
|
||||
solution would be to make it an ``atomic_t``, but for this usage, we
|
||||
don't really care about races: an approximate result is good enough, so
|
||||
I didn't change it.
|
||||
|
||||
The result is that :c:func:`cache_find()` requires no
|
||||
The result is that cache_find() requires no
|
||||
synchronization with any other functions, so is almost as fast on SMP as
|
||||
it would be on UP.
|
||||
|
||||
@ -1240,9 +1240,9 @@ and put the reference count.
|
||||
|
||||
Now, because the 'read lock' in RCU is simply disabling preemption, a
|
||||
caller which always has preemption disabled between calling
|
||||
:c:func:`cache_find()` and :c:func:`object_put()` does not
|
||||
cache_find() and object_put() does not
|
||||
need to actually get and put the reference count: we could expose
|
||||
:c:func:`__cache_find()` by making it non-static, and such
|
||||
__cache_find() by making it non-static, and such
|
||||
callers could simply call that.
|
||||
|
||||
The benefit here is that the reference count is not written to: the
|
||||
@ -1260,11 +1260,11 @@ counter. Nice and simple.
|
||||
If that was too slow (it's usually not, but if you've got a really big
|
||||
machine to test on and can show that it is), you could instead use a
|
||||
counter for each CPU, then none of them need an exclusive lock. See
|
||||
:c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` and
|
||||
:c:func:`put_cpu_var()` (``include/linux/percpu.h``).
|
||||
DEFINE_PER_CPU(), get_cpu_var() and
|
||||
put_cpu_var() (``include/linux/percpu.h``).
|
||||
|
||||
Of particular use for simple per-cpu counters is the ``local_t`` type,
|
||||
and the :c:func:`cpu_local_inc()` and related functions, which are
|
||||
and the cpu_local_inc() and related functions, which are
|
||||
more efficient than simple code on some architectures
|
||||
(``include/asm/local.h``).
|
||||
|
||||
@ -1289,10 +1289,10 @@ irq handler doesn't use a lock, and all other accesses are done as so::
|
||||
enable_irq(irq);
|
||||
spin_unlock(&lock);
|
||||
|
||||
The :c:func:`disable_irq()` prevents the irq handler from running
|
||||
The disable_irq() prevents the irq handler from running
|
||||
(and waits for it to finish if it's currently running on other CPUs).
|
||||
The spinlock prevents any other accesses happening at the same time.
|
||||
Naturally, this is slower than just a :c:func:`spin_lock_irq()`
|
||||
Naturally, this is slower than just a spin_lock_irq()
|
||||
call, so it only makes sense if this type of access happens extremely
|
||||
rarely.
|
||||
|
||||
@ -1315,22 +1315,22 @@ from user context, and can sleep.
|
||||
|
||||
- Accesses to userspace:
|
||||
|
||||
- :c:func:`copy_from_user()`
|
||||
- copy_from_user()
|
||||
|
||||
- :c:func:`copy_to_user()`
|
||||
- copy_to_user()
|
||||
|
||||
- :c:func:`get_user()`
|
||||
- get_user()
|
||||
|
||||
- :c:func:`put_user()`
|
||||
- put_user()
|
||||
|
||||
- :c:func:`kmalloc(GFP_KERNEL) <kmalloc>`
|
||||
- kmalloc(GP_KERNEL) <kmalloc>`
|
||||
|
||||
- :c:func:`mutex_lock_interruptible()` and
|
||||
:c:func:`mutex_lock()`
|
||||
- mutex_lock_interruptible() and
|
||||
mutex_lock()
|
||||
|
||||
There is a :c:func:`mutex_trylock()` which does not sleep.
|
||||
There is a mutex_trylock() which does not sleep.
|
||||
Still, it must not be used inside interrupt context since its
|
||||
implementation is not safe for that. :c:func:`mutex_unlock()`
|
||||
implementation is not safe for that. mutex_unlock()
|
||||
will also never sleep. It cannot be used in interrupt context either
|
||||
since a mutex must be released by the same task that acquired it.
|
||||
|
||||
@ -1340,11 +1340,11 @@ Some Functions Which Don't Sleep
|
||||
Some functions are safe to call from any context, or holding almost any
|
||||
lock.
|
||||
|
||||
- :c:func:`printk()`
|
||||
- printk()
|
||||
|
||||
- :c:func:`kfree()`
|
||||
- kfree()
|
||||
|
||||
- :c:func:`add_timer()` and :c:func:`del_timer()`
|
||||
- add_timer() and del_timer()
|
||||
|
||||
Mutex API reference
|
||||
===================
|
||||
@ -1400,26 +1400,26 @@ preemption
|
||||
|
||||
bh
|
||||
Bottom Half: for historical reasons, functions with '_bh' in them often
|
||||
now refer to any software interrupt, e.g. :c:func:`spin_lock_bh()`
|
||||
now refer to any software interrupt, e.g. spin_lock_bh()
|
||||
blocks any software interrupt on the current CPU. Bottom halves are
|
||||
deprecated, and will eventually be replaced by tasklets. Only one bottom
|
||||
half will be running at any time.
|
||||
|
||||
Hardware Interrupt / Hardware IRQ
|
||||
Hardware interrupt request. :c:func:`in_irq()` returns true in a
|
||||
Hardware interrupt request. in_irq() returns true in a
|
||||
hardware interrupt handler.
|
||||
|
||||
Interrupt Context
|
||||
Not user context: processing a hardware irq or software irq. Indicated
|
||||
by the :c:func:`in_interrupt()` macro returning true.
|
||||
by the in_interrupt() macro returning true.
|
||||
|
||||
SMP
|
||||
Symmetric Multi-Processor: kernels compiled for multiple-CPU machines.
|
||||
(``CONFIG_SMP=y``).
|
||||
|
||||
Software Interrupt / softirq
|
||||
Software interrupt handler. :c:func:`in_irq()` returns false;
|
||||
:c:func:`in_softirq()` returns true. Tasklets and softirqs both
|
||||
Software interrupt handler. in_irq() returns false;
|
||||
in_softirq() returns true. Tasklets and softirqs both
|
||||
fall into the category of 'software interrupts'.
|
||||
|
||||
Strictly speaking a softirq is one of up to 32 enumerated software
|
||||
|
@ -128,6 +128,10 @@ since we already have a valid pointer that we own a refcount for. The
|
||||
put needs no lock because nothing tries to get the data without
|
||||
already holding a pointer.
|
||||
|
||||
In the above example, kref_put() will be called 2 times in both success
|
||||
and error paths. This is necessary because the reference count got
|
||||
incremented 2 times by kref_init() and kref_get().
|
||||
|
||||
Note that the "before" in rule 1 is very important. You should never
|
||||
do something like::
|
||||
|
||||
|
@ -291,8 +291,8 @@ and QUERYMENU. And G/S_CTRL as well as G/TRY/S_EXT_CTRLS are automatically suppo
|
||||
In practice the basic usage as described above is sufficient for most drivers.
|
||||
|
||||
|
||||
Inheriting Controls
|
||||
-------------------
|
||||
Inheriting Sub-device Controls
|
||||
------------------------------
|
||||
|
||||
When a sub-device is registered with a V4L2 driver by calling
|
||||
v4l2_device_register_subdev() and the ctrl_handler fields of both v4l2_subdev
|
||||
@ -757,8 +757,8 @@ attempting to find another control from the same handler will deadlock.
|
||||
It is recommended not to use this function from inside the control ops.
|
||||
|
||||
|
||||
Inheriting Controls
|
||||
-------------------
|
||||
Preventing Controls inheritance
|
||||
-------------------------------
|
||||
|
||||
When one control handler is added to another using v4l2_ctrl_add_handler, then
|
||||
by default all controls from one are merged to the other. But a subdev might
|
||||
|
@ -20,4 +20,5 @@ fit into other categories.
|
||||
isl29003
|
||||
lis3lv02d
|
||||
max6875
|
||||
mic/index
|
||||
xilinx_sdfec
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user