ghidra/GhidraDocs/GhidraFilesystemStorage.html
2023-11-21 09:37:16 -05:00

203 lines
10 KiB
HTML

<!DOCTYPE html>
<html>
<head>
<meta charset = "utf-8">
<title>Ghidra Filesystem Storage</title>
<style media="screen" type="text/css">
pre { margin-left: 120px; }
li { margin-left: 90px; font-family:times new roman; font-size:14pt; }
h1 { color:#000080; font-family:times new roman; font-size:36pt; font-style:italic; font-weight:bold; text-align:center; }
h2 { margin: 10px; margin-top: 40px; color:#984c4c; font-family:times new roman; font-size:18pt; font-weight:bold; }
h3 { margin-left: 37px; margin-top: 20px; color:#0000ff; font-family:times new roman; font-size:14pt; font-weight:bold; }
h4 { margin-left: 70px; margin-top: 10px; font-family:times new roman; font-size:14pt; font-weight:normal; }
h5 { margin-left: 70px; margin-top: 10px; font-family:times new roman; font-size:14pt; font-weight:normal; }
h6 { color:#000080; font-family:times new roman; font-size:18pt; font-style:italic; font-weight:bold; text-align:center; }
p { margin-left: 90px; font-family:times new roman; font-size:14pt; }
body {
counter-reset: section;
}
h2 {
counter-reset: topic;
}
h3 {
counter-reset: rule;
}
h2::before {
counter-increment: section;
content: counter(section) ": ";
}
h3::before {
counter-increment: topic;
content: counter(section) "." counter(topic) " ";
}
h4::before {
counter-increment: rule;
content: counter(section) "." counter(topic) "." counter(rule) " ";
}
div.info {
margin-left: 10px; margin-top: 80px; font-family:times new roman; font-size:18pt; font-weight:bold;
}
</style>
</head>
<body>
<H1>Ghidra Filesystem Storage</H1>
<H2>Introduction</H2>
<P>The purpose of this document is to provide technical details of the file storage scheme(s) employed by
both Ghidra projects and Ghidra Server repositories. As of this writing Ghidra has employed two
different local filesystem storage schemes:</P>
<UL>
<LI>Indexed Filesystem (current)</LI>
<LI>Mangled Filesystem (legacy, pre-9.0 PUBLIC release)</LI>
</UL>
<P>In addition to providing details of each storage scheme, some details are provided about how project/repository database
files are stored within each filesystem as well as some troubleshooting tips to aid in any manual interventions
which may be required.</P>
<P>At this point in time all filesystem implementations rely on the use of a property file for each project defined
file (<I>*.prp</I>). For all database type project files there will be a corresponding subdirectory (<I>~*.db</I>)
which is used to store all content related to a project file database (e.g., ProgramDB). The naming and organization
of these two differ significantly between the two filesystem implementations.
<H3>Name Mangling/Encoding</H3>
<P>Ghidra uses the following name mangling/encoding for both Ghidra Server project repository subdirectories as well as
project folder and file naming within the legacy Mangled Filesystem. The goal of this encoding scheme is to preserve case-sensitive
naming while allowing storage on a single-case or case-insensitive native filesystem. This is achieved by mutating the original
name with the following character substitutions:</P>
<UL>
<LI>All uppercase letters replaced with an underscore ('_') followed by the lowercase letter, and</LI>
<LI>All underscore ('_') characters are replaced with two underscores ("__")
</UL>
<P>For a Ghidra Server repository named "My_Project", the resulting filesystem storage folder named "_my___project"
would appear within the server repositories directory.</P>
<H3>Ghidra Project Storage</H3>
<P>Ghidra projects employ multiple filesystem storage directories within the top-level project directory (<I>*.rep</I>):</P>
<UL>
<LI>Local project file storage (<I>idata</I>) provides the primary storage area for non-versioned project files
and for versioned files checked-out by the user.</LI>
<LI>Versioned project file storage (<I>versioned</I>) is present with private non-shared projects only and provides
version control storage similar to a Ghidra Server repository.</LI>
<LI>User data storage (<I>user</I>) provides non-versioned storage of user-specific data associated with a specific project file.</LI>
</UL>
<H3>Ghidra Server Storage</H3>
<P>The Ghidra Server employs a separate filesystem storage directory for each project repository using a mangled
name (see <I>Name Mangling/Encoding</I> above). While all newly created project repositories will use the latest
Indexed Filesystem storage scheme Ghidra continues to support the legacy Mangled Filesystem which may be in use
by older Ghidra Server installations. The <I>svrAdmin</I> command provides the ability to migrate an older
Mangled Filesystem to the current Index Filesystem (see <I>server/serverREADME.html</I>).</P>
<H2>Indexed Filesystem (current)</H2>
<P>This filesystem overcomes the project file-path length limitations inherent to the legacy Managled Filesystem and
utilizes an index file to store project file-paths and the corresponding 8-digit hexadecimal identifier for
each
(e.g., <I>00001234.prp</I> / <I>~00001234.db</I>). The following files are used to manage the filesystem content
and are located at the root of the filesystem storage directory.</P>
<UL>
<LI>Index File (<I>~index.dat<I>) - current index file
<LI>Backup Index File (<I>~index.bak</I>) - backup of current index file</LI>
<LI>Temporary Index File (<I>~index.tmp</I>) - temporary index file created during update</LI>
<LI>Index Journal File (<I>~journal.dat</I>) - journal or recent changes not yet reflected by index</LI>
<LI>Backup Journal File (<I>~journal.bak</I>) - backup of journal file created during incremental update</LI>
<LI>Index Lock File (<I>~index.lock</I>) - index lock used to coordinate index access and modification</LI>
</UL>
<P><B>Index Rebuild</B> - If the index file becomes corrupt it may be easily rebuilt while the associated project
is closed or Ghidra Server stopped to avoid file access during the repair process. While the filesystem is not
active/in-use all of the index
related files mentioned above may be manually deleted from the root of the appropriate filesystem storage directory
(see Ghidra Project Storage and Ghidra Server Storage above). Once the filesystem store is
started (e.g., project opened or Ghidra Server started) the missing index will trigger an automatic rebuild of the
index based upon the details provided by each property file contained within (<I>*.prp</I>).
<P><B>Locating Project File Storage</B> - Locating individual project files on disk requires interpretation of
the index file (<I>~index.dat</I>) and traversing the numbered storage folders appropriately. When locating a project
file within the index it is important to know both the full Ghidra project directory path and project filename.
If project filename are unique you can simply search for the filename within the index, otherwise you will have
to search for project folder path first. Sample <I>~index.dat</I> file:</P>
<PRE>
VERSION=1
/
00000003:myFile:a701ee4b1c71321380792951888
/A
00000100:anotherFile:a701ee4920f909328022843906
00000105:yetAnotherFile:a701ee48743913628104779815
/A/B
00001234:myFile:a701ee48019210920546276045
00001200:myFile.1:a701ee48491297248400412904
/A/B/C
00000004:someFile:a701ee4a23590324427866017
NEXT-ID:1250
MD5:d41d8cd98f00b204e9800998ecf8427e
</PRE>
<P>Once the project file of interest has been located within the index file and the corresponding 8-digit hex file number identified,
the storage subfolder name is derived from the 2nd and 3rd digits of the file number. Example file number "00001234" will be contained
with the subfolder "12". Within this subfolder the "00001234.prp" file should exist as will all other numbered files which have the
same 2nd/3rd digits. The storage hierarchy for the above index file would have the following hierarchy:</P>
<PRE>
./~index.dat
./00/
00000003.prp
~00000003.db/
00000004.prp
~00000004.db/
./01/
00000100.prp
~00000100.db/
00000105.prp
~00000105.db/
./12/
00001200.prp
~00001200.db/
00001234.prp <-- /A/B/myFile property file
~00001234.db/ <-- /A/B/myFile database directory
</PRE>
<H2>Mangled Filesystem (legacy)</H2>
<P>This filesystem utilizes mangled naming for all project folders and files and follows the same hierarchy as the project.
For example, a project file with the path <I>"/A_1/B_1/myFile"</I> would be found stored as
<I>"./_a__1/_b__1/my_file.prp"</I> and <I>"./_a__1/_b__1/~my_file.db/"</I>. Due to file path-length limitations of native
filesystems the use of this storage scheme is no longer used by default and has been retained only for backward
compatibility with older projects and repositories.
<H2>Removal of Corrupt Files</H2>
<P>If a project or Ghidra Server repository contains a corrupt file it may not be possible to remove the file via the
Ghidra GUI or API. While a detailed triage of a corrupt file may be possible by the Ghidra Development Team, such files
may need to be removed after being copied for triage. For a shared repository this will require stopping the Ghidra Server
and digging into the appropriate named repository directory. For a local project simply ensure that the project is
not in use.</P>
<P>For the local project case it will be neccessary to isolate the storage issue since it could be caused by the
local project store (<I>*.rep/idata/</I>) or versioned repository (Ghidra Server or private non-shared (<I>*.rep/versioned/</I>).
The Ghidra Server case can easily be identified by creating another temporary shared project to the same shared repository
and check the behavior of the project file in question. If the same behavior is observed the issue is likely on the server.
If you need assistance identifying the source of the bad behavior or recommended resolution please submit a Ghidra
trouble ticket.<P>
<P>As discussed for each filesystem above, the specific <I>*.prp</I> file and <I>~*.db/</I> directory should be
identified and copied for triage. Keep a copy will enable triage and may enable restoring the file in the
future if poossible. Once this file and corresponding directory have been copied they may be removed from the filesystem.
For the indexed filesystem case the index related files can be deleted which will trigger a rebuild of the index
(see Indexed Filesystem above).
</body>
</html>