GP-3980 bsim tutorial

This commit is contained in:
James 2023-12-05 16:09:34 +00:00
parent 58e22a6f7b
commit 7a9590469f
27 changed files with 200 additions and 183 deletions

View File

@ -20,7 +20,7 @@
# below (POSTGRES_CONFIG_OPTIONS) may be adjusted if required
# (e.g., build without openssl use, etc.).
#
# See https://www.postgresql.org/docs/10/install-procedure.html
# See https://www.postgresql.org/docs/15/install-procedure.html
# for supported postgresql config options.
#
# Additional packages may need to be installed include to perform the

View File

@ -2,21 +2,22 @@
The ``bsim`` command-line utility, located in the ``support`` directory of a Ghidra distribution, is used to create, populate, and manage BSim databases.
It works for all BSim database backends.
This utility offers a number of commands, many of which have several options.
In this section, we cover only a small subset of the possibilities.
Note that running ``bsim`` with no arguments will print a detailed usage message.
Running ``bsim`` with no arguments will print a detailed usage message.
## Generating Signature Files
The first step is to create signature files from the binaries in the Ghidra project.
Signature files are XML files which contain the BSim vectors and other metadata needed by the BSim server.
**Important**: If you have the ``postgres_object_files`` project open in Ghidra, close it now.
Non-shared projects are locked when open, and the lock will prevent the signature-generating process from accessing the project.
**Important**: It's simplest to exit Ghidra before performing the next steps, because:
- The H2-backed database can only be accessed by one process at a time.
- In case you have the ``postgres_object_files`` project open in Ghidra, signature generation will fail.
Non-shared projects are locked when open, and the lock will prevent the signature-generating process from accessing the project.
To generate the signature files, execute the following commands in a shell (adjust as necessary for Windows)
To generate the signature files, execute the following commands in a shell (adjust as necessary for Windows).
```bash
cd <ghidra_install_dir>/support
@ -37,6 +38,8 @@ Now, we commit the signatures to the BSim database with the following command (s
./bsim commitsigs file:/<database_dir>/example ~/bsim_sigs
```
Once the signatures have been committed, start Ghidra again.
## Aside: Creating a Database
We continue to use the database ``example``, so this step isn't necessary for the exercises.
@ -64,10 +67,12 @@ For example, you could restrict a BSim query to search only in executables of th
Executable categories in BSim are implemented using *program properties*, and function tags in BSim correspond to function tags in Ghidra. Properties and tags both have uses in Ghidra which are independent of BSim.
So, if we want a BSim database to record a particular category or tag, we must indicate that explicitly.
For example, to inform the database that we wish to record the ``ORIGIN`` category, you would execute the command
For example, to inform the database that we wish to record the ORIGIN category, you would execute the command
```bash
./bsim addexecategory file:/<database_dir>/example ORIGIN
```
Next Section: [Evaluating_Matches](BSimTutorial_Evaluating_Matches.md)
Executable categories can be added to a program using the script ``SetExecutableCategoryScript.java``.
Next Section: [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md)

View File

@ -22,7 +22,7 @@ There are a number of ways to initiate a BSim query, including:
- **BSim -> Search Functions...** from the Code Browser.
- Right-click in the Listing and select **BSim -> Search Functions...**
- Click on the BSim icon in the toolbar.
- Click on the BSim icon ![BSim toolbar icon](images/preferences-web-browser-shortcuts.png) in the Code Browser toolbar.
For these cases, the function(s) being queried depend on the current selection.
If there is no selection, the function containing the current address is queried.
@ -44,7 +44,7 @@ From the BSim Search Dialog, you can
- Bound the number of results returned for each function.
- Set query filters.
![](./images/bsim_search_dialog.png)
![](images/bsim_search_dialog.png)
#### Selecting a BSim Database
@ -66,7 +66,7 @@ The respective fields in the dialog set lower bounds for these values for the ma
- Sharing rare features contributes more to this score than sharing common features.
- There is no upper bound for confidence when considered over all pairs of vectors.
However, if you fix a vector *v*, the greatest possible confidence score for a comparison involving *v* occurs when *v* is compared to itself.
The resulting confidence value is called the **self significance** of *v*.
The resulting confidence value is called the **self-significance** of *v*.
Confidence is used to judge the significance of a match.
For example, many executables contain a function which simply returns a constant value.
@ -79,19 +79,18 @@ The results of a BSim query can be sorted by the similarity and/or confidence of
The **Matches per Function** bound controls the number of results returned for a single function.
Note that in large collections, certain small or common functions might have substantial numbers of identical matches.
Filters are discussed in [BSim Filters](BSimTutorial_Filters.md).
#### Performing the Query
Click the **Search** button in the dialog to perform a query.
**Notes**:
1. Filters are discussed in [BSim Filters](BSimTutorial_Filters.md).
1. After successfully issuing a query, you will also see a **Search Function(s)** action (without the ellipsis) in certain contexts.
After successfully issuing a query, you will also see a **Search Function(s)** action (without the ellipsis) in certain contexts.
This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Seach Dialog).
## Exercises:
## Exercises
The database `example` contains vectors from a Linux executable used by Ghidra's GNU demangler.
The database ``example`` contains vectors from a Linux executable used by Ghidra's GNU demangler.
Ghidra ships with several other versions of this executable.
We use these different versions to demonstrate some of the capabilities of BSim.
@ -105,22 +104,19 @@ We use these different versions to demonstrate some of the capabilities of BSim.
- Note that the function names **are** present in ``demangler_gnu_v2_41``.
1. Using the default query options, query `example` for matches to the function at ``140006760``.
1. You should see the following search results:
![results](./images/basic_query.png)
![results](images/basic_query.png)
- In this case, there is exactly one match, the similarity is 1.0, and the matching function has a non-default name (it won't always be this easy).
- **Note**: The results window has two tables: the function-level results (upper table) and the executable-level results (lower table).
The executable-level results are covered in [Executable-level Results](BSimTutorial_Exe_Results.md)
1. Right-click on the row of a match to see the available actions:
![actions](./images/actions.png)
1. Select the **Compare Functions** action to bring up the side-by-side comparison.
- The results window has two tables: the function-level results (upper table) and the executable-level results (lower table).
The executable-level results are covered in [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md).
1. Right-click on the row of the match and select the **Compare Functions** action to bring up the side-by-side comparison.
- The **Listing View** tab shows the disassembly.
- The **Decompiler Diff View** tab shows the decompiled code.
- Differences in the code are automatically highlighted in blue.
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
- **Note**: We cover the Decompiler Diff View in greater detail in [Applying Function Signatures](BSimTutorial_Applying_Function_Signatures.md)
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
1. Examine the diff views to verify that the match is valid.
1. Using the `Apply Function Names and Namespaces` action, transfer the name from the search result to the queried function.
1. Using the **Apply Name** action, apply the name from the search result to the queried function.
TODO: explain why there are different apply actions
**Note**: We cover the Decompiler Diff View in greater detail and discuss the various "Apply" actions in [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md).
### Exercise 2: Changes to the Source Code
@ -128,7 +124,7 @@ TODO: explain why there are different apply actions
- This executable is based on an earlier version of the source code than the executable in ``example``.
1. Navigate to the function ``expandargv`` in ``demangler_gnu_v2_24`` and issue a BSim query.
1. What differences do you see in the decompiled code?
<details><summary>In demangler_gnu_v2_41...</summary> Answer: The call to dupargv is now in an if clause (and decompiler creates a related local variable) and there are two additional calls to free. </details>
<details><summary>In demangler_gnu_v2_41...</summary> The main differences are that call to dupargv is now in an if clause (and decompiler creates a related local variable) and there are two additional calls to free. </details>
1. The relevant source files are included with the Ghidra distribution:
- ``<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler_gnu_v2_24/c/argv.c``
- ``<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler/gnu_v2_41/c/argv.c``
@ -140,9 +136,10 @@ TODO: explain why there are different apply actions
``<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41``.
- This executable is based on the same source code as the executable in `example` but compiled for a different architecture.
- **Note**: this file has the same name as the one used to populate the BSim database, so you will have to give the resulting Ghidra program a different name or import it into a different directory in your Ghidra project.
1. Navigate to ``_expandargv`` and issue a BSim query. What differences do you see regarding ``memmove`` and ``memcpy``?
<details><summary>In the arm64 version...</summary> Answer: In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
1. Examine the ``Listing View`` tab and verify that the architectures are different.
1. Navigate to ``_expandargv`` and issue a BSim query.
In the decompiler diff view, what differences do you see regarding ``memmove`` and ``memcpy``?
<details><summary>In the arm64 version...</summary> In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
1. Examine the **Listing View** tab and verify that the architectures are different.
## A Remark on Query Thresholds and Indices

View File

@ -10,10 +10,10 @@ Next, perform the following steps from the Ghidra Code Browser:
1. Run the Ghidra script ``CreateH2BSimDatabaseScript.java``.
1. In the resulting dialog:
1. Enter "example" in the `Database Name` field.
1. Select the new directory in the `Database Directory` field.
1. Enter "example" in the **Database Name** field.
1. Select the new directory in the **Database Directory** field.
1. Don't change any of the other fields.
1. Click OK.
1. Click **OK**.
## Populating the Database

View File

@ -12,7 +12,7 @@ To enable BSim, perform the following steps:
1. Click on the ``Configure`` link of the ``BSim`` entry.
1. In the resulting dialog, ensure that the checkbox for ``BSimSearchPlugin`` is checked.
![](./images/configure.png)
![](images/configure.png)
Next Section: [Creating and Populating a BSim Database from the GUI](BSimTutorial_Creating_Database_From_GUI.md)

View File

@ -1,4 +1,4 @@
# Evaluating Matches and Transferring Information
# Evaluating Matches and Applying Information
Summarizing what we've created over the last few sections, we now have:
1. A stripped executable (``postgres``).
@ -23,16 +23,18 @@ The corresponding function in `postgres` should have a default name.
1. Examine this match in the side-by-side decompiler view.
Note that the matching function has better data type information due to the debug information.
1. Q: Why does the placement of the `double` argument between the functions?
<details><summary>Answer</summary> Floating point values and integer/pointer values are passed in separate sets registers.
Neither ordering is wrong since both are consistent with the instructions of the function.
The debug info records a specific signature (and ordering) for the function, which Ghidra applies.
In the version without debug information, the decompiler used heuristics to determine the function's signature.</details>
<details><summary>Answer</summary> Floating point values and integer/pointer values are passed in separate sets registers.
Neither ordering is wrong since both are consistent with the instructions of the function.
The debug info records a specific signature (and ordering) for the function, which Ghidra applies.
In the version without debug information, the decompiler used heuristics to determine the function's signature.</details>
For matches with a fair number of differences, the decompiler diff panel can get pretty colorful.
Furthermore, as you click around, tokens will gain and lose highlight of various colors.
Furthermore, as you click around, tokens will gain and lose highlights of various colors.
It's worth giving a brief explanation of when highlighting happens and what the different colors mean.
Some terminology: if you click on a token in a decompiler panel, that token becomes the *focused token*.
![Decomp Diff Window](images/decomp_diff.png)
The colors:
- Blue is used to highlight differences between the two functions.
@ -43,36 +45,70 @@ Certain tokens, such as whitespace tokens or tokens used in variable declaration
## Exercise: Locking and Unlocking Scrolling
By default, scrolling in the diff window is synchronized.
This means that scrolling within one window will also scroll within the other window.
In the decompiler diff window, scrolling works by matching one line in the left function with one line in the right function.
The two functions are aligned using those lines.
Initially, the functions are aligned using the functions' signatures.
Before moving on, experiment with locking and unlocking scrolling.
As you click around in either function, the "aligning lines" will change.
If the focused token has a match, the scrolling is re-centered based on the lines containing the matched tokens.
If the focused token does not have a match, the functions will be aligned using the closest token to the focused token which does have a match.
Synchronized scrolling can be toggled using the ![lock icon](images/lock.gif) and ![unlock icon](images/unlock.gif) icons in the toolbar.
Exercise:
1. Experiment with locking and unlocking synchronized scrolling.
## Exercise: Applying Signatures
If you are satisified with a given match, you might want to apply information about the match to the queried function.
For example, you might want to apply the name or signature of the function.
There are some subtleties which determine how much information is safe to apply.
Hence there are three actions available under the **Apply From Other** menu when you right-click in the left panel:
1. **Function Name** will apply the function's name (and namespace) to the function on the left.
1. **Function Signature** will apply the name and namespace and "skeleton" data types.
Structure and union data types are not transferred.
Instead, empty placeholder structures are created.
1. **Function Signature and Data Types** will apply the name and signature with full data types.
This may result in many data types being imported into the program (e.g., structures which refer to other structures).
**Warning**: You should be absolutely certain that the datatypes are the exactly the same before applying signatures and data types.
If there have been any changes to a datatype's definition, you could end up bringing incorrect datatypes into a program, even using BSim matches with 1.0 similarity.
Applying full data types is also problematic for cross-architecture matches.
Exercise:
1. Since we know it's safe, apply the function signature and data types to the left function.
There are similarly-named actions available on rows of Function Matches table in the BSim Search Results window.
The **Status** column contains information about which rows have had their matches applied.
## Exercise: Comparing Callees
The token matching algorithm matches a function call in one program to a function call in another by considering the data flow into and out of the ``CALL`` instruction, but it does not do anything with the bodies of the callees.
However, given a matched pair of calls, you can bring up a new comparison window and compare their bodies manually.
However, given a matched pair of calls, you can bring up a new comparison window for the callees with the **Compare Matching Callees** action.
Ctrl f in left view
FUN_
find something
1. Click in the left panel of the decompile diff window and press ``Ctrl-F``.
1. Enter ``FUN_`` and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name.
1. Right-click on one of the matched tokens and select the **Compare Matching Callees** action.
1. In the comparison of the callees, apply the function signature and data types from the right function to the left function.
Verify that the update is reflected in the decompiler diff view of the callers.
## Exercise: Transferring Signatures
1. Transfer the signatures to the queried function via either:
- The `Apply Function Signature to Other Side` action in the diff window.
- The `Apply Function Names, Namespaces, and Signatures` action in the BSim Search Results window.
**Warning**: You should be absolutely certain that the datatypes are the same before applying signatures.
If there have been any changes to a datatype's definition, you could end up bringing incorrect datatypes into a program, even using BSim matches with 1.0 similarity.
# Exercise: Multiple Comparisons
The function shown in a panel is controlled by a drop-down menu at the top of the panel.
This can be useful when you'd like to evaluate multiple matches to a single function.
Exercise:
1. In the BSim Search Results window, right-click on a table column name, select **Add/Remove Columns**, and enable the **Matches** column.
1. Find two functions in ``postgres``, each of which has exactly two matches.
Select the corresponding four rows in the matches table and perform the **Compare Functions** action.
1. Experiment with the drop-downs in each panel.
In the next section, we discuss the Executable Results table.

View File

@ -1,37 +1,37 @@
# From Matching Functions to Matching Executables
In this section, we discuss the Executable results table.
In this section, we discuss the Executable Results table.
Each row of this table corresponds to one executable in the database.
The information in one row is an aggregation of all of the function-level matches into that row's executable.
Your Executable Results table from the previous query should look similar to the following:
Using the results window from the previous query, sort the Executable results table
by "Function Count" (i.e., the number of results which are in a given executable). You should see the following:
![](./images/exe_results.png)
![executable results](images/exe_results.png)
If you select a single row in the table and right-click on it, you will see the following actions:
![](./images/exe_results_actions.png)
- **Load Executable** will open a read-only copy of the program in the Code Browser.
- **Filter on this Executable** applies a filter which restricts the matches shown in the Function Matches table to matches which occur in the given executable.
- **Load Executable**
Opens a read-only copy of the program in the Code Browser.
- **Filter on this Executable**
Applies a filter which restricts the matches shown in the Function Matches table to matches which occur in the given executable.
## Exercise
1. If you haven't already, sort the Executable results by descending Function Count.
What position is `demangler_gnu_v2_33_1`?
- <details><summary>A:</summary> 7 </details>
1. The Confidence column shows the sum of the confidence scores of all matches into each executable. Sort the Executable results by descending confidence and observe that `demangler_gnu_v2_33_1` is now much further down the list.
- <details><summary>What could explain this?</summary>
If there are many function matches but the sum of all the confidences is relatively low,
it is likely that many of the matches involve small functions. For such a match, it is
more likely that the functions agree by chance rather than being derived from the same
source code.
</details>
1. In the Executable match table, right click on `demangler_gnu_v2_33_1` and apply the filter
action. Sort the filtered function matches by descending confidence. Starting at the top,
perform some code comparisons and convince yourself that the given explanation is correct.
- Note: You can remove the filter using the "Settings" icon in the upper right. We'll discuss this further in [BSim Filters](./BSimTutorial_Filters.md)
1. Sort the Executable results by descending **Function Count**.
An entry in this column shows the number of queried functions which have at least one match in the row's executable (if ``foo`` has 2 or more matches into a given executable, it still only contributes 1 to the function count).
What position is ``demangler_gnu_v2_41``?
<details><summary>In this table...</summary> It's in the first position.</details>
1. An entry in the **Confidence** column shows the sum of the confidence scores of all matches into the corresponding executable.
If ``foo`` has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score.
Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now much further down the list.
<details><summary>What could explain this?</summary> If there are many function matches but the sum of all the confidences is relatively low, it is likely that many of the matches involve small functions. For such a match, it is more likely that the functions agree by chance rather than being derived from the same source code. </details>
1. In the Executable match table, right click on ``demangler_gnu_v2_41`` and apply the filter action.
Sort the filtered function matches by descending confidence.
Starting at the top, perform some code comparisons and convince yourself that the given explanation is correct.
- **Note**: You can remove the filter using the **Filter Results** icon ![Filter Results](images/exec.png) in the upper right.
We'll discuss this further in [BSim Filters](BSimTutorial_Filters.md)
In the next section, describe a technique to restrict queries to functions which are likely to
have "interesting" matches.
From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action.
Keep in mind that such functions can "pollute" the results of a blanket query.
In the next section, we demonstrate a technique to restrict queries to functions which are more likely to have meaningful matches.
Next Section: [Overview Queries](BSimTutorial_Overview.md)
Next Section: [Overview Queries](BSimTutorial_Overview_Queries.md)

View File

@ -1,36 +1,21 @@
# BSim Filters
There are a number of filters that can be applied to BSim queries, involving names, architectures,
compilers, ingest dates, and many other attributes.
There are a number of filters that can be applied to BSim queries, involving names, architectures, compilers, ingest dates, user-defined executable categories, and many other attributes.
Filter be can applied *server-side* or *client-side*. Server-side filters affect the results sent
to Ghidra from a BSim server. Client-side filters apply to the BSim Search results table and can
be added and removed at will. However, to "undo" a server-side filter, you have to issue an
additional BSim query without the filter.
Filters be can applied *server-side* or *client-side*.
Server-side filters affect the query results sent to Ghidra from a BSim server.
Client-side filters apply to the BSim Search results table and can be added and removed at will.
However, to "undo" a server-side filter, you have to issue an additional BSim query without the filter.
Note that overview queries cannot be filtered.
Server-side filters can be applied using the `Filters` drop-down in the BSim Search dialog.
Server-side filters can be applied using the **Filters** drop-down in the BSim Search dialog.
## Exercise: Filters
1. Select all functions in `postgres` and bring up the BSim Search dialog.
1. Use the default query bounds.
1. Apply an `Executable name does not equal` filter with `demangler_gnu_v2_33_1` as the name to
exclude.
1. Perform the query and verify that `demangler_gnu_v2_33_1` is not in the list of executables
with matches.
<p align="center">
<img src="./images/search_info.png"/>
</p>
1. Using the `Search Info` icon, you can see what server-side filters were applied to the query.
1. Select all functions in ``postgres`` and bring up the BSim Search dialog.
1. Apply an **Executable name does not equal** filter with ``demangler_gnu_v2_41`` as the name to exclude.
1. Perform the query and verify ``demangler_gnu_v2_41`` is not in the list of executables with matches.
1. Using the **Search Info** icon ![Search Info](images/information.png) in the BSim Search Results toolbar, you can see the server-side filters applied to the query.
Verify that this information is correct.
<p align="center">
<img src="./images/filter_results.png"/>
</p>
1. Using the `Filter Results` icon, you can apply client-side filters to the query results.
Experiment with applying and removing some client-side filters.
Next Section: [Scripting and Visualization](BSimTutorial_Scripting.md)
1. Using the **Filter Results** icon ![Filter Results](images/exec.png), you can apply client-side filters to the query results. Experiment with applying and removing some client-side filters.
Next Section: [Scripting and Visualization](BSimTutorial_Scripting.md)

View File

@ -1,7 +1,7 @@
# Ghidra Analysis from the Command Line
For the remaining exercises, we need to populate our BSim database with a number of binaries.
We'd like a consistent set of binaries for the tutorial, but we don't want to clutter the Ghidra distribution with dozens of additional executables that aren't actually used by the codebase.
We'd like a consistent set of binaries for the tutorial, but we don't want to clutter the Ghidra distribution with dozens of additional executables.
Fortunately, the BSim plugin includes a script for building the PostgreSQL backend, and that build process creates hundreds of object files.
So we can just build PostgreSQL and harvest the object files we need.
@ -11,6 +11,9 @@ We do not run any PostgreSQL code, we simply analyze some files produced when bu
Note that these files must be built on a machine running Linux.
Windows users can build these files in a Linux virtual machine.
First, download ``postgresql-15.3.tar.gz`` from the PostgreSQL web site.
Put this file in ``<ghidra_install_dir>/Ghidra/Features/BSim``.
To build the files, execute the following commands in a shell: [^1]
[^1]: You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully.
@ -22,13 +25,12 @@ export CFLAGS="-O2 -g"
./make-postgres.sh
mkdir ~/postgres_object_files
cd build
find . -name pl*.o -exec cp {} ~/postgres_object_files/ \;
find . -name p*o -size +100000c -size -700000c -exec cp {} ~/postgres_object_files/ \;
cd os/linux_x86_64/postgresql/bin
strip -s postgres
```
To continue on Windows, transfer the ``~/postgres_object_files`` directory and the (stripped) ``postgres`` executable to your Windows machine.
To continue on Windows, transfer the ``~/postgres_object_files`` directory and the stripped ``postgres`` executable to your Windows machine.
## Importing and Analyzing the Exercise Files

View File

@ -29,7 +29,7 @@ The index drastically reduces the number of vector comparisons needed and allows
databases holding up to 10 million unique vectors, and a *large* template, intended for databases holding up to 100 million unique vectors.
Querying ``foo`` against a BSim database typically yields a number of potential matches.
Each individual match for ``foo`` can be compared to `foo` in a side-by-side view, and certain information (such as function name) can be quickly transferred from a match to ``foo``.
Each individual match for ``foo`` can be compared to `foo` in a side-by-side view, and certain information (such as function name) can be quickly copied from a match to ``foo``.
We frequently call BSim vectors the *BSim signature* of a function, or just the *signature* when the context is clear.
@ -46,7 +46,7 @@ Using BSim involves the following components:
- A *BSim Client*, i.e., an instance of Ghidra with the BSim plugin enabled.
- This is where the reverse engineering happens.
- A *BSim Database*, which stores the BSim signatures.
- Also stores some metadata about each function and the containing executable.
- Also stores some metadata about each function and its containing executable.
- In particular, stores the ghidra:// URL of the associated Ghidra program.
- Does not store disassembly or decompiled functions.
- A *Ghidra Project*, which stores the analyzed programs used to populate the BSim database.

View File

@ -1,57 +0,0 @@
# Overview Queries
An **Overview Query** queries a BSim database for the number of matches to each
function in an executable. The matching functions themselves are not returned.
Similarity and Confidence thresholds apply to an Overview query, but the
"Matches per Function" bound does not.
To perform an Overview Query, select `BSim -> Perform Overview...` from the Code
Browser.
## Exercise 1: Hit Counts and Self-Similarities
1. Perform an Overview query on `postgres` using the default query bounds. You should see
the following result:
![](./images/overview_window.png)
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-similarity. Verify that that is the case for this table.
1. Q: Examine the functions with the highest hit count. Why are there so many matches, and
why do they all have the same BSim feature vector?
- <details><summary>A:</summary> These functions simply return constants. BSim feature vectors
incorporate the fact that varnode is constant but do not incorporate the specific value.</details>
## Exercise 2: Selections and Queries
Using the hit count column, it is possible to exclude functions with large numbers of matches.
1. In the Overview Table, select all functions whose hit count is 5 or less.
1. Right-click on the selection and perform the `Search Selected Functions` action. Sort the
query results by `Function Count` and verify that `demangler_gnu_v2_33_1` is far down the list.
## Exercise 3: Vector Hashes
Suppose `foo` and `bar` have the same number of hits in the Overview table. There are two
possibilities:
- `foo` and `bar` have distinct feature vectors which happen to have the same number of matches.
- `foo` and `bar` have the same feature vector.
An optional column, `Vector Hash`, can be used to distinguish between these two cases.
1. Enable the `Vector Hash` Column in the Overview Table.
1. Sort the hit count column in ascending order, (multi)sort the Self Significance column in
descending order, then (multi)sort the Vector Hash column in ascending order.
1. Q: What are the first functions in the table with the same vector hash?
- <details><summary>A:</summary> `ts_headline_json_byid_opt` and `ts_headline_jsob_byid_opt`
</details>
1. Examine the decompiled code of these two functions and verify that they should have identical
BSim vectors.
Next Section: [Queries and Filters](BSimTutorial_Filters.md)

View File

@ -0,0 +1,42 @@
# Overview Queries
An **Overview Query** queries a BSim database for the number of matches to each function in an executable.
The matching functions themselves are not returned.
Similarity and Confidence thresholds can be set for an Overview query, but there is no "Matches per Function" bound and no filters can be set.
To perform an Overview Query, select **BSim -> Perform Overview...** from the Code Browser.
## Exercise 1: Hit Counts and Self-Significance.
1. Perform an Overview query on ``postgres`` using the default query thresholds.
You should see the following result:
![overview window](images/overview_window.png)
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-significance.
Verify that that is the case for this table.
1. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions?
<details><summary>Answer:</summary> These are all instances of PostgreSQL statistics-reporting functions. Their bodies are quite similar.</details>
## Exercise 2: Selections and Queries
Using the hit count column, it is possible to exclude functions with large numbers of matches.
1. In the Overview Table, select all functions whose hit count is 2 or less.
1. Right-click on the selection and perform the **Search Selected Functions** action.
Sort the query results by descending **Function Count** and verify that ``demangler_gnu_v2_41`` is far down the list.
## Exercise 3: Vector Hashes
Suppose ``foo`` and ``bar`` have the same number of hits in the Overview table.
There are two possibilities:
1. ``foo`` and ``bar`` have distinct feature vectors which happen to have the same number of matches.
1. ``foo`` and ``bar`` have the same feature vector.
An optional column, **Vector Hash**, can be used to distinguish between these two cases.
1. Enable the **Vector Hash** Column in the Overview Table.
1. Find two functions with the vector hash.
1. Select the two corresponding rows in the table and then transfer the selection to the Listing using the ![make selection icon](images/text_align_justify.png) icon in the BSim Overview toolbar.
1. In the Listing, press ``Shift-C`` or right-click and perform the **Compare Selected Functions** action.
1. In the resulting Function Comparison window, convince yourself that these two functions should have the same BSim signature.
Next Section: [Queries and Filters](BSimTutorial_Filters.md)

View File

@ -6,17 +6,17 @@ Finally, we briefly mention a few other topics related to BSim.
There are are number of example scripts in the ``BSim`` script category, which demonstrate how to interact with BSim programmatically:
![](./images/script_manager.png)
![](images/script_manager.png)
## Visualizing Features
Finally, if you'd like to see the particular BSim features in a function, you can use the BSim Feature Visualizer.
This plugin allows you to highlight regions of the decompiled code corresponding to a particular feature and to display a graph representing the feature.
To use this plugin, first enable the ``BSimFeatureVisualizerPlugin`` via **File -> Configure ** from the Code Browser.
To use this plugin, first enable the ``BSimFeatureVisualizerPlugin`` via **File -> Configure** from the Code Browser.
You can then bring it via **BSim -> BSim Feature Visualizer**.
![](./images/feature_visualizer.png)
![](images/feature_visualizer.png)
This is the end of the tutorial.

View File

@ -8,14 +8,14 @@ This tutorial demonstrates how create a small BSim database and walks through so
**Detailed information about BSim can be found in the "BSim" entry of the Ghidra Help**.
1. [Introduction to BSim](BSimTutorial_Intro.md)
1. [Enabling BSim](BSimTutorial_Enabling.md)
1. [Starting Ghidra and Enabling BSim](BSimTutorial_Enabling.md)
1. [Creating and Populating a BSim Database from the GUI](BSimTutorial_Creating_Database_From_GUI.md)
1. [Basic BSim Queries](BSimTutorial_Basic_Queries.md)
1. [Ghidra from the Command Line](BSimTutorial_Ghidra_Command_Line.md)
1. [BSim from the Command Line](BSimTutorial_BSim_Command_Line.md)
1. [Evaluating Matches](BSimTutorial_Evaluating_Matches.md)
1. [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md)
1. [Overview Queries](BSimTutorial_Overview.md)
1. [Overview Queries](BSimTutorial_Overview_Queries.md)
1. [BSim Filters](BSimTutorial_Filters.md)
1. [Scripting and Visualization](BSimTutorial_Scripting.md)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 778 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 900 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 955 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 209 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 900 B

View File

@ -1,7 +1,9 @@
##VERSION: 2.0
##MODULE IP: Creative Commons Attribution 2.5
##MODULE IP: Crystal Clear Icons - LGPL 2.1
##MODULE IP: FAMFAMFAM Icons - CC 2.5
##MODULE IP: LGPL 2.1
##MODULE IP: LGPL 3.0
##MODULE IP: Modified Nuvola Icons - LGPL 2.1
##MODULE IP: Nuvola Icons - LGPL 2.1
##MODULE IP: Public Domain
@ -28,20 +30,25 @@ GhidraClass/BSim/BSimTutorial_Exe_Results.md||GHIDRA||||END|
GhidraClass/BSim/BSimTutorial_Filters.md||GHIDRA||||END|
GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.md||GHIDRA||||END|
GhidraClass/BSim/BSimTutorial_Intro.md||GHIDRA||||END|
GhidraClass/BSim/BSimTutorial_Overview.md||GHIDRA||||END|
GhidraClass/BSim/BSimTutorial_Overview_Queries.md||GHIDRA||||END|
GhidraClass/BSim/BSimTutorial_Scripting.md||GHIDRA||||END|
GhidraClass/BSim/README.md||GHIDRA||||END|
GhidraClass/BSim/images/actions.png||GHIDRA||||END|
GhidraClass/BSim/images/basic_query.png||GHIDRA||||END|
GhidraClass/BSim/images/bsim_search_dialog.png||GHIDRA||||END|
GhidraClass/BSim/images/configure.png||GHIDRA||||END|
GhidraClass/BSim/images/decomp_diff.png||GHIDRA||||END|
GhidraClass/BSim/images/exe_results.png||GHIDRA||||END|
GhidraClass/BSim/images/exe_results_actions.png||GHIDRA||||END|
GhidraClass/BSim/images/exec.png||Crystal Clear Icons - LGPL 2.1||||END|
GhidraClass/BSim/images/feature_visualizer.png||GHIDRA||||END|
GhidraClass/BSim/images/filter_results.png||GHIDRA||||END|
GhidraClass/BSim/images/information.png||FAMFAMFAM Icons - CC 2.5||||END|
GhidraClass/BSim/images/lock.gif||GHIDRA||||END|
GhidraClass/BSim/images/overview_window.png||GHIDRA||||END|
GhidraClass/BSim/images/preferences-web-browser-shortcuts.png||LGPL 3.0||||END|
GhidraClass/BSim/images/script_manager.png||GHIDRA||||END|
GhidraClass/BSim/images/search_info.png||GHIDRA||||END|
GhidraClass/BSim/images/text_align_justify.png||FAMFAMFAM Icons - CC 2.5||||END|
GhidraClass/BSim/images/unlock.gif||GHIDRA||||END|
GhidraClass/Beginner/Images/GhidraLogo64.png||GHIDRA||||END|
GhidraClass/Beginner/Introduction_to_Ghidra_Student_Guide.html||GHIDRA|||This file contains mostly Ghidra content, but also includes code that is available for distribution, without restrictions, from https://github.com/paulrouget/dzslides.|END|
GhidraClass/Beginner/Introduction_to_Ghidra_Student_Guide_withNotes.html||Public Domain|||Slight modification of code that is available for distribution, without restrictions, (original extremely permissive wtf license allows us to change IP to Public Domain),from https://github.com/paulrouget/dzslides.|END|