mirror of
https://github.com/NationalSecurityAgency/ghidra.git
synced 2024-11-10 06:02:09 +00:00
GP-4009 Introduced BSim functionality including support for postgresql,
elasticsearch and h2 databases. Added BSim correlator to Version Tracking.
This commit is contained in:
parent
f0f5b8f2a4
commit
0865a3dfb0
@ -1,8 +1,6 @@
|
||||
##VERSION: 2.0
|
||||
##MODULE IP: Apache License 2.0
|
||||
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
||||
.classpath||NONE||reviewed||END|
|
||||
.project||NONE||reviewed||END|
|
||||
FridaNotes.txt||GHIDRA||||END|
|
||||
Module.manifest||GHIDRA||||END|
|
||||
build.gradle||GHIDRA||||END|
|
||||
|
@ -1,8 +1,6 @@
|
||||
##VERSION: 2.0
|
||||
##MODULE IP: Apache License 2.0
|
||||
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
||||
.classpath||NONE||reviewed||END|
|
||||
.project||NONE||reviewed||END|
|
||||
Module.manifest||GHIDRA||||END|
|
||||
build.gradle||GHIDRA||||END|
|
||||
src/llvm-project/lldb/bindings/java/java-typemaps.swig||Apache License 2.0 with LLVM Exceptions||||END|
|
||||
|
@ -1,8 +1,6 @@
|
||||
##VERSION: 2.0
|
||||
##MODULE IP: Apache License 2.0
|
||||
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
||||
.classpath||NONE||reviewed||END|
|
||||
.project||NONE||reviewed||END|
|
||||
InstructionsForBuildingLLDBInterface.txt||GHIDRA||||END|
|
||||
Module.manifest||GHIDRA||||END|
|
||||
build.gradle||GHIDRA||||END|
|
||||
|
81
Ghidra/Extensions/BSimElasticPlugin/INSTALL.txt
Executable file
81
Ghidra/Extensions/BSimElasticPlugin/INSTALL.txt
Executable file
@ -0,0 +1,81 @@
|
||||
Installation of the Elasticsearch BSim Plug-in:
|
||||
|
||||
In order to use Elasticsearch as the back-end database for a BSim instance,
|
||||
the lsh plug-in, included with this Ghidra extension, must be installed on
|
||||
the Elasticsearch cluster.
|
||||
|
||||
The lsh plug-in is bundled in the standard plug-in format as the file
|
||||
'lsh.zip'. It must be installed separately on EVERY node of the cluster,
|
||||
and each node must be restarted after the install in order for the plug-in to
|
||||
become active.
|
||||
|
||||
For a single node, installation is accomplished with the command-line
|
||||
'elasticsearch-plugin' script that comes with the standard Elasticsearch
|
||||
distribution. It expects a URL pointing to the plug-in to be installed.
|
||||
The basic command, executed in the Elasticsearch installation directory
|
||||
for the node, is
|
||||
|
||||
bin/elasticsearch-plugin install file:///path/to/ghidra/Ghidra/Extensions/BSimElasticPlugin/data/lsh.zip
|
||||
|
||||
Replace the initial portion of the absolute path in the URL to point to your
|
||||
particular Ghidra installation.
|
||||
|
||||
Deployment:
|
||||
|
||||
Follow the Elasticsearch documentation to do any additional configuration,
|
||||
starting, stopping, and management of your Elasticsearch cluster.
|
||||
|
||||
To try BSim with a toy deployment, you can start a single node (as per the
|
||||
documentation) from the command-line by just running
|
||||
|
||||
bin/elasticsearch
|
||||
|
||||
This will dump logging messages to the console, and you should see '[lsh]'
|
||||
listed among the loaded plug-ins as the node starts up.
|
||||
|
||||
Once the Elasticsearch node(s) are running, whether they are a toy or a full
|
||||
deployment, you can immediately proceed to the BSim 'bsim' command.
|
||||
The Ghidra/BSim client and 'bsim' command automatically assume an
|
||||
Elasticsearch server when they see the 'https' protocol in the provided URLs,
|
||||
although the 'elastic" protocol may also be specified and is equivalent.
|
||||
The use of the 'http' protocol for Elasticsearch is not supported.
|
||||
Adjust the hostname, port number, and repository name as appropriate.
|
||||
Use a command-line similar to the following to create a BSim instance:
|
||||
|
||||
bsim createdatabase elastic://1.2.3.4:9200/repo medium_32
|
||||
|
||||
This is equivalent to:
|
||||
|
||||
bsim createdatabase https://1.2.3.4:9200/repo medium_32
|
||||
|
||||
Use a command-line like this to generate and commit signatures from a Ghidra Server
|
||||
repository to the Elasticsearch database created above:
|
||||
|
||||
bsim generatesigs ghidra://1.2.3.4/repo bsim=elastic://1.2.3.4:9200/repo
|
||||
|
||||
Within Ghidra's BSim client, enter the same URL into the database connection
|
||||
panel in order to place queries to your Elasticsearch deployment. See the BSim
|
||||
documentation included with Ghidra for full details.
|
||||
|
||||
|
||||
Version:
|
||||
|
||||
The current BSim plug-in was designed and tested with Elasticsearch version 7.17.4.
|
||||
A change to the Elasticsearch scripting interface, starting with version 7.15, makes the BSim
|
||||
plug-in incompatible with previous versions, but the lsh plug-in jars may work without change
|
||||
across later Elasticsearch versions.
|
||||
|
||||
Elasticsearch plug-ins explicitly encode the version of Elasticsearch they work with, and the
|
||||
plug-in script will refuse to install the lsh plug-in if its version does not match your
|
||||
particular installation. If your Elasticsearch version is slightly different, you can try
|
||||
unpacking the zip file, changing the version number to match your software, and then repacking
|
||||
the zip file. Within the zip archive, the version number is stored in a configuration file
|
||||
|
||||
elasticsearch/plugin-descriptor.properties
|
||||
|
||||
The file format is fairly simple: edit the line
|
||||
|
||||
elasticsearch.version=7.17.4
|
||||
|
||||
The plugin may work with other nearby versions, but proceed at your own risk.
|
||||
|
0
Ghidra/Extensions/BSimElasticPlugin/Module.manifest
Executable file
0
Ghidra/Extensions/BSimElasticPlugin/Module.manifest
Executable file
99
Ghidra/Extensions/BSimElasticPlugin/build.gradle
Executable file
99
Ghidra/Extensions/BSimElasticPlugin/build.gradle
Executable file
@ -0,0 +1,99 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
apply from: "$rootProject.projectDir/gradle/distributableGhidraExtension.gradle"
|
||||
apply from: "$rootProject.projectDir/gradle/javaProject.gradle"
|
||||
apply plugin: 'eclipse'
|
||||
eclipse.project.name = 'Xtra BSimElasticPlugin'
|
||||
// This module is very different from other Ghidra modules. It is creating a stand-alone jar
|
||||
// file for an elastic database plugin. It is copying files from other modules into this module
|
||||
// before building a jar file from the files in this module and the cherry-picked files from
|
||||
// other modules (This is very brittle and will break if any of the files are renamed or moved.)
|
||||
project.ext.includeExtensionInInstallation = true
|
||||
|
||||
apply plugin: 'java'
|
||||
|
||||
sourceSets {
|
||||
elasticPlugin {
|
||||
java {
|
||||
srcDirs = [ 'src', 'srcdummy', 'build/genericSrc', 'build/utilitySrc', 'build/bsimSrc' ]
|
||||
}
|
||||
}
|
||||
}
|
||||
// this dependency block is needed for this code to compile in our eclipse environment. It is not needed
|
||||
// for the gradle build
|
||||
dependencies {
|
||||
|
||||
implementation project(':BSim')
|
||||
}
|
||||
libsDirName='ziplayout'
|
||||
|
||||
task copyGenericTask(type: Copy) {
|
||||
from project(':Generic').file('src/main/java')
|
||||
into 'build/genericSrc'
|
||||
include 'generic/lsh/vector/*.java'
|
||||
include 'generic/hash/SimpleCRC32.java'
|
||||
include 'ghidra/util/xml/SpecXmlUtils.java'
|
||||
}
|
||||
|
||||
task copyUtilityTask(type: Copy) {
|
||||
from project(':Utility').file('src/main/java')
|
||||
into 'build/utilitySrc'
|
||||
include 'ghidra/xml/XmlPullParser.java'
|
||||
include 'ghidra/xml/XmlElement.java'
|
||||
}
|
||||
|
||||
task copyBSimTask(type: Copy) {
|
||||
from project(':BSim').file('src/main/java')
|
||||
into 'build/bsimSrc'
|
||||
include 'ghidra/features/bsim/query/elastic/ElasticUtilities.java'
|
||||
include 'ghidra/features/bsim/query/elastic/Base64Lite.java'
|
||||
include 'ghidra/features/bsim/query/elastic/Base64VectorFactory.java'
|
||||
}
|
||||
|
||||
task copyPropertiesFile(type: Copy) {
|
||||
from 'contribZipExclude/plugin-descriptor.properties'
|
||||
into 'build/ziplayout'
|
||||
}
|
||||
|
||||
task elasticPluginJar(type: Jar) {
|
||||
from sourceSets.elasticPlugin.output
|
||||
archiveBaseName = 'lsh'
|
||||
excludes = [
|
||||
'**/org/apache',
|
||||
'**/org/elasticsearch/common',
|
||||
'**/org/elasticsearch/env',
|
||||
'**/org/elasticsearch/index',
|
||||
'**/org/elasticsearch/indices',
|
||||
'**/org/elasticsearch/plugins',
|
||||
'**/org/elasticsearch/script',
|
||||
'**/org/elasticsearch/search'
|
||||
]
|
||||
}
|
||||
|
||||
task elasticPluginZip(type: Zip) {
|
||||
from 'build/ziplayout'
|
||||
archiveBaseName = 'lsh'
|
||||
destinationDirectory = file("build/data")
|
||||
}
|
||||
|
||||
compileElasticPluginJava.dependsOn copyGenericTask
|
||||
compileElasticPluginJava.dependsOn copyUtilityTask
|
||||
compileElasticPluginJava.dependsOn copyBSimTask
|
||||
|
||||
elasticPluginZip.dependsOn elasticPluginJar
|
||||
elasticPluginZip.dependsOn copyPropertiesFile
|
||||
|
||||
jar.dependsOn elasticPluginZip
|
6
Ghidra/Extensions/BSimElasticPlugin/certification.manifest
Executable file
6
Ghidra/Extensions/BSimElasticPlugin/certification.manifest
Executable file
@ -0,0 +1,6 @@
|
||||
##VERSION: 2.0
|
||||
##MODULE IP: Apache License 2.0
|
||||
INSTALL.txt||GHIDRA||||END|
|
||||
Module.manifest||GHIDRA||reviewed||END|
|
||||
contribZipExclude/plugin-descriptor.properties||GHIDRA||||END|
|
||||
extension.properties||GHIDRA||||END|
|
@ -0,0 +1,6 @@
|
||||
description=Feature Vector Plugin
|
||||
version=1.0
|
||||
name=lsh
|
||||
classname=org.elasticsearch.plugin.analysis.lsh.AnalysisLSHPlugin
|
||||
java.version=1.11
|
||||
elasticsearch.version=8.8.1
|
5
Ghidra/Extensions/BSimElasticPlugin/extension.properties
Executable file
5
Ghidra/Extensions/BSimElasticPlugin/extension.properties
Executable file
@ -0,0 +1,5 @@
|
||||
name=BSimElasticPlugin
|
||||
description=Elastic search backend for BSim.
|
||||
author=Ghidra Team
|
||||
createdOn=11/23/20
|
||||
version=@extversion@
|
@ -0,0 +1,134 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugin.analysis.lsh;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.*;
|
||||
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.env.Environment;
|
||||
import org.elasticsearch.index.IndexModule;
|
||||
import org.elasticsearch.index.IndexSettings;
|
||||
import org.elasticsearch.index.analysis.TokenizerFactory;
|
||||
import org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider;
|
||||
import org.elasticsearch.plugins.*;
|
||||
import org.elasticsearch.script.ScriptContext;
|
||||
import org.elasticsearch.script.ScriptEngine;
|
||||
|
||||
import generic.lsh.vector.IDFLookup;
|
||||
import generic.lsh.vector.WeightFactory;
|
||||
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||
import ghidra.features.bsim.query.elastic.ElasticUtilities;
|
||||
|
||||
public class AnalysisLSHPlugin extends Plugin implements AnalysisPlugin, ScriptPlugin {
|
||||
|
||||
public static final String TOKENIZER_SETTINGS_BASE = "index.analysis.tokenizer.lsh_";
|
||||
public static String settingString = "";
|
||||
|
||||
static private Map<String, Base64VectorFactory> vecFactoryMap = new HashMap<>();
|
||||
private Map<String, AnalysisProvider<TokenizerFactory>> tokFactoryMap;
|
||||
|
||||
public class TokenizerFactoryProvider implements AnalysisProvider<TokenizerFactory> {
|
||||
|
||||
@Override
|
||||
public TokenizerFactory get(IndexSettings indexSettings, Environment env, String name,
|
||||
Settings settings) throws IOException {
|
||||
// settingString = settingString + " : " + indexSettings.getIndex().getName() + '(' + name + ')';
|
||||
return new LSHTokenizerFactory(indexSettings, env, name, settings);
|
||||
}
|
||||
}
|
||||
|
||||
public AnalysisLSHPlugin() {
|
||||
TokenizerFactoryProvider provider = new TokenizerFactoryProvider();
|
||||
tokFactoryMap = Collections.singletonMap("lsh_tokenizer", provider);
|
||||
}
|
||||
|
||||
private static void setupVectorFactory(String name, String idfConfig, String lshWeights) {
|
||||
WeightFactory weightFactory = new WeightFactory();
|
||||
String[] split = lshWeights.split(" ");
|
||||
double[] weightArray = new double[split.length];
|
||||
for (int i = 0; i < weightArray.length; ++i) {
|
||||
weightArray[i] = Double.parseDouble(split[i]);
|
||||
}
|
||||
weightFactory.set(weightArray);
|
||||
IDFLookup idfLookup = new IDFLookup();
|
||||
split = idfConfig.split(" ");
|
||||
int[] intArray = new int[split.length];
|
||||
for (int i = 0; i < intArray.length; ++i) {
|
||||
intArray[i] = Integer.parseInt(split[i]);
|
||||
}
|
||||
idfLookup.set(intArray);
|
||||
Base64VectorFactory vectorFactory = new Base64VectorFactory();
|
||||
// Server-side factory is never used to generate signatures,
|
||||
// so we don't need to specify settings
|
||||
vectorFactory.set(weightFactory, idfLookup, 0);
|
||||
vecFactoryMap.put(name, vectorFactory);
|
||||
}
|
||||
|
||||
/**
|
||||
* Entry point for Tokenizer and Script factories to grab the global vector factory
|
||||
* @param name is the name of the tokenizer
|
||||
* @return the vector factory used by the tokenizer
|
||||
*/
|
||||
public static Base64VectorFactory getVectorFactory(String name) {
|
||||
return vecFactoryMap.get(name);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void onIndexModule(IndexModule indexModule) {
|
||||
super.onIndexModule(indexModule);
|
||||
|
||||
Settings settings = indexModule.getSettings();
|
||||
String name = null;
|
||||
// Look for the specific kind of tokenizer settings, within the global settings for the index
|
||||
for (String key : settings.keySet()) {
|
||||
if (key.startsWith(TOKENIZER_SETTINGS_BASE)) {
|
||||
// We can have different settings for different indices, distinguished by this name
|
||||
int pos = key.indexOf('.', TOKENIZER_SETTINGS_BASE.length() + 1);
|
||||
if (pos > 0) {
|
||||
name = key.substring(TOKENIZER_SETTINGS_BASE.length(), pos);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (name != null) {
|
||||
String tokenizerName = "lsh_" + name;
|
||||
if (getVectorFactory(tokenizerName) != null) {
|
||||
return; // Factory already exists
|
||||
}
|
||||
settingString = settingString + " : onModule(" + name + ')';
|
||||
// If we found LSH tokenizer settings, pull them out and construct an LSHVectorFactory with them
|
||||
String baseKey = TOKENIZER_SETTINGS_BASE + name + '.';
|
||||
String idfConfig = settings.get(baseKey + ElasticUtilities.IDF_CONFIG);
|
||||
String lshWeights = settings.get(baseKey + ElasticUtilities.LSH_WEIGHTS);
|
||||
if (idfConfig == null || lshWeights == null) {
|
||||
return; // IDF_CONFIG and LSH_WEIGHTS settings must be present to proceed
|
||||
}
|
||||
setupVectorFactory(tokenizerName, idfConfig, lshWeights);
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) {
|
||||
return new BSimScriptEngine();
|
||||
}
|
||||
|
||||
@Override
|
||||
public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() {
|
||||
return tokFactoryMap;
|
||||
}
|
||||
|
||||
}
|
@ -0,0 +1,54 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugin.analysis.lsh;
|
||||
|
||||
import java.util.*;
|
||||
|
||||
import org.elasticsearch.script.*;
|
||||
|
||||
public class BSimScriptEngine implements ScriptEngine {
|
||||
private final static String ENGINE_NAME = "bsim_scripts";
|
||||
|
||||
@Override
|
||||
public <FactoryType> FactoryType compile(String scriptName, String scriptSource,
|
||||
ScriptContext<FactoryType> context, Map<String, String> params) {
|
||||
if (context.equals(ScoreScript.CONTEXT) == false) {
|
||||
throw new IllegalArgumentException(
|
||||
getType() + "scripts cannot be used for context [" + context.name + "]");
|
||||
}
|
||||
if (VectorCompareScriptFactory.SCRIPT_NAME.equals(scriptSource)) {
|
||||
ScoreScript.Factory factory = new VectorCompareScriptFactory();
|
||||
return context.factoryClazz.cast(factory);
|
||||
}
|
||||
throw new IllegalArgumentException("Unknown script name " + scriptSource);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() {
|
||||
// Can free up resources
|
||||
}
|
||||
|
||||
@Override
|
||||
public Set<ScriptContext<?>> getSupportedContexts() {
|
||||
return Collections.singleton(ScoreScript.CONTEXT);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getType() {
|
||||
return ENGINE_NAME;
|
||||
}
|
||||
|
||||
}
|
@ -0,0 +1,293 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugin.analysis.lsh;
|
||||
|
||||
import generic.lsh.vector.HashEntry;
|
||||
import ghidra.features.bsim.query.elastic.Base64Lite;
|
||||
|
||||
/**
|
||||
* Class for calculating the bin ids on LSHVectors as part of the LSH indexing process
|
||||
*
|
||||
*/
|
||||
public class LSHBinner {
|
||||
|
||||
private static final char[] hashSignTable = new char[512];
|
||||
private static int VEC_SIZE_UPPER = 5; // Size above which to use FFT to calculate dotproduct family
|
||||
private static int LSH_HASHBASE = 0xd7e6a299;
|
||||
private static int HASH_MULTIPLIER = 1103515245;
|
||||
private static int HASH_ADDEND = 12345;
|
||||
|
||||
public static class BytesRef {
|
||||
public char[] buffer;
|
||||
public BytesRef(int size) { buffer = new char[size]; }
|
||||
}
|
||||
|
||||
private int k; // Number of bits per bin id
|
||||
private int L; // Number of binnings
|
||||
private double doubleBuffer[]; // Scratch space for dot-product calculation
|
||||
private BytesRef tokenList[]; // Final token list used by lucene
|
||||
|
||||
static {
|
||||
/**
|
||||
* This is a precalculated table for generating dot-products with the random family of vectors directly
|
||||
* The first vector r_0 is expressed as a hashing function on the dimension index and the other vectors
|
||||
* are derived from r_0 using an FFT. The table is formed by precalculating the FFT on basis vectors in this table
|
||||
*/
|
||||
int i, j;
|
||||
int[] arr = new int[16];
|
||||
int hibit0ptr;
|
||||
int hibit1ptr;
|
||||
|
||||
for (i = 0; i < 16; ++i) { /* For each 4-bit position */
|
||||
hibit0ptr = i * 16;
|
||||
hibit1ptr = (i + 16) * 16;
|
||||
for (j = 0; j < 16; ++j)
|
||||
arr[j] = 0;
|
||||
|
||||
arr[i] = 1;
|
||||
hashFft16(arr);
|
||||
for (j = 0; j < 16; ++j) {
|
||||
if (arr[j] > 0) {
|
||||
hashSignTable[hibit0ptr + j] = '+';
|
||||
hashSignTable[hibit1ptr + j] = '-';
|
||||
} else {
|
||||
hashSignTable[hibit0ptr + j] = '-';
|
||||
hashSignTable[hibit1ptr + j] = '+';
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Raw Fast Fourier Transform on 16 wide integer array
|
||||
* @param arr is the 16-long array
|
||||
*/
|
||||
private static void hashFft16(int[] arr) {
|
||||
int x,y;
|
||||
|
||||
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||
}
|
||||
|
||||
/**
|
||||
* Raw Fast Fourier Transform on 16 wide array of doubles
|
||||
* @param arr is the 16-long array
|
||||
*/
|
||||
private static void hashFft16(double[] arr) {
|
||||
double x,y;
|
||||
|
||||
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||
}
|
||||
|
||||
public LSHBinner() {
|
||||
doubleBuffer = new double[16];
|
||||
k = -1;
|
||||
L = -1;
|
||||
tokenList = null;
|
||||
}
|
||||
|
||||
public void setKandL(int k,int L) {
|
||||
this.k = k;
|
||||
this.L = L;
|
||||
int numBits = 1;
|
||||
while( (1 << numBits) <= L )
|
||||
numBits += 1;
|
||||
numBits += k;
|
||||
int numChar = numBits / 6;
|
||||
if ((numBits % 6)!= 0)
|
||||
numChar += 1;
|
||||
tokenList = new BytesRef[L];
|
||||
for(int i=0;i<L;++i) {
|
||||
tokenList[i] = new BytesRef(numChar);
|
||||
}
|
||||
}
|
||||
|
||||
public BytesRef[] getTokenList() {
|
||||
return tokenList;
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate a dot product of the hash vector in -vec- with a random family of 16 vectors, { r }
|
||||
* r_0 is a randomly generated set of +1 -1 coefficients across all the dimensions (indexed by uint32 vec[i].hash)
|
||||
* The coefficient is calculated as a hashing function from the seed -hashcur- and the index (vec[i].hash),
|
||||
* so it should be balanced between +1 and -1.
|
||||
* All the other vectors are generated from an FFT of r_0. This allows the dotproduct with vec to be calculated
|
||||
* using an FFT if -vec- has many non-zero coefficients. If -vec- has only a few non-zero coefficients,
|
||||
* the dotproduct if calculated with each vector in the family directly for better efficiency.
|
||||
* The resulting dotproducts are converted into a 16-long bitvector based on the sign of the dotproduct and
|
||||
* placed in -bucket-
|
||||
* @param bucket is the (possibly partially filled) accumulator for dotproduct bits
|
||||
* @param vec is the HashEntry vector to calculate the dot-products on
|
||||
* @param hashcur is the index of the hash subfamily to representing r_0
|
||||
* @param res is space (a 16-long double array) for the in-place FFT
|
||||
* @return the bucket with new accumulated dot-product bits
|
||||
*/
|
||||
private int hash16DotProduct(int bucket,HashEntry[] vec,int hashcur)
|
||||
|
||||
{
|
||||
int i, j;
|
||||
int rowNum;
|
||||
int signPtr;
|
||||
|
||||
for (i = 0; i < 16; ++i)
|
||||
doubleBuffer[i] = 0.0; // Initialize the dotproduct results to zero
|
||||
|
||||
if (vec.length < VEC_SIZE_UPPER) { // If there are a small number of non-zero coefficients in -vec-
|
||||
for (i = 0; i < vec.length; ++i) {
|
||||
rowNum = vec[i].getHash() ^ hashcur; // Calculate the rest of the r_0 hashing function
|
||||
rowNum = (rowNum * HASH_MULTIPLIER) + HASH_ADDEND;
|
||||
rowNum = (rowNum >>> 24) & 0x1f;
|
||||
signPtr = rowNum * 16;
|
||||
for (j = 0; j < 16; ++j) { // Based on the precalculated coeff table calculate this portion of dotproduct
|
||||
if (hashSignTable[signPtr + j] == '+')
|
||||
doubleBuffer[j] += vec[i].getCoeff(); // Dot product with +1 // coeff
|
||||
else
|
||||
doubleBuffer[j] -= vec[i].getCoeff(); // Dot product with -1 // coeff
|
||||
}
|
||||
}
|
||||
}
|
||||
else { // If we have many non-zero coefficients in -vec-
|
||||
for (i = 0; i < vec.length; ++i) {
|
||||
rowNum = vec[i].getHash() ^ hashcur; // Calculate the rest of the r_0 hashing function
|
||||
rowNum = (rowNum * HASH_MULTIPLIER) + HASH_ADDEND;
|
||||
rowNum = (rowNum >>> 24) & 0x1f;
|
||||
if (rowNum < 0x10) // Set-up for the FFT
|
||||
doubleBuffer[rowNum] += vec[i].getCoeff();
|
||||
else
|
||||
doubleBuffer[rowNum & 0xf] -= vec[i].getCoeff();
|
||||
}
|
||||
hashFft16(doubleBuffer); // Calculate the remaining dot-products be performing FFT
|
||||
}
|
||||
|
||||
for (i = 0; i < 16; ++i) { // Convert the dot-product results to a bit-vector
|
||||
bucket <<= 1;
|
||||
if (doubleBuffer[i] > 0.0)
|
||||
bucket |= 1;
|
||||
}
|
||||
return bucket;
|
||||
}
|
||||
|
||||
public void generateBinIds(HashEntry[] vec)
|
||||
|
||||
{
|
||||
int bucket = 0;
|
||||
int bucketcnt = 0;
|
||||
int i,bitsleft;
|
||||
int curid;
|
||||
int mask,val;
|
||||
int hashbase = LSH_HASHBASE;
|
||||
|
||||
for (i = 0; i < L; ++i) {
|
||||
curid = i; // Tack-on bits that indicate the particular table this bin id belongs to
|
||||
bitsleft = k;
|
||||
do {
|
||||
if (bucketcnt == 0) {
|
||||
hashbase = (hashbase * HASH_MULTIPLIER) + HASH_ADDEND;
|
||||
bucket = hash16DotProduct(bucket, vec, hashbase);
|
||||
bucketcnt += 16;
|
||||
}
|
||||
if (bucketcnt >= bitsleft) {
|
||||
curid <<= bitsleft;
|
||||
mask = 1;
|
||||
mask = (mask << bitsleft) - 1;
|
||||
val = bucket >>> (bucketcnt - bitsleft);
|
||||
curid |= (val & mask);
|
||||
bucketcnt -= bitsleft;
|
||||
bitsleft = 0;
|
||||
} else {
|
||||
curid <<= bucketcnt;
|
||||
mask = 1;
|
||||
mask = (mask << bucketcnt) - 1;
|
||||
curid |= (bucket & mask);
|
||||
bitsleft -= bucketcnt;
|
||||
bucketcnt = 0;
|
||||
}
|
||||
} while (bitsleft > 0);
|
||||
char[] token = tokenList[i].buffer;
|
||||
for(int j=0;j<token.length;++j) {
|
||||
token[j] = Base64Lite.encode[curid & 0x3f]; // encode 6 bits
|
||||
curid >>= 6; // move to next 6 bits
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
@ -0,0 +1,68 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugin.analysis.lsh;
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.analysis.Tokenizer;
|
||||
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
|
||||
import org.elasticsearch.plugin.analysis.lsh.LSHBinner.BytesRef;
|
||||
|
||||
import generic.lsh.vector.LSHVector;
|
||||
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||
|
||||
public class LSHTokenizer extends Tokenizer {
|
||||
private final CharTermAttribute bytesAtt = addAttribute(CharTermAttribute.class);
|
||||
private BytesRef[] tokens;
|
||||
private int pos; // Number of terms/tokens returned so far
|
||||
private Base64VectorFactory vectorFactory;
|
||||
private LSHBinner binner;
|
||||
private char[] vecBuffer;
|
||||
|
||||
public LSHTokenizer(int k,int L,Base64VectorFactory vFactory) {
|
||||
super(DEFAULT_TOKEN_ATTRIBUTE_FACTORY);
|
||||
vectorFactory = vFactory;
|
||||
binner = new LSHBinner();
|
||||
binner.setKandL(k, L);
|
||||
pos = -1;
|
||||
vecBuffer = Base64VectorFactory.allocateBuffer();
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean incrementToken() throws IOException {
|
||||
clearAttributes();
|
||||
if (pos < 0) {
|
||||
LSHVector vector = vectorFactory.restoreVectorFromBase64(input,vecBuffer);
|
||||
// AnalysisLSHPlugin.settingString = AnalysisLSHPlugin.settingString + " : " + Long.toHexString(vector.calcUniqueHash());
|
||||
binner.generateBinIds(vector.getEntries());
|
||||
tokens = binner.getTokenList();
|
||||
pos = 0;
|
||||
}
|
||||
if (pos < tokens.length) {
|
||||
char[] buffer = tokens[pos].buffer;
|
||||
bytesAtt.copyBuffer(buffer,0,buffer.length);
|
||||
pos += 1;
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void reset() throws IOException {
|
||||
super.reset();
|
||||
pos = -1;
|
||||
}
|
||||
}
|
@ -0,0 +1,44 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugin.analysis.lsh;
|
||||
|
||||
import org.apache.lucene.analysis.Tokenizer;
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.env.Environment;
|
||||
import org.elasticsearch.index.IndexSettings;
|
||||
import org.elasticsearch.index.analysis.AbstractTokenizerFactory;
|
||||
|
||||
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||
import ghidra.features.bsim.query.elastic.ElasticUtilities;
|
||||
|
||||
public class LSHTokenizerFactory extends AbstractTokenizerFactory {
|
||||
|
||||
private Base64VectorFactory vectorFactory;
|
||||
private int k;
|
||||
private int L;
|
||||
|
||||
public LSHTokenizerFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
|
||||
super(indexSettings, settings, name);
|
||||
k = settings.getAsInt(ElasticUtilities.K_SETTING, -1);
|
||||
L = settings.getAsInt(ElasticUtilities.L_SETTING, -1);
|
||||
vectorFactory = AnalysisLSHPlugin.getVectorFactory(name);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Tokenizer create() {
|
||||
return new LSHTokenizer(k,L,vectorFactory);
|
||||
}
|
||||
}
|
@ -0,0 +1,147 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugin.analysis.lsh;
|
||||
|
||||
import java.io.*;
|
||||
import java.util.Map;
|
||||
|
||||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.elasticsearch.script.*;
|
||||
import org.elasticsearch.script.ScoreScript.LeafFactory;
|
||||
import org.elasticsearch.search.lookup.SearchLookup;
|
||||
|
||||
import generic.lsh.vector.LSHVector;
|
||||
import generic.lsh.vector.VectorCompare;
|
||||
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||
|
||||
public class VectorCompareScriptFactory implements ScoreScript.Factory {
|
||||
|
||||
public final static String SCRIPT_NAME = "lsh_compare";
|
||||
public final static String FEATURES_NAME = "{\"features\":\"";
|
||||
|
||||
@Override
|
||||
public boolean isResultDeterministic() {
|
||||
return true;
|
||||
}
|
||||
|
||||
@Override
|
||||
public LeafFactory newFactory(Map<String, Object> params, SearchLookup lookup) {
|
||||
return new VectorCompareLeafFactory(params, lookup);
|
||||
}
|
||||
|
||||
private static class VectorCompareLeafFactory implements LeafFactory {
|
||||
|
||||
private final Map<String, Object> params;
|
||||
private final SearchLookup lookup;
|
||||
private LSHVector baseVector; // Vector being compared to everything
|
||||
private final double simthresh; // Similarity threshold
|
||||
private final double sigthresh; // Significance threshold
|
||||
private final Base64VectorFactory vectorFactory; // Factory used for this particular query
|
||||
|
||||
private VectorCompareLeafFactory(Map<String, Object> params, SearchLookup lookup) {
|
||||
this.params = params;
|
||||
this.lookup = lookup;
|
||||
vectorFactory = AnalysisLSHPlugin.getVectorFactory((String) params.get("indexname"));
|
||||
simthresh = (Double) params.get("simthresh");
|
||||
sigthresh = (Double) params.get("sigthresh");
|
||||
StringReader reader = new StringReader((String) params.get("vector"));
|
||||
try {
|
||||
baseVector = vectorFactory.restoreVectorFromBase64(reader,
|
||||
Base64VectorFactory.allocateBuffer());
|
||||
}
|
||||
catch (IOException e) {
|
||||
baseVector = null;
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean needs_score() {
|
||||
return false;
|
||||
}
|
||||
|
||||
private static int scanForFeatures(byte[] buffer, int offset) throws IOException {
|
||||
int i = 0;
|
||||
while (i < FEATURES_NAME.length()) {
|
||||
char curChar = FEATURES_NAME.charAt(i);
|
||||
int val = buffer[offset];
|
||||
if (val == curChar) {
|
||||
i += 1;
|
||||
offset += 1;
|
||||
}
|
||||
else if (val == ' ' || val == '\t') {
|
||||
offset += 1;
|
||||
}
|
||||
else {
|
||||
throw new IOException("Document is missing \"features\"");
|
||||
}
|
||||
}
|
||||
return offset;
|
||||
}
|
||||
|
||||
private static int scanForLength(BytesRef byteRef, int startOffset) throws IOException {
|
||||
int finalLength = 0;
|
||||
int maxLength = byteRef.length - (startOffset - byteRef.offset);
|
||||
while (finalLength < maxLength) {
|
||||
if (byteRef.bytes[finalLength + startOffset] == '\"') {
|
||||
break;
|
||||
}
|
||||
finalLength += 1;
|
||||
}
|
||||
if (finalLength == byteRef.length) {
|
||||
throw new IOException("Document does not contain complete \"features\"");
|
||||
}
|
||||
return finalLength;
|
||||
}
|
||||
|
||||
@Override
|
||||
public ScoreScript newInstance(DocReader docReader) throws IOException {
|
||||
return new ScoreScript(params, lookup, docReader) {
|
||||
@Override
|
||||
public double execute(ExplanationHolder explanation) {
|
||||
try {
|
||||
DocValuesDocReader dvReader = (DocValuesDocReader) docReader;
|
||||
Document document =
|
||||
dvReader.getLeafReaderContext().reader().document(_getDocId());
|
||||
BytesRef byteRef = document.getField("_source").binaryValue();
|
||||
int valOffset = scanForFeatures(byteRef.bytes, byteRef.offset);
|
||||
int finalLength = scanForLength(byteRef, valOffset);
|
||||
InputStream inputStream =
|
||||
new ByteArrayInputStream(byteRef.bytes, valOffset, finalLength);
|
||||
Reader reader = new InputStreamReader(inputStream);
|
||||
// Should be sharing the VectorCompare between different calls
|
||||
// but apparently this routine needs to be thread safe, so we allocate it per call
|
||||
VectorCompare vectorCompare = new VectorCompare();
|
||||
LSHVector curVec = vectorFactory.restoreVectorFromBase64(reader,
|
||||
Base64VectorFactory.allocateBuffer());
|
||||
double sim = baseVector.compare(curVec, vectorCompare);
|
||||
if (sim <= simthresh) {
|
||||
return 0.0;
|
||||
}
|
||||
double sig = vectorFactory.calculateSignificance(vectorCompare);
|
||||
if (sig <= sigthresh) {
|
||||
return 0.0;
|
||||
}
|
||||
return sim;
|
||||
}
|
||||
catch (IOException e) {
|
||||
return 0.0;
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
@ -0,0 +1,29 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.analysis;
|
||||
|
||||
import java.io.Closeable;
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.util.AttributeFactory;
|
||||
import org.apache.lucene.util.AttributeSource;
|
||||
|
||||
public abstract class TokenStream extends AttributeSource implements Closeable {
|
||||
public static final AttributeFactory DEFAULT_TOKEN_ATTRIBUTE_FACTORY = null;
|
||||
|
||||
public abstract boolean incrementToken() throws IOException;
|
||||
}
|
@ -0,0 +1,38 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.analysis;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.io.Reader;
|
||||
|
||||
import org.apache.lucene.util.AttributeFactory;
|
||||
|
||||
public abstract class Tokenizer extends TokenStream {
|
||||
protected Reader input;
|
||||
|
||||
protected Tokenizer(AttributeFactory factory) {
|
||||
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() throws IOException {
|
||||
}
|
||||
|
||||
public void reset() throws IOException {
|
||||
}
|
||||
|
||||
}
|
@ -0,0 +1,25 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.analysis.tokenattributes;
|
||||
|
||||
import org.apache.lucene.util.Attribute;
|
||||
|
||||
public interface CharTermAttribute extends Attribute, CharSequence, Appendable {
|
||||
|
||||
public void copyBuffer(char[] buffer, int offset, int length);
|
||||
|
||||
}
|
@ -0,0 +1,26 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.document;
|
||||
|
||||
import org.apache.lucene.index.IndexableField;
|
||||
|
||||
public class Document {
|
||||
public final IndexableField getField(String name) {
|
||||
return null;
|
||||
}
|
||||
|
||||
}
|
@ -0,0 +1,27 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.index;
|
||||
|
||||
import java.io.Closeable;
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.document.Document;
|
||||
|
||||
public abstract class IndexReader implements Closeable {
|
||||
public final Document document(int docID) throws IOException {
|
||||
return null;
|
||||
}
|
||||
}
|
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.index;
|
||||
|
||||
public abstract class IndexReaderContext {
|
||||
public abstract IndexReader reader();
|
||||
|
||||
}
|
@ -0,0 +1,23 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.index;
|
||||
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
public interface IndexableField {
|
||||
public BytesRef binaryValue();
|
||||
}
|
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.index;
|
||||
|
||||
public abstract class LeafReader extends IndexReader {
|
||||
|
||||
}
|
@ -0,0 +1,24 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.index;
|
||||
|
||||
public final class LeafReaderContext extends IndexReaderContext {
|
||||
@Override
|
||||
public LeafReader reader() {
|
||||
return null;
|
||||
}
|
||||
}
|
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.util;
|
||||
|
||||
public interface Attribute {
|
||||
|
||||
}
|
@ -0,0 +1,20 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.util;
|
||||
|
||||
public abstract class AttributeFactory {
|
||||
|
||||
}
|
@ -0,0 +1,27 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.util;
|
||||
|
||||
public class AttributeSource {
|
||||
public final <T extends Attribute> T addAttribute(Class<T> attClass) {
|
||||
return null;
|
||||
}
|
||||
|
||||
public final void clearAttributes() {
|
||||
|
||||
}
|
||||
}
|
@ -0,0 +1,23 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for lucene class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.util;
|
||||
|
||||
public class BytesRef {
|
||||
public byte[] bytes;
|
||||
public int length;
|
||||
public int offset;
|
||||
}
|
@ -0,0 +1,34 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.common.settings;
|
||||
|
||||
import java.util.Set;
|
||||
|
||||
public class Settings {
|
||||
|
||||
public Integer getAsInt(String setting, Integer defaultValue) {
|
||||
return null;
|
||||
}
|
||||
|
||||
public String get(String setting) {
|
||||
return null;
|
||||
}
|
||||
|
||||
public Set<String> keySet() {
|
||||
return null;
|
||||
}
|
||||
}
|
21
Ghidra/Extensions/BSimElasticPlugin/srcdummy/org/elasticsearch/env/Environment.java
vendored
Normal file
21
Ghidra/Extensions/BSimElasticPlugin/srcdummy/org/elasticsearch/env/Environment.java
vendored
Normal file
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.env;
|
||||
|
||||
public class Environment {
|
||||
|
||||
}
|
@ -0,0 +1,26 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.index;
|
||||
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
|
||||
public class IndexModule {
|
||||
|
||||
public Settings getSettings() {
|
||||
return null;
|
||||
}
|
||||
}
|
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.index;
|
||||
|
||||
public final class IndexSettings {
|
||||
|
||||
}
|
@ -0,0 +1,27 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.index.analysis;
|
||||
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.index.IndexSettings;
|
||||
|
||||
public abstract class AbstractTokenizerFactory implements TokenizerFactory {
|
||||
|
||||
public AbstractTokenizerFactory(IndexSettings indexSettings, Settings settings, String name) {
|
||||
|
||||
}
|
||||
}
|
@ -0,0 +1,24 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.index.analysis;
|
||||
|
||||
import org.apache.lucene.analysis.Tokenizer;
|
||||
|
||||
public interface TokenizerFactory {
|
||||
Tokenizer create();
|
||||
|
||||
}
|
@ -0,0 +1,31 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.indices.analysis;
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.env.Environment;
|
||||
import org.elasticsearch.index.IndexSettings;
|
||||
|
||||
public class AnalysisModule {
|
||||
|
||||
public interface AnalysisProvider<T> {
|
||||
T get(IndexSettings indexSettings, Environment environment, String name, Settings settings)
|
||||
throws IOException;
|
||||
}
|
||||
}
|
@ -0,0 +1,27 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugins;
|
||||
|
||||
import java.util.Map;
|
||||
|
||||
import org.elasticsearch.index.analysis.TokenizerFactory;
|
||||
import org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider;
|
||||
|
||||
public interface AnalysisPlugin {
|
||||
Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers();
|
||||
|
||||
}
|
@ -0,0 +1,32 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugins;
|
||||
|
||||
import java.io.Closeable;
|
||||
import java.io.IOException;
|
||||
|
||||
import org.elasticsearch.index.IndexModule;
|
||||
|
||||
public abstract class Plugin implements Closeable {
|
||||
public void onIndexModule(IndexModule indexModule) {
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() throws IOException {
|
||||
|
||||
}
|
||||
}
|
@ -0,0 +1,28 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.plugins;
|
||||
|
||||
import java.util.Collection;
|
||||
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.script.ScriptContext;
|
||||
import org.elasticsearch.script.ScriptEngine;
|
||||
|
||||
public interface ScriptPlugin {
|
||||
ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts);
|
||||
|
||||
}
|
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
public interface DocReader {
|
||||
|
||||
}
|
@ -0,0 +1,28 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
import org.apache.lucene.index.LeafReaderContext;
|
||||
|
||||
public class DocValuesDocReader implements DocReader, LeafReaderContextSupplier {
|
||||
|
||||
@Override
|
||||
public LeafReaderContext getLeafReaderContext() {
|
||||
return null;
|
||||
}
|
||||
|
||||
}
|
@ -0,0 +1,23 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
import org.apache.lucene.index.LeafReaderContext;
|
||||
|
||||
public interface LeafReaderContextSupplier {
|
||||
LeafReaderContext getLeafReaderContext();
|
||||
}
|
@ -0,0 +1,50 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Map;
|
||||
|
||||
import org.elasticsearch.search.lookup.SearchLookup;
|
||||
|
||||
public abstract class ScoreScript {
|
||||
public ScoreScript(Map<String, Object> params, SearchLookup searchLookup, DocReader docReader) {
|
||||
|
||||
}
|
||||
|
||||
public static class ExplanationHolder {
|
||||
|
||||
}
|
||||
|
||||
public static final ScriptContext<ScoreScript.Factory> CONTEXT = null;
|
||||
|
||||
public interface Factory extends ScriptFactory {
|
||||
LeafFactory newFactory(Map<String, Object> params, SearchLookup lookup);
|
||||
}
|
||||
|
||||
public interface LeafFactory {
|
||||
boolean needs_score();
|
||||
|
||||
ScoreScript newInstance(DocReader reader) throws IOException;
|
||||
}
|
||||
|
||||
public int _getDocId() {
|
||||
return 0;
|
||||
}
|
||||
|
||||
public abstract double execute(ExplanationHolder explanation);
|
||||
}
|
@ -0,0 +1,22 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
public final class ScriptContext<T> {
|
||||
public final String name = null;
|
||||
public final Class<T> factoryClazz = null;
|
||||
}
|
@ -0,0 +1,30 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch interface
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
import java.io.Closeable;
|
||||
import java.util.Map;
|
||||
import java.util.Set;
|
||||
|
||||
public interface ScriptEngine extends Closeable {
|
||||
String getType();
|
||||
|
||||
<FactoryType> FactoryType compile(String name, String code, ScriptContext<FactoryType> context,
|
||||
Map<String, String> params);
|
||||
|
||||
Set<ScriptContext<?>> getSupportedContexts();
|
||||
}
|
@ -0,0 +1,22 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.script;
|
||||
|
||||
public interface ScriptFactory {
|
||||
boolean isResultDeterministic();
|
||||
|
||||
}
|
@ -0,0 +1,21 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
* NOTE: Dummy placeholder for elasticsearch class
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.elasticsearch.search.lookup;
|
||||
|
||||
public class SearchLookup {
|
||||
|
||||
}
|
9
Ghidra/Features/BSim/Module.manifest
Executable file
9
Ghidra/Features/BSim/Module.manifest
Executable file
@ -0,0 +1,9 @@
|
||||
##MODULE IP: Oxygen Icons - LGPL 3.0
|
||||
MODULE FILE LICENSE: postgresql-15.3.tar.gz Postgresql License
|
||||
MODULE FILE LICENSE: lib/postgresql-42.6.0.jar PostgresqlJDBC License
|
||||
MODULE FILE LICENSE: lib/json-simple-1.1.1.jar Apache License 2.0
|
||||
MODULE FILE LICENSE: lib/commons-dbcp2-2.9.0.jar Apache License 2.0
|
||||
MODULE FILE LICENSE: lib/commons-pool2-2.11.1.jar Apache License 2.0
|
||||
MODULE FILE LICENSE: lib/commons-logging-1.2.jar Apache License 2.0
|
||||
MODULE FILE LICENSE: lib/log4j-jcl-2.16.0.jar Apache License 2.0
|
||||
MODULE FILE LICENSE: lib/h2-2.2.220.jar H2 Mozilla License 2.0
|
197
Ghidra/Features/BSim/build.gradle
Executable file
197
Ghidra/Features/BSim/build.gradle
Executable file
@ -0,0 +1,197 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
apply from: "$rootProject.projectDir/gradle/distributableGhidraModule.gradle"
|
||||
apply from: "$rootProject.projectDir/gradle/javaProject.gradle"
|
||||
apply from: "$rootProject.projectDir/gradle/javaTestProject.gradle"
|
||||
apply from: "$rootProject.projectDir/gradle/nativeProject.gradle"
|
||||
apply from: "$rootProject.projectDir/gradle/helpProject.gradle"
|
||||
|
||||
apply plugin: 'eclipse'
|
||||
eclipse.project.name = 'Features BSim'
|
||||
|
||||
import java.nio.file.Files
|
||||
import org.gradle.util.GUtil
|
||||
|
||||
// NOTE: fetchDependencies.gradle must be updated if postgresql version changes
|
||||
def postgresql_distro = "postgresql-15.3.tar.gz"
|
||||
|
||||
dependencies {
|
||||
api project(":Decompiler")
|
||||
api project(":CodeCompare")
|
||||
|
||||
api "org.postgresql:postgresql:42.6.0"
|
||||
api "org.json.simple:json-simple:1.1.1"
|
||||
api "org.apache.commons:commons-dbcp2:2.9.0"
|
||||
api "org.apache.commons:commons-pool2:2.11.1"
|
||||
api "org.apache.commons:commons-logging:1.2"
|
||||
api "org.apache.logging.log4j:log4j-jcl:2.16.0"
|
||||
api "com.h2database:h2:2.2.220"
|
||||
}
|
||||
|
||||
// Copy postgresql source distro, lshvector plugin source, and make-postgres.sh
|
||||
// into common zip to allow for a rebuild of the postgres server if needed
|
||||
|
||||
rootProject.assembleDistribution {
|
||||
|
||||
String postgresqlDepsFile = "${DEPS_DIR}/BSim/${postgresql_distro}"
|
||||
String postgresqlBinRepoFile = "${BIN_REPO}/Ghidra/Features/BSim/${postgresql_distro}"
|
||||
|
||||
def postgresqlFile = file(postgresqlDepsFile).exists() ? postgresqlDepsFile : postgresqlBinRepoFile
|
||||
|
||||
into (getZipPath(this.project)) {
|
||||
from file("make-postgres.sh")
|
||||
}
|
||||
into (getZipPath(this.project)) {
|
||||
from file(postgresqlFile)
|
||||
}
|
||||
into (getZipPath(this.project) + "/src/lshvector") {
|
||||
from files("src/lshvector")
|
||||
}
|
||||
}
|
||||
|
||||
// Relative to the 'workingDir' Exec task property.
|
||||
def installPoint = "../help/help"
|
||||
|
||||
/**
|
||||
* Build the pdf docs for BSim and place into the '$installPoint' directory.
|
||||
* A build (ex: 'gradle buildLocalTSSI_Release') will place the pdf in the distribution.
|
||||
* There is an associated, auto-generated clean task.
|
||||
**/
|
||||
task buildBSimHelpPdf(type: Exec) {
|
||||
|
||||
workingDir 'src/main/doc'
|
||||
|
||||
def buildDir = "../../../build/BSimDocumentationPdf"
|
||||
|
||||
// Gradle will provide a cleanBuildBSimDocumentationPdf task that will remove these
|
||||
// declared outputs.
|
||||
outputs.dir "$workingDir/$buildDir"
|
||||
outputs.file "$workingDir/$buildDir/bsim.pdf"
|
||||
|
||||
// 'which' returns the number of failed arguments
|
||||
// Using the 'which' command first will allow the task to fail if the required
|
||||
// executables are not installed.
|
||||
//
|
||||
// The bash commands end with "2>&1" to redirect stderr to stdout and have all
|
||||
// messages print in sequence
|
||||
//
|
||||
// 'commandLine' takes one command, so wrap multiple commands in bash.
|
||||
commandLine 'bash', '-e', '-c', """
|
||||
echo '** Checking if required executables are installed. **'
|
||||
which xsltproc
|
||||
which fop
|
||||
|
||||
echo '** Preparing for xsltproc **'
|
||||
mkdir -p $buildDir/images
|
||||
|
||||
cp $installPoint/topics/BSimDatabasePlugin/images/*.png $buildDir/images
|
||||
|
||||
echo '** Building bsim.fo **'
|
||||
xsltproc --output $buildDir/bsim_withscaling.xml --stringparam profile.condition "withscaling" commonprofile.xsl bsim.xml 2>&1
|
||||
xsltproc --output $buildDir/bsim.fo focustom.xsl $buildDir/bsim_withscaling.xml 2>&1
|
||||
|
||||
echo '** Building bsim.pdf **'
|
||||
fop $buildDir/bsim.fo $buildDir/bsim.pdf 2>&1
|
||||
|
||||
echo '** Done. **'
|
||||
"""
|
||||
|
||||
// Allows doLast block regardless of exit value.
|
||||
ignoreExitValue true
|
||||
|
||||
// Store the output instead of printing to the console.
|
||||
standardOutput = new ByteArrayOutputStream()
|
||||
ext.output = { standardOutput.toString() }
|
||||
ext.errorOutput = { standardOutput.toString() }
|
||||
|
||||
// Check the OS before executing command.
|
||||
doFirst {
|
||||
if (!getCurrentPlatformName().startsWith("linux")) {
|
||||
throw new TaskExecutionException( it, new Exception("The '$it.name' task only works on Linux."))
|
||||
}
|
||||
}
|
||||
|
||||
// Print the output of the commands and check the return value.
|
||||
doLast {
|
||||
println output()
|
||||
if (execResult.exitValue) {
|
||||
logger.error("$it.name: An error occurred. Here is the output:\n" + output())
|
||||
throw new TaskExecutionException( it, new Exception("'$it.name': The command: '${commandLine.join(' ')}'" +
|
||||
" task \nfailed with exit code $execResult.exitValue; see task output for details."))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the html docs for BSim and place into the '$installPoint' directory.
|
||||
* A build (ex: 'gradle buildLocalTSSI_Release') will place the html files in the distribution.
|
||||
**/
|
||||
task buildBSimHelpHtml(type: Exec) {
|
||||
|
||||
workingDir 'src/main/doc'
|
||||
|
||||
def buildDir = "../../../build/html"
|
||||
|
||||
// 'which' returns the number of failed arguments
|
||||
// Using the 'which' command first will allow the task to fail if the required
|
||||
// executables are not installed.
|
||||
//
|
||||
// The bash commands end with "2>&1" to redirect stderr to stdout and have all
|
||||
// messages print in sequence
|
||||
//
|
||||
// 'commandLine' takes one command, so wrap multiple commands in bash.
|
||||
commandLine 'bash', '-e', '-c', """
|
||||
echo '** Checking if required executables are installed. **'
|
||||
which xsltproc
|
||||
which sed
|
||||
|
||||
echo '** Removing older html files installed under '$installPoint' **'
|
||||
rm -f $installPoint/topics/BSimDatabasePlugin/*.html
|
||||
|
||||
echo '** Building html files **'
|
||||
xsltproc --output $buildDir/bsim_noscaling.xml --stringparam profile.condition "noscaling" commonprofile.xsl bsim.xml 2>&1
|
||||
xsltproc --stringparam base.dir ${installPoint}/topics/BSimDatabasePlugin/ htmlcustom.xsl $buildDir/bsim_noscaling.xml 2>&1
|
||||
sed -i -e '/DefaultStyle.css/ { p; sQhref=".*"Qhref="../../shared/languages.css"Q; }' ${installPoint}/topics/BSimDatabasePlugin/*.html
|
||||
rm $installPoint/topics/BSimDatabasePlugin/index.html
|
||||
|
||||
echo '** Done. **'
|
||||
"""
|
||||
|
||||
// Allows doLast block regardless of exit value.
|
||||
ignoreExitValue true
|
||||
|
||||
// Store the output instead of printing to the console.
|
||||
standardOutput = new ByteArrayOutputStream()
|
||||
ext.output = { standardOutput.toString() }
|
||||
ext.errorOutput = { standardOutput.toString() }
|
||||
|
||||
// Check the OS before executing command.
|
||||
doFirst {
|
||||
if (!getCurrentPlatformName().startsWith("linux")) {
|
||||
throw new TaskExecutionException( it, new Exception("The '$it.name' task only works on Linux."))
|
||||
}
|
||||
}
|
||||
|
||||
// Print the output of the commands and check the return value.
|
||||
doLast {
|
||||
println output()
|
||||
if (execResult.exitValue) {
|
||||
logger.error("$it.name: An error occurred. Here is the output:\n" + output())
|
||||
throw new TaskExecutionException( it, new Exception("'$it.name': The command: '${commandLine.join(' ')}'" +
|
||||
" task \nfailed with exit code $execResult.exitValue; see task output for details."))
|
||||
}
|
||||
}
|
||||
}
|
51
Ghidra/Features/BSim/certification.manifest
Executable file
51
Ghidra/Features/BSim/certification.manifest
Executable file
@ -0,0 +1,51 @@
|
||||
##VERSION: 2.0
|
||||
##MODULE IP: Apache License 2.0
|
||||
##MODULE IP: Creative Commons Attribution 2.5
|
||||
##MODULE IP: Crystal Clear Icons - LGPL 2.1
|
||||
##MODULE IP: FAMFAMFAM Icons - CC 2.5
|
||||
##MODULE IP: H2 Mozilla License 2.0
|
||||
##MODULE IP: LGPL 2.1
|
||||
##MODULE IP: LGPL 3.0
|
||||
##MODULE IP: Oxygen Icons - LGPL 3.0
|
||||
##MODULE IP: Postgresql License
|
||||
##MODULE IP: PostgresqlJDBC License
|
||||
##MODULE IP: Public Domain
|
||||
Module.manifest||GHIDRA||||END|
|
||||
data/bsim.theme.properties||GHIDRA||||END|
|
||||
data/large_32.xml||GHIDRA||||END|
|
||||
data/lshweights_32.xml||GHIDRA|||Signature data|END|
|
||||
data/lshweights_64.xml||GHIDRA|||Signature data|END|
|
||||
data/lshweights_64_32.xml||GHIDRA|||Signature data|END|
|
||||
data/lshweights_cpool.xml||GHIDRA||||END|
|
||||
data/lshweights_nosize.xml||GHIDRA||||END|
|
||||
data/medium_32.xml||GHIDRA||||END|
|
||||
data/medium_64.xml||GHIDRA||||END|
|
||||
data/medium_cpool.xml||GHIDRA||||END|
|
||||
data/medium_nosize.xml||GHIDRA||||END|
|
||||
data/serverconfig.xml||GHIDRA||||END|
|
||||
src/lshvector/Makefile.lshvector||GHIDRA||||END|
|
||||
src/lshvector/lshvector--1.0.sql||GHIDRA||||END|
|
||||
src/lshvector/lshvector.control||GHIDRA||||END|
|
||||
src/main/help/help/TOC_Source.xml||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSim/BSimOverview.html||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSim/CommandLineReference.html||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSim/DatabaseConfiguration.html||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSim/FeatureWeight.html||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSim/IngestProcess.html||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/BSimSearch.html||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/AddServerDialog.png||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/ApplyResultsPanel.png||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/BSimOverviewDialog.png||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/BSimOverviewResults.png||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/BSimResultsProvider.png||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/BSimSearchDialog.png||GHIDRA||||END|
|
||||
src/main/help/help/topics/BSimSearchPlugin/images/ManageServersDialog.png||GHIDRA||||END|
|
||||
src/main/resources/bsim.log4j.xml||GHIDRA||||END|
|
||||
src/main/resources/images/checkmark_yellow.gif||GHIDRA||||END|
|
||||
src/main/resources/images/flag_green.png||FAMFAMFAM Icons - CC 2.5|||famfamfam silk icon set|END|
|
||||
src/main/resources/images/preferences-desktop-user-password.png||Oxygen Icons - LGPL 3.0|||Oxygen icon theme (dual license; LGPL or CC-SA-3.0)|END|
|
||||
src/main/resources/images/preferences-web-browser-shortcuts-32.png||Oxygen Icons - LGPL 3.0|||Oxygen icon theme (dual license; LGPL or CC-SA-3.0)|END|
|
||||
src/main/resources/images/preferences-web-browser-shortcuts.png||LGPL 3.0|||oxygen|END|
|
||||
src/main/resources/images/view_top_bottom.png||Crystal Clear Icons - LGPL 2.1||||END|
|
||||
src/main/resources/log4j-appender-console.xml||GHIDRA||||END|
|
||||
src/main/resources/log4j-appender-rolling-file.xml||GHIDRA||||END|
|
17
Ghidra/Features/BSim/data/bsim.theme.properties
Normal file
17
Ghidra/Features/BSim/data/bsim.theme.properties
Normal file
@ -0,0 +1,17 @@
|
||||
|
||||
[Defaults]
|
||||
|
||||
icon.bsim.query.dialog.provider = preferences-web-browser-shortcuts.png
|
||||
|
||||
icon.bsim.change.password = preferences-desktop-user-password.png
|
||||
|
||||
icon.bsim.table.split = view_top_bottom.png
|
||||
|
||||
icon.bsim.results.status.name.applied = checkmark_green.gif
|
||||
icon.bsim.results.status.signature.applied = EMPTY_ICON {checkmark_green.gif[move(-2,-1)]} {checkmark_green.gif [move(4,0)]}
|
||||
icon.bsim.results.status.matches = flag_green.png
|
||||
icon.bsim.results.status.ignored = checkmark_yellow.gif
|
||||
|
||||
icon.bsim.functions.table = FunctionScope.gif
|
||||
|
||||
[Dark Defaults]
|
13
Ghidra/Features/BSim/data/large_32.xml
Executable file
13
Ghidra/Features/BSim/data/large_32.xml
Executable file
@ -0,0 +1,13 @@
|
||||
<dbconfig>
|
||||
<info>
|
||||
<name>Large 32-bit</name>
|
||||
<owner>Example Owner</owner>
|
||||
<description>A large (~100 million functions) database tuned for 32-bit executables</description>
|
||||
<major>0</major>
|
||||
<minor>0</minor>
|
||||
<settings>0x49</settings>
|
||||
</info>
|
||||
<k>19</k>
|
||||
<L>232</L>
|
||||
<weightsfile>lshweights_32.xml</weightsfile>
|
||||
</dbconfig>
|
1587
Ghidra/Features/BSim/data/lshweights_32.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_32.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_64.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_64.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_64_32.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_64_32.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_cpool.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_cpool.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_nosize.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_nosize.xml
Executable file
File diff suppressed because it is too large
Load Diff
13
Ghidra/Features/BSim/data/medium_32.xml
Executable file
13
Ghidra/Features/BSim/data/medium_32.xml
Executable file
@ -0,0 +1,13 @@
|
||||
<dbconfig>
|
||||
<info>
|
||||
<name>Medium 32-bit</name>
|
||||
<owner>Example Owner</owner>
|
||||
<description>A medium sized (~10 million functions) database tuned for 32-bit executables</description>
|
||||
<major>0</major>
|
||||
<minor>0</minor>
|
||||
<settings>0x49</settings>
|
||||
</info>
|
||||
<k>17</k>
|
||||
<L>146</L>
|
||||
<weightsfile>lshweights_32.xml</weightsfile>
|
||||
</dbconfig>
|
13
Ghidra/Features/BSim/data/medium_64.xml
Executable file
13
Ghidra/Features/BSim/data/medium_64.xml
Executable file
@ -0,0 +1,13 @@
|
||||
<dbconfig>
|
||||
<info>
|
||||
<name>Medium 64-bit</name>
|
||||
<owner>Example Owner</owner>
|
||||
<description>A medium sized (~10 million functions) database tuned for 64-bit executables</description>
|
||||
<major>0</major>
|
||||
<minor>0</minor>
|
||||
<settings>0x49</settings>
|
||||
</info>
|
||||
<k>17</k>
|
||||
<L>146</L>
|
||||
<weightsfile>lshweights_64.xml</weightsfile>
|
||||
</dbconfig>
|
13
Ghidra/Features/BSim/data/medium_cpool.xml
Executable file
13
Ghidra/Features/BSim/data/medium_cpool.xml
Executable file
@ -0,0 +1,13 @@
|
||||
<dbconfig>
|
||||
<info>
|
||||
<name>Medium JVM/Dalvik</name>
|
||||
<owner>Example Owner</owner>
|
||||
<description>A medium sized (~10 million functions) database tuned for java .class or .dex files</description>
|
||||
<major>0</major>
|
||||
<minor>0</minor>
|
||||
<settings>0x49</settings>
|
||||
</info>
|
||||
<k>17</k>
|
||||
<L>146</L>
|
||||
<weightsfile>lshweights_cpool.xml</weightsfile>
|
||||
</dbconfig>
|
13
Ghidra/Features/BSim/data/medium_nosize.xml
Executable file
13
Ghidra/Features/BSim/data/medium_nosize.xml
Executable file
@ -0,0 +1,13 @@
|
||||
<dbconfig>
|
||||
<info>
|
||||
<name>Medium No Size</name>
|
||||
<owner>Example Owner</owner>
|
||||
<description>A medium sized (~10 million functions) database tuned for executables with different address/register sizes</description>
|
||||
<major>0</major>
|
||||
<minor>0</minor>
|
||||
<settings>0x4d</settings>
|
||||
</info>
|
||||
<k>17</k>
|
||||
<L>146</L>
|
||||
<weightsfile>lshweights_nosize.xml</weightsfile>
|
||||
</dbconfig>
|
14
Ghidra/Features/BSim/data/serverconfig.xml
Executable file
14
Ghidra/Features/BSim/data/serverconfig.xml
Executable file
@ -0,0 +1,14 @@
|
||||
<serverconfig> <!-- Runtime parameters for the query server -->
|
||||
<config key="shared_buffers">2GB</config> <!-- Amount of memory the server will use -->
|
||||
<config key="work_mem">16MB</config> <!-- Max memory to use for hash tables and sorts -->
|
||||
<config key="checkpoint_timeout">30min</config> <!-- Amount of time before all database records are flushed to disk -->
|
||||
<config key="listen_addresses">'*'</config> <!-- '*' = all available, '0.0.0.0' just IPv4, 'localhost' -->
|
||||
<config key="ssl">on</config> <!-- Enable server to connect via SSL -->
|
||||
<!-- <config key="ssl_ciphers">TLSv1.2</config> -->
|
||||
<config key="password_encryption">scram-sha-256</config>
|
||||
|
||||
<!-- <connect db="all" user="all" type="local" method="trust"/> -->
|
||||
<connect db="all" user="all" addr="127.0.0.1/32" type="hostssl" method="trust"/>
|
||||
<connect db="all" user="all" addr="::1/128" type="hostssl" method="trust"/>
|
||||
<connect db="all" user="all" addr="all" type="hostssl" method="trust"/>
|
||||
</serverconfig>
|
@ -0,0 +1,175 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
//Generate BSim signatures for the current program. The URL for the program is
|
||||
//created from the local storage location. These signatures are intended for the
|
||||
//in-memory database backend.
|
||||
//@category BSim
|
||||
import java.io.File;
|
||||
import java.io.IOException;
|
||||
import java.net.URL;
|
||||
import java.util.Iterator;
|
||||
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.base.values.GhidraValuesMap;
|
||||
import ghidra.features.bsim.query.*;
|
||||
import ghidra.features.bsim.query.BSimServerInfo.DBType;
|
||||
import ghidra.features.bsim.query.FunctionDatabase.Error;
|
||||
import ghidra.features.bsim.query.FunctionDatabase.ErrorCategory;
|
||||
import ghidra.features.bsim.query.description.DatabaseInformation;
|
||||
import ghidra.features.bsim.query.description.DescriptionManager;
|
||||
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager;
|
||||
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager.BSimH2FileDataSource;
|
||||
import ghidra.features.bsim.query.protocol.*;
|
||||
import ghidra.framework.model.DomainFolder;
|
||||
import ghidra.framework.protocol.ghidra.GhidraURL;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.listing.FunctionManager;
|
||||
import ghidra.util.MessageType;
|
||||
import ghidra.util.Msg;
|
||||
|
||||
//@category BSim
|
||||
//Generates and commits the BSim signatures for the currentProgram to the
|
||||
//selected H2 BSim database
|
||||
public class AddProgramToH2BSimDatabaseScript extends GhidraScript {
|
||||
|
||||
private static final String DATABASE = "H2 Database";
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
if (isRunningHeadless()) {
|
||||
popup("Use the \"bsim\" command-line tool to add programs to a database headlessly");
|
||||
return;
|
||||
}
|
||||
|
||||
if (currentProgram == null) {
|
||||
popup("This script requires that a program be open in the tool");
|
||||
return;
|
||||
}
|
||||
|
||||
GhidraValuesMap values = new GhidraValuesMap();
|
||||
values.defineFile(DATABASE, null, new File(System.getProperty("user.home")));
|
||||
values.setValidator((valueMap, status) -> {
|
||||
File selected = valueMap.getFile(DATABASE);
|
||||
if (selected.isDirectory() ||
|
||||
!selected.getAbsolutePath().endsWith(BSimServerInfo.H2_FILE_EXTENSION)) {
|
||||
status.setStatusText("Invalid Database File!", MessageType.ERROR);
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
});
|
||||
|
||||
askValues("Select Database File", null, values);
|
||||
|
||||
File h2DbFile = values.getFile(DATABASE);
|
||||
|
||||
FunctionDatabase h2Database = null;
|
||||
try {
|
||||
BSimServerInfo serverInfo =
|
||||
new BSimServerInfo(DBType.file, null, 0, h2DbFile.getAbsolutePath());
|
||||
h2Database = BSimClientFactory.buildClient(serverInfo, false);
|
||||
BSimH2FileDataSource bds =
|
||||
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||
if (bds == null) {
|
||||
popup(h2DbFile.getAbsolutePath() + " is not an H2 database file");
|
||||
return;
|
||||
}
|
||||
if (bds.getActiveConnections() > 0) {
|
||||
popup("There is an existing connection to the database.");
|
||||
return;
|
||||
}
|
||||
|
||||
h2Database.initialize();
|
||||
DatabaseInformation dbInfo = h2Database.getInfo();
|
||||
|
||||
LSHVectorFactory vectorFactory = h2Database.getLSHVectorFactory();
|
||||
GenSignatures gensig = null;
|
||||
try {
|
||||
gensig = new GenSignatures(dbInfo.trackcallgraph);
|
||||
gensig.setVectorFactory(vectorFactory);
|
||||
gensig.addExecutableCategories(dbInfo.execats);
|
||||
gensig.addFunctionTags(dbInfo.functionTags);
|
||||
gensig.addDateColumnName(dbInfo.dateColumnName);
|
||||
|
||||
DomainFolder df = currentProgram.getDomainFile().getParent();
|
||||
URL folderURL = df.getSharedProjectURL();
|
||||
if (folderURL == null) {
|
||||
folderURL = df.getLocalProjectURL();
|
||||
}
|
||||
String path = GhidraURL.getProjectPathname(folderURL);
|
||||
|
||||
URL normalizedProjectURL = GhidraURL.getProjectURL(folderURL);
|
||||
String repo = normalizedProjectURL.toExternalForm();
|
||||
|
||||
gensig.openProgram(this.currentProgram, null, null, null, repo, path);
|
||||
final FunctionManager fman = currentProgram.getFunctionManager();
|
||||
final Iterator<Function> iter = fman.getFunctions(true);
|
||||
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor);
|
||||
final DescriptionManager manager = gensig.getDescriptionManager();
|
||||
|
||||
//need to call sortCallGraph on each FunctionDescription
|
||||
//this de-dupes the list of callees for each function
|
||||
//without this there can be SQL errors due to inserting duplicate
|
||||
//entries into the callgraph table
|
||||
manager.listAllFunctions().forEachRemaining(fd -> fd.sortCallgraph());
|
||||
|
||||
InsertRequest insertreq = new InsertRequest();
|
||||
insertreq.manage = manager;
|
||||
if (insertreq.execute(h2Database) == null) {
|
||||
Error lastError = h2Database.getLastError();
|
||||
if ((lastError.category == ErrorCategory.Format) ||
|
||||
(lastError.category == ErrorCategory.Nonfatal)) {
|
||||
Msg.showWarn(this, null, "Skipping Insert",
|
||||
currentProgram.getName() + ": " + lastError.message);
|
||||
return;
|
||||
}
|
||||
throw new IOException(currentProgram.getName() + ": " + lastError.message);
|
||||
}
|
||||
|
||||
StringBuffer status = new StringBuffer(currentProgram.getName());
|
||||
status.append(" added to database ");
|
||||
status.append(dbInfo.databasename);
|
||||
status.append("\n\n");
|
||||
QueryExeCount exeCount = new QueryExeCount();
|
||||
ResponseExe countResponse = exeCount.execute(h2Database);
|
||||
if (countResponse != null) {
|
||||
status.append(dbInfo.databasename);
|
||||
status.append(" contains ");
|
||||
status.append(countResponse.recordCount);
|
||||
status.append(" executables.");
|
||||
}
|
||||
else {
|
||||
status.append("null response from QueryExeCount");
|
||||
}
|
||||
popup(status.toString());
|
||||
}
|
||||
finally {
|
||||
if (gensig != null) {
|
||||
gensig.dispose();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
finally {
|
||||
if (h2Database != null) {
|
||||
h2Database.close();
|
||||
BSimH2FileDataSource bds =
|
||||
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||
bds.dispose();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
80
Ghidra/Features/BSim/ghidra_scripts/CompareExecutables.java
Executable file
80
Ghidra/Features/BSim/ghidra_scripts/CompareExecutables.java
Executable file
@ -0,0 +1,80 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Calculate similarity/signifigance scores between executables by
|
||||
// combining their function scores.
|
||||
//@category BSim
|
||||
|
||||
import java.net.URL;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.BSimClientFactory;
|
||||
import ghidra.features.bsim.query.FunctionDatabase;
|
||||
import ghidra.features.bsim.query.client.*;
|
||||
import ghidra.features.bsim.query.description.ExecutableRecord;
|
||||
|
||||
public class CompareExecutables extends GhidraScript {
|
||||
|
||||
private ExecutableComparison exeCompare;
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
URL url = BSimClientFactory.deriveBSimURL("ghidra://localhost/repo");
|
||||
try (FunctionDatabase database = BSimClientFactory.buildClient(url, true)) {
|
||||
// FileScoreCaching cache = new FileScoreCaching("/tmp/test_scorecacher.txt");
|
||||
TableScoreCaching cache = new TableScoreCaching(database);
|
||||
exeCompare =
|
||||
new ExecutableComparison(database, 1000000, "11111111111111111111111111111111",
|
||||
cache,
|
||||
monitor);
|
||||
// Specify the list of executables to compare by giving their md5 hash
|
||||
// exeCompare.addExecutable("22222222222222222222222222222222"); // 32 hex-digit string
|
||||
// exeCompare.addExecutable("33333333333333333333333333333333");
|
||||
exeCompare.addAllExecutables(5000);
|
||||
ExecutableScorer scorer = exeCompare.getScorer();
|
||||
if (!exeCompare.isConfigured()) {
|
||||
exeCompare.resetThresholds(0.7, 10.0);
|
||||
}
|
||||
exeCompare.fillinSelfScores(); // Prefetch self-scores, calculate any we are missing
|
||||
|
||||
exeCompare.performScoring();
|
||||
scorer.commitSelfScore(); // Commit the newly calculated self-score
|
||||
|
||||
println("Maximum cluster size = " + Integer.toString(exeCompare.getMaxHitCount()));
|
||||
println("Hit count exceeded = " + Integer.toString(exeCompare.getExceedCount()));
|
||||
float scoreThresh = 0.01f;
|
||||
int numExe = scorer.numExecutables();
|
||||
ExecutableRecord exeA = scorer.getSingularExecutable();
|
||||
float selfScoreA = scorer.getSingularSelfScore();
|
||||
for (int i = 1; i <= numExe; ++i) {
|
||||
ExecutableRecord exeB = scorer.getExecutable(i);
|
||||
float selfScoreB = scorer.getScore(i);
|
||||
if (selfScoreB == 0.0f) { // This is possible if the executable has no "rare" functions.
|
||||
continue; // as defined by the ExecutableComparison.hitCountThreshold
|
||||
}
|
||||
ExecutableRecord smallRecord = selfScoreA < selfScoreB ? exeA : exeB;
|
||||
ExecutableRecord bigRecord = selfScoreA < selfScoreB ? exeB : exeA;
|
||||
float libScore = scorer.getNormalizedScore(i, true);
|
||||
float totalScore = scorer.getNormalizedScore(i, false);
|
||||
if (libScore < scoreThresh) {
|
||||
continue;
|
||||
}
|
||||
println(smallRecord.getNameExec() + " " + bigRecord.getNameExec());
|
||||
println(" " + Float.toString(libScore) + " library score");
|
||||
println(" " + Float.toString(totalScore) + " total score");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
148
Ghidra/Features/BSim/ghidra_scripts/CompareSignatures.java
Executable file
148
Ghidra/Features/BSim/ghidra_scripts/CompareSignatures.java
Executable file
@ -0,0 +1,148 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Use the decompiler to generate a signature for the current function containing the cursor
|
||||
// If we remember the last signature that was generated, compare this signature with
|
||||
// the last signature and print the similarity
|
||||
//@category BSim
|
||||
|
||||
import java.io.*;
|
||||
|
||||
import org.xml.sax.SAXException;
|
||||
|
||||
import generic.jar.ResourceFile;
|
||||
import generic.lsh.vector.*;
|
||||
import ghidra.app.decompiler.DecompInterface;
|
||||
import ghidra.app.decompiler.DecompileOptions;
|
||||
import ghidra.app.decompiler.signature.SignatureResult;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.app.services.ProgramManager;
|
||||
import ghidra.features.bsim.query.GenSignatures;
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.lang.LanguageID;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.listing.Program;
|
||||
import ghidra.util.xml.SpecXmlUtils;
|
||||
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||
import ghidra.xml.XmlPullParser;
|
||||
|
||||
public class CompareSignatures extends GhidraScript {
|
||||
|
||||
private LSHVectorFactory vectorFactory;
|
||||
|
||||
private LSHVector generateVector(Function f, Program program) {
|
||||
DecompInterface decompiler = new DecompInterface();
|
||||
decompiler.setOptions(new DecompileOptions());
|
||||
decompiler.toggleSyntaxTree(false);
|
||||
decompiler.setSignatureSettings(vectorFactory.getSettings());
|
||||
if (!decompiler.openProgram(program)) {
|
||||
println("Unable to initalize the Decompiler interface");
|
||||
println(decompiler.getLastMessage());
|
||||
return null;
|
||||
}
|
||||
SignatureResult sigres = decompiler.generateSignatures(f, false, 10, null);
|
||||
LSHVector vec = vectorFactory.buildVector(sigres.features);
|
||||
return vec;
|
||||
}
|
||||
|
||||
private Program getProgram(Program[] progarray, String name) {
|
||||
if ((name == null) || (progarray == null)) {
|
||||
return null;
|
||||
}
|
||||
for (Program prog : progarray) {
|
||||
if (name.equals(prog.getName())) {
|
||||
return prog;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
private static void readWeights(LSHVectorFactory vectorFactory, ResourceFile weightsFile)
|
||||
throws FileNotFoundException, IOException, SAXException {
|
||||
InputStream input = weightsFile.getInputStream();
|
||||
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||
SpecXmlUtils.getXmlHandler(), false);
|
||||
vectorFactory.readWeights(parser);
|
||||
input.close();
|
||||
}
|
||||
|
||||
private void buildLSHVectorFactory() {
|
||||
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||
try {
|
||||
LanguageID id = currentProgram.getLanguageID();
|
||||
ResourceFile defaultWeightsFile = GenSignatures.getWeightsFile(id, id);
|
||||
readWeights(vectorFactory, defaultWeightsFile);
|
||||
}
|
||||
catch (FileNotFoundException e) {
|
||||
// TODO Auto-generated catch block
|
||||
e.printStackTrace();
|
||||
}
|
||||
catch (IOException e) {
|
||||
// TODO Auto-generated catch block
|
||||
e.printStackTrace();
|
||||
}
|
||||
catch (SAXException e) {
|
||||
// TODO Auto-generated catch block
|
||||
e.printStackTrace();
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
Function func = this.getFunctionContaining(this.currentAddress);
|
||||
if (func == null) {
|
||||
return;
|
||||
}
|
||||
buildLSHVectorFactory();
|
||||
LSHVector vec = generateVector(func, currentProgram);
|
||||
ProgramManager programManager = state.getTool().getService(ProgramManager.class);
|
||||
Program[] progarray = programManager.getAllOpenPrograms();
|
||||
String lastprogram_string = System.getProperty("ghidra.lastprogram");
|
||||
Program lastprogram = getProgram(progarray, lastprogram_string);
|
||||
VectorCompare veccompare = new VectorCompare();
|
||||
if (lastprogram != null) {
|
||||
String addrstring = System.getProperty("ghidra.lastaddress");
|
||||
if (addrstring != null) {
|
||||
Address addr = lastprogram.getAddressFactory().getAddress(addrstring);
|
||||
Function lastfunction = lastprogram.getFunctionManager().getFunctionAt(addr);
|
||||
if (lastfunction != null) {
|
||||
LSHVector lastvector = generateVector(lastfunction, lastprogram);
|
||||
double sim = lastvector.compare(vec, veccompare);
|
||||
double signif = vectorFactory.calculateSignificance(veccompare);
|
||||
StringBuilder buf = new StringBuilder();
|
||||
buf.append("Comparison results:\n");
|
||||
buf.append(lastprogram.getName());
|
||||
buf.append(".");
|
||||
buf.append(lastfunction.getName());
|
||||
buf.append(" vs. ");
|
||||
buf.append(currentProgram.getName());
|
||||
buf.append(".");
|
||||
buf.append(func.getName());
|
||||
buf.append("\n Similarity: ");
|
||||
buf.append(Double.toString(sim));
|
||||
buf.append("\n Significance: ");
|
||||
buf.append(Double.toString(signif));
|
||||
buf.append("\n");
|
||||
lastvector.compareDetail(vec, buf);
|
||||
println(buf.toString());
|
||||
}
|
||||
}
|
||||
}
|
||||
System.setProperty("ghidra.lastprogram", currentProgram.getName());
|
||||
String addrstring = func.getEntryPoint().toString();
|
||||
System.setProperty("ghidra.lastaddress", addrstring);
|
||||
}
|
||||
|
||||
}
|
155
Ghidra/Features/BSim/ghidra_scripts/CompareSignaturesSpecifyWeights.java
Executable file
155
Ghidra/Features/BSim/ghidra_scripts/CompareSignaturesSpecifyWeights.java
Executable file
@ -0,0 +1,155 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Compare the BSim feature vectors of two functions.
|
||||
//@category BSim
|
||||
|
||||
import java.io.*;
|
||||
|
||||
import org.xml.sax.SAXException;
|
||||
|
||||
import generic.jar.ResourceFile;
|
||||
import generic.lsh.vector.*;
|
||||
import ghidra.app.decompiler.DecompInterface;
|
||||
import ghidra.app.decompiler.DecompileOptions;
|
||||
import ghidra.app.decompiler.signature.SignatureResult;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.app.services.ProgramManager;
|
||||
import ghidra.framework.Application;
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.listing.Program;
|
||||
import ghidra.util.exception.CancelledException;
|
||||
import ghidra.util.xml.SpecXmlUtils;
|
||||
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||
import ghidra.xml.XmlPullParser;
|
||||
|
||||
public class CompareSignaturesSpecifyWeights extends GhidraScript {
|
||||
|
||||
private static final String DEFAULT_LSH_WEIGHTS_FILE = "lshweights_nosize.xml";
|
||||
private LSHVectorFactory vectorFactory;
|
||||
|
||||
private LSHVector generateVector(Function f, Program program) {
|
||||
DecompInterface decompiler = new DecompInterface();
|
||||
decompiler.setOptions(new DecompileOptions());
|
||||
decompiler.setSignatureSettings(vectorFactory.getSettings());
|
||||
decompiler.toggleSyntaxTree(false);
|
||||
if (!decompiler.openProgram(program)) {
|
||||
println("Unable to initalize the Decompiler interface");
|
||||
println(decompiler.getLastMessage());
|
||||
return null;
|
||||
}
|
||||
|
||||
SignatureResult sigres = decompiler.generateSignatures(f, false, 10, null);
|
||||
|
||||
LSHVector vec = vectorFactory.buildVector(sigres.features);
|
||||
return vec;
|
||||
}
|
||||
|
||||
private static void readWeights(LSHVectorFactory vectorFactory, ResourceFile weightsFile)
|
||||
throws FileNotFoundException, IOException, SAXException {
|
||||
InputStream input = weightsFile.getInputStream();
|
||||
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||
SpecXmlUtils.getXmlHandler(), false);
|
||||
vectorFactory.readWeights(parser);
|
||||
input.close();
|
||||
}
|
||||
|
||||
private boolean buildLSHVectorFactory() {
|
||||
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||
try {
|
||||
String weightsFile =
|
||||
askString("Enter weights file name", "weights file", DEFAULT_LSH_WEIGHTS_FILE);
|
||||
ResourceFile defaultWeightsFile = Application.findDataFileInAnyModule(weightsFile);
|
||||
readWeights(vectorFactory, defaultWeightsFile);
|
||||
}
|
||||
catch (FileNotFoundException e) {
|
||||
e.printStackTrace();
|
||||
return false;
|
||||
}
|
||||
catch (IOException e) {
|
||||
e.printStackTrace();
|
||||
return false;
|
||||
}
|
||||
catch (SAXException e) {
|
||||
e.printStackTrace();
|
||||
return false;
|
||||
}
|
||||
catch (CancelledException e) {
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
private Program getProgram(Program[] progarray, String name) {
|
||||
if ((name == null) || (progarray == null)) {
|
||||
return null;
|
||||
}
|
||||
for (Program prog : progarray) {
|
||||
if (name.equals(prog.getName())) {
|
||||
return prog;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
Function func = this.getFunctionContaining(this.currentAddress);
|
||||
if (func == null) {
|
||||
return;
|
||||
}
|
||||
if (!buildLSHVectorFactory()) {
|
||||
return;
|
||||
}
|
||||
LSHVector vec = generateVector(func, currentProgram);
|
||||
ProgramManager programManager = state.getTool().getService(ProgramManager.class);
|
||||
Program[] progarray = programManager.getAllOpenPrograms();
|
||||
String lastprogram_string = System.getProperty("ghidra.lastprogram");
|
||||
Program lastprogram = getProgram(progarray, lastprogram_string);
|
||||
VectorCompare veccompare = new VectorCompare();
|
||||
if (lastprogram != null) {
|
||||
String addrstring = System.getProperty("ghidra.lastaddress");
|
||||
if (addrstring != null) {
|
||||
Address addr = lastprogram.getAddressFactory().getAddress(addrstring);
|
||||
Function lastfunction = lastprogram.getFunctionManager().getFunctionAt(addr);
|
||||
if (lastfunction != null) {
|
||||
LSHVector lastvector = generateVector(lastfunction, lastprogram);
|
||||
double sim = lastvector.compare(vec, veccompare);
|
||||
double signif = vectorFactory.calculateSignificance(veccompare);
|
||||
StringBuilder buf = new StringBuilder();
|
||||
buf.append("Comparison results:\n");
|
||||
buf.append(lastprogram.getName());
|
||||
buf.append(".");
|
||||
buf.append(lastfunction.getName());
|
||||
buf.append(" vs. ");
|
||||
buf.append(currentProgram.getName());
|
||||
buf.append(".");
|
||||
buf.append(func.getName());
|
||||
buf.append("\n Similarity: ");
|
||||
buf.append(Double.toString(sim));
|
||||
buf.append("\n Significance: ");
|
||||
buf.append(Double.toString(signif));
|
||||
buf.append("\n");
|
||||
lastvector.compareDetail(vec, buf);
|
||||
println(buf.toString());
|
||||
}
|
||||
}
|
||||
}
|
||||
System.setProperty("ghidra.lastprogram", currentProgram.getName());
|
||||
String addrstring = func.getEntryPoint().toString();
|
||||
System.setProperty("ghidra.lastaddress", addrstring);
|
||||
}
|
||||
}
|
@ -0,0 +1,170 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
//Creates an empty file-based H2 BSim database
|
||||
//@category BSim
|
||||
import java.io.File;
|
||||
import java.io.IOException;
|
||||
import java.util.*;
|
||||
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.base.values.GhidraValuesMap;
|
||||
import ghidra.features.bsim.query.*;
|
||||
import ghidra.features.bsim.query.BSimServerInfo.DBType;
|
||||
import ghidra.features.bsim.query.FunctionDatabase.Error;
|
||||
import ghidra.features.bsim.query.description.DatabaseInformation;
|
||||
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager;
|
||||
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager.BSimH2FileDataSource;
|
||||
import ghidra.features.bsim.query.protocol.*;
|
||||
import ghidra.util.MessageType;
|
||||
import ghidra.util.Msg;
|
||||
|
||||
public class CreateH2BSimDatabaseScript extends GhidraScript {
|
||||
private static final String NAME = "Database Name";
|
||||
private static final String DIRECTORY = "Database Directory";
|
||||
private static final String DATABASE_TEMPLATE = "Database Template";
|
||||
private static final String FUNCTION_TAGS = "Function Tags (CSV)";
|
||||
private static final String EXECUTABLE_CATEGORIES = "Executable Categories (CSV)";
|
||||
|
||||
private static final String[] templates =
|
||||
{ "medium_nosize", "medium_32", "medium_64", "medium_cpool" };
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
if (isRunningHeadless()) {
|
||||
popup("Use \"bsim\" to create an H2 BSim database from the command line");
|
||||
return;
|
||||
}
|
||||
|
||||
GhidraValuesMap values = new GhidraValuesMap();
|
||||
values.defineString(NAME, "");
|
||||
values.defineDirectory(DIRECTORY, new File(System.getProperty("user.home")));
|
||||
values.defineChoice(DATABASE_TEMPLATE, "medium_nosize", templates);
|
||||
values.defineString(FUNCTION_TAGS);
|
||||
values.defineString(EXECUTABLE_CATEGORIES);
|
||||
|
||||
values.setValidator((valueMap, status) -> {
|
||||
String databaseName = valueMap.getString(NAME);
|
||||
if (StringUtils.isBlank(databaseName)) {
|
||||
status.setStatusText("Name must be filled in!", MessageType.ERROR);
|
||||
return false;
|
||||
}
|
||||
File directory = valueMap.getFile(DIRECTORY);
|
||||
if (!directory.isDirectory()) {
|
||||
status.setStatusText("Invalid directory!", MessageType.ERROR);
|
||||
return false;
|
||||
}
|
||||
File dbFile = new File(directory, databaseName);
|
||||
File testFile = new File(dbFile.getPath() + BSimServerInfo.H2_FILE_EXTENSION);
|
||||
if (testFile.exists()) {
|
||||
status.setStatusText("Database file already exists!", MessageType.ERROR);
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
});
|
||||
|
||||
askValues("Enter Database Parameters",
|
||||
"Enter values required to create a new BSim H2 database.", values);
|
||||
|
||||
FunctionDatabase h2Database = null;
|
||||
try {
|
||||
String databaseName = values.getString(NAME);
|
||||
File dbDir = values.getFile(DIRECTORY);
|
||||
String template = values.getChoice(DATABASE_TEMPLATE);
|
||||
String functionTagsCSV = values.getString(FUNCTION_TAGS);
|
||||
List<String> tags = parseCSV(functionTagsCSV);
|
||||
|
||||
String exeCatCSV = values.getString(EXECUTABLE_CATEGORIES);
|
||||
List<String> cats = parseCSV(exeCatCSV);
|
||||
|
||||
File dbFile = new File(dbDir, databaseName);
|
||||
|
||||
BSimServerInfo serverInfo =
|
||||
new BSimServerInfo(DBType.file, null, 0, dbFile.getAbsolutePath());
|
||||
h2Database = BSimClientFactory.buildClient(serverInfo, false);
|
||||
BSimH2FileDataSource bds =
|
||||
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||
if (bds.getActiveConnections() > 0) {
|
||||
//if this happens, there is a connection to the database but the
|
||||
//database file was deleted
|
||||
Msg.showError(this, null, "Connection Error",
|
||||
"There is an existing connection to the database!");
|
||||
return;
|
||||
}
|
||||
|
||||
CreateDatabase command = new CreateDatabase();
|
||||
command.info = new DatabaseInformation();
|
||||
// Put in fields provided on the command line
|
||||
// If they are null, the template will fill them in
|
||||
command.info.databasename = databaseName;
|
||||
command.config_template = template;
|
||||
command.info.trackcallgraph = true;
|
||||
ResponseInfo response = command.execute(h2Database);
|
||||
if (response == null) {
|
||||
throw new IOException(h2Database.getLastError().message);
|
||||
}
|
||||
|
||||
for (String tag : tags) {
|
||||
InstallTagRequest req = new InstallTagRequest();
|
||||
req.tag_name = tag;
|
||||
ResponseInfo resp = req.execute(h2Database);
|
||||
if (resp == null) {
|
||||
Error lastError = h2Database.getLastError();
|
||||
throw new LSHException(lastError.message);
|
||||
}
|
||||
}
|
||||
|
||||
for (String cat : cats) {
|
||||
InstallCategoryRequest req = new InstallCategoryRequest();
|
||||
req.type_name = cat;
|
||||
ResponseInfo resp = req.execute(h2Database);
|
||||
if (resp == null) {
|
||||
Error lastError = h2Database.getLastError();
|
||||
throw new LSHException(lastError.message);
|
||||
}
|
||||
}
|
||||
popup("Database " + values.getString(NAME) + " created successfully!");
|
||||
}
|
||||
finally {
|
||||
if (h2Database != null) {
|
||||
h2Database.close();
|
||||
BSimH2FileDataSource bds =
|
||||
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||
bds.dispose();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
//this de-dupes
|
||||
private List<String> parseCSV(String csv) {
|
||||
Set<String> parsed = new HashSet<>();
|
||||
if (StringUtils.isEmpty(csv)) {
|
||||
return new ArrayList<String>();
|
||||
}
|
||||
String[] parts = csv.split(",");
|
||||
for (String p : parts) {
|
||||
if (!StringUtils.isBlank(p)) {
|
||||
parsed.add(p.trim());
|
||||
}
|
||||
}
|
||||
List<String> res = new ArrayList<>(parsed);
|
||||
res.sort(String::compareTo);
|
||||
return res;
|
||||
}
|
||||
|
||||
}
|
72
Ghidra/Features/BSim/ghidra_scripts/DebugSignatures.java
Executable file
72
Ghidra/Features/BSim/ghidra_scripts/DebugSignatures.java
Executable file
@ -0,0 +1,72 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.util.List;
|
||||
|
||||
import ghidra.app.decompiler.DecompInterface;
|
||||
import ghidra.app.decompiler.DecompileOptions;
|
||||
import ghidra.app.decompiler.signature.DebugSignature;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.program.model.lang.Language;
|
||||
import ghidra.program.model.listing.Function;
|
||||
|
||||
public class DebugSignatures extends GhidraScript {
|
||||
|
||||
private static final int SIGNATURE_SETTINGS = 0x45;
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
Function func = this.getFunctionContaining(this.currentAddress);
|
||||
|
||||
if (func == null) {
|
||||
popup("No function selected!");
|
||||
return;
|
||||
}
|
||||
|
||||
DecompInterface decompiler = new DecompInterface();
|
||||
decompiler.setOptions(new DecompileOptions());
|
||||
decompiler.toggleSyntaxTree(false);
|
||||
decompiler.setSignatureSettings(SIGNATURE_SETTINGS);
|
||||
if (!decompiler.openProgram(this.currentProgram)) {
|
||||
println("Unable to initalize the Decompiler interface");
|
||||
println(decompiler.getLastMessage());
|
||||
return;
|
||||
}
|
||||
|
||||
Language language = this.currentProgram.getLanguage();
|
||||
List<DebugSignature> sigres = decompiler.debugSignatures(func, 10, null);
|
||||
|
||||
StringBuffer buf = new StringBuffer();
|
||||
buf.append("\nFunction: ");
|
||||
buf.append(func.getName());
|
||||
buf.append("\nentry: ");
|
||||
buf.append(func.getEntryPoint().toString());
|
||||
buf.append("\n\n");
|
||||
if (sigres == null) {
|
||||
printf("Null sigres!\n");
|
||||
}
|
||||
else {
|
||||
for (int i = 0; i < sigres.size(); ++i) {
|
||||
sigres.get(i).printRaw(language, buf);
|
||||
buf.append("\n");
|
||||
}
|
||||
}
|
||||
printf("%s\n", buf.toString());
|
||||
decompiler.closeProgram();
|
||||
decompiler.dispose();
|
||||
}
|
||||
|
||||
}
|
61
Ghidra/Features/BSim/ghidra_scripts/DumpDebugSignatures.py
Executable file
61
Ghidra/Features/BSim/ghidra_scripts/DumpDebugSignatures.py
Executable file
@ -0,0 +1,61 @@
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
# Use the decompiler to generate signatures for the function at the current address, then dump the
|
||||
# signature hashes and debug information to the console
|
||||
# @category: BSim.python
|
||||
|
||||
import ghidra.app.decompiler.tracking.DecompInterfaceTracking as DecompInterfaceTracking
|
||||
import ghidra.app.decompiler.DecompileOptions as DecompileOptions
|
||||
import generic.lsh.vector.WeightedLSHCosineVectorFactory as WeightedLSHCosineVectorFactory
|
||||
import ghidra.query.GenSignatures as GenSignatures
|
||||
import ghidra.xml.NonThreadedXmlPullParserImpl as NonThreadedXmlPullParserImpl
|
||||
import ghidra.util.xml.SpecXmlUtils as SpecXmlUtils
|
||||
|
||||
|
||||
def processFunction(func):
|
||||
decompiler = DecompInterfaceTracking()
|
||||
options = DecompileOptions()
|
||||
decompiler.setOptions(options)
|
||||
decompiler.toggleSyntaxTree(False)
|
||||
decompiler.setSignatureSettings(getSettings())
|
||||
if not decompiler.openProgram(currentProgram):
|
||||
print "Unable to initialize the Decompiler interface!"
|
||||
print "%s" % decompiler.getLastMessage()
|
||||
return
|
||||
language = currentProgram.getLanguage()
|
||||
sigres = decompiler.debugSignatures(func,10,None)
|
||||
for i,res in enumerate(sigres):
|
||||
buf = java.lang.StringBuffer()
|
||||
sigres.get(i).printRaw(language,buf)
|
||||
print "%s" % buf.toString()
|
||||
decompiler.closeProgram()
|
||||
decompiler.dispose()
|
||||
|
||||
def getSettings():
|
||||
vectorFactory = WeightedLSHCosineVectorFactory()
|
||||
id = currentProgram.getLanguageID()
|
||||
defaultWeightsFile = GenSignatures.getWeightsFile(id,id)
|
||||
input = defaultWeightsFile.getInputStream()
|
||||
parser = NonThreadedXmlPullParserImpl(input,"Vector weights parser", SpecXmlUtils.getXmlHandler(),False)
|
||||
vectorFactory.readWeights(parser)
|
||||
input.close()
|
||||
return vectorFactory.getSettings()
|
||||
|
||||
func = currentProgram.getFunctionManager().getFunctionContaining(currentAddress)
|
||||
if func is None:
|
||||
print "no function at current address"
|
||||
else:
|
||||
processFunction(func)
|
115
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.java
Executable file
115
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.java
Executable file
@ -0,0 +1,115 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Use the decompiler to generate signatures for the function currently containing the cursor
|
||||
// and dump the signature hashes to the console
|
||||
//@category BSim
|
||||
|
||||
import java.io.*;
|
||||
import java.util.List;
|
||||
|
||||
import org.xml.sax.SAXException;
|
||||
|
||||
import generic.jar.ResourceFile;
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import generic.lsh.vector.WeightedLSHCosineVectorFactory;
|
||||
import ghidra.app.decompiler.DecompInterface;
|
||||
import ghidra.app.decompiler.DecompileOptions;
|
||||
import ghidra.app.decompiler.signature.DebugSignature;
|
||||
import ghidra.app.decompiler.signature.SignatureResult;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.GenSignatures;
|
||||
import ghidra.program.model.lang.Language;
|
||||
import ghidra.program.model.lang.LanguageID;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.util.xml.SpecXmlUtils;
|
||||
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||
import ghidra.xml.XmlPullParser;
|
||||
|
||||
public class DumpSignatures extends GhidraScript {
|
||||
|
||||
private LSHVectorFactory vectorFactory;
|
||||
|
||||
@Override
|
||||
public void run() throws Exception {
|
||||
Function func = this.getFunctionContaining(this.currentAddress);
|
||||
if (func == null) {
|
||||
return;
|
||||
}
|
||||
buildLSHVectorFactory();
|
||||
boolean debug = false;
|
||||
DecompInterface decompiler = new DecompInterface();
|
||||
decompiler.setOptions(new DecompileOptions());
|
||||
decompiler.setSignatureSettings(vectorFactory.getSettings());
|
||||
decompiler.toggleSyntaxTree(false);
|
||||
if (!decompiler.openProgram(this.currentProgram)) {
|
||||
println("Unable to initalize the Decompiler interface");
|
||||
println(decompiler.getLastMessage());
|
||||
return;
|
||||
}
|
||||
if (!debug) {
|
||||
SignatureResult sigres = decompiler.generateSignatures(func, false, 10, null);
|
||||
StringBuffer buf = new StringBuffer("\n");
|
||||
for (int feature : sigres.features) {
|
||||
buf.append(Integer.toHexString(feature));
|
||||
buf.append("\n");
|
||||
}
|
||||
println(buf.toString());
|
||||
}
|
||||
else {
|
||||
Language language = this.currentProgram.getLanguage();
|
||||
List<DebugSignature> sigres = decompiler.debugSignatures(func, 10, null);
|
||||
StringBuffer buf = new StringBuffer("\n");
|
||||
for (int i = 0; i < sigres.size(); ++i) {
|
||||
sigres.get(i).printRaw(language, buf);
|
||||
buf.append("\n");
|
||||
}
|
||||
println(buf.toString());
|
||||
}
|
||||
decompiler.closeProgram();
|
||||
decompiler.dispose();
|
||||
}
|
||||
|
||||
private static void readWeights(LSHVectorFactory vectorFactory, ResourceFile weightsFile)
|
||||
throws FileNotFoundException, IOException, SAXException {
|
||||
InputStream input = weightsFile.getInputStream();
|
||||
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||
SpecXmlUtils.getXmlHandler(), false);
|
||||
vectorFactory.readWeights(parser);
|
||||
input.close();
|
||||
}
|
||||
|
||||
private void buildLSHVectorFactory() {
|
||||
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||
try {
|
||||
LanguageID id = currentProgram.getLanguageID();
|
||||
ResourceFile defaultWeightsFile = GenSignatures.getWeightsFile(id, id);
|
||||
readWeights(vectorFactory, defaultWeightsFile);
|
||||
}
|
||||
catch (FileNotFoundException e) {
|
||||
// TODO Auto-generated catch block
|
||||
e.printStackTrace();
|
||||
}
|
||||
catch (IOException e) {
|
||||
// TODO Auto-generated catch block
|
||||
e.printStackTrace();
|
||||
}
|
||||
catch (SAXException e) {
|
||||
// TODO Auto-generated catch block
|
||||
e.printStackTrace();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
61
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.py
Executable file
61
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.py
Executable file
@ -0,0 +1,61 @@
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
# Use the decompiler to generate signatures for the function at the current address, then dump the
|
||||
# signature hashes to the console
|
||||
# @category: BSim.python
|
||||
|
||||
import ghidra.app.decompiler.tracking.DecompInterfaceTracking as DecompInterfaceTracking
|
||||
import ghidra.app.decompiler.DecompileOptions as DecompileOptions
|
||||
import generic.lsh.vector.WeightedLSHCosineVectorFactory as WeightedLSHCosineVectorFactory
|
||||
import ghidra.query.GenSignatures as GenSignatures
|
||||
import ghidra.xml.NonThreadedXmlPullParserImpl as NonThreadedXmlPullParserImpl
|
||||
import ghidra.util.xml.SpecXmlUtils as SpecXmlUtils
|
||||
|
||||
|
||||
def processFunction(func):
|
||||
decompiler = ghidra.app.decompiler.tracking.DecompInterfaceTracking()
|
||||
options = ghidra.app.decompiler.DecompileOptions()
|
||||
decompiler.setOptions(options)
|
||||
decompiler.toggleSyntaxTree(False)
|
||||
decompiler.setSignatureSettings(getSettings())
|
||||
if not decompiler.openProgram(currentProgram):
|
||||
print "Unable to initialize the Decompiler interface!"
|
||||
print "%s" % decompiler.getLastMessage()
|
||||
return
|
||||
sigres = decompiler.generateSignatures(func, False, 10, None)
|
||||
buf = java.lang.StringBuffer()
|
||||
for i,res in enumerate(sigres.features):
|
||||
buf.append(java.lang.Integer.toHexString(sigres.features[i]))
|
||||
buf.append("\n")
|
||||
print buf.toString()
|
||||
decompiler.closeProgram()
|
||||
decompiler.dispose()
|
||||
|
||||
def getSettings():
|
||||
vectorFactory = WeightedLSHCosineVectorFactory()
|
||||
id = currentProgram.getLanguageID()
|
||||
defaultWeightsFile = GenSignatures.getWeightsFile(id,id)
|
||||
input = defaultWeightsFile.getInputStream()
|
||||
parser = NonThreadedXmlPullParserImpl(input,"Vector weights parser", SpecXmlUtils.getXmlHandler(),False)
|
||||
vectorFactory.readWeights(parser)
|
||||
input.close()
|
||||
return vectorFactory.getSettings()
|
||||
|
||||
func = currentProgram.getFunctionManager().getFunctionContaining(currentAddress)
|
||||
if func is None:
|
||||
print "no function at current address"
|
||||
else:
|
||||
processFunction(func)
|
69
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.java
Executable file
69
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.java
Executable file
@ -0,0 +1,69 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
//Example of how to perform an overview query in a script.
|
||||
//@category BSim
|
||||
import java.util.HashSet;
|
||||
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.facade.SFOverviewInfo;
|
||||
import ghidra.features.bsim.query.facade.SimilarFunctionQueryService;
|
||||
import ghidra.features.bsim.query.protocol.ResponseNearestVector;
|
||||
import ghidra.features.bsim.query.protocol.SimilarityVectorResult;
|
||||
import ghidra.program.database.symbol.FunctionSymbol;
|
||||
import ghidra.program.model.listing.*;
|
||||
|
||||
|
||||
public class ExampleOverviewQuery extends GhidraScript {
|
||||
private static final double SIMILARITY_BOUND = 0.7;
|
||||
private static final double SIGNIFICANCE_BOUND = 0.0;
|
||||
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
Program queryingProgram = currentProgram;
|
||||
HashSet<FunctionSymbol> funcsToQuery = new HashSet<>();
|
||||
FunctionIterator fIter = queryingProgram.getFunctionManager().getFunctionsNoStubs(true);
|
||||
for (Function func : fIter){
|
||||
funcsToQuery.add((FunctionSymbol) func.getSymbol());
|
||||
}
|
||||
SFOverviewInfo overviewInfo = new SFOverviewInfo(funcsToQuery);
|
||||
overviewInfo.setSimilarityThreshold(SIMILARITY_BOUND);
|
||||
overviewInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND);
|
||||
|
||||
try (SimilarFunctionQueryService queryService =
|
||||
new SimilarFunctionQueryService(queryingProgram)) {
|
||||
String DATABASE_URL = askString("Enter database URL", "URL:");
|
||||
queryService.initializeDatabase(DATABASE_URL);
|
||||
LSHVectorFactory vectorFactory = queryService.getLSHVectorFactory();
|
||||
|
||||
ResponseNearestVector overviewResults =
|
||||
queryService.overviewSimilarFunctions(overviewInfo, null, monitor);
|
||||
StringBuilder buf = new StringBuilder();
|
||||
buf.append("\n");
|
||||
for (SimilarityVectorResult result : overviewResults.result) {
|
||||
buf.append("Name: ").append(result.getBase().getFunctionName()).append("\n");
|
||||
buf.append("Hit Count: ").append(result.getTotalCount()).append("\n");
|
||||
buf.append("Self-significance: ");
|
||||
buf.append(vectorFactory
|
||||
.getSelfSignificance(result.getBase().getSignatureRecord().getLSHVector()));
|
||||
buf.append("\n\n");
|
||||
}
|
||||
printf("%s\n", buf.toString());
|
||||
}
|
||||
}
|
||||
|
||||
}
|
47
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.py
Executable file
47
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.py
Executable file
@ -0,0 +1,47 @@
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
# Example of how to perform an overview query in a script
|
||||
# @category BSim.python
|
||||
|
||||
import ghidra.query.facade.SFOverviewInfo as SFOverviewInfo
|
||||
import ghidra.query.facade.SimilarFunctionQueryService as SimilarFunctionQueryService
|
||||
import java.util.HashSet
|
||||
|
||||
SIMILARITY_BOUND = 0.7
|
||||
SIGNIFICANCE_BOUND = 0.0
|
||||
|
||||
funcsToQuery = java.util.HashSet()
|
||||
fIter = currentProgram.getFunctionManager().getFunctionsNoStubs(True)
|
||||
for func in fIter:
|
||||
funcsToQuery.add(func.getSymbol())
|
||||
|
||||
overviewInfo = SFOverviewInfo(funcsToQuery)
|
||||
overviewInfo.setSimilarityThreshold(SIMILARITY_BOUND)
|
||||
overviewInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND)
|
||||
|
||||
queryService = SimilarFunctionQueryService(currentProgram)
|
||||
DB_URL = askString("Enter database URL", "URL:")
|
||||
queryService.initializeDatabase(DB_URL)
|
||||
vectorFactory = queryService.getLSHVectorFactory()
|
||||
|
||||
overviewResults = queryService.overviewSimilarFunctions(overviewInfo, monitor)
|
||||
|
||||
for result in overviewResults.result:
|
||||
print "Name: %s" % result.getBase().getFunctionName()
|
||||
print "Hit Count: %d" % result.getTotalCount()
|
||||
print "Self-significance: %f\n" % vectorFactory.getSelfSignificance(result.getBase().getSignatureRecord().getLSHVector())
|
||||
|
||||
queryService.dispose()
|
83
Ghidra/Features/BSim/ghidra_scripts/ExampleQueryClient.java
Executable file
83
Ghidra/Features/BSim/ghidra_scripts/ExampleQueryClient.java
Executable file
@ -0,0 +1,83 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Example of connecting to a BSim server and requesting executable and function records
|
||||
//@category BSim
|
||||
|
||||
import java.io.StringWriter;
|
||||
import java.net.URL;
|
||||
import java.util.List;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.BSimClientFactory;
|
||||
import ghidra.features.bsim.query.FunctionDatabase;
|
||||
import ghidra.features.bsim.query.description.*;
|
||||
import ghidra.features.bsim.query.protocol.*;
|
||||
import ghidra.util.Msg;
|
||||
|
||||
public class ExampleQueryClient extends GhidraScript {
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
URL url = BSimClientFactory.deriveBSimURL("ghidra://localhost/repo");
|
||||
try (FunctionDatabase client = BSimClientFactory.buildClient(url, false)) {
|
||||
if (!client.initialize()) {
|
||||
Msg.error(this, "Unable to connect to server");
|
||||
return;
|
||||
}
|
||||
|
||||
QueryInfo query = new QueryInfo();
|
||||
ResponseInfo resp = query.execute(client);
|
||||
StringWriter write = new StringWriter();
|
||||
resp.saveXml(write);
|
||||
write.flush();
|
||||
|
||||
QueryName exequery = new QueryName();
|
||||
exequery.spec.exename = "libdocdoxygenplugin.so";
|
||||
ResponseName respname = exequery.execute(client);
|
||||
if (respname == null) {
|
||||
Msg.error(this, client.getLastError());
|
||||
return;
|
||||
}
|
||||
ExecutableRecord erec = respname.manage.getExecutableRecordSet().first();
|
||||
FunctionDescription funcrec =
|
||||
respname.manage.findFunctionByName("DocDoxygenPlugin::createCatalog", erec);
|
||||
|
||||
QueryChildren childquery = new QueryChildren();
|
||||
childquery.md5sum = funcrec.getExecutableRecord().getMd5();
|
||||
childquery.functionKeys.add(new FunctionEntry(funcrec));
|
||||
|
||||
ResponseChildren respchild = childquery.execute(client);
|
||||
if (respchild == null) {
|
||||
Msg.error(this, client.getLastError());
|
||||
return;
|
||||
}
|
||||
for (int i = 0; i < respchild.correspond.size(); ++i) {
|
||||
FunctionDescription func = respchild.correspond.get(i);
|
||||
List<CallgraphEntry> callgraphRecord = func.getCallgraphRecord();
|
||||
if (callgraphRecord != null) {
|
||||
for (int j = 0; j < callgraphRecord.size(); ++j) {
|
||||
write.write(
|
||||
callgraphRecord.get(j).getFunctionDescription().getFunctionName());
|
||||
write.write('\n');
|
||||
}
|
||||
}
|
||||
}
|
||||
write.flush();
|
||||
Msg.info(this, write.toString());
|
||||
}
|
||||
}
|
||||
|
||||
}
|
73
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.java
Executable file
73
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.java
Executable file
@ -0,0 +1,73 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Generate signatures for every function in the current executable and write in XML form to
|
||||
// a user specified file.
|
||||
//@category BSim
|
||||
|
||||
import java.io.*;
|
||||
import java.util.Iterator;
|
||||
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.FunctionDatabase;
|
||||
import ghidra.features.bsim.query.GenSignatures;
|
||||
import ghidra.features.bsim.query.client.Configuration;
|
||||
import ghidra.features.bsim.query.description.DescriptionManager;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.listing.FunctionManager;
|
||||
|
||||
public class GenerateSignatures extends GhidraScript {
|
||||
|
||||
@Override
|
||||
public void run() throws Exception {
|
||||
final String md5string = currentProgram.getExecutableMD5();
|
||||
if ((md5string == null) || (md5string.length() < 10)) {
|
||||
throw new IOException("Could not get MD5 on file: " + currentProgram.getName());
|
||||
}
|
||||
final String basename = "sigs_" + md5string;
|
||||
System.setProperty("ghidra.output", basename); // Inform parallel controller of output name
|
||||
File file = null;
|
||||
// This form of askString will work for both standalone execution or for parallel
|
||||
final File workingdir = askDirectory("GenerateSignatures:", "Working directory");
|
||||
if (!workingdir.isDirectory()) {
|
||||
popup("Must select a working directory!");
|
||||
return;
|
||||
}
|
||||
file = new File(workingdir, basename);
|
||||
|
||||
final LSHVectorFactory vectorFactory = FunctionDatabase.generateLSHVectorFactory();
|
||||
final GenSignatures gensig = new GenSignatures(true);
|
||||
final String templatename =
|
||||
askString("GenerateSignatures:", "Database template", "medium_nosize");
|
||||
final Configuration config = FunctionDatabase.loadConfigurationTemplate(templatename);
|
||||
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings);
|
||||
gensig.setVectorFactory(vectorFactory);
|
||||
gensig.addExecutableCategories(config.info.execats);
|
||||
gensig.addFunctionTags(config.info.functionTags);
|
||||
gensig.addDateColumnName(config.info.dateColumnName);
|
||||
final String repo = "ghidra://localhost/" + state.getProject().getName();
|
||||
final String path = GenSignatures.getPathFromDomainFile(currentProgram);
|
||||
gensig.openProgram(this.currentProgram, null, null, null, repo, path);
|
||||
final FunctionManager fman = currentProgram.getFunctionManager();
|
||||
final Iterator<Function> iter = fman.getFunctions(true);
|
||||
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor);
|
||||
final FileWriter fwrite = new FileWriter(file);
|
||||
final DescriptionManager manager = gensig.getDescriptionManager();
|
||||
manager.saveXml(fwrite);
|
||||
fwrite.close();
|
||||
}
|
||||
|
||||
}
|
58
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.py
Executable file
58
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.py
Executable file
@ -0,0 +1,58 @@
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
#Generate signatures for every function in the current program and write them to an XML file in a user-specified directory
|
||||
#@category BSim.python
|
||||
|
||||
import java.lang.System as System
|
||||
import java.io.File as File
|
||||
import ghidra.query.FunctionDatabase as FunctionDatabase
|
||||
import ghidra.query.GenSignatures as GenSignatures
|
||||
import java.io.FileWriter as FileWriter
|
||||
|
||||
def run():
|
||||
md5String = currentProgram.getExecutableMD5()
|
||||
if (md5String is None) or (len(md5String) < 10):
|
||||
raise IOException("Could not get MD5 on file: " + currentProgram.getName())
|
||||
basename = "sigs_" + md5String
|
||||
System.setProperty("ghidra.output",basename)
|
||||
workingDir = askDirectory("GenerateSignatures:", "Working Directory")
|
||||
if not workingDir.isDirectory():
|
||||
popup("Must select a working directory")
|
||||
return
|
||||
outfile = File(workingDir,basename)
|
||||
vectorFactory = FunctionDatabase.generateLSHVectorFactory()
|
||||
gensig = GenSignatures(True)
|
||||
templateName = askString("GenerateSignatures:", "Database template", "medium_nosize")
|
||||
config = FunctionDatabase.loadConfigurationTemplate(templateName)
|
||||
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings)
|
||||
gensig.setVectorFactory(vectorFactory)
|
||||
gensig.addExecutableCategories(config.info.execats)
|
||||
gensig.addFunctionTags(config.info.functionTags)
|
||||
gensig.addDateColumnName(config.info.dateColumnName)
|
||||
repo = "ghidra://localhost/" + state.getProject().getName()
|
||||
path = GenSignatures.getPathFromDomainFile(currentProgram)
|
||||
gensig.openProgram(currentProgram,None,None,None,repo,path)
|
||||
fman = currentProgram.getFunctionManager()
|
||||
iter = fman.getFunctions(True)
|
||||
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor)
|
||||
fwrite = FileWriter(outfile)
|
||||
manager = gensig.getDescriptionManager()
|
||||
manager.saveXml(fwrite)
|
||||
fwrite.close()
|
||||
return
|
||||
|
||||
run()
|
||||
|
443
Ghidra/Features/BSim/ghidra_scripts/LocalBSimQueryScript.java
Normal file
443
Ghidra/Features/BSim/ghidra_scripts/LocalBSimQueryScript.java
Normal file
@ -0,0 +1,443 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
//Queries all functions in the current selection (or all functions in the current program if
|
||||
//the current selection is null) against all functions in a user-selected program.
|
||||
//@category BSim
|
||||
|
||||
import java.util.*;
|
||||
|
||||
import org.apache.commons.collections4.IteratorUtils;
|
||||
|
||||
import generic.lsh.vector.*;
|
||||
import ghidra.app.decompiler.DecompileException;
|
||||
import ghidra.app.plugin.core.functioncompare.FunctionComparisonProvider;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.app.services.FunctionComparisonService;
|
||||
import ghidra.app.tablechooser.*;
|
||||
import ghidra.features.bsim.query.*;
|
||||
import ghidra.features.bsim.query.client.Configuration;
|
||||
import ghidra.features.bsim.query.description.FunctionDescription;
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.listing.*;
|
||||
|
||||
//TODO: docs
|
||||
|
||||
public class LocalBSimQueryScript extends GhidraScript {
|
||||
|
||||
//functions with self significance below this bound will be skipped
|
||||
private static final double SELF_SIGNIFICANCE_BOUND = 15.0;
|
||||
//bsim database template determining the signature settings
|
||||
private static final String TEMPLATE_NAME = "medium_nosize";
|
||||
//these are analogous to the bounds in a bsim query
|
||||
private static final double MATCH_SIMILARITY_LOWER_BOUND = 0.0;
|
||||
private static final double MATCH_CONFIDENCE_LOWER_BOUND = 0.0;
|
||||
private static final int MATCHES_PER_FUNCTION = 10;
|
||||
//decrease this if you only want to see matches that aren't exact
|
||||
//for instance, when looking for changes between two versions of a program
|
||||
private static final double MATCH_SIMILARITY_UPPER_BOUND = 1.0;
|
||||
|
||||
private TableChooserDialog tableDialog;
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
if (isRunningHeadless()) {
|
||||
popup("This script cannot be run headlessly.");
|
||||
return;
|
||||
}
|
||||
|
||||
Set<Function> sourceFuncs = new HashSet<>();
|
||||
if (currentSelection == null) {
|
||||
IteratorUtils.forEach(currentProgram.getFunctionManager().getFunctions(true),
|
||||
x -> sourceFuncs.add(x));
|
||||
}
|
||||
else {
|
||||
IteratorUtils.forEach(
|
||||
currentProgram.getFunctionManager().getFunctionsOverlapping(currentSelection),
|
||||
x -> sourceFuncs.add(x));
|
||||
}
|
||||
|
||||
if (sourceFuncs.isEmpty()) {
|
||||
this.popup("No non-stub functions to query!");
|
||||
return;
|
||||
}
|
||||
|
||||
Program targetProgram = askProgram("Select Target Program");
|
||||
if (targetProgram == null) {
|
||||
return;
|
||||
}
|
||||
try {
|
||||
List<LocalBSimMatch> localMatches = null;
|
||||
|
||||
//use special optimized method when the target program is the same as the current program
|
||||
//in that case, a given function might be in both the source and target sets
|
||||
//but we only want to generate signatures for it once
|
||||
if (currentProgram.getUniqueProgramID() == targetProgram.getUniqueProgramID()) {
|
||||
localMatches = getMatchesCurrentProgram(sourceFuncs);
|
||||
}
|
||||
else {
|
||||
//in this case there is no overlap between the source and target functions
|
||||
localMatches = getMatchesTwoPrograms(sourceFuncs, currentProgram, targetProgram);
|
||||
}
|
||||
if (localMatches.isEmpty()) {
|
||||
popup("No matches meeting criteria.");
|
||||
return;
|
||||
}
|
||||
Collections.sort(localMatches);
|
||||
initializeTable(currentProgram, targetProgram);
|
||||
|
||||
//again, use an optimized method for the special case when target program is the same
|
||||
//as the current program
|
||||
if (currentProgram.getUniqueProgramID() == targetProgram.getUniqueProgramID()) {
|
||||
addMatchesOneProgram(localMatches, sourceFuncs);
|
||||
}
|
||||
else {
|
||||
addMatchesTwoPrograms(localMatches);
|
||||
}
|
||||
}
|
||||
finally {
|
||||
targetProgram.release(this);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Iterate through the list of sorted matches, adding the top MATCHES_PER_FUNCTION elements
|
||||
* to the table for each source function.
|
||||
* @param localMatches matches in decreasing order of confidence
|
||||
*/
|
||||
private void addMatchesTwoPrograms(List<LocalBSimMatch> localMatches) {
|
||||
Map<Function, Integer> matchCounts = new HashMap<>();
|
||||
for (LocalBSimMatch match : localMatches) {
|
||||
int count = matchCounts.getOrDefault(match.getSourceFunc(), 0);
|
||||
if (count >= MATCHES_PER_FUNCTION) {
|
||||
continue;
|
||||
}
|
||||
tableDialog.add(match);
|
||||
matchCounts.put(match.getSourceFunc(), count + 1);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Iterate through the list of sorted matches, adding the top MATCHES_PER_FUNCTION elements
|
||||
* to the table for each function ins {@code sourceFuncSet}.
|
||||
*
|
||||
* By construction, the matches in this list have the "source" function before the "target"
|
||||
* function (in address order). This is an optimization to prevent essentially the same
|
||||
* data from appearing in the list twice (since the BSim similarity and confidence operations
|
||||
* are commutative). So, for each match, we need to check whether the source or the
|
||||
* target are in {@code sourceFuncSet}.
|
||||
*
|
||||
* @param localMatches matches in decreasing order of confidence
|
||||
* @param sourceFuncSet source functions
|
||||
*/
|
||||
private void addMatchesOneProgram(List<LocalBSimMatch> localMatches,
|
||||
Set<Function> sourceFuncSet) {
|
||||
Map<Function, Integer> matchCounts = new HashMap<>();
|
||||
for (LocalBSimMatch match : localMatches) {
|
||||
Function leftFunc = match.getSourceFunc();
|
||||
int leftCount = matchCounts.getOrDefault(leftFunc, 0);
|
||||
if (sourceFuncSet.contains(leftFunc) && leftCount < MATCHES_PER_FUNCTION) {
|
||||
tableDialog.add(match);
|
||||
matchCounts.put(leftFunc, leftCount + 1);
|
||||
}
|
||||
Function rightFunc = match.getTargetFunc();
|
||||
int rightCount = matchCounts.getOrDefault(rightFunc, 0);
|
||||
if (sourceFuncSet.contains(rightFunc) && rightCount < MATCHES_PER_FUNCTION) {
|
||||
LocalBSimMatch switched = new LocalBSimMatch(rightFunc, leftFunc,
|
||||
match.getSimilarity(), match.getSignificance());
|
||||
tableDialog.add(switched);
|
||||
matchCounts.put(rightFunc, rightCount + 1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private List<LocalBSimMatch> getMatchesCurrentProgram(Set<Function> funcs)
|
||||
throws LSHException, DecompileException {
|
||||
List<LocalBSimMatch> bsimMatches = new ArrayList<>();
|
||||
LSHVectorFactory vectorFactory = getVectorFactory();
|
||||
|
||||
//generate the signatures for *all* functions in the program...
|
||||
FunctionManager fman = currentProgram.getFunctionManager();
|
||||
Iterator<Function> iter = fman.getFunctions(true);
|
||||
GenSignatures gensig =
|
||||
generateSignatures(currentProgram, iter, fman.getFunctionCount(), vectorFactory);
|
||||
|
||||
//...but use sourceFuncAddrs to ensure that source functions are in the
|
||||
//funcs set
|
||||
Set<Long> sourceFuncAddrs = new HashSet<>();
|
||||
for (Function func : funcs) {
|
||||
sourceFuncAddrs.add(func.getEntryPoint().getOffset());
|
||||
}
|
||||
Iterator<FunctionDescription> sourceDescripts =
|
||||
gensig.getDescriptionManager().listAllFunctions();
|
||||
VectorCompare vecCompare = new VectorCompare();
|
||||
while (sourceDescripts.hasNext()) {
|
||||
FunctionDescription srcDesc = sourceDescripts.next();
|
||||
//skip if not in selection
|
||||
if (!sourceFuncAddrs.contains(srcDesc.getAddress())) {
|
||||
continue;
|
||||
}
|
||||
//skip if self-significance too small
|
||||
LSHVector srcVector = srcDesc.getSignatureRecord().getLSHVector();
|
||||
if (vectorFactory.getSelfSignificance(srcVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||
continue;
|
||||
}
|
||||
Iterator<FunctionDescription> targetDescripts =
|
||||
gensig.getDescriptionManager().listAllFunctions();
|
||||
Function srcFunc = getFunction(currentProgram, srcDesc.getAddress());
|
||||
while (targetDescripts.hasNext()) {
|
||||
//skip if target before srcFunc in address order
|
||||
//AND target is one of the source functions (i.e., in funcs)
|
||||
FunctionDescription targetDesc = targetDescripts.next();
|
||||
long targetAddress = targetDesc.getAddress();
|
||||
if (sourceFuncAddrs.contains(targetAddress) &&
|
||||
targetAddress <= srcDesc.getAddress()) {
|
||||
continue;
|
||||
}
|
||||
//skip if self-significance too small
|
||||
LSHVector targetVector = targetDesc.getSignatureRecord().getLSHVector();
|
||||
if (vectorFactory.getSelfSignificance(targetVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||
continue;
|
||||
}
|
||||
double sim = srcVector.compare(targetVector, vecCompare);
|
||||
double sig = vectorFactory.calculateSignificance(vecCompare);
|
||||
if (sig >= MATCH_CONFIDENCE_LOWER_BOUND && MATCH_SIMILARITY_LOWER_BOUND <= sim &&
|
||||
sim <= MATCH_SIMILARITY_UPPER_BOUND) {
|
||||
Function targetFunc = getFunction(currentProgram, targetDesc.getAddress());
|
||||
bsimMatches.add(new LocalBSimMatch(srcFunc, targetFunc, sim, sig));
|
||||
}
|
||||
}
|
||||
}
|
||||
return bsimMatches;
|
||||
}
|
||||
|
||||
private List<LocalBSimMatch> getMatchesTwoPrograms(Set<Function> srcFuncs,
|
||||
Program sourceProgram, Program targetProgram) throws LSHException, DecompileException {
|
||||
List<LocalBSimMatch> bsimMatches = new ArrayList<>();
|
||||
LSHVectorFactory vectorFactory = getVectorFactory();
|
||||
GenSignatures srcSigs =
|
||||
generateSignatures(sourceProgram, srcFuncs.iterator(), srcFuncs.size(), vectorFactory);
|
||||
FunctionManager targetFuncMan = targetProgram.getFunctionManager();
|
||||
Iterator<Function> targetFuncIter = targetFuncMan.getFunctions(true);
|
||||
GenSignatures targetSigs = generateSignatures(targetProgram, targetFuncIter,
|
||||
targetFuncMan.getFunctionCount(), vectorFactory);
|
||||
Iterator<FunctionDescription> sourceDescripts =
|
||||
srcSigs.getDescriptionManager().listAllFunctions();
|
||||
VectorCompare vecCompare = new VectorCompare();
|
||||
while (sourceDescripts.hasNext()) {
|
||||
FunctionDescription srcDesc = sourceDescripts.next();
|
||||
//skip if self-significance too small
|
||||
LSHVector srcVector = srcDesc.getSignatureRecord().getLSHVector();
|
||||
if (vectorFactory.getSelfSignificance(srcVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||
continue;
|
||||
}
|
||||
Iterator<FunctionDescription> targetDescripts =
|
||||
targetSigs.getDescriptionManager().listAllFunctions();
|
||||
Function srcFunc = getFunction(sourceProgram, srcDesc.getAddress());
|
||||
while (targetDescripts.hasNext()) {
|
||||
FunctionDescription targetDesc = targetDescripts.next();
|
||||
//skip if self-significance too small
|
||||
LSHVector targetVector = targetDesc.getSignatureRecord().getLSHVector();
|
||||
if (vectorFactory.getSelfSignificance(targetVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||
continue;
|
||||
}
|
||||
double sim = srcVector.compare(targetVector, vecCompare);
|
||||
double sig = vectorFactory.calculateSignificance(vecCompare);
|
||||
if (sig >= MATCH_CONFIDENCE_LOWER_BOUND && MATCH_SIMILARITY_LOWER_BOUND <= sim &&
|
||||
sim <= MATCH_SIMILARITY_UPPER_BOUND) {
|
||||
Function targetFunc = getFunction(targetProgram, targetDesc.getAddress());
|
||||
bsimMatches.add(new LocalBSimMatch(srcFunc, targetFunc, sim, sig));
|
||||
}
|
||||
}
|
||||
}
|
||||
return bsimMatches;
|
||||
}
|
||||
|
||||
private Function getFunction(Program program, long offset) {
|
||||
Address addr = program.getAddressFactory().getDefaultAddressSpace().getAddress(offset);
|
||||
return program.getFunctionManager().getFunctionAt(addr);
|
||||
}
|
||||
|
||||
private LSHVectorFactory getVectorFactory() throws LSHException {
|
||||
LSHVectorFactory vectorFactory = FunctionDatabase.generateLSHVectorFactory();
|
||||
Configuration config = FunctionDatabase.loadConfigurationTemplate(TEMPLATE_NAME);
|
||||
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings);
|
||||
return vectorFactory;
|
||||
}
|
||||
|
||||
private GenSignatures generateSignatures(Program program, Iterator<Function> funcs, int count,
|
||||
LSHVectorFactory vectorFactory) throws LSHException, DecompileException {
|
||||
GenSignatures gensig = new GenSignatures(false);
|
||||
gensig.setVectorFactory(vectorFactory);
|
||||
gensig.openProgram(program, null, null, null, null, null);
|
||||
gensig.scanFunctions(funcs, count, monitor);
|
||||
return gensig;
|
||||
}
|
||||
|
||||
class LocalBSimMatch implements Comparable<LocalBSimMatch>, AddressableRowObject {
|
||||
private Function sourceFunc;
|
||||
private Function targetFunc;
|
||||
private double similarity;
|
||||
private double significance;
|
||||
|
||||
public LocalBSimMatch(Function sourceFunc, Function targetFunc, double sim, double signif) {
|
||||
this.sourceFunc = sourceFunc;
|
||||
this.targetFunc = targetFunc;
|
||||
this.similarity = sim;
|
||||
this.significance = signif;
|
||||
}
|
||||
|
||||
public Function getSourceFunc() {
|
||||
return sourceFunc;
|
||||
}
|
||||
|
||||
public Function getTargetFunc() {
|
||||
return targetFunc;
|
||||
}
|
||||
|
||||
public double getSimilarity() {
|
||||
return similarity;
|
||||
}
|
||||
|
||||
public double getSignificance() {
|
||||
return significance;
|
||||
}
|
||||
|
||||
public Program getSourceProgram() {
|
||||
return sourceFunc.getProgram();
|
||||
}
|
||||
|
||||
public Program getTargetProgram() {
|
||||
return targetFunc.getProgram();
|
||||
}
|
||||
|
||||
@Override
|
||||
public int compareTo(LocalBSimQueryScript.LocalBSimMatch o) {
|
||||
return -Double.compare(significance, o.significance);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Address getAddress() {
|
||||
return sourceFunc.getEntryPoint();
|
||||
}
|
||||
}
|
||||
|
||||
/****************************************************************************************
|
||||
* table stuff
|
||||
****************************************************************************************/
|
||||
|
||||
class CompareMatchesExecutor implements TableChooserExecutor {
|
||||
|
||||
private FunctionComparisonService compareService;
|
||||
private FunctionComparisonProvider comparisonProvider;
|
||||
|
||||
public CompareMatchesExecutor() {
|
||||
compareService = state.getTool().getService(FunctionComparisonService.class);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getButtonName() {
|
||||
return "Compare Selected Matches";
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean execute(AddressableRowObject rowObject) {
|
||||
LocalBSimMatch match = (LocalBSimMatch) rowObject;
|
||||
if (comparisonProvider == null) {
|
||||
comparisonProvider =
|
||||
compareService.compareFunctions(match.getSourceFunc(), match.getTargetFunc());
|
||||
}
|
||||
else {
|
||||
compareService.compareFunctions(match.getSourceFunc(), match.getTargetFunc(),
|
||||
comparisonProvider);
|
||||
}
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
private void initializeTable(Program sourceProgram, Program targetProgram) {
|
||||
StringBuilder titleBuilder = new StringBuilder("Local BSim Matches: ");
|
||||
titleBuilder.append(sourceProgram.getDomainFile().getPathname());
|
||||
titleBuilder.append(" -> ");
|
||||
titleBuilder.append(targetProgram.getDomainFile().getPathname());
|
||||
tableDialog =
|
||||
createTableChooserDialog(titleBuilder.toString(), new CompareMatchesExecutor());
|
||||
configureTableColumns(tableDialog);
|
||||
tableDialog.setMinimumSize(800, 400);
|
||||
tableDialog.show();
|
||||
tableDialog.setMessage(null);
|
||||
}
|
||||
|
||||
private void configureTableColumns(TableChooserDialog dialog) {
|
||||
|
||||
ColumnDisplay<Double> simColumn = new AbstractComparableColumnDisplay<Double>() {
|
||||
|
||||
@Override
|
||||
public Double getColumnValue(AddressableRowObject rowObject) {
|
||||
return ((LocalBSimMatch) rowObject).getSimilarity();
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getColumnName() {
|
||||
return "Similarity";
|
||||
}
|
||||
};
|
||||
|
||||
ColumnDisplay<Double> sigColumn = new AbstractComparableColumnDisplay<Double>() {
|
||||
|
||||
@Override
|
||||
public Double getColumnValue(AddressableRowObject rowObject) {
|
||||
return ((LocalBSimMatch) rowObject).getSignificance();
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getColumnName() {
|
||||
return "Significance";
|
||||
}
|
||||
};
|
||||
|
||||
StringColumnDisplay sourceFuncColumn = new StringColumnDisplay() {
|
||||
|
||||
@Override
|
||||
public String getColumnValue(AddressableRowObject rowObject) {
|
||||
return ((LocalBSimMatch) rowObject).getSourceFunc().getName(true);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getColumnName() {
|
||||
return "Source Function";
|
||||
}
|
||||
};
|
||||
|
||||
StringColumnDisplay targetFuncColumn = new StringColumnDisplay() {
|
||||
|
||||
@Override
|
||||
public String getColumnValue(AddressableRowObject rowObject) {
|
||||
return ((LocalBSimMatch) rowObject).getTargetFunc().getName(true);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getColumnName() {
|
||||
return "Target Function";
|
||||
}
|
||||
};
|
||||
|
||||
dialog.addCustomColumn(simColumn);
|
||||
dialog.addCustomColumn(sigColumn);
|
||||
dialog.addCustomColumn(sourceFuncColumn);
|
||||
dialog.addCustomColumn(targetFuncColumn);
|
||||
}
|
||||
|
||||
}
|
108
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.java
Executable file
108
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.java
Executable file
@ -0,0 +1,108 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Example of querying a BSim database about a single function
|
||||
//@category BSim
|
||||
|
||||
import java.net.URL;
|
||||
import java.util.Iterator;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.*;
|
||||
import ghidra.features.bsim.query.description.*;
|
||||
import ghidra.features.bsim.query.protocol.*;
|
||||
import ghidra.program.model.listing.Function;
|
||||
|
||||
|
||||
public class QueryFunction extends GhidraScript {
|
||||
|
||||
//GenSignatures gensig;
|
||||
//FunctionDatabase database;
|
||||
private static final int MATCHES_PER_FUNC = 10;
|
||||
private static final double SIMILARITY_BOUND = 0.7;
|
||||
private static final double CONFIDENCE_BOUND = 0.0;
|
||||
|
||||
@Override
|
||||
public void run() throws Exception {
|
||||
if (currentProgram == null) {
|
||||
return;
|
||||
}
|
||||
Function func = this.getFunctionContaining(this.currentAddress);
|
||||
if (func == null){
|
||||
popup("No function selected!");
|
||||
return;
|
||||
}
|
||||
|
||||
String DATABASE_URL = askString("Enter Database URL", "URL");
|
||||
URL url = BSimClientFactory.deriveBSimURL(DATABASE_URL);
|
||||
try (FunctionDatabase database = BSimClientFactory.buildClient(url, false)) {
|
||||
if (!database.initialize()) {
|
||||
println(database.getLastError().message);
|
||||
return;
|
||||
}
|
||||
|
||||
GenSignatures gensig = new GenSignatures(false);
|
||||
try {
|
||||
gensig.setVectorFactory(database.getLSHVectorFactory());
|
||||
gensig.openProgram(currentProgram, null, null, null, null, null);
|
||||
|
||||
DescriptionManager manager = gensig.getDescriptionManager();
|
||||
gensig.scanFunction(func);
|
||||
|
||||
QueryNearest query = new QueryNearest();
|
||||
query.manage = manager;
|
||||
query.max = MATCHES_PER_FUNC;
|
||||
query.thresh = SIMILARITY_BOUND;
|
||||
query.signifthresh = CONFIDENCE_BOUND;
|
||||
|
||||
ResponseNearest response = query.execute(database);
|
||||
if (response == null) {
|
||||
println(database.getLastError().message);
|
||||
return;
|
||||
}
|
||||
Iterator<SimilarityResult> iter = response.result.iterator();
|
||||
StringBuffer buf = new StringBuffer();
|
||||
while (iter.hasNext()) {
|
||||
SimilarityResult sim = iter.next();
|
||||
FunctionDescription base = sim.getBase();
|
||||
ExecutableRecord exe = base.getExecutableRecord();
|
||||
buf.append("\nExecutable: ")
|
||||
.append(exe.getNameExec())
|
||||
.append("\nFunction: ")
|
||||
.append(base.getFunctionName())
|
||||
.append('\n');
|
||||
Iterator<SimilarityNote> subiter = sim.iterator();
|
||||
while (subiter.hasNext()) {
|
||||
SimilarityNote note = subiter.next();
|
||||
FunctionDescription fdesc = note.getFunctionDescription();
|
||||
ExecutableRecord exerec = fdesc.getExecutableRecord();
|
||||
buf.append(" Executable: ");
|
||||
buf.append(exerec.getNameExec())
|
||||
.append("\n Matching Function name: ")
|
||||
.append(fdesc.getFunctionName());
|
||||
buf.append("\n Similarity: ").append(note.getSimilarity());
|
||||
buf.append("\n Significance: ").append(note.getSignificance());
|
||||
buf.append("\n\n");
|
||||
}
|
||||
}
|
||||
println(buf.toString());
|
||||
}
|
||||
finally {
|
||||
gensig.dispose();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
78
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.py
Executable file
78
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.py
Executable file
@ -0,0 +1,78 @@
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
# Example of performing a BSim query on a single function
|
||||
# @category BSim.python
|
||||
|
||||
import ghidra.query.BSimClientFactory as BSimClientFactory
|
||||
import ghidra.query.GenSignatures as GenSignatures
|
||||
import ghidra.query.protocol.QueryNearest as QueryNearest
|
||||
|
||||
MATCHES_PER_FUNC = 100
|
||||
SIMILARITY_BOUND = 0.7
|
||||
CONFIDENCE_BOUND = 0.0
|
||||
|
||||
def query(func):
|
||||
DATABASE_URL = askString("Enter Database URL", "URL")
|
||||
url = BSimClientFactory.deriveBSimURL(DATABASE_URL)
|
||||
database = BSimClientFactory.buildClient(url,False)
|
||||
if not database.initialize():
|
||||
print database.getLastError().message
|
||||
return
|
||||
gensig = GenSignatures(False)
|
||||
gensig.setVectorFactory(database.getLSHVectorFactory())
|
||||
gensig.openProgram(currentProgram,None,None,None,None,None)
|
||||
|
||||
gensig.scanFunction(func)
|
||||
|
||||
query = QueryNearest()
|
||||
query.manage = gensig.getDescriptionManager()
|
||||
query.max = MATCHES_PER_FUNC
|
||||
query.thresh = SIMILARITY_BOUND
|
||||
query.signifthresh = CONFIDENCE_BOUND
|
||||
|
||||
response = database.query(query)
|
||||
if response is None:
|
||||
print database.getLastError().message
|
||||
return
|
||||
simIter = response.result.iterator()
|
||||
while simIter.hasNext():
|
||||
sim = simIter.next()
|
||||
base = sim.getBase()
|
||||
exe = base.getExecutableRecord()
|
||||
print "Source executable: %s; source function: %s" % (exe.getNameExec(),base.getFunctionName())
|
||||
subIter = sim.iterator()
|
||||
while subIter.hasNext():
|
||||
note = subIter.next()
|
||||
fdesc = note.getFunctionDescription()
|
||||
exerec = fdesc.getExecutableRecord()
|
||||
print " Executable: %s" % exerec.getNameExec()
|
||||
print " Matching Function name: %s " % fdesc.getFunctionName()
|
||||
print " Similarity: %f" % note.getSimilarity()
|
||||
print " Significance: %f\n" % note.getSignificance()
|
||||
gensig.dispose()
|
||||
database.close()
|
||||
return;
|
||||
|
||||
if currentProgram is None:
|
||||
popup("currentProgram is None!")
|
||||
else:
|
||||
func = currentProgram.getFunctionManager().getFunctionContaining(currentAddress)
|
||||
if func is None:
|
||||
popup("Cursor must be in a function!")
|
||||
else:
|
||||
query(func)
|
||||
|
||||
|
333
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.java
Executable file
333
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.java
Executable file
@ -0,0 +1,333 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
//Example of a script to perform a more involved BSim query.
|
||||
//@category BSim
|
||||
import java.util.*;
|
||||
import java.util.function.BiPredicate;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.gui.filters.*;
|
||||
import ghidra.features.bsim.gui.search.results.BSimMatchResult;
|
||||
import ghidra.features.bsim.gui.search.results.ExecutableResult;
|
||||
import ghidra.features.bsim.query.FunctionDatabase;
|
||||
import ghidra.features.bsim.query.FunctionDatabase.ErrorCategory;
|
||||
import ghidra.features.bsim.query.description.FunctionDescription;
|
||||
import ghidra.features.bsim.query.facade.*;
|
||||
import ghidra.features.bsim.query.protocol.BSimFilter;
|
||||
import ghidra.features.bsim.query.protocol.PreFilter;
|
||||
import ghidra.program.database.symbol.FunctionSymbol;
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.listing.*;
|
||||
import ghidra.program.model.symbol.SourceType;
|
||||
import ghidra.util.exception.CancelledException;
|
||||
|
||||
/**
|
||||
* Script showing how to apply filters to a BSim query. Currently we support three types
|
||||
* of filters, described below:
|
||||
*
|
||||
* 1. QUERY THRESHOLDS
|
||||
* These are the items at the top of the BSim query dialog:
|
||||
* Similarity
|
||||
* Confidence
|
||||
* Matches per Function
|
||||
* These are server-side filters that will be applied when the db is queried.
|
||||
*
|
||||
* 2. PREFILTERS
|
||||
* Allows users to identify functions that meet certain criteria by specifying
|
||||
* {@link BiPredicate}s. Any functions matching the predicate(s) will be included
|
||||
* in the result set.
|
||||
*
|
||||
* 3. EXECUTABLE FILTERS
|
||||
* These are predefined filters that can be applied on the server or on the
|
||||
* client (applied only to the results of a query). On the BSim query
|
||||
* dialog these are the items in the filter pulldown menu.
|
||||
* @see BSimFilterType
|
||||
*
|
||||
* SCRIPT FLOW
|
||||
* This example script does the following:
|
||||
*
|
||||
* 1) Set threshold filters
|
||||
* 2) Set prefilters
|
||||
* 3) Set executable filters
|
||||
* 4) Query the database & print results
|
||||
* 5) Set new executable filters
|
||||
* 6) Print results
|
||||
*
|
||||
* NOTES: 1. You will be queried for the location of the BSim database. This URL
|
||||
* will take the form "ghidra://<ip address>/<database name>
|
||||
*
|
||||
* 2. This script is only an example - the specific filters demonstrated
|
||||
* here will not necessarily apply to what's in your BSim database.
|
||||
*
|
||||
*/
|
||||
public class QueryWithFiltersScript extends GhidraScript {
|
||||
|
||||
// Threshold settings.
|
||||
private static final int MAX_NUM_FUNCTIONS = 100;
|
||||
private static final double SIMILARITY_BOUND = 0.7;
|
||||
private static final double SIGNIFICANCE_BOUND = 0.0;
|
||||
|
||||
// Restricts the number of results.
|
||||
private static final int NUM_EXES_TO_DISPLAY = 10;
|
||||
|
||||
// Prefilter value we'll be setting.
|
||||
private static final double SELF_SIGNIFICANCE_BOUND = 40.0;
|
||||
|
||||
private HashSet<FunctionSymbol> funcsToQuery;
|
||||
private SimilarFunctionQueryService queryService;
|
||||
private SFQueryInfo queryInfo;
|
||||
private BSimFilter bsimFilter;
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
|
||||
funcsToQuery = getFunctionsToQuery(currentProgram);
|
||||
queryService = new SimilarFunctionQueryService(currentProgram);
|
||||
queryInfo = new SFQueryInfo(funcsToQuery);
|
||||
bsimFilter = queryInfo.getBsimFilter();
|
||||
|
||||
// Add threshold filters.
|
||||
queryInfo.setMaximumResults(MAX_NUM_FUNCTIONS);
|
||||
queryInfo.setSimilarityThreshold(SIMILARITY_BOUND);
|
||||
queryInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND);
|
||||
|
||||
// Add prefilters.
|
||||
setPrefilters();
|
||||
|
||||
// Add a simple date filter.
|
||||
addBsimFilter(new DateLaterBSimFilterType(""), "01/01/1776");
|
||||
|
||||
// Demonstration of a filter that allows for multiple entries. All filters but the
|
||||
// DateEarlier and DateLater allow this. The effect is that each filter will be OR'd
|
||||
// with the others. This is effectively the same as creating three distinct ArchEquals filters.
|
||||
//
|
||||
// ie: "The architecture can equal x86:LE:64:default OR the architecture can equal
|
||||
// ARM:LE_32:v4 OR ...."
|
||||
addBsimFilter(new ArchitectureBSimFilterType(),
|
||||
"x86:LE:64:default, x86:LE:32:default, ARM:LE:32:v4");
|
||||
|
||||
// Another filter with multiple entries, but in this case since it is a "NotEqual" filter,
|
||||
// the items are "AND'd together.
|
||||
//
|
||||
// ie: "The compiler cannot equal windows AND the compiler cannot equal foo_compiler".
|
||||
addBsimFilter(new CompilerBSimFilterType(), "windows, foo_compiler");
|
||||
|
||||
//connect to the database
|
||||
try {
|
||||
String dbUrl =
|
||||
askString("", "Enter the URL of the BSim database:", "ghidra://localhost/bsimDb");
|
||||
queryService.initializeDatabase(dbUrl);
|
||||
FunctionDatabase.Error error = queryService.getLastError();
|
||||
if (error != null && error.category == ErrorCategory.Nodatabase) {
|
||||
println("Database [" + dbUrl + "] cannot be found (does it exist?)");
|
||||
return;
|
||||
}
|
||||
}
|
||||
catch (QueryDatabaseException e) {
|
||||
println(e.getMessage());
|
||||
return;
|
||||
}
|
||||
|
||||
// Execute query and print results.
|
||||
List<BSimMatchResult> resultRows = executeQuery(queryInfo);
|
||||
printFunctionQueryResults(resultRows, "\nFunction-level results before filtering");
|
||||
|
||||
// Add some simple post-query filters. These filters will only be applied to the result
|
||||
// set returned from the previous query.
|
||||
addBsimFilter(new Md5BSimFilterType(), currentProgram.getExecutableMD5());
|
||||
addBsimFilter(new CompilerBSimFilterType(), "gcc");
|
||||
addBsimFilter(new FunctionTagBSimFilterType("KNOWN_LIBRARY", queryService),
|
||||
"false");
|
||||
|
||||
// Apply the filters and print results.
|
||||
List<BSimMatchResult> filteredRows =
|
||||
BSimMatchResult.filterMatchRows(bsimFilter, resultRows);
|
||||
printFunctionQueryResults(filteredRows, "\nFunction-level results after filtering");
|
||||
printExecutableInformation(filteredRows);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void cleanup(boolean success) {
|
||||
if (queryService != null) {
|
||||
queryService.dispose();
|
||||
}
|
||||
}
|
||||
|
||||
/***********************************************************************
|
||||
* PRIVATE METHODS
|
||||
***********************************************************************/
|
||||
|
||||
/**
|
||||
* Adds a filter to the given filter container.
|
||||
*
|
||||
* @param filterTemplate the filter type to add
|
||||
* @param value the value of the filter
|
||||
*/
|
||||
private void addBsimFilter(BSimFilterType filterTemplate, String value) {
|
||||
String[] inputs = value.split(",");
|
||||
for (String input : inputs) {
|
||||
if (!input.trim().isEmpty()) {
|
||||
bsimFilter.addAtom(filterTemplate, input.trim());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Queries the database and returns the results.
|
||||
*
|
||||
* @param qInfo contains all information required for the query
|
||||
* @return list of matches
|
||||
* @throws QueryDatabaseException if there is a problem executing the query similar functions query
|
||||
* @throws CancelledException if the user cancelled the operation
|
||||
*/
|
||||
private List<BSimMatchResult> executeQuery(SFQueryInfo qInfo)
|
||||
throws QueryDatabaseException, CancelledException {
|
||||
|
||||
SFQueryResult queryResults = queryService.querySimilarFunctions(qInfo, null, monitor);
|
||||
List<BSimMatchResult> resultRows =
|
||||
BSimMatchResult.generate(queryResults.getSimilarityResults(), currentProgram);
|
||||
|
||||
return resultRows;
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates predicates that will be used to filter out functions. This example provides three
|
||||
* different methods of doing this:
|
||||
*
|
||||
* - anonymous class
|
||||
* - lambda
|
||||
* - static method
|
||||
*
|
||||
* These are all possible because the filter takes a {@link BiPredicate}, which is a
|
||||
* functional interface.
|
||||
*
|
||||
*/
|
||||
private void setPrefilters() {
|
||||
|
||||
PreFilter preFilter = queryInfo.getPreFilter();
|
||||
|
||||
//
|
||||
// Option 1: Anonymous class
|
||||
// Filters out any functions with a self significance less than a
|
||||
// certain value.
|
||||
//
|
||||
preFilter.addPredicate(new BiPredicate<Program, FunctionDescription>() {
|
||||
@Override
|
||||
public boolean test(Program t, FunctionDescription u) {
|
||||
return queryService.getLSHVectorFactory()
|
||||
.getSelfSignificance(
|
||||
u.getSignatureRecord().getLSHVector()) >= SELF_SIGNIFICANCE_BOUND;
|
||||
}
|
||||
});
|
||||
|
||||
//
|
||||
// Option 2. Lambda expression
|
||||
// Filters out any functions with a self significance less than a
|
||||
// certain value.
|
||||
//
|
||||
preFilter.addPredicate((x, y) -> queryService.getLSHVectorFactory()
|
||||
.getSelfSignificance(
|
||||
y.getSignatureRecord().getLSHVector()) >= SELF_SIGNIFICANCE_BOUND);
|
||||
|
||||
//
|
||||
// Option 3. Static method
|
||||
// Filters out any functions that are of type ANALYSIS.
|
||||
//
|
||||
preFilter.addPredicate(QueryWithFiltersScript::isNotAnalysisSourceType);
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a set of ALL functions (no stubs) in the given program.
|
||||
*
|
||||
* @param program the program to get the functions from
|
||||
* @return list of function symbols
|
||||
*/
|
||||
private HashSet<FunctionSymbol> getFunctionsToQuery(Program program) {
|
||||
HashSet<FunctionSymbol> functions = new HashSet<>();
|
||||
FunctionIterator fIter = program.getFunctionManager().getFunctionsNoStubs(true);
|
||||
for (Function func : fIter) {
|
||||
functions.add((FunctionSymbol) func.getSymbol());
|
||||
}
|
||||
return functions;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns true if the given function is NOT an analysis type.
|
||||
*
|
||||
* @param program the current program
|
||||
* @param funcDesc the function description object
|
||||
* @return true if the symbol is NOT an analysis source type
|
||||
*/
|
||||
public static boolean isNotAnalysisSourceType(Program program, FunctionDescription funcDesc) {
|
||||
Address address =
|
||||
program.getAddressFactory().getDefaultAddressSpace().getAddress(funcDesc.getAddress());
|
||||
|
||||
Function function = program.getFunctionManager().getFunctionAt(address);
|
||||
if (function == null || function.getName().equals(funcDesc.getFunctionName())) {
|
||||
return false;
|
||||
}
|
||||
return function.getSymbol().getSource() != SourceType.ANALYSIS;
|
||||
}
|
||||
|
||||
/**
|
||||
* Prints a sorted list of executables represented in the function matches.
|
||||
*
|
||||
* @param filteredRows list of function results
|
||||
*/
|
||||
private void printExecutableInformation(List<BSimMatchResult> filteredRows) {
|
||||
|
||||
TreeSet<ExecutableResult> execrows = ExecutableResult.generateFromMatchRows(filteredRows);
|
||||
ExecutableResult[] results = new ExecutableResult[execrows.size()];
|
||||
results = execrows.toArray(results);
|
||||
|
||||
Arrays.sort(results, new Comparator<ExecutableResult>() {
|
||||
@Override
|
||||
public int compare(ExecutableResult o1, ExecutableResult o2) {
|
||||
return Double.compare(o2.getSignificanceSum(), o1.getSignificanceSum());
|
||||
}
|
||||
});
|
||||
|
||||
printf("Executable-level results:\n");
|
||||
for (int i = 0, max = Math.min(NUM_EXES_TO_DISPLAY, results.length); i < max; ++i) {
|
||||
printf(" MD5: %s\n", results[i].getExecutableRecord().getMd5());
|
||||
printf(" Executable Name: %s\n", results[i].getExecutableRecord().getNameExec());
|
||||
printf(" Function Count: %d\n", results[i].getFunctionCount());
|
||||
printf(" Significance Sum: %f\n\n", results[i].getSignificanceSum());
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Prints information about each function in the result set.
|
||||
*
|
||||
* @param resultRows the list of rows containing the info to print
|
||||
* @param title the title to print
|
||||
*/
|
||||
private void printFunctionQueryResults(List<BSimMatchResult> resultRows, String title) {
|
||||
printf(title + ": (%d)\n\n", resultRows.size());
|
||||
for (BSimMatchResult resultRow : resultRows) {
|
||||
printf(" queried function: %s\n",
|
||||
resultRow.getOriginalFunctionDescription().getFunctionName());
|
||||
printf(" matching function: %s\n",
|
||||
resultRow.getMatchFunctionDescription().getFunctionName());
|
||||
printf(" executable of matching function: %s\n",
|
||||
resultRow.getMatchFunctionDescription().getExecutableRecord().getNameExec());
|
||||
printf(" similarity: %f\n", resultRow.getSimilarity());
|
||||
printf(" significance: %f\n\n", resultRow.getSignificance());
|
||||
}
|
||||
printf("\n");
|
||||
}
|
||||
|
||||
}
|
173
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.py
Executable file
173
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.py
Executable file
@ -0,0 +1,173 @@
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
# Advanced example of BSim querying
|
||||
# @category BSim.python
|
||||
|
||||
import ghidra.query.facade.SimilarFunctionQueryService as SimilarFunctionQueryService
|
||||
import ghidra.query.facade.SFQueryInfo as SFQueryInfo
|
||||
import ghidra.query.FunctionDatabase as FunctionDatabase
|
||||
import ghidra.query.facade.QueryDatabaseException as QueryDatabaseException
|
||||
import java.util.HashSet as HashSet
|
||||
import ghidra.app.plugin.core.query.QueryNearestRow as QueryNearestRow
|
||||
import java.util.function.BiPredicate as BiPredicate
|
||||
import ghidra.query.protocol.FilterTemplate as FilterTemplate
|
||||
import ghidra.app.plugin.core.query.ExecutableResult as ExecutableResult
|
||||
import java.util.Comparator as Comparator
|
||||
import java.util.Arrays as Arrays
|
||||
import java.lang.Double as Double
|
||||
|
||||
#Query thresholds
|
||||
MAX_NUM_FUNCTIONS = 100
|
||||
SIMILARITY_BOUND = 0.7
|
||||
SIGNIFICANCE_BOUND = 0.0
|
||||
|
||||
#limit the number of results displayed
|
||||
NUM_EXES_TO_DISPLAY = 10
|
||||
|
||||
#for prefiltering: this number will be used to filter out small functions
|
||||
SELF_SIGNIFICANCE_BOUND = 40.0
|
||||
|
||||
def run():
|
||||
|
||||
#get the set of functions to query
|
||||
funcsToQuery = getFunctionsToQuery()
|
||||
|
||||
#sets up the object required for querying the database
|
||||
queryService = SimilarFunctionQueryService(currentProgram)
|
||||
queryInfo = SFQueryInfo(funcsToQuery)
|
||||
bsimFilter = queryInfo.getBsimFilter()
|
||||
|
||||
#sets the query parameters.
|
||||
#change the defined constants to control how fuzzy of
|
||||
#a match you're willing to accept, and the maximum number
|
||||
#of matches to return for each function
|
||||
queryInfo.setMaximumResults(MAX_NUM_FUNCTIONS)
|
||||
queryInfo.setSimilarityThreshold(SIMILARITY_BOUND)
|
||||
queryInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND)
|
||||
|
||||
#add the prefilters
|
||||
setPrefilters(queryService, queryInfo)
|
||||
|
||||
#add a filter on the date
|
||||
addBsimFilter(bsimFilter, FilterTemplate.DateLater(""), "01/01/1776")
|
||||
|
||||
#add a filter with multiple values. Since this is an "Equal" filter, the results are OR'd together
|
||||
#so a given executable will pass the main filter if it passes at least one of the subfilters
|
||||
addBsimFilter(bsimFilter, FilterTemplate.ArchEquals(),"x86:LE:64:default, x86:LE:32:default, ARM:LE:32:v4")
|
||||
|
||||
#now add a "notequal" filter
|
||||
#to pass, the compiler can't be windows and it can't be foo_compiler
|
||||
addBsimFilter(bsimFilter,FilterTemplate.CompNotEqual(),"windows, foo_compiler")
|
||||
|
||||
#establish a connection to the BSim database
|
||||
try:
|
||||
dbUrl = askString("","Enter the URL of the BSim database:", "ghidra://localhost/bsimDB")
|
||||
queryService.initializeDatabase(dbUrl)
|
||||
error = queryService.getDatabase().getLastError()
|
||||
if error is not None and (error.category is ErrorCategory.Nodatabase):
|
||||
print "Database [%s] cannot be found (does it exist?)" % dbUrl
|
||||
return
|
||||
except QueryDatabaseException as e:
|
||||
print e.getMessage()
|
||||
return
|
||||
|
||||
resultRows = executeQuery(queryService,queryInfo)
|
||||
printFunctionQueryResults(resultRows, "\nFunction-level results before filtering")
|
||||
|
||||
#now add some post-query filters, which filters the result set returned by the previous query
|
||||
|
||||
addBsimFilter(bsimFilter, FilterTemplate.Md5NotEqual(), currentProgram.getExecutableMD5())
|
||||
addBsimFilter(bsimFilter, FilterTemplate.CompilerEquals(), "gcc")
|
||||
addBsimFilter(bsimFilter, FilterTemplate.FunctionTagTemplate("KNOWN_LIBRARY", queryService), "false")
|
||||
|
||||
#apply the filters and print the results
|
||||
filteredRows = QueryNearestRow.filterMatchRows(bsimFilter, resultRows)
|
||||
printFunctionQueryResults(filteredRows, "\nFunction-level results after filtering")
|
||||
printExecutableInformation(filteredRows)
|
||||
return
|
||||
|
||||
|
||||
#collect the functions to query from currentProgram
|
||||
def getFunctionsToQuery():
|
||||
functions = HashSet();
|
||||
fIter = currentProgram.getFunctionManager().getFunctionsNoStubs(True)
|
||||
for func in fIter:
|
||||
functions.add(func.getSymbol())
|
||||
return functions
|
||||
|
||||
#query the database
|
||||
def executeQuery(queryService,queryInfo):
|
||||
queryResults = queryService.querySimilarFunctions(queryInfo,monitor)
|
||||
resultRows = QueryNearestRow.generate(queryResults.getSimilarityResults(),currentProgram)
|
||||
return resultRows
|
||||
|
||||
def printFunctionQueryResults(resultRows, title):
|
||||
print "%s: %d\n\n" % (title, resultRows.size())
|
||||
for row in resultRows:
|
||||
print " queried function: %s" % row.getOriginalFunctionDescription().getFunctionName()
|
||||
print " matching function: %s" % row.getMatchFunctionDescription().getFunctionName()
|
||||
print " executable of matching function: %s" % row.getMatchFunctionDescription().getExecutableRecord().getNameExec()
|
||||
print " similarity: %f" % row.getSimilarity()
|
||||
print " significance: %f\n" % row.getSignificance()
|
||||
|
||||
#Prefilters are used to filter out functions before sending a query to the database
|
||||
#A typical use case would be to collect all functions in a binary, then use a
|
||||
#prefilter to remove the functions with low self-significance (which is the
|
||||
#"BSim way" to remove small functions)
|
||||
def setPrefilters(queryService, queryInfo):
|
||||
preFilter = queryInfo.getPreFilter();
|
||||
selfSigFilter = ExampleFilter(queryService)
|
||||
preFilter.addPredicate(selfSigFilter)
|
||||
|
||||
class ExampleFilter(BiPredicate):
|
||||
|
||||
def __init__(self, queryService):
|
||||
self.queryService = queryService
|
||||
|
||||
def test(self,program, fdesc):
|
||||
return self.queryService.getLSHVectorFactory().getSelfSignificance(fdesc.getSignatureRecord().getLSHVector()) >= SELF_SIGNIFICANCE_BOUND
|
||||
|
||||
def addBsimFilter(bsimFilter, filterTemplate, values):
|
||||
for value in values.split(","):
|
||||
if len(value.strip()) > 0:
|
||||
bsimFilter.addAtom(filterTemplate, value.strip(), FilterTemplate.Blank())
|
||||
|
||||
#calls the methods to aggregate executable-level information about the matches
|
||||
def printExecutableInformation(filteredRows):
|
||||
execrows = ExecutableResult.generateFromMatchRows(filteredRows)
|
||||
results = execrows.toArray()
|
||||
sorter = Sorter()
|
||||
Arrays.sort(results,sorter)
|
||||
print "Executable-level results:"
|
||||
numExes = min(len(results),NUM_EXES_TO_DISPLAY)
|
||||
for i in range (numExes):
|
||||
print " MD5: %s" % results[i].getExecutableRecord().getMd5()
|
||||
print " Executable Name: %s" % results[i].getExecutableRecord().getNameExec()
|
||||
print " Function Count: %d" % results[i].getFunctionCount()
|
||||
print " Significance Sum: %f\n" % results[i].getSignificanceSum()
|
||||
return
|
||||
|
||||
class Sorter(Comparator):
|
||||
|
||||
def __init__(self):
|
||||
return
|
||||
|
||||
def compare(self,o1,o2):
|
||||
return Double.compare(o2.getSignificanceSum(), o1.getSignificanceSum())
|
||||
|
||||
|
||||
|
||||
run()
|
@ -0,0 +1,45 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.framework.options.Options;
|
||||
import ghidra.program.model.listing.Program;
|
||||
|
||||
//@category BSim
|
||||
//sets a property on the current program which can be used as
|
||||
//an executable category in BSim
|
||||
public class SetExecutableCategoryScript extends GhidraScript {
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
if (currentProgram == null) {
|
||||
popup("This script requires a program");
|
||||
return;
|
||||
}
|
||||
Options opts = currentProgram.getOptions(Program.PROGRAM_INFO);
|
||||
String name = askString("Enter Property Name", "Name");
|
||||
if (StringUtils.isAllBlank(name)) {
|
||||
return;
|
||||
}
|
||||
String value = askString("Enter Value of Property " + name, "Value");
|
||||
if (StringUtils.isAllBlank(value)) {
|
||||
return;
|
||||
}
|
||||
opts.setString(name, value);
|
||||
}
|
||||
|
||||
}
|
56
Ghidra/Features/BSim/ghidra_scripts/TailoredAnalysis.java
Executable file
56
Ghidra/Features/BSim/ghidra_scripts/TailoredAnalysis.java
Executable file
@ -0,0 +1,56 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.framework.options.Options;
|
||||
import ghidra.program.model.listing.Program;
|
||||
|
||||
// Setup tailored auto-analysis (in place of the headless analyzers full auto-analysis)
|
||||
// suitable for BSim ingest process. Intended to be invoked as an analyzeHeadless -preScript
|
||||
//@category BSim
|
||||
|
||||
public class TailoredAnalysis extends GhidraScript {
|
||||
|
||||
@Override
|
||||
public void run() throws Exception {
|
||||
Options pl = currentProgram.getOptions(Program.ANALYSIS_PROPERTIES);
|
||||
pl.setBoolean("Decompiler Parameter ID", false);
|
||||
|
||||
// These analyzers generate lots of cross references, which are not necessary for
|
||||
// signature analysis, and take time to run. On the other hand, you may want
|
||||
// them in general to facilitate general analysis
|
||||
pl.setBoolean("Stack", false);
|
||||
// pl.setBoolean("Windows x86 PE Instruction References", false);
|
||||
// pl.setBoolean("Windows x86 PE C++", false);
|
||||
// pl.setBoolean("Windows x86 PE Preliminary", false);
|
||||
// pl.setBoolean("ELF Scalar Operand References", false);
|
||||
|
||||
// Mangled symbols are good information but you may not be able to count on them being present in all versions
|
||||
// Options analyzerOptions = pl.getOptions("Demangler");
|
||||
// analyzerOptions.setBoolean("Commit Function Signatures", false);
|
||||
|
||||
// You really want these options turned on
|
||||
pl.setBoolean("Shared Return Calls",true);
|
||||
pl.setBoolean("Function Start Search", true);
|
||||
pl.setBoolean("DWARF", false);
|
||||
// Options analyzerOptions = pl.getOptions("Function Start Search");
|
||||
// analyzerOptions.setBoolean("Search Data Blocks", true);
|
||||
// analyzerOptions = pl.getOptions("Function Start Search After Code");
|
||||
// analyzerOptions.setBoolean("Search Data Blocks", true);
|
||||
// analyzerOptions = pl.getOptions("Function Start Search After Data");
|
||||
// analyzerOptions.setBoolean("Search Data Blocks", true);
|
||||
}
|
||||
|
||||
}
|
103
Ghidra/Features/BSim/ghidra_scripts/UpdateBSimMetadata.java
Executable file
103
Ghidra/Features/BSim/ghidra_scripts/UpdateBSimMetadata.java
Executable file
@ -0,0 +1,103 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
// Push updated information about function names and other metadata from the current program to a BSim database
|
||||
//@category BSim
|
||||
|
||||
import java.net.URL;
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.features.bsim.query.*;
|
||||
import ghidra.features.bsim.query.description.ExecutableRecord;
|
||||
import ghidra.features.bsim.query.description.FunctionDescription;
|
||||
import ghidra.features.bsim.query.protocol.QueryUpdate;
|
||||
import ghidra.features.bsim.query.protocol.ResponseUpdate;
|
||||
import ghidra.program.model.listing.FunctionIterator;
|
||||
import ghidra.program.model.listing.FunctionManager;
|
||||
|
||||
public class UpdateBSimMetadata extends GhidraScript {
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
if (currentProgram == null) {
|
||||
return;
|
||||
}
|
||||
String bsim_url = System.getProperty("ghidra.bsimurl");
|
||||
if (bsim_url==null || bsim_url.length()==0) {
|
||||
bsim_url = askString("Request Repository", "Select URL of database receiving update");
|
||||
}
|
||||
|
||||
URL url = BSimClientFactory.deriveBSimURL(bsim_url);
|
||||
try (FunctionDatabase database = BSimClientFactory.buildClient(url, true)) {
|
||||
if (!database.initialize()) {
|
||||
println(database.getLastError().message);
|
||||
return;
|
||||
}
|
||||
println("Connected to " + database.getInfo().databasename);
|
||||
|
||||
GenSignatures gensig = new GenSignatures(false);
|
||||
gensig.setVectorFactory(database.getLSHVectorFactory());
|
||||
gensig.openProgram(currentProgram, null, null, null, null, null);
|
||||
|
||||
FunctionManager functionManager = currentProgram.getFunctionManager();
|
||||
FunctionIterator funciter;
|
||||
if (currentSelection != null) {
|
||||
println("Scanning selected functions");
|
||||
funciter = functionManager.getFunctions(currentSelection, true);
|
||||
}
|
||||
else {
|
||||
println("Scanning all functions");
|
||||
funciter = functionManager.getFunctions(true); // If no highlight, update all functions
|
||||
}
|
||||
gensig.scanFunctionsMetadata(funciter, monitor);
|
||||
QueryUpdate update = new QueryUpdate();
|
||||
update.manage = gensig.getDescriptionManager();
|
||||
|
||||
ResponseUpdate respup = update.execute(database); // Try to push the update
|
||||
if (respup == null) {
|
||||
println(database.getLastError().message);
|
||||
return;
|
||||
}
|
||||
if (!respup.badexe.isEmpty()) {
|
||||
for (int j = 0; j < respup.badexe.size(); ++j) {
|
||||
ExecutableRecord erec = respup.badexe.get(j);
|
||||
println("Database does not contain executable: " + erec.getNameExec());
|
||||
}
|
||||
}
|
||||
if (!respup.badfunc.isEmpty()) {
|
||||
int max = respup.badfunc.size();
|
||||
if (max > 10) {
|
||||
println(
|
||||
"Could not find " + Integer.toString(respup.badfunc.size()) + " functions");
|
||||
max = 10;
|
||||
}
|
||||
for (int j = 0; j < max; ++j) {
|
||||
FunctionDescription func = respup.badfunc.get(j);
|
||||
println("Could not update function " + func.getFunctionName());
|
||||
}
|
||||
}
|
||||
if (respup.exeupdate > 0) {
|
||||
println("Updated executable metadata");
|
||||
}
|
||||
if (respup.funcupdate > 0) {
|
||||
println("Updated " + Integer.toString(respup.funcupdate) + " functions");
|
||||
}
|
||||
if (respup.exeupdate == 0 && respup.funcupdate == 0) {
|
||||
println("No changes");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
126
Ghidra/Features/BSim/make-postgres.sh
Executable file
126
Ghidra/Features/BSim/make-postgres.sh
Executable file
@ -0,0 +1,126 @@
|
||||
#!/bin/bash
|
||||
## ###
|
||||
# IP: GHIDRA
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
##
|
||||
#
|
||||
# This script may be used to build the postgresql server within
|
||||
# a GHIDRA installation. The postgresql server configuration options
|
||||
# below (POSTGRES_CONFIG_OPTIONS) may be adjusted if required
|
||||
# (e.g., build without openssl use, etc.).
|
||||
#
|
||||
# See https://www.postgresql.org/docs/10/install-procedure.html
|
||||
# for supported postgresql config options.
|
||||
#
|
||||
# Additional packages may need to be installed include to perform the
|
||||
# postgresql build. Please refer to the following web page for
|
||||
# package dependencies:
|
||||
#
|
||||
# https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code
|
||||
#
|
||||
# The postgresql source distribution should reside within the BSim module
|
||||
# directory prior to running this script. Within development environments
|
||||
# it will first check the ghidra.bin repo for this source file.
|
||||
#
|
||||
|
||||
POSTGRES=postgresql-15.3
|
||||
POSTGRES_GZ=${POSTGRES}.tar.gz
|
||||
POSTGRES_CONFIG_OPTIONS="--disable-rpath --with-openssl"
|
||||
|
||||
DIR=$(cd `dirname $0`; pwd)
|
||||
|
||||
POSTGRES_GZ_PATH=${DIR}/../../../../ghidra.bin/Ghidra/Features/BSim/${POSTGRES_GZ}
|
||||
if [ ! -f "${POSTGRES_GZ_PATH}" ]; then
|
||||
POSTGRES_GZ_PATH=${DIR}/${POSTGRES_GZ}
|
||||
if [ ! -f "${POSTGRES_GZ_PATH}" ]; then
|
||||
echo "Postgres source bundle not found: ${POSTGRES_GZ_PATH}"
|
||||
exit -1
|
||||
fi
|
||||
fi
|
||||
|
||||
OS=`uname -s`
|
||||
ARCH=`arch`
|
||||
|
||||
cd ${DIR}
|
||||
|
||||
mkdir -p build > /dev/null
|
||||
|
||||
if [ ! -d build/${POSTGRES} ]; then
|
||||
# Unpack postgres source distro into build
|
||||
echo "Unpacking postgresql source: ${POSTGRES_GZ_PATH}"
|
||||
$(cd build; tar -xzf ${POSTGRES_GZ_PATH} )
|
||||
fi
|
||||
|
||||
# Build postgresql
|
||||
|
||||
pushd build/${POSTGRES}
|
||||
|
||||
if [ "$OS" = "Darwin" ]; then
|
||||
export MACOSX_DEPLOYMENT_TARGET=10.5
|
||||
export ARCHFLAGS="-arch x86_64"
|
||||
OSDIR=mac_x86_64
|
||||
elif [ "$ARCH" = "x86_64" ]; then
|
||||
OSDIR=linux_x86_64
|
||||
else
|
||||
echo "Unsupported platform: $OS $ARCH"
|
||||
exit -1
|
||||
fi
|
||||
|
||||
# Install within build/os
|
||||
INSTALL_DIR=${DIR}/build/os/${OSDIR}/postgresql
|
||||
rm -rf ${INSTALL_DIR} > /dev/null
|
||||
|
||||
make distclean
|
||||
|
||||
# Configure postgres
|
||||
|
||||
./configure ${POSTGRES_CONFIG_OPTIONS} --prefix=${INSTALL_DIR}
|
||||
if [ $? != 0 ]; then
|
||||
exit $?
|
||||
fi
|
||||
|
||||
make install
|
||||
if [ $? != 0 ]; then
|
||||
exit $?
|
||||
fi
|
||||
|
||||
make -C contrib/pg_prewarm install
|
||||
if [ $? != 0 ]; then
|
||||
exit $?
|
||||
fi
|
||||
|
||||
echo "Completed postgresql build"
|
||||
|
||||
# Build lshvector plugin for postgresql
|
||||
|
||||
popd
|
||||
|
||||
rm -rf build/lshvector > /dev/null
|
||||
mkdir build/lshvector
|
||||
|
||||
echo "Building lshvector plugin..."
|
||||
|
||||
cp src/lshvector/* build/lshvector
|
||||
cp src/lshvector/c/* build/lshvector
|
||||
|
||||
cd build/lshvector
|
||||
make -f Makefile.lshvector install PG_CONFIG=${INSTALL_DIR}/bin/pg_config
|
||||
|
||||
if [ $? = 0 ]; then
|
||||
echo "Completed build and install of lshvector postgresql plugin"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
exit -1
|
||||
|
34
Ghidra/Features/BSim/other/testscripts/InstallMetadataTest.java
Executable file
34
Ghidra/Features/BSim/other/testscripts/InstallMetadataTest.java
Executable file
@ -0,0 +1,34 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.framework.options.Options;
|
||||
import ghidra.program.model.listing.Program;
|
||||
|
||||
/**
|
||||
* This script is used by the unit test BSimServerTest
|
||||
*/
|
||||
public class InstallMetadataTest extends GhidraScript {
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
Options pl = currentProgram.getOptions(Program.PROGRAM_INFO);
|
||||
String value = "static";
|
||||
if (currentProgram.getName().contains(".so"))
|
||||
value = "shared";
|
||||
pl.setString("Test Category", value);
|
||||
}
|
||||
|
||||
}
|
69
Ghidra/Features/BSim/other/testscripts/RegressionSignatures.java
Executable file
69
Ghidra/Features/BSim/other/testscripts/RegressionSignatures.java
Executable file
@ -0,0 +1,69 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
import java.io.File;
|
||||
import java.io.FileWriter;
|
||||
import java.io.IOException;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Iterator;
|
||||
import java.util.List;
|
||||
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.listing.FunctionManager;
|
||||
import ghidra.features.bsim.query.FunctionDatabase;
|
||||
import ghidra.features.bsim.query.GenSignatures;
|
||||
import ghidra.features.bsim.query.client.Configuration;
|
||||
import ghidra.features.bsim.query.description.DescriptionManager;
|
||||
|
||||
/**
|
||||
* This script is used by the unit test BSimServerTest
|
||||
*/
|
||||
public class RegressionSignatures extends GhidraScript {
|
||||
|
||||
@Override
|
||||
protected void run() throws Exception {
|
||||
String md5string = currentProgram.getExecutableMD5();
|
||||
if ((md5string == null) || (md5string.length() < 10))
|
||||
throw new IOException("Could not get MD5 on file: " + currentProgram.getName());
|
||||
String basename = "sigs_" + md5string;
|
||||
File file = null;
|
||||
// This form of askString will work for both standalone execution or for parallel
|
||||
File workingdir = askDirectory("RegressionSignatures:", "Working directory");
|
||||
file = new File(workingdir, basename);
|
||||
|
||||
LSHVectorFactory vectorFactory = FunctionDatabase.generateLSHVectorFactory();
|
||||
Configuration config = FunctionDatabase.loadConfigurationTemplate("medium_64");
|
||||
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings);
|
||||
GenSignatures gensig = new GenSignatures(true);
|
||||
gensig.setVectorFactory(vectorFactory);
|
||||
|
||||
List<String> names = new ArrayList<String>();
|
||||
names.add("Test Category");
|
||||
gensig.addExecutableCategories(names);
|
||||
String repo = "ghidra://localhost/repo";
|
||||
String path = "/raw";
|
||||
gensig.openProgram(this.currentProgram, null, null, null, repo, path);
|
||||
FunctionManager fman = currentProgram.getFunctionManager();
|
||||
Iterator<Function> iter = fman.getFunctions(true);
|
||||
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor);
|
||||
FileWriter fwrite = new FileWriter(file);
|
||||
DescriptionManager manager = gensig.getDescriptionManager();
|
||||
manager.saveXml(fwrite);
|
||||
fwrite.close();
|
||||
}
|
||||
|
||||
}
|
25
Ghidra/Features/BSim/src/lshvector/Makefile.lshvector
Executable file
25
Ghidra/Features/BSim/src/lshvector/Makefile.lshvector
Executable file
@ -0,0 +1,25 @@
|
||||
# Locality Sensitive Hashing package
|
||||
# NOTE: This file cannot be executed in place. It is copied into a temporary
|
||||
# directory with its source code and executed there.
|
||||
|
||||
ifeq ($(PG_CONFIG),)
|
||||
default:
|
||||
echo "You must specifiy PG_CONFIG"
|
||||
false
|
||||
|
||||
endif
|
||||
|
||||
MODULE_big = lshvector
|
||||
OBJS= lsh.o weights.o binhash.o crc32.o
|
||||
|
||||
EXTENSION = lshvector
|
||||
DATA = lshvector--1.0.sql
|
||||
|
||||
REGRESS = lshvector
|
||||
|
||||
EXTRA_CLEAN =
|
||||
|
||||
SHLIB_LINK += $(filter -lm, $(LIBS))
|
||||
|
||||
PGXS := $(shell $(PG_CONFIG) --pgxs)
|
||||
include $(PGXS)
|
277
Ghidra/Features/BSim/src/lshvector/c/binhash.c
Executable file
277
Ghidra/Features/BSim/src/lshvector/c/binhash.c
Executable file
@ -0,0 +1,277 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "lsh.h"
|
||||
|
||||
#define LSH_HASHBASE 0xD7E6A299
|
||||
|
||||
static char hash_signtable[512];
|
||||
|
||||
static void hash_int_fft_16(int32 *arr)
|
||||
|
||||
{
|
||||
int32 x,y;
|
||||
|
||||
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||
}
|
||||
|
||||
static void hash_double_fft_16(double *arr)
|
||||
|
||||
{
|
||||
double x,y;
|
||||
|
||||
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||
|
||||
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||
}
|
||||
|
||||
/*
|
||||
* This is a precalculated table for generating dotproducts with the random family of vectors directly
|
||||
* The first vector r_0 is expressed as a hashing function on the dimension index and the other vectors
|
||||
* are derived from r_0 using an FFT. The table is formed by precalculating the FFT on basis vectors in this table
|
||||
*/
|
||||
void lsh_setup_signtable(void)
|
||||
|
||||
{
|
||||
int32 i,j;
|
||||
int32 arr[16];
|
||||
char *hibit0ptr;
|
||||
char *hibit1ptr;
|
||||
|
||||
for(i=0;i<16;++i) { /* For each 4-bit position */
|
||||
hibit0ptr = hash_signtable + i * 16;
|
||||
hibit1ptr = hash_signtable + (i+16) * 16;
|
||||
for(j=0;j<16;++j)
|
||||
arr[j] = 0;
|
||||
|
||||
arr[ i ] = 1;
|
||||
hash_int_fft_16(arr);
|
||||
for(j=0;j<16;++j) {
|
||||
if (arr[j] > 0) {
|
||||
hibit0ptr[j] = '+';
|
||||
hibit1ptr[j] = '-';
|
||||
}
|
||||
else {
|
||||
hibit0ptr[j] = '-';
|
||||
hibit1ptr[j] = '+';
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Generate a dot product of the hash vector in -vec- with a random family of 16 vectors, { r }
|
||||
* r_0 is a randomly generated set of +1 -1 coefficients across all the dimensions (indexed by uint32 vec[i].hash)
|
||||
* The coefficient is calculated as a hashing function from the seed -hashcur- and the index (vec[i].hash),
|
||||
* so it should be balanced between +1 and -1.
|
||||
* All the other vectors are generated from an FFT of r_0. This allows the dotproduct with vec to be calculated
|
||||
* using an FFT if -vec- has many non-zero coefficients. If -vec- has only a few non-zero coefficients,
|
||||
* the dotproduct if calculated with each vector in the family directly for better efficiency.
|
||||
* The resulting dotproducts are converted into a 16-long bitvector based on the sign of the dotproduct and
|
||||
* placed in -bucket-
|
||||
*/
|
||||
static uint32 hash_16_dotproduct(uint32 bucket,LSH_ITEM *vec,uint32 vecsize,uint32 hashcur,uint32 vecsizeupper)
|
||||
|
||||
{
|
||||
uint32 i,j;
|
||||
uint32 rownum;
|
||||
char *signptr;
|
||||
double res[16];
|
||||
|
||||
for(i=0;i<16;++i)
|
||||
res[i] = 0.0; /* Initialize the dotproduct results to zero */
|
||||
|
||||
if (vecsize < vecsizeupper) { /* If there are a small number of non-zero coefficients in -vec- */
|
||||
for(i=0;i<vecsize;++i) {
|
||||
rownum = vec[i].hash ^ hashcur; /* Calculate the rest of the r_0 hashing function*/
|
||||
rownum = (rownum * 1103515245) + 12345;
|
||||
rownum = (rownum>>24)&0x1f;
|
||||
signptr = hash_signtable + rownum * 16;
|
||||
for(j=0;j<16;++j) { /* Based on the precalculated coeff table calculate this portion of dotproduct */
|
||||
if (signptr[j] == '+')
|
||||
res[j] += vec[i].coeff; /* Dot product with +1 coeff */
|
||||
else
|
||||
res[j] -= vec[i].coeff; /* Dot product with -1 coeff */
|
||||
}
|
||||
}
|
||||
}
|
||||
else { /* If we have many non-zero coeffs in -vec- */
|
||||
for(i=0;i<vecsize;++i) {
|
||||
rownum = vec[i].hash ^ hashcur; /* Calculate the rest of the r_0 hashing function*/
|
||||
rownum = (rownum * 1103515245) + 12345;
|
||||
rownum = (rownum>>24)&0x1f;
|
||||
if (rownum < 0x10) /* Set-up for the FFT */
|
||||
res[rownum] += vec[i].coeff;
|
||||
else
|
||||
res[rownum&0xf] -= vec[i].coeff;
|
||||
}
|
||||
hash_double_fft_16(res); /* Calculate the remaining dotproducts be performing FFT */
|
||||
}
|
||||
|
||||
for(i=0;i<16;++i) { /* Convert the dotproduct results to a bitvector */
|
||||
bucket <<= 1;
|
||||
if (res[i] > 0.0)
|
||||
bucket |= 1;
|
||||
}
|
||||
return bucket;
|
||||
}
|
||||
|
||||
void lsh_generate_binids(uint32 *res,LSH_ITEM *vec,uint32 vecsize)
|
||||
|
||||
{
|
||||
uint32 bucket = 0;
|
||||
int32 bucketcnt = 0;
|
||||
int32 i,bitsleft;
|
||||
uint32 curid;
|
||||
uint32 mask,val;
|
||||
uint32 hashbase = LSH_HASHBASE;
|
||||
|
||||
for(i=0;i<lsh_L;++i) {
|
||||
curid = i; /* Tack-on bits that indicate the particular table this binid belongs to */
|
||||
bitsleft = lsh_k;
|
||||
do {
|
||||
if (bucketcnt == 0) {
|
||||
hashbase = (hashbase * 1103515245) + 12345;
|
||||
bucket = hash_16_dotproduct(bucket,vec,vecsize,hashbase,5);
|
||||
bucketcnt += 16;
|
||||
}
|
||||
if (bucketcnt >= bitsleft) {
|
||||
curid <<= bitsleft;
|
||||
mask = 1;
|
||||
mask = (mask << bitsleft)-1;
|
||||
val = bucket >> (bucketcnt - bitsleft);
|
||||
curid |= (val & mask);
|
||||
bucketcnt -= bitsleft;
|
||||
bitsleft = 0;
|
||||
}
|
||||
else {
|
||||
curid <<= bucketcnt;
|
||||
mask = 1;
|
||||
mask = (mask << bucketcnt)-1;
|
||||
curid |= (bucket & mask);
|
||||
bitsleft -= bucketcnt;
|
||||
bucketcnt = 0;
|
||||
}
|
||||
} while(bitsleft > 0);
|
||||
res[ i ] = curid;
|
||||
}
|
||||
}
|
||||
|
||||
void lsh_generate_binids_datum(Datum *res,LSH_ITEM *vec,uint32 vecsize)
|
||||
|
||||
{
|
||||
uint32 bucket = 0;
|
||||
int32 bucketcnt = 0;
|
||||
int32 i,bitsleft;
|
||||
uint32 curid;
|
||||
uint32 mask,val;
|
||||
uint32 hashbase = LSH_HASHBASE;
|
||||
|
||||
for(i=0;i<lsh_L;++i) {
|
||||
curid = i; /* Tack-on bits that indicate the particular table this binid belongs to */
|
||||
bitsleft = lsh_k;
|
||||
do {
|
||||
if (bucketcnt == 0) {
|
||||
hashbase = (hashbase * 1103515245) + 12345;
|
||||
bucket = hash_16_dotproduct(bucket,vec,vecsize,hashbase,5);
|
||||
bucketcnt += 16;
|
||||
}
|
||||
if (bucketcnt >= bitsleft) {
|
||||
curid <<= bitsleft;
|
||||
mask = 1;
|
||||
mask = (mask << bitsleft)-1;
|
||||
val = bucket >> (bucketcnt - bitsleft);
|
||||
curid |= (val & mask);
|
||||
bucketcnt -= bitsleft;
|
||||
bitsleft = 0;
|
||||
}
|
||||
else {
|
||||
curid <<= bucketcnt;
|
||||
mask = 1;
|
||||
mask = (mask << bucketcnt)-1;
|
||||
curid |= (bucket & mask);
|
||||
bitsleft -= bucketcnt;
|
||||
bucketcnt = 0;
|
||||
}
|
||||
} while(bitsleft > 0);
|
||||
res[ i ] = Int32GetDatum((int32)curid);
|
||||
}
|
||||
}
|
101
Ghidra/Features/BSim/src/lshvector/c/crc32.c
Executable file
101
Ghidra/Features/BSim/src/lshvector/c/crc32.c
Executable file
@ -0,0 +1,101 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "lsh.h"
|
||||
|
||||
#define CRC_UPDATE(REG,VAL) (crc32tab[ (REG ^ VAL)&0xff ] ^ (REG >> 8))
|
||||
|
||||
/* Table for bytewise calculation of a 32-bit Cyclic Redundancy Check */
|
||||
uint32 crc32tab[] = {
|
||||
0x0,0x77073096,0xee0e612c,0x990951ba,0x76dc419,0x706af48f,
|
||||
0xe963a535,0x9e6495a3,0xedb8832,0x79dcb8a4,0xe0d5e91e,
|
||||
0x97d2d988,0x9b64c2b,0x7eb17cbd,0xe7b82d07,0x90bf1d91,
|
||||
0x1db71064,0x6ab020f2,0xf3b97148,0x84be41de,0x1adad47d,
|
||||
0x6ddde4eb,0xf4d4b551,0x83d385c7,0x136c9856,0x646ba8c0,
|
||||
0xfd62f97a,0x8a65c9ec,0x14015c4f,0x63066cd9,0xfa0f3d63,
|
||||
0x8d080df5,0x3b6e20c8,0x4c69105e,0xd56041e4,0xa2677172,
|
||||
0x3c03e4d1,0x4b04d447,0xd20d85fd,0xa50ab56b,0x35b5a8fa,
|
||||
0x42b2986c,0xdbbbc9d6,0xacbcf940,0x32d86ce3,0x45df5c75,
|
||||
0xdcd60dcf,0xabd13d59,0x26d930ac,0x51de003a,0xc8d75180,
|
||||
0xbfd06116,0x21b4f4b5,0x56b3c423,0xcfba9599,0xb8bda50f,
|
||||
0x2802b89e,0x5f058808,0xc60cd9b2,0xb10be924,0x2f6f7c87,
|
||||
0x58684c11,0xc1611dab,0xb6662d3d,0x76dc4190,0x1db7106,
|
||||
0x98d220bc,0xefd5102a,0x71b18589,0x6b6b51f,0x9fbfe4a5,
|
||||
0xe8b8d433,0x7807c9a2,0xf00f934,0x9609a88e,0xe10e9818,
|
||||
0x7f6a0dbb,0x86d3d2d,0x91646c97,0xe6635c01,0x6b6b51f4,
|
||||
0x1c6c6162,0x856530d8,0xf262004e,0x6c0695ed,0x1b01a57b,
|
||||
0x8208f4c1,0xf50fc457,0x65b0d9c6,0x12b7e950,0x8bbeb8ea,
|
||||
0xfcb9887c,0x62dd1ddf,0x15da2d49,0x8cd37cf3,0xfbd44c65,
|
||||
0x4db26158,0x3ab551ce,0xa3bc0074,0xd4bb30e2,0x4adfa541,
|
||||
0x3dd895d7,0xa4d1c46d,0xd3d6f4fb,0x4369e96a,0x346ed9fc,
|
||||
0xad678846,0xda60b8d0,0x44042d73,0x33031de5,0xaa0a4c5f,
|
||||
0xdd0d7cc9,0x5005713c,0x270241aa,0xbe0b1010,0xc90c2086,
|
||||
0x5768b525,0x206f85b3,0xb966d409,0xce61e49f,0x5edef90e,
|
||||
0x29d9c998,0xb0d09822,0xc7d7a8b4,0x59b33d17,0x2eb40d81,
|
||||
0xb7bd5c3b,0xc0ba6cad,0xedb88320,0x9abfb3b6,0x3b6e20c,
|
||||
0x74b1d29a,0xead54739,0x9dd277af,0x4db2615,0x73dc1683,
|
||||
0xe3630b12,0x94643b84,0xd6d6a3e,0x7a6a5aa8,0xe40ecf0b,
|
||||
0x9309ff9d,0xa00ae27,0x7d079eb1,0xf00f9344,0x8708a3d2,
|
||||
0x1e01f268,0x6906c2fe,0xf762575d,0x806567cb,0x196c3671,
|
||||
0x6e6b06e7,0xfed41b76,0x89d32be0,0x10da7a5a,0x67dd4acc,
|
||||
0xf9b9df6f,0x8ebeeff9,0x17b7be43,0x60b08ed5,0xd6d6a3e8,
|
||||
0xa1d1937e,0x38d8c2c4,0x4fdff252,0xd1bb67f1,0xa6bc5767,
|
||||
0x3fb506dd,0x48b2364b,0xd80d2bda,0xaf0a1b4c,0x36034af6,
|
||||
0x41047a60,0xdf60efc3,0xa867df55,0x316e8eef,0x4669be79,
|
||||
0xcb61b38c,0xbc66831a,0x256fd2a0,0x5268e236,0xcc0c7795,
|
||||
0xbb0b4703,0x220216b9,0x5505262f,0xc5ba3bbe,0xb2bd0b28,
|
||||
0x2bb45a92,0x5cb36a04,0xc2d7ffa7,0xb5d0cf31,0x2cd99e8b,
|
||||
0x5bdeae1d,0x9b64c2b0,0xec63f226,0x756aa39c,0x26d930a,
|
||||
0x9c0906a9,0xeb0e363f,0x72076785,0x5005713,0x95bf4a82,
|
||||
0xe2b87a14,0x7bb12bae,0xcb61b38,0x92d28e9b,0xe5d5be0d,
|
||||
0x7cdcefb7,0xbdbdf21,0x86d3d2d4,0xf1d4e242,0x68ddb3f8,
|
||||
0x1fda836e,0x81be16cd,0xf6b9265b,0x6fb077e1,0x18b74777,
|
||||
0x88085ae6,0xff0f6a70,0x66063bca,0x11010b5c,0x8f659eff,
|
||||
0xf862ae69,0x616bffd3,0x166ccf45,0xa00ae278,0xd70dd2ee,
|
||||
0x4e048354,0x3903b3c2,0xa7672661,0xd06016f7,0x4969474d,
|
||||
0x3e6e77db,0xaed16a4a,0xd9d65adc,0x40df0b66,0x37d83bf0,
|
||||
0xa9bcae53,0xdebb9ec5,0x47b2cf7f,0x30b5ffe9,0xbdbdf21c,
|
||||
0xcabac28a,0x53b39330,0x24b4a3a6,0xbad03605,0xcdd70693,
|
||||
0x54de5729,0x23d967bf,0xb3667a2e,0xc4614ab8,0x5d681b02,
|
||||
0x2a6f2b94,0xb40bbe37,0xc30c8ea1,0x5a05df1b,0x2d02ef8d };
|
||||
|
||||
uint64 lsh_hash_internal(LSHVECTOR *vec)
|
||||
|
||||
{
|
||||
uint32 reg1,reg2;
|
||||
uint32 curtf,curhash,oldreg1;
|
||||
uint32 i;
|
||||
uint64 res;
|
||||
|
||||
reg1 = 0x12CF93AB;
|
||||
reg2 = 0xEE39B2D6;
|
||||
|
||||
for(i=0;i<vec->numitems;++i) {
|
||||
curtf = vec->items[i].tf;
|
||||
curhash = vec->items[i].hash;
|
||||
oldreg1 = reg1;
|
||||
reg1 = CRC_UPDATE(reg1,curtf);
|
||||
reg1 = CRC_UPDATE(reg1,curhash);
|
||||
reg1 = CRC_UPDATE(reg1,(reg2>>24));
|
||||
reg2 = CRC_UPDATE(reg2,(oldreg1>>24));
|
||||
reg2 = CRC_UPDATE(reg2,(curhash>>8));
|
||||
reg2 = CRC_UPDATE(reg2,(curhash>>16));
|
||||
reg2 = CRC_UPDATE(reg2,(curhash>>24));
|
||||
}
|
||||
res = reg1;
|
||||
res <<= 32;
|
||||
res |= reg2;
|
||||
return res;
|
||||
}
|
414
Ghidra/Features/BSim/src/lshvector/c/lsh.c
Executable file
414
Ghidra/Features/BSim/src/lshvector/c/lsh.c
Executable file
@ -0,0 +1,414 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "lsh.h"
|
||||
#include "fmgr.h"
|
||||
#include "funcapi.h"
|
||||
#include "access/htup_details.h"
|
||||
#include "access/gin.h"
|
||||
#include "libpq/pqformat.h"
|
||||
#include <ctype.h>
|
||||
|
||||
PG_MODULE_MAGIC;
|
||||
|
||||
void _PG_init(void);
|
||||
|
||||
PG_FUNCTION_INFO_V1(lshvector_in);
|
||||
PG_FUNCTION_INFO_V1(lshvector_out);
|
||||
PG_FUNCTION_INFO_V1(lshvector_send);
|
||||
PG_FUNCTION_INFO_V1(lshvector_recv);
|
||||
PG_FUNCTION_INFO_V1(lshvector_hash);
|
||||
PG_FUNCTION_INFO_V1(lshvector_compare);
|
||||
PG_FUNCTION_INFO_V1(lshvector_overlap);
|
||||
|
||||
PG_FUNCTION_INFO_V1(lshvector_gin_extract_value);
|
||||
PG_FUNCTION_INFO_V1(lshvector_gin_extract_query);
|
||||
PG_FUNCTION_INFO_V1(lshvector_gin_consistent);
|
||||
|
||||
PG_FUNCTION_INFO_V1(lsh_load);
|
||||
PG_FUNCTION_INFO_V1(lsh_reload);
|
||||
PG_FUNCTION_INFO_V1(lsh_getweight);
|
||||
|
||||
Datum lshvector_in(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_out(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_send(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_recv(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_hash(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_compare(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_overlap(PG_FUNCTION_ARGS);
|
||||
|
||||
Datum lshvector_gin_extract_value(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_gin_extract_query(PG_FUNCTION_ARGS);
|
||||
Datum lshvector_gin_consistent(PG_FUNCTION_ARGS);
|
||||
|
||||
Datum lsh_load(PG_FUNCTION_ARGS);
|
||||
Datum lsh_reload(PG_FUNCTION_ARGS);
|
||||
Datum lsh_getweight(PG_FUNCTION_ARGS);
|
||||
|
||||
/*
|
||||
* Allocate memory for an LSHVECTOR given the raw count of the number of hash entries in the vector
|
||||
*/
|
||||
static LSHVECTOR *allocate_lshvector(uint32 numentries)
|
||||
|
||||
{
|
||||
LSHVECTOR *out;
|
||||
uint32 maxitems, commonlen;
|
||||
|
||||
/* Maximum number of hashes in a single LSHVECTOR assuming a 1 gigabyte allocation limit */
|
||||
maxitems = (0x3fffffff - HDRSIZELSH) / sizeof(LSH_ITEM);
|
||||
|
||||
if (numentries > maxitems) {
|
||||
ereport(ERROR,(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),errmsg("Exceeded maximum entries for single lshvector")));
|
||||
/* Does not return */
|
||||
}
|
||||
commonlen = HDRSIZELSH + numentries * sizeof(LSH_ITEM);
|
||||
out = (LSHVECTOR *) palloc(commonlen);
|
||||
SET_VARSIZE(out,commonlen);
|
||||
return out;
|
||||
}
|
||||
|
||||
void _PG_init(void)
|
||||
|
||||
{
|
||||
lsh_initialize();
|
||||
}
|
||||
|
||||
Datum lsh_load(PG_FUNCTION_ARGS)
|
||||
|
||||
{
|
||||
if (!weights_loaded) {
|
||||
lsh_load_weights();
|
||||
lsh_load_lookuptable();
|
||||
lsh_load_binconfig();
|
||||
weights_loaded = true;
|
||||
}
|
||||
PG_RETURN_INT32(0);
|
||||
}
|
||||
|
||||
Datum lsh_reload(PG_FUNCTION_ARGS)
|
||||
|
||||
{
|
||||
lsh_load_weights();
|
||||
lsh_load_lookuptable();
|
||||
lsh_load_binconfig();
|
||||
weights_loaded = true;
|
||||
PG_RETURN_INT32(0);
|
||||
}
|
||||
|
||||
Datum lsh_getweight(PG_FUNCTION_ARGS)
|
||||
|
||||
{
|
||||
LSHVECTOR *vec = PG_GETARG_LSHVECTOR_P(0);
|
||||
uint32 arg = PG_GETARG_UINT32(1);
|
||||
double res;
|
||||
|
||||
if (arg >= vec->numitems)
|
||||
res = 0.0;
|
||||
else
|
||||
res = vec->items[arg].coeff;
|
||||
PG_FREE_IF_COPY(vec,0);
|
||||
PG_RETURN_FLOAT8( res );
|
||||
}
|
||||
|
||||
/*
|
||||
* text input
|
||||
*/
|
||||
Datum
|
||||
lshvector_in(PG_FUNCTION_ARGS)
|
||||
{
|
||||
char *buf = (char *) PG_GETARG_POINTER(0);
|
||||
char *ptr,*ptrstart;
|
||||
LSHVECTOR *vec;
|
||||
uint32 numitems = 0;
|
||||
uint32 commacount = 0;
|
||||
uint32 i,j;
|
||||
int32 val;
|
||||
char curc;
|
||||
|
||||
ptr = buf;
|
||||
curc = '\0';
|
||||
while(*ptr) {
|
||||
curc = *ptr;
|
||||
if (isspace(curc)==0) break;
|
||||
++ptr;
|
||||
}
|
||||
if (curc != '(')
|
||||
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Missing opening '('"))); /* Does not return */
|
||||
++ptr;
|
||||
ptrstart = ptr;
|
||||
while (*ptr) {
|
||||
curc = *ptr;
|
||||
if (curc == ':')
|
||||
numitems += 1;
|
||||
else if (curc == ',')
|
||||
commacount += 1;
|
||||
else if (curc == ')')
|
||||
break;
|
||||
++ptr;
|
||||
}
|
||||
if ((curc != ')')||(numitems != commacount+1))
|
||||
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Bad delimiters"))); /* Does not return */
|
||||
|
||||
vec = allocate_lshvector(numitems);
|
||||
|
||||
ptr = ptrstart;
|
||||
i = 0;
|
||||
j = 0;
|
||||
while(*ptr) {
|
||||
val = strtol(ptr,&ptr,16);
|
||||
if (j==0) {
|
||||
if ((val<1)||(val>64)) {
|
||||
pfree(vec);
|
||||
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Term frequency count out of bounds"))); /* Does not return */
|
||||
}
|
||||
vec->items[i].tf = (uint16)val;
|
||||
j = 1;
|
||||
}
|
||||
else {
|
||||
vec->items[i].hash = (uint32)val;
|
||||
vec->items[i].idf = 0;
|
||||
j = 0;
|
||||
i += 1;
|
||||
}
|
||||
while(isspace( *ptr ))
|
||||
ptr++;
|
||||
if (*ptr == ')') break;
|
||||
if (*ptr == ':') {
|
||||
if (j==0) {
|
||||
pfree(vec);
|
||||
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Expected ','"))); /* Does not return */
|
||||
}
|
||||
ptr++;
|
||||
}
|
||||
else if (*ptr == ',') {
|
||||
if (j==1) {
|
||||
pfree(vec);
|
||||
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Expected ':'"))); /* Does not return */
|
||||
}
|
||||
ptr++;
|
||||
}
|
||||
}
|
||||
vec->numitems = numitems;
|
||||
lsh_calc_weights(vec);
|
||||
PG_RETURN_POINTER(vec);
|
||||
}
|
||||
|
||||
/*
|
||||
* text output
|
||||
*/
|
||||
Datum
|
||||
lshvector_out(PG_FUNCTION_ARGS)
|
||||
{
|
||||
LSHVECTOR *vec = PG_GETARG_LSHVECTOR_P(0);
|
||||
StringInfoData buf;
|
||||
uint32 i,sz;
|
||||
|
||||
initStringInfo(&buf);
|
||||
|
||||
appendStringInfoChar(&buf,'(');
|
||||
sz = vec->numitems;
|
||||
for(i=0;i<sz;++i) {
|
||||
appendStringInfo(&buf,"%x",(int32)vec->items[i].tf);
|
||||
appendStringInfoChar(&buf,':');
|
||||
appendStringInfo(&buf,"%x",(int32)vec->items[i].hash);
|
||||
if (i+1 < sz)
|
||||
appendStringInfoChar(&buf,',');
|
||||
}
|
||||
appendStringInfoChar(&buf,')');
|
||||
|
||||
PG_FREE_IF_COPY(vec,0);
|
||||
|
||||
PG_RETURN_CSTRING(buf.data);
|
||||
}
|
||||
|
||||
/*
|
||||
* binary output
|
||||
*/
|
||||
Datum
|
||||
lshvector_send(PG_FUNCTION_ARGS)
|
||||
{
|
||||
LSHVECTOR *vec = PG_GETARG_LSHVECTOR_P(0);
|
||||
uint32 i;
|
||||
uint32 numitems;
|
||||
StringInfoData buf;
|
||||
|
||||
numitems = vec->numitems;
|
||||
|
||||
pq_begintypsend(&buf);
|
||||
pq_sendint(&buf,numitems,4);
|
||||
|
||||
for(i=0;i<numitems;++i) {
|
||||
pq_sendint(&buf,vec->items[i].tf,1);
|
||||
pq_sendint(&buf,vec->items[i].hash,4);
|
||||
}
|
||||
PG_FREE_IF_COPY(vec,0);
|
||||
PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
|
||||
}
|
||||
|
||||
/*
|
||||
* binary input
|
||||
*/
|
||||
Datum
|
||||
lshvector_recv(PG_FUNCTION_ARGS)
|
||||
{
|
||||
LSHVECTOR *out;
|
||||
StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
|
||||
uint32 numitems;
|
||||
uint32 tf;
|
||||
uint32 i;
|
||||
|
||||
numitems = pq_getmsgint(buf,4);
|
||||
out = allocate_lshvector(numitems);
|
||||
|
||||
out->numitems = numitems;
|
||||
for(i=0;i<numitems;++i) {
|
||||
tf = pq_getmsgint(buf,1);
|
||||
if ((tf<1)||(tf>64)) {
|
||||
pfree(out);
|
||||
ereport(ERROR,(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),errmsg("Term frequency is out of range")));
|
||||
/* Does not return */
|
||||
}
|
||||
out->items[i].tf = tf;
|
||||
out->items[i].hash = pq_getmsgint(buf,4);
|
||||
}
|
||||
lsh_calc_weights(out);
|
||||
PG_RETURN_POINTER(out);
|
||||
}
|
||||
|
||||
Datum lshvector_hash(PG_FUNCTION_ARGS)
|
||||
{
|
||||
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||
int64 res = (int64)lsh_hash_internal(a);
|
||||
|
||||
PG_FREE_IF_COPY(a,0);
|
||||
|
||||
PG_RETURN_INT64(res);
|
||||
}
|
||||
|
||||
Datum lshvector_compare(PG_FUNCTION_ARGS)
|
||||
{
|
||||
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||
LSHVECTOR *b = PG_GETARG_LSHVECTOR_P(1);
|
||||
TupleDesc tupdesc;
|
||||
TupleDesc bless;
|
||||
HeapTuple restuple;
|
||||
Datum dvalues[2];
|
||||
bool nulls[2] = {false, false};
|
||||
double sim,sig;
|
||||
|
||||
sim = lsh_compare_internal(a,b,&sig);
|
||||
PG_FREE_IF_COPY(a,0);
|
||||
PG_FREE_IF_COPY(b,1);
|
||||
|
||||
if (get_call_result_type(fcinfo,NULL,&tupdesc) != TYPEFUNC_COMPOSITE)
|
||||
elog(ERROR,"Could not get composite row type to return");
|
||||
|
||||
bless = BlessTupleDesc(tupdesc);
|
||||
|
||||
dvalues[0] = Float8GetDatum(sim);
|
||||
dvalues[1] = Float8GetDatum(sig);
|
||||
restuple = heap_form_tuple(bless,dvalues,nulls);
|
||||
return HeapTupleGetDatum(restuple);
|
||||
}
|
||||
|
||||
/*
|
||||
* This is the actual operator function being accelerated by the gin index. In truth, the index itself
|
||||
* defines the operator, so the commented out code below emulates the indexes key generation process and
|
||||
* looks for overlap in the keys between two vectors. In practice, any query that invokes this operator
|
||||
* will hopefully be going through the index and so doesn't need to evaluate this function. For
|
||||
* cases where postgresql does a recheck after going through the index, there is no query that doesn't send
|
||||
* the results of the operator test to a similarity filter. So there is no reason to actually perform
|
||||
* the overlap test. So we just implement a NOP return that always returns true.
|
||||
*/
|
||||
Datum lshvector_overlap(PG_FUNCTION_ARGS)
|
||||
{
|
||||
/* bool res; */
|
||||
/* int32 i; */
|
||||
/* LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0); */
|
||||
/* LSHVECTOR *b = PG_GETARG_LSHVECTOR_P(1); */
|
||||
/* uint32 *bina = (uint32 *)palloc( sizeof(uint32) * lsh_L ); */
|
||||
/* uint32 *binb = (uint32 *)palloc( sizeof(uint32) * lsh_L ); */
|
||||
|
||||
/* lsh_generate_binids(bina,a->items,a->numitems); */
|
||||
/* lsh_generate_binids(binb,b->items,b->numitems); */
|
||||
/* PG_FREE_IF_COPY(a,0); */
|
||||
/* PG_FREE_IF_COPY(b,1); */
|
||||
|
||||
/* res = false; /\* Assume no overlap *\/ */
|
||||
/* for(i=0;i<lsh_L;++i) { */
|
||||
/* if (bina[i] == binb[i]) { */
|
||||
/* res = true; /\* We found an overlap, (only need one) *\/ */
|
||||
/* break; */
|
||||
/* } */
|
||||
/* } */
|
||||
/* pfree(bina); */
|
||||
/* pfree(binb); */
|
||||
|
||||
|
||||
PG_RETURN_BOOL(true);
|
||||
}
|
||||
|
||||
Datum lshvector_gin_extract_value(PG_FUNCTION_ARGS)
|
||||
|
||||
{
|
||||
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||
int32 *nkeys = (int32 *) PG_GETARG_POINTER(1);
|
||||
Datum *entries = (Datum *)palloc( sizeof(Datum) * lsh_L );
|
||||
|
||||
lsh_generate_binids_datum(entries,a->items,a->numitems);
|
||||
PG_FREE_IF_COPY(a,0);
|
||||
*nkeys = lsh_L;
|
||||
PG_RETURN_POINTER(entries);
|
||||
}
|
||||
|
||||
Datum lshvector_gin_extract_query(PG_FUNCTION_ARGS)
|
||||
|
||||
{
|
||||
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||
int32 *nkeys = (int32 *) PG_GETARG_POINTER(1);
|
||||
/* StrategyNumber strategy = PG_GETARG_UINT16(2); */
|
||||
/* bool **pmatch = (bool **) PG_GETARG_POINTER(3); */
|
||||
/* Pointer **extra_data = (Pointer **) PG_GETARG_POINTER(4); */
|
||||
/* bool **nullFlags = (bool **) PG_GETARG_POINTER(5); */
|
||||
/* int32 *searchMode = (int32 *) PG_GETARG_POINTER(6); */
|
||||
Datum *entries = (Datum *)palloc( sizeof(Datum) * lsh_L );
|
||||
|
||||
lsh_generate_binids_datum(entries,a->items,a->numitems);
|
||||
PG_FREE_IF_COPY(a,0);
|
||||
*nkeys = lsh_L;
|
||||
PG_RETURN_POINTER(entries);
|
||||
}
|
||||
|
||||
Datum lshvector_gin_consistent(PG_FUNCTION_ARGS)
|
||||
|
||||
{
|
||||
bool *check = (bool *) PG_GETARG_POINTER(0);
|
||||
/* StrategyNumber strategy = PG_GETARG_UINT16(1); */
|
||||
/* LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(2); */
|
||||
int32 nkeys = PG_GETARG_INT32(3);
|
||||
/* Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4); */
|
||||
bool *recheck = (bool *) PG_GETARG_POINTER(5);
|
||||
bool res = false;
|
||||
int32 i;
|
||||
|
||||
*recheck = false; /* The operator does NOT need to be recalculated, this routine should exactly match */
|
||||
for(i=0;i<nkeys;++i) {
|
||||
if (check[i]) { /* If ANY hash is present in the indexed lshvector */
|
||||
res = true; /* this is considered an overlap */
|
||||
break; /* and we don't need to look any further */
|
||||
}
|
||||
}
|
||||
PG_RETURN_BOOL(res);
|
||||
}
|
60
Ghidra/Features/BSim/src/lshvector/c/lsh.h
Executable file
60
Ghidra/Features/BSim/src/lshvector/c/lsh.h
Executable file
@ -0,0 +1,60 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef __LSH_H__
|
||||
#define __LSH_H__
|
||||
|
||||
#include "postgres.h"
|
||||
|
||||
typedef struct
|
||||
{
|
||||
uint32 hash; /* A specific hash */
|
||||
uint16 tf; /* Associated hash(term) frequency */
|
||||
uint16 idf; /* Inverse Document Frequency */
|
||||
double coeff; /* The actual weight of this hash as a coefficient */
|
||||
} LSH_ITEM;
|
||||
|
||||
typedef struct
|
||||
{
|
||||
int32 vl_len_; /* varlena header (do not touch directly!) */
|
||||
uint32 numitems;
|
||||
uint32 hashcount; /* Total number of hashes counting multiplicity */
|
||||
double length; /* Length of vector */
|
||||
LSH_ITEM items[1];
|
||||
} LSHVECTOR;
|
||||
|
||||
#define HDRSIZELSH offsetof(LSHVECTOR,items)
|
||||
|
||||
#define DatumGetLshVectorP(X) ((LSHVECTOR *) PG_DETOAST_DATUM(X))
|
||||
#define PG_GETARG_LSHVECTOR_P(n) DatumGetLshVectorP(PG_GETARG_DATUM(n))
|
||||
|
||||
extern int32 lsh_k;
|
||||
extern int32 lsh_L;
|
||||
extern uint32 crc32tab[];
|
||||
extern bool weights_loaded;
|
||||
|
||||
extern void lsh_calc_weights(LSHVECTOR *vec);
|
||||
extern void lsh_initialize(void);
|
||||
extern void lsh_load_weights(void);
|
||||
extern void lsh_load_lookuptable(void);
|
||||
extern uint64 lsh_hash_internal(LSHVECTOR *vec);
|
||||
extern double lsh_compare_internal(LSHVECTOR *a,LSHVECTOR *b,double *sig);
|
||||
|
||||
extern void lsh_setup_signtable(void);
|
||||
extern void lsh_load_binconfig(void);
|
||||
extern void lsh_generate_binids(uint32 *res,LSH_ITEM *vec,uint32 vecsize);
|
||||
extern void lsh_generate_binids_datum(Datum *res,LSH_ITEM *vec,uint32 vecsize);
|
||||
|
||||
#endif
|
476
Ghidra/Features/BSim/src/lshvector/c/weights.c
Executable file
476
Ghidra/Features/BSim/src/lshvector/c/weights.c
Executable file
@ -0,0 +1,476 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "lsh.h"
|
||||
#include "fmgr.h"
|
||||
#include "executor/spi.h"
|
||||
#include "utils/memutils.h"
|
||||
#include <math.h>
|
||||
|
||||
#define LSH_IDFSIZE 512
|
||||
#define LSH_TFSIZE 64
|
||||
#define LSH_MAX_HASHENTRIES 1048576
|
||||
#define LSH_MAX_K 31
|
||||
#define LSH_MAX_L 1024
|
||||
#define LSH_DEFAULT_K 17
|
||||
#define LSH_DEFAULT_L 146
|
||||
|
||||
int32 lsh_k; /* Number of bits in a binid */
|
||||
int32 lsh_L; /* Number of binnings */
|
||||
|
||||
static double lsh_idfweight[LSH_IDFSIZE]; /* Sorted weights least -> most probable for Inverse Document Freq */
|
||||
static double lsh_tfweight[LSH_TFSIZE]; /* Sorted weights least -> most probable for Term Frequency */
|
||||
static double lsh_weightnorm; /* Normalization of idf weights over raw log(probability) */
|
||||
static double lsh_probflip0; /* Significance penalty for hash flips */
|
||||
static double lsh_probflip1;
|
||||
static double lsh_probdiff0; /* Significance penalty for length differences */
|
||||
static double lsh_probdiff1;
|
||||
static double lsh_scale; /* Final scaling for significance scoring */
|
||||
static double lsh_addend;
|
||||
static double lsh_probflip0_norm;
|
||||
static double lsh_probflip1_norm;
|
||||
static double lsh_probdiff0_norm;
|
||||
static double lsh_probdiff1_norm;
|
||||
|
||||
typedef struct {
|
||||
uint32 hash;
|
||||
uint32 count;
|
||||
} IDFEntry;
|
||||
|
||||
static MemoryContext lsh_mem_ctx;
|
||||
static uint32 lsh_IDFTableMask; /* mask for hash table computation */
|
||||
static IDFEntry *lsh_IDFTable = NULL; /* The IDFLookup table */
|
||||
bool weights_loaded = false;
|
||||
|
||||
static void update_norms(void)
|
||||
|
||||
{
|
||||
int32 i;
|
||||
double scale_sqrt = sqrt(lsh_scale);
|
||||
lsh_probflip0_norm = lsh_probflip0 * lsh_scale;
|
||||
lsh_probflip1_norm = lsh_probflip1 * lsh_scale;
|
||||
lsh_probdiff0_norm = lsh_probdiff0 * lsh_scale;
|
||||
lsh_probdiff1_norm = lsh_probdiff1 * lsh_scale;
|
||||
lsh_weightnorm = lsh_weightnorm / lsh_scale;
|
||||
for(i=0;i<LSH_IDFSIZE;++i) {
|
||||
lsh_idfweight[i] *= scale_sqrt;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Load the IDF and TF weights and other scaling info from the table 'weighttable'
|
||||
* If the table isn't present, return false
|
||||
* This assumes the existence of a table with LSH_IDFSIZE + LSH_TFSIZE + 7 row constructed with
|
||||
* CREATE TABLE weighttable (id integer,weight double precision);
|
||||
*/
|
||||
static bool load_weights_from_table(void)
|
||||
|
||||
{
|
||||
SPITupleTable *spi_tuptable;
|
||||
TupleDesc spi_tupdesc;
|
||||
uint64 i,proc;
|
||||
int32 ret;
|
||||
char *resstring;
|
||||
int32 resindex;
|
||||
double resweight;
|
||||
|
||||
ret = SPI_connect();
|
||||
|
||||
if (ret < 0)
|
||||
elog(ERROR,"lshvector load_weights_from_table: SPI_connect returned %d",ret);
|
||||
|
||||
/* Check for the existence of weighttable */
|
||||
ret = SPI_execute("SELECT relname from pg_class where relname='weighttable';",true,0);
|
||||
proc = SPI_processed;
|
||||
if ((ret != SPI_OK_SELECT)||(proc != 1)) {
|
||||
elog(WARNING,"lshvector load_weights_from_table: weighttable not present - using default weights");
|
||||
SPI_finish();
|
||||
return false;
|
||||
}
|
||||
|
||||
ret = SPI_execute("SELECT ALL * from weighttable;",true,0); /* Read(only) all rows from table */
|
||||
proc = SPI_processed;
|
||||
|
||||
if ((ret != SPI_OK_SELECT)||(proc != (LSH_IDFSIZE+LSH_TFSIZE + 7))) {
|
||||
elog(WARNING,"lshvector load_weights_from_table: weighttable has incorrect length - reverting to default weights");
|
||||
SPI_finish();
|
||||
return false;
|
||||
}
|
||||
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||
spi_tuptable = SPI_tuptable;
|
||||
|
||||
for(i=0;i<proc;++i) {
|
||||
HeapTuple tuple = spi_tuptable->vals[i];
|
||||
resstring = SPI_getvalue(tuple, spi_tupdesc, 1); /* Column numbers start at 1 */
|
||||
resindex = strtol(resstring,NULL,10);
|
||||
pfree(resstring);
|
||||
resstring = SPI_getvalue(tuple, spi_tupdesc, 2);
|
||||
resweight = atof( resstring );
|
||||
pfree(resstring);
|
||||
if (resindex < LSH_IDFSIZE)
|
||||
lsh_idfweight[resindex] = resweight;
|
||||
else if (resindex < LSH_IDFSIZE + LSH_TFSIZE)
|
||||
lsh_tfweight[resindex - LSH_IDFSIZE] = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE))
|
||||
lsh_weightnorm = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 1))
|
||||
lsh_probflip0 = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 2))
|
||||
lsh_probflip1 = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 3))
|
||||
lsh_probdiff0 = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 4))
|
||||
lsh_probdiff1 = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 5))
|
||||
lsh_scale = resweight;
|
||||
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 6))
|
||||
lsh_addend = resweight;
|
||||
else {
|
||||
SPI_finish();
|
||||
return false;
|
||||
}
|
||||
}
|
||||
SPI_finish();
|
||||
update_norms();
|
||||
return true;
|
||||
}
|
||||
|
||||
void lsh_load_weights(void)
|
||||
|
||||
{
|
||||
int32 i;
|
||||
if (load_weights_from_table()) /* Try to get weights from table */
|
||||
return;
|
||||
|
||||
/* Provide some sort of reasonable default */
|
||||
for(i=0;i<LSH_IDFSIZE;++i)
|
||||
lsh_idfweight[i] = 1.0;
|
||||
for(i=0;i<LSH_TFSIZE;++i)
|
||||
lsh_tfweight[i] = 1.0;
|
||||
|
||||
lsh_weightnorm = 13.0;
|
||||
lsh_probflip0 = 0.2;
|
||||
lsh_probflip1 = 20.0;
|
||||
lsh_probdiff0 = 0.2;
|
||||
lsh_probdiff1 = 20.0;
|
||||
lsh_scale = 1.0;
|
||||
lsh_addend = 0.0;
|
||||
update_norms();
|
||||
}
|
||||
|
||||
static void initialize_idflookup_hashtable(uint32 size)
|
||||
|
||||
{
|
||||
uint32 i;
|
||||
MemoryContext oldctx;
|
||||
|
||||
lsh_IDFTableMask = 1;
|
||||
while( lsh_IDFTableMask < size )
|
||||
lsh_IDFTableMask <<= 1;
|
||||
|
||||
lsh_IDFTableMask <<= 1;
|
||||
oldctx = MemoryContextSwitchTo(lsh_mem_ctx);
|
||||
lsh_IDFTable = (IDFEntry *) palloc(sizeof(IDFEntry) * lsh_IDFTableMask);
|
||||
for(i=0;i<lsh_IDFTableMask;++i) {
|
||||
lsh_IDFTable[i].count = 0xffffffff; /* Mark all the slots as empty */
|
||||
}
|
||||
|
||||
lsh_IDFTableMask -= 1;
|
||||
MemoryContextSwitchTo(oldctx);
|
||||
}
|
||||
|
||||
static void insert_idflookup_hash(uint32 hash,uint32 count)
|
||||
|
||||
{
|
||||
IDFEntry *ptr;
|
||||
uint32 val = hash & lsh_IDFTableMask;
|
||||
for(;;) {
|
||||
ptr = lsh_IDFTable + val;
|
||||
if (ptr->count == 0xffffffff) /* Found an empty slot */
|
||||
break;
|
||||
val = (val + 1) & lsh_IDFTableMask;
|
||||
}
|
||||
ptr->hash = hash;
|
||||
ptr->count = count;
|
||||
}
|
||||
|
||||
static uint32 get_idflookup_count(uint32 hash)
|
||||
|
||||
{
|
||||
uint32 val;
|
||||
IDFEntry *ptr;
|
||||
if (lsh_IDFTableMask == 0)
|
||||
return 0;
|
||||
val = hash & lsh_IDFTableMask;
|
||||
for(;;) {
|
||||
ptr = lsh_IDFTable + val;
|
||||
if (ptr->count == 0xffffffff) break; /* Is slot empty */
|
||||
if (ptr->hash == hash)
|
||||
return ptr->count;
|
||||
val = (val + 1) & lsh_IDFTableMask;
|
||||
}
|
||||
return 0; /* Entry is not in the table (assume 0 count) */
|
||||
}
|
||||
|
||||
/*
|
||||
* Based on hash and existing idf and tf counts, calculate the final coefficient
|
||||
* Also calculate the vector length and hashcount
|
||||
*/
|
||||
void lsh_calc_weights(LSHVECTOR *vec)
|
||||
|
||||
{
|
||||
uint32 i;
|
||||
LSH_ITEM *ptr;
|
||||
uint32 idf;
|
||||
double length = 0.0;
|
||||
double coeff;
|
||||
uint32 tf;
|
||||
uint32 hashcount = 0;
|
||||
|
||||
ptr = vec->items;
|
||||
for(i=0;i<vec->numitems;++i) {
|
||||
idf = get_idflookup_count(ptr[i].hash);
|
||||
ptr[i].idf = idf;
|
||||
tf = ptr[i].tf;
|
||||
coeff = lsh_idfweight[idf] * lsh_tfweight[ tf - 1 ];
|
||||
ptr[i].coeff = coeff;
|
||||
length += coeff * coeff;
|
||||
hashcount += tf;
|
||||
}
|
||||
vec->length = sqrt(length);
|
||||
vec->hashcount = hashcount;
|
||||
}
|
||||
|
||||
/* Load the most common IDF hashes for lookup and weight generation from the table 'idflookup'
|
||||
* If the table isn't present, return false
|
||||
* This assumes the existence of a table with (approximately) 1000 rows constructed with
|
||||
* CREATE TABLE idflookup( hash bigint, lookup integer);
|
||||
*/
|
||||
static bool load_idflookup_from_table(void)
|
||||
|
||||
{
|
||||
SPITupleTable *spi_tuptable;
|
||||
TupleDesc spi_tupdesc;
|
||||
uint64 i,proc;
|
||||
int32 ret;
|
||||
char *resstring;
|
||||
uint32 rescount;
|
||||
uint32 reshash;
|
||||
|
||||
ret = SPI_connect();
|
||||
|
||||
if (ret < 0)
|
||||
elog(ERROR,"lshvector load_idflookup_from_table: SPI_connect returned %d",ret);
|
||||
|
||||
/* Check for the existence of idflookup */
|
||||
ret = SPI_execute("SELECT relname from pg_class where relname='idflookup';",true,0);
|
||||
proc = SPI_processed;
|
||||
if ((ret != SPI_OK_SELECT)||(proc != 1)) {
|
||||
elog(WARNING,"lshvector load_idflookup_from_table: No IDF hashes present");
|
||||
SPI_finish();
|
||||
return false;
|
||||
}
|
||||
|
||||
ret = SPI_execute("SELECT ALL * from idflookup;",true,0); /* Read(only) all rows from table */
|
||||
proc = SPI_processed;
|
||||
if ((ret != SPI_OK_SELECT)||(proc <= 1)||(proc > LSH_MAX_HASHENTRIES)) {
|
||||
elog(WARNING,"lshvector load_idflookup_from_table: idflookup has invalid size: IDF hashes not loaded");
|
||||
SPI_finish();
|
||||
return false;
|
||||
}
|
||||
initialize_idflookup_hashtable((uint32)proc); /* Allocate the hashtable to hold entries for each row */
|
||||
|
||||
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||
spi_tuptable = SPI_tuptable;
|
||||
|
||||
for(i=0;i<proc;++i) {
|
||||
HeapTuple tuple = spi_tuptable->vals[i];
|
||||
resstring = SPI_getvalue(tuple, spi_tupdesc, 1); /* Column numbers start at 1 */
|
||||
reshash = strtoul(resstring,NULL,10);
|
||||
pfree(resstring);
|
||||
resstring = SPI_getvalue(tuple, spi_tupdesc, 2);
|
||||
rescount = strtoul(resstring,NULL,10);
|
||||
pfree(resstring);
|
||||
insert_idflookup_hash(reshash,rescount);
|
||||
}
|
||||
SPI_finish();
|
||||
return true;
|
||||
}
|
||||
|
||||
void lsh_load_binconfig(void)
|
||||
|
||||
{ /* Load the k and L parameters from the database */
|
||||
SPITupleTable *spi_tuptable;
|
||||
TupleDesc spi_tupdesc;
|
||||
uint64 proc;
|
||||
int32 ret;
|
||||
char *resstring;
|
||||
HeapTuple tuple;
|
||||
|
||||
ret = SPI_connect();
|
||||
|
||||
if (ret < 0)
|
||||
elog(ERROR,"lshvector lsh_load_binconfig: SPI_connect returned %d",ret);
|
||||
|
||||
/* Check for the existence of keyvaluetable */
|
||||
ret = SPI_execute("SELECT relname from pg_class where relname='keyvaluetable';",true,0);
|
||||
proc = SPI_processed;
|
||||
if ((ret != SPI_OK_SELECT)||(proc != 1)) {
|
||||
SPI_finish();
|
||||
lsh_k = LSH_DEFAULT_K; /* Reasonable defaults if configuration parameters don't exist */
|
||||
lsh_L = LSH_DEFAULT_L;
|
||||
return;
|
||||
}
|
||||
|
||||
/* Get the 'k' value */
|
||||
ret = SPI_execute("SELECT value FROM keyvaluetable WHERE key='k';",true,0);
|
||||
proc = SPI_processed;
|
||||
if ((ret != SPI_OK_SELECT)||(proc != 1))
|
||||
elog(ERROR,"lshvector lsh_load_binconfig: Could not load 'k' value from keyvaluetable");
|
||||
|
||||
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||
spi_tuptable = SPI_tuptable;
|
||||
|
||||
tuple = spi_tuptable->vals[0];
|
||||
resstring = SPI_getvalue(tuple,spi_tupdesc, 1); /* First column */
|
||||
lsh_k = strtoul(resstring,NULL,10);
|
||||
pfree(resstring);
|
||||
|
||||
/* Get the 'L' value */
|
||||
ret = SPI_execute("SELECT value FROM keyvaluetable WHERE key='L';",true,0);
|
||||
proc = SPI_processed;
|
||||
if ((ret != SPI_OK_SELECT)||(proc != 1))
|
||||
elog(ERROR,"lshvector lsh_load_binconfig: Could not load 'L' value from keyvaluetable");
|
||||
|
||||
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||
spi_tuptable = SPI_tuptable;
|
||||
|
||||
tuple = spi_tuptable->vals[0];
|
||||
resstring = SPI_getvalue(tuple,spi_tupdesc, 1); /* First column */
|
||||
lsh_L = strtoul(resstring,NULL,10);
|
||||
pfree(resstring);
|
||||
SPI_finish();
|
||||
|
||||
if (lsh_k < 1 || lsh_k > LSH_MAX_K || lsh_L < 1 || lsh_L > LSH_MAX_L)
|
||||
elog(ERROR,"lshvector lsh_load_binconfig: Invalid k and L settings");
|
||||
}
|
||||
|
||||
void lsh_load_lookuptable(void)
|
||||
|
||||
{
|
||||
if (lsh_IDFTable != NULL) {
|
||||
pfree(lsh_IDFTable);
|
||||
lsh_IDFTable = NULL;
|
||||
}
|
||||
|
||||
if (load_idflookup_from_table())
|
||||
return;
|
||||
|
||||
if (lsh_IDFTable != NULL) {
|
||||
pfree(lsh_IDFTable);
|
||||
lsh_IDFTable = NULL;
|
||||
}
|
||||
lsh_IDFTableMask = 0; /* Default lookup, always return 0 */
|
||||
}
|
||||
|
||||
/* Initialize the weight system, the first time the extension is loaded */
|
||||
void lsh_initialize(void)
|
||||
|
||||
{
|
||||
lsh_mem_ctx = AllocSetContextCreate(TopMemoryContext,
|
||||
"IDF weights lookup table",
|
||||
ALLOCSET_DEFAULT_MINSIZE,
|
||||
ALLOCSET_DEFAULT_INITSIZE,
|
||||
ALLOCSET_DEFAULT_MAXSIZE);
|
||||
|
||||
lsh_IDFTable = NULL;
|
||||
weights_loaded = false;
|
||||
|
||||
lsh_setup_signtable();
|
||||
}
|
||||
|
||||
double lsh_compare_internal(LSHVECTOR *a,LSHVECTOR *b,double *sig)
|
||||
|
||||
{
|
||||
double res = 0.0;
|
||||
double dotproduct;
|
||||
int32 intersectcount = 0;
|
||||
uint32 hash1,hash2;
|
||||
LSH_ITEM *aptr,*aend,*bptr,*bend;
|
||||
int32 t1,t2;
|
||||
double w1,w2;
|
||||
uint32 numflip,diff,min,max;
|
||||
|
||||
aptr = a->items;
|
||||
aend = aptr + a->numitems;
|
||||
bptr = b->items;
|
||||
bend = bptr + b->numitems;
|
||||
|
||||
if ((aptr != aend)&&(bptr != bend)) {
|
||||
hash1 = aptr->hash;
|
||||
hash2 = bptr->hash;
|
||||
for(;;) {
|
||||
if (hash1 == hash2) {
|
||||
t1 = aptr->tf;
|
||||
t2 = bptr->tf;
|
||||
if (t1 < t2) { /* a has the smallest number of terms with same hash */
|
||||
w1 = aptr->coeff; /* Use a weight */
|
||||
res += w1 * w1;
|
||||
intersectcount += t1; /* All of a terms are in the intersection, count them */
|
||||
}
|
||||
else {
|
||||
w2 = bptr->coeff; /* Use b weight */
|
||||
res += w2 * w2;
|
||||
intersectcount += t2; /* All of b terms are in the intersection, count them */
|
||||
}
|
||||
aptr++;
|
||||
bptr++;
|
||||
if (aptr == aend) break;
|
||||
if (bptr == bend) break;
|
||||
hash1 = aptr->hash;
|
||||
hash2 = bptr->hash;
|
||||
}
|
||||
else if (hash1 < hash2) {
|
||||
aptr++;
|
||||
if (aptr == aend) break;
|
||||
hash1 = aptr->hash;
|
||||
}
|
||||
else { /* hash1 > hash2 */
|
||||
bptr++;
|
||||
if (bptr == bend) break;
|
||||
hash2 = bptr->hash;
|
||||
}
|
||||
}
|
||||
dotproduct = res;
|
||||
res /= (a->length * b->length);
|
||||
}
|
||||
else
|
||||
dotproduct = res;
|
||||
|
||||
if (a->hashcount < b->hashcount) {
|
||||
min = a->hashcount; /* Smallest vector is a */
|
||||
max = b->hashcount;
|
||||
}
|
||||
else {
|
||||
min = b->hashcount;
|
||||
max = a->hashcount;
|
||||
}
|
||||
diff = max - min; /* Subtract to get a positive difference */
|
||||
numflip = min - intersectcount;
|
||||
*sig = dotproduct - numflip * (lsh_probflip0_norm + lsh_probflip1_norm/max)
|
||||
- diff * (lsh_probdiff0_norm + lsh_probdiff1_norm/max) + lsh_addend;
|
||||
return res;
|
||||
}
|
||||
|
107
Ghidra/Features/BSim/src/lshvector/lshvector--1.0.sql
Executable file
107
Ghidra/Features/BSim/src/lshvector/lshvector--1.0.sql
Executable file
@ -0,0 +1,107 @@
|
||||
|
||||
|
||||
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
|
||||
\echo Use "CREATE EXTENSION lshvector" to load this file. \quit
|
||||
|
||||
-- Create user-defined type for feature vector
|
||||
|
||||
CREATE FUNCTION lshvector_in(cstring)
|
||||
RETURNS lshvector
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STABLE STRICT;
|
||||
-- Stable because of configurable weights
|
||||
|
||||
CREATE FUNCTION lshvector_out(lshvector)
|
||||
RETURNS cstring
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C IMMUTABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lshvector_recv(internal)
|
||||
RETURNS lshvector
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STABLE STRICT;
|
||||
-- Stable because of configurable weights
|
||||
|
||||
CREATE FUNCTION lshvector_send(lshvector)
|
||||
RETURNS bytea
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C IMMUTABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lshvector_hash(lshvector)
|
||||
RETURNS int8
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C IMMUTABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lsh_load()
|
||||
RETURNS int4
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STRICT;
|
||||
|
||||
CREATE FUNCTION lsh_reload()
|
||||
RETURNS int4
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STRICT;
|
||||
|
||||
CREATE FUNCTION lsh_getweight(lshvector)
|
||||
RETURNS float8
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C IMMUTABLE STRICT;
|
||||
|
||||
CREATE TYPE lshvector (
|
||||
INTERNALLENGTH = variable,
|
||||
INPUT = lshvector_in,
|
||||
OUTPUT = lshvector_out,
|
||||
RECEIVE = lshvector_recv,
|
||||
SEND = lshvector_send,
|
||||
ALIGNMENT = double,
|
||||
STORAGE = external
|
||||
);
|
||||
|
||||
CREATE TYPE lshvector_comptype AS (
|
||||
sim DOUBLE PRECISION,
|
||||
sig DOUBLE PRECISION
|
||||
);
|
||||
|
||||
CREATE FUNCTION lshvector_compare(lshvector,lshvector)
|
||||
RETURNS lshvector_comptype
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C IMMUTABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lshvector_overlap(lshvector,lshvector)
|
||||
RETURNS bool
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lshvector_gin_extract_value(lshvector,internal)
|
||||
RETURNS internal
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lshvector_gin_extract_query(lshvector,internal,int2,internal,internal,internal,internal)
|
||||
RETURNS internal
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STABLE STRICT;
|
||||
|
||||
CREATE FUNCTION lshvector_gin_consistent(internal, int2, lshvector, int4, internal, internal, internal, internal)
|
||||
RETURNS bool
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C IMMUTABLE STRICT;
|
||||
|
||||
CREATE OPERATOR % (
|
||||
LEFTARG = lshvector,
|
||||
RIGHTARG = lshvector,
|
||||
PROCEDURE = lshvector_overlap,
|
||||
COMMUTATOR = '%',
|
||||
RESTRICT = contsel,
|
||||
JOIN = contjoinsel
|
||||
);
|
||||
|
||||
CREATE OPERATOR CLASS gin_lshvector_ops
|
||||
FOR TYPE lshvector USING gin
|
||||
AS
|
||||
OPERATOR 1 % (lshvector,lshvector),
|
||||
FUNCTION 1 btint4cmp (int4,int4),
|
||||
FUNCTION 2 lshvector_gin_extract_value (lshvector,internal),
|
||||
FUNCTION 3 lshvector_gin_extract_query (lshvector,internal,int2,internal,internal,internal,internal),
|
||||
FUNCTION 4 lshvector_gin_consistent (internal,int2,lshvector,int4,internal,internal,internal,internal),
|
||||
STORAGE int4;
|
6
Ghidra/Features/BSim/src/lshvector/lshvector.control
Executable file
6
Ghidra/Features/BSim/src/lshvector/lshvector.control
Executable file
@ -0,0 +1,6 @@
|
||||
# Locality Sensitive Hashing extension
|
||||
comment = 'a feature vector type and a locality sensitive hashing index'
|
||||
default_version = '1.0'
|
||||
module_pathname = '$libdir/lshvector'
|
||||
superuser = false
|
||||
relocatable = true
|
175
Ghidra/Features/BSim/src/main/help/help/TOC_Source.xml
Executable file
175
Ghidra/Features/BSim/src/main/help/help/TOC_Source.xml
Executable file
@ -0,0 +1,175 @@
|
||||
<?xml version='1.0' encoding='ISO-8859-1' ?>
|
||||
<!--
|
||||
|
||||
This is an XML file intended to be parsed by the Ghidra help system. It is loosely based
|
||||
upon the JavaHelp table of contents document format. The Ghidra help system uses a
|
||||
TOC_Source.xml file to allow a module with help to define how its contents appear in the
|
||||
Ghidra help viewer's table of contents. The main document (in the Base module)
|
||||
defines a basic structure for the
|
||||
Ghidra table of contents system. Other TOC_Source.xml files may use this structure to insert
|
||||
their files directly into this structure (and optionally define a substructure).
|
||||
|
||||
|
||||
In this document, a tag can be either a <tocdef> or a <tocref>. The former is a definition
|
||||
of an XML item that may have a link and may contain other <tocdef> and <tocref> children.
|
||||
<tocdef> items may be referred to in other documents by using a <tocref> tag with the
|
||||
appropriate id attribute value. Using these two tags allows any module to define a place
|
||||
in the table of contents system (<tocdef>), which also provides a place for
|
||||
other TOC_Source.xml files to insert content (<tocref>).
|
||||
|
||||
During the help build time, all TOC_Source.xml files will be parsed and validated to ensure
|
||||
that all <tocref> tags point to valid <tocdef> tags. From these files will be generated
|
||||
<module name>_TOC.xml files, which are table of contents files written in the format
|
||||
desired by the JavaHelp system. Additionally, the genated files will be merged together
|
||||
as they are loaded by the JavaHelp system. In the end, when displaying help in the Ghidra
|
||||
help GUI, there will be on table of contents that has been created from the definitions in
|
||||
all of the modules' TOC_Source.xml files.
|
||||
|
||||
|
||||
Tags and Attributes
|
||||
|
||||
<tocdef>
|
||||
-id - the name of the definition (this must be unique across all TOC_Source.xml files)
|
||||
-text - the display text of the node, as seen in the help GUI
|
||||
-target** - the file to display when the node is clicked in the GUI
|
||||
-sortgroup - this is a string that defines where a given node should appear under a given
|
||||
parent. The string values will be sorted by the JavaHelp system using
|
||||
a javax.text.RulesBasedCollator. If this attribute is not specified, then
|
||||
the text of attribute will be used.
|
||||
|
||||
<tocref>
|
||||
-id - The id of the <tocdef> that this reference points to
|
||||
|
||||
**The URL for the target is relative and should start with 'help/topics'. This text is
|
||||
used by the Ghidra help system to provide a universal starting point for all links so that
|
||||
they can be resolved at runtime, across modules.
|
||||
|
||||
|
||||
-->
|
||||
|
||||
<tocroot>
|
||||
|
||||
<tocref id="Ghidra Functionality">
|
||||
<tocdef id="BSim"
|
||||
text="BSim"
|
||||
target= "help/topics/BSim/BSimOverview.html">
|
||||
<tocdef id="BSimDatabaseConfiguration" sortgroup="a"
|
||||
text="BSim Database Configuration"
|
||||
target="help/topics/BSim/DatabaseConfiguration.html" >
|
||||
<tocdef id="BSim Overview"
|
||||
sortgroup="a"
|
||||
text="Overview"
|
||||
target="help/topics/BSim/DatabaseConfiguration.html#ConfigOverview" />
|
||||
<tocdef id="BSim Server Configuration"
|
||||
sortgroup="b"
|
||||
text="Server Configuration"
|
||||
target="help/topics/BSim/DatabaseConfiguration.html#ServerConfig" />
|
||||
<tocdef id="Creating a BSim Database"
|
||||
sortgroup="c"
|
||||
text="Creating a Database"
|
||||
target="help/topics/BSim/DatabaseConfiguration.html#CreateDatabase" />
|
||||
<tocdef id="Tailoring BSim Meta-dataX"
|
||||
sortgroup="d"
|
||||
text="Tailoring BSim Meta-data"
|
||||
target="help/topics/BSim/DatabaseConfiguration.html#TailorBSim" />
|
||||
</tocdef>
|
||||
<tocdef id="BSimIngestProcess" sortgroup="b"
|
||||
text="Ingesting Executables"
|
||||
target="help/topics/BSim/IngestProcess.html" >
|
||||
<tocdef id="BSim Ingest Process"
|
||||
sortgroup="a"
|
||||
text="Ingest Process"
|
||||
target="help/topics/BSim/IngestProcess.html#IngestOverview"/>
|
||||
<tocdef id="BSim Tailoring Analysis"
|
||||
sortgroup="b"
|
||||
text="Tailoring Analysis"
|
||||
target="help/topics/BSim/IngestProcess.html#TailorAnalysis"/>
|
||||
<tocdef id="BSim Analysis Effects on Feature Extraction"
|
||||
sortgroup="c"
|
||||
text="Analysis Effects on Feature Extraction"
|
||||
target="help/topics/BSim/IngestProcess.html#AnalysisEffects"/>
|
||||
<tocdef id="BSim Maintenance"
|
||||
sortgroup="d"
|
||||
text="Maintenance"
|
||||
target="help/topics/BSim/IngestProcess.html#Maintenance"/>
|
||||
<tocdef id="BSim Migration"
|
||||
sortgroup="e"
|
||||
text="Migration"
|
||||
target="help/topics/BSim/IngestProcess.html#Migration"/>
|
||||
</tocdef>
|
||||
|
||||
|
||||
<tocdef id="BSimSearch"
|
||||
text="BSim Search"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html">
|
||||
<tocdef id="Adding_BSim_Plugin"
|
||||
sortgroup="a"
|
||||
text="Enabling the BSim Search Plugin"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#Adding_BSim_Plugin">
|
||||
</tocdef>
|
||||
<tocdef id="BSim_Servers_Dialog"
|
||||
sortgroup="b"
|
||||
text="Defining And Managing BSim Database Definitions"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Servers_Dialog">
|
||||
</tocdef>
|
||||
<tocdef id="BSim_Overview_Dialog"
|
||||
sortgroup="c"
|
||||
text="Overview Query"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Overview_Dialog">
|
||||
</tocdef>
|
||||
<tocdef id="BSim_Overview_Results"
|
||||
sortgroup="d"
|
||||
text="Overview Query Results"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Overview_Results">
|
||||
</tocdef>
|
||||
<tocdef id="BSim_Search_Dialog"
|
||||
sortgroup="e"
|
||||
text="Similar Function Search"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Search_Dialog">
|
||||
</tocdef>
|
||||
<tocdef id="Similar_Functions_Results"
|
||||
sortgroup="f"
|
||||
text="Similar Function Search Results"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#Similar_Functions_Results">
|
||||
</tocdef>
|
||||
<tocdef id="BSim_Authentication"
|
||||
sortgroup="g"
|
||||
text="Authentication"
|
||||
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Authentication">
|
||||
</tocdef>
|
||||
</tocdef>
|
||||
|
||||
<tocdef id="BSimFeatureWeight" sortgroup="d"
|
||||
text="Features and Weights"
|
||||
target="help/topics/BSim/FeatureWeight.html" >
|
||||
|
||||
<tocdef id="BSim Features of Software Functions"
|
||||
sortgroup="a"
|
||||
text="Features of Software Functions"
|
||||
target="help/topics/BSim/FeatureWeight.html#FunctionFeatures"/>
|
||||
<tocdef id="BSim Weighting Software Features"
|
||||
sortgroup="b"
|
||||
text="Weighting Software Features"
|
||||
target="help/topics/BSim/FeatureWeight.html#WeightingSoftware"/>
|
||||
<tocdef id="BSim Comparing Feature Vectors"
|
||||
sortgroup="d"
|
||||
text="Comparing Feature Vectors"
|
||||
target="help/topics/BSim/FeatureWeight.html#CompareVectors"/>
|
||||
</tocdef>
|
||||
|
||||
<tocdef id="BSimCommandLine" sortgroup="e"
|
||||
text="Command-Line Utility Reference"
|
||||
target="help/topics/BSim/CommandLineReference.html" >
|
||||
|
||||
<tocdef id="BSim Control (bsim_ctl)"
|
||||
sortgroup="a"
|
||||
text="BSim Control (bsim_ctl)"
|
||||
target="help/topics/BSim/CommandLineReference.html#BSimCtl"/>
|
||||
<tocdef id="BSim Command (bsim)"
|
||||
sortgroup="b"
|
||||
text="BSim Command (bsim)"
|
||||
target="help/topics/BSim/CommandLineReference.html#BSimCommand"/>
|
||||
</tocdef>
|
||||
</tocdef>
|
||||
</tocref>
|
||||
</tocroot>
|
25
Ghidra/Features/BSim/src/main/help/help/shared/languages.css
Normal file
25
Ghidra/Features/BSim/src/main/help/help/shared/languages.css
Normal file
@ -0,0 +1,25 @@
|
||||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
/*
|
||||
This file contains non-Ghidra style sheet markup. This file will be loaded in addition to
|
||||
DefaultStyle.css.
|
||||
*/
|
||||
|
||||
div.informalexample { margin-left: 50px; margin-top: 10px; }
|
||||
dd { margin-bottom: 20px; }
|
||||
dd p { margin-top: 5px; margin-left: 10px; }
|
||||
span.term { font-family:times new roman; font-size:14pt; font-weight:bold; }
|
||||
span.redtext { color:#CC0033; }
|
@ -0,0 +1,197 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META name="generator" content=
|
||||
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
|
||||
<TITLE>BSim Database</TITLE>
|
||||
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||
<LINK rel="home" href="index.html" title="BSim Database">
|
||||
<LINK rel="up" href="index.html" title="BSim Database">
|
||||
<LINK rel="prev" href="index.html" title="BSim Database">
|
||||
<LINK rel="next" href="DatabaseConfiguration.html" title="Database Configuration">
|
||||
</HEAD>
|
||||
|
||||
<BODY>
|
||||
<DIV class="chapter">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H1 class="title"><A name="DatabaseOverview"></A>BSim Database</H1>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||||
<H3 class="title">Quick Reference Links</H3>
|
||||
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist compact" style="list-style-type: disc;">
|
||||
<LI class="listitem"><A class="link" href="DatabaseConfiguration.html" title=
|
||||
"Database Configuration">Database Configuration</A></LI>
|
||||
|
||||
<LI class="listitem"><A class="link" href="IngestProcess.html" title=
|
||||
"Ingesting Executables">Ingesting Executables</A></LI>
|
||||
|
||||
<LI class="listitem"><A class="link" href="../BSimSearchPlugin/BSimSearch.html" title=
|
||||
"Querying a BSim Database">Querying a BSim Database</A></LI>
|
||||
|
||||
<LI class="listitem"><A class="link" href="FeatureWeight.html" title=
|
||||
"Features and Weights">Features and Weights</A></LI>
|
||||
|
||||
<LI class="listitem"><A class="link" href="CommandLineReference.html" title=
|
||||
"Command-Line Utility Reference">Command-Line Reference</A></LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="IntroOverview"></A>Overview</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Welcome to Ghidra's BSim (Behavioral Similarity) Database. This database technology is
|
||||
designed to allow reverse engineers to ingest metadata about previously analyzed binary
|
||||
executables to a central server or local database, which can then be queried in the
|
||||
course of analyzing new,
|
||||
unknown, executables to quickly discover previously seen functions and libraries.</P>
|
||||
|
||||
<P>The primary record ingested into the database describes a single function. The most
|
||||
novel aspects of the database are that:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist" style="list-style-type: disc;">
|
||||
<LI class="listitem">Queries are tolerant of variations in the compilation of the
|
||||
function.</LI>
|
||||
|
||||
<LI class="listitem">All records are indexed for quick queries. (even for very large
|
||||
collections)</LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The primary feature set used for indexing a function is extracted from a concise
|
||||
description of the data-flow of the function, not the explicit encoding of the machine
|
||||
instructions. The data-flow description is a graph-based (abstract syntax tree)
|
||||
representation, based on Ghidra's intermediate representation language, p-code, and is
|
||||
generated by the Ghidra decompiler. The resulting function descriptions are normalized to
|
||||
minimize the impact of variations due to:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist" style="list-style-type: disc;">
|
||||
<LI class="listitem">Equivalent machine instructions</LI>
|
||||
|
||||
<LI class="listitem">Storage location (registers, stack, memory)</LI>
|
||||
|
||||
<LI class="listitem">Instruction order</LI>
|
||||
|
||||
<LI class="listitem">Many forms of compiler transformation</LI>
|
||||
|
||||
<LI class="listitem">Even some forms of deliberate obfuscation.</LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Records are indexed using current Text Retrieval strategies, which allow "nearest
|
||||
neighbor" queries. The feature set of an unknown function being queried does not have to
|
||||
exactly match the features of a "hit" in the database, but only a configurable percentage
|
||||
of them. This supplies an additional level of tolerance of "functional difference" on top
|
||||
of the tolerance of "functionally equivalent" variations provided by the decompiler. In
|
||||
other words, there can be some amount of true change in the underlying source code, and the
|
||||
query may still be able to find a match.</P>
|
||||
|
||||
<P>Queries are quick: For a single function, results typically come back in microseconds,
|
||||
even for a database containing millions of functions.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="ToolOverview"></A>Overview of
|
||||
Tools</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>A BSim Database is built on top of one of three technologies: PostgreSQL,
|
||||
local H2 database, or Elasticsearch.
|
||||
PostgreSQL is a robust, production capable, server that supports multiple simultaneous
|
||||
connections and is extremely fault tolerant. Elasticsearch is a scalable search engine that
|
||||
allows a database to be distributed across an entire cluster of machines.
|
||||
The local H2 database support is provided for convenience and use with small personal
|
||||
collections. For any of these options, this distribution includes specific reverse
|
||||
engineering extensions and clients that provide the following capabilities.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist" style="list-style-type: disc;">
|
||||
<LI class="listitem">
|
||||
Integration with a Ghidra Server or local project:
|
||||
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist" style="list-style-type: circle;">
|
||||
<LI class="listitem">Ingest can be with respect to a Ghidra repository
|
||||
from either a Ghidra Server or local project.</LI>
|
||||
|
||||
<LI class="listitem">Query results can refer to executables within a
|
||||
repository.</LI>
|
||||
|
||||
<LI class="listitem">Easy command-line ingests using the <CODE class=
|
||||
"filename">bsim</CODE> command script</LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</LI>
|
||||
|
||||
<LI class="listitem">
|
||||
Client as a Ghidra Plug-in:
|
||||
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist" style="list-style-type: circle;">
|
||||
<LI class="listitem">Ghidra includes a plug-in client that integrates a query
|
||||
dialog and results windows directly into the main code browser.</LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</LI>
|
||||
|
||||
<LI class="listitem">
|
||||
Query API:
|
||||
|
||||
<DIV class="itemizedlist">
|
||||
<UL class="itemizedlist" style="list-style-type: circle;">
|
||||
<LI class="listitem">Ghidra includes a Java API to the BSim server so that
|
||||
queries (and potentially ingest) can be incorporated into analyst scripts. The
|
||||
API marshals queries and results between an active Ghidra session and a BSim
|
||||
server.</LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</LI>
|
||||
</UL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||||
<H3 class="title">Note</H3>
|
||||
|
||||
<P>The PostgreSQL server software is currently only supported for the <SPAN class=
|
||||
"emphasis"><EM>Linux</EM></SPAN> and <SPAN class="emphasis"><EM>MacOS</EM></SPAN>
|
||||
architectures. Elasticsearch server software must be obtained separately. Small local
|
||||
file-based databases are supported on all platforms via an embedded H2 database
|
||||
engine. The BSim client
|
||||
software is supported on all platforms and can connect to servers on a different
|
||||
architecture.</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</BODY>
|
||||
</HTML>
|
@ -0,0 +1,820 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META name="generator" content=
|
||||
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
|
||||
<TITLE>Command-Line Utility Reference</TITLE>
|
||||
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||
<LINK rel="home" href="index.html" title="BSim Database">
|
||||
<LINK rel="up" href="index.html" title="BSim Database">
|
||||
<LINK rel="prev" href="FeatureWeight.html" title="Features and Weights">
|
||||
</HEAD>
|
||||
|
||||
<BODY>
|
||||
<DIV class="chapter">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H1 class="title"><A name="CommandLineReference"></A>Command-Line Utility
|
||||
Reference</H1>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="BSimCtl"></A><CODE class=
|
||||
"computeroutput">bsim_ctl</CODE></H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<PRE>
|
||||
<CODE class="computeroutput">
|
||||
bsim_ctl start </datadir-path [auth=pki|password|trust] [--noLocalAuth] [cafile=</cacert-path>] [dn=".."]
|
||||
bsim_ctl stop </datadir-path> [--force]
|
||||
bsim_ctl adduser </datadir-path> <username> [dn=".."]
|
||||
bsim_ctl dropuser </datadir-path> <username>
|
||||
bsim_ctl resetpassword <username>
|
||||
bsim_ctl changeauth </datadir-path> [auth=pki|password|trust] [--noLocalAuth] [cafile=</cacert-path>] [dn=".."]
|
||||
bsim_ctl changeprivilege <username> admin|user
|
||||
|
||||
Global Options:
|
||||
port=<portnum>
|
||||
user=<username>
|
||||
cert=</certfile-path>
|
||||
</CODE>
|
||||
</PRE>
|
||||
</DIV>
|
||||
|
||||
<P><SPAN class="command"><STRONG>bsim_ctl</STRONG></SPAN> is a command-line utility for
|
||||
starting and stopping a BSim server using the PostgreSQL back-end that is prepackaged with
|
||||
the Ghidra distribution. All commands must be run on the machine hosting the server.
|
||||
Optional parameters for a given command are indicated by square brackets '[' and ']'.
|
||||
Options with an '=' character require a user specified value. If the value string requires
|
||||
space characters, it should be enclosed in double quotes.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="variablelist">
|
||||
<DL class="variablelist">
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>start</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Initializes and starts a PostgreSQL server. The command-line must include a path
|
||||
to the data directory for the server, which must exist. If a server had run
|
||||
previously and populated this directory, this command simply restarts the server
|
||||
using the preexisting data and configuration; otherwise, a new database is
|
||||
initialized. The user performing the initial start is automatically added to the
|
||||
database with <SPAN class="emphasis"><EM>admin</EM></SPAN> privileges.</P>
|
||||
|
||||
<P>During a restart, any authentication options (with the exception of the global
|
||||
<SPAN class="bold"><STRONG>cert=</STRONG></SPAN> option) are unnecessary and will
|
||||
be ignored. The PostgreSQL server will be restarted with the already established
|
||||
settings. To actually change the settings, use the <SPAN class=
|
||||
"bold"><STRONG>changeauth</STRONG></SPAN> command before restarting.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>auth=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>type</EM></SPAN> - specifies the authentication type (<B>pki |
|
||||
password | trust</B>) for a new database: <SPAN class=
|
||||
"emphasis"><EM>trust</EM></SPAN> for no authentication, <SPAN class=
|
||||
"emphasis"><EM>password</EM></SPAN> for password authentication, and <SPAN class=
|
||||
"emphasis"><EM>pki</EM></SPAN> for authentication using public key certificates.
|
||||
With the <SPAN class="emphasis"><EM>pki</EM></SPAN> setting, both the <SPAN class=
|
||||
"bold"><STRONG>cafile=</STRONG></SPAN> and the <SPAN class=
|
||||
"bold"><STRONG>dn=</STRONG></SPAN> options also need to be provided; additionally
|
||||
the <SPAN class="bold"><STRONG>cert=</STRONG></SPAN> option must be provided unless
|
||||
the <SPAN class="bold"><STRONG>--noLocalAuth</STRONG></SPAN> option is also
|
||||
given.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--noLocalAuth</STRONG></SPAN> - used together with
|
||||
the <SPAN class="command"><STRONG>auth=</STRONG></SPAN> option causes
|
||||
authentication to not be required for local connections, i.e. localhost.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>cafile=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>/cafile-path</EM></SPAN> - specifies an absolute path to a
|
||||
certificate authority file and is required for <SPAN class=
|
||||
"command"><STRONG>auth=pki</STRONG></SPAN>. This file should contain the
|
||||
certificates the PostgreSQL server will use to authenticate in PEM format
|
||||
concatenated together.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>dn=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>name</EM></SPAN> - specifies the Distinguished Name for the admin
|
||||
user and is required for <SPAN class=
|
||||
"command"><STRONG>auth=pki</STRONG></SPAN>.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>port=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>portnum</EM></SPAN> - specifies the port the PostgreSQL server will
|
||||
listen on. For port numbers other than the default 5432, URLs and other
|
||||
command-lines must explicitly specify the port, when connecting to the server. This
|
||||
option only effects the initial start of a server. For subsequent (re)starts this
|
||||
option is ignored, and the server will continue to listen on the same port
|
||||
specified in the initial start. Use <SPAN class=
|
||||
"command"><STRONG>changeauth</STRONG></SPAN> to change the port of a server after
|
||||
its initial start.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>stop</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Stops a currently running PostgreSQL server. The path to the actively used data
|
||||
directory must be provided. By default, shutdown will wait until existing
|
||||
connections to the database have been closed.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--force</STRONG></SPAN> - causes existing
|
||||
connections to be forcibly closed and the PostgreSQL server to shut down
|
||||
immediately.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>adduser</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Give a new user permission to access the PostgreSQL server. The path to the
|
||||
actively used data directory and a single username must be specified. The server
|
||||
must be running. New users are given <SPAN class="emphasis"><EM>user</EM></SPAN>
|
||||
(read-only) privileges, unless a subsequent <SPAN class=
|
||||
"command"><STRONG>changeprivilege</STRONG></SPAN> command is used.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>dn=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>name</EM></SPAN> - specifies the Distinguished Name of the new user,
|
||||
which is required if the database enabled <SPAN class=
|
||||
"command"><STRONG>auth=pki</STRONG></SPAN>. This option can be used to provide a
|
||||
Distinguished Name to a preexisting user, if the PostgreSQL server's authentication
|
||||
strategy is changed.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>dropuser</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Remove access to the PostgreSQL server for a specific user. The path to the
|
||||
actively used data directory and a single username must be specified. The server
|
||||
must be running.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>changeauth</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Change the configuration of a previously initialized PostgreSQL server. The path
|
||||
to the server's data directory must be specified. The server must not currently be
|
||||
running to use this command, which only takes effect after a restart. Options have
|
||||
the same meaning as for the <SPAN class="command"><STRONG>start</STRONG></SPAN>
|
||||
command.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>port=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>portnum</EM></SPAN> - changes the port the PostgreSQL server will
|
||||
listen on. If this option is not present, the server will continue to listen on the
|
||||
same port.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>auth=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>type</EM></SPAN> - changes the authentication type (<B>pki |
|
||||
password | trust</B>) used by the PostgreSQL server. No change is made if the
|
||||
option is not present. If the option is present, omitting the <SPAN class=
|
||||
"command"><STRONG>--noLocalAuth</STRONG></SPAN> causes local connections to require
|
||||
authentication. This command does not affect the presence or absence of passwords
|
||||
or Distinguished Names for existing users.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>dn=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>name</EM></SPAN> - specifies the Distinguished Name for the admin
|
||||
user and is required for <SPAN class=
|
||||
"command"><STRONG>auth=pki</STRONG></SPAN>.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>resetpassword</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Reset the password for a user. A single user must be specified, and the
|
||||
PostgreSQL server must be running. The password will be reset to 'changeme'.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>changeprivilege</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Change access privilege for a user. A single user must be specified followed by
|
||||
<SPAN class="command"><STRONG>admin</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>user</STRONG></SPAN>, and the PostgreSQL server must be
|
||||
running.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>--Global
|
||||
Options--</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>These options apply to all the <SPAN class=
|
||||
"command"><STRONG>bsim_ctl</STRONG></SPAN> commands that connect to an active
|
||||
PostgreSQL server: <SPAN class="command"><STRONG>start</STRONG></SPAN>, <SPAN
|
||||
class="command"><STRONG>adduser</STRONG></SPAN>, <SPAN class=
|
||||
"command"><STRONG>dropuser</STRONG></SPAN>, <SPAN class=
|
||||
"command"><STRONG>resetpassword</STRONG></SPAN>, and <SPAN class=
|
||||
"command"><STRONG>changeprivilege</STRONG></SPAN>.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>port=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>portnum</EM></SPAN> - specifies the port on which to connect with
|
||||
the PostgreSQL server.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>user=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>username</EM></SPAN> - specifies a user name to use when connecting
|
||||
to the PostgreSQL server.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>cert=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>/certfile-path</EM></SPAN> - provides the absolute file path to the
|
||||
user's certificate when connecting to a PostgreSQL server that requires PKI
|
||||
authentication.</P>
|
||||
</DD>
|
||||
</DL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="BSimCommand"></A><CODE class=
|
||||
"computeroutput">bsim</CODE></H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<PRE>
|
||||
<CODE class="computeroutput">
|
||||
bsim createdatabase <bsimURL> <config_template> [name="<name>"] [owner="<owner>"] [description="<text>"] [--nocallgraph]
|
||||
bsim setmetadata <bsimURL> [name="<name>"] [owner="<owner>"] [description="<text>"]\n" +
|
||||
bsim addexecategory <bsimURL> <category_name> [--date]
|
||||
bsim addfunctiontag <bsimURL> <tag_name>
|
||||
bsim dropindex <bsimURL>
|
||||
bsim rebuildindex <bsimURL>
|
||||
bsim prewarm <bsimURL>
|
||||
bsim generatesigs <ghidraURL> </xmldirectory> config=<config_template> [--overwrite]
|
||||
bsim generatesigs <ghidraURL> </xmldirectory> bsim=<bsimURL> [--commit] [--overwrite]
|
||||
bsim generatesigs <ghidraURL> bsim=<bsimURL>
|
||||
bsim commitsigs <bsimURL> </xmldirectory> [md5=<hash>] [override=<ghidraURL>]
|
||||
bsim generateupdates <ghidraURL> </xmldirectory> config=<config_template> [--overwrite]
|
||||
bsim generateupdates <ghidraURL> </xmldirectory> bsim=<bsimURL> [--commit] [--overwrite]
|
||||
bsim generateupdates <ghidraURL> bsim=<bsimURL>
|
||||
bsim commitupdates <bsimURL> </xmldirectory>
|
||||
bsim listexes <bsimURL> [md5=<hash>] [name=<exe_name>] [arch=<languageID>] [compiler=<cspecID>] [sortcol=<column_name>] [limit=<exe_count>] [--includelibs]
|
||||
bsim getexecount <bsimURL> [md5=<hash>] [name=<exe_name>] [arch=<languageID>] [compiler=<cspecID>] [--includelibs]
|
||||
bsim delete <bsimURL> [md5=<hash>] [name=<exe_name> [arch=<languageID>] [compiler=<cspecID>]]
|
||||
bsim listfuncs <bsimURL> [md5=<hash>] [name=<exe_name> [arch=<languageID>] [compiler=<cspecID>]] [--printselfsig] [--callgraph] [--printjustexe] [maxfunc=<max_count>]
|
||||
bsim dumpsigs <bsimURL> </xmldirectory> [md5=<hash>] [name=<exe_name> [arch=<languageID>] [compiler=<cspecID>]]
|
||||
|
||||
Global options:
|
||||
user=<username>
|
||||
cert=<certfile-path>
|
||||
</CODE>
|
||||
</PRE>
|
||||
</DIV>
|
||||
|
||||
<P>See <A class="xref" href="CommandLineReference.html#URLs">“Ghidra and BSim
|
||||
URLs”</A> below for details about specifying <EM>ghidraURL</EM> and <EM>bsimURL</EM>
|
||||
properly. See <A class="xref" href="DatabaseConfiguration.html">“Database
|
||||
Configuration”</A> for guidance on the various BSim Databases which are
|
||||
supported.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>bsim</STRONG></SPAN> is a command-line utility for
|
||||
managing the generation and ingest of BSim signatures and metadata. Depending on the
|
||||
subcommand, it connects to a Ghidra Server and/or a BSim database server. A <SPAN class=
|
||||
"emphasis"><EM>ghidraURL</EM></SPAN> refers to Ghidra Server or local project using the
|
||||
<SPAN class="command"><STRONG>ghidra:</STRONG></SPAN> protocol, while <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> refers to a BSim database server with the appropriate
|
||||
<SPAN class="command"><STRONG>postgresql:</STRONG></SPAN>, <SPAN class=
|
||||
"command"><STRONG>https:</STRONG></SPAN>, or <SPAN class=
|
||||
"command"><STRONG>file:</STRONG></SPAN> protocol specified. The <SPAN class=
|
||||
"command"><STRONG>elastic:</STRONG></SPAN> protocol is equivalent to and may be used in
|
||||
place of the <SPAN class="command"><STRONG>https:</STRONG></SPAN> protocol.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="variablelist">
|
||||
<DL class="variablelist">
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>createdatabase</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Creates a new empty repository. A URL and configuration template (<SPAN class=
|
||||
"bold"><STRONG>config_template</STRONG></SPAN>) is required. The new database name
|
||||
is taken from the path element of the URL.</P>
|
||||
|
||||
<P>Supported configuration templates (<SPAN class=
|
||||
"bold"><STRONG>config_template</STRONG></SPAN>) are defined within the Ghidra
|
||||
installation in XML form. The following configurations are currently defined:
|
||||
(<SPAN class="bold"><STRONG>large_32, medium_32, medium_64, medium_cpool,
|
||||
medium_nosize</STRONG></SPAN>).</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies a formal, more
|
||||
descriptive, name for the repository that can be used for the BSim client
|
||||
display.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>owner=</STRONG></SPAN> - gives a descriptive name
|
||||
for the owner of the repository and/or the data it will contain.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>description=</STRONG></SPAN> - specifies a short
|
||||
string describing the intended contents of the new repository.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--nocallgraph=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>yes/no</EM></SPAN> - disables storing call relationships between
|
||||
ingested functions. Default is to store call relationships.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>setmetadata</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Change the global <SPAN class="emphasis"><EM>name</EM></SPAN>, <SPAN class=
|
||||
"emphasis"><EM>owner</EM></SPAN>, or <SPAN class=
|
||||
"emphasis"><EM>description</EM></SPAN> metadata associated with a BSim server. A
|
||||
BSim server URL is required.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies a formal, more
|
||||
descriptive, name for the repository that can be used for the BSim client
|
||||
display.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>owner=</STRONG></SPAN> - gives a descriptive name
|
||||
for the owner of the repository and/or the data it will contain.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>description=</STRONG></SPAN> - specifies a short
|
||||
string describing the intended contents of the new repository.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>addexecategory</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Specify a new executable category to be included with generated metadata. A BSim
|
||||
server URL and the name of the new category are required. This only affects future
|
||||
ingest commands. Executables that have already been ingested are unaffected,
|
||||
although they can be adjusted with an <SPAN class=
|
||||
"command"><STRONG>updaterepo</STRONG></SPAN> command.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>date</STRONG></SPAN> - indicates the new category
|
||||
holds date/time information.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>addfunctiontag</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Specify a new function tag to be included with generated metadata. A BSim server
|
||||
URL and the name of the new tag are required. This only affects future ingest
|
||||
commands. Functions that have already been ingested are unaffected, although they
|
||||
can be adjusted with an <SPAN class="command"><STRONG>updaterepo</STRONG></SPAN>
|
||||
command.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>dropindex</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Delete the main signature index from a BSim repository (in preparation for new
|
||||
ingest). A BSim repository URL is required. Normal queries will not complete or
|
||||
will be extremely slow.</P>
|
||||
|
||||
<P><STRONG>NOTE:</STRONG> Not supported by H2 file database</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>rebuildindex</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Recreate the main signature index (that had previously been dropped) for a BSim
|
||||
repository. A BSim server URL is required. After this command completes, normal
|
||||
function queries should be fast.</P>
|
||||
|
||||
<P><STRONG>NOTE:</STRONG> Not supported by H2 file database</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>prewarm</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Instruct a restarted BSim server to preload pages from the main signature index
|
||||
and function table into RAM. This avoids slow random access disk reads on initial
|
||||
queries. A BSim server URL is required.</P>
|
||||
|
||||
<P><STRONG>NOTE:</STRONG> Not supported by Elasticsearch or H2 file databases</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>generatesigs</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Generates function signatures and metadata for all program files retrieved from
|
||||
a Ghidra Server repository or project as specified by a Ghidra URL. The generated
|
||||
signatures may be retained as XML "sigs_" files within a specified XML storage
|
||||
directory and/or commited to a specified BSim database specified with the <SPAN
|
||||
class="command"><STRONG>bsim=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> option. If an XML storage directory is not
|
||||
specified, a BSim URL must be specified to which the data will be committed.</P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>config=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>config-template</EM></SPAN> option may be specified when generating
|
||||
XML "sigs_" signature files in the absence of a BSim database (see
|
||||
<STRONG>createdatabase</STRONG> for supported configurations). The generated files
|
||||
will be written to the specified XML storage directory. Creation of the signature
|
||||
files can also be achieved by specifying the <STRONG>bsim=</STRONG><EM>bsimURL</EM>
|
||||
option instead of the <STRONG>config=</STRONG> option.</P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>--overwrite</STRONG></SPAN> <SPAN class=
|
||||
"emphasis">option may be specified when an XML storage directory has also been
|
||||
specified to allow conflicting signature files to be overwritten.</SPAN></P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>--commit</STRONG></SPAN> <SPAN class=
|
||||
"emphasis">option may be specified when a BSim URL has also been specified to allow
|
||||
generated signatures to be committed to the BSim database. This option is implied
|
||||
when a BSim URL has been specified without an XML storage directory.</SPAN></P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>commitsigs</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Commit previously generated signatures and metadata (see
|
||||
<STRONG>signaturerepo</STRONG>) to a BSim repository. A URL specifying the BSim
|
||||
repository and a path to a directory containing the "sigs_" XML files to commit are
|
||||
required.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>override=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>ghidraURL</EM></SPAN> - causes any Ghidra repository/project URL,
|
||||
describing the storage repository and path of executables that was recorded in the
|
||||
"sigs_" XML files during signature generation, to be overridden during the commit
|
||||
operation with the specified Ghidra URL.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>generateupdates</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Generates updated function metadata for program files from a Ghidra Server
|
||||
repository or project, as specified by a Ghidra URL, which previously had signature
|
||||
and metadata generated (see <STRONG>generatesigs</STRONG>). Only metadata: names,
|
||||
function tags, categories, etc. are changed. Signatures are not affected. The
|
||||
generated updates may be retained as XML "update_" files within a specified XML
|
||||
storage directory and/or commited to a specified BSim database specified with the
|
||||
<SPAN class="command"><STRONG>bsim=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> option. If an XML storage directory is not
|
||||
specified, a BSim URL must be specified to which the data will be committed.</P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>config=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>config-template</EM></SPAN> option may be specified when generating
|
||||
XML "update_" files in the absence of a BSim database (see
|
||||
<STRONG>createdatabase</STRONG> for supported configurations). The generated files
|
||||
will be written to the specified XML storage directory. Creation of the update
|
||||
files can also be achieved by specifying the <STRONG>bsim=</STRONG><EM>bsimURL</EM>
|
||||
option instead of the <STRONG>config=</STRONG> option.</P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>--overwrite</STRONG></SPAN> <SPAN class=
|
||||
"emphasis">option may be specified when an XML storage directory has also been
|
||||
specified to allow conflicting update files to be overwritten.</SPAN></P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>--commit</STRONG></SPAN> <SPAN class=
|
||||
"emphasis">option may be specified when a BSim URL has also been specified to allow
|
||||
generated updates to be committed to the BSim database. This option is implied when
|
||||
a BSim URL has been specified without an XML storage directory.</SPAN></P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>commitupdates</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Update a BSim repository with previously generated update metadata (see
|
||||
<STRONG>generateupdates</STRONG>). A URL specifying the BSim repository and a path
|
||||
to a directory containing the "update_" XML files to commit are required.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>listexes</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>List all executable program records within a specified BSim database repository
|
||||
which satisfy the specified criteria. A BSim URL specifying the repository must be
|
||||
provided, and one of two options, <SPAN class=
|
||||
"command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||
also be given. All matching executable records will be listed.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies an executable via its MD5
|
||||
checksum.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||
name which may match one or more executable records.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||
as a Ghidra processor id which will be used to filter executables.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||
specification id which will be used to filter executables.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>sortcol=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>column</EM></SPAN> - Indicates which display column should be used
|
||||
to sort the results (<STRONG>MD5 | NAME</STRONG>; default:
|
||||
<STRONG>MD5</STRONG>).</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>limit=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>max_count</EM></SPAN> - specifies the maximum number of executables
|
||||
to be listed which match the search criteria (default=20, a value of 0 indicates no
|
||||
limit).</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--includelibs</STRONG> - If specified, executable
|
||||
records which correspond to a referenced Library will be included. Such records
|
||||
have a fabricated MD5 which is based on its name.</SPAN></P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>getexecount</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Get the total number of executable program records within a specified BSim
|
||||
database repository which satisfy the specified criteria. A BSim URL specifying the
|
||||
repository must be provided, and one of two options, <SPAN class=
|
||||
"command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||
also be given. All matching executable records will be listed.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies an executable via its MD5
|
||||
checksum.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||
name which may match one or more executable records.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||
as a Ghidra processor id which will be used to filter executables.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||
specification id which will be used to filter executables.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--includelibs</STRONG> - If specified, executable
|
||||
records which correspond to a referenced Library will be included. Such records
|
||||
have a fabricated MD5 which is based on its name.</SPAN></P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>delete</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Remove all records associated with a specific executable from a BSim repository.
|
||||
A BSim URL specifying the repository must be provided, and one of two options,
|
||||
<SPAN class="command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||
also be given. All associated executable and function records are removed.
|
||||
If an executable cannot be uniquely identified an error will result.
|
||||
</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies the executable via its MD5
|
||||
checksum.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||
name which may match one or more executable records.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||
as a Ghidra processor id, when the <SPAN class=
|
||||
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||
executable.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||
id string, when the <SPAN class="command"><STRONG>name</STRONG></SPAN> option is
|
||||
not enough to uniquely specify the executable.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>listfuncs</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>List all function records associated with a specific executable from a BSim
|
||||
repository. A BSim URL specifying the repository must be provided, and one of two
|
||||
options, <SPAN class="command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||
also be given. All associated executable and function records are listed. If an
|
||||
executable cannot be uniquely identified an error will result.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies the executable via its MD5
|
||||
checksum.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||
name which may match one or more executable records.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||
as a Ghidra processor id, when the <SPAN class=
|
||||
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||
executable.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||
id string, when the <SPAN class="command"><STRONG>name</STRONG></SPAN> option is
|
||||
not enough to uniquely specify the executable.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--printselfsig</STRONG></SPAN> - If specified, each
|
||||
function listed will be prefixed by a calculated self-significance score. This value is
|
||||
expressed as a decimal value.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--callgraph</STRONG></SPAN> - If specified, a list
|
||||
of all library functions called by the identified executable will be listed after
|
||||
the function list.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>--printjustexe</STRONG> - If specified, only a
|
||||
summary of the executable will be displayed. If <STRONG>--callgraph</STRONG> was
|
||||
also specified only the called libraries will be listed and not the specified
|
||||
functions.</SPAN></P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>maxfunc=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>max_count</EM></SPAN> - specifies the maximum number of functions to
|
||||
be listed which correspond to the identified executable (default=1000, a value of 0
|
||||
indicates no limit).</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>dumpsigs</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>Dump signature and metadata from a BSim repository for a specific executable to
|
||||
a "sigs_" XML file. A BSim server URL and a path to a directory where the new file
|
||||
will be stored must be given. One of two options, <SPAN class=
|
||||
"command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>name=</STRONG></SPAN>, that specify the particular executable
|
||||
must also be given. If an executable cannot be uniquely identified an error will result.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies an executable via its MD5
|
||||
checksum.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||
name which may match one or more executable records.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||
as a Ghidra processor id, when the <SPAN class=
|
||||
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||
executable.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||
specification id, when the <SPAN class=
|
||||
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||
executable.</P>
|
||||
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>--Global
|
||||
Options--</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>These options apply to all <SPAN class="command"><STRONG>bsim</STRONG></SPAN>
|
||||
commands.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>user=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>name</EM></SPAN> - specifies a user to masquerade as when connecting
|
||||
to the server.</P>
|
||||
|
||||
<P><SPAN class="command"><STRONG>cert=</STRONG></SPAN><SPAN class=
|
||||
"emphasis"><EM>path</EM></SPAN> - provides a path to the user's certificate when
|
||||
connecting to a server that requires PKI authentication.</P>
|
||||
</DD>
|
||||
</DL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="URLs"></A>Ghidra and BSim URLs</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Ghidra utilizes Universal Resource Locators (URLs) to identify both <EM>Ghidra
|
||||
Server/Project Repositories</EM> and <EM>BSim Databases</EM>. See the corresponding sections
|
||||
below for specific formatting details. It is important to note that local <EM>ghidra</EM> and
|
||||
<EM>file</EM> URLs never include a double-slash after the protocol (i.e, "://").</P>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title" style="clear: both"><A name="GhidraURLs"></A>Ghidra Server/Project
|
||||
Repository URLs</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>BSim command-line tools, as well as the Ghidra GUI, utilize a URL to specify the
|
||||
location of a remote Ghidra Server repository or a local Ghidra Project. Both cases work in
|
||||
a very similar fashion other than the format of the URL and potential limitations of a
|
||||
local Project URL. Use of a Ghidra Server repository and corresponding URLs is preferred
|
||||
since any Ghidra URL metadata added to a shared BSim database has the ability to be
|
||||
accessed by other users, while a local Ghidra Project URL is very limited in its visibility
|
||||
and path validity on other systems. For this reason, use of a local Ghidra Project URL
|
||||
should be restricted to use with a local H2 BSim Database file.</P>
|
||||
|
||||
<P>The format of a remote <EM>Ghidra Server URL</EM> is distinctly different from a
|
||||
<EM>Local Ghidra Project URL</EM>. These URLs have the following formats:</P>
|
||||
|
||||
<P><STRONG>Remote Ghidra Server Repository</STRONG><BR>
|
||||
</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class=
|
||||
"computeroutput">ghidra://<hostname>[:<port>]/<repository_name>[/<folder_path>]</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>If the default Ghidra Server port (1111) is in use it need not be specified with URL.
|
||||
The <EM>hostname</EM> may specify either a Fully Qualified Domain Name (FQDN, e.g.,
|
||||
<EM>host.abc.com</EM>) or IP v4 Address (e.g., <EM>1.2.3.4</EM>).</P>
|
||||
<STRONG>Local Ghidra Project</STRONG><BR>
|
||||
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class=
|
||||
"computeroutput">ghidra:[/<directory_path>]/<project_name>[?/<folder_path>]</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>For local project URLs, the absolute directory path containing the project
|
||||
<EM>*.gpr</EM> locator file must be specified with the project name. The project name
|
||||
should exclude any <EM>.gpr/.rep</EM> suffix. Only the '/' character should be used as a
|
||||
directory separator. In addition, when running on Windows, the directory path should
|
||||
include its drive desigation preceeded by a '/' (e.g., <CODE class=
|
||||
"computeroutput">ghidra:/C:/mydir/myproject?/folderA/folderB</CODE>).</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title" style="clear: both"><A name="BSimURLs"></A>BSim Database URLs</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>BSim command-line tools utilize a URL to specify the type and specific details required
|
||||
to establish a connection to a specific BSim Database. Within the Ghidra GUI the database
|
||||
details are not specified using a URL and is done using an interactive form. Each BSim
|
||||
database type has a distinct URL format:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" cellpadding="2" class="simplelist">
|
||||
<TR>
|
||||
<TH>Database Type</TH>
|
||||
|
||||
<TH align="left">URL Format</TH>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>PostgreSQL</TD>
|
||||
|
||||
<TD><CODE class=
|
||||
"computeroutput">postgresql://<hostname>[:<port>]/<dbname></CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>Elasticsearch</TD>
|
||||
|
||||
<TD><CODE class=
|
||||
"computeroutput">https://<hostname>[:<port>]/<dbname></CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>Elasticsearch</TD>
|
||||
|
||||
<TD><CODE class=
|
||||
"computeroutput">elastic://<hostname>[:<port>]/<dbname></CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>H2 File</TD>
|
||||
|
||||
<TD><CODE class=
|
||||
"computeroutput">file:[/<directory_path>]/<dbname></CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>The use of the <EM>https</EM> and <EM>elastic</EM> is equivalent.</P>
|
||||
|
||||
<P>For local <EM>file</EM> URLs, the absolute path the H2 database <EM>*.mv.db</EM> file
|
||||
must be specified without the <EM>*.mv.db</EM> extension. Only the '/' character should be
|
||||
used as a directory separator. In addition, when running on Windows, the directory path
|
||||
should include its drive desigation preceeded by a '/' (e.g., <CODE class=
|
||||
"computeroutput">file:/C:/mydir/mydb</CODE>).</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</BODY>
|
||||
</HTML>
|
@ -0,0 +1,993 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META name="generator" content=
|
||||
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
|
||||
<TITLE>Database Configuration</TITLE>
|
||||
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||
<LINK rel="home" href="index.html" title="BSim Database">
|
||||
<LINK rel="up" href="index.html" title="BSim Database">
|
||||
<LINK rel="prev" href="DatabaseOverview.html" title="BSim Database">
|
||||
<LINK rel="next" href="IngestProcess.html" title="Ingesting Executables">
|
||||
</HEAD>
|
||||
|
||||
<BODY>
|
||||
<DIV class="chapter">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H1 class="title"><A name="DatabaseConfiguration"></A>Database Configuration</H1>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="ConfigOverview"></A>Overview</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The server for the BSim Database is distinct from the traditional Ghidra server,
|
||||
although for many use cases it is convenient to have both running and view the BSim server
|
||||
as a loosely coupled extension to the base Ghidra Server. In terms of start-up, shutdown,
|
||||
and configuration however, the two servers are completely separate.</P>
|
||||
|
||||
<P>There are two choices for deploying a shared server for the BSim Database: PostgreSQL or
|
||||
Elasticsearch. In addition, a local file-based database may be employed which utilizes an
|
||||
integrated H2 Database engine. This file-based database is intended for smaller datasets
|
||||
and its use is limited to a single process.</P>
|
||||
|
||||
<P>PostgreSQL software, including the extension necessary for BSim signature indexing,
|
||||
comes prepackaged with the Ghidra distribution. It runs on a single host and makes
|
||||
efficient use of whatever CPU, memory, and disk resources are made available to it.
|
||||
PostgreSQL is a highly robust and capable server that should perform well on minimally
|
||||
configured workstations up to high-end production hardware.</P>
|
||||
|
||||
<P>An Elasticsearch BSim plug-in is included with the Ghidra distribution, but the core
|
||||
server software must be obtained separately by the database administrator. Elasticsearch is
|
||||
a scalable text search and analytics database. It automatically distributes itself across
|
||||
machines in a cluster, allowing individual database queries and requests to be serviced in
|
||||
parallel. Support for BSim in Elasticsearch should still be considered in prototype, but
|
||||
all major functionality has been implemented, and the BSim schema takes full advantage of
|
||||
Elasticsearch as a distributed database.</P>
|
||||
|
||||
<P>BSim clients included in the base Ghidra distribution can interface to any of these
|
||||
databases.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="ServerConfig"></A>Server
|
||||
Configuration</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="PostConfig"></A>PostgreSQL Configuration</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The base Ghidra distribution comes with the PostgreSQL software and the extensions
|
||||
necessary for supporting a BSim database. The PostgreSQL server is most easily managed
|
||||
using the <SPAN class="bold"><STRONG>bsim_ctl</STRONG></SPAN> command-line script. When
|
||||
<SPAN class="bold"><STRONG>bsim_ctl start</STRONG></SPAN> is run for the first time (see
|
||||
below), the PostgreSQL software is unpacked, depending on the host OS, to either</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/Ghidra/Features/BSim/os/linux64/postgresql
|
||||
OR</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class=
|
||||
"computeroutput">$(ROOT)/Ghidra/Features/BSim/os/osx64/postgresql</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>BSim will not operate with PostgreSQL without the Ghidra specific extensions, but
|
||||
otherwise the provided installation is standard. It can be configured just like any other
|
||||
stand-alone PostgreSQL server. PostgreSQL is highly configurable, and there are no direct
|
||||
restrictions on modifying the configuration values. A default configuration is provided
|
||||
with this installation that has been tuned specifically for the BSim Database
|
||||
application, so in practice there may be little reason to modify it. But there are a few
|
||||
standard configuration values for the server that might need adjusting. These do impact
|
||||
important aspects of the server, like the amount of memory allocated to the server and
|
||||
access restrictions.</P>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="PostStartStop"></A>Starting and Stopping the
|
||||
Server</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The basic start-up and shut-down is accomplished with the same command-line script,
|
||||
which takes either the keyword <SPAN class="command"><STRONG>start</STRONG></SPAN> or
|
||||
<SPAN class="command"><STRONG>stop</STRONG></SPAN> as the first parameter. The second
|
||||
parameter must be an absolute path to the chosen data directory.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl start
|
||||
/path/to/datadir</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl start /path/to/datadir
|
||||
port=8000</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl stop
|
||||
/path/to/datadir</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl stop /path/to/datadir
|
||||
force</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>The data directory should already exist and should initially not contain any files.
|
||||
The first time a server is started for a particular data directory, a large number of
|
||||
configuration files and other sub-directories associated with the PostgreSQL server
|
||||
will automatically be created. Upon subsequent restarts the existing configuration will
|
||||
be reused.</P>
|
||||
|
||||
<P>The <SPAN class="bold"><STRONG>start</STRONG></SPAN> command can take an optional
|
||||
<SPAN class="bold"><STRONG>port=</STRONG></SPAN> parameter. This can be used to specify
|
||||
a non-standard port for the PostgreSQL server to listen on. In this case, any
|
||||
subsequent reference to the BSim server, in the Ghidra client, or with the <SPAN class=
|
||||
"command"><STRONG>bsim</STRONG></SPAN> command described below, must specify the port.
|
||||
When using the <SPAN class="command"><STRONG>bsim</STRONG></SPAN> command, a
|
||||
non-default port must be explicitly specified with the BSim <SPAN class=
|
||||
"command"><STRONG>postgresql://</STRONG></SPAN> URL (see <A class="xref" href=
|
||||
"CommandLineReference.html#URLs">“Ghidra and BSim URLs”</A> for more
|
||||
details).</P>
|
||||
|
||||
<P>The <SPAN class="command"><STRONG>stop</STRONG></SPAN> command can take the keyword
|
||||
<SPAN class="command"><STRONG>force</STRONG></SPAN> as an optional parameter. Without
|
||||
this, the shutdown of the server will wait until all currently connected clients finish
|
||||
their sessions. Adding this parameter will cause all clients to be disconnected
|
||||
immediately, rolling back any transactions, and the server will shutdown
|
||||
immediately.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="PostSecurityAuthentication"></A>Security and
|
||||
Authentication</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>BSim makes use of PostgreSQL security mechanisms to enforce privileges and
|
||||
authenticate users. The <SPAN class="command"><STRONG>bsim_ctl</STRONG></SPAN> command
|
||||
wraps the subset of functionality described here, but other adjustments are possible by
|
||||
connecting directly to the server and issuing SQL commands.</P>
|
||||
|
||||
<P>The PostgreSQL server, as configured for BSim, only accepts connections via SSL, so
|
||||
communications in transit are always encrypted regardless of the authentication
|
||||
settings.</P>
|
||||
|
||||
<P>PostgreSQL uses the concept of <SPAN class="emphasis"><EM>roles</EM></SPAN> to grant
|
||||
access privileges based on particular users. Generally, a user's role is determined by
|
||||
the <SPAN class="emphasis"><EM>username</EM></SPAN> used to establish the connection.
|
||||
For BSim, each user role is granted one of two privilege levels: <SPAN class=
|
||||
"command"><STRONG>user</STRONG></SPAN>, which allows read-only access to the server for
|
||||
normal queries, and <SPAN class="command"><STRONG>admin</STRONG></SPAN>, which
|
||||
additionally allows database creation, ingest, update, and deletion.</P>
|
||||
|
||||
<P>BSim supports three different authentication methods, when connecting as a client or
|
||||
during database ingest and maintenance. This method is established for a server by the
|
||||
initial <SPAN class="command"><STRONG>start</STRONG></SPAN> command.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="variablelist">
|
||||
<DL class="variablelist">
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>trust</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P><CODE class="computeroutput">bsim_ctl start /path/to/datadir
|
||||
auth=trust</CODE></P>
|
||||
|
||||
<P>This is currently the default. No authentication is performed and privilege
|
||||
is granted based on the user name presented. Masquerading is possible.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>password</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P><CODE class="computeroutput">bsim_ctl start /path/to/datadir
|
||||
auth=password</CODE></P>
|
||||
|
||||
<P>Users are authenticated via password. A default password 'changeme' is
|
||||
established when the new user is created. Passwords can be changed by the user
|
||||
from the BSim client or can be reset by an administrator via the <SPAN class=
|
||||
"command"><STRONG>resetpassword</STRONG></SPAN> command.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class="bold"><STRONG>pki</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P><CODE class="computeroutput">bsim_ctl start /path/to/datadir auth=pki
|
||||
ca=/path/to/rootcert</CODE></P>
|
||||
|
||||
<P>Users are authenticated by PKI certificates. Upon initialization, the BSim
|
||||
server must be provided (via the <SPAN class=
|
||||
"command"><STRONG>ca=</STRONG></SPAN> option) a file containing the public keys
|
||||
for the certificate authorities used to issue user's certificates. The file
|
||||
consists of the authoritative certificates in PEM format concatenated
|
||||
together.</P>
|
||||
|
||||
<P>BSim users must register their certificate with the Ghidra client using the
|
||||
<SPAN class="emphasis"><EM>Edit->Set PKI Certificate...</EM></SPAN> menu
|
||||
option from the Project dialog. The BSim client will automatically submit the
|
||||
certificate to a server that requests it, and the password to unlock it will be
|
||||
requested as needed. This is the same mechanism used to a access a PKI
|
||||
protected Ghidra server, and if a user needs access to both a BSim server and
|
||||
Ghidra server that are PKI protected, the servers should probably be configured
|
||||
with the same certificate authorities so that they will accept the same
|
||||
certificate from the user.</P>
|
||||
|
||||
<P>With PKI authentication enabled, at the time a new user role is established
|
||||
with the server, the X.509 Distinguished Name, as bound to the user's
|
||||
certificate, must be associated with the user name via the <SPAN class=
|
||||
"command"><STRONG>dn=</STRONG></SPAN> option. See <A class="xref" href=
|
||||
"#PostAddUser" title="Adding Users to the Database">“Adding Users to the
|
||||
Database”</A>.</P>
|
||||
</DD>
|
||||
</DL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The authentication method should be established once, the first time the <SPAN
|
||||
class="command"><STRONG>start</STRONG></SPAN> command is issued for the server on an
|
||||
empty data directory. Subsequent restarts of the server will not change these settings.
|
||||
If the settings really need to be changed, the <SPAN class=
|
||||
"command"><STRONG>changeauth</STRONG></SPAN> command can be issued. It takes the same
|
||||
options as the <SPAN class="command"><STRONG>start</STRONG></SPAN> command and can only
|
||||
be run if the server is shutdown first.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl changeauth
|
||||
/datadir/path auth=password</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>Using the <SPAN class="command"><STRONG>changeauth</STRONG></SPAN> command on a
|
||||
server with an established set of users will likely require other disruptive changes to
|
||||
create passwords or associate Distinguished Names with users, if they didn't exist
|
||||
before.</P>
|
||||
|
||||
<P>If it is determined that only the database administrators have OS level, local,
|
||||
access to the server's host machine, they can choose to use the <SPAN class=
|
||||
"command"><STRONG>noLocalAuth</STRONG></SPAN> option as part of the <SPAN class=
|
||||
"command"><STRONG>start</STRONG></SPAN> or <SPAN class=
|
||||
"command"><STRONG>changeauth</STRONG></SPAN> commands. This disables authentication for
|
||||
users connecting to the server by the 'localhost' interface. This may facilitate the
|
||||
use of scripts for ingest etc., where working with passwords is cumbersome.
|
||||
Authentication is still enforced for any remote connection.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="PostAddUser"></A>Adding Users to the Database</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The username used to start the server for the first time, causing the initialization
|
||||
of the data directory, becomes the administrator for that server. No other
|
||||
username/role is initially known to the server. New usernames/roles can be added to the
|
||||
server using the following command:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl adduser <SPAN class=
|
||||
"emphasis"><EM>username</EM></SPAN></CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl adduser <SPAN class=
|
||||
"emphasis"><EM>username</EM></SPAN> dn="C=US,ST=MD,CN=Firstname User"</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>If password authentication has been set for the server, the new user's password will
|
||||
initially be set to 'changeme'. If PKI authentication has been set for the server, The
|
||||
Distinguished Name, as bound to the new user's certificated must be provided when
|
||||
issuing the <SPAN class="command"><STRONG>adduser</STRONG></SPAN> command, via the
|
||||
<SPAN class="command"><STRONG>dn=</STRONG></SPAN> option. The Distinguished Name must
|
||||
be presented as a string containing a comma separated sequence of attribute/value pairs
|
||||
that uniquely identifies a certificate. Currently, the Common Name (CN=) is the only
|
||||
attribute inspected by the PostgreSQL server, so other attributes can be omitted.</P>
|
||||
|
||||
<P>New users are by default only given <SPAN class=
|
||||
"command"><STRONG>user</STRONG></SPAN> permissions, meaning that they can only place
|
||||
queries to the database and cannot ingest, update, or delete data. The new user can be
|
||||
given <SPAN class="command"><STRONG>admin</STRONG></SPAN> privileges (by an existing
|
||||
administrator) by issuing the command:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl changeprivilege <SPAN
|
||||
class="emphasis"><EM>username</EM></SPAN> admin</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="PostAdditionalConfig"></A>Additional
|
||||
Configuration</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The relevant configuration files are at the top level of the data directory:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">postgresql.conf</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">pg_hba.conf</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>The most important configuration parameters in <CODE class=
|
||||
"filename">postgresql.conf</CODE> are:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="variablelist">
|
||||
<DL class="variablelist">
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>shared_buffers</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>This controls the amount of RAM available for caching database pages across
|
||||
all connections to the server. The default should be reasonable in most
|
||||
situations, but for large databases or many simultaneous connections it might
|
||||
make sense to increase this.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>max_wal_size</STRONG></SPAN>,</SPAN> <SPAN class="term"><SPAN
|
||||
class="bold"><STRONG>checkpoint_timeout</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>These control how often the server forces database pages to be written back
|
||||
out to the file-system. The defaults are set to minimize disk writes when
|
||||
ingesting large numbers of records in one session. There should be little
|
||||
reason to change these values.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>ssl_cipher</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>This controls which ciphers the server allows when negotiating a connection.
|
||||
The defaults are reasonable, but administrators may want more control. The
|
||||
setting 'TLSv1.2', for instance, can be used to be compliant with the latest
|
||||
TLS standard.</P>
|
||||
</DD>
|
||||
</DL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The <CODE class="filename">pg_hba.conf</CODE> file is used to configure which
|
||||
connections the server accepts for a particular outward facing IP address and what
|
||||
security mechanisms are enforced for those connections. Currently all addresses are
|
||||
configured to accept SSL connections only, except possibly for 'localhost'.
|
||||
Administrators <SPAN class="emphasis"><EM>can</EM></SPAN> currently filter connections
|
||||
based on usernames and the particular database (which corresponds to Ghidra's concept
|
||||
of <SPAN class="emphasis"><EM>repository</EM></SPAN>).</P>
|
||||
|
||||
<DIV class="warning" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||||
<H3 class="title">Warning</H3>
|
||||
|
||||
<P>By default, the server accepts all connections from all users.</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="ConfigDefaults"></A>Configuration Defaults</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>There is a <CODE class="filename">serverconfig.xml</CODE> which contains a few of
|
||||
the default configuration values that are most crucial for the BSim Database. <SPAN
|
||||
class="bold"><STRONG>Beware:</STRONG></SPAN> This file is currently parsed only once
|
||||
for the entire <SPAN class="emphasis"><EM>lifetime</EM></SPAN> of a particular data
|
||||
directory: it is read only when the data directory is first initialized, i.e. the first
|
||||
time the <SPAN class="command"><STRONG>bsim_ctl start</STRONG></SPAN> command is
|
||||
invoked on the empty directory. This file is intended to provide reasonable defaults
|
||||
that are different from the standard PostgreSQL defaults. To provide site specific
|
||||
configuration, changes should be made to the normal PostgreSQL configuration files.</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="ElasticConfig"></A>Elasticsearch Configuration</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>A full description of how to configure an Elasticsearch cluster, including how to
|
||||
start and stop the server, is beyond the scope of this document. In particular, the <SPAN
|
||||
class="command"><STRONG>bsim_ctl</STRONG></SPAN> command-line, as described in <A class=
|
||||
"xref" href="DatabaseConfiguration.html#PostConfig" title=
|
||||
"PostgreSQL Configuration">“PostgreSQL Configuration”</A>, does not apply to
|
||||
Elasticsearch. Complete documentation is available on-line from the Elasticsearch
|
||||
website.</P>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="ElasticInstall"></A>Installing the Plug-in</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>In order to make use of Elasticsearch with BSim, the database administrator must
|
||||
install the <SPAN class="emphasis"><EM>lsh.zip</EM></SPAN> plug-in as part of the
|
||||
Elasticsearch deployment. The plug-in is available in the Ghidra add-on named <SPAN
|
||||
class="emphasis"><EM>BSimElasticPlugin</EM></SPAN>, which unpacks into a standard
|
||||
Ghidra installation. The file <SPAN class="emphasis"><EM>lsh.zip</EM></SPAN> is a
|
||||
standard Elasticsearch plug-in that must be installed on every node of the cluster
|
||||
before a BSim repository can be created. The Elasticsearch distribution typically comes
|
||||
preconfigured for a single node deployment. The description below shows how to enable
|
||||
BSim on such a toy deployment, but this will need to be extended to support an entire
|
||||
cluster.</P>
|
||||
|
||||
<P>Assuming the add-on has been unpacked, the plug-in can be installed to a single node
|
||||
using the <SPAN class="emphasis"><EM>elasticsearch-plugin</EM></SPAN> command in the
|
||||
<SPAN class="emphasis"><EM>bin</EM></SPAN> directory of the node's Elasticsearch
|
||||
installation.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">bin/elasticsearch-plugin install
|
||||
file:///path/to/ghidra/Ghidra/contrib/BSimElasticPlugin/data/lsh.zip</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>Replace the initial portion of the absolute path in the URL to point to the Ghidra
|
||||
installation. Once the plug-in is installed, the toy deployment can be (re)started from
|
||||
the command-line by running</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">bin/elasticsearch</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>This will dump logging messages to the console, and you should see <CODE class=
|
||||
"computeroutput">[lsh]</CODE> listed among the loaded plug-ins as the node starts
|
||||
up.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect3">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H4 class="title"><A name="ElasticURL"></A>The Elasticsearch URL</H4>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Assuming an Elasticsearch cluster is running and the plug-in has been properly
|
||||
installed, all other parts of BSim interact transparently with the cluster. The <SPAN
|
||||
class="command"><STRONG>bsim</STRONG></SPAN> command, described in <A class="xref"
|
||||
href="IngestProcess.html" title="Ingesting Executables"><I>Ingesting
|
||||
Executables</I></A>, and the Ghidra/BSim client, described in <A class="xref" href=
|
||||
"../BSimSearchPlugin/BSimSearch.html" title="Querying a BSim Database"><I>Querying a BSim
|
||||
Database</I></A>, require no additional configuration to work with Elasticsearch,
|
||||
except users must provide the correct URL to establish a connection. Elasticsearch
|
||||
communicates over <SPAN class="emphasis"><EM>https</EM></SPAN>, and BSim clients
|
||||
automatically assume they are communicating with Elasticsearch when they see this
|
||||
protocol. Alternatively, the protocol may be specified as <SPAN class=
|
||||
"emphasis"><EM>elastic</EM></SPAN> when using the <SPAN class=
|
||||
"command"><STRONG>bsim</STRONG></SPAN> command. Elasticsearch use by BSim assumes a
|
||||
default port of 9200 unless otherwise specified when specifying the server host. See <A
|
||||
class="xref" href="CommandLineReference.html#URLs">“Ghidra and BSim
|
||||
URLs”</A> for additional information about URLs.</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="CreateDatabase"></A>Creating a
|
||||
Database</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>If using either PostgreSQL or Elasticsearch the server must be properly configured and
|
||||
running before a <SPAN class="bold"><STRONG>database</STRONG></SPAN> can be created. In the
|
||||
case of an H2 file-based database there is no server requirement. Only after a database has
|
||||
been created can data be ingested or queries performed. In this context, a database is a
|
||||
single container of reverse engineered functions. Metadata pertaining to executables and
|
||||
call-graph relationships is also stored, but the principle database record describes a
|
||||
<SPAN class="emphasis"><EM>function</EM></SPAN>. A single PostgreSQL or Elasticsearch
|
||||
server can hold multiple independent databases.</P>
|
||||
|
||||
<P>A database is created using the <SPAN class="command"><STRONG>bsim</STRONG></SPAN>
|
||||
command script. The basic command looks like</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim createdatabase <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> <SPAN class=
|
||||
"emphasis"><EM>config_template</EM></SPAN></CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>A BSim database is completely distinct from the Ghidra Server or Ghidra project, so the
|
||||
executables and functions contained within do not need to coincide at all.</P>
|
||||
|
||||
<P>The Ghidra GUI client specifies a BSim database with its explicit characteristics (i.e.,
|
||||
DB type, name, host/port if applicable, etc.), while the <SPAN class=
|
||||
"command"><STRONG>bsim</STRONG></SPAN> command accepts a <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> which includes similar details (see <A class="xref"
|
||||
href="CommandLineReference.html#URLs">“Ghidra and BSim URLs”</A> for more
|
||||
details).</P>
|
||||
|
||||
<P>The <SPAN class="emphasis"><EM>config_template</EM></SPAN> parameter passed to <SPAN
|
||||
class="command"><STRONG>bsim createdatabase</STRONG></SPAN> names a collection of specific
|
||||
configuration values for the newly created database. A standard Ghidra distribution
|
||||
provides a number of predefined templates (See below) designed for specific database use
|
||||
cases. It is simplest to use a predefined template when creating a database, but it is
|
||||
possible to edit an existing template or create a new template (See <A class="xref" href=
|
||||
"DatabaseConfiguration.html#DatabaseTemplates" title=
|
||||
"Creating Database Templates">“Creating Database Templates”</A>).</P>
|
||||
|
||||
<P>There are two critical database properties being determined by the template that need to
|
||||
be kept in mind: the <SPAN class="bold"><STRONG>index tuning</STRONG></SPAN> and the <SPAN
|
||||
class="bold"><STRONG>weighting scheme</STRONG></SPAN> relative to the size of the database.
|
||||
The two pieces of the template name, separated by the '_' character, refer to these
|
||||
concerns.</P>
|
||||
|
||||
<P>The index tuning affects the use of the database by trading off between, the time
|
||||
required to perform individual queries, the amount of variation between matching functions
|
||||
a query can tolerate, and the amount of storage required per database record. Ideally, the
|
||||
database is tuned, before the initial ingest occurs, to the <SPAN class=
|
||||
"emphasis"><EM>anticipated size</EM></SPAN> of the database. The database can trade off
|
||||
storage size (per record) and latency for overall query response time, but the decision
|
||||
needs to be made before the database is populated. Currently there is a <SPAN class=
|
||||
"bold"><STRONG>medium</STRONG></SPAN> tuning that is ideal for repositories that will store
|
||||
on the order of 10 million functions. There is also a <SPAN class=
|
||||
"bold"><STRONG>large</STRONG></SPAN> tuning, which uses more storage but can maintain fast
|
||||
query times for databases with 100 million functions or more. There is a large overlap for
|
||||
these tunings, so if its unclear how large a database might grow, go ahead and use the
|
||||
medium tuning.</P>
|
||||
|
||||
<P>The weighting scheme affects how BSim views the relative importance of individual code
|
||||
constructs within a function. Code constructions are extracted as <SPAN class=
|
||||
"emphasis"><EM>features</EM></SPAN>, and each feature is assigned a weight. The basic
|
||||
schemes are: <SPAN class="bold"><STRONG>32</STRONG></SPAN> for 32-bit compiled code, <SPAN
|
||||
class="bold"><STRONG>64</STRONG></SPAN> for 64-bit code. The scheme that matches the
|
||||
predominant form of code in the repository being ingested should be used. Mixed schemes are
|
||||
possible, but a corpus which is predominantly 32-bit, even with a small number of 64-bit
|
||||
executables mixed in, should still use the 32-bit weights.</P>
|
||||
|
||||
<P>There are some weighting schemes designed for more specialized code. The <SPAN class=
|
||||
"bold"><STRONG>64_32</STRONG></SPAN> scheme is for 64-bit code using 32-bit pointers. The
|
||||
<SPAN class="bold"><STRONG>nosize</STRONG></SPAN> scheme allows better matching of 32-bit
|
||||
functions to 64-bit functions, when they are compiled from the same source. The <SPAN
|
||||
class="bold"><STRONG>cpool</STRONG></SPAN> scheme is designed for Java byte-code or Dalvik
|
||||
executables. For more discussion of weighting, see <A class="xref" href=
|
||||
"FeatureWeight.html#WeightingSoftware" title="Weighting Software Features">“Weighting
|
||||
Software Features”</A>.</P>
|
||||
|
||||
<P>The full template name incorporates both an index tuning and a weight scheme. Some
|
||||
common examples of template names:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="variablelist">
|
||||
<DL class="variablelist">
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>medium_32</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>A medium index tuning with a weighting scheme designed for 32-bit
|
||||
executables.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>medium_64</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>A medium index tuning with a weighting scheme designed for 64-bit
|
||||
executables.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>large_32</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>A 32-bit weighting scheme with tuning for a large database size.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>medium_cpool</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>A medium index tuning with a weighting scheme for Java executables.</P>
|
||||
</DD>
|
||||
|
||||
<DT><SPAN class="term"><SPAN class=
|
||||
"bold"><STRONG>medium_nosize</STRONG></SPAN></SPAN></DT>
|
||||
|
||||
<DD>
|
||||
<P>A medium index tuning with a weighting scheme allowing matches between 32-bit
|
||||
and 64-bit code.</P>
|
||||
</DD>
|
||||
</DL>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="TailorBSim"></A>Tailoring BSim
|
||||
Metadata</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>There is some facility to tailor a specific BSim database instance so that it can ingest
|
||||
and/or report information about executables or functions to make results more useful for a
|
||||
specific project or user. Capabilities can be added after a database has been created and
|
||||
is running by issuing specific <SPAN class="command"><STRONG>bsim</STRONG></SPAN> commands,
|
||||
but they can also be added to a <SPAN class="emphasis"><EM>configuration
|
||||
template</EM></SPAN> prior to creating the database, which provides a record of the
|
||||
specific additions should the database instance need to be recreated or multiple tailored
|
||||
instances be deployed. For additions that allow the ingest of more metadata about
|
||||
executables or functions, users must provide additional scripts to Ghidra during the ingest
|
||||
process in order to read in or discover the new metadata.</P>
|
||||
|
||||
<P>The <SPAN class="bold"><STRONG>Name</STRONG></SPAN>, <SPAN class=
|
||||
"bold"><STRONG>Owner</STRONG></SPAN>, and <SPAN class=
|
||||
"bold"><STRONG>Description</STRONG></SPAN> associated with a BSim instance can be trivially
|
||||
tailored with the <SPAN class="command"><STRONG>bsim setmetadata</STRONG></SPAN>
|
||||
command.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim setmetadata <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> "name=BSim Database"</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim setmetadata <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> "owner=Administrators"</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim setmetadata <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> "description=Files of interest"</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>This information is displayed in various windows by the BSim client. The values can be
|
||||
changed at any time and do not otherwise affect the records contained in the database.
|
||||
Multiple command-line parameters can be fed to <SPAN class="command"><STRONG>bsim
|
||||
setmetadata</STRONG></SPAN> so long as each one starts with <SPAN class=
|
||||
"bold"><STRONG>name=</STRONG></SPAN>, <SPAN class="bold"><STRONG>owner=</STRONG></SPAN>, or
|
||||
<SPAN class="bold"><STRONG>description=</STRONG></SPAN> respectively. Quoting may be
|
||||
necessary to get some strings to be interpreted as a single command-line parameter.</P>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="ExeCat"></A>Executable Categories</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>BSim provides the powerful ability to associate new types of metadata with each
|
||||
executable that the database ingests. Any method of categorizing executables that
|
||||
describes an executable with a simple string value, referred to here as an executable
|
||||
<SPAN class="bold"><STRONG>category</STRONG></SPAN>, can be added as a field to the
|
||||
database. With only minor adjustments to the ingest process, new category values can be
|
||||
automatically attached to incoming executables and are treated like any other executable
|
||||
field that BSim understands. Category values are retrieved with queries, can be used for
|
||||
filtering, and show up as sortable columns in result tables.</P>
|
||||
|
||||
<P>All categories have a formal name (or type), which is used both in the ingest process
|
||||
(See below) and as the label for table columns. The name can contain alphanumeric
|
||||
characters or punctuation from the limited set, ' ._:/()'. For each executable there can
|
||||
be zero, one, or more <SPAN class="emphasis"><EM>string</EM></SPAN> values associated
|
||||
with the category. No value is required for the executable, and any single value can be
|
||||
used for filtering (either the executable is labeled with the value or it is not) even if
|
||||
there are multiple values for that category. If there are multiple values, a query that
|
||||
matches the executable will return all the values as a single sorted column entry.</P>
|
||||
|
||||
<P>It is also possible to create a special time-based category. This category can have
|
||||
any name as above, but instead of associating string values with the executable, it
|
||||
associates a single time-stamp. The time-stamp has precision down to the millisecond and
|
||||
provides filtering and sorting based on time. Internally, this new category repurposes
|
||||
the column storage originally providing an executable's <SPAN class="emphasis"><EM>Ingest
|
||||
Date</EM></SPAN> field. As a result, any BSim instance
|
||||
can have only one time category and only one time-stamp within it. The ingest scripting
|
||||
must provide any actual time-stamp value for the executable, or the database will fill in
|
||||
the "epoch", 12:00 am, Jan 1, 1970.</P>
|
||||
|
||||
<P>A new category can be added to the database at any time using the <SPAN class=
|
||||
"command"><STRONG>bsim addexecategory</STRONG></SPAN> command.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim addexecategory <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> MyCategoryName</CODE></TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim addexecategory <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> MyTimeField date</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>The single time-stamp field can be renamed by appending the keyword "date" to the
|
||||
command after the name of the category. Once a category, the corresponding program
|
||||
options set for any new executables will automatically read into the database as part of
|
||||
the ingest process. Previously ingested executables, assuming they have the new program
|
||||
options set, can be updated within the BSim database using one of the <SPAN class=
|
||||
"command"><STRONG>bsim updaterepo</STRONG></SPAN> command variants. In either case, the
|
||||
relevant program options typically need to be filled by running a Ghidra script (See <A
|
||||
class="xref" href="IngestProcess.html#IngestExeCat" title=
|
||||
"Ingesting Executable Categories">“Ingesting Executable Categories”</A>).
|
||||
There is currently no method for deleting a category once it has been created.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="FunctionTags"></A>Function Tags</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>BSim can be configured to recognize specific <SPAN class="bold"><STRONG>Function
|
||||
Tags</STRONG></SPAN>, which are named Boolean properties on individual functions within
|
||||
an executable. Within a Ghidra program, any number of different function tags can be
|
||||
established by the user and are used to label individual functions or specific subsets of
|
||||
functions that share a particular property. This would typically be used to label classes
|
||||
of functions that are important to analysts, unpacked functions could be labeled with the
|
||||
tag <SPAN class="emphasis"><EM>UNPACKED</EM></SPAN> for instance.</P>
|
||||
|
||||
<P>In order for BSim to recognize specific function tags, they must be individually
|
||||
registered with the BSim database. These tags are then automatically ingested into the
|
||||
database, along with the other standard metadata describing functions, and can be used to
|
||||
filter match results when querying the database. A function tag has a formal name, which
|
||||
can be displayed as part of the function header within the main code browser and is used
|
||||
for BSim columns and filter labels. Once the tag is created for a program, functions
|
||||
universally have the tag as a Boolean property, either the name applies to a function or
|
||||
it doesn't, and arbitrary subsets can be <SPAN class="emphasis"><EM>tagged</EM></SPAN>
|
||||
with that name.</P>
|
||||
|
||||
<P>A tag must be <SPAN class="emphasis"><EM>registered</EM></SPAN> with a BSim database
|
||||
before it can be used as a filter or seen in results. A tag can be registered at any time
|
||||
with the <SPAN class="command"><STRONG>bsim addfunctiontag</STRONG></SPAN> command.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/support/bsim addfunctiontag <SPAN class=
|
||||
"emphasis"><EM>bsimURL</EM></SPAN> MyTagName</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>The new tag will automatically be read in when any new executables are ingested. If
|
||||
previously ingested executables already had the new tags before they were registered,
|
||||
their metadata within BSim database can be updated using the <SPAN class=
|
||||
"command"><STRONG>bsim updaterepo</STRONG></SPAN> command variants. BSim is limited to 29
|
||||
registered tag names, and there is currently no way to remove a tag once it has been
|
||||
registered.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="DatabaseTemplates"></A>Creating Database Templates</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>It is possible to create tailored database configuration templates so that
|
||||
implementors have a permanent and accessible record of a particular set-up and don't need
|
||||
to repeatedly issue <SPAN class="command"><STRONG>bsim setmetadata</STRONG></SPAN> and
|
||||
<SPAN class="command"><STRONG>bsim addexecategory</STRONG></SPAN> when creating a
|
||||
database. Other aspects of a database can also be manipulated, like weighting schemes and
|
||||
index tuning, but doing this properly is beyond the scope of this document. A <SPAN
|
||||
class="bold"><STRONG>database template</STRONG></SPAN> is the basic set of configuration
|
||||
parameters used to set up BSim database instance. The configuration parameters are
|
||||
established for a particular database when the <SPAN class="command"><STRONG>bsim
|
||||
createdatabase</STRONG></SPAN> command is run (See <A class="xref" href=
|
||||
"DatabaseConfiguration.html#CreateDatabase" title="Creating a Database">“Creating a
|
||||
Database”</A>). The template name passed on the command-line actually identifies an
|
||||
XML file-name, appended with the '.xml' suffix, in the directory:</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput">$(ROOT)/Ghidra/Features/BSim/data</CODE></TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
|
||||
<P>The file has a root tag of <SPAN class="emphasis"><EM><dbconfig></EM></SPAN>,
|
||||
and the first child tag of this root is the <SPAN class=
|
||||
"emphasis"><EM><info></EM></SPAN> tag. This tag contains all the metadata tags that
|
||||
can be easily changed or added to the database. A list of the metadata tags follows. The
|
||||
metadata is provided as formal text content within the tag, and none of the tags
|
||||
currently take attributes.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="table">
|
||||
<TABLE width="80%" frame="none">
|
||||
<COL width="30%">
|
||||
<COL width="70%">
|
||||
|
||||
<THEAD>
|
||||
<TR>
|
||||
<TD><SPAN class="bold"><STRONG>XML Tag</STRONG></SPAN></TD>
|
||||
|
||||
<TD><SPAN class="bold"><STRONG>Description</STRONG></SPAN></TD>
|
||||
</TR>
|
||||
</THEAD>
|
||||
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><name></CODE></TD>
|
||||
|
||||
<TD>Name of the database</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><owner></CODE></TD>
|
||||
|
||||
<TD>Owner of the database</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><description></CODE></TD>
|
||||
|
||||
<TD>An overarching description of the database</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><major></CODE></TD>
|
||||
|
||||
<TD>Major decompiler version used for ingest (Should be set to zero)</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><minor></CODE></TD>
|
||||
|
||||
<TD>Minor decompiler version used for ingest (Should be set to zero)</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><settings></CODE></TD>
|
||||
|
||||
<TD>Specific settings for the signature strategy (Should be set to zero)</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><execategory></CODE></TD>
|
||||
|
||||
<TD>The name of an executable category (to be) defined for this database
|
||||
instance</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><datename></CODE></TD>
|
||||
|
||||
<TD>The name of the timestamp column</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD><CODE class="computeroutput"><functiontag></CODE></TD>
|
||||
|
||||
<TD>The name of a function tag (to be) registered with this database
|
||||
instance</TD>
|
||||
</TR>
|
||||
</TBODY>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>There can be multiple <SPAN class="emphasis"><EM><execategory></EM></SPAN> tags
|
||||
if more than one category is desired and both <SPAN class=
|
||||
"emphasis"><EM><execategory></EM></SPAN> and <SPAN class=
|
||||
"emphasis"><EM><datename></EM></SPAN> are optional tags. The date column name
|
||||
defaults to 'Ingest Date' and is drawn from the corresponding Ghidra program option. The
|
||||
tag order needs to be preserved. There can be multiple <SPAN class=
|
||||
"emphasis"><EM><functiontag></EM></SPAN> tags, one for each function tag to be
|
||||
registered with the database.</P>
|
||||
|
||||
<P>It is easiest to copy an existing template and just edit the tags described above. The
|
||||
remaining tags in the file are more dangerous to manipulate. The <SPAN class=
|
||||
"emphasis"><EM><k></EM></SPAN> and <SPAN class="emphasis"><EM><L></EM></SPAN>
|
||||
tags pertain to the index tuning. The <SPAN class=
|
||||
"emphasis"><EM><weightsfile></EM></SPAN> tag gives the name of the weights file,
|
||||
within the same directory, which is also another XML file. It is simplest to choose from
|
||||
the existing weight files provided with the distribution. See <A class="xref" href=
|
||||
"FeatureWeight.html#WeightingSoftware" title=
|
||||
"Weighting Software Features">“Weighting Software Features”</A>.</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</BODY>
|
||||
</HTML>
|
@ -0,0 +1,258 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META name="generator" content=
|
||||
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
|
||||
<TITLE>Features and Weights</TITLE>
|
||||
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||
<LINK rel="home" href="index.html" title="BSim Database">
|
||||
<LINK rel="up" href="index.html" title="BSim Database">
|
||||
<LINK rel="prev" href="DatabaseQuery.html" title="Querying a BSim Database">
|
||||
<LINK rel="next" href="CommandLineReference.html" title="Command-Line Utility Reference">
|
||||
</HEAD>
|
||||
|
||||
<BODY>
|
||||
<DIV class="chapter">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H1 class="title"><A name="FeatureWeight"></A>Features and Weights</H1>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="FunctionFeatures"></A>Features of
|
||||
Software Functions</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>The BSim Database uses a standard <SPAN class="bold"><STRONG>Feature
|
||||
Vector</STRONG></SPAN> approach to compare and index software functions. A <SPAN class=
|
||||
"bold"><STRONG>feature</STRONG></SPAN> is an abstraction that simply means a single element
|
||||
or attribute that can be compared quantitatively between two objects. The set of possible
|
||||
features used by a particular approach is fixed, and any object being examined is viewed as
|
||||
some unordered subset of all the possible features. So features are the smallest (atomic)
|
||||
aspect of an object that can be measured, either two objects share a feature in common or
|
||||
they do not. But within this scheme, because objects generally consist of many individual
|
||||
features, quantitative fine-grained comparisons can be automatically calculated.</P>
|
||||
|
||||
<P>The BSim Database extracts its features from the data-flow representation of a function,
|
||||
after it has been normalized by the Ghidra decompiler. This is the SSA graph representation
|
||||
of the function, with nodes representing the variables and operators of the function, and
|
||||
the edges representing the read/write relationships between them. An individual feature is
|
||||
just a portion of this graph, encompassing some subset of variables and operators and the
|
||||
specific flow between them. Because of the decompilation, a feature can be viewed naturally
|
||||
as a uniform snippet of C source code, a partial extraction of some expression in the
|
||||
source code representation of the function. The full set of features provides uniform (and
|
||||
overlapping) coverage of the graph representation of the entire function.</P>
|
||||
|
||||
<P>Features encode specific aspects of the variables they cover but not others. The size of
|
||||
a variable, the operator that produced it, and the set of operators it is fed into are
|
||||
encoded in the features. But, any name assigned to the variable, its data-type, or even its
|
||||
storage location are <SPAN class="emphasis"><EM>not</EM></SPAN> encoded in the
|
||||
features.</P>
|
||||
|
||||
<P>Within a function, details about the specific subfunctions that it calls are not encoded
|
||||
in any of the features for that function, but information describing where the call is made
|
||||
and the set of parameters it takes is encoded.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="WeightingSoftware"></A>Weighting
|
||||
Software Features</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Some features are more useful for identifying a specific function out of a large corpus
|
||||
than others. With the view that features are just portions of recovered C expressions, some
|
||||
C expressions are simply more common than others. The BSim Database compensates for these
|
||||
differences by assigning a weight to each feature that factors in to the similarity and
|
||||
confidence scores produced when comparing functions. Weighting schemes are considered a
|
||||
configuration parameter of the database and are established for a particular database when
|
||||
it is created. The scheme cannot be changed without creating an entirely new database and
|
||||
reingesting the functions.</P>
|
||||
|
||||
<P>Ghidra comes with precomputed weighting schemes that are calculated using statistics
|
||||
drawn from homogeneous collections of systems and application software. A feature's weight
|
||||
is computed by counting the number of times it occurs across the entire corpus and
|
||||
comparing this with the counts from other features. This allows a direct computation of the
|
||||
information content of the feature; quantitatively, how much have we narrowed down a
|
||||
particular function from the corpus when we know it contains a particular feature.</P>
|
||||
|
||||
<P>The two primary weighting schemes are called <SPAN class=
|
||||
"bold"><STRONG>32</STRONG></SPAN> and <SPAN class="bold"><STRONG>64</STRONG></SPAN>, based
|
||||
on 32-bit code and one on 64-bit code respectively. This means that a particular database
|
||||
instance has better sensitivity for either 32-bit or 64-bit functions. The quantitative
|
||||
scores, similarity and confidence, will be more accurate at distinguishing pairs of
|
||||
functions from one corpus. This does not mean that functions from the <SPAN class=
|
||||
"emphasis"><EM>wrong</EM></SPAN> group cannot be ingested or queried, but the scores may
|
||||
not be as accurate. There is also a <SPAN class="bold"><STRONG>64_32</STRONG></SPAN>
|
||||
weighting scheme for architectures where code is compiled to use 64-bit registers but
|
||||
addresses are still 32-bit.</P>
|
||||
|
||||
<P>The specialized weighting scheme <SPAN class="bold"><STRONG>nosize</STRONG></SPAN>
|
||||
allows BSim to match between 32-bit and 64-bit implementations of a function. It works by
|
||||
making feature hashes blind to the size difference between a 32-bit variable versus a
|
||||
64-bit variable. This compensates for a compiler's tendency to assign a full 64-bit
|
||||
register to a 32-bit variable, which is frequently difficult for the decompiler to
|
||||
automatically resolve in the context of a single function. Because of this blindness, there
|
||||
is a slight loss of sensitivity, when matching 32-bit to 32-bit functions, or when matching
|
||||
64-bit to 64-bit, over the <SPAN class="bold"><STRONG>32</STRONG></SPAN> or <SPAN class=
|
||||
"bold"><STRONG>64</STRONG></SPAN> schemes respectively.</P>
|
||||
|
||||
<P>The weighting scheme <SPAN class="bold"><STRONG>cpool</STRONG></SPAN> should be used for
|
||||
run-time compilation (JIT) architectures, like Java Dalvik or <SPAN class=
|
||||
"emphasis"><EM>.class</EM></SPAN> byte-code executables. These architectures use
|
||||
characteristic <SPAN class="emphasis"><EM>constant pool</EM></SPAN> instructions that delay
|
||||
exact decisions about code and data layout until runtime. The decompiler can still recover
|
||||
data-flow effectively by treating these instructions as black-box operations, so BSim works
|
||||
in the same way as with native code. But a specialized weighting scheme is needed to
|
||||
balance BSim's sensitivity to these operations.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="section">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H2 class="title" style="clear: both"><A name="CompareVectors"></A>Comparing Feature
|
||||
Vectors</H2>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>For a particular function, the set of extracted features and their assigned weights make
|
||||
up the formal <SPAN class="bold"><STRONG>feature vector</STRONG></SPAN> associated with the
|
||||
function. When querying a BSim Database, the primary function search is performed by
|
||||
comparing feature vectors. There are two formal scores that are computed on a pair of
|
||||
feature vectors, <SPAN class="emphasis"><EM>similarity</EM></SPAN> and <SPAN class=
|
||||
"emphasis"><EM>confidence</EM></SPAN>.</P>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="Similarity"></A>Similarity</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Similarity is a direct calculation of the percentage of features in common between two
|
||||
functions. It varies continuously from 0.0, meaning the functions share no features at
|
||||
all, to 1.0, meaning that the functions have the same feature set. Formally, similarity
|
||||
is defined as the <SPAN class="emphasis"><EM>cosine similarity</EM></SPAN> of the two
|
||||
feature vectors. Weights determine how important individual features are in the score
|
||||
relative to other features, providing a practical and realistic meaning to the score. Two
|
||||
functions can exhibit a few unimportant changes, but the similarity can still be very
|
||||
high because the differences are likely not weighted heavily. Along the same lines, two
|
||||
functions can share most of their features but have a low similarity because they differ
|
||||
in more important features.</P>
|
||||
|
||||
<P>When searching for a function, the database sets a particular threshold on similarity,
|
||||
0.7 by default, and returns functions whose similarity with the queried function exceeds
|
||||
that threshold. This can produce <SPAN class="emphasis"><EM>false positive</EM></SPAN>
|
||||
matches for small functions because a small function is described by just a few features
|
||||
and it is then comparatively easy to randomly match a high percentage of these features.
|
||||
Deciding if a false positive is likely can be decided quantitatively by examining the
|
||||
<SPAN class="emphasis"><EM>confidence</EM></SPAN> score below.</P>
|
||||
</DIV>
|
||||
|
||||
<DIV class="sect2">
|
||||
<DIV class="titlepage">
|
||||
<DIV>
|
||||
<DIV>
|
||||
<H3 class="title"><A name="Confidence"></A>Confidence</H3>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>Confidence is a log likelihood ratio, a weighted count of the set of features that
|
||||
match between two functions minus the set of features that are different. It is an
|
||||
open-ended score, and the higher it gets, the more likely it is that the two functions
|
||||
are a true match. Fixing a threshold for the confidence score provides a more consistent
|
||||
<SPAN class="emphasis"><EM>false positive</EM></SPAN> rate, as opposed to thresholding on
|
||||
similarity. A higher score means the two functions have more features in common as an
|
||||
absolute count, not just a higher percentage. So the chance of randomly matching most of
|
||||
the features continues to go down as confidence goes up.</P>
|
||||
|
||||
<P>A general correspondence between low confidence scores and false positive rates can be
|
||||
somewhat skewed by <SPAN class="emphasis"><EM>wrappers</EM></SPAN> and other small
|
||||
functions, which are always common but whose specific frequency can vary depending on the
|
||||
type of software. BSim fixes the score 10.0 for a particular wrapper form, providing a
|
||||
convenient boundary between wrappers and more substantial functions where frequencies are
|
||||
more consistent. For scores of 10.0 and greater, we get the following rough
|
||||
correspondence with false positive rate. The rate drops by a factor of 2 for an increase
|
||||
in confidence of between 4 and 5 points.</P>
|
||||
|
||||
<DIV class="informalexample">
|
||||
<DIV class="table">
|
||||
<A name="falsepositive.htmltable"></A>
|
||||
|
||||
<TABLE width="70%" frame="none">
|
||||
<COL width="30%">
|
||||
<COL width="70%">
|
||||
|
||||
<THEAD>
|
||||
<TR>
|
||||
<TD><SPAN class="bold"><STRONG>Confidence</STRONG></SPAN></TD>
|
||||
|
||||
<TD><SPAN class="bold"><STRONG>False Positive Rate
|
||||
(Approximate)</STRONG></SPAN></TD>
|
||||
</TR>
|
||||
</THEAD>
|
||||
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD>10</TD>
|
||||
|
||||
<TD>1 in 4,000</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>26</TD>
|
||||
|
||||
<TD>1 in 100,000</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>43</TD>
|
||||
|
||||
<TD>1 in 1,000,000</TD>
|
||||
</TR>
|
||||
|
||||
<TR>
|
||||
<TD>93</TD>
|
||||
|
||||
<TD>1 in 1,000,000,000</TD>
|
||||
</TR>
|
||||
</TBODY>
|
||||
</TABLE>
|
||||
</DIV>
|
||||
</DIV>
|
||||
|
||||
<P>For a single function, there is an upper-bound to the confidence that can be achieved
|
||||
by a possible match, its <SPAN class="emphasis"><EM>self significance</EM></SPAN>. This
|
||||
upper-bound is of course reached by comparison with a function having 1.0 similarity.
|
||||
Self significance is roughly proportional to the size of the function. So its impossible
|
||||
to achieve a high confidence for a small function, for single matches viewed in
|
||||
isolation. Of course a medium to low confidence threshold may be enough to produce a
|
||||
unique match if the database is small, and a medium to high confidence threshold may
|
||||
still produce occasional false positives if the database is very large.</P>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</DIV>
|
||||
</BODY>
|
||||
</HTML>
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user