mirror of
https://github.com/NationalSecurityAgency/ghidra.git
synced 2024-11-10 06:02:09 +00:00
GP-4009 Introduced BSim functionality including support for postgresql,
elasticsearch and h2 databases. Added BSim correlator to Version Tracking.
This commit is contained in:
parent
f0f5b8f2a4
commit
0865a3dfb0
@ -1,8 +1,6 @@
|
|||||||
##VERSION: 2.0
|
##VERSION: 2.0
|
||||||
##MODULE IP: Apache License 2.0
|
##MODULE IP: Apache License 2.0
|
||||||
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
||||||
.classpath||NONE||reviewed||END|
|
|
||||||
.project||NONE||reviewed||END|
|
|
||||||
FridaNotes.txt||GHIDRA||||END|
|
FridaNotes.txt||GHIDRA||||END|
|
||||||
Module.manifest||GHIDRA||||END|
|
Module.manifest||GHIDRA||||END|
|
||||||
build.gradle||GHIDRA||||END|
|
build.gradle||GHIDRA||||END|
|
||||||
|
@ -1,8 +1,6 @@
|
|||||||
##VERSION: 2.0
|
##VERSION: 2.0
|
||||||
##MODULE IP: Apache License 2.0
|
##MODULE IP: Apache License 2.0
|
||||||
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
||||||
.classpath||NONE||reviewed||END|
|
|
||||||
.project||NONE||reviewed||END|
|
|
||||||
Module.manifest||GHIDRA||||END|
|
Module.manifest||GHIDRA||||END|
|
||||||
build.gradle||GHIDRA||||END|
|
build.gradle||GHIDRA||||END|
|
||||||
src/llvm-project/lldb/bindings/java/java-typemaps.swig||Apache License 2.0 with LLVM Exceptions||||END|
|
src/llvm-project/lldb/bindings/java/java-typemaps.swig||Apache License 2.0 with LLVM Exceptions||||END|
|
||||||
|
@ -1,8 +1,6 @@
|
|||||||
##VERSION: 2.0
|
##VERSION: 2.0
|
||||||
##MODULE IP: Apache License 2.0
|
##MODULE IP: Apache License 2.0
|
||||||
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
##MODULE IP: Apache License 2.0 with LLVM Exceptions
|
||||||
.classpath||NONE||reviewed||END|
|
|
||||||
.project||NONE||reviewed||END|
|
|
||||||
InstructionsForBuildingLLDBInterface.txt||GHIDRA||||END|
|
InstructionsForBuildingLLDBInterface.txt||GHIDRA||||END|
|
||||||
Module.manifest||GHIDRA||||END|
|
Module.manifest||GHIDRA||||END|
|
||||||
build.gradle||GHIDRA||||END|
|
build.gradle||GHIDRA||||END|
|
||||||
|
81
Ghidra/Extensions/BSimElasticPlugin/INSTALL.txt
Executable file
81
Ghidra/Extensions/BSimElasticPlugin/INSTALL.txt
Executable file
@ -0,0 +1,81 @@
|
|||||||
|
Installation of the Elasticsearch BSim Plug-in:
|
||||||
|
|
||||||
|
In order to use Elasticsearch as the back-end database for a BSim instance,
|
||||||
|
the lsh plug-in, included with this Ghidra extension, must be installed on
|
||||||
|
the Elasticsearch cluster.
|
||||||
|
|
||||||
|
The lsh plug-in is bundled in the standard plug-in format as the file
|
||||||
|
'lsh.zip'. It must be installed separately on EVERY node of the cluster,
|
||||||
|
and each node must be restarted after the install in order for the plug-in to
|
||||||
|
become active.
|
||||||
|
|
||||||
|
For a single node, installation is accomplished with the command-line
|
||||||
|
'elasticsearch-plugin' script that comes with the standard Elasticsearch
|
||||||
|
distribution. It expects a URL pointing to the plug-in to be installed.
|
||||||
|
The basic command, executed in the Elasticsearch installation directory
|
||||||
|
for the node, is
|
||||||
|
|
||||||
|
bin/elasticsearch-plugin install file:///path/to/ghidra/Ghidra/Extensions/BSimElasticPlugin/data/lsh.zip
|
||||||
|
|
||||||
|
Replace the initial portion of the absolute path in the URL to point to your
|
||||||
|
particular Ghidra installation.
|
||||||
|
|
||||||
|
Deployment:
|
||||||
|
|
||||||
|
Follow the Elasticsearch documentation to do any additional configuration,
|
||||||
|
starting, stopping, and management of your Elasticsearch cluster.
|
||||||
|
|
||||||
|
To try BSim with a toy deployment, you can start a single node (as per the
|
||||||
|
documentation) from the command-line by just running
|
||||||
|
|
||||||
|
bin/elasticsearch
|
||||||
|
|
||||||
|
This will dump logging messages to the console, and you should see '[lsh]'
|
||||||
|
listed among the loaded plug-ins as the node starts up.
|
||||||
|
|
||||||
|
Once the Elasticsearch node(s) are running, whether they are a toy or a full
|
||||||
|
deployment, you can immediately proceed to the BSim 'bsim' command.
|
||||||
|
The Ghidra/BSim client and 'bsim' command automatically assume an
|
||||||
|
Elasticsearch server when they see the 'https' protocol in the provided URLs,
|
||||||
|
although the 'elastic" protocol may also be specified and is equivalent.
|
||||||
|
The use of the 'http' protocol for Elasticsearch is not supported.
|
||||||
|
Adjust the hostname, port number, and repository name as appropriate.
|
||||||
|
Use a command-line similar to the following to create a BSim instance:
|
||||||
|
|
||||||
|
bsim createdatabase elastic://1.2.3.4:9200/repo medium_32
|
||||||
|
|
||||||
|
This is equivalent to:
|
||||||
|
|
||||||
|
bsim createdatabase https://1.2.3.4:9200/repo medium_32
|
||||||
|
|
||||||
|
Use a command-line like this to generate and commit signatures from a Ghidra Server
|
||||||
|
repository to the Elasticsearch database created above:
|
||||||
|
|
||||||
|
bsim generatesigs ghidra://1.2.3.4/repo bsim=elastic://1.2.3.4:9200/repo
|
||||||
|
|
||||||
|
Within Ghidra's BSim client, enter the same URL into the database connection
|
||||||
|
panel in order to place queries to your Elasticsearch deployment. See the BSim
|
||||||
|
documentation included with Ghidra for full details.
|
||||||
|
|
||||||
|
|
||||||
|
Version:
|
||||||
|
|
||||||
|
The current BSim plug-in was designed and tested with Elasticsearch version 7.17.4.
|
||||||
|
A change to the Elasticsearch scripting interface, starting with version 7.15, makes the BSim
|
||||||
|
plug-in incompatible with previous versions, but the lsh plug-in jars may work without change
|
||||||
|
across later Elasticsearch versions.
|
||||||
|
|
||||||
|
Elasticsearch plug-ins explicitly encode the version of Elasticsearch they work with, and the
|
||||||
|
plug-in script will refuse to install the lsh plug-in if its version does not match your
|
||||||
|
particular installation. If your Elasticsearch version is slightly different, you can try
|
||||||
|
unpacking the zip file, changing the version number to match your software, and then repacking
|
||||||
|
the zip file. Within the zip archive, the version number is stored in a configuration file
|
||||||
|
|
||||||
|
elasticsearch/plugin-descriptor.properties
|
||||||
|
|
||||||
|
The file format is fairly simple: edit the line
|
||||||
|
|
||||||
|
elasticsearch.version=7.17.4
|
||||||
|
|
||||||
|
The plugin may work with other nearby versions, but proceed at your own risk.
|
||||||
|
|
0
Ghidra/Extensions/BSimElasticPlugin/Module.manifest
Executable file
0
Ghidra/Extensions/BSimElasticPlugin/Module.manifest
Executable file
99
Ghidra/Extensions/BSimElasticPlugin/build.gradle
Executable file
99
Ghidra/Extensions/BSimElasticPlugin/build.gradle
Executable file
@ -0,0 +1,99 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
apply from: "$rootProject.projectDir/gradle/distributableGhidraExtension.gradle"
|
||||||
|
apply from: "$rootProject.projectDir/gradle/javaProject.gradle"
|
||||||
|
apply plugin: 'eclipse'
|
||||||
|
eclipse.project.name = 'Xtra BSimElasticPlugin'
|
||||||
|
// This module is very different from other Ghidra modules. It is creating a stand-alone jar
|
||||||
|
// file for an elastic database plugin. It is copying files from other modules into this module
|
||||||
|
// before building a jar file from the files in this module and the cherry-picked files from
|
||||||
|
// other modules (This is very brittle and will break if any of the files are renamed or moved.)
|
||||||
|
project.ext.includeExtensionInInstallation = true
|
||||||
|
|
||||||
|
apply plugin: 'java'
|
||||||
|
|
||||||
|
sourceSets {
|
||||||
|
elasticPlugin {
|
||||||
|
java {
|
||||||
|
srcDirs = [ 'src', 'srcdummy', 'build/genericSrc', 'build/utilitySrc', 'build/bsimSrc' ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// this dependency block is needed for this code to compile in our eclipse environment. It is not needed
|
||||||
|
// for the gradle build
|
||||||
|
dependencies {
|
||||||
|
|
||||||
|
implementation project(':BSim')
|
||||||
|
}
|
||||||
|
libsDirName='ziplayout'
|
||||||
|
|
||||||
|
task copyGenericTask(type: Copy) {
|
||||||
|
from project(':Generic').file('src/main/java')
|
||||||
|
into 'build/genericSrc'
|
||||||
|
include 'generic/lsh/vector/*.java'
|
||||||
|
include 'generic/hash/SimpleCRC32.java'
|
||||||
|
include 'ghidra/util/xml/SpecXmlUtils.java'
|
||||||
|
}
|
||||||
|
|
||||||
|
task copyUtilityTask(type: Copy) {
|
||||||
|
from project(':Utility').file('src/main/java')
|
||||||
|
into 'build/utilitySrc'
|
||||||
|
include 'ghidra/xml/XmlPullParser.java'
|
||||||
|
include 'ghidra/xml/XmlElement.java'
|
||||||
|
}
|
||||||
|
|
||||||
|
task copyBSimTask(type: Copy) {
|
||||||
|
from project(':BSim').file('src/main/java')
|
||||||
|
into 'build/bsimSrc'
|
||||||
|
include 'ghidra/features/bsim/query/elastic/ElasticUtilities.java'
|
||||||
|
include 'ghidra/features/bsim/query/elastic/Base64Lite.java'
|
||||||
|
include 'ghidra/features/bsim/query/elastic/Base64VectorFactory.java'
|
||||||
|
}
|
||||||
|
|
||||||
|
task copyPropertiesFile(type: Copy) {
|
||||||
|
from 'contribZipExclude/plugin-descriptor.properties'
|
||||||
|
into 'build/ziplayout'
|
||||||
|
}
|
||||||
|
|
||||||
|
task elasticPluginJar(type: Jar) {
|
||||||
|
from sourceSets.elasticPlugin.output
|
||||||
|
archiveBaseName = 'lsh'
|
||||||
|
excludes = [
|
||||||
|
'**/org/apache',
|
||||||
|
'**/org/elasticsearch/common',
|
||||||
|
'**/org/elasticsearch/env',
|
||||||
|
'**/org/elasticsearch/index',
|
||||||
|
'**/org/elasticsearch/indices',
|
||||||
|
'**/org/elasticsearch/plugins',
|
||||||
|
'**/org/elasticsearch/script',
|
||||||
|
'**/org/elasticsearch/search'
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
task elasticPluginZip(type: Zip) {
|
||||||
|
from 'build/ziplayout'
|
||||||
|
archiveBaseName = 'lsh'
|
||||||
|
destinationDirectory = file("build/data")
|
||||||
|
}
|
||||||
|
|
||||||
|
compileElasticPluginJava.dependsOn copyGenericTask
|
||||||
|
compileElasticPluginJava.dependsOn copyUtilityTask
|
||||||
|
compileElasticPluginJava.dependsOn copyBSimTask
|
||||||
|
|
||||||
|
elasticPluginZip.dependsOn elasticPluginJar
|
||||||
|
elasticPluginZip.dependsOn copyPropertiesFile
|
||||||
|
|
||||||
|
jar.dependsOn elasticPluginZip
|
6
Ghidra/Extensions/BSimElasticPlugin/certification.manifest
Executable file
6
Ghidra/Extensions/BSimElasticPlugin/certification.manifest
Executable file
@ -0,0 +1,6 @@
|
|||||||
|
##VERSION: 2.0
|
||||||
|
##MODULE IP: Apache License 2.0
|
||||||
|
INSTALL.txt||GHIDRA||||END|
|
||||||
|
Module.manifest||GHIDRA||reviewed||END|
|
||||||
|
contribZipExclude/plugin-descriptor.properties||GHIDRA||||END|
|
||||||
|
extension.properties||GHIDRA||||END|
|
@ -0,0 +1,6 @@
|
|||||||
|
description=Feature Vector Plugin
|
||||||
|
version=1.0
|
||||||
|
name=lsh
|
||||||
|
classname=org.elasticsearch.plugin.analysis.lsh.AnalysisLSHPlugin
|
||||||
|
java.version=1.11
|
||||||
|
elasticsearch.version=8.8.1
|
5
Ghidra/Extensions/BSimElasticPlugin/extension.properties
Executable file
5
Ghidra/Extensions/BSimElasticPlugin/extension.properties
Executable file
@ -0,0 +1,5 @@
|
|||||||
|
name=BSimElasticPlugin
|
||||||
|
description=Elastic search backend for BSim.
|
||||||
|
author=Ghidra Team
|
||||||
|
createdOn=11/23/20
|
||||||
|
version=@extversion@
|
@ -0,0 +1,134 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugin.analysis.lsh;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.util.*;
|
||||||
|
|
||||||
|
import org.elasticsearch.common.settings.Settings;
|
||||||
|
import org.elasticsearch.env.Environment;
|
||||||
|
import org.elasticsearch.index.IndexModule;
|
||||||
|
import org.elasticsearch.index.IndexSettings;
|
||||||
|
import org.elasticsearch.index.analysis.TokenizerFactory;
|
||||||
|
import org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider;
|
||||||
|
import org.elasticsearch.plugins.*;
|
||||||
|
import org.elasticsearch.script.ScriptContext;
|
||||||
|
import org.elasticsearch.script.ScriptEngine;
|
||||||
|
|
||||||
|
import generic.lsh.vector.IDFLookup;
|
||||||
|
import generic.lsh.vector.WeightFactory;
|
||||||
|
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||||
|
import ghidra.features.bsim.query.elastic.ElasticUtilities;
|
||||||
|
|
||||||
|
public class AnalysisLSHPlugin extends Plugin implements AnalysisPlugin, ScriptPlugin {
|
||||||
|
|
||||||
|
public static final String TOKENIZER_SETTINGS_BASE = "index.analysis.tokenizer.lsh_";
|
||||||
|
public static String settingString = "";
|
||||||
|
|
||||||
|
static private Map<String, Base64VectorFactory> vecFactoryMap = new HashMap<>();
|
||||||
|
private Map<String, AnalysisProvider<TokenizerFactory>> tokFactoryMap;
|
||||||
|
|
||||||
|
public class TokenizerFactoryProvider implements AnalysisProvider<TokenizerFactory> {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public TokenizerFactory get(IndexSettings indexSettings, Environment env, String name,
|
||||||
|
Settings settings) throws IOException {
|
||||||
|
// settingString = settingString + " : " + indexSettings.getIndex().getName() + '(' + name + ')';
|
||||||
|
return new LSHTokenizerFactory(indexSettings, env, name, settings);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public AnalysisLSHPlugin() {
|
||||||
|
TokenizerFactoryProvider provider = new TokenizerFactoryProvider();
|
||||||
|
tokFactoryMap = Collections.singletonMap("lsh_tokenizer", provider);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void setupVectorFactory(String name, String idfConfig, String lshWeights) {
|
||||||
|
WeightFactory weightFactory = new WeightFactory();
|
||||||
|
String[] split = lshWeights.split(" ");
|
||||||
|
double[] weightArray = new double[split.length];
|
||||||
|
for (int i = 0; i < weightArray.length; ++i) {
|
||||||
|
weightArray[i] = Double.parseDouble(split[i]);
|
||||||
|
}
|
||||||
|
weightFactory.set(weightArray);
|
||||||
|
IDFLookup idfLookup = new IDFLookup();
|
||||||
|
split = idfConfig.split(" ");
|
||||||
|
int[] intArray = new int[split.length];
|
||||||
|
for (int i = 0; i < intArray.length; ++i) {
|
||||||
|
intArray[i] = Integer.parseInt(split[i]);
|
||||||
|
}
|
||||||
|
idfLookup.set(intArray);
|
||||||
|
Base64VectorFactory vectorFactory = new Base64VectorFactory();
|
||||||
|
// Server-side factory is never used to generate signatures,
|
||||||
|
// so we don't need to specify settings
|
||||||
|
vectorFactory.set(weightFactory, idfLookup, 0);
|
||||||
|
vecFactoryMap.put(name, vectorFactory);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Entry point for Tokenizer and Script factories to grab the global vector factory
|
||||||
|
* @param name is the name of the tokenizer
|
||||||
|
* @return the vector factory used by the tokenizer
|
||||||
|
*/
|
||||||
|
public static Base64VectorFactory getVectorFactory(String name) {
|
||||||
|
return vecFactoryMap.get(name);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void onIndexModule(IndexModule indexModule) {
|
||||||
|
super.onIndexModule(indexModule);
|
||||||
|
|
||||||
|
Settings settings = indexModule.getSettings();
|
||||||
|
String name = null;
|
||||||
|
// Look for the specific kind of tokenizer settings, within the global settings for the index
|
||||||
|
for (String key : settings.keySet()) {
|
||||||
|
if (key.startsWith(TOKENIZER_SETTINGS_BASE)) {
|
||||||
|
// We can have different settings for different indices, distinguished by this name
|
||||||
|
int pos = key.indexOf('.', TOKENIZER_SETTINGS_BASE.length() + 1);
|
||||||
|
if (pos > 0) {
|
||||||
|
name = key.substring(TOKENIZER_SETTINGS_BASE.length(), pos);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (name != null) {
|
||||||
|
String tokenizerName = "lsh_" + name;
|
||||||
|
if (getVectorFactory(tokenizerName) != null) {
|
||||||
|
return; // Factory already exists
|
||||||
|
}
|
||||||
|
settingString = settingString + " : onModule(" + name + ')';
|
||||||
|
// If we found LSH tokenizer settings, pull them out and construct an LSHVectorFactory with them
|
||||||
|
String baseKey = TOKENIZER_SETTINGS_BASE + name + '.';
|
||||||
|
String idfConfig = settings.get(baseKey + ElasticUtilities.IDF_CONFIG);
|
||||||
|
String lshWeights = settings.get(baseKey + ElasticUtilities.LSH_WEIGHTS);
|
||||||
|
if (idfConfig == null || lshWeights == null) {
|
||||||
|
return; // IDF_CONFIG and LSH_WEIGHTS settings must be present to proceed
|
||||||
|
}
|
||||||
|
setupVectorFactory(tokenizerName, idfConfig, lshWeights);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) {
|
||||||
|
return new BSimScriptEngine();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() {
|
||||||
|
return tokFactoryMap;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,54 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugin.analysis.lsh;
|
||||||
|
|
||||||
|
import java.util.*;
|
||||||
|
|
||||||
|
import org.elasticsearch.script.*;
|
||||||
|
|
||||||
|
public class BSimScriptEngine implements ScriptEngine {
|
||||||
|
private final static String ENGINE_NAME = "bsim_scripts";
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public <FactoryType> FactoryType compile(String scriptName, String scriptSource,
|
||||||
|
ScriptContext<FactoryType> context, Map<String, String> params) {
|
||||||
|
if (context.equals(ScoreScript.CONTEXT) == false) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
getType() + "scripts cannot be used for context [" + context.name + "]");
|
||||||
|
}
|
||||||
|
if (VectorCompareScriptFactory.SCRIPT_NAME.equals(scriptSource)) {
|
||||||
|
ScoreScript.Factory factory = new VectorCompareScriptFactory();
|
||||||
|
return context.factoryClazz.cast(factory);
|
||||||
|
}
|
||||||
|
throw new IllegalArgumentException("Unknown script name " + scriptSource);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void close() {
|
||||||
|
// Can free up resources
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Set<ScriptContext<?>> getSupportedContexts() {
|
||||||
|
return Collections.singleton(ScoreScript.CONTEXT);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getType() {
|
||||||
|
return ENGINE_NAME;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,293 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugin.analysis.lsh;
|
||||||
|
|
||||||
|
import generic.lsh.vector.HashEntry;
|
||||||
|
import ghidra.features.bsim.query.elastic.Base64Lite;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Class for calculating the bin ids on LSHVectors as part of the LSH indexing process
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
public class LSHBinner {
|
||||||
|
|
||||||
|
private static final char[] hashSignTable = new char[512];
|
||||||
|
private static int VEC_SIZE_UPPER = 5; // Size above which to use FFT to calculate dotproduct family
|
||||||
|
private static int LSH_HASHBASE = 0xd7e6a299;
|
||||||
|
private static int HASH_MULTIPLIER = 1103515245;
|
||||||
|
private static int HASH_ADDEND = 12345;
|
||||||
|
|
||||||
|
public static class BytesRef {
|
||||||
|
public char[] buffer;
|
||||||
|
public BytesRef(int size) { buffer = new char[size]; }
|
||||||
|
}
|
||||||
|
|
||||||
|
private int k; // Number of bits per bin id
|
||||||
|
private int L; // Number of binnings
|
||||||
|
private double doubleBuffer[]; // Scratch space for dot-product calculation
|
||||||
|
private BytesRef tokenList[]; // Final token list used by lucene
|
||||||
|
|
||||||
|
static {
|
||||||
|
/**
|
||||||
|
* This is a precalculated table for generating dot-products with the random family of vectors directly
|
||||||
|
* The first vector r_0 is expressed as a hashing function on the dimension index and the other vectors
|
||||||
|
* are derived from r_0 using an FFT. The table is formed by precalculating the FFT on basis vectors in this table
|
||||||
|
*/
|
||||||
|
int i, j;
|
||||||
|
int[] arr = new int[16];
|
||||||
|
int hibit0ptr;
|
||||||
|
int hibit1ptr;
|
||||||
|
|
||||||
|
for (i = 0; i < 16; ++i) { /* For each 4-bit position */
|
||||||
|
hibit0ptr = i * 16;
|
||||||
|
hibit1ptr = (i + 16) * 16;
|
||||||
|
for (j = 0; j < 16; ++j)
|
||||||
|
arr[j] = 0;
|
||||||
|
|
||||||
|
arr[i] = 1;
|
||||||
|
hashFft16(arr);
|
||||||
|
for (j = 0; j < 16; ++j) {
|
||||||
|
if (arr[j] > 0) {
|
||||||
|
hashSignTable[hibit0ptr + j] = '+';
|
||||||
|
hashSignTable[hibit1ptr + j] = '-';
|
||||||
|
} else {
|
||||||
|
hashSignTable[hibit0ptr + j] = '-';
|
||||||
|
hashSignTable[hibit1ptr + j] = '+';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Raw Fast Fourier Transform on 16 wide integer array
|
||||||
|
* @param arr is the 16-long array
|
||||||
|
*/
|
||||||
|
private static void hashFft16(int[] arr) {
|
||||||
|
int x,y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||||
|
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||||
|
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||||
|
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||||
|
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Raw Fast Fourier Transform on 16 wide array of doubles
|
||||||
|
* @param arr is the 16-long array
|
||||||
|
*/
|
||||||
|
private static void hashFft16(double[] arr) {
|
||||||
|
double x,y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||||
|
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||||
|
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||||
|
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||||
|
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||||
|
}
|
||||||
|
|
||||||
|
public LSHBinner() {
|
||||||
|
doubleBuffer = new double[16];
|
||||||
|
k = -1;
|
||||||
|
L = -1;
|
||||||
|
tokenList = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void setKandL(int k,int L) {
|
||||||
|
this.k = k;
|
||||||
|
this.L = L;
|
||||||
|
int numBits = 1;
|
||||||
|
while( (1 << numBits) <= L )
|
||||||
|
numBits += 1;
|
||||||
|
numBits += k;
|
||||||
|
int numChar = numBits / 6;
|
||||||
|
if ((numBits % 6)!= 0)
|
||||||
|
numChar += 1;
|
||||||
|
tokenList = new BytesRef[L];
|
||||||
|
for(int i=0;i<L;++i) {
|
||||||
|
tokenList[i] = new BytesRef(numChar);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public BytesRef[] getTokenList() {
|
||||||
|
return tokenList;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generate a dot product of the hash vector in -vec- with a random family of 16 vectors, { r }
|
||||||
|
* r_0 is a randomly generated set of +1 -1 coefficients across all the dimensions (indexed by uint32 vec[i].hash)
|
||||||
|
* The coefficient is calculated as a hashing function from the seed -hashcur- and the index (vec[i].hash),
|
||||||
|
* so it should be balanced between +1 and -1.
|
||||||
|
* All the other vectors are generated from an FFT of r_0. This allows the dotproduct with vec to be calculated
|
||||||
|
* using an FFT if -vec- has many non-zero coefficients. If -vec- has only a few non-zero coefficients,
|
||||||
|
* the dotproduct if calculated with each vector in the family directly for better efficiency.
|
||||||
|
* The resulting dotproducts are converted into a 16-long bitvector based on the sign of the dotproduct and
|
||||||
|
* placed in -bucket-
|
||||||
|
* @param bucket is the (possibly partially filled) accumulator for dotproduct bits
|
||||||
|
* @param vec is the HashEntry vector to calculate the dot-products on
|
||||||
|
* @param hashcur is the index of the hash subfamily to representing r_0
|
||||||
|
* @param res is space (a 16-long double array) for the in-place FFT
|
||||||
|
* @return the bucket with new accumulated dot-product bits
|
||||||
|
*/
|
||||||
|
private int hash16DotProduct(int bucket,HashEntry[] vec,int hashcur)
|
||||||
|
|
||||||
|
{
|
||||||
|
int i, j;
|
||||||
|
int rowNum;
|
||||||
|
int signPtr;
|
||||||
|
|
||||||
|
for (i = 0; i < 16; ++i)
|
||||||
|
doubleBuffer[i] = 0.0; // Initialize the dotproduct results to zero
|
||||||
|
|
||||||
|
if (vec.length < VEC_SIZE_UPPER) { // If there are a small number of non-zero coefficients in -vec-
|
||||||
|
for (i = 0; i < vec.length; ++i) {
|
||||||
|
rowNum = vec[i].getHash() ^ hashcur; // Calculate the rest of the r_0 hashing function
|
||||||
|
rowNum = (rowNum * HASH_MULTIPLIER) + HASH_ADDEND;
|
||||||
|
rowNum = (rowNum >>> 24) & 0x1f;
|
||||||
|
signPtr = rowNum * 16;
|
||||||
|
for (j = 0; j < 16; ++j) { // Based on the precalculated coeff table calculate this portion of dotproduct
|
||||||
|
if (hashSignTable[signPtr + j] == '+')
|
||||||
|
doubleBuffer[j] += vec[i].getCoeff(); // Dot product with +1 // coeff
|
||||||
|
else
|
||||||
|
doubleBuffer[j] -= vec[i].getCoeff(); // Dot product with -1 // coeff
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else { // If we have many non-zero coefficients in -vec-
|
||||||
|
for (i = 0; i < vec.length; ++i) {
|
||||||
|
rowNum = vec[i].getHash() ^ hashcur; // Calculate the rest of the r_0 hashing function
|
||||||
|
rowNum = (rowNum * HASH_MULTIPLIER) + HASH_ADDEND;
|
||||||
|
rowNum = (rowNum >>> 24) & 0x1f;
|
||||||
|
if (rowNum < 0x10) // Set-up for the FFT
|
||||||
|
doubleBuffer[rowNum] += vec[i].getCoeff();
|
||||||
|
else
|
||||||
|
doubleBuffer[rowNum & 0xf] -= vec[i].getCoeff();
|
||||||
|
}
|
||||||
|
hashFft16(doubleBuffer); // Calculate the remaining dot-products be performing FFT
|
||||||
|
}
|
||||||
|
|
||||||
|
for (i = 0; i < 16; ++i) { // Convert the dot-product results to a bit-vector
|
||||||
|
bucket <<= 1;
|
||||||
|
if (doubleBuffer[i] > 0.0)
|
||||||
|
bucket |= 1;
|
||||||
|
}
|
||||||
|
return bucket;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void generateBinIds(HashEntry[] vec)
|
||||||
|
|
||||||
|
{
|
||||||
|
int bucket = 0;
|
||||||
|
int bucketcnt = 0;
|
||||||
|
int i,bitsleft;
|
||||||
|
int curid;
|
||||||
|
int mask,val;
|
||||||
|
int hashbase = LSH_HASHBASE;
|
||||||
|
|
||||||
|
for (i = 0; i < L; ++i) {
|
||||||
|
curid = i; // Tack-on bits that indicate the particular table this bin id belongs to
|
||||||
|
bitsleft = k;
|
||||||
|
do {
|
||||||
|
if (bucketcnt == 0) {
|
||||||
|
hashbase = (hashbase * HASH_MULTIPLIER) + HASH_ADDEND;
|
||||||
|
bucket = hash16DotProduct(bucket, vec, hashbase);
|
||||||
|
bucketcnt += 16;
|
||||||
|
}
|
||||||
|
if (bucketcnt >= bitsleft) {
|
||||||
|
curid <<= bitsleft;
|
||||||
|
mask = 1;
|
||||||
|
mask = (mask << bitsleft) - 1;
|
||||||
|
val = bucket >>> (bucketcnt - bitsleft);
|
||||||
|
curid |= (val & mask);
|
||||||
|
bucketcnt -= bitsleft;
|
||||||
|
bitsleft = 0;
|
||||||
|
} else {
|
||||||
|
curid <<= bucketcnt;
|
||||||
|
mask = 1;
|
||||||
|
mask = (mask << bucketcnt) - 1;
|
||||||
|
curid |= (bucket & mask);
|
||||||
|
bitsleft -= bucketcnt;
|
||||||
|
bucketcnt = 0;
|
||||||
|
}
|
||||||
|
} while (bitsleft > 0);
|
||||||
|
char[] token = tokenList[i].buffer;
|
||||||
|
for(int j=0;j<token.length;++j) {
|
||||||
|
token[j] = Base64Lite.encode[curid & 0x3f]; // encode 6 bits
|
||||||
|
curid >>= 6; // move to next 6 bits
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,68 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugin.analysis.lsh;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import org.apache.lucene.analysis.Tokenizer;
|
||||||
|
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
|
||||||
|
import org.elasticsearch.plugin.analysis.lsh.LSHBinner.BytesRef;
|
||||||
|
|
||||||
|
import generic.lsh.vector.LSHVector;
|
||||||
|
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||||
|
|
||||||
|
public class LSHTokenizer extends Tokenizer {
|
||||||
|
private final CharTermAttribute bytesAtt = addAttribute(CharTermAttribute.class);
|
||||||
|
private BytesRef[] tokens;
|
||||||
|
private int pos; // Number of terms/tokens returned so far
|
||||||
|
private Base64VectorFactory vectorFactory;
|
||||||
|
private LSHBinner binner;
|
||||||
|
private char[] vecBuffer;
|
||||||
|
|
||||||
|
public LSHTokenizer(int k,int L,Base64VectorFactory vFactory) {
|
||||||
|
super(DEFAULT_TOKEN_ATTRIBUTE_FACTORY);
|
||||||
|
vectorFactory = vFactory;
|
||||||
|
binner = new LSHBinner();
|
||||||
|
binner.setKandL(k, L);
|
||||||
|
pos = -1;
|
||||||
|
vecBuffer = Base64VectorFactory.allocateBuffer();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean incrementToken() throws IOException {
|
||||||
|
clearAttributes();
|
||||||
|
if (pos < 0) {
|
||||||
|
LSHVector vector = vectorFactory.restoreVectorFromBase64(input,vecBuffer);
|
||||||
|
// AnalysisLSHPlugin.settingString = AnalysisLSHPlugin.settingString + " : " + Long.toHexString(vector.calcUniqueHash());
|
||||||
|
binner.generateBinIds(vector.getEntries());
|
||||||
|
tokens = binner.getTokenList();
|
||||||
|
pos = 0;
|
||||||
|
}
|
||||||
|
if (pos < tokens.length) {
|
||||||
|
char[] buffer = tokens[pos].buffer;
|
||||||
|
bytesAtt.copyBuffer(buffer,0,buffer.length);
|
||||||
|
pos += 1;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void reset() throws IOException {
|
||||||
|
super.reset();
|
||||||
|
pos = -1;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,44 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugin.analysis.lsh;
|
||||||
|
|
||||||
|
import org.apache.lucene.analysis.Tokenizer;
|
||||||
|
import org.elasticsearch.common.settings.Settings;
|
||||||
|
import org.elasticsearch.env.Environment;
|
||||||
|
import org.elasticsearch.index.IndexSettings;
|
||||||
|
import org.elasticsearch.index.analysis.AbstractTokenizerFactory;
|
||||||
|
|
||||||
|
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||||
|
import ghidra.features.bsim.query.elastic.ElasticUtilities;
|
||||||
|
|
||||||
|
public class LSHTokenizerFactory extends AbstractTokenizerFactory {
|
||||||
|
|
||||||
|
private Base64VectorFactory vectorFactory;
|
||||||
|
private int k;
|
||||||
|
private int L;
|
||||||
|
|
||||||
|
public LSHTokenizerFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
|
||||||
|
super(indexSettings, settings, name);
|
||||||
|
k = settings.getAsInt(ElasticUtilities.K_SETTING, -1);
|
||||||
|
L = settings.getAsInt(ElasticUtilities.L_SETTING, -1);
|
||||||
|
vectorFactory = AnalysisLSHPlugin.getVectorFactory(name);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Tokenizer create() {
|
||||||
|
return new LSHTokenizer(k,L,vectorFactory);
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,147 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugin.analysis.lsh;
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
import org.apache.lucene.document.Document;
|
||||||
|
import org.apache.lucene.util.BytesRef;
|
||||||
|
import org.elasticsearch.script.*;
|
||||||
|
import org.elasticsearch.script.ScoreScript.LeafFactory;
|
||||||
|
import org.elasticsearch.search.lookup.SearchLookup;
|
||||||
|
|
||||||
|
import generic.lsh.vector.LSHVector;
|
||||||
|
import generic.lsh.vector.VectorCompare;
|
||||||
|
import ghidra.features.bsim.query.elastic.Base64VectorFactory;
|
||||||
|
|
||||||
|
public class VectorCompareScriptFactory implements ScoreScript.Factory {
|
||||||
|
|
||||||
|
public final static String SCRIPT_NAME = "lsh_compare";
|
||||||
|
public final static String FEATURES_NAME = "{\"features\":\"";
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean isResultDeterministic() {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public LeafFactory newFactory(Map<String, Object> params, SearchLookup lookup) {
|
||||||
|
return new VectorCompareLeafFactory(params, lookup);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static class VectorCompareLeafFactory implements LeafFactory {
|
||||||
|
|
||||||
|
private final Map<String, Object> params;
|
||||||
|
private final SearchLookup lookup;
|
||||||
|
private LSHVector baseVector; // Vector being compared to everything
|
||||||
|
private final double simthresh; // Similarity threshold
|
||||||
|
private final double sigthresh; // Significance threshold
|
||||||
|
private final Base64VectorFactory vectorFactory; // Factory used for this particular query
|
||||||
|
|
||||||
|
private VectorCompareLeafFactory(Map<String, Object> params, SearchLookup lookup) {
|
||||||
|
this.params = params;
|
||||||
|
this.lookup = lookup;
|
||||||
|
vectorFactory = AnalysisLSHPlugin.getVectorFactory((String) params.get("indexname"));
|
||||||
|
simthresh = (Double) params.get("simthresh");
|
||||||
|
sigthresh = (Double) params.get("sigthresh");
|
||||||
|
StringReader reader = new StringReader((String) params.get("vector"));
|
||||||
|
try {
|
||||||
|
baseVector = vectorFactory.restoreVectorFromBase64(reader,
|
||||||
|
Base64VectorFactory.allocateBuffer());
|
||||||
|
}
|
||||||
|
catch (IOException e) {
|
||||||
|
baseVector = null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean needs_score() {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static int scanForFeatures(byte[] buffer, int offset) throws IOException {
|
||||||
|
int i = 0;
|
||||||
|
while (i < FEATURES_NAME.length()) {
|
||||||
|
char curChar = FEATURES_NAME.charAt(i);
|
||||||
|
int val = buffer[offset];
|
||||||
|
if (val == curChar) {
|
||||||
|
i += 1;
|
||||||
|
offset += 1;
|
||||||
|
}
|
||||||
|
else if (val == ' ' || val == '\t') {
|
||||||
|
offset += 1;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
throw new IOException("Document is missing \"features\"");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return offset;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static int scanForLength(BytesRef byteRef, int startOffset) throws IOException {
|
||||||
|
int finalLength = 0;
|
||||||
|
int maxLength = byteRef.length - (startOffset - byteRef.offset);
|
||||||
|
while (finalLength < maxLength) {
|
||||||
|
if (byteRef.bytes[finalLength + startOffset] == '\"') {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
finalLength += 1;
|
||||||
|
}
|
||||||
|
if (finalLength == byteRef.length) {
|
||||||
|
throw new IOException("Document does not contain complete \"features\"");
|
||||||
|
}
|
||||||
|
return finalLength;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ScoreScript newInstance(DocReader docReader) throws IOException {
|
||||||
|
return new ScoreScript(params, lookup, docReader) {
|
||||||
|
@Override
|
||||||
|
public double execute(ExplanationHolder explanation) {
|
||||||
|
try {
|
||||||
|
DocValuesDocReader dvReader = (DocValuesDocReader) docReader;
|
||||||
|
Document document =
|
||||||
|
dvReader.getLeafReaderContext().reader().document(_getDocId());
|
||||||
|
BytesRef byteRef = document.getField("_source").binaryValue();
|
||||||
|
int valOffset = scanForFeatures(byteRef.bytes, byteRef.offset);
|
||||||
|
int finalLength = scanForLength(byteRef, valOffset);
|
||||||
|
InputStream inputStream =
|
||||||
|
new ByteArrayInputStream(byteRef.bytes, valOffset, finalLength);
|
||||||
|
Reader reader = new InputStreamReader(inputStream);
|
||||||
|
// Should be sharing the VectorCompare between different calls
|
||||||
|
// but apparently this routine needs to be thread safe, so we allocate it per call
|
||||||
|
VectorCompare vectorCompare = new VectorCompare();
|
||||||
|
LSHVector curVec = vectorFactory.restoreVectorFromBase64(reader,
|
||||||
|
Base64VectorFactory.allocateBuffer());
|
||||||
|
double sim = baseVector.compare(curVec, vectorCompare);
|
||||||
|
if (sim <= simthresh) {
|
||||||
|
return 0.0;
|
||||||
|
}
|
||||||
|
double sig = vectorFactory.calculateSignificance(vectorCompare);
|
||||||
|
if (sig <= sigthresh) {
|
||||||
|
return 0.0;
|
||||||
|
}
|
||||||
|
return sim;
|
||||||
|
}
|
||||||
|
catch (IOException e) {
|
||||||
|
return 0.0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,29 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.analysis;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import org.apache.lucene.util.AttributeFactory;
|
||||||
|
import org.apache.lucene.util.AttributeSource;
|
||||||
|
|
||||||
|
public abstract class TokenStream extends AttributeSource implements Closeable {
|
||||||
|
public static final AttributeFactory DEFAULT_TOKEN_ATTRIBUTE_FACTORY = null;
|
||||||
|
|
||||||
|
public abstract boolean incrementToken() throws IOException;
|
||||||
|
}
|
@ -0,0 +1,38 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.analysis;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.io.Reader;
|
||||||
|
|
||||||
|
import org.apache.lucene.util.AttributeFactory;
|
||||||
|
|
||||||
|
public abstract class Tokenizer extends TokenStream {
|
||||||
|
protected Reader input;
|
||||||
|
|
||||||
|
protected Tokenizer(AttributeFactory factory) {
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void close() throws IOException {
|
||||||
|
}
|
||||||
|
|
||||||
|
public void reset() throws IOException {
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,25 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.analysis.tokenattributes;
|
||||||
|
|
||||||
|
import org.apache.lucene.util.Attribute;
|
||||||
|
|
||||||
|
public interface CharTermAttribute extends Attribute, CharSequence, Appendable {
|
||||||
|
|
||||||
|
public void copyBuffer(char[] buffer, int offset, int length);
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,26 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.document;
|
||||||
|
|
||||||
|
import org.apache.lucene.index.IndexableField;
|
||||||
|
|
||||||
|
public class Document {
|
||||||
|
public final IndexableField getField(String name) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,27 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.index;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import org.apache.lucene.document.Document;
|
||||||
|
|
||||||
|
public abstract class IndexReader implements Closeable {
|
||||||
|
public final Document document(int docID) throws IOException {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.index;
|
||||||
|
|
||||||
|
public abstract class IndexReaderContext {
|
||||||
|
public abstract IndexReader reader();
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,23 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.index;
|
||||||
|
|
||||||
|
import org.apache.lucene.util.BytesRef;
|
||||||
|
|
||||||
|
public interface IndexableField {
|
||||||
|
public BytesRef binaryValue();
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.index;
|
||||||
|
|
||||||
|
public abstract class LeafReader extends IndexReader {
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,24 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.index;
|
||||||
|
|
||||||
|
public final class LeafReaderContext extends IndexReaderContext {
|
||||||
|
@Override
|
||||||
|
public LeafReader reader() {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.util;
|
||||||
|
|
||||||
|
public interface Attribute {
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,20 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.util;
|
||||||
|
|
||||||
|
public abstract class AttributeFactory {
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,27 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.util;
|
||||||
|
|
||||||
|
public class AttributeSource {
|
||||||
|
public final <T extends Attribute> T addAttribute(Class<T> attClass) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
public final void clearAttributes() {
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,23 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for lucene class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.apache.lucene.util;
|
||||||
|
|
||||||
|
public class BytesRef {
|
||||||
|
public byte[] bytes;
|
||||||
|
public int length;
|
||||||
|
public int offset;
|
||||||
|
}
|
@ -0,0 +1,34 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.common.settings;
|
||||||
|
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
|
public class Settings {
|
||||||
|
|
||||||
|
public Integer getAsInt(String setting, Integer defaultValue) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
public String get(String setting) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Set<String> keySet() {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
21
Ghidra/Extensions/BSimElasticPlugin/srcdummy/org/elasticsearch/env/Environment.java
vendored
Normal file
21
Ghidra/Extensions/BSimElasticPlugin/srcdummy/org/elasticsearch/env/Environment.java
vendored
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.env;
|
||||||
|
|
||||||
|
public class Environment {
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,26 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.index;
|
||||||
|
|
||||||
|
import org.elasticsearch.common.settings.Settings;
|
||||||
|
|
||||||
|
public class IndexModule {
|
||||||
|
|
||||||
|
public Settings getSettings() {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.index;
|
||||||
|
|
||||||
|
public final class IndexSettings {
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,27 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.index.analysis;
|
||||||
|
|
||||||
|
import org.elasticsearch.common.settings.Settings;
|
||||||
|
import org.elasticsearch.index.IndexSettings;
|
||||||
|
|
||||||
|
public abstract class AbstractTokenizerFactory implements TokenizerFactory {
|
||||||
|
|
||||||
|
public AbstractTokenizerFactory(IndexSettings indexSettings, Settings settings, String name) {
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,24 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.index.analysis;
|
||||||
|
|
||||||
|
import org.apache.lucene.analysis.Tokenizer;
|
||||||
|
|
||||||
|
public interface TokenizerFactory {
|
||||||
|
Tokenizer create();
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,31 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.indices.analysis;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import org.elasticsearch.common.settings.Settings;
|
||||||
|
import org.elasticsearch.env.Environment;
|
||||||
|
import org.elasticsearch.index.IndexSettings;
|
||||||
|
|
||||||
|
public class AnalysisModule {
|
||||||
|
|
||||||
|
public interface AnalysisProvider<T> {
|
||||||
|
T get(IndexSettings indexSettings, Environment environment, String name, Settings settings)
|
||||||
|
throws IOException;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,27 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugins;
|
||||||
|
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
import org.elasticsearch.index.analysis.TokenizerFactory;
|
||||||
|
import org.elasticsearch.indices.analysis.AnalysisModule.AnalysisProvider;
|
||||||
|
|
||||||
|
public interface AnalysisPlugin {
|
||||||
|
Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers();
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,32 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugins;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
import org.elasticsearch.index.IndexModule;
|
||||||
|
|
||||||
|
public abstract class Plugin implements Closeable {
|
||||||
|
public void onIndexModule(IndexModule indexModule) {
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void close() throws IOException {
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,28 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.plugins;
|
||||||
|
|
||||||
|
import java.util.Collection;
|
||||||
|
|
||||||
|
import org.elasticsearch.common.settings.Settings;
|
||||||
|
import org.elasticsearch.script.ScriptContext;
|
||||||
|
import org.elasticsearch.script.ScriptEngine;
|
||||||
|
|
||||||
|
public interface ScriptPlugin {
|
||||||
|
ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts);
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
public interface DocReader {
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,28 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
import org.apache.lucene.index.LeafReaderContext;
|
||||||
|
|
||||||
|
public class DocValuesDocReader implements DocReader, LeafReaderContextSupplier {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public LeafReaderContext getLeafReaderContext() {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,23 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
import org.apache.lucene.index.LeafReaderContext;
|
||||||
|
|
||||||
|
public interface LeafReaderContextSupplier {
|
||||||
|
LeafReaderContext getLeafReaderContext();
|
||||||
|
}
|
@ -0,0 +1,50 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
import org.elasticsearch.search.lookup.SearchLookup;
|
||||||
|
|
||||||
|
public abstract class ScoreScript {
|
||||||
|
public ScoreScript(Map<String, Object> params, SearchLookup searchLookup, DocReader docReader) {
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class ExplanationHolder {
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static final ScriptContext<ScoreScript.Factory> CONTEXT = null;
|
||||||
|
|
||||||
|
public interface Factory extends ScriptFactory {
|
||||||
|
LeafFactory newFactory(Map<String, Object> params, SearchLookup lookup);
|
||||||
|
}
|
||||||
|
|
||||||
|
public interface LeafFactory {
|
||||||
|
boolean needs_score();
|
||||||
|
|
||||||
|
ScoreScript newInstance(DocReader reader) throws IOException;
|
||||||
|
}
|
||||||
|
|
||||||
|
public int _getDocId() {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
public abstract double execute(ExplanationHolder explanation);
|
||||||
|
}
|
@ -0,0 +1,22 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
public final class ScriptContext<T> {
|
||||||
|
public final String name = null;
|
||||||
|
public final Class<T> factoryClazz = null;
|
||||||
|
}
|
@ -0,0 +1,30 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch interface
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
|
public interface ScriptEngine extends Closeable {
|
||||||
|
String getType();
|
||||||
|
|
||||||
|
<FactoryType> FactoryType compile(String name, String code, ScriptContext<FactoryType> context,
|
||||||
|
Map<String, String> params);
|
||||||
|
|
||||||
|
Set<ScriptContext<?>> getSupportedContexts();
|
||||||
|
}
|
@ -0,0 +1,22 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.script;
|
||||||
|
|
||||||
|
public interface ScriptFactory {
|
||||||
|
boolean isResultDeterministic();
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
* NOTE: Dummy placeholder for elasticsearch class
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
package org.elasticsearch.search.lookup;
|
||||||
|
|
||||||
|
public class SearchLookup {
|
||||||
|
|
||||||
|
}
|
9
Ghidra/Features/BSim/Module.manifest
Executable file
9
Ghidra/Features/BSim/Module.manifest
Executable file
@ -0,0 +1,9 @@
|
|||||||
|
##MODULE IP: Oxygen Icons - LGPL 3.0
|
||||||
|
MODULE FILE LICENSE: postgresql-15.3.tar.gz Postgresql License
|
||||||
|
MODULE FILE LICENSE: lib/postgresql-42.6.0.jar PostgresqlJDBC License
|
||||||
|
MODULE FILE LICENSE: lib/json-simple-1.1.1.jar Apache License 2.0
|
||||||
|
MODULE FILE LICENSE: lib/commons-dbcp2-2.9.0.jar Apache License 2.0
|
||||||
|
MODULE FILE LICENSE: lib/commons-pool2-2.11.1.jar Apache License 2.0
|
||||||
|
MODULE FILE LICENSE: lib/commons-logging-1.2.jar Apache License 2.0
|
||||||
|
MODULE FILE LICENSE: lib/log4j-jcl-2.16.0.jar Apache License 2.0
|
||||||
|
MODULE FILE LICENSE: lib/h2-2.2.220.jar H2 Mozilla License 2.0
|
197
Ghidra/Features/BSim/build.gradle
Executable file
197
Ghidra/Features/BSim/build.gradle
Executable file
@ -0,0 +1,197 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
apply from: "$rootProject.projectDir/gradle/distributableGhidraModule.gradle"
|
||||||
|
apply from: "$rootProject.projectDir/gradle/javaProject.gradle"
|
||||||
|
apply from: "$rootProject.projectDir/gradle/javaTestProject.gradle"
|
||||||
|
apply from: "$rootProject.projectDir/gradle/nativeProject.gradle"
|
||||||
|
apply from: "$rootProject.projectDir/gradle/helpProject.gradle"
|
||||||
|
|
||||||
|
apply plugin: 'eclipse'
|
||||||
|
eclipse.project.name = 'Features BSim'
|
||||||
|
|
||||||
|
import java.nio.file.Files
|
||||||
|
import org.gradle.util.GUtil
|
||||||
|
|
||||||
|
// NOTE: fetchDependencies.gradle must be updated if postgresql version changes
|
||||||
|
def postgresql_distro = "postgresql-15.3.tar.gz"
|
||||||
|
|
||||||
|
dependencies {
|
||||||
|
api project(":Decompiler")
|
||||||
|
api project(":CodeCompare")
|
||||||
|
|
||||||
|
api "org.postgresql:postgresql:42.6.0"
|
||||||
|
api "org.json.simple:json-simple:1.1.1"
|
||||||
|
api "org.apache.commons:commons-dbcp2:2.9.0"
|
||||||
|
api "org.apache.commons:commons-pool2:2.11.1"
|
||||||
|
api "org.apache.commons:commons-logging:1.2"
|
||||||
|
api "org.apache.logging.log4j:log4j-jcl:2.16.0"
|
||||||
|
api "com.h2database:h2:2.2.220"
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy postgresql source distro, lshvector plugin source, and make-postgres.sh
|
||||||
|
// into common zip to allow for a rebuild of the postgres server if needed
|
||||||
|
|
||||||
|
rootProject.assembleDistribution {
|
||||||
|
|
||||||
|
String postgresqlDepsFile = "${DEPS_DIR}/BSim/${postgresql_distro}"
|
||||||
|
String postgresqlBinRepoFile = "${BIN_REPO}/Ghidra/Features/BSim/${postgresql_distro}"
|
||||||
|
|
||||||
|
def postgresqlFile = file(postgresqlDepsFile).exists() ? postgresqlDepsFile : postgresqlBinRepoFile
|
||||||
|
|
||||||
|
into (getZipPath(this.project)) {
|
||||||
|
from file("make-postgres.sh")
|
||||||
|
}
|
||||||
|
into (getZipPath(this.project)) {
|
||||||
|
from file(postgresqlFile)
|
||||||
|
}
|
||||||
|
into (getZipPath(this.project) + "/src/lshvector") {
|
||||||
|
from files("src/lshvector")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Relative to the 'workingDir' Exec task property.
|
||||||
|
def installPoint = "../help/help"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build the pdf docs for BSim and place into the '$installPoint' directory.
|
||||||
|
* A build (ex: 'gradle buildLocalTSSI_Release') will place the pdf in the distribution.
|
||||||
|
* There is an associated, auto-generated clean task.
|
||||||
|
**/
|
||||||
|
task buildBSimHelpPdf(type: Exec) {
|
||||||
|
|
||||||
|
workingDir 'src/main/doc'
|
||||||
|
|
||||||
|
def buildDir = "../../../build/BSimDocumentationPdf"
|
||||||
|
|
||||||
|
// Gradle will provide a cleanBuildBSimDocumentationPdf task that will remove these
|
||||||
|
// declared outputs.
|
||||||
|
outputs.dir "$workingDir/$buildDir"
|
||||||
|
outputs.file "$workingDir/$buildDir/bsim.pdf"
|
||||||
|
|
||||||
|
// 'which' returns the number of failed arguments
|
||||||
|
// Using the 'which' command first will allow the task to fail if the required
|
||||||
|
// executables are not installed.
|
||||||
|
//
|
||||||
|
// The bash commands end with "2>&1" to redirect stderr to stdout and have all
|
||||||
|
// messages print in sequence
|
||||||
|
//
|
||||||
|
// 'commandLine' takes one command, so wrap multiple commands in bash.
|
||||||
|
commandLine 'bash', '-e', '-c', """
|
||||||
|
echo '** Checking if required executables are installed. **'
|
||||||
|
which xsltproc
|
||||||
|
which fop
|
||||||
|
|
||||||
|
echo '** Preparing for xsltproc **'
|
||||||
|
mkdir -p $buildDir/images
|
||||||
|
|
||||||
|
cp $installPoint/topics/BSimDatabasePlugin/images/*.png $buildDir/images
|
||||||
|
|
||||||
|
echo '** Building bsim.fo **'
|
||||||
|
xsltproc --output $buildDir/bsim_withscaling.xml --stringparam profile.condition "withscaling" commonprofile.xsl bsim.xml 2>&1
|
||||||
|
xsltproc --output $buildDir/bsim.fo focustom.xsl $buildDir/bsim_withscaling.xml 2>&1
|
||||||
|
|
||||||
|
echo '** Building bsim.pdf **'
|
||||||
|
fop $buildDir/bsim.fo $buildDir/bsim.pdf 2>&1
|
||||||
|
|
||||||
|
echo '** Done. **'
|
||||||
|
"""
|
||||||
|
|
||||||
|
// Allows doLast block regardless of exit value.
|
||||||
|
ignoreExitValue true
|
||||||
|
|
||||||
|
// Store the output instead of printing to the console.
|
||||||
|
standardOutput = new ByteArrayOutputStream()
|
||||||
|
ext.output = { standardOutput.toString() }
|
||||||
|
ext.errorOutput = { standardOutput.toString() }
|
||||||
|
|
||||||
|
// Check the OS before executing command.
|
||||||
|
doFirst {
|
||||||
|
if (!getCurrentPlatformName().startsWith("linux")) {
|
||||||
|
throw new TaskExecutionException( it, new Exception("The '$it.name' task only works on Linux."))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Print the output of the commands and check the return value.
|
||||||
|
doLast {
|
||||||
|
println output()
|
||||||
|
if (execResult.exitValue) {
|
||||||
|
logger.error("$it.name: An error occurred. Here is the output:\n" + output())
|
||||||
|
throw new TaskExecutionException( it, new Exception("'$it.name': The command: '${commandLine.join(' ')}'" +
|
||||||
|
" task \nfailed with exit code $execResult.exitValue; see task output for details."))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build the html docs for BSim and place into the '$installPoint' directory.
|
||||||
|
* A build (ex: 'gradle buildLocalTSSI_Release') will place the html files in the distribution.
|
||||||
|
**/
|
||||||
|
task buildBSimHelpHtml(type: Exec) {
|
||||||
|
|
||||||
|
workingDir 'src/main/doc'
|
||||||
|
|
||||||
|
def buildDir = "../../../build/html"
|
||||||
|
|
||||||
|
// 'which' returns the number of failed arguments
|
||||||
|
// Using the 'which' command first will allow the task to fail if the required
|
||||||
|
// executables are not installed.
|
||||||
|
//
|
||||||
|
// The bash commands end with "2>&1" to redirect stderr to stdout and have all
|
||||||
|
// messages print in sequence
|
||||||
|
//
|
||||||
|
// 'commandLine' takes one command, so wrap multiple commands in bash.
|
||||||
|
commandLine 'bash', '-e', '-c', """
|
||||||
|
echo '** Checking if required executables are installed. **'
|
||||||
|
which xsltproc
|
||||||
|
which sed
|
||||||
|
|
||||||
|
echo '** Removing older html files installed under '$installPoint' **'
|
||||||
|
rm -f $installPoint/topics/BSimDatabasePlugin/*.html
|
||||||
|
|
||||||
|
echo '** Building html files **'
|
||||||
|
xsltproc --output $buildDir/bsim_noscaling.xml --stringparam profile.condition "noscaling" commonprofile.xsl bsim.xml 2>&1
|
||||||
|
xsltproc --stringparam base.dir ${installPoint}/topics/BSimDatabasePlugin/ htmlcustom.xsl $buildDir/bsim_noscaling.xml 2>&1
|
||||||
|
sed -i -e '/DefaultStyle.css/ { p; sQhref=".*"Qhref="../../shared/languages.css"Q; }' ${installPoint}/topics/BSimDatabasePlugin/*.html
|
||||||
|
rm $installPoint/topics/BSimDatabasePlugin/index.html
|
||||||
|
|
||||||
|
echo '** Done. **'
|
||||||
|
"""
|
||||||
|
|
||||||
|
// Allows doLast block regardless of exit value.
|
||||||
|
ignoreExitValue true
|
||||||
|
|
||||||
|
// Store the output instead of printing to the console.
|
||||||
|
standardOutput = new ByteArrayOutputStream()
|
||||||
|
ext.output = { standardOutput.toString() }
|
||||||
|
ext.errorOutput = { standardOutput.toString() }
|
||||||
|
|
||||||
|
// Check the OS before executing command.
|
||||||
|
doFirst {
|
||||||
|
if (!getCurrentPlatformName().startsWith("linux")) {
|
||||||
|
throw new TaskExecutionException( it, new Exception("The '$it.name' task only works on Linux."))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Print the output of the commands and check the return value.
|
||||||
|
doLast {
|
||||||
|
println output()
|
||||||
|
if (execResult.exitValue) {
|
||||||
|
logger.error("$it.name: An error occurred. Here is the output:\n" + output())
|
||||||
|
throw new TaskExecutionException( it, new Exception("'$it.name': The command: '${commandLine.join(' ')}'" +
|
||||||
|
" task \nfailed with exit code $execResult.exitValue; see task output for details."))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
51
Ghidra/Features/BSim/certification.manifest
Executable file
51
Ghidra/Features/BSim/certification.manifest
Executable file
@ -0,0 +1,51 @@
|
|||||||
|
##VERSION: 2.0
|
||||||
|
##MODULE IP: Apache License 2.0
|
||||||
|
##MODULE IP: Creative Commons Attribution 2.5
|
||||||
|
##MODULE IP: Crystal Clear Icons - LGPL 2.1
|
||||||
|
##MODULE IP: FAMFAMFAM Icons - CC 2.5
|
||||||
|
##MODULE IP: H2 Mozilla License 2.0
|
||||||
|
##MODULE IP: LGPL 2.1
|
||||||
|
##MODULE IP: LGPL 3.0
|
||||||
|
##MODULE IP: Oxygen Icons - LGPL 3.0
|
||||||
|
##MODULE IP: Postgresql License
|
||||||
|
##MODULE IP: PostgresqlJDBC License
|
||||||
|
##MODULE IP: Public Domain
|
||||||
|
Module.manifest||GHIDRA||||END|
|
||||||
|
data/bsim.theme.properties||GHIDRA||||END|
|
||||||
|
data/large_32.xml||GHIDRA||||END|
|
||||||
|
data/lshweights_32.xml||GHIDRA|||Signature data|END|
|
||||||
|
data/lshweights_64.xml||GHIDRA|||Signature data|END|
|
||||||
|
data/lshweights_64_32.xml||GHIDRA|||Signature data|END|
|
||||||
|
data/lshweights_cpool.xml||GHIDRA||||END|
|
||||||
|
data/lshweights_nosize.xml||GHIDRA||||END|
|
||||||
|
data/medium_32.xml||GHIDRA||||END|
|
||||||
|
data/medium_64.xml||GHIDRA||||END|
|
||||||
|
data/medium_cpool.xml||GHIDRA||||END|
|
||||||
|
data/medium_nosize.xml||GHIDRA||||END|
|
||||||
|
data/serverconfig.xml||GHIDRA||||END|
|
||||||
|
src/lshvector/Makefile.lshvector||GHIDRA||||END|
|
||||||
|
src/lshvector/lshvector--1.0.sql||GHIDRA||||END|
|
||||||
|
src/lshvector/lshvector.control||GHIDRA||||END|
|
||||||
|
src/main/help/help/TOC_Source.xml||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSim/BSimOverview.html||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSim/CommandLineReference.html||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSim/DatabaseConfiguration.html||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSim/FeatureWeight.html||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSim/IngestProcess.html||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/BSimSearch.html||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/AddServerDialog.png||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/ApplyResultsPanel.png||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/BSimOverviewDialog.png||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/BSimOverviewResults.png||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/BSimResultsProvider.png||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/BSimSearchDialog.png||GHIDRA||||END|
|
||||||
|
src/main/help/help/topics/BSimSearchPlugin/images/ManageServersDialog.png||GHIDRA||||END|
|
||||||
|
src/main/resources/bsim.log4j.xml||GHIDRA||||END|
|
||||||
|
src/main/resources/images/checkmark_yellow.gif||GHIDRA||||END|
|
||||||
|
src/main/resources/images/flag_green.png||FAMFAMFAM Icons - CC 2.5|||famfamfam silk icon set|END|
|
||||||
|
src/main/resources/images/preferences-desktop-user-password.png||Oxygen Icons - LGPL 3.0|||Oxygen icon theme (dual license; LGPL or CC-SA-3.0)|END|
|
||||||
|
src/main/resources/images/preferences-web-browser-shortcuts-32.png||Oxygen Icons - LGPL 3.0|||Oxygen icon theme (dual license; LGPL or CC-SA-3.0)|END|
|
||||||
|
src/main/resources/images/preferences-web-browser-shortcuts.png||LGPL 3.0|||oxygen|END|
|
||||||
|
src/main/resources/images/view_top_bottom.png||Crystal Clear Icons - LGPL 2.1||||END|
|
||||||
|
src/main/resources/log4j-appender-console.xml||GHIDRA||||END|
|
||||||
|
src/main/resources/log4j-appender-rolling-file.xml||GHIDRA||||END|
|
17
Ghidra/Features/BSim/data/bsim.theme.properties
Normal file
17
Ghidra/Features/BSim/data/bsim.theme.properties
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
|
||||||
|
[Defaults]
|
||||||
|
|
||||||
|
icon.bsim.query.dialog.provider = preferences-web-browser-shortcuts.png
|
||||||
|
|
||||||
|
icon.bsim.change.password = preferences-desktop-user-password.png
|
||||||
|
|
||||||
|
icon.bsim.table.split = view_top_bottom.png
|
||||||
|
|
||||||
|
icon.bsim.results.status.name.applied = checkmark_green.gif
|
||||||
|
icon.bsim.results.status.signature.applied = EMPTY_ICON {checkmark_green.gif[move(-2,-1)]} {checkmark_green.gif [move(4,0)]}
|
||||||
|
icon.bsim.results.status.matches = flag_green.png
|
||||||
|
icon.bsim.results.status.ignored = checkmark_yellow.gif
|
||||||
|
|
||||||
|
icon.bsim.functions.table = FunctionScope.gif
|
||||||
|
|
||||||
|
[Dark Defaults]
|
13
Ghidra/Features/BSim/data/large_32.xml
Executable file
13
Ghidra/Features/BSim/data/large_32.xml
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
<dbconfig>
|
||||||
|
<info>
|
||||||
|
<name>Large 32-bit</name>
|
||||||
|
<owner>Example Owner</owner>
|
||||||
|
<description>A large (~100 million functions) database tuned for 32-bit executables</description>
|
||||||
|
<major>0</major>
|
||||||
|
<minor>0</minor>
|
||||||
|
<settings>0x49</settings>
|
||||||
|
</info>
|
||||||
|
<k>19</k>
|
||||||
|
<L>232</L>
|
||||||
|
<weightsfile>lshweights_32.xml</weightsfile>
|
||||||
|
</dbconfig>
|
1587
Ghidra/Features/BSim/data/lshweights_32.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_32.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_64.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_64.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_64_32.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_64_32.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_cpool.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_cpool.xml
Executable file
File diff suppressed because it is too large
Load Diff
1587
Ghidra/Features/BSim/data/lshweights_nosize.xml
Executable file
1587
Ghidra/Features/BSim/data/lshweights_nosize.xml
Executable file
File diff suppressed because it is too large
Load Diff
13
Ghidra/Features/BSim/data/medium_32.xml
Executable file
13
Ghidra/Features/BSim/data/medium_32.xml
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
<dbconfig>
|
||||||
|
<info>
|
||||||
|
<name>Medium 32-bit</name>
|
||||||
|
<owner>Example Owner</owner>
|
||||||
|
<description>A medium sized (~10 million functions) database tuned for 32-bit executables</description>
|
||||||
|
<major>0</major>
|
||||||
|
<minor>0</minor>
|
||||||
|
<settings>0x49</settings>
|
||||||
|
</info>
|
||||||
|
<k>17</k>
|
||||||
|
<L>146</L>
|
||||||
|
<weightsfile>lshweights_32.xml</weightsfile>
|
||||||
|
</dbconfig>
|
13
Ghidra/Features/BSim/data/medium_64.xml
Executable file
13
Ghidra/Features/BSim/data/medium_64.xml
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
<dbconfig>
|
||||||
|
<info>
|
||||||
|
<name>Medium 64-bit</name>
|
||||||
|
<owner>Example Owner</owner>
|
||||||
|
<description>A medium sized (~10 million functions) database tuned for 64-bit executables</description>
|
||||||
|
<major>0</major>
|
||||||
|
<minor>0</minor>
|
||||||
|
<settings>0x49</settings>
|
||||||
|
</info>
|
||||||
|
<k>17</k>
|
||||||
|
<L>146</L>
|
||||||
|
<weightsfile>lshweights_64.xml</weightsfile>
|
||||||
|
</dbconfig>
|
13
Ghidra/Features/BSim/data/medium_cpool.xml
Executable file
13
Ghidra/Features/BSim/data/medium_cpool.xml
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
<dbconfig>
|
||||||
|
<info>
|
||||||
|
<name>Medium JVM/Dalvik</name>
|
||||||
|
<owner>Example Owner</owner>
|
||||||
|
<description>A medium sized (~10 million functions) database tuned for java .class or .dex files</description>
|
||||||
|
<major>0</major>
|
||||||
|
<minor>0</minor>
|
||||||
|
<settings>0x49</settings>
|
||||||
|
</info>
|
||||||
|
<k>17</k>
|
||||||
|
<L>146</L>
|
||||||
|
<weightsfile>lshweights_cpool.xml</weightsfile>
|
||||||
|
</dbconfig>
|
13
Ghidra/Features/BSim/data/medium_nosize.xml
Executable file
13
Ghidra/Features/BSim/data/medium_nosize.xml
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
<dbconfig>
|
||||||
|
<info>
|
||||||
|
<name>Medium No Size</name>
|
||||||
|
<owner>Example Owner</owner>
|
||||||
|
<description>A medium sized (~10 million functions) database tuned for executables with different address/register sizes</description>
|
||||||
|
<major>0</major>
|
||||||
|
<minor>0</minor>
|
||||||
|
<settings>0x4d</settings>
|
||||||
|
</info>
|
||||||
|
<k>17</k>
|
||||||
|
<L>146</L>
|
||||||
|
<weightsfile>lshweights_nosize.xml</weightsfile>
|
||||||
|
</dbconfig>
|
14
Ghidra/Features/BSim/data/serverconfig.xml
Executable file
14
Ghidra/Features/BSim/data/serverconfig.xml
Executable file
@ -0,0 +1,14 @@
|
|||||||
|
<serverconfig> <!-- Runtime parameters for the query server -->
|
||||||
|
<config key="shared_buffers">2GB</config> <!-- Amount of memory the server will use -->
|
||||||
|
<config key="work_mem">16MB</config> <!-- Max memory to use for hash tables and sorts -->
|
||||||
|
<config key="checkpoint_timeout">30min</config> <!-- Amount of time before all database records are flushed to disk -->
|
||||||
|
<config key="listen_addresses">'*'</config> <!-- '*' = all available, '0.0.0.0' just IPv4, 'localhost' -->
|
||||||
|
<config key="ssl">on</config> <!-- Enable server to connect via SSL -->
|
||||||
|
<!-- <config key="ssl_ciphers">TLSv1.2</config> -->
|
||||||
|
<config key="password_encryption">scram-sha-256</config>
|
||||||
|
|
||||||
|
<!-- <connect db="all" user="all" type="local" method="trust"/> -->
|
||||||
|
<connect db="all" user="all" addr="127.0.0.1/32" type="hostssl" method="trust"/>
|
||||||
|
<connect db="all" user="all" addr="::1/128" type="hostssl" method="trust"/>
|
||||||
|
<connect db="all" user="all" addr="all" type="hostssl" method="trust"/>
|
||||||
|
</serverconfig>
|
@ -0,0 +1,175 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
//Generate BSim signatures for the current program. The URL for the program is
|
||||||
|
//created from the local storage location. These signatures are intended for the
|
||||||
|
//in-memory database backend.
|
||||||
|
//@category BSim
|
||||||
|
import java.io.File;
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.net.URL;
|
||||||
|
import java.util.Iterator;
|
||||||
|
|
||||||
|
import generic.lsh.vector.LSHVectorFactory;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.base.values.GhidraValuesMap;
|
||||||
|
import ghidra.features.bsim.query.*;
|
||||||
|
import ghidra.features.bsim.query.BSimServerInfo.DBType;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase.Error;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase.ErrorCategory;
|
||||||
|
import ghidra.features.bsim.query.description.DatabaseInformation;
|
||||||
|
import ghidra.features.bsim.query.description.DescriptionManager;
|
||||||
|
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager;
|
||||||
|
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager.BSimH2FileDataSource;
|
||||||
|
import ghidra.features.bsim.query.protocol.*;
|
||||||
|
import ghidra.framework.model.DomainFolder;
|
||||||
|
import ghidra.framework.protocol.ghidra.GhidraURL;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
import ghidra.program.model.listing.FunctionManager;
|
||||||
|
import ghidra.util.MessageType;
|
||||||
|
import ghidra.util.Msg;
|
||||||
|
|
||||||
|
//@category BSim
|
||||||
|
//Generates and commits the BSim signatures for the currentProgram to the
|
||||||
|
//selected H2 BSim database
|
||||||
|
public class AddProgramToH2BSimDatabaseScript extends GhidraScript {
|
||||||
|
|
||||||
|
private static final String DATABASE = "H2 Database";
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
if (isRunningHeadless()) {
|
||||||
|
popup("Use the \"bsim\" command-line tool to add programs to a database headlessly");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (currentProgram == null) {
|
||||||
|
popup("This script requires that a program be open in the tool");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
GhidraValuesMap values = new GhidraValuesMap();
|
||||||
|
values.defineFile(DATABASE, null, new File(System.getProperty("user.home")));
|
||||||
|
values.setValidator((valueMap, status) -> {
|
||||||
|
File selected = valueMap.getFile(DATABASE);
|
||||||
|
if (selected.isDirectory() ||
|
||||||
|
!selected.getAbsolutePath().endsWith(BSimServerInfo.H2_FILE_EXTENSION)) {
|
||||||
|
status.setStatusText("Invalid Database File!", MessageType.ERROR);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
|
||||||
|
askValues("Select Database File", null, values);
|
||||||
|
|
||||||
|
File h2DbFile = values.getFile(DATABASE);
|
||||||
|
|
||||||
|
FunctionDatabase h2Database = null;
|
||||||
|
try {
|
||||||
|
BSimServerInfo serverInfo =
|
||||||
|
new BSimServerInfo(DBType.file, null, 0, h2DbFile.getAbsolutePath());
|
||||||
|
h2Database = BSimClientFactory.buildClient(serverInfo, false);
|
||||||
|
BSimH2FileDataSource bds =
|
||||||
|
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||||
|
if (bds == null) {
|
||||||
|
popup(h2DbFile.getAbsolutePath() + " is not an H2 database file");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (bds.getActiveConnections() > 0) {
|
||||||
|
popup("There is an existing connection to the database.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
h2Database.initialize();
|
||||||
|
DatabaseInformation dbInfo = h2Database.getInfo();
|
||||||
|
|
||||||
|
LSHVectorFactory vectorFactory = h2Database.getLSHVectorFactory();
|
||||||
|
GenSignatures gensig = null;
|
||||||
|
try {
|
||||||
|
gensig = new GenSignatures(dbInfo.trackcallgraph);
|
||||||
|
gensig.setVectorFactory(vectorFactory);
|
||||||
|
gensig.addExecutableCategories(dbInfo.execats);
|
||||||
|
gensig.addFunctionTags(dbInfo.functionTags);
|
||||||
|
gensig.addDateColumnName(dbInfo.dateColumnName);
|
||||||
|
|
||||||
|
DomainFolder df = currentProgram.getDomainFile().getParent();
|
||||||
|
URL folderURL = df.getSharedProjectURL();
|
||||||
|
if (folderURL == null) {
|
||||||
|
folderURL = df.getLocalProjectURL();
|
||||||
|
}
|
||||||
|
String path = GhidraURL.getProjectPathname(folderURL);
|
||||||
|
|
||||||
|
URL normalizedProjectURL = GhidraURL.getProjectURL(folderURL);
|
||||||
|
String repo = normalizedProjectURL.toExternalForm();
|
||||||
|
|
||||||
|
gensig.openProgram(this.currentProgram, null, null, null, repo, path);
|
||||||
|
final FunctionManager fman = currentProgram.getFunctionManager();
|
||||||
|
final Iterator<Function> iter = fman.getFunctions(true);
|
||||||
|
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor);
|
||||||
|
final DescriptionManager manager = gensig.getDescriptionManager();
|
||||||
|
|
||||||
|
//need to call sortCallGraph on each FunctionDescription
|
||||||
|
//this de-dupes the list of callees for each function
|
||||||
|
//without this there can be SQL errors due to inserting duplicate
|
||||||
|
//entries into the callgraph table
|
||||||
|
manager.listAllFunctions().forEachRemaining(fd -> fd.sortCallgraph());
|
||||||
|
|
||||||
|
InsertRequest insertreq = new InsertRequest();
|
||||||
|
insertreq.manage = manager;
|
||||||
|
if (insertreq.execute(h2Database) == null) {
|
||||||
|
Error lastError = h2Database.getLastError();
|
||||||
|
if ((lastError.category == ErrorCategory.Format) ||
|
||||||
|
(lastError.category == ErrorCategory.Nonfatal)) {
|
||||||
|
Msg.showWarn(this, null, "Skipping Insert",
|
||||||
|
currentProgram.getName() + ": " + lastError.message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
throw new IOException(currentProgram.getName() + ": " + lastError.message);
|
||||||
|
}
|
||||||
|
|
||||||
|
StringBuffer status = new StringBuffer(currentProgram.getName());
|
||||||
|
status.append(" added to database ");
|
||||||
|
status.append(dbInfo.databasename);
|
||||||
|
status.append("\n\n");
|
||||||
|
QueryExeCount exeCount = new QueryExeCount();
|
||||||
|
ResponseExe countResponse = exeCount.execute(h2Database);
|
||||||
|
if (countResponse != null) {
|
||||||
|
status.append(dbInfo.databasename);
|
||||||
|
status.append(" contains ");
|
||||||
|
status.append(countResponse.recordCount);
|
||||||
|
status.append(" executables.");
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
status.append("null response from QueryExeCount");
|
||||||
|
}
|
||||||
|
popup(status.toString());
|
||||||
|
}
|
||||||
|
finally {
|
||||||
|
if (gensig != null) {
|
||||||
|
gensig.dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
finally {
|
||||||
|
if (h2Database != null) {
|
||||||
|
h2Database.close();
|
||||||
|
BSimH2FileDataSource bds =
|
||||||
|
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||||
|
bds.dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
80
Ghidra/Features/BSim/ghidra_scripts/CompareExecutables.java
Executable file
80
Ghidra/Features/BSim/ghidra_scripts/CompareExecutables.java
Executable file
@ -0,0 +1,80 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Calculate similarity/signifigance scores between executables by
|
||||||
|
// combining their function scores.
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.net.URL;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.BSimClientFactory;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase;
|
||||||
|
import ghidra.features.bsim.query.client.*;
|
||||||
|
import ghidra.features.bsim.query.description.ExecutableRecord;
|
||||||
|
|
||||||
|
public class CompareExecutables extends GhidraScript {
|
||||||
|
|
||||||
|
private ExecutableComparison exeCompare;
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
URL url = BSimClientFactory.deriveBSimURL("ghidra://localhost/repo");
|
||||||
|
try (FunctionDatabase database = BSimClientFactory.buildClient(url, true)) {
|
||||||
|
// FileScoreCaching cache = new FileScoreCaching("/tmp/test_scorecacher.txt");
|
||||||
|
TableScoreCaching cache = new TableScoreCaching(database);
|
||||||
|
exeCompare =
|
||||||
|
new ExecutableComparison(database, 1000000, "11111111111111111111111111111111",
|
||||||
|
cache,
|
||||||
|
monitor);
|
||||||
|
// Specify the list of executables to compare by giving their md5 hash
|
||||||
|
// exeCompare.addExecutable("22222222222222222222222222222222"); // 32 hex-digit string
|
||||||
|
// exeCompare.addExecutable("33333333333333333333333333333333");
|
||||||
|
exeCompare.addAllExecutables(5000);
|
||||||
|
ExecutableScorer scorer = exeCompare.getScorer();
|
||||||
|
if (!exeCompare.isConfigured()) {
|
||||||
|
exeCompare.resetThresholds(0.7, 10.0);
|
||||||
|
}
|
||||||
|
exeCompare.fillinSelfScores(); // Prefetch self-scores, calculate any we are missing
|
||||||
|
|
||||||
|
exeCompare.performScoring();
|
||||||
|
scorer.commitSelfScore(); // Commit the newly calculated self-score
|
||||||
|
|
||||||
|
println("Maximum cluster size = " + Integer.toString(exeCompare.getMaxHitCount()));
|
||||||
|
println("Hit count exceeded = " + Integer.toString(exeCompare.getExceedCount()));
|
||||||
|
float scoreThresh = 0.01f;
|
||||||
|
int numExe = scorer.numExecutables();
|
||||||
|
ExecutableRecord exeA = scorer.getSingularExecutable();
|
||||||
|
float selfScoreA = scorer.getSingularSelfScore();
|
||||||
|
for (int i = 1; i <= numExe; ++i) {
|
||||||
|
ExecutableRecord exeB = scorer.getExecutable(i);
|
||||||
|
float selfScoreB = scorer.getScore(i);
|
||||||
|
if (selfScoreB == 0.0f) { // This is possible if the executable has no "rare" functions.
|
||||||
|
continue; // as defined by the ExecutableComparison.hitCountThreshold
|
||||||
|
}
|
||||||
|
ExecutableRecord smallRecord = selfScoreA < selfScoreB ? exeA : exeB;
|
||||||
|
ExecutableRecord bigRecord = selfScoreA < selfScoreB ? exeB : exeA;
|
||||||
|
float libScore = scorer.getNormalizedScore(i, true);
|
||||||
|
float totalScore = scorer.getNormalizedScore(i, false);
|
||||||
|
if (libScore < scoreThresh) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
println(smallRecord.getNameExec() + " " + bigRecord.getNameExec());
|
||||||
|
println(" " + Float.toString(libScore) + " library score");
|
||||||
|
println(" " + Float.toString(totalScore) + " total score");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
148
Ghidra/Features/BSim/ghidra_scripts/CompareSignatures.java
Executable file
148
Ghidra/Features/BSim/ghidra_scripts/CompareSignatures.java
Executable file
@ -0,0 +1,148 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Use the decompiler to generate a signature for the current function containing the cursor
|
||||||
|
// If we remember the last signature that was generated, compare this signature with
|
||||||
|
// the last signature and print the similarity
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
|
||||||
|
import org.xml.sax.SAXException;
|
||||||
|
|
||||||
|
import generic.jar.ResourceFile;
|
||||||
|
import generic.lsh.vector.*;
|
||||||
|
import ghidra.app.decompiler.DecompInterface;
|
||||||
|
import ghidra.app.decompiler.DecompileOptions;
|
||||||
|
import ghidra.app.decompiler.signature.SignatureResult;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.app.services.ProgramManager;
|
||||||
|
import ghidra.features.bsim.query.GenSignatures;
|
||||||
|
import ghidra.program.model.address.Address;
|
||||||
|
import ghidra.program.model.lang.LanguageID;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
import ghidra.program.model.listing.Program;
|
||||||
|
import ghidra.util.xml.SpecXmlUtils;
|
||||||
|
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||||
|
import ghidra.xml.XmlPullParser;
|
||||||
|
|
||||||
|
public class CompareSignatures extends GhidraScript {
|
||||||
|
|
||||||
|
private LSHVectorFactory vectorFactory;
|
||||||
|
|
||||||
|
private LSHVector generateVector(Function f, Program program) {
|
||||||
|
DecompInterface decompiler = new DecompInterface();
|
||||||
|
decompiler.setOptions(new DecompileOptions());
|
||||||
|
decompiler.toggleSyntaxTree(false);
|
||||||
|
decompiler.setSignatureSettings(vectorFactory.getSettings());
|
||||||
|
if (!decompiler.openProgram(program)) {
|
||||||
|
println("Unable to initalize the Decompiler interface");
|
||||||
|
println(decompiler.getLastMessage());
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
SignatureResult sigres = decompiler.generateSignatures(f, false, 10, null);
|
||||||
|
LSHVector vec = vectorFactory.buildVector(sigres.features);
|
||||||
|
return vec;
|
||||||
|
}
|
||||||
|
|
||||||
|
private Program getProgram(Program[] progarray, String name) {
|
||||||
|
if ((name == null) || (progarray == null)) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
for (Program prog : progarray) {
|
||||||
|
if (name.equals(prog.getName())) {
|
||||||
|
return prog;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void readWeights(LSHVectorFactory vectorFactory, ResourceFile weightsFile)
|
||||||
|
throws FileNotFoundException, IOException, SAXException {
|
||||||
|
InputStream input = weightsFile.getInputStream();
|
||||||
|
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||||
|
SpecXmlUtils.getXmlHandler(), false);
|
||||||
|
vectorFactory.readWeights(parser);
|
||||||
|
input.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
private void buildLSHVectorFactory() {
|
||||||
|
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||||
|
try {
|
||||||
|
LanguageID id = currentProgram.getLanguageID();
|
||||||
|
ResourceFile defaultWeightsFile = GenSignatures.getWeightsFile(id, id);
|
||||||
|
readWeights(vectorFactory, defaultWeightsFile);
|
||||||
|
}
|
||||||
|
catch (FileNotFoundException e) {
|
||||||
|
// TODO Auto-generated catch block
|
||||||
|
e.printStackTrace();
|
||||||
|
}
|
||||||
|
catch (IOException e) {
|
||||||
|
// TODO Auto-generated catch block
|
||||||
|
e.printStackTrace();
|
||||||
|
}
|
||||||
|
catch (SAXException e) {
|
||||||
|
// TODO Auto-generated catch block
|
||||||
|
e.printStackTrace();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
Function func = this.getFunctionContaining(this.currentAddress);
|
||||||
|
if (func == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
buildLSHVectorFactory();
|
||||||
|
LSHVector vec = generateVector(func, currentProgram);
|
||||||
|
ProgramManager programManager = state.getTool().getService(ProgramManager.class);
|
||||||
|
Program[] progarray = programManager.getAllOpenPrograms();
|
||||||
|
String lastprogram_string = System.getProperty("ghidra.lastprogram");
|
||||||
|
Program lastprogram = getProgram(progarray, lastprogram_string);
|
||||||
|
VectorCompare veccompare = new VectorCompare();
|
||||||
|
if (lastprogram != null) {
|
||||||
|
String addrstring = System.getProperty("ghidra.lastaddress");
|
||||||
|
if (addrstring != null) {
|
||||||
|
Address addr = lastprogram.getAddressFactory().getAddress(addrstring);
|
||||||
|
Function lastfunction = lastprogram.getFunctionManager().getFunctionAt(addr);
|
||||||
|
if (lastfunction != null) {
|
||||||
|
LSHVector lastvector = generateVector(lastfunction, lastprogram);
|
||||||
|
double sim = lastvector.compare(vec, veccompare);
|
||||||
|
double signif = vectorFactory.calculateSignificance(veccompare);
|
||||||
|
StringBuilder buf = new StringBuilder();
|
||||||
|
buf.append("Comparison results:\n");
|
||||||
|
buf.append(lastprogram.getName());
|
||||||
|
buf.append(".");
|
||||||
|
buf.append(lastfunction.getName());
|
||||||
|
buf.append(" vs. ");
|
||||||
|
buf.append(currentProgram.getName());
|
||||||
|
buf.append(".");
|
||||||
|
buf.append(func.getName());
|
||||||
|
buf.append("\n Similarity: ");
|
||||||
|
buf.append(Double.toString(sim));
|
||||||
|
buf.append("\n Significance: ");
|
||||||
|
buf.append(Double.toString(signif));
|
||||||
|
buf.append("\n");
|
||||||
|
lastvector.compareDetail(vec, buf);
|
||||||
|
println(buf.toString());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
System.setProperty("ghidra.lastprogram", currentProgram.getName());
|
||||||
|
String addrstring = func.getEntryPoint().toString();
|
||||||
|
System.setProperty("ghidra.lastaddress", addrstring);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
155
Ghidra/Features/BSim/ghidra_scripts/CompareSignaturesSpecifyWeights.java
Executable file
155
Ghidra/Features/BSim/ghidra_scripts/CompareSignaturesSpecifyWeights.java
Executable file
@ -0,0 +1,155 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Compare the BSim feature vectors of two functions.
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
|
||||||
|
import org.xml.sax.SAXException;
|
||||||
|
|
||||||
|
import generic.jar.ResourceFile;
|
||||||
|
import generic.lsh.vector.*;
|
||||||
|
import ghidra.app.decompiler.DecompInterface;
|
||||||
|
import ghidra.app.decompiler.DecompileOptions;
|
||||||
|
import ghidra.app.decompiler.signature.SignatureResult;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.app.services.ProgramManager;
|
||||||
|
import ghidra.framework.Application;
|
||||||
|
import ghidra.program.model.address.Address;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
import ghidra.program.model.listing.Program;
|
||||||
|
import ghidra.util.exception.CancelledException;
|
||||||
|
import ghidra.util.xml.SpecXmlUtils;
|
||||||
|
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||||
|
import ghidra.xml.XmlPullParser;
|
||||||
|
|
||||||
|
public class CompareSignaturesSpecifyWeights extends GhidraScript {
|
||||||
|
|
||||||
|
private static final String DEFAULT_LSH_WEIGHTS_FILE = "lshweights_nosize.xml";
|
||||||
|
private LSHVectorFactory vectorFactory;
|
||||||
|
|
||||||
|
private LSHVector generateVector(Function f, Program program) {
|
||||||
|
DecompInterface decompiler = new DecompInterface();
|
||||||
|
decompiler.setOptions(new DecompileOptions());
|
||||||
|
decompiler.setSignatureSettings(vectorFactory.getSettings());
|
||||||
|
decompiler.toggleSyntaxTree(false);
|
||||||
|
if (!decompiler.openProgram(program)) {
|
||||||
|
println("Unable to initalize the Decompiler interface");
|
||||||
|
println(decompiler.getLastMessage());
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
SignatureResult sigres = decompiler.generateSignatures(f, false, 10, null);
|
||||||
|
|
||||||
|
LSHVector vec = vectorFactory.buildVector(sigres.features);
|
||||||
|
return vec;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void readWeights(LSHVectorFactory vectorFactory, ResourceFile weightsFile)
|
||||||
|
throws FileNotFoundException, IOException, SAXException {
|
||||||
|
InputStream input = weightsFile.getInputStream();
|
||||||
|
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||||
|
SpecXmlUtils.getXmlHandler(), false);
|
||||||
|
vectorFactory.readWeights(parser);
|
||||||
|
input.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
private boolean buildLSHVectorFactory() {
|
||||||
|
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||||
|
try {
|
||||||
|
String weightsFile =
|
||||||
|
askString("Enter weights file name", "weights file", DEFAULT_LSH_WEIGHTS_FILE);
|
||||||
|
ResourceFile defaultWeightsFile = Application.findDataFileInAnyModule(weightsFile);
|
||||||
|
readWeights(vectorFactory, defaultWeightsFile);
|
||||||
|
}
|
||||||
|
catch (FileNotFoundException e) {
|
||||||
|
e.printStackTrace();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
catch (IOException e) {
|
||||||
|
e.printStackTrace();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
catch (SAXException e) {
|
||||||
|
e.printStackTrace();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
catch (CancelledException e) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
private Program getProgram(Program[] progarray, String name) {
|
||||||
|
if ((name == null) || (progarray == null)) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
for (Program prog : progarray) {
|
||||||
|
if (name.equals(prog.getName())) {
|
||||||
|
return prog;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
Function func = this.getFunctionContaining(this.currentAddress);
|
||||||
|
if (func == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (!buildLSHVectorFactory()) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
LSHVector vec = generateVector(func, currentProgram);
|
||||||
|
ProgramManager programManager = state.getTool().getService(ProgramManager.class);
|
||||||
|
Program[] progarray = programManager.getAllOpenPrograms();
|
||||||
|
String lastprogram_string = System.getProperty("ghidra.lastprogram");
|
||||||
|
Program lastprogram = getProgram(progarray, lastprogram_string);
|
||||||
|
VectorCompare veccompare = new VectorCompare();
|
||||||
|
if (lastprogram != null) {
|
||||||
|
String addrstring = System.getProperty("ghidra.lastaddress");
|
||||||
|
if (addrstring != null) {
|
||||||
|
Address addr = lastprogram.getAddressFactory().getAddress(addrstring);
|
||||||
|
Function lastfunction = lastprogram.getFunctionManager().getFunctionAt(addr);
|
||||||
|
if (lastfunction != null) {
|
||||||
|
LSHVector lastvector = generateVector(lastfunction, lastprogram);
|
||||||
|
double sim = lastvector.compare(vec, veccompare);
|
||||||
|
double signif = vectorFactory.calculateSignificance(veccompare);
|
||||||
|
StringBuilder buf = new StringBuilder();
|
||||||
|
buf.append("Comparison results:\n");
|
||||||
|
buf.append(lastprogram.getName());
|
||||||
|
buf.append(".");
|
||||||
|
buf.append(lastfunction.getName());
|
||||||
|
buf.append(" vs. ");
|
||||||
|
buf.append(currentProgram.getName());
|
||||||
|
buf.append(".");
|
||||||
|
buf.append(func.getName());
|
||||||
|
buf.append("\n Similarity: ");
|
||||||
|
buf.append(Double.toString(sim));
|
||||||
|
buf.append("\n Significance: ");
|
||||||
|
buf.append(Double.toString(signif));
|
||||||
|
buf.append("\n");
|
||||||
|
lastvector.compareDetail(vec, buf);
|
||||||
|
println(buf.toString());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
System.setProperty("ghidra.lastprogram", currentProgram.getName());
|
||||||
|
String addrstring = func.getEntryPoint().toString();
|
||||||
|
System.setProperty("ghidra.lastaddress", addrstring);
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,170 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
//Creates an empty file-based H2 BSim database
|
||||||
|
//@category BSim
|
||||||
|
import java.io.File;
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.util.*;
|
||||||
|
|
||||||
|
import org.apache.commons.lang3.StringUtils;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.base.values.GhidraValuesMap;
|
||||||
|
import ghidra.features.bsim.query.*;
|
||||||
|
import ghidra.features.bsim.query.BSimServerInfo.DBType;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase.Error;
|
||||||
|
import ghidra.features.bsim.query.description.DatabaseInformation;
|
||||||
|
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager;
|
||||||
|
import ghidra.features.bsim.query.file.BSimH2FileDBConnectionManager.BSimH2FileDataSource;
|
||||||
|
import ghidra.features.bsim.query.protocol.*;
|
||||||
|
import ghidra.util.MessageType;
|
||||||
|
import ghidra.util.Msg;
|
||||||
|
|
||||||
|
public class CreateH2BSimDatabaseScript extends GhidraScript {
|
||||||
|
private static final String NAME = "Database Name";
|
||||||
|
private static final String DIRECTORY = "Database Directory";
|
||||||
|
private static final String DATABASE_TEMPLATE = "Database Template";
|
||||||
|
private static final String FUNCTION_TAGS = "Function Tags (CSV)";
|
||||||
|
private static final String EXECUTABLE_CATEGORIES = "Executable Categories (CSV)";
|
||||||
|
|
||||||
|
private static final String[] templates =
|
||||||
|
{ "medium_nosize", "medium_32", "medium_64", "medium_cpool" };
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
if (isRunningHeadless()) {
|
||||||
|
popup("Use \"bsim\" to create an H2 BSim database from the command line");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
GhidraValuesMap values = new GhidraValuesMap();
|
||||||
|
values.defineString(NAME, "");
|
||||||
|
values.defineDirectory(DIRECTORY, new File(System.getProperty("user.home")));
|
||||||
|
values.defineChoice(DATABASE_TEMPLATE, "medium_nosize", templates);
|
||||||
|
values.defineString(FUNCTION_TAGS);
|
||||||
|
values.defineString(EXECUTABLE_CATEGORIES);
|
||||||
|
|
||||||
|
values.setValidator((valueMap, status) -> {
|
||||||
|
String databaseName = valueMap.getString(NAME);
|
||||||
|
if (StringUtils.isBlank(databaseName)) {
|
||||||
|
status.setStatusText("Name must be filled in!", MessageType.ERROR);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
File directory = valueMap.getFile(DIRECTORY);
|
||||||
|
if (!directory.isDirectory()) {
|
||||||
|
status.setStatusText("Invalid directory!", MessageType.ERROR);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
File dbFile = new File(directory, databaseName);
|
||||||
|
File testFile = new File(dbFile.getPath() + BSimServerInfo.H2_FILE_EXTENSION);
|
||||||
|
if (testFile.exists()) {
|
||||||
|
status.setStatusText("Database file already exists!", MessageType.ERROR);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
|
||||||
|
askValues("Enter Database Parameters",
|
||||||
|
"Enter values required to create a new BSim H2 database.", values);
|
||||||
|
|
||||||
|
FunctionDatabase h2Database = null;
|
||||||
|
try {
|
||||||
|
String databaseName = values.getString(NAME);
|
||||||
|
File dbDir = values.getFile(DIRECTORY);
|
||||||
|
String template = values.getChoice(DATABASE_TEMPLATE);
|
||||||
|
String functionTagsCSV = values.getString(FUNCTION_TAGS);
|
||||||
|
List<String> tags = parseCSV(functionTagsCSV);
|
||||||
|
|
||||||
|
String exeCatCSV = values.getString(EXECUTABLE_CATEGORIES);
|
||||||
|
List<String> cats = parseCSV(exeCatCSV);
|
||||||
|
|
||||||
|
File dbFile = new File(dbDir, databaseName);
|
||||||
|
|
||||||
|
BSimServerInfo serverInfo =
|
||||||
|
new BSimServerInfo(DBType.file, null, 0, dbFile.getAbsolutePath());
|
||||||
|
h2Database = BSimClientFactory.buildClient(serverInfo, false);
|
||||||
|
BSimH2FileDataSource bds =
|
||||||
|
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||||
|
if (bds.getActiveConnections() > 0) {
|
||||||
|
//if this happens, there is a connection to the database but the
|
||||||
|
//database file was deleted
|
||||||
|
Msg.showError(this, null, "Connection Error",
|
||||||
|
"There is an existing connection to the database!");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
CreateDatabase command = new CreateDatabase();
|
||||||
|
command.info = new DatabaseInformation();
|
||||||
|
// Put in fields provided on the command line
|
||||||
|
// If they are null, the template will fill them in
|
||||||
|
command.info.databasename = databaseName;
|
||||||
|
command.config_template = template;
|
||||||
|
command.info.trackcallgraph = true;
|
||||||
|
ResponseInfo response = command.execute(h2Database);
|
||||||
|
if (response == null) {
|
||||||
|
throw new IOException(h2Database.getLastError().message);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (String tag : tags) {
|
||||||
|
InstallTagRequest req = new InstallTagRequest();
|
||||||
|
req.tag_name = tag;
|
||||||
|
ResponseInfo resp = req.execute(h2Database);
|
||||||
|
if (resp == null) {
|
||||||
|
Error lastError = h2Database.getLastError();
|
||||||
|
throw new LSHException(lastError.message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (String cat : cats) {
|
||||||
|
InstallCategoryRequest req = new InstallCategoryRequest();
|
||||||
|
req.type_name = cat;
|
||||||
|
ResponseInfo resp = req.execute(h2Database);
|
||||||
|
if (resp == null) {
|
||||||
|
Error lastError = h2Database.getLastError();
|
||||||
|
throw new LSHException(lastError.message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
popup("Database " + values.getString(NAME) + " created successfully!");
|
||||||
|
}
|
||||||
|
finally {
|
||||||
|
if (h2Database != null) {
|
||||||
|
h2Database.close();
|
||||||
|
BSimH2FileDataSource bds =
|
||||||
|
BSimH2FileDBConnectionManager.getDataSourceIfExists(h2Database.getServerInfo());
|
||||||
|
bds.dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
//this de-dupes
|
||||||
|
private List<String> parseCSV(String csv) {
|
||||||
|
Set<String> parsed = new HashSet<>();
|
||||||
|
if (StringUtils.isEmpty(csv)) {
|
||||||
|
return new ArrayList<String>();
|
||||||
|
}
|
||||||
|
String[] parts = csv.split(",");
|
||||||
|
for (String p : parts) {
|
||||||
|
if (!StringUtils.isBlank(p)) {
|
||||||
|
parsed.add(p.trim());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
List<String> res = new ArrayList<>(parsed);
|
||||||
|
res.sort(String::compareTo);
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
72
Ghidra/Features/BSim/ghidra_scripts/DebugSignatures.java
Executable file
72
Ghidra/Features/BSim/ghidra_scripts/DebugSignatures.java
Executable file
@ -0,0 +1,72 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
import ghidra.app.decompiler.DecompInterface;
|
||||||
|
import ghidra.app.decompiler.DecompileOptions;
|
||||||
|
import ghidra.app.decompiler.signature.DebugSignature;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.program.model.lang.Language;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
|
||||||
|
public class DebugSignatures extends GhidraScript {
|
||||||
|
|
||||||
|
private static final int SIGNATURE_SETTINGS = 0x45;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
Function func = this.getFunctionContaining(this.currentAddress);
|
||||||
|
|
||||||
|
if (func == null) {
|
||||||
|
popup("No function selected!");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
DecompInterface decompiler = new DecompInterface();
|
||||||
|
decompiler.setOptions(new DecompileOptions());
|
||||||
|
decompiler.toggleSyntaxTree(false);
|
||||||
|
decompiler.setSignatureSettings(SIGNATURE_SETTINGS);
|
||||||
|
if (!decompiler.openProgram(this.currentProgram)) {
|
||||||
|
println("Unable to initalize the Decompiler interface");
|
||||||
|
println(decompiler.getLastMessage());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
Language language = this.currentProgram.getLanguage();
|
||||||
|
List<DebugSignature> sigres = decompiler.debugSignatures(func, 10, null);
|
||||||
|
|
||||||
|
StringBuffer buf = new StringBuffer();
|
||||||
|
buf.append("\nFunction: ");
|
||||||
|
buf.append(func.getName());
|
||||||
|
buf.append("\nentry: ");
|
||||||
|
buf.append(func.getEntryPoint().toString());
|
||||||
|
buf.append("\n\n");
|
||||||
|
if (sigres == null) {
|
||||||
|
printf("Null sigres!\n");
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
for (int i = 0; i < sigres.size(); ++i) {
|
||||||
|
sigres.get(i).printRaw(language, buf);
|
||||||
|
buf.append("\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf("%s\n", buf.toString());
|
||||||
|
decompiler.closeProgram();
|
||||||
|
decompiler.dispose();
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
61
Ghidra/Features/BSim/ghidra_scripts/DumpDebugSignatures.py
Executable file
61
Ghidra/Features/BSim/ghidra_scripts/DumpDebugSignatures.py
Executable file
@ -0,0 +1,61 @@
|
|||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
# Use the decompiler to generate signatures for the function at the current address, then dump the
|
||||||
|
# signature hashes and debug information to the console
|
||||||
|
# @category: BSim.python
|
||||||
|
|
||||||
|
import ghidra.app.decompiler.tracking.DecompInterfaceTracking as DecompInterfaceTracking
|
||||||
|
import ghidra.app.decompiler.DecompileOptions as DecompileOptions
|
||||||
|
import generic.lsh.vector.WeightedLSHCosineVectorFactory as WeightedLSHCosineVectorFactory
|
||||||
|
import ghidra.query.GenSignatures as GenSignatures
|
||||||
|
import ghidra.xml.NonThreadedXmlPullParserImpl as NonThreadedXmlPullParserImpl
|
||||||
|
import ghidra.util.xml.SpecXmlUtils as SpecXmlUtils
|
||||||
|
|
||||||
|
|
||||||
|
def processFunction(func):
|
||||||
|
decompiler = DecompInterfaceTracking()
|
||||||
|
options = DecompileOptions()
|
||||||
|
decompiler.setOptions(options)
|
||||||
|
decompiler.toggleSyntaxTree(False)
|
||||||
|
decompiler.setSignatureSettings(getSettings())
|
||||||
|
if not decompiler.openProgram(currentProgram):
|
||||||
|
print "Unable to initialize the Decompiler interface!"
|
||||||
|
print "%s" % decompiler.getLastMessage()
|
||||||
|
return
|
||||||
|
language = currentProgram.getLanguage()
|
||||||
|
sigres = decompiler.debugSignatures(func,10,None)
|
||||||
|
for i,res in enumerate(sigres):
|
||||||
|
buf = java.lang.StringBuffer()
|
||||||
|
sigres.get(i).printRaw(language,buf)
|
||||||
|
print "%s" % buf.toString()
|
||||||
|
decompiler.closeProgram()
|
||||||
|
decompiler.dispose()
|
||||||
|
|
||||||
|
def getSettings():
|
||||||
|
vectorFactory = WeightedLSHCosineVectorFactory()
|
||||||
|
id = currentProgram.getLanguageID()
|
||||||
|
defaultWeightsFile = GenSignatures.getWeightsFile(id,id)
|
||||||
|
input = defaultWeightsFile.getInputStream()
|
||||||
|
parser = NonThreadedXmlPullParserImpl(input,"Vector weights parser", SpecXmlUtils.getXmlHandler(),False)
|
||||||
|
vectorFactory.readWeights(parser)
|
||||||
|
input.close()
|
||||||
|
return vectorFactory.getSettings()
|
||||||
|
|
||||||
|
func = currentProgram.getFunctionManager().getFunctionContaining(currentAddress)
|
||||||
|
if func is None:
|
||||||
|
print "no function at current address"
|
||||||
|
else:
|
||||||
|
processFunction(func)
|
115
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.java
Executable file
115
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.java
Executable file
@ -0,0 +1,115 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Use the decompiler to generate signatures for the function currently containing the cursor
|
||||||
|
// and dump the signature hashes to the console
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
import org.xml.sax.SAXException;
|
||||||
|
|
||||||
|
import generic.jar.ResourceFile;
|
||||||
|
import generic.lsh.vector.LSHVectorFactory;
|
||||||
|
import generic.lsh.vector.WeightedLSHCosineVectorFactory;
|
||||||
|
import ghidra.app.decompiler.DecompInterface;
|
||||||
|
import ghidra.app.decompiler.DecompileOptions;
|
||||||
|
import ghidra.app.decompiler.signature.DebugSignature;
|
||||||
|
import ghidra.app.decompiler.signature.SignatureResult;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.GenSignatures;
|
||||||
|
import ghidra.program.model.lang.Language;
|
||||||
|
import ghidra.program.model.lang.LanguageID;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
import ghidra.util.xml.SpecXmlUtils;
|
||||||
|
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||||
|
import ghidra.xml.XmlPullParser;
|
||||||
|
|
||||||
|
public class DumpSignatures extends GhidraScript {
|
||||||
|
|
||||||
|
private LSHVectorFactory vectorFactory;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void run() throws Exception {
|
||||||
|
Function func = this.getFunctionContaining(this.currentAddress);
|
||||||
|
if (func == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
buildLSHVectorFactory();
|
||||||
|
boolean debug = false;
|
||||||
|
DecompInterface decompiler = new DecompInterface();
|
||||||
|
decompiler.setOptions(new DecompileOptions());
|
||||||
|
decompiler.setSignatureSettings(vectorFactory.getSettings());
|
||||||
|
decompiler.toggleSyntaxTree(false);
|
||||||
|
if (!decompiler.openProgram(this.currentProgram)) {
|
||||||
|
println("Unable to initalize the Decompiler interface");
|
||||||
|
println(decompiler.getLastMessage());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (!debug) {
|
||||||
|
SignatureResult sigres = decompiler.generateSignatures(func, false, 10, null);
|
||||||
|
StringBuffer buf = new StringBuffer("\n");
|
||||||
|
for (int feature : sigres.features) {
|
||||||
|
buf.append(Integer.toHexString(feature));
|
||||||
|
buf.append("\n");
|
||||||
|
}
|
||||||
|
println(buf.toString());
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
Language language = this.currentProgram.getLanguage();
|
||||||
|
List<DebugSignature> sigres = decompiler.debugSignatures(func, 10, null);
|
||||||
|
StringBuffer buf = new StringBuffer("\n");
|
||||||
|
for (int i = 0; i < sigres.size(); ++i) {
|
||||||
|
sigres.get(i).printRaw(language, buf);
|
||||||
|
buf.append("\n");
|
||||||
|
}
|
||||||
|
println(buf.toString());
|
||||||
|
}
|
||||||
|
decompiler.closeProgram();
|
||||||
|
decompiler.dispose();
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void readWeights(LSHVectorFactory vectorFactory, ResourceFile weightsFile)
|
||||||
|
throws FileNotFoundException, IOException, SAXException {
|
||||||
|
InputStream input = weightsFile.getInputStream();
|
||||||
|
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||||
|
SpecXmlUtils.getXmlHandler(), false);
|
||||||
|
vectorFactory.readWeights(parser);
|
||||||
|
input.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
private void buildLSHVectorFactory() {
|
||||||
|
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||||
|
try {
|
||||||
|
LanguageID id = currentProgram.getLanguageID();
|
||||||
|
ResourceFile defaultWeightsFile = GenSignatures.getWeightsFile(id, id);
|
||||||
|
readWeights(vectorFactory, defaultWeightsFile);
|
||||||
|
}
|
||||||
|
catch (FileNotFoundException e) {
|
||||||
|
// TODO Auto-generated catch block
|
||||||
|
e.printStackTrace();
|
||||||
|
}
|
||||||
|
catch (IOException e) {
|
||||||
|
// TODO Auto-generated catch block
|
||||||
|
e.printStackTrace();
|
||||||
|
}
|
||||||
|
catch (SAXException e) {
|
||||||
|
// TODO Auto-generated catch block
|
||||||
|
e.printStackTrace();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
61
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.py
Executable file
61
Ghidra/Features/BSim/ghidra_scripts/DumpSignatures.py
Executable file
@ -0,0 +1,61 @@
|
|||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
# Use the decompiler to generate signatures for the function at the current address, then dump the
|
||||||
|
# signature hashes to the console
|
||||||
|
# @category: BSim.python
|
||||||
|
|
||||||
|
import ghidra.app.decompiler.tracking.DecompInterfaceTracking as DecompInterfaceTracking
|
||||||
|
import ghidra.app.decompiler.DecompileOptions as DecompileOptions
|
||||||
|
import generic.lsh.vector.WeightedLSHCosineVectorFactory as WeightedLSHCosineVectorFactory
|
||||||
|
import ghidra.query.GenSignatures as GenSignatures
|
||||||
|
import ghidra.xml.NonThreadedXmlPullParserImpl as NonThreadedXmlPullParserImpl
|
||||||
|
import ghidra.util.xml.SpecXmlUtils as SpecXmlUtils
|
||||||
|
|
||||||
|
|
||||||
|
def processFunction(func):
|
||||||
|
decompiler = ghidra.app.decompiler.tracking.DecompInterfaceTracking()
|
||||||
|
options = ghidra.app.decompiler.DecompileOptions()
|
||||||
|
decompiler.setOptions(options)
|
||||||
|
decompiler.toggleSyntaxTree(False)
|
||||||
|
decompiler.setSignatureSettings(getSettings())
|
||||||
|
if not decompiler.openProgram(currentProgram):
|
||||||
|
print "Unable to initialize the Decompiler interface!"
|
||||||
|
print "%s" % decompiler.getLastMessage()
|
||||||
|
return
|
||||||
|
sigres = decompiler.generateSignatures(func, False, 10, None)
|
||||||
|
buf = java.lang.StringBuffer()
|
||||||
|
for i,res in enumerate(sigres.features):
|
||||||
|
buf.append(java.lang.Integer.toHexString(sigres.features[i]))
|
||||||
|
buf.append("\n")
|
||||||
|
print buf.toString()
|
||||||
|
decompiler.closeProgram()
|
||||||
|
decompiler.dispose()
|
||||||
|
|
||||||
|
def getSettings():
|
||||||
|
vectorFactory = WeightedLSHCosineVectorFactory()
|
||||||
|
id = currentProgram.getLanguageID()
|
||||||
|
defaultWeightsFile = GenSignatures.getWeightsFile(id,id)
|
||||||
|
input = defaultWeightsFile.getInputStream()
|
||||||
|
parser = NonThreadedXmlPullParserImpl(input,"Vector weights parser", SpecXmlUtils.getXmlHandler(),False)
|
||||||
|
vectorFactory.readWeights(parser)
|
||||||
|
input.close()
|
||||||
|
return vectorFactory.getSettings()
|
||||||
|
|
||||||
|
func = currentProgram.getFunctionManager().getFunctionContaining(currentAddress)
|
||||||
|
if func is None:
|
||||||
|
print "no function at current address"
|
||||||
|
else:
|
||||||
|
processFunction(func)
|
69
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.java
Executable file
69
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.java
Executable file
@ -0,0 +1,69 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
//Example of how to perform an overview query in a script.
|
||||||
|
//@category BSim
|
||||||
|
import java.util.HashSet;
|
||||||
|
|
||||||
|
import generic.lsh.vector.LSHVectorFactory;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.facade.SFOverviewInfo;
|
||||||
|
import ghidra.features.bsim.query.facade.SimilarFunctionQueryService;
|
||||||
|
import ghidra.features.bsim.query.protocol.ResponseNearestVector;
|
||||||
|
import ghidra.features.bsim.query.protocol.SimilarityVectorResult;
|
||||||
|
import ghidra.program.database.symbol.FunctionSymbol;
|
||||||
|
import ghidra.program.model.listing.*;
|
||||||
|
|
||||||
|
|
||||||
|
public class ExampleOverviewQuery extends GhidraScript {
|
||||||
|
private static final double SIMILARITY_BOUND = 0.7;
|
||||||
|
private static final double SIGNIFICANCE_BOUND = 0.0;
|
||||||
|
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
Program queryingProgram = currentProgram;
|
||||||
|
HashSet<FunctionSymbol> funcsToQuery = new HashSet<>();
|
||||||
|
FunctionIterator fIter = queryingProgram.getFunctionManager().getFunctionsNoStubs(true);
|
||||||
|
for (Function func : fIter){
|
||||||
|
funcsToQuery.add((FunctionSymbol) func.getSymbol());
|
||||||
|
}
|
||||||
|
SFOverviewInfo overviewInfo = new SFOverviewInfo(funcsToQuery);
|
||||||
|
overviewInfo.setSimilarityThreshold(SIMILARITY_BOUND);
|
||||||
|
overviewInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND);
|
||||||
|
|
||||||
|
try (SimilarFunctionQueryService queryService =
|
||||||
|
new SimilarFunctionQueryService(queryingProgram)) {
|
||||||
|
String DATABASE_URL = askString("Enter database URL", "URL:");
|
||||||
|
queryService.initializeDatabase(DATABASE_URL);
|
||||||
|
LSHVectorFactory vectorFactory = queryService.getLSHVectorFactory();
|
||||||
|
|
||||||
|
ResponseNearestVector overviewResults =
|
||||||
|
queryService.overviewSimilarFunctions(overviewInfo, null, monitor);
|
||||||
|
StringBuilder buf = new StringBuilder();
|
||||||
|
buf.append("\n");
|
||||||
|
for (SimilarityVectorResult result : overviewResults.result) {
|
||||||
|
buf.append("Name: ").append(result.getBase().getFunctionName()).append("\n");
|
||||||
|
buf.append("Hit Count: ").append(result.getTotalCount()).append("\n");
|
||||||
|
buf.append("Self-significance: ");
|
||||||
|
buf.append(vectorFactory
|
||||||
|
.getSelfSignificance(result.getBase().getSignatureRecord().getLSHVector()));
|
||||||
|
buf.append("\n\n");
|
||||||
|
}
|
||||||
|
printf("%s\n", buf.toString());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
47
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.py
Executable file
47
Ghidra/Features/BSim/ghidra_scripts/ExampleOverviewQuery.py
Executable file
@ -0,0 +1,47 @@
|
|||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
# Example of how to perform an overview query in a script
|
||||||
|
# @category BSim.python
|
||||||
|
|
||||||
|
import ghidra.query.facade.SFOverviewInfo as SFOverviewInfo
|
||||||
|
import ghidra.query.facade.SimilarFunctionQueryService as SimilarFunctionQueryService
|
||||||
|
import java.util.HashSet
|
||||||
|
|
||||||
|
SIMILARITY_BOUND = 0.7
|
||||||
|
SIGNIFICANCE_BOUND = 0.0
|
||||||
|
|
||||||
|
funcsToQuery = java.util.HashSet()
|
||||||
|
fIter = currentProgram.getFunctionManager().getFunctionsNoStubs(True)
|
||||||
|
for func in fIter:
|
||||||
|
funcsToQuery.add(func.getSymbol())
|
||||||
|
|
||||||
|
overviewInfo = SFOverviewInfo(funcsToQuery)
|
||||||
|
overviewInfo.setSimilarityThreshold(SIMILARITY_BOUND)
|
||||||
|
overviewInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND)
|
||||||
|
|
||||||
|
queryService = SimilarFunctionQueryService(currentProgram)
|
||||||
|
DB_URL = askString("Enter database URL", "URL:")
|
||||||
|
queryService.initializeDatabase(DB_URL)
|
||||||
|
vectorFactory = queryService.getLSHVectorFactory()
|
||||||
|
|
||||||
|
overviewResults = queryService.overviewSimilarFunctions(overviewInfo, monitor)
|
||||||
|
|
||||||
|
for result in overviewResults.result:
|
||||||
|
print "Name: %s" % result.getBase().getFunctionName()
|
||||||
|
print "Hit Count: %d" % result.getTotalCount()
|
||||||
|
print "Self-significance: %f\n" % vectorFactory.getSelfSignificance(result.getBase().getSignatureRecord().getLSHVector())
|
||||||
|
|
||||||
|
queryService.dispose()
|
83
Ghidra/Features/BSim/ghidra_scripts/ExampleQueryClient.java
Executable file
83
Ghidra/Features/BSim/ghidra_scripts/ExampleQueryClient.java
Executable file
@ -0,0 +1,83 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Example of connecting to a BSim server and requesting executable and function records
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.io.StringWriter;
|
||||||
|
import java.net.URL;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.BSimClientFactory;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase;
|
||||||
|
import ghidra.features.bsim.query.description.*;
|
||||||
|
import ghidra.features.bsim.query.protocol.*;
|
||||||
|
import ghidra.util.Msg;
|
||||||
|
|
||||||
|
public class ExampleQueryClient extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
URL url = BSimClientFactory.deriveBSimURL("ghidra://localhost/repo");
|
||||||
|
try (FunctionDatabase client = BSimClientFactory.buildClient(url, false)) {
|
||||||
|
if (!client.initialize()) {
|
||||||
|
Msg.error(this, "Unable to connect to server");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
QueryInfo query = new QueryInfo();
|
||||||
|
ResponseInfo resp = query.execute(client);
|
||||||
|
StringWriter write = new StringWriter();
|
||||||
|
resp.saveXml(write);
|
||||||
|
write.flush();
|
||||||
|
|
||||||
|
QueryName exequery = new QueryName();
|
||||||
|
exequery.spec.exename = "libdocdoxygenplugin.so";
|
||||||
|
ResponseName respname = exequery.execute(client);
|
||||||
|
if (respname == null) {
|
||||||
|
Msg.error(this, client.getLastError());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
ExecutableRecord erec = respname.manage.getExecutableRecordSet().first();
|
||||||
|
FunctionDescription funcrec =
|
||||||
|
respname.manage.findFunctionByName("DocDoxygenPlugin::createCatalog", erec);
|
||||||
|
|
||||||
|
QueryChildren childquery = new QueryChildren();
|
||||||
|
childquery.md5sum = funcrec.getExecutableRecord().getMd5();
|
||||||
|
childquery.functionKeys.add(new FunctionEntry(funcrec));
|
||||||
|
|
||||||
|
ResponseChildren respchild = childquery.execute(client);
|
||||||
|
if (respchild == null) {
|
||||||
|
Msg.error(this, client.getLastError());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
for (int i = 0; i < respchild.correspond.size(); ++i) {
|
||||||
|
FunctionDescription func = respchild.correspond.get(i);
|
||||||
|
List<CallgraphEntry> callgraphRecord = func.getCallgraphRecord();
|
||||||
|
if (callgraphRecord != null) {
|
||||||
|
for (int j = 0; j < callgraphRecord.size(); ++j) {
|
||||||
|
write.write(
|
||||||
|
callgraphRecord.get(j).getFunctionDescription().getFunctionName());
|
||||||
|
write.write('\n');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
write.flush();
|
||||||
|
Msg.info(this, write.toString());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
73
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.java
Executable file
73
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.java
Executable file
@ -0,0 +1,73 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Generate signatures for every function in the current executable and write in XML form to
|
||||||
|
// a user specified file.
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
import java.util.Iterator;
|
||||||
|
|
||||||
|
import generic.lsh.vector.LSHVectorFactory;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase;
|
||||||
|
import ghidra.features.bsim.query.GenSignatures;
|
||||||
|
import ghidra.features.bsim.query.client.Configuration;
|
||||||
|
import ghidra.features.bsim.query.description.DescriptionManager;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
import ghidra.program.model.listing.FunctionManager;
|
||||||
|
|
||||||
|
public class GenerateSignatures extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void run() throws Exception {
|
||||||
|
final String md5string = currentProgram.getExecutableMD5();
|
||||||
|
if ((md5string == null) || (md5string.length() < 10)) {
|
||||||
|
throw new IOException("Could not get MD5 on file: " + currentProgram.getName());
|
||||||
|
}
|
||||||
|
final String basename = "sigs_" + md5string;
|
||||||
|
System.setProperty("ghidra.output", basename); // Inform parallel controller of output name
|
||||||
|
File file = null;
|
||||||
|
// This form of askString will work for both standalone execution or for parallel
|
||||||
|
final File workingdir = askDirectory("GenerateSignatures:", "Working directory");
|
||||||
|
if (!workingdir.isDirectory()) {
|
||||||
|
popup("Must select a working directory!");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
file = new File(workingdir, basename);
|
||||||
|
|
||||||
|
final LSHVectorFactory vectorFactory = FunctionDatabase.generateLSHVectorFactory();
|
||||||
|
final GenSignatures gensig = new GenSignatures(true);
|
||||||
|
final String templatename =
|
||||||
|
askString("GenerateSignatures:", "Database template", "medium_nosize");
|
||||||
|
final Configuration config = FunctionDatabase.loadConfigurationTemplate(templatename);
|
||||||
|
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings);
|
||||||
|
gensig.setVectorFactory(vectorFactory);
|
||||||
|
gensig.addExecutableCategories(config.info.execats);
|
||||||
|
gensig.addFunctionTags(config.info.functionTags);
|
||||||
|
gensig.addDateColumnName(config.info.dateColumnName);
|
||||||
|
final String repo = "ghidra://localhost/" + state.getProject().getName();
|
||||||
|
final String path = GenSignatures.getPathFromDomainFile(currentProgram);
|
||||||
|
gensig.openProgram(this.currentProgram, null, null, null, repo, path);
|
||||||
|
final FunctionManager fman = currentProgram.getFunctionManager();
|
||||||
|
final Iterator<Function> iter = fman.getFunctions(true);
|
||||||
|
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor);
|
||||||
|
final FileWriter fwrite = new FileWriter(file);
|
||||||
|
final DescriptionManager manager = gensig.getDescriptionManager();
|
||||||
|
manager.saveXml(fwrite);
|
||||||
|
fwrite.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
58
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.py
Executable file
58
Ghidra/Features/BSim/ghidra_scripts/GenerateSignatures.py
Executable file
@ -0,0 +1,58 @@
|
|||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
#Generate signatures for every function in the current program and write them to an XML file in a user-specified directory
|
||||||
|
#@category BSim.python
|
||||||
|
|
||||||
|
import java.lang.System as System
|
||||||
|
import java.io.File as File
|
||||||
|
import ghidra.query.FunctionDatabase as FunctionDatabase
|
||||||
|
import ghidra.query.GenSignatures as GenSignatures
|
||||||
|
import java.io.FileWriter as FileWriter
|
||||||
|
|
||||||
|
def run():
|
||||||
|
md5String = currentProgram.getExecutableMD5()
|
||||||
|
if (md5String is None) or (len(md5String) < 10):
|
||||||
|
raise IOException("Could not get MD5 on file: " + currentProgram.getName())
|
||||||
|
basename = "sigs_" + md5String
|
||||||
|
System.setProperty("ghidra.output",basename)
|
||||||
|
workingDir = askDirectory("GenerateSignatures:", "Working Directory")
|
||||||
|
if not workingDir.isDirectory():
|
||||||
|
popup("Must select a working directory")
|
||||||
|
return
|
||||||
|
outfile = File(workingDir,basename)
|
||||||
|
vectorFactory = FunctionDatabase.generateLSHVectorFactory()
|
||||||
|
gensig = GenSignatures(True)
|
||||||
|
templateName = askString("GenerateSignatures:", "Database template", "medium_nosize")
|
||||||
|
config = FunctionDatabase.loadConfigurationTemplate(templateName)
|
||||||
|
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings)
|
||||||
|
gensig.setVectorFactory(vectorFactory)
|
||||||
|
gensig.addExecutableCategories(config.info.execats)
|
||||||
|
gensig.addFunctionTags(config.info.functionTags)
|
||||||
|
gensig.addDateColumnName(config.info.dateColumnName)
|
||||||
|
repo = "ghidra://localhost/" + state.getProject().getName()
|
||||||
|
path = GenSignatures.getPathFromDomainFile(currentProgram)
|
||||||
|
gensig.openProgram(currentProgram,None,None,None,repo,path)
|
||||||
|
fman = currentProgram.getFunctionManager()
|
||||||
|
iter = fman.getFunctions(True)
|
||||||
|
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor)
|
||||||
|
fwrite = FileWriter(outfile)
|
||||||
|
manager = gensig.getDescriptionManager()
|
||||||
|
manager.saveXml(fwrite)
|
||||||
|
fwrite.close()
|
||||||
|
return
|
||||||
|
|
||||||
|
run()
|
||||||
|
|
443
Ghidra/Features/BSim/ghidra_scripts/LocalBSimQueryScript.java
Normal file
443
Ghidra/Features/BSim/ghidra_scripts/LocalBSimQueryScript.java
Normal file
@ -0,0 +1,443 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
//Queries all functions in the current selection (or all functions in the current program if
|
||||||
|
//the current selection is null) against all functions in a user-selected program.
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.util.*;
|
||||||
|
|
||||||
|
import org.apache.commons.collections4.IteratorUtils;
|
||||||
|
|
||||||
|
import generic.lsh.vector.*;
|
||||||
|
import ghidra.app.decompiler.DecompileException;
|
||||||
|
import ghidra.app.plugin.core.functioncompare.FunctionComparisonProvider;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.app.services.FunctionComparisonService;
|
||||||
|
import ghidra.app.tablechooser.*;
|
||||||
|
import ghidra.features.bsim.query.*;
|
||||||
|
import ghidra.features.bsim.query.client.Configuration;
|
||||||
|
import ghidra.features.bsim.query.description.FunctionDescription;
|
||||||
|
import ghidra.program.model.address.Address;
|
||||||
|
import ghidra.program.model.listing.*;
|
||||||
|
|
||||||
|
//TODO: docs
|
||||||
|
|
||||||
|
public class LocalBSimQueryScript extends GhidraScript {
|
||||||
|
|
||||||
|
//functions with self significance below this bound will be skipped
|
||||||
|
private static final double SELF_SIGNIFICANCE_BOUND = 15.0;
|
||||||
|
//bsim database template determining the signature settings
|
||||||
|
private static final String TEMPLATE_NAME = "medium_nosize";
|
||||||
|
//these are analogous to the bounds in a bsim query
|
||||||
|
private static final double MATCH_SIMILARITY_LOWER_BOUND = 0.0;
|
||||||
|
private static final double MATCH_CONFIDENCE_LOWER_BOUND = 0.0;
|
||||||
|
private static final int MATCHES_PER_FUNCTION = 10;
|
||||||
|
//decrease this if you only want to see matches that aren't exact
|
||||||
|
//for instance, when looking for changes between two versions of a program
|
||||||
|
private static final double MATCH_SIMILARITY_UPPER_BOUND = 1.0;
|
||||||
|
|
||||||
|
private TableChooserDialog tableDialog;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
if (isRunningHeadless()) {
|
||||||
|
popup("This script cannot be run headlessly.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
Set<Function> sourceFuncs = new HashSet<>();
|
||||||
|
if (currentSelection == null) {
|
||||||
|
IteratorUtils.forEach(currentProgram.getFunctionManager().getFunctions(true),
|
||||||
|
x -> sourceFuncs.add(x));
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
IteratorUtils.forEach(
|
||||||
|
currentProgram.getFunctionManager().getFunctionsOverlapping(currentSelection),
|
||||||
|
x -> sourceFuncs.add(x));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (sourceFuncs.isEmpty()) {
|
||||||
|
this.popup("No non-stub functions to query!");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
Program targetProgram = askProgram("Select Target Program");
|
||||||
|
if (targetProgram == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
List<LocalBSimMatch> localMatches = null;
|
||||||
|
|
||||||
|
//use special optimized method when the target program is the same as the current program
|
||||||
|
//in that case, a given function might be in both the source and target sets
|
||||||
|
//but we only want to generate signatures for it once
|
||||||
|
if (currentProgram.getUniqueProgramID() == targetProgram.getUniqueProgramID()) {
|
||||||
|
localMatches = getMatchesCurrentProgram(sourceFuncs);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
//in this case there is no overlap between the source and target functions
|
||||||
|
localMatches = getMatchesTwoPrograms(sourceFuncs, currentProgram, targetProgram);
|
||||||
|
}
|
||||||
|
if (localMatches.isEmpty()) {
|
||||||
|
popup("No matches meeting criteria.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
Collections.sort(localMatches);
|
||||||
|
initializeTable(currentProgram, targetProgram);
|
||||||
|
|
||||||
|
//again, use an optimized method for the special case when target program is the same
|
||||||
|
//as the current program
|
||||||
|
if (currentProgram.getUniqueProgramID() == targetProgram.getUniqueProgramID()) {
|
||||||
|
addMatchesOneProgram(localMatches, sourceFuncs);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
addMatchesTwoPrograms(localMatches);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
finally {
|
||||||
|
targetProgram.release(this);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Iterate through the list of sorted matches, adding the top MATCHES_PER_FUNCTION elements
|
||||||
|
* to the table for each source function.
|
||||||
|
* @param localMatches matches in decreasing order of confidence
|
||||||
|
*/
|
||||||
|
private void addMatchesTwoPrograms(List<LocalBSimMatch> localMatches) {
|
||||||
|
Map<Function, Integer> matchCounts = new HashMap<>();
|
||||||
|
for (LocalBSimMatch match : localMatches) {
|
||||||
|
int count = matchCounts.getOrDefault(match.getSourceFunc(), 0);
|
||||||
|
if (count >= MATCHES_PER_FUNCTION) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
tableDialog.add(match);
|
||||||
|
matchCounts.put(match.getSourceFunc(), count + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Iterate through the list of sorted matches, adding the top MATCHES_PER_FUNCTION elements
|
||||||
|
* to the table for each function ins {@code sourceFuncSet}.
|
||||||
|
*
|
||||||
|
* By construction, the matches in this list have the "source" function before the "target"
|
||||||
|
* function (in address order). This is an optimization to prevent essentially the same
|
||||||
|
* data from appearing in the list twice (since the BSim similarity and confidence operations
|
||||||
|
* are commutative). So, for each match, we need to check whether the source or the
|
||||||
|
* target are in {@code sourceFuncSet}.
|
||||||
|
*
|
||||||
|
* @param localMatches matches in decreasing order of confidence
|
||||||
|
* @param sourceFuncSet source functions
|
||||||
|
*/
|
||||||
|
private void addMatchesOneProgram(List<LocalBSimMatch> localMatches,
|
||||||
|
Set<Function> sourceFuncSet) {
|
||||||
|
Map<Function, Integer> matchCounts = new HashMap<>();
|
||||||
|
for (LocalBSimMatch match : localMatches) {
|
||||||
|
Function leftFunc = match.getSourceFunc();
|
||||||
|
int leftCount = matchCounts.getOrDefault(leftFunc, 0);
|
||||||
|
if (sourceFuncSet.contains(leftFunc) && leftCount < MATCHES_PER_FUNCTION) {
|
||||||
|
tableDialog.add(match);
|
||||||
|
matchCounts.put(leftFunc, leftCount + 1);
|
||||||
|
}
|
||||||
|
Function rightFunc = match.getTargetFunc();
|
||||||
|
int rightCount = matchCounts.getOrDefault(rightFunc, 0);
|
||||||
|
if (sourceFuncSet.contains(rightFunc) && rightCount < MATCHES_PER_FUNCTION) {
|
||||||
|
LocalBSimMatch switched = new LocalBSimMatch(rightFunc, leftFunc,
|
||||||
|
match.getSimilarity(), match.getSignificance());
|
||||||
|
tableDialog.add(switched);
|
||||||
|
matchCounts.put(rightFunc, rightCount + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private List<LocalBSimMatch> getMatchesCurrentProgram(Set<Function> funcs)
|
||||||
|
throws LSHException, DecompileException {
|
||||||
|
List<LocalBSimMatch> bsimMatches = new ArrayList<>();
|
||||||
|
LSHVectorFactory vectorFactory = getVectorFactory();
|
||||||
|
|
||||||
|
//generate the signatures for *all* functions in the program...
|
||||||
|
FunctionManager fman = currentProgram.getFunctionManager();
|
||||||
|
Iterator<Function> iter = fman.getFunctions(true);
|
||||||
|
GenSignatures gensig =
|
||||||
|
generateSignatures(currentProgram, iter, fman.getFunctionCount(), vectorFactory);
|
||||||
|
|
||||||
|
//...but use sourceFuncAddrs to ensure that source functions are in the
|
||||||
|
//funcs set
|
||||||
|
Set<Long> sourceFuncAddrs = new HashSet<>();
|
||||||
|
for (Function func : funcs) {
|
||||||
|
sourceFuncAddrs.add(func.getEntryPoint().getOffset());
|
||||||
|
}
|
||||||
|
Iterator<FunctionDescription> sourceDescripts =
|
||||||
|
gensig.getDescriptionManager().listAllFunctions();
|
||||||
|
VectorCompare vecCompare = new VectorCompare();
|
||||||
|
while (sourceDescripts.hasNext()) {
|
||||||
|
FunctionDescription srcDesc = sourceDescripts.next();
|
||||||
|
//skip if not in selection
|
||||||
|
if (!sourceFuncAddrs.contains(srcDesc.getAddress())) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
//skip if self-significance too small
|
||||||
|
LSHVector srcVector = srcDesc.getSignatureRecord().getLSHVector();
|
||||||
|
if (vectorFactory.getSelfSignificance(srcVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
Iterator<FunctionDescription> targetDescripts =
|
||||||
|
gensig.getDescriptionManager().listAllFunctions();
|
||||||
|
Function srcFunc = getFunction(currentProgram, srcDesc.getAddress());
|
||||||
|
while (targetDescripts.hasNext()) {
|
||||||
|
//skip if target before srcFunc in address order
|
||||||
|
//AND target is one of the source functions (i.e., in funcs)
|
||||||
|
FunctionDescription targetDesc = targetDescripts.next();
|
||||||
|
long targetAddress = targetDesc.getAddress();
|
||||||
|
if (sourceFuncAddrs.contains(targetAddress) &&
|
||||||
|
targetAddress <= srcDesc.getAddress()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
//skip if self-significance too small
|
||||||
|
LSHVector targetVector = targetDesc.getSignatureRecord().getLSHVector();
|
||||||
|
if (vectorFactory.getSelfSignificance(targetVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
double sim = srcVector.compare(targetVector, vecCompare);
|
||||||
|
double sig = vectorFactory.calculateSignificance(vecCompare);
|
||||||
|
if (sig >= MATCH_CONFIDENCE_LOWER_BOUND && MATCH_SIMILARITY_LOWER_BOUND <= sim &&
|
||||||
|
sim <= MATCH_SIMILARITY_UPPER_BOUND) {
|
||||||
|
Function targetFunc = getFunction(currentProgram, targetDesc.getAddress());
|
||||||
|
bsimMatches.add(new LocalBSimMatch(srcFunc, targetFunc, sim, sig));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return bsimMatches;
|
||||||
|
}
|
||||||
|
|
||||||
|
private List<LocalBSimMatch> getMatchesTwoPrograms(Set<Function> srcFuncs,
|
||||||
|
Program sourceProgram, Program targetProgram) throws LSHException, DecompileException {
|
||||||
|
List<LocalBSimMatch> bsimMatches = new ArrayList<>();
|
||||||
|
LSHVectorFactory vectorFactory = getVectorFactory();
|
||||||
|
GenSignatures srcSigs =
|
||||||
|
generateSignatures(sourceProgram, srcFuncs.iterator(), srcFuncs.size(), vectorFactory);
|
||||||
|
FunctionManager targetFuncMan = targetProgram.getFunctionManager();
|
||||||
|
Iterator<Function> targetFuncIter = targetFuncMan.getFunctions(true);
|
||||||
|
GenSignatures targetSigs = generateSignatures(targetProgram, targetFuncIter,
|
||||||
|
targetFuncMan.getFunctionCount(), vectorFactory);
|
||||||
|
Iterator<FunctionDescription> sourceDescripts =
|
||||||
|
srcSigs.getDescriptionManager().listAllFunctions();
|
||||||
|
VectorCompare vecCompare = new VectorCompare();
|
||||||
|
while (sourceDescripts.hasNext()) {
|
||||||
|
FunctionDescription srcDesc = sourceDescripts.next();
|
||||||
|
//skip if self-significance too small
|
||||||
|
LSHVector srcVector = srcDesc.getSignatureRecord().getLSHVector();
|
||||||
|
if (vectorFactory.getSelfSignificance(srcVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
Iterator<FunctionDescription> targetDescripts =
|
||||||
|
targetSigs.getDescriptionManager().listAllFunctions();
|
||||||
|
Function srcFunc = getFunction(sourceProgram, srcDesc.getAddress());
|
||||||
|
while (targetDescripts.hasNext()) {
|
||||||
|
FunctionDescription targetDesc = targetDescripts.next();
|
||||||
|
//skip if self-significance too small
|
||||||
|
LSHVector targetVector = targetDesc.getSignatureRecord().getLSHVector();
|
||||||
|
if (vectorFactory.getSelfSignificance(targetVector) <= SELF_SIGNIFICANCE_BOUND) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
double sim = srcVector.compare(targetVector, vecCompare);
|
||||||
|
double sig = vectorFactory.calculateSignificance(vecCompare);
|
||||||
|
if (sig >= MATCH_CONFIDENCE_LOWER_BOUND && MATCH_SIMILARITY_LOWER_BOUND <= sim &&
|
||||||
|
sim <= MATCH_SIMILARITY_UPPER_BOUND) {
|
||||||
|
Function targetFunc = getFunction(targetProgram, targetDesc.getAddress());
|
||||||
|
bsimMatches.add(new LocalBSimMatch(srcFunc, targetFunc, sim, sig));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return bsimMatches;
|
||||||
|
}
|
||||||
|
|
||||||
|
private Function getFunction(Program program, long offset) {
|
||||||
|
Address addr = program.getAddressFactory().getDefaultAddressSpace().getAddress(offset);
|
||||||
|
return program.getFunctionManager().getFunctionAt(addr);
|
||||||
|
}
|
||||||
|
|
||||||
|
private LSHVectorFactory getVectorFactory() throws LSHException {
|
||||||
|
LSHVectorFactory vectorFactory = FunctionDatabase.generateLSHVectorFactory();
|
||||||
|
Configuration config = FunctionDatabase.loadConfigurationTemplate(TEMPLATE_NAME);
|
||||||
|
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings);
|
||||||
|
return vectorFactory;
|
||||||
|
}
|
||||||
|
|
||||||
|
private GenSignatures generateSignatures(Program program, Iterator<Function> funcs, int count,
|
||||||
|
LSHVectorFactory vectorFactory) throws LSHException, DecompileException {
|
||||||
|
GenSignatures gensig = new GenSignatures(false);
|
||||||
|
gensig.setVectorFactory(vectorFactory);
|
||||||
|
gensig.openProgram(program, null, null, null, null, null);
|
||||||
|
gensig.scanFunctions(funcs, count, monitor);
|
||||||
|
return gensig;
|
||||||
|
}
|
||||||
|
|
||||||
|
class LocalBSimMatch implements Comparable<LocalBSimMatch>, AddressableRowObject {
|
||||||
|
private Function sourceFunc;
|
||||||
|
private Function targetFunc;
|
||||||
|
private double similarity;
|
||||||
|
private double significance;
|
||||||
|
|
||||||
|
public LocalBSimMatch(Function sourceFunc, Function targetFunc, double sim, double signif) {
|
||||||
|
this.sourceFunc = sourceFunc;
|
||||||
|
this.targetFunc = targetFunc;
|
||||||
|
this.similarity = sim;
|
||||||
|
this.significance = signif;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Function getSourceFunc() {
|
||||||
|
return sourceFunc;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Function getTargetFunc() {
|
||||||
|
return targetFunc;
|
||||||
|
}
|
||||||
|
|
||||||
|
public double getSimilarity() {
|
||||||
|
return similarity;
|
||||||
|
}
|
||||||
|
|
||||||
|
public double getSignificance() {
|
||||||
|
return significance;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Program getSourceProgram() {
|
||||||
|
return sourceFunc.getProgram();
|
||||||
|
}
|
||||||
|
|
||||||
|
public Program getTargetProgram() {
|
||||||
|
return targetFunc.getProgram();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public int compareTo(LocalBSimQueryScript.LocalBSimMatch o) {
|
||||||
|
return -Double.compare(significance, o.significance);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Address getAddress() {
|
||||||
|
return sourceFunc.getEntryPoint();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/****************************************************************************************
|
||||||
|
* table stuff
|
||||||
|
****************************************************************************************/
|
||||||
|
|
||||||
|
class CompareMatchesExecutor implements TableChooserExecutor {
|
||||||
|
|
||||||
|
private FunctionComparisonService compareService;
|
||||||
|
private FunctionComparisonProvider comparisonProvider;
|
||||||
|
|
||||||
|
public CompareMatchesExecutor() {
|
||||||
|
compareService = state.getTool().getService(FunctionComparisonService.class);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getButtonName() {
|
||||||
|
return "Compare Selected Matches";
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean execute(AddressableRowObject rowObject) {
|
||||||
|
LocalBSimMatch match = (LocalBSimMatch) rowObject;
|
||||||
|
if (comparisonProvider == null) {
|
||||||
|
comparisonProvider =
|
||||||
|
compareService.compareFunctions(match.getSourceFunc(), match.getTargetFunc());
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
compareService.compareFunctions(match.getSourceFunc(), match.getTargetFunc(),
|
||||||
|
comparisonProvider);
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private void initializeTable(Program sourceProgram, Program targetProgram) {
|
||||||
|
StringBuilder titleBuilder = new StringBuilder("Local BSim Matches: ");
|
||||||
|
titleBuilder.append(sourceProgram.getDomainFile().getPathname());
|
||||||
|
titleBuilder.append(" -> ");
|
||||||
|
titleBuilder.append(targetProgram.getDomainFile().getPathname());
|
||||||
|
tableDialog =
|
||||||
|
createTableChooserDialog(titleBuilder.toString(), new CompareMatchesExecutor());
|
||||||
|
configureTableColumns(tableDialog);
|
||||||
|
tableDialog.setMinimumSize(800, 400);
|
||||||
|
tableDialog.show();
|
||||||
|
tableDialog.setMessage(null);
|
||||||
|
}
|
||||||
|
|
||||||
|
private void configureTableColumns(TableChooserDialog dialog) {
|
||||||
|
|
||||||
|
ColumnDisplay<Double> simColumn = new AbstractComparableColumnDisplay<Double>() {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Double getColumnValue(AddressableRowObject rowObject) {
|
||||||
|
return ((LocalBSimMatch) rowObject).getSimilarity();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getColumnName() {
|
||||||
|
return "Similarity";
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
ColumnDisplay<Double> sigColumn = new AbstractComparableColumnDisplay<Double>() {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Double getColumnValue(AddressableRowObject rowObject) {
|
||||||
|
return ((LocalBSimMatch) rowObject).getSignificance();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getColumnName() {
|
||||||
|
return "Significance";
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
StringColumnDisplay sourceFuncColumn = new StringColumnDisplay() {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getColumnValue(AddressableRowObject rowObject) {
|
||||||
|
return ((LocalBSimMatch) rowObject).getSourceFunc().getName(true);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getColumnName() {
|
||||||
|
return "Source Function";
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
StringColumnDisplay targetFuncColumn = new StringColumnDisplay() {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getColumnValue(AddressableRowObject rowObject) {
|
||||||
|
return ((LocalBSimMatch) rowObject).getTargetFunc().getName(true);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getColumnName() {
|
||||||
|
return "Target Function";
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
dialog.addCustomColumn(simColumn);
|
||||||
|
dialog.addCustomColumn(sigColumn);
|
||||||
|
dialog.addCustomColumn(sourceFuncColumn);
|
||||||
|
dialog.addCustomColumn(targetFuncColumn);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
108
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.java
Executable file
108
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.java
Executable file
@ -0,0 +1,108 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Example of querying a BSim database about a single function
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.net.URL;
|
||||||
|
import java.util.Iterator;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.*;
|
||||||
|
import ghidra.features.bsim.query.description.*;
|
||||||
|
import ghidra.features.bsim.query.protocol.*;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
|
||||||
|
|
||||||
|
public class QueryFunction extends GhidraScript {
|
||||||
|
|
||||||
|
//GenSignatures gensig;
|
||||||
|
//FunctionDatabase database;
|
||||||
|
private static final int MATCHES_PER_FUNC = 10;
|
||||||
|
private static final double SIMILARITY_BOUND = 0.7;
|
||||||
|
private static final double CONFIDENCE_BOUND = 0.0;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void run() throws Exception {
|
||||||
|
if (currentProgram == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
Function func = this.getFunctionContaining(this.currentAddress);
|
||||||
|
if (func == null){
|
||||||
|
popup("No function selected!");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
String DATABASE_URL = askString("Enter Database URL", "URL");
|
||||||
|
URL url = BSimClientFactory.deriveBSimURL(DATABASE_URL);
|
||||||
|
try (FunctionDatabase database = BSimClientFactory.buildClient(url, false)) {
|
||||||
|
if (!database.initialize()) {
|
||||||
|
println(database.getLastError().message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
GenSignatures gensig = new GenSignatures(false);
|
||||||
|
try {
|
||||||
|
gensig.setVectorFactory(database.getLSHVectorFactory());
|
||||||
|
gensig.openProgram(currentProgram, null, null, null, null, null);
|
||||||
|
|
||||||
|
DescriptionManager manager = gensig.getDescriptionManager();
|
||||||
|
gensig.scanFunction(func);
|
||||||
|
|
||||||
|
QueryNearest query = new QueryNearest();
|
||||||
|
query.manage = manager;
|
||||||
|
query.max = MATCHES_PER_FUNC;
|
||||||
|
query.thresh = SIMILARITY_BOUND;
|
||||||
|
query.signifthresh = CONFIDENCE_BOUND;
|
||||||
|
|
||||||
|
ResponseNearest response = query.execute(database);
|
||||||
|
if (response == null) {
|
||||||
|
println(database.getLastError().message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
Iterator<SimilarityResult> iter = response.result.iterator();
|
||||||
|
StringBuffer buf = new StringBuffer();
|
||||||
|
while (iter.hasNext()) {
|
||||||
|
SimilarityResult sim = iter.next();
|
||||||
|
FunctionDescription base = sim.getBase();
|
||||||
|
ExecutableRecord exe = base.getExecutableRecord();
|
||||||
|
buf.append("\nExecutable: ")
|
||||||
|
.append(exe.getNameExec())
|
||||||
|
.append("\nFunction: ")
|
||||||
|
.append(base.getFunctionName())
|
||||||
|
.append('\n');
|
||||||
|
Iterator<SimilarityNote> subiter = sim.iterator();
|
||||||
|
while (subiter.hasNext()) {
|
||||||
|
SimilarityNote note = subiter.next();
|
||||||
|
FunctionDescription fdesc = note.getFunctionDescription();
|
||||||
|
ExecutableRecord exerec = fdesc.getExecutableRecord();
|
||||||
|
buf.append(" Executable: ");
|
||||||
|
buf.append(exerec.getNameExec())
|
||||||
|
.append("\n Matching Function name: ")
|
||||||
|
.append(fdesc.getFunctionName());
|
||||||
|
buf.append("\n Similarity: ").append(note.getSimilarity());
|
||||||
|
buf.append("\n Significance: ").append(note.getSignificance());
|
||||||
|
buf.append("\n\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
println(buf.toString());
|
||||||
|
}
|
||||||
|
finally {
|
||||||
|
gensig.dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
78
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.py
Executable file
78
Ghidra/Features/BSim/ghidra_scripts/QueryFunction.py
Executable file
@ -0,0 +1,78 @@
|
|||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
# Example of performing a BSim query on a single function
|
||||||
|
# @category BSim.python
|
||||||
|
|
||||||
|
import ghidra.query.BSimClientFactory as BSimClientFactory
|
||||||
|
import ghidra.query.GenSignatures as GenSignatures
|
||||||
|
import ghidra.query.protocol.QueryNearest as QueryNearest
|
||||||
|
|
||||||
|
MATCHES_PER_FUNC = 100
|
||||||
|
SIMILARITY_BOUND = 0.7
|
||||||
|
CONFIDENCE_BOUND = 0.0
|
||||||
|
|
||||||
|
def query(func):
|
||||||
|
DATABASE_URL = askString("Enter Database URL", "URL")
|
||||||
|
url = BSimClientFactory.deriveBSimURL(DATABASE_URL)
|
||||||
|
database = BSimClientFactory.buildClient(url,False)
|
||||||
|
if not database.initialize():
|
||||||
|
print database.getLastError().message
|
||||||
|
return
|
||||||
|
gensig = GenSignatures(False)
|
||||||
|
gensig.setVectorFactory(database.getLSHVectorFactory())
|
||||||
|
gensig.openProgram(currentProgram,None,None,None,None,None)
|
||||||
|
|
||||||
|
gensig.scanFunction(func)
|
||||||
|
|
||||||
|
query = QueryNearest()
|
||||||
|
query.manage = gensig.getDescriptionManager()
|
||||||
|
query.max = MATCHES_PER_FUNC
|
||||||
|
query.thresh = SIMILARITY_BOUND
|
||||||
|
query.signifthresh = CONFIDENCE_BOUND
|
||||||
|
|
||||||
|
response = database.query(query)
|
||||||
|
if response is None:
|
||||||
|
print database.getLastError().message
|
||||||
|
return
|
||||||
|
simIter = response.result.iterator()
|
||||||
|
while simIter.hasNext():
|
||||||
|
sim = simIter.next()
|
||||||
|
base = sim.getBase()
|
||||||
|
exe = base.getExecutableRecord()
|
||||||
|
print "Source executable: %s; source function: %s" % (exe.getNameExec(),base.getFunctionName())
|
||||||
|
subIter = sim.iterator()
|
||||||
|
while subIter.hasNext():
|
||||||
|
note = subIter.next()
|
||||||
|
fdesc = note.getFunctionDescription()
|
||||||
|
exerec = fdesc.getExecutableRecord()
|
||||||
|
print " Executable: %s" % exerec.getNameExec()
|
||||||
|
print " Matching Function name: %s " % fdesc.getFunctionName()
|
||||||
|
print " Similarity: %f" % note.getSimilarity()
|
||||||
|
print " Significance: %f\n" % note.getSignificance()
|
||||||
|
gensig.dispose()
|
||||||
|
database.close()
|
||||||
|
return;
|
||||||
|
|
||||||
|
if currentProgram is None:
|
||||||
|
popup("currentProgram is None!")
|
||||||
|
else:
|
||||||
|
func = currentProgram.getFunctionManager().getFunctionContaining(currentAddress)
|
||||||
|
if func is None:
|
||||||
|
popup("Cursor must be in a function!")
|
||||||
|
else:
|
||||||
|
query(func)
|
||||||
|
|
||||||
|
|
333
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.java
Executable file
333
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.java
Executable file
@ -0,0 +1,333 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
//Example of a script to perform a more involved BSim query.
|
||||||
|
//@category BSim
|
||||||
|
import java.util.*;
|
||||||
|
import java.util.function.BiPredicate;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.gui.filters.*;
|
||||||
|
import ghidra.features.bsim.gui.search.results.BSimMatchResult;
|
||||||
|
import ghidra.features.bsim.gui.search.results.ExecutableResult;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase.ErrorCategory;
|
||||||
|
import ghidra.features.bsim.query.description.FunctionDescription;
|
||||||
|
import ghidra.features.bsim.query.facade.*;
|
||||||
|
import ghidra.features.bsim.query.protocol.BSimFilter;
|
||||||
|
import ghidra.features.bsim.query.protocol.PreFilter;
|
||||||
|
import ghidra.program.database.symbol.FunctionSymbol;
|
||||||
|
import ghidra.program.model.address.Address;
|
||||||
|
import ghidra.program.model.listing.*;
|
||||||
|
import ghidra.program.model.symbol.SourceType;
|
||||||
|
import ghidra.util.exception.CancelledException;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Script showing how to apply filters to a BSim query. Currently we support three types
|
||||||
|
* of filters, described below:
|
||||||
|
*
|
||||||
|
* 1. QUERY THRESHOLDS
|
||||||
|
* These are the items at the top of the BSim query dialog:
|
||||||
|
* Similarity
|
||||||
|
* Confidence
|
||||||
|
* Matches per Function
|
||||||
|
* These are server-side filters that will be applied when the db is queried.
|
||||||
|
*
|
||||||
|
* 2. PREFILTERS
|
||||||
|
* Allows users to identify functions that meet certain criteria by specifying
|
||||||
|
* {@link BiPredicate}s. Any functions matching the predicate(s) will be included
|
||||||
|
* in the result set.
|
||||||
|
*
|
||||||
|
* 3. EXECUTABLE FILTERS
|
||||||
|
* These are predefined filters that can be applied on the server or on the
|
||||||
|
* client (applied only to the results of a query). On the BSim query
|
||||||
|
* dialog these are the items in the filter pulldown menu.
|
||||||
|
* @see BSimFilterType
|
||||||
|
*
|
||||||
|
* SCRIPT FLOW
|
||||||
|
* This example script does the following:
|
||||||
|
*
|
||||||
|
* 1) Set threshold filters
|
||||||
|
* 2) Set prefilters
|
||||||
|
* 3) Set executable filters
|
||||||
|
* 4) Query the database & print results
|
||||||
|
* 5) Set new executable filters
|
||||||
|
* 6) Print results
|
||||||
|
*
|
||||||
|
* NOTES: 1. You will be queried for the location of the BSim database. This URL
|
||||||
|
* will take the form "ghidra://<ip address>/<database name>
|
||||||
|
*
|
||||||
|
* 2. This script is only an example - the specific filters demonstrated
|
||||||
|
* here will not necessarily apply to what's in your BSim database.
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
public class QueryWithFiltersScript extends GhidraScript {
|
||||||
|
|
||||||
|
// Threshold settings.
|
||||||
|
private static final int MAX_NUM_FUNCTIONS = 100;
|
||||||
|
private static final double SIMILARITY_BOUND = 0.7;
|
||||||
|
private static final double SIGNIFICANCE_BOUND = 0.0;
|
||||||
|
|
||||||
|
// Restricts the number of results.
|
||||||
|
private static final int NUM_EXES_TO_DISPLAY = 10;
|
||||||
|
|
||||||
|
// Prefilter value we'll be setting.
|
||||||
|
private static final double SELF_SIGNIFICANCE_BOUND = 40.0;
|
||||||
|
|
||||||
|
private HashSet<FunctionSymbol> funcsToQuery;
|
||||||
|
private SimilarFunctionQueryService queryService;
|
||||||
|
private SFQueryInfo queryInfo;
|
||||||
|
private BSimFilter bsimFilter;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
|
||||||
|
funcsToQuery = getFunctionsToQuery(currentProgram);
|
||||||
|
queryService = new SimilarFunctionQueryService(currentProgram);
|
||||||
|
queryInfo = new SFQueryInfo(funcsToQuery);
|
||||||
|
bsimFilter = queryInfo.getBsimFilter();
|
||||||
|
|
||||||
|
// Add threshold filters.
|
||||||
|
queryInfo.setMaximumResults(MAX_NUM_FUNCTIONS);
|
||||||
|
queryInfo.setSimilarityThreshold(SIMILARITY_BOUND);
|
||||||
|
queryInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND);
|
||||||
|
|
||||||
|
// Add prefilters.
|
||||||
|
setPrefilters();
|
||||||
|
|
||||||
|
// Add a simple date filter.
|
||||||
|
addBsimFilter(new DateLaterBSimFilterType(""), "01/01/1776");
|
||||||
|
|
||||||
|
// Demonstration of a filter that allows for multiple entries. All filters but the
|
||||||
|
// DateEarlier and DateLater allow this. The effect is that each filter will be OR'd
|
||||||
|
// with the others. This is effectively the same as creating three distinct ArchEquals filters.
|
||||||
|
//
|
||||||
|
// ie: "The architecture can equal x86:LE:64:default OR the architecture can equal
|
||||||
|
// ARM:LE_32:v4 OR ...."
|
||||||
|
addBsimFilter(new ArchitectureBSimFilterType(),
|
||||||
|
"x86:LE:64:default, x86:LE:32:default, ARM:LE:32:v4");
|
||||||
|
|
||||||
|
// Another filter with multiple entries, but in this case since it is a "NotEqual" filter,
|
||||||
|
// the items are "AND'd together.
|
||||||
|
//
|
||||||
|
// ie: "The compiler cannot equal windows AND the compiler cannot equal foo_compiler".
|
||||||
|
addBsimFilter(new CompilerBSimFilterType(), "windows, foo_compiler");
|
||||||
|
|
||||||
|
//connect to the database
|
||||||
|
try {
|
||||||
|
String dbUrl =
|
||||||
|
askString("", "Enter the URL of the BSim database:", "ghidra://localhost/bsimDb");
|
||||||
|
queryService.initializeDatabase(dbUrl);
|
||||||
|
FunctionDatabase.Error error = queryService.getLastError();
|
||||||
|
if (error != null && error.category == ErrorCategory.Nodatabase) {
|
||||||
|
println("Database [" + dbUrl + "] cannot be found (does it exist?)");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
catch (QueryDatabaseException e) {
|
||||||
|
println(e.getMessage());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute query and print results.
|
||||||
|
List<BSimMatchResult> resultRows = executeQuery(queryInfo);
|
||||||
|
printFunctionQueryResults(resultRows, "\nFunction-level results before filtering");
|
||||||
|
|
||||||
|
// Add some simple post-query filters. These filters will only be applied to the result
|
||||||
|
// set returned from the previous query.
|
||||||
|
addBsimFilter(new Md5BSimFilterType(), currentProgram.getExecutableMD5());
|
||||||
|
addBsimFilter(new CompilerBSimFilterType(), "gcc");
|
||||||
|
addBsimFilter(new FunctionTagBSimFilterType("KNOWN_LIBRARY", queryService),
|
||||||
|
"false");
|
||||||
|
|
||||||
|
// Apply the filters and print results.
|
||||||
|
List<BSimMatchResult> filteredRows =
|
||||||
|
BSimMatchResult.filterMatchRows(bsimFilter, resultRows);
|
||||||
|
printFunctionQueryResults(filteredRows, "\nFunction-level results after filtering");
|
||||||
|
printExecutableInformation(filteredRows);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void cleanup(boolean success) {
|
||||||
|
if (queryService != null) {
|
||||||
|
queryService.dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/***********************************************************************
|
||||||
|
* PRIVATE METHODS
|
||||||
|
***********************************************************************/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Adds a filter to the given filter container.
|
||||||
|
*
|
||||||
|
* @param filterTemplate the filter type to add
|
||||||
|
* @param value the value of the filter
|
||||||
|
*/
|
||||||
|
private void addBsimFilter(BSimFilterType filterTemplate, String value) {
|
||||||
|
String[] inputs = value.split(",");
|
||||||
|
for (String input : inputs) {
|
||||||
|
if (!input.trim().isEmpty()) {
|
||||||
|
bsimFilter.addAtom(filterTemplate, input.trim());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Queries the database and returns the results.
|
||||||
|
*
|
||||||
|
* @param qInfo contains all information required for the query
|
||||||
|
* @return list of matches
|
||||||
|
* @throws QueryDatabaseException if there is a problem executing the query similar functions query
|
||||||
|
* @throws CancelledException if the user cancelled the operation
|
||||||
|
*/
|
||||||
|
private List<BSimMatchResult> executeQuery(SFQueryInfo qInfo)
|
||||||
|
throws QueryDatabaseException, CancelledException {
|
||||||
|
|
||||||
|
SFQueryResult queryResults = queryService.querySimilarFunctions(qInfo, null, monitor);
|
||||||
|
List<BSimMatchResult> resultRows =
|
||||||
|
BSimMatchResult.generate(queryResults.getSimilarityResults(), currentProgram);
|
||||||
|
|
||||||
|
return resultRows;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Creates predicates that will be used to filter out functions. This example provides three
|
||||||
|
* different methods of doing this:
|
||||||
|
*
|
||||||
|
* - anonymous class
|
||||||
|
* - lambda
|
||||||
|
* - static method
|
||||||
|
*
|
||||||
|
* These are all possible because the filter takes a {@link BiPredicate}, which is a
|
||||||
|
* functional interface.
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
private void setPrefilters() {
|
||||||
|
|
||||||
|
PreFilter preFilter = queryInfo.getPreFilter();
|
||||||
|
|
||||||
|
//
|
||||||
|
// Option 1: Anonymous class
|
||||||
|
// Filters out any functions with a self significance less than a
|
||||||
|
// certain value.
|
||||||
|
//
|
||||||
|
preFilter.addPredicate(new BiPredicate<Program, FunctionDescription>() {
|
||||||
|
@Override
|
||||||
|
public boolean test(Program t, FunctionDescription u) {
|
||||||
|
return queryService.getLSHVectorFactory()
|
||||||
|
.getSelfSignificance(
|
||||||
|
u.getSignatureRecord().getLSHVector()) >= SELF_SIGNIFICANCE_BOUND;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
//
|
||||||
|
// Option 2. Lambda expression
|
||||||
|
// Filters out any functions with a self significance less than a
|
||||||
|
// certain value.
|
||||||
|
//
|
||||||
|
preFilter.addPredicate((x, y) -> queryService.getLSHVectorFactory()
|
||||||
|
.getSelfSignificance(
|
||||||
|
y.getSignatureRecord().getLSHVector()) >= SELF_SIGNIFICANCE_BOUND);
|
||||||
|
|
||||||
|
//
|
||||||
|
// Option 3. Static method
|
||||||
|
// Filters out any functions that are of type ANALYSIS.
|
||||||
|
//
|
||||||
|
preFilter.addPredicate(QueryWithFiltersScript::isNotAnalysisSourceType);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a set of ALL functions (no stubs) in the given program.
|
||||||
|
*
|
||||||
|
* @param program the program to get the functions from
|
||||||
|
* @return list of function symbols
|
||||||
|
*/
|
||||||
|
private HashSet<FunctionSymbol> getFunctionsToQuery(Program program) {
|
||||||
|
HashSet<FunctionSymbol> functions = new HashSet<>();
|
||||||
|
FunctionIterator fIter = program.getFunctionManager().getFunctionsNoStubs(true);
|
||||||
|
for (Function func : fIter) {
|
||||||
|
functions.add((FunctionSymbol) func.getSymbol());
|
||||||
|
}
|
||||||
|
return functions;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns true if the given function is NOT an analysis type.
|
||||||
|
*
|
||||||
|
* @param program the current program
|
||||||
|
* @param funcDesc the function description object
|
||||||
|
* @return true if the symbol is NOT an analysis source type
|
||||||
|
*/
|
||||||
|
public static boolean isNotAnalysisSourceType(Program program, FunctionDescription funcDesc) {
|
||||||
|
Address address =
|
||||||
|
program.getAddressFactory().getDefaultAddressSpace().getAddress(funcDesc.getAddress());
|
||||||
|
|
||||||
|
Function function = program.getFunctionManager().getFunctionAt(address);
|
||||||
|
if (function == null || function.getName().equals(funcDesc.getFunctionName())) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return function.getSymbol().getSource() != SourceType.ANALYSIS;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Prints a sorted list of executables represented in the function matches.
|
||||||
|
*
|
||||||
|
* @param filteredRows list of function results
|
||||||
|
*/
|
||||||
|
private void printExecutableInformation(List<BSimMatchResult> filteredRows) {
|
||||||
|
|
||||||
|
TreeSet<ExecutableResult> execrows = ExecutableResult.generateFromMatchRows(filteredRows);
|
||||||
|
ExecutableResult[] results = new ExecutableResult[execrows.size()];
|
||||||
|
results = execrows.toArray(results);
|
||||||
|
|
||||||
|
Arrays.sort(results, new Comparator<ExecutableResult>() {
|
||||||
|
@Override
|
||||||
|
public int compare(ExecutableResult o1, ExecutableResult o2) {
|
||||||
|
return Double.compare(o2.getSignificanceSum(), o1.getSignificanceSum());
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
printf("Executable-level results:\n");
|
||||||
|
for (int i = 0, max = Math.min(NUM_EXES_TO_DISPLAY, results.length); i < max; ++i) {
|
||||||
|
printf(" MD5: %s\n", results[i].getExecutableRecord().getMd5());
|
||||||
|
printf(" Executable Name: %s\n", results[i].getExecutableRecord().getNameExec());
|
||||||
|
printf(" Function Count: %d\n", results[i].getFunctionCount());
|
||||||
|
printf(" Significance Sum: %f\n\n", results[i].getSignificanceSum());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Prints information about each function in the result set.
|
||||||
|
*
|
||||||
|
* @param resultRows the list of rows containing the info to print
|
||||||
|
* @param title the title to print
|
||||||
|
*/
|
||||||
|
private void printFunctionQueryResults(List<BSimMatchResult> resultRows, String title) {
|
||||||
|
printf(title + ": (%d)\n\n", resultRows.size());
|
||||||
|
for (BSimMatchResult resultRow : resultRows) {
|
||||||
|
printf(" queried function: %s\n",
|
||||||
|
resultRow.getOriginalFunctionDescription().getFunctionName());
|
||||||
|
printf(" matching function: %s\n",
|
||||||
|
resultRow.getMatchFunctionDescription().getFunctionName());
|
||||||
|
printf(" executable of matching function: %s\n",
|
||||||
|
resultRow.getMatchFunctionDescription().getExecutableRecord().getNameExec());
|
||||||
|
printf(" similarity: %f\n", resultRow.getSimilarity());
|
||||||
|
printf(" significance: %f\n\n", resultRow.getSignificance());
|
||||||
|
}
|
||||||
|
printf("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
173
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.py
Executable file
173
Ghidra/Features/BSim/ghidra_scripts/QueryWithFiltersScript.py
Executable file
@ -0,0 +1,173 @@
|
|||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
# Advanced example of BSim querying
|
||||||
|
# @category BSim.python
|
||||||
|
|
||||||
|
import ghidra.query.facade.SimilarFunctionQueryService as SimilarFunctionQueryService
|
||||||
|
import ghidra.query.facade.SFQueryInfo as SFQueryInfo
|
||||||
|
import ghidra.query.FunctionDatabase as FunctionDatabase
|
||||||
|
import ghidra.query.facade.QueryDatabaseException as QueryDatabaseException
|
||||||
|
import java.util.HashSet as HashSet
|
||||||
|
import ghidra.app.plugin.core.query.QueryNearestRow as QueryNearestRow
|
||||||
|
import java.util.function.BiPredicate as BiPredicate
|
||||||
|
import ghidra.query.protocol.FilterTemplate as FilterTemplate
|
||||||
|
import ghidra.app.plugin.core.query.ExecutableResult as ExecutableResult
|
||||||
|
import java.util.Comparator as Comparator
|
||||||
|
import java.util.Arrays as Arrays
|
||||||
|
import java.lang.Double as Double
|
||||||
|
|
||||||
|
#Query thresholds
|
||||||
|
MAX_NUM_FUNCTIONS = 100
|
||||||
|
SIMILARITY_BOUND = 0.7
|
||||||
|
SIGNIFICANCE_BOUND = 0.0
|
||||||
|
|
||||||
|
#limit the number of results displayed
|
||||||
|
NUM_EXES_TO_DISPLAY = 10
|
||||||
|
|
||||||
|
#for prefiltering: this number will be used to filter out small functions
|
||||||
|
SELF_SIGNIFICANCE_BOUND = 40.0
|
||||||
|
|
||||||
|
def run():
|
||||||
|
|
||||||
|
#get the set of functions to query
|
||||||
|
funcsToQuery = getFunctionsToQuery()
|
||||||
|
|
||||||
|
#sets up the object required for querying the database
|
||||||
|
queryService = SimilarFunctionQueryService(currentProgram)
|
||||||
|
queryInfo = SFQueryInfo(funcsToQuery)
|
||||||
|
bsimFilter = queryInfo.getBsimFilter()
|
||||||
|
|
||||||
|
#sets the query parameters.
|
||||||
|
#change the defined constants to control how fuzzy of
|
||||||
|
#a match you're willing to accept, and the maximum number
|
||||||
|
#of matches to return for each function
|
||||||
|
queryInfo.setMaximumResults(MAX_NUM_FUNCTIONS)
|
||||||
|
queryInfo.setSimilarityThreshold(SIMILARITY_BOUND)
|
||||||
|
queryInfo.setSignificanceThreshold(SIGNIFICANCE_BOUND)
|
||||||
|
|
||||||
|
#add the prefilters
|
||||||
|
setPrefilters(queryService, queryInfo)
|
||||||
|
|
||||||
|
#add a filter on the date
|
||||||
|
addBsimFilter(bsimFilter, FilterTemplate.DateLater(""), "01/01/1776")
|
||||||
|
|
||||||
|
#add a filter with multiple values. Since this is an "Equal" filter, the results are OR'd together
|
||||||
|
#so a given executable will pass the main filter if it passes at least one of the subfilters
|
||||||
|
addBsimFilter(bsimFilter, FilterTemplate.ArchEquals(),"x86:LE:64:default, x86:LE:32:default, ARM:LE:32:v4")
|
||||||
|
|
||||||
|
#now add a "notequal" filter
|
||||||
|
#to pass, the compiler can't be windows and it can't be foo_compiler
|
||||||
|
addBsimFilter(bsimFilter,FilterTemplate.CompNotEqual(),"windows, foo_compiler")
|
||||||
|
|
||||||
|
#establish a connection to the BSim database
|
||||||
|
try:
|
||||||
|
dbUrl = askString("","Enter the URL of the BSim database:", "ghidra://localhost/bsimDB")
|
||||||
|
queryService.initializeDatabase(dbUrl)
|
||||||
|
error = queryService.getDatabase().getLastError()
|
||||||
|
if error is not None and (error.category is ErrorCategory.Nodatabase):
|
||||||
|
print "Database [%s] cannot be found (does it exist?)" % dbUrl
|
||||||
|
return
|
||||||
|
except QueryDatabaseException as e:
|
||||||
|
print e.getMessage()
|
||||||
|
return
|
||||||
|
|
||||||
|
resultRows = executeQuery(queryService,queryInfo)
|
||||||
|
printFunctionQueryResults(resultRows, "\nFunction-level results before filtering")
|
||||||
|
|
||||||
|
#now add some post-query filters, which filters the result set returned by the previous query
|
||||||
|
|
||||||
|
addBsimFilter(bsimFilter, FilterTemplate.Md5NotEqual(), currentProgram.getExecutableMD5())
|
||||||
|
addBsimFilter(bsimFilter, FilterTemplate.CompilerEquals(), "gcc")
|
||||||
|
addBsimFilter(bsimFilter, FilterTemplate.FunctionTagTemplate("KNOWN_LIBRARY", queryService), "false")
|
||||||
|
|
||||||
|
#apply the filters and print the results
|
||||||
|
filteredRows = QueryNearestRow.filterMatchRows(bsimFilter, resultRows)
|
||||||
|
printFunctionQueryResults(filteredRows, "\nFunction-level results after filtering")
|
||||||
|
printExecutableInformation(filteredRows)
|
||||||
|
return
|
||||||
|
|
||||||
|
|
||||||
|
#collect the functions to query from currentProgram
|
||||||
|
def getFunctionsToQuery():
|
||||||
|
functions = HashSet();
|
||||||
|
fIter = currentProgram.getFunctionManager().getFunctionsNoStubs(True)
|
||||||
|
for func in fIter:
|
||||||
|
functions.add(func.getSymbol())
|
||||||
|
return functions
|
||||||
|
|
||||||
|
#query the database
|
||||||
|
def executeQuery(queryService,queryInfo):
|
||||||
|
queryResults = queryService.querySimilarFunctions(queryInfo,monitor)
|
||||||
|
resultRows = QueryNearestRow.generate(queryResults.getSimilarityResults(),currentProgram)
|
||||||
|
return resultRows
|
||||||
|
|
||||||
|
def printFunctionQueryResults(resultRows, title):
|
||||||
|
print "%s: %d\n\n" % (title, resultRows.size())
|
||||||
|
for row in resultRows:
|
||||||
|
print " queried function: %s" % row.getOriginalFunctionDescription().getFunctionName()
|
||||||
|
print " matching function: %s" % row.getMatchFunctionDescription().getFunctionName()
|
||||||
|
print " executable of matching function: %s" % row.getMatchFunctionDescription().getExecutableRecord().getNameExec()
|
||||||
|
print " similarity: %f" % row.getSimilarity()
|
||||||
|
print " significance: %f\n" % row.getSignificance()
|
||||||
|
|
||||||
|
#Prefilters are used to filter out functions before sending a query to the database
|
||||||
|
#A typical use case would be to collect all functions in a binary, then use a
|
||||||
|
#prefilter to remove the functions with low self-significance (which is the
|
||||||
|
#"BSim way" to remove small functions)
|
||||||
|
def setPrefilters(queryService, queryInfo):
|
||||||
|
preFilter = queryInfo.getPreFilter();
|
||||||
|
selfSigFilter = ExampleFilter(queryService)
|
||||||
|
preFilter.addPredicate(selfSigFilter)
|
||||||
|
|
||||||
|
class ExampleFilter(BiPredicate):
|
||||||
|
|
||||||
|
def __init__(self, queryService):
|
||||||
|
self.queryService = queryService
|
||||||
|
|
||||||
|
def test(self,program, fdesc):
|
||||||
|
return self.queryService.getLSHVectorFactory().getSelfSignificance(fdesc.getSignatureRecord().getLSHVector()) >= SELF_SIGNIFICANCE_BOUND
|
||||||
|
|
||||||
|
def addBsimFilter(bsimFilter, filterTemplate, values):
|
||||||
|
for value in values.split(","):
|
||||||
|
if len(value.strip()) > 0:
|
||||||
|
bsimFilter.addAtom(filterTemplate, value.strip(), FilterTemplate.Blank())
|
||||||
|
|
||||||
|
#calls the methods to aggregate executable-level information about the matches
|
||||||
|
def printExecutableInformation(filteredRows):
|
||||||
|
execrows = ExecutableResult.generateFromMatchRows(filteredRows)
|
||||||
|
results = execrows.toArray()
|
||||||
|
sorter = Sorter()
|
||||||
|
Arrays.sort(results,sorter)
|
||||||
|
print "Executable-level results:"
|
||||||
|
numExes = min(len(results),NUM_EXES_TO_DISPLAY)
|
||||||
|
for i in range (numExes):
|
||||||
|
print " MD5: %s" % results[i].getExecutableRecord().getMd5()
|
||||||
|
print " Executable Name: %s" % results[i].getExecutableRecord().getNameExec()
|
||||||
|
print " Function Count: %d" % results[i].getFunctionCount()
|
||||||
|
print " Significance Sum: %f\n" % results[i].getSignificanceSum()
|
||||||
|
return
|
||||||
|
|
||||||
|
class Sorter(Comparator):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
return
|
||||||
|
|
||||||
|
def compare(self,o1,o2):
|
||||||
|
return Double.compare(o2.getSignificanceSum(), o1.getSignificanceSum())
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
run()
|
@ -0,0 +1,45 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
import org.apache.commons.lang3.StringUtils;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.framework.options.Options;
|
||||||
|
import ghidra.program.model.listing.Program;
|
||||||
|
|
||||||
|
//@category BSim
|
||||||
|
//sets a property on the current program which can be used as
|
||||||
|
//an executable category in BSim
|
||||||
|
public class SetExecutableCategoryScript extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
if (currentProgram == null) {
|
||||||
|
popup("This script requires a program");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
Options opts = currentProgram.getOptions(Program.PROGRAM_INFO);
|
||||||
|
String name = askString("Enter Property Name", "Name");
|
||||||
|
if (StringUtils.isAllBlank(name)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
String value = askString("Enter Value of Property " + name, "Value");
|
||||||
|
if (StringUtils.isAllBlank(value)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
opts.setString(name, value);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
56
Ghidra/Features/BSim/ghidra_scripts/TailoredAnalysis.java
Executable file
56
Ghidra/Features/BSim/ghidra_scripts/TailoredAnalysis.java
Executable file
@ -0,0 +1,56 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.framework.options.Options;
|
||||||
|
import ghidra.program.model.listing.Program;
|
||||||
|
|
||||||
|
// Setup tailored auto-analysis (in place of the headless analyzers full auto-analysis)
|
||||||
|
// suitable for BSim ingest process. Intended to be invoked as an analyzeHeadless -preScript
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
public class TailoredAnalysis extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void run() throws Exception {
|
||||||
|
Options pl = currentProgram.getOptions(Program.ANALYSIS_PROPERTIES);
|
||||||
|
pl.setBoolean("Decompiler Parameter ID", false);
|
||||||
|
|
||||||
|
// These analyzers generate lots of cross references, which are not necessary for
|
||||||
|
// signature analysis, and take time to run. On the other hand, you may want
|
||||||
|
// them in general to facilitate general analysis
|
||||||
|
pl.setBoolean("Stack", false);
|
||||||
|
// pl.setBoolean("Windows x86 PE Instruction References", false);
|
||||||
|
// pl.setBoolean("Windows x86 PE C++", false);
|
||||||
|
// pl.setBoolean("Windows x86 PE Preliminary", false);
|
||||||
|
// pl.setBoolean("ELF Scalar Operand References", false);
|
||||||
|
|
||||||
|
// Mangled symbols are good information but you may not be able to count on them being present in all versions
|
||||||
|
// Options analyzerOptions = pl.getOptions("Demangler");
|
||||||
|
// analyzerOptions.setBoolean("Commit Function Signatures", false);
|
||||||
|
|
||||||
|
// You really want these options turned on
|
||||||
|
pl.setBoolean("Shared Return Calls",true);
|
||||||
|
pl.setBoolean("Function Start Search", true);
|
||||||
|
pl.setBoolean("DWARF", false);
|
||||||
|
// Options analyzerOptions = pl.getOptions("Function Start Search");
|
||||||
|
// analyzerOptions.setBoolean("Search Data Blocks", true);
|
||||||
|
// analyzerOptions = pl.getOptions("Function Start Search After Code");
|
||||||
|
// analyzerOptions.setBoolean("Search Data Blocks", true);
|
||||||
|
// analyzerOptions = pl.getOptions("Function Start Search After Data");
|
||||||
|
// analyzerOptions.setBoolean("Search Data Blocks", true);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
103
Ghidra/Features/BSim/ghidra_scripts/UpdateBSimMetadata.java
Executable file
103
Ghidra/Features/BSim/ghidra_scripts/UpdateBSimMetadata.java
Executable file
@ -0,0 +1,103 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
// Push updated information about function names and other metadata from the current program to a BSim database
|
||||||
|
//@category BSim
|
||||||
|
|
||||||
|
import java.net.URL;
|
||||||
|
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.features.bsim.query.*;
|
||||||
|
import ghidra.features.bsim.query.description.ExecutableRecord;
|
||||||
|
import ghidra.features.bsim.query.description.FunctionDescription;
|
||||||
|
import ghidra.features.bsim.query.protocol.QueryUpdate;
|
||||||
|
import ghidra.features.bsim.query.protocol.ResponseUpdate;
|
||||||
|
import ghidra.program.model.listing.FunctionIterator;
|
||||||
|
import ghidra.program.model.listing.FunctionManager;
|
||||||
|
|
||||||
|
public class UpdateBSimMetadata extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
if (currentProgram == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
String bsim_url = System.getProperty("ghidra.bsimurl");
|
||||||
|
if (bsim_url==null || bsim_url.length()==0) {
|
||||||
|
bsim_url = askString("Request Repository", "Select URL of database receiving update");
|
||||||
|
}
|
||||||
|
|
||||||
|
URL url = BSimClientFactory.deriveBSimURL(bsim_url);
|
||||||
|
try (FunctionDatabase database = BSimClientFactory.buildClient(url, true)) {
|
||||||
|
if (!database.initialize()) {
|
||||||
|
println(database.getLastError().message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
println("Connected to " + database.getInfo().databasename);
|
||||||
|
|
||||||
|
GenSignatures gensig = new GenSignatures(false);
|
||||||
|
gensig.setVectorFactory(database.getLSHVectorFactory());
|
||||||
|
gensig.openProgram(currentProgram, null, null, null, null, null);
|
||||||
|
|
||||||
|
FunctionManager functionManager = currentProgram.getFunctionManager();
|
||||||
|
FunctionIterator funciter;
|
||||||
|
if (currentSelection != null) {
|
||||||
|
println("Scanning selected functions");
|
||||||
|
funciter = functionManager.getFunctions(currentSelection, true);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
println("Scanning all functions");
|
||||||
|
funciter = functionManager.getFunctions(true); // If no highlight, update all functions
|
||||||
|
}
|
||||||
|
gensig.scanFunctionsMetadata(funciter, monitor);
|
||||||
|
QueryUpdate update = new QueryUpdate();
|
||||||
|
update.manage = gensig.getDescriptionManager();
|
||||||
|
|
||||||
|
ResponseUpdate respup = update.execute(database); // Try to push the update
|
||||||
|
if (respup == null) {
|
||||||
|
println(database.getLastError().message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (!respup.badexe.isEmpty()) {
|
||||||
|
for (int j = 0; j < respup.badexe.size(); ++j) {
|
||||||
|
ExecutableRecord erec = respup.badexe.get(j);
|
||||||
|
println("Database does not contain executable: " + erec.getNameExec());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!respup.badfunc.isEmpty()) {
|
||||||
|
int max = respup.badfunc.size();
|
||||||
|
if (max > 10) {
|
||||||
|
println(
|
||||||
|
"Could not find " + Integer.toString(respup.badfunc.size()) + " functions");
|
||||||
|
max = 10;
|
||||||
|
}
|
||||||
|
for (int j = 0; j < max; ++j) {
|
||||||
|
FunctionDescription func = respup.badfunc.get(j);
|
||||||
|
println("Could not update function " + func.getFunctionName());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (respup.exeupdate > 0) {
|
||||||
|
println("Updated executable metadata");
|
||||||
|
}
|
||||||
|
if (respup.funcupdate > 0) {
|
||||||
|
println("Updated " + Integer.toString(respup.funcupdate) + " functions");
|
||||||
|
}
|
||||||
|
if (respup.exeupdate == 0 && respup.funcupdate == 0) {
|
||||||
|
println("No changes");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
126
Ghidra/Features/BSim/make-postgres.sh
Executable file
126
Ghidra/Features/BSim/make-postgres.sh
Executable file
@ -0,0 +1,126 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
## ###
|
||||||
|
# IP: GHIDRA
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
##
|
||||||
|
#
|
||||||
|
# This script may be used to build the postgresql server within
|
||||||
|
# a GHIDRA installation. The postgresql server configuration options
|
||||||
|
# below (POSTGRES_CONFIG_OPTIONS) may be adjusted if required
|
||||||
|
# (e.g., build without openssl use, etc.).
|
||||||
|
#
|
||||||
|
# See https://www.postgresql.org/docs/10/install-procedure.html
|
||||||
|
# for supported postgresql config options.
|
||||||
|
#
|
||||||
|
# Additional packages may need to be installed include to perform the
|
||||||
|
# postgresql build. Please refer to the following web page for
|
||||||
|
# package dependencies:
|
||||||
|
#
|
||||||
|
# https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code
|
||||||
|
#
|
||||||
|
# The postgresql source distribution should reside within the BSim module
|
||||||
|
# directory prior to running this script. Within development environments
|
||||||
|
# it will first check the ghidra.bin repo for this source file.
|
||||||
|
#
|
||||||
|
|
||||||
|
POSTGRES=postgresql-15.3
|
||||||
|
POSTGRES_GZ=${POSTGRES}.tar.gz
|
||||||
|
POSTGRES_CONFIG_OPTIONS="--disable-rpath --with-openssl"
|
||||||
|
|
||||||
|
DIR=$(cd `dirname $0`; pwd)
|
||||||
|
|
||||||
|
POSTGRES_GZ_PATH=${DIR}/../../../../ghidra.bin/Ghidra/Features/BSim/${POSTGRES_GZ}
|
||||||
|
if [ ! -f "${POSTGRES_GZ_PATH}" ]; then
|
||||||
|
POSTGRES_GZ_PATH=${DIR}/${POSTGRES_GZ}
|
||||||
|
if [ ! -f "${POSTGRES_GZ_PATH}" ]; then
|
||||||
|
echo "Postgres source bundle not found: ${POSTGRES_GZ_PATH}"
|
||||||
|
exit -1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
OS=`uname -s`
|
||||||
|
ARCH=`arch`
|
||||||
|
|
||||||
|
cd ${DIR}
|
||||||
|
|
||||||
|
mkdir -p build > /dev/null
|
||||||
|
|
||||||
|
if [ ! -d build/${POSTGRES} ]; then
|
||||||
|
# Unpack postgres source distro into build
|
||||||
|
echo "Unpacking postgresql source: ${POSTGRES_GZ_PATH}"
|
||||||
|
$(cd build; tar -xzf ${POSTGRES_GZ_PATH} )
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Build postgresql
|
||||||
|
|
||||||
|
pushd build/${POSTGRES}
|
||||||
|
|
||||||
|
if [ "$OS" = "Darwin" ]; then
|
||||||
|
export MACOSX_DEPLOYMENT_TARGET=10.5
|
||||||
|
export ARCHFLAGS="-arch x86_64"
|
||||||
|
OSDIR=mac_x86_64
|
||||||
|
elif [ "$ARCH" = "x86_64" ]; then
|
||||||
|
OSDIR=linux_x86_64
|
||||||
|
else
|
||||||
|
echo "Unsupported platform: $OS $ARCH"
|
||||||
|
exit -1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Install within build/os
|
||||||
|
INSTALL_DIR=${DIR}/build/os/${OSDIR}/postgresql
|
||||||
|
rm -rf ${INSTALL_DIR} > /dev/null
|
||||||
|
|
||||||
|
make distclean
|
||||||
|
|
||||||
|
# Configure postgres
|
||||||
|
|
||||||
|
./configure ${POSTGRES_CONFIG_OPTIONS} --prefix=${INSTALL_DIR}
|
||||||
|
if [ $? != 0 ]; then
|
||||||
|
exit $?
|
||||||
|
fi
|
||||||
|
|
||||||
|
make install
|
||||||
|
if [ $? != 0 ]; then
|
||||||
|
exit $?
|
||||||
|
fi
|
||||||
|
|
||||||
|
make -C contrib/pg_prewarm install
|
||||||
|
if [ $? != 0 ]; then
|
||||||
|
exit $?
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Completed postgresql build"
|
||||||
|
|
||||||
|
# Build lshvector plugin for postgresql
|
||||||
|
|
||||||
|
popd
|
||||||
|
|
||||||
|
rm -rf build/lshvector > /dev/null
|
||||||
|
mkdir build/lshvector
|
||||||
|
|
||||||
|
echo "Building lshvector plugin..."
|
||||||
|
|
||||||
|
cp src/lshvector/* build/lshvector
|
||||||
|
cp src/lshvector/c/* build/lshvector
|
||||||
|
|
||||||
|
cd build/lshvector
|
||||||
|
make -f Makefile.lshvector install PG_CONFIG=${INSTALL_DIR}/bin/pg_config
|
||||||
|
|
||||||
|
if [ $? = 0 ]; then
|
||||||
|
echo "Completed build and install of lshvector postgresql plugin"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
exit -1
|
||||||
|
|
34
Ghidra/Features/BSim/other/testscripts/InstallMetadataTest.java
Executable file
34
Ghidra/Features/BSim/other/testscripts/InstallMetadataTest.java
Executable file
@ -0,0 +1,34 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.framework.options.Options;
|
||||||
|
import ghidra.program.model.listing.Program;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This script is used by the unit test BSimServerTest
|
||||||
|
*/
|
||||||
|
public class InstallMetadataTest extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
Options pl = currentProgram.getOptions(Program.PROGRAM_INFO);
|
||||||
|
String value = "static";
|
||||||
|
if (currentProgram.getName().contains(".so"))
|
||||||
|
value = "shared";
|
||||||
|
pl.setString("Test Category", value);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
69
Ghidra/Features/BSim/other/testscripts/RegressionSignatures.java
Executable file
69
Ghidra/Features/BSim/other/testscripts/RegressionSignatures.java
Executable file
@ -0,0 +1,69 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
import java.io.File;
|
||||||
|
import java.io.FileWriter;
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.Iterator;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
import generic.lsh.vector.LSHVectorFactory;
|
||||||
|
import ghidra.app.script.GhidraScript;
|
||||||
|
import ghidra.program.model.listing.Function;
|
||||||
|
import ghidra.program.model.listing.FunctionManager;
|
||||||
|
import ghidra.features.bsim.query.FunctionDatabase;
|
||||||
|
import ghidra.features.bsim.query.GenSignatures;
|
||||||
|
import ghidra.features.bsim.query.client.Configuration;
|
||||||
|
import ghidra.features.bsim.query.description.DescriptionManager;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This script is used by the unit test BSimServerTest
|
||||||
|
*/
|
||||||
|
public class RegressionSignatures extends GhidraScript {
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void run() throws Exception {
|
||||||
|
String md5string = currentProgram.getExecutableMD5();
|
||||||
|
if ((md5string == null) || (md5string.length() < 10))
|
||||||
|
throw new IOException("Could not get MD5 on file: " + currentProgram.getName());
|
||||||
|
String basename = "sigs_" + md5string;
|
||||||
|
File file = null;
|
||||||
|
// This form of askString will work for both standalone execution or for parallel
|
||||||
|
File workingdir = askDirectory("RegressionSignatures:", "Working directory");
|
||||||
|
file = new File(workingdir, basename);
|
||||||
|
|
||||||
|
LSHVectorFactory vectorFactory = FunctionDatabase.generateLSHVectorFactory();
|
||||||
|
Configuration config = FunctionDatabase.loadConfigurationTemplate("medium_64");
|
||||||
|
vectorFactory.set(config.weightfactory, config.idflookup, config.info.settings);
|
||||||
|
GenSignatures gensig = new GenSignatures(true);
|
||||||
|
gensig.setVectorFactory(vectorFactory);
|
||||||
|
|
||||||
|
List<String> names = new ArrayList<String>();
|
||||||
|
names.add("Test Category");
|
||||||
|
gensig.addExecutableCategories(names);
|
||||||
|
String repo = "ghidra://localhost/repo";
|
||||||
|
String path = "/raw";
|
||||||
|
gensig.openProgram(this.currentProgram, null, null, null, repo, path);
|
||||||
|
FunctionManager fman = currentProgram.getFunctionManager();
|
||||||
|
Iterator<Function> iter = fman.getFunctions(true);
|
||||||
|
gensig.scanFunctions(iter, fman.getFunctionCount(), monitor);
|
||||||
|
FileWriter fwrite = new FileWriter(file);
|
||||||
|
DescriptionManager manager = gensig.getDescriptionManager();
|
||||||
|
manager.saveXml(fwrite);
|
||||||
|
fwrite.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
25
Ghidra/Features/BSim/src/lshvector/Makefile.lshvector
Executable file
25
Ghidra/Features/BSim/src/lshvector/Makefile.lshvector
Executable file
@ -0,0 +1,25 @@
|
|||||||
|
# Locality Sensitive Hashing package
|
||||||
|
# NOTE: This file cannot be executed in place. It is copied into a temporary
|
||||||
|
# directory with its source code and executed there.
|
||||||
|
|
||||||
|
ifeq ($(PG_CONFIG),)
|
||||||
|
default:
|
||||||
|
echo "You must specifiy PG_CONFIG"
|
||||||
|
false
|
||||||
|
|
||||||
|
endif
|
||||||
|
|
||||||
|
MODULE_big = lshvector
|
||||||
|
OBJS= lsh.o weights.o binhash.o crc32.o
|
||||||
|
|
||||||
|
EXTENSION = lshvector
|
||||||
|
DATA = lshvector--1.0.sql
|
||||||
|
|
||||||
|
REGRESS = lshvector
|
||||||
|
|
||||||
|
EXTRA_CLEAN =
|
||||||
|
|
||||||
|
SHLIB_LINK += $(filter -lm, $(LIBS))
|
||||||
|
|
||||||
|
PGXS := $(shell $(PG_CONFIG) --pgxs)
|
||||||
|
include $(PGXS)
|
277
Ghidra/Features/BSim/src/lshvector/c/binhash.c
Executable file
277
Ghidra/Features/BSim/src/lshvector/c/binhash.c
Executable file
@ -0,0 +1,277 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#include "lsh.h"
|
||||||
|
|
||||||
|
#define LSH_HASHBASE 0xD7E6A299
|
||||||
|
|
||||||
|
static char hash_signtable[512];
|
||||||
|
|
||||||
|
static void hash_int_fft_16(int32 *arr)
|
||||||
|
|
||||||
|
{
|
||||||
|
int32 x,y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||||
|
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||||
|
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||||
|
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||||
|
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void hash_double_fft_16(double *arr)
|
||||||
|
|
||||||
|
{
|
||||||
|
double x,y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[8]; arr[0] = x + y; arr[8] = x - y;
|
||||||
|
x = arr[1]; y = arr[9]; arr[1] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[2]; y = arr[10]; arr[2] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[3]; y = arr[11]; arr[3] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[4]; y = arr[12]; arr[4] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[5]; y = arr[13]; arr[5] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[6]; y = arr[14]; arr[6] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[7]; y = arr[15]; arr[7] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[4]; arr[0] = x + y; arr[4] = x - y;
|
||||||
|
x = arr[1]; y = arr[5]; arr[1] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[2]; y = arr[6]; arr[2] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[3]; y = arr[7]; arr[3] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[12]; arr[8] = x + y; arr[12] = x - y;
|
||||||
|
x = arr[9]; y = arr[13]; arr[9] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[10]; y = arr[14]; arr[10] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[11]; y = arr[15]; arr[11] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[2]; arr[0] = x + y; arr[2] = x - y;
|
||||||
|
x = arr[1]; y = arr[3]; arr[1] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[6]; arr[4] = x + y; arr[6] = x - y;
|
||||||
|
x = arr[5]; y = arr[7]; arr[5] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[10]; arr[8] = x + y; arr[10] = x - y;
|
||||||
|
x = arr[9]; y = arr[11]; arr[9] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[14]; arr[12] = x + y; arr[14] = x - y;
|
||||||
|
x = arr[13]; y = arr[15]; arr[13] = x + y; arr[15] = x - y;
|
||||||
|
|
||||||
|
x = arr[0]; y = arr[1]; arr[0] = x + y; arr[1] = x - y;
|
||||||
|
x = arr[2]; y = arr[3]; arr[2] = x + y; arr[3] = x - y;
|
||||||
|
x = arr[4]; y = arr[5]; arr[4] = x + y; arr[5] = x - y;
|
||||||
|
x = arr[6]; y = arr[7]; arr[6] = x + y; arr[7] = x - y;
|
||||||
|
x = arr[8]; y = arr[9]; arr[8] = x + y; arr[9] = x - y;
|
||||||
|
x = arr[10]; y = arr[11]; arr[10] = x + y; arr[11] = x - y;
|
||||||
|
x = arr[12]; y = arr[13]; arr[12] = x + y; arr[13] = x - y;
|
||||||
|
x = arr[14]; y = arr[15]; arr[14] = x + y; arr[15] = x - y;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This is a precalculated table for generating dotproducts with the random family of vectors directly
|
||||||
|
* The first vector r_0 is expressed as a hashing function on the dimension index and the other vectors
|
||||||
|
* are derived from r_0 using an FFT. The table is formed by precalculating the FFT on basis vectors in this table
|
||||||
|
*/
|
||||||
|
void lsh_setup_signtable(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
int32 i,j;
|
||||||
|
int32 arr[16];
|
||||||
|
char *hibit0ptr;
|
||||||
|
char *hibit1ptr;
|
||||||
|
|
||||||
|
for(i=0;i<16;++i) { /* For each 4-bit position */
|
||||||
|
hibit0ptr = hash_signtable + i * 16;
|
||||||
|
hibit1ptr = hash_signtable + (i+16) * 16;
|
||||||
|
for(j=0;j<16;++j)
|
||||||
|
arr[j] = 0;
|
||||||
|
|
||||||
|
arr[ i ] = 1;
|
||||||
|
hash_int_fft_16(arr);
|
||||||
|
for(j=0;j<16;++j) {
|
||||||
|
if (arr[j] > 0) {
|
||||||
|
hibit0ptr[j] = '+';
|
||||||
|
hibit1ptr[j] = '-';
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
hibit0ptr[j] = '-';
|
||||||
|
hibit1ptr[j] = '+';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Generate a dot product of the hash vector in -vec- with a random family of 16 vectors, { r }
|
||||||
|
* r_0 is a randomly generated set of +1 -1 coefficients across all the dimensions (indexed by uint32 vec[i].hash)
|
||||||
|
* The coefficient is calculated as a hashing function from the seed -hashcur- and the index (vec[i].hash),
|
||||||
|
* so it should be balanced between +1 and -1.
|
||||||
|
* All the other vectors are generated from an FFT of r_0. This allows the dotproduct with vec to be calculated
|
||||||
|
* using an FFT if -vec- has many non-zero coefficients. If -vec- has only a few non-zero coefficients,
|
||||||
|
* the dotproduct if calculated with each vector in the family directly for better efficiency.
|
||||||
|
* The resulting dotproducts are converted into a 16-long bitvector based on the sign of the dotproduct and
|
||||||
|
* placed in -bucket-
|
||||||
|
*/
|
||||||
|
static uint32 hash_16_dotproduct(uint32 bucket,LSH_ITEM *vec,uint32 vecsize,uint32 hashcur,uint32 vecsizeupper)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 i,j;
|
||||||
|
uint32 rownum;
|
||||||
|
char *signptr;
|
||||||
|
double res[16];
|
||||||
|
|
||||||
|
for(i=0;i<16;++i)
|
||||||
|
res[i] = 0.0; /* Initialize the dotproduct results to zero */
|
||||||
|
|
||||||
|
if (vecsize < vecsizeupper) { /* If there are a small number of non-zero coefficients in -vec- */
|
||||||
|
for(i=0;i<vecsize;++i) {
|
||||||
|
rownum = vec[i].hash ^ hashcur; /* Calculate the rest of the r_0 hashing function*/
|
||||||
|
rownum = (rownum * 1103515245) + 12345;
|
||||||
|
rownum = (rownum>>24)&0x1f;
|
||||||
|
signptr = hash_signtable + rownum * 16;
|
||||||
|
for(j=0;j<16;++j) { /* Based on the precalculated coeff table calculate this portion of dotproduct */
|
||||||
|
if (signptr[j] == '+')
|
||||||
|
res[j] += vec[i].coeff; /* Dot product with +1 coeff */
|
||||||
|
else
|
||||||
|
res[j] -= vec[i].coeff; /* Dot product with -1 coeff */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else { /* If we have many non-zero coeffs in -vec- */
|
||||||
|
for(i=0;i<vecsize;++i) {
|
||||||
|
rownum = vec[i].hash ^ hashcur; /* Calculate the rest of the r_0 hashing function*/
|
||||||
|
rownum = (rownum * 1103515245) + 12345;
|
||||||
|
rownum = (rownum>>24)&0x1f;
|
||||||
|
if (rownum < 0x10) /* Set-up for the FFT */
|
||||||
|
res[rownum] += vec[i].coeff;
|
||||||
|
else
|
||||||
|
res[rownum&0xf] -= vec[i].coeff;
|
||||||
|
}
|
||||||
|
hash_double_fft_16(res); /* Calculate the remaining dotproducts be performing FFT */
|
||||||
|
}
|
||||||
|
|
||||||
|
for(i=0;i<16;++i) { /* Convert the dotproduct results to a bitvector */
|
||||||
|
bucket <<= 1;
|
||||||
|
if (res[i] > 0.0)
|
||||||
|
bucket |= 1;
|
||||||
|
}
|
||||||
|
return bucket;
|
||||||
|
}
|
||||||
|
|
||||||
|
void lsh_generate_binids(uint32 *res,LSH_ITEM *vec,uint32 vecsize)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 bucket = 0;
|
||||||
|
int32 bucketcnt = 0;
|
||||||
|
int32 i,bitsleft;
|
||||||
|
uint32 curid;
|
||||||
|
uint32 mask,val;
|
||||||
|
uint32 hashbase = LSH_HASHBASE;
|
||||||
|
|
||||||
|
for(i=0;i<lsh_L;++i) {
|
||||||
|
curid = i; /* Tack-on bits that indicate the particular table this binid belongs to */
|
||||||
|
bitsleft = lsh_k;
|
||||||
|
do {
|
||||||
|
if (bucketcnt == 0) {
|
||||||
|
hashbase = (hashbase * 1103515245) + 12345;
|
||||||
|
bucket = hash_16_dotproduct(bucket,vec,vecsize,hashbase,5);
|
||||||
|
bucketcnt += 16;
|
||||||
|
}
|
||||||
|
if (bucketcnt >= bitsleft) {
|
||||||
|
curid <<= bitsleft;
|
||||||
|
mask = 1;
|
||||||
|
mask = (mask << bitsleft)-1;
|
||||||
|
val = bucket >> (bucketcnt - bitsleft);
|
||||||
|
curid |= (val & mask);
|
||||||
|
bucketcnt -= bitsleft;
|
||||||
|
bitsleft = 0;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
curid <<= bucketcnt;
|
||||||
|
mask = 1;
|
||||||
|
mask = (mask << bucketcnt)-1;
|
||||||
|
curid |= (bucket & mask);
|
||||||
|
bitsleft -= bucketcnt;
|
||||||
|
bucketcnt = 0;
|
||||||
|
}
|
||||||
|
} while(bitsleft > 0);
|
||||||
|
res[ i ] = curid;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void lsh_generate_binids_datum(Datum *res,LSH_ITEM *vec,uint32 vecsize)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 bucket = 0;
|
||||||
|
int32 bucketcnt = 0;
|
||||||
|
int32 i,bitsleft;
|
||||||
|
uint32 curid;
|
||||||
|
uint32 mask,val;
|
||||||
|
uint32 hashbase = LSH_HASHBASE;
|
||||||
|
|
||||||
|
for(i=0;i<lsh_L;++i) {
|
||||||
|
curid = i; /* Tack-on bits that indicate the particular table this binid belongs to */
|
||||||
|
bitsleft = lsh_k;
|
||||||
|
do {
|
||||||
|
if (bucketcnt == 0) {
|
||||||
|
hashbase = (hashbase * 1103515245) + 12345;
|
||||||
|
bucket = hash_16_dotproduct(bucket,vec,vecsize,hashbase,5);
|
||||||
|
bucketcnt += 16;
|
||||||
|
}
|
||||||
|
if (bucketcnt >= bitsleft) {
|
||||||
|
curid <<= bitsleft;
|
||||||
|
mask = 1;
|
||||||
|
mask = (mask << bitsleft)-1;
|
||||||
|
val = bucket >> (bucketcnt - bitsleft);
|
||||||
|
curid |= (val & mask);
|
||||||
|
bucketcnt -= bitsleft;
|
||||||
|
bitsleft = 0;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
curid <<= bucketcnt;
|
||||||
|
mask = 1;
|
||||||
|
mask = (mask << bucketcnt)-1;
|
||||||
|
curid |= (bucket & mask);
|
||||||
|
bitsleft -= bucketcnt;
|
||||||
|
bucketcnt = 0;
|
||||||
|
}
|
||||||
|
} while(bitsleft > 0);
|
||||||
|
res[ i ] = Int32GetDatum((int32)curid);
|
||||||
|
}
|
||||||
|
}
|
101
Ghidra/Features/BSim/src/lshvector/c/crc32.c
Executable file
101
Ghidra/Features/BSim/src/lshvector/c/crc32.c
Executable file
@ -0,0 +1,101 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#include "lsh.h"
|
||||||
|
|
||||||
|
#define CRC_UPDATE(REG,VAL) (crc32tab[ (REG ^ VAL)&0xff ] ^ (REG >> 8))
|
||||||
|
|
||||||
|
/* Table for bytewise calculation of a 32-bit Cyclic Redundancy Check */
|
||||||
|
uint32 crc32tab[] = {
|
||||||
|
0x0,0x77073096,0xee0e612c,0x990951ba,0x76dc419,0x706af48f,
|
||||||
|
0xe963a535,0x9e6495a3,0xedb8832,0x79dcb8a4,0xe0d5e91e,
|
||||||
|
0x97d2d988,0x9b64c2b,0x7eb17cbd,0xe7b82d07,0x90bf1d91,
|
||||||
|
0x1db71064,0x6ab020f2,0xf3b97148,0x84be41de,0x1adad47d,
|
||||||
|
0x6ddde4eb,0xf4d4b551,0x83d385c7,0x136c9856,0x646ba8c0,
|
||||||
|
0xfd62f97a,0x8a65c9ec,0x14015c4f,0x63066cd9,0xfa0f3d63,
|
||||||
|
0x8d080df5,0x3b6e20c8,0x4c69105e,0xd56041e4,0xa2677172,
|
||||||
|
0x3c03e4d1,0x4b04d447,0xd20d85fd,0xa50ab56b,0x35b5a8fa,
|
||||||
|
0x42b2986c,0xdbbbc9d6,0xacbcf940,0x32d86ce3,0x45df5c75,
|
||||||
|
0xdcd60dcf,0xabd13d59,0x26d930ac,0x51de003a,0xc8d75180,
|
||||||
|
0xbfd06116,0x21b4f4b5,0x56b3c423,0xcfba9599,0xb8bda50f,
|
||||||
|
0x2802b89e,0x5f058808,0xc60cd9b2,0xb10be924,0x2f6f7c87,
|
||||||
|
0x58684c11,0xc1611dab,0xb6662d3d,0x76dc4190,0x1db7106,
|
||||||
|
0x98d220bc,0xefd5102a,0x71b18589,0x6b6b51f,0x9fbfe4a5,
|
||||||
|
0xe8b8d433,0x7807c9a2,0xf00f934,0x9609a88e,0xe10e9818,
|
||||||
|
0x7f6a0dbb,0x86d3d2d,0x91646c97,0xe6635c01,0x6b6b51f4,
|
||||||
|
0x1c6c6162,0x856530d8,0xf262004e,0x6c0695ed,0x1b01a57b,
|
||||||
|
0x8208f4c1,0xf50fc457,0x65b0d9c6,0x12b7e950,0x8bbeb8ea,
|
||||||
|
0xfcb9887c,0x62dd1ddf,0x15da2d49,0x8cd37cf3,0xfbd44c65,
|
||||||
|
0x4db26158,0x3ab551ce,0xa3bc0074,0xd4bb30e2,0x4adfa541,
|
||||||
|
0x3dd895d7,0xa4d1c46d,0xd3d6f4fb,0x4369e96a,0x346ed9fc,
|
||||||
|
0xad678846,0xda60b8d0,0x44042d73,0x33031de5,0xaa0a4c5f,
|
||||||
|
0xdd0d7cc9,0x5005713c,0x270241aa,0xbe0b1010,0xc90c2086,
|
||||||
|
0x5768b525,0x206f85b3,0xb966d409,0xce61e49f,0x5edef90e,
|
||||||
|
0x29d9c998,0xb0d09822,0xc7d7a8b4,0x59b33d17,0x2eb40d81,
|
||||||
|
0xb7bd5c3b,0xc0ba6cad,0xedb88320,0x9abfb3b6,0x3b6e20c,
|
||||||
|
0x74b1d29a,0xead54739,0x9dd277af,0x4db2615,0x73dc1683,
|
||||||
|
0xe3630b12,0x94643b84,0xd6d6a3e,0x7a6a5aa8,0xe40ecf0b,
|
||||||
|
0x9309ff9d,0xa00ae27,0x7d079eb1,0xf00f9344,0x8708a3d2,
|
||||||
|
0x1e01f268,0x6906c2fe,0xf762575d,0x806567cb,0x196c3671,
|
||||||
|
0x6e6b06e7,0xfed41b76,0x89d32be0,0x10da7a5a,0x67dd4acc,
|
||||||
|
0xf9b9df6f,0x8ebeeff9,0x17b7be43,0x60b08ed5,0xd6d6a3e8,
|
||||||
|
0xa1d1937e,0x38d8c2c4,0x4fdff252,0xd1bb67f1,0xa6bc5767,
|
||||||
|
0x3fb506dd,0x48b2364b,0xd80d2bda,0xaf0a1b4c,0x36034af6,
|
||||||
|
0x41047a60,0xdf60efc3,0xa867df55,0x316e8eef,0x4669be79,
|
||||||
|
0xcb61b38c,0xbc66831a,0x256fd2a0,0x5268e236,0xcc0c7795,
|
||||||
|
0xbb0b4703,0x220216b9,0x5505262f,0xc5ba3bbe,0xb2bd0b28,
|
||||||
|
0x2bb45a92,0x5cb36a04,0xc2d7ffa7,0xb5d0cf31,0x2cd99e8b,
|
||||||
|
0x5bdeae1d,0x9b64c2b0,0xec63f226,0x756aa39c,0x26d930a,
|
||||||
|
0x9c0906a9,0xeb0e363f,0x72076785,0x5005713,0x95bf4a82,
|
||||||
|
0xe2b87a14,0x7bb12bae,0xcb61b38,0x92d28e9b,0xe5d5be0d,
|
||||||
|
0x7cdcefb7,0xbdbdf21,0x86d3d2d4,0xf1d4e242,0x68ddb3f8,
|
||||||
|
0x1fda836e,0x81be16cd,0xf6b9265b,0x6fb077e1,0x18b74777,
|
||||||
|
0x88085ae6,0xff0f6a70,0x66063bca,0x11010b5c,0x8f659eff,
|
||||||
|
0xf862ae69,0x616bffd3,0x166ccf45,0xa00ae278,0xd70dd2ee,
|
||||||
|
0x4e048354,0x3903b3c2,0xa7672661,0xd06016f7,0x4969474d,
|
||||||
|
0x3e6e77db,0xaed16a4a,0xd9d65adc,0x40df0b66,0x37d83bf0,
|
||||||
|
0xa9bcae53,0xdebb9ec5,0x47b2cf7f,0x30b5ffe9,0xbdbdf21c,
|
||||||
|
0xcabac28a,0x53b39330,0x24b4a3a6,0xbad03605,0xcdd70693,
|
||||||
|
0x54de5729,0x23d967bf,0xb3667a2e,0xc4614ab8,0x5d681b02,
|
||||||
|
0x2a6f2b94,0xb40bbe37,0xc30c8ea1,0x5a05df1b,0x2d02ef8d };
|
||||||
|
|
||||||
|
uint64 lsh_hash_internal(LSHVECTOR *vec)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 reg1,reg2;
|
||||||
|
uint32 curtf,curhash,oldreg1;
|
||||||
|
uint32 i;
|
||||||
|
uint64 res;
|
||||||
|
|
||||||
|
reg1 = 0x12CF93AB;
|
||||||
|
reg2 = 0xEE39B2D6;
|
||||||
|
|
||||||
|
for(i=0;i<vec->numitems;++i) {
|
||||||
|
curtf = vec->items[i].tf;
|
||||||
|
curhash = vec->items[i].hash;
|
||||||
|
oldreg1 = reg1;
|
||||||
|
reg1 = CRC_UPDATE(reg1,curtf);
|
||||||
|
reg1 = CRC_UPDATE(reg1,curhash);
|
||||||
|
reg1 = CRC_UPDATE(reg1,(reg2>>24));
|
||||||
|
reg2 = CRC_UPDATE(reg2,(oldreg1>>24));
|
||||||
|
reg2 = CRC_UPDATE(reg2,(curhash>>8));
|
||||||
|
reg2 = CRC_UPDATE(reg2,(curhash>>16));
|
||||||
|
reg2 = CRC_UPDATE(reg2,(curhash>>24));
|
||||||
|
}
|
||||||
|
res = reg1;
|
||||||
|
res <<= 32;
|
||||||
|
res |= reg2;
|
||||||
|
return res;
|
||||||
|
}
|
414
Ghidra/Features/BSim/src/lshvector/c/lsh.c
Executable file
414
Ghidra/Features/BSim/src/lshvector/c/lsh.c
Executable file
@ -0,0 +1,414 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#include "lsh.h"
|
||||||
|
#include "fmgr.h"
|
||||||
|
#include "funcapi.h"
|
||||||
|
#include "access/htup_details.h"
|
||||||
|
#include "access/gin.h"
|
||||||
|
#include "libpq/pqformat.h"
|
||||||
|
#include <ctype.h>
|
||||||
|
|
||||||
|
PG_MODULE_MAGIC;
|
||||||
|
|
||||||
|
void _PG_init(void);
|
||||||
|
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_in);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_out);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_send);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_recv);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_hash);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_compare);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_overlap);
|
||||||
|
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_gin_extract_value);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_gin_extract_query);
|
||||||
|
PG_FUNCTION_INFO_V1(lshvector_gin_consistent);
|
||||||
|
|
||||||
|
PG_FUNCTION_INFO_V1(lsh_load);
|
||||||
|
PG_FUNCTION_INFO_V1(lsh_reload);
|
||||||
|
PG_FUNCTION_INFO_V1(lsh_getweight);
|
||||||
|
|
||||||
|
Datum lshvector_in(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_out(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_send(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_recv(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_hash(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_compare(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_overlap(PG_FUNCTION_ARGS);
|
||||||
|
|
||||||
|
Datum lshvector_gin_extract_value(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_gin_extract_query(PG_FUNCTION_ARGS);
|
||||||
|
Datum lshvector_gin_consistent(PG_FUNCTION_ARGS);
|
||||||
|
|
||||||
|
Datum lsh_load(PG_FUNCTION_ARGS);
|
||||||
|
Datum lsh_reload(PG_FUNCTION_ARGS);
|
||||||
|
Datum lsh_getweight(PG_FUNCTION_ARGS);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Allocate memory for an LSHVECTOR given the raw count of the number of hash entries in the vector
|
||||||
|
*/
|
||||||
|
static LSHVECTOR *allocate_lshvector(uint32 numentries)
|
||||||
|
|
||||||
|
{
|
||||||
|
LSHVECTOR *out;
|
||||||
|
uint32 maxitems, commonlen;
|
||||||
|
|
||||||
|
/* Maximum number of hashes in a single LSHVECTOR assuming a 1 gigabyte allocation limit */
|
||||||
|
maxitems = (0x3fffffff - HDRSIZELSH) / sizeof(LSH_ITEM);
|
||||||
|
|
||||||
|
if (numentries > maxitems) {
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),errmsg("Exceeded maximum entries for single lshvector")));
|
||||||
|
/* Does not return */
|
||||||
|
}
|
||||||
|
commonlen = HDRSIZELSH + numentries * sizeof(LSH_ITEM);
|
||||||
|
out = (LSHVECTOR *) palloc(commonlen);
|
||||||
|
SET_VARSIZE(out,commonlen);
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
void _PG_init(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
lsh_initialize();
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lsh_load(PG_FUNCTION_ARGS)
|
||||||
|
|
||||||
|
{
|
||||||
|
if (!weights_loaded) {
|
||||||
|
lsh_load_weights();
|
||||||
|
lsh_load_lookuptable();
|
||||||
|
lsh_load_binconfig();
|
||||||
|
weights_loaded = true;
|
||||||
|
}
|
||||||
|
PG_RETURN_INT32(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lsh_reload(PG_FUNCTION_ARGS)
|
||||||
|
|
||||||
|
{
|
||||||
|
lsh_load_weights();
|
||||||
|
lsh_load_lookuptable();
|
||||||
|
lsh_load_binconfig();
|
||||||
|
weights_loaded = true;
|
||||||
|
PG_RETURN_INT32(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lsh_getweight(PG_FUNCTION_ARGS)
|
||||||
|
|
||||||
|
{
|
||||||
|
LSHVECTOR *vec = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
uint32 arg = PG_GETARG_UINT32(1);
|
||||||
|
double res;
|
||||||
|
|
||||||
|
if (arg >= vec->numitems)
|
||||||
|
res = 0.0;
|
||||||
|
else
|
||||||
|
res = vec->items[arg].coeff;
|
||||||
|
PG_FREE_IF_COPY(vec,0);
|
||||||
|
PG_RETURN_FLOAT8( res );
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* text input
|
||||||
|
*/
|
||||||
|
Datum
|
||||||
|
lshvector_in(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
char *buf = (char *) PG_GETARG_POINTER(0);
|
||||||
|
char *ptr,*ptrstart;
|
||||||
|
LSHVECTOR *vec;
|
||||||
|
uint32 numitems = 0;
|
||||||
|
uint32 commacount = 0;
|
||||||
|
uint32 i,j;
|
||||||
|
int32 val;
|
||||||
|
char curc;
|
||||||
|
|
||||||
|
ptr = buf;
|
||||||
|
curc = '\0';
|
||||||
|
while(*ptr) {
|
||||||
|
curc = *ptr;
|
||||||
|
if (isspace(curc)==0) break;
|
||||||
|
++ptr;
|
||||||
|
}
|
||||||
|
if (curc != '(')
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Missing opening '('"))); /* Does not return */
|
||||||
|
++ptr;
|
||||||
|
ptrstart = ptr;
|
||||||
|
while (*ptr) {
|
||||||
|
curc = *ptr;
|
||||||
|
if (curc == ':')
|
||||||
|
numitems += 1;
|
||||||
|
else if (curc == ',')
|
||||||
|
commacount += 1;
|
||||||
|
else if (curc == ')')
|
||||||
|
break;
|
||||||
|
++ptr;
|
||||||
|
}
|
||||||
|
if ((curc != ')')||(numitems != commacount+1))
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Bad delimiters"))); /* Does not return */
|
||||||
|
|
||||||
|
vec = allocate_lshvector(numitems);
|
||||||
|
|
||||||
|
ptr = ptrstart;
|
||||||
|
i = 0;
|
||||||
|
j = 0;
|
||||||
|
while(*ptr) {
|
||||||
|
val = strtol(ptr,&ptr,16);
|
||||||
|
if (j==0) {
|
||||||
|
if ((val<1)||(val>64)) {
|
||||||
|
pfree(vec);
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Term frequency count out of bounds"))); /* Does not return */
|
||||||
|
}
|
||||||
|
vec->items[i].tf = (uint16)val;
|
||||||
|
j = 1;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
vec->items[i].hash = (uint32)val;
|
||||||
|
vec->items[i].idf = 0;
|
||||||
|
j = 0;
|
||||||
|
i += 1;
|
||||||
|
}
|
||||||
|
while(isspace( *ptr ))
|
||||||
|
ptr++;
|
||||||
|
if (*ptr == ')') break;
|
||||||
|
if (*ptr == ':') {
|
||||||
|
if (j==0) {
|
||||||
|
pfree(vec);
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Expected ','"))); /* Does not return */
|
||||||
|
}
|
||||||
|
ptr++;
|
||||||
|
}
|
||||||
|
else if (*ptr == ',') {
|
||||||
|
if (j==1) {
|
||||||
|
pfree(vec);
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_SYNTAX_ERROR),errmsg("Expected ':'"))); /* Does not return */
|
||||||
|
}
|
||||||
|
ptr++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
vec->numitems = numitems;
|
||||||
|
lsh_calc_weights(vec);
|
||||||
|
PG_RETURN_POINTER(vec);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* text output
|
||||||
|
*/
|
||||||
|
Datum
|
||||||
|
lshvector_out(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
LSHVECTOR *vec = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
StringInfoData buf;
|
||||||
|
uint32 i,sz;
|
||||||
|
|
||||||
|
initStringInfo(&buf);
|
||||||
|
|
||||||
|
appendStringInfoChar(&buf,'(');
|
||||||
|
sz = vec->numitems;
|
||||||
|
for(i=0;i<sz;++i) {
|
||||||
|
appendStringInfo(&buf,"%x",(int32)vec->items[i].tf);
|
||||||
|
appendStringInfoChar(&buf,':');
|
||||||
|
appendStringInfo(&buf,"%x",(int32)vec->items[i].hash);
|
||||||
|
if (i+1 < sz)
|
||||||
|
appendStringInfoChar(&buf,',');
|
||||||
|
}
|
||||||
|
appendStringInfoChar(&buf,')');
|
||||||
|
|
||||||
|
PG_FREE_IF_COPY(vec,0);
|
||||||
|
|
||||||
|
PG_RETURN_CSTRING(buf.data);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* binary output
|
||||||
|
*/
|
||||||
|
Datum
|
||||||
|
lshvector_send(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
LSHVECTOR *vec = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
uint32 i;
|
||||||
|
uint32 numitems;
|
||||||
|
StringInfoData buf;
|
||||||
|
|
||||||
|
numitems = vec->numitems;
|
||||||
|
|
||||||
|
pq_begintypsend(&buf);
|
||||||
|
pq_sendint(&buf,numitems,4);
|
||||||
|
|
||||||
|
for(i=0;i<numitems;++i) {
|
||||||
|
pq_sendint(&buf,vec->items[i].tf,1);
|
||||||
|
pq_sendint(&buf,vec->items[i].hash,4);
|
||||||
|
}
|
||||||
|
PG_FREE_IF_COPY(vec,0);
|
||||||
|
PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* binary input
|
||||||
|
*/
|
||||||
|
Datum
|
||||||
|
lshvector_recv(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
LSHVECTOR *out;
|
||||||
|
StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
|
||||||
|
uint32 numitems;
|
||||||
|
uint32 tf;
|
||||||
|
uint32 i;
|
||||||
|
|
||||||
|
numitems = pq_getmsgint(buf,4);
|
||||||
|
out = allocate_lshvector(numitems);
|
||||||
|
|
||||||
|
out->numitems = numitems;
|
||||||
|
for(i=0;i<numitems;++i) {
|
||||||
|
tf = pq_getmsgint(buf,1);
|
||||||
|
if ((tf<1)||(tf>64)) {
|
||||||
|
pfree(out);
|
||||||
|
ereport(ERROR,(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),errmsg("Term frequency is out of range")));
|
||||||
|
/* Does not return */
|
||||||
|
}
|
||||||
|
out->items[i].tf = tf;
|
||||||
|
out->items[i].hash = pq_getmsgint(buf,4);
|
||||||
|
}
|
||||||
|
lsh_calc_weights(out);
|
||||||
|
PG_RETURN_POINTER(out);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lshvector_hash(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
int64 res = (int64)lsh_hash_internal(a);
|
||||||
|
|
||||||
|
PG_FREE_IF_COPY(a,0);
|
||||||
|
|
||||||
|
PG_RETURN_INT64(res);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lshvector_compare(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
LSHVECTOR *b = PG_GETARG_LSHVECTOR_P(1);
|
||||||
|
TupleDesc tupdesc;
|
||||||
|
TupleDesc bless;
|
||||||
|
HeapTuple restuple;
|
||||||
|
Datum dvalues[2];
|
||||||
|
bool nulls[2] = {false, false};
|
||||||
|
double sim,sig;
|
||||||
|
|
||||||
|
sim = lsh_compare_internal(a,b,&sig);
|
||||||
|
PG_FREE_IF_COPY(a,0);
|
||||||
|
PG_FREE_IF_COPY(b,1);
|
||||||
|
|
||||||
|
if (get_call_result_type(fcinfo,NULL,&tupdesc) != TYPEFUNC_COMPOSITE)
|
||||||
|
elog(ERROR,"Could not get composite row type to return");
|
||||||
|
|
||||||
|
bless = BlessTupleDesc(tupdesc);
|
||||||
|
|
||||||
|
dvalues[0] = Float8GetDatum(sim);
|
||||||
|
dvalues[1] = Float8GetDatum(sig);
|
||||||
|
restuple = heap_form_tuple(bless,dvalues,nulls);
|
||||||
|
return HeapTupleGetDatum(restuple);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This is the actual operator function being accelerated by the gin index. In truth, the index itself
|
||||||
|
* defines the operator, so the commented out code below emulates the indexes key generation process and
|
||||||
|
* looks for overlap in the keys between two vectors. In practice, any query that invokes this operator
|
||||||
|
* will hopefully be going through the index and so doesn't need to evaluate this function. For
|
||||||
|
* cases where postgresql does a recheck after going through the index, there is no query that doesn't send
|
||||||
|
* the results of the operator test to a similarity filter. So there is no reason to actually perform
|
||||||
|
* the overlap test. So we just implement a NOP return that always returns true.
|
||||||
|
*/
|
||||||
|
Datum lshvector_overlap(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
/* bool res; */
|
||||||
|
/* int32 i; */
|
||||||
|
/* LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0); */
|
||||||
|
/* LSHVECTOR *b = PG_GETARG_LSHVECTOR_P(1); */
|
||||||
|
/* uint32 *bina = (uint32 *)palloc( sizeof(uint32) * lsh_L ); */
|
||||||
|
/* uint32 *binb = (uint32 *)palloc( sizeof(uint32) * lsh_L ); */
|
||||||
|
|
||||||
|
/* lsh_generate_binids(bina,a->items,a->numitems); */
|
||||||
|
/* lsh_generate_binids(binb,b->items,b->numitems); */
|
||||||
|
/* PG_FREE_IF_COPY(a,0); */
|
||||||
|
/* PG_FREE_IF_COPY(b,1); */
|
||||||
|
|
||||||
|
/* res = false; /\* Assume no overlap *\/ */
|
||||||
|
/* for(i=0;i<lsh_L;++i) { */
|
||||||
|
/* if (bina[i] == binb[i]) { */
|
||||||
|
/* res = true; /\* We found an overlap, (only need one) *\/ */
|
||||||
|
/* break; */
|
||||||
|
/* } */
|
||||||
|
/* } */
|
||||||
|
/* pfree(bina); */
|
||||||
|
/* pfree(binb); */
|
||||||
|
|
||||||
|
|
||||||
|
PG_RETURN_BOOL(true);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lshvector_gin_extract_value(PG_FUNCTION_ARGS)
|
||||||
|
|
||||||
|
{
|
||||||
|
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
int32 *nkeys = (int32 *) PG_GETARG_POINTER(1);
|
||||||
|
Datum *entries = (Datum *)palloc( sizeof(Datum) * lsh_L );
|
||||||
|
|
||||||
|
lsh_generate_binids_datum(entries,a->items,a->numitems);
|
||||||
|
PG_FREE_IF_COPY(a,0);
|
||||||
|
*nkeys = lsh_L;
|
||||||
|
PG_RETURN_POINTER(entries);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lshvector_gin_extract_query(PG_FUNCTION_ARGS)
|
||||||
|
|
||||||
|
{
|
||||||
|
LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(0);
|
||||||
|
int32 *nkeys = (int32 *) PG_GETARG_POINTER(1);
|
||||||
|
/* StrategyNumber strategy = PG_GETARG_UINT16(2); */
|
||||||
|
/* bool **pmatch = (bool **) PG_GETARG_POINTER(3); */
|
||||||
|
/* Pointer **extra_data = (Pointer **) PG_GETARG_POINTER(4); */
|
||||||
|
/* bool **nullFlags = (bool **) PG_GETARG_POINTER(5); */
|
||||||
|
/* int32 *searchMode = (int32 *) PG_GETARG_POINTER(6); */
|
||||||
|
Datum *entries = (Datum *)palloc( sizeof(Datum) * lsh_L );
|
||||||
|
|
||||||
|
lsh_generate_binids_datum(entries,a->items,a->numitems);
|
||||||
|
PG_FREE_IF_COPY(a,0);
|
||||||
|
*nkeys = lsh_L;
|
||||||
|
PG_RETURN_POINTER(entries);
|
||||||
|
}
|
||||||
|
|
||||||
|
Datum lshvector_gin_consistent(PG_FUNCTION_ARGS)
|
||||||
|
|
||||||
|
{
|
||||||
|
bool *check = (bool *) PG_GETARG_POINTER(0);
|
||||||
|
/* StrategyNumber strategy = PG_GETARG_UINT16(1); */
|
||||||
|
/* LSHVECTOR *a = PG_GETARG_LSHVECTOR_P(2); */
|
||||||
|
int32 nkeys = PG_GETARG_INT32(3);
|
||||||
|
/* Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4); */
|
||||||
|
bool *recheck = (bool *) PG_GETARG_POINTER(5);
|
||||||
|
bool res = false;
|
||||||
|
int32 i;
|
||||||
|
|
||||||
|
*recheck = false; /* The operator does NOT need to be recalculated, this routine should exactly match */
|
||||||
|
for(i=0;i<nkeys;++i) {
|
||||||
|
if (check[i]) { /* If ANY hash is present in the indexed lshvector */
|
||||||
|
res = true; /* this is considered an overlap */
|
||||||
|
break; /* and we don't need to look any further */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
PG_RETURN_BOOL(res);
|
||||||
|
}
|
60
Ghidra/Features/BSim/src/lshvector/c/lsh.h
Executable file
60
Ghidra/Features/BSim/src/lshvector/c/lsh.h
Executable file
@ -0,0 +1,60 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#ifndef __LSH_H__
|
||||||
|
#define __LSH_H__
|
||||||
|
|
||||||
|
#include "postgres.h"
|
||||||
|
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
uint32 hash; /* A specific hash */
|
||||||
|
uint16 tf; /* Associated hash(term) frequency */
|
||||||
|
uint16 idf; /* Inverse Document Frequency */
|
||||||
|
double coeff; /* The actual weight of this hash as a coefficient */
|
||||||
|
} LSH_ITEM;
|
||||||
|
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
int32 vl_len_; /* varlena header (do not touch directly!) */
|
||||||
|
uint32 numitems;
|
||||||
|
uint32 hashcount; /* Total number of hashes counting multiplicity */
|
||||||
|
double length; /* Length of vector */
|
||||||
|
LSH_ITEM items[1];
|
||||||
|
} LSHVECTOR;
|
||||||
|
|
||||||
|
#define HDRSIZELSH offsetof(LSHVECTOR,items)
|
||||||
|
|
||||||
|
#define DatumGetLshVectorP(X) ((LSHVECTOR *) PG_DETOAST_DATUM(X))
|
||||||
|
#define PG_GETARG_LSHVECTOR_P(n) DatumGetLshVectorP(PG_GETARG_DATUM(n))
|
||||||
|
|
||||||
|
extern int32 lsh_k;
|
||||||
|
extern int32 lsh_L;
|
||||||
|
extern uint32 crc32tab[];
|
||||||
|
extern bool weights_loaded;
|
||||||
|
|
||||||
|
extern void lsh_calc_weights(LSHVECTOR *vec);
|
||||||
|
extern void lsh_initialize(void);
|
||||||
|
extern void lsh_load_weights(void);
|
||||||
|
extern void lsh_load_lookuptable(void);
|
||||||
|
extern uint64 lsh_hash_internal(LSHVECTOR *vec);
|
||||||
|
extern double lsh_compare_internal(LSHVECTOR *a,LSHVECTOR *b,double *sig);
|
||||||
|
|
||||||
|
extern void lsh_setup_signtable(void);
|
||||||
|
extern void lsh_load_binconfig(void);
|
||||||
|
extern void lsh_generate_binids(uint32 *res,LSH_ITEM *vec,uint32 vecsize);
|
||||||
|
extern void lsh_generate_binids_datum(Datum *res,LSH_ITEM *vec,uint32 vecsize);
|
||||||
|
|
||||||
|
#endif
|
476
Ghidra/Features/BSim/src/lshvector/c/weights.c
Executable file
476
Ghidra/Features/BSim/src/lshvector/c/weights.c
Executable file
@ -0,0 +1,476 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#include "lsh.h"
|
||||||
|
#include "fmgr.h"
|
||||||
|
#include "executor/spi.h"
|
||||||
|
#include "utils/memutils.h"
|
||||||
|
#include <math.h>
|
||||||
|
|
||||||
|
#define LSH_IDFSIZE 512
|
||||||
|
#define LSH_TFSIZE 64
|
||||||
|
#define LSH_MAX_HASHENTRIES 1048576
|
||||||
|
#define LSH_MAX_K 31
|
||||||
|
#define LSH_MAX_L 1024
|
||||||
|
#define LSH_DEFAULT_K 17
|
||||||
|
#define LSH_DEFAULT_L 146
|
||||||
|
|
||||||
|
int32 lsh_k; /* Number of bits in a binid */
|
||||||
|
int32 lsh_L; /* Number of binnings */
|
||||||
|
|
||||||
|
static double lsh_idfweight[LSH_IDFSIZE]; /* Sorted weights least -> most probable for Inverse Document Freq */
|
||||||
|
static double lsh_tfweight[LSH_TFSIZE]; /* Sorted weights least -> most probable for Term Frequency */
|
||||||
|
static double lsh_weightnorm; /* Normalization of idf weights over raw log(probability) */
|
||||||
|
static double lsh_probflip0; /* Significance penalty for hash flips */
|
||||||
|
static double lsh_probflip1;
|
||||||
|
static double lsh_probdiff0; /* Significance penalty for length differences */
|
||||||
|
static double lsh_probdiff1;
|
||||||
|
static double lsh_scale; /* Final scaling for significance scoring */
|
||||||
|
static double lsh_addend;
|
||||||
|
static double lsh_probflip0_norm;
|
||||||
|
static double lsh_probflip1_norm;
|
||||||
|
static double lsh_probdiff0_norm;
|
||||||
|
static double lsh_probdiff1_norm;
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
uint32 hash;
|
||||||
|
uint32 count;
|
||||||
|
} IDFEntry;
|
||||||
|
|
||||||
|
static MemoryContext lsh_mem_ctx;
|
||||||
|
static uint32 lsh_IDFTableMask; /* mask for hash table computation */
|
||||||
|
static IDFEntry *lsh_IDFTable = NULL; /* The IDFLookup table */
|
||||||
|
bool weights_loaded = false;
|
||||||
|
|
||||||
|
static void update_norms(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
int32 i;
|
||||||
|
double scale_sqrt = sqrt(lsh_scale);
|
||||||
|
lsh_probflip0_norm = lsh_probflip0 * lsh_scale;
|
||||||
|
lsh_probflip1_norm = lsh_probflip1 * lsh_scale;
|
||||||
|
lsh_probdiff0_norm = lsh_probdiff0 * lsh_scale;
|
||||||
|
lsh_probdiff1_norm = lsh_probdiff1 * lsh_scale;
|
||||||
|
lsh_weightnorm = lsh_weightnorm / lsh_scale;
|
||||||
|
for(i=0;i<LSH_IDFSIZE;++i) {
|
||||||
|
lsh_idfweight[i] *= scale_sqrt;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Load the IDF and TF weights and other scaling info from the table 'weighttable'
|
||||||
|
* If the table isn't present, return false
|
||||||
|
* This assumes the existence of a table with LSH_IDFSIZE + LSH_TFSIZE + 7 row constructed with
|
||||||
|
* CREATE TABLE weighttable (id integer,weight double precision);
|
||||||
|
*/
|
||||||
|
static bool load_weights_from_table(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
SPITupleTable *spi_tuptable;
|
||||||
|
TupleDesc spi_tupdesc;
|
||||||
|
uint64 i,proc;
|
||||||
|
int32 ret;
|
||||||
|
char *resstring;
|
||||||
|
int32 resindex;
|
||||||
|
double resweight;
|
||||||
|
|
||||||
|
ret = SPI_connect();
|
||||||
|
|
||||||
|
if (ret < 0)
|
||||||
|
elog(ERROR,"lshvector load_weights_from_table: SPI_connect returned %d",ret);
|
||||||
|
|
||||||
|
/* Check for the existence of weighttable */
|
||||||
|
ret = SPI_execute("SELECT relname from pg_class where relname='weighttable';",true,0);
|
||||||
|
proc = SPI_processed;
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc != 1)) {
|
||||||
|
elog(WARNING,"lshvector load_weights_from_table: weighttable not present - using default weights");
|
||||||
|
SPI_finish();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = SPI_execute("SELECT ALL * from weighttable;",true,0); /* Read(only) all rows from table */
|
||||||
|
proc = SPI_processed;
|
||||||
|
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc != (LSH_IDFSIZE+LSH_TFSIZE + 7))) {
|
||||||
|
elog(WARNING,"lshvector load_weights_from_table: weighttable has incorrect length - reverting to default weights");
|
||||||
|
SPI_finish();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||||
|
spi_tuptable = SPI_tuptable;
|
||||||
|
|
||||||
|
for(i=0;i<proc;++i) {
|
||||||
|
HeapTuple tuple = spi_tuptable->vals[i];
|
||||||
|
resstring = SPI_getvalue(tuple, spi_tupdesc, 1); /* Column numbers start at 1 */
|
||||||
|
resindex = strtol(resstring,NULL,10);
|
||||||
|
pfree(resstring);
|
||||||
|
resstring = SPI_getvalue(tuple, spi_tupdesc, 2);
|
||||||
|
resweight = atof( resstring );
|
||||||
|
pfree(resstring);
|
||||||
|
if (resindex < LSH_IDFSIZE)
|
||||||
|
lsh_idfweight[resindex] = resweight;
|
||||||
|
else if (resindex < LSH_IDFSIZE + LSH_TFSIZE)
|
||||||
|
lsh_tfweight[resindex - LSH_IDFSIZE] = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE))
|
||||||
|
lsh_weightnorm = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 1))
|
||||||
|
lsh_probflip0 = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 2))
|
||||||
|
lsh_probflip1 = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 3))
|
||||||
|
lsh_probdiff0 = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 4))
|
||||||
|
lsh_probdiff1 = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 5))
|
||||||
|
lsh_scale = resweight;
|
||||||
|
else if (resindex == (LSH_IDFSIZE + LSH_TFSIZE + 6))
|
||||||
|
lsh_addend = resweight;
|
||||||
|
else {
|
||||||
|
SPI_finish();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
SPI_finish();
|
||||||
|
update_norms();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
void lsh_load_weights(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
int32 i;
|
||||||
|
if (load_weights_from_table()) /* Try to get weights from table */
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* Provide some sort of reasonable default */
|
||||||
|
for(i=0;i<LSH_IDFSIZE;++i)
|
||||||
|
lsh_idfweight[i] = 1.0;
|
||||||
|
for(i=0;i<LSH_TFSIZE;++i)
|
||||||
|
lsh_tfweight[i] = 1.0;
|
||||||
|
|
||||||
|
lsh_weightnorm = 13.0;
|
||||||
|
lsh_probflip0 = 0.2;
|
||||||
|
lsh_probflip1 = 20.0;
|
||||||
|
lsh_probdiff0 = 0.2;
|
||||||
|
lsh_probdiff1 = 20.0;
|
||||||
|
lsh_scale = 1.0;
|
||||||
|
lsh_addend = 0.0;
|
||||||
|
update_norms();
|
||||||
|
}
|
||||||
|
|
||||||
|
static void initialize_idflookup_hashtable(uint32 size)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 i;
|
||||||
|
MemoryContext oldctx;
|
||||||
|
|
||||||
|
lsh_IDFTableMask = 1;
|
||||||
|
while( lsh_IDFTableMask < size )
|
||||||
|
lsh_IDFTableMask <<= 1;
|
||||||
|
|
||||||
|
lsh_IDFTableMask <<= 1;
|
||||||
|
oldctx = MemoryContextSwitchTo(lsh_mem_ctx);
|
||||||
|
lsh_IDFTable = (IDFEntry *) palloc(sizeof(IDFEntry) * lsh_IDFTableMask);
|
||||||
|
for(i=0;i<lsh_IDFTableMask;++i) {
|
||||||
|
lsh_IDFTable[i].count = 0xffffffff; /* Mark all the slots as empty */
|
||||||
|
}
|
||||||
|
|
||||||
|
lsh_IDFTableMask -= 1;
|
||||||
|
MemoryContextSwitchTo(oldctx);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void insert_idflookup_hash(uint32 hash,uint32 count)
|
||||||
|
|
||||||
|
{
|
||||||
|
IDFEntry *ptr;
|
||||||
|
uint32 val = hash & lsh_IDFTableMask;
|
||||||
|
for(;;) {
|
||||||
|
ptr = lsh_IDFTable + val;
|
||||||
|
if (ptr->count == 0xffffffff) /* Found an empty slot */
|
||||||
|
break;
|
||||||
|
val = (val + 1) & lsh_IDFTableMask;
|
||||||
|
}
|
||||||
|
ptr->hash = hash;
|
||||||
|
ptr->count = count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static uint32 get_idflookup_count(uint32 hash)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 val;
|
||||||
|
IDFEntry *ptr;
|
||||||
|
if (lsh_IDFTableMask == 0)
|
||||||
|
return 0;
|
||||||
|
val = hash & lsh_IDFTableMask;
|
||||||
|
for(;;) {
|
||||||
|
ptr = lsh_IDFTable + val;
|
||||||
|
if (ptr->count == 0xffffffff) break; /* Is slot empty */
|
||||||
|
if (ptr->hash == hash)
|
||||||
|
return ptr->count;
|
||||||
|
val = (val + 1) & lsh_IDFTableMask;
|
||||||
|
}
|
||||||
|
return 0; /* Entry is not in the table (assume 0 count) */
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Based on hash and existing idf and tf counts, calculate the final coefficient
|
||||||
|
* Also calculate the vector length and hashcount
|
||||||
|
*/
|
||||||
|
void lsh_calc_weights(LSHVECTOR *vec)
|
||||||
|
|
||||||
|
{
|
||||||
|
uint32 i;
|
||||||
|
LSH_ITEM *ptr;
|
||||||
|
uint32 idf;
|
||||||
|
double length = 0.0;
|
||||||
|
double coeff;
|
||||||
|
uint32 tf;
|
||||||
|
uint32 hashcount = 0;
|
||||||
|
|
||||||
|
ptr = vec->items;
|
||||||
|
for(i=0;i<vec->numitems;++i) {
|
||||||
|
idf = get_idflookup_count(ptr[i].hash);
|
||||||
|
ptr[i].idf = idf;
|
||||||
|
tf = ptr[i].tf;
|
||||||
|
coeff = lsh_idfweight[idf] * lsh_tfweight[ tf - 1 ];
|
||||||
|
ptr[i].coeff = coeff;
|
||||||
|
length += coeff * coeff;
|
||||||
|
hashcount += tf;
|
||||||
|
}
|
||||||
|
vec->length = sqrt(length);
|
||||||
|
vec->hashcount = hashcount;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Load the most common IDF hashes for lookup and weight generation from the table 'idflookup'
|
||||||
|
* If the table isn't present, return false
|
||||||
|
* This assumes the existence of a table with (approximately) 1000 rows constructed with
|
||||||
|
* CREATE TABLE idflookup( hash bigint, lookup integer);
|
||||||
|
*/
|
||||||
|
static bool load_idflookup_from_table(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
SPITupleTable *spi_tuptable;
|
||||||
|
TupleDesc spi_tupdesc;
|
||||||
|
uint64 i,proc;
|
||||||
|
int32 ret;
|
||||||
|
char *resstring;
|
||||||
|
uint32 rescount;
|
||||||
|
uint32 reshash;
|
||||||
|
|
||||||
|
ret = SPI_connect();
|
||||||
|
|
||||||
|
if (ret < 0)
|
||||||
|
elog(ERROR,"lshvector load_idflookup_from_table: SPI_connect returned %d",ret);
|
||||||
|
|
||||||
|
/* Check for the existence of idflookup */
|
||||||
|
ret = SPI_execute("SELECT relname from pg_class where relname='idflookup';",true,0);
|
||||||
|
proc = SPI_processed;
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc != 1)) {
|
||||||
|
elog(WARNING,"lshvector load_idflookup_from_table: No IDF hashes present");
|
||||||
|
SPI_finish();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = SPI_execute("SELECT ALL * from idflookup;",true,0); /* Read(only) all rows from table */
|
||||||
|
proc = SPI_processed;
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc <= 1)||(proc > LSH_MAX_HASHENTRIES)) {
|
||||||
|
elog(WARNING,"lshvector load_idflookup_from_table: idflookup has invalid size: IDF hashes not loaded");
|
||||||
|
SPI_finish();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
initialize_idflookup_hashtable((uint32)proc); /* Allocate the hashtable to hold entries for each row */
|
||||||
|
|
||||||
|
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||||
|
spi_tuptable = SPI_tuptable;
|
||||||
|
|
||||||
|
for(i=0;i<proc;++i) {
|
||||||
|
HeapTuple tuple = spi_tuptable->vals[i];
|
||||||
|
resstring = SPI_getvalue(tuple, spi_tupdesc, 1); /* Column numbers start at 1 */
|
||||||
|
reshash = strtoul(resstring,NULL,10);
|
||||||
|
pfree(resstring);
|
||||||
|
resstring = SPI_getvalue(tuple, spi_tupdesc, 2);
|
||||||
|
rescount = strtoul(resstring,NULL,10);
|
||||||
|
pfree(resstring);
|
||||||
|
insert_idflookup_hash(reshash,rescount);
|
||||||
|
}
|
||||||
|
SPI_finish();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
void lsh_load_binconfig(void)
|
||||||
|
|
||||||
|
{ /* Load the k and L parameters from the database */
|
||||||
|
SPITupleTable *spi_tuptable;
|
||||||
|
TupleDesc spi_tupdesc;
|
||||||
|
uint64 proc;
|
||||||
|
int32 ret;
|
||||||
|
char *resstring;
|
||||||
|
HeapTuple tuple;
|
||||||
|
|
||||||
|
ret = SPI_connect();
|
||||||
|
|
||||||
|
if (ret < 0)
|
||||||
|
elog(ERROR,"lshvector lsh_load_binconfig: SPI_connect returned %d",ret);
|
||||||
|
|
||||||
|
/* Check for the existence of keyvaluetable */
|
||||||
|
ret = SPI_execute("SELECT relname from pg_class where relname='keyvaluetable';",true,0);
|
||||||
|
proc = SPI_processed;
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc != 1)) {
|
||||||
|
SPI_finish();
|
||||||
|
lsh_k = LSH_DEFAULT_K; /* Reasonable defaults if configuration parameters don't exist */
|
||||||
|
lsh_L = LSH_DEFAULT_L;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Get the 'k' value */
|
||||||
|
ret = SPI_execute("SELECT value FROM keyvaluetable WHERE key='k';",true,0);
|
||||||
|
proc = SPI_processed;
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc != 1))
|
||||||
|
elog(ERROR,"lshvector lsh_load_binconfig: Could not load 'k' value from keyvaluetable");
|
||||||
|
|
||||||
|
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||||
|
spi_tuptable = SPI_tuptable;
|
||||||
|
|
||||||
|
tuple = spi_tuptable->vals[0];
|
||||||
|
resstring = SPI_getvalue(tuple,spi_tupdesc, 1); /* First column */
|
||||||
|
lsh_k = strtoul(resstring,NULL,10);
|
||||||
|
pfree(resstring);
|
||||||
|
|
||||||
|
/* Get the 'L' value */
|
||||||
|
ret = SPI_execute("SELECT value FROM keyvaluetable WHERE key='L';",true,0);
|
||||||
|
proc = SPI_processed;
|
||||||
|
if ((ret != SPI_OK_SELECT)||(proc != 1))
|
||||||
|
elog(ERROR,"lshvector lsh_load_binconfig: Could not load 'L' value from keyvaluetable");
|
||||||
|
|
||||||
|
spi_tupdesc = SPI_tuptable->tupdesc;
|
||||||
|
spi_tuptable = SPI_tuptable;
|
||||||
|
|
||||||
|
tuple = spi_tuptable->vals[0];
|
||||||
|
resstring = SPI_getvalue(tuple,spi_tupdesc, 1); /* First column */
|
||||||
|
lsh_L = strtoul(resstring,NULL,10);
|
||||||
|
pfree(resstring);
|
||||||
|
SPI_finish();
|
||||||
|
|
||||||
|
if (lsh_k < 1 || lsh_k > LSH_MAX_K || lsh_L < 1 || lsh_L > LSH_MAX_L)
|
||||||
|
elog(ERROR,"lshvector lsh_load_binconfig: Invalid k and L settings");
|
||||||
|
}
|
||||||
|
|
||||||
|
void lsh_load_lookuptable(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
if (lsh_IDFTable != NULL) {
|
||||||
|
pfree(lsh_IDFTable);
|
||||||
|
lsh_IDFTable = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (load_idflookup_from_table())
|
||||||
|
return;
|
||||||
|
|
||||||
|
if (lsh_IDFTable != NULL) {
|
||||||
|
pfree(lsh_IDFTable);
|
||||||
|
lsh_IDFTable = NULL;
|
||||||
|
}
|
||||||
|
lsh_IDFTableMask = 0; /* Default lookup, always return 0 */
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Initialize the weight system, the first time the extension is loaded */
|
||||||
|
void lsh_initialize(void)
|
||||||
|
|
||||||
|
{
|
||||||
|
lsh_mem_ctx = AllocSetContextCreate(TopMemoryContext,
|
||||||
|
"IDF weights lookup table",
|
||||||
|
ALLOCSET_DEFAULT_MINSIZE,
|
||||||
|
ALLOCSET_DEFAULT_INITSIZE,
|
||||||
|
ALLOCSET_DEFAULT_MAXSIZE);
|
||||||
|
|
||||||
|
lsh_IDFTable = NULL;
|
||||||
|
weights_loaded = false;
|
||||||
|
|
||||||
|
lsh_setup_signtable();
|
||||||
|
}
|
||||||
|
|
||||||
|
double lsh_compare_internal(LSHVECTOR *a,LSHVECTOR *b,double *sig)
|
||||||
|
|
||||||
|
{
|
||||||
|
double res = 0.0;
|
||||||
|
double dotproduct;
|
||||||
|
int32 intersectcount = 0;
|
||||||
|
uint32 hash1,hash2;
|
||||||
|
LSH_ITEM *aptr,*aend,*bptr,*bend;
|
||||||
|
int32 t1,t2;
|
||||||
|
double w1,w2;
|
||||||
|
uint32 numflip,diff,min,max;
|
||||||
|
|
||||||
|
aptr = a->items;
|
||||||
|
aend = aptr + a->numitems;
|
||||||
|
bptr = b->items;
|
||||||
|
bend = bptr + b->numitems;
|
||||||
|
|
||||||
|
if ((aptr != aend)&&(bptr != bend)) {
|
||||||
|
hash1 = aptr->hash;
|
||||||
|
hash2 = bptr->hash;
|
||||||
|
for(;;) {
|
||||||
|
if (hash1 == hash2) {
|
||||||
|
t1 = aptr->tf;
|
||||||
|
t2 = bptr->tf;
|
||||||
|
if (t1 < t2) { /* a has the smallest number of terms with same hash */
|
||||||
|
w1 = aptr->coeff; /* Use a weight */
|
||||||
|
res += w1 * w1;
|
||||||
|
intersectcount += t1; /* All of a terms are in the intersection, count them */
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
w2 = bptr->coeff; /* Use b weight */
|
||||||
|
res += w2 * w2;
|
||||||
|
intersectcount += t2; /* All of b terms are in the intersection, count them */
|
||||||
|
}
|
||||||
|
aptr++;
|
||||||
|
bptr++;
|
||||||
|
if (aptr == aend) break;
|
||||||
|
if (bptr == bend) break;
|
||||||
|
hash1 = aptr->hash;
|
||||||
|
hash2 = bptr->hash;
|
||||||
|
}
|
||||||
|
else if (hash1 < hash2) {
|
||||||
|
aptr++;
|
||||||
|
if (aptr == aend) break;
|
||||||
|
hash1 = aptr->hash;
|
||||||
|
}
|
||||||
|
else { /* hash1 > hash2 */
|
||||||
|
bptr++;
|
||||||
|
if (bptr == bend) break;
|
||||||
|
hash2 = bptr->hash;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
dotproduct = res;
|
||||||
|
res /= (a->length * b->length);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
dotproduct = res;
|
||||||
|
|
||||||
|
if (a->hashcount < b->hashcount) {
|
||||||
|
min = a->hashcount; /* Smallest vector is a */
|
||||||
|
max = b->hashcount;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
min = b->hashcount;
|
||||||
|
max = a->hashcount;
|
||||||
|
}
|
||||||
|
diff = max - min; /* Subtract to get a positive difference */
|
||||||
|
numflip = min - intersectcount;
|
||||||
|
*sig = dotproduct - numflip * (lsh_probflip0_norm + lsh_probflip1_norm/max)
|
||||||
|
- diff * (lsh_probdiff0_norm + lsh_probdiff1_norm/max) + lsh_addend;
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
107
Ghidra/Features/BSim/src/lshvector/lshvector--1.0.sql
Executable file
107
Ghidra/Features/BSim/src/lshvector/lshvector--1.0.sql
Executable file
@ -0,0 +1,107 @@
|
|||||||
|
|
||||||
|
|
||||||
|
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
|
||||||
|
\echo Use "CREATE EXTENSION lshvector" to load this file. \quit
|
||||||
|
|
||||||
|
-- Create user-defined type for feature vector
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_in(cstring)
|
||||||
|
RETURNS lshvector
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STABLE STRICT;
|
||||||
|
-- Stable because of configurable weights
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_out(lshvector)
|
||||||
|
RETURNS cstring
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C IMMUTABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_recv(internal)
|
||||||
|
RETURNS lshvector
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STABLE STRICT;
|
||||||
|
-- Stable because of configurable weights
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_send(lshvector)
|
||||||
|
RETURNS bytea
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C IMMUTABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_hash(lshvector)
|
||||||
|
RETURNS int8
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C IMMUTABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lsh_load()
|
||||||
|
RETURNS int4
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lsh_reload()
|
||||||
|
RETURNS int4
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lsh_getweight(lshvector)
|
||||||
|
RETURNS float8
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C IMMUTABLE STRICT;
|
||||||
|
|
||||||
|
CREATE TYPE lshvector (
|
||||||
|
INTERNALLENGTH = variable,
|
||||||
|
INPUT = lshvector_in,
|
||||||
|
OUTPUT = lshvector_out,
|
||||||
|
RECEIVE = lshvector_recv,
|
||||||
|
SEND = lshvector_send,
|
||||||
|
ALIGNMENT = double,
|
||||||
|
STORAGE = external
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TYPE lshvector_comptype AS (
|
||||||
|
sim DOUBLE PRECISION,
|
||||||
|
sig DOUBLE PRECISION
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_compare(lshvector,lshvector)
|
||||||
|
RETURNS lshvector_comptype
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C IMMUTABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_overlap(lshvector,lshvector)
|
||||||
|
RETURNS bool
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_gin_extract_value(lshvector,internal)
|
||||||
|
RETURNS internal
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_gin_extract_query(lshvector,internal,int2,internal,internal,internal,internal)
|
||||||
|
RETURNS internal
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STABLE STRICT;
|
||||||
|
|
||||||
|
CREATE FUNCTION lshvector_gin_consistent(internal, int2, lshvector, int4, internal, internal, internal, internal)
|
||||||
|
RETURNS bool
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C IMMUTABLE STRICT;
|
||||||
|
|
||||||
|
CREATE OPERATOR % (
|
||||||
|
LEFTARG = lshvector,
|
||||||
|
RIGHTARG = lshvector,
|
||||||
|
PROCEDURE = lshvector_overlap,
|
||||||
|
COMMUTATOR = '%',
|
||||||
|
RESTRICT = contsel,
|
||||||
|
JOIN = contjoinsel
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE OPERATOR CLASS gin_lshvector_ops
|
||||||
|
FOR TYPE lshvector USING gin
|
||||||
|
AS
|
||||||
|
OPERATOR 1 % (lshvector,lshvector),
|
||||||
|
FUNCTION 1 btint4cmp (int4,int4),
|
||||||
|
FUNCTION 2 lshvector_gin_extract_value (lshvector,internal),
|
||||||
|
FUNCTION 3 lshvector_gin_extract_query (lshvector,internal,int2,internal,internal,internal,internal),
|
||||||
|
FUNCTION 4 lshvector_gin_consistent (internal,int2,lshvector,int4,internal,internal,internal,internal),
|
||||||
|
STORAGE int4;
|
6
Ghidra/Features/BSim/src/lshvector/lshvector.control
Executable file
6
Ghidra/Features/BSim/src/lshvector/lshvector.control
Executable file
@ -0,0 +1,6 @@
|
|||||||
|
# Locality Sensitive Hashing extension
|
||||||
|
comment = 'a feature vector type and a locality sensitive hashing index'
|
||||||
|
default_version = '1.0'
|
||||||
|
module_pathname = '$libdir/lshvector'
|
||||||
|
superuser = false
|
||||||
|
relocatable = true
|
175
Ghidra/Features/BSim/src/main/help/help/TOC_Source.xml
Executable file
175
Ghidra/Features/BSim/src/main/help/help/TOC_Source.xml
Executable file
@ -0,0 +1,175 @@
|
|||||||
|
<?xml version='1.0' encoding='ISO-8859-1' ?>
|
||||||
|
<!--
|
||||||
|
|
||||||
|
This is an XML file intended to be parsed by the Ghidra help system. It is loosely based
|
||||||
|
upon the JavaHelp table of contents document format. The Ghidra help system uses a
|
||||||
|
TOC_Source.xml file to allow a module with help to define how its contents appear in the
|
||||||
|
Ghidra help viewer's table of contents. The main document (in the Base module)
|
||||||
|
defines a basic structure for the
|
||||||
|
Ghidra table of contents system. Other TOC_Source.xml files may use this structure to insert
|
||||||
|
their files directly into this structure (and optionally define a substructure).
|
||||||
|
|
||||||
|
|
||||||
|
In this document, a tag can be either a <tocdef> or a <tocref>. The former is a definition
|
||||||
|
of an XML item that may have a link and may contain other <tocdef> and <tocref> children.
|
||||||
|
<tocdef> items may be referred to in other documents by using a <tocref> tag with the
|
||||||
|
appropriate id attribute value. Using these two tags allows any module to define a place
|
||||||
|
in the table of contents system (<tocdef>), which also provides a place for
|
||||||
|
other TOC_Source.xml files to insert content (<tocref>).
|
||||||
|
|
||||||
|
During the help build time, all TOC_Source.xml files will be parsed and validated to ensure
|
||||||
|
that all <tocref> tags point to valid <tocdef> tags. From these files will be generated
|
||||||
|
<module name>_TOC.xml files, which are table of contents files written in the format
|
||||||
|
desired by the JavaHelp system. Additionally, the genated files will be merged together
|
||||||
|
as they are loaded by the JavaHelp system. In the end, when displaying help in the Ghidra
|
||||||
|
help GUI, there will be on table of contents that has been created from the definitions in
|
||||||
|
all of the modules' TOC_Source.xml files.
|
||||||
|
|
||||||
|
|
||||||
|
Tags and Attributes
|
||||||
|
|
||||||
|
<tocdef>
|
||||||
|
-id - the name of the definition (this must be unique across all TOC_Source.xml files)
|
||||||
|
-text - the display text of the node, as seen in the help GUI
|
||||||
|
-target** - the file to display when the node is clicked in the GUI
|
||||||
|
-sortgroup - this is a string that defines where a given node should appear under a given
|
||||||
|
parent. The string values will be sorted by the JavaHelp system using
|
||||||
|
a javax.text.RulesBasedCollator. If this attribute is not specified, then
|
||||||
|
the text of attribute will be used.
|
||||||
|
|
||||||
|
<tocref>
|
||||||
|
-id - The id of the <tocdef> that this reference points to
|
||||||
|
|
||||||
|
**The URL for the target is relative and should start with 'help/topics'. This text is
|
||||||
|
used by the Ghidra help system to provide a universal starting point for all links so that
|
||||||
|
they can be resolved at runtime, across modules.
|
||||||
|
|
||||||
|
|
||||||
|
-->
|
||||||
|
|
||||||
|
<tocroot>
|
||||||
|
|
||||||
|
<tocref id="Ghidra Functionality">
|
||||||
|
<tocdef id="BSim"
|
||||||
|
text="BSim"
|
||||||
|
target= "help/topics/BSim/BSimOverview.html">
|
||||||
|
<tocdef id="BSimDatabaseConfiguration" sortgroup="a"
|
||||||
|
text="BSim Database Configuration"
|
||||||
|
target="help/topics/BSim/DatabaseConfiguration.html" >
|
||||||
|
<tocdef id="BSim Overview"
|
||||||
|
sortgroup="a"
|
||||||
|
text="Overview"
|
||||||
|
target="help/topics/BSim/DatabaseConfiguration.html#ConfigOverview" />
|
||||||
|
<tocdef id="BSim Server Configuration"
|
||||||
|
sortgroup="b"
|
||||||
|
text="Server Configuration"
|
||||||
|
target="help/topics/BSim/DatabaseConfiguration.html#ServerConfig" />
|
||||||
|
<tocdef id="Creating a BSim Database"
|
||||||
|
sortgroup="c"
|
||||||
|
text="Creating a Database"
|
||||||
|
target="help/topics/BSim/DatabaseConfiguration.html#CreateDatabase" />
|
||||||
|
<tocdef id="Tailoring BSim Meta-dataX"
|
||||||
|
sortgroup="d"
|
||||||
|
text="Tailoring BSim Meta-data"
|
||||||
|
target="help/topics/BSim/DatabaseConfiguration.html#TailorBSim" />
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="BSimIngestProcess" sortgroup="b"
|
||||||
|
text="Ingesting Executables"
|
||||||
|
target="help/topics/BSim/IngestProcess.html" >
|
||||||
|
<tocdef id="BSim Ingest Process"
|
||||||
|
sortgroup="a"
|
||||||
|
text="Ingest Process"
|
||||||
|
target="help/topics/BSim/IngestProcess.html#IngestOverview"/>
|
||||||
|
<tocdef id="BSim Tailoring Analysis"
|
||||||
|
sortgroup="b"
|
||||||
|
text="Tailoring Analysis"
|
||||||
|
target="help/topics/BSim/IngestProcess.html#TailorAnalysis"/>
|
||||||
|
<tocdef id="BSim Analysis Effects on Feature Extraction"
|
||||||
|
sortgroup="c"
|
||||||
|
text="Analysis Effects on Feature Extraction"
|
||||||
|
target="help/topics/BSim/IngestProcess.html#AnalysisEffects"/>
|
||||||
|
<tocdef id="BSim Maintenance"
|
||||||
|
sortgroup="d"
|
||||||
|
text="Maintenance"
|
||||||
|
target="help/topics/BSim/IngestProcess.html#Maintenance"/>
|
||||||
|
<tocdef id="BSim Migration"
|
||||||
|
sortgroup="e"
|
||||||
|
text="Migration"
|
||||||
|
target="help/topics/BSim/IngestProcess.html#Migration"/>
|
||||||
|
</tocdef>
|
||||||
|
|
||||||
|
|
||||||
|
<tocdef id="BSimSearch"
|
||||||
|
text="BSim Search"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html">
|
||||||
|
<tocdef id="Adding_BSim_Plugin"
|
||||||
|
sortgroup="a"
|
||||||
|
text="Enabling the BSim Search Plugin"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#Adding_BSim_Plugin">
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="BSim_Servers_Dialog"
|
||||||
|
sortgroup="b"
|
||||||
|
text="Defining And Managing BSim Database Definitions"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Servers_Dialog">
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="BSim_Overview_Dialog"
|
||||||
|
sortgroup="c"
|
||||||
|
text="Overview Query"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Overview_Dialog">
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="BSim_Overview_Results"
|
||||||
|
sortgroup="d"
|
||||||
|
text="Overview Query Results"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Overview_Results">
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="BSim_Search_Dialog"
|
||||||
|
sortgroup="e"
|
||||||
|
text="Similar Function Search"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Search_Dialog">
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="Similar_Functions_Results"
|
||||||
|
sortgroup="f"
|
||||||
|
text="Similar Function Search Results"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#Similar_Functions_Results">
|
||||||
|
</tocdef>
|
||||||
|
<tocdef id="BSim_Authentication"
|
||||||
|
sortgroup="g"
|
||||||
|
text="Authentication"
|
||||||
|
target = "help/topics/BSimSearchPlugin/BSimSearch.html#BSim_Authentication">
|
||||||
|
</tocdef>
|
||||||
|
</tocdef>
|
||||||
|
|
||||||
|
<tocdef id="BSimFeatureWeight" sortgroup="d"
|
||||||
|
text="Features and Weights"
|
||||||
|
target="help/topics/BSim/FeatureWeight.html" >
|
||||||
|
|
||||||
|
<tocdef id="BSim Features of Software Functions"
|
||||||
|
sortgroup="a"
|
||||||
|
text="Features of Software Functions"
|
||||||
|
target="help/topics/BSim/FeatureWeight.html#FunctionFeatures"/>
|
||||||
|
<tocdef id="BSim Weighting Software Features"
|
||||||
|
sortgroup="b"
|
||||||
|
text="Weighting Software Features"
|
||||||
|
target="help/topics/BSim/FeatureWeight.html#WeightingSoftware"/>
|
||||||
|
<tocdef id="BSim Comparing Feature Vectors"
|
||||||
|
sortgroup="d"
|
||||||
|
text="Comparing Feature Vectors"
|
||||||
|
target="help/topics/BSim/FeatureWeight.html#CompareVectors"/>
|
||||||
|
</tocdef>
|
||||||
|
|
||||||
|
<tocdef id="BSimCommandLine" sortgroup="e"
|
||||||
|
text="Command-Line Utility Reference"
|
||||||
|
target="help/topics/BSim/CommandLineReference.html" >
|
||||||
|
|
||||||
|
<tocdef id="BSim Control (bsim_ctl)"
|
||||||
|
sortgroup="a"
|
||||||
|
text="BSim Control (bsim_ctl)"
|
||||||
|
target="help/topics/BSim/CommandLineReference.html#BSimCtl"/>
|
||||||
|
<tocdef id="BSim Command (bsim)"
|
||||||
|
sortgroup="b"
|
||||||
|
text="BSim Command (bsim)"
|
||||||
|
target="help/topics/BSim/CommandLineReference.html#BSimCommand"/>
|
||||||
|
</tocdef>
|
||||||
|
</tocdef>
|
||||||
|
</tocref>
|
||||||
|
</tocroot>
|
25
Ghidra/Features/BSim/src/main/help/help/shared/languages.css
Normal file
25
Ghidra/Features/BSim/src/main/help/help/shared/languages.css
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
/* ###
|
||||||
|
* IP: GHIDRA
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
/*
|
||||||
|
This file contains non-Ghidra style sheet markup. This file will be loaded in addition to
|
||||||
|
DefaultStyle.css.
|
||||||
|
*/
|
||||||
|
|
||||||
|
div.informalexample { margin-left: 50px; margin-top: 10px; }
|
||||||
|
dd { margin-bottom: 20px; }
|
||||||
|
dd p { margin-top: 5px; margin-left: 10px; }
|
||||||
|
span.term { font-family:times new roman; font-size:14pt; font-weight:bold; }
|
||||||
|
span.redtext { color:#CC0033; }
|
@ -0,0 +1,197 @@
|
|||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||||
|
|
||||||
|
<HTML>
|
||||||
|
<HEAD>
|
||||||
|
<META name="generator" content=
|
||||||
|
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||||
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||||
|
|
||||||
|
<TITLE>BSim Database</TITLE>
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||||
|
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||||
|
<LINK rel="home" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="up" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="prev" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="next" href="DatabaseConfiguration.html" title="Database Configuration">
|
||||||
|
</HEAD>
|
||||||
|
|
||||||
|
<BODY>
|
||||||
|
<DIV class="chapter">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H1 class="title"><A name="DatabaseOverview"></A>BSim Database</H1>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||||||
|
<H3 class="title">Quick Reference Links</H3>
|
||||||
|
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist compact" style="list-style-type: disc;">
|
||||||
|
<LI class="listitem"><A class="link" href="DatabaseConfiguration.html" title=
|
||||||
|
"Database Configuration">Database Configuration</A></LI>
|
||||||
|
|
||||||
|
<LI class="listitem"><A class="link" href="IngestProcess.html" title=
|
||||||
|
"Ingesting Executables">Ingesting Executables</A></LI>
|
||||||
|
|
||||||
|
<LI class="listitem"><A class="link" href="../BSimSearchPlugin/BSimSearch.html" title=
|
||||||
|
"Querying a BSim Database">Querying a BSim Database</A></LI>
|
||||||
|
|
||||||
|
<LI class="listitem"><A class="link" href="FeatureWeight.html" title=
|
||||||
|
"Features and Weights">Features and Weights</A></LI>
|
||||||
|
|
||||||
|
<LI class="listitem"><A class="link" href="CommandLineReference.html" title=
|
||||||
|
"Command-Line Utility Reference">Command-Line Reference</A></LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="IntroOverview"></A>Overview</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Welcome to Ghidra's BSim (Behavioral Similarity) Database. This database technology is
|
||||||
|
designed to allow reverse engineers to ingest metadata about previously analyzed binary
|
||||||
|
executables to a central server or local database, which can then be queried in the
|
||||||
|
course of analyzing new,
|
||||||
|
unknown, executables to quickly discover previously seen functions and libraries.</P>
|
||||||
|
|
||||||
|
<P>The primary record ingested into the database describes a single function. The most
|
||||||
|
novel aspects of the database are that:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist" style="list-style-type: disc;">
|
||||||
|
<LI class="listitem">Queries are tolerant of variations in the compilation of the
|
||||||
|
function.</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">All records are indexed for quick queries. (even for very large
|
||||||
|
collections)</LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The primary feature set used for indexing a function is extracted from a concise
|
||||||
|
description of the data-flow of the function, not the explicit encoding of the machine
|
||||||
|
instructions. The data-flow description is a graph-based (abstract syntax tree)
|
||||||
|
representation, based on Ghidra's intermediate representation language, p-code, and is
|
||||||
|
generated by the Ghidra decompiler. The resulting function descriptions are normalized to
|
||||||
|
minimize the impact of variations due to:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist" style="list-style-type: disc;">
|
||||||
|
<LI class="listitem">Equivalent machine instructions</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">Storage location (registers, stack, memory)</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">Instruction order</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">Many forms of compiler transformation</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">Even some forms of deliberate obfuscation.</LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Records are indexed using current Text Retrieval strategies, which allow "nearest
|
||||||
|
neighbor" queries. The feature set of an unknown function being queried does not have to
|
||||||
|
exactly match the features of a "hit" in the database, but only a configurable percentage
|
||||||
|
of them. This supplies an additional level of tolerance of "functional difference" on top
|
||||||
|
of the tolerance of "functionally equivalent" variations provided by the decompiler. In
|
||||||
|
other words, there can be some amount of true change in the underlying source code, and the
|
||||||
|
query may still be able to find a match.</P>
|
||||||
|
|
||||||
|
<P>Queries are quick: For a single function, results typically come back in microseconds,
|
||||||
|
even for a database containing millions of functions.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="ToolOverview"></A>Overview of
|
||||||
|
Tools</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>A BSim Database is built on top of one of three technologies: PostgreSQL,
|
||||||
|
local H2 database, or Elasticsearch.
|
||||||
|
PostgreSQL is a robust, production capable, server that supports multiple simultaneous
|
||||||
|
connections and is extremely fault tolerant. Elasticsearch is a scalable search engine that
|
||||||
|
allows a database to be distributed across an entire cluster of machines.
|
||||||
|
The local H2 database support is provided for convenience and use with small personal
|
||||||
|
collections. For any of these options, this distribution includes specific reverse
|
||||||
|
engineering extensions and clients that provide the following capabilities.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist" style="list-style-type: disc;">
|
||||||
|
<LI class="listitem">
|
||||||
|
Integration with a Ghidra Server or local project:
|
||||||
|
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist" style="list-style-type: circle;">
|
||||||
|
<LI class="listitem">Ingest can be with respect to a Ghidra repository
|
||||||
|
from either a Ghidra Server or local project.</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">Query results can refer to executables within a
|
||||||
|
repository.</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">Easy command-line ingests using the <CODE class=
|
||||||
|
"filename">bsim</CODE> command script</LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">
|
||||||
|
Client as a Ghidra Plug-in:
|
||||||
|
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist" style="list-style-type: circle;">
|
||||||
|
<LI class="listitem">Ghidra includes a plug-in client that integrates a query
|
||||||
|
dialog and results windows directly into the main code browser.</LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</LI>
|
||||||
|
|
||||||
|
<LI class="listitem">
|
||||||
|
Query API:
|
||||||
|
|
||||||
|
<DIV class="itemizedlist">
|
||||||
|
<UL class="itemizedlist" style="list-style-type: circle;">
|
||||||
|
<LI class="listitem">Ghidra includes a Java API to the BSim server so that
|
||||||
|
queries (and potentially ingest) can be incorporated into analyst scripts. The
|
||||||
|
API marshals queries and results between an active Ghidra session and a BSim
|
||||||
|
server.</LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</LI>
|
||||||
|
</UL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||||||
|
<H3 class="title">Note</H3>
|
||||||
|
|
||||||
|
<P>The PostgreSQL server software is currently only supported for the <SPAN class=
|
||||||
|
"emphasis"><EM>Linux</EM></SPAN> and <SPAN class="emphasis"><EM>MacOS</EM></SPAN>
|
||||||
|
architectures. Elasticsearch server software must be obtained separately. Small local
|
||||||
|
file-based databases are supported on all platforms via an embedded H2 database
|
||||||
|
engine. The BSim client
|
||||||
|
software is supported on all platforms and can connect to servers on a different
|
||||||
|
architecture.</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</BODY>
|
||||||
|
</HTML>
|
@ -0,0 +1,820 @@
|
|||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||||
|
|
||||||
|
<HTML>
|
||||||
|
<HEAD>
|
||||||
|
<META name="generator" content=
|
||||||
|
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||||
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||||
|
|
||||||
|
<TITLE>Command-Line Utility Reference</TITLE>
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||||
|
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||||
|
<LINK rel="home" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="up" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="prev" href="FeatureWeight.html" title="Features and Weights">
|
||||||
|
</HEAD>
|
||||||
|
|
||||||
|
<BODY>
|
||||||
|
<DIV class="chapter">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H1 class="title"><A name="CommandLineReference"></A>Command-Line Utility
|
||||||
|
Reference</H1>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="BSimCtl"></A><CODE class=
|
||||||
|
"computeroutput">bsim_ctl</CODE></H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<PRE>
|
||||||
|
<CODE class="computeroutput">
|
||||||
|
bsim_ctl start </datadir-path [auth=pki|password|trust] [--noLocalAuth] [cafile=</cacert-path>] [dn=".."]
|
||||||
|
bsim_ctl stop </datadir-path> [--force]
|
||||||
|
bsim_ctl adduser </datadir-path> <username> [dn=".."]
|
||||||
|
bsim_ctl dropuser </datadir-path> <username>
|
||||||
|
bsim_ctl resetpassword <username>
|
||||||
|
bsim_ctl changeauth </datadir-path> [auth=pki|password|trust] [--noLocalAuth] [cafile=</cacert-path>] [dn=".."]
|
||||||
|
bsim_ctl changeprivilege <username> admin|user
|
||||||
|
|
||||||
|
Global Options:
|
||||||
|
port=<portnum>
|
||||||
|
user=<username>
|
||||||
|
cert=</certfile-path>
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>bsim_ctl</STRONG></SPAN> is a command-line utility for
|
||||||
|
starting and stopping a BSim server using the PostgreSQL back-end that is prepackaged with
|
||||||
|
the Ghidra distribution. All commands must be run on the machine hosting the server.
|
||||||
|
Optional parameters for a given command are indicated by square brackets '[' and ']'.
|
||||||
|
Options with an '=' character require a user specified value. If the value string requires
|
||||||
|
space characters, it should be enclosed in double quotes.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="variablelist">
|
||||||
|
<DL class="variablelist">
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>start</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Initializes and starts a PostgreSQL server. The command-line must include a path
|
||||||
|
to the data directory for the server, which must exist. If a server had run
|
||||||
|
previously and populated this directory, this command simply restarts the server
|
||||||
|
using the preexisting data and configuration; otherwise, a new database is
|
||||||
|
initialized. The user performing the initial start is automatically added to the
|
||||||
|
database with <SPAN class="emphasis"><EM>admin</EM></SPAN> privileges.</P>
|
||||||
|
|
||||||
|
<P>During a restart, any authentication options (with the exception of the global
|
||||||
|
<SPAN class="bold"><STRONG>cert=</STRONG></SPAN> option) are unnecessary and will
|
||||||
|
be ignored. The PostgreSQL server will be restarted with the already established
|
||||||
|
settings. To actually change the settings, use the <SPAN class=
|
||||||
|
"bold"><STRONG>changeauth</STRONG></SPAN> command before restarting.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>auth=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>type</EM></SPAN> - specifies the authentication type (<B>pki |
|
||||||
|
password | trust</B>) for a new database: <SPAN class=
|
||||||
|
"emphasis"><EM>trust</EM></SPAN> for no authentication, <SPAN class=
|
||||||
|
"emphasis"><EM>password</EM></SPAN> for password authentication, and <SPAN class=
|
||||||
|
"emphasis"><EM>pki</EM></SPAN> for authentication using public key certificates.
|
||||||
|
With the <SPAN class="emphasis"><EM>pki</EM></SPAN> setting, both the <SPAN class=
|
||||||
|
"bold"><STRONG>cafile=</STRONG></SPAN> and the <SPAN class=
|
||||||
|
"bold"><STRONG>dn=</STRONG></SPAN> options also need to be provided; additionally
|
||||||
|
the <SPAN class="bold"><STRONG>cert=</STRONG></SPAN> option must be provided unless
|
||||||
|
the <SPAN class="bold"><STRONG>--noLocalAuth</STRONG></SPAN> option is also
|
||||||
|
given.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--noLocalAuth</STRONG></SPAN> - used together with
|
||||||
|
the <SPAN class="command"><STRONG>auth=</STRONG></SPAN> option causes
|
||||||
|
authentication to not be required for local connections, i.e. localhost.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>cafile=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>/cafile-path</EM></SPAN> - specifies an absolute path to a
|
||||||
|
certificate authority file and is required for <SPAN class=
|
||||||
|
"command"><STRONG>auth=pki</STRONG></SPAN>. This file should contain the
|
||||||
|
certificates the PostgreSQL server will use to authenticate in PEM format
|
||||||
|
concatenated together.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>dn=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>name</EM></SPAN> - specifies the Distinguished Name for the admin
|
||||||
|
user and is required for <SPAN class=
|
||||||
|
"command"><STRONG>auth=pki</STRONG></SPAN>.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>port=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>portnum</EM></SPAN> - specifies the port the PostgreSQL server will
|
||||||
|
listen on. For port numbers other than the default 5432, URLs and other
|
||||||
|
command-lines must explicitly specify the port, when connecting to the server. This
|
||||||
|
option only effects the initial start of a server. For subsequent (re)starts this
|
||||||
|
option is ignored, and the server will continue to listen on the same port
|
||||||
|
specified in the initial start. Use <SPAN class=
|
||||||
|
"command"><STRONG>changeauth</STRONG></SPAN> to change the port of a server after
|
||||||
|
its initial start.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>stop</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Stops a currently running PostgreSQL server. The path to the actively used data
|
||||||
|
directory must be provided. By default, shutdown will wait until existing
|
||||||
|
connections to the database have been closed.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--force</STRONG></SPAN> - causes existing
|
||||||
|
connections to be forcibly closed and the PostgreSQL server to shut down
|
||||||
|
immediately.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>adduser</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Give a new user permission to access the PostgreSQL server. The path to the
|
||||||
|
actively used data directory and a single username must be specified. The server
|
||||||
|
must be running. New users are given <SPAN class="emphasis"><EM>user</EM></SPAN>
|
||||||
|
(read-only) privileges, unless a subsequent <SPAN class=
|
||||||
|
"command"><STRONG>changeprivilege</STRONG></SPAN> command is used.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>dn=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>name</EM></SPAN> - specifies the Distinguished Name of the new user,
|
||||||
|
which is required if the database enabled <SPAN class=
|
||||||
|
"command"><STRONG>auth=pki</STRONG></SPAN>. This option can be used to provide a
|
||||||
|
Distinguished Name to a preexisting user, if the PostgreSQL server's authentication
|
||||||
|
strategy is changed.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>dropuser</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Remove access to the PostgreSQL server for a specific user. The path to the
|
||||||
|
actively used data directory and a single username must be specified. The server
|
||||||
|
must be running.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>changeauth</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Change the configuration of a previously initialized PostgreSQL server. The path
|
||||||
|
to the server's data directory must be specified. The server must not currently be
|
||||||
|
running to use this command, which only takes effect after a restart. Options have
|
||||||
|
the same meaning as for the <SPAN class="command"><STRONG>start</STRONG></SPAN>
|
||||||
|
command.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>port=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>portnum</EM></SPAN> - changes the port the PostgreSQL server will
|
||||||
|
listen on. If this option is not present, the server will continue to listen on the
|
||||||
|
same port.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>auth=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>type</EM></SPAN> - changes the authentication type (<B>pki |
|
||||||
|
password | trust</B>) used by the PostgreSQL server. No change is made if the
|
||||||
|
option is not present. If the option is present, omitting the <SPAN class=
|
||||||
|
"command"><STRONG>--noLocalAuth</STRONG></SPAN> causes local connections to require
|
||||||
|
authentication. This command does not affect the presence or absence of passwords
|
||||||
|
or Distinguished Names for existing users.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>dn=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>name</EM></SPAN> - specifies the Distinguished Name for the admin
|
||||||
|
user and is required for <SPAN class=
|
||||||
|
"command"><STRONG>auth=pki</STRONG></SPAN>.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>resetpassword</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Reset the password for a user. A single user must be specified, and the
|
||||||
|
PostgreSQL server must be running. The password will be reset to 'changeme'.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>changeprivilege</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Change access privilege for a user. A single user must be specified followed by
|
||||||
|
<SPAN class="command"><STRONG>admin</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>user</STRONG></SPAN>, and the PostgreSQL server must be
|
||||||
|
running.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>--Global
|
||||||
|
Options--</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>These options apply to all the <SPAN class=
|
||||||
|
"command"><STRONG>bsim_ctl</STRONG></SPAN> commands that connect to an active
|
||||||
|
PostgreSQL server: <SPAN class="command"><STRONG>start</STRONG></SPAN>, <SPAN
|
||||||
|
class="command"><STRONG>adduser</STRONG></SPAN>, <SPAN class=
|
||||||
|
"command"><STRONG>dropuser</STRONG></SPAN>, <SPAN class=
|
||||||
|
"command"><STRONG>resetpassword</STRONG></SPAN>, and <SPAN class=
|
||||||
|
"command"><STRONG>changeprivilege</STRONG></SPAN>.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>port=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>portnum</EM></SPAN> - specifies the port on which to connect with
|
||||||
|
the PostgreSQL server.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>user=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>username</EM></SPAN> - specifies a user name to use when connecting
|
||||||
|
to the PostgreSQL server.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>cert=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>/certfile-path</EM></SPAN> - provides the absolute file path to the
|
||||||
|
user's certificate when connecting to a PostgreSQL server that requires PKI
|
||||||
|
authentication.</P>
|
||||||
|
</DD>
|
||||||
|
</DL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="BSimCommand"></A><CODE class=
|
||||||
|
"computeroutput">bsim</CODE></H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<PRE>
|
||||||
|
<CODE class="computeroutput">
|
||||||
|
bsim createdatabase <bsimURL> <config_template> [name="<name>"] [owner="<owner>"] [description="<text>"] [--nocallgraph]
|
||||||
|
bsim setmetadata <bsimURL> [name="<name>"] [owner="<owner>"] [description="<text>"]\n" +
|
||||||
|
bsim addexecategory <bsimURL> <category_name> [--date]
|
||||||
|
bsim addfunctiontag <bsimURL> <tag_name>
|
||||||
|
bsim dropindex <bsimURL>
|
||||||
|
bsim rebuildindex <bsimURL>
|
||||||
|
bsim prewarm <bsimURL>
|
||||||
|
bsim generatesigs <ghidraURL> </xmldirectory> config=<config_template> [--overwrite]
|
||||||
|
bsim generatesigs <ghidraURL> </xmldirectory> bsim=<bsimURL> [--commit] [--overwrite]
|
||||||
|
bsim generatesigs <ghidraURL> bsim=<bsimURL>
|
||||||
|
bsim commitsigs <bsimURL> </xmldirectory> [md5=<hash>] [override=<ghidraURL>]
|
||||||
|
bsim generateupdates <ghidraURL> </xmldirectory> config=<config_template> [--overwrite]
|
||||||
|
bsim generateupdates <ghidraURL> </xmldirectory> bsim=<bsimURL> [--commit] [--overwrite]
|
||||||
|
bsim generateupdates <ghidraURL> bsim=<bsimURL>
|
||||||
|
bsim commitupdates <bsimURL> </xmldirectory>
|
||||||
|
bsim listexes <bsimURL> [md5=<hash>] [name=<exe_name>] [arch=<languageID>] [compiler=<cspecID>] [sortcol=<column_name>] [limit=<exe_count>] [--includelibs]
|
||||||
|
bsim getexecount <bsimURL> [md5=<hash>] [name=<exe_name>] [arch=<languageID>] [compiler=<cspecID>] [--includelibs]
|
||||||
|
bsim delete <bsimURL> [md5=<hash>] [name=<exe_name> [arch=<languageID>] [compiler=<cspecID>]]
|
||||||
|
bsim listfuncs <bsimURL> [md5=<hash>] [name=<exe_name> [arch=<languageID>] [compiler=<cspecID>]] [--printselfsig] [--callgraph] [--printjustexe] [maxfunc=<max_count>]
|
||||||
|
bsim dumpsigs <bsimURL> </xmldirectory> [md5=<hash>] [name=<exe_name> [arch=<languageID>] [compiler=<cspecID>]]
|
||||||
|
|
||||||
|
Global options:
|
||||||
|
user=<username>
|
||||||
|
cert=<certfile-path>
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>See <A class="xref" href="CommandLineReference.html#URLs">“Ghidra and BSim
|
||||||
|
URLs”</A> below for details about specifying <EM>ghidraURL</EM> and <EM>bsimURL</EM>
|
||||||
|
properly. See <A class="xref" href="DatabaseConfiguration.html">“Database
|
||||||
|
Configuration”</A> for guidance on the various BSim Databases which are
|
||||||
|
supported.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>bsim</STRONG></SPAN> is a command-line utility for
|
||||||
|
managing the generation and ingest of BSim signatures and metadata. Depending on the
|
||||||
|
subcommand, it connects to a Ghidra Server and/or a BSim database server. A <SPAN class=
|
||||||
|
"emphasis"><EM>ghidraURL</EM></SPAN> refers to Ghidra Server or local project using the
|
||||||
|
<SPAN class="command"><STRONG>ghidra:</STRONG></SPAN> protocol, while <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> refers to a BSim database server with the appropriate
|
||||||
|
<SPAN class="command"><STRONG>postgresql:</STRONG></SPAN>, <SPAN class=
|
||||||
|
"command"><STRONG>https:</STRONG></SPAN>, or <SPAN class=
|
||||||
|
"command"><STRONG>file:</STRONG></SPAN> protocol specified. The <SPAN class=
|
||||||
|
"command"><STRONG>elastic:</STRONG></SPAN> protocol is equivalent to and may be used in
|
||||||
|
place of the <SPAN class="command"><STRONG>https:</STRONG></SPAN> protocol.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="variablelist">
|
||||||
|
<DL class="variablelist">
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>createdatabase</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Creates a new empty repository. A URL and configuration template (<SPAN class=
|
||||||
|
"bold"><STRONG>config_template</STRONG></SPAN>) is required. The new database name
|
||||||
|
is taken from the path element of the URL.</P>
|
||||||
|
|
||||||
|
<P>Supported configuration templates (<SPAN class=
|
||||||
|
"bold"><STRONG>config_template</STRONG></SPAN>) are defined within the Ghidra
|
||||||
|
installation in XML form. The following configurations are currently defined:
|
||||||
|
(<SPAN class="bold"><STRONG>large_32, medium_32, medium_64, medium_cpool,
|
||||||
|
medium_nosize</STRONG></SPAN>).</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies a formal, more
|
||||||
|
descriptive, name for the repository that can be used for the BSim client
|
||||||
|
display.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>owner=</STRONG></SPAN> - gives a descriptive name
|
||||||
|
for the owner of the repository and/or the data it will contain.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>description=</STRONG></SPAN> - specifies a short
|
||||||
|
string describing the intended contents of the new repository.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--nocallgraph=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>yes/no</EM></SPAN> - disables storing call relationships between
|
||||||
|
ingested functions. Default is to store call relationships.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>setmetadata</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Change the global <SPAN class="emphasis"><EM>name</EM></SPAN>, <SPAN class=
|
||||||
|
"emphasis"><EM>owner</EM></SPAN>, or <SPAN class=
|
||||||
|
"emphasis"><EM>description</EM></SPAN> metadata associated with a BSim server. A
|
||||||
|
BSim server URL is required.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies a formal, more
|
||||||
|
descriptive, name for the repository that can be used for the BSim client
|
||||||
|
display.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>owner=</STRONG></SPAN> - gives a descriptive name
|
||||||
|
for the owner of the repository and/or the data it will contain.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>description=</STRONG></SPAN> - specifies a short
|
||||||
|
string describing the intended contents of the new repository.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>addexecategory</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Specify a new executable category to be included with generated metadata. A BSim
|
||||||
|
server URL and the name of the new category are required. This only affects future
|
||||||
|
ingest commands. Executables that have already been ingested are unaffected,
|
||||||
|
although they can be adjusted with an <SPAN class=
|
||||||
|
"command"><STRONG>updaterepo</STRONG></SPAN> command.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>date</STRONG></SPAN> - indicates the new category
|
||||||
|
holds date/time information.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>addfunctiontag</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Specify a new function tag to be included with generated metadata. A BSim server
|
||||||
|
URL and the name of the new tag are required. This only affects future ingest
|
||||||
|
commands. Functions that have already been ingested are unaffected, although they
|
||||||
|
can be adjusted with an <SPAN class="command"><STRONG>updaterepo</STRONG></SPAN>
|
||||||
|
command.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>dropindex</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Delete the main signature index from a BSim repository (in preparation for new
|
||||||
|
ingest). A BSim repository URL is required. Normal queries will not complete or
|
||||||
|
will be extremely slow.</P>
|
||||||
|
|
||||||
|
<P><STRONG>NOTE:</STRONG> Not supported by H2 file database</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>rebuildindex</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Recreate the main signature index (that had previously been dropped) for a BSim
|
||||||
|
repository. A BSim server URL is required. After this command completes, normal
|
||||||
|
function queries should be fast.</P>
|
||||||
|
|
||||||
|
<P><STRONG>NOTE:</STRONG> Not supported by H2 file database</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>prewarm</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Instruct a restarted BSim server to preload pages from the main signature index
|
||||||
|
and function table into RAM. This avoids slow random access disk reads on initial
|
||||||
|
queries. A BSim server URL is required.</P>
|
||||||
|
|
||||||
|
<P><STRONG>NOTE:</STRONG> Not supported by Elasticsearch or H2 file databases</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>generatesigs</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Generates function signatures and metadata for all program files retrieved from
|
||||||
|
a Ghidra Server repository or project as specified by a Ghidra URL. The generated
|
||||||
|
signatures may be retained as XML "sigs_" files within a specified XML storage
|
||||||
|
directory and/or commited to a specified BSim database specified with the <SPAN
|
||||||
|
class="command"><STRONG>bsim=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> option. If an XML storage directory is not
|
||||||
|
specified, a BSim URL must be specified to which the data will be committed.</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>config=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>config-template</EM></SPAN> option may be specified when generating
|
||||||
|
XML "sigs_" signature files in the absence of a BSim database (see
|
||||||
|
<STRONG>createdatabase</STRONG> for supported configurations). The generated files
|
||||||
|
will be written to the specified XML storage directory. Creation of the signature
|
||||||
|
files can also be achieved by specifying the <STRONG>bsim=</STRONG><EM>bsimURL</EM>
|
||||||
|
option instead of the <STRONG>config=</STRONG> option.</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>--overwrite</STRONG></SPAN> <SPAN class=
|
||||||
|
"emphasis">option may be specified when an XML storage directory has also been
|
||||||
|
specified to allow conflicting signature files to be overwritten.</SPAN></P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>--commit</STRONG></SPAN> <SPAN class=
|
||||||
|
"emphasis">option may be specified when a BSim URL has also been specified to allow
|
||||||
|
generated signatures to be committed to the BSim database. This option is implied
|
||||||
|
when a BSim URL has been specified without an XML storage directory.</SPAN></P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>commitsigs</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Commit previously generated signatures and metadata (see
|
||||||
|
<STRONG>signaturerepo</STRONG>) to a BSim repository. A URL specifying the BSim
|
||||||
|
repository and a path to a directory containing the "sigs_" XML files to commit are
|
||||||
|
required.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>override=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>ghidraURL</EM></SPAN> - causes any Ghidra repository/project URL,
|
||||||
|
describing the storage repository and path of executables that was recorded in the
|
||||||
|
"sigs_" XML files during signature generation, to be overridden during the commit
|
||||||
|
operation with the specified Ghidra URL.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>generateupdates</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Generates updated function metadata for program files from a Ghidra Server
|
||||||
|
repository or project, as specified by a Ghidra URL, which previously had signature
|
||||||
|
and metadata generated (see <STRONG>generatesigs</STRONG>). Only metadata: names,
|
||||||
|
function tags, categories, etc. are changed. Signatures are not affected. The
|
||||||
|
generated updates may be retained as XML "update_" files within a specified XML
|
||||||
|
storage directory and/or commited to a specified BSim database specified with the
|
||||||
|
<SPAN class="command"><STRONG>bsim=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> option. If an XML storage directory is not
|
||||||
|
specified, a BSim URL must be specified to which the data will be committed.</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>config=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>config-template</EM></SPAN> option may be specified when generating
|
||||||
|
XML "update_" files in the absence of a BSim database (see
|
||||||
|
<STRONG>createdatabase</STRONG> for supported configurations). The generated files
|
||||||
|
will be written to the specified XML storage directory. Creation of the update
|
||||||
|
files can also be achieved by specifying the <STRONG>bsim=</STRONG><EM>bsimURL</EM>
|
||||||
|
option instead of the <STRONG>config=</STRONG> option.</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>--overwrite</STRONG></SPAN> <SPAN class=
|
||||||
|
"emphasis">option may be specified when an XML storage directory has also been
|
||||||
|
specified to allow conflicting update files to be overwritten.</SPAN></P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>--commit</STRONG></SPAN> <SPAN class=
|
||||||
|
"emphasis">option may be specified when a BSim URL has also been specified to allow
|
||||||
|
generated updates to be committed to the BSim database. This option is implied when
|
||||||
|
a BSim URL has been specified without an XML storage directory.</SPAN></P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>commitupdates</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Update a BSim repository with previously generated update metadata (see
|
||||||
|
<STRONG>generateupdates</STRONG>). A URL specifying the BSim repository and a path
|
||||||
|
to a directory containing the "update_" XML files to commit are required.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>listexes</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>List all executable program records within a specified BSim database repository
|
||||||
|
which satisfy the specified criteria. A BSim URL specifying the repository must be
|
||||||
|
provided, and one of two options, <SPAN class=
|
||||||
|
"command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||||
|
also be given. All matching executable records will be listed.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies an executable via its MD5
|
||||||
|
checksum.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||||
|
name which may match one or more executable records.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||||
|
as a Ghidra processor id which will be used to filter executables.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||||
|
specification id which will be used to filter executables.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>sortcol=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>column</EM></SPAN> - Indicates which display column should be used
|
||||||
|
to sort the results (<STRONG>MD5 | NAME</STRONG>; default:
|
||||||
|
<STRONG>MD5</STRONG>).</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>limit=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>max_count</EM></SPAN> - specifies the maximum number of executables
|
||||||
|
to be listed which match the search criteria (default=20, a value of 0 indicates no
|
||||||
|
limit).</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--includelibs</STRONG> - If specified, executable
|
||||||
|
records which correspond to a referenced Library will be included. Such records
|
||||||
|
have a fabricated MD5 which is based on its name.</SPAN></P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>getexecount</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Get the total number of executable program records within a specified BSim
|
||||||
|
database repository which satisfy the specified criteria. A BSim URL specifying the
|
||||||
|
repository must be provided, and one of two options, <SPAN class=
|
||||||
|
"command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||||
|
also be given. All matching executable records will be listed.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies an executable via its MD5
|
||||||
|
checksum.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||||
|
name which may match one or more executable records.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||||
|
as a Ghidra processor id which will be used to filter executables.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||||
|
specification id which will be used to filter executables.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--includelibs</STRONG> - If specified, executable
|
||||||
|
records which correspond to a referenced Library will be included. Such records
|
||||||
|
have a fabricated MD5 which is based on its name.</SPAN></P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>delete</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Remove all records associated with a specific executable from a BSim repository.
|
||||||
|
A BSim URL specifying the repository must be provided, and one of two options,
|
||||||
|
<SPAN class="command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||||
|
also be given. All associated executable and function records are removed.
|
||||||
|
If an executable cannot be uniquely identified an error will result.
|
||||||
|
</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies the executable via its MD5
|
||||||
|
checksum.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||||
|
name which may match one or more executable records.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||||
|
as a Ghidra processor id, when the <SPAN class=
|
||||||
|
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||||
|
executable.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||||
|
id string, when the <SPAN class="command"><STRONG>name</STRONG></SPAN> option is
|
||||||
|
not enough to uniquely specify the executable.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>listfuncs</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>List all function records associated with a specific executable from a BSim
|
||||||
|
repository. A BSim URL specifying the repository must be provided, and one of two
|
||||||
|
options, <SPAN class="command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>name=</STRONG></SPAN>, that indicate the specific executable must
|
||||||
|
also be given. All associated executable and function records are listed. If an
|
||||||
|
executable cannot be uniquely identified an error will result.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies the executable via its MD5
|
||||||
|
checksum.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||||
|
name which may match one or more executable records.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||||
|
as a Ghidra processor id, when the <SPAN class=
|
||||||
|
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||||
|
executable.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||||
|
id string, when the <SPAN class="command"><STRONG>name</STRONG></SPAN> option is
|
||||||
|
not enough to uniquely specify the executable.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--printselfsig</STRONG></SPAN> - If specified, each
|
||||||
|
function listed will be prefixed by a calculated self-significance score. This value is
|
||||||
|
expressed as a decimal value.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--callgraph</STRONG></SPAN> - If specified, a list
|
||||||
|
of all library functions called by the identified executable will be listed after
|
||||||
|
the function list.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>--printjustexe</STRONG> - If specified, only a
|
||||||
|
summary of the executable will be displayed. If <STRONG>--callgraph</STRONG> was
|
||||||
|
also specified only the called libraries will be listed and not the specified
|
||||||
|
functions.</SPAN></P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>maxfunc=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>max_count</EM></SPAN> - specifies the maximum number of functions to
|
||||||
|
be listed which correspond to the identified executable (default=1000, a value of 0
|
||||||
|
indicates no limit).</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>dumpsigs</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>Dump signature and metadata from a BSim repository for a specific executable to
|
||||||
|
a "sigs_" XML file. A BSim server URL and a path to a directory where the new file
|
||||||
|
will be stored must be given. One of two options, <SPAN class=
|
||||||
|
"command"><STRONG>md5=</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>name=</STRONG></SPAN>, that specify the particular executable
|
||||||
|
must also be given. If an executable cannot be uniquely identified an error will result.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>md5=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>32-hexdigits</EM></SPAN> - specifies an executable via its MD5
|
||||||
|
checksum.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>name=</STRONG></SPAN> - specifies an executable
|
||||||
|
name which may match one or more executable records.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>arch=</STRONG></SPAN> - specifies an architecture
|
||||||
|
as a Ghidra processor id, when the <SPAN class=
|
||||||
|
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||||
|
executable.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>compiler=</STRONG></SPAN> - specifies a compiler
|
||||||
|
specification id, when the <SPAN class=
|
||||||
|
"command"><STRONG>name</STRONG></SPAN> option is not enough to uniquely specify the
|
||||||
|
executable.</P>
|
||||||
|
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>--Global
|
||||||
|
Options--</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>These options apply to all <SPAN class="command"><STRONG>bsim</STRONG></SPAN>
|
||||||
|
commands.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>user=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>name</EM></SPAN> - specifies a user to masquerade as when connecting
|
||||||
|
to the server.</P>
|
||||||
|
|
||||||
|
<P><SPAN class="command"><STRONG>cert=</STRONG></SPAN><SPAN class=
|
||||||
|
"emphasis"><EM>path</EM></SPAN> - provides a path to the user's certificate when
|
||||||
|
connecting to a server that requires PKI authentication.</P>
|
||||||
|
</DD>
|
||||||
|
</DL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="URLs"></A>Ghidra and BSim URLs</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Ghidra utilizes Universal Resource Locators (URLs) to identify both <EM>Ghidra
|
||||||
|
Server/Project Repositories</EM> and <EM>BSim Databases</EM>. See the corresponding sections
|
||||||
|
below for specific formatting details. It is important to note that local <EM>ghidra</EM> and
|
||||||
|
<EM>file</EM> URLs never include a double-slash after the protocol (i.e, "://").</P>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title" style="clear: both"><A name="GhidraURLs"></A>Ghidra Server/Project
|
||||||
|
Repository URLs</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>BSim command-line tools, as well as the Ghidra GUI, utilize a URL to specify the
|
||||||
|
location of a remote Ghidra Server repository or a local Ghidra Project. Both cases work in
|
||||||
|
a very similar fashion other than the format of the URL and potential limitations of a
|
||||||
|
local Project URL. Use of a Ghidra Server repository and corresponding URLs is preferred
|
||||||
|
since any Ghidra URL metadata added to a shared BSim database has the ability to be
|
||||||
|
accessed by other users, while a local Ghidra Project URL is very limited in its visibility
|
||||||
|
and path validity on other systems. For this reason, use of a local Ghidra Project URL
|
||||||
|
should be restricted to use with a local H2 BSim Database file.</P>
|
||||||
|
|
||||||
|
<P>The format of a remote <EM>Ghidra Server URL</EM> is distinctly different from a
|
||||||
|
<EM>Local Ghidra Project URL</EM>. These URLs have the following formats:</P>
|
||||||
|
|
||||||
|
<P><STRONG>Remote Ghidra Server Repository</STRONG><BR>
|
||||||
|
</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">ghidra://<hostname>[:<port>]/<repository_name>[/<folder_path>]</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>If the default Ghidra Server port (1111) is in use it need not be specified with URL.
|
||||||
|
The <EM>hostname</EM> may specify either a Fully Qualified Domain Name (FQDN, e.g.,
|
||||||
|
<EM>host.abc.com</EM>) or IP v4 Address (e.g., <EM>1.2.3.4</EM>).</P>
|
||||||
|
<STRONG>Local Ghidra Project</STRONG><BR>
|
||||||
|
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">ghidra:[/<directory_path>]/<project_name>[?/<folder_path>]</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>For local project URLs, the absolute directory path containing the project
|
||||||
|
<EM>*.gpr</EM> locator file must be specified with the project name. The project name
|
||||||
|
should exclude any <EM>.gpr/.rep</EM> suffix. Only the '/' character should be used as a
|
||||||
|
directory separator. In addition, when running on Windows, the directory path should
|
||||||
|
include its drive desigation preceeded by a '/' (e.g., <CODE class=
|
||||||
|
"computeroutput">ghidra:/C:/mydir/myproject?/folderA/folderB</CODE>).</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title" style="clear: both"><A name="BSimURLs"></A>BSim Database URLs</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>BSim command-line tools utilize a URL to specify the type and specific details required
|
||||||
|
to establish a connection to a specific BSim Database. Within the Ghidra GUI the database
|
||||||
|
details are not specified using a URL and is done using an interactive form. Each BSim
|
||||||
|
database type has a distinct URL format:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" cellpadding="2" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TH>Database Type</TH>
|
||||||
|
|
||||||
|
<TH align="left">URL Format</TH>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>PostgreSQL</TD>
|
||||||
|
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">postgresql://<hostname>[:<port>]/<dbname></CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>Elasticsearch</TD>
|
||||||
|
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">https://<hostname>[:<port>]/<dbname></CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>Elasticsearch</TD>
|
||||||
|
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">elastic://<hostname>[:<port>]/<dbname></CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>H2 File</TD>
|
||||||
|
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">file:[/<directory_path>]/<dbname></CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The use of the <EM>https</EM> and <EM>elastic</EM> is equivalent.</P>
|
||||||
|
|
||||||
|
<P>For local <EM>file</EM> URLs, the absolute path the H2 database <EM>*.mv.db</EM> file
|
||||||
|
must be specified without the <EM>*.mv.db</EM> extension. Only the '/' character should be
|
||||||
|
used as a directory separator. In addition, when running on Windows, the directory path
|
||||||
|
should include its drive desigation preceeded by a '/' (e.g., <CODE class=
|
||||||
|
"computeroutput">file:/C:/mydir/mydb</CODE>).</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</BODY>
|
||||||
|
</HTML>
|
@ -0,0 +1,993 @@
|
|||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||||
|
|
||||||
|
<HTML>
|
||||||
|
<HEAD>
|
||||||
|
<META name="generator" content=
|
||||||
|
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||||
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||||
|
|
||||||
|
<TITLE>Database Configuration</TITLE>
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||||
|
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||||
|
<LINK rel="home" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="up" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="prev" href="DatabaseOverview.html" title="BSim Database">
|
||||||
|
<LINK rel="next" href="IngestProcess.html" title="Ingesting Executables">
|
||||||
|
</HEAD>
|
||||||
|
|
||||||
|
<BODY>
|
||||||
|
<DIV class="chapter">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H1 class="title"><A name="DatabaseConfiguration"></A>Database Configuration</H1>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="ConfigOverview"></A>Overview</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The server for the BSim Database is distinct from the traditional Ghidra server,
|
||||||
|
although for many use cases it is convenient to have both running and view the BSim server
|
||||||
|
as a loosely coupled extension to the base Ghidra Server. In terms of start-up, shutdown,
|
||||||
|
and configuration however, the two servers are completely separate.</P>
|
||||||
|
|
||||||
|
<P>There are two choices for deploying a shared server for the BSim Database: PostgreSQL or
|
||||||
|
Elasticsearch. In addition, a local file-based database may be employed which utilizes an
|
||||||
|
integrated H2 Database engine. This file-based database is intended for smaller datasets
|
||||||
|
and its use is limited to a single process.</P>
|
||||||
|
|
||||||
|
<P>PostgreSQL software, including the extension necessary for BSim signature indexing,
|
||||||
|
comes prepackaged with the Ghidra distribution. It runs on a single host and makes
|
||||||
|
efficient use of whatever CPU, memory, and disk resources are made available to it.
|
||||||
|
PostgreSQL is a highly robust and capable server that should perform well on minimally
|
||||||
|
configured workstations up to high-end production hardware.</P>
|
||||||
|
|
||||||
|
<P>An Elasticsearch BSim plug-in is included with the Ghidra distribution, but the core
|
||||||
|
server software must be obtained separately by the database administrator. Elasticsearch is
|
||||||
|
a scalable text search and analytics database. It automatically distributes itself across
|
||||||
|
machines in a cluster, allowing individual database queries and requests to be serviced in
|
||||||
|
parallel. Support for BSim in Elasticsearch should still be considered in prototype, but
|
||||||
|
all major functionality has been implemented, and the BSim schema takes full advantage of
|
||||||
|
Elasticsearch as a distributed database.</P>
|
||||||
|
|
||||||
|
<P>BSim clients included in the base Ghidra distribution can interface to any of these
|
||||||
|
databases.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="ServerConfig"></A>Server
|
||||||
|
Configuration</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="PostConfig"></A>PostgreSQL Configuration</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The base Ghidra distribution comes with the PostgreSQL software and the extensions
|
||||||
|
necessary for supporting a BSim database. The PostgreSQL server is most easily managed
|
||||||
|
using the <SPAN class="bold"><STRONG>bsim_ctl</STRONG></SPAN> command-line script. When
|
||||||
|
<SPAN class="bold"><STRONG>bsim_ctl start</STRONG></SPAN> is run for the first time (see
|
||||||
|
below), the PostgreSQL software is unpacked, depending on the host OS, to either</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/Ghidra/Features/BSim/os/linux64/postgresql
|
||||||
|
OR</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class=
|
||||||
|
"computeroutput">$(ROOT)/Ghidra/Features/BSim/os/osx64/postgresql</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>BSim will not operate with PostgreSQL without the Ghidra specific extensions, but
|
||||||
|
otherwise the provided installation is standard. It can be configured just like any other
|
||||||
|
stand-alone PostgreSQL server. PostgreSQL is highly configurable, and there are no direct
|
||||||
|
restrictions on modifying the configuration values. A default configuration is provided
|
||||||
|
with this installation that has been tuned specifically for the BSim Database
|
||||||
|
application, so in practice there may be little reason to modify it. But there are a few
|
||||||
|
standard configuration values for the server that might need adjusting. These do impact
|
||||||
|
important aspects of the server, like the amount of memory allocated to the server and
|
||||||
|
access restrictions.</P>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="PostStartStop"></A>Starting and Stopping the
|
||||||
|
Server</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The basic start-up and shut-down is accomplished with the same command-line script,
|
||||||
|
which takes either the keyword <SPAN class="command"><STRONG>start</STRONG></SPAN> or
|
||||||
|
<SPAN class="command"><STRONG>stop</STRONG></SPAN> as the first parameter. The second
|
||||||
|
parameter must be an absolute path to the chosen data directory.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl start
|
||||||
|
/path/to/datadir</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl start /path/to/datadir
|
||||||
|
port=8000</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl stop
|
||||||
|
/path/to/datadir</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl stop /path/to/datadir
|
||||||
|
force</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The data directory should already exist and should initially not contain any files.
|
||||||
|
The first time a server is started for a particular data directory, a large number of
|
||||||
|
configuration files and other sub-directories associated with the PostgreSQL server
|
||||||
|
will automatically be created. Upon subsequent restarts the existing configuration will
|
||||||
|
be reused.</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="bold"><STRONG>start</STRONG></SPAN> command can take an optional
|
||||||
|
<SPAN class="bold"><STRONG>port=</STRONG></SPAN> parameter. This can be used to specify
|
||||||
|
a non-standard port for the PostgreSQL server to listen on. In this case, any
|
||||||
|
subsequent reference to the BSim server, in the Ghidra client, or with the <SPAN class=
|
||||||
|
"command"><STRONG>bsim</STRONG></SPAN> command described below, must specify the port.
|
||||||
|
When using the <SPAN class="command"><STRONG>bsim</STRONG></SPAN> command, a
|
||||||
|
non-default port must be explicitly specified with the BSim <SPAN class=
|
||||||
|
"command"><STRONG>postgresql://</STRONG></SPAN> URL (see <A class="xref" href=
|
||||||
|
"CommandLineReference.html#URLs">“Ghidra and BSim URLs”</A> for more
|
||||||
|
details).</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="command"><STRONG>stop</STRONG></SPAN> command can take the keyword
|
||||||
|
<SPAN class="command"><STRONG>force</STRONG></SPAN> as an optional parameter. Without
|
||||||
|
this, the shutdown of the server will wait until all currently connected clients finish
|
||||||
|
their sessions. Adding this parameter will cause all clients to be disconnected
|
||||||
|
immediately, rolling back any transactions, and the server will shutdown
|
||||||
|
immediately.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="PostSecurityAuthentication"></A>Security and
|
||||||
|
Authentication</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>BSim makes use of PostgreSQL security mechanisms to enforce privileges and
|
||||||
|
authenticate users. The <SPAN class="command"><STRONG>bsim_ctl</STRONG></SPAN> command
|
||||||
|
wraps the subset of functionality described here, but other adjustments are possible by
|
||||||
|
connecting directly to the server and issuing SQL commands.</P>
|
||||||
|
|
||||||
|
<P>The PostgreSQL server, as configured for BSim, only accepts connections via SSL, so
|
||||||
|
communications in transit are always encrypted regardless of the authentication
|
||||||
|
settings.</P>
|
||||||
|
|
||||||
|
<P>PostgreSQL uses the concept of <SPAN class="emphasis"><EM>roles</EM></SPAN> to grant
|
||||||
|
access privileges based on particular users. Generally, a user's role is determined by
|
||||||
|
the <SPAN class="emphasis"><EM>username</EM></SPAN> used to establish the connection.
|
||||||
|
For BSim, each user role is granted one of two privilege levels: <SPAN class=
|
||||||
|
"command"><STRONG>user</STRONG></SPAN>, which allows read-only access to the server for
|
||||||
|
normal queries, and <SPAN class="command"><STRONG>admin</STRONG></SPAN>, which
|
||||||
|
additionally allows database creation, ingest, update, and deletion.</P>
|
||||||
|
|
||||||
|
<P>BSim supports three different authentication methods, when connecting as a client or
|
||||||
|
during database ingest and maintenance. This method is established for a server by the
|
||||||
|
initial <SPAN class="command"><STRONG>start</STRONG></SPAN> command.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="variablelist">
|
||||||
|
<DL class="variablelist">
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>trust</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P><CODE class="computeroutput">bsim_ctl start /path/to/datadir
|
||||||
|
auth=trust</CODE></P>
|
||||||
|
|
||||||
|
<P>This is currently the default. No authentication is performed and privilege
|
||||||
|
is granted based on the user name presented. Masquerading is possible.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>password</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P><CODE class="computeroutput">bsim_ctl start /path/to/datadir
|
||||||
|
auth=password</CODE></P>
|
||||||
|
|
||||||
|
<P>Users are authenticated via password. A default password 'changeme' is
|
||||||
|
established when the new user is created. Passwords can be changed by the user
|
||||||
|
from the BSim client or can be reset by an administrator via the <SPAN class=
|
||||||
|
"command"><STRONG>resetpassword</STRONG></SPAN> command.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class="bold"><STRONG>pki</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P><CODE class="computeroutput">bsim_ctl start /path/to/datadir auth=pki
|
||||||
|
ca=/path/to/rootcert</CODE></P>
|
||||||
|
|
||||||
|
<P>Users are authenticated by PKI certificates. Upon initialization, the BSim
|
||||||
|
server must be provided (via the <SPAN class=
|
||||||
|
"command"><STRONG>ca=</STRONG></SPAN> option) a file containing the public keys
|
||||||
|
for the certificate authorities used to issue user's certificates. The file
|
||||||
|
consists of the authoritative certificates in PEM format concatenated
|
||||||
|
together.</P>
|
||||||
|
|
||||||
|
<P>BSim users must register their certificate with the Ghidra client using the
|
||||||
|
<SPAN class="emphasis"><EM>Edit->Set PKI Certificate...</EM></SPAN> menu
|
||||||
|
option from the Project dialog. The BSim client will automatically submit the
|
||||||
|
certificate to a server that requests it, and the password to unlock it will be
|
||||||
|
requested as needed. This is the same mechanism used to a access a PKI
|
||||||
|
protected Ghidra server, and if a user needs access to both a BSim server and
|
||||||
|
Ghidra server that are PKI protected, the servers should probably be configured
|
||||||
|
with the same certificate authorities so that they will accept the same
|
||||||
|
certificate from the user.</P>
|
||||||
|
|
||||||
|
<P>With PKI authentication enabled, at the time a new user role is established
|
||||||
|
with the server, the X.509 Distinguished Name, as bound to the user's
|
||||||
|
certificate, must be associated with the user name via the <SPAN class=
|
||||||
|
"command"><STRONG>dn=</STRONG></SPAN> option. See <A class="xref" href=
|
||||||
|
"#PostAddUser" title="Adding Users to the Database">“Adding Users to the
|
||||||
|
Database”</A>.</P>
|
||||||
|
</DD>
|
||||||
|
</DL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The authentication method should be established once, the first time the <SPAN
|
||||||
|
class="command"><STRONG>start</STRONG></SPAN> command is issued for the server on an
|
||||||
|
empty data directory. Subsequent restarts of the server will not change these settings.
|
||||||
|
If the settings really need to be changed, the <SPAN class=
|
||||||
|
"command"><STRONG>changeauth</STRONG></SPAN> command can be issued. It takes the same
|
||||||
|
options as the <SPAN class="command"><STRONG>start</STRONG></SPAN> command and can only
|
||||||
|
be run if the server is shutdown first.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl changeauth
|
||||||
|
/datadir/path auth=password</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Using the <SPAN class="command"><STRONG>changeauth</STRONG></SPAN> command on a
|
||||||
|
server with an established set of users will likely require other disruptive changes to
|
||||||
|
create passwords or associate Distinguished Names with users, if they didn't exist
|
||||||
|
before.</P>
|
||||||
|
|
||||||
|
<P>If it is determined that only the database administrators have OS level, local,
|
||||||
|
access to the server's host machine, they can choose to use the <SPAN class=
|
||||||
|
"command"><STRONG>noLocalAuth</STRONG></SPAN> option as part of the <SPAN class=
|
||||||
|
"command"><STRONG>start</STRONG></SPAN> or <SPAN class=
|
||||||
|
"command"><STRONG>changeauth</STRONG></SPAN> commands. This disables authentication for
|
||||||
|
users connecting to the server by the 'localhost' interface. This may facilitate the
|
||||||
|
use of scripts for ingest etc., where working with passwords is cumbersome.
|
||||||
|
Authentication is still enforced for any remote connection.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="PostAddUser"></A>Adding Users to the Database</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The username used to start the server for the first time, causing the initialization
|
||||||
|
of the data directory, becomes the administrator for that server. No other
|
||||||
|
username/role is initially known to the server. New usernames/roles can be added to the
|
||||||
|
server using the following command:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl adduser <SPAN class=
|
||||||
|
"emphasis"><EM>username</EM></SPAN></CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl adduser <SPAN class=
|
||||||
|
"emphasis"><EM>username</EM></SPAN> dn="C=US,ST=MD,CN=Firstname User"</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>If password authentication has been set for the server, the new user's password will
|
||||||
|
initially be set to 'changeme'. If PKI authentication has been set for the server, The
|
||||||
|
Distinguished Name, as bound to the new user's certificated must be provided when
|
||||||
|
issuing the <SPAN class="command"><STRONG>adduser</STRONG></SPAN> command, via the
|
||||||
|
<SPAN class="command"><STRONG>dn=</STRONG></SPAN> option. The Distinguished Name must
|
||||||
|
be presented as a string containing a comma separated sequence of attribute/value pairs
|
||||||
|
that uniquely identifies a certificate. Currently, the Common Name (CN=) is the only
|
||||||
|
attribute inspected by the PostgreSQL server, so other attributes can be omitted.</P>
|
||||||
|
|
||||||
|
<P>New users are by default only given <SPAN class=
|
||||||
|
"command"><STRONG>user</STRONG></SPAN> permissions, meaning that they can only place
|
||||||
|
queries to the database and cannot ingest, update, or delete data. The new user can be
|
||||||
|
given <SPAN class="command"><STRONG>admin</STRONG></SPAN> privileges (by an existing
|
||||||
|
administrator) by issuing the command:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim_ctl changeprivilege <SPAN
|
||||||
|
class="emphasis"><EM>username</EM></SPAN> admin</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="PostAdditionalConfig"></A>Additional
|
||||||
|
Configuration</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The relevant configuration files are at the top level of the data directory:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">postgresql.conf</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">pg_hba.conf</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The most important configuration parameters in <CODE class=
|
||||||
|
"filename">postgresql.conf</CODE> are:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="variablelist">
|
||||||
|
<DL class="variablelist">
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>shared_buffers</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>This controls the amount of RAM available for caching database pages across
|
||||||
|
all connections to the server. The default should be reasonable in most
|
||||||
|
situations, but for large databases or many simultaneous connections it might
|
||||||
|
make sense to increase this.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>max_wal_size</STRONG></SPAN>,</SPAN> <SPAN class="term"><SPAN
|
||||||
|
class="bold"><STRONG>checkpoint_timeout</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>These control how often the server forces database pages to be written back
|
||||||
|
out to the file-system. The defaults are set to minimize disk writes when
|
||||||
|
ingesting large numbers of records in one session. There should be little
|
||||||
|
reason to change these values.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>ssl_cipher</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>This controls which ciphers the server allows when negotiating a connection.
|
||||||
|
The defaults are reasonable, but administrators may want more control. The
|
||||||
|
setting 'TLSv1.2', for instance, can be used to be compliant with the latest
|
||||||
|
TLS standard.</P>
|
||||||
|
</DD>
|
||||||
|
</DL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The <CODE class="filename">pg_hba.conf</CODE> file is used to configure which
|
||||||
|
connections the server accepts for a particular outward facing IP address and what
|
||||||
|
security mechanisms are enforced for those connections. Currently all addresses are
|
||||||
|
configured to accept SSL connections only, except possibly for 'localhost'.
|
||||||
|
Administrators <SPAN class="emphasis"><EM>can</EM></SPAN> currently filter connections
|
||||||
|
based on usernames and the particular database (which corresponds to Ghidra's concept
|
||||||
|
of <SPAN class="emphasis"><EM>repository</EM></SPAN>).</P>
|
||||||
|
|
||||||
|
<DIV class="warning" style="margin-left: 0.5in; margin-right: 0.5in;">
|
||||||
|
<H3 class="title">Warning</H3>
|
||||||
|
|
||||||
|
<P>By default, the server accepts all connections from all users.</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="ConfigDefaults"></A>Configuration Defaults</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>There is a <CODE class="filename">serverconfig.xml</CODE> which contains a few of
|
||||||
|
the default configuration values that are most crucial for the BSim Database. <SPAN
|
||||||
|
class="bold"><STRONG>Beware:</STRONG></SPAN> This file is currently parsed only once
|
||||||
|
for the entire <SPAN class="emphasis"><EM>lifetime</EM></SPAN> of a particular data
|
||||||
|
directory: it is read only when the data directory is first initialized, i.e. the first
|
||||||
|
time the <SPAN class="command"><STRONG>bsim_ctl start</STRONG></SPAN> command is
|
||||||
|
invoked on the empty directory. This file is intended to provide reasonable defaults
|
||||||
|
that are different from the standard PostgreSQL defaults. To provide site specific
|
||||||
|
configuration, changes should be made to the normal PostgreSQL configuration files.</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="ElasticConfig"></A>Elasticsearch Configuration</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>A full description of how to configure an Elasticsearch cluster, including how to
|
||||||
|
start and stop the server, is beyond the scope of this document. In particular, the <SPAN
|
||||||
|
class="command"><STRONG>bsim_ctl</STRONG></SPAN> command-line, as described in <A class=
|
||||||
|
"xref" href="DatabaseConfiguration.html#PostConfig" title=
|
||||||
|
"PostgreSQL Configuration">“PostgreSQL Configuration”</A>, does not apply to
|
||||||
|
Elasticsearch. Complete documentation is available on-line from the Elasticsearch
|
||||||
|
website.</P>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="ElasticInstall"></A>Installing the Plug-in</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>In order to make use of Elasticsearch with BSim, the database administrator must
|
||||||
|
install the <SPAN class="emphasis"><EM>lsh.zip</EM></SPAN> plug-in as part of the
|
||||||
|
Elasticsearch deployment. The plug-in is available in the Ghidra add-on named <SPAN
|
||||||
|
class="emphasis"><EM>BSimElasticPlugin</EM></SPAN>, which unpacks into a standard
|
||||||
|
Ghidra installation. The file <SPAN class="emphasis"><EM>lsh.zip</EM></SPAN> is a
|
||||||
|
standard Elasticsearch plug-in that must be installed on every node of the cluster
|
||||||
|
before a BSim repository can be created. The Elasticsearch distribution typically comes
|
||||||
|
preconfigured for a single node deployment. The description below shows how to enable
|
||||||
|
BSim on such a toy deployment, but this will need to be extended to support an entire
|
||||||
|
cluster.</P>
|
||||||
|
|
||||||
|
<P>Assuming the add-on has been unpacked, the plug-in can be installed to a single node
|
||||||
|
using the <SPAN class="emphasis"><EM>elasticsearch-plugin</EM></SPAN> command in the
|
||||||
|
<SPAN class="emphasis"><EM>bin</EM></SPAN> directory of the node's Elasticsearch
|
||||||
|
installation.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">bin/elasticsearch-plugin install
|
||||||
|
file:///path/to/ghidra/Ghidra/contrib/BSimElasticPlugin/data/lsh.zip</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Replace the initial portion of the absolute path in the URL to point to the Ghidra
|
||||||
|
installation. Once the plug-in is installed, the toy deployment can be (re)started from
|
||||||
|
the command-line by running</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">bin/elasticsearch</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>This will dump logging messages to the console, and you should see <CODE class=
|
||||||
|
"computeroutput">[lsh]</CODE> listed among the loaded plug-ins as the node starts
|
||||||
|
up.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect3">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H4 class="title"><A name="ElasticURL"></A>The Elasticsearch URL</H4>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Assuming an Elasticsearch cluster is running and the plug-in has been properly
|
||||||
|
installed, all other parts of BSim interact transparently with the cluster. The <SPAN
|
||||||
|
class="command"><STRONG>bsim</STRONG></SPAN> command, described in <A class="xref"
|
||||||
|
href="IngestProcess.html" title="Ingesting Executables"><I>Ingesting
|
||||||
|
Executables</I></A>, and the Ghidra/BSim client, described in <A class="xref" href=
|
||||||
|
"../BSimSearchPlugin/BSimSearch.html" title="Querying a BSim Database"><I>Querying a BSim
|
||||||
|
Database</I></A>, require no additional configuration to work with Elasticsearch,
|
||||||
|
except users must provide the correct URL to establish a connection. Elasticsearch
|
||||||
|
communicates over <SPAN class="emphasis"><EM>https</EM></SPAN>, and BSim clients
|
||||||
|
automatically assume they are communicating with Elasticsearch when they see this
|
||||||
|
protocol. Alternatively, the protocol may be specified as <SPAN class=
|
||||||
|
"emphasis"><EM>elastic</EM></SPAN> when using the <SPAN class=
|
||||||
|
"command"><STRONG>bsim</STRONG></SPAN> command. Elasticsearch use by BSim assumes a
|
||||||
|
default port of 9200 unless otherwise specified when specifying the server host. See <A
|
||||||
|
class="xref" href="CommandLineReference.html#URLs">“Ghidra and BSim
|
||||||
|
URLs”</A> for additional information about URLs.</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="CreateDatabase"></A>Creating a
|
||||||
|
Database</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>If using either PostgreSQL or Elasticsearch the server must be properly configured and
|
||||||
|
running before a <SPAN class="bold"><STRONG>database</STRONG></SPAN> can be created. In the
|
||||||
|
case of an H2 file-based database there is no server requirement. Only after a database has
|
||||||
|
been created can data be ingested or queries performed. In this context, a database is a
|
||||||
|
single container of reverse engineered functions. Metadata pertaining to executables and
|
||||||
|
call-graph relationships is also stored, but the principle database record describes a
|
||||||
|
<SPAN class="emphasis"><EM>function</EM></SPAN>. A single PostgreSQL or Elasticsearch
|
||||||
|
server can hold multiple independent databases.</P>
|
||||||
|
|
||||||
|
<P>A database is created using the <SPAN class="command"><STRONG>bsim</STRONG></SPAN>
|
||||||
|
command script. The basic command looks like</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim createdatabase <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> <SPAN class=
|
||||||
|
"emphasis"><EM>config_template</EM></SPAN></CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>A BSim database is completely distinct from the Ghidra Server or Ghidra project, so the
|
||||||
|
executables and functions contained within do not need to coincide at all.</P>
|
||||||
|
|
||||||
|
<P>The Ghidra GUI client specifies a BSim database with its explicit characteristics (i.e.,
|
||||||
|
DB type, name, host/port if applicable, etc.), while the <SPAN class=
|
||||||
|
"command"><STRONG>bsim</STRONG></SPAN> command accepts a <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> which includes similar details (see <A class="xref"
|
||||||
|
href="CommandLineReference.html#URLs">“Ghidra and BSim URLs”</A> for more
|
||||||
|
details).</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="emphasis"><EM>config_template</EM></SPAN> parameter passed to <SPAN
|
||||||
|
class="command"><STRONG>bsim createdatabase</STRONG></SPAN> names a collection of specific
|
||||||
|
configuration values for the newly created database. A standard Ghidra distribution
|
||||||
|
provides a number of predefined templates (See below) designed for specific database use
|
||||||
|
cases. It is simplest to use a predefined template when creating a database, but it is
|
||||||
|
possible to edit an existing template or create a new template (See <A class="xref" href=
|
||||||
|
"DatabaseConfiguration.html#DatabaseTemplates" title=
|
||||||
|
"Creating Database Templates">“Creating Database Templates”</A>).</P>
|
||||||
|
|
||||||
|
<P>There are two critical database properties being determined by the template that need to
|
||||||
|
be kept in mind: the <SPAN class="bold"><STRONG>index tuning</STRONG></SPAN> and the <SPAN
|
||||||
|
class="bold"><STRONG>weighting scheme</STRONG></SPAN> relative to the size of the database.
|
||||||
|
The two pieces of the template name, separated by the '_' character, refer to these
|
||||||
|
concerns.</P>
|
||||||
|
|
||||||
|
<P>The index tuning affects the use of the database by trading off between, the time
|
||||||
|
required to perform individual queries, the amount of variation between matching functions
|
||||||
|
a query can tolerate, and the amount of storage required per database record. Ideally, the
|
||||||
|
database is tuned, before the initial ingest occurs, to the <SPAN class=
|
||||||
|
"emphasis"><EM>anticipated size</EM></SPAN> of the database. The database can trade off
|
||||||
|
storage size (per record) and latency for overall query response time, but the decision
|
||||||
|
needs to be made before the database is populated. Currently there is a <SPAN class=
|
||||||
|
"bold"><STRONG>medium</STRONG></SPAN> tuning that is ideal for repositories that will store
|
||||||
|
on the order of 10 million functions. There is also a <SPAN class=
|
||||||
|
"bold"><STRONG>large</STRONG></SPAN> tuning, which uses more storage but can maintain fast
|
||||||
|
query times for databases with 100 million functions or more. There is a large overlap for
|
||||||
|
these tunings, so if its unclear how large a database might grow, go ahead and use the
|
||||||
|
medium tuning.</P>
|
||||||
|
|
||||||
|
<P>The weighting scheme affects how BSim views the relative importance of individual code
|
||||||
|
constructs within a function. Code constructions are extracted as <SPAN class=
|
||||||
|
"emphasis"><EM>features</EM></SPAN>, and each feature is assigned a weight. The basic
|
||||||
|
schemes are: <SPAN class="bold"><STRONG>32</STRONG></SPAN> for 32-bit compiled code, <SPAN
|
||||||
|
class="bold"><STRONG>64</STRONG></SPAN> for 64-bit code. The scheme that matches the
|
||||||
|
predominant form of code in the repository being ingested should be used. Mixed schemes are
|
||||||
|
possible, but a corpus which is predominantly 32-bit, even with a small number of 64-bit
|
||||||
|
executables mixed in, should still use the 32-bit weights.</P>
|
||||||
|
|
||||||
|
<P>There are some weighting schemes designed for more specialized code. The <SPAN class=
|
||||||
|
"bold"><STRONG>64_32</STRONG></SPAN> scheme is for 64-bit code using 32-bit pointers. The
|
||||||
|
<SPAN class="bold"><STRONG>nosize</STRONG></SPAN> scheme allows better matching of 32-bit
|
||||||
|
functions to 64-bit functions, when they are compiled from the same source. The <SPAN
|
||||||
|
class="bold"><STRONG>cpool</STRONG></SPAN> scheme is designed for Java byte-code or Dalvik
|
||||||
|
executables. For more discussion of weighting, see <A class="xref" href=
|
||||||
|
"FeatureWeight.html#WeightingSoftware" title="Weighting Software Features">“Weighting
|
||||||
|
Software Features”</A>.</P>
|
||||||
|
|
||||||
|
<P>The full template name incorporates both an index tuning and a weight scheme. Some
|
||||||
|
common examples of template names:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="variablelist">
|
||||||
|
<DL class="variablelist">
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>medium_32</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>A medium index tuning with a weighting scheme designed for 32-bit
|
||||||
|
executables.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>medium_64</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>A medium index tuning with a weighting scheme designed for 64-bit
|
||||||
|
executables.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>large_32</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>A 32-bit weighting scheme with tuning for a large database size.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>medium_cpool</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>A medium index tuning with a weighting scheme for Java executables.</P>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT><SPAN class="term"><SPAN class=
|
||||||
|
"bold"><STRONG>medium_nosize</STRONG></SPAN></SPAN></DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
<P>A medium index tuning with a weighting scheme allowing matches between 32-bit
|
||||||
|
and 64-bit code.</P>
|
||||||
|
</DD>
|
||||||
|
</DL>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="TailorBSim"></A>Tailoring BSim
|
||||||
|
Metadata</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>There is some facility to tailor a specific BSim database instance so that it can ingest
|
||||||
|
and/or report information about executables or functions to make results more useful for a
|
||||||
|
specific project or user. Capabilities can be added after a database has been created and
|
||||||
|
is running by issuing specific <SPAN class="command"><STRONG>bsim</STRONG></SPAN> commands,
|
||||||
|
but they can also be added to a <SPAN class="emphasis"><EM>configuration
|
||||||
|
template</EM></SPAN> prior to creating the database, which provides a record of the
|
||||||
|
specific additions should the database instance need to be recreated or multiple tailored
|
||||||
|
instances be deployed. For additions that allow the ingest of more metadata about
|
||||||
|
executables or functions, users must provide additional scripts to Ghidra during the ingest
|
||||||
|
process in order to read in or discover the new metadata.</P>
|
||||||
|
|
||||||
|
<P>The <SPAN class="bold"><STRONG>Name</STRONG></SPAN>, <SPAN class=
|
||||||
|
"bold"><STRONG>Owner</STRONG></SPAN>, and <SPAN class=
|
||||||
|
"bold"><STRONG>Description</STRONG></SPAN> associated with a BSim instance can be trivially
|
||||||
|
tailored with the <SPAN class="command"><STRONG>bsim setmetadata</STRONG></SPAN>
|
||||||
|
command.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim setmetadata <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> "name=BSim Database"</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim setmetadata <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> "owner=Administrators"</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim setmetadata <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> "description=Files of interest"</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>This information is displayed in various windows by the BSim client. The values can be
|
||||||
|
changed at any time and do not otherwise affect the records contained in the database.
|
||||||
|
Multiple command-line parameters can be fed to <SPAN class="command"><STRONG>bsim
|
||||||
|
setmetadata</STRONG></SPAN> so long as each one starts with <SPAN class=
|
||||||
|
"bold"><STRONG>name=</STRONG></SPAN>, <SPAN class="bold"><STRONG>owner=</STRONG></SPAN>, or
|
||||||
|
<SPAN class="bold"><STRONG>description=</STRONG></SPAN> respectively. Quoting may be
|
||||||
|
necessary to get some strings to be interpreted as a single command-line parameter.</P>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="ExeCat"></A>Executable Categories</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>BSim provides the powerful ability to associate new types of metadata with each
|
||||||
|
executable that the database ingests. Any method of categorizing executables that
|
||||||
|
describes an executable with a simple string value, referred to here as an executable
|
||||||
|
<SPAN class="bold"><STRONG>category</STRONG></SPAN>, can be added as a field to the
|
||||||
|
database. With only minor adjustments to the ingest process, new category values can be
|
||||||
|
automatically attached to incoming executables and are treated like any other executable
|
||||||
|
field that BSim understands. Category values are retrieved with queries, can be used for
|
||||||
|
filtering, and show up as sortable columns in result tables.</P>
|
||||||
|
|
||||||
|
<P>All categories have a formal name (or type), which is used both in the ingest process
|
||||||
|
(See below) and as the label for table columns. The name can contain alphanumeric
|
||||||
|
characters or punctuation from the limited set, ' ._:/()'. For each executable there can
|
||||||
|
be zero, one, or more <SPAN class="emphasis"><EM>string</EM></SPAN> values associated
|
||||||
|
with the category. No value is required for the executable, and any single value can be
|
||||||
|
used for filtering (either the executable is labeled with the value or it is not) even if
|
||||||
|
there are multiple values for that category. If there are multiple values, a query that
|
||||||
|
matches the executable will return all the values as a single sorted column entry.</P>
|
||||||
|
|
||||||
|
<P>It is also possible to create a special time-based category. This category can have
|
||||||
|
any name as above, but instead of associating string values with the executable, it
|
||||||
|
associates a single time-stamp. The time-stamp has precision down to the millisecond and
|
||||||
|
provides filtering and sorting based on time. Internally, this new category repurposes
|
||||||
|
the column storage originally providing an executable's <SPAN class="emphasis"><EM>Ingest
|
||||||
|
Date</EM></SPAN> field. As a result, any BSim instance
|
||||||
|
can have only one time category and only one time-stamp within it. The ingest scripting
|
||||||
|
must provide any actual time-stamp value for the executable, or the database will fill in
|
||||||
|
the "epoch", 12:00 am, Jan 1, 1970.</P>
|
||||||
|
|
||||||
|
<P>A new category can be added to the database at any time using the <SPAN class=
|
||||||
|
"command"><STRONG>bsim addexecategory</STRONG></SPAN> command.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim addexecategory <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> MyCategoryName</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim addexecategory <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> MyTimeField date</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The single time-stamp field can be renamed by appending the keyword "date" to the
|
||||||
|
command after the name of the category. Once a category, the corresponding program
|
||||||
|
options set for any new executables will automatically read into the database as part of
|
||||||
|
the ingest process. Previously ingested executables, assuming they have the new program
|
||||||
|
options set, can be updated within the BSim database using one of the <SPAN class=
|
||||||
|
"command"><STRONG>bsim updaterepo</STRONG></SPAN> command variants. In either case, the
|
||||||
|
relevant program options typically need to be filled by running a Ghidra script (See <A
|
||||||
|
class="xref" href="IngestProcess.html#IngestExeCat" title=
|
||||||
|
"Ingesting Executable Categories">“Ingesting Executable Categories”</A>).
|
||||||
|
There is currently no method for deleting a category once it has been created.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="FunctionTags"></A>Function Tags</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>BSim can be configured to recognize specific <SPAN class="bold"><STRONG>Function
|
||||||
|
Tags</STRONG></SPAN>, which are named Boolean properties on individual functions within
|
||||||
|
an executable. Within a Ghidra program, any number of different function tags can be
|
||||||
|
established by the user and are used to label individual functions or specific subsets of
|
||||||
|
functions that share a particular property. This would typically be used to label classes
|
||||||
|
of functions that are important to analysts, unpacked functions could be labeled with the
|
||||||
|
tag <SPAN class="emphasis"><EM>UNPACKED</EM></SPAN> for instance.</P>
|
||||||
|
|
||||||
|
<P>In order for BSim to recognize specific function tags, they must be individually
|
||||||
|
registered with the BSim database. These tags are then automatically ingested into the
|
||||||
|
database, along with the other standard metadata describing functions, and can be used to
|
||||||
|
filter match results when querying the database. A function tag has a formal name, which
|
||||||
|
can be displayed as part of the function header within the main code browser and is used
|
||||||
|
for BSim columns and filter labels. Once the tag is created for a program, functions
|
||||||
|
universally have the tag as a Boolean property, either the name applies to a function or
|
||||||
|
it doesn't, and arbitrary subsets can be <SPAN class="emphasis"><EM>tagged</EM></SPAN>
|
||||||
|
with that name.</P>
|
||||||
|
|
||||||
|
<P>A tag must be <SPAN class="emphasis"><EM>registered</EM></SPAN> with a BSim database
|
||||||
|
before it can be used as a filter or seen in results. A tag can be registered at any time
|
||||||
|
with the <SPAN class="command"><STRONG>bsim addfunctiontag</STRONG></SPAN> command.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/support/bsim addfunctiontag <SPAN class=
|
||||||
|
"emphasis"><EM>bsimURL</EM></SPAN> MyTagName</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The new tag will automatically be read in when any new executables are ingested. If
|
||||||
|
previously ingested executables already had the new tags before they were registered,
|
||||||
|
their metadata within BSim database can be updated using the <SPAN class=
|
||||||
|
"command"><STRONG>bsim updaterepo</STRONG></SPAN> command variants. BSim is limited to 29
|
||||||
|
registered tag names, and there is currently no way to remove a tag once it has been
|
||||||
|
registered.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="DatabaseTemplates"></A>Creating Database Templates</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>It is possible to create tailored database configuration templates so that
|
||||||
|
implementors have a permanent and accessible record of a particular set-up and don't need
|
||||||
|
to repeatedly issue <SPAN class="command"><STRONG>bsim setmetadata</STRONG></SPAN> and
|
||||||
|
<SPAN class="command"><STRONG>bsim addexecategory</STRONG></SPAN> when creating a
|
||||||
|
database. Other aspects of a database can also be manipulated, like weighting schemes and
|
||||||
|
index tuning, but doing this properly is beyond the scope of this document. A <SPAN
|
||||||
|
class="bold"><STRONG>database template</STRONG></SPAN> is the basic set of configuration
|
||||||
|
parameters used to set up BSim database instance. The configuration parameters are
|
||||||
|
established for a particular database when the <SPAN class="command"><STRONG>bsim
|
||||||
|
createdatabase</STRONG></SPAN> command is run (See <A class="xref" href=
|
||||||
|
"DatabaseConfiguration.html#CreateDatabase" title="Creating a Database">“Creating a
|
||||||
|
Database”</A>). The template name passed on the command-line actually identifies an
|
||||||
|
XML file-name, appended with the '.xml' suffix, in the directory:</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<TABLE border="0" summary="Simple list" class="simplelist">
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput">$(ROOT)/Ghidra/Features/BSim/data</CODE></TD>
|
||||||
|
</TR>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The file has a root tag of <SPAN class="emphasis"><EM><dbconfig></EM></SPAN>,
|
||||||
|
and the first child tag of this root is the <SPAN class=
|
||||||
|
"emphasis"><EM><info></EM></SPAN> tag. This tag contains all the metadata tags that
|
||||||
|
can be easily changed or added to the database. A list of the metadata tags follows. The
|
||||||
|
metadata is provided as formal text content within the tag, and none of the tags
|
||||||
|
currently take attributes.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="table">
|
||||||
|
<TABLE width="80%" frame="none">
|
||||||
|
<COL width="30%">
|
||||||
|
<COL width="70%">
|
||||||
|
|
||||||
|
<THEAD>
|
||||||
|
<TR>
|
||||||
|
<TD><SPAN class="bold"><STRONG>XML Tag</STRONG></SPAN></TD>
|
||||||
|
|
||||||
|
<TD><SPAN class="bold"><STRONG>Description</STRONG></SPAN></TD>
|
||||||
|
</TR>
|
||||||
|
</THEAD>
|
||||||
|
|
||||||
|
<TBODY>
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><name></CODE></TD>
|
||||||
|
|
||||||
|
<TD>Name of the database</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><owner></CODE></TD>
|
||||||
|
|
||||||
|
<TD>Owner of the database</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><description></CODE></TD>
|
||||||
|
|
||||||
|
<TD>An overarching description of the database</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><major></CODE></TD>
|
||||||
|
|
||||||
|
<TD>Major decompiler version used for ingest (Should be set to zero)</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><minor></CODE></TD>
|
||||||
|
|
||||||
|
<TD>Minor decompiler version used for ingest (Should be set to zero)</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><settings></CODE></TD>
|
||||||
|
|
||||||
|
<TD>Specific settings for the signature strategy (Should be set to zero)</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><execategory></CODE></TD>
|
||||||
|
|
||||||
|
<TD>The name of an executable category (to be) defined for this database
|
||||||
|
instance</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><datename></CODE></TD>
|
||||||
|
|
||||||
|
<TD>The name of the timestamp column</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD><CODE class="computeroutput"><functiontag></CODE></TD>
|
||||||
|
|
||||||
|
<TD>The name of a function tag (to be) registered with this database
|
||||||
|
instance</TD>
|
||||||
|
</TR>
|
||||||
|
</TBODY>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>There can be multiple <SPAN class="emphasis"><EM><execategory></EM></SPAN> tags
|
||||||
|
if more than one category is desired and both <SPAN class=
|
||||||
|
"emphasis"><EM><execategory></EM></SPAN> and <SPAN class=
|
||||||
|
"emphasis"><EM><datename></EM></SPAN> are optional tags. The date column name
|
||||||
|
defaults to 'Ingest Date' and is drawn from the corresponding Ghidra program option. The
|
||||||
|
tag order needs to be preserved. There can be multiple <SPAN class=
|
||||||
|
"emphasis"><EM><functiontag></EM></SPAN> tags, one for each function tag to be
|
||||||
|
registered with the database.</P>
|
||||||
|
|
||||||
|
<P>It is easiest to copy an existing template and just edit the tags described above. The
|
||||||
|
remaining tags in the file are more dangerous to manipulate. The <SPAN class=
|
||||||
|
"emphasis"><EM><k></EM></SPAN> and <SPAN class="emphasis"><EM><L></EM></SPAN>
|
||||||
|
tags pertain to the index tuning. The <SPAN class=
|
||||||
|
"emphasis"><EM><weightsfile></EM></SPAN> tag gives the name of the weights file,
|
||||||
|
within the same directory, which is also another XML file. It is simplest to choose from
|
||||||
|
the existing weight files provided with the distribution. See <A class="xref" href=
|
||||||
|
"FeatureWeight.html#WeightingSoftware" title=
|
||||||
|
"Weighting Software Features">“Weighting Software Features”</A>.</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</BODY>
|
||||||
|
</HTML>
|
@ -0,0 +1,258 @@
|
|||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
|
||||||
|
|
||||||
|
<HTML>
|
||||||
|
<HEAD>
|
||||||
|
<META name="generator" content=
|
||||||
|
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||||
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||||
|
|
||||||
|
<TITLE>Features and Weights</TITLE>
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||||
|
<LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
|
||||||
|
<META name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||||||
|
<LINK rel="home" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="up" href="index.html" title="BSim Database">
|
||||||
|
<LINK rel="prev" href="DatabaseQuery.html" title="Querying a BSim Database">
|
||||||
|
<LINK rel="next" href="CommandLineReference.html" title="Command-Line Utility Reference">
|
||||||
|
</HEAD>
|
||||||
|
|
||||||
|
<BODY>
|
||||||
|
<DIV class="chapter">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H1 class="title"><A name="FeatureWeight"></A>Features and Weights</H1>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="FunctionFeatures"></A>Features of
|
||||||
|
Software Functions</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>The BSim Database uses a standard <SPAN class="bold"><STRONG>Feature
|
||||||
|
Vector</STRONG></SPAN> approach to compare and index software functions. A <SPAN class=
|
||||||
|
"bold"><STRONG>feature</STRONG></SPAN> is an abstraction that simply means a single element
|
||||||
|
or attribute that can be compared quantitatively between two objects. The set of possible
|
||||||
|
features used by a particular approach is fixed, and any object being examined is viewed as
|
||||||
|
some unordered subset of all the possible features. So features are the smallest (atomic)
|
||||||
|
aspect of an object that can be measured, either two objects share a feature in common or
|
||||||
|
they do not. But within this scheme, because objects generally consist of many individual
|
||||||
|
features, quantitative fine-grained comparisons can be automatically calculated.</P>
|
||||||
|
|
||||||
|
<P>The BSim Database extracts its features from the data-flow representation of a function,
|
||||||
|
after it has been normalized by the Ghidra decompiler. This is the SSA graph representation
|
||||||
|
of the function, with nodes representing the variables and operators of the function, and
|
||||||
|
the edges representing the read/write relationships between them. An individual feature is
|
||||||
|
just a portion of this graph, encompassing some subset of variables and operators and the
|
||||||
|
specific flow between them. Because of the decompilation, a feature can be viewed naturally
|
||||||
|
as a uniform snippet of C source code, a partial extraction of some expression in the
|
||||||
|
source code representation of the function. The full set of features provides uniform (and
|
||||||
|
overlapping) coverage of the graph representation of the entire function.</P>
|
||||||
|
|
||||||
|
<P>Features encode specific aspects of the variables they cover but not others. The size of
|
||||||
|
a variable, the operator that produced it, and the set of operators it is fed into are
|
||||||
|
encoded in the features. But, any name assigned to the variable, its data-type, or even its
|
||||||
|
storage location are <SPAN class="emphasis"><EM>not</EM></SPAN> encoded in the
|
||||||
|
features.</P>
|
||||||
|
|
||||||
|
<P>Within a function, details about the specific subfunctions that it calls are not encoded
|
||||||
|
in any of the features for that function, but information describing where the call is made
|
||||||
|
and the set of parameters it takes is encoded.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="WeightingSoftware"></A>Weighting
|
||||||
|
Software Features</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Some features are more useful for identifying a specific function out of a large corpus
|
||||||
|
than others. With the view that features are just portions of recovered C expressions, some
|
||||||
|
C expressions are simply more common than others. The BSim Database compensates for these
|
||||||
|
differences by assigning a weight to each feature that factors in to the similarity and
|
||||||
|
confidence scores produced when comparing functions. Weighting schemes are considered a
|
||||||
|
configuration parameter of the database and are established for a particular database when
|
||||||
|
it is created. The scheme cannot be changed without creating an entirely new database and
|
||||||
|
reingesting the functions.</P>
|
||||||
|
|
||||||
|
<P>Ghidra comes with precomputed weighting schemes that are calculated using statistics
|
||||||
|
drawn from homogeneous collections of systems and application software. A feature's weight
|
||||||
|
is computed by counting the number of times it occurs across the entire corpus and
|
||||||
|
comparing this with the counts from other features. This allows a direct computation of the
|
||||||
|
information content of the feature; quantitatively, how much have we narrowed down a
|
||||||
|
particular function from the corpus when we know it contains a particular feature.</P>
|
||||||
|
|
||||||
|
<P>The two primary weighting schemes are called <SPAN class=
|
||||||
|
"bold"><STRONG>32</STRONG></SPAN> and <SPAN class="bold"><STRONG>64</STRONG></SPAN>, based
|
||||||
|
on 32-bit code and one on 64-bit code respectively. This means that a particular database
|
||||||
|
instance has better sensitivity for either 32-bit or 64-bit functions. The quantitative
|
||||||
|
scores, similarity and confidence, will be more accurate at distinguishing pairs of
|
||||||
|
functions from one corpus. This does not mean that functions from the <SPAN class=
|
||||||
|
"emphasis"><EM>wrong</EM></SPAN> group cannot be ingested or queried, but the scores may
|
||||||
|
not be as accurate. There is also a <SPAN class="bold"><STRONG>64_32</STRONG></SPAN>
|
||||||
|
weighting scheme for architectures where code is compiled to use 64-bit registers but
|
||||||
|
addresses are still 32-bit.</P>
|
||||||
|
|
||||||
|
<P>The specialized weighting scheme <SPAN class="bold"><STRONG>nosize</STRONG></SPAN>
|
||||||
|
allows BSim to match between 32-bit and 64-bit implementations of a function. It works by
|
||||||
|
making feature hashes blind to the size difference between a 32-bit variable versus a
|
||||||
|
64-bit variable. This compensates for a compiler's tendency to assign a full 64-bit
|
||||||
|
register to a 32-bit variable, which is frequently difficult for the decompiler to
|
||||||
|
automatically resolve in the context of a single function. Because of this blindness, there
|
||||||
|
is a slight loss of sensitivity, when matching 32-bit to 32-bit functions, or when matching
|
||||||
|
64-bit to 64-bit, over the <SPAN class="bold"><STRONG>32</STRONG></SPAN> or <SPAN class=
|
||||||
|
"bold"><STRONG>64</STRONG></SPAN> schemes respectively.</P>
|
||||||
|
|
||||||
|
<P>The weighting scheme <SPAN class="bold"><STRONG>cpool</STRONG></SPAN> should be used for
|
||||||
|
run-time compilation (JIT) architectures, like Java Dalvik or <SPAN class=
|
||||||
|
"emphasis"><EM>.class</EM></SPAN> byte-code executables. These architectures use
|
||||||
|
characteristic <SPAN class="emphasis"><EM>constant pool</EM></SPAN> instructions that delay
|
||||||
|
exact decisions about code and data layout until runtime. The decompiler can still recover
|
||||||
|
data-flow effectively by treating these instructions as black-box operations, so BSim works
|
||||||
|
in the same way as with native code. But a specialized weighting scheme is needed to
|
||||||
|
balance BSim's sensitivity to these operations.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="section">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H2 class="title" style="clear: both"><A name="CompareVectors"></A>Comparing Feature
|
||||||
|
Vectors</H2>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>For a particular function, the set of extracted features and their assigned weights make
|
||||||
|
up the formal <SPAN class="bold"><STRONG>feature vector</STRONG></SPAN> associated with the
|
||||||
|
function. When querying a BSim Database, the primary function search is performed by
|
||||||
|
comparing feature vectors. There are two formal scores that are computed on a pair of
|
||||||
|
feature vectors, <SPAN class="emphasis"><EM>similarity</EM></SPAN> and <SPAN class=
|
||||||
|
"emphasis"><EM>confidence</EM></SPAN>.</P>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="Similarity"></A>Similarity</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Similarity is a direct calculation of the percentage of features in common between two
|
||||||
|
functions. It varies continuously from 0.0, meaning the functions share no features at
|
||||||
|
all, to 1.0, meaning that the functions have the same feature set. Formally, similarity
|
||||||
|
is defined as the <SPAN class="emphasis"><EM>cosine similarity</EM></SPAN> of the two
|
||||||
|
feature vectors. Weights determine how important individual features are in the score
|
||||||
|
relative to other features, providing a practical and realistic meaning to the score. Two
|
||||||
|
functions can exhibit a few unimportant changes, but the similarity can still be very
|
||||||
|
high because the differences are likely not weighted heavily. Along the same lines, two
|
||||||
|
functions can share most of their features but have a low similarity because they differ
|
||||||
|
in more important features.</P>
|
||||||
|
|
||||||
|
<P>When searching for a function, the database sets a particular threshold on similarity,
|
||||||
|
0.7 by default, and returns functions whose similarity with the queried function exceeds
|
||||||
|
that threshold. This can produce <SPAN class="emphasis"><EM>false positive</EM></SPAN>
|
||||||
|
matches for small functions because a small function is described by just a few features
|
||||||
|
and it is then comparatively easy to randomly match a high percentage of these features.
|
||||||
|
Deciding if a false positive is likely can be decided quantitatively by examining the
|
||||||
|
<SPAN class="emphasis"><EM>confidence</EM></SPAN> score below.</P>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<DIV class="sect2">
|
||||||
|
<DIV class="titlepage">
|
||||||
|
<DIV>
|
||||||
|
<DIV>
|
||||||
|
<H3 class="title"><A name="Confidence"></A>Confidence</H3>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>Confidence is a log likelihood ratio, a weighted count of the set of features that
|
||||||
|
match between two functions minus the set of features that are different. It is an
|
||||||
|
open-ended score, and the higher it gets, the more likely it is that the two functions
|
||||||
|
are a true match. Fixing a threshold for the confidence score provides a more consistent
|
||||||
|
<SPAN class="emphasis"><EM>false positive</EM></SPAN> rate, as opposed to thresholding on
|
||||||
|
similarity. A higher score means the two functions have more features in common as an
|
||||||
|
absolute count, not just a higher percentage. So the chance of randomly matching most of
|
||||||
|
the features continues to go down as confidence goes up.</P>
|
||||||
|
|
||||||
|
<P>A general correspondence between low confidence scores and false positive rates can be
|
||||||
|
somewhat skewed by <SPAN class="emphasis"><EM>wrappers</EM></SPAN> and other small
|
||||||
|
functions, which are always common but whose specific frequency can vary depending on the
|
||||||
|
type of software. BSim fixes the score 10.0 for a particular wrapper form, providing a
|
||||||
|
convenient boundary between wrappers and more substantial functions where frequencies are
|
||||||
|
more consistent. For scores of 10.0 and greater, we get the following rough
|
||||||
|
correspondence with false positive rate. The rate drops by a factor of 2 for an increase
|
||||||
|
in confidence of between 4 and 5 points.</P>
|
||||||
|
|
||||||
|
<DIV class="informalexample">
|
||||||
|
<DIV class="table">
|
||||||
|
<A name="falsepositive.htmltable"></A>
|
||||||
|
|
||||||
|
<TABLE width="70%" frame="none">
|
||||||
|
<COL width="30%">
|
||||||
|
<COL width="70%">
|
||||||
|
|
||||||
|
<THEAD>
|
||||||
|
<TR>
|
||||||
|
<TD><SPAN class="bold"><STRONG>Confidence</STRONG></SPAN></TD>
|
||||||
|
|
||||||
|
<TD><SPAN class="bold"><STRONG>False Positive Rate
|
||||||
|
(Approximate)</STRONG></SPAN></TD>
|
||||||
|
</TR>
|
||||||
|
</THEAD>
|
||||||
|
|
||||||
|
<TBODY>
|
||||||
|
<TR>
|
||||||
|
<TD>10</TD>
|
||||||
|
|
||||||
|
<TD>1 in 4,000</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>26</TD>
|
||||||
|
|
||||||
|
<TD>1 in 100,000</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>43</TD>
|
||||||
|
|
||||||
|
<TD>1 in 1,000,000</TD>
|
||||||
|
</TR>
|
||||||
|
|
||||||
|
<TR>
|
||||||
|
<TD>93</TD>
|
||||||
|
|
||||||
|
<TD>1 in 1,000,000,000</TD>
|
||||||
|
</TR>
|
||||||
|
</TBODY>
|
||||||
|
</TABLE>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
|
||||||
|
<P>For a single function, there is an upper-bound to the confidence that can be achieved
|
||||||
|
by a possible match, its <SPAN class="emphasis"><EM>self significance</EM></SPAN>. This
|
||||||
|
upper-bound is of course reached by comparison with a function having 1.0 similarity.
|
||||||
|
Self significance is roughly proportional to the size of the function. So its impossible
|
||||||
|
to achieve a high confidence for a small function, for single matches viewed in
|
||||||
|
isolation. Of course a medium to low confidence threshold may be enough to produce a
|
||||||
|
unique match if the database is small, and a medium to high confidence threshold may
|
||||||
|
still produce occasional false positives if the database is very large.</P>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</DIV>
|
||||||
|
</BODY>
|
||||||
|
</HTML>
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user