Crate scarf

Source
Expand description

Scarf is a library for analyzing x86 functions.

The main concept of scarf is that the user gives FuncAnalysis a function which will be simulated. FuncAnalysis will walk through the function’s different execution paths, while calling user-defined callbacks, letting the user code to examine execution to extract information it needs. The callback is also able to modify state and have some control over execution, allowing scarf to be adapted for cases where the library is not quite able to handle unusual assmebly patterns.

Examples of problems that could be answered with scarf:

  • Find all child function calls of a function, where one of the arguments is constant integer between 0x600 and 0x700
  • If the function writes a 64-bit value to (Base pointer)+0x28, return the base pointer and value which was written to. That is, detect writes to a field of a struct when the field offset is known to be 0x28.
  • Check if the function reads memory at given constant address, and track non-stack locations where the read value is passed to.
  • Determine all constant arguments that are passed to a certain function f, by analyzing all of the functions calling f.
  • Find a point where the function compares some value x to be less than constant 0x100, and return what expression x is, as well as the jump address and whether it has to be changed to always or never to jump in order to always go to x < 0x100 branch.

In general, scarf is still relatively low-level in its execution representation. Good analysis results often require user to handle edge cases, which often requires iterative improvements to analysis code when you come across an executable that the analysis quite does not work on. As ultimately the only input to scarf analysis is often just the executable binary, keeping tests using those binaries to prevent scarf-using code from suddenly regressing is a good idea.

Scarf strives to be fast enough to analyze an average function in less than 1 millisecond even on slower machines. This makes it quite feasible to brute force every function of even in larger executable in few minutes, as well as have more targeted analysis be fast enough that it can be ran without anyone noticing. Some of this speed means giving up accuracy, and if an adversary codegen wanted to explicitly break scarf, it would likely at least require user callback to actively help scarf from breaking. The simulation accuracy issues do not seem to cause too much problem in regular compiler-generated code though.

The following are main types used by scarf:

  • FuncAnalysis - The entry point for scarf analysis. Walks through and keeps track of all branches the execution may take.
  • trait Analyzer - User-implemented trait that FuncAnalysis calls back to, allowing user code to query and manipulate analysis.
  • analysis::Control - A type passed to Analyzer callbacks, providing various ways to query and manipulate analysis state.
  • BinaryFile - Contains sections of the binary, including code that is to be simulated.
    • BinarySection - A single section, practically just Vec<u8> and a base address.
  • VirtualAddress32 / VirtualAddress64 - Integer newtype representing a constant address, usually in BinaryFile
  • Operand - The main value / expression type of scarf.
  • OperandContext - Allocation arena and interner for Operands. Has to outlive FuncAnalysis, so user code is required to create this and pass to rest of scarf.
  • trait ExecutionState - Holds all of the simulated CPU and memory state of one point in analysis’s execution. Concrete types are ExecutionStateX86 and ExecutionStateX86_64.

Re-exports§

pub use crate::analysis::Analyzer;
pub use crate::analysis::FuncAnalysis;
pub use crate::operand::ArithOpType;
pub use crate::operand::MemAccess;
pub use crate::operand::MemAccessSize;
pub use crate::operand::Operand;
pub use crate::operand::OperandType;
pub use crate::operand::OperandContext;
pub use crate::operand::OperandCtx;
pub use crate::operand::ArchId;
pub use crate::exec_state::ExecutionState;
pub use crate::exec_state_x86::ExecutionState as ExecutionStateX86;
pub use crate::exec_state_x86_64::ExecutionState as ExecutionStateX86_64;
pub use crate::VirtualAddress32 as VirtualAddress;

Modules§

analysis
Contains FuncAnalysis and related types and traits.
cfg
cfg_dot
exec_state
Traits for abstracting over different CPU architecture, and code that can be shared between them.
exec_state_x86
32-bit x86 architechture state. Rexported as scarf::ExecutionStateX86.
exec_state_x86_64
64-bit x86 architechture state. Rexported as scarf::ExecutionStateX86_64.
operand
Operand and its supporting types.
operation_helpers

Structs§

BinaryFile
Contains the binary that is to be analyzed, loaded to memory.
BinaryFileWithCachedSection
BinaryFile, but caches last section from which data was read from, allowing faster repeated small reads when the addresses are expected, but not required to be in same section.
BinarySection
Single section of a BinaryFile.
DestArchId
DestOperand version of [ArchId], public API is similarly just new with u32, get u32, though this is not tied to OperandCtx
FlagUpdate
Part of Operation, representing an update to ExecutionStates flags.
Instruction
OutOfBounds
Zero-sized error type that is returned when reading from BinaryFile by VirtualAddress cannot be done.
Rva
Represents relative virtual address.
Rva64
Represents relative virtual address.
SpecialBytes
VirtualAddress32
VirtualAddress32 represents a constant 32-bit memory address.
VirtualAddress64
VirtualAddress64 represents a constant 64-bit memory address.

Enums§

DestOperand
DisasmError
Errors from disassembly (Operation generation)
Error
FlagArith
Operations used by [FlagUpdate].
Operation
A sub-operation of simulated instruction.

Functions§

parse
Creates a BinaryFile from 32-bit Windows executable at filename.
parse_x86_64
Creates a BinaryFile from 64-bit Windows executable at filename.
raw_bin
Creates a BinaryFile from memory buffer(s) representing the binary sections.