Expand description
Scarf is a library for analyzing x86 functions.
The main concept of scarf is that the user gives FuncAnalysis
a function
which will be simulated. FuncAnalysis
will walk through the function’s different
execution paths, while calling user-defined callbacks, letting the user code
to examine execution to extract information it needs. The callback is also able to
modify state and have some control over execution, allowing scarf to be adapted for cases
where the library is not quite able to handle unusual assmebly patterns.
Examples of problems that could be answered with scarf:
- Find all child function calls of a function, where one of the arguments is constant integer between 0x600 and 0x700
- If the function writes a 64-bit value to
(Base pointer)+0x28
, return the base pointer and value which was written to. That is, detect writes to a field of a struct when the field offset is known to be 0x28. - Check if the function reads memory at given constant address, and track non-stack locations where the read value is passed to.
- Determine all constant arguments that are passed to a certain function
f
, by analyzing all of the functions callingf
. - Find a point where the function compares some value
x
to be less than constant 0x100, and return what expressionx
is, as well as the jump address and whether it has to be changed to always or never to jump in order to always go tox < 0x100
branch.
In general, scarf is still relatively low-level in its execution representation. Good analysis results often require user to handle edge cases, which often requires iterative improvements to analysis code when you come across an executable that the analysis quite does not work on. As ultimately the only input to scarf analysis is often just the executable binary, keeping tests using those binaries to prevent scarf-using code from suddenly regressing is a good idea.
Scarf strives to be fast enough to analyze an average function in less than 1 millisecond even on slower machines. This makes it quite feasible to brute force every function of even in larger executable in few minutes, as well as have more targeted analysis be fast enough that it can be ran without anyone noticing. Some of this speed means giving up accuracy, and if an adversary codegen wanted to explicitly break scarf, it would likely at least require user callback to actively help scarf from breaking. The simulation accuracy issues do not seem to cause too much problem in regular compiler-generated code though.
The following are main types used by scarf:
FuncAnalysis
- The entry point for scarf analysis. Walks through and keeps track of all branches the execution may take.trait Analyzer
- User-implemented trait thatFuncAnalysis
calls back to, allowing user code to query and manipulate analysis.analysis::Control
- A type passed toAnalyzer
callbacks, providing various ways to query and manipulate analysis state.BinaryFile
- Contains sections of the binary, including code that is to be simulated.BinarySection
- A single section, practically justVec<u8>
and a base address.
VirtualAddress32
/VirtualAddress64
- Integer newtype representing a constant address, usually inBinaryFile
trait exec_state::VirtualAddress
- A trait allowing handling both address sizes generically.
Operand
- The main value / expression type of scarf.OperandContext
- Allocation arena and interner forOperand
s. Has to outliveFuncAnalysis
, so user code is required to create this and pass to rest of scarf.trait ExecutionState
- Holds all of the simulated CPU and memory state of one point in analysis’s execution. Concrete types areExecutionStateX86
andExecutionStateX86_64
.
Re-exports§
pub use crate::analysis::Analyzer;
pub use crate::analysis::FuncAnalysis;
pub use crate::operand::ArithOpType;
pub use crate::operand::MemAccess;
pub use crate::operand::MemAccessSize;
pub use crate::operand::Operand;
pub use crate::operand::OperandType;
pub use crate::operand::OperandContext;
pub use crate::operand::OperandCtx;
pub use crate::operand::ArchId;
pub use crate::exec_state::ExecutionState;
pub use crate::exec_state_x86::ExecutionState as ExecutionStateX86;
pub use crate::exec_state_x86_64::ExecutionState as ExecutionStateX86_64;
pub use crate::VirtualAddress32 as VirtualAddress;
Modules§
- analysis
- Contains
FuncAnalysis
and related types and traits. - cfg
- cfg_dot
- exec_
state - Traits for abstracting over different CPU architecture, and code that can be shared between them.
- exec_
state_ x86 - 32-bit x86 architechture state. Rexported as
scarf::ExecutionStateX86
. - exec_
state_ x86_ 64 - 64-bit x86 architechture state. Rexported as
scarf::ExecutionStateX86_64
. - operand
Operand
and its supporting types.- operation_
helpers
Structs§
- Binary
File - Contains the binary that is to be analyzed, loaded to memory.
- Binary
File With Cached Section - BinaryFile, but caches last section from which data was read from, allowing faster repeated small reads when the addresses are expected, but not required to be in same section.
- Binary
Section - Single section of a
BinaryFile
. - Dest
Arch Id DestOperand
version of [ArchId
], public API is similarly just new with u32, get u32, though this is not tied toOperandCtx
- Flag
Update - Part of
Operation
, representing an update toExecutionState
s flags. - Instruction
- OutOf
Bounds - Zero-sized error type that is returned when reading from BinaryFile by VirtualAddress cannot be done.
- Rva
- Represents relative virtual address.
- Rva64
- Represents relative virtual address.
- Special
Bytes - Virtual
Address32 VirtualAddress32
represents a constant 32-bit memory address.- Virtual
Address64 VirtualAddress64
represents a constant 64-bit memory address.
Enums§
- Dest
Operand - Disasm
Error - Errors from disassembly (
Operation
generation) - Error
- Flag
Arith - Operations used by
[FlagUpdate]
. - Operation
- A sub-operation of simulated instruction.
Functions§
- parse
- Creates a
BinaryFile
from 32-bit Windows executable atfilename
. - parse_
x86_ 64 - Creates a
BinaryFile
from 64-bit Windows executable atfilename
. - raw_bin
- Creates a BinaryFile from memory buffer(s) representing the binary sections.