An Analytical Approach to Modern Binary Deobfuscation
For Reverse Engineers and Malware Analysts
4 Day u_long 32 CPE Hour Training: February 2022
## Abstract
Code obfuscation has become one of the most prevalent mechanisms aiming to complicate the process of software reverse engineering. It plays a major role on a wide range of domains: from malware threats to protection of intellectual property and digital rights management.
"An Analytical approach to Modern Binary Deobfuscation" is a curated training that provides an intensive jump-start into the field of code (de)obfuscation. Over the course of this training, students will receive a comprehensive introduction to the most relevant software obfuscation mechanisms as well as existing deobfuscation techniques to analyze, confront and defeat obfuscated code.
## Key learning objectives
- Obtain a high-level overview of the context and scenarios where code obfuscation is used
- Gain an in-depth understanding of code obfuscation mechanisms
- Build obfuscated code, both from scratch and through available tooling
- Develop an understanding of the main code deobfuscation techniques
- Learn tooling for analyzing obfuscated code and apply deobfuscation techniques
- Become familiar with state of the art (de)obfuscation research literature
## Contents
## Part 1: Code obfuscation
#### Introduction, context and motivation
#### Data-flow based obfuscation
- Constant unfolding
- Dead code insertion
- Encodings
- Pattern-based obfuscation
#### Control-flow based obfuscation
- Function inlining/outlining
- Opaque predicates
- Control-flow flattening
#### Mixing data-flow and control-flow obfuscation
- VM-based obfuscation
- Hardening VM-based obfuscation
#### Mixed Boolean-Arithmetic
- Preliminary concepts
- MBA rewriting
- Insertion of identities
- Opaque constants
## Part 2: Code deobfuscation
#### Data-flow analysis
- Reaching definition analysis
- Liveness analysis
#### Dynamic binary Instrumentation
- Tracing code execution
- Hooking
- Extracting deobfuscated data
#### SMT-based analysis
- Semantic equivalence checking
- Translating code conditions into SMT solver constraints
- Proving code properties
- Attacking simple MBA and weak cryptography
#### Symbolic execution
- Reasoning about code in a symbolic way
- Working with native code
- Working with intermediate representations
- Plugging an SMT solver
- Attacking opaque predicates
- Attacking MBA obfuscation
- Attacking VM-based obfuscation
#### Program synthesis
- Code syntax vs code semantics
- Oracle-based program synthesis
- Describing semantics through I/O behavior
- Generating I/O pairs
- Attacking MBA obfuscation
- Attacking VM-based obfuscation
#### Conclusions and research directions
## Tools used
* Disassemblers
- IDA Pro
- radare2
* Obfuscation
- Manual obfuscation
- O-LLVM
- Tigress
* Dynamic Binary Instrumentation
- Frida
- QBDI
* Symbolic execution
- Miasm
- Triton
- Radius
* Program synthesis
- syntia
- msynth
- qsynthesis
* Other tools
- Z3
- MBA-Solver
- Custom tooling
## Teaching methodology
Live classes are designed to be dynamic and engaging, making the students get the most out of the training materials and instructor expertise. A clear presentation of the concepts, accompanied by illustrative examples and demos. For each section, there will be practice time allocated. The students will be provided with several exercises to work on, with the continuous support of the instructor.
## Prerequisites
- Understanding of basic programming concepts
- Familiarity with x86 assembly, C and Python
- Knowledge of reverse engineering fundamentals
## System requirements
- A working desktop/laptop capable of running virtual machines
- 40 GB free hard disk space
## Students will be provided with
- A Virtual Machine image with all tools, examples and exercises
Catalan hacker, reverse engineer and mathematician, with an extensive background in code (de)obfuscation research and Mixed Boolean-Arithmetic expressions, as well as industry experience as a senior malware reverse engineer. Founder of Fura Labs (@FuraLabs), a research and education firm on software security and reverse engineering. Co-founder and president of @HackingLliure, a non-profit association and hacking community. Speaker and trainer at several international security conferences.
#### TRAINING SCHEDULE
| | |
|------------------|-------------------|
| FEB 19 Saturday | Live Lecture (4h) |
| FEB 20 Sunday | Live Lecture (4h) |
| FEB 21 Monday | Live Lecture (4h) |
| FEB 22 Tuesday | Live Lecture (3h) |
| FEB 23 Wednesday | Office Hours (3h) |
| FEB 24 Thursday | Wrap Up (3h) |
##### Live Lecture Timings (FEBRUARY 19,20,21)
| | |
|---------------|-----------------|
| 8 am - 12 pm | US Pacific Time |
| 11 am - 3 pm | US Eastern Time |
| 4 pm - 8 pm | UK |
| 5 pm - 9 pm | CET |
##### Live Lecture Timings (FEBRUARY 22,23,24)
| | |
|---------------|-----------------|
| 8 am - 11 am | US Pacific Time |
| 11 am - 2 pm | US Eastern Time |
| 4 pm - 7 pm | UK |
| 5 pm - 8 pm | CET |
Each lecture shall be split into 2 sessions of 1h45m each, with a 30-minute break in between.
The instructor shall also be available for 2 hours after each live lecture for handling any direct Q&A.
#### TRAINING SCHEDULE
This training shall be conducted during
**EXACT LECTURE DATES SHALL BE ANNOUNCED SOON.**
Lecture Recordings
Recordings shall be made available
after each lecture, throughout the duration
of the course. ONLY FOR REGISTERED STUDENTS.