Reverse Engineering – Binary Program Analysis
Virtual 32 CPE Hours Training ★ February 2024
WEEK 1 ★ FEB 10-17 // DETAILED SCHEDULE READY
Abstract
Reverse engineering is the art of extracting valuable information from unknown binary programs. No matter whether we aim to find vulnerabilities in closed-source software, dissect the internals of nation-state malware, or simply bypass copy protection technologies: Reverse engineering helps us to pinpoint relevant code/data locations, enables us to reconstruct high-level constructs from machine code, and thus provides us with insights into valuable program internals.
In this training, we learn the fundamentals of reverse engineering from scratch, ranging from reconstructing high-level code over recovering complex data structures and C++ class hierarchies to analyzing complex malware samples. In between, we become proficient in using state-of-the-art tools such as IDA, Ghidra, and GDB. This way, the training accompanies students in their first reverse engineering steps and paves their way for a long journey.
First, we discuss the layers between machine code and high-level languages, introduce binary file formats and get to know important tools such as hex editors, disassemblers, decompilers, and debuggers. Afterward, we familiarize ourselves with the X86-64 instruction set architecture, the most common architecture on desktop computers and servers. Thereby, we learn how to manually write assembly code, inspect registers and flags in a debugger, and reconstruct arithmetic calculations and loops in a disassembler.
In the second part, we cover the reconstruction of high-level code constructs from machine code. For this, we compile C code to machine code and compare them side-by-side. Using different compilers and optimization levels, we are able to study the manifold representations of high-level constructs. Afterward, we focus on manually recovering high-level functions from compiler-generated code. Finally, we dive into the area of software cracking and deepen our skills by reverse engineering and patching serial validation schemes.
Before we reconstruct complex data structures and C++ classes with Ghidra, we first learn how to identify them manually. Following, we have a look at how to recover class inheritance relationships, analyze constructors & virtual functions, and how to dissolve virtual function calls.
Finally, we put our obtained knowledge into practice by analyzing nation-state malware samples. After discussing challenges and strategies when dealing with complex binaries, we identify malware functionality based on API functions and reconstruct class hierarchies of malware modules. In order to reveal hidden strings in the binary, we script Ghidra to automatically decrypt them.
Pedagogy
The training focuses on hands-on sessions. While some lecture parts provide an understanding of how high-level code can be represented in machine code, various hands-on sessions teach how to interact with reverse engineering tools and reconstruct high-level code from binary programs. The trainer actively supports the students to successfully solve the given exercises. After a task is completed, we discuss different solutions in class. Furthermore, students receive detailed reference solutions that can be used during and after the course.
While this class mostly focuses on the X86-64 architecture, we can optionally take a look at the ARM32 architecture and discuss their differences and similarities. Since the course teaches reverse engineering in a general way, students will notice that all techniques and tools can also be applied to other architectures.
Key Learning Objectives
- Learn reverse engineering from scratch and understand all layers between machine code and high-level languages
- Become proficient in using state-of-the-art tools like IDA, Ghidra and GDB
- Learn how to reconstruct (nested) conditionals and loops, functions, complex data structures and C++ classes from machine code
- Get to know strategies to analyze complex binaries and apply them to nation-state malware samples
- Deepen your reverse engineering skills in various hands-on sessions
Intended Audience
Everybody working in cyber security with an interest or need to learn about low-level program analysis. This includes cyber security experts, malware and forensic analysts etc.
Detailed Agenda
Introduction to Reverse Engineering
- Motivation
- Application scenarios
- From machine code to high-level languages
- Compilers
- Executable file formats (ELF & PE)
- Static and dynamic program analysis
- Editing ELF files with a hex editor
- Disassembling with IDA
- Decompilation with Ghidra
- Debugging with GDB
X86-64 Architecture
- Architecture overview
- Register and data types
- Arithmetic operations and control-flow instructions
- Stack operations and function invocations
- Inspection of registers and flags with GDB
- Implementation of arithmetic operations in assembly code
- Reconstruction of simple calculations
- Loop reconstruction with IDA
Reconstruction of Functions
- Inspection of empty functions on the binary level
- Stack frame analysis with GDB
- Prologue and epilogue identification with IDA and GDB
- Calling conventions
- Basic blocks and control-flow graphs
- Reconstruction of function signatures and arguments
- Reconstruction of recursive functions
- Reconstruction of (nested) conditionals/switch case
- Reconstruction of (nested) loops
- Impact of compiler optimizations
Software Cracking
- Software license checks and keygenning
- Analysis of serial validation schemes with IDA/Ghidra and GDB
- Patching to manipulate control flow
Reconstruction of Data Structures
- Local and global data structures
- Variables, arrays, strings and structs
- Reconstruction of arrays with IDA/Ghidra
- Reconstruction of structs with IDA/Ghidra
C++ Reverse Engineering
- Function overloading and name mangling
- Class objects and object life cycles
- Identification and reconstruction of class objects
- Reconstruction of class relationships/inheritance
- Static/dynamic dispatching
- Virtual functions and class inheritance
- Identification and analysis of virtual function tables
- Dissolving virtual function calls
Malware Reverse Engineering
- Malware types and behavior
- Analysis challenges and strategies
- Identification of malware functionality based on API functions
- Class reconstruction of C++ malware with Ghidra
- Ghidra scripting for automated string decryption
ARM32 Architecture (Optional)
- Architecture overview
- Differences to X86-64
- Register and data types
- Stack operations
- Arithmetic operations and control-flow instructions
- Subroutines and calling convention
Knowledge Prequisites
The participants should have some familiarity with low-level programming in C. Particularly, a basic understanding of pointers is recommended.
Hardware Requirements
Students should have access to a computer with 4 GB RAM (minimum) and at least 20 GB disk space.
Software Requirements
Students should install a virtualization software such as Virtual Box or VMware. Students will be provided with a Linux VM containing all necessary tools and setups.
Tim Blazytko
Tim Blazytko is a well-known binary security researcher and co-founder of emproof. After working on novel methods for code deobfuscation, fuzzing and root cause analysis during his PhD, Tim now builds code obfuscation schemes tailored to embedded devices. Moreover, he gives trainings on reverse engineering and code deobfuscation, analyzes malware and performs security audits.