That is correct.
So let's do that.
But before we go any further... I'd like to just add a tiny teeny linker flag.
The problem with making dynamic libraries — or "shared" libraries, as GCC and
GNU ld tend to call them, is that having undefined symbols is a-okay.
For example, if we were to call a function that does not exist from within
stage1:
It would just... ask for an i_do_not_exist
symbol. So the error would be
pushed to load time, ie. whenever the library is dlopen
ed by some program. Or
possibly even later, when i_do_not_exist
is called, if the dynamic loader is
feeling particularly lazy.
There's just one problem with that...
The good news is: there is a linker flag for that! (And also, that is the
default behavior for executables).
This may sound trivial, but had I known about it when I started researching this
part, it would've saved me a lot of grief. Also, encore
here is doing the
heavy lifting, providing a panic handler, and a memory allocator, so there's
less chances of us getting it wrong.
There! That's a good start.
Now all we have to do is turn that dynamic library into a non-relocatable
executable. And we have most of the tools to do that.
In Part 15 of this series, we enabled an option in rust-analyzer, and turns out,
it changed names! If in .vscode/settings.json
you had
rust-analyzer.cargo.loadOutDirsFromCheck
, you may want to change it.
Well, first it should calculate the convex hull of the guest executable, so that
we know how to lay out the "output" executable.
I thought it was a two-stage plan?
Okay, cool, so, to relink stage1
we're going to need the guest hull, and also
a mutable reference to the Writer
, let's make a function for that:
Rust code
// in `crates/minipak/src/main.rs`
// new!
use core:: ops:: Range;
#[ allow( clippy::unnecessary_wraps) ]
// 👇
fn main ( env : Env ) -> Result < ( ) , Error > {
let args = cli:: Args:: parse ( & env) ;
println ! ( "Packing guest {:?}" , args.input) ;
let guest_file = File:: open ( args. input ) ?;
let guest_map = guest_file. map ( ) ?;
let guest_obj = Object:: new ( guest_map. as_ref ( ) ) ?;
let guest_hull = guest_obj. segments ( ) . load_convex_hull ( ) ?;
let mut output = Writer:: new ( & args. output , 0o755 ) ?;
relink_stage1 ( guest_hull, & mut output) ?;
Ok ( ( ) )
}
fn relink_stage1 ( guest_hull : Range < u64 > , writer : & mut Writer ) -> Result < ( ) , Error > {
println ! ( "Guest hull: {:0x?}" , guest_hull) ;
let obj = Object:: new ( include_bytes ! ( concat!(
env!( "OUT_DIR" ) ,
"/embeds/libstage1.so"
) ) ) ?;
// TODO: fill out!
Ok ( ( ) )
}
Quick detour through error handling: we might return various types of errors:
from deku
, from encore
, or from pixie
, so let's give minipak an Error
type now and not have to worry about it later:
TOML markup
# in `crates/minipak/Cargo.toml`
[ dependencies ]
displaydoc = { version = "0.1.7" , default-features = false }
Rust code
use encore:: prelude:: * ;
use pixie:: { deku:: DekuError, PixieError} ;
#[ derive( displaydoc::Display, Debug) ]
pub enum Error {
/// `{0}`
Encore( EncoreError ) ,
/// deku error: `{0}`
Deku( DekuError ) ,
/// pixie error: `{0}`
Pixie( PixieError ) ,
}
impl From < EncoreError > for Error {
fn from ( e : EncoreError ) -> Self {
Self:: Encore ( e)
}
}
impl From < DekuError > for Error {
fn from ( e : DekuError ) -> Self {
Self:: Deku ( e)
}
}
impl From < PixieError > for Error {
fn from ( e : PixieError ) -> Self {
Self:: Pixie ( e)
}
}
And we just need to use it from main.rs
:
Rust code
// in `crates/minipak/src/main.rs`
mod error;
use error:: Error;
Does everything still work? Yes?
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
Guest hull: 400000..3180968
Yes. Good.
Next up: some basic tests. We expect stage1
to be relocatable, so, let's check
that expectation. If it is relocatable, then its convex hull start at 0x0
:
Rust code
// in `relink_stage1`
let hull = obj. segments ( ) . load_convex_hull ( ) ?;
assert_eq ! ( hull.start, 0 , "stage1 must be relocatable" ) ;
Then we have a decision to make: where will our executable start? If we're
packing a relocatable executable, we can pick any base address! If we're packing
a non-relocatable executable, we have to pick their base address.
Rust code
// in `relink_stage1`
// Pick a base offset. If our guest is a relocatable executable, pick a
// random one, otherwise, pick theirs.
let base_offset = if guest_hull. start == 0 {
0x800000 // by fair dice roll
} else {
guest_hull. start
} ;
println ! ( "Picked base_offset 0x{:x}" , base_offset) ;
let hull = ( hull. start + base_offset) ..( hull. end + base_offset) ;
println ! ( "Stage1 hull: {:x?}" , hull) ;
println ! ( " Guest hull: {:x?}" , guest_hull) ;
This alone should give us some interesting output:
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
Picked base_offset 0x400000
Stage1 hull: 400000..40b048
Guest hull: 400000..3180968
Cool! hugo
is not relocatable, so we will be mapping stage
starting from
the same base address.
If we try to pack ls
however, which is relocatable:
Shell session
$ cargo run --quiet --release --bin minipak -- /bin/ls -o /tmp/gcc.pak
Packing guest "/bin/ls"
Picked base_offset 0x800000
Stage1 hull: 800000..80b048
Guest hull: 0..24558
Then we pick 0x800000
as a base address.
Alright, cool. Next up, we're going to proceed as if we wanted to load stage1
as a library.
It's an ELF object, so it has segments, and so it can be mapped:
Rust code
let mut mapped = MappedObject:: new ( & obj, None) ?;
println ! ( "Loaded stage1" ) ;
And then we should relocate it.
Wait, relocate it? Like we did in parts uhhh... in the previous parts?
Exactly! But we'll only care about four relocation types: 64
, GlobDat
,
JumpSlot
, Relative
.
Because that's all we have in our library:
Shell session
$ readelf -Wr ./target/debug/libstage1.so | grep -oE 'R_X86\w+' | sort -u
R_X86_64_GLOB_DAT
R_X86_64_JUMP_SLOT
R_X86_64_RELATIVE
Okay, so 64
hasn't showed up yet, but let's handle it anyway.
Alright!
Relocating is ELF business, and we do our ELF business in the pixie
crate.
Let's add a relocate
method to MappedObject
:
Rust code
// in `crates/pixie/src/lib.rs`
use alloc:: boxed:: Box;
impl < ' a > MappedObject < ' a > {
/// Apply relocations with the given base offset
pub fn relocate ( & mut self , base_offset : u64 ) -> Result < ( ) , PixieError > {
if !self . is_relocatable ( ) {
return Err ( PixieError:: CannotRelocateNonRelocatableObject) ;
}
let dyn_entries = self . object . read_dynamic_entries ( ) ?;
let syms = dyn_entries. syms ( ) ?;
let relas = dyn_entries
. find ( DynamicTagType:: Rela) ?
. parse_all ( dyn_entries. find ( DynamicTagType:: RelaSz) ?) ;
let plt_relas: Box < dyn Iterator < Item = _ > > = match dyn_entries. find ( DynamicTagType:: JmpRel)
{
Ok( jmprel) => Box:: new ( jmprel. parse_all ( dyn_entries. find ( DynamicTagType:: PltRelSz) ?) ) ,
Err( _) => Box:: new ( core:: iter:: empty ( ) ) as _ ,
} ;
for rela in relas. chain ( plt_relas) {
let rela = rela?;
self . apply_rela ( & syms, & rela, base_offset) ?;
}
Ok ( ( ) )
}
}
Okay, uh, we jumped a few steps — there's a lot of symbols we haven't defined
yet in here.
We need a new error variant:
Rust code
// in `crates/pixie/src/lib.rs`
#[ derive( displaydoc::Display, Debug) ]
/// A pixie error
pub enum PixieError {
/// `{0}`
Deku( DekuError ) ,
/// `{0}
Encore( EncoreError ) ,
/// no segments found
NoSegmentsFound,
/// could not find segment of type `{0:?}`
SegmentNotFound( SegmentType ) ,
/// cannot map non-relocatable object at fixed position
CannotMapNonRelocatableObjectAtFixedPosition,
// 👇 new!
/// cannot relocate non-relocatable object
CannotRelocateNonRelocatableObject,
}
And then we need to teach pixie
about the kind of entries contained in the
"Dynamic" segment.
Rust code
// in `crates/pixie/src/format/dynamic.rs`
use super :: prelude:: * ;
#[ derive( Debug, Clone, DekuRead, DekuWrite) ]
pub struct DynamicTag {
pub typ : DynamicTagType ,
pub addr : u64 ,
}
#[ derive( Debug, DekuRead, DekuWrite, Clone, Copy, PartialEq) ]
#[ deku( type = "u64" ) ]
pub enum DynamicTagType {
#[ deku( id = "0" ) ]
Null,
#[ deku( id = "2" ) ]
PltRelSz,
#[ deku( id = "5" ) ]
StrTab,
#[ deku( id = "6" ) ]
SymTab,
#[ deku( id = "7" ) ]
Rela,
#[ deku( id = "8" ) ]
RelaSz,
#[ deku( id = "11" ) ]
SymEnt,
#[ deku( id = "23" ) ]
JmpRel,
#[ deku( id_pat = "_" ) ]
Other( u64 ) ,
}
That's a new module of pixie::format
, we need to declare it:
Rust code
// in `crates/pixie/src/format/mod.rs`
mod dynamic;
pub use dynamic:: * ;
Then, we'll need to implement Object::read_dynamic_entries
.
Rust code
// in `crates/pixie/src/lib.rs`
impl < ' a > Object < ' a > {
/// Read all dynamic entries
pub fn read_dynamic_entries ( & self ) -> Result < DynamicEntries < ' a > , PixieError > {
let dyn_seg = self . segments . find ( SegmentType:: Dynamic) ?;
let mut entries = DynamicEntries:: default ( ) ;
let mut input = ( dyn_seg. slice ( ) , 0 ) ;
loop {
let ( rest, tag) = DynamicTag:: from_bytes ( input) ?;
if tag. typ == DynamicTagType:: Null {
break ;
}
entries. items . push ( DynamicEntry {
tag,
full_slice : & self . slice ,
} ) ;
input = rest;
}
Ok ( entries)
}
}
Which returns a type DynamicEntries
, very similar to the Segments
type we
made before — it just has a bunch of utility methods:
Rust code
// in `crates/pixie/src/lib.rs`
/// Entries in the `DYNAMIC` segment.
#[ derive( Default) ]
pub struct DynamicEntries < ' a > {
items : Vec < DynamicEntry < ' a > > ,
}
impl < ' a > DynamicEntries < ' a > {
/// Returns a slice of all entries
pub fn all ( & self ) -> & [ DynamicEntry < ' a > ] {
& self . items
}
/// Iterates over all entries of a given type
pub fn of_type ( & self , typ : DynamicTagType ) -> impl Iterator < Item = & DynamicEntry < ' a > > {
self . items . iter ( ) . filter ( move |entry| entry. typ ( ) == typ)
}
/// Finds the first entry of a given type
pub fn find ( & self , typ : DynamicTagType ) -> Result < & DynamicEntry < ' a > , PixieError > {
self . of_type ( typ)
. next ( )
. ok_or ( PixieError:: DynamicEntryNotFound ( typ) )
}
/// Constructs an instance of `Syms`. Requires the presence of the `SymTab`,
/// `SymEnt` and `StrTab` dynamic entries.
pub fn syms ( & ' a self ) -> Result < Syms < ' a > , PixieError > {
Ok ( Syms {
symtab : self . find ( DynamicTagType:: SymTab) ?,
syment : self . find ( DynamicTagType:: SymEnt) ?,
strtab : self . find ( DynamicTagType:: StrTab) ?,
} )
}
}
This brings a new error variant — when we can't find a dynamic entry of the
requested type:
Rust code
#[ derive( displaydoc::Display, Debug) ]
/// A pixie error
pub enum PixieError {
// (cut)
/// could not find dynamic entry of type `{0:?}`
DynamicEntryNotFound( DynamicTagType ) ,
}
The DynamicEntry
type holds both a "dynamic tag", and the corresponding data
(which is always more or less an u64
, but can be a number, an address, or something
else still):
Rust code
// in `crates/pixie/src/lib.rs`
/// An entry in the `DYNAMIC` section
pub struct DynamicEntry < ' a > {
/// The dynamic tag as read from the `DYNAMIC` section
tag : DynamicTag ,
/// A slice of the full ELF object
full_slice : & ' a [ u8 ] ,
}
impl < ' a > DynamicEntry < ' a > {
/// Returns the type of this dynamic entry
pub fn typ ( & self ) -> DynamicTagType {
self . tag . typ
}
/// Returns a slice of the full file starting with this entry interpreted as
/// an offset.
pub fn as_slice ( & self ) -> & ' a [ u8 ] {
& self . full_slice [ self . as_usize ( ) ..]
}
/// Returns this entry's value as an `usize`
pub fn as_usize ( & self ) -> usize {
self . as_u64 ( ) as usize
}
/// Returns this entry's value as an `u64`
pub fn as_u64 ( & self ) -> u64 {
self . tag . addr
}
/// Parses several `T` records, using `self` at the start of the input, and
/// `len` total length of the input.
pub fn parse_all < T > (
& self ,
len : & DynamicEntry < ' a > ,
) -> impl Iterator < Item = Result < T , PixieError > > + ' a
where
T : DekuContainerRead < ' a > ,
{
let slice = & self . as_slice ( ) [ ..len. as_usize ( ) ] ;
let mut input = ( slice, 0 ) ;
core:: iter:: from_fn ( move || -> Option < Result < T , PixieError > > {
if input. 0 . is_empty ( ) {
return None;
}
let ( rest, t) = match T:: from_bytes ( input) {
Ok( x) => x,
Err( e) => return Some ( Err ( e. into ( ) ) ) ,
} ;
input = rest;
Some ( Ok ( t) )
} )
}
/// Parses the nth `T` record, using `self` as the start of the input, and
/// `record_len` as the record length.
pub fn parse_nth < T > ( & self , record_len : & DynamicEntry < ' a > , n : usize ) -> Result < T , DekuError >
where
T : DekuContainerRead < ' a > ,
{
let slice = & self . as_slice ( ) [ ( record_len. as_usize ( ) * n) ..] ;
let input = ( slice, 0 ) ;
let ( _, t) = T:: from_bytes ( input) ?;
Ok ( t)
}
}
Then, we have Syms
, which allows looking up symbol names, using the symtab
,
syment
, and strtab
dynamic entries:
Rust code
// in `crates/pixie/src/lib.rs`
/// Allows reading symbols out of an ELF file
pub struct Syms < ' a > {
/// Indicates the start of the symbol table
symtab : & ' a DynamicEntry < ' a > ,
/// Indicates the size of a symbol entry
syment : & ' a DynamicEntry < ' a > ,
/// Indicates the start of the string table
strtab : & ' a DynamicEntry < ' a > ,
}
impl < ' a > Syms < ' a > {
/// Read the nth symbol
pub fn nth ( & self , n : usize ) -> Result < ( Sym , & ' a str ) , PixieError > {
let sym: Sym = self . symtab . parse_nth ( & self . syment , n) ?;
let name = unsafe { self . strtab . as_slice ( ) . as_ptr ( ) . add ( sym. name as _ ) . cstr ( ) } ;
Ok ( ( sym, name) )
}
/// Find a symbol by name. Will end up panicking if the symbol
/// is not found!
pub fn by_name ( & self , name : & str ) -> Result < Sym , PixieError > {
let mut i = 0 ;
loop {
let ( sym, sym_name) = self . nth ( i) ?;
if sym_name == name {
return Ok ( sym) ;
}
i += 1 ;
}
}
}
Sym
is also its own type:
Rust code
// in `pixie/src/format/mod.rs`
mod sym;
pub use sym:: * ;
Rust code
// in `pixie/src/format/sym.rs`
use super :: prelude:: * ;
#[ derive( Debug, DekuRead, DekuWrite, Clone) ]
pub struct Sym {
pub name : u32 ,
pub bind : SymBind ,
#[ deku( pad_bytes_after = "1" ) ]
pub typ : SymType ,
pub shndx : u16 ,
pub value : u64 ,
pub size : u64 ,
}
#[ derive( Debug, DekuRead, DekuWrite, Clone, Copy, PartialEq) ]
#[ deku( type = "u8" , bits = 4 ) ]
pub enum SymBind {
#[ deku( id = "0" ) ]
Local,
#[ deku( id = "1" ) ]
Global,
#[ deku( id = "2" ) ]
Weak,
#[ deku( id_pat = "_" ) ]
Other( u8 ) ,
}
#[ derive( Debug, DekuRead, DekuWrite, Clone, Copy, PartialEq) ]
#[ deku( type = "u8" , bits = 4 ) ]
pub enum SymType {
#[ deku( id = "0" ) ]
None,
#[ deku( id = "1" ) ]
Object,
#[ deku( id = "2" ) ]
Func,
#[ deku( id = "3" ) ]
Section,
#[ deku( id = "4" ) ]
File,
#[ deku( id = "6" ) ]
Tls,
#[ deku( id = "10" ) ]
IFunc,
#[ deku( id_pat = "_" ) ]
Other( u8 ) ,
}
Whoa, whoahey, that's a lot of code, isn't it?
Yeah, but it's not that bad, is it? We're just making some nice abstractions, as
usual, and using what deku
gives us to parse symbols easily.
Now that we have all that, we can focus on actually applying relocations.
And, remember how we very carefully handled each relocation type differently,
making sure to apply the formula from the SysV ABI?
Yeah well, not this time.
Rust code
// in `crates/pixie/src/lib.rs`
impl < ' a > MappedObject < ' a > {
/// Apply a single relocation
fn apply_rela ( & mut self , syms : & Syms , rela : & Rela , base_offset : u64 ) -> Result < ( ) , PixieError > {
match rela. typ {
RelType:: _64 | RelType:: GlobDat | RelType:: JumpSlot | RelType:: Relative => {
// we support these
}
_ => {
return Err ( PixieError:: UnsupportedRela ( rela. clone ( ) ) ) ;
}
}
// some relocations don't use symbols, we'll just use the 0th symbol
// for them, which is fine.
let ( sym, _) = syms. nth ( rela. sym as _ ) ?;
let value = base_offset + sym. value + rela. addend ;
let mem_offset = self . vaddr_to_mem_offset ( rela. offset ) ;
unsafe {
let target = self . mem . as_ptr ( ) . add ( mem_offset) as * mut u64 ;
* target = value;
}
Ok ( ( ) )
}
}
Turns out: we can all compute them the same way! It's literally always just
"base_offset + symbol value + addend". Some relocation refer to the 0th symbol,
which has a value of 0, and some relocations don't have an addend, so the addend
"field" is 0, and it all ends up being correct.
There's still two missing pieces: yet another error variant, in case we
encounter a relocation type we do not support (which should never happen):
Rust code
// in `crates/pixie/src/lib.rs`
#[ derive( displaydoc::Display, Debug) ]
/// A pixie error
pub enum PixieError {
// (cut)
/// unsupported relocation type `{0:?}`
UnsupportedRela( Rela ) ,
}
And of course, the Rela
and RelType
types, which are also part of the ELF
format:
Rust code
// in `crates/pixie/src/format/mod.rs`
mod rela;
pub use rela:: * ;
Rust code
// in `crates/pixie/src/format/rela.rs`
use super :: prelude:: * ;
#[ derive( Debug, DekuRead, DekuWrite, Clone) ]
pub struct Rela {
pub offset : u64 ,
pub typ : RelType ,
pub sym : u32 ,
pub addend : u64 ,
}
#[ derive( Debug, DekuRead, DekuWrite, Clone, Copy, PartialEq) ]
#[ deku( type = "u32" ) ]
pub enum RelType {
#[ deku( id = "0" ) ]
Null,
#[ deku( id = "1" ) ]
_64,
#[ deku( id = "6" ) ]
GlobDat,
#[ deku( id = "7" ) ]
JumpSlot,
#[ deku( id = "8" ) ]
Relative,
#[ deku( id = "16" ) ]
DtpMod64,
#[ deku( id_pat = "_" ) ]
Other( u32 ) ,
}
And with all that , we can relocate stage1
.
To get an idea of the result, let's write the relocated version of stage1
directly to the output:
Rust code
// in `crates/minipak/src/main.rs`
fn relink_stage1 ( guest_hull : Range < u64 > , writer : & mut Writer ) -> Result < ( ) , Error > {
let obj = Object:: new ( include_bytes ! ( concat!(
env!( "OUT_DIR" ) ,
"/embeds/libstage1.so"
) ) ) ?;
let hull = obj. segments ( ) . load_convex_hull ( ) ?;
assert_eq ! ( hull.start, 0 , "stage1 must be relocatable" ) ;
// Pick a base offset. If our guest is a relocatable executable, pick a
// random one, otherwise, pick theirs.
let base_offset = if guest_hull. start == 0 {
0x800000 // by fair dice roll
} else {
guest_hull. start
} ;
println ! ( "Picked base_offset 0x{:x}" , base_offset) ;
let hull = ( hull. start + base_offset) ..( hull. end + base_offset) ;
println ! ( "Stage1 hull: {:x?}" , hull) ;
println ! ( " Guest hull: {:x?}" , guest_hull) ;
// Map stage1 wherever...
let mut mapped = MappedObject:: new ( & obj, None) ?;
println ! ( "Loaded stage1" ) ;
// 👇 new code's here
// ...but relocate it as if it was mapped at `base_offset`
mapped. relocate ( base_offset) ?;
println ! ( "Relocated stage1" ) ;
// Dump the relocated version of the executable segment to disk, for comparison:
let exec_segment = mapped. vaddr_slice (
obj. segments ( )
. of_type ( pixie:: SegmentType:: Load)
. find ( |x| x. header ( ) . flags == ( ProgramHeader:: EXECUTE | ProgramHeader:: READ) )
. unwrap ( )
. header ( )
. mem_range ( ) ,
) ;
writer. write_all ( exec_segment) ?;
Ok ( ( ) )
}
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
Picked base_offset 0x400000
Stage1 hull: 400000..40b048
Guest hull: 400000..3180968
Loaded stage1
Relocated stage1
Okay! Now let's try to compare the non-relocated and the relocated version of
the first segment. First we need to extract just the right part of the the
non-relocated libstage1.so
:
Shell session
$ readelf -Wl ./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/libstage1.so | grep -E 'MemSiz|LOAD'
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x001060 0x001060 R 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x004b0d 0x004b0d R E 0x1000
LOAD 0x007000 0x0000000000007000 0x0000000000007000 0x001b88 0x001b88 R 0x1000
LOAD 0x009750 0x000000000000a750 0x000000000000a750 0x0008c0 0x0008f8 RW 0x1000
$ dd if=./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/libstage1.so of=/tmp/unrelocated bs=1 skip=$((0x002000)) count=$((0x004b0d))
19213+0 records in
19213+0 records out
19213 bytes (19 kB, 19 KiB) copied, 0.0276595 s, 695 kB/s
Now, if my calculations are correct, the first few bytes should be the same:
Shell session
$ xxd /tmp/unrelocated | head -3
00000000: f30f 1efa 4883 ec08 488b 05e9 8f00 0048 ....H...H......H
00000010: 85c0 7402 ffd0 4883 c408 c300 0000 0000 ..t...H.........
00000020: ff35 c28e 0000 ff25 c48e 0000 0f1f 4000 .5.....%......@.
$ xxd /tmp/hugo.pak | head -3
00000000: f30f 1efa 4883 ec08 488b 05e9 8f00 0048 ....H...H......H
00000010: 85c0 7402 ffd0 4883 c408 c300 0000 0000 ..t...H.........
00000020: ff35 c28e 0000 ff25 c48e 0000 0f1f 4000 .5.....%......@.
Yeah, yes! That looks similar.
Let's find the differences, shall we?
Shell session
$ diff <(xxd /tmp/unrelocated) <(xxd /tmp/hugo.pak)
$
Huh. No output? They're the same? Did our program just... do nothing?
Maybe there's no relocations in the executable segment?
Ohhhhhhhh, right! It probably uses rip-relative addressing to avoid any
relocations touching the executable segment, so that it can be shared across
multiple loads of the same dynamic library.
We've seen that in Part 9 ,
uh, over a year ago.
So then, where are relocations?
Shell session
$ readelf -Wr ./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/libstage1.so | head
Relocation section '.rela.dyn' at offset 0x490 contains 125 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
👇
000000000000a750 0000000000000008 R_X86_64_RELATIVE 28e0
000000000000a758 0000000000000008 R_X86_64_RELATIVE 2890
000000000000a760 0000000000000008 R_X86_64_RELATIVE 2900
000000000000a778 0000000000000008 R_X86_64_RELATIVE 2a90
000000000000a780 0000000000000008 R_X86_64_RELATIVE 2910
000000000000a788 0000000000000008 R_X86_64_RELATIVE 2a40
000000000000a790 0000000000000008 R_X86_64_RELATIVE 7000
$ readelf -Wl ./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/libstage1.so | grep -E "MemSiz|LOAD"
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x001060 0x001060 R 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x004b0d 0x004b0d R E 0x1000
LOAD 0x007000 0x0000000000007000 0x0000000000007000 0x001b88 0x001b88 R 0x1000
👇
LOAD 0x009750 0x000000000000a750 0x000000000000a750 0x0008c0 0x0008f8 RW 0x1000
Ah! In the read-write segment.
Okay then:
Rust code
// in `relink_stage1`
let rw_segment = mapped. vaddr_slice (
obj. segments ( )
. of_type ( pixie:: SegmentType:: Load)
. find ( |x| x. header ( ) . flags == ( ProgramHeader:: READ | ProgramHeader:: WRITE) )
. unwrap ( )
. header ( )
. mem_range ( ) ,
) ;
writer. write_all ( rw_segment) ?;
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
Picked base_offset 0x400000
Stage1 hull: 400000..40b048
Guest hull: 400000..3180968
Loaded stage1
Relocated stage1
$ readelf -Wl ./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/libstage1.so | grep -E "MemSiz|LOAD"
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x001060 0x001060 R 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x004b0d 0x004b0d R E 0x1000
LOAD 0x007000 0x0000000000007000 0x0000000000007000 0x001b88 0x001b88 R 0x1000
👇 👇
LOAD 0x009750 0x000000000000a750 0x000000000000a750 0x0008c0 0x0008f8 RW 0x1000
$ dd if=./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/libstage1.so of=/tmp/unrelocated bs=1 skip=$((0x009750)) count=$((0x0008c0))
2240+0 records in
2240+0 records out
2240 bytes (2.2 kB, 2.2 KiB) copied, 0.0038677 s, 579 kB/s
Let's diff again:
Shell session
$ diff <(xxd /tmp/unrelocated) <(xxd /tmp/hugo.pak) | head -14
1,6c1,6
👇
< 00000000: e028 0000 0000 0000 9028 0000 0000 0000 .(.......(......
< 00000010: 0029 0000 0000 0000 0800 0000 0000 0000 .)..............
< 00000020: 0800 0000 0000 0000 902a 0000 0000 0000 .........*......
< 00000030: 1029 0000 0000 0000 402a 0000 0000 0000 .)......@*......
< 00000040: 0070 0000 0000 0000 4b00 0000 0000 0000 .p......K.......
< 00000050: 5c01 0000 1300 0000 0029 0000 0000 0000 \........)......
---
👇
> 00000000: e028 4000 0000 0000 9028 4000 0000 0000 .(@......(@.....
> 00000010: 0029 4000 0000 0000 0800 0000 0000 0000 .)@.............
> 00000020: 0800 0000 0000 0000 902a 4000 0000 0000 .........*@.....
> 00000030: 1029 4000 0000 0000 402a 4000 0000 0000 .)@.....@*@.....
> 00000040: 0070 4000 0000 0000 4b00 0000 0000 0000 .p@.....K.......
> 00000050: 5c01 0000 1300 0000 0029 4000 0000 0000 \........)@.....
Ah, there we have it! A bunch of 0
that become 4
.
Is it because hugo
has a base address of 0x40000
? What would happen if we
operated on /bin/ls
instead?
Well, let's try it:
Shell session
$ cargo run --quiet --release --bin minipak -- /bin/ls -o /tmp/ls.pak
Packing guest "/bin/ls"
Picked base_offset 0x800000
Stage1 hull: 800000..80b048
Guest hull: 0..24558
Loaded stage1
Relocated stage1
$ diff <(xxd /tmp/unrelocated) <(xxd /tmp/ls.pak) | head -14
1,6c1,6
👇
< 00000000: e028 0000 0000 0000 9028 0000 0000 0000 .(.......(......
< 00000010: 0029 0000 0000 0000 0800 0000 0000 0000 .)..............
< 00000020: 0800 0000 0000 0000 902a 0000 0000 0000 .........*......
< 00000030: 1029 0000 0000 0000 402a 0000 0000 0000 .)......@*......
< 00000040: 0070 0000 0000 0000 4b00 0000 0000 0000 .p......K.......
< 00000050: 5c01 0000 1300 0000 0029 0000 0000 0000 \........)......
---
👇
> 00000000: e028 8000 0000 0000 9028 8000 0000 0000 .(.......(......
> 00000010: 0029 8000 0000 0000 0800 0000 0000 0000 .)..............
> 00000020: 0800 0000 0000 0000 902a 8000 0000 0000 .........*......
> 00000030: 1029 8000 0000 0000 402a 8000 0000 0000 .)......@*......
> 00000040: 0070 8000 0000 0000 4b00 0000 0000 0000 .p......K.......
> 00000050: 5c01 0000 1300 0000 0029 8000 0000 0000 \........)......
Yup, sure enough! They're now 8
.
Alright, well, there's no telling if our relocations are correct yet, but at
least there's definitely something being relocated.
Which means all we need to do now... is generate an ELF object that happens
to be an executable.
So, what, write a header?
Yes! And program headers, everything we need.
And it'll be easy! Because deku
not only lets us read binary formats,
it also lets us write binary formats.
Let's go!
Rust code
fn relink_stage1 ( guest_hull : Range < u64 > , writer : & mut Writer ) -> Result < ( ) , Error > {
let obj = Object:: new ( include_bytes ! ( concat!(
env!( "OUT_DIR" ) ,
"/embeds/libstage1.so"
) ) ) ?;
let hull = obj. segments ( ) . load_convex_hull ( ) ?;
assert_eq ! ( hull.start, 0 , "stage1 must be relocatable" ) ;
// Pick a base offset. If our guest is a relocatable executable, pick a
// random one, otherwise, pick theirs.
let base_offset = if guest_hull. start == 0 {
0x800000 // by fair dice roll
} else {
guest_hull. start
} ;
println ! ( "Picked base_offset 0x{:x}" , base_offset) ;
let hull = ( hull. start + base_offset) ..( hull. end + base_offset) ;
println ! ( "Stage1 hull: {:x?}" , hull) ;
println ! ( " Guest hull: {:x?}" , guest_hull) ;
// Map stage1 wherever...
let mut mapped = MappedObject:: new ( & obj, None) ?;
println ! ( "Loaded stage1" ) ;
// ...but relocate it as if it was mapped at `base_offset`
mapped. relocate ( base_offset) ?;
println ! ( "Relocated stage1" ) ;
println ! ( "Looking for `entry` in stage1..." ) ;
let entry_sym = mapped. lookup_sym ( "entry" ) ?;
let entry_point = base_offset + entry_sym. value ;
// Collect all the load segments
let mut load_segs = obj
. segments ( )
. of_type ( SegmentType:: Load)
. collect :: < Vec < _ > > ( ) ;
// Now write out some ELF!
let out_header = ObjectHeader {
class : pixie:: ElfClass:: Elf64,
endianness : Endianness:: Little,
version : 1 ,
os_abi : OsAbi:: SysV,
typ : ElfType:: Exec,
machine : ElfMachine:: X86_64,
version_bis : 1 ,
entry_point,
flags : 0 ,
hdr_size : ObjectHeader:: SIZE,
// Two additional segments: one for `brk` alignment, and GNU_STACK.
ph_count : load_segs. len ( ) as u16 + 2 ,
ph_offset : ObjectHeader:: SIZE as _ ,
ph_entsize : ProgramHeader:: SIZE,
// We're not adding any sections, our object will be opaque to debuggers
sh_count : 0 ,
sh_entsize : 0 ,
sh_nidx : 0 ,
sh_offset : 0 ,
} ;
writer. write_deku ( & out_header) ?;
let static_headers = load_segs. iter ( ) . map ( |seg| {
let mut ph = seg. header ( ) . clone ( ) ;
ph. vaddr += base_offset;
ph. paddr += base_offset;
ph
} ) ;
for ph in static_headers {
writer. write_deku ( & ph) ?;
}
// Insert dummy segment to offset the `brk` to its original position
// for the guest, if we can.
{
let current_hull = align_hull ( hull) ;
let desired_hull = align_hull ( guest_hull) ;
let pad_size = if current_hull. end > desired_hull. end {
println ! ( "WARNING: Guest executable is too small, the `brk` will be wrong." ) ;
0x0
} else {
desired_hull. end - current_hull. end
} ;
let ph = ProgramHeader {
paddr : current_hull. end ,
vaddr : current_hull. end ,
memsz : pad_size,
filesz : 0 ,
offset : 0 ,
align : 0x1000 ,
typ : SegmentType:: Load,
flags : ProgramHeader:: WRITE | ProgramHeader:: READ,
} ;
writer. write_deku ( & ph) ?;
}
// Add a GNU_STACK program header for alignment and to make it
// non-executable.
{
let ph = ProgramHeader {
paddr : 0 ,
vaddr : 0 ,
memsz : 0 ,
filesz : 0 ,
offset : 0 ,
align : 0x10 ,
typ : SegmentType:: GnuStack,
flags : ProgramHeader:: WRITE | ProgramHeader:: READ,
} ;
writer. write_deku ( & ph) ?;
}
// Sort load segments by file offset and copy them.
{
load_segs. sort_by_key ( |& seg| seg. header ( ) . offset ) ;
println ! ( "Copying stage1 segments..." ) ;
let copy_start_offset = writer. offset ( ) ;
println ! ( "copy_start_offset = 0x{:x}" , copy_start_offset) ;
let copied_segments = load_segs
. into_iter ( )
. filter ( move |seg| seg. header ( ) . offset > copy_start_offset) ;
for cp_seg in copied_segments {
let ph = cp_seg. header ( ) ;
println ! ( "copying {:?}" , ph) ;
// Pad space between segments with zeros:
writer. pad ( ph. offset - writer. offset ( ) ) ?;
// Then copy.
let start = ph. vaddr ;
let len = ph. filesz ;
let end = start + len;
writer. write_all ( mapped. vaddr_slice ( start..end) ) ?;
}
}
// Pad end of last segment with zeros:
writer. align ( 0x1000 ) ?;
Ok ( ( ) )
}
We've used a couple helper functions, let's define them now: align_hull
:
Rust code
// in `crates/pixie/src/lib.rs`
/// Align *down* to the nearest 4K boundary
pub fn floor ( val : u64 ) -> u64 {
val & !0xFFF
}
/// Align *up* to the nearest 4K boundary
pub fn ceil ( val : u64 ) -> u64 {
if floor ( val) == val {
val
} else {
floor ( val + 0x1000 )
}
}
/// Given a convex hull, align its start *down* to the nearest 4K boundary and
/// its end *up* to the nearest 4K boundary
pub fn align_hull ( hull : Range < u64 > ) -> Range < u64 > {
floor ( hull. start ) ..ceil ( hull. end )
}
And then MappedObject::lookup_sym
, which we use to find the address of entry
in libstage1.so
. Luckily this one is trivially expressed using the abstractions
we've already carefully constructed:
Rust code
// in `crates/pixie/src/lib.rs`
impl < ' a > MappedObject < ' a > {
/// Returns the (non-relocated) vaddr of a symbol by name
pub fn lookup_sym ( & self , name : & str ) -> Result < Sym , PixieError > {
let dyn_entries = self . object . read_dynamic_entries ( ) ?;
dyn_entries. syms ( ) ?. by_name ( name)
}
}
And now, well... we should be generating a fully-relocated, statically linked
executable from libstage1.so
.
Let's try it?
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
Picked base_offset 0x400000
Stage1 hull: 400000..40b048
Guest hull: 400000..3180968
Loaded stage1
Relocated stage1
Looking for `entry` in stage1...
Copying stage1 segments...
copy_start_offset = 0x190
copying ProgramHeader { typ: Load, flags: 0x5, offset: 0x2000, vaddr: 0x2000, paddr: 0x2000, filesz: 0x4b0d, memsz: 0x4b0d, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x4, offset: 0x7000, vaddr: 0x7000, paddr: 0x7000, filesz: 0x1b88, memsz: 0x1b88, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x6, offset: 0x9750, vaddr: 0xa750, paddr: 0xa750, filesz: 0x8c0, memsz: 0x8f8, align: 0x1000 }
$ /tmp/hugo.pak
[stage1] Stack top: 0x7fffa8187a40
Hurray!!
...and if we've done our job correctly, it should have a structure that's very
similar to the original guest executable:
Shell session
$ readelf -Wl ~/go/bin/hugo | grep -E 'LOAD|MemSiz'
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x172b4b0 0x172b4b0 R E 0x1000
LOAD 0x172c000 0x0000000001b2c000 0x0000000001b2c000 0x155ceb8 0x155ceb8 R 0x1000
LOAD 0x2c89000 0x0000000003089000 0x0000000003089000 0x0b08c0 0x0f7968 RW 0x1000
$ readelf -Wl /tmp/hugo.pak | grep -E 'LOAD|MemSiz'
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x001060 0x001060 R 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x004b0d 0x004b0d R E 0x1000
LOAD 0x007000 0x0000000000407000 0x0000000000407000 0x001b88 0x001b88 R 0x1000
LOAD 0x009750 0x000000000040a750 0x000000000040a750 0x0008c0 0x0008f8 RW 0x1000
LOAD 0x000000 0x000000000040c000 0x000000000040c000 0x000000 0x2d75000 RW 0x1000
$ gdb -q -ex "set noconfirm" -ex "p/x 0x0000000003089000 + 0x0f7968" -ex "p/x 0x000000000040c000 + 0x2d75000" -ex "quit"
No symbol table is loaded. Use the "file" command.
$1 = 0x3180968
$2 = 0x3181000
Yes! After alignment, both executables end at the same address, and so their
brk
should be the same.
Now remember, we cannot have stage1
directly load the guest — well, right
now, we're not even writing the compressed guest to our output file, so it's
tiny:
Shell session
$ ls -lhA /tmp/hugo.pak
-rwxr-xr-x 1 amos amos 44K May 1 22:59 /tmp/hugo.pak
But still, we cannot have stage1
load the guest, because it's mapped where
the guest should be:
First, we need to map stage2
out of the way.
But where is "out of the way"?
Well, that's the beauty of it! The whole area where guest
will eventually be
is already mapped from our executable.
So any call to mmap
(without the FIXED
flag) will give us a region that's
"out of the way" — it won't overwrite an already-mapped region.
Well, right now we don't even have a stage2
to map, so, let's make one!
Shell session
$ (cd crates && cargo new --lib stage2)
warning: compiling this new package may not work due to invalid workspace configuration
current package believes it's in a workspace when it's not:
current: /home/amos/ftl/minipak/crates/stage2/Cargo.toml
workspace: /home/amos/ftl/minipak/Cargo.toml
this may be fixable by adding `crates/stage2` to the `workspace.members` array of the manifest located at: /home/amos/ftl/minipak/Cargo.toml
Alternatively, to keep it out of the workspace, add the package to the `workspace.exclude` array, or add an empty `[workspace]` table to the package's manifest.
Created library `stage2` package
Well, it's requested so politely:
TOML markup
# in the top-level `Cargo.toml`
[ workspace ]
members = [
"crates/encore" ,
"crates/pixie" ,
"crates/minipak" ,
"crates/stage1" ,
"crates/stage2" ,
]
Let's also add a build script:
Rust code
// in `crates/stage2/build.rs`
fn main ( ) {
println ! ( "cargo:rustc-link-arg=-Wl,-z,defs" ) ;
}
A dependency on encore
, and setting the crate type to cdylib
:
TOML markup
# in `crates/stage2/Cargo.toml`
[ lib ]
crate-type = [ "cdylib" ]
[ dependencies ]
encore = { path = "../encore" }
And, well, let's add an entry point to it too!
Rust code
// in `crates/stage2/src/lib.rs`
// Don't use libstd
#![ no_std]
// Allow inline assembly
#![ feature( asm) ]
// Allow naked (no-prelude) functions
#![ feature( naked_functions) ]
// Use the default allocation error handler
#![ feature( default_alloc_error_handler) ]
extern crate alloc;
use encore:: prelude:: * ;
macro_rules! info {
( $( $tokens: tt) * ) => {
println!( "[stage2] {}" , alloc::format!( $( $tokens) * ) ) ;
}
}
#[ no_mangle]
#[ inline( never) ]
/// # Safety
/// Does a raw syscall, initializes the global allocator
unsafe extern "C" fn entry ( stack_top : * mut u8 ) -> ! {
init_allocator ( ) ;
crate :: main ( stack_top)
}
/// # Safety
/// Maps and jmps to another ELF object
#[ inline( never) ]
unsafe fn main ( stack_top : * mut u8 ) -> ! {
info ! ( "Stack top: {:?}" , stack_top) ;
encore:: syscall:: exit ( 0 ) ;
}
Now, let's consider the chain of events: minipak
generates its executable from
the guest and stage1
. So by the time the "packed executable" starts up,
stage1
is already mapped.
stage2
, however, must be mapped by stage1
. So, it must be embedded into the
"packed executable" as well.
Luckily, we made this next part very easy for ourselves.
First off, whenever we build minipak
, we also want to build stage2
— let's
add it to our build script:
Rust code
// in `crates/minipak/build.rs`
// omitted: other functions
fn main ( ) {
for & arg in & [ "-nostartfiles" , "-nodefaultlibs" , "-static" ] {
println ! ( "cargo:rustc-link-arg={}" , arg) ;
}
cargo_build ( & PathBuf:: from ( "../stage1" ) ) ;
// new! 👇
cargo_build ( & PathBuf:: from ( "../stage2" ) ) ;
}
Second, let's add it as a Resource
in the pixie manifest:
Rust code
// in `crates/pixie/src/manifest.rs`
#[ derive( Debug, DekuRead, DekuWrite) ]
#[ deku( magic = b"piximani" ) ]
pub struct Manifest {
pub stage2 : Resource ,
pub guest : Resource ,
}
And thirdly, well, thirdly let's embed both libstage2.so
and the compressed
guest into the output executable.
They'll go right after our "relinked stage1":
Rust code
// in `crates`
#[ allow( clippy::unnecessary_wraps) ]
fn main ( env : Env ) -> Result < ( ) , Error > {
let args = cli:: Args:: parse ( & env) ;
println ! ( "Packing guest {:?}" , args.input) ;
let guest_file = File:: open ( args. input ) ?;
let guest_map = guest_file. map ( ) ?;
let guest_obj = Object:: new ( guest_map. as_ref ( ) ) ?;
let guest_hull = guest_obj. segments ( ) . load_convex_hull ( ) ?;
let mut output = Writer:: new ( & args. output , 0o755 ) ?;
relink_stage1 ( guest_hull, & mut output) ?;
let stage2_slice = include_bytes ! ( concat!( env!( "OUT_DIR" ) , "/embeds/libstage2.so" ) ) ;
let stage2_offset = output. offset ( ) ;
println ! ( "Copying stage2 at 0x{:x}" , stage2_offset) ;
output. write_all ( stage2_slice) ?;
output. align ( 0x8 ) ?;
println ! ( "Compressing guest..." ) ;
let compressed_guest = lz4_flex:: compress_prepend_size ( guest_map. as_ref ( ) ) ;
let guest_offset = output. offset ( ) ;
println ! ( "Copying compressed guest at 0x{:x}" , guest_offset) ;
output. write_all ( & compressed_guest) ?;
output. align ( 0x8 ) ?;
let manifest_offset = output. offset ( ) ;
println ! ( "Writing manifest at 0x{:x}" , manifest_offset) ;
let manifest = Manifest {
stage2 : Resource {
offset : stage2_offset as _ ,
len : stage2_slice. len ( ) ,
} ,
guest : Resource {
offset : guest_offset as _ ,
len : compressed_guest. len ( ) ,
} ,
} ;
output. write_deku ( & manifest) ?;
output. align ( 0x8 ) ?;
println ! ( "Writing end marker" ) ;
let end_marker = EndMarker {
manifest_offset : manifest_offset as _ ,
} ;
output. write_deku ( & end_marker) ?;
println ! ( "Written to ({})" , args.output) ;
Ok ( ( ) )
}
There! Now minipak
is feature-complete. Well, the minipak
crate, not the
whole project — stage1
and stage2
are still not complete, but our minipak
executable does everything we want it to do, and its output is a lot chunkier
than before:
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
Picked base_offset 0x400000
Stage1 hull: 400000..40b048
Guest hull: 400000..3180968
Loaded stage1
Relocated stage1
Looking for `entry` in stage1...
Copying stage1 segments...
copy_start_offset = 0x190
copying ProgramHeader { typ: Load, flags: 0x5, offset: 0x2000, vaddr: 0x2000, paddr: 0x2000, filesz: 0x4b0d, memsz: 0x4b0d, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x4, offset: 0x7000, vaddr: 0x7000, paddr: 0x7000, filesz: 0x1b88, memsz: 0x1b88, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x6, offset: 0x9750, vaddr: 0xa750, paddr: 0xa750, filesz: 0x8c0, memsz: 0x8f8, align: 0x1000 }
Copying stage2 at 0xb000
Compressing guest...
Copying compressed guest at 0x15670
Writing manifest at 0x1eda4f8
Writing end marker
Written to (/tmp/hugo.pak)
$ ls -lhA /tmp/hugo.pak
-rwxr-xr-x 1 amos amos 31M May 2 00:06 /tmp/hugo.pak
Mh. How does it compare with the original binary though?
Shh let's keep that for later. When we actually get it to work.
Okay, so! Clearly stage1
has to read the EndMarker
, to find the Manifest
,
so it knows where stage2
is, and it can map it.
Turns out this is relatively compact:
Rust code
// in `stage1/src/lib.rs`
use pixie:: { Manifest, MappedObject, Object} ;
/// # Safety
/// Maps and calls into another ELF object
#[ inline( never) ]
unsafe fn main ( stack_top : * mut u8 ) -> ! {
info ! ( "Stack top: {:?}" , stack_top) ;
// Open ourselves and read the manifest.
let file = File:: open ( "/proc/self/exe" ) . unwrap ( ) ;
let map = file. map ( ) . unwrap ( ) ;
let slice = map. as_ref ( ) ;
let manifest = Manifest:: read_from_full_slice ( slice) . unwrap ( ) ;
// Load stage2 anywhere in memory
let s2_slice = & slice[ manifest. stage2 . as_range ( ) ] ;
let s2_obj = Object:: new ( s2_slice) . unwrap ( ) ;
let mut s2_mapped = MappedObject:: new ( & s2_obj, None) . unwrap ( ) ;
info ! (
"Mapped stage2 at base 0x{:x} (offset 0x{:x})" ,
s2_mapped.base( ) ,
s2_mapped.base_offset( )
) ;
info ! ( "Relocating stage2..." ) ;
s2_mapped. relocate ( s2_mapped. base_offset ( ) ) . unwrap ( ) ;
info ! ( "Relocating stage2... done!" ) ;
// Find stage2's entry function and call it
let s2_entry = s2_mapped. lookup_sym ( "entry" ) . unwrap ( ) ;
info ! ( "Found entry_sym {:?}" , s2_entry) ;
let entry: unsafe extern "C" fn ( * mut u8 ) -> ! =
core:: mem:: transmute ( s2_mapped. base_offset ( ) + s2_entry. value ) ;
entry ( stack_top) ;
}
Of course this uses some types and functions from pixie
, so:
TOML markup
# Cargo.toml
[ dependencies ]
pixie = { path = "../pixie" }
And now... well, the whole thing won't quite work, but at least we should
reach stage2
.
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
(cut)
$ /tmp/hugo.pak
[stage1] Stack top: 0x7ffca05e6de0
[stage1] Mapped stage2 at base 0x7f199bfd2000 (offset 0x7f199bfd2000)
[stage1] Relocating stage2...
[stage1] Relocating stage2... done!
[stage1] Found entry_sym Sym { name: 112, bind: Global, typ: Func, shndx: 7, value: 26240, size: 20 }
[stage2] Stack top: 0x7ffca05e6de0
...and we do!
And now, the pièce de résistance.
Hey, we've done that already!
Why yes, yes we have! But this our last — and our best.
To launch the executable we have to:
Map ourselves
Read the manifest
Decompress the guest in memory
Map it in place
Adjust the PHDR
, PHNUM
and ENTRY
auxiliary vectors
Jump to the entry point
Oh, what about dynamically-linked executables? That need an interpreter?
Ah, I guess we can do that too! If we find an interpreter segment, we can map
it in memory as well, and jump to its entry point, instead of the guest's.
Let's go!
Rust code
// in `crates/stage2/src/lib.rs`
use pixie:: { Manifest, MappedObject, Object, ObjectHeader} ;
/// # Safety
/// Maps and jmps to another ELF object
#[ inline( never) ]
unsafe fn main ( stack_top : * mut u8 ) -> ! {
info ! ( "Stack top: {:?}" , stack_top) ;
let mut stack = Env:: read ( stack_top as _ ) ;
// Open ourselves and read the manifest.
let file = File:: open ( "/proc/self/exe" ) . unwrap ( ) ;
info ! ( "Mapping self..." ) ;
let map = file. map ( ) . unwrap ( ) ;
info ! ( "Mapping self... done!" ) ;
let slice = map. as_ref ( ) ;
let manifest = Manifest:: read_from_full_slice ( slice) . unwrap ( ) ;
let compressed_guest = & slice[ manifest. guest . as_range ( ) ] ;
let guest = lz4_flex:: decompress_size_prepended ( compressed_guest) . unwrap ( ) ;
let guest_obj = Object:: new ( guest. as_ref ( ) ) . unwrap ( ) ;
let guest_hull = guest_obj. segments ( ) . load_convex_hull ( ) . unwrap ( ) ;
let at = if guest_hull. start == 0 {
// guest is relocatable, load it with the same base as ourselves
let elf_header_address = stack. find_vector ( AuxvType:: PHDR) . value ;
let self_base = elf_header_address - ObjectHeader:: SIZE as u64 ;
Some ( self_base)
} else {
// guest is non-relocatable, it'll be loaded at its preferred offset
None
} ;
let base_offset = at. unwrap_or_default ( ) ;
let guest_mapped = MappedObject:: new ( & guest_obj, at) . unwrap ( ) ;
info ! ( "Mapped guest at 0x{:x}" , guest_mapped.base( ) ) ;
// Set phdr auxiliary vector
let at_phdr = stack. find_vector ( AuxvType:: PHDR) ;
at_phdr. value = guest_mapped. base ( ) + guest_obj. header ( ) . ph_offset ;
// Set phnum auxiliary vector
let at_phnum = stack. find_vector ( AuxvType:: PHNUM) ;
at_phnum. value = guest_obj. header ( ) . ph_count as _ ;
// Set entry auxiliary vector
let at_entry = stack. find_vector ( AuxvType:: ENTRY) ;
at_entry. value = base_offset + guest_obj. header ( ) . entry_point ;
match guest_obj. segments ( ) . find ( SegmentType:: Interp) {
Ok( interp) => {
let interp = core:: str:: from_utf8 ( interp. slice ( ) ) . unwrap ( ) ;
println ! ( "Should load interpreter {}!" , interp) ;
let interp_file = File:: open ( interp) . unwrap ( ) ;
let interp_map = interp_file. map ( ) . unwrap ( ) ;
let interp_obj = Object:: new ( interp_map. as_ref ( ) ) . unwrap ( ) ;
let interp_hull = interp_obj. segments ( ) . load_convex_hull ( ) . unwrap ( ) ;
if interp_hull. start != 0 {
panic ! ( "Expected interpreter to be relocatable" ) ;
}
// Map interpreter anywhere
let interp_mapped = MappedObject:: new ( & interp_obj, None) . unwrap ( ) ;
// Adjust base
let at_base = stack. find_vector ( AuxvType:: BASE) ;
at_base. value = interp_mapped. base ( ) ;
let entry_point = interp_mapped. base ( ) + interp_obj. header ( ) . entry_point ;
info ! ( "Jumping to interpreter's entry point 0x{:x}" , entry_point) ;
pixie:: launch ( stack_top, entry_point) ;
}
Err( _) => {
let entry_point = base_offset + guest_obj. header ( ) . entry_point ;
info ! ( "Jumping to guest's entry point 0x{:x}" , entry_point) ;
pixie:: launch ( stack_top, entry_point) ;
}
}
}
We just need a couple dependencies:
TOML markup
# in `crates/stage2/Cargo.toml`
[ dependencies ]
encore = { path = "../encore" }
# 👇 new!
pixie = { path = "../pixie" }
# 👇 also new!
lz4_flex = { version = "0.7.5" , default-features = false , features = [ "safe-encode" , "safe-decode" ] }
And we're off to the races!
Shell session
$ cargo run --quiet --release --bin minipak -- ~/go/bin/hugo -o /tmp/hugo.pak
Packing guest "/home/amos/go/bin/hugo"
(cut)
$ /tmp/hugo.pak
[stage1] Stack top: 0x7ffc7371c880
[stage1] Mapped stage2 at base 0x7f6cf5481000 (offset 0x7f6cf5481000)
[stage1] Relocating stage2...
[stage1] Relocating stage2... done!
[stage1] Found entry_sym Sym { name: 119, bind: Global, typ: Func, shndx: 7, value: 75936, size: 20 }
[stage2] Stack top: 0x7ffc7371c880
[stage2] Mapping self...
[stage2] Mapping self... done!
[stage2] Mapped guest at 0x400000
[stage2] Jumping to guest's entry point 0x4712a0
Total in 0 ms
Error: Unable to locate config file or config directory. Perhaps you need to create a new site.
Run `hugo help new` for details.
✨✨✨
We did it! We finally did it.
Let's make sure it also works with dynamically-linked executables:
Shell session
$ cargo run --quiet --release --bin minipak -- /bin/ls -o /tmp/ls.pak
Packing guest "/bin/ls"
Picked base_offset 0x800000
Stage1 hull: 800000..81e048
Guest hull: 0..24558
Loaded stage1
Relocated stage1
Looking for `entry` in stage1...
WARNING: Guest executable is too small, the `brk` will be wrong.
Copying stage1 segments...
copy_start_offset = 0x190
copying ProgramHeader { typ: Load, flags: 0x5, offset: 0x3000, vaddr: 0x3000, paddr: 0x3000, filesz: 0x1226d, memsz: 0x1226d, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x4, offset: 0x16000, vaddr: 0x16000, paddr: 0x16000, filesz: 0x485c, memsz: 0x485c, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x6, offset: 0x1b380, vaddr: 0x1c380, paddr: 0x1c380, filesz: 0x1c90, memsz: 0x1cc8, align: 0x1000 }
Copying stage2 at 0x1e000
Compressing guest...
Copying compressed guest at 0x39670
Writing manifest at 0x4e180
Writing end marker
Written to (/tmp/ls.pak)
$ /tmp/ls.pak -lhA
[stage1] Stack top: 0x7fff2495c6e0
[stage1] Mapped stage2 at base 0x7fce6c804000 (offset 0x7fce6c804000)
[stage1] Relocating stage2...
[stage1] Relocating stage2... done!
[stage1] Found entry_sym Sym { name: 119, bind: Global, typ: Func, shndx: 7, value: 75936, size: 20 }
[stage2] Stack top: 0x7fff2495c6e0
[stage2] Mapping self...
[stage2] Mapping self... done!
[stage2] Mapped guest at 0x800000
Should load interpreter /lib64/ld-linux-x86-64.so.2!
[stage2] Jumping to interpreter's entry point 0x7fce6474b090
total 48K
drwxr-xr-x 2 amos amos 4.0K May 1 15:53 .cargo
-rw-r--r-- 1 amos amos 6.5K May 2 00:38 Cargo.lock
-rw-r--r-- 1 amos amos 223 May 1 23:33 Cargo.toml
drwxr-xr-x 7 amos amos 4.0K May 1 23:33 crates
-rw------- 1 amos amos 2.6K May 1 18:11 .gdb_history
drwxr-xr-x 8 amos amos 4.0K May 1 23:28 .git
-rw-r--r-- 1 amos amos 21 Feb 21 20:14 .gitignore
-rw-r--r-- 1 amos amos 117 May 1 15:44 rust-toolchain
drwxr-xr-x 2 amos amos 4.0K May 1 19:45 samples
drwxr-xr-x 4 amos amos 4.0K May 1 19:58 target
drwxr-xr-x 2 amos amos 4.0K Feb 21 18:28 .vscode
Beary cool! Does it really make sense to compress ls
though?
Well, no, not really. ls
is already so small, our packed version is actually
larger:
Shell session
$ ls -lhA /bin/ls
-rwxr-xr-x 1 root root 139K Mar 6 2020 /bin/ls
$ ls -lhA /tmp/ls.pak
-rwxr-xr-x 1 amos amos 313K May 2 00:48 /tmp/ls.pak
...because it contains stage1
, stage2
and a compressed version of ls
,
and both stages are pretty chunky right now:
Shell session
$ ls -lhA ./target/release/build/minipak-51b667ed4cbdb6ec/out/embeds/
total 244K
-rw-r--r-- 1 amos amos 177 May 1 19:58 CACHEDIR.TAG
-rwxr-xr-x 1 amos amos 118K May 2 00:47 libstage1.so
-rwxr-xr-x 1 amos amos 110K May 2 00:47 libstage2.so
drwxr-xr-x 7 amos amos 4.0K May 2 00:47 release
-rw-r--r-- 1 amos amos 1.6K May 1 19:58 .rustc_info.json
So, let's answer a bunch of questions!
Well, first off, ~110K is not that chunky , by desktop computer standards.
It's positively tiny by server computer standards, and it's enormous by
embedded standards, but we're not targeting your smartwatch, so all is well.
Still, I was curious what was in there, so I looked, using Bloaty McBloatface :
Shell session
$ cargo build --release
$ objcopy --strip-all ./target/release/libstage1.so /tmp/libstage1.so
$ bloaty -d symbols -n 0 --debug-file ./target/release/libstage1.so /tmp/libstage1.so | head -30
FILE SIZE VM SIZE
-------------- --------------
6.9% 8.07Ki 0.0% 0 [Unmapped]
6.3% 7.41Ki 6.9% 7.41Ki [section .rela.dyn]
4.7% 5.50Ki 5.1% 5.50Ki [section .rodata]
4.3% 5.02Ki 4.7% 5.02Ki [section .data.rel.ro]
3.2% 3.77Ki 3.5% 3.77Ki _$LT$pixie..format..header..ObjectHeader$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::hf0dca140941584be
2.7% 3.13Ki 2.9% 3.13Ki _$LT$bitvec..ptr..span..BitSpanError$LT$T$GT$$u20$as$u20$core..fmt..Debug$GT$::fmt::hae2f2e9efb0f4129
2.1% 2.49Ki 2.3% 2.49Ki pixie::MappedObject::relocate::h2fcec852915cca41
2.0% 2.31Ki 2.1% 2.31Ki bitvec::slice::BitSlice$LT$O$C$T$GT$::clone_from_bitslice::hf0e2687b7949ac19
1.7% 2.02Ki 1.9% 2.02Ki [section .text]
1.5% 1.77Ki 1.6% 1.77Ki core::fmt::Formatter::pad::hcb18266da989bb74
1.3% 1.58Ki 1.5% 1.58Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$u16$GT$::read::h52a077145863edef
1.3% 1.58Ki 1.5% 1.58Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$u32$GT$::read::h66b6fabedc184b5f
1.3% 1.58Ki 1.5% 1.58Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$usize$GT$::read::h62ed5fa41ab068c2
1.3% 1.52Ki 1.4% 1.52Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$u8$GT$::read::h4e61a7d6b96113b7
1.3% 1.50Ki 0.0% 0 [ELF Headers]
1.1% 1.35Ki 1.3% 1.35Ki _$LT$str$u20$as$u20$core..fmt..Debug$GT$::fmt::h06fbb5704eb2e464
1.1% 1.28Ki 1.2% 1.28Ki stage1::main::hdd1a3e200abaead0
1.1% 1.26Ki 1.2% 1.26Ki pixie::Object::new::hb545e2dcce88210a
1.0% 1.23Ki 1.1% 1.23Ki _$LT$pixie..format..sym..Sym$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::h3cb963e32c924534
1.0% 1.21Ki 1.1% 1.21Ki core::fmt::Formatter::pad_integral::h5030801cc5b3cd80
0.9% 1.05Ki 1.0% 1.05Ki _$LT$pixie..format..program_header..ProgramHeader$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::h92ac268876508bb9
0.9% 1.04Ki 1.0% 1.04Ki core::str::slice_error_fail::h02d9683ab20ccc40
0.8% 1023 0.9% 1023 _$LT$pixie..manifest..Manifest$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::h9d75f433f2e12a83
0.8% 920 0.8% 920 linked_list_allocator::hole::HoleList::allocate_first_fit::h2b05751692364505
0.7% 873 0.8% 873 bitvec::vec::api::_$LT$impl$u20$bitvec..vec..BitVec$LT$O$C$T$GT$$GT$::extend_with::h24831e0a831e998d
0.7% 870 0.8% 870 pixie::MappedObject::lookup_sym::hf84feee74de2706b
0.7% 853 0.8% 853 bitvec::slice::BitSlice$LT$O$C$T$GT$::copy_within_unchecked::h460ce8747088367a
0.7% 817 0.7% 817 _$LT$pixie..format..rela..Rela$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::hef92d06a7ec7525c
Well well well. I won't call out anyone here, but, convenience does come at a
cost, it would seem.
Let's look at stage2
:
Shell session
$ objcopy --strip-all ./target/release/libstage2.so /tmp/libstage2.so
$ bloaty -d symbols -n 0 --debug-file ./target/release/libstage2.so /tmp/libstage2.so | head -30
FILE SIZE VM SIZE
-------------- --------------
8.0% 8.77Ki 0.0% 0 [Unmapped]
5.8% 6.40Ki 6.4% 6.40Ki [section .rela.dyn]
5.1% 5.58Ki 5.6% 5.58Ki stage2::main::he49bd1bea95ff619
4.6% 5.08Ki 5.1% 5.08Ki [section .rodata]
3.4% 3.77Ki 3.8% 3.77Ki _$LT$pixie..format..header..ObjectHeader$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::hf0dca140941584be
3.3% 3.58Ki 3.6% 3.58Ki [section .data.rel.ro]
2.1% 2.31Ki 2.3% 2.31Ki bitvec::slice::BitSlice$LT$O$C$T$GT$::clone_from_bitslice::hf0e2687b7949ac19
1.7% 1.90Ki 1.9% 1.90Ki [section .text]
1.6% 1.77Ki 1.8% 1.77Ki core::fmt::Formatter::pad::hcb18266da989bb74
1.4% 1.58Ki 1.6% 1.58Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$u16$GT$::read::h52a077145863edef
1.4% 1.58Ki 1.6% 1.58Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$u32$GT$::read::h66b6fabedc184b5f
1.4% 1.58Ki 1.6% 1.58Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$usize$GT$::read::h62ed5fa41ab068c2
1.4% 1.57Ki 1.6% 1.57Ki _$LT$bitvec..ptr..span..BitSpanError$LT$T$GT$$u20$as$u20$core..fmt..Debug$GT$::fmt::hae2f2e9efb0f4129
1.4% 1.52Ki 1.5% 1.52Ki deku::impls::primitive::_$LT$impl$u20$deku..DekuRead$LT$$LP$deku..ctx..Endian$C$deku..ctx..Size$RP$$GT$$u20$for$u20$u8$GT$::read::h4e61a7d6b96113b7
1.3% 1.38Ki 0.0% 0 [ELF Headers]
1.2% 1.35Ki 1.4% 1.35Ki _$LT$str$u20$as$u20$core..fmt..Debug$GT$::fmt::h06fbb5704eb2e464
1.1% 1.26Ki 1.3% 1.26Ki pixie::Object::new::hb545e2dcce88210a
1.1% 1.21Ki 1.2% 1.21Ki core::fmt::Formatter::pad_integral::h5030801cc5b3cd80
1.0% 1.05Ki 1.1% 1.05Ki _$LT$pixie..format..program_header..ProgramHeader$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::h92ac268876508bb9
1.0% 1.04Ki 1.1% 1.04Ki core::str::slice_error_fail::h02d9683ab20ccc40
0.9% 1023 1.0% 1023 _$LT$pixie..manifest..Manifest$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::h9d75f433f2e12a83
0.8% 920 0.9% 920 linked_list_allocator::hole::HoleList::allocate_first_fit::h2b05751692364505
0.8% 873 0.9% 873 bitvec::vec::api::_$LT$impl$u20$bitvec..vec..BitVec$LT$O$C$T$GT$$GT$::extend_with::h24831e0a831e998d
0.8% 853 0.8% 853 bitvec::slice::BitSlice$LT$O$C$T$GT$::copy_within_unchecked::h460ce8747088367a
0.7% 812 0.8% 812 _$LT$pixie..manifest..EndMarker$u20$as$u20$deku..DekuContainerRead$GT$::from_bytes::h83ebca148e53570a
0.7% 796 0.8% 796 encore::fs::File::raw_open::hdba6f4608ca26b0d
0.7% 772 0.8% 772 _$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$::write_str::h7ca3568df6f09b6a
0.7% 770 0.8% 770 bitvec::slice::BitSlice$LT$O$C$T$GT$::copy_within_unchecked::h31c15c4829e980b1
Interestingly, lz4_flex
(which stage1
does not use) doesn't even show up in
the top 30 hungriest hippos symbols:
Shell session
$ bloaty -d symbols -n 0 --debug-file ./target/release/libstage2.so /tmp/libstage2.so | grep lz4
0.4% 433 0.4% 433 lz4_flex::block::decompress_safe::duplicate_overlapping_slice::h018f1b297902314e
0.3% 370 0.4% 370 _$LT$lz4_flex..block..DecompressError$u20$as$u20$core..fmt..Debug$GT$::fmt::hb893af1a554d89f6
0.2% 205 0.2% 205 lz4_flex::block::decompress_safe::copy_24::hfa85bd20ab36cca7
Although, maybe it's just been mostly inlined in stage2::main
? Hard to tell.
We can always try to ask the compiler to "optimize for size", see if it makes a
difference?
TOML markup
[ profile . release ]
opt-level = "s"
Shell session
$ cargo build --release
$ objcopy --strip-all ./target/release/libstage1.so /tmp/libstage1.so
$ objcopy --strip-all ./target/release/libstage2.so /tmp/libstage2.so
$ ls -lhA /tmp/libstage*
-rwxr-xr-x 1 amos amos 114K May 2 01:30 /tmp/libstage1.so
-rwxr-xr-x 1 amos amos 98K May 2 01:31 /tmp/libstage2.so
It helps a little!
What about optimization level z
?
TOML markup
$ ls -lhA /tmp /libstage *
-rwxr-xr-x 1 amos amos 118K May 2 01 :32 /tmp /libstage1 . so
-rwxr-xr-x 1 amos amos 106K May 2 01 :32 /tmp /libstage2 . so
Mh, nope, s
was better for us.
What if we switch from "thin" LTO to "fat" LTO?
TOML markup
[ profile . release ]
lto = "fat"
Shell session
$ ls -lhA /tmp/libstage*
-rwxr-xr-x 1 amos amos 94K May 2 01:33 /tmp/libstage1.so
-rwxr-xr-x 1 amos amos 82K May 2 01:33 /tmp/libstage2.so
Mhh, mhh, small gains. We can even bring down codegen-units
to 1, to really
take advantage of LTO, as explained here by James .
TOML markup
[ profile . release ]
codegen-units = 1
incremental = false
Shell session
$ ls -lhA /tmp/libstage*
-rwxr-xr-x 1 amos amos 82K May 2 01:36 /tmp/libstage1.so
-rwxr-xr-x 1 amos amos 70K May 2 01:37 /tmp/libstage2.so
Finally, we can force libcore
to be built with those settings as well:
To get this to compile, we had to comment out mentions of compiler-builtins
in the encore
crate.
Also, -Z build-std
is a nightly flag, it only works because we ask for a
nightly toolchain in the rust-toolchain
file.
Shell session
$ cargo build -Z build-std --target x86_64-unknown-linux-gnu --release
Note also that -Z build-std
requires --target
to be set, and that changes
the directory where the libraries are produced:
Shell session
$ objcopy --strip-all ./target/x86_64-unknown-linux-gnu/release/libstage1.so /tmp/libstage1.so
$ objcopy --strip-all ./target/x86_64-unknown-linux-gnu/release/libstage2.so /tmp/libstage2.so
$ ls -lhA /tmp/libstage*
-rwxr-xr-x 1 amos amos 70K May 2 01:43 /tmp/libstage1.so
-rwxr-xr-x 1 amos amos 54K May 2 01:43 /tmp/libstage2.so
Let's try to re-pack ls
:
Shell session
$ ./target/x86_64-unknown-linux-gnu/release/minipak /bin/ls -o /tmp/ls.pak
Packing guest "/bin/ls"
(cut)
$ ls -lhA /tmp/ls.pak
-rwxr-xr-x 1 amos amos 237K May 2 01:44 /tmp/ls.pak
Well, that's 25% less than before! Pretty cool.
There's other things we could do!
For example, we could compress stage2
itself — we've seen that adding
lz4_flex
to the mix didn't make a big difference between stage1
and
stage2
, and stage2
is actually quite compressible:
$ lz4 -9 ./target/x86_64-unknown-linux-gnu/release/libstage2.so /tmp/libstage2.so.lz4
Compressed 701800 bytes into 243010 bytes ==> 34.63%
We could also make stage1
not use pixie
at all: we could have minipak
do
most of the work, generating a list of relocations that we can easily read and
process from stage1
without knowledge of the ELF file format, that would
probably cut down on stage1
's size.
And finally, we could switch to a different compression format. I'm not sure LZ4
is the best compromise in terms of compression ratio, decompression speed and
code size.
The last thing I want to do is compare against... well, the only ELF packer I'm
aware of: UPX .
Unfortunately, UPX refuses to pack /bin/ls
:
Shell session
$ upx -1 /bin/ls -o /tmp/ls.upx1
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2020
UPX git-d7ba31+ Markus Oberhumer, Laszlo Molnar & John Reiser Jan 23rd 2020
File size Ratio Format Name
-------------------- ------ ----------- -----------
upx: /bin/ls: CantPackException: bad DT_GNU_HASH n_bucket=0x15 n_bitmask=0x2 len=0xb0
Packed 0 files.
But it'll happily pack hugo, so, let's get comparing:
Shell session
$ ls -lhA ~/go/bin/hugo /tmp/hugo.*
-rwxr-xr-x 1 amos amos 61M Jan 26 10:44 /home/amos/go/bin/hugo
-rwxr-xr-x 1 amos amos 31M May 2 01:56 /tmp/hugo.pak
-rwxr-xr-x 1 amos amos 29M Jan 26 10:44 /tmp/hugo.upx1
-rwxr-xr-x 1 amos amos 26M Jan 26 10:44 /tmp/hugo.upx9
Honestly? I'm pretty happy with those results.
I'm also curious how long it takes to start up each of these.
I'm on a laptop right now, so this will be, uh, less than scientific, also
there's disk caches involved, and the whole thing is running in WSL2, but still,
let's take a look using hyperfine :
Shell session
$ hyperfine --warmup 5 '~/go/bin/hugo version' '/tmp/hugo.upx1 version' '/tmp/hugo.upx9 version' '/tmp/hugo.pak version'
Benchmark #1: ~/go/bin/hugo version
Time (mean ± σ): 24.9 ms ± 3.6 ms [User: 30.6 ms, System: 16.2 ms]
Range (min … max): 20.5 ms … 41.8 ms 112 runs
Benchmark #2: /tmp/hugo.upx1 version
Time (mean ± σ): 209.2 ms ± 15.5 ms [User: 214.9 ms, System: 17.1 ms]
Range (min … max): 195.8 ms … 253.6 ms 14 runs
Benchmark #3: /tmp/hugo.upx9 version
Time (mean ± σ): 179.8 ms ± 18.8 ms [User: 183.1 ms, System: 20.0 ms]
Range (min … max): 160.9 ms … 232.5 ms 16 runs
Benchmark #4: /tmp/hugo.pak version
Time (mean ± σ): 203.4 ms ± 9.3 ms [User: 179.7 ms, System: 45.8 ms]
Range (min … max): 187.2 ms … 217.5 ms 15 runs
Summary
'~/go/bin/hugo version' ran
7.23 ± 1.28 times faster than '/tmp/hugo.upx9 version'
8.18 ± 1.23 times faster than '/tmp/hugo.pak version'
8.41 ± 1.36 times faster than '/tmp/hugo.upx1 version'
Again, nothing to be ashamed of there — the upx -9
version seems faster than
both the upx -1
version and the minipak
version, but dang, I'm pretty happy
with those results.
Now let's try it on a large Rust binary: futile, which powers this
website .
Shell session
$ ls -lhA ~/futile /tmp/futile.*
-rwxr-xr-x 1 amos amos 27M May 2 02:05 /home/amos/futile
-rwxr-xr-x 1 amos amos 12M May 2 02:06 /tmp/futile.pak
-rwxr-xr-x 1 amos amos 11M May 2 02:05 /tmp/futile.upx1
-rwxr-xr-x 1 amos amos 8.3M May 2 02:05 /tmp/futile.upx9
Again, not too bad! upx -9
has a strong lead here too, but keep it mind it's
been developed over two decades and its -9
setting uses seven different
passes!
What about startup times?
Shell session
$ hyperfine --warmup 15 '~/futile help' '/tmp/futile.upx1 help' '/tmp/futile.upx9 help' '/tmp/futile.pak help'
Benchmark #1: ~/futile help
Time (mean ± σ): 7.0 ms ± 3.5 ms [User: 4.4 ms, System: 6.2 ms]
Range (min … max): 1.8 ms … 16.1 ms 520 runs
Warning: Command took less than 5 ms to complete. Results might be inaccurate.
Benchmark #2: /tmp/futile.upx1 help
Time (mean ± σ): 80.5 ms ± 9.4 ms [User: 76.1 ms, System: 6.1 ms]
Range (min … max): 74.9 ms … 115.1 ms 37 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark #3: /tmp/futile.upx9 help
Time (mean ± σ): 72.0 ms ± 6.7 ms [User: 69.0 ms, System: 4.6 ms]
Range (min … max): 67.7 ms … 103.6 ms 39 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark #4: /tmp/futile.pak help
Time (mean ± σ): 71.9 ms ± 4.9 ms [User: 66.4 ms, System: 6.9 ms]
Range (min … max): 68.3 ms … 94.5 ms 38 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
'~/futile help' ran
10.34 ± 5.26 times faster than '/tmp/futile.pak help'
10.35 ± 5.30 times faster than '/tmp/futile.upx9 help'
11.58 ± 5.99 times faster than '/tmp/futile.upx1 help'
Not bad at all!
(I did my best here, but there were statistical outliers in all the runs. I
closed every program I could afford to, and still I couldn't get it to behave.
Ah well.)
So, we've done big Go binary, big Rust binary... how about big C++ binary?
Let's compress electron
with this!
Compressing with UPX went fine, but minipak
crashed:
Shell session
$ ~/ftl/minipak/target/x86_64-unknown-linux-gnu/release/minipak electron -o electron.pak
Packing guest "electron"
Picked base_offset 0x800000
Stage1 hull: 800000..815040
Guest hull: 0..8227a08
Loaded stage1
Relocated stage1Looking for `entry` in stage1...
Copying stage1 segments...copy_start_offset = 0x190
copying ProgramHeader { typ: Load, flags: 0x5, offset: 0x2000, vaddr: 0x2000, paddr: 0x2000, filesz: 0xbdc1, memsz: 0xbdc1, align: 0x1000 }copying ProgramHeader { typ: Load, flags: 0x4, offset: 0xe000, vaddr: 0xe000, paddr: 0xe000, filesz: 0x4224, memsz: 0x4224, align: 0x1000 }
copying ProgramHeader { typ: Load, flags: 0x6, offset: 0x12a20, vaddr: 0x13a20, paddr: 0x13a20, filesz: 0x15e8, memsz: 0x1620, align: 0x1000 }
Copying stage2 at 0x15000
Compressing guest...panicked at 'memory allocation of 150278328 bytes failed', /home/amos/.rustup/toolchains/nightly-2021-04-25-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:386:9
[1] 32021 illegal hardware instruction ~/ftl/minipak/target/x86_64-unknown-linux-gnu/release/minipak electron -o
Well, yup, we used a fixed heap size and it looks like, for this, 128 MiB
weren't enough!
Let's bump that to 512:
Rust code
// in `crates/encore/src/items.rs`
/// Heap size, in megabytes
const HEAP_SIZE_MB: u64 = 512 ;
And now, compression works!
Let's compare sizes:
Shell session
$ ls -lhA ./electron*
-rwxr-xr-x 1 amos amos 131M May 2 02:15 ./electron
-rwxr-xr-x 1 amos amos 73M May 2 02:23 ./electron.pak
-rwxr-xr-x 1 amos amos 63M May 2 02:15 ./electron.upx1
-rwxr-xr-x 1 amos amos 53M May 2 02:15 ./electron.upx9
And startup times:
Shell session
$ hyperfine --warmup 3 './electron -v' './electron.upx1 -v' './electron.upx9 -v' './electron.pak -v'
Benchmark #1: ./electron -v
Time (mean ± σ): 107.5 ms ± 12.0 ms [User: 72.0 ms, System: 11.5 ms]
Range (min … max): 98.2 ms … 140.1 ms 29 runs t help to . It migh Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark #2: ./electron.upx1 -v
Time (mean ± σ): 1.679 s ± 0.124 s [User: 766.7 ms, System: 84.0 ms]
Range (min … max): 1.491 s … 1.807 s 10 runs
Benchmark #3: ./electron.upx9 -v
Time (mean ± σ): 1.511 s ± 0.138 s [User: 704.6 ms, System: 66.0 ms]
Range (min … max): 1.335 s … 1.670 s 10 runs
Benchmark #4: ./electron.pak -v
Time (mean ± σ): 2.079 s ± 0.597 s [User: 558.6 ms, System: 309.1 ms]
Range (min … max): 1.235 s … 2.833 s 10 runs
Summary
'./electron -v' ran
14.06 ± 2.03 times faster than './electron.upx9 -v'
15.63 ± 2.09 times faster than './electron.upx1 -v'
19.34 ± 5.96 times faster than './electron.pak -v'
No surprises there — electron is a beast (but still, 100ms startup time
uncompressed is remarkable, given how much it packs).
But, like... we made an executable packer, that compresses electron .
And it runs, it's the real thing!
That's pretty darn cool .
This concludes my longest series so far, "Making our own executable packer".
All the way back in Part 3 , I
jokingly predicted that I would never finish it:
This series is never going to end.
In 2060, when I'm 70, and everybody will have switched to using Fuchsia on
the desktop, my friends will still poke fun at me: "Hey amos, remember your
ELF series? When's it gonna end?", and I'll feign a smile, but inside I will
be acutely, painfully aware that I have angered the binary gods and that I
should have left well enough alone.
Well, take that, 2020 me. We did it reddit! Woo!
I'd like to thank everyone for sticking around to see this series to its
conclusion, especially my patrons .
I know you're all probably wondering "what's next??" and the answer is: sleep.
Lots and lots of sleep.
And then, who knows! So many interesting topics. I'm sure y'all will have great
suggestions.
I hope you enjoyed the series, and if you've followed at home, send me
screenshots of your stuff running! That would make me really happy.
Until next time, take care!