Thanks to my sponsors: genny, belzael, Max Heaton, Chris Emery, Zachary Thomas, Alex Krantz, Andronik, Aleksandre Khokhiashvili, CryoMyst, Sarah Berrettini, Manuel Hutter, Eugene Bulkin, playest, Lena Schönburg, Chris Walker, Horváth-Lázár Péter, Michal Hošna, Pete LeVasseur, kuerbsikakteen, Yuriy Taraday and 254 more
So you want to live-reload Rust
👋 This page was last updated ~4 years ago. Just so you know.
Good morning! It is still 2020, and the world is literally on fire, so I guess we could all use a distraction.
This article continues the tradition of me getting shamelessly nerd-sniped - once by Pascal about small strings, then again by a twitch viewer about Rust enum sizes.
This time, Ana was handing out free nerdsnipes, so I got in line, and mine was:
How about you teach us how to how reload a dylib whenever the file changes?
And, sure, we can do that.
What's a dylib?
dylib
is short for "dynamic library", also called "shared library", "shared
object", "DLL", etc.
Let's first look at things that are not dynamic libraries.
We'll start with a C program, using GCC and binutils on Linux.
Say we want to greet many different things, and persons, we might want a greet function:
$ mkdir greet
$ cd greet/
// in `main.c`
#include <stdio.h>
void greet(const char *name) {
printf("Hello, %s!\n", name);
}
int main(void) {
greet("moon");
return 0;
}
$ gcc -Wall main.c -o main
$ ./main
Hello, moon!
This is not a dynamic library. It's just a function.
We can put that function into another file:
// in `greet.c`
#include <stdio.h>
void greet(const char *name) {
printf("Hello, %s!\n", name);
}
And compile it into an object (.o
) file:
$ gcc -Wall -c greet.c
$ file greet.o
greet.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Then, from main.c
, pinky promise that there will be a function named
greet
that exists at some point in the future:
// in `main.c`
extern void greet(const char *name);
int main(void) {
greet("stars");
return 0;
}
Then compile main.c
into an object (.o
) file:
$ gcc -Wall -c main.c
$ file main.o
main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Now we have two objects: greet.o
provides (T
) greet
,
and needs (U
) printf
:
$ nm greet.o
U _GLOBAL_OFFSET_TABLE_
0000000000000000 T greet
U printf
And we have main.o
, which provides (T
) main
, and needs
(U
) greet
:
$ nm main.o
U _GLOBAL_OFFSET_TABLE_
U greet
0000000000000000 T main
If we try to make an executable out of just greet.o
, then... it doesn't
work, because main
is not provided, and some other object (that GCC
magically links in when making executables) wants it:
$ gcc greet.o -o woops
/usr/bin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/Scrt1.o: in function `_start':
(.text+0x24): undefined reference to `main'
collect2: error: ld returned 1 exit status
If we try to make an executable with just main.o
, then... it doesn't
work either, because we promised greet
would be there, and it's not:
$ gcc
gcc main.o -o woops
/usr/bin/ld: main.o: in function `main':
main.c:(.text+0xc): undefined reference to `greet'
collect2: error: ld returned 1 exit status
But if we have both... then it works!
$ gcc main.o greet.o -o main
$ file main
main: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e1915df00b8bf67e121fbd30f0eaf1fd81ecdeb6, for GNU/Linux 3.2.0, not stripped
$ ./main
Hello, stars!
And we have an executable. Again. But there's still no dynamic library (of ours) involved there.
If we look at the symbols our main
executable needs:
$ nm --undefined-only main
w __cxa_finalize@@GLIBC_2.2.5
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
U __libc_start_main@@GLIBC_2.2.5
U printf@@GLIBC_2.2.5
Okay, let's ignore weak (w
) symbols for now - mostly, it needs... some
startup routine, and printf
. Good.
As for the symbols that are defined in main
, there's uh, a lot:
$ nm --defined-only main
00000000000002e8 r __abi_tag
0000000000004030 B __bss_start
0000000000004030 b completed.0
0000000000004020 D __data_start
0000000000004020 W data_start
0000000000001070 t deregister_tm_clones
00000000000010e0 t __do_global_dtors_aux
0000000000003df0 d __do_global_dtors_aux_fini_array_entry
0000000000004028 D __dso_handle
0000000000003df8 d _DYNAMIC
0000000000004030 D _edata
0000000000004038 B _end
00000000000011f8 T _fini
0000000000001130 t frame_dummy
0000000000003de8 d __frame_dummy_init_array_entry
000000000000214c r __FRAME_END__
0000000000004000 d _GLOBAL_OFFSET_TABLE_
0000000000002018 r __GNU_EH_FRAME_HDR
0000000000001150 T greet
0000000000001000 t _init
0000000000003df0 d __init_array_end
0000000000003de8 d __init_array_start
0000000000002000 R _IO_stdin_used
00000000000011f0 T __libc_csu_fini
0000000000001180 T __libc_csu_init
0000000000001139 T main
00000000000010a0 t register_tm_clones
0000000000001040 T _start
0000000000004030 D __TMC_END__
So let's filter it out a little:
$ nm --defined-only ./main | grep ' T '
00000000000011f8 T _fini
0000000000001150 T greet
00000000000011f0 T __libc_csu_fini
0000000000001180 T __libc_csu_init
0000000000001139 T main
0000000000001040 T _start
Oh hey, greet
is there.
Does that mean... is our main
file also a dynamic library?
Let's try loading it from another executable, at runtime.
How do we load a library at runtime? That's the dynamic linker's job. Instead of making our own, this time we'll use glibc's dynamic linker:
// in `load.c`
// my best guess is that `dlfcn` stands for `dynamic loading functions`
#include <dlfcn.h>
#include <stdio.h>
// C function pointer syntax is... something.
// Let's typedef our way out of this one.
typedef void (*greet_t)(const char *name);
int main(void) {
// what do we want? symbols!
// when do we want them? at an implementation-defined time!
void *lib = dlopen("./main", RTLD_LAZY);
if (!lib) {
fprintf(stderr, "failed to load library\n");
return 1;
}
greet_t greet = (greet_t) dlsym(lib, "greet");
if (!greet) {
fprintf(stderr, "could not look up symbol 'greet'\n");
return 1;
}
greet("venus");
dlclose(lib);
return 0;
}
Let's make an executable out of load.c
and:
$ gcc -Wall load.c -o load
/usr/bin/ld: /tmp/ccnvYCz7.o: in function `main':
load.c:(.text+0x15): undefined reference to `dlopen'
/usr/bin/ld: load.c:(.text+0x5a): undefined reference to `dlsym'
collect2: error: ld returned 1 exit status
Oh right, dlopen
itself is in a dynamic library - libdl.so
:
$ whereis libdl.so
libdl: /usr/lib/libdl.so /usr/lib/libdl.a
Okay, /usr/lib
is in gcc's default library path:
$ gcc -x c -E -v /dev/null 2>&1 | grep LIBRARY_PATH
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../:/lib/:/usr/lib/
...and it does contain dlopen
, dlsym
and dlclose
:
$ nm /usr/lib/libdl.so | grep -E 'dl(open|sym|close)'
nm: /usr/lib/libdl.so: no symbols
Uhh... wait, those are dynamic symbols, so we need nm
's -D
flag:
$ nm -D /usr/lib/libdl.so | grep -E 'dl(open|sym|close)'
0000000000001450 T dlclose@@GLIBC_2.2.5
0000000000001390 T dlopen@@GLIBC_2.2.5
00000000000014c0 T dlsym@@GLIBC_2.2.5
What's with the @@GLIBC_2.2.5
suffixes?
Oh hey cool bear - those are just versions, don't worry about them.
Say I did want to worry about them, where could I read more about them?
You can check the LSB Core Specification - but be warned, it's a rabbit hole and a half.
So, since libdl.so
contains the symbols we need, and its in GCC's library
path, we should be able to link against it with just -ldl
:
$ gcc -Wall load.c -o load -ldl
$ file load
load: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=0d246f67c894d7032d0d5093ec01625e58711034, for GNU/Linux 3.2.0, not stripped
Hooray! Now we just have to run it:
$ ./load
failed to load library
Ah. Well let's be thankful our C program had basic error checking this time.
So, main
is not a dynamic library?
I guess not.
Is there any way to get a little more visibility into why dlopen
fails?
Sure! We can use the LD_DEBUG
environment variable.
$ LD_DEBUG=all ./load
160275: symbol=__vdso_clock_gettime; lookup in file=linux-vdso.so.1 [0]
160275: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
160275: symbol=__vdso_gettimeofday; lookup in file=linux-vdso.so.1 [0]
160275: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_gettimeofday' [LINUX_2.6]
160275: symbol=__vdso_time; lookup in file=linux-vdso.so.1 [0]
160275: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_time' [LINUX_2.6]
160275: symbol=__vdso_getcpu; lookup in file=linux-vdso.so.1 [0]
160275: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
160275: symbol=__vdso_clock_getres; lookup in file=linux-vdso.so.1 [0]
160275: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_getres' [LINUX_2.6]
Hold on, hold on - what are these for?
vDSO
is for "virtual dynamic shared object" - the short answer is, it makes
some syscalls faster. The long answer, you can read on LWN.
160275: file=libdl.so.2 [0]; needed by ./load [0]
160275: find library=libdl.so.2 [0]; searching
160275: search cache=/etc/ld.so.cache
160275: trying file=/usr/lib/libdl.so.2
160275:
160275: file=libdl.so.2 [0]; generating link map
160275: dynamic: 0x00007f5be513dcf0 base: 0x00007f5be5139000 size: 0x0000000000005090
160275: entry: 0x00007f5be513a210 phdr: 0x00007f5be5139040 phnum: 11
Ah, it's loading libdl.so
- we asked for that! What's /etc/ld.so
though?
Well, libdl.so
is a dynamic library, so it's loaded at runtime, so the dynamic
linker has to find it first.
There's a config file at /etc/ld.so.conf
:
$ cat /etc/ld.so.conf
# Dynamic linker/loader configuration.
# See ld.so(8) and ldconfig(8) for details.
include /etc/ld.so.conf.d/*.conf
$ cat /etc/ld.so.conf.d/*.conf
/usr/lib/libfakeroot
/usr/lib32
/usr/lib/openmpi
And to avoid repeating lookups, there's a cache at /etc/ld.so.cache
:
$ xxd /etc/ld.so.cache | tail -60 | head
00030bb0: 4641 7564 696f 2e73 6f00 6c69 6246 4175 FAudio.so.libFAu
00030bc0: 6469 6f2e 736f 002f 7573 722f 6c69 6233 dio.so./usr/lib3
00030bd0: 322f 6c69 6246 4175 6469 6f2e 736f 006c 2/libFAudio.so.l
00030be0: 6962 4547 4c5f 6e76 6964 6961 2e73 6f2e ibEGL_nvidia.so.
00030bf0: 3000 2f75 7372 2f6c 6962 2f6c 6962 4547 0./usr/lib/libEG
00030c00: 4c5f 6e76 6964 6961 2e73 6f2e 3000 6c69 L_nvidia.so.0.li
00030c10: 6245 474c 5f6e 7669 6469 612e 736f 2e30 bEGL_nvidia.so.0
00030c20: 002f 7573 722f 6c69 6233 322f 6c69 6245 ./usr/lib32/libE
00030c30: 474c 5f6e 7669 6469 612e 736f 2e30 006c GL_nvidia.so.0.l
00030c40: 6962 4547 4c5f 6e76 6964 6961 2e73 6f00 ibEGL_nvidia.so.
Let's keep going through our LD_DEBUG=all
output:
160275: file=libc.so.6 [0]; needed by ./load [0]
160275: find library=libc.so.6 [0]; searching
160275: search cache=/etc/ld.so.cache
160275: trying file=/usr/lib/libc.so.6
160275:
160275: file=libc.so.6 [0]; generating link map
160275: dynamic: 0x00007f2d14b7a9c0 base: 0x00007f2d149b9000 size: 0x00000000001c82a0
160275: entry: 0x00007f2d149e1290 phdr: 0x00007f2d149b9040 phnum: 14
Same deal - but this time it's loading libc.so.6
.
160275: checking for version `GLIBC_2.2.5' in file /usr/lib/libdl.so.2 [0] required by file ./load [0]
160275: checking for version `GLIBC_2.2.5' in file /usr/lib/libc.so.6 [0] required by file ./load [0]
160275: checking for version `GLIBC_PRIVATE' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libdl.so.2 [0]
160275: checking for version `GLIBC_PRIVATE' in file /usr/lib/libc.so.6 [0] required by file /usr/lib/libdl.so.2 [0]
160275: checking for version `GLIBC_2.4' in file /usr/lib/libc.so.6 [0] required by file /usr/lib/libdl.so.2 [0]
160275: checking for version `GLIBC_2.2.5' in file /usr/lib/libc.so.6 [0] required by file /usr/lib/libdl.so.2 [0]
160275: checking for version `GLIBC_2.2.5' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libc.so.6 [0]
160275: checking for version `GLIBC_2.3' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libc.so.6 [0]
160275: checking for version `GLIBC_PRIVATE' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libc.so.6 [0]
Ah, there's those versions I was asking about earlier.
Yup. As you can see, there's a bunch of them. Also, I'm pretty sure "private" is not very semver, but let's not get distracted.
160275: Initial object scopes
160275: object=./load [0]
160275: scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
160275:
160275: object=linux-vdso.so.1 [0]
160275: scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
160275: scope 1: linux-vdso.so.1
160275:
160275: object=/usr/lib/libdl.so.2 [0]
160275: scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
160275:
160275: object=/usr/lib/libc.so.6 [0]
160275: scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
160275:
160275: object=/lib64/ld-linux-x86-64.so.2 [0]
160275: no scope
Here the dynamic linker is just telling us the order in which it'll look for symbols in various object files. Note that there's a specific order for each object file - they just happen to be mostly the same here.
For ./load
, it'll first look in ./load
, the executable we're loading, then
in libdl
, then in libc
, then in.. the dynamic linker itself.
Wait... it looks for symbols in ./load
? An executable?
So executables are also dynamic libraries?
Well... sort of. Let's come back to that later.
160275: relocation processing: /usr/lib/libc.so.6
160275: symbol=_res; lookup in file=./load [0]
160275: symbol=_res; lookup in file=/usr/lib/libdl.so.2 [0]
160275: symbol=_res; lookup in file=/usr/lib/libc.so.6 [0]
160275: binding file /usr/lib/libc.so.6 [0] to /usr/lib/libc.so.6 [0]: normal symbol `_res' [GLIBC_2.2.5]
160275: symbol=stderr; lookup in file=./load [0]
160275: binding file /usr/lib/libc.so.6 [0] to ./load [0]: normal symbol `stderr' [GLIBC_2.2.5]
160275: symbol=error_one_per_line; lookup in file=./load [0]
160275: symbol=error_one_per_line; lookup in file=/usr/lib/libdl.so.2 [0]
160275: symbol=error_one_per_line; lookup in file=/usr/lib/libc.so.6 [0]
160275: binding file /usr/lib/libc.so.6 [0] to /usr/lib/libc.so.6 [0]: normal symbol `error_one_per_line' [GLIBC_2.2.5]
(etc.)
Okay, there's a lot of these, so let's skip ahead. But you can see it looks
up in the order it determined earlier: first ./load
, then libdl
, then
libc
.
160275: calling init: /lib64/ld-linux-x86-64.so.2
160275:
160275:
160275: calling init: /usr/lib/libc.so.6
160275:
160275:
160275: calling init: /usr/lib/libdl.so.2
160275:
160275:
160275: initialize program: ./load
160275:
160275:
160275: transferring control: ./load
At this point it's done (well, done enough) loading dynamic libraries,
and initializing them, and it has transferred control to our program, ./load
.
160275: symbol=dlopen; lookup in file=./load [0]
160275: symbol=dlopen; lookup in file=/usr/lib/libdl.so.2 [0]
160275: binding file ./load [0] to /usr/lib/libdl.so.2 [0]: normal symbol `dlopen' [GLIBC_2.2.5]
Uhhh amos, why is it still doing symbol lookups? Wasn't it
done loading libdl.so
?
Ehhh, it was "done enough". Remember the RTLD_LAZY
flag we passed
to dlopen
? On my Linux distro, it's the default setting for the dynamic
loader.
Oh. And I suppose the "implementation-defined time" is now?
Correct.
160275: file=./main [0]; dynamically loaded by ./load [0]
160275: file=./main [0]; generating link map
Oohh it's actually loading ./main
!
Yes, because we called dlopen
! It even says that it's "dynamically loaded"
by ./load
, our test executable.
Well, what happens next? Any error messages?
Unfortunately, there arent. It just looks up fwrite
(which I'm assuming
is what our fprintf
call compiled to) so we can print our own error messages,
then calls finalizers and exits:
160275: symbol=fwrite; lookup in file=./load [0]
160275: symbol=fwrite; lookup in file=/usr/lib/libdl.so.2 [0]
160275: symbol=fwrite; lookup in file=/usr/lib/libc.so.6 [0]
160275: binding file ./load [0] to /usr/lib/libc.so.6 [0]: normal symbol `fwrite' [GLIBC_2.2.5]
failed to load library
160275:
160275: calling fini: ./load [0]
160275:
160275:
160275: calling fini: /usr/lib/libdl.so.2 [0]
160275:
So we don't know what went wrong?
Well... remember when we tried to make sure libdl.so
had
dlopen
and friends? We had to use nm's -D
flag
D for "dynamic", yes.
But when we found that main
provided the greet
symbol, we
didn't use -D
.
And if we do...
$ nm -D main
w __cxa_finalize@@GLIBC_2.2.5
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
U __libc_start_main@@GLIBC_2.2.5
U printf@@GLIBC_2.2.5
...there's no sign of greet
.
Ohh. So for main
, greet
is in one of the symbol tables, but not the
dynamic symbol table.
Correct!
In fact, if we strip main
, all those symbols are gone.
Before:
$ nm main | grep " T "
00000000000011f8 T _fini
0000000000001150 T greet
00000000000011f0 T __libc_csu_fini
0000000000001180 T __libc_csu_init
0000000000001139 T main
0000000000001040 T _start
$ stat -c '%s bytes' main
16664 bytes
After:
$ strip main
$ nm main | grep " T "
nm: main: no symbols
$ stat -c '%s bytes' main
14328 bytes
But it still has dynamic symbols right? Even after stripping?
Yes, it needs printf
!
$ nm -D main
w __cxa_finalize@@GLIBC_2.2.5
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
U __libc_start_main@@GLIBC_2.2.5
U printf@@GLIBC_2.2.5
Okay, so main
has a dynamic symbol table, it just doesn't export anything.
Can we make it somehow both an executable and a dynamic library?
Bear, I'm so glad you asked. Yes we can.
Let's do it, just for fun:
#include <unistd.h>
#include <stdio.h>
// This tells GCC to make a section named `.interp` and store
// `/lib64/ld-linux-x86-64.so.2` (the path of the dynamic linker) in it.
//
// (Normally it would do it itself, but since we're going to be using the
// `-shared` flag, it won't.)
const char interpreter[] __attribute__((section(".interp"))) = "/lib64/ld-linux-x86-64.so.2";
void greet(const char *name) {
printf("Hello, %s!\n", name);
}
// Normally, we'd link with an object file that has its own entry point,
// and *then* calls `main`, but since we're using the `-shared` flag, we're
// linking to *another* object file, and we need to provide our own entry point.
//
// Unlike main, this one does not return an `int`, and we can never return from
// it, we need to call `_exit` or we'll crash.
void entry() {
greet("rain");
_exit(0);
}
And now... we make a dynamic library / executable hybrid:
$ gcc -Wall -shared main.c -o libmain.so -Wl,-soname,libmain.so -Wl,-e,entry
$ file libmain.so
libmain.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=460bf95f9cd22afa074399512bd9290c20b552ff, not stripped
Cool bear's hot tip
-Wl,-some-option
is how we tell GCC to pass linker options. -Wl,-foo
will
pass -foo
to GNU ld. -Wl,-foo,bar
will pass -foo=bar
.
-soname
isn't technically required for this demo to work, but it's a thing, so we might as well set it.
As for -e=entry
, that one is required, otherwise we won't be able to run
it as an executable. Remember, we're bringing our own entry point!
And it works as an executable:
$ ./libmain.so
Hello, rain!
And as a library:
// in `load.c`
int main(void) {
// was "main"
void *lib = dlopen("./libmain.so", RTLD_LAZY);
// etc.
}
$ gcc -Wall load.c -o load -ldl
$ ./load
Hello, venus!
Whoa, that's neat! Can we take a look at LD_DEBUG
output for this run?
Sure, let's g-
...but this time, can we filter it out a little, so it fits in one or two screens?
Okay, sure.
When LD_DEBUG
is set, the dynamic linker (ld-linux-x86-64.so.2
, which is
also an executable / dynamic library hybrid) outputs debug information to
the "standard error" (stderr), which has file descriptor number 2, so - if we want
to filter it, we'll need to redirect "standard error" to "standard output"
with 2>&1
- let's try it out:
$ LD_DEBUG=all ./load 2>&1 | grep 'strcpy'
172425: symbol=strcpy; lookup in file=./load [0]
172425: symbol=strcpy; lookup in file=/usr/lib/libdl.so.2 [0]
172425: symbol=strcpy; lookup in file=/usr/lib/libc.so.6 [0]
172425: binding file /usr/lib/libdl.so.2 [0] to /usr/lib/libc.so.6 [0]: normal symbol `strcpy' [GLIBC_2.2.5]
Yeah, that works!
Next up - all
is a bit verbose, let's try setting LD_DEBUG
to files
instead. Also, let's pipe everything into wc -l
, to count lines
$ LD_DEBUG=all ./load 2>&1 | wc -l
666
$ LD_DEBUG=files ./load 2>&1 | wc -l
50
Okay, 50 lines! That's much more reasonable:
$ LD_DEBUG=files ./load 2>&1 | head -10
173292:
173292: file=libdl.so.2 [0]; needed by ./load [0]
173292: file=libdl.so.2 [0]; generating link map
173292: dynamic: 0x00007f3a1df6fcf0 base: 0x00007f3a1df6b000 size: 0x0000000000005090
173292: entry: 0x00007f3a1df6c210 phdr: 0x00007f3a1df6b040 phnum: 11
173292:
173292:
173292: file=libc.so.6 [0]; needed by ./load [0]
173292: file=libc.so.6 [0]; generating link map
173292: dynamic: 0x00007f3a1df639c0 base: 0x00007f3a1dda2000 size: 0x00000000001c82a0
Mhh having the output prefixed by the PID (process identifier, here 172709
) is a bit
annoying, we can use sed
(the "Stream EDitor") to fix that.
By the power vested in me by regular expressions, I filter thee:
$ LD_DEBUG=files ./load 2>&1 | sed -E -e 's/^[[:blank:]]+[[:digit:]]+:[[:blank:]]*//' | head
file=libdl.so.2 [0]; needed by ./load [0]
file=libdl.so.2 [0]; generating link map
dynamic: 0x00007fe98d502cf0 base: 0x00007fe98d4fe000 size: 0x0000000000005090
entry: 0x00007fe98d4ff210 phdr: 0x00007fe98d4fe040 phnum: 11
file=libc.so.6 [0]; needed by ./load [0]
file=libc.so.6 [0]; generating link map
dynamic: 0x00007fe98d4f69c0 base: 0x00007fe98d335000 size: 0x00000000001c82a0
Let's break that down. The -E
flag enables extended regular expressions. My advice?
Don't bother learning non-extended regular expressions.
-e
specifies a script for sed
to run. Here, our script has the s/pattern/replacement/
command, which substitutes pattern
with replacement
.
You can probably make sense of the pattern by just using a cheat sheet, but here it is:
^
the beginning of a line[[:blank:]]+
one or more space or tab characters[[:digit:]]+
one or more decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9):
a literal colon character[[:blank:]]*
zero or more space or tab characters
And our replacement is "". The empty string.
Hey, silly question - why are we using '
(a single quote) around sed
scripts? Don't you usually use double quotes?
Well, I don't want the shell to expand whatever is inside. See for example:
$ echo "$(whoami)"
amos
Compared to:
$ echo '$(whoami)'
$(whoami)
Since there might be a bunch of strange characters, that are meaningful to my shell, I don't want my shell to interpolate any of it, so, single quotes.
Got it, thanks.
Cool bear's hot tip
Note that the above sed command could also be achieved with a simple cut -f 2
.
But where's the fun in that?
So.
We've filtered out a lot of noise, but we're still getting those blank lines - we can use
another sed command to filter those out: /pattern/d
- where the d
stands for "delete".
Our pattern will just be ^$
- it matches the start of a line and the end of a line, with
nothing in between, so, only empty lines will (should?) match.
$ LD_DEBUG=files ./load 2>&1 | sed -E -e 's/^[[:blank:]]+[[:digit:]]+:[[:blank:]]*//' -e '/^$/d'
file=libdl.so.2 [0]; needed by ./load [0]
file=libdl.so.2 [0]; generating link map
dynamic: 0x00007fc870f73cf0 base: 0x00007fc870f6f000 size: 0x0000000000005090
entry: 0x00007fc870f70210 phdr: 0x00007fc870f6f040 phnum: 11
file=libc.so.6 [0]; needed by ./load [0]
file=libc.so.6 [0]; generating link map
dynamic: 0x00007fc870f679c0 base: 0x00007fc870da6000 size: 0x00000000001c82a0
entry: 0x00007fc870dce290 phdr: 0x00007fc870da6040 phnum: 14
calling init: /lib64/ld-linux-x86-64.so.2
calling init: /usr/lib/libc.so.6
calling init: /usr/lib/libdl.so.2
initialize program: ./load
transferring control: ./load
Here comes the good stuff!
file=./libmain.so [0]; dynamically loaded by ./load [0]
file=./libmain.so [0]; generating link map
dynamic: 0x00007fc870fa6e10 base: 0x00007fc870fa3000 size: 0x0000000000004040
entry: 0x00007fc870fa4150 phdr: 0x00007fc870fa3040 phnum: 11
calling init: ./libmain.so
opening file=./libmain.so [0]; direct_opencount=1
calling fini: ./libmain.so [0]
file=./libmain.so [0]; destroying link map
calling fini: ./load [0]
calling fini: /usr/lib/libdl.so.2 [0]
Hello, venus!
So, the output is out of order here a little bit - the stderr (standard
error) and stdout (standard output) streams are mixed, so printing Hello, venus
actually happens before finalizers are called.
Alright, and what if we wanted a regular dynamic library, one that isn't also an executable?
That's much simpler. We don't need an entry point, we don't need to use a
funky GCC attribute to add an .interp
section, and we only need the
one linker flag.
// in `greet.c`
#include <stdio.h>
void greet(const char *name) {
printf("Hello, %s!\n", name);
}
Do we need to do anything special to export greet
?
No we don't! In C99, by default, functions have external linkage,
so we're all good. If we wanted to not export it, we'd use the
static
keyword, to ask for internal linkage.
And then just use -shared
, and specify an output name of libsomething.so
:
$ gcc -Wall -shared greet.c -o libgreet.so
And, let's just adjust load.c
to load libgreet.so
(it was loading libmain.so
previously):
#include <dlfcn.h>
#include <stdio.h>
typedef void (*greet_t)(const char *name);
int main(void) {
// this was `./libmain.so`
void *lib = dlopen("./libgreet.so", RTLD_LAZY);
if (!lib) {
fprintf(stderr, "failed to load library\n");
return 1;
}
greet_t greet = (greet_t) dlsym(lib, "greet");
if (!lib) {
fprintf(stderr, "could not look up symbol 'greet'\n");
return 1;
}
greet("venus");
dlclose(lib);
return 0;
}
Okay! I think that now, 34 minutes in, we know what a dylib is.
More or less.
And now, some Rust
Let's write some Rust.
$ cargo new greet-rs
Created binary (application) `greet-rs` package
// in `src/main.rs`
fn main() {
greet("fresh coffee");
}
fn greet(name: &str) {
println!("Hello, {}", name);
}
$ cargo run -q
Hello, fresh coffee
It sure greets. But how does it actually work? Is it interpreted? Is it compiled?
I don't think Rust has an interpreter..
Well, actually... how do you think const-eval works?
M.. magic?
No, M is for Miri. It interprets mid-level intermediate representation, and voilà: compile-time evaluation.
I thought Miri was used to detect undefined behavior?
That too! It's a neat tool.
In that case, though, our code is definitely being compiled.
cargo run
does two things: first, cargo build
, then, run the resulting executable.
$ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ./target/debug/greet-rs
Hello, fresh coffee
Now that we have a Linux executable, we can poke at it!
For example, we can look at its symbols:
$ nm ./target/debug/greet-rs | grep " T "
000000000002ccc0 T __divti3
000000000002d0c8 T _fini
000000000002d0c0 T __libc_csu_fini
000000000002d050 T __libc_csu_init
00000000000053f0 T main
00000000000111f0 T __rdos_backtrace_create_state
0000000000010e60 T __rdos_backtrace_pcinfo
00000000000110e0 T __rdos_backtrace_syminfo
0000000000005580 T __rust_alloc
000000000000e470 T rust_begin_unwind
(etc.)
Mh, there's a lot of those. In my version, there's 188 T
symbols.
We can also look at the dynamic symbols:
$ nm -D ./target/debug/greet-rs
U abort@@GLIBC_2.2.5
U bcmp@@GLIBC_2.2.5
U bsearch@@GLIBC_2.2.5
U close@@GLIBC_2.2.5
w __cxa_finalize@@GLIBC_2.2.5
w __cxa_thread_atexit_impl@@GLIBC_2.18
U dladdr@@GLIBC_2.2.5
U dl_iterate_phdr@@GLIBC_2.2.5
U __errno_location@@GLIBC_2.2.5
U free@@GLIBC_2.2.5
This time, there's only 79 of them. But, see, it's not that different from our C executable. Since our Rust executable uses the standard library (it's not no_std), it also uses the C library. Here, it's glibc.
But does it export anything?
$ nm -D --defined-only ./target/debug/greet-rs
Mh nope. Does that command even work, though? Is this thing on?
$ nm -D --defined-only /usr/lib/libdl.so
0000000000001dc0 T dladdr@@GLIBC_2.2.5
0000000000001df0 T dladdr1@@GLIBC_2.3.3
0000000000001450 T dlclose@@GLIBC_2.2.5
0000000000001860 T dlerror@@GLIBC_2.2.5
0000000000005040 B _dlfcn_hook@@GLIBC_PRIVATE
0000000000001f20 T dlinfo@@GLIBC_2.3.3
00000000000020b0 T dlmopen@@GLIBC_2.3.4
0000000000001390 T dlopen@@GLIBC_2.2.5
00000000000014c0 T dlsym@@GLIBC_2.2.5
00000000000015c0 W dlvsym@@GLIBC_2.2.5
(etc.)
Okay, so it's not a dynamic library.
Correct!
But can we use a dynamic library from Rust?
Sure we can! That's how we get malloc
, free
, etc.
But how?
That's a fair question - after all, if our test executable load
uses dlopen
:
$ ltrace -l 'libdl*' ./load
load->dlopen("./libmain.so", 1) = 0x55cf98c282c0
load->dlsym(0x55cf98c282c0, "greet") = 0x7fe0d4074129
Hello, venus!
load->dlclose(0x55cf98c282c0) = 0
+++ exited (status 0) +++
Our greet-rs
executable doesn't:
$ ltrace -l 'libdl*' ./greet-rs/target/debug/greet-rs
Hello, fresh coffee
+++ exited (status 0) +++
Exactly, so, how does it load them?
Well... it doesn't. The dynamic linker does it, before our program even starts.
We can use ldd
to find out the direct dependencies of an ELF file. Our
executable is an ELF file. Dynamic libraries on this system are ELF files
too. Even our .o
files have been ELF files all along.
Cool bear's hot tip
This wasn't always the case on Linux (or MINIX, or System V).
In the times of yore, there was a.out. There was stabbing involved.
$ ldd ./greet-rs/target/debug/greet-rs
linux-vdso.so.1 (0x00007ffc911e1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f17479b1000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f174798f000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f1747975000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f17477ac000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f1747a27000)
But I don't love ldd
.
ldd
is just a bash script!
$ head -3 $(which ldd)
#! /usr/bin/bash
# Copyright (C) 1996-2020 Free Software Foundation, Inc.
# This file is part of the GNU C Library.
And how do we feel about bash in this house?
Conflicted.
In fact, ldd
just sets an environment variable and calls the dynamic linker
instead - which, as we mentioned earlier, is both a dynamic library and an
executable:
$ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 ./greet-rs/target/debug/greet-rs
linux-vdso.so.1 (0x00007ffd5d1c3000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f3c953da000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f3c953b8000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f3c9539e000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f3c951d5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3c95450000)
The main reason I don't like that is that running ldd
on an executable actually loads it,
and, if it's a malicious binary, this can result in arbitrary code execution.
Another reason I don't love ldd
is that its output is all flat:
$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffcbd1ef000)
libreadline.so.8 => /usr/lib/libreadline.so.8 (0x00007fd0dbabf000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fd0dbab9000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fd0db8f0000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fd0db87f000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fd0dbc35000)
And I like trees:
$ lddtree /bin/bash
/bin/bash (interpreter => /lib64/ld-linux-x86-64.so.2)
libreadline.so.8 => /usr/lib/libreadline.so.8
libncursesw.so.6 => /usr/lib/libncursesw.so.6
libdl.so.2 => /usr/lib/libdl.so.2
libc.so.6 => /usr/lib/libc.so.6
Fancy. But Amos... isn't lddtree
also a bash script?
It is! But it doesn't use ld.so
, it uses scanelf
, or readelf
and objdump
.
What if I wanted something that isn't written in bash? For personal reasons?
There's also a C++11 thing and I also did a Go thing, there's plenty of poison from which to pick yours.
Which doesn't tell us how the dynamic linker knows which libraries
to load. Reading the bash source of ldd
is especially unhelpful, since
it just lets ld.so
do all the hard work.
However, if we use readelf
...
$ readelf --dynamic ./greet-rs/target/debug/greet-rs | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
...we can see that the names of the dynamic libraries we need are right there in the dynamic section of our ELF file.
But here's the thing - how did that happen? I don't remember asking for any dynamic library, and yet here they are.
In other words, what do we write in Rust, so that our executable requires another dynamic library?
Well, we write this:
use std::{ffi::CString, os::raw::c_char};
#[link(name = "greet")]
extern "C" {
fn greet(name: *const c_char);
}
fn main() {
let name = CString::new("fresh coffee").unwrap();
unsafe {
greet(name.as_ptr());
}
}
Cool bear's hot tip
There is a gotcha here - if you write the above code like that instead:
let name = CString::new("fresh coffee").unwrap().as_ptr();
unsafe {
greet(name);
}
It doesn't work. Here's the equivalent of the incorrect version:
let name = {
// builds a new CString
let cstring = CString::new("fresh coffee").unwrap();
// derives a raw pointer (`*const c_char`) from the CString.
// since it's not a reference, it doesn't have a lifetime, nothing
// in the type system links it to `cstring`
let ptr = cstring.as_ptr();
// here, `cstring` goes out of scope and is freed, so `ptr` is now
// dangling
ptr
};
unsafe {
// `name` is what `ptr` was in our inner scope, and it's dangling,
// so this will crash and/or do very naughty things.
greet(name);
}
Esteban wants me to tell you that this is a big gotcha because the
Rust compiler doesn't catch this at all, at least not yet, so you have to be careful
not to make this mistake. It's a good example of the dangers of unsafe
.
Note also that this is not a problem with the cstr
crate, which returns a
&'static CStr
, and which we use further down.
But that doesn't quite work, because...
$ cargo build
Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
error: linking with `cc` failed: exit code: 1
|
= note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" (etc.) "-Wl,-Bdynamic" "-ldl" "-lrt" "-lpthread" "-lgcc_s" "-lc" "-lm" "-lrt" "-lpthread" "-lutil" "-ldl" "-lutil"
= note: /usr/bin/ld: cannot find -lgreet
collect2: error: ld returned 1 exit status
error: aborting due to previous error
error: could not compile `greet-rs`.
There are interesting things! Going on! In this error message!
Yes, I see some -Wl,-something
command-line flags there. Is it
using the same convention to pass linker flags?
It is!
Is it using... the same linker? GNU ld?
And our libgreet.so
from earlier is definitely not in any of the default
library paths.
So, we have a couple options at our disposal. We could copy libgreet.so
to, say, /usr/lib
. Although it would immediately make everything work,
this requires root privilege, so we'll try not to do it.
We could set the RUSTFLAGS
environment variable when building our binary:
$ RUSTFLAGS="-L ${PWD}/.." cargo build
Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
Cool bear's hot tip
PWD
is an environment variable set to the "present working directory",
also called "current working directory".
In bash and zsh, variables like $PWD
are expanded - but it's often a good
idea to enclose the variable name in brackets, in case it's followed by
other characters that are valid in identifiers.
To avoid this:
$ echo "$PWDOOPS"
We do this:
$ echo "${PWD}OOPS"
/home/amos/ftl/greet/greet-rsOOPS
Finally, -L ..
would work just as well, but it's also a good idea to pass
absolute paths, when specifying search paths. Otherwise, if one of the
tools involved passes that argument to another tool, and that other tool
changes the current directory, then our relative path is now incorrect.
So, setting RUSTFLAGS
works. Remembering to set it every time we want
to compile is no fun, though.
So we can make a build script instead! It's in build.rs
, not in the src/
folder, but next to the src/
folder:
// in `build.rs`
use std::path::PathBuf;
fn main() {
let manifest_dir =
PathBuf::from(std::env::var_os("CARGO_MANIFEST_DIR").expect("manifest dir should be set"));
let lib_dir = manifest_dir
.parent()
.expect("manifest dir should have a parent");
println!("cargo:rustc-link-search={}", lib_dir.display());
}
And now we can just cargo build
away:
$ cargo build
Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
Finished dev [unoptimized + debuginfo] target(s) in 0.16s
But will it run?
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory
No, it won't!
But I thought... didn't we... didn't we specify -L
so the
linker could find libgreet.so
?
Yes, the static linker (ld
) found it. But the dynamic linker (ld.so
)
also needs to find it, at runtime.
How do we achieve that? Are there more search paths?
There are more search paths.
$ LD_LIBRARY_PATH="${PWD}/.." ./target/debug/greet-rs
Hello, fresh coffee!
Hooray!
This is also a hassle, though. We probably don't want to specify the library
path every time we run greet-rs
.
As usual, we have a couple options available. Remember /etc/ld.so
? There are
config files in there. We could just make our own:
# in /etc/ld.so.conf/greet.conf
# change this unless your name is also amos, in which
# case, welcome to the club.
/home/amos/ftl/greet
And now, everything w-
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory
Wait, wasn't /etc/ld.so
cached?
Oh, right.
$ sudo ldconfig
Password: hunter2
That should do it.
$ ./target/debug/greet-rs
Hello, fresh coffee!
Hurray! I wonder though: is it such a good idea to modify system configuration just for that?
It probably isn't. Which is why we're going to undo our changes.
$ sudo rm /etc/ld.so.conf.d/greet.conf
$ sudo ldconfig
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory
Now we're back to square one hundred.
The good news is: there is a thing in ELF files that tells the dynamic linker "hey by the way look here for libraries" - it's called "RPATH", or "RUNPATH", actually there's a bunch of these with subtle differences, oh no.
The bad news is: short of creating a .cargo/config
file, or setting the
RUSTFLAGS
environment variable, there's no great way to set the RPATH in
Rust right now. There's an open issue, though. Feel free to go ahead and contribute there.
Me? I have an article to finish. And the other good news here is that you can set an executable's RPATH after the fact. You can patch it.
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory
$ readelf -d ./target/debug/greet-rs | grep RUNPATH
$ patchelf --set-rpath "${PWD}/.." ./target/debug/greet-rs
$ ./target/debug/greet-rs
Hello, fresh coffee!
$ readelf -d ./target/debug/greet-rs | grep RUNPATH
0x000000000000001d (RUNPATH) Library runpath: [/home/amos/ftl/greet/greet-rs/..]
Heck, we can even make the RPATH relative to our executable's location.
$ patchelf --set-rpath '$ORIGIN/../../..' ./target/debug/greet-rs
Oh, single quotes again! That way $ORIGIN
doesn't expanded by the shell,
since it's not an environment variable, it's special syntax just for the
dynamic linker.
Yes! And we can make sure we got it right with readelf
:
$ readelf -d ./target/debug/greet-rs | grep RUNPATH
0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/../../..]
$ ./target/debug/greet-rs
Hello, fresh coffee!
Okay, the hard part is over. Kind of.
The thing is, we don't really want to "link against" libgreet.so
.
We want to be able to dynamically reload it. So first, we have to
dynamically load it. With dlopen
.
But we can take all that knowledge we've just gained and use, like, the easy 10%, because the rest is irrelevant, you'll see why in a minute.
We've just seen how to use functions from a dynamic library - and
dlopen
and friends are in a dynamic library, libdl.so
.
So let's just do that:
use std::{ffi::c_void, ffi::CString, os::raw::c_char, os::raw::c_int};
#[link(name = "dl")]
extern "C" {
fn dlopen(path: *const c_char, flags: c_int) -> *const c_void;
fn dlsym(handle: *const c_void, name: *const c_char) -> *const c_void;
fn dlclose(handle: *const c_void);
}
// had to look that one up in `dlfcn.h`
// in C, it's a #define. in Rust, it's a proper constant
pub const RTLD_LAZY: c_int = 0x00001;
fn main() {
let lib_name = CString::new("../libgreet.so").unwrap();
let lib = unsafe { dlopen(lib_name.as_ptr(), RTLD_LAZY) };
if lib.is_null() {
panic!("could not open library");
}
let greet_name = CString::new("greet").unwrap();
let greet = unsafe { dlsym(lib, greet_name.as_ptr()) };
type Greet = unsafe extern "C" fn(name: *const c_char);
use std::mem::transmute;
let greet: Greet = unsafe { transmute(greet) };
let name = CString::new("fresh coffee").unwrap();
unsafe {
greet(name.as_ptr());
}
unsafe {
dlclose(lib);
}
}
Cool bear's hot tip
On Windows, you'd normally use LoadLibrary instead of
dlopen
, unless you used some sort of compatibility layer, like Cygwin.
Finally, we can remove our Cargo build script (build.rs
), and
we won't have to use patchelf
either, since we're giving a full
path (not just a name) to dlopen
.
$ cargo build -q
$ ./target/debug/greet-rs
Hello, fresh coffee!
🎉🎉🎉
Okay, that's a bunch of unsafe code.
Isn't there, you know, a crate for that?
Sure, let's go crate shopping.
Ooh, libloading looks cool, let's give it a shot:
$ cargo add libloading
Adding libloading v0.6.3 to dependencies
use std::{ffi::CString, os::raw::c_char};
use libloading::{Library, Symbol};
fn main() {
let lib = Library::new("../libgreet.so").unwrap();
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet").unwrap();
let name = CString::new("fresh coffee").unwrap();
greet(name.as_ptr());
}
}
Mhhh. unwrap
salad.
Alright, sure, let's have main
return a Result
instead,
so we can use ?
to propagate errors instead.
use std::{error::Error, ffi::CString, os::raw::c_char};
use libloading::{Library, Symbol};
fn main() -> Result<(), Box<dyn Error>> {
let lib = Library::new("../libgreet.so")?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
let name = CString::new("fresh coffee")?;
greet(name.as_ptr());
}
Ok(())
}
Better. But why are we building an instance of CString
?
Couldn't we do that at compile-time? Isn't there.. a crate.. that lets us do C-style strings?
Yeah, yes, there's a crate for that, okay, sure.
$ cargo add cstr
Adding cstr v0.2.2 to dependencies
use cstr::cstr;
use std::{error::Error, os::raw::c_char};
use libloading::{Library, Symbol};
fn main() -> Result<(), Box<dyn Error>> {
let lib = Library::new("../libgreet.so")?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
greet(cstr!("rust macros").as_ptr());
}
Ok(())
}
Now this I like. Very clean.
Yeah, libloading
is cool!
Note that it'll also work on macOS, on which dynamic libraries are actually
.dylib
, and on Windows, where you have .dll
files.
Let's give it a try:
$ cargo build -q
$ ./target/debug/greet-rs
Hello, rust macros!
Works great.
A Rust dynamic library
...but our libgreet.so
is still C!
Can't we use Rust for that too?
Let's try it:
$ cargo new --lib libgreet-rs
Created library `libgreet-rs` package
$ cd libgreet-rs/
You sure about that naming convention buddy?
Not really, no
Now, if we want our Rust library to be a drop-in replacement for the C library, we need to match that function signature:
void greet(const char *name);
The Rust equivalent would be *const c_char
// in `libgreet-rs/src/lib.rs`
use std::{ffi::CStr, os::raw::c_char};
fn greet(name: *const c_char) {
let cstr = unsafe { CStr::from_ptr(name) };
println!("Hello, {}!", cstr.to_str().unwrap());
}
$ cargo build
Compiling libgreet-rs v0.1.0 (/home/amos/ftl/greet/libgreet-rs)
warning: function is never used: `greet`
--> src/lib.rs:3:4
|
3 | fn greet(name: *const c_char) {
| ^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: 1 warning emitted
Finished dev [unoptimized + debuginfo] target(s) in 0.06s
Uh oh... an unused function. Do we need to ask for external linkage?
Oh right!
pub fn greet(name: *const c_char) {
// etc.
}
And also... maybe we should specify a calling convention?
Right again - since we're replacing a C library, let's make
our function extern "C"
. And also, we're dealing with raw pointers,
it's also unsafe
.
And clippy
is telling me to document why it's unsafe,
so let's do it.
/// # Safety
/// Pointer must be valid, and point to a null-terminated
/// string. What happens otherwise is UB.
pub unsafe extern "C" fn greet(name: *const c_char) {
let cstr = CStr::from_ptr(name);
println!("Hello, {}!", cstr.to_str().unwrap());
}
Is that it? Are we done?
Let's see... if we compile that library, what do we
have in our target/debug/
folder?
$ cargo build -q
$ ls ./target/debug/
build deps examples incremental liblibgreet_rs.d liblibgreet_rs.rlib
Bwahaha liblib.
...yeah. Let's fix that real quick.
# in libgreet-rs/Cargo.toml
[package]
name = "greet" # was "libgreet-rs"
$ cargo clean && cargo build -q
$ ls ./target/debug/
build deps examples incremental libgreet.d libgreet.rlib
Better. So, we don't have an .so
file. We don't even have
an .a
file! So it's not a typical static library either.
What is it?
$ file ./target/debug/libgreet.rlib
./target/debug/libgreet.rlib: current ar archive
Oh, a "GNU ar" archive!
readelf
can read those:
$ readelf --symbols ./target/debug/libgreet.rlib | tail -10
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS 3rux23h9i3obhoz1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5
4: 0000000000000000 0 SECTION LOCAL DEFAULT 6
5: 0000000000000000 0 SECTION LOCAL DEFAULT 7
6: 0000000000000000 0 SECTION LOCAL DEFAULT 18
7: 0000000000000000 72 FUNC GLOBAL HIDDEN 3 _ZN4core3fmt9Arg[...]
8: 0000000000000000 34 OBJECT WEAK DEFAULT 4 __rustc_debug_gd[...]
File: ./target/debug/libgreet.rlib(lib.rmeta)
And so can nm
:
$ nm ./target/debug/libgreet.rlib | head
nm: lib.rmeta: file format not recognized
greet-01dfe44a33984d16.197vz32lntcvm24o.rcgu.o:
0000000000000000 V __rustc_debug_gdb_scripts_section__
0000000000000000 T _ZN4core3ptr13drop_in_place17h43462a34d923c292E
greet-01dfe44a33984d16.1ilnatflm2f12z98.rcgu.o:
0000000000000000 V DW.ref.rust_eh_personality
0000000000000000 r GCC_except_table0
0000000000000000 V __rustc_debug_gdb_scripts_section__
U rust_eh_personality
Apparently lib.rmeta
is not an ELF file. From the file name,
I'd say it's metadata. Let's try extracting it using ar x
(for eXtract):
$ ar x ./target/debug/libgreet.rlib lib.rmeta --output /tmp
$ file /tmp/lib.rmeta
/tmp/lib.rmeta: data
$ xxd /tmp/lib.rmeta | head
00000000: 7275 7374 0000 0005 0000 05a8 2372 7573 rust........#rus
00000010: 7463 2031 2e34 362e 3020 2830 3434 3838 tc 1.46.0 (04488
00000020: 6166 6533 2032 3032 302d 3038 2d32 3429 afe3 2020-08-24)
00000030: 0373 7464 f2f2 a5a4 fdaf af90 ae01 0002 .std............
00000040: 112d 6366 3066 3333 6166 3361 3930 3137 .-cf0f33af3a9017
00000050: 3738 0463 6f72 658a e799 f18c fcbf 85d2 78.core.........
00000060: 0100 0211 2d39 3734 3937 6332 3666 6464 ....-97497c26fdd
00000070: 6237 3838 3211 636f 6d70 696c 6572 5f62 b7882.compiler_b
00000080: 7569 6c74 696e 73af b98f f482 e282 db47 uiltins........G
00000090: 0002 112d 6631 6139 6438 6334 3433 6532 ...-f1a9d8c443e2
Mhhh, binary format shenanigans. Is there a crate to parse that?
Of course - but it's inside rustc's codebase.
So, that's all well and good, but it's not yet a drop-in replacement for our C dynamic library.
Turns out, there's a bunch of "crate types", which we can set
with the lib.crate-type
attribute in our Cargo manifest, Cargo.toml
.
bin
is for executables, it's the type of our greet-rs
project.
What we have right now is lib
.
Then there's dylib
:
# in libgreet-rs/Cargo.toml
[lib]
crate-type = ["dylib"]
$ cargo clean && cargo build -q
$ ls target/debug/
build deps examples incremental libgreet.d libgreet.so
Eyyy, we got an .so
file!
We sure did! Let's try loading it!
// in `greet-rs/src/main.rs`
fn main() -> Result<(), Box<dyn Error>> {
// new path:
let lib = Library::new("../libgreet-rs/target/debug/libgreet.so")?;
// (cut)
}
$ cargo run -q
Error: DlSym { desc: "../libgreet-rs/target/debug/libgreet.so: undefined symbol: greet" }
Awwwwwwww.
Is it still not exported? I thought we made it pub
and everything?
I don't know, let's ask nm
.
$ nm ../libgreet-rs/target/debug/libgreet.so | grep greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1
000000000004b2a0 T _ZN5greet5greet17h1155cd3fae6e8167E
That's not very readable... it's as if the output is mangled somehow?
Let's read nm
's man page:
-C
--demangle[=style]
Decode (demangle) low-level symbol names into user-level names.
Besides removing any initial underscore prepended by the system,
this makes C++ function names readable. Different compilers have
different mangling styles. The optional demangling style argument
can be used to choose an appropriate demangling style for your
compiler.
Sure, let's try it:
$ nm --demangle ../libgreet-rs/target/debug/libgreet.so | grep greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1
000000000004b2a0 T greet::greet
Ohhh there's namespacing going on.
I think I've seen this before... try adding
#[no_mangle]
on greet
?
// in `libgreet-rs/src/lib.rs`
use std::{ffi::CStr, os::raw::c_char};
// new!
#[no_mangle]
pub unsafe extern "C" fn greet(name: *const c_char) {
let cstr = CStr::from_ptr(name);
println!("Hello, {}!", cstr.to_str().unwrap());
}
$ (cd ../libgreet-rs && cargo build -q)
$ nm --demangle ../libgreet-rs/target/debug/libgreet.so | grep greet
000000000004b2a0 T greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1
Better! Let's try nm
again, without --demangle
, to make sure:
$ nm ../libgreet-rs/target/debug/libgreet.so | grep greet
000000000004b2a0 T greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1
Wonderful. If my calculations are correct..
$ cargo run -q
Hello, rust macros!
YES!
Nicely done. Does it also work when loaded from C?
Only one way to find out.
// in `load.c`
#include <dlfcn.h>
#include <stdio.h>
typedef void (*greet_t)(const char *name);
int main(void) {
// new path:
void *lib = dlopen("./libgreet-rs/target/debug/libgreet.so", RTLD_LAZY);
// the rest is as before
}
$ gcc -Wall load.c -o load -ldl
$ ./load
Hello, venus!
Seems to work well.
However, using crate-type=dylib
is discouraged, in favor
of crate-type=cdylib
(notice the leading "c").
Let's see why:
$ cargo clean && cargo build --release -q
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 3.8M Sep 16 16:24 ./target/release/libgreet.so
$ strip ./target/release/libgreet.so
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 3.8M Sep 16 16:24 ./target/release/libgreet.so
$ nm -D ./target/release/libgreet.so | grep " T " | wc -l
2084
Now with cdylib
:
[lib]
crate-type = ["cdylib"]
$ cargo clean && cargo build --release -q
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 2.7M Sep 16 16:25 ./target/release/libgreet.so
$ strip ./target/release/libgreet.so
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 219K Sep 16 16:25 ./target/release/libgreet.so
$ nm -D ./target/release/libgreet.so | grep " T " | wc -l
2
$ nm -D ./target/release/libgreet.so | grep " T "
0000000000004260 T greet
000000000000cd70 T rust_eh_personality
Oooh. Exports only the symbols we care about and it's way smaller? Sign me the heck up.
Same! And it still loads from C!
// in `load.c`
int main(void) {
// was target/debug, now target/release
void *lib = dlopen("./libgreet-rs/target/release/libgreet.so", RTLD_LAZY);
}
$ gcc -Wall load.c -o load -ldl
$ ./load
Hello, venus!
And from Rust!
// in `greet-rs/src/main.rs`
fn main() -> Result<(), Box<dyn Error>> {
// was target/debug, now target/release
let lib = Library::new("../libgreet-rs/target/release/libgreet.so")?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
greet(cstr!("thin library").as_ptr());
}
Ok(())
}
$ cargo run -q
Hello, thin library!
And now, some reloading
So far, we've only ever loaded libraries once. But can we reload them?
How do we even unload a library with libloading
?
Well, libdl.so
had a dlclose
function.
Does libloading
even close libraries? Ever?
Let's go hunt for info:
$ lddtree ./target/debug/greet-rs
./target/debug/greet-rs (interpreter => /lib64/ld-linux-x86-64.so.2)
libdl.so.2 => /usr/lib/libdl.so.2
libpthread.so.0 => /usr/lib/libpthread.so.0
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.6 => /usr/lib/libc.so.6
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
Oh, greet-rs depends on libdl.so
.
Maybe we can use ltrace
to see if it ever calls
dlclose
?
$ ltrace ./target/debug/greet-rs
Hello, thin library!
+++ exited (status 0) +++
Oh.
Let's debug our program. I wanted to use LLDB for once (the LLVM debugger), but fate has decided against it (it's broken for Rust 1.46 - the fix has already been merged and will land in the next stable).
So let's use GDB:
$ gdb --quiet --args ./target/debug/greet-rs
Reading symbols from ./target/debug/greet-rs...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/amos/ftl/greet/greet-rs/target/debug/greet-rs.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) break dlclose
Function "dlclose" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlclose) pending.
(gdb) run
Starting program: /home/amos/ftl/greet/greet-rs/target/debug/greet-rs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Hello, thin library!
Breakpoint 1, 0x00007ffff7f91450 in dlclose () from /usr/lib/libdl.so.2
(gdb) bt
#0 0x00007ffff7f91450 in dlclose () from /usr/lib/libdl.so.2
#1 0x000055555555fb4e in <libloading::os::unix::Library as core::ops::drop::Drop>::drop (self=0x7fffffffe168)
at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/libloading-0.6.3/src/os/unix/mod.rs:305
#2 0x000055555555f54f in core::ptr::drop_in_place () at /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:184
#3 0x000055555555ad1f in core::ptr::drop_in_place () at /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:184
#4 0x000055555555c1fa in greet_rs::main () at src/main.rs:14
It does call dlclose
! When the Library
is dropped.
Hurray! That means we can do this, if we want:
// in `greet-rs/src/main.rs`
use cstr::cstr;
use std::{error::Error, io::BufRead, os::raw::c_char};
use libloading::{Library, Symbol};
fn main() -> Result<(), Box<dyn Error>> {
let mut line = String::new();
let stdin = std::io::stdin();
loop {
if let Err(e) = load_and_print() {
eprintln!("Something went wrong: {}", e);
}
println!("-----------------------------");
println!("Press Enter to go again, Ctrl-C to exit...");
stdin.lock().read_line(&mut line)?;
}
}
fn load_and_print() -> Result<(), libloading::Error> {
let lib = Library::new("../libgreet-rs/target/release/libgreet.so")?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
greet(cstr!("reloading").as_ptr());
}
Ok(())
}
$ cargo run -q
Hello, reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Hello, reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Hello, reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
^C
Works well. But.. the library is not changing though.
You're right, let me try to actually change it.
Bear, it doesn't work.
So it would appear.
Why doesn't it work?
Don't you think maybe you ought to have started that article with a proof of concept?
...
What can prevent dlclose
from unloading a library?
Well - researching this part took me a little while.
I probably spent two entire days debugging this, and reading code from glibc and the Rust standard library. I worked through hypothesis after hypothesis, and also switched from debugger to debugger, as either the debuggers or their frontends abandoned me halfway through the adventure.
Yeah, but now you know how it works!
Bearly.
So, here's the "short" version.
In rtld
(the runtime loader - what I've been calling the dynamic linker all
this time), every instance of a DSO (dynamic shared object) is
reference-counted.
Let's take our simple C library again: if we dlopen
it once, it's mapped.
And if we dlclose
it once, it's not mapped anymore.
Let's change load.c
to showcase that:
// in `load.c`
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void assert(void *p) {
if (!p) {
fprintf(stderr, "woops");
exit(1);
}
}
// this function is 101% pragmatic, don't @ me
void print_mapping_count() {
const size_t buf_size = 1024;
char buf[buf_size];
printf("mapping count: ");
fflush(stdout);
snprintf(buf, buf_size, "bash -c 'cat /proc/%d/maps | grep libgreet | wc -l'",
getpid());
system(buf);
}
int main(void) {
print_mapping_count();
printf("> dlopen(RTLD_NOW)\n");
void *lib = dlopen("./libgreet.so", RTLD_NOW);
assert(lib);
print_mapping_count();
printf("> dlclose()\n");
dlclose(lib);
print_mapping_count();
return 0;
}
$ gcc -Wall load.c -o load -ldl
$ ./load
mapping count: 0
> dlopen(RTLD_NOW)
mapping count: 5
> dlclose()
mapping count: 0
This is what it looks like when it works.
Now, when you call dlopen
multiple times, it doesn't map the same file over
and over again. It doesn't actually load it several times.
Let's confirm by trying it:
int main(void) {
print_mapping_count();
printf("> dlopen(RTLD_NOW)\n");
void *lib = dlopen("./libgreet.so", RTLD_NOW);
assert(lib);
print_mapping_count();
// new!
printf("> dlopen(RTLD_NOW), a second time\n");
void *lib2 = dlopen("./libgreet.so", RTLD_NOW);
assert(lib2);
print_mapping_count();
return 0;
}
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(RTLD_NOW)
mapping count: 5
> dlopen(RTLD_NOW), a second time
mapping count: 5
The number of file mappings remained the same. But how does glibc actually do that?
If we look at the dl_open_worker
function in glibc 2.31, we can see it calls
_dl_map_object
:
// in `glibc/elf/dl-open.c`
// in `dl_open_worker()`
/* Load the named object. */
struct link_map *new;
args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
mode | __RTLD_CALLMAP, args->nsid);
And the first thing _dl_map_object
does is compare, to see if the name
we're passing is similar to a name that's already loaded:
// in `glibc/elf/dl-load.c`
// in `_dl_map_object()`
/* Look for this name among those already loaded. */
for (l = GL(dl_ns)[nsid]._ns_loaded; l; l = l->l_next)
{
/* If the requested name matches the soname of a loaded object,
use that object. Elide this check for names that have not
yet been opened. */
if (__glibc_unlikely ((l->l_faked | l->l_removed) != 0))
continue;
if (!_dl_name_match_p (name, l))
{
const char *soname;
if (__glibc_likely (l->l_soname_added)
|| l->l_info[DT_SONAME] == NULL)
continue;
soname = ((const char *) D_PTR (l, l_info[DT_STRTAB])
+ l->l_info[DT_SONAME]->d_un.d_val);
if (strcmp (name, soname) != 0)
continue;
/* We have a match on a new name -- cache it. */
add_name_to_object (l, soname);
l->l_soname_added = 1;
}
/* We have a match. */
return l;
}
Note that it compares both the DT_SONAME
(which we covered earlier) and
the actual name passed to dlopen
. Even if we somehow managed to change both
of these between loads, it goes on to compare a "file identifier", in
_dl_map_object_from_fd
:
// in `glibc/elf/dl-load.c`
// in `_dl_map_object_from_fd()`
/* Look again to see if the real name matched another already loaded. */
for (l = GL(dl_ns)[nsid]._ns_loaded; l != NULL; l = l->l_next)
if (!l->l_removed && _dl_file_id_match_p (&l->l_file_id, &id))
{
/* The object is already loaded.
Just bump its reference count and return it. */
__close_nocancel (fd);
/* If the name is not in the list of names for this object add
it. */
free (realname);
add_name_to_object (l, name);
return l;
}
And on Linux, the "file id" is a struct made up of a device number and an inode number:
// in `glibc/sysdeps/posix/dl-fileid.h`
/* For POSIX.1 systems, the pair of st_dev and st_ino constitute
a unique identifier for a file. */
struct r_file_id
{
dev_t dev;
ino64_t ino;
};
/* Sample FD to fill in *ID. Returns true on success.
On error, returns false, with errno set. */
static inline bool
_dl_get_file_id (int fd, struct r_file_id *id)
{
struct stat64 st;
if (__glibc_unlikely (__fxstat64 (_STAT_VER, fd, &st) < 0))
return false;
id->dev = st.st_dev;
id->ino = st.st_ino;
return true;
}
So, dlopen
tries really hard to identify "loading the same file twice".
When closing, if the same file has been opened more times than it has been closed, nothing happens:
// in `load.c`
int main(void) {
print_mapping_count();
printf("> dlopen(RTLD_NOW), loads the DSO\n");
void *lib = dlopen("./libgreet.so", RTLD_NOW);
assert(lib);
print_mapping_count();
printf("> dlopen(RTLD_NOW), increases the reference count\n");
void *lib2 = dlopen("./libgreet.so", RTLD_NOW);
assert(lib2);
print_mapping_count();
printf("> dlclose(), decreases the reference count\n");
dlclose(lib2);
print_mapping_count();
printf("> dlclose(), reference count falls to 0, the DSO is unloaded\n");
dlclose(lib);
print_mapping_count();
return 0;
}
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(RTLD_NOW), loads the DSO
mapping count: 5
> dlopen(RTLD_NOW), increases the reference count
mapping count: 5
> dlclose(), decreases the reference count
mapping count: 5
> dlclose(), reference count falls to 0, the DSO is unloaded
mapping count: 0
Here's another reason why dlclose
might not unload a DSO. If
we loaded it with the RTLD_NODELETE
flag:
int main(void) {
print_mapping_count();
printf("> dlopen(RTLD_NOW | RTLD_NODELETE), loads the DSO\n");
void *lib = dlopen("./libgreet.so", RTLD_NOW | RTLD_NODELETE);
assert(lib);
print_mapping_count();
printf("> dlclose(), reference count falls to 0, but NODELETE is active\n");
dlclose(lib);
print_mapping_count();
return 0;
}
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(RTLD_NOW | RTLD_NODELETE), loads the DSO
mapping count: 5
> dlclose(), reference count falls to 0, but NODELETE is active
mapping count: 5
Here's yet another reason why dlclose
might not unload a DSO: if
we load another DSO, and some of its symbols are bounds to symbols
from the first DSO, then closing the first DSO will not unload it, since
it's needed by the second DSO.
Let's make something that links against libgreet.so
:
// in `woops.c`
extern void greet(const char *name);
void woops() {
greet("woops");
}
$ gcc -shared -Wall woops.c -o libwoops.so -L "${PWD}" -lgreet
$ file libwoops.so
libwoops.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=52a0b6f4bc8422b6dfbb4709decb8c3acdf23adf, with debug_info, not stripped
int main(void) {
print_mapping_count();
printf("> dlopen(libgreet, RTLD_NOW)\n");
void *lib = dlopen("./libgreet.so", RTLD_NOW);
assert(lib);
print_mapping_count();
printf("> dlopen(libwoops, RTLD_NOW)\n");
void *lib2 = dlopen("./libwoops.so", RTLD_NOW);
assert(lib2);
print_mapping_count();
printf("> dlclose(libgreet), but libwoops still needs it!\n");
dlclose(lib);
print_mapping_count();
return 0;
}
(Note that we still need to set LD_LIBRARY_PATH
- rtld still needs
to find libgreet.so
on disk before realizing it's already loaded).
$ gcc -Wall load.c -o load -ldl && LD_LIBRARY_PATH="${PWD}" ./load
mapping count: 0
> dlopen(libgreet, RTLD_NOW)
mapping count: 5
> dlopen(libwoops, RTLD_NOW)
mapping count: 5
> dlclose(libgreet), but libwoops still needs it!
mapping count: 5
If we close libwoops
as well, then libgreet
ends up being unloaded as
well, since nothing references it any longer:
int main(void) {
print_mapping_count();
printf("> dlopen(libgreet, RTLD_NOW)\n");
void *lib = dlopen("./libgreet.so", RTLD_NOW);
assert(lib);
print_mapping_count();
printf("> dlopen(libwoops, RTLD_NOW)\n");
void *lib2 = dlopen("./libwoops.so", RTLD_NOW);
assert(lib2);
print_mapping_count();
printf("> dlclose(libgreet), but libwoops still needs it!\n");
dlclose(lib);
print_mapping_count();
printf("> dlclose(libwoops), unloads libgreet\n");
dlclose(lib2);
print_mapping_count();
return 0;
}
$ gcc -Wall load.c -o load -ldl && LD_LIBRARY_PATH="${PWD}" ./load
mapping count: 0
> dlopen(libgreet, RTLD_NOW)
mapping count: 5
> dlopen(libwoops, RTLD_NOW)
mapping count: 5
> dlclose(libgreet), but libwoops still needs it!
mapping count: 5
> dlclose(libwoops), unloads libgreet
mapping count: 0
It doesn't matter in which order we close libgreet
and libwoops
. Any time
we close anything, rtld
goes through aaaaaaaall the objects it has loaded,
and decides whether they're still needed.
So, we've seen three things that can prevent a DSO from unloading:
- The reference count is > 0
- We've loaded it with
RTLD_NODELETE
- It's used by another DSO that is currently loaded
But... but our Rust cdylib
is doing none of those.
I know, right?
There is, in fact, a fourth thing.
Before clarifying everything, let me muddy the waters a little more.
Let's change our Rust library to make greet
a no-op.
// in `libgreet-rs/src/lib.rs`
use std::os::raw::c_char;
/// # Safety
/// Pointer must be valid, and point to a null-terminated
/// string. What happens otherwise is UB.
#[no_mangle]
pub unsafe extern "C" fn greet(_name: *const c_char) {
// muffin!
}
Then let's rebuild it:
$ cd libgreet-rs
$ cargo build
And load it from our test program:
// in `load.c`
int main(void) {
print_mapping_count();
printf("> dlopen(libgreet, RTLD_NOW)\n");
void *lib = dlopen("./libgreet-rs/target/debug/libgreet.so", RTLD_NOW);
assert(lib);
print_mapping_count();
printf("> dlclose(libgreet), will it work?\n");
dlclose(lib);
print_mapping_count();
return 0;
}
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(libgreet, RTLD_NOW)
mapping count: 6
> dlclose(libgreet), will it work?
mapping count: 0
It.. it works. Why does it work?
Well... let's look at the actual code of dlclose
- or, rather, let's
skip three or four abstraction levels and look directly at _dl_close_worker
:
// in `glibc/elf/dl-close.c`
// in `_dl_close_worker()`
/* Check whether this object is still used. */
if (l->l_type == lt_loaded
&& l->l_direct_opencount == 0
&& !l->l_nodelete_active
/* See CONCURRENCY NOTES in cxa_thread_atexit_impl.c to know why
acquire is sufficient and correct. */
&& atomic_load_acquire (&l->l_tls_dtor_count) == 0
&& !used[done_index])
continue;
There's our fourth thing. Did you see it?
Enhance!
&& atomic_load_acquire (&l->l_tls_dtor_count) == 0
Transport Layer Security... something count?
No bear, thread-local storage.
Oh yes.
l_tls_dtor_count
counts the number of thread-local destructors.
What are those? Why do we want them?
Well, there's simple cases of thread-local variables, say, this, in C99:
// in `tls.c`
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
__thread int a = 0;
void *work() {
for (int a = 0; a < 3; a++) {
printf("[%lu] a = %d\n", pthread_self() % 10, a);
sleep(1);
}
return NULL;
}
int main(void) {
pthread_t t1, t2, t3;
pthread_create(&t1, NULL, work, NULL);
pthread_create(&t2, NULL, work, NULL);
pthread_create(&t3, NULL, work, NULL);
sleep(4);
return 0;
}
$ gcc -Wall tls.c -o tls -lpthread
$ ./tls
[6] a = 0
[2] a = 0
[8] a = 0
[6] a = 1
[2] a = 1
[8] a = 1
[6] a = 2
[2] a = 2
[8] a = 2
As you can see, each thread has its own copy of a
. The space for it
is allocated when a thread is created, and deallocated when a thread exits.
But int
is a primitive type. It's nice and simple. There's no need
to do any particular cleanup when it's freed. Just release the associated
memory and you're good!
Which is not the case... of a RefCell<Option<Box<dyn Write + Send>>>
:
// in `rust/src/libstd/io/stdio.rs`
thread_local! {
/// Stdout used by print! and println! macros
static LOCAL_STDOUT: RefCell<Option<Box<dyn Write + Send>>> = {
RefCell::new(None)
}
}
Ohhh. We did use println!
.
The RefCell
isn't the problem. Nor the Option
. The problem is the Box
.
Sort of!
That heap-allocated needs to be freed somehow. Here's the actual implementation, as of Rust 1.46:
// in `rust/src/alloc/boxed.rs`
#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<#[may_dangle] T: ?Sized> Drop for Box<T> {
fn drop(&mut self) {
// FIXME: Do nothing, drop is currently performed by compiler.
}
}
So, since Box
implements Drop
, a "thread-local destructor" is registered.
Let's look at LocalKey::get
- it calls try_initialize
pub unsafe fn get<F: FnOnce() -> T>(&self, init: F) -> Option<&'static T> {
match self.inner.get() {
Some(val) => Some(val),
None => self.try_initialize(init),
}
}
try_initialize
, in turn, calls try_register_dtor
:
// `try_register_dtor` is only called once per fast thread local
// variable, except in corner cases where thread_local dtors reference
// other thread_local's, or it is being recursively initialized.
unsafe fn try_register_dtor(&self) -> bool {
match self.dtor_state.get() {
DtorState::Unregistered => {
// dtor registration happens before initialization.
register_dtor(self as *const _ as *mut u8, destroy_value::<T>);
self.dtor_state.set(DtorState::Registered);
true
}
DtorState::Registered => {
// recursively initialized
true
}
DtorState::RunningOrHasRun => false,
}
}
And register_dtor
, well, it's a thing of beauty:
// in `rust/src/libstd/sys/unix/fast_thread_local.rs`
// Since what appears to be glibc 2.18 this symbol has been shipped which
// GCC and clang both use to invoke destructors in thread_local globals, so
// let's do the same!
//
// Note, however, that we run on lots older linuxes, as well as cross
// compiling from a newer linux to an older linux, so we also have a
// fallback implementation to use as well.
#[cfg(any(
target_os = "linux",
target_os = "fuchsia",
target_os = "redox",
target_os = "emscripten"
))]
pub unsafe fn register_dtor(t: *mut u8, dtor: unsafe extern "C" fn(*mut u8)) {
use crate::mem;
use crate::sys_common::thread_local::register_dtor_fallback;
extern "C" {
#[linkage = "extern_weak"]
static __dso_handle: *mut u8;
#[linkage = "extern_weak"]
static __cxa_thread_atexit_impl: *const libc::c_void;
}
if !__cxa_thread_atexit_impl.is_null() {
type F = unsafe extern "C" fn(
dtor: unsafe extern "C" fn(*mut u8),
arg: *mut u8,
dso_handle: *mut u8,
) -> libc::c_int;
mem::transmute::<*const libc::c_void, F>(__cxa_thread_atexit_impl)(
dtor,
t,
&__dso_handle as *const _ as *mut _,
);
return;
}
register_dtor_fallback(t, dtor);
}
What does __cxa_thread_atexit_impl
do? Let's look at the glibc source again:
// in `glibc/stdlib/cxa_thread_atexit_impl.c`
/* Register a destructor for TLS variables declared with the 'thread_local'
keyword. This function is only called from code generated by the C++
compiler. FUNC is the destructor function and OBJ is the object to be
passed to the destructor. DSO_SYMBOL is the __dso_handle symbol that each
DSO has at a unique address in its map, added from crtbegin.o during the
linking phase. */
int
__cxa_thread_atexit_impl (dtor_func func, void *obj, void *dso_symbol)
{
// (cut)
}
"Only called from code generated by the C++ compiler", huh.
So, as soon as we call __cxa_thread_atexit_impl
, it's game over. We can
never, ever, unload that DSO.
Speaking of... why? Why does glibc check for that before unloading a DSO?
Well... a TLS destructor must be run on the same thread. Here, let me show you.
// in `tls2.c`
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <stdlib.h>
//====================================
// glibc TLS destructor stuff
//====================================
typedef void (*dtor_func)(void *);
extern void *__dso_handle;
extern void __cxa_thread_atexit_impl(dtor_func func, void *obj, void *dso_symbol);
//====================================
// Some thread-local data
//====================================
typedef struct {
uint64_t *array;
} data_t;
__thread data_t *data = NULL;
//====================================
// Some helpers
//====================================
// Returns an identifier that's shorter than `pthread_self`,
// easier to distinguish in the program's output. May collide
// though - not a great hash function.
uint8_t thread_id() {
return (pthread_self() >> 8) % 256;
}
// Attempt to sleep for a given amount of milliseconds.
// Passing `ms > 1000` is UB.
void sleep_ms(int ms) {
struct timespec ts = { .tv_sec = 0, .tv_nsec = ms * 1000 * 1000 };
nanosleep(&ts, NULL);
}
//====================================
// Our destructor
//====================================
void dtor(void *p) {
printf("[%x] dtor called! data = %p\n", thread_id(), data);
free(data->array);
free(data);
data = NULL;
}
//====================================
// Worker thread function
//====================================
void *work() {
printf("[%x] is worker thread\n", thread_id());
const size_t n = 16;
// initialize `data` for this thread
data = malloc(sizeof(data_t));
data->array = malloc(n * sizeof(uint64_t));
printf("[%x] allocated! data = %p\n", thread_id(), data);
printf("[%x] registering destructor\n", thread_id());
__cxa_thread_atexit_impl(dtor, NULL, __dso_handle);
// compute fibonnaci sequence
if (n >= 2) {
data->array[0] = 1;
data->array[1] = 1;
}
for (int i = 2; i < n; i++) {
data->array[i] = data->array[i - 2] + data->array[i - 1];
}
// print
for (int i = 0; i < n; i++) {
printf(i > 0 ? ", %lu" : "%lu", data->array[i]);
}
printf("\n");
printf("[%x] thread exiting\n", thread_id());
return NULL;
}
//====================================
// Main function
//====================================
int main(void) {
printf("[%x] is main thread\n", thread_id());
pthread_t t1;
printf("[%x] creating thread\n", thread_id());
pthread_create(&t1, NULL, work, NULL);
sleep_ms(100);
return 0;
}
Everything works fine in this code sample. The destructor is registered from a thread, and called on that same thread, when it exits naturally:
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[77] is main thread
[77] creating thread
[66] is worker thread
[66] allocated! data = 0x7f1bc8000b60
[66] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
[66] thread exiting
[66] dtor called! data = 0x7f1bc8000b60
If however the destructor is called from another thread, like the main thread, things go terribly wrong:
// in `tls2.c`
void *work() {
// (cut)
// commented out:
// printf("[%x] registering destructor\n", thread_id());
// __cxa_thread_atexit_impl(dtor, NULL, __dso_handle);
// (cut)
}
int main(void) {
printf("[%x] is main thread\n", thread_id());
pthread_t t1;
printf("[%x] creating thread\n", thread_id());
pthread_create(&t1, NULL, work, NULL);
sleep_ms(100);
dtor(NULL);
return 0;
}
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[e7] is main thread
[e7] creating thread
[d6] is worker thread
[d6] allocated! data = 0x7f522c000b60
[d6] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
[e7] dtor called! data = (nil)
zsh: segmentation fault (core dumped) ./tls2
Well yeah - the destructor refers to thread-local storage, but it's running on the wrong thread, so it's reading garbage.
Yup!
Note that this example code is a bit contrived and uses
__cxa_thread_atexit_impl
in unintended ways.
In code that a Rust or C++ compiler would emit, the obj
argument would be used
to pass the this
pointer, and so it would never be NULL
(or wrong). To
reproduce a similar failure, we'd have to have the destructor refer to some
other thread-local data, which... well. Don't do that.
Thanks to InBetweenNames on GitHub for the insight!
But say that thread does not exit naturally. Say it's cancelled, for example:
// in `tls2.c`
void *work() {
// (cut)
sleep(2);
printf("[%x] thread exiting\n", thread_id());
return NULL;
}
int main(void) {
// (cut)
sleep_ms(100);
pthread_cancel(t1);
return 0;
}
Then what happens?
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[c7] is main thread
[c7] creating thread
[b6] is worker thread
[b6] allocated! data = 0x7f1bcc000b60
[b6] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
The destructor isn't called at all?
Well, we still need to join it:
int main(void) {
// (cut)
sleep_ms(100);
pthread_cancel(t1);
pthread_join(t1, NULL);
return 0;
}
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[67] is main thread
[67] creating thread
[56] is worker thread
[56] allocated! data = 0x7fbd10000b60
[56] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
[56] dtor called! data = 0x7fbd10000b60
Alright, so, couldn't we just do this?
Depends. Who's "we"?
It's true that, once all TLS destructors run, l_tls_dtor_count
falls back
to zero, and the DSO can be unloaded.
So technically, one may come up with a scheme:
- Pause all threads
- Go through all threads associated with our DSO (
libgreet.so
) - Cancel them, then join them
- That way all the TLS destructors run
- And we can unload it
There's just... several small problems with that.
First off, the only way I can think of, of enumerating all threads, would be to use the ptrace API, like debuggers do.
This would also need to happen out-of-process, so the whole thing would require spawning another process.
Yay, moving parts!
Second - cancelling threads is not that easy. If we look at the pthread_cancel
man page:
The pthread_cancel() function sends a cancellation request to the
thread thread. Whether and when the target thread reacts to the
cancellation request depends on two attributes that are under the
control of that thread: its cancelability state and type.
A thread's cancelability state, determined by
pthread_setcancelstate(3), can be enabled (the default for new
threads) or disabled. If a thread has disabled cancellation, then a
cancellation request remains queued until the thread enables
cancellation. If a thread has enabled cancella‐ tion, then its
cancelability type determines when cancellation occurs.
A thread's cancellation type, determined by pthread_setcanceltype(3),
may be either asynchronous or deferred (the default for new threads).
Asynchronous cancelability means that the thread can be canceled at
any time (usually immediately, but the system does not guarantee
this). Deferred cancelability means that cancellation will be
delayed until the thread next calls a function that is a cancellation
point. A list of functions that are or may be cancellation points is
provided in pthreads(7).
So, if the thread's cancellation is asynchronous
, we might be able to
cancel it at any time - no guarantees!. But if it's the default, deferred
,
then it can only be cancelled at a "cancellation point".
Which, fortunately, sleep
is one of them, according to man 7 pthreads
.
But what if we're crunching numbers real hard, in a tight loop? Then we
won't be able to cancel that.
Third, what about cleanup? pthreads provides pthread_cleanup_push
, which
is fine if you expect your threads to be cancelled - but the Rust libstd
doesn't expect to be cancelled, at all.
If we search for pthread_cleanup_push
usage in Rust's libstd using
ripgrep:
$ cd rust/src/libstd
$ rg 'pthread_cleanup_push'
$
No results.
And then, there's a fourth thing.
In libgreet.so
, which thread is __cxa_thread_atexit_impl
called from?
Let's investigate:
// in `greet-rs/src/main.rs`
fn main() -> Result<(), Box<dyn Error>> {
println!("main thread id = {:?}", std::thread::current().id());
// (cut)
}
$ cd greet-rs/
$ cargo build
Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
Finished dev [unoptimized + debuginfo] target(s) in 0.29s
// in `libgreet-rs/src/lib.rs`
use std::os::raw::c_char;
/// # Safety
/// Pointer must be valid, and point to a null-terminated
/// string. What happens otherwise is UB.
#[no_mangle]
pub unsafe extern "C" fn greet(_name: *const c_char) {
println!("greeting from thread {:?}", std::thread::current().id());
}
$ cd libgreet-rs/
$ cargo build
Compiling greet v0.1.0 (/home/amos/ftl/greet/libgreet-rs)
Finished dev [unoptimized + debuginfo] target(s) in 0.18s
$ cd greet-rs/
$ ./target/debug/greet-rs
main thread id = ThreadId(1)
greeting from thread ThreadId(1)
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Uh oh.
It's the same thread!
We don't want to cancel the main thread now, do we?
Tonight, at eleven:
I swear to humanity, bear, if you say "pthread_cancel
culture"
...doom.
So. It seems we're stuck. I guess we can't reload Rust libraries.
Not as long as we use types that register thread-local destructor. So,
no println!
for us - in fact, no std::io
at all.
Unless... unless we find a way to prevent our Rust
library from calling __cxa_thread_atexit_impl
.
How do you mean?
Here, let me show you for once. If we declare our own #[no_mangle]
function in libgreet-rs
...
// in `libgreet-rs/src/lib.rs`
#[no_mangle]
pub unsafe extern "C" fn __cxa_thread_atexit_impl() {}
#[no_mangle]
pub unsafe extern "C" fn greet(name: *const c_char) {
let s = CStr::from_ptr(name);
println!("greetings, {}", s.to_str().unwrap());
}
$ cd libgreet-rs/
$ cargo b -q
$ cd greet-rs/
$ cargo b -q
$ ./target/debug/greet-rs
greetings, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Okay now let's change the library...
// in `libgreet-rs/src/lib.rs`
// in `fn greet()`
println!("hello, {}", s.to_str().unwrap());
# session where greet-rs is still running
# (now pressing enter)
greetings, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Mhh, no, that doesn't work.
See? Not that easy.
If we repeat the operation with LD_DEBUG=all
, we can see where rtld
takes the
__cxa_thread_atexit_impl
symbol for libgreet.so
:
137666: symbol=__cxa_thread_atexit_impl; lookup in file=./target/debug/greet-rs [0]
137666: symbol=__cxa_thread_atexit_impl; lookup in file=/usr/lib/libdl.so.2 [0]
137666: symbol=__cxa_thread_atexit_impl; lookup in file=/usr/lib/libpthread.so.0 [0]
137666: symbol=__cxa_thread_atexit_impl; lookup in file=/usr/lib/libgcc_s.so.1 [0]
137666: symbol=__cxa_thread_atexit_impl; lookup in file=/usr/lib/libc.so.6 [0]
137666: binding file ./target/debug/greet-rs [0] to /usr/lib/libc.so.6 [0]: normal symbol `__cxa_thread_atexit_impl' [GLIBC_2.18]
Ah, crap, libc wins again.
There's actually a way to make that workaround "work". Another dlopen
flag:
RTLD_DEEPBIND (since glibc 2.3.4)
Place the lookup scope of the symbols in this shared object
ahead of the global scope. This means that a self-con‐ tained
object will use its own symbols in preference to global symbols
with the same name contained in objects that have already been
loaded.
That way, it would look first in libgreet.so
to find __cxa_thread_atexit_impl
.
...or we could just put the definition in greet-rs
instead?
The executable? Sure, that should work - it's the first place rtld
looks.
First we need to remove __cxa_thread_atexit_impl
from libgreet-rs/src/lib.rs
,
and then we can add it to greet-rs/src/main.rs
// in `greet-rs/src/main.rs`
#[no_mangle]
pub unsafe extern "C" fn __cxa_thread_atexit_impl() {}
$ cd libgreet-rs/
$ cargo build -q
$ cd ../greet-rs/
$ cargo build -q
$ ./target/debug/greet-rs
./target/debug/greet-rs
hello, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Now let's change libgreet-rs
, you know the drill by now. And
press enter in our greet-rs
shell session:
hello again, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
🎉🎉🎉
Finally, we've done it.
Or have we?
Let's run greet-rs
through valgrind, just for fun:
$ valgrind --leak-check=full ./target/debug/greet-rs
==141352== Memcheck, a memory error detector
==141352== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==141352== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==141352== Command: ./target/debug/greet-rs
==141352==
hello again, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
==141352== Process terminating with default action of signal 2 (SIGINT)
(cut)
==141352== HEAP SUMMARY:
==141352== in use at exit: 11,205 bytes in 22 blocks
==141352== total heap usage: 38 allocs, 16 frees, 15,231 bytes allocated
==141352==
==141352== 96 (24 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 22
==141352== at 0x483A77F: malloc (vg_replace_malloc.c:307)
(cut)
==141352== by 0x110DA4: greet_rs::load_and_print (main.rs:31)
==141352== by 0x1106DA: greet_rs::main (main.rs:15)
(cut)
==141352== by 0x110E29: main (in /home/amos/ftl/greet/greet-rs/target/debug/greet-rs)
==141352==
==141352== 1,136 (8 direct, 1,128 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 22
==141352== at 0x483A77F: malloc (vg_replace_malloc.c:307)
(cut)
==141352== by 0x110DA4: greet_rs::load_and_print (main.rs:31)
==141352== by 0x1106DA: greet_rs::main (main.rs:15)
(cut)
==141352== by 0x110E29: main (in /home/amos/ftl/greet/greet-rs/target/debug/greet-rs)
==141352==
==141352== LEAK SUMMARY:
==141352== definitely lost: 32 bytes in 2 blocks
(cut)
Oh. It's leaking memory.
Which. Of course it is. We made "registering destructors" a no-op.
Wasn't there a fallback in Rust's libstd
?
Yes there was!
But the fallback is only used when __cxa_thread_atexit_impl
is not
present. If, for example, your version of glibc does not provide that
symbol. Which can happen!
So... do we patch glibc?
Luckily, we don't need to.
libstd
doesn't really check if __cxa_thread_atexit_impl
is "provided"
or "present". It checks if the address of __cxa_thread_atexit_impl
, as
provided by rtld
during the loading of libgreet.so
, is non-zero.
// in the Rust libstd
if !__cxa_thread_atexit_impl.is_null() {
// etc.
}
Oooh, ooh! I have an idea.
Pray tell!
What if we made a symbol, named __cxa_thread_atexit_impl
...
Go on...
And injected it in the rtld
namespace, before libc.so.6
...
With LD_PRELOAD
? Sure.
...and it's a constant symbol, and its value is 0.
Is... is that legal? Should we call a lawyer?
Turns out - no lawyers are needed. At first, I tried doing that without involving another dynamic library, but GNU ld was not amused. Not amused at all. In fact, an internal assertion failed, rudely.
But, if we're willing to make another .so
file, we can make it work.
How do we make a constant symbol?
I'm not aware of any way to do that in Rust. Or, heck, even in C.
But that's where assembly comes in handy. We've talked about assembly before. In the current implementation, Rust code typically gets compiled to LLVM IR, which is a form of assembly.
In the GNU toolchain (GCC and friends), C code gets compiled to... GNU assembly.
AT&T style. And then assembled with gas
, the GNU assembler.
So, let's write a bit of assembly:
// in `tls-dtor-fallback.S`
.global __cxa_thread_atexit_impl
__cxa_thread_atexit_impl = 0
Then, let's make an honest shared library out of it:
$ gcc -Wall -shared -nostdlib -nodefaultlibs tls-dtor-fallback.S -o libtls-dtor-fallback.so
Let's check what we have in there:
$ nm -D ./libtls-dtor-fallback.so
0000000000000000 A __cxa_thread_atexit_impl
Wonderful! Just what we need.
Wait... "A"? Shouldn't it be "T"?
"T" is for the ".text" (code) section. "A" is for "absolute". Doesn't matter though. It's still a symbol, and rtld should find it.
Then we just inject it when we run greet-rs
, and:
Wait! We forgot to remove the __cxa_thread_atexit_impl
from greet-rs/src/main.rs
Ah right! So, let's remove it from there, and recompile... and then let's
inject our library when we run greet-rs
:
$ LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
greetings, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Then we can change libgreet
:
// in `libgreet-rs/src/lib.rs`
#[no_mangle]
pub unsafe extern "C" fn greet(name: *const c_char) {
let s = CStr::from_ptr(name);
println!("hurray for {}!", s.to_str().unwrap());
}
$ cd libgreet-rs/ && cargo build -q
Then from the session where greet-rs
is still running, we press enter:
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
This seems like a good ending, right? The library unloads, we can load it again, everything's fine?
Why do I sense there's trouble afoot?
Because there is, cool bear. There is. We messed up bad.
More like "breakaround"
The thing is, there's a very good reason why glibc doesn't let you unload a DSO if it has registered TLS destructors. We've already seen good reasons, but that wasn't the whole story.
First off, we never checked that we fixed the memory leak:
$ valgrind --leak-check=full --trace-children=yes env LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
(cut)
==263760== LEAK SUMMARY:
==263760== definitely lost: 32 bytes in 2 blocks
==263760== indirectly lost: 1,200 bytes in 4 blocks
==263760== possibly lost: 0 bytes in 0 blocks
==263760== still reachable: 10,149 bytes in 20 blocks
==263760== suppressed: 0 bytes in 0 blocks
...and that's just when we load libgreet.so
once. It leaks 32 bytes
directly, 1200 bytes indirectly per load.
But let's ignore that - we could load it 447K times before it leaks 64 MiB of RAM, so arguably, in development, that's not a huge problem.
Right - and at least, the actual .so
file is unmapped, so
the kernel can free those resources.
True, so it is "better" than before in the "memory leak" department.
The issue with our workaround is much bigger. The reason we're leaking
memory is because the TLS destructors registered by libgreet
never
actually get run.
How do you know?
Days and days of stepping through code with various debuggers?
Oh, that's what you were up to. I thought you were just installing Gentoo.
...that too.
But what would happen if the destructors were actually called?
For TLS destructors to be called, on Linux, with glibc, we need to actually let a thread terminate.
So let's try to call load_and_print
from a thread:
// in `greet-rs/src/main.rs`
fn main() -> Result<(), Box<dyn Error>> {
let mut line = String::new();
let stdin = std::io::stdin();
println!("Here we go!");
loop {
// new! was just a regular call:
std::thread::spawn(load_and_print).join().unwrap().unwrap();
println!("-----------------------------");
println!("Press Enter to go again, Ctrl-C to exit...");
line.clear();
stdin.read_line(&mut line).unwrap();
}
Ok(())
}
Mhh why do we unwrap
twice?
JoinHandle::join
returns a Result<T, E>
- which is Err
if the thread panics. But here, the thread also
returns a Result<T, E>
, so the actual return type is std::thread::Result<Result<(), libloading::Error>>
Wait, std::thread::Result
only has one type parameter? It doesn't take an E
for error?
Libraries tend to do that - they define their own Result
type, which is an alias
over std::result::Result
with the error type E
set to something from the crate.
So, now that we do load_and_print
from a thread:
$ cargo b -q
$ ./target/debug/greet-rs
Here we go!
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
^C
Seems to work fine?
Woops, forgot to inject our "workaround"
$ LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
hurray for reloading!
zsh: segmentation fault (core dumped) LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Ah, yes.
We can dig a little deeper with LLDB:
$ lldb ./taget/debug/greet-rs
(lldb) target create "./target/debug/greet-rs"
Current executable set to '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64).
(lldb) env LD_PRELOAD=../libtls-dtor-fallback.so
(lldb) r
Process 285989 launched: '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64)
Here we go!
hurray for reloading!
Process 285989 stopped
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
frame #0: 0x00007ffff7b574f0
error: memory read failed for 0x7ffff7b57400
(lldb) bt
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
* frame #0: 0x00007ffff7b574f0
frame #1: 0x00007ffff7f74201 libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:302:8
frame #2: 0x00007ffff7f7418a libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:251
frame #3: 0x00007ffff7f743fc libpthread.so.0`start_thread(arg=0x00007ffff7d85640) at pthread_create.c:474:3
frame #4: 0x00007ffff7e88293 libc.so.6`__clone at clone.S:95
(lldb)
...but even Valgrind would've given us a hint as to what went wrong:
$ valgrind --quiet --leak-check=full --trace-children=yes env LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
hurray for reloading!
==287464== Thread 2:
==287464== Jump to the invalid address stated on the next line
==287464== at 0x50994F0: ???
==287464== by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:302)
==287464== by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:251)
==287464== by 0x488C3FB: start_thread (pthread_create.c:474)
==287464== by 0x49BF292: clone (clone.S:95)
==287464== Address 0x50994f0 is not stack'd, malloc'd or (recently) free'd
==287464==
==287464== Can't extend stack to 0x484f138 during signal delivery for thread 2:
==287464== no stack segment
==287464==
==287464== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==287464== Access not within mapped region at address 0x484F138
==287464== at 0x50994F0: ???
==287464== by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:302)
==287464== by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:251)
==287464== by 0x488C3FB: start_thread (pthread_create.c:474)
==287464== by 0x49BF292: clone (clone.S:95)
==287464== If you believe this happened as a result of a stack
==287464== overflow in your program's main thread (unlikely but
==287464== possible), you can try to increase the size of the
==287464== main thread stack using the --main-stacksize= flag.
==287464== The main thread stack size used in this run was 8388608
Poor Valgrind is trying to its darndest to help us.
But no, that memory range was not stack'd, malloc'd, or recently free'd.
It was, however, recently unmapped.
With our workaround, or "breakaround", as I've recently taken to calling it, we've entered the land of super-duper-undefined behavior, aka SDUB.
Because events are happening in this order:
- T1 (main thread) spawns a second thread, T2
- In T2,
libgreet.so
is loaded - In T2,
greet()
fromlibgreet.so
is called - In T2,
greet()
callsprintln!()
, which accessesLOCAL_STDOUT
, which is initialized, and for which a TLS destructor is registered (using the fallback, since we hid__cxa_thread_atexit_impl
) - In T2,
lib
is dropped, solibgreet.so
is unloaded - T2 finishes, so all pthreads TLS key destructors are called (this is how the fallback works)
...however, the destructors' code was in the DSO we just unloaded.
So... we broke libloading
?
We definitely made it insta-unsound.
Because in libloading
, Library::new
is not unsafe. And neither is
dropping a Library
. And yet that's where we crash.
Mhh. Couldn't we make sure we call the pthread TLS key destructors
before libgreet.so
is dropped?
Sure, yes, we can do that.
// in `greet-rs/src/main.rs`
use cstr::cstr;
use std::ffi::c_void;
use std::{error::Error, io::BufRead, os::raw::c_char};
use libloading::{Library, Symbol};
fn main() -> Result<(), Box<dyn Error>> {
let mut line = String::new();
let stdin = std::io::stdin();
println!("Here we go!");
loop {
let lib = std::thread::spawn(load_and_print).join().unwrap().unwrap();
drop(lib); // for clarity
println!("-----------------------------");
println!("Press Enter to go again, Ctrl-C to exit...");
line.clear();
stdin.read_line(&mut line).unwrap();
}
Ok(())
}
// now returns a `Library`, instead of dropping it
fn load_and_print() -> Result<Library, libloading::Error> {
let lib = Library::new("../libgreet-rs/target/debug/libgreet.so")?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
greet(cstr!("reloading").as_ptr());
}
Ok(lib)
}
$ cargo b -q
$ LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
^C
But there are many such scenarios. What if we don't
run load_and_print
in a thread, but if instead we run the
whole loop in a thread that isn't the main thread?
// in `greet-rs/src/main.rs`
use cstr::cstr;
use std::ffi::c_void;
use std::{error::Error, io::BufRead, os::raw::c_char};
use libloading::{Library, Symbol};
fn main() -> Result<(), Box<dyn Error>> {
std::thread::spawn(run).join().unwrap();
Ok(())
}
fn run() {
let mut line = String::new();
let stdin = std::io::stdin();
println!("Here we go!");
let n = 3;
for _ in 0..n {
load_and_print().unwrap();
println!("-----------------------------");
println!("Press Enter to go again, Ctrl-C to exit...");
line.clear();
stdin.read_line(&mut line).unwrap();
}
println!("Did {} rounds, stopping", n);
}
fn load_and_print() -> Result<(), libloading::Error> {
let lib = Library::new("../libgreet-rs/target/debug/libgreet.so")?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
greet(cstr!("reloading").as_ptr());
}
Ok(())
}
$ lldb ./target/debug/greet-rs
(lldb) target create "./target/debug/greet-rs"
Current executable set to '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64).
(lldb) env LD_PRELOAD=../libtls-dtor-fallback.so
(lldb) r
Process 333436 launched: '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64)
Here we go!
three cheers for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
three cheers for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
three cheers for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
Did 3 rounds, stopping
Process 333436 stopped
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
frame #0: 0x00007ffff7b574f0
error: memory read failed for 0x7ffff7b57400
(lldb) bt
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
* frame #0: 0x00007ffff7b574f0
frame #1: 0x00007ffff7f74201 libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:302:8
frame #2: 0x00007ffff7f7418a libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:251
frame #3: 0x00007ffff7f743fc libpthread.so.0`start_thread(arg=0x00007ffff7d85640) at pthread_create.c:474:3
frame #4: 0x00007ffff7e88293 libc.so.6`__clone at clone.S:95
(lldb)
So... what's our solution here?
Well, there's a few things we can try.
A little memory leak, as a treat
Listen, sometimes you have to make compromises.
$ cargo new --lib compromise
Created library `compromise` package
Let's 🛒 go 🛒 shopping!
$ cargo add once_cell
Adding once_cell v1.4.1 to dependencies
$ cargo add cstr
Adding cstr v0.2.4 to dependencies
$ cargo add libc
Adding libc v0.2.77 to dependencies
So, this one is going to be a bit convoluted, but stay with me - we can do this.
First off, we don't always want hot reloading to be enabled. When it's disabled, we actually want to register TLS constructors. So we need to maintain some global state - that represents whether we're in a hot reloading scenario or not.
We could put it behind a Mutex,
but do we really need to? Who knows how a Mutex
is even implemented? Maybe it uses
thread-local primitives behind the scenes, which we cannot use to implement this.
Let's go for something more minimal - just an AtomicBool.
// in `compromise/src/lib.rs`
use std::{sync::atomic::AtomicBool, sync::atomic::Ordering};
static HOT_RELOAD_ENABLED: AtomicBool = AtomicBool::new(false);
// this one will be called from our executable, so it needs to be `pub`
pub fn set_hot_reload_enabled(enabled: bool) {
HOT_RELOAD_ENABLED.store(enabled, Ordering::SeqCst)
}
// this one can be `pub(crate)`, it'll only be called internally
pub(crate) fn is_hot_reload_enabled() -> bool {
HOT_RELOAD_ENABLED.load(Ordering::SeqCst)
}
Next up: we need an actual mechanism to prevent registration of TLS destructors when hot-reloading is enabled.
Right now we only have an implementation for Linux:
// in `compromise/src/lib.rs`
#[cfg(target_os = "linux")]
pub mod linux;
That's where things get a little... complicated.
Basically, we want to provide a function that:
- Takes the same arguments as
__cxa_thread_atexit
- Calls libc's
__cxa_thread_atexit
, only if hot reloading is disabled.
Which means, if hot reloading is disabled, we need to look up __cxa_thread_atexit
.
How do we even do that? Isn't it hidden by our own version?
Not hidden. Ours just comes first. We can still grab it with dlsym()
, using the RTLD_NEXT
flag.
How convenient. And we're going to do that on every call?
Well, that's the tricky part.
We don't care a lot about performance, because we don't expect to be
registering TLS destructors very often, but still, I'd expect a dlsym
call to be sorta costly, so I'd like to cache it.
First, let's define the type of the function we'll looking up:
// in `compromise/src/lib.rs`
use std::ffi::c_void;
type NextFn = unsafe extern "C" fn(*mut c_void, *mut c_void, *mut c_void);
Next - one way to "only look it up once" would be to declare a static
of type
once_cell::sync::Lazy<NextFn>
- similar to what lazy_static
gives us, except using once_cell.
// in `compromise/src/lib.rs`
use cstr::cstr;
use once_cell::sync::Lazy;
use std::mem::transmute;
static NEXT: Lazy<NextFn> = Lazy::new(|| unsafe {
transmute(libc::dlsym(
libc::RTLD_NEXT,
#[allow(clippy::transmute_ptr_to_ref)] // just silencing warnings
cstr!("__cxa_thread_atexit_impl").as_ptr(),
))
});
And then we can use it from our own thread_atexit
:
// in `compromise/src/lib.rs`
#[allow(clippy::missing_safety_doc)]
pub unsafe fn thread_atexit(func: *mut c_void, obj: *mut c_void, dso_symbol: *mut c_void) {
if crate::is_hot_reload_enabled() {
// avoid registering TLS destructors on purpose, to avoid
// double-frees and general crashiness
} else {
// hot reloading is disabled, attempt to forward TLS destructor
// registration to glibc
// note: we need to deref `NEXT` because it's a `Lazy<T>`
(*NEXT)(func, obj, dso_symbol)
}
}
...but that could crash if the system glibc doesn't have
__cxa_thread_atexit_impl
- ie, if dlsym
returned a null pointer.
There's worse: building a value of type extern "C" fn foo()
that is null is
undefined behavior. Compiler optimizations may assume the pointer is
non-null and remove any null checks we add.
So, let's not do undefined behavior.
Not even a little? As a treat?
Not even a little.
Luckily, extern "C" fn foo()
is a pointer type, and Option<T>
when T
is a pointer type is transparent - it has the same size, same layout,
it's just None
when the pointer is null.
This is exactly what we want.
static NEXT: Lazy<Option<NextFn>> = Lazy::new(|| unsafe {
std::mem::transmute(libc::dlsym(
libc::RTLD_NEXT,
#[allow(clippy::transmute_ptr_to_ref)]
cstr!("__cxa_thread_atexit_impl").as_ptr(),
))
});
Now, onto our thread_atexit
function.
Here's our full Linux implementation, with some symbols renamed for clarity:
// `compromise/src/linux.rs` implementation (whole file)
use cstr::cstr;
use once_cell::sync::Lazy;
use std::ffi::c_void;
pub type NextFn = unsafe extern "C" fn(*mut c_void, *mut c_void, *mut c_void);
static SYSTEM_THREAD_ATEXIT: Lazy<Option<NextFn>> = Lazy::new(|| unsafe {
#[allow(clippy::transmute_ptr_to_ref)]
let name = cstr!("__cxa_thread_atexit_impl").as_ptr();
std::mem::transmute(libc::dlsym(
libc::RTLD_NEXT,
#[allow(clippy::transmute_ptr_to_ref)]
name,
))
});
/// Turns glibc's TLS destructor register function, `__cxa_thread_atexit_impl`,
/// into a no-op if hot reloading is enabled.
///
/// # Safety
/// This needs to be public for symbol visibility reasons, but you should
/// never need to call this yourself
pub unsafe fn thread_atexit(func: *mut c_void, obj: *mut c_void, dso_symbol: *mut c_void) {
if crate::is_hot_reload_enabled() {
// avoid registering TLS destructors on purpose, to avoid
// double-frees and general crashiness
} else if let Some(system_thread_atexit) = *SYSTEM_THREAD_ATEXIT {
// hot reloading is disabled, and system provides `__cxa_thread_atexit_impl`,
// so forward the call to it.
system_thread_atexit(func, obj, dso_symbol);
} else {
// hot reloading is disabled *and* we don't have `__cxa_thread_atexit_impl`,
// throw hands up in the air and leak memory.
}
}
Easy enough!
Mhhhhhhhh. But where do we define our own __cxa_thread_atexit_impl
?
This one is just called thread_atexit
, and it's mangled.
Good eye! Turns out, if we just define __cxa_thread_atexit_impl
, even
pub
, even #[no_mangle]
, it's not enough, because when linking, GNU ld
picks glibc's version and we never end up calling the one in the compromise
crate.
So it only works if it's defined directly in the executable?
Correct.
How do we do that?
Well... there's always macros. Which let us more or less take a bunch of AST (Abstract Syntax Tree) nodes and paste them into the module that calls it.
Let's see how that would work:
// in `compromise/src/lib.rs`
#[macro_export]
macro_rules! register {
() => {
#[cfg(target_os = "linux")]
#[no_mangle]
pub unsafe extern "C" fn __cxa_thread_atexit_impl(
func: *mut c_void,
obj: *mut c_void,
dso_symbol: *mut c_void,
) {
compromise::linux::thread_atexit(func, obj, dso_symbol);
}
};
}
Ohhhh there it is. So the compromise
crate only works if the executable's
crate calls the compromise::register!()
macro?
Yup!
And is that why linux::thread_atexit
was pub
? Because it'll actually end
up being called from greet-rs
(outside the compromise
crate)?
Yes!! And that's also why, in the macro, it's fully-qualified: compromise::linux::thread_atexit
.
Alright, I'll let your crimes be for now - just show us how to use them!
Well, first we need to import the crate:
$ cd greet-rs/
$ cargo rm libc
Removing libc from dependencies
$ cargo add ../compromise
Adding compromise (unknown version) to dependencies
Cool bear's hot tip
cargo-edit (which provides the cargo add
and cargo rm
subcommands) is not doing "magic" here - it's just adding
the compromise
crate as a path. Here's what the resulting Cargo.toml
's
dependencies
section looks like:
[dependencies]
libloading = "0.6.3"
cstr = "0.2.2"
compromise = { path = "../compromise" }
It's not published to crates.io, it's not vendored, the compromise/
folder
has to live on disk next to greet-rs/
or it won't build.
Then, in greet-rs/src/main.rs
, we need to register compromise
:
// in `greet-rs/src/main.rs`
// ⚠ Important: hot reloading won't work without it.
compromise::register!();
And then, at some point, call compromise::set_hot_reload_enabled(true)
.
But do we want to call it every time? No! So let's bring in a crate for CLI (command-line interface) argument parsing:
$ cargo add argh
Adding argh v0.1.3 to dependencies
It'll be quick - I swear.
use argh::FromArgs;
#[derive(FromArgs)]
/// Greet
struct Args {
/// whether "hot reloading" should be enabled
#[argh(switch)]
watch: bool,
}
fn main() -> Result<(), Box<dyn Error>> {
let args: Args = argh::from_env();
compromise::set_hot_reload_enabled(args.watch);
if args.watch {
println!("Hot reloading enabled - there will be memory leaks!");
}
std::thread::spawn(run).join().unwrap();
Ok(())
}
Let's give it a shot:
$ cargo b -q
...but before we do - did our trick work?
$ nm -D ./target/debug/greet-rs | grep __cxa
w __cxa_finalize@@GLIBC_2.2.5
000000000000ac40 T __cxa_thread_atexit_impl
Looking good! We can see __cxa_finalize
was taken from glibc (as evidenced
by the @@GLIBC_2.2.5
version marker), and __cxa_thread_atexit_impl
is
defined in the executable itself.
We can convince ourselves further by running it in LLDB:
$ lldb ./target/debug/greet-rs
(lldb) target create "./target/debug/greet-rs"
Current executable set to '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64).
(lldb) b __cxa_thread_atexit_impl
Breakpoint 1: where = greet-rs`__cxa_thread_atexit_impl + 18 at lib.rs:26:13, address = 0x000000000000ac52
(lldb) r
Process 21171 launched: '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64)
1 location added to breakpoint 1
Process 21171 stopped
* thread #1, name = 'greet-rs', stop reason = breakpoint 1.1
frame #0: 0x000055555555ec52 greet-rs`__cxa_thread_atexit_impl(func=0x000055555557ae60, obj=0x00007ffff7d87be0, dso_symbol=0x00005555555be088) at lib.rs:26:13
23 obj: *mut c_void,
24 dso_symbol: *mut c_void,
25 ) {
-> 26 compromise::linux::thread_atexit(func, obj, dso_symbol);
27 }
28 };
29 }
Fantastic.
At this point, we almost don't even need to try it out - unless we messed up
the conditions in compromise/linux.rs
, everything should work just fine.
But let's anyway. Here's an asciinema:
That's all well and good...
Yeah, I'm happy we finally got it work-
...but that's not "live" reloading. You still have to press enter.
...FINE.
Having fun
This has been a long and difficult article, so it's time to unwind a little, and reap what we've sown.
Segmentation faults?
No, fun.
First off, bear is 100% right. We're not live reloading right now. We're just loading the library every time we print anything, and unloading it right after.
Let's fix that.
$ cargo add notify --vers 5.0.0-pre.3
Adding notify v5.0.0-pre.3 to dependencies
// in `greet-rs/src/main.rs`
use notify::{RecommendedWatcher, RecursiveMode, Watcher};
fn main() -> Result<(), Box<dyn Error>> {
let args: Args = argh::from_env();
compromise::set_hot_reload_enabled(args.watch);
if args.watch {
println!("Hot reloading enabled - there will be memory leaks!");
}
let base = PathBuf::from("../libgreet-rs").canonicalize().unwrap();
let libname = "libgreet.so";
let relative_path = PathBuf::from("target").join("debug").join(libname);
let absolute_path = base.join(&relative_path);
let mut watcher: RecommendedWatcher = Watcher::new_immediate({
move |res: Result<notify::Event, _>| match res {
Ok(event) => {
if let notify::EventKind::Create(_) = event.kind {
if event.paths.iter().any(|x| x.ends_with(&relative_path)) {
let res = step(&absolute_path);
if let Err(e) = res {
println!("step error: {}", e);
}
}
}
}
Err(e) => println!("watch error: {}", e),
}
})
.unwrap();
watcher.watch(&base, RecursiveMode::Recursive).unwrap();
loop {
std::thread::sleep(std::time::Duration::from_secs(1));
}
}
fn step(lib_path: &Path) -> Result<(), libloading::Error> {
let lib = Library::new(lib_path)?;
unsafe {
let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
#[allow(clippy::transmute_ptr_to_ref)]
greet(cstr!("saturday").as_ptr());
}
Ok(())
}
Now we're having fun!
Wait.. you're not going to explain any of it?
Shush. I'm having fun. Y'all can figure it out.
To try it out, let's combine our new file-watching powers with cargo-watch
,
to recompile libgreet-rs
any time we change it.
$ cargo install cargo-watch
(cut: lots and lots of output)
And here's our next demo:
But this isn't fun enough.
Fun, in larger quantities
You know what would be fun? If we could draw stuff. In real time. And have our code be live-reloaded. Now that would be really fun.
Ohhhhh.
But in order for that to work, we probably don't want to be reloading the library every frame.
We don't have graphics yet, but let's prepare for that. First, let's make a
plugin
module with the implementation details:
mod plugin {
use libloading::{Library, Symbol};
use std::{os::raw::c_char, path::Path};
/// Represents a loaded instance of our plugin
/// We keep the `Library` together with function pointers
/// so that they go out of scope together.
pub struct Plugin {
pub greet: unsafe extern "C" fn(name: *const c_char),
lib: Library,
}
impl Plugin {
pub fn load(lib_path: &Path) -> Result<Self, libloading::Error> {
let lib = Library::new(lib_path)?;
Ok(unsafe {
Plugin {
greet: *(lib.get(b"greet")?),
lib,
}
})
}
}
}
And then let's use it!
Instead of having our watcher directly load the library, we'll have it communicate with our main thread using a std::sync::mpsc::channel.
On every "frame", if a message was sent to the channel, we'll try to reload the plugin. Otherwise, we'll just use it, as usual.
Let's go:
use plugin::Plugin;
fn main() -> Result<(), Box<dyn Error>> {
// same as before
let args: Args = argh::from_env();
compromise::set_hot_reload_enabled(args.watch);
if args.watch {
println!("Hot reloading enabled - there will be memory leaks!");
}
let base = PathBuf::from("../libgreet-rs").canonicalize().unwrap();
let libname = "libgreet.so";
let relative_path = PathBuf::from("target").join("debug").join(libname);
let absolute_path = base.join(&relative_path);
// here's our watcher to communicate between the watcher thread
// (using `tx`, the "transmitter") and the main thread (using
// `rx`, the "receiver").
let (tx, rx) = std::sync::mpsc::channel::<()>();
let mut watcher: RecommendedWatcher = Watcher::new_immediate({
move |res: Result<notify::Event, _>| match res {
Ok(event) => {
if let notify::EventKind::Create(_) = event.kind {
if event.paths.iter().any(|x| x.ends_with(&relative_path)) {
// signal that we need to reload
tx.send(()).unwrap();
}
}
}
Err(e) => println!("watch error: {}", e),
}
})
.unwrap();
watcher.watch(&base, RecursiveMode::Recursive).unwrap();
// Initial plugin load, before the main loop starts
let mut plugin = Some(Plugin::load(&absolute_path).unwrap());
let start = std::time::SystemTime::now();
// Forever... (or until Ctrl-C)
loop {
std::thread::sleep(std::time::Duration::from_millis(100));
if rx.try_recv().is_ok() {
println!("==== Reloading ====");
// These two lines look funky, but they're needed - we *first*
// need to drop the current plugin (which will call `dlclose`)
// before we load the next one (which will call `dlopen`), otherwise
// we'll just increase the reference count on the already-loaded
// DSO.
plugin = None;
plugin = Some(Plugin::load(&absolute_path)?);
}
if let Some(plugin) = plugin.as_ref() {
let s = format!("We've been running for {:?}", start.elapsed().unwrap());
let s = CString::new(s)?;
unsafe { (plugin.greet)(s.as_ptr()) };
}
}
}
One more demo?
One more demo.
🎉🎉🎉
Let's draw some stuff
So, we've got the foundation of a very fun playground here.
We can turn our text application into a graphical application with very little effort. But I don't want to spend forever going over various drawing libraries, instead, I think we're going to go with... just a framebuffer.
Raw pixels.
$ cargo new --lib common
Created library `common` package
This library will be used by both greet-rs
and libgreet-rs
, it'll just
define a common data structure.
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Pixel {
pub b: u8,
pub g: u8,
pub r: u8,
/// Unused (zero)
pub z: u8,
}
#[repr(C)]
pub struct FrameContext {
pub width: usize,
pub height: usize,
pub pixels: *mut Pixel,
pub ticks: usize,
}
impl FrameContext {
pub fn pixels(&mut self) -> &mut [Pixel] {
unsafe { std::slice::from_raw_parts_mut(self.pixels, self.width * self.height) }
}
}
Cool bear's hot tip
This is not a lesson in FFI (foreign-function interface) but suffice to say that slices are not guaranteed to remain stable from one Rust version to the next.
So, we use a raw pointer instead, and a getter, to construct the slice on the plugin's side.
$ cd greet-rs/
$ cargo add minifb
Adding minifb v0.19.1 to dependencies
$ cargo add ../common
Adding common (unknown version) to dependencies
// in `greet-rs/src/main.rs`
use common::{FrameContext, Pixel};
use minifb::{Key, Window, WindowOptions};
fn main() -> Result<(), Box<dyn Error>> {
// omitted: CLI arg parsing, paths, watcher initialization
watcher.watch(&base, RecursiveMode::Recursive).unwrap();
const WIDTH: usize = 640;
const HEIGHT: usize = 360;
let mut pixels: Vec<Pixel> = Vec::with_capacity(WIDTH * HEIGHT);
for _ in 0..pixels.capacity() {
pixels.push(Pixel {
z: 0,
r: 0,
g: 0,
b: 0,
});
}
let mut window = Window::new("Playground", WIDTH, HEIGHT, WindowOptions::default())?;
window.limit_update_rate(Some(std::time::Duration::from_micros(16600)));
let mut plugin = Some(Plugin::load(&absolute_path).unwrap());
let start = std::time::SystemTime::now();
while window.is_open() && !window.is_key_down(Key::Escape) {
if rx.try_recv().is_ok() {
println!("==== Reloading ====");
plugin = None;
plugin = Some(Plugin::load(&absolute_path)?);
}
if let Some(plugin) = plugin.as_ref() {
let mut cx = FrameContext {
width: WIDTH,
height: HEIGHT,
pixels: &mut pixels[0],
ticks: start.elapsed().unwrap().as_millis() as usize,
};
unsafe { (plugin.draw)(&mut cx) }
}
window
.update_with_buffer(
#[allow(clippy::transmute_ptr_to_ptr)]
unsafe {
std::mem::transmute(pixels.as_slice())
},
WIDTH,
HEIGHT,
)
.unwrap();
}
Ok(())
}
Our plugin interface has been extended a little:
// in `greet-rs/src/main.rs`
mod plugin {
use common::FrameContext; // new
use libloading::{Library, Symbol};
use std::{os::raw::c_char, path::Path};
/// Represents a loaded instance of our plugin
/// We keep the `Library` together with function pointers
/// so that they go out of scope together.
pub struct Plugin {
pub draw: extern "C" fn(fc: &mut FrameContext), // new
pub greet: unsafe extern "C" fn(name: *const c_char),
lib: Library,
}
impl Plugin {
pub fn load(lib_path: &Path) -> Result<Self, libloading::Error> {
let lib = Library::new(lib_path)?;
Ok(unsafe {
Plugin {
greet: *(lib.get(b"greet")?),
draw: *(lib.get(b"draw")?), // new
lib,
}
})
}
}
}
Running it as-is won't work:
$ cargo run -q -- --watch
Hot reloading enabled - there will be memory leaks!
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DlSym { desc: "/home/amos/ftl/greet/libgreet-rs/target/debug/libgreet.so: undefined symbol: draw" }', src/main.rs:70:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Luckily, libloading
is looking out for us.
So, let's add a draw
function to libgreet-rs
$ cd libgreet-rs/
$ cargo add ../common
Adding common (unknown version) to dependencies
For a first try, we'll make the whole screen blue:
// in `libgreet-rs/src/lib.rs`
use common::FrameContext;
#[no_mangle]
pub extern "C" fn draw(cx: &mut FrameContext) {
let pixels = cx.pixels();
// all blue!
for p in pixels {
p.b = 255;
}
}
Let's give it a shot:
$ cd libgreet-rs/
$ cargo b -q
$ cd greet-rs/
$ cargo run -q -- --watch
Mhhhhh, pure blue. Revolting.
It's a proof of concept bear, cool down.
Also what's up with your window decorations?
I don't know, might be the combination of two HiDPI (high display density)
settings, one zooming out, the other zooming in, or maybe it's just that
I'm using gnome-unstable
.
Ahah. Living dangerously I see.
Always.
And then... then that's it.
We're pretty much done.
Sure, we could add a lot of other nice things. We could let plugins have state, we could expose more functions, in one direction or the other.
But we have a nice enough playground. Don't believe me?
Just wind me up, and watch me go:
Afterword
What about Windows? or macOS?
Both left as an exercise to the reader. For macOS, I'd imagine a similar
strategy applies. For Windows, I'm not sure. It looks like the standard
library uses DLL_THREAD_DETACH
and DLL_PROCESS_DETACH
events, and keeps
its own list of destructors, so that approach might not work in some
multi-threaded scenarios.
We sure had to do a lot of things to live-reload a Rust library. In comparison, live-reloading a C library was super simple. What gives? I thought Rust was the best thing ever?
That's fair - but we went the long-winded way.
...as we always do.
Right. We could've totally gotten away with just avoiding TLS destructors
whatsoever - the host application could've exposed a println
function,
or we could've used std::io::stdout().write()
directly. We had options.
Is that family of problems Rust-specific? The __cxa_thread_atexit_impl
business?
No, it's not. We would've had the same kind of issues in C++, for which
__cxa_thread_atexit_impl
was made in the first place.
Would've or could've?
Well, I don't know whether cout
and cerr
rely on thread-local storage
by default, so, "could've", I guess.
Aren't you afraid the readers are going to see the estimated time for this article and just walk away?
Well, they're reading now, aren't they?
...fair. But still, why not split this into a series?
Well, first off, because I want to see just how long I can make articles without splitting them up, without folks just discarding them. Hopefully by now folks know what I'm about, and whether it's worth their time or not.
What's the other reason?
Splitting it into a series involves moving a bunch of assets into a bunch of folders and I'm really tired.
Yes, it is 2020.
So, how long have we been working on this article?
About... two weeks I'd say. One of them full-time.
Do you regret being nerd-sniped like that? Would you try to avoid it in the future?
I don't regret it at all. I wouldn't say stepping through glibc code in LLDB is the epitome of fun, but it's already come in handy several times since I did that.
Would you recommend that readers do the same?
Absolutely - the more you can learn about the layers on which you build: your language's runtime, the operating system, the specifics of memory and processors, it's all useful once in a while.
Do you think they'll actually do it?
Well, by not making complete project sources available for these articles, I'm already sort of trying to reproduce the feeling of absorbing knowledge from a dead tree (ie. print) book and typing it up by hand, on your own computer, to try and reproduce it.
Is that the real reason, or are you being lazy again?
Eh, 50/50. I don't think you can absorb all that knowledge by just downloading and running sources.
You have to work for it.
Isn't that sorta gatekeepy?
I'm not sure. Is it?
Well, I think readers just want something to play with. Not everyone has the time to sift through the article and apply every code change one by one.
You ought to know - you've been updating Making our own executable packer article by article, and it's been taking forever.
Fair enough. You do it then!
Uhhhh...
Here's another article just for you:
I am a Java, C#, C or C++ developer, time to do some Rust
As I've said before, I'm working on a book about lifetimes. Or maybe it's just a long series - I haven't decided the specifics yet. Like every one of my series/book things, it's long, and it starts you off way in the periphery of the subject, and takes a lot of detours to get there.
In other words - it's great if you want an adventure (which truly understanding Rust definitely is), but it's not the best if you are currently on the puzzled end of a conversation with your neighborhood lifetime enforcer, the Rust compiler.