Good morning! It is still 2020, and the world is literally on fire, so I guess we could all use a distraction.

This article continues the tradition of me getting shamelessly nerd-sniped - once by Pascal about small strings, then again by a twitch viewer about Rust enum sizes.

This time, Ana was handing out free nerdsnipes, so I got in line, and mine was:

How about you teach us how to how reload a dylib whenever the file changes?

And, sure, we can do that.

What's a dylib?

dylib is short for "dynamic library", also called "shared library", "shared object", "DLL", etc.

Let's first look at things that are not dynamic libraries.

We'll start with a C program, using GCC and binutils on Linux.

Say we want to greet many different things, and persons, we might want a greet function:

Shell session
$ mkdir greet
$ cd greet/
C code
// in `main.c`

#include <stdio.h>

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

int main(void) {
    greet("moon");
    return 0;
}
Shell session
$ gcc -Wall main.c -o main
$ ./main
Hello, moon!

This is not a dynamic library. It's just a function.

We can put that function into another file:

C code
// in `greet.c`

#include <stdio.h>

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

And compile it into an object (.o) file:

Shell session
$ gcc -Wall -c greet.c
$ file greet.o
greet.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

Then, from main.c, pinky promise that there will be a function named greet that exists at some point in the future:

C code
// in `main.c`

extern void greet(const char *name);

int main(void) {
    greet("stars");
    return 0;
}

Then compile main.c into an object (.o) file:

Shell session
$ gcc -Wall -c main.c
$ file main.o
main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

Now we have two objects: greet.o provides (T) greet, and needs (U) printf:

Shell session
$ nm greet.o
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T greet
                 U printf

And we have main.o, which provides (T) main, and needs (U) greet:

Shell session
$ nm main.o
                 U _GLOBAL_OFFSET_TABLE_
                 U greet
0000000000000000 T main

If we try to make an executable out of just greet.o, then... it doesn't work, because main is not provided, and some other object (that GCC magically links in when making executables) wants it:

Shell session
$ gcc greet.o -o woops
/usr/bin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/Scrt1.o: in function `_start':
(.text+0x24): undefined reference to `main'
collect2: error: ld returned 1 exit status

If we try to make an executable with just main.o, then... it doesn't work either, because we promised greet would be there, and it's not:

Shell session
$ gcc
gcc main.o -o woops
/usr/bin/ld: main.o: in function `main':
main.c:(.text+0xc): undefined reference to `greet'
collect2: error: ld returned 1 exit status

But if we have both... then it works!

Shell session
$ gcc main.o greet.o -o main
$ file main
main: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e1915df00b8bf67e121fbd30f0eaf1fd81ecdeb6, for GNU/Linux 3.2.0, not stripped
$ ./main
Hello, stars!

And we have an executable. Again. But there's still no dynamic library (of ours) involved there.

If we look at the symbols our main executable needs:

Shell session
$ nm --undefined-only main
                 w __cxa_finalize@@GLIBC_2.2.5
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U __libc_start_main@@GLIBC_2.2.5
                 U printf@@GLIBC_2.2.5

Okay, let's ignore weak (w) symbols for now - mostly, it needs... some startup routine, and printf. Good.

As for the symbols that are defined in main, there's uh, a lot:

Shell session
$ nm --defined-only main
00000000000002e8 r __abi_tag
0000000000004030 B __bss_start
0000000000004030 b completed.0
0000000000004020 D __data_start
0000000000004020 W data_start
0000000000001070 t deregister_tm_clones
00000000000010e0 t __do_global_dtors_aux
0000000000003df0 d __do_global_dtors_aux_fini_array_entry
0000000000004028 D __dso_handle
0000000000003df8 d _DYNAMIC
0000000000004030 D _edata
0000000000004038 B _end
00000000000011f8 T _fini
0000000000001130 t frame_dummy
0000000000003de8 d __frame_dummy_init_array_entry
000000000000214c r __FRAME_END__
0000000000004000 d _GLOBAL_OFFSET_TABLE_
0000000000002018 r __GNU_EH_FRAME_HDR
0000000000001150 T greet
0000000000001000 t _init
0000000000003df0 d __init_array_end
0000000000003de8 d __init_array_start
0000000000002000 R _IO_stdin_used
00000000000011f0 T __libc_csu_fini
0000000000001180 T __libc_csu_init
0000000000001139 T main
00000000000010a0 t register_tm_clones
0000000000001040 T _start
0000000000004030 D __TMC_END__

So let's filter it out a little:

Shell session
$ nm --defined-only ./main | grep ' T '
00000000000011f8 T _fini
0000000000001150 T greet
00000000000011f0 T __libc_csu_fini
0000000000001180 T __libc_csu_init
0000000000001139 T main
0000000000001040 T _start

Oh hey, greet is there.

Does that mean... is our main file also a dynamic library?

Let's try loading it from another executable, at runtime.

How do we load a library at runtime? That's the dynamic linker's job. Instead of making our own, this time we'll use glibc's dynamic linker:

C code
// in `load.c`

// my best guess is that `dlfcn` stands for `dynamic loading functions`
#include <dlfcn.h>
#include <stdio.h>

// C function pointer syntax is... something.
// Let's typedef our way out of this one.
typedef void (*greet_t)(const char *name);

int main(void) {
    // what do we want? symbols!
    // when do we want them? at an implementation-defined time!
    void *lib = dlopen("./main", RTLD_LAZY);
    if (!lib) {
        fprintf(stderr, "failed to load library\n");
        return 1;
    }
    greet_t greet = (greet_t) dlsym(lib, "greet");
    if (!greet) {
        fprintf(stderr, "could not look up symbol 'greet'\n");
        return 1;
    }
    greet("venus");
    dlclose(lib);
    return 0;
}

Let's make an executable out of load.c and:

Shell session
$ gcc -Wall load.c -o load
/usr/bin/ld: /tmp/ccnvYCz7.o: in function `main':
load.c:(.text+0x15): undefined reference to `dlopen'
/usr/bin/ld: load.c:(.text+0x5a): undefined reference to `dlsym'
collect2: error: ld returned 1 exit status

Oh right, dlopen itself is in a dynamic library - libdl.so:

Shell session
$ whereis libdl.so
libdl: /usr/lib/libdl.so /usr/lib/libdl.a

Okay, /usr/lib is in gcc's default library path:

Shell session
$ gcc -x c -E -v /dev/null 2>&1 | grep LIBRARY_PATH
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../:/lib/:/usr/lib/

...and it does contain dlopen, dlsym and dlclose:

Shell session
$ nm /usr/lib/libdl.so | grep -E 'dl(open|sym|close)'
nm: /usr/lib/libdl.so: no symbols

Uhh... wait, those are dynamic symbols, so we need nm's -D flag:

Shell session
$ nm -D /usr/lib/libdl.so | grep -E 'dl(open|sym|close)'
0000000000001450 T dlclose@@GLIBC_2.2.5
0000000000001390 T dlopen@@GLIBC_2.2.5
00000000000014c0 T dlsym@@GLIBC_2.2.5

What's with the @@GLIBC_2.2.5 suffixes?

Oh hey cool bear - those are just versions, don't worry about them.

Say I did want to worry about them, where could I read more about them?

You can check the LSB Core Specification - but be warned, it's a rabbit hole and a half.

So, since libdl.so contains the symbols we need, and its in GCC's library path, we should be able to link against it with just -ldl:

Shell session
$ gcc -Wall load.c -o load -ldl
$ file load
load: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=0d246f67c894d7032d0d5093ec01625e58711034, for GNU/Linux 3.2.0, not stripped

Hooray! Now we just have to run it:

Shell session
$ ./load
failed to load library

Ah. Well let's be thankful our C program had basic error checking this time.

So, main is not a dynamic library?

I guess not.

Is there any way to get a little more visibility into why dlopen fails?

Sure! We can use the LD_DEBUG environment variable.

Shell session
$ LD_DEBUG=all ./load
    160275:     symbol=__vdso_clock_gettime;  lookup in file=linux-vdso.so.1 [0]
    160275:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
    160275:     symbol=__vdso_gettimeofday;  lookup in file=linux-vdso.so.1 [0]
    160275:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_gettimeofday' [LINUX_2.6]
    160275:     symbol=__vdso_time;  lookup in file=linux-vdso.so.1 [0]
    160275:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_time' [LINUX_2.6]
    160275:     symbol=__vdso_getcpu;  lookup in file=linux-vdso.so.1 [0]
    160275:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
    160275:     symbol=__vdso_clock_getres;  lookup in file=linux-vdso.so.1 [0]
    160275:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_getres' [LINUX_2.6]

Hold on, hold on - what are these for?

vDSO is for "virtual dynamic shared object" - the short answer is, it makes some syscalls faster. The long answer, you can read on LWN.

Shell session
    160275:     file=libdl.so.2 [0];  needed by ./load [0]
    160275:     find library=libdl.so.2 [0]; searching
    160275:      search cache=/etc/ld.so.cache
    160275:       trying file=/usr/lib/libdl.so.2
    160275:
    160275:     file=libdl.so.2 [0];  generating link map
    160275:       dynamic: 0x00007f5be513dcf0  base: 0x00007f5be5139000   size: 0x0000000000005090
    160275:         entry: 0x00007f5be513a210  phdr: 0x00007f5be5139040  phnum:                 11

Ah, it's loading libdl.so - we asked for that! What's /etc/ld.so though?

Well, libdl.so is a dynamic library, so it's loaded at runtime, so the dynamic linker has to find it first.

There's a config file at /etc/ld.so.conf:

Shell session
$ cat /etc/ld.so.conf
# Dynamic linker/loader configuration.
# See ld.so(8) and ldconfig(8) for details.

include /etc/ld.so.conf.d/*.conf
$ cat /etc/ld.so.conf.d/*.conf
/usr/lib/libfakeroot
/usr/lib32
/usr/lib/openmpi

And to avoid repeating lookups, there's a cache at /etc/ld.so.cache:

Shell session
$ xxd /etc/ld.so.cache | tail -60 | head
00030bb0: 4641 7564 696f 2e73 6f00 6c69 6246 4175  FAudio.so.libFAu
00030bc0: 6469 6f2e 736f 002f 7573 722f 6c69 6233  dio.so./usr/lib3
00030bd0: 322f 6c69 6246 4175 6469 6f2e 736f 006c  2/libFAudio.so.l
00030be0: 6962 4547 4c5f 6e76 6964 6961 2e73 6f2e  ibEGL_nvidia.so.
00030bf0: 3000 2f75 7372 2f6c 6962 2f6c 6962 4547  0./usr/lib/libEG
00030c00: 4c5f 6e76 6964 6961 2e73 6f2e 3000 6c69  L_nvidia.so.0.li
00030c10: 6245 474c 5f6e 7669 6469 612e 736f 2e30  bEGL_nvidia.so.0
00030c20: 002f 7573 722f 6c69 6233 322f 6c69 6245  ./usr/lib32/libE
00030c30: 474c 5f6e 7669 6469 612e 736f 2e30 006c  GL_nvidia.so.0.l
00030c40: 6962 4547 4c5f 6e76 6964 6961 2e73 6f00  ibEGL_nvidia.so.

Let's keep going through our LD_DEBUG=all output:

Shell session
    160275:     file=libc.so.6 [0];  needed by ./load [0]
    160275:     find library=libc.so.6 [0]; searching
    160275:      search cache=/etc/ld.so.cache
    160275:       trying file=/usr/lib/libc.so.6
    160275:
    160275:     file=libc.so.6 [0];  generating link map
    160275:       dynamic: 0x00007f2d14b7a9c0  base: 0x00007f2d149b9000   size: 0x00000000001c82a0
    160275:         entry: 0x00007f2d149e1290  phdr: 0x00007f2d149b9040  phnum:                 14

Same deal - but this time it's loading libc.so.6.

Shell session
    160275:     checking for version `GLIBC_2.2.5' in file /usr/lib/libdl.so.2 [0] required by file ./load [0]
    160275:     checking for version `GLIBC_2.2.5' in file /usr/lib/libc.so.6 [0] required by file ./load [0]
    160275:     checking for version `GLIBC_PRIVATE' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libdl.so.2 [0]
    160275:     checking for version `GLIBC_PRIVATE' in file /usr/lib/libc.so.6 [0] required by file /usr/lib/libdl.so.2 [0]
    160275:     checking for version `GLIBC_2.4' in file /usr/lib/libc.so.6 [0] required by file /usr/lib/libdl.so.2 [0]
    160275:     checking for version `GLIBC_2.2.5' in file /usr/lib/libc.so.6 [0] required by file /usr/lib/libdl.so.2 [0]
    160275:     checking for version `GLIBC_2.2.5' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libc.so.6 [0]
    160275:     checking for version `GLIBC_2.3' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libc.so.6 [0]
    160275:     checking for version `GLIBC_PRIVATE' in file /lib64/ld-linux-x86-64.so.2 [0] required by file /usr/lib/libc.so.6 [0]

Ah, there's those versions I was asking about earlier.

Yup. As you can see, there's a bunch of them. Also, I'm pretty sure "private" is not very semver, but let's not get distracted.

Shell session
    160275:     Initial object scopes
    160275:     object=./load [0]
    160275:      scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
    160275:
    160275:     object=linux-vdso.so.1 [0]
    160275:      scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
    160275:      scope 1: linux-vdso.so.1
    160275:
    160275:     object=/usr/lib/libdl.so.2 [0]
    160275:      scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
    160275:
    160275:     object=/usr/lib/libc.so.6 [0]
    160275:      scope 0: ./load /usr/lib/libdl.so.2 /usr/lib/libc.so.6 /lib64/ld-linux-x86-64.so.2
    160275:
    160275:     object=/lib64/ld-linux-x86-64.so.2 [0]
    160275:      no scope

Here the dynamic linker is just telling us the order in which it'll look for symbols in various object files. Note that there's a specific order for each object file - they just happen to be mostly the same here.

For ./load, it'll first look in ./load, the executable we're loading, then in libdl, then in libc, then in.. the dynamic linker itself.

Wait... it looks for symbols in ./load? An executable?

So executables are also dynamic libraries?

Well... sort of. Let's come back to that later.

Shell session
    160275:     relocation processing: /usr/lib/libc.so.6
    160275:     symbol=_res;  lookup in file=./load [0]
    160275:     symbol=_res;  lookup in file=/usr/lib/libdl.so.2 [0]
    160275:     symbol=_res;  lookup in file=/usr/lib/libc.so.6 [0]
    160275:     binding file /usr/lib/libc.so.6 [0] to /usr/lib/libc.so.6 [0]: normal symbol `_res' [GLIBC_2.2.5]
    160275:     symbol=stderr;  lookup in file=./load [0]
    160275:     binding file /usr/lib/libc.so.6 [0] to ./load [0]: normal symbol `stderr' [GLIBC_2.2.5]
    160275:     symbol=error_one_per_line;  lookup in file=./load [0]
    160275:     symbol=error_one_per_line;  lookup in file=/usr/lib/libdl.so.2 [0]
    160275:     symbol=error_one_per_line;  lookup in file=/usr/lib/libc.so.6 [0]
    160275:     binding file /usr/lib/libc.so.6 [0] to /usr/lib/libc.so.6 [0]: normal symbol `error_one_per_line' [GLIBC_2.2.5]
(etc.)

Okay, there's a lot of these, so let's skip ahead. But you can see it looks up in the order it determined earlier: first ./load, then libdl, then libc.

Shell session
    160275:     calling init: /lib64/ld-linux-x86-64.so.2
    160275:
    160275:
    160275:     calling init: /usr/lib/libc.so.6
    160275:
    160275:
    160275:     calling init: /usr/lib/libdl.so.2
    160275:
    160275:
    160275:     initialize program: ./load
    160275:
    160275:
    160275:     transferring control: ./load

At this point it's done (well, done enough) loading dynamic libraries, and initializing them, and it has transferred control to our program, ./load.

Shell session
   160275:     symbol=dlopen;  lookup in file=./load [0]
    160275:     symbol=dlopen;  lookup in file=/usr/lib/libdl.so.2 [0]
    160275:     binding file ./load [0] to /usr/lib/libdl.so.2 [0]: normal symbol `dlopen' [GLIBC_2.2.5]

Uhhh amos, why is it still doing symbol lookups? Wasn't it done loading libdl.so?

Ehhh, it was "done enough". Remember the RTLD_LAZY flag we passed to dlopen? On my Linux distro, it's the default setting for the dynamic loader.

Oh. And I suppose the "implementation-defined time" is now?

Correct.

Shell session
    160275:     file=./main [0];  dynamically loaded by ./load [0]
    160275:     file=./main [0];  generating link map

Oohh it's actually loading ./main!

Yes, because we called dlopen! It even says that it's "dynamically loaded" by ./load, our test executable.

Well, what happens next? Any error messages?

Unfortunately, there arent. It just looks up fwrite (which I'm assuming is what our fprintf call compiled to) so we can print our own error messages, then calls finalizers and exits:

Shell session
    160275:     symbol=fwrite;  lookup in file=./load [0]
    160275:     symbol=fwrite;  lookup in file=/usr/lib/libdl.so.2 [0]
    160275:     symbol=fwrite;  lookup in file=/usr/lib/libc.so.6 [0]
    160275:     binding file ./load [0] to /usr/lib/libc.so.6 [0]: normal symbol `fwrite' [GLIBC_2.2.5]
failed to load library
    160275:
    160275:     calling fini: ./load [0]
    160275:
    160275:
    160275:     calling fini: /usr/lib/libdl.so.2 [0]
    160275:

So we don't know what went wrong?

Well... remember when we tried to make sure libdl.so had dlopen and friends? We had to use nm's -D flag

D for "dynamic", yes.

But when we found that main provided the greet symbol, we didn't use -D.

And if we do...

Shell session
$ nm -D main
                 w __cxa_finalize@@GLIBC_2.2.5
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U __libc_start_main@@GLIBC_2.2.5
                 U printf@@GLIBC_2.2.5

...there's no sign of greet.

Ohh. So for main, greet is in one of the symbol tables, but not the dynamic symbol table.

Correct!

In fact, if we strip main, all those symbols are gone.

Before:

Shell session
$ nm main | grep " T "
00000000000011f8 T _fini
0000000000001150 T greet
00000000000011f0 T __libc_csu_fini
0000000000001180 T __libc_csu_init
0000000000001139 T main
0000000000001040 T _start
$ stat -c '%s bytes' main
16664 bytes

After:

Shell session
$ strip main
$ nm main | grep " T "
nm: main: no symbols
$ stat -c '%s bytes' main
14328 bytes

But it still has dynamic symbols right? Even after stripping?

Yes, it needs printf!

Shell session
$ nm -D main
                 w __cxa_finalize@@GLIBC_2.2.5
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U __libc_start_main@@GLIBC_2.2.5
                 U printf@@GLIBC_2.2.5

Okay, so main has a dynamic symbol table, it just doesn't export anything.

Can we make it somehow both an executable and a dynamic library?

Bear, I'm so glad you asked. Yes we can.

Let's do it, just for fun:

C code
#include <unistd.h>
#include <stdio.h>

// This tells GCC to make a section named `.interp` and store
// `/lib64/ld-linux-x86-64.so.2` (the path of the dynamic linker) in it.
//
// (Normally it would do it itself, but since we're going to be using the
// `-shared` flag, it won't.)
const char interpreter[] __attribute__((section(".interp"))) = "/lib64/ld-linux-x86-64.so.2";

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

// Normally, we'd link with an object file that has its own entry point,
// and *then* calls `main`, but since we're using the `-shared` flag, we're
// linking to *another* object file, and we need to provide our own entry point.
//
// Unlike main, this one does not return an `int`, and we can never return from
// it, we need to call `_exit` or we'll crash.
void entry() {
    greet("rain");
    _exit(0);
}

And now... we make a dynamic library / executable hybrid:

Shell session
$ gcc -Wall -shared main.c -o libmain.so -Wl,-soname,libmain.so -Wl,-e,entry
$ file libmain.so
libmain.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=460bf95f9cd22afa074399512bd9290c20b552ff, not stripped
Cool bear's hot tip

-Wl,-some-option is how we tell GCC to pass linker options. -Wl,-foo will pass -foo to GNU ld. -Wl,-foo,bar will pass -foo=bar.

-soname isn't technically required for this demo to work, but it's a thing, so we might as well set it.

As for -e=entry, that one is required, otherwise we won't be able to run it as an executable. Remember, we're bringing our own entry point!

And it works as an executable:

Shell session
$ ./libmain.so
Hello, rain!

And as a library:

C code
// in `load.c`

int main(void) {
    // was "main"
    void *lib = dlopen("./libmain.so", RTLD_LAZY);

    // etc.
}
Shell session
$ gcc -Wall load.c -o load -ldl
$ ./load
Hello, venus!

Whoa, that's neat! Can we take a look at LD_DEBUG output for this run?

Sure, let's g-

...but this time, can we filter it out a little, so it fits in one or two screens?

Okay, sure.

When LD_DEBUG is set, the dynamic linker (ld-linux-x86-64.so.2, which is also an executable / dynamic library hybrid) outputs debug information to the "standard error" (stderr), which has file descriptor number 2, so - if we want to filter it, we'll need to redirect "standard error" to "standard output" with 2>&1 - let's try it out:

Shell session
$ LD_DEBUG=all ./load 2>&1 | grep 'strcpy'
    172425:     symbol=strcpy;  lookup in file=./load [0]
    172425:     symbol=strcpy;  lookup in file=/usr/lib/libdl.so.2 [0]
    172425:     symbol=strcpy;  lookup in file=/usr/lib/libc.so.6 [0]
    172425:     binding file /usr/lib/libdl.so.2 [0] to /usr/lib/libc.so.6 [0]: normal symbol `strcpy' [GLIBC_2.2.5]

Yeah, that works!

Next up - all is a bit verbose, let's try setting LD_DEBUG to files instead. Also, let's pipe everything into wc -l, to count lines

Shell session
$ LD_DEBUG=all ./load 2>&1 | wc -l
666
$ LD_DEBUG=files ./load 2>&1 | wc -l
50

Okay, 50 lines! That's much more reasonable:

Shell session
$ LD_DEBUG=files ./load 2>&1 | head -10
    173292:
    173292:     file=libdl.so.2 [0];  needed by ./load [0]
    173292:     file=libdl.so.2 [0];  generating link map
    173292:       dynamic: 0x00007f3a1df6fcf0  base: 0x00007f3a1df6b000   size: 0x0000000000005090
    173292:         entry: 0x00007f3a1df6c210  phdr: 0x00007f3a1df6b040  phnum:                 11
    173292:
    173292:
    173292:     file=libc.so.6 [0];  needed by ./load [0]
    173292:     file=libc.so.6 [0];  generating link map
    173292:       dynamic: 0x00007f3a1df639c0  base: 0x00007f3a1dda2000   size: 0x00000000001c82a0

Mhh having the output prefixed by the PID (process identifier, here 172709) is a bit annoying, we can use sed (the "Stream EDitor") to fix that.

By the power vested in me by regular expressions, I filter thee:

Shell session
$ LD_DEBUG=files ./load 2>&1 | sed -E -e 's/^[[:blank:]]+[[:digit:]]+:[[:blank:]]*//' | head

file=libdl.so.2 [0];  needed by ./load [0]
file=libdl.so.2 [0];  generating link map
dynamic: 0x00007fe98d502cf0  base: 0x00007fe98d4fe000   size: 0x0000000000005090
entry: 0x00007fe98d4ff210  phdr: 0x00007fe98d4fe040  phnum:                 11


file=libc.so.6 [0];  needed by ./load [0]
file=libc.so.6 [0];  generating link map
dynamic: 0x00007fe98d4f69c0  base: 0x00007fe98d335000   size: 0x00000000001c82a0

Let's break that down. The -E flag enables extended regular expressions. My advice? Don't bother learning non-extended regular expressions.

-e specifies a script for sed to run. Here, our script has the s/pattern/replacement/ command, which substitutes pattern with replacement.

You can probably make sense of the pattern by just using a cheat sheet, but here it is:

And our replacement is "". The empty string.

Hey, silly question - why are we using ' (a single quote) around sed scripts? Don't you usually use double quotes?

Well, I don't want the shell to expand whatever is inside. See for example:

Shell session
$ echo "$(whoami)"
amos

Compared to:

Shell session
$ echo '$(whoami)'
$(whoami)

Since there might be a bunch of strange characters, that are meaningful to my shell, I don't want my shell to interpolate any of it, so, single quotes.

Got it, thanks.

Cool bear's hot tip

Note that the above sed command could also be achieved with a simple cut -f 2.

But where's the fun in that?

So.

We've filtered out a lot of noise, but we're still getting those blank lines - we can use another sed command to filter those out: /pattern/d - where the d stands for "delete".

Our pattern will just be ^$ - it matches the start of a line and the end of a line, with nothing in between, so, only empty lines will (should?) match.

Shell session
$ LD_DEBUG=files ./load 2>&1 | sed -E -e 's/^[[:blank:]]+[[:digit:]]+:[[:blank:]]*//' -e '/^$/d'
file=libdl.so.2 [0];  needed by ./load [0]
file=libdl.so.2 [0];  generating link map
dynamic: 0x00007fc870f73cf0  base: 0x00007fc870f6f000   size: 0x0000000000005090
entry: 0x00007fc870f70210  phdr: 0x00007fc870f6f040  phnum:                 11
file=libc.so.6 [0];  needed by ./load [0]
file=libc.so.6 [0];  generating link map
dynamic: 0x00007fc870f679c0  base: 0x00007fc870da6000   size: 0x00000000001c82a0
entry: 0x00007fc870dce290  phdr: 0x00007fc870da6040  phnum:                 14
calling init: /lib64/ld-linux-x86-64.so.2
calling init: /usr/lib/libc.so.6
calling init: /usr/lib/libdl.so.2
initialize program: ./load
transferring control: ./load

Here comes the good stuff!

Shell session
file=./libmain.so [0];  dynamically loaded by ./load [0]
file=./libmain.so [0];  generating link map
dynamic: 0x00007fc870fa6e10  base: 0x00007fc870fa3000   size: 0x0000000000004040
entry: 0x00007fc870fa4150  phdr: 0x00007fc870fa3040  phnum:                 11
calling init: ./libmain.so
opening file=./libmain.so [0]; direct_opencount=1
calling fini: ./libmain.so [0]
file=./libmain.so [0];  destroying link map
calling fini: ./load [0]
calling fini: /usr/lib/libdl.so.2 [0]
Hello, venus!

So, the output is out of order here a little bit - the stderr (standard error) and stdout (standard output) streams are mixed, so printing Hello, venus actually happens before finalizers are called.

Alright, and what if we wanted a regular dynamic library, one that isn't also an executable?

That's much simpler. We don't need an entry point, we don't need to use a funky GCC attribute to add an .interp section, and we only need the one linker flag.

C code
// in `greet.c`

#include <stdio.h>

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Do we need to do anything special to export greet?

No we don't! In C99, by default, functions have external linkage, so we're all good. If we wanted to not export it, we'd use the static keyword, to ask for internal linkage.

And then just use -shared, and specify an output name of libsomething.so:

Shell session
$ gcc -Wall -shared greet.c -o libgreet.so

And, let's just adjust load.c to load libgreet.so (it was loading libmain.so previously):

C code
#include <dlfcn.h>
#include <stdio.h>

typedef void (*greet_t)(const char *name);

int main(void) {
    // this was `./libmain.so`
    void *lib = dlopen("./libgreet.so", RTLD_LAZY);
    if (!lib) {
        fprintf(stderr, "failed to load library\n");
        return 1;
    }
    greet_t greet = (greet_t) dlsym(lib, "greet");
    if (!lib) {
        fprintf(stderr, "could not look up symbol 'greet'\n");
        return 1;
    }
    greet("venus");
    dlclose(lib);
    return 0;
}

Okay! I think that now, 34 minutes in, we know what a dylib is.

More or less.

And now, some Rust

Let's write some Rust.

Shell session
$ cargo new greet-rs
     Created binary (application) `greet-rs` package
Rust code
// in `src/main.rs`

fn main() {
    greet("fresh coffee");
}

fn greet(name: &str) {
    println!("Hello, {}", name);
}
Shell session
$ cargo run -q
Hello, fresh coffee

It sure greets. But how does it actually work? Is it interpreted? Is it compiled?

I don't think Rust has an interpreter..

Well, actually... how do you think const-eval works?

M.. magic?

No, M is for Miri. It interprets mid-level intermediate representation, and voilà: compile-time evaluation.

I thought Miri was used to detect undefined behavior?

That too! It's a neat tool.

In that case, though, our code is definitely being compiled.

cargo run does two things: first, cargo build, then, run the resulting executable.

Shell session
$ cargo build
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ./target/debug/greet-rs
Hello, fresh coffee

Now that we have a Linux executable, we can poke at it!

For example, we can look at its symbols:

Shell session
$ nm ./target/debug/greet-rs | grep " T "
000000000002ccc0 T __divti3
000000000002d0c8 T _fini
000000000002d0c0 T __libc_csu_fini
000000000002d050 T __libc_csu_init
00000000000053f0 T main
00000000000111f0 T __rdos_backtrace_create_state
0000000000010e60 T __rdos_backtrace_pcinfo
00000000000110e0 T __rdos_backtrace_syminfo
0000000000005580 T __rust_alloc
000000000000e470 T rust_begin_unwind
(etc.)

Mh, there's a lot of those. In my version, there's 188 T symbols.

We can also look at the dynamic symbols:

Shell session
$ nm -D ./target/debug/greet-rs
                 U abort@@GLIBC_2.2.5
                 U bcmp@@GLIBC_2.2.5
                 U bsearch@@GLIBC_2.2.5
                 U close@@GLIBC_2.2.5
                 w __cxa_finalize@@GLIBC_2.2.5
                 w __cxa_thread_atexit_impl@@GLIBC_2.18
                 U dladdr@@GLIBC_2.2.5
                 U dl_iterate_phdr@@GLIBC_2.2.5
                 U __errno_location@@GLIBC_2.2.5
                 U free@@GLIBC_2.2.5

This time, there's only 79 of them. But, see, it's not that different from our C executable. Since our Rust executable uses the standard library (it's not no_std), it also uses the C library. Here, it's glibc.

But does it export anything?

Shell session
$ nm -D --defined-only ./target/debug/greet-rs

Mh nope. Does that command even work, though? Is this thing on?

Shell session
$ nm -D --defined-only /usr/lib/libdl.so
0000000000001dc0 T dladdr@@GLIBC_2.2.5
0000000000001df0 T dladdr1@@GLIBC_2.3.3
0000000000001450 T dlclose@@GLIBC_2.2.5
0000000000001860 T dlerror@@GLIBC_2.2.5
0000000000005040 B _dlfcn_hook@@GLIBC_PRIVATE
0000000000001f20 T dlinfo@@GLIBC_2.3.3
00000000000020b0 T dlmopen@@GLIBC_2.3.4
0000000000001390 T dlopen@@GLIBC_2.2.5
00000000000014c0 T dlsym@@GLIBC_2.2.5
00000000000015c0 W dlvsym@@GLIBC_2.2.5
(etc.)

Okay, so it's not a dynamic library.

Correct!

But can we use a dynamic library from Rust?

Sure we can! That's how we get malloc, free, etc.

But how?

That's a fair question - after all, if our test executable load uses dlopen:

Shell session
$ ltrace -l 'libdl*' ./load
load->dlopen("./libmain.so", 1)                                                     = 0x55cf98c282c0
load->dlsym(0x55cf98c282c0, "greet")                                                = 0x7fe0d4074129
Hello, venus!
load->dlclose(0x55cf98c282c0)                                                       = 0
+++ exited (status 0) +++

Our greet-rs executable doesn't:

Shell session
$ ltrace -l 'libdl*' ./greet-rs/target/debug/greet-rs
Hello, fresh coffee
+++ exited (status 0) +++

Exactly, so, how does it load them?

Well... it doesn't. The dynamic linker does it, before our program even starts.

We can use ldd to find out the direct dependencies of an ELF file. Our executable is an ELF file. Dynamic libraries on this system are ELF files too. Even our .o files have been ELF files all along.

Cool bear's hot tip

This wasn't always the case on Linux (or MINIX, or System V).

In the times of yore, there was a.out. There was stabbing involved.

Shell session
$ ldd ./greet-rs/target/debug/greet-rs
        linux-vdso.so.1 (0x00007ffc911e1000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f17479b1000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f174798f000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f1747975000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f17477ac000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f1747a27000)

But I don't love ldd.

ldd is just a bash script!

Shell session
$ head -3 $(which ldd)
#! /usr/bin/bash
# Copyright (C) 1996-2020 Free Software Foundation, Inc.
# This file is part of the GNU C Library.

And how do we feel about bash in this house?

Conflicted.

Correct.

In fact, ldd just sets an environment variable and calls the dynamic linker instead - which, as we mentioned earlier, is both a dynamic library and an executable:

Shell session
$ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 ./greet-rs/target/debug/greet-rs
        linux-vdso.so.1 (0x00007ffd5d1c3000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f3c953da000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f3c953b8000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f3c9539e000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f3c951d5000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3c95450000)

The main reason I don't like that is that running ldd on an executable actually loads it, and, if it's a malicious binary, this can result in arbitrary code execution.

Another reason I don't love ldd is that its output is all flat:

Shell session
$ ldd /bin/bash
        linux-vdso.so.1 (0x00007ffcbd1ef000)
        libreadline.so.8 => /usr/lib/libreadline.so.8 (0x00007fd0dbabf000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fd0dbab9000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fd0db8f0000)
        libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fd0db87f000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fd0dbc35000)

And I like trees:

Shell session
$ lddtree /bin/bash
/bin/bash (interpreter => /lib64/ld-linux-x86-64.so.2)
    libreadline.so.8 => /usr/lib/libreadline.so.8
        libncursesw.so.6 => /usr/lib/libncursesw.so.6
    libdl.so.2 => /usr/lib/libdl.so.2
    libc.so.6 => /usr/lib/libc.so.6

Fancy. But Amos... isn't lddtree also a bash script?

It is! But it doesn't use ld.so, it uses scanelf, or readelf and objdump.

What if I wanted something that isn't written in bash? For personal reasons?

There's also a C++11 thing and I also did a Go thing, there's plenty of poison from which to pick yours.

Which doesn't tell us how the dynamic linker knows which libraries to load. Reading the bash source of ldd is especially unhelpful, since it just lets ld.so do all the hard work.

However, if we use readelf...

Shell session
$ readelf --dynamic ./greet-rs/target/debug/greet-rs | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

...we can see that the names of the dynamic libraries we need are right there in the dynamic section of our ELF file.

But here's the thing - how did that happen? I don't remember asking for any dynamic library, and yet here they are.

In other words, what do we write in Rust, so that our executable requires another dynamic library?

Well, we write this:

Rust code
use std::{ffi::CString, os::raw::c_char};

#[link(name = "greet")]
extern "C" {
    fn greet(name: *const c_char);
}

fn main() {
    let name = CString::new("fresh coffee").unwrap();
    unsafe {
        greet(name.as_ptr());
    }
}
Cool bear's hot tip

There is a gotcha here - if you write the above code like that instead:

Rust code
    let name = CString::new("fresh coffee").unwrap().as_ptr();
    unsafe {
        greet(name);
    }

It doesn't work. Here's the equivalent of the incorrect version:

Rust code
    let name = {
        // builds a new CString
        let cstring = CString::new("fresh coffee").unwrap();
        // derives a raw pointer (`*const c_char`) from the CString.
        // since it's not a reference, it doesn't have a lifetime, nothing
        // in the type system links it to `cstring`
        let ptr = cstring.as_ptr();
        // here, `cstring` goes out of scope and is freed, so `ptr` is now
        // dangling
        ptr
    };
    unsafe {
        // `name` is what `ptr` was in our inner scope, and it's dangling,
        // so this will crash and/or do very naughty things.
        greet(name);
    }

Esteban wants me to tell you that this is a big gotcha because the Rust compiler doesn't catch this at all, at least not yet, so you have to be careful not to make this mistake. It's a good example of the dangers of unsafe.

Note also that this is not a problem with the cstr crate, which returns a &'static CStr, and which we use further down.

But that doesn't quite work, because...

Shell session
$ cargo build
   Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
error: linking with `cc` failed: exit code: 1
  |
  = note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" (etc.) "-Wl,-Bdynamic" "-ldl" "-lrt" "-lpthread" "-lgcc_s" "-lc" "-lm" "-lrt" "-lpthread" "-lutil" "-ldl" "-lutil"
  = note: /usr/bin/ld: cannot find -lgreet
          collect2: error: ld returned 1 exit status


error: aborting due to previous error

error: could not compile `greet-rs`.

There are interesting things! Going on! In this error message!

Yes, I see some -Wl,-something command-line flags there. Is it using the same convention to pass linker flags?

It is!

Is it using... the same linker? GNU ld?

Yes! Unless we specifically ask it to use another linker, like gold or lld.

(Not to be confused with ldd.)

And our libgreet.so from earlier is definitely not in any of the default library paths.

So, we have a couple options at our disposal. We could copy libgreet.so to, say, /usr/lib. Although it would immediately make everything work, this requires root privilege, so we'll try not to do it.

We could set the RUSTFLAGS environment variable when building our binary:

Shell session
$ RUSTFLAGS="-L ${PWD}/.." cargo build
   Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 0.17s
Cool bear's hot tip

PWD is an environment variable set to the "present working directory", also called "current working directory".

In bash and zsh, variables like $PWD are expanded - but it's often a good idea to enclose the variable name in brackets, in case it's followed by other characters that are valid in identifiers.

To avoid this:

Shell session
$ echo "$PWDOOPS"

We do this:

Shell session
$ echo "${PWD}OOPS"
/home/amos/ftl/greet/greet-rsOOPS

Finally, -L .. would work just as well, but it's also a good idea to pass absolute paths, when specifying search paths. Otherwise, if one of the tools involved passes that argument to another tool, and that other tool changes the current directory, then our relative path is now incorrect.

So, setting RUSTFLAGS works. Remembering to set it every time we want to compile is no fun, though.

So we can make a build script instead! It's in build.rs, not in the src/ folder, but next to the src/ folder:

Rust code
// in `build.rs`

use std::path::PathBuf;

fn main() {
    let manifest_dir =
        PathBuf::from(std::env::var_os("CARGO_MANIFEST_DIR").expect("manifest dir should be set"));
    let lib_dir = manifest_dir
        .parent()
        .expect("manifest dir should have a parent");
    println!("cargo:rustc-link-search={}", lib_dir.display());
}

And now we can just cargo build away:

Rust code
$ cargo build
   Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s

But will it run?

Shell session
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory

No, it won't!

But I thought... didn't we... didn't we specify -L so the linker could find libgreet.so?

Yes, the static linker (ld) found it. But the dynamic linker (ld.so) also needs to find it, at runtime.

How do we achieve that? Are there more search paths?

There are more search paths.

Shell session
$ LD_LIBRARY_PATH="${PWD}/.." ./target/debug/greet-rs
Hello, fresh coffee!

Hooray!

This is also a hassle, though. We probably don't want to specify the library path every time we run greet-rs.

As usual, we have a couple options available. Remember /etc/ld.so? There are config files in there. We could just make our own:

# in /etc/ld.so.conf/greet.conf

# change this unless your name is also amos, in which
# case, welcome to the club.
/home/amos/ftl/greet

And now, everything w-

Shell session
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory

Wait, wasn't /etc/ld.so cached?

Oh, right.

Shell session
$ sudo ldconfig
Password: hunter2

That should do it.

Shell session
$ ./target/debug/greet-rs
Hello, fresh coffee!

Hurray! I wonder though: is it such a good idea to modify system configuration just for that?

It probably isn't. Which is why we're going to undo our changes.

Shell session
$ sudo rm /etc/ld.so.conf.d/greet.conf
$ sudo ldconfig
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory

Now we're back to square one hundred.

The good news is: there is a thing in ELF files that tells the dynamic linker "hey by the way look here for libraries" - it's called "RPATH", or "RUNPATH", actually there's a bunch of these with subtle differences, oh no.

The bad news is: short of creating a .cargo/config file, or setting the RUSTFLAGS environment variable, there's no great way to set the RPATH in Rust right now. There's an open issue, though. Feel free to go ahead and contribute there.

Me? I have an article to finish. And the other good news here is that you can set an executable's RPATH after the fact. You can patch it.

Shell session
$ ./target/debug/greet-rs
./target/debug/greet-rs: error while loading shared libraries: libgreet.so: cannot open shared object file: No such file or directory
$ readelf -d ./target/debug/greet-rs | grep RUNPATH
$ patchelf --set-rpath "${PWD}/.." ./target/debug/greet-rs
$ ./target/debug/greet-rs
Hello, fresh coffee!
$ readelf -d ./target/debug/greet-rs | grep RUNPATH
 0x000000000000001d (RUNPATH)            Library runpath: [/home/amos/ftl/greet/greet-rs/..]

Heck, we can even make the RPATH relative to our executable's location.

Shell session
$ patchelf --set-rpath '$ORIGIN/../../..' ./target/debug/greet-rs

Oh, single quotes again! That way $ORIGIN doesn't expanded by the shell, since it's not an environment variable, it's special syntax just for the dynamic linker.

Yes! And we can make sure we got it right with readelf:

Shell session
$ readelf -d ./target/debug/greet-rs | grep RUNPATH
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/../../..]
$ ./target/debug/greet-rs
Hello, fresh coffee!

Okay, the hard part is over. Kind of.

The thing is, we don't really want to "link against" libgreet.so.

We want to be able to dynamically reload it. So first, we have to dynamically load it. With dlopen.

But we can take all that knowledge we've just gained and use, like, the easy 10%, because the rest is irrelevant, you'll see why in a minute.

We've just seen how to use functions from a dynamic library - and dlopen and friends are in a dynamic library, libdl.so.

So let's just do that:

Rust code
use std::{ffi::c_void, ffi::CString, os::raw::c_char, os::raw::c_int};

#[link(name = "dl")]
extern "C" {
    fn dlopen(path: *const c_char, flags: c_int) -> *const c_void;
    fn dlsym(handle: *const c_void, name: *const c_char) -> *const c_void;
    fn dlclose(handle: *const c_void);
}

// had to look that one up in `dlfcn.h`
// in C, it's a #define. in Rust, it's a proper constant
pub const RTLD_LAZY: c_int = 0x00001;

fn main() {
    let lib_name = CString::new("../libgreet.so").unwrap();
    let lib = unsafe { dlopen(lib_name.as_ptr(), RTLD_LAZY) };
    if lib.is_null() {
        panic!("could not open library");
    }

    let greet_name = CString::new("greet").unwrap();
    let greet = unsafe { dlsym(lib, greet_name.as_ptr()) };

    type Greet = unsafe extern "C" fn(name: *const c_char);
    use std::mem::transmute;
    let greet: Greet = unsafe { transmute(greet) };

    let name = CString::new("fresh coffee").unwrap();
    unsafe {
        greet(name.as_ptr());
    }

    unsafe {
        dlclose(lib);
    }
}
Cool bear's hot tip

On Windows, you'd normally use LoadLibrary instead of dlopen, unless you used some sort of compatibility layer, like Cygwin.

Finally, we can remove our Cargo build script (build.rs), and we won't have to use patchelf either, since we're giving a full path (not just a name) to dlopen.

Shell session
$ cargo build -q
$ ./target/debug/greet-rs
Hello, fresh coffee!

🎉🎉🎉

Okay, that's a bunch of unsafe code.

Isn't there, you know, a crate for that?

Sure, let's go crate shopping.

Ooh, libloading looks cool, let's give it a shot:

Shell session
$ cargo add libloading
      Adding libloading v0.6.3 to dependencies
Rust code
use std::{ffi::CString, os::raw::c_char};

use libloading::{Library, Symbol};

fn main() {
    let lib = Library::new("../libgreet.so").unwrap();
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet").unwrap();
        let name = CString::new("fresh coffee").unwrap();
        greet(name.as_ptr());
    }
}

Mhhh. unwrap salad.

Alright, sure, let's have main return a Result instead, so we can use ? to propagate errors instead.

Rust code
use std::{error::Error, ffi::CString, os::raw::c_char};

use libloading::{Library, Symbol};

fn main() -> Result<(), Box<dyn Error>> {
    let lib = Library::new("../libgreet.so")?;
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        let name = CString::new("fresh coffee")?;
        greet(name.as_ptr());
    }

    Ok(())
}

Better. But why are we building an instance of CString?

Couldn't we do that at compile-time? Isn't there.. a crate.. that lets us do C-style strings?

Yeah, yes, there's a crate for that, okay, sure.

Shell session
$ cargo add cstr
      Adding cstr v0.2.2 to dependencies
Rust code
use cstr::cstr;
use std::{error::Error, os::raw::c_char};

use libloading::{Library, Symbol};

fn main() -> Result<(), Box<dyn Error>> {
    let lib = Library::new("../libgreet.so")?;
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        greet(cstr!("rust macros").as_ptr());
    }

    Ok(())
}

Now this I like. Very clean.

Yeah, libloading is cool!

Note that it'll also work on macOS, on which dynamic libraries are actually .dylib, and on Windows, where you have .dll files.

Let's give it a try:

Shell session
$ cargo build -q
$ ./target/debug/greet-rs
Hello, rust macros!

Works great.

A Rust dynamic library

...but our libgreet.so is still C!

Can't we use Rust for that too?

Let's try it:

Shell session
$ cargo new --lib libgreet-rs
     Created library `libgreet-rs` package
$ cd libgreet-rs/

You sure about that naming convention buddy?

Not really, no

Now, if we want our Rust library to be a drop-in replacement for the C library, we need to match that function signature:

C code
void greet(const char *name);

The Rust equivalent would be *const c_char

Rust code
// in `libgreet-rs/src/lib.rs`

use std::{ffi::CStr, os::raw::c_char};

fn greet(name: *const c_char) {
    let cstr = unsafe { CStr::from_ptr(name) };
    println!("Hello, {}!", cstr.to_str().unwrap());
}
Shell session
$ cargo build
   Compiling libgreet-rs v0.1.0 (/home/amos/ftl/greet/libgreet-rs)
warning: function is never used: `greet`
 --> src/lib.rs:3:4
  |
3 | fn greet(name: *const c_char) {
  |    ^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: 1 warning emitted

    Finished dev [unoptimized + debuginfo] target(s) in 0.06s

Uh oh... an unused function. Do we need to ask for external linkage?

Oh right!

Rust code
pub fn greet(name: *const c_char) {
    // etc.
}

And also... maybe we should specify a calling convention?

Right again - since we're replacing a C library, let's make our function extern "C". And also, we're dealing with raw pointers, it's also unsafe.

And clippy is telling me to document why it's unsafe, so let's do it.

Rust code
/// # Safety
/// Pointer must be valid, and point to a null-terminated
/// string. What happens otherwise is UB.
pub unsafe extern "C" fn greet(name: *const c_char) {
    let cstr = CStr::from_ptr(name);
    println!("Hello, {}!", cstr.to_str().unwrap());
}

Is that it? Are we done?

Let's see... if we compile that library, what do we have in our target/debug/ folder?

Shell session
$ cargo build -q
$ ls ./target/debug/
build  deps  examples  incremental  liblibgreet_rs.d  liblibgreet_rs.rlib

Bwahaha liblib.

...yeah. Let's fix that real quick.

TOML markup
# in libgreet-rs/Cargo.toml

[package]
name = "greet" # was "libgreet-rs"
Shell session
$ cargo clean && cargo build -q
$ ls ./target/debug/
build  deps  examples  incremental  libgreet.d  libgreet.rlib

Better. So, we don't have an .so file. We don't even have an .a file! So it's not a typical static library either.

What is it?

Shell session
$ file ./target/debug/libgreet.rlib
./target/debug/libgreet.rlib: current ar archive

Oh, a "GNU ar" archive!

readelf can read those:

Shell session
$ readelf --symbols ./target/debug/libgreet.rlib | tail -10
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 3rux23h9i3obhoz1
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT   18
     7: 0000000000000000    72 FUNC    GLOBAL HIDDEN     3 _ZN4core3fmt9Arg[...]
     8: 0000000000000000    34 OBJECT  WEAK   DEFAULT    4 __rustc_debug_gd[...]

File: ./target/debug/libgreet.rlib(lib.rmeta)

And so can nm:

Shell session
$ nm ./target/debug/libgreet.rlib | head
nm: lib.rmeta: file format not recognized

greet-01dfe44a33984d16.197vz32lntcvm24o.rcgu.o:
0000000000000000 V __rustc_debug_gdb_scripts_section__
0000000000000000 T _ZN4core3ptr13drop_in_place17h43462a34d923c292E

greet-01dfe44a33984d16.1ilnatflm2f12z98.rcgu.o:
0000000000000000 V DW.ref.rust_eh_personality
0000000000000000 r GCC_except_table0
0000000000000000 V __rustc_debug_gdb_scripts_section__
                 U rust_eh_personality

Apparently lib.rmeta is not an ELF file. From the file name, I'd say it's metadata. Let's try extracting it using ar x (for eXtract):

Shell session
$ ar x ./target/debug/libgreet.rlib lib.rmeta --output /tmp
$ file /tmp/lib.rmeta
/tmp/lib.rmeta: data
$ xxd /tmp/lib.rmeta | head
00000000: 7275 7374 0000 0005 0000 05a8 2372 7573  rust........#rus
00000010: 7463 2031 2e34 362e 3020 2830 3434 3838  tc 1.46.0 (04488
00000020: 6166 6533 2032 3032 302d 3038 2d32 3429  afe3 2020-08-24)
00000030: 0373 7464 f2f2 a5a4 fdaf af90 ae01 0002  .std............
00000040: 112d 6366 3066 3333 6166 3361 3930 3137  .-cf0f33af3a9017
00000050: 3738 0463 6f72 658a e799 f18c fcbf 85d2  78.core.........
00000060: 0100 0211 2d39 3734 3937 6332 3666 6464  ....-97497c26fdd
00000070: 6237 3838 3211 636f 6d70 696c 6572 5f62  b7882.compiler_b
00000080: 7569 6c74 696e 73af b98f f482 e282 db47  uiltins........G
00000090: 0002 112d 6631 6139 6438 6334 3433 6532  ...-f1a9d8c443e2

Mhhh, binary format shenanigans. Is there a crate to parse that?

Of course - but it's inside rustc's codebase.

So, that's all well and good, but it's not yet a drop-in replacement for our C dynamic library.

Turns out, there's a bunch of "crate types", which we can set with the lib.crate-type attribute in our Cargo manifest, Cargo.toml.

bin is for executables, it's the type of our greet-rs project. What we have right now is lib.

Then there's dylib:

TOML markup
# in libgreet-rs/Cargo.toml
[lib]
crate-type = ["dylib"]
Shell session
$ cargo clean && cargo build -q
$ ls target/debug/
build  deps  examples  incremental  libgreet.d  libgreet.so

Eyyy, we got an .so file!

We sure did! Let's try loading it!

Rust code
// in `greet-rs/src/main.rs`

fn main() -> Result<(), Box<dyn Error>> {
    // new path:
    let lib = Library::new("../libgreet-rs/target/debug/libgreet.so")?;
    // (cut)
}
Shell session
$ cargo run -q
Error: DlSym { desc: "../libgreet-rs/target/debug/libgreet.so: undefined symbol: greet" }

Awwwwwwww.

Is it still not exported? I thought we made it pub and everything?

I don't know, let's ask nm.

Shell session
$ nm ../libgreet-rs/target/debug/libgreet.so | grep greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1
000000000004b2a0 T _ZN5greet5greet17h1155cd3fae6e8167E

That's not very readable... it's as if the output is mangled somehow?

Let's read nm's man page:

       -C
       --demangle[=style]
           Decode (demangle) low-level symbol names into user-level names.
           Besides removing any initial underscore prepended by the system,
           this makes C++ function names readable. Different compilers have
           different mangling styles. The optional demangling style argument
           can be used to choose an appropriate demangling style for your
           compiler.

Sure, let's try it:

Shell session
$ nm --demangle ../libgreet-rs/target/debug/libgreet.so | grep greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1
000000000004b2a0 T greet::greet

Ohhh there's namespacing going on.

I think I've seen this before... try adding #[no_mangle] on greet?

Rust code
// in `libgreet-rs/src/lib.rs`

use std::{ffi::CStr, os::raw::c_char};

// new!
#[no_mangle]
pub unsafe extern "C" fn greet(name: *const c_char) {
    let cstr = CStr::from_ptr(name);
    println!("Hello, {}!", cstr.to_str().unwrap());
}
Shell session
$ (cd ../libgreet-rs && cargo build -q)
$ nm --demangle ../libgreet-rs/target/debug/libgreet.so | grep greet
000000000004b2a0 T greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1

Better! Let's try nm again, without --demangle, to make sure:

Shell session
$ nm ../libgreet-rs/target/debug/libgreet.so | grep greet
000000000004b2a0 T greet
0000000000000000 N rust_metadata_greet_8d607b42dd0910ba8c251b9991cf8b1

Wonderful. If my calculations are correct..

Shell session
$ cargo run -q
Hello, rust macros!

YES!

Nicely done. Does it also work when loaded from C?

Only one way to find out.

C code
// in `load.c`

#include <dlfcn.h>
#include <stdio.h>

typedef void (*greet_t)(const char *name);

int main(void) {
    // new path:
    void *lib = dlopen("./libgreet-rs/target/debug/libgreet.so", RTLD_LAZY);

    // the rest is as before
}
Shell session
$ gcc -Wall load.c -o load -ldl
$ ./load
Hello, venus!

Seems to work well.

However, using crate-type=dylib is discouraged, in favor of crate-type=cdylib (notice the leading "c").

Let's see why:

Shell session
$ cargo clean && cargo build --release -q
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 3.8M Sep 16 16:24 ./target/release/libgreet.so
$ strip ./target/release/libgreet.so
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 3.8M Sep 16 16:24 ./target/release/libgreet.so
$ nm -D ./target/release/libgreet.so | grep " T " | wc -l
2084

Now with cdylib:

TOML markup
[lib]
crate-type = ["cdylib"]
Shell session
$ cargo clean && cargo build --release -q
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 2.7M Sep 16 16:25 ./target/release/libgreet.so
$ strip ./target/release/libgreet.so
$ ls -lhA ./target/release/libgreet.so
-rwxr-xr-x 2 amos amos 219K Sep 16 16:25 ./target/release/libgreet.so
$ nm -D ./target/release/libgreet.so | grep " T " | wc -l
2
$ nm -D ./target/release/libgreet.so | grep " T "
0000000000004260 T greet
000000000000cd70 T rust_eh_personality

Oooh. Exports only the symbols we care about and it's way smaller? Sign me the heck up.

Same! And it still loads from C!

C code
// in `load.c`

int main(void) {
    // was target/debug, now target/release
    void *lib = dlopen("./libgreet-rs/target/release/libgreet.so", RTLD_LAZY);
}
Shell session
$ gcc -Wall load.c -o load -ldl
$ ./load
Hello, venus!

And from Rust!

Rust code
// in `greet-rs/src/main.rs`

fn main() -> Result<(), Box<dyn Error>> {
    // was target/debug, now target/release
    let lib = Library::new("../libgreet-rs/target/release/libgreet.so")?;

    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        greet(cstr!("thin library").as_ptr());
    }

    Ok(())
}
Shell session
$ cargo run -q
Hello, thin library!

And now, some reloading

So far, we've only ever loaded libraries once. But can we reload them?

How do we even unload a library with libloading?

Well, libdl.so had a dlclose function.

Does libloading even close libraries? Ever?

Let's go hunt for info:

Shell session
$ lddtree ./target/debug/greet-rs
./target/debug/greet-rs (interpreter => /lib64/ld-linux-x86-64.so.2)
    libdl.so.2 => /usr/lib/libdl.so.2
    libpthread.so.0 => /usr/lib/libpthread.so.0
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
    libc.so.6 => /usr/lib/libc.so.6
    ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2

Oh, greet-rs depends on libdl.so.

Maybe we can use ltrace to see if it ever calls dlclose?

Shell session
$ ltrace ./target/debug/greet-rs
Hello, thin library!
+++ exited (status 0) +++

Oh.

Let's debug our program. I wanted to use LLDB for once (the LLVM debugger), but fate has decided against it (it's broken for Rust 1.46 - the fix has already been merged and will land in the next stable).

So let's use GDB:

Shell session
$ gdb --quiet --args ./target/debug/greet-rs
Reading symbols from ./target/debug/greet-rs...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/amos/ftl/greet/greet-rs/target/debug/greet-rs.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) break dlclose
Function "dlclose" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlclose) pending.
(gdb) run
Starting program: /home/amos/ftl/greet/greet-rs/target/debug/greet-rs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Hello, thin library!

Breakpoint 1, 0x00007ffff7f91450 in dlclose () from /usr/lib/libdl.so.2
(gdb) bt
#0  0x00007ffff7f91450 in dlclose () from /usr/lib/libdl.so.2
#1  0x000055555555fb4e in <libloading::os::unix::Library as core::ops::drop::Drop>::drop (self=0x7fffffffe168)
    at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/libloading-0.6.3/src/os/unix/mod.rs:305
#2  0x000055555555f54f in core::ptr::drop_in_place () at /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:184
#3  0x000055555555ad1f in core::ptr::drop_in_place () at /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:184
#4  0x000055555555c1fa in greet_rs::main () at src/main.rs:14

It does call dlclose! When the Library is dropped.

Hurray! That means we can do this, if we want:

Rust code
// in `greet-rs/src/main.rs`

use cstr::cstr;
use std::{error::Error, io::BufRead, os::raw::c_char};

use libloading::{Library, Symbol};

fn main() -> Result<(), Box<dyn Error>> {
    let mut line = String::new();
    let stdin = std::io::stdin();

    loop {
        if let Err(e) = load_and_print() {
            eprintln!("Something went wrong: {}", e);
        }

        println!("-----------------------------");
        println!("Press Enter to go again, Ctrl-C to exit...");
        stdin.lock().read_line(&mut line)?;
    }
}

fn load_and_print() -> Result<(), libloading::Error> {
    let lib = Library::new("../libgreet-rs/target/release/libgreet.so")?;
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        greet(cstr!("reloading").as_ptr());
    }

    Ok(())
}
Shell session
$ cargo run -q
Hello, reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Hello, reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Hello, reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
^C

Works well. But.. the library is not changing though.

You're right, let me try to actually change it.

Bear, it doesn't work.

So it would appear.

Why doesn't it work?

Don't you think maybe you ought to have started that article with a proof of concept?

...

What can prevent dlclose from unloading a library?

Well - researching this part took me a little while.

I probably spent two entire days debugging this, and reading code from glibc and the Rust standard library. I worked through hypothesis after hypothesis, and also switched from debugger to debugger, as either the debuggers or their frontends abandoned me halfway through the adventure.

Yeah, but now you know how it works!

Bearly.

So, here's the "short" version.

In rtld (the runtime loader - what I've been calling the dynamic linker all this time), every instance of a DSO (dynamic shared object) is reference-counted.

Let's take our simple C library again: if we dlopen it once, it's mapped. And if we dlclose it once, it's not mapped anymore.

Let's change load.c to showcase that:

C code
// in `load.c`

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void assert(void *p) {
  if (!p) {
    fprintf(stderr, "woops");
    exit(1);
  }
}

// this function is 101% pragmatic, don't @ me
void print_mapping_count() {
  const size_t buf_size = 1024;
  char buf[buf_size];
  printf("mapping count: ");
  fflush(stdout);
  snprintf(buf, buf_size, "bash -c 'cat /proc/%d/maps | grep libgreet | wc -l'",
           getpid());
  system(buf);
}

int main(void) {
  print_mapping_count();

  printf("> dlopen(RTLD_NOW)\n");
  void *lib = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib);
  print_mapping_count();

  printf("> dlclose()\n");
  dlclose(lib);
  print_mapping_count();

  return 0;
}
Shell session
$ gcc -Wall load.c -o load -ldl
$ ./load
mapping count: 0
> dlopen(RTLD_NOW)
mapping count: 5
> dlclose()
mapping count: 0

This is what it looks like when it works.

Now, when you call dlopen multiple times, it doesn't map the same file over and over again. It doesn't actually load it several times.

Let's confirm by trying it:

C code
int main(void) {
  print_mapping_count();

  printf("> dlopen(RTLD_NOW)\n");
  void *lib = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib);
  print_mapping_count();

  // new!
  printf("> dlopen(RTLD_NOW), a second time\n");
  void *lib2 = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib2);
  print_mapping_count();

  return 0;
}
Shell session
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(RTLD_NOW)
mapping count: 5
> dlopen(RTLD_NOW), a second time
mapping count: 5

The number of file mappings remained the same. But how does glibc actually do that?

If we look at the dl_open_worker function in glibc 2.31, we can see it calls _dl_map_object:

C code
// in `glibc/elf/dl-open.c`
// in `dl_open_worker()`

  /* Load the named object.  */
  struct link_map *new;
  args->map = new = _dl_map_object (call_map, file, lt_loaded, 0,
                    mode | __RTLD_CALLMAP, args->nsid);

And the first thing _dl_map_object does is compare, to see if the name we're passing is similar to a name that's already loaded:

C code
// in `glibc/elf/dl-load.c`
// in `_dl_map_object()`

  /* Look for this name among those already loaded.  */
  for (l = GL(dl_ns)[nsid]._ns_loaded; l; l = l->l_next)
    {
      /* If the requested name matches the soname of a loaded object,
     use that object.  Elide this check for names that have not
     yet been opened.  */
      if (__glibc_unlikely ((l->l_faked | l->l_removed) != 0))
    continue;
      if (!_dl_name_match_p (name, l))
    {
      const char *soname;

      if (__glibc_likely (l->l_soname_added)
          || l->l_info[DT_SONAME] == NULL)
        continue;

      soname = ((const char *) D_PTR (l, l_info[DT_STRTAB])
            + l->l_info[DT_SONAME]->d_un.d_val);
      if (strcmp (name, soname) != 0)
        continue;

      /* We have a match on a new name -- cache it.  */
      add_name_to_object (l, soname);
      l->l_soname_added = 1;
    }

      /* We have a match.  */
      return l;
    }

Note that it compares both the DT_SONAME (which we covered earlier) and the actual name passed to dlopen. Even if we somehow managed to change both of these between loads, it goes on to compare a "file identifier", in _dl_map_object_from_fd:

C code
// in `glibc/elf/dl-load.c`
// in `_dl_map_object_from_fd()`

      /* Look again to see if the real name matched another already loaded.  */
      for (l = GL(dl_ns)[nsid]._ns_loaded; l != NULL; l = l->l_next)
    if (!l->l_removed && _dl_file_id_match_p (&l->l_file_id, &id))
      {
        /* The object is already loaded.
           Just bump its reference count and return it.  */
        __close_nocancel (fd);

        /* If the name is not in the list of names for this object add
           it.  */
        free (realname);
        add_name_to_object (l, name);

        return l;
      }

And on Linux, the "file id" is a struct made up of a device number and an inode number:

C code
// in `glibc/sysdeps/posix/dl-fileid.h`

/* For POSIX.1 systems, the pair of st_dev and st_ino constitute
   a unique identifier for a file.  */
struct r_file_id
  {
    dev_t dev;
    ino64_t ino;
  };

/* Sample FD to fill in *ID.  Returns true on success.
   On error, returns false, with errno set.  */
static inline bool
_dl_get_file_id (int fd, struct r_file_id *id)
{
  struct stat64 st;

  if (__glibc_unlikely (__fxstat64 (_STAT_VER, fd, &st) < 0))
    return false;

  id->dev = st.st_dev;
  id->ino = st.st_ino;
  return true;
}

So, dlopen tries really hard to identify "loading the same file twice".

When closing, if the same file has been opened more times than it has been closed, nothing happens:

C code
// in `load.c`

int main(void) {
  print_mapping_count();

  printf("> dlopen(RTLD_NOW), loads the DSO\n");
  void *lib = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib);
  print_mapping_count();

  printf("> dlopen(RTLD_NOW), increases the reference count\n");
  void *lib2 = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib2);
  print_mapping_count();

  printf("> dlclose(), decreases the reference count\n");
  dlclose(lib2);
  print_mapping_count();

  printf("> dlclose(), reference count falls to 0, the DSO is unloaded\n");
  dlclose(lib);
  print_mapping_count();

  return 0;
}
Shell session
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(RTLD_NOW), loads the DSO
mapping count: 5
> dlopen(RTLD_NOW), increases the reference count
mapping count: 5
> dlclose(), decreases the reference count
mapping count: 5
> dlclose(), reference count falls to 0, the DSO is unloaded
mapping count: 0

Here's another reason why dlclose might not unload a DSO. If we loaded it with the RTLD_NODELETE flag:

C code
int main(void) {
  print_mapping_count();

  printf("> dlopen(RTLD_NOW | RTLD_NODELETE), loads the DSO\n");
  void *lib = dlopen("./libgreet.so", RTLD_NOW | RTLD_NODELETE);
  assert(lib);
  print_mapping_count();

  printf("> dlclose(), reference count falls to 0, but NODELETE is active\n");
  dlclose(lib);
  print_mapping_count();

  return 0;
}
Shell session
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(RTLD_NOW | RTLD_NODELETE), loads the DSO
mapping count: 5
> dlclose(), reference count falls to 0, but NODELETE is active
mapping count: 5

Here's yet another reason why dlclose might not unload a DSO: if we load another DSO, and some of its symbols are bounds to symbols from the first DSO, then closing the first DSO will not unload it, since it's needed by the second DSO.

Let's make something that links against libgreet.so:

C code
// in `woops.c`

extern void greet(const char *name);

void woops() {
        greet("woops");
}
Shell session
$ gcc -shared -Wall woops.c -o libwoops.so -L "${PWD}" -lgreet
$ file libwoops.so
libwoops.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=52a0b6f4bc8422b6dfbb4709decb8c3acdf23adf, with debug_info, not stripped
C code
int main(void) {
  print_mapping_count();

  printf("> dlopen(libgreet, RTLD_NOW)\n");
  void *lib = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib);
  print_mapping_count();

  printf("> dlopen(libwoops, RTLD_NOW)\n");
  void *lib2 = dlopen("./libwoops.so", RTLD_NOW);
  assert(lib2);
  print_mapping_count();

  printf("> dlclose(libgreet), but libwoops still needs it!\n");
  dlclose(lib);
  print_mapping_count();

  return 0;
}

(Note that we still need to set LD_LIBRARY_PATH - rtld still needs to find libgreet.so on disk before realizing it's already loaded).

Shell session
$ gcc -Wall load.c -o load -ldl && LD_LIBRARY_PATH="${PWD}" ./load
mapping count: 0
> dlopen(libgreet, RTLD_NOW)
mapping count: 5
> dlopen(libwoops, RTLD_NOW)
mapping count: 5
> dlclose(libgreet), but libwoops still needs it!
mapping count: 5

If we close libwoops as well, then libgreet ends up being unloaded as well, since nothing references it any longer:

C code
int main(void) {
  print_mapping_count();

  printf("> dlopen(libgreet, RTLD_NOW)\n");
  void *lib = dlopen("./libgreet.so", RTLD_NOW);
  assert(lib);
  print_mapping_count();

  printf("> dlopen(libwoops, RTLD_NOW)\n");
  void *lib2 = dlopen("./libwoops.so", RTLD_NOW);
  assert(lib2);
  print_mapping_count();

  printf("> dlclose(libgreet), but libwoops still needs it!\n");
  dlclose(lib);
  print_mapping_count();

  printf("> dlclose(libwoops), unloads libgreet\n");
  dlclose(lib2);
  print_mapping_count();


  return 0;
}
Shell session
$ gcc -Wall load.c -o load -ldl && LD_LIBRARY_PATH="${PWD}" ./load
mapping count: 0
> dlopen(libgreet, RTLD_NOW)
mapping count: 5
> dlopen(libwoops, RTLD_NOW)
mapping count: 5
> dlclose(libgreet), but libwoops still needs it!
mapping count: 5
> dlclose(libwoops), unloads libgreet
mapping count: 0

It doesn't matter in which order we close libgreet and libwoops. Any time we close anything, rtld goes through aaaaaaaall the objects it has loaded, and decides whether they're still needed.

So, we've seen three things that can prevent a DSO from unloading:

But... but our Rust cdylib is doing none of those.

I know, right?

There is, in fact, a fourth thing.

Before clarifying everything, let me muddy the waters a little more.

Let's change our Rust library to make greet a no-op.

Rust code
// in `libgreet-rs/src/lib.rs`

use std::os::raw::c_char;

/// # Safety
/// Pointer must be valid, and point to a null-terminated
/// string. What happens otherwise is UB.
#[no_mangle]
pub unsafe extern "C" fn greet(_name: *const c_char) {
    // muffin!
}

Then let's rebuild it:

Shell session
$ cd libgreet-rs
$ cargo build

And load it from our test program:

C code
// in `load.c`

int main(void) {
  print_mapping_count();

  printf("> dlopen(libgreet, RTLD_NOW)\n");
  void *lib = dlopen("./libgreet-rs/target/debug/libgreet.so", RTLD_NOW);
  assert(lib);
  print_mapping_count();

  printf("> dlclose(libgreet), will it work?\n");
  dlclose(lib);
  print_mapping_count();

  return 0;
}
Shell session
$ gcc -Wall load.c -o load -ldl && ./load
mapping count: 0
> dlopen(libgreet, RTLD_NOW)
mapping count: 6
> dlclose(libgreet), will it work?
mapping count: 0

It.. it works. Why does it work?

Well... let's look at the actual code of dlclose - or, rather, let's skip three or four abstraction levels and look directly at _dl_close_worker:

C code
// in `glibc/elf/dl-close.c`
// in `_dl_close_worker()`

      /* Check whether this object is still used.  */
      if (l->l_type == lt_loaded
      && l->l_direct_opencount == 0
      && !l->l_nodelete_active
      /* See CONCURRENCY NOTES in cxa_thread_atexit_impl.c to know why
         acquire is sufficient and correct.  */
      && atomic_load_acquire (&l->l_tls_dtor_count) == 0
      && !used[done_index])
    continue;

There's our fourth thing. Did you see it?

Enhance!

C code
      && atomic_load_acquire (&l->l_tls_dtor_count) == 0

Transport Layer Security... something count?

Oh yes.

l_tls_dtor_count counts the number of thread-local destructors.

What are those? Why do we want them?

Well, there's simple cases of thread-local variables, say, this, in C99:

C code
// in `tls.c`

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

__thread int a = 0;

void *work() {
  for (int a = 0; a < 3; a++) {
    printf("[%lu] a = %d\n", pthread_self() % 10, a);
    sleep(1);
  }
  return NULL;
}

int main(void) {
  pthread_t t1, t2, t3;

  pthread_create(&t1, NULL, work, NULL);
  pthread_create(&t2, NULL, work, NULL);
  pthread_create(&t3, NULL, work, NULL);
  sleep(4);
  return 0;
}
Shell session
$ gcc -Wall tls.c -o tls -lpthread
$ ./tls
[6] a = 0
[2] a = 0
[8] a = 0
[6] a = 1
[2] a = 1
[8] a = 1
[6] a = 2
[2] a = 2
[8] a = 2

As you can see, each thread has its own copy of a. The space for it is allocated when a thread is created, and deallocated when a thread exits.

But int is a primitive type. It's nice and simple. There's no need to do any particular cleanup when it's freed. Just release the associated memory and you're good!

Which is not the case... of a RefCell<Option<Box<dyn Write + Send>>>:

Rust code
// in `rust/src/libstd/io/stdio.rs`

thread_local! {
    /// Stdout used by print! and println! macros
    static LOCAL_STDOUT: RefCell<Option<Box<dyn Write + Send>>> = {
        RefCell::new(None)
    }
}

Ohhh. We did use println!.

The RefCell isn't the problem. Nor the Option. The problem is the Box.

Wait, Box implements Drop?

Sort of!

That heap-allocated needs to be freed somehow. Here's the actual implementation, as of Rust 1.46:

Rust code
// in `rust/src/alloc/boxed.rs`

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<#[may_dangle] T: ?Sized> Drop for Box<T> {
    fn drop(&mut self) {
        // FIXME: Do nothing, drop is currently performed by compiler.
    }
}

So, since Box implements Drop, a "thread-local destructor" is registered.

Let's look at LocalKey::get - it calls try_initialize

Rust code
        pub unsafe fn get<F: FnOnce() -> T>(&self, init: F) -> Option<&'static T> {
            match self.inner.get() {
                Some(val) => Some(val),
                None => self.try_initialize(init),
            }
        }

try_initialize, in turn, calls try_register_dtor:

Rust code
        // `try_register_dtor` is only called once per fast thread local
        // variable, except in corner cases where thread_local dtors reference
        // other thread_local's, or it is being recursively initialized.
        unsafe fn try_register_dtor(&self) -> bool {
            match self.dtor_state.get() {
                DtorState::Unregistered => {
                    // dtor registration happens before initialization.
                    register_dtor(self as *const _ as *mut u8, destroy_value::<T>);
                    self.dtor_state.set(DtorState::Registered);
                    true
                }
                DtorState::Registered => {
                    // recursively initialized
                    true
                }
                DtorState::RunningOrHasRun => false,
            }
        }

And register_dtor, well, it's a thing of beauty:

Rust code
// in `rust/src/libstd/sys/unix/fast_thread_local.rs`

// Since what appears to be glibc 2.18 this symbol has been shipped which
// GCC and clang both use to invoke destructors in thread_local globals, so
// let's do the same!
//
// Note, however, that we run on lots older linuxes, as well as cross
// compiling from a newer linux to an older linux, so we also have a
// fallback implementation to use as well.
#[cfg(any(
    target_os = "linux",
    target_os = "fuchsia",
    target_os = "redox",
    target_os = "emscripten"
))]
pub unsafe fn register_dtor(t: *mut u8, dtor: unsafe extern "C" fn(*mut u8)) {
    use crate::mem;
    use crate::sys_common::thread_local::register_dtor_fallback;

    extern "C" {
        #[linkage = "extern_weak"]
        static __dso_handle: *mut u8;
        #[linkage = "extern_weak"]
        static __cxa_thread_atexit_impl: *const libc::c_void;
    }
    if !__cxa_thread_atexit_impl.is_null() {
        type F = unsafe extern "C" fn(
            dtor: unsafe extern "C" fn(*mut u8),
            arg: *mut u8,
            dso_handle: *mut u8,
        ) -> libc::c_int;
        mem::transmute::<*const libc::c_void, F>(__cxa_thread_atexit_impl)(
            dtor,
            t,
            &__dso_handle as *const _ as *mut _,
        );
        return;
    }
    register_dtor_fallback(t, dtor);
}

What does __cxa_thread_atexit_impl do? Let's look at the glibc source again:

C code
// in `glibc/stdlib/cxa_thread_atexit_impl.c`

/* Register a destructor for TLS variables declared with the 'thread_local'
   keyword.  This function is only called from code generated by the C++
   compiler.  FUNC is the destructor function and OBJ is the object to be
   passed to the destructor.  DSO_SYMBOL is the __dso_handle symbol that each
   DSO has at a unique address in its map, added from crtbegin.o during the
   linking phase.  */
int
__cxa_thread_atexit_impl (dtor_func func, void *obj, void *dso_symbol)
{
    // (cut)
}

"Only called from code generated by the C++ compiler", huh.

So, as soon as we call __cxa_thread_atexit_impl, it's game over. We can never, ever, unload that DSO.

Speaking of... why? Why does glibc check for that before unloading a DSO?

Well... a TLS destructor must be run on the same thread. Here, let me show you.

C code
// in `tls2.c`

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <stdlib.h>

//====================================
// glibc TLS destructor stuff
//====================================

typedef void (*dtor_func)(void *);

extern void *__dso_handle;
extern void __cxa_thread_atexit_impl(dtor_func func, void *obj, void *dso_symbol);

//====================================
// Some thread-local data
//====================================

typedef struct {
  uint64_t *array;
} data_t;

__thread data_t *data = NULL;

//====================================
// Some helpers
//====================================

// Returns an identifier that's shorter than `pthread_self`,
// easier to distinguish in the program's output. May collide
// though - not a great hash function.
uint8_t thread_id() {
  return (pthread_self() >> 8) % 256;
}

// Attempt to sleep for a given amount of milliseconds.
// Passing `ms > 1000` is UB.
void sleep_ms(int ms) {
  struct timespec ts = { .tv_sec = 0, .tv_nsec = ms * 1000 * 1000 };
  nanosleep(&ts, NULL);
}

//====================================
// Our destructor
//====================================

void dtor(void *p) {
  printf("[%x] dtor called! data = %p\n", thread_id(), data);
  free(data->array);
  free(data);
  data = NULL;
}

//====================================
// Worker thread function
//====================================

void *work() {
  printf("[%x] is worker thread\n", thread_id());

  const size_t n = 16;
  // initialize `data` for this thread
  data = malloc(sizeof(data_t));
  data->array = malloc(n * sizeof(uint64_t));
  printf("[%x] allocated!   data = %p\n", thread_id(), data);

  printf("[%x] registering destructor\n", thread_id());
  __cxa_thread_atexit_impl(dtor, NULL, __dso_handle);

  // compute fibonnaci sequence
  if (n >= 2) {
    data->array[0] = 1;
    data->array[1] = 1;
  }
  for (int i = 2; i < n; i++) {
    data->array[i] = data->array[i - 2] + data->array[i - 1];
  }

  // print
  for (int i = 0; i < n; i++) {
    printf(i > 0 ? ", %lu" : "%lu", data->array[i]);
  }
  printf("\n");

  printf("[%x] thread exiting\n", thread_id());
  return NULL;
}

//====================================
// Main function
//====================================

int main(void) {
  printf("[%x] is main thread\n", thread_id());

  pthread_t t1;
  printf("[%x] creating thread\n", thread_id());
  pthread_create(&t1, NULL, work, NULL);

  sleep_ms(100);

  return 0;
}

Everything works fine in this code sample. The destructor is registered from a thread, and called on that same thread, when it exits naturally:

Shell session
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[77] is main thread
[77] creating thread
[66] is worker thread
[66] allocated!   data = 0x7f1bc8000b60
[66] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
[66] thread exiting
[66] dtor called! data = 0x7f1bc8000b60

If however the destructor is called from another thread, like the main thread, things go terribly wrong:

C code
// in `tls2.c`

void *work() {
  // (cut)

  // commented out:
  // printf("[%x] registering destructor\n", thread_id());
  // __cxa_thread_atexit_impl(dtor, NULL, __dso_handle);

  // (cut)
}

int main(void) {
  printf("[%x] is main thread\n", thread_id());

  pthread_t t1;
  printf("[%x] creating thread\n", thread_id());
  pthread_create(&t1, NULL, work, NULL);

  sleep_ms(100);
  dtor(NULL);

  return 0;
}
Shell session
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[e7] is main thread
[e7] creating thread
[d6] is worker thread
[d6] allocated!   data = 0x7f522c000b60
[d6] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
[e7] dtor called! data = (nil)
zsh: segmentation fault (core dumped)  ./tls2

Well yeah - the destructor refers to thread-local storage, but it's running on the wrong thread, so it's reading garbage.

Yup!

Note that this example code is a bit contrived and uses __cxa_thread_atexit_impl in unintended ways.

In code that a Rust or C++ compiler would emit, the obj argument would be used to pass the this pointer, and so it would never be NULL (or wrong). To reproduce a similar failure, we'd have to have the destructor refer to some other thread-local data, which... well. Don't do that.

Thanks to InBetweenNames on GitHub for the insight!

But say that thread does not exit naturally. Say it's cancelled, for example:

C code
// in `tls2.c`

void *work() {
  // (cut)

  sleep(2);
  printf("[%x] thread exiting\n", thread_id());
  return NULL;
}

int main(void) {
  // (cut)

  sleep_ms(100);
  pthread_cancel(t1);

  return 0;
}

Then what happens?

Shell session
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[c7] is main thread
[c7] creating thread
[b6] is worker thread
[b6] allocated!   data = 0x7f1bcc000b60
[b6] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987

The destructor isn't called at all?

Well, we still need to join it:

C code
int main(void) {
  // (cut)

  sleep_ms(100);
  pthread_cancel(t1);
  pthread_join(t1, NULL);

  return 0;
}
Shell session
$ gcc -Wall tls2.c -o tls2 -lpthread && ./tls2
[67] is main thread
[67] creating thread
[56] is worker thread
[56] allocated!   data = 0x7fbd10000b60
[56] registering destructor
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987
[56] dtor called! data = 0x7fbd10000b60

Alright, so, couldn't we just do this?

Depends. Who's "we"?

It's true that, once all TLS destructors run, l_tls_dtor_count falls back to zero, and the DSO can be unloaded.

So technically, one may come up with a scheme:

There's just... several small problems with that.

First off, the only way I can think of, of enumerating all threads, would be to use the ptrace API, like debuggers do.

This would also need to happen out-of-process, so the whole thing would require spawning another process.

Yay, moving parts!

Second - cancelling threads is not that easy. If we look at the pthread_cancel man page:

       The pthread_cancel() function sends a cancellation request to the
       thread thread. Whether and when the target thread reacts to the
       cancellation request depends on two attributes that are under the
       control of that thread: its cancelability state and type.

       A thread's cancelability state, determined by
       pthread_setcancelstate(3), can be enabled (the default for new
       threads) or disabled. If a thread has disabled cancellation, then a
       cancellation request remains queued until the thread enables
       cancellation. If a thread has enabled cancella‐ tion, then its
       cancelability type determines when cancellation occurs.

       A thread's cancellation type, determined by pthread_setcanceltype(3),
       may be either asynchronous or deferred (the default for new threads).
       Asynchronous cancelability means that the thread can be canceled at
       any time (usually immediately, but the system does not guarantee
       this). Deferred cancelability means that cancellation will be
       delayed until the thread next calls a function that is a cancellation
       point. A list of functions that are or may be cancellation points is
       provided in pthreads(7).

So, if the thread's cancellation is asynchronous, we might be able to cancel it at any time - no guarantees!. But if it's the default, deferred, then it can only be cancelled at a "cancellation point".

Which, fortunately, sleep is one of them, according to man 7 pthreads. But what if we're crunching numbers real hard, in a tight loop? Then we won't be able to cancel that.

Third, what about cleanup? pthreads provides pthread_cleanup_push, which is fine if you expect your threads to be cancelled - but the Rust libstd doesn't expect to be cancelled, at all.

If we search for pthread_cleanup_push usage in Rust's libstd using ripgrep:

sh
$ cd rust/src/libstd
$ rg 'pthread_cleanup_push'
$

No results.

And then, there's a fourth thing.

In libgreet.so, which thread is __cxa_thread_atexit_impl called from?

Let's investigate:

Rust code
// in `greet-rs/src/main.rs`

fn main() -> Result<(), Box<dyn Error>> {
    println!("main thread id = {:?}", std::thread::current().id());

    // (cut)
}
Shell session
$ cd greet-rs/
$ cargo build
   Compiling greet-rs v0.1.0 (/home/amos/ftl/greet/greet-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
Rust code
// in `libgreet-rs/src/lib.rs`

use std::os::raw::c_char;

/// # Safety
/// Pointer must be valid, and point to a null-terminated
/// string. What happens otherwise is UB.
#[no_mangle]
pub unsafe extern "C" fn greet(_name: *const c_char) {
    println!("greeting from thread {:?}", std::thread::current().id());
}
Shell session
$ cd libgreet-rs/
$ cargo build
   Compiling greet v0.1.0 (/home/amos/ftl/greet/libgreet-rs)
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
Shell session
$ cd greet-rs/
$ ./target/debug/greet-rs
main thread id = ThreadId(1)
greeting from thread ThreadId(1)
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Uh oh.

It's the same thread!

We don't want to cancel the main thread now, do we?

Tonight, at eleven:

I swear to humanity, bear, if you say "pthread_cancel culture"

...doom.

So. It seems we're stuck. I guess we can't reload Rust libraries.

Not as long as we use types that register thread-local destructor. So, no println! for us - in fact, no std::io at all.

Unless... unless we find a way to prevent our Rust library from calling __cxa_thread_atexit_impl.

How do you mean?

Here, let me show you for once. If we declare our own #[no_mangle] function in libgreet-rs...

Rust code
// in `libgreet-rs/src/lib.rs`

#[no_mangle]
pub unsafe extern "C" fn __cxa_thread_atexit_impl() {}

#[no_mangle]
pub unsafe extern "C" fn greet(name: *const c_char) {
    let s = CStr::from_ptr(name);
    println!("greetings, {}", s.to_str().unwrap());
}
Shell session
$ cd libgreet-rs/
$ cargo b -q
Shell session
$ cd greet-rs/
$ cargo b -q
$ ./target/debug/greet-rs
greetings, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Okay now let's change the library...

Rust code
// in `libgreet-rs/src/lib.rs`
// in `fn greet()`

    println!("hello, {}", s.to_str().unwrap());
Shell session
# session where greet-rs is still running
# (now pressing enter)
greetings, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Mhh, no, that doesn't work.

See? Not that easy.

If we repeat the operation with LD_DEBUG=all, we can see where rtld takes the __cxa_thread_atexit_impl symbol for libgreet.so:

    137666:	symbol=__cxa_thread_atexit_impl;  lookup in file=./target/debug/greet-rs [0]
    137666:	symbol=__cxa_thread_atexit_impl;  lookup in file=/usr/lib/libdl.so.2 [0]
    137666:	symbol=__cxa_thread_atexit_impl;  lookup in file=/usr/lib/libpthread.so.0 [0]
    137666:	symbol=__cxa_thread_atexit_impl;  lookup in file=/usr/lib/libgcc_s.so.1 [0]
    137666:	symbol=__cxa_thread_atexit_impl;  lookup in file=/usr/lib/libc.so.6 [0]
    137666:	binding file ./target/debug/greet-rs [0] to /usr/lib/libc.so.6 [0]: normal symbol `__cxa_thread_atexit_impl' [GLIBC_2.18]

Ah, crap, libc wins again.

There's actually a way to make that workaround "work". Another dlopen flag:

       RTLD_DEEPBIND (since glibc 2.3.4)
              Place the lookup scope of the symbols in this shared object
              ahead of the global scope. This means that a self-con‐ tained
              object will use its own symbols in preference to global symbols
              with the same name contained in objects that have already been
              loaded.

That way, it would look first in libgreet.so to find __cxa_thread_atexit_impl.

...or we could just put the definition in greet-rs instead?

The executable? Sure, that should work - it's the first place rtld looks.

First we need to remove __cxa_thread_atexit_impl from libgreet-rs/src/lib.rs, and then we can add it to greet-rs/src/main.rs

Rust code
// in `greet-rs/src/main.rs`

#[no_mangle]
pub unsafe extern "C" fn __cxa_thread_atexit_impl() {}
Shell session
$ cd libgreet-rs/
$ cargo build -q
$ cd ../greet-rs/
$ cargo build -q
$ ./target/debug/greet-rs
./target/debug/greet-rs
hello, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Now let's change libgreet-rs, you know the drill by now. And press enter in our greet-rs shell session:

Shell session
hello again, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...

🎉🎉🎉

Finally, we've done it.

Or have we?

Let's run greet-rs through valgrind, just for fun:

Shell session
$ valgrind --leak-check=full ./target/debug/greet-rs
==141352== Memcheck, a memory error detector
==141352== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==141352== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==141352== Command: ./target/debug/greet-rs
==141352==
hello again, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...
==141352== Process terminating with default action of signal 2 (SIGINT)
(cut)
==141352== HEAP SUMMARY:
==141352==     in use at exit: 11,205 bytes in 22 blocks
==141352==   total heap usage: 38 allocs, 16 frees, 15,231 bytes allocated
==141352==
==141352== 96 (24 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 22
==141352==    at 0x483A77F: malloc (vg_replace_malloc.c:307)
(cut)
==141352==    by 0x110DA4: greet_rs::load_and_print (main.rs:31)
==141352==    by 0x1106DA: greet_rs::main (main.rs:15)
(cut)
==141352==    by 0x110E29: main (in /home/amos/ftl/greet/greet-rs/target/debug/greet-rs)
==141352==
==141352== 1,136 (8 direct, 1,128 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 22
==141352==    at 0x483A77F: malloc (vg_replace_malloc.c:307)
(cut)
==141352==    by 0x110DA4: greet_rs::load_and_print (main.rs:31)
==141352==    by 0x1106DA: greet_rs::main (main.rs:15)
(cut)
==141352==    by 0x110E29: main (in /home/amos/ftl/greet/greet-rs/target/debug/greet-rs)
==141352==
==141352== LEAK SUMMARY:
==141352==    definitely lost: 32 bytes in 2 blocks
(cut)

Oh. It's leaking memory.

Which. Of course it is. We made "registering destructors" a no-op.

Wasn't there a fallback in Rust's libstd?

Yes there was!

But the fallback is only used when __cxa_thread_atexit_impl is not present. If, for example, your version of glibc does not provide that symbol. Which can happen!

So... do we patch glibc?

Luckily, we don't need to.

libstd doesn't really check if __cxa_thread_atexit_impl is "provided" or "present". It checks if the address of __cxa_thread_atexit_impl, as provided by rtld during the loading of libgreet.so, is non-zero.

Rust code
    // in the Rust libstd
    if !__cxa_thread_atexit_impl.is_null() {
        // etc.
    }

Oooh, ooh! I have an idea.

Pray tell!

What if we made a symbol, named __cxa_thread_atexit_impl...

Go on...

And injected it in the rtld namespace, before libc.so.6...

With LD_PRELOAD? Sure.

...and it's a constant symbol, and its value is 0.

Is... is that legal? Should we call a lawyer?

Turns out - no lawyers are needed. At first, I tried doing that without involving another dynamic library, but GNU ld was not amused. Not amused at all. In fact, an internal assertion failed, rudely.

But, if we're willing to make another .so file, we can make it work.

How do we make a constant symbol?

I'm not aware of any way to do that in Rust. Or, heck, even in C.

But that's where assembly comes in handy. We've talked about assembly before. In the current implementation, Rust code typically gets compiled to LLVM IR, which is a form of assembly.

In the GNU toolchain (GCC and friends), C code gets compiled to... GNU assembly. AT&T style. And then assembled with gas, the GNU assembler.

So, let's write a bit of assembly:

x86 assembly
// in `tls-dtor-fallback.S`

.global __cxa_thread_atexit_impl

__cxa_thread_atexit_impl = 0

Then, let's make an honest shared library out of it:

Shell session
$ gcc -Wall -shared -nostdlib -nodefaultlibs tls-dtor-fallback.S -o libtls-dtor-fallback.so

Let's check what we have in there:

Shell session
$ nm -D ./libtls-dtor-fallback.so
0000000000000000 A __cxa_thread_atexit_impl

Wonderful! Just what we need.

Wait... "A"? Shouldn't it be "T"?

"T" is for the ".text" (code) section. "A" is for "absolute". Doesn't matter though. It's still a symbol, and rtld should find it.

Then we just inject it when we run greet-rs, and:

Wait! We forgot to remove the __cxa_thread_atexit_impl from greet-rs/src/main.rs

Ah right! So, let's remove it from there, and recompile... and then let's inject our library when we run greet-rs:

Shell session
$ LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
greetings, reloading
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Then we can change libgreet:

Rust code
// in `libgreet-rs/src/lib.rs`

#[no_mangle]
pub unsafe extern "C" fn greet(name: *const c_char) {
    let s = CStr::from_ptr(name);
    println!("hurray for {}!", s.to_str().unwrap());
}
Shell session
$ cd libgreet-rs/ && cargo build -q

Then from the session where greet-rs is still running, we press enter:

Shell session
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

This seems like a good ending, right? The library unloads, we can load it again, everything's fine?

Why do I sense there's trouble afoot?

Because there is, cool bear. There is. We messed up bad.

More like "breakaround"

The thing is, there's a very good reason why glibc doesn't let you unload a DSO if it has registered TLS destructors. We've already seen good reasons, but that wasn't the whole story.

First off, we never checked that we fixed the memory leak:

Shell session
$ valgrind --leak-check=full --trace-children=yes env LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
(cut)
==263760== LEAK SUMMARY:
==263760==    definitely lost: 32 bytes in 2 blocks
==263760==    indirectly lost: 1,200 bytes in 4 blocks
==263760==      possibly lost: 0 bytes in 0 blocks
==263760==    still reachable: 10,149 bytes in 20 blocks
==263760==         suppressed: 0 bytes in 0 blocks

...and that's just when we load libgreet.so once. It leaks 32 bytes directly, 1200 bytes indirectly per load.

But let's ignore that - we could load it 447K times before it leaks 64 MiB of RAM, so arguably, in development, that's not a huge problem.

Right - and at least, the actual .so file is unmapped, so the kernel can free those resources.

True, so it is "better" than before in the "memory leak" department.

The issue with our workaround is much bigger. The reason we're leaking memory is because the TLS destructors registered by libgreet never actually get run.

How do you know?

Days and days of stepping through code with various debuggers?

Oh, that's what you were up to. I thought you were just installing Gentoo.

...that too.

But what would happen if the destructors were actually called?

For TLS destructors to be called, on Linux, with glibc, we need to actually let a thread terminate.

So let's try to call load_and_print from a thread:

Rust code
// in `greet-rs/src/main.rs`

fn main() -> Result<(), Box<dyn Error>> {
    let mut line = String::new();
    let stdin = std::io::stdin();

    println!("Here we go!");
    loop {
        // new! was just a regular call:
        std::thread::spawn(load_and_print).join().unwrap().unwrap();

        println!("-----------------------------");
        println!("Press Enter to go again, Ctrl-C to exit...");

        line.clear();
        stdin.read_line(&mut line).unwrap();
    }

    Ok(())
}

Mhh why do we unwrap twice?

JoinHandle::join returns a Result<T, E> - which is Err if the thread panics. But here, the thread also returns a Result<T, E>, so the actual return type is std::thread::Result<Result<(), libloading::Error>>

Wait, std::thread::Result only has one type parameter? It doesn't take an E for error?

Libraries tend to do that - they define their own Result type, which is an alias over std::result::Result with the error type E set to something from the crate.

So, now that we do load_and_print from a thread:

Shell session
$ cargo b -q
$ ./target/debug/greet-rs
Here we go!
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
^C

Seems to work fine?

Woops, forgot to inject our "workaround"

Shell session
$ LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
hurray for reloading!
zsh: segmentation fault (core dumped)  LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs

Ah, yes.

We can dig a little deeper with LLDB:

Shell session
$ lldb ./taget/debug/greet-rs
(lldb) target create "./target/debug/greet-rs"
Current executable set to '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64).
(lldb) env LD_PRELOAD=../libtls-dtor-fallback.so
(lldb) r
Process 285989 launched: '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64)
Here we go!
hurray for reloading!
Process 285989 stopped
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
    frame #0: 0x00007ffff7b574f0
error: memory read failed for 0x7ffff7b57400
(lldb) bt
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
  * frame #0: 0x00007ffff7b574f0
    frame #1: 0x00007ffff7f74201 libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:302:8
    frame #2: 0x00007ffff7f7418a libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:251
    frame #3: 0x00007ffff7f743fc libpthread.so.0`start_thread(arg=0x00007ffff7d85640) at pthread_create.c:474:3
    frame #4: 0x00007ffff7e88293 libc.so.6`__clone at clone.S:95
(lldb)

...but even Valgrind would've given us a hint as to what went wrong:

Shell session
$ valgrind --quiet --leak-check=full --trace-children=yes env LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
hurray for reloading!
==287464== Thread 2:
==287464== Jump to the invalid address stated on the next line
==287464==    at 0x50994F0: ???
==287464==    by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:302)
==287464==    by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:251)
==287464==    by 0x488C3FB: start_thread (pthread_create.c:474)
==287464==    by 0x49BF292: clone (clone.S:95)
==287464==  Address 0x50994f0 is not stack'd, malloc'd or (recently) free'd
==287464==
==287464== Can't extend stack to 0x484f138 during signal delivery for thread 2:
==287464==   no stack segment
==287464==
==287464== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==287464==  Access not within mapped region at address 0x484F138
==287464==    at 0x50994F0: ???
==287464==    by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:302)
==287464==    by 0x488C200: __nptl_deallocate_tsd (pthread_create.c:251)
==287464==    by 0x488C3FB: start_thread (pthread_create.c:474)
==287464==    by 0x49BF292: clone (clone.S:95)
==287464==  If you believe this happened as a result of a stack
==287464==  overflow in your program's main thread (unlikely but
==287464==  possible), you can try to increase the size of the
==287464==  main thread stack using the --main-stacksize= flag.
==287464==  The main thread stack size used in this run was 8388608

Poor Valgrind is trying to its darndest to help us.

But no, that memory range was not stack'd, malloc'd, or recently free'd.

It was, however, recently unmapped.

With our workaround, or "breakaround", as I've recently taken to calling it, we've entered the land of super-duper-undefined behavior, aka SDUB.

Because events are happening in this order:

...however, the destructors' code was in the DSO we just unloaded.

So... we broke libloading?

We definitely made it insta-unsound.

Because in libloading, Library::new is not unsafe. And neither is dropping a Library. And yet that's where we crash.

Mhh. Couldn't we make sure we call the pthread TLS key destructors before libgreet.so is dropped?

Sure, yes, we can do that.

Rust code
// in `greet-rs/src/main.rs`

use cstr::cstr;
use std::ffi::c_void;
use std::{error::Error, io::BufRead, os::raw::c_char};

use libloading::{Library, Symbol};

fn main() -> Result<(), Box<dyn Error>> {
    let mut line = String::new();
    let stdin = std::io::stdin();

    println!("Here we go!");
    loop {
        let lib = std::thread::spawn(load_and_print).join().unwrap().unwrap();
        drop(lib); // for clarity

        println!("-----------------------------");
        println!("Press Enter to go again, Ctrl-C to exit...");

        line.clear();
        stdin.read_line(&mut line).unwrap();
    }

    Ok(())
}

// now returns a `Library`, instead of dropping it
fn load_and_print() -> Result<Library, libloading::Error> {
    let lib = Library::new("../libgreet-rs/target/debug/libgreet.so")?;
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        greet(cstr!("reloading").as_ptr());
    }

    Ok(lib)
}
Shell session
$ cargo b -q
$ LD_PRELOAD=../libtls-dtor-fallback.so ./target/debug/greet-rs
Here we go!
hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

hurray for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...
^C

But there are many such scenarios. What if we don't run load_and_print in a thread, but if instead we run the whole loop in a thread that isn't the main thread?

Rust code
// in `greet-rs/src/main.rs`

use cstr::cstr;
use std::ffi::c_void;
use std::{error::Error, io::BufRead, os::raw::c_char};

use libloading::{Library, Symbol};

fn main() -> Result<(), Box<dyn Error>> {
    std::thread::spawn(run).join().unwrap();
    Ok(())
}

fn run() {
    let mut line = String::new();
    let stdin = std::io::stdin();

    println!("Here we go!");

    let n = 3;
    for _ in 0..n {
        load_and_print().unwrap();

        println!("-----------------------------");
        println!("Press Enter to go again, Ctrl-C to exit...");

        line.clear();
        stdin.read_line(&mut line).unwrap();
    }

    println!("Did {} rounds, stopping", n);
}

fn load_and_print() -> Result<(), libloading::Error> {
    let lib = Library::new("../libgreet-rs/target/debug/libgreet.so")?;
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        greet(cstr!("reloading").as_ptr());
    }

    Ok(())
}
Shell session
$ lldb ./target/debug/greet-rs
(lldb) target create "./target/debug/greet-rs"
Current executable set to '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64).
(lldb) env LD_PRELOAD=../libtls-dtor-fallback.so
(lldb) r
Process 333436 launched: '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64)
Here we go!
three cheers for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

three cheers for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

three cheers for reloading!
-----------------------------
Press Enter to go again, Ctrl-C to exit...

Did 3 rounds, stopping
Process 333436 stopped
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
    frame #0: 0x00007ffff7b574f0
error: memory read failed for 0x7ffff7b57400
(lldb) bt
* thread #2, name = 'greet-rs', stop reason = signal SIGSEGV: invalid address (fault address: 0x7ffff7b574f0)
  * frame #0: 0x00007ffff7b574f0
    frame #1: 0x00007ffff7f74201 libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:302:8
    frame #2: 0x00007ffff7f7418a libpthread.so.0`__nptl_deallocate_tsd at pthread_create.c:251
    frame #3: 0x00007ffff7f743fc libpthread.so.0`start_thread(arg=0x00007ffff7d85640) at pthread_create.c:474:3
    frame #4: 0x00007ffff7e88293 libc.so.6`__clone at clone.S:95
(lldb)

So... what's our solution here?

Well, there's a few things we can try.

A little memory leak, as a treat

Listen, sometimes you have to make compromises.

Shell session
$ cargo new --lib compromise
     Created library `compromise` package

Let's 🛒 go 🛒 shopping!

Shell session
$ cargo add once_cell
      Adding once_cell v1.4.1 to dependencies
$ cargo add cstr
      Adding cstr v0.2.4 to dependencies
$ cargo add libc
      Adding libc v0.2.77 to dependencies

So, this one is going to be a bit convoluted, but stay with me - we can do this.

First off, we don't always want hot reloading to be enabled. When it's disabled, we actually want to register TLS constructors. So we need to maintain some global state - that represents whether we're in a hot reloading scenario or not.

We could put it behind a Mutex, but do we really need to? Who knows how a Mutex is even implemented? Maybe it uses thread-local primitives behind the scenes, which we cannot use to implement this.

Let's go for something more minimal - just an AtomicBool.

Rust code
// in `compromise/src/lib.rs`

use std::{sync::atomic::AtomicBool, sync::atomic::Ordering};

static HOT_RELOAD_ENABLED: AtomicBool = AtomicBool::new(false);

// this one will be called from our executable, so it needs to be `pub`
pub fn set_hot_reload_enabled(enabled: bool) {
    HOT_RELOAD_ENABLED.store(enabled, Ordering::SeqCst)
}

// this one can be `pub(crate)`, it'll only be called internally
pub(crate) fn is_hot_reload_enabled() -> bool {
    HOT_RELOAD_ENABLED.load(Ordering::SeqCst)
}

Next up: we need an actual mechanism to prevent registration of TLS destructors when hot-reloading is enabled.

Right now we only have an implementation for Linux:

Rust code
// in `compromise/src/lib.rs`

#[cfg(target_os = "linux")]
pub mod linux;

That's where things get a little... complicated.

Basically, we want to provide a function that:

Which means, if hot reloading is disabled, we need to look up __cxa_thread_atexit.

How do we even do that? Isn't it hidden by our own version?

Not hidden. Ours just comes first. We can still grab it with dlsym(), using the RTLD_NEXT flag.

How convenient. And we're going to do that on every call?

Well, that's the tricky part.

We don't care a lot about performance, because we don't expect to be registering TLS destructors very often, but still, I'd expect a dlsym call to be sorta costly, so I'd like to cache it.

First, let's define the type of the function we'll looking up:

Rust code
// in `compromise/src/lib.rs`

use std::ffi::c_void;

type NextFn = unsafe extern "C" fn(*mut c_void, *mut c_void, *mut c_void);

Next - one way to "only look it up once" would be to declare a static of type once_cell::sync::Lazy<NextFn> - similar to what lazy_static gives us, except using once_cell.

Rust code
// in `compromise/src/lib.rs`

use cstr::cstr;
use once_cell::sync::Lazy;
use std::mem::transmute;

static NEXT: Lazy<NextFn> = Lazy::new(|| unsafe {
    transmute(libc::dlsym(
        libc::RTLD_NEXT,
        #[allow(clippy::transmute_ptr_to_ref)] // just silencing warnings
        cstr!("__cxa_thread_atexit_impl").as_ptr(),
    ))
});

And then we can use it from our own thread_atexit:

Rust code
// in `compromise/src/lib.rs`

#[allow(clippy::missing_safety_doc)]
pub unsafe fn thread_atexit(func: *mut c_void, obj: *mut c_void, dso_symbol: *mut c_void) {
    if crate::is_hot_reload_enabled() {
        // avoid registering TLS destructors on purpose, to avoid
        // double-frees and general crashiness
    } else {
        // hot reloading is disabled, attempt to forward TLS destructor
        // registration to glibc

        // note: we need to deref `NEXT` because it's a `Lazy<T>`
        (*NEXT)(func, obj, dso_symbol)
    }
}

...but that could crash if the system glibc doesn't have __cxa_thread_atexit_impl - ie, if dlsym returned a null pointer.

There's worse: building a value of type extern "C" fn foo() that is null is undefined behavior. Compiler optimizations may assume the pointer is non-null and remove any null checks we add.

So, let's not do undefined behavior.

Not even a little? As a treat?

Not even a little.

Luckily, extern "C" fn foo() is a pointer type, and Option<T> when T is a pointer type is transparent - it has the same size, same layout, it's just None when the pointer is null.

This is exactly what we want.

Rust code
static NEXT: Lazy<Option<NextFn>> = Lazy::new(|| unsafe {
    std::mem::transmute(libc::dlsym(
        libc::RTLD_NEXT,
        #[allow(clippy::transmute_ptr_to_ref)]
        cstr!("__cxa_thread_atexit_impl").as_ptr(),
    ))
});

Now, onto our thread_atexit function.

Here's our full Linux implementation, with some symbols renamed for clarity:

Rust code
// `compromise/src/linux.rs` implementation (whole file)

use cstr::cstr;
use once_cell::sync::Lazy;
use std::ffi::c_void;

pub type NextFn = unsafe extern "C" fn(*mut c_void, *mut c_void, *mut c_void);

static SYSTEM_THREAD_ATEXIT: Lazy<Option<NextFn>> = Lazy::new(|| unsafe {
    #[allow(clippy::transmute_ptr_to_ref)]
    let name = cstr!("__cxa_thread_atexit_impl").as_ptr();
    std::mem::transmute(libc::dlsym(
        libc::RTLD_NEXT,
        #[allow(clippy::transmute_ptr_to_ref)]
        name,
    ))
});

/// Turns glibc's TLS destructor register function, `__cxa_thread_atexit_impl`,
/// into a no-op if hot reloading is enabled.
///
/// # Safety
/// This needs to be public for symbol visibility reasons, but you should
/// never need to call this yourself
pub unsafe fn thread_atexit(func: *mut c_void, obj: *mut c_void, dso_symbol: *mut c_void) {
    if crate::is_hot_reload_enabled() {
        // avoid registering TLS destructors on purpose, to avoid
        // double-frees and general crashiness
    } else if let Some(system_thread_atexit) = *SYSTEM_THREAD_ATEXIT {
        // hot reloading is disabled, and system provides `__cxa_thread_atexit_impl`,
        // so forward the call to it.
        system_thread_atexit(func, obj, dso_symbol);
    } else {
        // hot reloading is disabled *and* we don't have `__cxa_thread_atexit_impl`,
        // throw hands up in the air and leak memory.
    }
}

Easy enough!

Mhhhhhhhh. But where do we define our own __cxa_thread_atexit_impl? This one is just called thread_atexit, and it's mangled.

Good eye! Turns out, if we just define __cxa_thread_atexit_impl, even pub, even #[no_mangle], it's not enough, because when linking, GNU ld picks glibc's version and we never end up calling the one in the compromise crate.

So it only works if it's defined directly in the executable?

Correct.

How do we do that?

Well... there's always macros. Which let us more or less take a bunch of AST (Abstract Syntax Tree) nodes and paste them into the module that calls it.

Let's see how that would work:

Rust code
// in `compromise/src/lib.rs`

#[macro_export]
macro_rules! register {
    () => {
        #[cfg(target_os = "linux")]
        #[no_mangle]
        pub unsafe extern "C" fn __cxa_thread_atexit_impl(
            func: *mut c_void,
            obj: *mut c_void,
            dso_symbol: *mut c_void,
        ) {
            compromise::linux::thread_atexit(func, obj, dso_symbol);
        }
    };
}

Ohhhh there it is. So the compromise crate only works if the executable's crate calls the compromise::register!() macro?

Yup!

And is that why linux::thread_atexit was pub? Because it'll actually end up being called from greet-rs (outside the compromise crate)?

Yes!! And that's also why, in the macro, it's fully-qualified: compromise::linux::thread_atexit.

Alright, I'll let your crimes be for now - just show us how to use them!

Well, first we need to import the crate:

Shell session
$ cd greet-rs/
$ cargo rm libc
    Removing libc from dependencies
$ cargo add ../compromise
      Adding compromise (unknown version) to dependencies
Cool bear's hot tip

cargo-edit (which provides the cargo add and cargo rm subcommands) is not doing "magic" here - it's just adding the compromise crate as a path. Here's what the resulting Cargo.toml's dependencies section looks like:

TOML markup
[dependencies]
libloading = "0.6.3"
cstr = "0.2.2"
compromise = { path = "../compromise" }

It's not published to crates.io, it's not vendored, the compromise/ folder has to live on disk next to greet-rs/ or it won't build.

Then, in greet-rs/src/main.rs, we need to register compromise:

rs
// in `greet-rs/src/main.rs`

// ⚠ Important: hot reloading won't work without it.
compromise::register!();

And then, at some point, call compromise::set_hot_reload_enabled(true).

But do we want to call it every time? No! So let's bring in a crate for CLI (command-line interface) argument parsing:

Shell session
$ cargo add argh
      Adding argh v0.1.3 to dependencies

It'll be quick - I swear.

Rust code
use argh::FromArgs;

#[derive(FromArgs)]
/// Greet
struct Args {
    /// whether "hot reloading" should be enabled
    #[argh(switch)]
    watch: bool,
}

fn main() -> Result<(), Box<dyn Error>> {
    let args: Args = argh::from_env();
    compromise::set_hot_reload_enabled(args.watch);
    if args.watch {
        println!("Hot reloading enabled - there will be memory leaks!");
    }

    std::thread::spawn(run).join().unwrap();
    Ok(())
}

Let's give it a shot:

Shell session
$ cargo b -q

...but before we do - did our trick work?

Shell session
$ nm -D ./target/debug/greet-rs | grep __cxa
                 w __cxa_finalize@@GLIBC_2.2.5
000000000000ac40 T __cxa_thread_atexit_impl

Looking good! We can see __cxa_finalize was taken from glibc (as evidenced by the @@GLIBC_2.2.5 version marker), and __cxa_thread_atexit_impl is defined in the executable itself.

We can convince ourselves further by running it in LLDB:

Shell session
$ lldb ./target/debug/greet-rs
(lldb) target create "./target/debug/greet-rs"
Current executable set to '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64).
(lldb) b __cxa_thread_atexit_impl
Breakpoint 1: where = greet-rs`__cxa_thread_atexit_impl + 18 at lib.rs:26:13, address = 0x000000000000ac52
(lldb) r
Process 21171 launched: '/home/amos/ftl/greet/greet-rs/target/debug/greet-rs' (x86_64)
1 location added to breakpoint 1
Process 21171 stopped
* thread #1, name = 'greet-rs', stop reason = breakpoint 1.1
    frame #0: 0x000055555555ec52 greet-rs`__cxa_thread_atexit_impl(func=0x000055555557ae60, obj=0x00007ffff7d87be0, dso_symbol=0x00005555555be088) at lib.rs:26:13
   23               obj: *mut c_void,
   24               dso_symbol: *mut c_void,
   25           ) {
-> 26               compromise::linux::thread_atexit(func, obj, dso_symbol);
   27           }
   28       };
   29   }

Fantastic.

At this point, we almost don't even need to try it out - unless we messed up the conditions in compromise/linux.rs, everything should work just fine.

But let's anyway. Here's an asciinema:

That's all well and good...

Yeah, I'm happy we finally got it work-

...but that's not "live" reloading. You still have to press enter.

...FINE.

Having fun

This has been a long and difficult article, so it's time to unwind a little, and reap what we've sown.

Segmentation faults?

No, fun.

First off, bear is 100% right. We're not live reloading right now. We're just loading the library every time we print anything, and unloading it right after.

Let's fix that.

Shell session
$ cargo add notify --vers 5.0.0-pre.3
      Adding notify v5.0.0-pre.3 to dependencies
Rust code
// in `greet-rs/src/main.rs`

use notify::{RecommendedWatcher, RecursiveMode, Watcher};

fn main() -> Result<(), Box<dyn Error>> {
    let args: Args = argh::from_env();
    compromise::set_hot_reload_enabled(args.watch);
    if args.watch {
        println!("Hot reloading enabled - there will be memory leaks!");
    }

    let base = PathBuf::from("../libgreet-rs").canonicalize().unwrap();
    let libname = "libgreet.so";
    let relative_path = PathBuf::from("target").join("debug").join(libname);
    let absolute_path = base.join(&relative_path);

    let mut watcher: RecommendedWatcher = Watcher::new_immediate({
        move |res: Result<notify::Event, _>| match res {
            Ok(event) => {
                if let notify::EventKind::Create(_) = event.kind {
                    if event.paths.iter().any(|x| x.ends_with(&relative_path)) {
                        let res = step(&absolute_path);
                        if let Err(e) = res {
                            println!("step error: {}", e);
                        }
                    }
                }
            }
            Err(e) => println!("watch error: {}", e),
        }
    })
    .unwrap();
    watcher.watch(&base, RecursiveMode::Recursive).unwrap();

    loop {
        std::thread::sleep(std::time::Duration::from_secs(1));
    }
}

fn step(lib_path: &Path) -> Result<(), libloading::Error> {
    let lib = Library::new(lib_path)?;
    unsafe {
        let greet: Symbol<unsafe extern "C" fn(name: *const c_char)> = lib.get(b"greet")?;
        #[allow(clippy::transmute_ptr_to_ref)]
        greet(cstr!("saturday").as_ptr());
    }

    Ok(())
}

Now we're having fun!

Wait.. you're not going to explain any of it?

Shush. I'm having fun. Y'all can figure it out.

To try it out, let's combine our new file-watching powers with cargo-watch, to recompile libgreet-rs any time we change it.

sh
$ cargo install cargo-watch
(cut: lots and lots of output)

And here's our next demo:

But this isn't fun enough.

Fun, in larger quantities

You know what would be fun? If we could draw stuff. In real time. And have our code be live-reloaded. Now that would be really fun.

Ohhhhh.

But in order for that to work, we probably don't want to be reloading the library every frame.

We don't have graphics yet, but let's prepare for that. First, let's make a plugin module with the implementation details:

Rust code
mod plugin {
    use libloading::{Library, Symbol};
    use std::{os::raw::c_char, path::Path};

    /// Represents a loaded instance of our plugin
    /// We keep the `Library` together with function pointers
    /// so that they go out of scope together.
    pub struct Plugin {
        pub greet: unsafe extern "C" fn(name: *const c_char),
        lib: Library,
    }

    impl Plugin {
        pub fn load(lib_path: &Path) -> Result<Self, libloading::Error> {
            let lib = Library::new(lib_path)?;

            Ok(unsafe {
                Plugin {
                    greet: *(lib.get(b"greet")?),
                    lib,
                }
            })
        }
    }
}

And then let's use it!

Instead of having our watcher directly load the library, we'll have it communicate with our main thread using a std::sync::mpsc::channel.

On every "frame", if a message was sent to the channel, we'll try to reload the plugin. Otherwise, we'll just use it, as usual.

Let's go:

Rust code
use plugin::Plugin;

fn main() -> Result<(), Box<dyn Error>> {
    // same as before
    let args: Args = argh::from_env();
    compromise::set_hot_reload_enabled(args.watch);
    if args.watch {
        println!("Hot reloading enabled - there will be memory leaks!");
    }

    let base = PathBuf::from("../libgreet-rs").canonicalize().unwrap();
    let libname = "libgreet.so";
    let relative_path = PathBuf::from("target").join("debug").join(libname);
    let absolute_path = base.join(&relative_path);

    // here's our watcher to communicate between the watcher thread
    // (using `tx`, the "transmitter") and the main thread (using
    // `rx`, the "receiver").
    let (tx, rx) = std::sync::mpsc::channel::<()>();

    let mut watcher: RecommendedWatcher = Watcher::new_immediate({
        move |res: Result<notify::Event, _>| match res {
            Ok(event) => {
                if let notify::EventKind::Create(_) = event.kind {
                    if event.paths.iter().any(|x| x.ends_with(&relative_path)) {
                        // signal that we need to reload
                        tx.send(()).unwrap();
                    }
                }
            }
            Err(e) => println!("watch error: {}", e),
        }
    })
    .unwrap();
    watcher.watch(&base, RecursiveMode::Recursive).unwrap();

    // Initial plugin load, before the main loop starts
    let mut plugin = Some(Plugin::load(&absolute_path).unwrap());
    let start = std::time::SystemTime::now();

    // Forever... (or until Ctrl-C)
    loop {
        std::thread::sleep(std::time::Duration::from_millis(100));

        if rx.try_recv().is_ok() {
            println!("==== Reloading ====");
            // These two lines look funky, but they're needed - we *first*
            // need to drop the current plugin (which will call `dlclose`)
            // before we load the next one (which will call `dlopen`), otherwise
            // we'll just increase the reference count on the already-loaded
            // DSO.
            plugin = None;
            plugin = Some(Plugin::load(&absolute_path)?);
        }

        if let Some(plugin) = plugin.as_ref() {
            let s = format!("We've been running for {:?}", start.elapsed().unwrap());
            let s = CString::new(s)?;
            unsafe { (plugin.greet)(s.as_ptr()) };
        }
    }
}

One more demo?

One more demo.

🎉🎉🎉

Let's draw some stuff

So, we've got the foundation of a very fun playground here.

We can turn our text application into a graphical application with very little effort. But I don't want to spend forever going over various drawing libraries, instead, I think we're going to go with... just a framebuffer.

Raw pixels.

Shell session
$ cargo new --lib common
     Created library `common` package

This library will be used by both greet-rs and libgreet-rs, it'll just define a common data structure.

Rust code
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Pixel {
    pub b: u8,
    pub g: u8,
    pub r: u8,
    /// Unused (zero)
    pub z: u8,
}

#[repr(C)]
pub struct FrameContext {
    pub width: usize,
    pub height: usize,
    pub pixels: *mut Pixel,
    pub ticks: usize,
}

impl FrameContext {
    pub fn pixels(&mut self) -> &mut [Pixel] {
        unsafe { std::slice::from_raw_parts_mut(self.pixels, self.width * self.height) }
    }
}
Cool bear's hot tip

This is not a lesson in FFI (foreign-function interface) but suffice to say that slices are not guaranteed to remain stable from one Rust version to the next.

So, we use a raw pointer instead, and a getter, to construct the slice on the plugin's side.

Shell session
$ cd greet-rs/
$ cargo add minifb
      Adding minifb v0.19.1 to dependencies
$ cargo add ../common
      Adding common (unknown version) to dependencies
Rust code
// in `greet-rs/src/main.rs`

use common::{FrameContext, Pixel};
use minifb::{Key, Window, WindowOptions};

fn main() -> Result<(), Box<dyn Error>> {
    // omitted: CLI arg parsing, paths, watcher initialization
    watcher.watch(&base, RecursiveMode::Recursive).unwrap();

    const WIDTH: usize = 640;
    const HEIGHT: usize = 360;
    let mut pixels: Vec<Pixel> = Vec::with_capacity(WIDTH * HEIGHT);
    for _ in 0..pixels.capacity() {
        pixels.push(Pixel {
            z: 0,
            r: 0,
            g: 0,
            b: 0,
        });
    }

    let mut window = Window::new("Playground", WIDTH, HEIGHT, WindowOptions::default())?;
    window.limit_update_rate(Some(std::time::Duration::from_micros(16600)));

    let mut plugin = Some(Plugin::load(&absolute_path).unwrap());
    let start = std::time::SystemTime::now();

    while window.is_open() && !window.is_key_down(Key::Escape) {
        if rx.try_recv().is_ok() {
            println!("==== Reloading ====");
            plugin = None;
            plugin = Some(Plugin::load(&absolute_path)?);
        }

        if let Some(plugin) = plugin.as_ref() {
            let mut cx = FrameContext {
                width: WIDTH,
                height: HEIGHT,
                pixels: &mut pixels[0],
                ticks: start.elapsed().unwrap().as_millis() as usize,
            };
            unsafe { (plugin.draw)(&mut cx) }
        }

        window
            .update_with_buffer(
                #[allow(clippy::transmute_ptr_to_ptr)]
                unsafe {
                    std::mem::transmute(pixels.as_slice())
                },
                WIDTH,
                HEIGHT,
            )
            .unwrap();
    }

    Ok(())
}

Our plugin interface has been extended a little:

Rust code
// in `greet-rs/src/main.rs`

mod plugin {
    use common::FrameContext; // new
    use libloading::{Library, Symbol};
    use std::{os::raw::c_char, path::Path};

    /// Represents a loaded instance of our plugin
    /// We keep the `Library` together with function pointers
    /// so that they go out of scope together.
    pub struct Plugin {
        pub draw: extern "C" fn(fc: &mut FrameContext), // new
        pub greet: unsafe extern "C" fn(name: *const c_char),
        lib: Library,
    }

    impl Plugin {
        pub fn load(lib_path: &Path) -> Result<Self, libloading::Error> {
            let lib = Library::new(lib_path)?;

            Ok(unsafe {
                Plugin {
                    greet: *(lib.get(b"greet")?),
                    draw: *(lib.get(b"draw")?), // new
                    lib,
                }
            })
        }
    }
}

Running it as-is won't work:

Shell session
$ cargo run -q -- --watch
Hot reloading enabled - there will be memory leaks!
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DlSym { desc: "/home/amos/ftl/greet/libgreet-rs/target/debug/libgreet.so: undefined symbol: draw" }', src/main.rs:70:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Luckily, libloading is looking out for us.

So, let's add a draw function to libgreet-rs

Shell session
$ cd libgreet-rs/
$ cargo add ../common
      Adding common (unknown version) to dependencies

For a first try, we'll make the whole screen blue:

Rust code
// in `libgreet-rs/src/lib.rs`

use common::FrameContext;

#[no_mangle]
pub extern "C" fn draw(cx: &mut FrameContext) {
    let pixels = cx.pixels();

    // all blue!
    for p in pixels {
        p.b = 255;
    }
}

Let's give it a shot:

Shell session
$ cd libgreet-rs/
$ cargo b -q
$ cd greet-rs/
$ cargo run -q -- --watch

Mhhhhh, pure blue. Revolting.

It's a proof of concept bear, cool down.

Also what's up with your window decorations?

I don't know, might be the combination of two HiDPI (high display density) settings, one zooming out, the other zooming in, or maybe it's just that I'm using gnome-unstable.

Ahah. Living dangerously I see.

Always.

And then... then that's it.

We're pretty much done.

Sure, we could add a lot of other nice things. We could let plugins have state, we could expose more functions, in one direction or the other.

But we have a nice enough playground. Don't believe me?

Just wind me up, and watch me go:

Afterword

What about Windows? or macOS?

Both left as an exercise to the reader. For macOS, I'd imagine a similar strategy applies. For Windows, I'm not sure. It looks like the standard library uses DLL_THREAD_DETACH and DLL_PROCESS_DETACH events, and keeps its own list of destructors, so that approach might not work in some multi-threaded scenarios.

We sure had to do a lot of things to live-reload a Rust library. In comparison, live-reloading a C library was super simple. What gives? I thought Rust was the best thing ever?

That's fair - but we went the long-winded way.

...as we always do.

Right. We could've totally gotten away with just avoiding TLS destructors whatsoever - the host application could've exposed a println function, or we could've used std::io::stdout().write() directly. We had options.

Is that family of problems Rust-specific? The __cxa_thread_atexit_impl business?

No, it's not. We would've had the same kind of issues in C++, for which __cxa_thread_atexit_impl was made in the first place.

Would've or could've?

Well, I don't know whether cout and cerr rely on thread-local storage by default, so, "could've", I guess.

Aren't you afraid the readers are going to see the estimated time for this article and just walk away?

Well, they're reading now, aren't they?

...fair. But still, why not split this into a series?

Well, first off, because I want to see just how long I can make articles without splitting them up, without folks just discarding them. Hopefully by now folks know what I'm about, and whether it's worth their time or not.

What's the other reason?

Splitting it into a series involves moving a bunch of assets into a bunch of folders and I'm really tired.

Yes, it is 2020.

So, how long have we been working on this article?

About... two weeks I'd say. One of them full-time.

Do you regret being nerd-sniped like that? Would you try to avoid it in the future?

I don't regret it at all. I wouldn't say stepping through glibc code in LLDB is the epitome of fun, but it's already come in handy several times since I did that.

Would you recommend that readers do the same?

Absolutely - the more you can learn about the layers on which you build: your language's runtime, the operating system, the specifics of memory and processors, it's all useful once in a while.

Do you think they'll actually do it?

Well, by not making complete project sources available for these articles, I'm already sort of trying to reproduce the feeling of absorbing knowledge from a dead tree (ie. print) book and typing it up by hand, on your own computer, to try and reproduce it.

Is that the real reason, or are you being lazy again?

Eh, 50/50. I don't think you can absorb all that knowledge by just downloading and running sources.

You have to work for it.

Isn't that sorta gatekeepy?

I'm not sure. Is it?

Well, I think readers just want something to play with. Not everyone has the time to sift through the article and apply every code change one by one.

You ought to know - you've been updating Making our own executable packer article by article, and it's been taking forever.

Fair enough. You do it then!

Uhhhh...