Improving error handling - panics vs. proper errors

👋 This page was last updated ~5 years ago. Just so you know.

Before we move on to parsing more of our raw packets, I want to take some time to improve our error handling strategy.

Currently, the ersatz codebase contains a mix of Result<T, E>, and some methods that panic, like unwrap() and expect().

We also have a custom Error enum that lets us return rawsock errors, IO errors, or Win32 errors:

pub enum Error {
    Rawsock(rawsock::Error),
    IO(std::io::Error),
    Win32(u32),
}

First of all, I want to address something: When is it okay to panic? And when are we better off returning a result?

Here's my personal take on this:

  • panicking is fine in application code
  • panicking is fine in library code too, when it indicates either of:
    • the library has a bug (it makes assumptions that turned out to be false)
    • the library is being misused in a way that we could not check with the type system.

In general, panicking is fine if nothing useful can be done when that error occurs except for quitting

For example, as of right now, ersatz is a command-line application, so, in the netinfo package, we take some liberties and use expect(), which will panic:

let entry = table
    .entries()
    .iter()
    .find(|r| r.dest == ipv4::Addr([0, 0, 0, 0]))
    .expect("should have default interface");
    // ^ this can panic!

If we change this code to be wrong, for example if we set the IP address to be 0.0.0.1, we get a panic:

$ cargo run --quiet
thread 'main' panicked at 'should have default interface', src\libcore\option.rs:1190:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

If we set the environment RUST_BACKTRACE to 1, we get a backtrace:

$ RUST_BACKTRACE=1 cargo run --quiet
thread 'main' panicked at 'should have default interface', src\libcore\option.rs:1190:5
stack backtrace:
   0: backtrace::backtrace::trace_unsynchronized
             at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.37\src\backtrace\mod.rs:66
   1: std::sys_common::backtrace::_print_fmt
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\sys_common\backtrace.rs:76
   2: std::sys_common::backtrace::_print::{{impl}}::fmt
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\sys_common\backtrace.rs:60
   3: core::fmt::write
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libcore\fmt\mod.rs:1030
   4: std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\io\mod.rs:1412
   5: std::sys_common::backtrace::_print
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\sys_common\backtrace.rs:64
   6: std::sys_common::backtrace::print
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\sys_common\backtrace.rs:49
   7: std::panicking::default_hook::{{closure}}
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:196
   8: std::panicking::default_hook
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:210
   9: std::panicking::rust_panic_with_hook
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:473
  10: std::panicking::continue_panic_fmt
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:380
  11: std::panicking::rust_begin_panic
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:307
  12: core::panicking::panic_fmt
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libcore\panicking.rs:85
  13: core::option::expect_failed
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libcore\option.rs:1190
  14: core::option::Option<ersatz::netinfo::IpForwardRow*>::expect<ersatz::netinfo::IpForwardRow*>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libcore\option.rs:345
  15: ersatz::netinfo::default_nic_guid
             at .\src\netinfo.rs:11
  16: ersatz::main
             at .\src\main.rs:16
  17: std::rt::lang_start::{{closure}}<core::result::Result<(), ersatz::error::Error>>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libstd\rt.rs:64
  18: std::rt::lang_start_internal::{{closure}}
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\rt.rs:49
  19: std::panicking::try::do_call<closure-0,i32>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:292
  20: panic_unwind::__rust_maybe_catch_panic
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libpanic_unwind\lib.rs:80
  21: std::panicking::try
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:271
  22: std::panic::catch_unwind
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panic.rs:394
  23: std::rt::lang_start_internal
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\rt.rs:48
  24: std::rt::lang_start<core::result::Result<(), ersatz::error::Error>>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libstd\rt.rs:64
  25: main
  26: invoke_main
             at d:\agent\_work\3\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
  27: __scrt_common_main_seh
             at d:\agent\_work\3\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
  28: BaseThreadInitThunk
  29: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

If we use a crate like better-panic, we get prettier backtraces:

$ cargo add better-panic
      Adding better-panic v0.2.0 to dependencies
// in `src/main.rs`

fn main() -> Result<(), Error> {
    better_panic::install();

    // cut: rest of main
}
$ RUST_BACKTRACE=1 cargo run --quiet
Backtrace (most recent call first):
  File "rust:/src\libcore\option.rs", line 1190, in core::option::expect_failed
  File "rust:src\libcore\option.rs", line 345, in core::option::Option<ersatz::netinfo::IpForwardRow*>::expect<ersatz::netinfo::IpForwardRow*>
  File "C:\msys64\home\amos\ftl\ersatz\src\netinfo.rs", line 11, in ersatz::netinfo::default_nic_guid
    let entry = table
  File "C:\msys64\home\amos\ftl\ersatz\src\main.rs", line 18, in ersatz::main
    let iface_name = format!(r#"\Device\NPF_{}"#, netinfo::default_nic_guid()?);

The application panicked (crashed).
  should have default interface
in src\libcore\option.rs, line 1190
thread: main

The text doesn't do it justice, because it has colors, so here's a screenshot too:

That said, we're going to be converting ersatz to a library eventually, so that it can be used by sup to speak ICMP.

You may argue that this is still a fine use of panics: if an application relies on a library like ersatz to do networking, it's unlikely the application will be able to do anything useful it we can't get ahold of the default network interface.

Then again, it's not impossible! The application might be a server, and it might use ersatz in a handler - in that case, we wouldn't want it to bring down the whole server. It might also fall back to another method of doing networking (again, unlikely, but not impossible). In that scenario, the application would have to catch the panic with something like std::panic::catch_unwind()

Here's how it could do it:

// in `src/main.rs`

fn main() -> Result<(), Error> {
    let res = std::panic::catch_unwind(|| {
        panic!("This will be caught");
    });
    match res {
        Ok(_) => println!("Call went fine"),
        Err(e) => println!("Call went wrong, error is: {:?}", e),
    }
$ cargo run --quiet
thread 'main' panicked at 'This will be caught', src\main.rs:15:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Call went wrong: Any
Listening for packets...

Mhh, this isn't ideal. First off, the "panic" message is still shown - if we still had better-panic installed, or if we exported RUST_BACKTRACE=1, we'd see the whole backtrace - even though we catch it!

Second, from where we catch it (the Err() arm of the match res), we don't see the error message. The E in Result<T, E> for catch_unwind is Box<dyn Any + Send>.

If we want to see the message, we have to downcast the Any - which can also fail. Here, the Any contains a &str, so we can get it out:

let res = std::panic::catch_unwind(|| {
    panic!("This will be caught");
});
match res {
    Ok(_) => println!("Call went fine"),
    Err(e) => {
        let s: Box<&'static str> = e.downcast().unwrap();
        println!("Call went wrong, error is {:?}", s)
    }
}
$ cargo run --quiet
thread 'main' panicked at 'This will be caught', src\main.rs:15:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Call went wrong, error is "This will be caught"
Listening for packets...

...but that won't be true every time. If we change our panicking code to this:

let res = std::panic::catch_unwind(|| {
    let s: Option<String> = None;
    s.expect("s should not be none");
});

...then we get twice the panics!

$ cargo run --quiet
thread 'main' panicked at 's should not be none', src\libcore\option.rs:1190:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src\libcore\result.rs:1165:5

Besides, we're still not able to suppress the output from the panic we capture. I'm also ready to bet that catching a panic is rather expensive (see Why not use exceptions as regular flow of control?).

And to top it all off, this won't even catch all panics:

Note that this function may not catch all panics in Rust. A panic in Rust is not always implemented via unwinding, but can be implemented by aborting the process as well. This function only catches unwinding panics, not those that abort the process.

(From the catch_unwind docs)

Cool bear

Cool bear's hot tip

You can enable "abort on panics" with a few lines in your Cargo.toml, see Aborting on panic.

So, seeing as we plan on ersatz becoming a library, we probably want to get rid of most of its panicky code.

Cool bear

Cool bear's hot tip

Here we're going to go through the codebase manually, and remove panicky code.

You can also give the dont-panic crate a shot, but its caveats make it rather unpractical.

I just wanted to mention it because it's a neat trick.

Reviewing panicky code

So, let's review panicky code in ersatz.

We've got a couple of expect() calls in src/loadlibrary.rs:

impl Library {
    pub fn new(name: &str) -> Option<Self> {
        let name = CString::new(name).expect("invalid library name");
        // etc.
    }

    pub unsafe fn get_proc<T>(&self, name: &str) -> Option<T> {
        let name = CString::new(name).expect("invalid proc name");
        // etc.
    }
}

Now, back when we wrote that code, we made the argument that it was fine to panic here because:

  1. It let us not worry about error handling for now
  2. It's quite unlikely for someone to pass a string with null bytes in them

But making that expect() panic is easy enough:

// in `src/main.rs`
fn main() -> Result<(), Error> {
    better_panic::install();

    let name = "\0UH_OH.dll\0";
    loadlibrary::Library::new(name);

    // etc.
}

If the application uses our loadlibrary module with string constants, if it knows exactly which libraries/symbols to use, then it's no problem.

The panic will probably be caught in testing, and then the application code can be fixed, and everything is fine!

But what if it's a REPL (Read Eval Print Loop)? And it it lets folks load libraries with arbitrary names?

$ fictive-repl
loadlibrary "\0UH_OH.dll"

Then we probably don't want the whole REPL to collapse onto itself whenever there's invalid input.

So, for loadlibrary, we probably want a proper Result<T, E> here, not just an Option<T>.

But I still feel too lazy to make a proper Error enum just for loadlibrary.

Introducing thiserror

Well, good news everyone! At the very moment I was writing Part 6, David Tolnay was busy solving that problem.

If we were to write an Error enum manually, it would look something like:

pub enum Error {
    InvalidLibraryName(&'static str),
    CouldNotOpenLibrary(&'static str),
    InvalidProcName(&'static str),
    CouldNotGetProc(&'static str),
}

use std::fmt;
impl fmt::Debug for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            Self::InvalidLibraryName(name) => write!(f, "invalid library name: {:?}", name),
            Self::CouldNotOpenLibrary(name) => write!(f, "could not open library: {:?}", name),
            Self::InvalidProcName(name) => write!(f, "invalid proc name: {:?}", name),
            Self::CouldNotGetProc(name) => write!(f, "could not get proc: {:?}", name),
        }
    }
}

impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        fmt::Debug::fmt(self, f)
    }
}

impl std::error::Error for Error {}

If we use the thiserror crate however, we only have to write this:

$ cargo add thiserror
      Adding thiserror v1.0.5 to dependencies
use thiserror::Error;

#[derive(Error, Debug)]
pub enum Error {
    #[error("invalid library name {0:?}")]
    InvalidLibraryName(&'static str),
    #[error("could not open library {0:?}")]
    CouldNotOpenLibrary(&'static str),
    #[error("invalid proc name {0:?}")]
    InvalidProcName(&'static str),
    #[error("could not get proc {0:?}")]
    CouldNotGetProc(&'static str),
}

And that's it.

All that's left is to use it:

// in `src/loadlibrary.rs`

impl Library {
    pub fn new(name: &'static str) -> Result<Self, Error> {
        let c_name = match CString::new(name) {
            Ok(x) => x,
            Err(_) => return Err(Error::InvalidLibraryName(name)),
        };
        let res = unsafe { LoadLibraryA(c_name.as_ptr()) };
        res.ok_or(Error::CouldNotOpenLibrary(name))
            .map(|module| Library { module })
    }

    pub unsafe fn get_proc<T>(&self, name: &'static str) -> Result<T, Error> {
        let c_name = match CString::new(name) {
            Ok(x) => x,
            Err(_) => return Err(Error::InvalidProcName(name)),
        };
        let res = GetProcAddress(self.module, c_name.as_ptr());
        res.ok_or(Error::CouldNotGetProc(name)).map(|proc| transmute_copy(&proc))
    }
}

Note that we were using Library::new(...).unwrap() previously, so we don't need to change, say, our bind! macro.

To see the error, we do need to unwrap() explicitly in our sample error-provoking code in main though:

// in `src/main.rs`

fn main() -> Result<(), Error> {
    better_panic::install();

    let name = "\0UH_OH.dll\0";
    loadlibrary::Library::new(name).unwrap();

    // etc.
}
$ cargo run --quiet
Backtrace (most recent call first):
  File "rust:src\libcore\result.rs", line 933, in core::result::Result<ersatz::loadlibrary::Library, ersatz::loadlibrary::Error>::unwrap<ersatz::loadlibrary::Library,ersatz::loadlibrary::Error>
  File "C:\msys64\home\amos\ftl\ersatz\src\main.rs", line 17, in ersatz::main
    loadlibrary::Library::new(name).unwrap();

The application panicked (crashed).
  called `Result::unwrap()` on an `Err` value: InvalidLibraryName("\u{0}UH_OH.dll\u{0}")
in src\libcore\result.rs, line 1165
thread: main

Good! But not quite there yet. We do get a backtrace, but it only shows where we unwrap(), not where the error actually occured.

Introducing failure

If we want to fix that, we can use the failure crate by boats:

$ cargo add failure
      Adding failure v0.1.6 to dependencies

We'll want to enable its backtrace feature:

# in `Cargo.toml`

[dependencies]
# before:
failure = "0.1.6"

# after:
failure = { version = "0.1.6", features = ["backtrace"] }

Then all we have to do is change our return type to failure::Fallible<T>:

impl Library {
    pub fn new(name: &'static str) -> failure::Fallible<Self> {
        let c_name = match CString::new(name) {
            Ok(x) => x,
            // note: the `?` sigil automatically uses the `Into` trait
            // to convert to `failure::Error`
            Err(_) => Err(Error::InvalidLibraryName(name))?,
        };
        let res = unsafe { LoadLibraryA(c_name.as_ptr()) };
        // note: for `ok_or`, we can't use `?`, so we have to call
        // `.into()` explicitly
        res.ok_or(Error::CouldNotOpenLibrary(name).into())
            .map(|module| Library { module })
    }

    pub unsafe fn get_proc<T>(&self, name: &'static str) -> failure::Fallible<T> {
        let c_name = match CString::new(name) {
            Ok(x) => x,
            Err(_) => Err(Error::InvalidProcName(name))?,
        };
        let res = GetProcAddress(self.module, c_name.as_ptr());
        res.ok_or(Error::CouldNotGetProc(name).into()).map(|proc| transmute_copy(&proc))
    }
}

And then we get a stack trace of where the actual error occured:

$ cargo run --quiet
Backtrace (most recent call first):
  File "rust:src\libcore\result.rs", line 933, in core::result::Result<ersatz::loadlibrary::Library, failure::error::Error>::unwrap<ersatz::loadlibrary::Library,failure::error::Error>
  File "C:\msys64\home\amos\ftl\ersatz\src\main.rs", line 17, in ersatz::main
    loadlibrary::Library::new(name).unwrap();

The application panicked (crashed).
  called `Result::unwrap()` on an `Err` value: InvalidLibraryName("\u{0}UH_OH.dll\u{0}")

  stack backtrace:
     0: backtrace::backtrace::dbghelp::trace
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace\dbghelp.rs:88
     1: backtrace::backtrace::trace_unsynchronized<closure-0>
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace\mod.rs:66
     2: backtrace::backtrace::trace<closure-0>
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace\mod.rs:53
     3: backtrace::capture::Backtrace::create
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\capture.rs:164
     4: backtrace::capture::Backtrace::new_unresolved
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\capture.rs:158
     5: failure::backtrace::internal::InternalBacktrace::new
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\failure-0.1.6\src\backtrace\internal.rs:46
     6: failure::backtrace::Backtrace::new
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\failure-0.1.6\src\backtrace\mod.rs:121
     7: failure::error::error_impl::{{impl}}::from<ersatz::loadlibrary::Error>
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\failure-0.1.6\src\error\error_impl.rs:19
     8: failure::error::{{impl}}::from<ersatz::loadlibrary::Error>
               at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\failure-0.1.6\src\error\mod.rs:36
     9: ersatz::loadlibrary::Library::new
               at src\loadlibrary.rs:38
    10: ersatz::main
               at src\main.rs:17
    11: std::rt::lang_start::{{closure}}<core::result::Result<(), ersatz::error::Error>>
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libstd\rt.rs:64
    12: std::rt::lang_start_internal::{{closure}}
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\rt.rs:49
    13: std::panicking::try::do_call<closure-0,i32>
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:292
    14: panic_unwind::__rust_maybe_catch_panic
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libpanic_unwind\lib.rs:80
    15: std::panicking::try
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panicking.rs:271
    16: std::panic::catch_unwind
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\panic.rs:394
    17: std::rt::lang_start_internal
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\/src\libstd\rt.rs:48
    18: std::rt::lang_start<core::result::Result<(), ersatz::error::Error>>
               at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libstd\rt.rs:64
    19: main
    20: invoke_main
               at d:\agent\_work\3\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    21: __scrt_common_main_seh
               at d:\agent\_work\3\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    22: BaseThreadInitThunk
    23: RtlUserThreadStart
in src\libcore\result.rs, line 1165
thread: main

I can't help but remark a few things though

  • We now have two backtraces: the location of .unwrap(), and the location where the error was "thrown".
  • The second backtrace is a bit on the verbose side - better-panic is not used there, since it's not a panic!

So we probably don't need better-panic anymore:

$ cargo rm better-panic
    Removing better-panic from dependencies

But we should probably find another solution, because those backtraces are really verbose and hard to read.

Introducing color-backtrace

How about we give the color-backtrace crate a try? It seems it recently got (hacky) support for failure, which we also use!

$ cargo add color-backtrace
      Adding color-backtrace v0.3.0 to dependencies

We'll want to enable the experimental failure-bt feature:

# in `Cargo.toml`

[dependencies]
color-backtrace = { version = "0.3.0", features = ["failure-bt"] }

And also "install" it, just like we did with better-panic:

// in `src/main.rs`

fn main() -> Result<(), Error> {
    color_backtrace::install();

    // etc.
}

And as it turns out, failure support is not quite that magical, so we can't just use unwrap() - instead, we have to use print_backtrace ourselves.

Let's move a few things around to make things easier:

// in `src/main.rs`

// This one does not return a result
fn main() {
    if let Err(e) = do_main() {
        eprintln!("Fatal error: {}", e);

        use color_backtrace::{failure::print_backtrace, Settings};
        unsafe {
            print_backtrace(&e.backtrace(), &mut Settings::new()).unwrap();
        }
    }
}

// This one returns a result
fn do_main() -> failure::Fallible<()> {
    color_backtrace::install();

    let name = "\0UH_OH.dll\0";
    loadlibrary::Library::new(name)?;

    // etc.
}

Does it work?

$ cargo run --quiet
<failure backtrace not captured>

Ah. The hint isn't quite as clear as the default Rust one, but we know what to do:

$ RUST_BACKTRACE=1 cargo run --quiet
Fatal error: invalid library name "\u{0}UH_OH.dll\u{0}"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                          (7 post panic frames hidden)
 7: failure::error::error_impl::{{impl}}::from<ersatz::loadlibrary::Error>
    at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\failure-0.1.6\src\error\error_impl.rs:19
 8: failure::error::{{impl}}::from<ersatz::loadlibrary::Error>
    at C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\failure-0.1.6\src\error\mod.rs:36
 9: ersatz::loadlibrary::Library::new
    at C:\msys64\home\amos\ftl\ersatz\src\loadlibrary.rs:38
10: ersatz::do_main
    at C:\msys64\home\amos\ftl\ersatz\src\main.rs:27
11: ersatz::main
    at C:\msys64\home\amos\ftl\ersatz\src\main.rs:13
                        (13 runtime init frames hidden)

Wonderful!

Here's a screenshot for the full effect:

There's one thing that's still bothering me though. In order to get our custom error type to work, we had to change the signature from:

impl Library {
    pub fn new(name: &str) -> failure::Fallible<Self> { /* etc. */ }
}

To:

impl Library {
    pub fn new(name: &'static str) -> failure::Fallible<Self> { /* etc. */ }
}

i.e, the library and proc names now need to be 'static. Which is okay if you're loading libraries and procs from string constants (already 'static), but not if you're implementing a REPL, where you'll be passing references to strings that were just parsed from user input, and which are short-lived.

Cool bear

Cool bear's hot tip

Why did we change name to have the 'static lifetime in the first place?

Imagine we have a top-level error handler that prints fatal errors, complete with their backtrace. It'll need to own an instance of whatever our Error type is, and that instance needs to own sufficient data to format the error.

This is because the error is actually formatted way after the problematic function has returned and all its locals have been dropped.

So we can't return a &str with a lifetime shorter than 'static:

fn example() -> Result<(), Error> {
    let s = String::from("some string");

    if is_wrong(s) {
        // this borrow outlives `s`, which is dropped
        // on return, so the compiler won't let us do that.
        return Err(MyError(&s))
    }

    Ok(())
}

But we can return a reference with a 'static lifetime:

fn example() -> Result<(), Error> {
    let s = "some string";

    if is_wrong(s) {
        // this is okay - `s` is a string constant, so it's `'static`
        return Err(MyError(s))
    }

    Ok(())
}

Or we can have MyError own the String, in which case the string will be valid as long as we own an instance of MyError:

fn example() -> Result<(), Error> {
    let s = String::from("some string");

    if is_wrong(s) {
        // this is okay - we're moving a clone of `s` into the Error itself,
        // which it now owns.
        return Err(MyError(s.clone()))
    }

    Ok(())
}

Also, it's not clear what the problem is with the Library name! It's clear to us, since we wrote the code, but it wouldn't be clear to a third-party user of the library.

Let's make error handling a whole lot better.

First we'll want to change Library a little, so we can remember its name:

// in `src/loadlibrary.rs`

pub struct Library {
    // new!
    name: String,
    module: HModule,
}

Next, we want to reduce possible errors to three:

  • One for when an invalid name is passed
  • One for when a library cannot be found
  • One for when a proc in a given library cannot be found
// in `src/loadlibrary.rs`

#[derive(Error, Debug)]
pub enum Error {
    #[error("invalid name: {0}")]
    InvalidName(#[from] std::ffi::NulError),

    #[error("could not open library {0:?}")]
    LibraryNotFound(String),

    #[error("could not find proc {proc:?} in {library:?}")]
    ProcNotFound { proc: String, library: String },
}

This uses the #[from] attribute from the thiserror crate, and some more formatting tricks. Go read its whole documentation!

Now we only have to actually return those errors:

impl Library {
    // note: 'static is gone!
    pub fn new(name: &str) -> failure::Fallible<Self> {
        // we can use `?`, because `Error::InvalidName` implements `From`
        // for `std::ffi::NulError`
        let c_name = CString::new(name)?;

        match unsafe { LoadLibraryA(c_name.as_ptr()) } {
            Some(module) => Ok(Self {
                name: name.to_string(),
                module,
            }),
            None => Err(Error::LibraryNotFound(name.to_string()))?,
        }
    }

    pub unsafe fn get_proc<T>(&self, name: &str) -> failure::Fallible<T> {
        let c_name = CString::new(name)?;

        match GetProcAddress(self.module, c_name.as_ptr()) {
            Some(proc) => Ok(transmute_copy(&proc)),
            None => Err(Error::ProcNotFound {
                library: self.name.clone(),
                proc: name.to_string(),
            })?,
        }
    }
}

Does it work?

Well.. kind of. But it seems our "invalid name" message was swallowed.

Why is that happening? Well, with failure, the return type is failure::Fallible<T>, which is equivalent to std::result::Result<T, failure::Error>. And failure::Error contains a dyn failure::Fail, which has a blanket implementation for all types that implement std::error::Error.

And std::ffi::NulError definitely implements std::error::Error. So if we want our failure::Error to contain our error instead, we'll have to be a tiny bit more specific:

impl Library {
    pub fn new(name: &str) -> failure::Fallible<Self> {
        // this `map_err` transforms `NulError` into `Error::InvalidName(NulError)`
        let c_name = CString::new(name).map_err(Error::from)?;

        // etc.
    }

    pub unsafe fn get_proc<T>(&self, name: &str) -> failure::Fallible<T> {
        let c_name = CString::new(name).map_err(Error::from)?;

        // etc.
    }

Better!

But we've ignored something important.

Pattern matching and downcasting

If we did want to handle those loadlibrary errors directly, instead of just bubbling them up to print a backtrace - could we?

Sure! We'd just need downcast_ref and a bit of pattern matching:

// in `src/main.rs`

fn do_main() -> failure::Fallible<()> {
    color_backtrace::install();

    match loadlibrary::Library::new("USER33.dll") {
        Ok(_) => println!("Library loaded fine!"),
        Err(e) => {
            if let Some(loadlibrary::Error::LibraryNotFound(name)) =
                e.downcast_ref::<loadlibrary::Error>()
            {
                println!("Oh no, we don't have {:?}", name);
                // we could fall back to another library here
            } else {
                // bubble up
                return Err(e);
            }
        }
    }

    // etc.
}
$ RUST_BACKTRACE=1 cargo run --quiet
Oh no, we don't have "USER33.dll"
Listening for packets...

Nice.

But this raises the question: do we really need to use failure in loadlibrary?

Probably not. It would be less awkward for users to match directly on loadlibrary::Error rather than have to go through downcast_ref. We can still use failure in application code!

Also, is it right for Library::new() and Library::get_proc() to return the same error type? They have some common errors (invalid name), but it's impossible for new() to return ProcNotFound, and for get_proc() to return LibraryNotFound, which makes it awkward to match for those error conditions.

If we were still writing error types manually, I would be extremely too lazy to do it. But with thiserror, we'll be there in no time.

Let's first split our error types:

// in `src/loadlibrary.rs`

#[derive(Error, Debug)]
pub enum LibraryError {
    #[error("invalid library name: {0}")]
    InvalidName(#[from] std::ffi::NulError),

    #[error("could not open library {0:?}")]
    NotFound(String),
}

#[derive(Error, Debug)]
pub enum ProcError {
    #[error("invalid proc name: {0}")]
    InvalidName(#[from] std::ffi::NulError),

    #[error("could not find proc {proc:?} in {library:?}")]
    NotFound { proc: String, library: String },
}

Cool! Now our error messages are even more precise: we know - without looking at a backtrace - whether it's an invalid library name or an invalid proc name.

We were also able to make variant names shorter to avoid repetition: ProcError::NotFound, vs Error::ProcNotFound.

Now to use them:

impl Library {
    // note that we specify E=LibraryError here...
    pub fn new(name: &str) -> Result<Self, LibraryError> {
        // ...so now we don't have to use `map_err`!
        let c_name = CString::new(name)?;

        match unsafe { LoadLibraryA(c_name.as_ptr()) } {
            Some(module) => Ok(Self {
                name: name.to_string(),
                module,
            }),
            None => Err(LibraryError::NotFound(name.to_string()))?,
        }
    }

    pub fn get_proc<T>(&self, name: &str) -> Result<T, ProcError> {
        let c_name = CString::new(name)?;

        match unsafe { GetProcAddress(self.module, c_name.as_ptr()) } {
            Some(proc) => Ok(unsafe { transmute_copy(&proc) }),
            None => Err(ProcError::NotFound {
                library: self.name.clone(),
                proc: name.to_string(),
            })?,
        }
    }
}

We of course, still want to use failure::Fallible in src/main.rs, and we want to keep our custom top-level error handler:

fn do_main() -> failure::Fallible<()> {
    color_backtrace::install();

    let kernel32 = loadlibrary::Library::new("KERNEL32.dll")?;
    #[allow(non_snake_case)]
    let PreventBugs: unsafe extern "C" fn() = kernel32.get_proc("PreventBugs")?;
    unsafe {
        PreventBugs();
    }

    // etc.
}

Here's the result:

Which is the best of all worlds:

  • We get a clean, colorful backtrace into application code
  • The errors are fully-typed and we could easily match them if we wanted to
  • The ? sigil does what we want in library code
Cool bear

Cool bear's hot tip

There is, of course, no PreventBugs function in KERNEL32.dll (yet!), but there is SetProcessMitigationPolicy, which gets us, like, 20% of the way there.

Cool bear

What did we learn?

The thiserror crate lets you derive Error and Display with custom messages. It also lets you do conversion From other error types (IO, etc.).

The failure crate has an Error types that captures backtraces. It contains a Fail, which has a blanket implementation for all types that implement std::error::Error.

The better-panic crate prints panic backtraces in a friendlier format - but it only works for panics!

The color-backtrace crate also formats backtraces in a nice way - with color, and omitting init and post-panic frames.

All that's left is for us to go through the rest of the code and decide where we want custom errors.

But first, let's replace this:

// in `src/error.rs`

use std::fmt;

pub enum Error {
    Rawsock(rawsock::Error),
    IO(std::io::Error),
    Win32(u32),
}

impl From<rawsock::Error> for Error {
    fn from(e: rawsock::Error) -> Self {
        Self::Rawsock(e)
    }
}

impl From<std::io::Error> for Error {
    fn from(e: std::io::Error) -> Self {
        Self::IO(e)
    }
}

impl From<u32> for Error {
    fn from(e: u32) -> Self {
        Self::Win32(e)
    }
}

impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            Self::Rawsock(e) => write!(f, "{}", e),
            Self::IO(e) => write!(f, "{}", e),
            Self::Win32(e) => write!(f, "Win32 error code {} (0x{:x})", e, e),
        }
    }
}

impl fmt::Debug for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self)
    }
}

impl std::error::Error for Error {}

With this:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum Error {
    #[error("Raw socket error: {0}")]
    Rawsock(#[from] rawsock::Error),
    #[error("I/O: {0}")]
    IO(#[from] std::io::Error),
    #[error("Win32 error code {0} (0x{0:x})")]
    Win32(u32),
}

and... we don't need to change how Error is used anywhere in the codebase.

Those were equivalent. thiserror fits our use case perfectly.

What else... we've got a couple of unwrap in our bind! macro:

// in `src/loadlibrary.rs`

// in `bind` macro

static FUNCTIONS: once_cell::sync::Lazy<Functions> =
    once_cell::sync::Lazy::new(|| {
        let lib = crate::loadlibrary::Library::new($lib).unwrap();
        Functions {
            $($name: lib.get_proc(stringify!($name)).unwrap()),*
        }
    });

These seem fine. bind! is meant to load libraries and procs with static names, and we have no way to bubble up the error since they're lazily loaded whenever they're first called - their unsafe extern "C" prototype may not even allows us to return an error (and if they do, it's an u32).

But, we still benefit from the error handling work we did on loadlibrary earlier. Previously, if we specified a wrong library, we'd just get a message like this:

$ RUST_BACKTRACE=1 cargo run --quiet
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src\libcore\option.rs:378:21

...and we'd have to rely on the stack trace to guide us towards the bind! invocation that caused the problem.

Now, if we "accidentally" get a proc name wrong:

// in `src/netinfo.rs`

crate::bind! {
    library "IPHLPAPI.dll";

    fn GetIpForwardTable(table: *mut IpForwardTable, size: *mut u32, order: bool) -> u32;
    fn GetInterfaceInfo(info: *mut IpInterfaceInfo, size: *mut u32) -> u32;

    fn FinishArticleSeries() -> ();
}

We get all the detail we want:

And further down the stack:

It doesn't go through our top-level error handler, so the output is not quite as nice (it shows the Debug version of the error, not the Display one), but that's okay.

Next up, we have a couple of expects here:

// in `src/netinfo.rs`

pub fn default_nic_guid() -> Result<String, Error> {
    let table = VLS::new(|ptr, size| GetIpForwardTable(ptr, size, false))?;
    let entry = table
        .entries()
        .iter()
        .find(|r| r.dest == ipv4::Addr([0, 0, 0, 0]))
        .expect("should have default interface");

    let ifaces = VLS::new(|ptr, size| GetInterfaceInfo(ptr, size))?;
    let iface = ifaces
        .adapters()
        .iter()
        .find(|r| r.index == entry.if_index)
        .expect("default interface should exist");

    // etc.
}

...which we can easily add cases for in our top-level error type:

// in `src/error.rs`

use thiserror::Error;

#[derive(Debug, Error)]
pub enum Error {
    #[error("Raw socket error: {0}")]
    Rawsock(#[from] rawsock::Error),
    #[error("I/O: {0}")]
    IO(#[from] std::io::Error),
    #[error("Win32 error code {0} (0x{0:x})")]
    Win32(u32),

    #[error("ersatz could not find the default IP route")]
    DefaultRouteMissing,
    #[error("ersatz could not find the default network interface")]
    DefaultInterfaceMissing,
    #[error("ersatz could not identify the default network interface")]
    DefaultInterfaceUnidentified,
}

and then use:

// in `src/netinfo.rs`

pub fn default_nic_guid() -> Result<String, Error> {
    let table = VLS::new(|ptr, size| GetIpForwardTable(ptr, size, false))?;
    let entry = table
        .entries()
        .iter()
        .find(|r| r.dest == ipv4::Addr([0, 0, 0, 0]))
        .ok_or(Error::DefaultRouteMissing)?;

    let ifaces = VLS::new(|ptr, size| GetInterfaceInfo(ptr, size))?;
    let iface = ifaces
        .adapters()
        .iter()
        .find(|r| r.index == entry.if_index)
        .ok_or(Error::DefaultInterfaceMissing)?;

    let name = iface.name.to_string();
    let guid_start = name.find("{").ok_or(Error::DefaultInterfaceUnidentified)?;
    let guid = &name[guid_start..];
    Ok(guid.to_string())
}

And that's all of our errors so far!

We could go down the rabbit hole further if we intended to match and handle those errors - I'd probably let netinfo have its own error type, and have ersatz's Error type wrap it.

But that's absolutely unnecessary - as was most of this article for an exploratory codebase. It was a nice showcase of the different crates you can use to improve error handling in your codebase though, so I hope you liked it!

Next article, I pinky promise, we'll read some IPv4 packets.

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Aiming for correctness with types

The Nature weekly journal of science was first published in 1869. And after one and a half century, it has finally completed one cycle of carcinization, by publishing an article about the Rust programming language.

It's a really good article.

What I liked about this article is that it didn't just talk about performance, or even just memory safety - it also talked about correctness.