FFI-safe types in Rust, newtypes and MaybeUninit

👋 This page was last updated ~5 years ago. Just so you know.

It's time to make sup, our own take on ping, use the Win32 APIs to send an ICMP echo. Earlier we discovered that Windows's ping.exe used IcmpSendEcho2Ex. But for our purposes, the simpler IcmpSendEcho will do just fine.

As we mentioned earlier, it's provided by IPHLPAPI.dll, and its C declaration is:

IPHLPAPI_DLL_LINKAGE DWORD IcmpSendEcho(
  HANDLE                 IcmpHandle,
  IPAddr                 DestinationAddress,
  LPVOID                 RequestData,
  WORD                   RequestSize,
  PIP_OPTION_INFORMATION RequestOptions,
  LPVOID                 ReplyBuffer,
  DWORD                  ReplySize,
  DWORD                  Timeout
);

Compared to MessageBoxA, there's a lot more types going on!

WORD is typically an u16, whereas DWORD is an u32.

LPVOID is a Long Pointer to Void, so const *c_void will do. Same goes for HANDLE, according to Windows Data Types.

And then there's IPAddr, which is an IPv4 address (there's a separate family of functions for IPv6). We know that IP addresses are written by humans as x.y.z.w, where each letter is a number between 0 and 255.

Let's make a proper type for that:

// this is a **newtype**
// it has the same memory layout as `[u8; 4]`, but we can
// define our own implementations of traits..
struct IPAddr([u8; 4]);

// ..like this trait for example!
impl fmt::Debug for IPAddr {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        // here, self is effectively a tuple with a single
        // element of type `[u8; 4]`. `self.0` accesses the
        // first element of a tuple, and having `[a, b, c, d]`
        // on the left does a destructuring assignment, letting us
        // bind elements of the array to different names.
        // this works because u8 is Copy!
        let [a, b, c, d] = self.0;
        write!(f, "{}.{}.{}.{}", a, b, c, d)
    }
}

Now let's take it out for a spin:

fn main() {
    let addr = IPAddr([8, 8, 8, 8]);
    println!("addr = {:?}", addr);

    let addr_as_integer: u32 = unsafe { transmute(addr) };
    println!("addr_as_integer = {}", addr_as_integer);
}

Hey, that looks familiar!

By using transmute, we were able to reinterpret our IPAddr type as a 32-bit integer, and we stumbled upon 134744072, the same value rohitab's API monitor showed us in part 2.

Next let's take a look at RequestOptions. PIP_OPTION_INFORMATION is a pointer to an IP_OPTION_INFORMATION_STRUCTURE, for which MSDN gives us the following C declaration:

typedef struct ip_option_information32 {
  UCHAR Ttl;
  UCHAR Tos;
  UCHAR Flags;
  UCHAR OptionsSize;
  UCHAR POINTER_32 *OptionsData;
} IP_OPTION_INFORMATION32, *PIP_OPTION_INFORMATION32;

We recognize ttl (time to live). But what's that POINTER_32 thing? We're currently working on a 64-bit Windows, and the docs for IcmpSendEcho say the following:

RequestOptions

A pointer to the IP header options for the request, in the form of an IP_OPTION_INFORMATION structure. On a 64-bit platform, this parameter is in the form for an IP_OPTION_INFORMATION32 structure.

That's why I pulled up the C declaration for IP_OPTION_INFORMATION32. If we look at the other one, IP_OPTION_INFORMATION, which is used on 32-bit Windows:

typedef struct ip_option_information {
  UCHAR  Ttl;
  UCHAR  Tos;
  UCHAR  Flags;
  UCHAR  OptionsSize;
  PUCHAR OptionsData;
} IP_OPTION_INFORMATION, *PIP_OPTION_INFORMATION;

...it's pretty much the same thing, except OptionsData is a regular pointer.

But! A regular pointer on 64-bit would be, well, 64 bits. Whereas on 32-bit it's 32-bit. So by having two different structs, Windows ensures that, whether it's called from a 64-bit process or a 32-bit process, the structure has the exact same size and layout.

But why?

My guess is that this struct eventually gets passed to the kernel, and although, on 64-bit Windows, processes can be either 32-bit or 64-bit, via WoW64, by the time we hit the network stack of the kernel, that distinction is gone, so there has to be a single struct declaration with a single layout. The 32-bit version was there first, so it's used for both architectures, as the common denominator.

Cool bear

Cool bear's hot tip

Amos is making guesses here, but if you know better (for example, you've worked at Microsoft at the time these decisions were made), feel free to send him an errata on Twitter.

In this case, we're not really planning on passing "IP options", so we can just use any old 32-bit-wide type. And so our IpOptionInformation type reads:

#[repr(C)]
struct IpOptionInformation {
    ttl: u8,
    tos: u8,
    flags: u8,
    options_size: u8,
    // actually a 32-bit pointer, but, that's a Windows
    // oddity and I couldn't find a built-in Rust type for it.
    options_data: u32,
}

Notice that we used #[repr(C)]. What does that mean?

Well, there are several ways to lay out a struct in memory. Let's take this one:

struct Foo {
    a: u8,
    b: u32,
}

What's the size of that struct? 5 bytes, right? One for a, four for b:

Right??

Wrong.

The actual struct layout chosen by rustc here is actually:

If we wanted to have the compact representation we were thinking of, we could use repr(packed):

fn main() {
    struct Foo {
        a: u8,
        b: u32,
    }

    #[repr(packed)]
    struct FooPacked {
        a: u8,
        b: u32,
    }

    use std::mem::size_of;
    println!("Foo       = {}", size_of::<Foo>());
    println!("FooPacked = {}", size_of::<FooPacked>());
}

The reason for that is performance. Conventional wisdom says: it's faster to access values that are aligned. So, for a 4-byte value, you'd store it at an address that's a multiple of 4.

Cool bear

Cool bear's hot tip

That's really hard to benchmark correctly.

Also, it doesn't seem all that true for modern x86, and when dealing with larger data sets, padding actually hurts performance, because of caching.

Finally, you need to be aware that "misaligned accesses are okay" is not true for a large number of non-x86 processors, see this answer.

Anyway, different compilers have different ways of laying out structs in memory, and since we're interacting with about 65 million lines of C/C++ code, we use #[repr(C)].

Does this actually make a difference for this struct?

We can check with the memoffset crate.

fn main() {
    // first, let's declare both structs: with Rust repr
    struct IOI_Rust {
        ttl: u8,
        tos: u8,
        flags: u8,
        options_size: u8,
        options_data: u32,
    }

    // and C repr
    #[repr(C)]
    struct IOI_C {
        ttl: u8,
        tos: u8,
        flags: u8,
        options_size: u8,
        options_data: u32,
    }

    use memoffset::span_of;
    use std::mem::size_of;

    // let's make a quick macro, this will make this a lot easier
    macro_rules! print_offset {
        // the macro takes one identifier (the struct's name), then a tuple
        // of identifiers (the field names)
        ($type: ident, ($($field: ident),*)) => {
            // `$type` is an identifier, but we're going to
            // print it out, so we need it as a string instead.
            let t = stringify!($type);

            // this will repeat for each $field
            $(
                let f = stringify!($field);
                let span = span_of!($type, $field);
                println!("{:10} {:15} {:?}", t, f, span);
            )*

            // finally, print the total field size
            let ts = size_of::<$type>();
            println!("{:10} {:15} {}", t, "(total)", ts);
            println!();
        };
    }
    print_offset!(IOI_Rust, (ttl, tos, flags, options_size, options_data));
    print_offset!(IOI_C, (ttl, tos, flags, options_size, options_data));
}

Here's the output from this program:

So it does make a difference in our case. As a diagram now:

So if we hadn't used #[repr(C)], we would have been passing garbage for all the parameters. And that's the scary thing with extern functions. Since we provide our own declarations, and the compiler believes us at our word, we better get it right.

Putting all of this knowledge together, we can tentatively write out the type for IcmpSendEcho as:

type Handle = *const c_void;
type IcmpSendEcho = extern "stdcall" fn(
    handle: Handle,
    dest: IPAddr,
    request_data: *const u8,
    request_size: u16,
    request_options: Option<&IpOptionInformation>,
    reply_buffer: *mut u8,
    reply_size: u32,
    timeout: u32,
) -> u32;

Let's unpack this (ha!). The request_data field can contain anything we want when sending ICMP echo messages. Remember, the Windows ping.exe just sends a bunch of letters from the alphabet.

Request options is a pointer to an IpOptionInformation - but it could also be NULL. We could have written that field as:

type IcmpSendEcho = extern "stdcall" fn(
    // ...
    request_options: *const IpOptionInformation,

But instead we wrote it as:

type IcmpSendEcho = extern "stdcall" fn(
    // ...
    request_options: Option<&IpOptionInformation>,

Because:

  • Both of those are FFI-safe - they are both just one regular pointer
  • The former (*const X) is a lot more annoying to use from Rust code.

Finally, reply_buffer is an output parameter (IcmpSendEcho will write to it), so it needs to be a *mut pointer, not a *const one. We'll get to what's in the reply buffer later, so for now we'll just leave it as raw bytes (hence, *mut u8).

Back to our regularly-scheduled Win32 API calling

It looks like IcmpSendEcho takes a Handle, and after a quick search on MSDN, we find that IcmpCreateFile is the right function for us. Its declaration is a lot simpler:

IPHLPAPI_DLL_LINKAGE HANDLE IcmpCreateFile();

No parameters, great:

type IcmpCreateFile = extern "stdcall" fn() -> Handle;

Alright, it's time to call some functions!

First let's retrieve both their addresses and create an "ICMP file", whatever that means:

fn main() {
    unsafe {
        let h = LoadLibraryA("IPHLPAPI.dll\0".as_ptr());
        let IcmpCreateFile: IcmpCreateFile =
            transmute(GetProcAddress(h, "IcmpCreateFile\0".as_ptr()));
        let IcmpSendEcho: IcmpSendEcho = transmute(GetProcAddress(h, "IcmpSendEcho\0".as_ptr()));

        let handle = IcmpCreateFile();
        println!("handle = {:?}", handle);
    }
}
> cargo run
handle = 0x246c5e0ee30

Looks good! There's a troubling lack of error handling so far, but since we got that far we can be pretty sure that we loaded the right library, and spelled IcmpCreateFile correctly.

We're all set to call IcmpSendEcho:

// in main, in unsafe block:
let handle = IcmpCreateFile();
println!("handle = {:?}", handle);

// let's send some culture down the internet pipes
let data = "O Romeo, Romeo. Reachable art thou Romeo?";

// this will be written to, so it needs to be `mut`.
// I'm picking 128 bytes here because I expect the reply
// to be small
let mut reply = vec![0u8; 128];

let ret = IcmpSendEcho(
    handle,
    IPAddr([8, 8, 8, 8]), // destination
    data.as_ptr(),        // request data
    data.len() as u16,
    Some(&IpOptionInformation {
        ttl: 128, // time to live
        tos: 0,
        flags: 0,
        options_data: 0,
        options_size: 0,
    }),
    reply.as_mut_ptr(), // reply buffer
    reply.len() as u32,
    4000, // timeout (4 seconds)
);
println!("ret = {}", ret);

Did it work?

> cargo run
handle = 0x1bec649ec50
ret = 1

Time for a moment of doubt. I'm pretty sure most Win32 functions return 0 on success, but.. maybe this one is different?

Return Value

The IcmpSendEcho function returns the number of ICMP_ECHO_REPLY structures stored in the ReplyBuffer. The status of each reply is contained in the structure. If the return value is zero, call GetLastError for additional error information.

It is! It is different. Our return value of 1 is actually good news, everyone.


So there you have it, we've made our own ping, and as you can see, it works great. Thanks for following the series, next time we'll cover bwahahah sorry I can't type this with a straight face - of course we're not done.

First of all, we haven't examined the reply buffer at all - let's do so, using the pretty-hex crate.

> cargo add pretty-hex
      Adding pretty-hex v0.1.1 to dependencies
Cool bear

Cool bear's hot tip

Note: cargo add and cargo rm are not builtins, they're provided by the cargo-edit crate, which I, cool bear, fully endorse.

Since I don't remember how to use pretty-hex, I'm going to generate and open the docs locally:

> cargo doc --open
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Opening C:\Users\amos\sup\target\doc\sup\index.html
Cool bear

Cool bear's hot tip

This is a cargo built-in, and it's very good.

It works offline, so if you're a TV writer looking for a plot device, you can have a character use this from a fully-isolated basement to get out of a precarious situation.

Oh! That's easy.

// in main, in unsafe block, after `IcmpSendEcho` call:
use pretty_hex::*;
println!("{:?}", reply.hex_dump());

Now that's interesting. Along with some non-text data, we got our request data back!

MSDN docs told us the reply buffer actually contained a series of ICMP_ECHO_REPLY structs, so let's take a look at that declaration:

typedef struct icmp_echo_reply {
  IPAddr                       Address;
  ULONG                        Status;
  ULONG                        RoundTripTime;
  USHORT                       DataSize;
  USHORT                       Reserved;
  PVOID                        Data;
  struct ip_option_information Options;
} ICMP_ECHO_REPLY, *PICMP_ECHO_REPLY;

Heyy, we know almost all of these! We already have IPAddr, and we already have IpOptionInformation. As for ULONG and USHORT, they're just u32 and u16.

Time to get binding:

#[repr(C)]
#[derive(Debug)]
struct IcmpEchoReply {
    address: IPAddr,
    status: u32,
    rtt: u32,
    data_size: u16,
    reserved: u16,
    data: *const u8,
    options: IpOptionInformation,
}

For inspection purposes, we've derived the Debug trait for this struct. Since it contains an IpOptionInformation, we'll need to add #[derive(Debug)] to it as well.

Now, here's one thing we could do. First define IP options separately, to make the IcmpSendEcho call more readable:

let ip_opts = IpOptionInformation {
    ttl: 128,
    tos: 0,
    flags: 0,
    options_data: 0,
    options_size: 0,
}

And then declare a single IcmpEchoReply - but don't initialize it.

// First off, we need to adjust the signature of `IcmpSendEcho` so that it accepts
// a pointer to an IcmpEchoReply, not a u8 slice:
type IcmpSendEcho = extern "stdcall" fn(
    // omitted: other params
    reply_buffer: *mut IcmpEchoReply,
) -> u32;

// Now onto MaybeUninit
use std::mem;
let mut reply: mem::MaybeUninit<IcmpEchoReply> = mem::MaybeUninit::uninit();

let ret = IcmpSendEcho(
    handle,
    IPAddr([8, 8, 8, 8]),
    data.as_ptr(),
    data.len() as u16,
    Some(&ip_opts),
    reply.as_mut_ptr(),
    mem::size_of::<IcmpEchoReply>() as u32,
    4000,
);
if ret == 0 {
    panic!("IcmpSendEcho failed! ret = {}", ret);
}

let reply = reply.assume_init();
println!("{:#?}", reply);

MaybeUninit was recently stabilized (see the 1.36 changelog). It allows us to tell Rust to allocate a value, but until we call assume_init, to treat it as uninitialized. It basically leverages the type system to prevent undefined behavior.

Here, we only assume it's initialized if IcmpSendEcho succeeds, which I believe is correct. However, we're out of luck:

...because the "reply" for IcmpSendEcho is weirder. The docs say:

ReplySize

The allocated size, in bytes, of the reply buffer. The buffer should be large enough to hold at least one ICMP_ECHO_REPLY structure plus RequestSize bytes of data.

This buffer should also be large enough to also hold 8 more bytes of data (the size of an ICMP error message).

To recap, this is how IcmpSendEcho stores things in the reply buffer:

Not only did we not reserve the 8 bytes for the ICMP error message, we're also sending some data, so our reply buffer isn't large enough - and that's why it now fails. Note that, in ICMP, the reply data is exactly the data we sent.

To unpack the reply properly, we're going to have to do something slightly more involved. We'll just allocate a vector with enough room, and only later on interpret its contents as either an IcmpEchoReply, an ICMP error, or the reply data.

// Since we changed our mind again, we need to adjust the signature of `IcmpSendEcho`
// *again* so that it accepts once again a pointer to a u8 slice:
type IcmpSendEcho = extern "stdcall" fn(
    // omitted: other params
    reply_buffer: *mut u8,
) -> u32;

// in main, in unsafe block
use std::mem;
let reply_size = mem::size_of::<IcmpEchoReply>();

let reply_buf_size = reply_size + 8 + data.len();
let mut reply_buf = vec![0u8; reply_buf_size];
// note: there's probably a way to use MaybeUninit here / avoid using vec, but
// let's go for something simple.

let ret = IcmpSendEcho(
    handle,
    IPAddr([8, 8, 8, 8]),
    data.as_ptr(),
    data.len() as u16,
    Some(&ip_opts),
    reply_buf.as_mut_ptr(),
    reply_buf_size as u32,
    4000,
);
if ret == 0 {
    panic!("IcmpSendEcho failed! ret = {}", ret);
}

// casting between pointer types requires transmute:
let reply: &IcmpEchoReply = mem::transmute(&reply_buf[0]);
println!("{:#?}", *reply);

// as it turns out, the "8 bytes for ICMP errors" occur *before* the
// reply data.
let reply_data: *const u8 = mem::transmute(&reply_buf[reply_size + 8]);
// in the previous line, `reply_data` is just a pointer - this turns it
// into a slice.
let reply_data = std::slice::from_raw_parts(reply_data, reply.data_size as usize);

use pretty_hex::*;
println!("{:?}", reply_data.hex_dump());

And now, everything works beautifully:

Here's our complete program so far:

use pretty_hex::*;
use std::{
    ffi::c_void,
    fmt,
    mem::{size_of, transmute},
    slice,
};

type HModule = *const c_void;
type FarProc = *const c_void;

extern "stdcall" {
    fn LoadLibraryA(name: *const u8) -> HModule;
    fn GetProcAddress(module: HModule, name: *const u8) -> FarProc;
}

struct IPAddr([u8; 4]);

impl fmt::Debug for IPAddr {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let [a, b, c, d] = self.0;
        write!(f, "{}.{}.{}.{}", a, b, c, d)
    }
}

#[repr(C)]
#[derive(Debug)]
struct IpOptionInformation {
    ttl: u8,
    tos: u8,
    flags: u8,
    options_size: u8,
    options_data: u32,
}

type Handle = *const c_void;

#[repr(C)]
#[derive(Debug)]
struct IcmpEchoReply {
    address: IPAddr,
    status: u32,
    rtt: u32,
    data_size: u16,
    reserved: u16,
    data: *const u8,
    options: IpOptionInformation,
}

type IcmpSendEcho = extern "stdcall" fn(
    handle: Handle,
    dest: IPAddr,
    request_data: *const u8,
    request_size: u16,
    request_options: Option<&IpOptionInformation>,
    reply_buffer: *mut u8,
    reply_size: u32,
    timeout: u32,
) -> u32;
type IcmpCreateFile = extern "stdcall" fn() -> Handle;

fn main() {
    #[allow(non_snake_case)]
    unsafe {
        let h = LoadLibraryA("IPHLPAPI.dll\0".as_ptr());
        let IcmpCreateFile: IcmpCreateFile =
            transmute(GetProcAddress(h, "IcmpCreateFile\0".as_ptr()));
        let IcmpSendEcho: IcmpSendEcho = transmute(GetProcAddress(h, "IcmpSendEcho\0".as_ptr()));

        let handle = IcmpCreateFile();

        let data = "O Romeo, Romeo. Reachable art thou Romeo?";
        let ip_opts = IpOptionInformation {
            ttl: 128,
            tos: 0,
            flags: 0,
            options_data: 0,
            options_size: 0,
        };

        let reply_size = size_of::<IcmpEchoReply>();
        let reply_buf_size = reply_size + 8 + data.len();
        let mut reply_buf = vec![0u8; reply_buf_size];

        let ret = IcmpSendEcho(
            handle,
            IPAddr([8, 8, 8, 8]),
            data.as_ptr(),
            data.len() as u16,
            Some(&ip_opts),
            reply_buf.as_mut_ptr(),
            reply_buf_size as u32,
            4000,
        );
        if ret == 0 {
            panic!("IcmpSendEcho failed! ret = {}", ret);
        }

        let reply: &IcmpEchoReply = transmute(&reply_buf[0]);
        println!("{:#?}", *reply);

        let reply_data: *const u8 = transmute(&reply_buf[reply_size + 8]);
        let reply_data = slice::from_raw_parts(reply_data, reply.data_size as usize);
        println!("{:?}", reply_data.hex_dump());
    }
}

In the next part, we'll refactor our codebase and add some more features!

Cool bear

What did we learn?

Newtypes allow us to provide our own implementation of traits - in this article, we provided a custom Debug implementation for [u8; 4] - an IPv4 address as represented in the Win32 API.

When it comes to FFI (foreign function interface), struct layout matters. Rust's default representation is different from C's, and we can opt into packing. It's controlled by the repr attribute, used directly above a struct declaration.

cargo doc allows generating and reading the documentation of third-party crates, even offline. It generates the documentation for all dependencies of the current project.

MaybeUninit allows us to safely deal with uninitialized data, without causing undefined behavior. This is enforced by the type system.

Option<&T> can be used instead of *const T when passing parameters from Rust to a C function, for ease of use.

Rust slices can be made from a raw pointer + a length, using std::slice::from_raw_parts.

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Futures Nostalgia

Up until recently, hyper was my favorite Rust HTTP framework. It's low-level, but that gives you a lot of control over what happens.

Here's what a sample hyper application would look like:

$ cargo new nostalgia
     Created binary (application) `nostalgia` package
$ cd nostalgia
$ cargo add hyper@0.14 --features "http1 tcp server"
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding hyper v0.14 to dependencies with features: ["http1", "tcp", "server"]
$ cargo add tokio@1 --features "full"
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding tokio v1 to dependencies with features: ["full"]