In my previous article, I said I needed to stop thinking of Rust generics as Java generics, because in Rust, generic types are erased.

Someone gently pointed out that they are also erased in Java, the difference was elsewhere. And so, let's learn the difference together.

Java generics

I learned Java first (a long, long time ago), and their approach to generics made sense to me at the time.

Here's how Java sees generics: Every instance of a class is an object, and every object inherits from Object.

You can store anything in a collection if you have an array of Objects, for example, but that's not very useful:

Java code
class Container {
    public Object[] items;

    public Container(Object[] items) {
        this.items = items;
    }
}

class Dog {
    public Dog() {}
    public void bark() {}
}

class Main {
    public static void main(String[] args) {
        Dog d = new Dog();
        Container c = new Container(new Dog[] {d, d, d});
        c.items[0].bark();
    }
}

This will fail on the last line of main with:

error: cannot find symbol
        c.items[0].bark();
                  ^
  symbol:   method bark()
  location: class Object

This is because, as far as Java is concerned, Container contains instances of Object, not Dog. If we cast it explicitly to Dog, then it'll compile:

Java code
((Dog)c.items[0]).bark();

But then if you put a Cat in your container, that'll fail at runtime.

Java code
class Dog {
    public Dog() {}
    public void bark() {}
}

class Cat {
    public Cat() {}
}

class Main {
    public static void main(String[] args) {
        Container c = new Container(new Object[] {new Cat(), new Dog()});
        ((Dog)c.items[0]).bark();
    }
}

Gets you:

Exception in thread "main" java.lang.ClassCastException: Cat cannot be cast to Dog
    at Main.main(Main.java:8)

So generics in Java add a safety to the equation by not letting you do certain things. If we make our container generic,

Java code
import java.util.ArrayList;

class Container<T> {
    public ArrayList<T> items;

    public Container() {
        this.items = new ArrayList<T>();
    }
}

class Main {
    public static void main(String[] args) {
        Dog d = new Dog();
        Container<Dog> c = new Container();
        c.items.add(new Dog());
        // this line has a compile error
        c.items.add(new Cat());
    }
}

It won't let us add a Cat to a Container<Dog>. But it's a very thin safety layer. At runtime, there's still only one Container type. And it still only contains objects.

In fact, if you want to use primitive types (that are not objects) with your generic type, they need to be boxed.

Java code
class Main {
    public static void main(String[] args) {
        Container<Byte> c = new Container();
        c.items.add((byte) 1);
        c.items.add((byte) 2);
        c.items.add((byte) 3);
    }
}

This compiles and runs fine, and it looks like we're adding ints directly to our container, but do not be fooled! Integer is a class, and 1, 2, and 3 are automatically boxed. The code is equivalent to:

Java code
c.items.add(new Byte((byte) 1));
c.items.add(new Byte((byte) 2));
c.items.add(new Byte((byte) 3));

(Ignoring some performance optimizations relative to auto boxing).

I wanted to show off object sizes but it's non-trivial to do in Java (you have to set up a manifest to be able to use the instrumentation package, and even then it's just an estimate of object sizes), so I won't.

But the mental model is this: there is only one Container type, and no matter what T is, it must extend Object, and the actual Container object only stores references to those objects. A Container of three bytes takes no less space in memory than a container of three dogs.

Note that, in Java, just like in Rust, generic type parameters are erased. That means that once a Collection has been constructed, if it's passed somewhere else without type information, it's impossible to find out what T was.

Java code
void foobar<T>(coll: Collection<T>) {
    // there is *no way* to know about T at runtime.
    // the compiler only wants to enforce that we use
    // coll correctly, but we cannot, for example, create
    // an instance of T. T's class is not stored anywhere
    // inside Collection.
    // At runtime, Collection<T> is Collection<Object>,
    // no matter what T is. It'll always take the same space,
    // and always use the same code.
}

Rust generics

In Rust, types are also erased.

Rust code
struct Container<T> {
    t: T,
}

fn foobar<T>(c: Container<T>) {
    // there is no way to know what T is at runtime.
    // we cannot match T. we cannot have different
    // codepaths for different T. The difference must
    // come from the outside.
}

fn main() {
    let a = Container { t: 42 };
}

However, generics are also reified. A Container<u8> is not the same type as a Container<u32>. It has a different size, it implements the same set of methods but with different code, etc.

Definition of reify (verb):

to consider or represent (something abstract) as a material or concrete thing : to give definite content and form to (a concept or idea)

Source: Merriam-Webster

Quick vocabulary aside: in the Rust community, the term "monomorphization" is used rather than "reification". They both refer to the same thing here, although the former is slightly more specific: the compiler is generating code for a single (mono) form, from polymorphic (multiple forms) code.

Also, in Rust, we can easily measure the size of things. Let's try it out:

Rust code
struct Container<T> {
    items: [T; 3],
}

fn main() {
    use std::mem::size_of_val;

    let cu8 = Container {
        items: [1u8, 2u8, 3u8],
    };
    println!("size of cu8 = {} bytes", size_of_val(&cu8));

    let cu32 = Container {
        items: [1u32, 2u32, 3u32],
    };
    println!("size of cu32 = {} bytes", size_of_val(&cu32));
}

Gives the output:

size of cu8 = 3 bytes
size of cu32 = 12 bytes

Everything behaves as if we had written:

Rust code
struct ContainerU8 {
    items: [u8; 3],
}

struct ContainerU32 {
    items: [u32; 3],
}

fn main() {
    let cu8 = ContainerU8 {
        items: [1, 2, 3],
    };

    let cu32 = ContainerU32 {
        items: [1, 2, 3],
    };

    // etc.
}

...except the compiler generates the different instantiations of Container for us.

The model for Rust is this:

Reification

One negative aspect of reification is it tends to increase binary size.

If our Container above has lots of methods, for example, and we use it with a lot of different Ts, we'll end up generating a lot of variants. This gets worse when combining multiple generic types.

It's important to remember that any local you declare in a function has to be sized, so that it can be properly allocated on the stack.

If we declare a Container<u32>, 12 bytes are reserved on the stack. If we declare a Box<Container<u32>, the container itself is allocated in the heap (an entirely different style of memory management), and only 4 or 8 bytes (for 32-bit and 64-bit) are reserved on the stack, just enough to point to the heap-allocated container.

In fact, let's try to measure the size of allocators.

Rust code
fn main() {
    use std::mem::size_of_val;

    let v1 = vec![1, 2, 3];
    let v2 = vec![4, 5, 6];

    {
        let simple = v1.iter();
        println!("size of simple = {} bytes", size_of_val(&simple));
    }

    {
        let chained = v1.iter().chain(v2.iter());
        println!("size of chained = {} bytes", size_of_val(&chained));
    }

    {
        let vv = vec![v1, v2];
        let flattened = vv.iter().flatten();
        println!("size of flattened = {} bytes", size_of_val(&flattened))
    }
}

Gets us the output:

size of simple = 16 bytes
size of chained = 40 bytes
size of flattened = 64 bytes

simple is a std::slice::Iter<'_, i32>. chained is a std::iter::Chain<std::slice::Iter<'_, i32>, std::slice::Iter<'_, i32>>, etc. They are all generic structs, that implement the Iterator trait. At runtime, they're concrete types. They're all allocated on the stack, taking up a specific size:

If we box all our containers:

Rust code
{
    let simple = Box::new(v1.iter());
    println!("size of boxed simple = {} bytes", size_of_val(&simple));
}

{
    let chained = Box::new(v1.iter().chain(v2.iter()));
    println!("size of boxed chained = {} bytes", size_of_val(&chained));
}

{
    let vv = vec![v1, v2];
    let flattened = Box::new(vv.iter().flatten());
    println!(
        "size of boxed flattened = {} bytes",
        size_of_val(&flattened)
    );
}

We get:

size of boxed simple = 8 bytes
size of boxed chained = 8 bytes
size of boxed flattened = 8 bytes

I'm writing this from a 64-bit Linux machine, and 8 bytes is indeed 64 bits. It checks out.

Where did our iterators go? On the heap.

Can we still measure their size? We can, because Box implements Deref, so we can pass the contents of the box to size_of_val.

Rust code
{
    let simple = Box::new(v1.iter());
    println!("~~ simple ~~");
    println!("box      = {} bytes", size_of_val(&simple));
    println!("contents = {} bytes", size_of_val(&*simple));
}

// etc.

Prints:

~~ simple ~~
box      = 8 bytes
contents = 16 bytes
~~ chained ~~
box      = 8 bytes
contents = 40 bytes
~~ chained ~~
box      = 8 bytes
contents = 64 bytes

I said iterators went on the heap, but I'm just trusting Box's documentation at this point. Is there any other way to find out? Sure. Let's start printing addresses.

Rust code
fn print_addr<T>(name: &str, reference: &T) {
    println!("addr of {} = {:#?}", name, reference as *const _);
}

fn main() {
    use std::mem::size_of_val;

    let v1 = vec![1, 2, 3];
    print_addr("v1      ", &v1);
    let v2 = vec![4, 5, 6];
    print_addr("v2      ", &v2);

    {
        let simple = Box::new(v1.iter());
        println!("~~ simple ~~");
        print_addr("box     ", &simple);
        print_addr("contents", &*simple);
    }

    // etc.
}

We get the following output:

addr of v1       = 0x00007ffff436d070
addr of v2       = 0x00007ffff436d088
~~ simple ~~
addr of box      = 0x00007ffc2bd60120
addr of contents = 0x0000560aca0dea80
~~ chained ~~
addr of box      = 0x00007ffc2bd60158
addr of contents = 0x0000560aca0debe0
~~ chained ~~
addr of box      = 0x00007ffc2bd60208
addr of contents = 0x0000560aca0dec50

Looking at those addresses, it's pretty clear that all values allocated on the stack (such as our two Vecs) are around 0x00007ff........., whereas all values allocated on the heap are in the neighborhood of 0x0000560..........

Our memory now looks something like this:

Thinking back to our recursive iterators, it all makes sense. The size of our iterator depends on the contents of our actual data structure. There's no way to figure out that size at compile time. However, we can easily reserve a pointer's worth of memory, and then have it point to a concrete type that's as large as we need it to be.

Inlining

Reification also has upsides. Knowing the fully instantiated type ahead of time generates a lot of opportunities for optimizing code.

Let's take this small program:

Rust code
fn eq<T>(a: T, b: T) -> bool
where T: PartialEq,
{
    a == b
}

pub fn main() {
    let mut iter = std::env::args();
    let (a, b) = (iter.next().unwrap(), iter.next().unwrap());
    compare(&a, &b);
}

fn compare(a: &str, b: &str) {
  eq(a, b);
  eq(a.len(), b.len());
}

If we compile it on Compiler Explorer without optimizations, we can see that eq is called twice.

We also see that it calls len() to compute the string's lengths (on lines 1387 and 1391).

Now what happens when we turn on optimizations?

Note: I had to tune code a little to avoid dead code elimination

No calls here! eq has completely disappeared, and so have len calls. My assembly is rusty so I'm not able to tell what every line does, but we can clearly see that, for string equality testing, memcmp is called directly (line 269). I'm not sure which je or jne is doing length testing exactly, that seems like a good candidate for a future post.

The important thing here is: the compiler had so much knowledge of the caller (compare) and the callee (eq), that it was able to inline everything away. It would've inlined compare too, if I hadn't used #[inline(never)] on it.

Conclusion

So, gather round folks, what did we learn?

If you liked this article, you can watch me try really hard not to write too much unsafe Rust on twitter at @fasterthanlime, and if you'd like me to write about something in particular, you can poke me there.

Acknowledgements

Thanks to @the_walterpi on Twitter for pointing out that you don't need std::mem::transmute to get a raw pointer, and that the Rust community prefers the term "monomorphization" over "reification".