Rust generics vs Java generics
👋 This page was last updated ~6 years ago. Just so you know.
In my previous article, I said I needed to stop thinking of Rust generics as Java generics, because in Rust, generic types are erased.
Someone gently pointed out that they are also erased in Java, the difference was elsewhere. And so, let's learn the difference together.
Java generics
I learned Java first (a long, long time ago), and their approach to generics made sense to me at the time.
Here's how Java sees generics: Every instance of a class is an object, and
every object inherits from Object
.
You can store anything in a collection if you have an array of Objects, for example, but that's not very useful:
class Container { public Object[] items; public Container(Object[] items) { this.items = items; } } class Dog { public Dog() {} public void bark() {} } class Main { public static void main(String[] args) { Dog d = new Dog(); Container c = new Container(new Dog[] {d, d, d}); c.items[0].bark(); } }
This will fail on the last line of main with:
error: cannot find symbol c.items[0].bark(); ^ symbol: method bark() location: class Object
This is because, as far as Java is concerned, Container contains instances of Object, not Dog. If we cast it explicitly to Dog, then it'll compile:
((Dog)c.items[0]).bark();
But then if you put a Cat
in your container, that'll fail at runtime.
class Dog { public Dog() {} public void bark() {} } class Cat { public Cat() {} } class Main { public static void main(String[] args) { Container c = new Container(new Object[] {new Cat(), new Dog()}); ((Dog)c.items[0]).bark(); } }
Gets you:
Exception in thread "main" java.lang.ClassCastException: Cat cannot be cast to Dog at Main.main(Main.java:8)
So generics in Java add a safety to the equation by not letting you do certain things. If we make our container generic,
import java.util.ArrayList; class Container<T> { public ArrayList<T> items; public Container() { this.items = new ArrayList<T>(); } } class Main { public static void main(String[] args) { Dog d = new Dog(); Container<Dog> c = new Container(); c.items.add(new Dog()); // this line has a compile error c.items.add(new Cat()); } }
It won't let us add a Cat
to a Container<Dog>
. But it's a very thin safety layer.
At runtime, there's still only one Container
type. And it still only contains
objects.
In fact, if you want to use primitive types (that are not objects) with your generic type, they need to be boxed.
class Main { public static void main(String[] args) { Container<Byte> c = new Container(); c.items.add((byte) 1); c.items.add((byte) 2); c.items.add((byte) 3); } }
This compiles and runs fine, and it looks like we're adding ints directly
to our container, but do not be fooled! Integer
is a class, and 1, 2, and 3
are automatically boxed. The code is equivalent to:
c.items.add(new Byte((byte) 1)); c.items.add(new Byte((byte) 2)); c.items.add(new Byte((byte) 3));
(Ignoring some performance optimizations relative to auto boxing).
I wanted to show off object sizes but it's non-trivial to do in Java (you have to set up a manifest to be able to use the instrumentation package, and even then it's just an estimate of object sizes), so I won't.
But the mental model is this: there is only one Container
type, and no
matter what T
is, it must extend Object, and the actual Container
object
only stores references to those objects. A Container of three bytes takes no
less space in memory than a container of three dogs.
Note that, in Java, just like in Rust, generic type parameters are erased. That means that once a Collection has been constructed, if it's passed somewhere else without type information, it's impossible to find out what T was.
void foobar<T>(coll: Collection<T>) { // there is *no way* to know about T at runtime. // the compiler only wants to enforce that we use // coll correctly, but we cannot, for example, create // an instance of T. T's class is not stored anywhere // inside Collection. // At runtime, Collection<T> is Collection<Object>, // no matter what T is. It'll always take the same space, // and always use the same code. }
Rust generics
In Rust, types are also erased.
struct Container<T> { t: T, } fn foobar<T>(c: Container<T>) { // there is no way to know what T is at runtime. // we cannot match T. we cannot have different // codepaths for different T. The difference must // come from the outside. } fn main() { let a = Container { t: 42 }; }
However, generics are also reified. A Container<u8>
is not the same type as
a Container<u32>
. It has a different size, it implements the same set of methods
but with different code, etc.
Definition of reify (verb):
to consider or represent (something abstract) as a material or concrete thing : to give definite content and form to (a concept or idea)
Source: Merriam-Webster
Quick vocabulary aside: in the Rust community, the term "monomorphization" is used rather than "reification". They both refer to the same thing here, although the former is slightly more specific: the compiler is generating code for a single (mono) form, from polymorphic (multiple forms) code.
Also, in Rust, we can easily measure the size of things. Let's try it out:
struct Container<T> { items: [T; 3], } fn main() { use std::mem::size_of_val; let cu8 = Container { items: [1u8, 2u8, 3u8], }; println!("size of cu8 = {} bytes", size_of_val(&cu8)); let cu32 = Container { items: [1u32, 2u32, 3u32], }; println!("size of cu32 = {} bytes", size_of_val(&cu32)); }
Gives the output:
size of cu8 = 3 bytes size of cu32 = 12 bytes
Everything behaves as if we had written:
struct ContainerU8 { items: [u8; 3], } struct ContainerU32 { items: [u32; 3], } fn main() { let cu8 = ContainerU8 { items: [1, 2, 3], }; let cu32 = ContainerU32 { items: [1, 2, 3], }; // etc. }
...except the compiler generates the different instantiations of Container for us.
The model for Rust is this:
Reification
One negative aspect of reification is it tends to increase binary size.
If our Container above has lots of methods, for example, and we use it with
a lot of different T
s, we'll end up generating a lot of variants. This gets
worse when combining multiple generic types.
It's important to remember that any local you declare in a function has to be sized, so that it can be properly allocated on the stack.
If we declare a Container<u32>
, 12 bytes are reserved on the stack. If we
declare a Box<Container<u32>
, the container itself is allocated in the heap
(an entirely different style of memory management), and only 4 or 8 bytes
(for 32-bit and 64-bit) are reserved on the stack, just enough to point to
the heap-allocated container.
In fact, let's try to measure the size of allocators.
fn main() { use std::mem::size_of_val; let v1 = vec![1, 2, 3]; let v2 = vec![4, 5, 6]; { let simple = v1.iter(); println!("size of simple = {} bytes", size_of_val(&simple)); } { let chained = v1.iter().chain(v2.iter()); println!("size of chained = {} bytes", size_of_val(&chained)); } { let vv = vec![v1, v2]; let flattened = vv.iter().flatten(); println!("size of flattened = {} bytes", size_of_val(&flattened)) } }
Gets us the output:
size of simple = 16 bytes size of chained = 40 bytes size of flattened = 64 bytes
simple
is a std::slice::Iter<'_, i32>
. chained
is a
std::iter::Chain<std::slice::Iter<'_, i32>, std::slice::Iter<'_, i32>>
,
etc. They are all generic structs, that implement the Iterator
trait.
At runtime, they're concrete types. They're all allocated on the stack,
taking up a specific size:
If we box all our containers:
{ let simple = Box::new(v1.iter()); println!("size of boxed simple = {} bytes", size_of_val(&simple)); } { let chained = Box::new(v1.iter().chain(v2.iter())); println!("size of boxed chained = {} bytes", size_of_val(&chained)); } { let vv = vec![v1, v2]; let flattened = Box::new(vv.iter().flatten()); println!( "size of boxed flattened = {} bytes", size_of_val(&flattened) ); }
We get:
size of boxed simple = 8 bytes size of boxed chained = 8 bytes size of boxed flattened = 8 bytes
I'm writing this from a 64-bit Linux machine, and 8 bytes is indeed 64 bits. It checks out.
Where did our iterators go? On the heap.
Can we still measure their size? We can, because
Box implements
Deref, so we can pass
the contents of the box to size_of_val
.
{ let simple = Box::new(v1.iter()); println!("~~ simple ~~"); println!("box = {} bytes", size_of_val(&simple)); println!("contents = {} bytes", size_of_val(&*simple)); } // etc.
Prints:
~~ simple ~~ box = 8 bytes contents = 16 bytes ~~ chained ~~ box = 8 bytes contents = 40 bytes ~~ chained ~~ box = 8 bytes contents = 64 bytes
I said iterators went on the heap, but I'm just trusting Box's documentation at this point. Is there any other way to find out? Sure. Let's start printing addresses.
fn print_addr<T>(name: &str, reference: &T) { println!("addr of {} = {:#?}", name, reference as *const _); } fn main() { use std::mem::size_of_val; let v1 = vec![1, 2, 3]; print_addr("v1 ", &v1); let v2 = vec![4, 5, 6]; print_addr("v2 ", &v2); { let simple = Box::new(v1.iter()); println!("~~ simple ~~"); print_addr("box ", &simple); print_addr("contents", &*simple); } // etc. }
We get the following output:
addr of v1 = 0x00007ffff436d070 addr of v2 = 0x00007ffff436d088 ~~ simple ~~ addr of box = 0x00007ffc2bd60120 addr of contents = 0x0000560aca0dea80 ~~ chained ~~ addr of box = 0x00007ffc2bd60158 addr of contents = 0x0000560aca0debe0 ~~ chained ~~ addr of box = 0x00007ffc2bd60208 addr of contents = 0x0000560aca0dec50
Looking at those addresses, it's pretty clear that all values allocated
on the stack (such as our two Vec
s) are around 0x00007ff.........
, whereas
all values allocated on the heap are in the neighborhood of 0x0000560.........
.
Our memory now looks something like this:
Thinking back to our recursive iterators, it all makes sense. The size of our iterator depends on the contents of our actual data structure. There's no way to figure out that size at compile time. However, we can easily reserve a pointer's worth of memory, and then have it point to a concrete type that's as large as we need it to be.
Inlining
Reification also has upsides. Knowing the fully instantiated type ahead of time generates a lot of opportunities for optimizing code.
Let's take this small program:
fn eq<T>(a: T, b: T) -> bool where T: PartialEq, { a == b } pub fn main() { let mut iter = std::env::args(); let (a, b) = (iter.next().unwrap(), iter.next().unwrap()); compare(&a, &b); } fn compare(a: &str, b: &str) { eq(a, b); eq(a.len(), b.len()); }
If we compile it on Compiler Explorer without
optimizations, we can see that eq
is called twice.
We also see that it calls len()
to compute the string's lengths
(on lines 1387 and 1391).
Now what happens when we turn on optimizations?
Note: I had to tune code a little to avoid dead code elimination
No calls here! eq
has completely disappeared, and so have len
calls.
My assembly is rusty so I'm not able to tell what every line does, but
we can clearly see that, for string equality testing, memcmp
is called
directly (line 269). I'm not sure which je
or jne
is doing length
testing exactly, that seems like a good candidate for a future post.
The important thing here is: the compiler had so much knowledge of the
caller (compare) and the callee (eq), that it was able to inline everything
away. It would've inlined compare too, if I hadn't used #[inline(never)]
on it.
Conclusion
So, gather round folks, what did we learn?
- It doesn't take much for Amos to write a long follow-up post
- Both Java and Rust generics do type erasure, but...
- In Java, all generic type parameters must be objects
- Whereas in Rust, they can be any (sized) type
- In Rust, generic types and functions are reified, which means specific versions of them are generated...
- ...but if it can, the optimizer will inline everything and throw most of it away anyway, so you're not really paying for it.
If you liked this article, you can watch me try really hard not to write too much unsafe Rust on twitter at @fasterthanlime, and if you'd like me to write about something in particular, you can poke me there.
Acknowledgements
Thanks to @the_walterpi on Twitter for pointing
out that you don't need std::mem::transmute
to get a raw pointer, and that the
Rust community prefers the term "monomorphization" over "reification".
If you liked what you saw, please support my work!
Here's another article just for you:
I'd like to think that my understanding of "async Rust" has increased over the past year or so. I'm 100% onboard with the basic principle: I would like to handle thousands of concurrent tasks using a handful of threads. That sounds great!
And to become proficient with async Rust, I've accepted a lot of things. There are blue functions and red functions, and red (async) functions are contagious.