I've been looking for this blog post everywhere, but it doesn't exist, so I guess it's my turn to write about Some Fun with Rust.

The task at hand

Let's say you have a recursive, acyclic data structure, like so:

Rust code
struct Node { values: Vec<i32>, children: Vec<Node>, }

This allows you to represent a tree-like structure:

[1, 2, 3] /\ / \ / \ / \ / \ [4, 5] [6, 7]

Now let's say you want to iterate over the values of the root node and all its children, recursively, so that you get the sequence [1, 2, 3, 4, 5, 6, 7].

A naive implementation

This is what I originally came up with:

Rust code
impl Node { fn values(&self) -> impl Iterator<Item = &i32> { self.values .iter() .chain(self.children.iter().map(|n| n.values()).flatten()) } }

Let's unpack that. So first, we generate an iterator over our own values [1, 2, 3]. Then, we take an iterator over our children, and map it to an iterator of iterators over all of their values.

You can think of the result of map() as an iterator over [[4, 5], [6, 7]]. Then we flatten() it so that it becomes an iterator over [4, 5, 6, 7],

And finally, we chain both iterators together, so that we get an iterator over [1, 2, 3, 4, 5, 6, 7]. Sounds good, right?

Except it doesn't compile. (Not with Rust 1.34.1, at least)

Here's the error message we get:

error[E0720]: opaque type expands to a recursive type --> src/main.rs:8:25 | 8 | fn values(&self) -> impl Iterator<Item = &i32> { | ^^^^^^^^^^^^^^^^^^^^^^^^^^ expands to self-referential type | = note: expanded type is `std::iter::Chain<std::slice::Iter<'_, i32>, std::iter::Flatten<std::iter::Map<std::slice::Iter<'_, Node>, [closure@src/main.rs:11:45: 11:59]>>>`

If we clean up the expanded type a little, we have:

T = Chain( Slice, Flatten( Map(Slice, T) ) )

Which, well, the compiler isn't wrong. The structure could be arbitrarily big at runtime, yet it wants to determine its size at compile time.

This is because - if I'm understanding it correctly - you can't return a trait:

Rust code
// does not compile fn foo() -> fmt::Display { "hi" }

But you can return a concrete type that implements a trait:

Rust code
// should be fine! fn foo() -> impl fmt::Display { "hi" }

But it still needs to know the size of that concrete type at compile time, because when you later do

Rust code
fn main() { let s = foo(); println!("{}", s); }

s is allocated on the stack, so we need to know its size before calling into foo.

So, our impl trait-returning foo behaves sorta like a generic version:

Rust code
// does not compile fn foo<T>() -> T where T: fmt::Display, { "hi" }

...except that version does not compile either because although "hi" (a &'static str) satisfies the constraints for T, it isn't T. In fact, nothing would prevent us from calling foo::<i32>().

In summary: returning an impl trait allows us to have a generic function, except with a return type parameter, that's inferred from the function's body rather than its arguments. Also, it allows us to hide the real type of what it is we return. The caller will only ever see the trait we want them to see, even though after compilation it will allocate the exact right amount for the concrete type.

Phew, hope that makes sense.

A bad solution

So, we can't return a self-referential type, but we can.. build our own iterator.

Rust code
// does not compile struct NodeIter<'a> { viter: Iterator<Item = &'a i32>, citer: Iterator<Item = &'a Node>, }

The plan is to first yield values out of viter (for val iterator), and when we get a None, to get the next value from citer, and replace viter with an iterator over it.

But wait, that's not a valid struct. Our fields can't be traits, just like you can't just return traits from functions. We need to put both of them in a box.

Rust code
struct NodeIter<'a> { viter: Box<Iterator<Item = &'a i32>>, citer: Box<Iterator<Item = &'a Node>>, } impl<'a> Iterator for NodeIter<'a> { type Item = &'a i32; fn next(&mut self) -> Option<Self::Item> { unimplemented!() } }

So, now our values method returns a concrete type:

Rust code
impl Node { fn values<'a>(&'a self) -> NodeIter<'a> { NodeIter { // does not compile viter: Box::new(self.values.iter()), citer: Box::new(self.children.iter()), } } }

But we're already running into other compile errors

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements --> src/main.rs:69:45 | 69 | viter: Box::new(self.values.iter()), | ^^^^ | note: first, the lifetime cannot outlive the lifetime 'a as defined on the method body at 67:23... --> src/main.rs:67:23 | 67 | pub fn values<'a>(&'a self) -> NodeIter<'a> { | ^^ note: ...so that reference does not outlive borrowed content --> src/main.rs:69:33 | 69 | viter: Box::new(self.values.iter()), | ^^^^^^^^^^^ = note: but, the lifetime must be valid for the static lifetime... = note: ...so that the expression is assignable: expected std::boxed::Box<(dyn std::iter::Iterator<Item=&i32> + 'static)> found std::boxed::Box<dyn std::iter::Iterator<Item=&i32>>

It took me a while to figure this one out, but, apparently, there is a way to specify the lifetime of the contents of a box. Lifetimes are part of generic types and functions' signatures, so you can just add a constraint in there:

Rust code
// before struct NodeIter<'a> { viter: Box<Iterator<Item = &'a i32>>, citer: Box<Iterator<Item = &'a Node>>, } // after struct NodeIter<'a> { viter: Box<Iterator<Item = &'a i32> + 'a>, citer: Box<Iterator<Item = &'a Node> + 'a>, }

Which says: not only should our Iterator return references that are valid for 'a , the iterator itself should also live at least as long as 'a. I'm not sure why that distinction has to be made explicitly right now, but I at least now the compiler agrees with us.

Implementing the iterator itself is annoying, but it does work that way:

Rust code
impl<'a> Iterator for NodeIter<'a> { type Item = &'a i32; fn next(&mut self) -> Option<Self::Item> { // if we still have some values of our own to yield... if let Some(val) = self.viter.next() { // then yield them Some(val) } else { // if we're out of values, but we still have children to walk... if let Some(child) = self.citer.next() { // then yield all their values. self.viter = Box::new(child.values()); // call next() again, hopefully immediately yielding the // branch's next value, but if not, we'll just keep recursing // until we find a non-empty branch or run out of nodes. self.next() } else { None } } } }

If we try to use it in an example:

Rust code
fn main() { let n = Node { values: vec![1, 2, 3], children: vec![ Node { values: vec![4, 5], children: vec![], }, Node { values: vec![6, 7], children: vec![], }, ], }; let v: Vec<_> = n.values().collect(); println!("{:?}", v); }

We get the expected output: [1, 2, 3, 4, 5, 6, 7].

Feel free to mess around with that version on the playground. Try removing some stuff you think is unnecessary, see which compiler error messages you get.

A better solution

So let's recap. Our original approach didn't work because rust wanted to be able to allocate the whole iterator on the stack, and it needed to know its exact size.

Can't we just return a boxed iterator? Let's try it out:

Rust code
impl Node { pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> { Box::new( self.values .iter() .chain(self.children.iter().map(|n| n.values()).flatten()), ) } }

At first, it doesn't appear to work

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements --> src/main.rs:33:22 | 33 | .iter() | ^^^^ | note: first, the lifetime cannot outlive the lifetime 'a as defined on the method body at 30:23... --> src/main.rs:30:23 | 30 | pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> { | ^^ note: ...so that reference does not outlive borrowed content --> src/main.rs:32:17 | 32 | self.children | ^^^^^^^^^^^^^ = note: but, the lifetime must be valid for the static lifetime... = note: ...so that the expression is assignable: expected std::boxed::Box<(dyn std::iter::Iterator<Item=&'a i32> + 'static)> found std::boxed::Box<dyn std::iter::Iterator<Item=&i32>>

...but wait, that error seems familiar - we encountered it when implementing our custom iterator. Don't we just need to add a lifetime constraint to our box?

Rust code
// before: pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> { // ... } // after: pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32> + 'a> { // ... }

Sure enough, this version compiles and runs fine!

You can play with the final code on the Rust playground.

Conclusion

Gather round team: what did we learn?

If you liked this article, you can watch me struggle my way through some Rust in near-realtime on twitter at @fasterthanlime, and if you'd like me to write about something in particular, you can poke me there.

I've written a follow-up to this article: Rust vs Java generics

Acknowledgements

Thanks to @chriskrycho on Twitter for pointing out that both Java and Rust erase generics.

I'm looking for my next job!

August 2020 will be my last month at my current job, and I'm excited to figure out the next step.

If you're a remote-friendly company and you think we'd be a good fit, don't hesitate to get in touch!