Recursive iterators in Rust
👋 This page was last updated ~5 years ago. Just so you know.
I've been looking for this blog post everywhere, but it doesn't exist, so I guess it's my turn to write about Some Fun with Rust.
The task at hand
Let's say you have a recursive, acyclic data structure, like so:
struct Node { values: Vec<i32>, children: Vec<Node>, }
This allows you to represent a tree-like structure:
[1, 2, 3] /\ / \ / \ / \ / \ [4, 5] [6, 7]
Now let's say you want to iterate over the values of the root
node and all its children, recursively, so that you get the
sequence [1, 2, 3, 4, 5, 6, 7]
.
A naive implementation
This is what I originally came up with:
impl Node { fn values(&self) -> impl Iterator<Item = &i32> { self.values .iter() .chain(self.children.iter().map(|n| n.values()).flatten()) } }
Let's unpack that. So first, we generate an iterator over our own values
[1, 2, 3]
. Then, we take an iterator over our children, and map it to an iterator
of iterators over all of their values.
You can think of the result of map()
as an iterator over [[4, 5], [6, 7]]
.
Then we flatten()
it so that it becomes an iterator over [4, 5, 6, 7]
,
And finally, we chain both iterators together, so that we get an iterator over
[1, 2, 3, 4, 5, 6, 7]
. Sounds good, right?
Except it doesn't compile. (Not with Rust 1.34.1, at least)
Here's the error message we get:
error[E0720]: opaque type expands to a recursive type --> src/main.rs:8:25 | 8 | fn values(&self) -> impl Iterator<Item = &i32> { | ^^^^^^^^^^^^^^^^^^^^^^^^^^ expands to self-referential type | = note: expanded type is `std::iter::Chain<std::slice::Iter<'_, i32>, std::iter::Flatten<std::iter::Map<std::slice::Iter<'_, Node>, [closure@src/main.rs:11:45: 11:59]>>>`
If we clean up the expanded type a little, we have:
T = Chain( Slice, Flatten( Map(Slice, T) ) )
Which, well, the compiler isn't wrong. The structure could be arbitrarily big at runtime, yet it wants to determine its size at compile time.
This is because - if I'm understanding it correctly - you can't return a trait:
// does not compile fn foo() -> fmt::Display { "hi" }
But you can return a concrete type that implements a trait:
// should be fine! fn foo() -> impl fmt::Display { "hi" }
But it still needs to know the size of that concrete type at compile time, because when you later do
fn main() { let s = foo(); println!("{}", s); }
s
is allocated on the stack, so we need to know its size before calling
into foo
.
So, our impl trait
-returning foo behaves sorta like a generic version:
// does not compile fn foo<T>() -> T where T: fmt::Display, { "hi" }
...except that version does not compile either because although "hi"
(a
&'static str
) satisfies the constraints for T, it isn't T. In fact, nothing
would prevent us from calling foo::<i32>()
.
In summary: returning an impl trait
allows us to have a generic function,
except with a return type parameter
, that's inferred from the function's body
rather than its arguments. Also, it allows us to hide the real type of what it is
we return. The caller will only ever see the trait we want them to see, even though
after compilation it will allocate the exact right amount for the concrete type.
Phew, hope that makes sense.
A bad solution
So, we can't return a self-referential type
, but we can.. build our own iterator.
// does not compile struct NodeIter<'a> { viter: Iterator<Item = &'a i32>, citer: Iterator<Item = &'a Node>, }
The plan is to first yield values out of viter
(for val iterator), and when
we get a None
, to get the next value from citer
, and replace viter
with
an iterator over it.
But wait, that's not a valid struct. Our fields can't be traits, just like you can't just return traits from functions. We need to put both of them in a box.
struct NodeIter<'a> { viter: Box<Iterator<Item = &'a i32>>, citer: Box<Iterator<Item = &'a Node>>, } impl<'a> Iterator for NodeIter<'a> { type Item = &'a i32; fn next(&mut self) -> Option<Self::Item> { unimplemented!() } }
So, now our values
method returns a concrete type:
impl Node { fn values<'a>(&'a self) -> NodeIter<'a> { NodeIter { // does not compile viter: Box::new(self.values.iter()), citer: Box::new(self.children.iter()), } } }
But we're already running into other compile errors
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements --> src/main.rs:69:45 | 69 | viter: Box::new(self.values.iter()), | ^^^^ | note: first, the lifetime cannot outlive the lifetime 'a as defined on the method body at 67:23... --> src/main.rs:67:23 | 67 | pub fn values<'a>(&'a self) -> NodeIter<'a> { | ^^ note: ...so that reference does not outlive borrowed content --> src/main.rs:69:33 | 69 | viter: Box::new(self.values.iter()), | ^^^^^^^^^^^ = note: but, the lifetime must be valid for the static lifetime... = note: ...so that the expression is assignable: expected std::boxed::Box<(dyn std::iter::Iterator<Item=&i32> + 'static)> found std::boxed::Box<dyn std::iter::Iterator<Item=&i32>>
It took me a while to figure this one out, but, apparently, there is a way to specify the lifetime of the contents of a box. Lifetimes are part of generic types and functions' signatures, so you can just add a constraint in there:
// before struct NodeIter<'a> { viter: Box<Iterator<Item = &'a i32>>, citer: Box<Iterator<Item = &'a Node>>, } // after struct NodeIter<'a> { viter: Box<Iterator<Item = &'a i32> + 'a>, citer: Box<Iterator<Item = &'a Node> + 'a>, }
Which says: not only should our Iterator return references that are valid for 'a
,
the iterator itself should also live at least as long as 'a
. I'm not sure why
that distinction has to be made explicitly right now, but I at least know the compiler
agrees with us.
Implementing the iterator itself is annoying, but it does work that way:
impl<'a> Iterator for NodeIter<'a> { type Item = &'a i32; fn next(&mut self) -> Option<Self::Item> { // if we still have some values of our own to yield... if let Some(val) = self.viter.next() { // then yield them Some(val) } else { // if we're out of values, but we still have children to walk... if let Some(child) = self.citer.next() { // then yield all their values. self.viter = Box::new(child.values()); // call next() again, hopefully immediately yielding the // branch's next value, but if not, we'll just keep recursing // until we find a non-empty branch or run out of nodes. self.next() } else { None } } } }
If we try to use it in an example:
fn main() { let n = Node { values: vec![1, 2, 3], children: vec![ Node { values: vec![4, 5], children: vec![], }, Node { values: vec![6, 7], children: vec![], }, ], }; let v: Vec<_> = n.values().collect(); println!("{:?}", v); }
We get the expected output: [1, 2, 3, 4, 5, 6, 7]
.
Feel free to mess around with that version on the playground. Try removing some stuff you think is unnecessary, see which compiler error messages you get.
A better solution
So let's recap. Our original approach didn't work because rust wanted to be able to allocate the whole iterator on the stack, and it needed to know its exact size.
Can't we just return a boxed iterator? Let's try it out:
impl Node { pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> { Box::new( self.values .iter() .chain(self.children.iter().map(|n| n.values()).flatten()), ) } }
At first, it doesn't appear to work
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements --> src/main.rs:33:22 | 33 | .iter() | ^^^^ | note: first, the lifetime cannot outlive the lifetime 'a as defined on the method body at 30:23... --> src/main.rs:30:23 | 30 | pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> { | ^^ note: ...so that reference does not outlive borrowed content --> src/main.rs:32:17 | 32 | self.children | ^^^^^^^^^^^^^ = note: but, the lifetime must be valid for the static lifetime... = note: ...so that the expression is assignable: expected std::boxed::Box<(dyn std::iter::Iterator<Item=&'a i32> + 'static)> found std::boxed::Box<dyn std::iter::Iterator<Item=&i32>>
...but wait, that error seems familiar - we encountered it when implementing our custom iterator. Don't we just need to add a lifetime constraint to our box?
// before: pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> { // ... } // after: pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32> + 'a> { // ... }
Sure enough, this version compiles and runs fine!
You can play with the final code on the Rust playground.
Conclusion
Gather round team: what did we learn?
impl trait
return types are cool, but they're not magic.- Amos needs to stop thinking of Rust generics as Java generics. In Rust, generics are reified, which is good for performance, bad for binary size, but mostly it means the compiler needs to figure out a lot more stuff ahead of time.
- Lifetime constraints are not only for references (
&'a str
), but also for all generic type parameters (Box<T + 'a>
). - The rust playground uses GitHub gist for storing code, which is neat. I was wondering how they handled long-term storage of snippets, and seeing it's handled by GitHub makes me more confident in sharing some code that way.
If you liked this article, you can watch me struggle my way through some Rust in near-realtime on twitter at @fasterthanlime, and if you'd like me to write about something in particular, you can poke me there.
I've written a follow-up to this article: Rust vs Java generics
Acknowledgements
Thanks to @chriskrycho on Twitter for pointing out that both Java and Rust erase generics.
If you liked what you saw, please support my work!
Here's another article just for you:
Blog posts that praise Rust are many but funding is generally in short supply.
If even a small percentage of the money Rust saves companies was put back into the ecosystem it would help secure the future of the platform tremendously.
Multiple sources of funding
It is unreasonable going forward to expect the same handful of companies to provide all the funding.