Recursive iterators in Rust

👋 This page was last updated ~6 years ago. Just so you know.

I've been looking for this blog post everywhere, but it doesn't exist, so I guess it's my turn to write about Some Fun with Rust.

The task at hand

Let's say you have a recursive, acyclic data structure, like so:

struct Node {
    values: Vec<i32>,
    children: Vec<Node>,
}

This allows you to represent a tree-like structure:

   [1, 2, 3]
       /\
      /  \
     /    \
    /      \
   /        \
[4, 5]    [6, 7]

Now let's say you want to iterate over the values of the root node and all its children, recursively, so that you get the sequence [1, 2, 3, 4, 5, 6, 7].

A naive implementation

This is what I originally came up with:

impl Node {
    fn values(&self) -> impl Iterator<Item = &i32> {
        self.values
            .iter()
            .chain(self.children.iter().map(|n| n.values()).flatten())
    }
}

Let's unpack that. So first, we generate an iterator over our own values [1, 2, 3]. Then, we take an iterator over our children, and map it to an iterator of iterators over all of their values.

You can think of the result of map() as an iterator over [[4, 5], [6, 7]]. Then we flatten() it so that it becomes an iterator over [4, 5, 6, 7],

And finally, we chain both iterators together, so that we get an iterator over [1, 2, 3, 4, 5, 6, 7]. Sounds good, right?

Except it doesn't compile. (Not with Rust 1.34.1, at least)

Here's the error message we get:

error[E0720]: opaque type expands to a recursive type
 --> src/main.rs:8:25
  |
8 |     fn values(&self) -> impl Iterator<Item = &i32> {
  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^ expands to self-referential type
  |
  = note: expanded type is `std::iter::Chain<std::slice::Iter<'_, i32>, std::iter::Flatten<std::iter::Map<std::slice::Iter<'_, Node>, [closure@src/main.rs:11:45: 11:59]>>>`

If we clean up the expanded type a little, we have:

T = Chain(
        Slice,
        Flatten(
            Map(Slice, T)
        )
    )

Which, well, the compiler isn't wrong. The structure could be arbitrarily big at runtime, yet it wants to determine its size at compile time.

This is because - if I'm understanding it correctly - you can't return a trait:

// does not compile
fn foo() -> fmt::Display {
    "hi"
}

But you can return a concrete type that implements a trait:

// should be fine!
fn foo() -> impl fmt::Display {
    "hi"
}

But it still needs to know the size of that concrete type at compile time, because when you later do

fn main() {
    let s = foo();
    println!("{}", s);
}

s is allocated on the stack, so we need to know its size before calling into foo.

So, our impl trait-returning foo behaves sorta like a generic version:

// does not compile
fn foo<T>() -> T
where
    T: fmt::Display,
{
    "hi"
}

...except that version does not compile either because although "hi" (a &'static str) satisfies the constraints for T, it isn't T. In fact, nothing would prevent us from calling foo::<i32>().

In summary: returning an impl trait allows us to have a generic function, except with a return type parameter, that's inferred from the function's body rather than its arguments. Also, it allows us to hide the real type of what it is we return. The caller will only ever see the trait we want them to see, even though after compilation it will allocate the exact right amount for the concrete type.

Phew, hope that makes sense.

A bad solution

So, we can't return a self-referential type, but we can.. build our own iterator.

// does not compile
struct NodeIter<'a> {
    viter: Iterator<Item = &'a i32>,
    citer: Iterator<Item = &'a Node>,
}

The plan is to first yield values out of viter (for val iterator), and when we get a None, to get the next value from citer, and replace viter with an iterator over it.

But wait, that's not a valid struct. Our fields can't be traits, just like you can't just return traits from functions. We need to put both of them in a box.

struct NodeIter<'a> {
    viter: Box<Iterator<Item = &'a i32>>,
    citer: Box<Iterator<Item = &'a Node>>,
}

impl<'a> Iterator for NodeIter<'a> {
    type Item = &'a i32;

    fn next(&mut self) -> Option<Self::Item> {
        unimplemented!()
    }
}

So, now our values method returns a concrete type:

impl Node {
    fn values<'a>(&'a self) -> NodeIter<'a> {
        NodeIter {
            // does not compile
            viter: Box::new(self.values.iter()),
            citer: Box::new(self.children.iter()),
        }
    }
}

But we're already running into other compile errors

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
  --> src/main.rs:69:45
   |
69 |                 viter: Box::new(self.values.iter()),
   |                                             ^^^^
   |
note: first, the lifetime cannot outlive the lifetime 'a as defined on the method body at 67:23...
  --> src/main.rs:67:23
   |
67 |         pub fn values<'a>(&'a self) -> NodeIter<'a> {
   |                       ^^
note: ...so that reference does not outlive borrowed content
  --> src/main.rs:69:33
   |
69 |                 viter: Box::new(self.values.iter()),
   |                                 ^^^^^^^^^^^
   = note: but, the lifetime must be valid for the static lifetime...
   = note: ...so that the expression is assignable:
           expected std::boxed::Box<(dyn std::iter::Iterator<Item=&i32> + 'static)>
              found std::boxed::Box<dyn std::iter::Iterator<Item=&i32>>

It took me a while to figure this one out, but, apparently, there is a way to specify the lifetime of the contents of a box. Lifetimes are part of generic types and functions' signatures, so you can just add a constraint in there:

// before
struct NodeIter<'a> {
    viter: Box<Iterator<Item = &'a i32>>,
    citer: Box<Iterator<Item = &'a Node>>,
}

// after
struct NodeIter<'a> {
    viter: Box<Iterator<Item = &'a i32> + 'a>,
    citer: Box<Iterator<Item = &'a Node> + 'a>,
}

Which says: not only should our Iterator return references that are valid for 'a , the iterator itself should also live at least as long as 'a. I'm not sure why that distinction has to be made explicitly right now, but I at least know the compiler agrees with us.

Implementing the iterator itself is annoying, but it does work that way:

impl<'a> Iterator for NodeIter<'a> {
    type Item = &'a i32;

    fn next(&mut self) -> Option<Self::Item> {
        // if we still have some values of our own to yield...
        if let Some(val) = self.viter.next() {
            // then yield them
            Some(val)
        } else {
            // if we're out of values, but we still have children to walk...
            if let Some(child) = self.citer.next() {
                // then yield all their values.
                self.viter = Box::new(child.values());
                // call next() again, hopefully immediately yielding the
                // branch's next value, but if not, we'll just keep recursing
                // until we find a non-empty branch or run out of nodes.
                self.next()
            } else {
                None
            }
        }
    }
}

If we try to use it in an example:

fn main() {
    let n = Node {
        values: vec![1, 2, 3],
        children: vec![
            Node {
                values: vec![4, 5],
                children: vec![],
            },
            Node {
                values: vec![6, 7],
                children: vec![],
            },
        ],
    };
    let v: Vec<_> = n.values().collect();
    println!("{:?}", v);
}

We get the expected output: [1, 2, 3, 4, 5, 6, 7].

Feel free to mess around with that version on the playground. Try removing some stuff you think is unnecessary, see which compiler error messages you get.

A better solution

So let's recap. Our original approach didn't work because rust wanted to be able to allocate the whole iterator on the stack, and it needed to know its exact size.

Can't we just return a boxed iterator? Let's try it out:

impl Node {
    pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> {
        Box::new(
            self.values
                .iter()
                .chain(self.children.iter().map(|n| n.values()).flatten()),
        )
    }
}

At first, it doesn't appear to work

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
  --> src/main.rs:33:22
   |
33 |                     .iter()
   |                      ^^^^
   |
note: first, the lifetime cannot outlive the lifetime 'a as defined on the method body at 30:23...
  --> src/main.rs:30:23
   |
30 |         pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> {
   |                       ^^
note: ...so that reference does not outlive borrowed content
  --> src/main.rs:32:17
   |
32 |                 self.children
   |                 ^^^^^^^^^^^^^
   = note: but, the lifetime must be valid for the static lifetime...
   = note: ...so that the expression is assignable:
           expected std::boxed::Box<(dyn std::iter::Iterator<Item=&'a i32> + 'static)>
              found std::boxed::Box<dyn std::iter::Iterator<Item=&i32>>

...but wait, that error seems familiar - we encountered it when implementing our custom iterator. Don't we just need to add a lifetime constraint to our box?

    // before:
    pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32>> {
        // ...
    }

    // after:
    pub fn values<'a>(&'a self) -> Box<Iterator<Item = &i32> + 'a> {
        // ...
    }

Sure enough, this version compiles and runs fine!

You can play with the final code on the Rust playground.

Conclusion

Gather round team: what did we learn?

  • impl trait return types are cool, but they're not magic.
  • Amos needs to stop thinking of Rust generics as Java generics. In Rust, generics are reified, which is good for performance, bad for binary size, but mostly it means the compiler needs to figure out a lot more stuff ahead of time.
  • Lifetime constraints are not only for references (&'a str), but also for all generic type parameters (Box<T + 'a>).
  • The rust playground uses GitHub gist for storing code, which is neat. I was wondering how they handled long-term storage of snippets, and seeing it's handled by GitHub makes me more confident in sharing some code that way.

If you liked this article, you can watch me struggle my way through some Rust in near-realtime on twitter at @fasterthanlime, and if you'd like me to write about something in particular, you can poke me there.

I've written a follow-up to this article: Rust vs Java generics

Acknowledgements

Thanks to @chriskrycho on Twitter for pointing out that both Java and Rust erase generics.

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

I won free load testing

Long story short: a couple of my articles got really popular on a bunch of sites, and someone, somewhere, went "well, let's see how much traffic that smart-ass can handle", and suddenly I was on the receiving end of a couple DDoS attacks.

It really doesn't matter what the articles were about — the attack is certainly not representative of how folks on either side of any number of debates generally behave.