Failing in Rust: The fun of a lifetime

And today – solving the ETL problem on Exercism.

After figuring out that the the problem description is somewhat misleading 1It suggests that the input format is a mapping of score–>collection-of-characters, and output should be character–>score, but in reality, the test suite is actually looking at mappings of score–>collection-of-string and transforming them to string–>score, it seemed like this should be pretty straightforward:

  • Iterate over the input and for each node, create a set of items – one for each String in the vector – mapping score–>string
    • Since we’re emitting multiple items per each node in the original input, we need to use flat_map
    • Internally, when iterating over the items in the vector, we can use the standard map function
  • Collect the emitted items into a BTreeMap

Ah, yes. Fun with iterators and references – one should always keep in mind what iterators they should use, and what that implies on their iterator methods, and specifically their input types. Fortunately, that’s very aptly and succinctly summarized in the std::iter documentation under “The three forms of iteration” – if you’re not very familiar with the three forms yet, I suggest you go read it right now. So here’s my first failed attempt:

And the corresponding errors:

What is happening here is that I’m using ‘iter()’, which iterates over references, yet I’m using the arguments as if they were non-references. Specifically, both ‘string’ and ‘score’ are references, so when I’m returning ‘(string, score)’ from the closure passed to ‘map’, I’m actually returning a tuple of type (&String, &i32), yet in the ‘collect’ call I’m trying to treat them as tuples of type (String, i32).
The error message from rustc can be a bit confusing, because it says that a reference is expected, and String is found instead, but from my POV, since we specified explicitly that we expect String, I’d expect it to say

Unfortunately, my POV is wrong. If you look closely, the error message clearly marks what the error pertains to (.collect…), so rustc is actually helpful enough to tell me exactly what to look at… that is, assuming I know how to properly read its error messages, which might be a bit of a far-fetched assumption in my case.

OK, we should be able to solve this in no time. I’ll just de-reference my references… oh wait, constructing the tuple from the dereferenced String will transfer ownership of that String to the tuple, which is bad because we don’t want to alter the input (also, we can’t, because it’s not mutable). OK, so I’ll just clone the string (i32 is copied over anyway), no big deal. Here’s my second version of the transform expression:

This should work, right? I’m returning the correct types, and just doing a (relatively) simple mapping. What could go wrong? Well apparently, Rust doesn’t like the fact that the closure I’m passing to “map” encloses the ‘score’ reference, and insists that the reference’s lifetime is shorter than that of the context in which it is used:

After staring at this error for a few minutes, I sort of realized that this is probably related to the fact that iterators in rust are lazy, which means that at the point that the call to flat_map returns, we haven’t actually done (almost) any work. It’s the call to “.collect” that actually iterates through the items and runs the transformation. I’m saying “sort of”, because I still couldn’t fully grasp what the actual problem is…
Initially, I thought that the problem is that ‘score’ exist within the scope of the flat_map call, and so when we return from that call it goes out of scope and reaches the end of its lifetime, yet we need to use it when doing the actual iteration. After some more mental parsing, I concluded that this model is incorrect – ‘score’ does not live within the scope of the call to flat_map; instead, it lives within the scope of the function used by flat_map when we iterate over it. But what does that mean exactly?
I tried to reason about this just by looking at my code, and figured out I’m getting nowhere fast2Well, not that fast, actually, so I decided that I need to dig into how flat_map() and map() actually work:

  • I started out with the documentation for std::iter, and the map and flat_map functions.
    • The documentation for std::iter didn’t teach me anything new per-se, but it did help solidify and reinforce some points. If you haven’t thoroughly read it yet, I recommend doing it – it’s time well spent (and it’s not that long anyway).
    • The documentation for both map and flat_map was very useful, because it introduced me to how the magic of lazy evaluation works for iterator adapters: Both adapters return a struct that implements the Iterator trait (std::iter::Map and std::Iter::FlatMap, respectively) and doing an iteration like

      we are actually iterating over an std::iter::Map, which means we’re calling the .next() method on that instance.
      This means that Map and FlatMap (and basically any other adapter) store the Iterator on which they were originally called, and call that Iterator’s methods as necessary. Such adapters would also store some extra data to allow them to actually perform the adaptation – more often than not, there will at least be a function3Strictly speaking, the iterators and adapter don’t have to store anything – but without storing the original iterator and some extra data, there aren’t many meaningful things it can do. Nevertheless, there are at least some cases where that is handy – e.g. the empty iterator
  • Equipped with that knowledge, I tried to think of how FlatMap works internally, in order to figure out how the two closures interact and what the lifetime of ‘score’ is. I started coming up with some ideas, but quickly realized that I’m going about it the wrong way – I can just look at the source code! To be completely honest, I knew this approach is probably an overkill, but at this point I couldn’t stop – I needed to know how things were actually implemented.
  • Let’s start with map() and it’s corresponding struct ‘Map’. This is pretty straight-forward to implement – it keeps the original iterator, as well as the mapping-function you pass to map(), and whenever next() is called, it just calls next() on the internal iterator and invokes the mapping function on the result.
  • How does FlatMap work? Every time it invokes the mapping-function, it gets back an iterator, rather than an item, and FlatMap is expected to return each of the items of this iterator in sequence4This is the “Flat” in “FlatMap”. This complicates things a bit:
    1. Let the original iterator be ‘It1’
    2. Calling It1.flat_map(Fn1) returns a FlatMap object – ‘FM’ – that holds both ‘It1’ and ‘Fn1’ as members
    3. When we call the FM.next() function:
      1. It basically calls Fn1(It1.next()), which returns an iterator, and stores that iterator as ‘It2′5This is simplified, since it needs to handle the case when mapping returns an empty iterator etc., but the general idea is still the same
      2. It then calls It2.next() and returns the result
      3. In subsequent calls to FM.next(), it would first try to retrieve items from It2, and once It2.next() returns null, it would refresh it by calling It2 = Fn1(It1.next())
      4. This is a bit simplified – we need to also handle edge cases and bi-directional iteration, but the general behavior is as described above. Read the source code for full details.

So, getting back to my lifetime error, what is actually happening here? Now that we have a better understanding of how the underlying mechanisms work we can get a better idea of what the lifetime of ‘score’ is, as well as the lifetime of the closure passed to map():

  1. The call to input.iter().flat_map(…) returns an object of type FlatMap, which takes ownership of input.iter(), and also of C1
  2. When we call collect() on the FlatMap object, it traverses the iterator (by calling next() repeatedly)
  3. When next() is called, FlatMap calls next() on the internal iterator (input.iter()), and then passes the result to C1, which in turn returns a Map object. This is where the problem lies:
    • The Map object takes ownership of strings.iter()
    • It also takes ownership of C2, and C2 stores a reference to ‘score’, which means the Map object stores a reference to ‘score’.
    • At the point when C1 returns, ‘score’ goes out of scope, yet the Map object is returned and lives on (FlatMap took ownership of it), so the reference stored by Map is basically a dangling reference (put differently, Map has a longer lifetime than one of its members)

Eventually, this doesn’t really require a deep understanding of Map/FlatMap, but rather the understanding that iterators are lazy, and the implication that has – specifically, that adapters return objects, and if those objects are returned from some function, then they cannot reference any local bindings from that function, because that would break the lifetime guarantees.

So what’s the right way to do this?
This works:

As does this one, which moves 6To be exact, in this case it copies, since score is &i32, and that means it implements the Copy trait, which causes values to be copied rather than moved ‘score’ to be owned by the map() closure – for more details, see Move Closures section in the Rust Book:

Looking back at this, it seems like this should have been easier to debug than it was – I actually spent almost a full hour digging into this (not including the time to write this post). Nothing like a bit of Rust to make me feel like a complete noob… But as always, it was educating, so time well spent.

Footnotes   [ + ]

1. It suggests that the input format is a mapping of score–>collection-of-characters, and output should be character–>score, but in reality, the test suite is actually looking at mappings of score–>collection-of-string and transforming them to string–>score
2. Well, not that fast, actually
3. Strictly speaking, the iterators and adapter don’t have to store anything – but without storing the original iterator and some extra data, there aren’t many meaningful things it can do. Nevertheless, there are at least some cases where that is handy – e.g. the empty iterator
4. This is the “Flat” in “FlatMap”
5. This is simplified, since it needs to handle the case when mapping returns an empty iterator etc., but the general idea is still the same
6. To be exact, in this case it copies, since score is &i32, and that means it implements the Copy trait, which causes values to be copied rather than moved

Leave a Reply

Your email address will not be published. Required fields are marked *