Crust of Rust: Lifetime Annotations

In the 2019 Rust Survey, a lot of people were asking for video content covering intermediate Rust content. So in this first video (possibly of many), we’re going to investigate a case where you need multiple explicit lifetime annotations. We explore why they are needed, and why we need more than one in this particular case. We also talk about some of the differences between the string types and introduce generics over a self-defined trait in the process.

And don’t worry, I know that what we’re implementing exists in the standard library :)

You can find the final code at https://gist.github.com/jonhoo/2a7fdcf79be03e51a5f95cd326f2a1e8.

Lifetimes

Lending out a reference to a resource that someone else owns can be complicated. For example, imagine this set of operations:

I acquire a handle to some kind of resource.
I lend you a reference to the resource.
I decide I’m done with the resource, and deallocate it, while you still have your reference.
You decide to use the resource.

Uh oh! Your reference is pointing to an invalid resource. This is called a dangling pointer or ‘use after free’, when the resource is memory.

To fix this, we have to make sure that step four never happens after step three. The ownership system in Rust does this through a concept called lifetimes, which describe the scope that a reference is valid for.

When we have a function that takes an argument by reference, we can be implicit or explicit about the lifetime of the reference:

// implicit
fn foo(x: &i32) {
}

// explicit
fn bar<'a>(x: &'a i32) {
}

The 'a reads ‘the lifetime a’. Technically, every reference has some lifetime associated with it, but the compiler lets you elide them in common cases. Before we get to that, though, let’s break the explicit example down:

fn bar<'a>(...)

We previously talked a little about function syntax, but we didn’t discuss the <>s after a function’s name. A function can have ‘generic parameters’ between the <>s, of which lifetimes are one kind. We’ll discuss other kinds of generics later in the book, but for now, let’s focus on the lifetimes aspect.

We use <> to declare our lifetimes. This says that bar has one lifetime, 'a. If we had two reference parameters, it would look like this:

fn bar<'a, 'b>(...)

Then in our parameter list, we use the lifetimes we’ve named:

...(x: &'a i32)

If we wanted a &mut reference, we’d do this:

...(x: &'a mut i32)

If you compare &mut i32 to &'a mut i32, they’re the same, it’s that the lifetime 'a has snuck in between the & and the mut i32. We read &mut i32 as ‘a mutable reference to an i32’ and &'a mut i32 as ‘a mutable reference to an i32 with the lifetime 'a’.

In `struct`s

You’ll also need explicit lifetimes when working with structs that contain references:

struct Foo<'a> {
    x: &'a i32,
}

fn main() {
    let y = &5; // this is the same as `let _y = 5; let y = &_y;`
    let f = Foo { x: y };

    println!("{}", f.x);
}

As you can see, structs can also have lifetimes. In a similar way to functions,

struct Foo<'a> {

declares a lifetime, and

x: &'a i32,

uses it. So why do we need a lifetime here? We need to ensure that any reference to a Foo cannot outlive the reference to an i32 it contains.

`impl` blocks

Let’s implement a method on Foo:

struct Foo<'a> {
    x: &'a i32,
}

impl<'a> Foo<'a> {
    fn x(&self) -> &'a i32 { self.x }
}

fn main() {
    let y = &5; // this is the same as `let _y = 5; let y = &_y;`
    let f = Foo { x: y };

    println!("x is: {}", f.x());
}

As you can see, we need to declare a lifetime for Foo in the impl line. We repeat 'a twice, like on functions: impl<'a> defines a lifetime 'a, and Foo<'a> uses it.

Multiple lifetimes

If you have multiple references, you can use the same lifetime multiple times:

fn x_or_y<'a>(x: &'a str, y: &'a str) -> &'a str {

This says that x and y both are alive for the same scope, and that the return value is also alive for that scope. If you wanted x and y to have different lifetimes, you can use multiple lifetime parameters:

fn x_or_y<'a, 'b>(x: &'a str, y: &'b str) -> &'a str {

In this example, x and y have different valid scopes, but the return value has the same lifetime as x.

Thinking in scopes

A way to think about lifetimes is to visualize the scope that a reference is valid for. For example:

fn main() {
    let y = &5;     // -+ y goes into scope
                    //  |
    // stuff        //  |
                    //  |
}                   // -+ y goes out of scope

Adding in our Foo:

struct Foo<'a> {
    x: &'a i32,
}

fn main() {
    let y = &5;           // -+ y goes into scope
    let f = Foo { x: y }; // -+ f goes into scope
    // stuff              //  |
                          //  |
}                         // -+ f and y go out of scope

Our f lives within the scope of y, so everything works. What if it didn’t? This code won’t work:

struct Foo<'a> {
    x: &'a i32,
}

fn main() {
    let x;                    // -+ x goes into scope
                              //  |
    {                         //  |
        let y = &5;           // ---+ y goes into scope
        let f = Foo { x: y }; // ---+ f goes into scope
        x = &f.x;             //  | | error here
    }                         // ---+ f and y go out of scope
                              //  |
    println!("{}", x);        //  |
}                             // -+ x goes out of scope

Whew! As you can see here, the scopes of f and y are smaller than the scope of x. But when we do x = &f.x, we make x a reference to something that’s about to go out of scope.

Named lifetimes are a way of giving these scopes a name. Giving something a name is the first step towards being able to talk about it.

'static

The lifetime named ‘static’ is a special lifetime. It signals that something has the lifetime of the entire program. Most Rust programmers first come across 'static when dealing with strings:

let x: &'static str = "Hello, world.";

String literals have the type &'static str because the reference is always alive: they are baked into the data segment of the final binary. Another example are globals:

static FOO: i32 = 5;
let x: &'static i32 = &FOO;

This adds an i32 to the data segment of the binary, and x is a reference to it.

Lifetime Elision

Rust supports powerful local type inference in the bodies of functions but not in their item signatures. It’s forbidden to allow reasoning about types based on the item signature alone. However, for ergonomic reasons, a very restricted secondary inference algorithm called “lifetime elision” does apply when judging lifetimes. Lifetime elision is concerned solely to infer lifetime parameters using three easily memorizable and unambiguous rules. This means lifetime elision acts as a shorthand for writing an item signature, while not hiding away the actual types involved as full local inference would if applied to it.

When talking about lifetime elision, we use the term input lifetime and output lifetime. An input lifetime is a lifetime associated with a parameter of a function, and an output lifetime is a lifetime associated with the return value of a function. For example, this function has an input lifetime:

fn foo<'a>(bar: &'a str)

This one has an output lifetime:

fn foo<'a>() -> &'a str

This one has a lifetime in both positions:

fn foo<'a>(bar: &'a str) -> &'a str

Here are the three rules:

Each elided lifetime in a function’s arguments becomes a distinct lifetime parameter.
If there is exactly one input lifetime, elided or not, that lifetime is assigned to all elided lifetimes in the return values of that function.
If there are multiple input lifetimes, but one of them is &self or &mut self, the lifetime of self is assigned to all elided output lifetimes.

Otherwise, it is an error to elide an output lifetime.

Examples

Here are some examples of functions with elided lifetimes. We’ve paired each example of an elided lifetime with its expanded form.

fn print(s: &str); // elided
fn print<'a>(s: &'a str); // expanded

fn debug(lvl: u32, s: &str); // elided
fn debug<'a>(lvl: u32, s: &'a str); // expanded

// In the preceding example, `lvl` doesn’t need a lifetime because it’s not a
// reference (`&`). Only things relating to references (such as a `struct`
// which contains a reference) need lifetimes.

fn substr(s: &str, until: u32) -> &str; // elided
fn substr<'a>(s: &'a str, until: u32) -> &'a str; // expanded

fn get_str() -> &str; // ILLEGAL, no inputs

fn frob(s: &str, t: &str) -> &str; // ILLEGAL, two inputs
fn frob<'a, 'b>(s: &'a str, t: &'b str) -> &str; // Expanded: Output lifetime is ambiguous

fn get_mut(&mut self) -> &mut T; // elided
fn get_mut<'a>(&'a mut self) -> &'a mut T; // expanded

fn args<T: ToCStr>(&mut self, args: &[T]) -> &mut Command; // elided
fn args<'a, 'b, T: ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command; // expanded

fn new(buf: &mut [u8]) -> BufWriter; // elided
fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a>; // expanded

Understand Basic Lifetime Annotation in Rust

Lifetime annotation in rust is a relatively unique concept and it is hard to understand (at least for me). Spent some time on it and just want to share my understanding.

Every reference should have a lifetime.
The compiler wants to know the lifetime of every reference.
When the returned value is a reference, the compiler may fail to know its lifetime.
So, we should specify it.

Ok, let’s start with the example from The Book.

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

This code will not be compiled. The error message is as follwoing:

error[E0106]: missing lifetime specifier
 --> src/main.rs:1:33
  |
1 | fn longest(x: &str, y: &str) -> &str {
  |                                 ^ expected lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`

well, according to The Book, the reason is:

Rust can’t tell whether the reference being returned refers to x or y

Based on this statement, I started to think, what if I did not return things relates to x and y? So I wrote this code:

fn longest(x: &str, d: & str) -> & str {
    "ddd"
}

This time I got the same error and help message as before. So I started to think what is behind the help message. Then I noticed that at the very beginning, it suggests

this function’s return type contains a borrowed value …

So maybe this should be the main reason instead of the x and y stuff 🤔? Ok, it suggested borrowed value, and yes I intended to return one borrowed value &str. But where could the borrowed value come from and what lifetime should the returned value have? Remember according to The Book, every reference should have a lifetime.

I guess from the compiler’s perspective, this borrowed value can mainly from two sources. The first one is the parameter this function got, and the second one is any value created within the function (surely it could also be from global variables, e.g. constant). So what is the lifetime of these situations? Let’s investigate them one by one. And we start with return value created within the function.

If we want to return a reference to inner scope variables, the reference should have a long lifetime. Otherwise, the reference would be dropped once the function scope is ended. This reminds me of the static lifetime specifier (surely it can also be 'a ). According to documentation, this specifier means the variable has the life time as long as the whole project. So I changed the code to

fn longest(x: &str, d: & str) -> &'static str {
    "ddd"
}

Luckily this program compiled and returned me the expected result 🆒. But what if I want to return a reference to a String ? So I changed the code to

fn longest(x: &str, d: & str) -> &'static str {
    &String::from("ddd")
}

This time I got some different error message, which is

2 |     &String::from("ddd")
  |     ^-------------------
  |     ||
  |     |temporary value created here
  |     returns a reference to data owned by the current function

This error message is easy to understand, the created memory for this String struct in heap would be dropped once the function’s scope is ended. In that case, the reference would become one dangling reference, which violates the compiler’s rule.

Ok now let’s have a look at get reference from parameters. Again, let’s think about what compiler would think when it saw the signature. It saw two parameters, but the question is that these two parameters might have different lifetime. In that case, which lifetime should the compiler use to create the reference? We don’t know, neither the compiler, and that is why it complains. In that case, we should specify the lifetime of them. So we can just paste the code provided from The Book.

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

Based on the code above we can see that we are telling the compiler that both x, y, and the returned value have a lifetime of 'a. 'a is just one generic statement, it tells the lifetime of x and y. In the meantime, it also tells the compiler that the returned value should live at least as long as lifetime a. Hence, the compiler would be happy. It got all the information it needs.

One thing should be noticed is that we usually add 'a to all parameter, which is not necessary. In the above case, it was because the returned value could either be x or y, no one knows it until runtime. But if we are sure about which one would be returned, say x, we can just specify the lifetime of x

fn longest<'a>(x: &'a str, y: str) -> &'a str {
    x
}

We can also specify a different lifetime for the different parameter, but it should be noted that the returned value’s lifetime can only be one.

fn longest<'a, 'b>(x: &'a str, y: &'b str) -> &'b str {
    y
}

So, in conclusion, we should inform the compiler every reference’s lifetime, otherwise, it will complain. Although some times the compiler can infer it, sometimes we should tell it explicitly.

Rust Tutorial - Lifetime Specifiers Explained

Having a hard time understanding lifetime specifiers in the Rust programming language on functions? You are not alone. Look no further, I explain them in plain terms with examples.

#rust #javascript #web-development