Custom Search

Sunday, June 25, 2017

Introduction to Lifetimes in Rust


Abstract

A language that doesn't affect the way you think about programming is not worth knowing.
    -- Alan J. Perlis
Rust enhances it's variables with lifetime information, which I believe is unique among languages currently in production. This is the most exciting and best language feature I've encountered since recursion. It makes me think about aspects of a program I wasn't thinking about before and catches heisenbugs at compile time. I hope that explaining that to you will help you catch that excitement.

Some motivation

My motivation for looking into new programming languages is to try and become more productive, mostly by catching more bugs at compile time that are hard to find in a debugger. Functional languages bring two features that help with this: immutable variables that help prevent timing-related locking bugs, and algebraic data types that keep you from using error indicators as if they were data.
While Rust isn't a pure functional language like Haskell, it has both immutable variables and algebraic data types, so it makes the cut as far as I'm concerned. Even better, it has ownership, which prevents a wide variety of pointer bugs. That's the third major source of heisenbugs. Garbage collectors also do this, but can leak memory, and have a significant run-time cost that prevents them from being used in embedded systems with limited resources.
Rust ownership keeps track of the single variable that owns a value. When that variable goes out of scope, the value is freed. Reference variables can borrow a value. When those variables go out of scope, the borrow .

Some examples

So let's look at just about the simplest case where this shows up.

A buggy example

I have to apologize for using C here. But I need to demonstrate the type of bug Rust's lifetimes keeps you from writing.
So what do you think the following code outputs?
#include <stdio.h>

int main() {
     int *x;
     {
          int y = 1;
          x = &y;
     }
     {
          int z = 3;
          int *y = &z;
          printf("y points to the value %d\n", *y);
     } 
     printf("x points to the value %d\n", *x);
}
If you said 3, congratulations on being far to familiar with C. Because gcc 6.2.0 does print 3, even though that is clearly wrong. Turning on -Wall and -Wextra just gets a warning about y being unused.
This is a classic bad memory reference bug - x points to memory that's been freed when the printf runs. The second block serves no purpose but to overwrite that value to surface the bug.

Now in Rust

So let's rewrite this example in Rust. We don't need the second block for reasons that will become obvious shortly.
fn main() {
    let x;
    {
      let y = 1;
      x = &y;
    }
    println!("x is now {}", x)
}
Compiling this generates the error message:
error: `y` does not live long enough --> 1-error.rs:6:5
| 5 | x = &y; | - borrow occurs here 6 | } | ^ `y` dropped here while still borrowed 7 | println!("x is now {}", x) 8 | } | - borrowed value needs to live until here  
The first line of the error - and Rust errors are very informative, which sometimes leads to them being long - says that the value assigned to y doesn't live long enough. The next line tells us it has to live as long as the variable x - the "block suffix" is the rest of the block following the statement. The third line says it only lives as long as the value y, with similar wording. So indeed, not long enough.
This is the heart of lifetimes in Rust. The compiler keeps track of how long any value will be valid, and issues a warning if we try saving a reference to that value that could outlive the value. This prevents the same type of bugs that a garbage collector eliminates, but it's all done in the compiler, so there's no runtime overhead, meaning Rust is suitable for the same kind of platforms as C.

A bad fix

So let's fix the error. We'll just take the block delimiters out so both values live to the end of the function:
fn main() {
    let x;
    let y = 1;
    x = &y;

    println!("x is now {}", x)
}
But this doesn't work. Here's the error message:
error: `y` does not live long enough
 --> 1-error.rs:7:1
  |
4 |     x = &y;
  |          - borrow occurs here
...
7 | }
  | ^ `y` dropped here while still borrowed
  |
  = note: values in a scope are dropped in the opposite order they are created
We get pretty much the same error message. This illustrates an annoyance with lifetimes - variables are freed in the reverse order of declaration, and that's enough of an issue that the compiler complains.

A good fix

Fortunately, this is easy to fix: Just swap the two variable declarations.
fn main() {
    let y = 1;
    let x = &y;

    println!("x is now {}", x)
}
By assigning x after y is created, the lifetimes are fine.

Function calls

This is nice, but not all that interesting. So let's extend it to an external function:
fn id(x: &u32) -> &u32 {
    return &x;
}

fn main() {
    let x;
    let c1 = 1;

    x = id(&c1);
    println!("x is now {}", x)
}
And we see the same error as before:
error: `c1` does not live long enough
  --> 2-function.rs:11:1
   |
9  |     x = id(&c1);
   |             -- borrow occurs here
10 |     println!("x is now {}", x)
11 | }
   | ^ `c1` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created
Checking the same fix of swapping the two variable declarations:
fn id(x: &u32) -> &u32 {
    return &x;
}

fn main() {
    let c1 = 1;
    let x = id(&c1);

    println!("x is now {}", x)
}
And it works fine.

Names

As you can see, in this case the lifetime of the input value is used for the lifetime of the function's return value. Let's try an example without an input value:
fn id() -> &u32 {
    return &0;
}

fn main() {
    let x = id();

    println!("x is now {}", x)
}
This is an error that pretty much every new C programmer makes at least once: returning the address of a local variable. GCC only warns us about that. Rust gives us an error:
error[E0106]: missing lifetime specifier
 --> 3-zeroadic.rs:1:12
  |
1 | fn id() -> &u32 {
  |            ^ expected lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
  = help: consider giving it a 'static lifetime
Here's another example of Rust's nice error messages, suggesting a `static lifetime.
fn id() -> &'static u32 {
    return &0;
}

fn main() {
    let x = id();

    println!("x is now {}", x)
}
The 'static added to the type of the return values is a lifetime declaration, which all start with '. They are used with reference types - those prefixed by & - to describe to the compiler what lifetime the variable being declared should have. In this case, we're using 'static, which is the only built-in lifetime. It declares that the value returned by the function has the same lifetime as the program.
But we still get an error:
error: borrowed value does not live long enough
 --> 3-zeroadic.rs:2:13
  |
2 |     return &0;
  |             ^- temporary value only lives until here
  |             |
  |             does not live long enough
  |
  = note: borrowed value must be valid for the static lifetime...
  = note: consider using a `let` binding to increase its lifetime
So let's try the let binding it suggested:
fn id() -> &'static u32 {
    let x = 0;
    return &x;
}

fn main() {
    let x = id();

    println!("x is now {}", x)
}
Given the previous examples, it shouldn't be a surprise that this doesn't work:
error: `x` does not live long enough
 --> 3-zeroadic.rs:3:13
  |
3 |     return &x;
  |             ^ does not live long enough
4 | }
  | - borrowed value only lives until here
  |
  = note: borrowed value must be valid for the static lifetime...
So let's make this a static variable:
fn zero() -> &'static u32 {
    static x: u32 = 0;
    return &x;
}


fn main() {
    let x = zero();

    println!("x is now {}", x)
}
This compiles, but gives us a warning:
warning: static variable `x` should have an upper case name such as `X`, #[warn(non_upper_case_globals)] on by default
 --> 3-zeroadic.rs:2:5
  |
2 |     static x: u32 = 0;
  |     ^^^^^^^^^^^^^^^^^^
It's complaining about violating a convention (that static variables have upper case names), and tells us how to suppress that warning. Another example of Rust's nice error messages.

Functions with lifetimes

That first function
fn id(x: &u32) -> &u32 {
    return &x;
}
uses lifetime elision, which means that the lifetime of a function's return value is the same as it's argument if there's only one argument, or the same as a self argument if it has one.
If we violate the argument count the other way we can see a more common use of lifetimes variables:
fn max(x: &u32, y: &u32) -> &u32 {
    if *x > *y {
        x
    } else {
        y
    }
}


fn main() {
    let x1 = 0;
    let x2 = 1;
    let x = max(&x1, &x2);

    println!("x is now {}", x)
}
And the compiler again complains about needing a lifetime specifier:
error[E0106]: missing lifetime specifier
 --> 4-binary.rs:1:29
  |
1 | fn max(x: &u32, y: &u32) -> &u32 {
  |                             ^ expected lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`
So we can fix that by changing the function's signature:
fn max<'a> (x: &'a u32, y: &'a u32) -> &'a u32 {
The first <'a> declares the lifetime 'a. That then gets used for the arguments, and in the return value the same way 'static was before. This says the lifetime of all three values will be the same.
With that change, things compile and work fine. But let's try an experiment in main:
fn main() {
    let x1 = 0;
    let x;
    let x2 = 1;
    x = max(&x1, &x2);

    println!("x is now {}", x)
}
In this case, the variable that the return value is assigned to has a lifetime in between the two variables that are passed in. So it should give us an error message for that one, but not the other. Sure enough, it does:
error: `x2` does not live long enough
  --> 5-misused.rs:16:1
   |
13 |     x = max(&x1, &x2);
   |                   -- borrow occurs here
...
16 | }
   | ^ `x2` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

So let's try another experiment, and see how to declare multiple lifetimes:
fn max<'a, 'b> (x: &'a u32, y: &'b u32) -> &'a u32 {
    if *x > *y {
        x
    } else {
        y
    }
}
In this case, we declare two lifetimes, 'a and 'b. But we can return a value with a lifetime that doesn't match the function signature, and the compiler notices:
error[E0312]: lifetime of reference outlives lifetime of borrowed content...
 --> 6-multilifetime.rs:5:9
  |
5 |         y
  |         ^
  |
note: ...the reference is valid for the lifetime 'a as defined on the body at 1:51...
 --> 6-multilifetime.rs:1:52
  |
1 |   fn max<'a, 'b> (x: &'a u32, y: &'b u32) -> &'a u32 {
  |  ____________________________________________________^ starting here...
2 | |     if *x > *y {
3 | |         x
4 | |     } else {
5 | |         y
6 | |     }
7 | | }
  | |_^ ...ending here
note: ...but the borrowed content is only valid for the lifetime 'b as defined on the body at 1:51
 --> 6-multilifetime.rs:1:52
  |
1 |   fn max<'a, 'b> (x: &'a u32, y: &'b u32) -> &'a u32 {
  |  ____________________________________________________^ starting here...
2 | |     if *x > *y {
3 | |         x
4 | |     } else {
5 | |         y
6 | |     }
7 | | }
  | |_^ ...ending here
help: consider using an explicit lifetime parameter as shown: fn max<'a>(x: &'a u32, y: &'a u32) -> &'a u32
 --> 6-multilifetime.rs:1:1
  |
1 |   fn max<'a, 'b> (x: &'a u32, y: &'b u32) -> &'a u32 {
  |  _^ starting here...
2 | |     if *x > *y {
3 | |         x
4 | |     } else {
5 | |         y
6 | |     }
7 | | }
  | |_^ ...ending here
Did I mention that Rust's error messages can get a bit lengthy?

Structures

Structures (by which I mean composite values like structs, enums and tuples) are values, so have a lifetime associated with them when they are created. However, if a structure includes a reference, then that reference must have a lifetime associated with it. Here's a pretty vanilla definition of a polymorphic list type using algebraic data types:
enum List<T> {
    Cons(T, &List<T>),
    Nil,
}
And sure enough, this gives us an error message:
error[E0106]: missing lifetime specifier
 --> 7-list.rs:2:13
  |
2 |     Cons(T, &List<T>),
  |             ^ expected lifetime parameter
Saying that we need to provide a lifetime for the reference variable in the tuple. Fixing this requires providing a name for the lifetime of the structure, then stating that the type held in the structure as well as the reference variable both have at least that lifetime. It also changes the type signature of the List, which now includes the lifetime:
enum List<'a, T: 'a> {
    Cons(T, &'a List<'a, T>),
    Nil,
}
Type constructors - like Cons in the List example - have behavior similar to functions. The values passed to the constructor must all have lifetimes that exceed the lifetime of the constructed (instead of returned) value, and violating that will cause a compilation error.
There is one new behavior: we can mutate elements in a structure. The new value has the same lifetime constraint as the original value. So something like this:
fn main() {
    let x = 1;
    let nil = List::Nil;
    let mut list = List::Cons(&x, &nil);
    let y = 2;

    if let List::Cons(ref mut elem, _) = list {
        *elem = &y;
    }

    println!("linked list has length: {}", list.len());
    println!("{}", list.stringify());
}
Will also generate an error message:
error: `y` does not live long enough
  --> mutlist.rs:18:1
   |
13 |         *elem = &y;
   |                  - borrow occurs here
...
18 | }
   | ^ `y` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created
Similar to the function call example, without the need to contort the code to declare list before the value being referenced in it. The fix is the same, though - just change the order of the variable declarations.

Summary

To me, lifetimes are interesting because they catches bugs at compile time instead of blowing up at run time, and do it in a way that can be used for embedded systems. It's exciting because it makes me think about things I don't think about regularly when programming but probably should be, and provides a framework for reasoning about those things.
This hasn't completely covered Rust's lifetime features. You can declare lifetimes in object implementations, allowing lifetimes constraints that span multiple methods to be expressed.
Lifetimes are only part of Rust's ownership features. Reference tracking and borrowing are integral to handling ownership, and you really need to understand those to take proper advantage of it.
All in all, this is an interesting set of features that I hope catches on.