What are move semantics in Rust?

nalply picture nalply · May 17, 2015 · Viewed 9.6k times · Source

In Rust, there are two possibilities to take a reference

  1. Borrow, i.e., take a reference but don't allow mutating the reference destination. The & operator borrows ownership from a value.

  2. Borrow mutably, i.e., take a reference to mutate the destination. The &mut operator mutably borrows ownership from a value.

The Rust documentation about borrowing rules says:

First, any borrow must last for a scope no greater than that of the owner. Second, you may have one or the other of these two kinds of borrows, but not both at the same time:

  • one or more references (&T) to a resource,
  • exactly one mutable reference (&mut T).

I believe that taking a reference is creating a pointer to the value and accessing the value by the pointer. This could be optimized away by the compiler if there is a simpler equivalent implementation.

However, I don't understand what move means and how it is implemented.

For types implementing the Copy trait it means copying e.g. by assigning the struct member-wise from the source, or a memcpy(). For small structs or for primitives this copy is efficient.

And for move?

This question is not a duplicate of What are move semantics? because Rust and C++ are different languages and move semantics are different between the two.

Answer

Matthieu M. picture Matthieu M. · May 17, 2015

Semantics

Rust implements what is known as an Affine Type System:

Affine types are a version of linear types imposing weaker constraints, corresponding to affine logic. An affine resource can only be used once, while a linear one must be used once.

Types that are not Copy, and are thus moved, are Affine Types: you may use them either once or never, nothing else.

Rust qualifies this as a transfer of ownership in its Ownership-centric view of the world (*).

(*) Some of the people working on Rust are much more qualified than I am in CS, and they knowingly implemented an Affine Type System; however contrary to Haskell which exposes the math-y/cs-y concepts, Rust tends to expose more pragmatic concepts.

Note: it could be argued that Affine Types returned from a function tagged with #[must_use] are actually Linear Types from my reading.


Implementation

It depends. Please keep in mind than Rust is a language built for speed, and there are numerous optimizations passes at play here which will depend on the compiler used (rustc + LLVM, in our case).

Within a function body (playground):

fn main() {
    let s = "Hello, World!".to_string();
    let t = s;
    println!("{}", t);
}

If you check the LLVM IR (in Debug), you'll see:

%_5 = alloca %"alloc::string::String", align 8
%t = alloca %"alloc::string::String", align 8
%s = alloca %"alloc::string::String", align 8

%0 = bitcast %"alloc::string::String"* %s to i8*
%1 = bitcast %"alloc::string::String"* %_5 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 24, i32 8, i1 false)
%2 = bitcast %"alloc::string::String"* %_5 to i8*
%3 = bitcast %"alloc::string::String"* %t to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 24, i32 8, i1 false)

Underneath the covers, rustc invokes a memcpy from the result of "Hello, World!".to_string() to s and then to t. While it might seem inefficient, checking the same IR in Release mode you will realize that LLVM has completely elided the copies (realizing that s was unused).

The same situation occurs when calling a function: in theory you "move" the object into the function stack frame, however in practice if the object is large the rustc compiler might switch to passing a pointer instead.

Another situation is returning from a function, but even then the compiler might apply "return value optimization" and build directly in the caller's stack frame -- that is, the caller passes a pointer into which to write the return value, which is used without intermediary storage.

The ownership/borrowing constraints of Rust enable optimizations that are difficult to reach in C++ (which also has RVO but cannot apply it in as many cases).

So, the digest version:

  • moving large objects is inefficient, but there are a number of optimizations at play that might elide the move altogether
  • moving involves a memcpy of std::mem::size_of::<T>() bytes, so moving a large String is efficient because it only copies a couple bytes whatever the size of the allocated buffer they hold onto