Programming WebAssembly with Rust, the future of the web?

Programming WebAssembly with Rust, the future of the web?

This article has aimed to introduce the process of building a WebAsssembly module from Rust, with `wasm-pack` as our means of generating and compiling the final module.

WebAssembly, the future of the web?

WebAssembly is being billed as the future technology of the web, with the ability to run code in the browser at near native speeds. WebAssembly (often shortened to wasm) is a compact binary format that is compiled from languages like C and C++, as well as Rust — the language we will be demonstrating in this article.

Rust, being a relatively new language itself, has already built up a lot of support for developing, compiling and publishing modules in WebAssembly, mostly led by the Mozilla Foundation. Development of both Rust and WebAssembly are ongoing, albeit Rust being in a more mature state. Nonetheless, we can expect the APIs talked about in this article to be changed over time, which we’ll attempt to keep up to date as releases are rolled out.

AssemblyScript, a strict subset of Typescript, is another interesting language aimed at Javascript developers to start adopting Wasm, having a smaller learning curve for front-end developers than Rust or C++. This is a good example of innovation happening in the WebAssembly ecosystem.

In this piece we will publish a wasm module that makes a fetch request to Github, and return the resulting JSON. This compiled wasm function will then be called from Javascript in a Create React App project.

The full project that coincides with this piece can be found here on Github.

The Rust WebAssembly tools

To build a wasm module with Rust, we’ll utilise two frameworks to package up the module and get the compiled WebAssembly to interact with the browser:

  • wasm-pack (Github): The “one-stop shop” for compiling Rust based WebAssembly for the web. wasm-pack is a CLI tool that can build, test and publish WebAssembly modules
  • wasm-bindgen(Github): A Rust library and CLI tool for facilitating interactions with Javascript and the DOM. In fact, the underlying APIs of wasm-bindgen provide bindings for all the Web APIs, making it possible to manipulate the DOM, listen to events, call fetch requests, websockets, and more — all with Rust compiled wasm

It is important to stress that WebAssembly cannot directly access the DOM — yet. This will undoubtedly be different in 1–2 years, where various proposals being developed now will resolve features like multi-threading and direct DOM manipulation, two major bottlenecks of WebAssembly adoption currently existing.

For now, wasm-bindgen acts as the bridge to interact with Javascript APIs, all of which have Rust bindings within the library. This is done with two underlying dependencies of wasm-bindgen: the js-sys crate and web-sys crate, exposing the entire Javascript standard library and Web API library respectively, to Rust and WebAssembly.

To summarise this introductory, what we can most likely expect in the near term is:

  • wasm-bindgen to dramatically speed up as more WebAssembly proposals are implemented and support rolled out in browsers. We can also expect smaller module sizes as boilerplate code becomes unnecessary
  • wasm-bindgen will attempt to maintain their top level APIs as features are rolled out, but the underlying web-sys will undoubtedly undergo major changes as direct DOM manipulation rolls out
  • js-sys will still be around for the purpose of interacting with Javascript standard library and Javascript modules, where the two languages will be working hand in hand

This last point will most likely be the main use case of WebAssembly in the short term. Where libraries of modules (and entire web apps) have already been established in Javascript, there will be little incentive to re-build entire projects into WebAssembly.

However, what we can expect is WebAssembly based modules, that do specific things very well, to be wrapped up in NPM modules and imported into a Javascript project. Modules like:

  • Computationally expensive things like crunching numbers, rendering 3D objects, or running machine learning algorithms. These tasks in wasm will be running an order of magnitude faster than their Javascript counterparts
  • Blockchain light clients and distributed network protocols. With blockchain clients, particularly Parity’s Ethereum and Substrate frameworks, it will become apparent that compiling their existing source code into WebAssembly will make total sense, having client clients run in the browser as an imported module. These will also leverage speed and efficiency, while modularising protocol level APIs from your UX code
  • With Rust’s built in memory safety and strict typing, we can also expect mission critical tools such as security features and live chat / real-time web environments to be implemented as WebAssembly modules, with a focus on stability

Javascript is not going anywhere

What does all this mean in terms of Javascript? At this point it is hard to see a future without Javascript being the dominant language of the web. Javascript is not going anywhere, and WebAssembly will more than likely be a compliment to Javascript, rather than a replacement, speeding up certain parts of applications and providing more ways for codebases from other languages to run in the browser.

With this understanding, let’s now jump into a real-world example. We’ll be slightly modifying the fetch request example from the wasm-bindgen examples hosted on Github.

We’ll then import this module asynchronously into a Create React App project and call our wasm function within a React component, providing webpack the means to recognise wasm based modules in the process without ejecting the project.

Installing Wasm Pack

For the sake of this talk we will clone the git repository coinciding with this piece, that contains a wasm-pack generated Rust project with the fetch request code therein. We’ll also cover some basic setup steps.

A note on VS Code

Visual Studio code has great support for Rust with the RLS extension. In addition, the WebAssembly extension is also available to have wasm syntax highlighting and the ability to easily preview wasm binaries.

The following instructions assume that you already have Rust installed on your system. If not, head over to the Rust Installation page and download _rustup_.

Let’s firstly install the required CLI tools for our wasm endeavours, starting with wasm-pack. The latest installation instructions will be on their installation page, but installing the package only requires one curl request:

# install wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

A wasm-pack binary will be installed in your ~/.cargo/bin directory. The program contains the new, build, test and publish commands, the documentation of which can be found here.

Getting familiar with wasm-bindgen

Also recommended is to clone the entire wasm-bindgen repository, and have all the example code present on your system to test:

# clone wasm-bindgen
git clone https://github.com/rustwasm/wasm-bindgen

The examples/ folder contains a range of projects, albeit not configured to be built as a module. Each example can be viewed online and is accompanied with a dedicated documentation page. Refer to each example’s README.md file to get links to the live demo and documentation.

The simplest wasm-bindgen example is the console_log project, showcasing how to bind the console.log() Javascript function to a Rust function — the simplest implementation of which being the following:

#[wasm_bindgen]
extern "C" {
   #[wasm_bindgen(js_namespace = console)]
   fn log(s: &str);
}

We must annotate all wasm blocks and functions with #[wasm_bindgen] to let the compiler know this block of code is to be compiled into WebAssembly. This is a type of Conditional Compilation, and wasm_bindgen is termed an attribute.

Note: You may see some WebAssembly examples wrapped in _extern {}_, while others are wrapped in _extern “C” {}_. There is currently no significance between the two. One concerned developer opened an issue for the discrepancy, that appears to stem from an auto-formatting feature of rustfmt.

Visiting conditional compilation

Rust has a range of conditional compilation attributes built in, such as the #cfg attribute to determine whether blocks of code should be compiled based on variables, some built in, such as the platform your program is being compiled to. Here are a couple of examples taken from the documentation:

// This function is only compiled when compiling for macOS 
#[cfg(target_os = "macos")] 
fn macos_only() {   
   // ... 
}
// This function is only compiled when either foo or bar is defined #[cfg(any(foo, bar))] 
fn needs_foo_or_bar() {   
   // ... 
}

In the first example, target_os is one of a few set configuration options, that determines which platform we’re compiling to. We can even define platform specific dependencies within Cargo.toml; if I wanted to support macOS core libraries only on that platform, I could do the following in Cargo.toml:

# Cargo.toml
[target.'cfg(target_os = "macos")'.dependencies]
cocoa = "0.18.4"
core-foundation = "0.6"
core-graphics = "0.17.3"
...

You may also have noticed the #[test] attribute in your Rust projects too — preventing code annotated with #[test] from being compiled, and to be run in conjunction with cargo test.

So #[wasm] and #[wasm-bindgen] are custom annotations specifically for compiling into WebAssembly. Revisiting the console.log example, we are also using attribute arguments to specify the Javascript namespace to bind our function to:

#[wasm_bindgen(js_namespace = console)]
fn log(s: &str);
...

In the above example, we’re aiming to bind the log() function to the console.log() Javascript function.

On the subject of conditional compilation, WebAssembly projects also commonly use a feature cfg parameter, further filtering what is compiled based on features we define in Cargo.toml:

[features]
default = ["super_mode"]

These features can then be used within #[cfg]:

#[cfg(feature = "super_mode")]
fn super_execute(s: &str;) {
   ...
}

Attributes can be fun to work with, adding flexibility to your code while accommodating edge cases you may run into — e.g. custom builds for specific clients. Both wasm-pack and wasm-bindgen rely heavily on the feature for custom compilation into wasm. Read more about Attributes in Rust here.

Back to the console.log demo, the demo also documents how console.log() is polymorphic, and can accept multiple arguments. Because of this, we can also bind multiple signatures while ensuring the console.log() function is still being called, with the js_name attribute parameter:

# a log function that takes a unsigned integar
#[wasm_bindgen(js_namespace = console, js_name = log)]    
fn log_u32(a: u32);

This is a great example of how Rust can bind to Javascript, being the opposite of what we are attempting with our fetch example, aiming to call Rust compiled wasm from the Javascript side.

Beyond running these examples from the repository, it is also good practice to copy functionality from them into your own wasm-pack projects, remembering to include the required dependencies from Cargo.toml to coincide with the functions.

With some Rust understanding cemented, let’s now move onto our wasm fetch demo.

Wasm Fetch Project Structure

Clone the following repository to fetch the example demo we’ll walk through next, that hosts both the Rust module and React app client:

git clone https://github.com/rossbulat/wasm-fetch-example

The wasm-module directory contains the Rust project, whereas the client directory contains the React client.

I have packaged both projects into a single repository for convenience purposes. It is recommended to separate them into separate repositories and have each project’s config files at the top level (_Cargo.toml_, _package.json_, etc).

Inside wasm-module

wasm-pack new has been run to generate the project structure, and the fetch functionality has been plugged into our lib.rs.

Upon calling the call_fetch() function (that we’ll do in Javascript), a fetch request is called to the Github wasm-bindgen repository, fetching the latest commits from the repo. The resulting JSON is returned, which will then be available to deconstruct on the Javascript side.

The folder structure is simple, with our Rust source files within the src/ directory, and Cargo.toml outlining the project dependencies (in the form of Cargo crates), and some initial configuration:

# Rust project structure
src/
   fetch.rs
   lib.rs
   utils.rs
.gitignore
Cargo.toml
...

These are the files we are interested in. Let’s drill down what this project consists of:

  • The src/ folder contains our Rust code to be compiled. The meat of the project is in lib.rs, with the call_fetch() as the function we wish to call in Javascript. This file contains a range of use statements to bring various libraries into scope, including the required wasm_bindgen Javascript bindings, and types, required
  • src/fetch.rs simply contains some structs that will be used to store returned fetch data. In the original wasm-bindgen fetch example, these structs were also defined within lib.rs, along with the rest of the example code. For the sake of readability, I have opted to separate them from the main execution
  • Cargo.toml is a key file to understand, defining the dependencies and “features” to be used in the project. We will visit how these work further down

The execution flow within call_fetch() calls a Javascript fetch request from the wasm-bindgen bindings, before converting the returned Promise into a Rust Future — the Rust equivalent of a JS promise. Once a response is received from Github, it is persisted and formatted via the structs defined in fetch.rs.

This is the full execution flow:

# fetch request execution flow
-> JS calls call_fetch()
-> Fetch request is called via wasm-bindgen bindings, returns promise
-> JS Promise converted into Rust Future
-> Await Github response and store via provided structs
-> Rust Future converted back to JS Promise
-> Return response for use in JS

As we explored earlier, this process will become more simplified as the WebAssembly specification becomes more capable.

Building the module with wasm-pack build

When running wasm-pack build, the compiled result will be output to a pkg/ directory in your project folder. We can in fact do this now to examine the result:

# build project
cd wasm-fetch-example/wasm-module
wasm-pack build
> [INFO]: 🎯  Checking for the Wasm target...
> [INFO]: 🌀  Compiling to Wasm...
> Compiling proc-macro2 v0.4.30
  ...

The project build time will depend on your system. The resulting pkg/ folder will contain our compiled module:

# pkg contents 
pkg/
   README.md   
   wasm_fetch_example.d.ts  
   wasm_fetch_example_bg.d.ts
   package.json  
   wasm_fetch_example.js  
   wasm_fetch_example_bg.wasm

We can see that wasm-pack does not simply output a .wasm binary for us:

  • A package.json has been generated, treating this directory as a module itself, ready to be published to NPM or another directory (we’ll briefly cover publishing the module to a private registry further down)
  • Type definition files .d.ts have been generated for your module and surrounding binding functions in the event you are importing the module into a Typescript based project. These files contains types for every export of our wasm module — these could be constants, functions classes etc
  • wasm_fetch_example.js contains Javascript bindings to the wasm module itself
  • Your README.md will be copied to the package also, to provide documentation about the module

wasm-pack has done the work of formatting the project ready to be published as a module, and supports both Javascript and its Typescript superset. Upon compiling, you may see this warning in the console:

> Optional fields missing from Cargo.toml: 'description', 'repository', and 'license'. These are not necessary, but recommended

wasm-pack will attempt to take this information from Cargo.toml to populate package.json. You could indeed slot in this information in Cargo.toml under the [package] section:

# Cargo.toml
[package]
description = "An example Rust based WebAssembly project implementing a fetch request via wasm-bindgen."
license = "MIT"
repository = "<your_repo_url>",
...

Note: Subsequent builds are a lot quicker, only re-compiling the changes you have made.

Publishing wasm modules

In the event you wish to publicly publish your WebAssembly module, you can indeed do so with the pack and publish commands wasm-pack provides:

  • wasm-pack pack creates a tarball from the pkg/ directory
  • wasm-pack publish creates a tarball from the pkg/ directory, and then publishes it to the public NPM registry

The _publish_ command is simply a pointer to the _npm publish_ command, the official means of publishing to an NPM registry. So all the flags available to _npm publish_ are also available to _wasm-pack publish_.

# publishing to public npm registry
wasm-pack build

It is likely that you’ll want to publish your module privately to a private registry. I have published an article on exactly how to set up such a registry, using the private proxy registry Verdaccio.

Once you have a private registry set up, simply provide the URL with the --registry flag:

# publish to a private registry
wasm-pack publish --registry "http://<your_registry_ip_or_domain>"

This provides a means of testing WebAssembly libraries internally — a more realistic scenario for organisations that are iteratively developing WebAssembly modules as the specification evolves.

wasm-pack has now played its part, its role stops after the publishing process, with our module ready to be added to projects as a dependency. Let’s now explore how to use wasm modules in a React project.

Importing wasm modules

To test the fetch functionality, I have included a base Create React App project in the accompanying repository of this piece.

To save the reader from publishing the wasm module themselves for use with this project, I have included it in within node_modules, ignoring the entire directory apart from this module:

// .gitignore
client/node_modules/*
!/client/node_modules/wasm-fetch-example
...

Amending Create React App to support wasm modules

Create React App does not currently support WebAssembly based modules in its Webpack configuration.

Note:ECMA Script module integration is currently an active WebAssembly proposal — we can expect a more streamlined integration process once these features are finalised and rolled out.

This again is most likely a short term issue, and will be resolved as WebAssembly gains more adoption — but there is a solution.

In order to support our newly published module, we need to amend the Webpack configuration of Create React App, adding a wasm loader. We can indeed do this, without ejecting CRA, with a package called react-app-rewired, along with wasm-loader, adding WebAssembly support to Webpack.

These have been installed as dev dependencies:

yarn add react-app-rewired wasm-loader --dev

A config-overrides.js script has been defined in the client’s root directory, that plugs in support for wasm based modules.

The last amendment here is in package.json, where we are calling react-app-rewired instead of react-scripts when compiling and running the app:

// package.json
"script": {
   "start": "react-app-rewired start",
   "build": "react-app-rewired build",
   "test": "react-app-rewired test",
   ...
}

Asynchronously importing a wasm module

App.tsx demonstrates how we can asynchronously import a wasm module and load it into a component’s state.

Here is the full solution:

import React from 'react';
import logo from './logo.svg';
import './App.css';

const App: React.FC = () => {

  // wasm module will be stored in state once loaded
  const [wasmModule, setWasmModule] = React.useState();
  
  // asynchronous function to fetch module and load into state
  const loadWasm = async () => {
    try {
      const wasm = await import('wasm-fetch-example');
      setWasmModule({ wasm });
      console.log('wasm set');
    } catch (err) {
      console.error(`Unexpected error in loadWasm. [Message: ${err.message}]`);
    }
  };

  // takes our module and calls the call_fetch() function
  const callFetch = async ({ wasm }: { wasm: any }) => {
    console.log('calling fetch');
    const res = await wasm.call_fetch();
    console.log(res);
  }

  // load wasm asynchronously if not yet defined
  wasmModule === undefined && loadWasm();

  // call fetch once module has imported
  if (wasmModule !== undefined) {
    callFetch(wasmModule);
  }
  
  return (
    <div className="App">
    </div>
  )
}

export default App;

App.tsx

In Summary

This article has aimed to introduce the process of building a WebAsssembly module from Rust, with wasm-pack as our means of generating and compiling the final module.

In this piece, we have ascertained:

  • That wasm-bindgen provides us with bindings to the Javascript standard library and standard Web API library, giving us access to call Javascript, manipulate the DOM, get window and event data, etc — all from our Rust wasm module
  • wasm-pack is a useful tool for generating a bare-bones Rust based WebAssembly project, and automates the process of compiling wasm and preparing the resulting package to be published as a module
  • Create React App by default does not currently support .wasm module extensions in its Webpack config. To get WebAssembly into your components, the react-app-rewired package can be used to plug a wasm-loader into the Webpack configuration, extending that of CRA’s configuration, without the need to eject the project
  • Importing wasm modules asynchronously via Promises will not interrupt the flow of your app. Loading indicators or prompts can be used to let the user know your module is being loaded into state.

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

What’s new in HTML6

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

The Rust Programming Language - Understanding Loops in Rust

The Rust Programming Language - Understanding Loops in Rust

In this Rust programming language tutorial, we'll understanding Loops in Rust. Rust currently provides three approaches to performing some kind of iterative activity. They are: loop, while and for. The infinite loop is the simplest form of loop available in Rust. Rust also has a while loop. The for loop is used to loop a particular number of times

Rust currently provides three approaches to performing some kind of iterative activity. They are: loop, while and for. Each approach has its own set of uses.

loop

The infinite loop is the simplest form of loop available in Rust. Using the keyword loop, Rust provides a way to loop indefinitely until some terminating statement is reached. Rust's infinite loops look like this:

loop {
    println!("Loop forever!");
}

while

Rust also has a while loop. It looks like this:


# #![allow(unused_variables)]
#fn main() {
let mut x = 5; // mut x: i32
let mut done = false; // mut done: bool

while !done {
    x += x - 3;

    println!("{}", x);

    if x % 5 == 0 {
        done = true;
    }
}
#}

while loops are the correct choice when you’re not sure how many times you need to loop.

If you need an infinite loop, you may be tempted to write this:

while true {

However, loop is far better suited to handle this case:

loop {

Rust’s control-flow analysis treats this construct differently than a while true, since we know that it will always loop. In general, the more information we can give to the compiler, the better it can do with safety and code generation, so you should always prefer loop when you plan to loop infinitely.

for

The for loop is used to loop a particular number of times. Rust’s for loops work a bit differently than in other systems languages, however. Rust’s for loop doesn’t look like this “C-style” for loop:

for (x = 0; x < 10; x++) {
    printf( "%d\n", x );
}

Instead, it looks like this:


# #![allow(unused_variables)]
#fn main() {
for x in 0..10 {
    println!("{}", x); // x: i32
}
#}

In slightly more abstract terms,

for var in expression {
    code
}

The expression is an item that can be converted into an iterator using IntoIterator. The iterator gives back a series of elements, one element per iteration of the loop. That value is then bound to the name var, which is valid for the loop body. Once the body is over, the next value is fetched from the iterator, and we loop another time. When there are no more values, the for loop is over.

In our example, 0..10 is an expression that takes a start and an end position, and gives an iterator over those values. The upper bound is exclusive, though, so our loop will print 0 through 9, not 10.

Rust does not have the “C-style” for loop on purpose. Manually controlling each element of the loop is complicated and error prone, even for experienced C developers.

Enumerate

When you need to keep track of how many times you have already looped, you can use the .enumerate() function.

On ranges:


# #![allow(unused_variables)]
#fn main() {
for (index, value) in (5..10).enumerate() {
    println!("index = {} and value = {}", index, value);
}
#}

Outputs:

index = 0 and value = 5
index = 1 and value = 6
index = 2 and value = 7
index = 3 and value = 8
index = 4 and value = 9

Don't forget to add the parentheses around the range.

On iterators:


# #![allow(unused_variables)]
#fn main() {
let lines = "hello\nworld".lines();

for (linenumber, line) in lines.enumerate() {
    println!("{}: {}", linenumber, line);
}
#}

Outputs:

0: hello
1: world

Ending iteration early

Let’s take a look at that while loop we had earlier:


# #![allow(unused_variables)]
#fn main() {
let mut x = 5;
let mut done = false;

while !done {
    x += x - 3;

    println!("{}", x);

    if x % 5 == 0 {
        done = true;
    }
}
#}

We had to keep a dedicated mut boolean variable binding, done, to know when we should exit out of the loop. Rust has two keywords to help us with modifying iteration: break and continue.

In this case, we can write the loop in a better way with break:


# #![allow(unused_variables)]
#fn main() {
let mut x = 5;

loop {
    x += x - 3;

    println!("{}", x);

    if x % 5 == 0 { break; }
}
#}

We now loop forever with loop and use break to break out early. Issuing an explicit return statement will also serve to terminate the loop early.

continue is similar, but instead of ending the loop, it goes to the next iteration. This will only print the odd numbers:


# #![allow(unused_variables)]
#fn main() {
for x in 0..10 {
    if x % 2 == 0 { continue; }

    println!("{}", x);
}
#}
Loop labels

You may also encounter situations where you have nested loops and need to specify which one your break or continue statement is for. Like most other languages, Rust's break or continue apply to the innermost loop. In a situation where you would like to break or continue for one of the outer loops, you can use labels to specify which loop the break or continue statement applies to.

In the example below, we continue to the next iteration of outer loop when x is even, while we continue to the next iteration of inner loop when y is even. So it will execute the println! when both x and y are odd.


# #![allow(unused_variables)]
#fn main() {
'outer: for x in 0..10 {
    'inner: for y in 0..10 {
        if x % 2 == 0 { continue 'outer; } // Continues the loop over `x`.
        if y % 2 == 0 { continue 'inner; } // Continues the loop over `y`.
        println!("x: {}, y: {}", x, y);
    }
}
#}

The Rust Programming Language - Understanding If in Rust

The Rust Programming Language - Understanding If in Rust

The Rust Programming Language - Understanding If in Rust. Rust’s take on if is not particularly complex, but it’s much more like the if you’ll find in a dynamically typed language than in a more traditional systems language. if is a specific form of a more general concept, the ‘branch’, whose name comes from a branch in a tree: a decision point, where depending on a choice, multiple paths can be taken.

Rust’s take on if is not particularly complex, but it’s much more like the if you’ll find in a dynamically typed language than in a more traditional systems language. So let’s talk about it, to make sure you grasp the nuances.

if is a specific form of a more general concept, the ‘branch’, whose name comes from a branch in a tree: a decision point, where depending on a choice, multiple paths can be taken.

In the case of if, there is one choice that leads down two paths:


# #![allow(unused_variables)]
#fn main() {
let x = 5;

if x == 5 {
    println!("x is five!");
}
#}

If we changed the value of x to something else, this line would not print. More specifically, if the expression after the if evaluates to true, then the block is executed. If it’s false, then it is not.

If you want something to happen in the false case, use an else:


# #![allow(unused_variables)]
#fn main() {
let x = 5;

if x == 5 {
    println!("x is five!");
} else {
    println!("x is not five :(");
}
#}

If there is more than one case, use an else if:


# #![allow(unused_variables)]
#fn main() {
let x = 5;

if x == 5 {
    println!("x is five!");
} else if x == 6 {
    println!("x is six!");
} else {
    println!("x is not five or six :(");
}
#}

This is all pretty standard. However, you can also do this:


# #![allow(unused_variables)]
#fn main() {
let x = 5;

let y = if x == 5 {
    10
} else {
    15
}; // y: i32
#}

Which we can (and probably should) write like this:


# #![allow(unused_variables)]
#fn main() {
let x = 5;

let y = if x == 5 { 10 } else { 15 }; // y: i32
#}

This works because if is an expression. The value of the expression is the value of the last expression in whichever branch was chosen. An if without an else always results in () as the value.

The Rust Programming Language - Understanding Functions in Rust

The Rust Programming Language - Understanding Functions in Rust

The Rust Programming Language - Understanding Functions in Rust - Functions - Functions are the building blocks of readable, maintainable, and reusable code. Every Rust program has at least one function.

Every Rust program has at least one function, the main function:

fn main() {
}

This is the simplest possible function declaration. As we mentioned before, fn says ‘this is a function’, followed by the name, some parentheses because this function takes no arguments, and then some curly braces to indicate the body. Here’s a function named foo:


# #![allow(unused_variables)]
#fn main() {
fn foo() {
}
#}

So, what about taking arguments? Here’s a function that prints a number:


# #![allow(unused_variables)]
#fn main() {
fn print_number(x: i32) {
    println!("x is: {}", x);
}
#}

Here’s a complete program that uses print_number:

fn main() {
    print_number(5);
}

fn print_number(x: i32) {
    println!("x is: {}", x);
}

As you can see, function arguments work very similar to let declarations: you add a type to the argument name, after a colon.

Here’s a complete program that adds two numbers together and prints them:

fn main() {
    print_sum(5, 6);
}

fn print_sum(x: i32, y: i32) {
    println!("sum is: {}", x + y);
}

You separate arguments with a comma, both when you call the function, as well as when you declare it.

Unlike let, you must declare the types of function arguments. This does not work:

fn print_sum(x, y) {
    println!("sum is: {}", x + y);
}

You get this error:

expected one of `!`, `:`, or `@`, found `)`
fn print_sum(x, y) {

This is a deliberate design decision. While full-program inference is possible, languages which have it, like Haskell, often suggest that documenting your types explicitly is a best-practice. We agree that forcing functions to declare types while allowing for inference inside of function bodies is a wonderful sweet spot between full inference and no inference.

What about returning a value? Here’s a function that adds one to an integer:


# #![allow(unused_variables)]
#fn main() {
fn add_one(x: i32) -> i32 {
    x + 1
}
#}

Rust functions return exactly one value, and you declare the type after an ‘arrow’, which is a dash (-) followed by a greater-than sign (>). The last line of a function determines what it returns. You’ll note the lack of a semicolon here. If we added it in:

fn add_one(x: i32) -> i32 {
    x + 1;
}

We would get an error:

error: not all control paths return a value
fn add_one(x: i32) -> i32 {
     x + 1;
}

help: consider removing this semicolon:
     x + 1;
          ^

This reveals two interesting things about Rust: it is an expression-based language, and semicolons are different from semicolons in other ‘curly brace and semicolon’-based languages. These two things are related.

Expressions vs. Statements

Rust is primarily an expression-based language. There are only two kinds of statements, and everything else is an expression.

So what's the difference? Expressions return a value, and statements do not. That’s why we end up with ‘not all control paths return a value’ here: the statement x + 1; doesn’t return a value. There are two kinds of statements in Rust: ‘declaration statements’ and ‘expression statements’. Everything else is an expression. Let’s talk about declaration statements first.

In some languages, variable bindings can be written as expressions, not statements. Like Ruby:

x = y = 5

In Rust, however, using let to introduce a binding is not an expression. The following will produce a compile-time error:

let x = (let y = 5); // Expected identifier, found keyword `let`.

The compiler is telling us here that it was expecting to see the beginning of an expression, and a let can only begin a statement, not an expression.

Note that assigning to an already-bound variable (e.g. y = 5) is still an expression, although its value is not particularly useful. Unlike other languages where an assignment evaluates to the assigned value (e.g. 5 in the previous example), in Rust the value of an assignment is an empty tuple () because the assigned value can have only one owner, and any other returned value would be too surprising:


# #![allow(unused_variables)]
#fn main() {
let mut y = 5;

let x = (y = 6);  // `x` has the value `()`, not `6`.
#}

The second kind of statement in Rust is the expression statement. Its purpose is to turn any expression into a statement. In practical terms, Rust's grammar expects statements to follow other statements. This means that you use semicolons to separate expressions from each other. This means that Rust looks a lot like most other languages that require you to use semicolons at the end of every line, and you will see semicolons at the end of almost every line of Rust code you see.

What is this exception that makes us say "almost"? You saw it already, in this code:


# #![allow(unused_variables)]
#fn main() {
fn add_one(x: i32) -> i32 {
    x + 1
}
#}

Our function claims to return an i32, but with a semicolon, it would return () instead. Rust realizes this probably isn’t what we want, and suggests removing the semicolon in the error we saw before.

Early returns

But what about early returns? Rust does have a keyword for that, return:


# #![allow(unused_variables)]
#fn main() {
fn foo(x: i32) -> i32 {
    return x;

    // We never run this code!
    x + 1
}
#}

Using a return as the last line of a function works, but is considered poor style:


# #![allow(unused_variables)]
#fn main() {
fn foo(x: i32) -> i32 {
    return x + 1;
}
#}

The previous definition without return may look a bit strange if you haven’t worked in an expression-based language before, but it becomes intuitive over time.

Diverging functions

Rust has some special syntax for ‘diverging functions’, which are functions that do not return:


# #![allow(unused_variables)]
#fn main() {
fn diverges() -> ! {
    panic!("This function never returns!");
}
#}

panic! is a macro, similar to println!() that we’ve already seen. Unlike println!(), panic!() causes the current thread of execution to crash with the given message. Because this function will cause a crash, it will never return, and so it has the type ‘!’, which is read ‘diverges’.

If you add a main function that calls diverges() and run it, you’ll get some output that looks like this:

thread ‘main’ panicked at ‘This function never returns!’, hello.rs:2

If you want more information, you can get a backtrace by setting the RUST_BACKTRACE environment variable:

$ RUST_BACKTRACE=1 ./diverges
thread 'main' panicked at 'This function never returns!', hello.rs:2
Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
  hello::diverges
        at ./hello.rs:2
  hello::main
        at ./hello.rs:6

If you want the complete backtrace and filenames:

$ RUST_BACKTRACE=full ./diverges
thread 'main' panicked at 'This function never returns!', hello.rs:2
stack backtrace:
   1:     0x7f402773a829 - sys::backtrace::write::h0942de78b6c02817K8r
   2:     0x7f402773d7fc - panicking::on_panic::h3f23f9d0b5f4c91bu9w
   3:     0x7f402773960e - rt::unwind::begin_unwind_inner::h2844b8c5e81e79558Bw
   4:     0x7f4027738893 - rt::unwind::begin_unwind::h4375279447423903650
   5:     0x7f4027738809 - diverges::h2266b4c4b850236beaa
   6:     0x7f40277389e5 - main::h19bb1149c2f00ecfBaa
   7:     0x7f402773f514 - rt::unwind::try::try_fn::h13186883479104382231
   8:     0x7f402773d1d8 - __rust_try
   9:     0x7f402773f201 - rt::lang_start::ha172a3ce74bb453aK5w
  10:     0x7f4027738a19 - main
  11:     0x7f402694ab44 - __libc_start_main
  12:     0x7f40277386c8 - <unknown>
  13:                0x0 - <unknown>

If you need to override an already set RUST_BACKTRACE, in cases when you cannot just unset the variable, then set it to 0 to avoid getting a backtrace. Any other value (even no value at all) turns on backtrace.

$ export RUST_BACKTRACE=1
...
$ RUST_BACKTRACE=0 ./diverges 
thread 'main' panicked at 'This function never returns!', hello.rs:2
note: Run with `RUST_BACKTRACE=1` for a backtrace.

RUST_BACKTRACE also works with Cargo’s run command:

$ RUST_BACKTRACE=full cargo run
     Running `target/debug/diverges`
thread 'main' panicked at 'This function never returns!', hello.rs:2
stack backtrace:
   1:     0x7f402773a829 - sys::backtrace::write::h0942de78b6c02817K8r
   2:     0x7f402773d7fc - panicking::on_panic::h3f23f9d0b5f4c91bu9w
   3:     0x7f402773960e - rt::unwind::begin_unwind_inner::h2844b8c5e81e79558Bw
   4:     0x7f4027738893 - rt::unwind::begin_unwind::h4375279447423903650
   5:     0x7f4027738809 - diverges::h2266b4c4b850236beaa
   6:     0x7f40277389e5 - main::h19bb1149c2f00ecfBaa
   7:     0x7f402773f514 - rt::unwind::try::try_fn::h13186883479104382231
   8:     0x7f402773d1d8 - __rust_try
   9:     0x7f402773f201 - rt::lang_start::ha172a3ce74bb453aK5w
  10:     0x7f4027738a19 - main
  11:     0x7f402694ab44 - __libc_start_main
  12:     0x7f40277386c8 - <unknown>
  13:                0x0 - <unknown>

A diverging function can be used as any type:


# #![allow(unused_variables)]
#fn main() {
# fn diverges() -> ! {
#    panic!("This function never returns!");
# }
let x: i32 = diverges();
let x: String = diverges();
#}
Function pointers

We can also create variable bindings which point to functions:


# #![allow(unused_variables)]
#fn main() {
let f: fn(i32) -> i32;
#}

f is a variable binding which points to a function that takes an i32 as an argument and returns an i32. For example:


# #![allow(unused_variables)]
#fn main() {
fn plus_one(i: i32) -> i32 {
    i + 1
}

// Without type inference:
let f: fn(i32) -> i32 = plus_one;

// With type inference:
let f = plus_one;
#}

We can then use f to call the function:


# #![allow(unused_variables)]
#fn main() {
# fn plus_one(i: i32) -> i32 { i + 1 }
# let f = plus_one;
let six = f(5);
#}