0%
September 4, 2023

Rust Study Notes

rust

Life Time

The compiler has 3 rules for the lifetime:

  • The compiler assigns a lifetime parameter to each parameter that's a reference
  • If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters.
  • If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters

Note that

  • A liftime comes from an input reference is called an input lifeime;
  • that comes from an output reference is called an output lifetime.

To sum up, we always expect:

Result Type

We treat Result type like a Promise in javascript, in which we have

  • return Ok(...); = resolve(...); and
  • return Err(...); = reject(...);.

The generic type parameters of Result following the rule:

Result<type returned by Ok, type returned by Err>
struct Config<'a> {
    query: &'a String,
    filename: &'a String,
}

impl<'a> Config<'a> {
    fn new(args: &'a [String]) -> Result<Config, &str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }
        let query = &args[1];
        let filename = &args[2];
        Ok(Config { query, filename })
    }
}

There are two ways to squeeze the Config out of Result enum:

Extraction Method 1: Squeezing by Unwrap

Next in our programme if we unwrap and handle the error gracefully:

1let args: Vec<String> = env::args().collect();
2let config = Config::new(&args).unwrap_or_else(|err| {
3    println!("Problem parsing arguments: {}", err);
4    process::exit(1);
5});
6run(config);

Then from line 6 onwards our config has been converted from Result to Config.

Take away. We can squeeze Result<T> to T by executing unwrap() once.

Extraction Method 2: Assigning by Some

Alternatively, it is conventional to write a placeholder null variable and assign value into it when something exists, that pattern in rust is implemented by Option enum and Some object:

let mut config: Option<Config> = None;

let result = Config::new(&args);
if let Ok(config_) = result {
    config = Some(config_);
};

if let Some(config_unwrapped) = config {
    run(config_unwrapped);
};

We didn't handle the error and error message. We can combine unwrap_or_else and the assignment = Some(config_) approach depending on the ways of doing things.

If we want multiple nulls checking,

if let (Some(a_), Some(b_)) = (a, b) {
    // do something
}

For example,

if let (Some(a), Some(b)) = (Some(7), Some(8)) {
    println!("Result: {}", a * b);
}

prints 56.

Throwing Arbitrary Error

Consider the following function:

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let query = config.query;
    let filename = config.filename;
    let contents = fs::read_to_string(filename)?;
    println!("{}", contents);
    Ok(())
}
  • fs::read_to_string returns a Result object. If we want to throw an Error and let the function call in the previous stack frame to handle it, we just add a ?.
  • The Box<dyn Error> in the return type serves the same purpose as Java's
    public void function someFunction() throws Exception {};

Handle the Final Execution Error

Assume that we have:

1fn main() {
2    let args: Vec<String> = env::args().collect();
3    let config = Config::new(&args).unwrap_or_else(|err| {
4        println!("[Problem parsing arguments] {}", err);
5        process::exit(1);
6    });

Then the following two are equivalent:

7    run(config).unwrap_or_else(|err| {
8        println!("Application Error: {}", err);
9        process::exit(1);
10    });
11}
7    if let Err(err) = run(config) {
8        println!("Application Error: {}", err);
9        process::exit(1);
10    }
11}

Second Visit to the Multi-threading Web Server Example In Rust Book

My energy got exhausted at the first time I go with rust book to the last chapter (you can see how much detail I have recorded here before the last chapter on web server!).

This time I grabbed and digested detail in a deeper understanding. I try to record the detail in this blog post.

fn main()

We start off by writing down the general structure of the program in main function, the intersting part lies inside lib.rs, i.e., how we define ThreadPool.

use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
use std::time::Duration;
use std::{fs, thread};

use web_server::ThreadPool;

fn main() {
    let listener = TcpListener::bind("127.0.0.1:7878").unwrap();

    let pool = ThreadPool::new(4);

    for stream in listener.incoming() {
        let stream = stream.unwrap();
        pool.execute(|| {
            handle_connection(stream);
        });
    }
}

fn handle_connection(mut stream: TcpStream) {
    let get = b"GET / HTTP/1.1\r\n";
    let sleep = b"GET /sleep HTTP/1.1\r\n";

    let mut buffer = [0; 1024];
    stream.read(&mut buffer).unwrap();

    println!("{}", buffer.starts_with(get));

    let (status_line, filename) = if buffer.starts_with(get) {
        ("HTTP/1.1 200 OK", "hello.html")
    } else if buffer.starts_with(sleep) {
        println!("{}", "sleeping...");
        thread::sleep(Duration::from_secs(5));
        println!("{}", "awake!");
        ("HTTP/1.1 200 OK", "hello.html")
    } else {
        ("HTTP/1.1 404 NOT FOUND", "404.html")
    };

    let contents = fs::read_to_string(filename).unwrap();
    let response = format!(
        "{}\r\nContent-Length: {}\r\n\r\n{}",
        status_line,
        &contents.len(),
        &contents
    );
    stream.write(response.as_bytes()).unwrap();
    stream.flush().unwrap();
}
lib.rs, the web_server::ThreadPool

lib.rs is a single module which by default is imported by calling

project_name::{what's defined as pub in lib.rs}

Inside our lib.rs we have

1use std::option::Option;
2use std::{
3    sync::{
4        mpsc::{self, Receiver},
5        Arc, Mutex,
6    },
7    thread,
8};
9
10pub struct ThreadPool {
11    workers: Vec<Worker>,
12    sender: mpsc::Sender<Message>,
13}

I would like to pin the takeways in this program (instead of introducing what's the target and what's to be done in this example).

14// property of a mutable reference is at most mutable reference
15// we cannot move it out, moving is not a mutation
16impl Drop for ThreadPool {
17    fn drop(&mut self) {
18        println!("Terminating all workers");
19        for _ in &self.workers {
20            self.sender.send(Message::Terminate).unwrap();
21        }
22
23        for worker in &mut self.workers {

1st Takeaway. We would write line 23 as

for worker in self.workers

instead by our first instinct, an error will pop up:

`self.workers` moved due to this implicit call to `.into_iter()`
`into_iter` takes ownership of the receiver `self`, which moves `self.workers`
  • The property of a mutable reference is at most a mutable reference (which we need to specify).
  • The reason is that .into_iter(self) is implicitly called, which moves our self.workers into a function that generates iterator.
  • Although self is a mutable reference, moving its property is not a mutation, a move will drain the memory out by assigning the source property to null_ptr and assign that original pointer to the target that we move into.
24            println!("Shutting down worker {}", worker.id);
25
26            if let Some(thread) = worker.thread.take() {
27                thread.join().unwrap();
28            }
29        }
30    }
31}

2nd Takeaway. Note that worker is a property of a mutable reference self.workers, hence again worker itself is at most a mutable reference.

However, we want to call worker.thread.join().unwrap(), the function join has signature join(self), i.e., worker.thread will be moved.

The usual trick in rust is to wrap T into Option<T>, then Option<T>::take() allows moving the Some<T> out by careful unsafe rust implementation.

32enum Message {
33    Job(Box<dyn FnOnce() + Send + 'static>),
34    Terminate,
35}

3nd Takeway. In the course of coding this example, instead of implementing enum Message, what we originally implemented is simply

type Job = Box<dyn FnOnce() + Send + 'static>

and in line 42 has been

let (sender, receiver) = mpsc::channel::<Job>();

Because later on we not only want to signal a Job to the threads, we also want to signal a Termination to the threads.

In plain javascript we can naively implement this by sending ["job", job] and ["terminate", null] to the workers, i.e., we append some field to distinguish the messages.

In rust approach, we treat that field's as enum variants:

  • Job(job trait) ["job", job] (job is a closure)
  • Terminate ["terminate", null]

and we group the variants in an enum class:

enum Message {
    Job(Box<dyn FnOnce() + Send + 'static>),
    Terminate,
}
36impl ThreadPool {
37    pub fn new(size: usize) -> Self {
38        assert!(size > 0);
39        let mut workers = Vec::with_capacity(size);
40        let (sender, receiver) = mpsc::channel::<Message>();
41
42        let receiver = Arc::new(Mutex::new(receiver));
43
44        for id in 0..size {
45            workers.push(Worker::new(id, receiver.clone()));
46        }
47
48        ThreadPool { workers, sender }
49    }
50
51    pub fn execute<F>(&self, f: F)
52    where
53        F: FnOnce() + Send + 'static,
54    {
55        let job = Message::Job(Box::new(f));
56        self.sender.send(job).unwrap();
57    }
58}
59
60struct Worker {
61    id: usize,
62    thread: Option<thread::JoinHandle<()>>,
63}
64
65impl Worker {
66    fn new(id: usize, receiver: Arc<Mutex<Receiver<Message>>>) -> Worker {
67        let thread = thread::spawn(move || loop {
68            let msg = receiver.clone().lock().unwrap().recv().unwrap();
69            match msg {
70                Message::Job(job) => {
71                    println!("Worker {} got a job; excuting.", id);
72                    job();
73                }
74                Message::Terminate => {
75                    println!("Terminated!");
76                    break;
77                }
78            }
79        });
80        Worker {
81            id,
82            thread: Some(thread),
83        }
84    }
85}

Finally:

  • Arc is a multi-threaded version of Rc for multiple reference to the same wrappered object.
  • Mutex is to block access from other threads to the wrapped object.