CS3984 Computer Systems in Rust



What is the course about?

This is a class on Computer Systems.

The course adopts the perspective of a programmer using computer systems, rather than a designer of operating systems.

What languages are systems applications written in?

Databases

PostgreSQL

SQLite

MariaDB

MySQL

What languages are systems applications written in?

Networking

OpenSSH

nginx

Curl

Nmap

Why C and C++ in systems programming?

Primary reason: performance.

Source: The Computer Language Benchmarks Game

Why C and C++ in systems programming?

Primary reason: performance.

  • Compiles directly to machine code
  • Allows direct control over memory allocation
  • Allows directly accessing hardware and memory
  • Interoperates with other low-level code like assembly language
  • Compiles to bytecode that has to be interpreted*
  • Memory is garbage collected
  • Platform access gated by virtual machine

* In practice the bytecode is Just-In-Time compiled to native code and executed

But people write web apps in Javascript, and those applications serve thousands of concurrent users!

Much of modern programming is abstractions upon abstractions:

Lower-level languages are still needed!

Why not C and C++ in systems programming?

Primary reason: memory unsafety.

Memory safety is a property of programming languages that prevents bugs related to memory access. These include buffer overflows, use after free, and data races*.

Memory unsafety causes

  • 70% of high/critical vulnerabilities in Google’s Chromium
  • 70% common vulnerabilities and exposures (CVEs) from Microsoft
  • ~94% of high/critical bugs in Mozilla software
  • 67% of zero-day vulnerabilities from Google’s Project Zero

White House Press Release

White House Press Release
Source: Office of the National Cyber Director

July 2024 Crowdstrike Incident

Crowdstrike Error
  • Cybersecurity company Crowdstrike distributed a faulty update for its Falcon sensor software.
  • The update caused machines to enter a bootloop or boot into recovery mode, many requiring manual fixing.
  • Roughly 8.5 million systems crashed, costing at least US$10 billion in financial damanges worldwide.

July 2024 Crowdstrike Incident

Cause? Memory safety error.

…Sensors that received the new version of Channel File 291 carrying the problematic content were exposed to a latent out-of-bounds read issue in the Content Interpreter. At the next IPC notification from the operating system, the new IPC Template Instances were evaluated, specifying a comparison against the 21st input value. The Content Interpreter expected only 20 values. Therefore, the attempt to access the 21st value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash.

  1. Sensor input read into array
  2. Array access not bounds checked
  3. Template specifies a comparison with the 21st input value, but sensor only expected 20 values

Why Rust?

Let’s look at marketing from the Rust programming language:

Performance

  1. Compilation to native code with no runtime
    • Allows running on embedded devices
    • Allows interop with other languages
  2. Lack of garbage collection
    • Provides predictable performance

Case Study: Discord

  1. Implementing a critical data structure (used in the Elixir backend) in Rust improved performance by 820x in the best case and 42,500x in the worst case source.

  2. Rewriting the “Read States” service from Go to Rust improved performance in every metric including latency, CPU, and memory source.

    Note: Go is purple, Rust is blue

Reliability

  1. Buffer overflows
#include <stdio.h>

int main(void) {
  int array[] = { 1, 2, 3, 4, 5 };
  printf("%d\n", array[1000]);
}


fn main() {
  let array: [u32; 5] = [1, 2, 3, 4, 5];
  println!("{}", array[1000]);
}


Reliability

  1. Dangling pointer
#include <stdio.h>

int* return_pointer() {
    int x = 5;
    return &x;
}

int main(void) {
    int* x = return_pointer();
    printf("%d\n", *x);
}


fn return_reference() -> &i32 {
    let x = 5;
    &x
}

fn main() {
    let x: &i32 = return_reference();
    println!("{}", *x);
}


Undefined Behavior in C

  1. Result of running a program that violates the language specification.
  2. There are no restrictions on the behavior of the program.
  3. Implementations are not required to diagnose undefined behavior.

When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose

Spot The Overflow

#include <string.h>

void copy_packet(char *packet_data, int packet_len) {
    char buffer[128];
    int bytes_to_copy = packet_len;
    if (bytes_to_copy < 128) {
        strncpy(buffer, packet_data, bytes_to_copy);
    }
}


Source: Stanford CS110L

Spot The Problem

for (size_t i = 0; i < container.size() - 1; i++) {
    // Access element in container at index `i`
    container[i];
}


Tricky C

#include <stdio.h>

int main(void) {
    unsigned char one = 1;
    unsigned char max = 255;

    unsigned char sum = one + max;
    if (sum == one + max) {
        printf("sum = one + max and sum == one + max");
    } else {
        printf("sum = one + max but sum != one + max");
    }
}


Undefined Behavior in Rust

  1. Safe Rust cannot cause Undefined Behavior.
  2. Unsafe Rust can cause Undefined Behavior.
  3. The unsafe keyword separates Safe and Unsafe Rust.

Safe and Unsafe Rust

#include <stdint.h>

int32_t add(int32_t a, int32_t b) {
    return a + b;
}


Safe and Unsafe Rust (2)

extern "C" {
    fn add(a: i32, b: i32) -> i32;
}


Safe and Unsafe Rust (3)

extern "C" {
    fn add(a: i32, b: i32) -> i32;
}

pub fn safe_add(a: i32, b: i32) -> i32 {
    unsafe { add(a, b) }
}


Safe and Unsafe Rust (4)

extern "C" {
    fn add(a: i32, b: i32) -> i32;
}

pub fn safe_add(a: i32, b: i32) -> i32 {
    // Will overflow
    if a >= 0 && (b > i32::MAX - a) {
        return 0;
    }

    // Will underflow
    if a < 0 && (b < i32::MIN - a) {
        return 0;
    }

    unsafe { add(a, b) }
}