CS3984 Computer Systems in Rust



Unix File Descriptors

  • A file descriptor is a handle that allows user processes to refer to files, which are abstractions for sequences of bytes
  • Unix represents many different kernel abstractions as files to abstract I/O devices: e.g., disks, terminals, network sockets, IPC channels (pipes), etc.
  • They provide a uniform API, no matter the kind of the underlying object read(2), write(2), close(2), lseek(2), dup2(), and more
  • May maintain a read/write position if seekable
    • But note: not all operations work on all kinds of file descriptors
  • Are represented using (small) integers obtain from system calls such as open(2)
  • Are considered low-level I/O
  • Are inherited/cloned by a child process upon fork()
  • Are retained when a process exec()'s another program
  • Are closed when a process exit()s or is killed

Standard Streams

  • By convention, 0, 1, 2 are used for standard input, standard output, and standard error streams
  • Programs do not have to open any files to obtain these file descriptors; they are preconnected;
  • Thus programs can use them without needing any additional information upfront
  • Control programs (shell), or the program starting a program can set those up to refer to some regular file, terminal device, a pipe, or something else
  • When used, they access the underlying kernel object in the same way as if they’d opened it themselves

Rust High-level I/O

  • Access standard streams via high-level I/O (similar to C’s stdio.h)
use std::io;

fn main() -> io::Result<()> {
    let mut buffer = String::new();
    io::stdin().read_line(&mut buffer)?;
    println!("You entered: {}", buffer);
    Ok(())
}


  • std::io::stdin provides a high-level abstraction over these low-level standard streams
    • buffering of system call
    • decoding of UTF-8 when requested (read_to_string)
    • line-based input

File Descriptor Management

  • To properly understand file descriptors, must understand their implementation inside the kernel
  • File descriptors use 2 layers of indirection, both of which involve reference counting
    • (integer) file descriptors in a per-process table point to entries in a global open file table
    • per-process file descriptor table has a limit on the number of entries
    • each open file table entry maintains a read/write offset (or position) for the file
    • entries in the open file table point to entries in a global “vnode” table, which contains specialized entries for each file-like object
  • File descriptor tables are (generally) per-process, but processes can duplicate and rearrange entries

In-kernel Management of File Descriptors

In-kernel Management of File Descriptors (cont’d)

  • Double indirection
    • open file table points to vnode table, but each entry maintains individual r/w offset that is advanced on read/write

    • multiple file descriptors in one or more processes may refer to the same entry in the open file table. In this case, either file descriptor can be used to access the file and advance r/w offset

    • new open file entry is created by open(2) for regular files, and various special purpose calls for other types: socket(2), pipe(2), etc.

System Calls for File Descriptor Manipulation

  • close(fd):
    • clear entry in file descriptor table, decrement refcount in open file table
    • if zero, deallocate entry in open file table and decrement refcount in vnode table
    • if zero, deallocate entry in vnode table and close underlying object
    • for certain objects (pipes, socket), closing the underlying object has important side effects that occur only if all file descriptors referring to it have been closed
  • dup(int fd): create a new file descriptor referring to the same open file table entry as file descriptor fd, increment refcount; returns lowest available (unused) file descriptor number
  • dup2(int fromfd, int tofd): if tofd is open, close it. Then, assign tofd to the same open file entry as fromfd (as in dup(), increment refcount
  • On fork(), the child inherits a copy of the parent’s file descriptor table (and the reference count of each open file table entries is incremented)
  • On exit() (or abnormal termination), all entries are closed

I/O Safety in Rust

  • Because file descriptors are integers, direct manipulation of file descriptors can be dangerous. In Rust, they are called raw filedescriptors, see RawFd.

    pub type RawFd = c_int;
    
    
    

    A RawFd is merely another name for what a C compiler would call int - typically i32.

  • Sources of error are similar to those when manipulating pointers directly:

    • closing an already closed file descriptor (=double free)
    • invoking operations on invalid file descriptors (=dereferencing uninitialized pointers)
    • forgetting to close a file descriptor in a long-running program (=leaking memory)
  • Rust RFC 3128 introduces I/O Safety which applies Rust’s resource management ideas to file descriptors.

Unix Pipes

A Unix pipe is a FIFO, bounded buffer that provides the abstraction of a unidirectional stream of bytes flowing from writer to reader.

  • Writers:
    • can store data in the pipe as long as there is space
    • block if pipe is full until reader drains pipe
  • Readers:
    • drains pipe by reading from it
    • if empty, blocks until writer writes data

Unix Pipes (cont’d)

  • Pipes provide a classic “bounded buffer” abstraction that
    • is safe: no race conditions, no shared memory, handled by kernel
    • provides flow control that automatically regulates CPU scheduling
    • created unnamed - no need to agree on names before hand
    • file descriptor table entries provide for automatic cleanup upon close(), but also when processes that have gained access terminate.
    • is heavily optimized under the hood by the kernel