Exercise 3
Submission
Submission deadline: 11:59 pm, Thursday, October 31st 2024.
Submission method: Run the included submit.py
script in your root exercise repository.
You will need Python and Requests installed. If you do not have Python and/or Requests, you can either:
Part 1
Summary
- Implement a serial counting codes implementation in Rust.
- Implement a parallel version using scoped threads and an atomic variable.
- Implement a parallel version using regular threads and thread handles.
-
Implement a parallel version using task parallelism using the
rayon
crate. -
Benchmark your implementations using
hyperfine
.
Details
- Clone the exercise repository.
-
In the
part-1-countingcodes
folder, implement the following programs in the following files:-
In
src/main.rs
, implement the serial version -
In
src/bin/scope_atomic.rs
implement the scoped version -
In
src/bin/thread_handle.rs
implement the regular thread version -
In
src/bin/task_parallel.rs
implement the task parallelism version
-
In
Serial Implementation
You will implement a single-threaded Rust solution for Counting Codes.
You will not need to come up with a solution to the problem; reference solutions in C and C++ are given in the reference
folder.
You should compile, test, and benchmark the binaries like you would for your Rust solutions.
$ gcc -O3 reference/countingcodes.c -o countingcodes_c$ g++ -O3 reference/countingcodes.cpp -o countingcodes_cpp
The reference solution given uses a recursive, brute-force, backtracking algorithm. You do not need to modify the algorithm used; a 1-1 translation of the code will suffice.
Observe the following hints:
- You will find it helpful to structure your Rust solution similar to the C++ solution.
-
Use equivalent Rust types for the C++ data structures, eg.
Vec
forstd::vector
.
Scoped and Atomic
You will implement a multi-threaded Rust solution for Counting Codes. Read the following carefully:
-
The provided algorithm computes the solution through brute-force:
For each possible digit
d
from 1 to 9,for (int d = 1; d <= 9; d++) {
the digit is placed onto the grid if possible,
inrow[i][d] = true; B[i][j] = d;
then the count with the placed digit is computed through recursion
cnt += backtrack(i, j+1);
-
A simple multi-threaded solution will be to spawn a new thread for each recursive call.
This requires cloning the data structures used (with the placed digit), then spawning a thread that takes ownership of the new data structure to solve.
-
Your solution needs to keep track of total count generated by all of the threads spawned.
Use an atomic integer for this purpose. The atomic can be safely shared and passed to all threads spawned.
-
Your solution needs to know when all threads have finished brute-forcing their subtask.
This means all threads should be spawned in a
std::thread::scope
. The scope should be created in yourmain
function, and the first thread (running the task representing the entire problem) is spawned in the scope. Since a scope implicitly joins all threads spawned in the scope before returning, after the scope ends the value of the atomic integer will be the final count (the solution). -
Your solution should limit the number of threads spawned to avoid the overhead associated with creating too many threads.
Use an integer to track the current recursion depth; fallback to a regular recursive call once the depth reaches a certain level.
Regular Threads
Implement a multi-threaded solution that uses regular threads that should provide a speedup over the scoped atomic version. Read the following carefully:
-
Notice how after spawning off all tasks (or finishing all recursive calls), the thread currently in
backtrack
has no more work to do. - Therefore, a simplification of your multi-threaded solution would be to spawn regular threads when spawning new tasks, placing their handles into a vector.
-
After spawning all tasks (or finishing all recursive calls), the thread should
join
all handles to obtain the final counts from the tasks spawned. -
The
backtrack
function can then return the count directly, similar to the serial implementation.
Task Parallelism
The rayon
crate provides high-level abstractions for data parallelism, which we will use to parallelize the tasks the algorithm performs.
This is primarily achieved with parallel iterators, which you will use in this version of your Counting Codes solution.
Read the following carefully:
- From your Regular Threads implementation, a thread is spawned for each task, with the handles to each thread placed in a vector.
- This version should instead place the task itself into the vector.
- After all tasks are created, use a parallel iterator provided by rayon to run all tasks in parallel, summing the counts returned by the tasks.
Testing
Test inputs and expected outputs are provided in the tests/
directory.
For each test, the input is provided in the *.in
file, and the expected output is provided in the *.ans
file.
To run all the tests against a particular binary, run the provided test_countingcodes.py
script.
Benchmark
You will create a writeup called writeup.md
in the part-1-countingcodes
directory.
The performance metric you will measure is the execution time of the programs.
You will use the command-line tool hyperfine to perform the benchmark. A local copy of the tool on rlogin is available at ~cs3214/bin/hyperfine
. Read the output of hyperfine --help
or the README of the project to learn how to use the tool.
Your writeup should contain at least the following:
- Statistics from the execution of all 4 implementations of your Counting Codes solution.
- Analysis of the difference in execution time between your solutions. You may find the flamegraph utility helpful for supporting your claims.
Requirements
- You must benchmark the release (optimized) version of your implementations.
-
You should benchmark with the
4x4_zero
test, as it runs long enough to prevent noise yet short enough as to not require a coffee break each run. - Your serial implementation must be at most 1.5x slower than the provided C and C++ implementations.
- Your slowest parallel implementation must be at least 2x faster than your serial implementation.
Part 2
You will implement a multi-producer, multi-consumer channel in safe Rust.
Details
- You should already have the exercise repository cloned.
-
In the
part-2-channel
folder, fill in thetodo!()
statements. -
Implement a multi-consumer version of
std::sync::mpsc
, with a similar API. Read the documentation included for the corresponding function and types instd::sync::mpsc
, as well as the tests to figure out the details of your implementation.
Testing
An integration test is provided in the tests
directory.
To run the tests, run cargo test
(or if you have nextest installed, cargo nextest run
).
Requirements and Hints
- You must ensure you do not publicly expose any unnecessary (internal) fields and methods. Hint: See the documentation on visiblity keywords.
-
Use a
std::collections::VecDeque
for efficient pushing and popping from two ends of the queue. - Hint: use a mutex to protect access to the underlying shared queue.
-
The
Receiver
must not busy-wait when attempting to retrieve an item from the queue. Hint: Use a condition variable. -
You cannot use the
std::sync::mpsc
library in your solution; use theVecDeque
previously mentioned and standard library concurrency primitives.