This project was created during my High-Performance Computing course as part of the Master's degree in Software Engineering on University College Dublin, 2017/2018. I'm uploading it in hope it be may be useful to other students.
The goal of the first assignment is to write C programs implementing the following four algorithms of multiplication of two n×n dense matrices:
- Straightforward non-blocked ijk algorithm.
- Straightforward BLAS algorithm.
- Blocked ijk algorithm using square b×b blocks.
- Blocked kij algorithm using square b×b blocks.
The goal of the second assignment is to write Pthreads, OpenMP and MPI C programs implementing the algorithm of multiplication of two n×n dense matrices on p-processor SMP and calculation of its norm such that:
- p threads/processes are involved in the computations.
- 1-dimensional parallel algorithm of matrix multiplication is employed:
- matrix B is vertically partitioned into p equal slices.
- there is one-to-one mapping between the partitions and the threads.
- each thread is responsible for computation of the corresponding slice of the resulting matrix.
- Computation of the norm of the resulting matrix employs the mutex synchronization mechanism.
- The norm used is the maximum absolute row sum norm (infinity norm).
.
|-- bin/ (compiled binaries)
|-- data/ (txt files with generated data)
|-- src
| |-- lib
| | |-- utils.h
| | |-- matrix.c
| | `-- matrix.h
| |-- mpi.c
| |-- openmp.c
| |-- pthread.c
| |-- blk_seq_ijk.c
| |-- blk_seq_kij.c
| |-- seq_blas.c
| `-- seq_ijk.c
|-- Makefile
|-- random_matrix.py
|-- README.md
|-- test_blocked_seq.sh
|-- test_mpi.sh
|-- test_openmp.sh
|-- test_pthread.sh
|-- test_sequential.sh
`-- test.sh
seq_ijk ............... Sequential ijk multiplication
seq_kij .............. Sequential kij multiplication
seq_blas ............. Sequential BLAS multiplication
blk_seq_ijk ... Blocked sequential ijk multiplication
blk_seq_kij ... Blocked sequential kij multiplication
pthread ...................... Pthreads infinity norm
openmp ......................... OpenMP infinity norm
mpi ............................... MPI infinity norm
The python script random_matrix.py
generates n x m
random floating-point matrix. The scipt is based on uniform
function from package random
from Python's standard library. Script can be used manually, or, alternatively, testdata
recipe from Makefile
can be run to generate random matrices ranging from 8 x 8
to 4096 x 4096
. Makefile
also includes all recipes required to compile
and run the tests. Each compilation plan includes debug variant with verbose message logging. Check lib/utils.h
for details.
Makefile
also includes all parameter configuration for GCC and path to BLAS library (OpenBLAS by default). Makefile
is intended to be run by run.sh
and specific test suites.
CC=gcc
CFLAGS= -Wall -std=gnu99 -g
LIBS=src/lib/matrix.c
TUNE= -O0
OPEN_BLAS_DIR=/opt/OpenBLAS
OPEN_BLAS=-static -I$(OPEN_BLAS_DIR)/include/ -L$(OPEN_BLAS_DIR)/lib -lopenblas -lpthread
seq_blas:
$(CC) $(TUNE) $(CFLAGS) -o bin/seq_blas $(LIBS) src/seq_blas.c $(OPEN_BLAS)
debug_seq_blas:
$(CC) $(TUNE) $(CFLAGS) -DDEBUG -o bin/seq_blas $(LIBS) src/seq_blas.c $(OPEN_BLAS)
...
All binaries are compiled without any compiler optimizations (-O0
option).
This little wrapper script is used as a shorthand when debugging. It accepts the name of the program to be compiled, matrix dimensions and optional specific arguments (such as block-size) and run the test. Run ./run.sh
for help and available options.
$ make testdata
$ ./run.sh seq_ijk 8 8 8 --debug
Hardware statistics .................
CPU(s): 8
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Compiling debug seq_ijk .............
Running tests .......................
test_sequential
- Tests performance of straightforward ijk, blocked ijk/kij and BLAS routines in matrix multiplication.test_block_size_dependence
- Tests dependence of the blocked ijk/kij algorithms on the block-size.test_mpi
- Tests MPI performance in matrix norm calculation, 4 processes.test_openmp
- Tests OpenMP performance in matrix norm calculation.test_pthread
- Tests Pthreads performance in matrix norm calculation.test_mpi_np_dependence
- Tests MPI performance against number of processes.