An Introduction To Parallel Systems
This is a short seriesof talks about parallel programming I gave for BCCS. I'll try to post the slides for each talk, plus any appropriate links.
Lecture One - Who, What, Why, Where, When?
Lecture Two - Data Parallelism and Vector Processors
- GCC has built in functions and data attributes for working with SIMD instructions, see GCC manual 5.42 and the relevant architecture in GCC manual 5.47. Alternatively, use the autovectorisation support.
- The Clearspeed CSX600 is the SIMD processor refered to in the slides.
- Versions of GCC from 4.2.0 onwards have support for OpenMP as described here. One application of this is in creating a parallel version of the C++ standard library. The "examples" appendix of the standard is particularly good an illustrates not only what the basic concepts are but also how they are used.
- Diego Novillo's slides from the RedHat summit 2006 cover both autovectorisation and OpenMP.
- BOINC is one of the most popular massively parallel platforms; it currently powers SETI@home. Folding@home is one of the largest massively parallel networks. The first of these systems was distributed.net.
Lecture 3 - Message Passing Systems
Links for further reference on some of the topics covered:
- The MPI standard(s). At the time of writing, 1.2 is widely implemented / in use and 2.0 is beginning to be adopted.
- MPICH2 is one of the popular, free implementations of MPI. Note that the MPICH2 install guide gives a very good, step by step guide of how to set up an MPI environment on a network of machines that share their /home.
- Most modern process calculus systems are descendants of Robin Milner's CCS (Calculus of Communicating Systems) or Tony Hoare's CSP (Communicating Sequential Processes (The author's 1985 book of the same name may also be of interest). I'd also strong recommend the Synchronous Calculus of Respource and Process (SCRaPs); although I'm biased on the issues...
- Model cheking tools take a process calculus model of a system and seek to verify properties of it (for example, showing that it doesn't deadlock). SPIN is one of the best known model checker.
- libnuma a system for better exploiting NUMA architectures on Linus is decsribed in the paper An NUMA API for Linux.
Lecture 4 - Shared Resource Parallelism
Links for further reference on some of the topics covered:
- Thread local storage GCC Manual 5.51
- The pthreads interface is described in the XSH section of the POSIX standard. Alternatively see your systems's manpages (try apropos pthread).
- GNU Portable Threads is a nice example of a userspace, co-operative (non pre-emptive) threading library.
- Modern GCCs provide built in functions for atomic memory access, see GCC Manual 5.44. Note that the Linux kernel and GNU Libc also have their own macros for atomic operations.
- When considering the assumptions given on slide 16, it may be worth investigating the volatile keyword (and how it is implemented), memory barriers and memory consistancy models and Hans Boehm's paper Threads Cannot be Implemented as a Library.
- One other synchronisation primative of note is the monitor. Also well worth a read is the same author's views on Java's approach to synchronisation Java's insecure parallelism.
- Futexes are a series of userspace and kernel functions used to implement the POSIX synchronisation primatives in the NPTL (The current POSIX thread implementation in Linux/Glibc). They serve as a good illustartion of the basic building blocks used in a wide variety of synchronisation primatives. There are two relevant papers: Fuss, futexes and furwocks: Fast Userlevel Locking in Linux and Futexes Are Tricky which explains how (not) to use them. In short; don't use them directly, use the POSIX primatives, but do have a look at them.
- An example of priority inversion and the problems it can cause where the problems with the Mars pathfinder spacecraft.
- A set of references and pointers (pun intended) on lock free libraries are available here. A number of good papers and implementations of lock free data structures can be found here.
- The ABA problem is described and a solution propesed in this paper.
- The original paper on software transactional memory, an newer version of the idea is contained in a more recent paper by the same authors.
- I forgot to rant about how cool helgrind is; it's part of the rather fantastic valgrind tool set. You may also want to look at the NPTL trace tool.