This view classifies PDC unplugged activities by their classification according to the NSF/IEEE-TCPP Curriculum Initiative on Parallel
and Distributed Computing (TCPP 2012) which aims to articulate the core PDC areas that an undergraduate computer science student should cover in the course
of their undergraduate education. The TCPP report informed the creation of the PD topic area in CS2013, and lists several cross-cutting topics. For the immediate, our
focus is on identifying unplugged activities that satisfy each of topics suggested for Core courses (CS1, CS2, Systems and DS/A); Advanced/Elective courses are not
covered. Bloom's Taxonomy is used by TCPP to help solidify the level of coverage of each topic:
Each topic is preceded by one of these terms. Use the Bloom taxonomy classification to gauge the level of topic coverage in a course.
Topic | Suggested Course | Unplugged Activities |
Comprehend Parallel Taxonomy: Flynn's taxonomy, data vs. control parallelism, shared/distributed memory | Systems |
DesertIslandsAnalogy;
SieveOfErastothenes;
See All (2)
|
Know Superscalar (ILP): Describe opportunities for multiple instruction issue and execution (different instructions on different data) | Systems | |
Know SIMD/Vector (e.g., AVX, GPU, TPU): Describe uses of SIMD/Vector (same operation on multiple data items), e.g., accelerating graphics for games. | Systems |
CoinFlip;
See All (1)
|
Know SIMD/Vector energy effects: Saves energy by sharing one instruction over many instruction executions, whether in parallel (SIMD) or pipelined (vector) | Systems |
|
Comprehend Pipelines (Basic Structure and Organization): Describe basic pipelining process (multiple instructions can execute at the same time), describe stages of instruction execution | Systems |
KitchenAnalogy;
PipelineSort;
See All (2)
|
Know Pipelines (Data and control hazards): Show examples of how one pipe stage can depend on a result from another, or delayed branch resolution can start the wrong instructions in a pipe that can reduce performance | Systems |
|
Know Pipeline Streams (e.g., GPU): Know that stream-based architecture exists in GPUs for graphics | Systems |
PipelineSort;
See All (1)
|
Know MIMD: Identify MIMD instances in practice (multicore, cluster, e.g.), and know the difference between execution of tasks and threads | Systems |
CoinFlip;
JigsawPuzzle;
See All (4)
|
Know MultiThreading: Distinguish multithreading from multicore (based on which resources are shared) | Systems | |
Comprehend Multicore: Describe how cores share resources (cache, memory) and resolve conflicts | Systems |
JigsawPuzzle;
MatrixAddition;
See All (2)
|
Know Heterogeneous (e.g., GPU, security, AI, low-power cores): Recognize that multicore may not all be the same kind of core (mix of organizations, instruction sets, high level explanation of benefits and costs) | Systems |
DesertIslandsAnalogy;
See All (1)
|
Know Heterogeneous Architecture Energy (small vs. big cores, CPU vs. GPU, accelerators): Know that heterogeneity saves energy by using power-efficient cores when there is sufficient parallelism | Systems |
|
Know Symmetric Multi-Processor (SMP): Explain concept of uniform access shared memory architecture | Systems |
|
Comprehend Buses for shared-memory implementation: Be able to explain issues related to shared memory being a single resource, limited bandwidth and latency, snooping, and scalability | Systems |
MatrixAddition;
See All (1)
|
Know Distributed Memory Latency: Know the concept, implications for scaling, impact on work/communication ratio to achieve speedup. Awareness of aspects of latency in parallel systems and networks that contribute to the idea of time accumulating over dependent parts or distance | Systems |
ParallelAddition;
LongDistancePhoneCall;
See All (3)
|
Know Distributed Memory Bandwidth: Know the concept, how it limits sharing, and considerations of data movement cost | Systems |
LongDistancePhoneCall;
See All (1)
|
Comprehend Caching Know the cache hierarchies, and that shared caches (vs. private caches) result in coherency and performance issues for software | Systems |
KitchenAnalogy;
See All (1)
|
Know Atomicity: CS2: Show awareness of the significance of thread-safe data structures in libraries.
Systems: Explain the need for atomic operations and their use in synchronization, both in local and distributed contexts.
| CS2/Systems | |
Know Interrupts and event handling: CS2: know that event handling is an aspect of graphical user interfaces.
Systems: know that I/O is mostly interrupt-driven.
| CS2/Systems | |
Know Handshaking: Know the difference between synchronous and asynchronous communication protocols, including context of multiple processors.
| Systems | |
Know Process ID, System/Node ID
: Explain need for a process identifier to enable interprocess communication, and for a node identifier in initialization of multiprocessor systems
| Systems |
StabalizingLeaderElection;
See All (1)
|
Know Floating Point Range: Understand that range is limited, implications of infinities | CS1/CS2/Systems | |
Know Floating Point Precision: How single and double precision floating point numbers impact software performance, basic ways to avoid loss of precision in calculations | CS1/CS2/Systems | |
Know Floating Point Error Propagation: Understand NaN, Infinity values and how they affect computations and exception handling | CS2 | |
Know Floating Point IEEE 754 standard: Representation, range, precision, rounding, NaN, infinities, subnormals, comparison, effects of casting to other types | CS1/CS2/Systems | |
Comprehend Instructions Per Cycle: Explain pros and cons of IPC as a performance metric, various pipelined implementations | Systems | |
Know Performance Benchmarks: Awareness of various benchmarks (such as LinPack, NAS parallel) and how they test different aspects of performance in parallel systems | Systems | |
Comprehend Peak Performance: Understanding peak performance, how it is rarely valid for estimating real performance, illustrate fallacies | Systems | |
Know MIPS/FLOPS: Know meaning of terms | Systems | |
Comprehend Peak Vs. Sustained Performance: Know difference between peak and sustained performance, how to define, measure, different benchmarks | Systems | |
Comprehend Power, Energy: Be able to explain difference between power (rate of energy usage) and energy | Systems | |
Know Hardware limitations for data bound computation: Comprehend the hardware limitations in the context of Big data and their most relevant v's (volume, velocity) | Systems | |
Know Pressure imposed by data volume: Know of the limitations in terms of storage, memories, and filesystems when dealing with very large data volumes | Systems | |
Know Pressure imposed by data velocity: Know of the limitations in bandwidth, memory, and processing when processing very fast data streams. Include networking and reliability issues | Systems | |
Know Cost of data movement across memory hierarchies: Comprehend the difference in speed vs cost (latency, energy) of accessing the different memory hierarchies, and of changing their sizes, in the context of multiple processors and hierarchies | Systems | |
Topic | Suggested Course | Unplugged Activities |
Comprehend Concurrency and Parallelism: Understand concurrency is an algorithmic property; it exposes potential for parallelization. If concurrency is present in an algorithm, it can be parallelized, without concurrency there is no scope for parallelization. Concurrency can be present in a sequential program, parallelization takes advantage of concurrency to increase performance. | CS1/CS2/DSA |
SweetenJuice;
PBJinParallel;
See All (2) |
Know SIMD: Understand common vector operations including element-by-element operations and reductions | CS2/Systems | |
Know SIMD (Process vector extensions) : Know examples - e.g., Intel AVX or Power VSX macros | Systems | |
Apply Shared Memory: Be able to write correct thread-based programs (protecting shared data) and understand how to obtain speed up. | CS2/DSA |
ArrayAddition;
See All (1)
|
Know Shared Memory Language Extensions: Know about language extensions for parallel programming. Illustrate with examples from Cilk (spawn/join), Java (Java threads), or other languages. | CS2/DSA |
FlowerJoinAnalogy;
See All (1)
|
Comprehend Shared Memory Compiler Directives: Understand what simple directives, such as those of OpenMP, mean (parallel for, concurrent section), pragmas show examples | CS2/DSA | |
Comprehend Shared Memory Libraries: Know one in detail, and know of the existence of some other example libraries such as Pthreads, Pfunc, Intel's TBB (Thread building blocks), Microsoft's TPL (Task Parallel Library), C++ threads, etc. | CS2/DSA | |
Know Distributed Memory: Know basic notions of messaging among processes, different ways of message passing, collective operations | Systems/DSA |
MessagePassingAcrobats;
ByzantineGenerals;
See All (5)
|
Comprehend Client Server and Peer to Peer models: Know notions of invoking and providing services (e.g., RPC, RMI, web services) - understand these as concurrent processes; know about network model of distributed computing (e.g., sockets); know that in distributed systems such handshaking interaction is crucial for the efficient communication between asynchronous processes | Systems | |
Know Hybrid: Know the notion of programming over multiple classes of machines simultaneously (CPU, GPU, TPU, etc.) | Systems |
DesertIslandsAnalogy;
MatrixAddition;
See All (2)
|
Apply Task/thread spawning: Be able to write correct programs with threads, synchronize (fork-join, producer/consumer, master/worker, etc.), use dynamic threads (in number and possibly recursively) thread creation - (e.g., Pthreads, Java threads, etc.) - builds on shared memory topic above | CS2/DSA | |
Know Event-Driven Execution: Know about the need for event-driven execution; possible approaches to implementing it. Know about the notion of causality among events (e.g., remote file access, GUI). These effects may be easier to discuss in the context of distributed systems. | CS2 | |
Comprehend SPMD: Understand how SPMD program is written and how it executes | CS2 |
DesertIslandsAnalogy;
See All (1)
|
Comprehend SPMD Notations: Know the existence of highly threaded data parallel notations (e.g., CUDA, OpenCL), message passing (e.g., MPI), and some others (e.g., Global Arrays, BSP library) | CS2/DSA | |
Apply Data parallel: Be able to write a correct data-parallel program for shared-memory machines and get speedup, should do an exercise. | CS2/DSA |
ArrayAddition;
SieveOfErastothenes;
See All (3)
|
Apply Parallel loops for shared memory: Know, through an example, one way to implement parallel loops, understand collision/dependencies across iterations (e.g., OpenMP, Intel's TBB) for shared memory | CS2/DSA | |
Know Tasks and threads: Understand what it means to create and assign work to threads/processes in a parallel program, and know of at least one way do that (e.g., OpenMP, Intel TBB, etc.) (0.5 hours) | CS2/DSA/Systems/ProgLang |
MoreProcessorsNotAlwaysBetter;
CompanyAnalogy;
See All (2)
|
Know MapReduce: Understand how problems can be solved by mapreduce, and how algorithms can be written using map and reduce | CS2 | |
Know Offloading to accelerators: Know about running parts of applications on accelerators (e.g., GPU, TPU, FPGA) | CS2/Systems | |
Apply Tasks and Threads: Be able to write parallel programs that create and assign work to threads/processes, in at least one parallel environment (e.g., OpenMP, Intel TBB, pthreads, etc.) | CS2/DSA/Systems | |
Apply Synchronization: Be able to write shared memory programs with critical regions, producer- consumer communication, and get speedup; know the notions of mechanisms for concurrency (monitors, semaphores, etc.) | CS2/DSA/Systems | |
Apply Critical regions: Be able to write shared memory programs that use critical regions for synchronization | CS2/DSA/Systems |
ArrayAddition;
SweetenJuice;
See All (2)
|
Apply Producer-consumer: Be able to write shared memory programs that use the producer-consumer pattern to share data and synchronize threads | CS2/DSA/Systems | |
Comprehend Concurrency Issues: Understand the notions of deadlock (detection, prevention), race conditions (definition), determinacy/non-determinacy in parallel programs (e.g., if there is a race condition, the correctness of the output may depend on the order of execution) | DSA/Systems |
PBJinParallel;
See All (1)
|
Comprehend Deadlock/Livelock: Understand what deadlock and livelock are, and methods for detecting and preventing them; also cast in terms of distributed systems | DSA/Systems |
PBJinParallel;
OrangeGame;
See All (2)
|
Comprehend Starvation: Understand how starvation (of a thread or process) can occur, in context of an example (e.g., dining philosophers) | DSA/Systems | |
Comprehend Race Condition: Know what a race condition is, and how to use synchronization to prevent it | DSA/Systems |
SurvivorAnalogy;
ArrayAddition;
See All (6)
|
Comprehend Computation Decomposition Strategies: Understand different ways to assign computations to threads or processes | CS2/DSA |
PlantingTrees;
StuffingEnvelopes;
See All (7)
|
Comprehend Owner Computes Rule: Understand how to assign loop iterations to threads based on which thread/process owns the data element(s) written in an iteration | CS2/DSA | |
Comprehend Decomposition into Atomic Tasks: Understand how to decompose computations into tasks with communication only at the beginning and end of each task, and assign them to threads/processes | CS2/DSA | |
Know Decomposition into Map and Reduce tasks: Know that divide and conquer can be expressed and programmed as a Map and Reduce decomposition | CS2 | |
Comprehend Load balancing: Understand the effects of load imbalances on performance, and ways to balance load across threads or processes (1 hour) | DSA/Systems |
PlantingTrees;
FindOldestPenny;
See All (5)
|
Comprehend Scheduling and mapping: Understand the importance of a programmer, compiler and/or runtime system mapping and scheduling computations to threads/processes, both statically and dynamically | DSA/Systems |
JigsawPuzzle;
CompanyAnalogy;
See All (2)
|
Know Effect of timing failures/delay in distributed systems: Understand that a failure in one node can cause a global failure in a distributed system. For example, one could use waiting on a non-terminating program to illustrate a failure scenario in distributed systems (e.g., in the context of consensus). | CS2 | |
Know Data: Understand impact of data distribution, layout and locality on performance | DSA |
JigsawPuzzle;
See All (1)
|
Know Data layout: Know how to lay out data in memory to get improve performance (memory hierarch in shared memory parallel system) | DSA/Systems |
KitchenAnalogy;
See All (1)
|
Know Data False sharing: Know that for cache coherent shared memory systems, data is kept coherent in blocks, not individual words, and how to avoid false sharing across threads of data for a block | DSA/ProgLang |
KitchenAnalogy;
See All (1)
|
Know Data Energy Impact: Know the energy cost of loading data into memory from secondary storage (and writing out modified data to secondary storage) | Systems | |
Know Data Representation:
Power savings using smaller data representation 64-bit vs. 32-bit vs. 16-bit floating-point precision, 32-bit vs. 16-bit integer). For example, machine learning on GPUs is driving lower (16-bit) floating-point precision. | DSA/Systems | |
Know Data locality: Know what spatial and temporal locality are, and how to organize data to take advantage of them | DSA/ProgLang |
KitchenAnalogy;
See All (1)
|
Know Performance impact of data movement: Know the performance cost of moving data to secondary storage for big data analysis, distributed vs centralized analysis. Be aware of specific mechanisms that take advantage of locality (e.g., in-situ processing) | Systems | |
Know Structured vs unstructured data: Know the differences and tradeoffs between these data representations | DSA | |
Know Graph representations and databases: Know of graph representations of data to facilitate graph analysis | DSA | |
Know Distributed file systems: Be aware of existence of distributed file systems and common examples of where they are used and why | Systems | |
Know Performance Tools: Know of tools for runtime monitoring (e.g., gprof, perf, Intel Performance Toolkit, TAU) | DSA/Systems | |
Comprehend Speedup: Understand how to compute speedup, and what it means | CS2/DSA |
ParallelAddition;
FindSmallestCard;
See All (4)
|
Comprehend Efficiency: Understand how to compute efficiency, and why it matters | CS2/DSA | |
Comprehend Parallel Scalability: Understand that speedup and efficiency is a single point of measure for a particular problem size and number of processes/threads. These metrics change as problem size and/or number of processes/threads vary. Understand that scalability is a metric that measures how speedup varies as problem size and/or number of processes/threads vary. | CS2/DSA | |
Know Amdahl's Law: Know that speedup is limited by the sequential portion of a parallel program, if problem size is kept fixed | CS2/DSA |
ParallelAddition;
MoreProcessorsNotAlwaysBetter;
See All (3)
|
Know Gustafson’s Law: Understand the idea of weak scaling, where problem size increases as the number of processes/threads increases | CS2/DSA | |
Know Power-latency tradeoff: Familiar with the notion that problem decompositions (including their granularity), and active/idle states (e.g., including modulation of CPU frequency) may be exploited to adjust balance among throughput, latency, and energy consumption. | Systems | |
Know Energy efficiency vs. load balancing: Aware that unbalanced work decomposition and communication congestion can prolong computation and reduce energy efficiency. | Systems | |
Topic | Suggested Course | Unplugged Activities |
Comprehend Concurrency, Asynchrony, Dependencies, and Nondeterminism: Qualitatively understand the notion of concurrency, asynchrony, dependencies, and nondeterminism through one or more every day examples illustrating simultaneous events with dependencies. The examples can be non-computational in CS1; for example, preheating the oven and preparing the batter for a cake can proceed concurrently and asynchronously to save time, but both these events must finish before baking starts (dependency). Computational examples of the appropriate level can be used in CS2. The goal for the instructor is to develop a baseline of ideas upon which to build the PDC concepts. | CS1/CS2 |
StabalizingLeaderElection;
ParallelGarbageCollection;
See All (4)
|
Comprehend Asymptotics: Be able to describe upper (big-O) and lower bounds (big- Omega,) in the context of PDC, understanding that the functions whose order is being analyzed may have an additional variable related to the number of processing elements in the PDC context. | DSA/CS2 | |
Comprehend Time: Recognize time as a fundamental computational resource that can be influenced by parallelism. | DSA |
PlantingTrees;
ParallelAddition;
See All (6)
|
Comprehend Work: Understand the definition of computational work and how it is different from time. Be able to observe its impact on complexity measures such as time, speedup, energy consumption, etc. | DSA | |
Comprehend Space/Memory: Recognize space/memory in the same manner as time | DSA | |
Comprehend Memory and Communication complexity: Understand that data movement (such as memory accesses or communication) may take more wall-time and energy than computations. Understand also that in certain distributed "big data" applications, moving data is cost prohibitive. | DSA | |
Comprehend Speedup (Scalability): Recognize the use of parallelism either to solve a given problem instance faster or to solve larger instance in the same time. Understand and be able to explain why there's an upper limit on speedup for a given problem of a given size (Amdah's law).
| DSA |
ParallelAddition;
FindSmallestCard;
See All (8)
|
Know Efficiency, Scalability, Throughput: Know about the notions of efficiency, strong and weak scaling, and throughput. | DSA |
|
Know Model(s) to abstract from architectural details (for example PRAM (shared mem) and/or completely connected (network)): Understand concurrency basics without the trappings of real systems (routing, data alignment etc.). Recognize the PRAM as embodying the simplest forms of parallel computation: Embarrassingly parallel problems can be sped up easily just by employing many processors. Recognize how a completely connected network abstracts away from routing details. Recognize the difference between the model(s) and real systems. Such a MODEL of CHOICE (MoC) is assumed to be adopted by the instructor on which PDC concepts would be discussed. | DSA | |
Know Notions from scheduling: Understand how to decompose a problem into tasks | DSA |
StuffingEnvelopes;
See All (1) |
Apply Dependencies: Understand how dependencies constrain the execution order of sub-computations (thereby lifting one from the limited domain of “embarrassing parallelism” to more complex computational structures) and how deleterious race conditions can occur when dependencies among concurrent tasks are not respected. Also understand that there are situations where not respecting the order of certain computations may result in nondeterministic, but acceptable results. | CS1/CS2/DSA |
PlantingTrees;
StuffingEnvelopes;
See All (6) |
Comprehend Task Graphs: See multiple examples of this concrete algorithmic abstraction as a mechanism for exposing inter-task dependencies. These graphs, which are used also in compiler analyses, form the level at which parallelism is exposed and exploited | DSA/SWEng | |
Know Make/Span: Observe analyses in which makespan is identified with parallel time (basically, time to completion) | DSA | |
Apply Divide & Conquer (parallel aspects):
Recognize that the same structure that enables divide and conquer (sequential) algorithms exposes opportunities for parallel computation. Examples include mergesort or numerical integration (trapezoid rule, Simpson’s rule) or (at a more advanced level) Strassen's matrix-multiply.
| CS2/DSA |
ParallelAddition;
FindSmallestCard;
See All (3)
|
Know Load Balancing: Understand that processors equitably sharing the computational/communication load among processors benefits the entire algorithm. That is, the most stretched processor determines the performance of the entire algorithm. Use a simple task graph (for example) with a small number of processors to illustrate the idea. | DSA/CS2 |
PlantingTrees;
SieveOfErastothenes;
See All (2)
|
Comprehend Reduction: Recognize and use the tree structure implicit in applications such as scalar product, mergesort, histogram, mapreduce, etc. | DSA | |
Apply Synchronization: Recognize that synchronization is necessary for certain algorithms to work correctly. Also recognize that synchronization should be used only when needed as it entails its own overheads. | CS1/CS2/DSA |
SweetenJuice;
PBJinParallel;
See All (2)
|
Know Parallel Prefix (Scan): Observe, via examples this "high-level" algorithmic tool. For instance, polynomial evaluation can be done as a prefix product (of powers of x), then a set of multiplications, followed by a reduction. | DSA | |
Know Other multi-party communication patterns: Recognize common multi-party communication patterns such as broadcast, gather, scatter, all-to-all or many-to-many communications, and their use as building blocks for parallel and distributed algorithms. Illustrate using block matrix transpose or shuffle from map-reduce, etc. | CS2/DSA | |
Comprehend Mutual Exclusion and Conflict Resolution:
Understand the need to resolve conflicts among concurrent processes competing for a shared resource. Here the computation may have to grant exclusive access to one process to ensure correctness and/or progress. Be able to identify and mitigate problems due to races. Instructors may provide examples such as (a) selecting an airline seat that may be simultaneously competed for by several passengers, (b) selecting which customer gets an item when more than one tries to buy it simultaneously, (c) mutual exclusion in the context of Java threads, (d) Dining philosophers.
| CS2/DSA |
SweetenJuice;
See All (1)
|
Comprehend Reduction and Broadcast for communication and synchronization: Understand, for example, how recursive doubling can be used for all-to-one reduction, and its dual, one-to-all reduction, in log(p) steps. The same applies to all-to-all broadcast and all-to-all reduction. Be aware of the synonyms for these operations in the jargon associated with different areas; for example, all-to-all broadcast may be referred to as "gossip" or "total exchange". Recognize that all-to-all broadcast/reduction are synchronizing operations in a distributed (event-driven) environment. | DSA | |
Comprehend Parallel Prefix (Scan): Understand the structure of at least one simple parallel prefix algorithm, for example, on a PRAM-type model. One could consider recursive or iterative approaches (such as those of Ladner-Fischer, Kogge-Stone, Brent-Kung) | DSA | |
Know Termination detection: Observe that, unlike the sequential case, processes in parallel and distributed algorithms may not know when the problem has been solved, or even if their part of the problem is solved; so termination has to be addressed explicitly. In some cases (such as reduction, tree algorithms, divide and conquer) it may be possible for a process to terminate on the basis of local information (for example, a node has passed its information to its parent in a reduction tree). In other cases, a global check may be necessary. | DSA | |
Know Sorting: Observe several sorting algorithms for varied platforms --- together with analyses. Parallel merge sort is the simplest example, but equally simple alternatives for rings and meshes might be covered also; more sophisticated algorithms might be covered in more advanced courses | CS2/DSA |
NondeterministicSorting;
PipelineSort;
See All (6)
|
Know Search: With the help of BFS- or DFS-like parallel search in a tree, graph or solution space, understand speedup anomalies and the fact that certain algorithms don't lend themselves to parallelization without modifying the semantics of the original problem. | DSA |
ParallelGarbageCollection;
See All (1)
|
Know Algorithms for streams:
Comprehend the notion of efficient algorithms (e.g., Bloom filters, heavy hitters) and structures (e.g., distributed hash tables) for stream data, and the difficulty of dealing with limited space. | CS1/DSA | |
Apply Deeper Algorithmic Experience:
Experience through class instruction and assignment/project the design, analysis, and implementation aspects of at least one parallel or distributed algorithm of choice in detail. Master PDC algorithmic concepts through a detailed exploration, including recognizing how algorithm design reflects the structure of the computational problem(s) and the PDC model/environment. Possible computational problems to explore include matrix product, map reduce, sorting, search, convolution, a graph algorithm of your choice.
| CS2/DSA |
SieveOfErastothenes;
MatrixAddition;
See All (2)
|