GPU Programming 2: Parallel Primer

Module 3 of the Introduction To Concurrent Programming does a primer on parallel programming in Python and C++. How useful this section is will really depend on how much the next two modules build on top of this one.

What I liked most about it is that I didn’t know about barriers before, it looks like a fun approach to managing a large number of threads at once. However since you can’t change the number of parties (even when nothing is waiting) it feels like it’ll have limited use.

The two assignments were pitched as “you have N ticketed jobs, make sure that each job runs in the correct order.” With the classes they gave a spotlight on and the way the assignment was set up it felt like the solutions the lesson plan pushed you towards either had issues with livelocking (so continuously re-trying) or had space requirements that scaled with the number of threads. Both of those are the biggest weaknesses of using futures, as an example. So instead I used the lesson as a jumping off point to find the best solution: what’s most space efficient, minimizes livelocking, and is most maintainable. Here’s a self-contained Python approach I came up with:

The lesson didn’t discuss conditional variables at all but this felt like the best approach. This also is easily convertible to C++ since the std::condition_variable and std::lock apis are almost the same.

Space Efficiency O(1). The goal was to create new threads, so besides those threads we create a small group of variables that we continuously modify.
Livelocking Eliminated. While each thread does get woken up whenever we increment (via using notify_all()) we’re guaranteed to only check once per thread per increment.
Maintainability I can easily see creating an abstract function in ManagedThread to represent the actual task and making an addThread() function on the ThreadManager to allow us to add new threads over time. I’d actually change our ManagedThread to some type of task class then to get people to avoid submitting long lived loops and adding a timeout to the wait_for() call. Then other engineers will not have to modify the core classes at all.

With that, I’ll be able to jump into the actual CUDA part of the course!

Reach For Realities

GPU Programming 2: Parallel Primer

Leave a Reply Cancel reply