Dive Into Systems

14.9. Exercises

Implement the entirety of the scalar_multiply program. Time your code using the gettimeofday() function and 100 millions elements. How does the time of the program vary as you increase the number of threads? What if you increase the number of elements to 1 billion? 2 billion?
Improve the original scalar multiply threaded function by placing all the arguments into a struct and passing it through main. Time the performance of this version of the code. Is there any difference? (solution)
Improve the scalar_multiply threaded function by implementing a better load balancing procedure. In other words, implement the load balancing procedure in the note above.
Using what you have learned, try implementing a program that performs matrix vector multiplication. In matrix vector multiplication, each row in the matrix is multiplied by some vector of elements.

Implement a parallel version of the Step 2 of the CountSort algorithm. Time your performance.
Try combining Step 1 and Step 2 of the CountSort program into a single program. To do this, you will need to add another cycle of pthread_create() and pthread_join() to your program.
Time the total performance of the new CountSort program.

OpenMP: The writeElems() function makes the assumption that the user only inputs a number of threads less that MAX. Is there a way to rewrite this code so that it will work, regardless of the number of threads?