Task parallelism (also called "coarse-grained parallelism") is a
powerful method to carry out multiple independent calculations in
parallel, such as Monte Carlo simulations, or "un-rolling" serial
FOR loops. These calculations are sometimes referred to as “embarrassingly
parallel.” Star-P's “ppeval” construct is a function call which
executes your code in parallel on the server.
|
serial processing on one processor
|
|
parallel processing on four processors
|
With Star-P, there is no need to worry about the details of parallelization
- Star-P takes care of distributing the data across the processors,
executing the computations and gathering the results at the end
of a computation. "ppeval" is reminiscent of MATLAB®'s "feval", except
that the function specified and all required data are operated on
on the parallel server. Consider the example of processing a stack
of MRI brain scans. Let's assume we require running an SVD computation
on each image. In serial fashion, we might write a FOR loop, to
process one image at a time (the last 3 lines in the code below).
% load 12 MRI images, each
256x256 pixels from file.
% The resultant matlab variable MRIdat will be 256-by-256-by-12
in size.
load MRIdata
% get size of the image cube
[xpixel,ypixel,nimage] = size(MRIdat);
% pre-allocate output variable to improve
MATLAB® performance
MRI_U = zeros(xpixel,ypixel,nimage);
MRI_S = zeros(xpixel,ypixel,nimage);
MRI_V = zeros(xpixel,ypixel,nimage);
% Loop over the individual slices
for i = 1:nimage
[MRI_U(:,:,i),MRI_S(:,:,i),MRI_V(:,:,i)] = svd(MRIdat(:,:,i));
end
Because the operations are independent, we can use
ppeval to carry them out in parallel using the ppeval construct,
in the last line in the code below:
% load 12 MRI images, each
256x256 pixels from file.
% The resultant matlab variable MRIdat will be 256-by-256-by-12p
in size.
ppload MRIdata
% For the ppeval algorithm it is NOT necessary
to
% 1) know the size of the image cube
% 2) pre-allocate the memory for the output
% for successful completion on the algorithm.
% Loop over the individual slices in task
parallel
[MRI_U,MRI_S,MRI_V] = ppeval('svd',MRIdat);
Now, let’s consider a data set with 240 images, each 256x256
pixels. On a desktop PC with a 2.8 GHz AMD Opteron CPU and 3 GB
of RAM, the serial computation takes 59 seconds. When carried out
in parallel with Star-P on a server with eight processors (2.4 GHz
Opteron) and 32 GB of RAM, the computation takes less than 7 seconds:
Finally, it should be noted that Star-P’s powerful abstraction
does not require you to know how many processors are available.
Star-P handles the data distribution, and the same MATLAB® code will
run successfully whether there are 8, 32, or 512 processors available.
|