Interactive SuperComputing


 

Assessing Star-P® for M

Star-P is a platform that has been built to bring the power of parallel computing to users of interactive desktop tools, such as MATLABŪ from The MathWorks. Parallel processing can often speed up computations, and enable larger data sets to be processed. That said, it is important to note that not all algorithms and code structures lend themselves equally well to parallelization.

Although there are no hard and fast rules regarding this, the goal of this note is to briefly provide some guidelines for the kinds of algorithms and code structures that do well, and those that do not. There are two key opportunities for parallelization enabled by Star-P:
  1. Task-parallel computations: Task parallelism (sometimes called "coarse-grained" or "embarrassingly parallel") is a powerful method to carry out many independent calculations in parallel, such as Monte Carlo simulations, or "un-rolling" serial FOR loops. For example, in a medical application involving image processing on multiple brain slices, Star-P can distribute the images across several processors, and simultaneously process them.
  2. Data-parallel computations: Data parallelism (sometimes called "global array syntax") is used for high-level matrix and vector operations on large data sets. This is critical for many of today's toughest computational problems. (For example, a 10 MB data file generated by airborne radar today may swell to a terabyte-sized data set generated by an array of satellites.)

At the end of this note, Interactive Supercomputing provides a number of tools that can be used to further assess how Star-P can add value to your work. Engage with Interactive Supercomputing or one of our Solution Partners with these tools to gain more insight into going parallel.

Task Parallel or Data Parallel?

Star-P supports both task- and data-parallel computations. To assess when and where to use them for a given model or application, it is best to think of the M-based script as a sequence of compute blocks (Figure 1), and in particular focus on the compute-intensive sections. (Strategically-placed tic/toc commands can be an effective means of quickly assessing where the compute time is being spent.)



Figure 1: Typical code structure of a M script

Now, for a given block, the question becomes: can parallel computing help, and if so, which approach? Figure 2 provides some guidelines for this.



Figure 2: Flow-chart for optimizing code structure
  • Serial Execution on the Desktop: It turns out that it does not make sense to parallelize some operations, given the overhead associated with distributing the data - and therefore the computation is best left on the desktop PC:
    • Operations on small data sets, less than 100KB in size;
    • Operations that take half a second or less to compute;
    • String operations;

  • Task-Parallel Execution: Moderate data sets, and operations that can be carried out independent of each other, lend themselves well to task parallelism:
    • "Moderate" data set size spans roughly 1 to 100 MB;
    • Desktop execution times should be at least seconds, and those that take minutes or hours may see the best speed-ups;
    • Code that is "vectorized" - that is, operations work on whole vectors or matrices, rather than individual elements - work best;
    • Task parallelism can be very effective for "unrolling" loops, but only when the code is separable in time. That is, a given computation iteration is not dependent on the previous one.
    • Star-P also enables users to plug existing codes, written in C, C++, or Fortran, into the Star-P Server, and to access them as functions called from the desktop MATLAB® environment, run on multiple processors in parallel.

  • Data-Parallel Execution:
    • Operations that take minutes, hours, or cannot be done on a desktop PC
    • Operations on matrices of a million elements or more (e.g., 2,000 x 2,000)
    • Operations on data sets with tens or hundreds of MB, or more
    • Code must be "vectorized" - that is, operations work on whole vectors or matrices, rather than individual elements;
For an assessment of how much Star-P can accelerate your M codes, simply email the .m files to: sales@interactivesupercomputing.com!

Tools for assessment

  • Take the application quiz and receive a FREE high level technical assessment from Interactive Supercomputing's applications engineers who have years of experience with VHLL's and parallel computing;
  • If you have Task Parallel applications, run the Task Parallel Profiler to gain an understanding of the possibilities for speed up on a 32 core cluster!
  • For a deeper assessment of how well Star-P applies to your specific M code, provide the results of our Function Analyzer to our application team for feedback;
  • To determine the compatibility of Star-P and your cluster view our Supported Configurations information or run the compatibility script to verify server compatibility.

Summary

  • Leave on the desktop when dealing with a small data set, string operations, or the computation takes less than second;
  • Do you know where in the code the compute-intensive operations are located?
  • Task-parallel:
    • Are data sets in the 1-100 MB range?
    • Computations taking minutes or hours on the desktop?
    • Are operations on vectors or matrices, rather than element-wise?
    • Are loops separable in time? (if NO, cannot use task parallel)
    • Is there existing serial C or Fortran code that can be plugged in and run?
  • Data parallel:
    • Vectorized operations on large data sets (Megabytes, Gigabytes, and beyond)
  • You can take Star-P for a test drive using Star-P On-Demand should you feel your application would be well suited for parallelism.